R operator %in% is handy for working with vectors, but how to use it oppositely? Something like %notin% that will exclude anything that is in a vector.
There is no actual %notin% operator in R, but below is the explanation on how to get the desired result.
For example, you have 2 data frames. Imagine that you would like to select rows from the second data frame matching with ids in the first one or do the opposite.
df1 <- data.frame( id = as.character(c("00129", "00121", "00124", "00127", "00130", "00128", "00111")), agent = as.character(c("David", "John", "Paul", "Kate", "Thomas", "Alma", "Grace")) ) # id agent #1 00129 David #2 00121 John #3 00124 Paul #4 00127 Kate #5 00130 Thomas #6 00128 Alma #7 00111 Grace df2 <- data.frame( id = as.character(c("00123", "00124", "00124", "00125", "00126", "00127", "00128")), v1 = as.numeric(c(701.82, 698.73, NA, NA, 698.71, 698.59, 690.83)), v2 = as.numeric(c(697.87, NA, 76.44, 95.53, 629.38, 486.48, 328.51)), v3 = as.numeric(c(283.02, 783.89, NA, NA, 902.52, 990.53, 812.63)), v4 = as.numeric(c(201.40, 215.42, 57.47, 301.33, NA, NA, NA)) ) # id v1 v2 v3 v4 #1 00123 701.82 697.87 283.02 201.40 #2 00124 698.73 NA 783.89 215.42 #3 00124 NA 76.44 NA 57.47 #4 00125 NA 95.53 NA 301.33 #5 00126 698.71 629.38 902.52 NA #6 00127 698.59 486.48 990.53 NA #7 00128 690.83 328.51 812.63 NA
R %in% operator
Maybe someone will do left_join from dplyr and after then filter, but it is not necessary by using a vector with ids. As you see, with the operator %in%, it is done easily. Here are other examples with filtering data frames in R.
subset(df2, df2$id %in% df1$id) # id v1 v2 v3 v4 #2 00124 698.73 NA 783.89 215.42 #3 00124 NA 76.44 NA 57.47 #6 00127 698.59 486.48 990.53 NA #7 00128 690.83 328.51 812.63 NA
R %not in% operator, opposite to %in%
There is no actual %not in% operator. It is done by using negation (NOT operator) like this.
subset(df2, !(df2$id %in% df1$id)) # id v1 v2 v3 v4 #1 00123 701.82 697.87 283.02 201.40 #4 00125 NA 95.53 NA 301.33 #5 00126 698.71 629.38 902.52 NA
You can test how it looks in separate columns.
df2$match <- df2$id %in% df1$id df2$no_match <- !(df2$id %in% df1$id) df2 # id v1 v2 v3 v4 match no_match #1 00123 701.82 697.87 283.02 201.40 FALSE TRUE #2 00124 698.73 NA 783.89 215.42 TRUE FALSE #3 00124 NA 76.44 NA 57.47 TRUE FALSE #4 00125 NA 95.53 NA 301.33 FALSE TRUE #5 00126 698.71 629.38 902.52 NA FALSE TRUE #6 00127 698.59 486.48 990.53 NA TRUE FALSE #7 00128 690.83 328.51 812.63 NA TRUE FALSE
What’s next?
Check out my favorite RStudio tips and tricks. For example, how to comment out multiple lines of R script at once.
If you are using the dplyr package in R, then take a look at these useful tips and tricks.
Leave a Reply