R “not in” operator, opposite of “in”

R “not in” operator, opposite of “in”

R operator %in% is handy for working with vectors, but how to use it oppositely? Something like %notin% that will exclude anything that is in a vector.

There is no actual %notin% operator in R, but below is the explanation on how to get the desired result.

For example, you have 2 data frames. Imagine that you would like to select rows from the second data frame matching with ids in the first one or do the opposite.

df1 <- data.frame(
  id = as.character(c("00129", "00121", "00124", "00127", "00130", "00128", "00111")),
  agent = as.character(c("David", "John", "Paul", "Kate", "Thomas", "Alma", "Grace"))
)

#     id  agent
#1 00129  David
#2 00121   John
#3 00124   Paul
#4 00127   Kate
#5 00130 Thomas
#6 00128   Alma
#7 00111  Grace

df2 <- data.frame(
  id = as.character(c("00123", "00124", "00124", "00125", "00126", "00127", "00128")),
  v1 = as.numeric(c(701.82, 698.73, NA, NA, 698.71, 698.59, 690.83)),
  v2 = as.numeric(c(697.87, NA, 76.44, 95.53, 629.38, 486.48, 328.51)),
  v3 = as.numeric(c(283.02, 783.89, NA, NA, 902.52, 990.53, 812.63)),
  v4 = as.numeric(c(201.40, 215.42, 57.47, 301.33, NA, NA, NA))
)

#     id     v1     v2     v3     v4
#1 00123 701.82 697.87 283.02 201.40
#2 00124 698.73     NA 783.89 215.42
#3 00124     NA  76.44     NA  57.47
#4 00125     NA  95.53     NA 301.33
#5 00126 698.71 629.38 902.52     NA
#6 00127 698.59 486.48 990.53     NA
#7 00128 690.83 328.51 812.63     NA

R %in% operator

Maybe someone will do left_join from dplyr and after then filter, but it is not necessary by using a vector with ids. As you see, with the operator %in%, it is done easily. Here are other examples with filtering data frames in R.

subset(df2, df2$id %in% df1$id)

#     id     v1     v2     v3     v4
#2 00124 698.73     NA 783.89 215.42
#3 00124     NA  76.44     NA  57.47
#6 00127 698.59 486.48 990.53     NA
#7 00128 690.83 328.51 812.63     NA

R %not in% operator, opposite to %in%

There is no actual %not in% operator. It is done by using negation (NOT operator) like this.

subset(df2, !(df2$id %in% df1$id))

#     id     v1     v2     v3     v4
#1 00123 701.82 697.87 283.02 201.40
#4 00125     NA  95.53     NA 301.33
#5 00126 698.71 629.38 902.52     NA

You can test how it looks in separate columns.

df2$match <- df2$id %in% df1$id
df2$no_match <- !(df2$id %in% df1$id)

df2

#     id     v1     v2     v3     v4 match no_match
#1 00123 701.82 697.87 283.02 201.40 FALSE     TRUE
#2 00124 698.73     NA 783.89 215.42  TRUE    FALSE
#3 00124     NA  76.44     NA  57.47  TRUE    FALSE
#4 00125     NA  95.53     NA 301.33 FALSE     TRUE
#5 00126 698.71 629.38 902.52     NA FALSE     TRUE
#6 00127 698.59 486.48 990.53     NA  TRUE    FALSE
#7 00128 690.83 328.51 812.63     NA  TRUE    FALSE

What’s next?

Check out my favorite RStudio tips and tricks. For example, how to comment out multiple lines of R script at once.

If you are using the dplyr package in R, then take a look at these useful tips and tricks.





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *