dplyr distinct with exceptions

How use dplyr distinct with exceptions, select unique rows in R

Here are several examples of how to use dplyr distinct to select only unique rows in the data frame. Examples starting with situations where you define columns used to get distinct rows and ending with dplyr distinct with exceptions. Similarly to distinct by using one or multiple columns to get unique rows, maybe it is more rational to use it for all columns except one or several.

Here is my data frame.

df <- data.frame(
  manager = as.character(c("David", "David", "David", "Kate", "Kate", "Alma", "Alma")),
  month_name = as.character(c("January", "January", "February", "January", "January", "January", "February")),
  value = as.numeric(c(33, 33, 58, 10, 97, 88, 88))
)

df

#  manager month_name value
#1   David    January    33
#2   David    January    33
#3   David   February    58
#4    Kate    January    10
#5    Kate    January    97
#6    Alma    January    88
#7    Alma   February    88

 

dplyr distinct by using one or multiple columns

Function distinct from dplyr is easy to use, but be aware of the keep_all parameter. If you want to keep other columns in the data frame that are not specified in distinct, set keep_all to TRUE. Otherwise, after using that, some of the columns will be lost.

require(dplyr)

df %>% distinct(manager, .keep_all = TRUE)

#  manager month_name value
#1   David    January    33
#2    Kate    January    10
#3    Alma    January    88

The use of function distinct by using multiple columns at the same time is similar. Add any additional column that you want.

require(dplyr)

df %>% distinct(manager, month_name, .keep_all = TRUE)

#1   David    January    33
#2   David   February    58
#3    Kate    January    10
#4    Alma    January    88
#5    Alma   February    88

 

dplyr distinct by using all columns

To run a function distinct through the whole data frame, use it without any arguments.

require(dplyr)

df %>% distinct()

#  manager month_name value
#1   David    January    33
#2   David   February    58
#3    Kate    January    10
#4    Kate    January    97
#5    Alma    January    88
#6    Alma   February    88

 

dplyr distinct with exceptions

Here is how to use dplyr distinct with exceptions to get unique rows in the R data frame. Similarly to distinct by using one or multiple columns, maybe it is more rational to use it for all columns except one or several. By using function across, you can specify the exception in a selection of distinct rows. You can use across to apply even ifelse for the range of columns.

require(dplyr)

df %>% distinct(across(-value), .keep_all = TRUE)

#  manager month_name value
#1   David    January    33
#2   David   February    58
#3    Kate    January    10
#4    Alma    January    88
#5    Alma   February    88

 

Additional situations

Use dplyr distinct to remove duplicates and keep the last row.

Use dplyr distinct to keep the first and last row by a group in the R data frame.

Here is the easy method of how to calculate the count of unique values in one or multiple columns by using R.

It is good to know dplyr tips and tricks

Here are my favorite top 10 dplyr tips and tricks.





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *