Here are several examples of how to use dplyr distinct to select only unique rows in the data frame. Examples starting with situations where you define columns used to get distinct rows and ending with dplyr distinct with exceptions. Similarly to distinct by using one or multiple columns to get unique rows, maybe it is more rational to use it for all columns except one or several.
Here is my data frame.
df <- data.frame( manager = as.character(c("David", "David", "David", "Kate", "Kate", "Alma", "Alma")), month_name = as.character(c("January", "January", "February", "January", "January", "January", "February")), value = as.numeric(c(33, 33, 58, 10, 97, 88, 88)) ) df # manager month_name value #1 David January 33 #2 David January 33 #3 David February 58 #4 Kate January 10 #5 Kate January 97 #6 Alma January 88 #7 Alma February 88
dplyr distinct by using one or multiple columns
Function distinct from dplyr is easy to use, but be aware of the keep_all parameter. If you want to keep other columns in the data frame that are not specified in distinct, set keep_all to TRUE. Otherwise, after using that, some of the columns will be lost.
require(dplyr) df %>% distinct(manager, .keep_all = TRUE) # manager month_name value #1 David January 33 #2 Kate January 10 #3 Alma January 88
The use of function distinct by using multiple columns at the same time is similar. Add any additional column that you want.
require(dplyr) df %>% distinct(manager, month_name, .keep_all = TRUE) #1 David January 33 #2 David February 58 #3 Kate January 10 #4 Alma January 88 #5 Alma February 88
dplyr distinct by using all columns
To run a function distinct through the whole data frame, use it without any arguments.
require(dplyr) df %>% distinct() # manager month_name value #1 David January 33 #2 David February 58 #3 Kate January 10 #4 Kate January 97 #5 Alma January 88 #6 Alma February 88
dplyr distinct with exceptions
Here is how to use dplyr distinct with exceptions to get unique rows in the R data frame. Similarly to distinct by using one or multiple columns, maybe it is more rational to use it for all columns except one or several. By using function across, you can specify the exception in a selection of distinct rows. You can use across to apply even ifelse for the range of columns.
require(dplyr) df %>% distinct(across(-value), .keep_all = TRUE) # manager month_name value #1 David January 33 #2 David February 58 #3 Kate January 10 #4 Alma January 88 #5 Alma February 88
Additional situations
Use dplyr distinct to remove duplicates and keep the last row.
Use dplyr distinct to keep the first and last row by a group in the R data frame.
Here is the easy method of how to calculate the count of unique values in one or multiple columns by using R.
It is good to know dplyr tips and tricks
Here are my favorite top 10 dplyr tips and tricks.
Leave a Reply