Remove duplicates and keep last in R

Here is how to remove duplicates but keep the last row in the R data frame.

Here is my data frame that contains agents and managers. Let’s say the last one is that I would like to get.

df <- data.frame(
agent = as.character(c("David", "David", "David", "Kate", "Kate", "Alma", "Alma")),
manager = as.character(c("Lisa", "Monica", "Karl", "Kianna", "Luna", "Kaylin", "Georgia"))
)

It is easy to keep the last unique record by using dplyr. Distinct functions return the first record and that is the reason you should use a little workaround.

Group by at least one category that you are interested in. Function row_count will get the count of rows by each group. Filter by last row number, which equals the result of function n. N that gives the current group size.

df <- df %>%
group_by(agent) %>%
filter(row_number() == n())

How to get top or bottom values by each group in R

4 comments

thank you very much, i was struggling with this problem and didn’t know about row_number() function. Greetings from Perú

Janis Sturis says:

October 25, 2021 at 19:41

Thank you for your feedback!
Looks like world is united by R.

Reply
Santiago says:

October 25, 2021 at 19:52

Is it possible to do the same but with the content of the column and not by the position? (intead of last row number, “Karl”). Thanks

Reply
1. Janis Sturis says:
  
  October 25, 2021 at 21:47
  
  Maybe. If there is some principle that can define sequence and works in all situations like the alphabet or group index. In that case, you can use arrange() and distinct() from dplyr.
  
  Reply

4 comments

Leave a comment Cancel reply

Leave a comment