Remove duplicates and keep last in R

Remove duplicates and keep last in R

Here is how to remove duplicates but keep the last row in the R data frame.

Here is my data frame that contains agents and managers. Let’s say the last one is that I would like to get.

keep last row in R

df <- data.frame(
agent = as.character(c("David", "David", "David", "Kate", "Kate", "Alma", "Alma")),
manager = as.character(c("Lisa", "Monica", "Karl", "Kianna", "Luna", "Kaylin", "Georgia"))
)

It is easy to keep the last unique record by using dplyr. Distinct functions return the first record and that is the reason you should use a little workaround.

Group by at least one category that you are interested in. Function row_count will get the count of rows by each group. Filter by last row number, which equals the result of function n. N that gives the current group size.

df <- df %>%
group_by(agent) %>%
filter(row_number() == n())

How to get top or bottom values by each group in R





Posted

in

Comments

4 responses to “Remove duplicates and keep last in R”

  1. Santiago

    thank you very much, i was struggling with this problem and didn’t know about row_number() function. Greetings from Perú

    1. Janis Sturis

      Thank you for your feedback!
      Looks like world is united by R.

    2. Santiago

      Is it possible to do the same but with the content of the column and not by the position? (intead of last row number, “Karl”). Thanks

      1. Janis Sturis

        Maybe. If there is some principle that can define sequence and works in all situations like the alphabet or group index. In that case, you can use arrange() and distinct() from dplyr.

Leave a Reply

Your email address will not be published. Required fields are marked *