first and last row of category in R

Get the first and last row of the data group in R

If your data frame content is categorized and you are interested to extract the first and the last row of each group in R, this post is for you. It is a useful solution to detect the minimum and the maximum by a group in big data sets. In my experience, it works faster than this solution with dplyr.

Let’s take a look at the R dataset iris. I will keep only two columns “Species” and “Petal.Length”.

require(dplyr)

head(iris %>% select(Species, Petal.Length))

#  Species Petal.Length
#1  setosa          1.4
#2  setosa          1.4
#3  setosa          1.3
#4  setosa          1.5
#5  setosa          1.4
#6  setosa          1.7

Data is already grouped by species, and I would like to get the first and the last row of each of these groups.

Get the first and last row of the data group in R

1. Order your data in a way that is right for your goal. You can do that by using the function arrange from dplyr.

2. Use the dplyr filter function to get the first and the last row of each group. This is a combination of duplicates removal that leaves the first and last row at the same time.

df <- iris %>%
  select(Species, Petal.Length) %>%
  group_by(Species) %>%
  filter(row_number() == 1 | row_number() == n())

head(df)

## A tibble: 6 x 2
## Groups:   Species [3]
#  Species    Petal.Length
#               
#1 setosa              1.4
#2 setosa              1.4
#3 versicolor          4.7
#4 versicolor          4.1
#5 virginica           6  
#6 virginica           5.1

Combine the first and last row of each group in R

If you would like to combine the first and last rows, here is how to do that. One of the solutions is to group this information and concatenate it in one column.

If you would like to see the first and last values of each group in separate columns, the first thing is to create a column with repeated a combination of necessary labels.

require(tidyr)

df$first_last <- rep(c("first", "last"), times = nrow(df)/2)

head(df)
## A tibble: 6 x 3
## Groups:   Species [3]
#  Species    Petal.Length first_last
#                     
#1 setosa              1.4 first     
#2 setosa              1.4 last      
#3 versicolor          4.7 first     
#4 versicolor          4.1 last      
#5 virginica           6   first     
#6 virginica           5.1 last

After that, use pivot_wider to transform rows to columns.

df %>% pivot_wider(names_from = first_last, values_from = Petal.Length)

## A tibble: 3 x 3
## Groups:   Species [3]
#  Species    first  last
#         
#1 setosa       1.4   1.4
#2 versicolor   4.7   4.1
#3 virginica    6     5.1

Posted

August 4, 2021

in

Comments

Leave a Reply Cancel reply