dense rank in R

determine rank in R

Sometimes you have to calculate a rank in R to indicate an order by one or multiple criteria. It is done by using multiple principles and in various order. I will use dense rank that will not create gaps between ranks, and in the situation when something is identical, the ranks will also be identical.

I will use the dplyr and Starwars dataset that comes with this package. Here is my sample from that dataset. It contains 4 columns, 2 sexes, and records that have available height and mass.

require(dplyr)

df <- starwars %>%
  select(name, height, mass, sex) %>%
  filter(sex %in% c("male", "female") & !is.na(height) & !is.na(mass))

df %>% head() %>% as.data.frame()

#                name height mass    sex
#1     Luke Skywalker    172   77   male
#2        Darth Vader    202  136   male
#3        Leia Organa    150   49 female
#4          Owen Lars    178  120   male
#5 Beru Whitesun lars    165   75 female
#6  Biggs Darklighter    183   84   male

 

rank by using dplyr

Here are all dplyr ranking functions that are available. I’m interested in dense rank and will use the dense_rank function.

Here is how to create a rank for each of the separate columns. I would like to have rank number 1 for the largest value, and it is important to use the desc function in that case. That will tell the dense_rank function to look at the data in descending order. Otherwise, it will generate by ascending order, and desc is not necessary.

df <- df %>% mutate(
  "height_rank" = dense_rank(desc(height)),
  "mass_rank" = dense_rank(desc(mass))
)

Here is my top 10 by height rank. If you are interested in selecting top or bottom values in R, I recommend this post.

df %>% slice_min(height_rank, n = 10) %>% as.data.frame()

#              name height mass  sex height_rank mass_rank
#1          Tarfful    234  136 male           1         2
#2          Lama Su    229   88 male           2        10
#3        Chewbacca    228  112 male           3         5
#4     Roos Tarpals    224   82 male           4        15
#5         Grievous    216  159 male           5         1
#6       Tion Medon    206   80 male           6        16
#7      Darth Vader    202  136 male           7         2
#8     Ki-Adi-Mundi    198   82 male           8        15
#9  Dexter Jettster    198  102 male           8         7
#10   Jar Jar Binks    196   66 male           9        23
#11       Kit Fisto    196   87 male           9        11

As you can see, a couple of them are with the same height have the same rank accordingly. But what if I should create a rank based on several columns?

 

rank in R based on multiple columns

If I want to do the ranking by multiple columns, I prefer the rank of the mean value of different ranks.

Meanwhile, the easiest way to create rank based on multiple columns is by using the frank function from data.table package. The trickiest part is to set descending order. If it is necessary to get descending order in the frank function, add a minus sign to the corresponding column.

df$total_rank <- df %>% data.table::frank(-height, -mass, ties.method = "dense")

df %>% slice_min(total_rank, n = 10) %>% as.data.frame()

#              name height mass  sex height_rank mass_rank total_rank
#1          Tarfful    234  136 male           1         2          1
#2          Lama Su    229   88 male           2        10          2
#3        Chewbacca    228  112 male           3         5          3
#4     Roos Tarpals    224   82 male           4        15          4
#5         Grievous    216  159 male           5         1          5
#6       Tion Medon    206   80 male           6        16          6
#7      Darth Vader    202  136 male           7         2          7
#8  Dexter Jettster    198  102 male           8         7          8
#9     Ki-Adi-Mundi    198   82 male           8        15          9
#10       Kit Fisto    196   87 male           9        11         10

The result is not satisfying. It is formed primarily from the first column.

If I calculate the mean of the two separate ranks and create ar rank out of the result, the result looks logical to me.

df <- df %>%
  rowwise() %>%
  mutate("mean_rank" = mean(c(height_rank, mass_rank))) %>%
  ungroup()

df <- df %>% mutate("total_rank" = dense_rank(mean_rank)) %>% select(-mean_rank)

df %>% slice_min(total_rank, n = 10) %>% as.data.frame()

#              name height mass  sex height_rank mass_rank total_rank
#1          Tarfful    234  136 male           1         2          1
#2         Grievous    216  159 male           5         1          2
#3        Chewbacca    228  112 male           3         5          3
#4      Darth Vader    202  136 male           7         2          4
#5          Lama Su    229   88 male           2        10          5
#6  Dexter Jettster    198  102 male           8         7          6
#7            Bossk    190  113 male          12         4          7
#8     Qui-Gon Jinn    193   89 male          10         9          8
#9      Nute Gunray    191   90 male          11         8          8
#10    Roos Tarpals    224   82 male           4        15          8

 

create rank by a group in R

Dplyr is a great package that can execute calculations by groups. For example, percentage by group, minimum or maximum value by group, or cumulative sum or count.

Here is how to create a rank by a group.

df <- df %>%
  group_by(sex) %>%
  mutate("height_by_sex_rank" = dense_rank(desc(height)))

df %>% slice_min(height_by_sex_rank, n = 3) %>% as.data.frame()

#         name height mass    sex height_rank mass_rank total_rank height_by_sex_rank
#1  Adi Gallia    184   50 female          15        28         24                  1
#2 Ayla Secura    178   55 female          19        27         26                  2
#3    Shaak Ti    178   57 female          19        25         25                  2
#4     Tarfful    234  136   male           1         2          1                  1
#5     Lama Su    229   88   male           2        10          5                  2
#6   Chewbacca    228  112   male           3         5          3                  3





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *