Sometimes you have to calculate a rank in R to indicate an order by one or multiple criteria. It is done by using multiple principles and in various order. I will use dense rank that will not create gaps between ranks, and in the situation when something is identical, the ranks will also be identical.
I will use the dplyr and Starwars dataset that comes with this package. Here is my sample from that dataset. It contains 4 columns, 2 sexes, and records that have available height and mass.
require(dplyr) df <- starwars %>% select(name, height, mass, sex) %>% filter(sex %in% c("male", "female") & !is.na(height) & !is.na(mass)) df %>% head() %>% as.data.frame() # name height mass sex #1 Luke Skywalker 172 77 male #2 Darth Vader 202 136 male #3 Leia Organa 150 49 female #4 Owen Lars 178 120 male #5 Beru Whitesun lars 165 75 female #6 Biggs Darklighter 183 84 male
rank by using dplyr
Here are all dplyr ranking functions that are available. I’m interested in dense rank and will use the dense_rank function.
Here is how to create a rank for each of the separate columns. I would like to have rank number 1 for the largest value, and it is important to use the desc function in that case. That will tell the dense_rank function to look at the data in descending order. Otherwise, it will generate by ascending order, and desc is not necessary.
df <- df %>% mutate( "height_rank" = dense_rank(desc(height)), "mass_rank" = dense_rank(desc(mass)) )
Here is my top 10 by height rank. If you are interested in selecting top or bottom values in R, I recommend this post.
df %>% slice_min(height_rank, n = 10) %>% as.data.frame() # name height mass sex height_rank mass_rank #1 Tarfful 234 136 male 1 2 #2 Lama Su 229 88 male 2 10 #3 Chewbacca 228 112 male 3 5 #4 Roos Tarpals 224 82 male 4 15 #5 Grievous 216 159 male 5 1 #6 Tion Medon 206 80 male 6 16 #7 Darth Vader 202 136 male 7 2 #8 Ki-Adi-Mundi 198 82 male 8 15 #9 Dexter Jettster 198 102 male 8 7 #10 Jar Jar Binks 196 66 male 9 23 #11 Kit Fisto 196 87 male 9 11
As you can see, a couple of them are with the same height have the same rank accordingly. But what if I should create a rank based on several columns?
rank in R based on multiple columns
If I want to do the ranking by multiple columns, I prefer the rank of the mean value of different ranks.
Meanwhile, the easiest way to create rank based on multiple columns is by using the frank function from data.table package. The trickiest part is to set descending order. If it is necessary to get descending order in the frank function, add a minus sign to the corresponding column.
df$total_rank <- df %>% data.table::frank(-height, -mass, ties.method = "dense") df %>% slice_min(total_rank, n = 10) %>% as.data.frame() # name height mass sex height_rank mass_rank total_rank #1 Tarfful 234 136 male 1 2 1 #2 Lama Su 229 88 male 2 10 2 #3 Chewbacca 228 112 male 3 5 3 #4 Roos Tarpals 224 82 male 4 15 4 #5 Grievous 216 159 male 5 1 5 #6 Tion Medon 206 80 male 6 16 6 #7 Darth Vader 202 136 male 7 2 7 #8 Dexter Jettster 198 102 male 8 7 8 #9 Ki-Adi-Mundi 198 82 male 8 15 9 #10 Kit Fisto 196 87 male 9 11 10
The result is not satisfying. It is formed primarily from the first column.
If I calculate the mean of the two separate ranks and create ar rank out of the result, the result looks logical to me.
df <- df %>% rowwise() %>% mutate("mean_rank" = mean(c(height_rank, mass_rank))) %>% ungroup() df <- df %>% mutate("total_rank" = dense_rank(mean_rank)) %>% select(-mean_rank) df %>% slice_min(total_rank, n = 10) %>% as.data.frame() # name height mass sex height_rank mass_rank total_rank #1 Tarfful 234 136 male 1 2 1 #2 Grievous 216 159 male 5 1 2 #3 Chewbacca 228 112 male 3 5 3 #4 Darth Vader 202 136 male 7 2 4 #5 Lama Su 229 88 male 2 10 5 #6 Dexter Jettster 198 102 male 8 7 6 #7 Bossk 190 113 male 12 4 7 #8 Qui-Gon Jinn 193 89 male 10 9 8 #9 Nute Gunray 191 90 male 11 8 8 #10 Roos Tarpals 224 82 male 4 15 8
create rank by a group in R
Dplyr is a great package that can execute calculations by groups. For example, percentage by group, minimum or maximum value by group, or cumulative sum or count.
Here is how to create a rank by a group.
df <- df %>% group_by(sex) %>% mutate("height_by_sex_rank" = dense_rank(desc(height))) df %>% slice_min(height_by_sex_rank, n = 3) %>% as.data.frame() # name height mass sex height_rank mass_rank total_rank height_by_sex_rank #1 Adi Gallia 184 50 female 15 28 24 1 #2 Ayla Secura 178 55 female 19 27 26 2 #3 Shaak Ti 178 57 female 19 25 25 2 #4 Tarfful 234 136 male 1 2 1 1 #5 Lama Su 229 88 male 2 10 5 2 #6 Chewbacca 228 112 male 3 5 3 3
Leave a Reply