Columns with different names to join data frames in R by using functions from dplyr, like left_join or others, are not very handy but can be used. It is better if you have data frames with matching key column names. In that kind of scenario, you can sometimes join without specifying them if there are no other matching names.
Here is the data frame that I’m using in this example.
head(swiss) # Fertility Agriculture Examination Education Catholic Infant.Mortality #Courtelary 80.2 17.0 15 12 9.96 22.2 #Delemont 83.1 45.1 6 9 84.84 22.2 #Franches-Mnt 92.5 39.7 5 5 93.40 20.2 #Moutier 85.8 36.5 12 7 33.77 20.3 #Neuveville 76.9 43.5 17 15 5.16 20.6 #Porrentruy 76.1 35.3 9 7 90.57 26.6
To demonstrate how to use columns with different names to join data frames in R, I created two additional data frames from the previous one.
require(dplyr) p1 <- swiss %>% add_rownames(var = "Province") %>% select(Province, 2:3) p1 %>% as.data.frame() %>% head() # Province Fertility Agriculture #1 Courtelary 80.2 17.0 #2 Delemont 83.1 45.1 #3 Franches-Mnt 92.5 39.7 #4 Moutier 85.8 36.5 #5 Neuveville 76.9 43.5 #6 Porrentruy 76.1 35.3 p2 <- swiss %>% add_rownames(var = "Territory") %>% select(Territory, 4:5) p2 %>% as.data.frame() %>% head() # Territory Examination Education #1 Courtelary 15 12 #2 Delemont 6 9 #3 Franches-Mnt 5 5 #4 Moutier 12 7 #5 Neuveville 17 15 #6 Porrentruy 9 7
If two data frames have no common column name, by argument should be supplied. In the situation when you are using columns with different names to join data frames in R, you can specify them as below. In this case, there is left_join from dplyr.
left_join(p1, p2, by = c("Province" = "Territory")) %>% as.data.frame() %>% head() # Province Fertility Agriculture Examination Education #1 Courtelary 80.2 17.0 15 12 #2 Delemont 83.1 45.1 6 9 #3 Franches-Mnt 92.5 39.7 5 5 #4 Moutier 85.8 36.5 12 7 #5 Neuveville 76.9 43.5 17 15 #6 Porrentruy 76.1 35.3 9 7
It is also possible to do the left_join from dplyr and specify the column from the second data frame by index.
left_join(p1, p2, by = c("Province" = names(p2)[1])) %>% as.data.frame() %>% head()
If data frames have identical column names that you want to use in the join operation, you can join them without specifying them. For example, the function left_join from dplyr will automatically detect column names that can be used and will tell you that, like in the message below.
names(p2)[1] <- "Province" left_join(p1, p2) %>% as.data.frame() %>% head() #Joining, by = "Province" #Province Fertility Agriculture Examination Education #1 Courtelary 80.2 17.0 15 12 #2 Delemont 83.1 45.1 6 9 #3 Franches-Mnt 92.5 39.7 5 5 #4 Moutier 85.8 36.5 12 7 #5 Neuveville 76.9 43.5 17 15 #6 Porrentruy 76.1 35.3 9 7
There are other interesting scenarios that might be useful if you are using the join functions from dplyr.
For example, here is how to join only selected columns from the data frame in R or execute multiple dplyr left_joins at once.
Otherwise, here are my top 10 dplyr tips and tricks.
Leave a Reply