different names to join data frames in R, dplyr left_join with different column names

Columns with different names to join data frames in R dplyr

Columns with different names to join data frames in R by using functions from dplyr, like left_join or others, are not very handy but can be used. It is better if you have data frames with matching key column names. In that kind of scenario, you can sometimes join without specifying them if there are no other matching names.

 

Here is the data frame that I’m using in this example.

head(swiss)

#             Fertility Agriculture Examination Education Catholic Infant.Mortality
#Courtelary        80.2        17.0          15        12     9.96             22.2
#Delemont          83.1        45.1           6         9    84.84             22.2
#Franches-Mnt      92.5        39.7           5         5    93.40             20.2
#Moutier           85.8        36.5          12         7    33.77             20.3
#Neuveville        76.9        43.5          17        15     5.16             20.6
#Porrentruy        76.1        35.3           9         7    90.57             26.6

To demonstrate how to use columns with different names to join data frames in R, I created two additional data frames from the previous one.

require(dplyr)


p1 <- swiss %>%
  add_rownames(var = "Province") %>%
  select(Province, 2:3)

p1 %>% as.data.frame() %>% head()

#      Province Fertility Agriculture
#1   Courtelary      80.2        17.0
#2     Delemont      83.1        45.1
#3 Franches-Mnt      92.5        39.7
#4      Moutier      85.8        36.5
#5   Neuveville      76.9        43.5
#6   Porrentruy      76.1        35.3


p2 <- swiss %>% 
  add_rownames(var = "Territory") %>% 
  select(Territory, 4:5)

p2 %>% as.data.frame() %>% head()

#     Territory Examination Education
#1   Courtelary          15        12
#2     Delemont           6         9
#3 Franches-Mnt           5         5
#4      Moutier          12         7
#5   Neuveville          17        15
#6   Porrentruy           9         7

If two data frames have no common column name, by argument should be supplied. In the situation when you are using columns with different names to join data frames in R, you can specify them as below. In this case, there is left_join from dplyr.

left_join(p1, p2, by = c("Province" = "Territory")) %>% 
  as.data.frame() %>% 
  head()

#      Province Fertility Agriculture Examination Education
#1   Courtelary      80.2        17.0          15        12
#2     Delemont      83.1        45.1           6         9
#3 Franches-Mnt      92.5        39.7           5         5
#4      Moutier      85.8        36.5          12         7
#5   Neuveville      76.9        43.5          17        15
#6   Porrentruy      76.1        35.3           9         7

It is also possible to do the left_join from dplyr and specify the column from the second data frame by index.

left_join(p1, p2, by = c("Province" = names(p2)[1])) %>%
  as.data.frame() %>%
  head()

 

If data frames have identical column names that you want to use in the join operation, you can join them without specifying them. For example, the function left_join from dplyr will automatically detect column names that can be used and will tell you that, like in the message below.

names(p2)[1] <- "Province" 

left_join(p1, p2) %>% 
  as.data.frame() %>%
  head()

#Joining, by = "Province"
#Province Fertility Agriculture Examination Education
#1   Courtelary      80.2        17.0          15        12
#2     Delemont      83.1        45.1           6         9
#3 Franches-Mnt      92.5        39.7           5         5
#4      Moutier      85.8        36.5          12         7
#5   Neuveville      76.9        43.5          17        15
#6   Porrentruy      76.1        35.3           9         7

 

There are other interesting scenarios that might be useful if you are using the join functions from dplyr.

For example, here is how to join only selected columns from the data frame in R or execute multiple dplyr left_joins at once.

Otherwise, here are my top 10 dplyr tips and tricks.


Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *