bind R data frames vertically

How to bind R data frames vertically

Here is how to bind two or more R data frames vertically using different methods and dealing with possible errors. Binding R data frames together by rows might be easy, but there might be situations where columns don’t match or there are more than two data frames.

Bind R data frames vertically with the base function

By using the R base function rbind, you can bind two data frames with matching column names.

df1 <- head(iris)
df1$select <- "head"
df2 <- tail(iris)
df2$select <- "tail"

rbind(df1, df2)


#    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species select
#1            5.1         3.5          1.4         0.2    setosa   head
#2            4.9         3.0          1.4         0.2    setosa   head
#3            4.7         3.2          1.3         0.2    setosa   head
#4            4.6         3.1          1.5         0.2    setosa   head
#5            5.0         3.6          1.4         0.2    setosa   head
#6            5.4         3.9          1.7         0.4    setosa   head
#145          6.7         3.3          5.7         2.5 virginica   tail
#146          6.7         3.0          5.2         2.3 virginica   tail
#147          6.3         2.5          5.0         1.9 virginica   tail
#148          6.5         3.0          5.2         2.0 virginica   tail
#149          6.2         3.4          5.4         2.3 virginica   tail
#150          5.9         3.0          5.1         1.8 virginica   tail

rbind error – number of columns of arguments do not match

If you are dealing with this error, column names of data frames are not matching. Here is more about this error. If this mismatch is acceptable, then the appending with dplyr bind_rows is the best solution. See that below.

Compare R data frame columns and see the mismatch

If the differences in column names are unexpected, then here is a quick method to find them.

You can also use the setdiff function to get unmatching column names of R data frames.

df1 <- head(iris)[1:4]
df1$select <- "head"
df2 <- tail(iris)
df2$select <- "tail"

rbind(df1, df2)
#Error in rbind(deparse.level, ...) : 
#  numbers of columns of arguments do not match

setdiff(names(df1), names(df2))
#character(0)
setdiff(names(df2), names(df1))
#[1] "Species"

 

Bind R data frames vertically with dplyr bind_rows

Here is a function bind_rows that helps to deal easily with several problematic situations binding R data frames. The standard situation when you are stacking two data frames on top of each other looks like this.

It is possible to add an id column that is useful to examine the results after appending.

require(dplyr)

df1 <- head(iris)
df2 <- tail(iris)

bind_rows(df1, df2, .id = 'id')

#   id Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1   1          5.1         3.5          1.4         0.2    setosa
#2   1          4.9         3.0          1.4         0.2    setosa
#3   1          4.7         3.2          1.3         0.2    setosa
#4   1          4.6         3.1          1.5         0.2    setosa
#5   1          5.0         3.6          1.4         0.2    setosa
#6   1          5.4         3.9          1.7         0.4    setosa
#7   2          6.7         3.3          5.7         2.5 virginica
#8   2          6.7         3.0          5.2         2.3 virginica
#9   2          6.3         2.5          5.0         1.9 virginica
#10  2          6.5         3.0          5.2         2.0 virginica
#11  2          6.2         3.4          5.4         2.3 virginica
#12  2          5.9         3.0          5.1         1.8 virginica

In the situation when there are differences in data frame column names or column count, bind_rows will append them together. Be careful that there is no mistake with column names.

df1 <- head(iris)
df2 <- tail(iris)[1:4]


bind_rows(df1, df2, .id = 'id')

#   id Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1   1          5.1         3.5          1.4         0.2  setosa
#2   1          4.9         3.0          1.4         0.2  setosa
#3   1          4.7         3.2          1.3         0.2  setosa
#4   1          4.6         3.1          1.5         0.2  setosa
#5   1          5.0         3.6          1.4         0.2  setosa
#6   1          5.4         3.9          1.7         0.4  setosa
#7   2          6.7         3.3          5.7         2.5    NA
#8   2          6.7         3.0          5.2         2.3    NA
#9   2          6.3         2.5          5.0         1.9    NA
#10  2          6.5         3.0          5.2         2.0    NA
#11  2          6.2         3.4          5.4         2.3    NA
#12  2          5.9         3.0          5.1         1.8    NA

If you have multiple R data frames that you want to bind together, add them all to the bind_rows.

df1 <- head(iris)
df2 <- tail(iris)
df3 <- iris[10:15,]


bind_rows(df1, df2, df3, .id = 'id')

#   id Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1   1          5.1         3.5          1.4         0.2    setosa
#2   1          4.9         3.0          1.4         0.2    setosa
#3   1          4.7         3.2          1.3         0.2    setosa
#4   1          4.6         3.1          1.5         0.2    setosa
#5   1          5.0         3.6          1.4         0.2    setosa
#6   1          5.4         3.9          1.7         0.4    setosa
#7   2          6.7         3.3          5.7         2.5 virginica
#8   2          6.7         3.0          5.2         2.3 virginica
#9   2          6.3         2.5          5.0         1.9 virginica
#10  2          6.5         3.0          5.2         2.0 virginica
#11  2          6.2         3.4          5.4         2.3 virginica
#12  2          5.9         3.0          5.1         1.8 virginica
#13  3          4.9         3.1          1.5         0.1    setosa
#14  3          5.4         3.7          1.5         0.2    setosa
#15  3          4.8         3.4          1.6         0.2    setosa
#16  3          4.8         3.0          1.4         0.1    setosa
#17  3          4.3         3.0          1.1         0.1    setosa
#18  3          5.8         4.0          1.2         0.2    setosa

Bind list of R data frames vertically

You can also get a list of available R data frames and bind them together by rows. The advantage is that after binding together, the id column contains the names of data frames.

require(dplyr)

df1 <- head(iris)
df2 <- tail(iris)
df3 <- iris[10:15,]

dlist <- list(df1, df2, df3)
names(dlist) <- c("df1", "df2", "df3")

bind_rows(dlist, .id = 'id')

#    id Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1  df1          5.1         3.5          1.4         0.2    setosa
#2  df1          4.9         3.0          1.4         0.2    setosa
#3  df1          4.7         3.2          1.3         0.2    setosa
#4  df1          4.6         3.1          1.5         0.2    setosa
#5  df1          5.0         3.6          1.4         0.2    setosa
#6  df1          5.4         3.9          1.7         0.4    setosa
#7  df2          6.7         3.3          5.7         2.5 virginica
#8  df2          6.7         3.0          5.2         2.3 virginica
#9  df2          6.3         2.5          5.0         1.9 virginica
#10 df2          6.5         3.0          5.2         2.0 virginica
#11 df2          6.2         3.4          5.4         2.3 virginica
#12 df2          5.9         3.0          5.1         1.8 virginica
#13 df3          4.9         3.1          1.5         0.1    setosa
#14 df3          5.4         3.7          1.5         0.2    setosa
#15 df3          4.8         3.4          1.6         0.2    setosa
#16 df3          4.8         3.0          1.4         0.1    setosa
#17 df3          4.3         3.0          1.1         0.1    setosa
#18 df3          5.8         4.0          1.2         0.2    setosa

You can also bind multiple data frames by part of their names. After creating a list of R data frames, you can use that in bind_rows. Results are the same as above.

dlist <- mget(ls(pattern = 'df'))

bind_rows(dlist, .id = 'id')

 

Load and bind together at the same time

Sometimes the need for binding R data frames is a result of importing multiple data frames.
In that case, you should try this method that can load and combine multiple RDS files at once.

Ensure that the data frame contains defined columns

Let’s imagine that during the data processing, the count of data frame columns can change. At the same time, you want to be sure that the columns of the end result always stay the same.

Here is a good solution that shows how to create an empty R data frame. By using bind_rows, it will ensure that there are always all necessary columns.

If you like to use the dplyr package, make sure that you know some of the dplyr tips and tricks.





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *