Here is how to bind two or more R data frames vertically using different methods and dealing with possible errors. Binding R data frames together by rows might be easy, but there might be situations where columns don’t match or there are more than two data frames.
Bind R data frames vertically with the base function
By using the R base function rbind, you can bind two data frames with matching column names.
df1 <- head(iris) df1$select <- "head" df2 <- tail(iris) df2$select <- "tail" rbind(df1, df2) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species select #1 5.1 3.5 1.4 0.2 setosa head #2 4.9 3.0 1.4 0.2 setosa head #3 4.7 3.2 1.3 0.2 setosa head #4 4.6 3.1 1.5 0.2 setosa head #5 5.0 3.6 1.4 0.2 setosa head #6 5.4 3.9 1.7 0.4 setosa head #145 6.7 3.3 5.7 2.5 virginica tail #146 6.7 3.0 5.2 2.3 virginica tail #147 6.3 2.5 5.0 1.9 virginica tail #148 6.5 3.0 5.2 2.0 virginica tail #149 6.2 3.4 5.4 2.3 virginica tail #150 5.9 3.0 5.1 1.8 virginica tail
rbind error – number of columns of arguments do not match
If you are dealing with this error, column names of data frames are not matching. Here is more about this error. If this mismatch is acceptable, then the appending with dplyr bind_rows is the best solution. See that below.
Compare R data frame columns and see the mismatch
If the differences in column names are unexpected, then here is a quick method to find them.
You can also use the setdiff function to get unmatching column names of R data frames.
df1 <- head(iris)[1:4] df1$select <- "head" df2 <- tail(iris) df2$select <- "tail" rbind(df1, df2) #Error in rbind(deparse.level, ...) : # numbers of columns of arguments do not match setdiff(names(df1), names(df2)) #character(0) setdiff(names(df2), names(df1)) #[1] "Species"
Bind R data frames vertically with dplyr bind_rows
Here is a function bind_rows that helps to deal easily with several problematic situations binding R data frames. The standard situation when you are stacking two data frames on top of each other looks like this.
It is possible to add an id column that is useful to examine the results after appending.
require(dplyr) df1 <- head(iris) df2 <- tail(iris) bind_rows(df1, df2, .id = 'id') # id Sepal.Length Sepal.Width Petal.Length Petal.Width Species #1 1 5.1 3.5 1.4 0.2 setosa #2 1 4.9 3.0 1.4 0.2 setosa #3 1 4.7 3.2 1.3 0.2 setosa #4 1 4.6 3.1 1.5 0.2 setosa #5 1 5.0 3.6 1.4 0.2 setosa #6 1 5.4 3.9 1.7 0.4 setosa #7 2 6.7 3.3 5.7 2.5 virginica #8 2 6.7 3.0 5.2 2.3 virginica #9 2 6.3 2.5 5.0 1.9 virginica #10 2 6.5 3.0 5.2 2.0 virginica #11 2 6.2 3.4 5.4 2.3 virginica #12 2 5.9 3.0 5.1 1.8 virginica
In the situation when there are differences in data frame column names or column count, bind_rows will append them together. Be careful that there is no mistake with column names.
df1 <- head(iris) df2 <- tail(iris)[1:4] bind_rows(df1, df2, .id = 'id') # id Sepal.Length Sepal.Width Petal.Length Petal.Width Species #1 1 5.1 3.5 1.4 0.2 setosa #2 1 4.9 3.0 1.4 0.2 setosa #3 1 4.7 3.2 1.3 0.2 setosa #4 1 4.6 3.1 1.5 0.2 setosa #5 1 5.0 3.6 1.4 0.2 setosa #6 1 5.4 3.9 1.7 0.4 setosa #7 2 6.7 3.3 5.7 2.5 NA #8 2 6.7 3.0 5.2 2.3 NA #9 2 6.3 2.5 5.0 1.9 NA #10 2 6.5 3.0 5.2 2.0 NA #11 2 6.2 3.4 5.4 2.3 NA #12 2 5.9 3.0 5.1 1.8 NA
If you have multiple R data frames that you want to bind together, add them all to the bind_rows.
df1 <- head(iris) df2 <- tail(iris) df3 <- iris[10:15,] bind_rows(df1, df2, df3, .id = 'id') # id Sepal.Length Sepal.Width Petal.Length Petal.Width Species #1 1 5.1 3.5 1.4 0.2 setosa #2 1 4.9 3.0 1.4 0.2 setosa #3 1 4.7 3.2 1.3 0.2 setosa #4 1 4.6 3.1 1.5 0.2 setosa #5 1 5.0 3.6 1.4 0.2 setosa #6 1 5.4 3.9 1.7 0.4 setosa #7 2 6.7 3.3 5.7 2.5 virginica #8 2 6.7 3.0 5.2 2.3 virginica #9 2 6.3 2.5 5.0 1.9 virginica #10 2 6.5 3.0 5.2 2.0 virginica #11 2 6.2 3.4 5.4 2.3 virginica #12 2 5.9 3.0 5.1 1.8 virginica #13 3 4.9 3.1 1.5 0.1 setosa #14 3 5.4 3.7 1.5 0.2 setosa #15 3 4.8 3.4 1.6 0.2 setosa #16 3 4.8 3.0 1.4 0.1 setosa #17 3 4.3 3.0 1.1 0.1 setosa #18 3 5.8 4.0 1.2 0.2 setosa
Bind list of R data frames vertically
You can also get a list of available R data frames and bind them together by rows. The advantage is that after binding together, the id column contains the names of data frames.
require(dplyr) df1 <- head(iris) df2 <- tail(iris) df3 <- iris[10:15,] dlist <- list(df1, df2, df3) names(dlist) <- c("df1", "df2", "df3") bind_rows(dlist, .id = 'id') # id Sepal.Length Sepal.Width Petal.Length Petal.Width Species #1 df1 5.1 3.5 1.4 0.2 setosa #2 df1 4.9 3.0 1.4 0.2 setosa #3 df1 4.7 3.2 1.3 0.2 setosa #4 df1 4.6 3.1 1.5 0.2 setosa #5 df1 5.0 3.6 1.4 0.2 setosa #6 df1 5.4 3.9 1.7 0.4 setosa #7 df2 6.7 3.3 5.7 2.5 virginica #8 df2 6.7 3.0 5.2 2.3 virginica #9 df2 6.3 2.5 5.0 1.9 virginica #10 df2 6.5 3.0 5.2 2.0 virginica #11 df2 6.2 3.4 5.4 2.3 virginica #12 df2 5.9 3.0 5.1 1.8 virginica #13 df3 4.9 3.1 1.5 0.1 setosa #14 df3 5.4 3.7 1.5 0.2 setosa #15 df3 4.8 3.4 1.6 0.2 setosa #16 df3 4.8 3.0 1.4 0.1 setosa #17 df3 4.3 3.0 1.1 0.1 setosa #18 df3 5.8 4.0 1.2 0.2 setosa
You can also bind multiple data frames by part of their names. After creating a list of R data frames, you can use that in bind_rows. Results are the same as above.
dlist <- mget(ls(pattern = 'df')) bind_rows(dlist, .id = 'id')
Load and bind together at the same time
Sometimes the need for binding R data frames is a result of importing multiple data frames.
In that case, you should try this method that can load and combine multiple RDS files at once.
Ensure that the data frame contains defined columns
Let’s imagine that during the data processing, the count of data frame columns can change. At the same time, you want to be sure that the columns of the end result always stay the same.
Here is a good solution that shows how to create an empty R data frame. By using bind_rows, it will ensure that there are always all necessary columns.
If you like to use the dplyr package, make sure that you know some of the dplyr tips and tricks.
Leave a Reply