combine Excel files in R, append xslx files in R

How to combine Excel files in R, including transformations

Here are two quick solutions to combine Excel files in R with the necessary transformations. Fundamentally it is a similar approach that I used to combine multiple RDS files in R.

 

In this post, I will take a look at two scenarios of how to combine Excel files in R. In both of them, I will use the function map_dfr from the purrr and the function read_excel from the readxl. There is a couple of ways how to read data from an Excel file in R and here is a compilation of them.

The usage of purrr I found more convenient than other ways to work with loops in R.

This post will also include path detection of multiple files in R.

 

Simple scenario

If there is not necessary to do additional data transformations to successfully combine Excel files in R, then here is how to do that. This approach comes from one of the previous posts that contain multiple ways how to read data from Excel workbooks or multiple sheets. I highly recommend you this post because it contains multiple ways how to import data from Excel in R. In your situation read_excel from the readxl package might not be ideal.

My Excel files are located in one directory and one subdirectory.

Excel files to combine in R

The content is a table in one worksheet.

file content to read in R

Here is a combination of all of them. Package pure with set_names can handle additional columns with file ID.

require(purrr)
require(readxl)


path <- "C:/Users/dc/Downloads/MultiFiles"

list.files(
  path = path,
  pattern = "*.xlsx",
  full.names = TRUE,
  recursive = TRUE
) %>%
  set_names() %>%
  map_dfr(read_excel, .id = "Source")

## A tibble: 6 x 3
#  Source                                                     Name  Value
#                                                         
#1 C:/Users/dc/Downloads/MultiFiles/another file/My.File3.xlsx E         5
#2 C:/Users/dc/Downloads/MultiFiles/another file/My.File3.xlsx F         6
#3 C:/Users/dc/Downloads/MultiFiles/MyFile1.xlsx               A         1
#4 C:/Users/dc/Downloads/MultiFiles/MyFile1.xlsx               B         2
#5 C:/Users/dc/Downloads/MultiFiles/MyFile2.xlsx               C         3
#6 C:/Users/dc/Downloads/MultiFiles/MyFile2.xlsx               D         4

 

Combine Excel files in R, including necessary transformations

If combining your files is not as simple as in the previous example, then here is how to do them with map_dfr.

The good news is that you can do some of the transformations while reading Excel files with read_excel. For example, you can read the first row as a header by using the parameter skip.

Some of to transformations for example with data types you can do after the loop that combines Excel files. In this situation, I will focus on something important inside the loop, for example, a column that contains an Excel file name or modification date. If you have to add Excel file creation or modification timestamp then here is how to do that.

require(purrr)
require(dplyr)
require(readxl)

xlsxfiles <-
  list.files(
    path = "C:/Users/dc/Downloads/MultiFiles",
    pattern = "*.xlsx",
    full.names = TRUE,
    recursive = TRUE
  )

map_df(xlsxfiles, function(x) {
  read_excel(x) %>%
    mutate("Source" = basename(x),
           "Mod DT" = file.info(x)$mtime)
})

## A tibble: 6 x 4   
#  Name  Value Source        `Mod DT`            
#                
#1 E         5 My.File3.xlsx 2022-08-26 18:58:11
#2 F         6 My.File3.xlsx 2022-08-26 18:58:11
#3 A         1 MyFile1.xlsx  2022-01-04 16:28:59
#4 B         2 MyFile1.xlsx  2022-01-04 16:28:59
#5 C         3 MyFile2.xlsx  2022-01-04 18:00:26
#6 D         4 MyFile2.xlsx  2022-01-04 18:00:26

Alternatively, you can get the same result by writing differently.

xlsxfiles %>%
  map_df(.x, .f = ~ read_excel(.x) %>% mutate(
    "Source" = basename(.x),
    "Mod DT" = file.info(.x)$mtime
  ))

 

You can export all collected data from multiple Excel files into a single Excel file. Here is how to save that into Excel using R.

If you want to add a progress bar to the purrr loop, look into this datacornering.com post.

script progress bar in R, purrr progress bar, lapply progress bar