Here are two quick solutions to combine Excel files in R with the necessary transformations. Fundamentally it is a similar approach that I used to combine multiple RDS files in R.
In this post, I will take a look at two scenarios of how to combine Excel files in R. In both of them, I will use the function map_dfr from the purrr and the function read_excel from the readxl. There is a couple of ways how to read data from an Excel file in R and here is a compilation of them.
The usage of purrr I found more convenient than other ways to work with loops in R.
This post will also include path detection of multiple files in R.
Simple scenario
If there is not necessary to do additional data transformations to successfully combine Excel files in R, then here is how to do that. This approach comes from one of the previous posts that contain multiple ways how to read data from Excel workbooks or multiple sheets. I highly recommend you this post because it contains multiple ways how to import data from Excel in R. In your situation read_excel from the readxl package might not be ideal.
My Excel files are located in one directory and one subdirectory.
The content is a table in one worksheet.
Here is a combination of all of them. Package purrr with set_names can handle additional columns with file ID.
require(purrr) require(readxl) path <- "C:/Users/dc/Downloads/MultiFiles" list.files( path = path, pattern = "*.xlsx", full.names = TRUE, recursive = TRUE ) %>% set_names() %>% map_dfr(read_excel, .id = "Source") ## A tibble: 6 x 3 # Source Name Value # #1 C:/Users/dc/Downloads/MultiFiles/another file/My.File3.xlsx E 5 #2 C:/Users/dc/Downloads/MultiFiles/another file/My.File3.xlsx F 6 #3 C:/Users/dc/Downloads/MultiFiles/MyFile1.xlsx A 1 #4 C:/Users/dc/Downloads/MultiFiles/MyFile1.xlsx B 2 #5 C:/Users/dc/Downloads/MultiFiles/MyFile2.xlsx C 3 #6 C:/Users/dc/Downloads/MultiFiles/MyFile2.xlsx D 4
Combine Excel files in R, including necessary transformations
If combining your files is not as simple as in the previous example, then here is how to do them with map_dfr.
The good news is that you can do some of the transformations while reading Excel files with read_excel. For example, you can read the first row as a header by using the parameter skip.
Some of to transformations for example with data types you can do after the loop that combines Excel files. In this situation, I will focus on something important inside the loop, for example, a column that contains an Excel file name or modification date. If you have to add Excel file creation or modification timestamp then here is how to do that.
require(purrr) require(dplyr) require(readxl) xlsxfiles <- list.files( path = "C:/Users/dc/Downloads/MultiFiles", pattern = "*.xlsx", full.names = TRUE, recursive = TRUE ) map_df(xlsxfiles, function(x) { read_excel(x) %>% mutate("Source" = basename(x), "Mod DT" = file.info(x)$mtime) }) ## A tibble: 6 x 4 # Name Value Source `Mod DT` # #1 E 5 My.File3.xlsx 2022-08-26 18:58:11 #2 F 6 My.File3.xlsx 2022-08-26 18:58:11 #3 A 1 MyFile1.xlsx 2022-01-04 16:28:59 #4 B 2 MyFile1.xlsx 2022-01-04 16:28:59 #5 C 3 MyFile2.xlsx 2022-01-04 18:00:26 #6 D 4 MyFile2.xlsx 2022-01-04 18:00:26
Alternatively, you can get the same result by writing differently.
xlsxfiles %>% map_df(.x, .f = ~ read_excel(.x) %>% mutate( "Source" = basename(.x), "Mod DT" = file.info(.x)$mtime ))
You can export all collected data from multiple Excel files into a single Excel file. Here is how to save that into Excel using R.
If you want to add a progress bar to the purrr loop, look into this datacornering.com post.
Leave a Reply