Excel is possibly the most popular tool in work with data, and it is good to know how to read Excel files in R. Unfortunately, there are multiple possible scenarios and Excel file formats that might be necessary to deal with. Some of the problems by reading Excel files in R might be prevented by educating file authors.
In that case, this article about data organization in spreadsheets is my number one recommendation.
My Excel files
Here is my collection of various Excel files in the Windows Downloads folder that I will use.
The content is simple. For example, here is the data in MyFile.xlsx.
To read the file content, I will create a path that contains my Windows username.
username <- Sys.getenv("USERNAME") directory <- paste0("C:/Users/", username, "/Downloads/") filename <- "MyFile.xlsx"
Excel file content summary in R
Sometimes it is necessary to take a look at the Excel file content without opening it. One of the best approaches is by using the capabilities of package tidyxl.
In the Excel file, I have 4 sheets. One of them is hidden, and the other is very hidden. As you can see, that is not a problem.
filename <- "MyFile2Sheets.xlsx" tidyxl::xlsx_sheet_names(paste0(directory, filename)) #[1] "Sheet1" "Sheet2" "HiddenSheet" "VeryHiddenSheet"
Here is how to get a much richer Excel file summary in R.
require(tidyverse) tidyxl::xlsx_cells(paste0(directory, filename)) # A tibble: 24 x 21 # sheet address row col is_blank data_type error logical numeric date character # # 1 Sheet1 A1 1 1 FALSE character NA NA NA NA Name # 2 Sheet1 B1 1 2 FALSE character NA NA NA NA Value # 3 Sheet1 A2 2 1 FALSE character NA NA NA NA A # 4 Sheet1 B2 2 2 FALSE numeric NA NA 1 NA NA # 5 Sheet1 A3 3 1 FALSE character NA NA NA NA B # 6 Sheet1 B3 3 2 FALSE numeric NA NA 2 NA NA # 7 Sheet2 A1 1 1 FALSE character NA NA NA NA Name # 8 Sheet2 B1 1 2 FALSE character NA NA NA NA Value # 9 Sheet2 A2 2 1 FALSE character NA NA NA NA C #10 Sheet2 B2 2 2 FALSE numeric NA NA 3 NA NA # ... with 14 more rows, and 10 more variables: character_formatted , formula , is_array , # formula_ref , formula_group , comment , height , width , style_format , # local_format_id
read Excel file content by using Windows clipboard in R
If your Excel file is already open and you want to do a quick analysis, then it is possible to get data via clipboard. Here is a post that explains 3 ways how to copy data from Excel to R or vice versa. It is great if it is possible to keep it simple.
read Excel files in R without opening them
There are at least 10 R packages that can help you to get data from Excel in R. Here are 2 of them that I found versatile and useful in working with Excel files.
read Excel files in R with readxl
One of my favorites is the readxl package. By using the read_excel function, you can easily read content from xlsx, xls, and xlsm files.
require(readxl) filename <- "MyFile.xlsx" read_excel(paste0(directory, filename)) # A tibble: 2 x 2 # Name Value # #1 A 1 #2 B 2
If your Excel file has multiple sheets you can specify which one is necessary.
filename <- "MyFile2Sheets.xlsx" excel_sheets(paste0(directory, filename)) #[1] "Sheet1" "Sheet2" "HiddenSheet" "VeryHiddenSheet" read_excel(paste0(directory, filename), sheet = "Sheet2") # A tibble: 2 x 2 # Name Value # #1 C 3 #2 D 4
If you have a complex data structure in an Excel sheet, I recommend taking a look at this post about the combination of tidyxl and readxl capabilities.
It looks like the readxl package cannot read xlsb format files, but that is not a problem for excel.link package.
read Excel files in R with excel.link
This package can read Excel file formats like xls, xslb, xlsm, and xlsx. If you have a password-protected Excel file, that is not a problem for this package. This set of such properties makes this package my favorite.
Here is a simple situation with importing an xlsx format file.
require(excel.link) filename <- "MyFile.xlsx" xl.read.file(paste0(directory, filename)) # Name Value #2 A 1 #3 B 2
Here is a situation with importing the xlsb file in R.
filename <- "MyFile.xlsb" xl.read.file(paste0(directory, filename)) # Name Value #2 A 1 #3 B 2
read password protected Excel files in R
If you have a password-protected Excel file and you know the password, you can read its content by using additional parameters of xl.read.file from excel.link package. In my case, there is a password-encrypted Excel file.
require(excel.link) filename <- "MyFileProtected.xlsx" xl.read.file(paste0(directory, filename), password = "123") # Name Value #2 A 1 #3 B 2
read Azure Information Protection encrypted Excel files in R
If you want to read the Azure Information Protection encrypted file in R, try to use xl.read.file function from excel.link package. It worked for me.
But if you are using the readxl package, you will get the error message “Evaluation error: zip file”. Besides using excel.link package, I can think about the other three rational solutions.
Firstly, change the Sensitivity level to the public in Excel by using a button in the Excel ribbon.
Secondly, sometimes copying data from Excel to R via the clipboard is enough. Don’t overcomplicate if it is not necessary.
Thirdly, try to use the data in a different file format like CSV or change the data source. Sometimes Excel is a data export result from other data sources. CSV is a great format if you have only plain data. You can use that as a data source in multiple tools like Excel, Power BI, or R.
By the way, here is an example of how to combine multiple files in R.
read and combine multiple Excel sheets in R
Here is how to combine multiple Excel sheets and add sheet names.
require(purrr) require(readxl) filename <- "MyFile2Sheets.xlsx" directory <- paste0("C:/Users/", username, "/Downloads/") paste0(directory, filename) %>% excel_sheets() %>% set_names() %>% map_df( ~ read_excel(path = paste0(directory, filename), sheet = .x), .id = "sheet") # A tibble: 8 x 3 # sheet Name Value # #1 Sheet1 A 1 #2 Sheet1 B 2 #3 Sheet2 C 3 #4 Sheet2 D 4 #5 HiddenSheet from hidden 0 #6 HiddenSheet from hidden 0 #7 VeryHiddenSheet from veryhidden 0 #8 VeryHiddenSheet from veryhidden 0
More of that you can find here.
read and combine multiple Excel files in R
Below is a similar approach that I used to combine multiple RDS files in R. By using function set_names, you can add a column with the file name.
require(purrr) require(readxl) username <- Sys.getenv("USERNAME") path <- paste0("C:/Users/", username, "/Downloads/MultiFiles/") list.files( path = path, pattern = "*.xlsx", full.names = TRUE ) %>% set_names() %>% map_dfr(read_excel, .id = "source")
Besides password-protected or encrypted files, there are plenty of things that might be problematic when reading Excel file content in R.
Be prepared to deal with data wrangling with these dplyr tips and tricks.
Leave a Reply