There are at least two ways how to fill down values in R data frame columns. The fastest way is to use zoo package function na.locf that is combined with dplyr function mutate. Besides, sometimes in R, it is necessary to replace NA with 0. All of those things in this post.
Here are my data frame and code for that.
df <- data.frame( Date = as.POSIXct(c("2019-12-01", "2019-12-02", "2019-12-03", "2019-12-04", "2019-12-05", "2019-12-06", "2019-12-07", "2019-12-08", "2019-12-09", "2019-12-10")), Sales1 = as.numeric(c("2", NA, "7", NA, NA, "9", "12", "14", NA, NA)), Sales2 = as.numeric(c(NA, "8", "9", "10", "13", NA, "17", NA, NA, "20")), Sales3 = as.numeric(c(NA, NA, "3", "5", "7", "10", NA, "14", "19", "22")) )
As a result, columns should be filled like this.
Fill data frame values with fill function from the tidyr package
Tidyr has a handy function fill. This approach is easy to understand but pretty slow in large data frames with more than 100 000 rows.
In this case, I’m interested to fill values in all columns. To save some time, you can call all data frame columns with function names or colnames.
df <- data.frame( Date = as.POSIXct(c("2019-12-01", "2019-12-02", "2019-12-03", "2019-12-04", "2019-12-05", "2019-12-06", "2019-12-07", "2019-12-08", "2019-12-09", "2019-12-10")), Sales1 = as.numeric(c("2", NA, "7", NA, NA, "9", "12", "14", NA, NA)), Sales2 = as.numeric(c(NA, "8", "9", "10", "13", NA, "17", NA, NA, "20")), Sales3 = as.numeric(c(NA, NA, "3", "5", "7", "10", NA, "14", "19", "22")) ) require(tidyr) require(dplyr) df <- df %>% fill(names(df), .direction = "down")
Another option is to select all columns with dplyr function everything.
df <- df %>% fill(everything(), .direction = "down")
Fill R data frame NA values with 0
In some cases, there is necessary to replace NA with 0. As you can see in the previous figure, some of the columns start with NA, and that might be logical. In R, you can do it by using square brackets.
# replace NA with 0 df[is.na(df)] <- 0
Fill R data frame values with na.locf function from zoo package
This approach is the fastest. There is a handy zoo package function na.locf that replaces NA value with the most recent non-NA value. In combination with mutate it can replace existing columns. It is not so flexible as tidyr fill where you can change filling options but on large data frames at least a few times faster.
df <- data.frame( Date = as.POSIXct(c("2019-12-01", "2019-12-02", "2019-12-03", "2019-12-04", "2019-12-05", "2019-12-06", "2019-12-07", "2019-12-08", "2019-12-09", "2019-12-10")), Sales1 = as.numeric(c("2", NA, "7", NA, NA, "9", "12", "14", NA, NA)), Sales2 = as.numeric(c(NA, "8", "9", "10", "13", NA, "17", NA, NA, "20")), Sales3 = as.numeric(c(NA, NA, "3", "5", "7", "10", NA, "14", "19", "22")) ) require(zoo) require(dplyr) df <- df %>% mutate( Sales1 = na.locf(Sales1, na.rm = F), Sales2 = na.locf(Sales2, na.rm = F), Sales3 = na.locf(Sales3, na.rm = F) ) # replace NA with 0 df[is.na(df)] <- 0
If you are new to R, check out this other post on how to switch from Excel to R.
Leave a Reply