R

Running, moving, rolling average in R, dplyr

Running, moving, rolling average in R, dplyr

You can calculate the moving average (also called a running or rolling average) in different ways by using R packages.

Running average with dplyr

Here is one of the scenarios that can be executed with dplyr. I will use R built-in dataset airquality.

head(airquality)

#  Ozone Solar.R Wind Temp Month Day
#1    41     190  7.4   67     5   1
#2    36     118  8.0   72     5   2
#3    12     149 12.6   74     5   3
#4    18     313 11.5   62     5   4
#5    NA      NA 14.3   56     5   5
#6    28      NA 14.9   66     5   6

It contains wind measurements by every day of the month. Let’s say I want to calculate the running average for each month.

With dplyr, it can be done mathematically. I will create temporary column rec. That column will be used by the base function cumsum to calculate average wind speed at every necessary point. Here is another example with a cumulative sum that you can use to explore cumsum.

require(dplyr)

airquality <- airquality %>% 
  group_by(Month) %>% 
  mutate(rec = 1) %>% 
  mutate(rollavg = cumsum(Wind)/cumsum(rec)) %>% 
  select(-rec)

head(as.data.frame(airquality))
#  Ozone Solar.R Wind Temp Month Day   rollavg
#1    41     190  7.4   67     5   1  7.400000
#2    36     118  8.0   72     5   2  7.700000
#3    12     149 12.6   74     5   3  9.333333
#4    18     313 11.5   62     5   4  9.875000
#5    NA      NA 14.3   56     5   5 10.760000
#6    28      NA 14.9   66     5   6 11.450000

If you don’t like a lot of decimal numbers, you can use rounding or formatting. Here is another example of scientific notation.

airquality$rollavg <- format(airquality$rollavg, digits = 3)

head(as.data.frame(airquality))
#  Ozone Solar.R Wind Temp Month Day rollavg
#1    41     190  7.4   67     5   1    7.40
#2    36     118  8.0   72     5   2    7.70
#3    12     149 12.6   74     5   3    9.33
#4    18     313 11.5   62     5   4    9.88
#5    NA      NA 14.3   56     5   5   10.76
#6    28      NA 14.9   66     5   6   11.45

Moving, rolling average in R

One of the best ways to calculate rolling average in R or any other rolling calculation is using package RcppRoll. There are a lot of functions that start with “roll…” that can calculate the rolling average, rolling minimum, maximum, etc. You can also calculation in a lot of variations – 7 day rolling average, 14 day rolling average, etc. You can also use it in dplyr mutate like cumsum in the previous example.

7 day moving average in R goes like this.

require(RcppRoll)

airquality$d7_rollavg <- roll_mean(airquality$Wind, n = 7, align = "right", fill = NA)

airquality$d7_rollavg <- format(airquality$d7_rollavg, digits = 3)

head(as.data.frame(airquality), n= 10)
#   Ozone Solar.R Wind Temp Month Day rollavg d7_rollavg
#1     41     190  7.4   67     5   1    7.40         NA
#2     36     118  8.0   72     5   2    7.70         NA
#3     12     149 12.6   74     5   3    9.33         NA
#4     18     313 11.5   62     5   4    9.88         NA
#5     NA      NA 14.3   56     5   5   10.76         NA
#6     28      NA 14.9   66     5   6   11.45         NA
#7     23     299  8.6   65     5   7   11.04      11.04
#8     19      99 13.8   59     5   8   11.39      11.96
#9      8      19 20.1   61     5   9   12.36      13.69
#10    NA     194  8.6   69     5  10   11.98      13.11

 




0 comments on “Running, moving, rolling average in R, dplyr

Leave a Reply

Your email address will not be published. Required fields are marked *