Running, moving, rolling average in R, dplyr

Running, moving, rolling average in R, dplyr

You can calculate the moving average (also called a running or rolling average) in different ways by using R packages.

Running average with dplyr

Here is one of the scenarios that can be executed with dplyr. I will use R built-in dataset airquality.

head(airquality)

#  Ozone Solar.R Wind Temp Month Day
#1    41     190  7.4   67     5   1
#2    36     118  8.0   72     5   2
#3    12     149 12.6   74     5   3
#4    18     313 11.5   62     5   4
#5    NA      NA 14.3   56     5   5
#6    28      NA 14.9   66     5   6

It contains wind measurements by every day of the month. Let’s say I want to calculate the running average for each month.

With dplyr, it can be done mathematically. I will create temporary column rec. That column will be used by the base function cumsum to calculate average wind speed at every necessary point. Here is another example with a cumulative sum that you can use to explore cumsum.

require(dplyr)

airquality <- airquality %>% 
  group_by(Month) %>% 
  mutate(rec = 1) %>% 
  mutate(rollavg = cumsum(Wind)/cumsum(rec)) %>% 
  select(-rec)

head(as.data.frame(airquality))
#  Ozone Solar.R Wind Temp Month Day   rollavg
#1    41     190  7.4   67     5   1  7.400000
#2    36     118  8.0   72     5   2  7.700000
#3    12     149 12.6   74     5   3  9.333333
#4    18     313 11.5   62     5   4  9.875000
#5    NA      NA 14.3   56     5   5 10.760000
#6    28      NA 14.9   66     5   6 11.450000

If you don’t like a lot of decimal numbers, you can use rounding or formatting. Here is another example of scientific notation.

airquality$rollavg <- format(airquality$rollavg, digits = 3)

head(as.data.frame(airquality))
#  Ozone Solar.R Wind Temp Month Day rollavg
#1    41     190  7.4   67     5   1    7.40
#2    36     118  8.0   72     5   2    7.70
#3    12     149 12.6   74     5   3    9.33
#4    18     313 11.5   62     5   4    9.88
#5    NA      NA 14.3   56     5   5   10.76
#6    28      NA 14.9   66     5   6   11.45

Moving, rolling average in R

One of the best ways to calculate rolling average in R or any other rolling calculation is using package RcppRoll. There are a lot of functions that start with “roll…” that can calculate the rolling average, rolling minimum, maximum, etc. You can also calculation in a lot of variations – 7 day rolling average, 14 day rolling average, etc. You can also use it in dplyr mutate like cumsum in the previous example.

7 day moving average in R goes like this.

require(RcppRoll)

airquality$d7_rollavg <- roll_mean(airquality$Wind, n = 7, align = "right", fill = NA)

airquality$d7_rollavg <- format(airquality$d7_rollavg, digits = 3)

head(as.data.frame(airquality), n= 10)
#   Ozone Solar.R Wind Temp Month Day rollavg d7_rollavg
#1     41     190  7.4   67     5   1    7.40         NA
#2     36     118  8.0   72     5   2    7.70         NA
#3     12     149 12.6   74     5   3    9.33         NA
#4     18     313 11.5   62     5   4    9.88         NA
#5     NA      NA 14.3   56     5   5   10.76         NA
#6     28      NA 14.9   66     5   6   11.45         NA
#7     23     299  8.6   65     5   7   11.04      11.04
#8     19      99 13.8   59     5   8   11.39      11.96
#9      8      19 20.1   61     5   9   12.36      13.69
#10    NA     194  8.6   69     5  10   11.98      13.11

 





Posted

in

Comments

2 responses to “Running, moving, rolling average in R, dplyr”

  1. Lukas

    Hi Janis,
    just stumbled across this blog post.

    I do believe, your construction “rollavg = cumsum(Wind)/cumsum(rec)” does not really provide you with the rolling average.
    For example, the first entry in every month will always give you the value itself (since the first element in cumsum will be the first element in the array you are summing), and not an average. This is not a general property of the rolling average.

    Best, Lukas

    1. Janis Sturis

      Thank you for your comment, Lukas!
      Your right, in that specific case group_by(Month) does that. It is something like a cumulative moving average for each month.
      The article will need to be improved.

Leave a Reply

Your email address will not be published. Required fields are marked *