Here is how to calculate cumulative sum or count by using R built-in datasets.
Cumulative sum in R
Here is data from the R built-in airpassanger dataset. This data comes in time-series format and first of all, I will create a data frame.
The cumulative sum is calculated by using function cumsum.
am <- data.frame("year" = time(airmiles), "miles" = airmiles) am$cm_miles <- cumsum(am$miles)
Cummulative calculation within group in R
Sometimes cumulative sum is needed within the group. For example withing year, month or whatever.
You can do it by adding group_by from dplyr.
require(dplyr) # select data from Starwars dataset sw <- starwars %>% select(name, species, mass) %>% filter(!is.na(mass)) %>% arrange(species) # calculate cumulative sum of mass by species sw <- sw %>% group_by(species) %>% mutate("cm_mass" = cumsum(mass))
In this example, I was actually running into dplyr unused argument error, because select is also in MASS. Check out this post on how to deal with that.
Cummulative count or group index in R
You can do it in at least two different ways. I’m continuing the previous example.
# create column with 1 sw$rec <- 1 # calculate cumulative count sw <- sw %>% group_by(species) %>% mutate("cm_count" = cumsum(rec)) %>% select(-rec) # remove column that is not needed library(data.table) sw$cm_count2 <- rowid(sw$species)
Leave a Reply