Calculation of weighted mean in R is done easily by dedicated function or mathematically. Here are a couple of situations and solutions to successfully do weighted mean calculations in R.
Excel users might prefer ar term weighted average, but actually, there might be a difference between mean and average. Mostly in terminology.
Weighted mean in R
Here is my data frame. The third column already contains calculations that come from different amounts. To calculate the mean from all the separate means, it should be weighted.
df <- data.frame( 'cat' = c('A', 'A', 'B', 'B'), 'var amount' = c(88, 31, 84, 41), 'var mean time' = c(312, 437, 211, 818) ) df # cat var.amount var.mean.time #1 A 88 312 #2 A 31 437 #3 B 84 211 #4 B 41 818
A very convenient way to calculate the weighted mean in R is by using weighted.mean function that comes from the stats package.
weighted.mean(df$var.mean.time, df$var.amount) #[1] 378.1352
If there are NA values in your data, weighted.mean have an additional argument to ignore those. An example of that is below.
Calculate weighted mean in dplyr pipe
If you like to use dplyr and want to calculate the weighted mean by using the capabilities of this package, then here is how to do that.
require(dplyr) df %>% summarise(wm_var = weighted.mean(var.mean.time, var.amount)) # wm_var #1 378.1352
Weighted mean by group in R
If you want to calculate the weighted mean by subcategories in your data, then it is easily done by adding the group_by function.
df %>% group_by(cat) %>% summarise(wm_var = weighted.mean(var.mean.time, var.amount)) %>% as.data.frame() # cat wm_var #1 A 344.563 #2 B 410.096
The results of the weighted.mean function might appear without decimal places in the console, but I fixed that by transforming the result into the data frame.
Similarly, you can calculate cumulative sum, count, percentage, or do other calculations by a group in R.
Weighted mean across columns in R
Here is how to calculate weighted mean using several columns within the R data frame. Here is a data frame that has weights and means separated into columns. To calculate the weighted mean, there is necessary to do that for each row in a data frame and by using relevant pairs of columns.
df <- data.frame( 'var1 amount' = c(88, 31, 84), 'var1 mean time' = c(312, 437, 211), 'var2 amount' = c(75, 45, 91), 'var3 mean time' = c(761, 321, 752) ) df # var1.amount var1.mean.time var2.amount var3.mean.time #1 88 312 75 761 #2 31 437 45 321 #3 84 211 91 752
Package dplyr has a set of useful functions that help to select necessary columns and do a calculation across them.
require(dplyr) df %>% rowwise() %>% mutate("wt_mean" = weighted.mean(across(ends_with("time")), across(ends_with("amount")))) %>% as.data.frame() # var1.amount var1.mean.time var2.amount var3.mean.time wt_mean #1 88 312 75 761 518.5951 #2 31 437 45 321 368.3158 #3 84 211 91 752 492.3200
Here is a different way to do this calculation by multiplying pairs of columns and summarising them in rows.
df %>% mutate("wt mean" = rowSums(across(ends_with("amount")) * across(ends_with("time"))) / rowSums(across(ends_with("amount")))) # var1.amount var1.mean.time var2.amount var3.mean.time wt mean #1 88 312 75 761 518.5951 #2 31 437 45 321 368.3158 #3 84 211 91 752 492.3200
If your data frame has NA values, you can easily ignore them with an additional argument.
df <- data.frame( 'var1 amount' = c(88, 31, 84), 'var1 mean time' = c(312, 437, 211), 'var2 amount' = c(75, 45, NA), 'var3 mean time' = c(761, 321, NA) ) df %>% rowwise() %>% mutate("wt_mean" = weighted.mean(across(ends_with("time")), across(ends_with("amount")), na.rm = T)) %>% as.data.frame() # var1.amount var1.average.time var2.amount var3.average.time wt_mean #1 88 312 75 761 518.5951 #2 31 437 45 321 368.3158 #3 84 211 NA NA 211.0000
If you want to round the result of weighted mean, then check out this post to avoid unexpected results.
Here are my top 10 favorite dplyr tips and tricks that are a great addition to this post.
If you are new to R but an experienced Excel user, you can compare the result of the weighted average in Excel to double-check.
Leave a Reply