weighted mean in R

How to calculate weighted mean in R

Calculation of weighted mean in R is done easily by dedicated function or mathematically. Here are a couple of situations and solutions to successfully do weighted mean calculations in R.

Excel users might prefer ar term weighted average, but actually, there might be a difference between mean and average. Mostly in terminology.

 

Weighted mean in R

Here is my data frame. The third column already contains calculations that come from different amounts. To calculate the mean from all the separate means, it should be weighted.

df <- data.frame(
  'cat' = c('A', 'A', 'B', 'B'),
  'var amount' = c(88, 31, 84, 41),
  'var mean time' = c(312, 437, 211, 818)
)

df

#  cat var.amount var.mean.time
#1   A         88           312
#2   A         31           437
#3   B         84           211
#4   B         41           818

A very convenient way to calculate the weighted mean in R is by using weighted.mean function that comes from the stats package.

weighted.mean(df$var.mean.time, df$var.amount)

#[1] 378.1352

If there are NA values in your data, weighted.mean have an additional argument to ignore those. An example of that is below.

 

Calculate weighted mean in dplyr pipe

If you like to use dplyr and want to calculate the weighted mean by using the capabilities of this package, then here is how to do that.

require(dplyr)

df %>% summarise(wm_var = weighted.mean(var.mean.time, var.amount))

#    wm_var
#1 378.1352

 

Weighted mean by group in R

If you want to calculate the weighted mean by subcategories in your data, then it is easily done by adding the group_by function.

df %>%
  group_by(cat) %>%
  summarise(wm_var = weighted.mean(var.mean.time, var.amount)) %>% as.data.frame()

#  cat  wm_var
#1   A 344.563
#2   B 410.096

The results of the weighted.mean function might appear without decimal places in the console, but I fixed that by transforming the result into the data frame.

Similarly, you can calculate cumulative sum, count, percentage, or do other calculations by a group in R.

 

Weighted mean across columns in R

Here is how to calculate weighted mean using several columns within the R data frame. Here is a data frame that has weights and means separated into columns. To calculate the weighted mean, there is necessary to do that for each row in a data frame and by using relevant pairs of columns.

df <- data.frame(
  'var1 amount' = c(88, 31, 84),
  'var1 mean time' = c(312, 437, 211),
  'var2 amount' = c(75, 45, 91),
  'var3 mean time' = c(761, 321, 752)
)


df

#  var1.amount var1.mean.time var2.amount var3.mean.time
#1          88            312          75            761
#2          31            437          45            321
#3          84            211          91            752

Package dplyr has a set of useful functions that help to select necessary columns and do a calculation across them.

require(dplyr)

df %>%
  rowwise() %>%
  mutate("wt_mean" = weighted.mean(across(ends_with("time")), across(ends_with("amount")))) %>%
  as.data.frame()

#  var1.amount var1.mean.time var2.amount var3.mean.time  wt_mean
#1          88            312          75            761 518.5951
#2          31            437          45            321 368.3158
#3          84            211          91            752 492.3200

Here is a different way to do this calculation by multiplying pairs of columns and summarising them in rows.

df %>% 
  mutate("wt mean" = rowSums(across(ends_with("amount")) * across(ends_with("time"))) / rowSums(across(ends_with("amount"))))

#  var1.amount var1.mean.time var2.amount var3.mean.time  wt mean
#1          88            312          75            761 518.5951
#2          31            437          45            321 368.3158
#3          84            211          91            752 492.3200

 

If your data frame has NA values, you can easily ignore them with an additional argument.

df <- data.frame(
'var1 amount' = c(88, 31, 84),
'var1 mean time' = c(312, 437, 211),
'var2 amount' = c(75, 45, NA),
'var3 mean time' = c(761, 321, NA)
)

df %>%
  rowwise() %>%
  mutate("wt_mean" = weighted.mean(across(ends_with("time")), across(ends_with("amount")), na.rm = T)) %>% 
  as.data.frame()

#  var1.amount var1.average.time var2.amount var3.average.time  wt_mean
#1          88               312          75               761 518.5951
#2          31               437          45               321 368.3158
#3          84               211          NA                NA 211.0000

If you want to round the result of weighted mean, then check out this post to avoid unexpected results.

Here are my top 10 favorite dplyr tips and tricks that are a great addition to this post.

If you are new to R but an experienced Excel user, you can compare the result of the weighted average in Excel to double-check.





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *