dplyr mutate to add and remove column, add and replace column in R, mutate add new and drop existing

Use R dplyr mutate to add and remove existing columns

The main purpose of the function mutate from dplyr is to add a new variable with necessary calculations. Here is how to use the mutate to add and remove existing data frame columns if they are not necessary. It will help you to eliminate extra steps in your data-wrangling process.

 

The mutate is one of the best functions from package dplyr. If you want to know more of my favorite tips and tricks from dplyr, take a look at this post.

 

Use dplyr mutate to add and remove existing data frame columns in R

Adding and removing columns at once in the R data frame is not an easy task if you don’t know about the function mutate and specific arguments that let you do that quickly.

In some of the situations, there is a relatively simple task like overwriting a single column, replacing values in the data frame, or running simple calculations like rounding across columns.

When doing multiple or complex transformations, it might not be easy. For example, here is a data frame with air measurements in the year 1973.

head(airquality)

#  Ozone Solar.R Wind Temp Month Day
#1    41     190  7.4   67     5   1
#2    36     118  8.0   72     5   2
#3    12     149 12.6   74     5   3
#4    18     313 11.5   62     5   4
#5    NA      NA 14.3   56     5   5
#6    28      NA 14.9   66     5   6

There are month numbers and days, but I would like to add a new column with full date and, at the same time, remove month and day columns. Here is more of how to combine separate components into a date in R, but I will use base functions in this scenario.

require(dplyr)


airquality %>% 
  mutate("Date" = as.Date(ISOdate(1973, Month, Day))) %>% 
  head()

#  Ozone Solar.R Wind Temp Month Day       Date
#1    41     190  7.4   67     5   1 1973-05-01
#2    36     118  8.0   72     5   2 1973-05-02
#3    12     149 12.6   74     5   3 1973-05-03
#4    18     313 11.5   62     5   4 1973-05-04
#5    NA      NA 14.3   56     5   5 1973-05-05
#6    28      NA 14.9   66     5   6 1973-05-06

Solution

With the mutate argument keep, you can optimize the process of creating new variables by adding and removing multiple used columns simultaneously. If the keep argument contains the keyword “unused“, columns used to create new variable are removed.

airquality %>%
  mutate("Date" = as.Date(ISOdate(1973, Month, Day))
         , .keep = "unused") %>%
  head()

#  Ozone Solar.R Wind Temp       Date
#1    41     190  7.4   67 1973-05-01
#2    36     118  8.0   72 1973-05-02
#3    12     149 12.6   74 1973-05-03
#4    18     313 11.5   62 1973-05-04
#5    NA      NA 14.3   56 1973-05-05
#6    28      NA 14.9   66 1973-05-06

 

Remove existing columns after calculations with the function mutate

If you are not feeling comfortable with the previous solution, you can use the dplyr function select and specify which columns to drop. Sometimes it is necessary to keep some of the data frame columns a little longer, and this approach might be the perfect solution.

airquality %>%
  mutate("Date" = as.Date(ISOdate(1973, Month, Day))) %>%
  select(-Month, -Day) %>%
  head()

#  Ozone Solar.R Wind Temp       Date
#1    41     190  7.4   67 1973-05-01
#2    36     118  8.0   72 1973-05-02
#3    12     149 12.6   74 1973-05-03
#4    18     313 11.5   62 1973-05-04
#5    NA      NA 14.3   56 1973-05-05
#6    28      NA 14.9   66 1973-05-06

As you can see, with the dplyr function select, you can not only define which variables you want to keep but also remove and rename. More of that here.