The dplyr function mutate in R might be one of the most popular functions that are used, for example, by creating a new data frame column. It is not hard to use that, and for some R users might be a reason why the function mutate is not fully known.
Here are 8 examples of how to use dplyr mutate in R.
- Add a new data frame column with mutate in a specific location
- Add multiple data frame columns with mutate in R
- Use newly created variables inside the next variables within mutate in R
- Add a new data frame column and drop used columns with mutate in R
- Use mutate together with across
- Use mutate together with recode
- Dplyr mutate together with if_else or case_when
- Debug mutate results with the function browser
Here is a simple example that shows you how to add a new column by using the function mutate form dplyr. The airquality dataset contains day and month numbers from the year 1973, and the new column contains the full date.
require(dplyr) airquality %>% mutate("Date" = as.Date(ISOdate(1973, Month, Day))) %>% head() # Ozone Solar.R Wind Temp Month Day Date #1 41 190 7.4 67 5 1 1973-05-01 #2 36 118 8.0 72 5 2 1973-05-02 #3 12 149 12.6 74 5 3 1973-05-03 #4 18 313 11.5 62 5 4 1973-05-04 #5 NA NA 14.3 56 5 5 1973-05-05 #6 28 NA 14.9 66 5 6 1973-05-06
1. Add a new data frame column with mutate in a specific location
The newly created column, by default, is on the far right. If you want to change the position, you can use mutate arguments before and after.
airquality %>% mutate("Date" = as.Date(ISOdate(1973, Month, Day)) , .before = Month) %>% head() # Ozone Solar.R Wind Temp Date Month Day #1 41 190 7.4 67 1973-05-01 5 1 #2 36 118 8.0 72 1973-05-02 5 2 #3 12 149 12.6 74 1973-05-03 5 3 #4 18 313 11.5 62 1973-05-04 5 4 #5 NA NA 14.3 56 1973-05-05 5 5 #6 28 NA 14.9 66 1973-05-06 5 6
Otherwise, it is possible to move a column in R to a specific position by using the function relocate.
2. Add multiple data frame columns with mutate in R
It is possible to add multiple new data frame columns with mutate. Separate them and specify the names for each of them.
Here is one additional column in the same mutate function. Temperatures converted from Fahrenheit to Celsius, and more conversion examples in R are here.
airquality %>% mutate("Date" = as.Date(ISOdate(1973, Month, Day)) , "Temp.C" = round(measurements::conv_unit(Temp, "F", "C"), 0) ) %>% head() # Ozone Solar.R Wind Temp Month Day Date Temp.C #1 41 190 7.4 67 5 1 1973-05-01 19 #2 36 118 8.0 72 5 2 1973-05-02 22 #3 12 149 12.6 74 5 3 1973-05-03 23 #4 18 313 11.5 62 5 4 1973-05-04 17 #5 NA NA 14.3 56 5 5 1973-05-05 13 #6 28 NA 14.9 66 5 6 1973-05-06 19
In my case, I like to use a separator before a new variable because it is easier for me to comment or uncomment out necessary parts, especially the last element.
airquality %>% mutate("Date" = as.Date(ISOdate(1973, Month, Day)) # , "Temp.C" = round(measurements::conv_unit(Temp, "F", "C"), 0) ) %>% head()
By the way, here is how to make multiline comments in R.
3. Use newly created variables inside the next variables within mutate in R
It is possible to create multiple variables inside the mutate. The good news is that it is also possible to reference some of the previous variables like this.
airquality %>% mutate("Date" = as.Date(ISOdate(1973, Month, Day)) , "IsoWeek" = lubridate::isoweek(Date) , "Temp.C" = round(measurements::conv_unit(Temp, "F", "C"), 0) ) %>% head() # Ozone Solar.R Wind Temp Month Day Date IsoWeek Temp.C #1 41 190 7.4 67 5 1 1973-05-01 18 19 #2 36 118 8.0 72 5 2 1973-05-02 18 22 #3 12 149 12.6 74 5 3 1973-05-03 18 23 #4 18 313 11.5 62 5 4 1973-05-04 18 17 #5 NA NA 14.3 56 5 5 1973-05-05 18 13 #6 28 NA 14.9 66 5 6 1973-05-06 18 19
4. Add a new data frame column and drop used columns with mutate in R
It will help you to eliminate extra steps in your data-wrangling process. Here is more.
airquality %>% mutate("Date" = as.Date(ISOdate(1973, Month, Day)) , .keep = "unused") %>% head() # Ozone Solar.R Wind Temp Date #1 41 190 7.4 67 1973-05-01 #2 36 118 8.0 72 1973-05-02 #3 12 149 12.6 74 1973-05-03 #4 18 313 11.5 62 1973-05-04 #5 NA NA 14.3 56 1973-05-05 #6 28 NA 14.9 66 1973-05-06
It also works if there are multiple new variables.
airquality %>% mutate("Date" = as.Date(ISOdate(1973, Month, Day)) , "IsoWeek" = lubridate::isoweek(Date) , "Temp.C" = round(measurements::conv_unit(Temp, "F", "C"), 0) , .keep = "unused" ) %>% head() # Ozone Solar.R Wind Date IsoWeek Temp.C #1 41 190 7.4 1973-05-01 18 19 #2 36 118 8.0 1973-05-02 18 22 #3 12 149 12.6 1973-05-03 18 23 #4 18 313 11.5 1973-05-04 18 17 #5 NA NA 14.3 1973-05-05 18 13 #6 28 NA 14.9 1973-05-06 18 19
5. Use mutate together with across
If you want to do the same transformations for multiple columns in R, try to combine the function mutate with across.
Here is the situation when it might be necessary to round numbers in multiple data frame columns.
head(USPersonalExpenditure) # 1940 1945 1950 1955 1960 #Food and Tobacco 22.200 44.500 59.60 73.2 86.80 #Household Operation 10.500 15.500 29.00 36.5 46.20 #Medical and Health 3.530 5.760 9.71 14.0 21.10 #Personal Care 1.040 1.980 2.45 3.4 5.40 #Private Education 0.341 0.974 1.80 2.6 3.64
With the functions mutate and across, it is easy, and you can find out more about that here.
USPersonalExpenditure %>% as.data.frame() %>% mutate(across(everything(), round, digits = 0)) # 1940 1945 1950 1955 1960 #Food and Tobacco 22 44 60 73 87 #Household Operation 10 16 29 36 46 #Medical and Health 4 6 10 14 21 #Personal Care 1 2 2 3 5 #Private Education 0 1 2 3 4
6. Use mutate together with recode
If you want to quickly fix and overwrite categorical values in the data frame column, it is easy to do with mutate and recode.
Here are the names of the species in the iris dataset.
iris %>% distinct(Species) # Species #1 setosa #2 versicolor #3 virginica
I can overwrite the column with species names by using the same name as the new column or create an additional column with recoded names like this.
iris %>% mutate("Species.Short" = recode(Species , "setosa" = "SET" , "versicolor" = "VER" , "virginica" = "VIR" )) %>% head() # Sepal.Length Sepal.Width Petal.Length Petal.Width Species Species.Short #1 5.1 3.5 1.4 0.2 setosa SET #2 4.9 3.0 1.4 0.2 setosa SET #3 4.7 3.2 1.3 0.2 setosa SET #4 4.6 3.1 1.5 0.2 setosa SET #5 5.0 3.6 1.4 0.2 setosa SET #6 5.4 3.9 1.7 0.4 setosa SET
For more examples and other solutions in a similar scenario, look here.
7. Dplyr mutate together with if_else or case_when
While you can use the R base function ifelse with mutate, there are good alternatives like if_else or case_when. The function case_when is useful if you have to look at multiple conditions or, in other words, multiple if_else statements.
airquality %>% mutate(temp_cat = case_when( Temp > 70 ~ "high", Temp <= 70 & Temp > 60 ~ "medium", TRUE ~ "low" )) %>% head() # Ozone Solar.R Wind Temp Month Day temp_cat #1 41 190 7.4 67 5 1 medium #2 36 118 8.0 72 5 2 high #3 12 149 12.6 74 5 3 high #4 18 313 11.5 62 5 4 medium #5 NA NA 14.3 56 5 5 low #6 28 NA 14.9 66 5 6 medium
The function if_else has an additional argument in comparison to the ifelse, and it is not so strict about data types.
airquality %>% mutate("Solar.R.C" = ifelse(Solar.R > 300 , "over 300" , "under 300") , .after = Solar.R) %>% head() # Ozone Solar.R Solar.R.C Wind Temp Month Day #1 41 190 under 300 7.4 67 5 1 #2 36 118 under 300 8.0 72 5 2 #3 12 149 under 300 12.6 74 5 3 #4 18 313 over 300 11.5 62 5 4 #5 NA NA NA 14.3 56 5 5 #6 28 NA NA 14.9 66 5 6 airquality %>% mutate("Solar.R.C" = if_else(Solar.R > 300 , "over 300" , "under 300" , missing = "missing") , .after = Solar.R) %>% head() # Ozone Solar.R Solar.R.C Wind Temp Month Day #1 41 190 under 300 7.4 67 5 1 #2 36 118 under 300 8.0 72 5 2 #3 12 149 under 300 12.6 74 5 3 #4 18 313 over 300 11.5 62 5 4 #5 NA NA missing 14.3 56 5 5 #6 28 NA missing 14.9 66 5 6
8. Debug mutate results with the function browser
You can add the browser function inside mutate and take a quick look at the results of a new variable like this. Great for debugging. Cudos for this tip to Twitter user @_ColinFay.
airquality %>% mutate("Date" = as.Date(ISOdate(1973, Month, Day)), browser()) %>% head() #Browse[1]> Date[1:5] #[1] "1973-05-01" "1973-05-02" "1973-05-03" "1973-05-04" "1973-05-05"
Here is how to quickly look at the data frame’s necessary part.
airquality %>% mutate("Date" = as.Date(ISOdate(1973, Month, Day))) %>% browser() %>% head() #Browse[1]> .[1:5, 5:7] #Month Day Date #1 5 1 1973-05-01 #2 5 2 1973-05-02 #3 5 3 1973-05-03 #4 5 4 1973-05-04 #5 5 5 1973-05-05
Bonus. Use count instead of mutate when necessary
By using the function add_count, you can quickly get a column with a count by the groupĀ and keep records ungrouped. It is a simpler solution to get the same result as with the function group_by and mutate.
Here is an example that shows how frequently certain amount stations report seismic activity.
quakes %>% add_count(stations, name = "cnt_stations") %>% head() # lat long depth mag stations cnt_stations #1 -20.42 181.62 562 4.8 41 12 #2 -20.62 181.03 650 4.2 15 34 #3 -26.00 184.10 42 5.4 43 14 #4 -17.97 181.66 626 4.1 19 29 #5 -20.42 181.96 649 4.0 11 28 #6 -19.68 184.31 195 4.0 12 25
Here is more about the count in R.
Dplyr function count is simple and often useful. If you want to know more tips and tricks, please take a look at my favorite dplyr tips and tricks.
Leave a Reply