add a column to the R data frame

How to add a column to the R data frame

If you are new to R, one of the first questions might be how to add a column to the R data frame. Here are multiple scenarios that will help you to handle this task.

By using the $ sign in R, you can manipulate data frame content. You can take a look at something that I already there, overwrite or create new one.
I will use airquality data set to demonstrate that.

head(airquality)

#  Ozone Solar.R Wind Temp Month Day
#1    41     190  7.4   67     5   1
#2    36     118  8.0   72     5   2
#3    12     149 12.6   74     5   3
#4    18     313 11.5   62     5   4
#5    NA      NA 14.3   56     5   5
#6    28      NA 14.9   66     5   6

There is one column that contains temperatures in the Fahrenheit range.

head(airquality$Temp)

#[1] 67 72 74 62 56 66

I will convert these Fahrenheit temperatures into Celsius by using these instructions. Here is how to add a new column to the R data frame by using the $ sign.

airquality$TempC <- 5 / 9 * (airquality$Temp - 32)

head(airquality)

#  Ozone Solar.R Wind Temp Month Day    TempC
#1    41     190  7.4   67     5   1 19.44444
#2    36     118  8.0   72     5   2 22.22222
#3    12     149 12.6   74     5   3 23.33333
#4    18     313 11.5   62     5   4 16.66667
#5    NA      NA 14.3   56     5   5 13.33333
#6    28      NA 14.9   66     5   6 18.88889

 

Add a column with mutate function from dplyr

Dplyr is one of the most popular packages in R, and it contains the function mutate that is useful in this scenario. The previously created column with the conversion from Fahrenheit to Celsius will look like this.

require(dplyr)

airquality <- airquality %>% mutate("TempC" = 5 / 9 * (Temp - 32))

head(airquality)

#  Ozone Solar.R Wind Temp Month Day    TempC
#1    41     190  7.4   67     5   1 19.44444
#2    36     118  8.0   72     5   2 22.22222
#3    12     149 12.6   74     5   3 23.33333
#4    18     313 11.5   62     5   4 16.66667
#5    NA      NA 14.3   56     5   5 13.33333
#6    28      NA 14.9   66     5   6 18.88889

There are two useful arguments before and after that you can use in the function mutate to add and move the column to the necessary position at the same time.

require(dplyr) 

airquality %>%
  mutate("TempC" = 5 / 9 * (Temp - 32), .before = Month) %>%
  head()

#  Ozone Solar.R Wind Temp    TempC Month Day
#1    41     190  7.4   67 19.44444     5   1
#2    36     118  8.0   72 22.22222     5   2
#3    12     149 12.6   74 23.33333     5   3
#4    18     313 11.5   62 16.66667     5   4
#5    NA      NA 14.3   56 13.33333     5   5
#6    28      NA 14.9   66 18.88889     5   6

Dplyr is a very useful package in R, and these 10 tips might help you quickly elevate your skills.

 

Add multiple columns to the R data frame

With help of the function mutate from dplyr, you can easily add multiple columns to the R data frame.
Let’s say that I want to create a column that contains converted temperature from Fahrenheit to Celsius and another with a rounded result. Rounding might look like a simple task, but I highly recommend taking a deeper look at some of the specific situations in R with rounding.

airquality <- airquality %>% mutate(
  "TempC" = 5 / 9 * (Temp - 32),
  "RoundedTempC" = round(TempC, digits = 0)
)

head(airquality)

#  Ozone Solar.R Wind Temp Month Day    TempC RoundedTempC
#1    41     190  7.4   67     5   1 19.44444           19
#2    36     118  8.0   72     5   2 22.22222           22
#3    12     149 12.6   74     5   3 23.33333           23
#4    18     313 11.5   62     5   4 16.66667           17
#5    NA      NA 14.3   56     5   5 13.33333           13
#6    28      NA 14.9   66     5   6 18.88889           19

If you add only two columns, you can use a basic approach like at the beginning of this post.

 

Add an empty column in R

Here is how to create an empty data frame column.

airquality$EmptyCol <- NA

head(airquality)

#  Ozone Solar.R Wind Temp Month Day    TempC RoundedTempC EmptyCol
#1    41     190  7.4   67     5   1 19.44444           19       NA
#2    36     118  8.0   72     5   2 22.22222           22       NA
#3    12     149 12.6   74     5   3 23.33333           23       NA
#4    18     313 11.5   62     5   4 16.66667           17       NA
#5    NA      NA 14.3   56     5   5 13.33333           13       NA
#6    28      NA 14.9   66     5   6 18.88889           19       NA

The alternative to that is the add_column function from tibble, which would fit better in the dplyr workflow. This function has additional arguments that help you move a new column to a certain position. It is possible to move columns anyway, but with additional steps.

airquality <- airquality %>% tibble::add_column("EmptyCol2" = NA)

head(airquality)

#  Ozone Solar.R Wind Temp Month Day    TempC RoundedTempC EmptyCol EmptyCol2
#1    41     190  7.4   67     5   1 19.44444           19       NA        NA
#2    36     118  8.0   72     5   2 22.22222           22       NA        NA
#3    12     149 12.6   74     5   3 23.33333           23       NA        NA
#4    18     313 11.5   62     5   4 16.66667           17       NA        NA
#5    NA      NA 14.3   56     5   5 13.33333           13       NA        NA
#6    28      NA 14.9   66     5   6 18.88889           19       NA        NA

 

Create an empty data frame in R

It might sound easy because it contains only empty columns. Here is a post from this blog that includes simple instructions on how to do that.

 

Create a new column in R with the condition

One of the best solutions for beginners is using the function ifelse which works similarly to the if function in Excel.

airquality$CondCol <- ifelse(airquality$Temp > 70, "high", "low")

head(airquality)

#  Ozone Solar.R Wind Temp Month Day    TempC RoundedTempC EmptyCol EmptyCol2 CondCol
#1    41     190  7.4   67     5   1 19.44444           19       NA        NA     low
#2    36     118  8.0   72     5   2 22.22222           22       NA        NA    high
#3    12     149 12.6   74     5   3 23.33333           23       NA        NA    high
#4    18     313 11.5   62     5   4 16.66667           17       NA        NA     low
#5    NA      NA 14.3   56     5   5 13.33333           13       NA        NA     low
#6    28      NA 14.9   66     5   6 18.88889           19       NA        NA     low

If you have a simple condition and true or false in the result is enough, you can create a conditional column simpler.

airquality$CondCol <- airquality$Temp > 70

head(airquality)

#  Ozone Solar.R Wind Temp Month Day    TempC RoundedTempC EmptyCol EmptyCol2 CondCol
#1    41     190  7.4   67     5   1 19.44444           19       NA        NA   FALSE
#2    36     118  8.0   72     5   2 22.22222           22       NA        NA    TRUE
#3    12     149 12.6   74     5   3 23.33333           23       NA        NA    TRUE
#4    18     313 11.5   62     5   4 16.66667           17       NA        NA   FALSE
#5    NA      NA 14.3   56     5   5 13.33333           13       NA        NA   FALSE
#6    28      NA 14.9   66     5   6 18.88889           19       NA        NA   FALSE

Here is how to convert them into 1 and 0.

 

Opposite of adding columns to the R data frame

If you want to do the opposite of adding something to the data frame, take a look at this post.
This post explains how to remove unnecessary columns in R in at least 4 different ways.

Here is how to split the existing data frame column in R.





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *