ifelse and NA problem in R

ifelse and NA problem in R

If your data frame contains NA values, then the R function ifelse might return results you don’t desire. Missing values might be a problem for ifelse. The first victory is that you are aware of that.

Here are the first rows of airquality data frame that contains NA values in some of the columns.

head(airquality)

#  Ozone Solar.R Wind Temp Month Day
#1    41     190  7.4   67     5   1
#2    36     118  8.0   72     5   2
#3    12     149 12.6   74     5   3
#4    18     313 11.5   62     5   4
#5    NA      NA 14.3   56     5   5
#6    28      NA 14.9   66     5   6

Let’s say I’m not aware of that and would like to create a column with simple ifelse logic.

airquality$Tag <- ifelse(airquality$Ozone > 20, "my tag", "no tag")

head(airquality)

#  Ozone Solar.R Wind Temp Month Day    Tag
#1    41     190  7.4   67     5   1 my tag
#2    36     118  8.0   72     5   2 my tag
#3    12     149 12.6   74     5   3 no tag
#4    18     313 11.5   62     5   4 no tag
#5    NA      NA 14.3   56     5   5   
#6    28      NA 14.9   66     5   6 my tag

As you can see in the fifth row, NA is not considered FALSE in the R function ifelse. The result is a missing value.

Combined with another ifelse statement or nested ifelse, results are even weirder.

airquality$Tag <- ifelse(airquality$Solar.R > 200, "my tag2", airquality$Tag)

head(airquality)

#  Ozone Solar.R Wind Temp Month Day     Tag
#1    41     190  7.4   67     5   1  my tag
#2    36     118  8.0   72     5   2  my tag
#3    12     149 12.6   74     5   3  no tag
#4    18     313 11.5   62     5   4 my tag2
#5    NA      NA 14.3   56     5   5    
#6    28      NA 14.9   66     5   6

Solution to ifelse NA problem

You can add additional logic for NA values to ifelse, but it might not be easy with multiple ifelse statements.

airquality$Tag <- ifelse(airquality$Ozone > 20 &
           !is.na (airquality$Ozone),
         "my tag",
         "no tag")

head(airquality)

#  Ozone Solar.R Wind Temp Month Day    Tag
#1    41     190  7.4   67     5   1 my tag
#2    36     118  8.0   72     5   2 my tag
#3    12     149 12.6   74     5   3 no tag
#4    18     313 11.5   62     5   4 no tag
#5    NA      NA 14.3   56     5   5 no tag
#6    28      NA 14.9   66     5   6 my tag

If you have multiple ifelse statements or the previous solution is too long, then case_when from the dplyr package might be what you’re looking for.

airquality <- airquality %>%
  mutate(Tag = case_when(
    airquality$Ozone > 20 ~ "my tag",
    airquality$Solar.R > 200 ~ "my tag2",
    TRUE ~ "no tag"
  ))

head(airquality)

#  Ozone Solar.R Wind Temp Month Day     Tag
#1    41     190  7.4   67     5   1  my tag
#2    36     118  8.0   72     5   2  my tag
#3    12     149 12.6   74     5   3  no tag
#4    18     313 11.5   62     5   4 my tag2
#5    NA      NA 14.3   56     5   5  no tag
#6    28      NA 14.9   66     5   6  my tag

 

There might be situations that allow replacing NA values, and here is how to do that.

If you want to detect which of the columns contains NA values, then check this Datacornering post.

I hope that some of the examples helped you. Here is another post with some tips and tricks that might be helpful while working with RStudio.

 





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *