Count excluding NA in R, count non missing values in R, count non-NA in R. count filled values in R

Count excluding NA in R

If you want to count values excluding NA in R, here is a simple way to do that. You can detect non-NA values and get results as TRUE or FALSE that can be used by the sum function to get the result.

 

Here is the airquality data set from R. First column contains NA values, but I would like to count all values in that column, excluding NA.

head(airquality)

#  Ozone Solar.R Wind Temp Month Day
# 1    41     190  7.4   67     5   1
# 2    36     118  8.0   72     5   2
# 3    12     149 12.6   74     5   3
# 4    18     313 11.5   62     5   4
# 5    NA      NA 14.3   56     5   5
# 6    28      NA 14.9   66     5   6

Here is the number of total rows.

NROW(airquality$Ozone)

#[1] 153

If you have to do calculations with the sum function, it contains an argument to remove NA values, but there is not that kind of option for counting. Luckily you can sum all the indicators that show where the values are not missing.

Count excluding NA in R

Here is how to get non-missing value indicators.

head(
  !is.na(airquality$Ozone)
  )

#[1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE

That leads to the count excluding NA values.

sum(
  !is.na(airquality$Ozone)
)

#[1] 116

You can convert TRUE and FALSE to 1 and 0 in R, but that makes no difference in the previous calculation.

head(
  as.integer(!is.na(airquality$Ozone))
)

#[1] 1 1 1 1 0 1

 

Count non-NA values in a row or column

You can easily count excluding NA values in all rows or columns with the functions rowSums and colSums. Here is how to do that.

head(
rowSums(!is.na(airquality))
)

#[1] 6 6 6 6 4 5

colSums(!is.na(airquality))

#  Ozone Solar.R    Wind    Temp   Month     Day 
#    116     146     153     153     153     153

Here is how to do the opposite and get a summary of NA values in the R data frame.

 

Count non-missing values by the group in R

If you want to count by the group in R and, in this case, count non-missing values, try this approach using the package dplyr. Here is how to count non-NA values by filtering out the component that you don’t want to use.

require(dplyr)

airquality %>% 
  filter(!is.na(Ozone)) %>% 
  group_by(Month) %>% 
  count(name = "non-NA values in column Ozone by Month")

#  Month non-NA values in column Ozone by Month
# 1     5                                     26
# 2     6                                      9
# 3     7                                     26
# 4     8                                     26
# 5     9                                     29

It is possible to summarise and calculate the difference with the total rows in the group.

airquality %>% 
  group_by(Month) %>% 
  summarise("non-NA values in column Ozone by Month" = sum(!is.na(Ozone)),
            "all rows by Month" = n(),
            "difference"  = `all rows by Month`  - `non-NA values in column Ozone by Month`)

#  Month non-NA values in column Ozone by Month all rows by Month difference
# 1     5                                     26                31          5
# 2     6                                      9                30         21
# 3     7                                     26                31          5
# 4     8                                     26                31          5
# 5     9                                     29                30          1

 

Here is how to do that and other useful tips and tricks for counting in R.

If you love to work with other tasks using the dplyr, here are my top 10 favorite tips and tricks that can help to increase productivity.


Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *