count number of observations in R

count in R, more than 10 examples

Count in R might be one of the calculations that can give a quick and useful insight into data. Sometimes it might be all that’s necessary for a simple analysis. In this post, I collected more than 10 useful and different examples of how to count values in R.

Count in R by using base functionality

Here is how to count rows, columns, or observations in a column if you like.

nrow(iris)
#[1] 150

ncol(iris)
#[1] 5

By using the dim function, you can count rows and columns at the same time.

dim(iris)
#[1] 150   5

The table function is a great addition to that that helps to count by categories in a column.

table(iris$Species)

#    setosa versicolor  virginica 
#        50         50         50

 

Count conditionally in R

You can use base R to create conditions and count the number of occurrences in a column. If you are an Excel user, it is similar to the function COUNTIF.

Here are three ways to count conditionally in R and get the same result.

nrow(iris[iris$Species == "setosa", ])
#[1] 50
nrow(subset(iris, iris$Species == "setosa"))
#[1] 50
length(which(iris$Species == "setosa"))
#[1] 50

 

Count in R by using dplyr

function count

The count function from the dplyr package is easy and intuitive to use. That’s one of my top 10 favorite dplyr tips and tricks.

require(dplyr)

mtcars %>% count(cyl)

#  cyl  n
#1   4 11
#2   6  7
#3   8 14

You can give a name to a count column and save time with renaming.

mtcars %>% count(cyl, name = "CountByCyl")

#  cyl CountByCyl
#1   4         11
#2   6          7
#3   8         14

If you want to sort the results by the count you can do that with a dedicated parameter.

mtcars %>% count(cyl, name = "CountByCyl", sort = TRUE)

#  cyl CountByCyl
#1   8         14
#2   4         11
#3   6          7

There is possible to count by multiple columns at the same time.

mtcars %>% count(cyl, gear, name = "count", sort = TRUE)

#  cyl gear count
#1   8    3    12
#2   4    4     8
#3   6    4     4
#4   4    5     2
#5   6    3     2
#6   8    5     2
#7   4    3     1
#8   6    5     1

If you combine that with the filter function you can create a conditional count in R.

mtcars %>%
filter(gear > 3) %>%
count(cyl, name = "CountByCyl", sort = TRUE)

#  cyl CountByCyl
#1   4         10
#2   6          5
#3   8          2

If you want to count by multiple conditions, add them all to the filter function.

mtcars %>%
filter(gear == 4 & hp > 100) %>%
count(cyl, name = "CountByCyl", sort = TRUE)

#  cyl CountByCyl
#1   6          4
#2   4          1

The count function from the dplyr package is one simple function and sometimes all that is necessary at the beginning of the analysis.

 

function add_count

By using the function add_count, you can quickly get a column with a count by the group and keep records ungrouped. If you are using the dplyr package, this is a great addition to the function count.

Here is an example that shows how frequently certain amount stations report seismic activity.

require(dplyr)


quakes %>% 
  add_count(stations, name = "cnt_stations") %>% 
  head()

#     lat   long depth mag stations cnt_stations
#1 -20.42 181.62   562 4.8       41           12
#2 -20.62 181.03   650 4.2       15           34
#3 -26.00 184.10    42 5.4       43           14
#4 -17.97 181.66   626 4.1       19           29
#5 -20.42 181.96   649 4.0       11           28
#6 -19.68 184.31   195 4.0       12           25

 

Count and do other calculations by a group in R, function n

Function n you can use, for example, with the summarize function. If we only need a count of something, then the previous approach is with less typing.

This approach might be handy if you want to do other calculations in a group. For example, percentage by group, minimum or maximum value by group, or cumulative sum or count.

mtcars %>%
  group_by(cyl, gear) %>%
  summarise(cnt = n()) %>% 
  as.data.frame()

#  cyl gear cnt
#1   4    3   1
#2   4    4   8
#3   4    5   2
#4   6    3   2
#5   6    4   4
#6   6    5   1
#7   8    3  12
#8   8    5   2

 

Count NA values in column or data frame

By knowing previously described possibilities, there are multiple ways how to count NA values.

Here is something different to detect that in the data frame. The airquality dataset is an R dataset that contains missing values and is useful in this demonstration.

sum(is.na(airquality))
#[1] 44

If you want to count NA values for each data frame column, then here is how to do that.

colSums(is.na(airquality))
#  Ozone Solar.R    Wind    Temp   Month     Day 
#     37       7       0       0       0       0

Here are a number of observations without NA.

colSums(!is.na(airquality))
#  Ozone Solar.R    Wind    Temp   Month     Day 
#    116     146     153     153     153     153

If you want to analyze the appearance of NA values in your data more broadly take a look at this post.

In more complex calculations columnwise, try the dplyr capabilities.

 

Count unique values in R

The previous methods can be used to deal with this.
If I’m wrong, here is another post that might be useful.





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *