filter data in R

filter in R

Here are more than 5 examples of how to apply a filter in R to take a look or get a subset of your data. Depending on your goals solution might differ.

 

Filter by using RStudio viewer

RStudio has a spreadsheet-style data viewer that you can use mainly by using function View. Here are some of the RStudio tips and tricks that show how to open a data viewer by clicking.

You can test that by viewing the dataset iris.

View(iris)

You can see a filter button like in the picture below.

filter in RStudio

Increase amount of columns shown in RStudio viewer

By default, there is a limit of columns that you can see in the RStudio viewer. In the latest RStudio versions amount of columns that you can see might be limited to 50. If you want to change that, for example, to 500, you can do that like this.

rstudioapi::writeRStudioPreference("data_viewer_max_columns", 500L)

It might not work if the RStudio version is like 1.2.1335. In that case there will be error: unexpected ‘,’ in “(“data_viewer_max_columns”,”.

 

Filter by using base R

You can use function subset to filter the necessary. Species column from iris dataset contains 3 different values and 50 records for each of them.

table(iris$Species)

#    setosa versicolor  virginica 
#        50         50         50

Here are a couple of other examples if you want to get a count of something in R.

If I want to get the subset of rows that contains the necessary value, it looks like this.

head(subset(iris, iris$Species == "virginica"))

#    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#101          6.3         3.3          6.0         2.5 virginica
#102          5.8         2.7          5.1         1.9 virginica
#103          7.1         3.0          5.9         2.1 virginica
#104          6.3         2.9          5.6         1.8 virginica
#105          6.5         3.0          5.8         2.2 virginica
#106          7.6         3.0          6.6         2.1 virginica

 

%in% R

If there are multiple values that you want to use in R to filter, then try in operator. You can filter multiple values like this.

head(subset(iris, iris$Species %in% c("setosa", "virginica")))

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
#4          4.6         3.1          1.5         0.2  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4         3.9          1.7         0.4  setosa

If you want to organize filter criteria separately, then you can also try this way.

criteria <- c("setosa"
              , "virginica")

subset(iris, iris$Species %in% criteria)

If you want to create a not-in condition in R, then here is how to do that.

Take a look at this post if you want to filter by partial match in R using grepl.

 

Filter function from dplyr

There is a function in R that has an actual name filter. That function comes from the dplyr package. Perhaps a little bit more convenient naming.

require(dplyr)

iris %>%
  filter(Species == "virginica") %>%
  head()

#  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1          6.3         3.3          6.0         2.5 virginica
#2          5.8         2.7          5.1         1.9 virginica
#3          7.1         3.0          5.9         2.1 virginica
#4          6.3         2.9          5.6         1.8 virginica
#5          6.5         3.0          5.8         2.2 virginica
#6          7.6         3.0          6.6         2.1 virginica

If you have multiple filter criteria for the content of the same column, then you can also combine them within the function.

iris %>%
  filter(Species %in% c("setosa", "virginica")) %>%
  head()

In case you have involved multiple columns in filtering, combine them by using or and and operators.

iris %>%
  filter(Species %in% c("setosa", "virginica") & Sepal.Width > 4) %>%
  head()

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.7         4.4          1.5         0.4  setosa
#2          5.2         4.1          1.5         0.1  setosa
#3          5.5         4.2          1.4         0.2  setosa

 

Filter by date interval in R

You can use dates that are only in the dataset or filter depending on today’s date returned by R function Sys.Date.

Sys.Date()
#[1] "2022-01-12"

Take a look at these examples on how to subtract days from the date. For example, filtering data from the last 7 days look like this.

df <- df %>% filter(Date >= Sys.Date() - 7 & Date < Sys.Date())

 

Filter last month in R

If you want to filter last month’s data, try function rollback from lubridate that returns the last date of the previous month.

lubridate::rollback(Sys.Date())
#[1] "2021-12-31"
df <- df %>% filter(Date > lubridate::rollback(Sys.Date()))

Subtract months from the current date to get the last 3 months data.

lubridate::rollback(Sys.Date() - months(2))
#[1] "2021-10-31"





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *