RStudio data viewer is a great tool to look into data, but sometimes it is necessary to filter by data frame row number in R. By importing files, you might get a warning from parsing with a specified row number, and it might be necessary to do further investigation.
Imagine that you are importing a text file with the read_delim function from the readr package. Sometimes, as a result, you might get warnings like below. What exactly happened with one or multiple rows that are causing parsing failures?
Warning: 2 parsing failures. row col expected actual file 14756 X7 a double NULL 'C:/source/my.txt' 107524 X7 a double NULL 'C:/source/my.txt'
Here is a data frame that I will use in the examples below.
head(mtcars) # mpg cyl disp hp drat wt qsec vs am gear carb #Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 #Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 #Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 #Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 #Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 #Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Here are a couple of examples of filtering in R that are not so specific.
Filter by data frame row number in R
base
It is quite simple to filter by data frame row number in R if you know how the square brackets work. The first element is dedicated to rows and the other to columns. It is easy to remember where is rows and columns if you are an Excel user and know the R1C1 cell reference style.
Here is how to get the third row from the data frame.
mtcars[3,] # mpg cyl disp hp drat wt qsec vs am gear carb #Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
Here is how to filter multiple separate rows from the data frame in R.
mtcars[c(3,5),] # mpg cyl disp hp drat wt qsec vs am gear carb #Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 #Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
dplyr
Here is the same situation that was in base R, but this time with dplyr capabilities and functions like row_number and filter.
require(dplyr) mtcars %>% filter(row_number() == 3) # mpg cyl disp hp drat wt qsec vs am gear carb #Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
You can also use the base function row instead of row_number.
It is even easier to subset rows by index with the dplyr function slice.
mtcars %>% slice(3) # mpg cyl disp hp drat wt qsec vs am gear carb #Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
If you want to filter by specifying multiple separate rows with dplyr, then you can do that by using the %in% operator or quickly with function slice.
mtcars %>% filter(row_number() %in% c(3, 5)) # mpg cyl disp hp drat wt qsec vs am gear carb #Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 #Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2 mtcars %>% slice(3, 5) # mpg cyl disp hp drat wt qsec vs am gear carb #Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 #Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
Here is how to implement “not in” operator in R.
Pull values from the row in R
If you want to pull only values from a data frame row to create a vector, then here is how to do that.
as.character(as.vector(mtcars[3,])) # [1] "22.8" "4" "108" "93" "3.85" "2.32" "18.61" "1" "1" "4" "1"
Leave a Reply