Here is how to detect strings that start or end with certain parameters in R. You can do that by using grepl and a little bit of regex or package stringr. In this case, function grepl is a better choice than grep because it returns a logical vector that is useful to detect and filter necessary records.
Here is my dataset. One column contains strings and is used below.
snames <- data.frame('province' = row.names(swiss)) head(snames) # province #1 Courtelary #2 Delemont #3 Franches-Mnt #4 Moutier #5 Neuveville #6 Porrentruy
Detect strings that start with by using grepl in R
Let’s say I want to filter all rows where the string starts with “La”. By using regex, it is possible to define that beginning of the string should match the desired combination.
subset(snames, grepl('^La', snames$province)) # province #18 Lausanne #19 La Vallee #20 Lavaux #40 La Chauxdfnd
Ignore the case by using grepl
By default, grepl is case sensitive, and if you want to ignore case, use ignore.case parameter.
subset(snames, grepl('^la', snames$province, ignore.case = TRUE)) # province #18 Lausanne #19 La Vallee #20 Lavaux #40 La Chauxdfnd
You can use the space symbol in the grepl pattern parameter if necessary.
subset(snames, grepl('^La ', snames$province)) # province #19 La Vallee #40 La Chauxdfnd
Detect strings that start with by using str_starts from stringr
Alternative to grepl is a bunch of functions from the stringr package. Some of them are very user-friendly, but you can also use regex. If you want to filter records based on the results, try to use it in the dplyr pipe.
snames %>% filter(stringr::str_starts(province, 'La')) # province #1 Lausanne #2 La Vallee #3 Lavaux #4 La Chauxdfnd
Function str_starts is also case sensitive, but if you want to use it in a case insensitive way, you can do it in two ways.
The first option is text transformation to lowercase.
snames %>% filter(stringr::str_starts(tolower(province), 'la')) # province #1 Lausanne #2 La Vallee #3 Lavaux #4 La Chauxdfnd
The second option is with the help of the regex function.
snames %>% filter(stringr::str_starts(province, regex('la', ignore_case = T))) # province #1 Lausanne #2 La Vallee #3 Lavaux #4 La Chauxdfnd
Detect strings that end with by using grepl in R
If it is important how the string ends, then you can detect that with grepl and regex.
subset(snames, grepl('y$', snames$province)) # province #1 Courtelary #6 Porrentruy #15 Cossonay #29 Vevey #31 Conthey #34 Martigwy #35 Monthey #39 Boudry
Detect strings that end with by using str_ends from stringr
If it is important to look at end of the string, package stringr contains function str_ends.
snames %>% filter(stringr::str_ends(province, 'y')) # province #1 Courtelary #2 Porrentruy #3 Cossonay #4 Vevey #5 Conthey #6 Martigwy #7 Monthey #8 Boudry
I hope this will help you detect strings that start or end with certain parameters in R.
If you want to find out the appearance of multiple strings at once, then take a look at this post that will help you to figure it out.
If you want to extract text based on the results, then take a look at this solution.
Leave a Reply