grepl string start or end with in R

Detect strings that start or end with in R

Here is how to detect strings that start or end with certain parameters in R. You can do that by using grepl and a little bit of regex or package stringr. In this case, function grepl is a better choice than grep because it returns a logical vector that is useful to detect and filter necessary records.

Here is my dataset. One column contains strings and is used below.

snames <- data.frame('province' = row.names(swiss))

head(snames)

#      province
#1   Courtelary
#2     Delemont
#3 Franches-Mnt
#4      Moutier
#5   Neuveville
#6   Porrentruy

Detect strings that start with by using grepl in R

Let’s say I want to filter all rows where the string starts with “La”. By using regex, it is possible to define that beginning of the string should match the desired combination.

subset(snames, grepl('^La', snames$province))

#       province
#18     Lausanne
#19    La Vallee
#20       Lavaux
#40 La Chauxdfnd

Ignore the case by using grepl

By default, grepl is case sensitive, and if you want to ignore case, use ignore.case parameter.

subset(snames, grepl('^la', snames$province, ignore.case = TRUE))

#       province
#18     Lausanne
#19    La Vallee
#20       Lavaux
#40 La Chauxdfnd

You can use the space symbol in the grepl pattern parameter if necessary.

subset(snames, grepl('^La ', snames$province))

#       province
#19    La Vallee
#40 La Chauxdfnd

Detect strings that start with by using str_starts from stringr

Alternative to grepl is a bunch of functions from the stringr package. Some of them are very user-friendly, but you can also use regex. If you want to filter records based on the results, try to use it in the dplyr pipe.

snames %>% filter(stringr::str_starts(province, 'La'))

#      province
#1     Lausanne
#2    La Vallee
#3       Lavaux
#4 La Chauxdfnd

Function str_starts is also case sensitive, but if you want to use it in a case insensitive way, you can do it in two ways.
The first option is text transformation to lowercase.

snames %>% filter(stringr::str_starts(tolower(province), 'la'))

#      province
#1     Lausanne
#2    La Vallee
#3       Lavaux
#4 La Chauxdfnd

The second option is with the help of the regex function.

snames %>% filter(stringr::str_starts(province, regex('la', ignore_case = T)))

#      province
#1     Lausanne
#2    La Vallee
#3       Lavaux
#4 La Chauxdfnd

 

Detect strings that end with by using grepl in R

If it is important how the string ends, then you can detect that with grepl and regex.

subset(snames, grepl('y$', snames$province))

#     province
#1  Courtelary
#6  Porrentruy
#15   Cossonay
#29      Vevey
#31    Conthey
#34   Martigwy
#35    Monthey
#39     Boudry

 

Detect strings that end with by using str_ends from stringr

If it is important to look at end of the string, package stringr contains function str_ends.

snames %>% filter(stringr::str_ends(province, 'y'))

#    province
#1 Courtelary
#2 Porrentruy
#3   Cossonay
#4      Vevey
#5    Conthey
#6   Martigwy
#7    Monthey
#8     Boudry

 

I hope this will help you detect strings that start or end with certain parameters in R.

If you want to find out the appearance of multiple strings at once, then take a look at this post that will help you to figure it out.

If you want to extract text based on the results, then take a look at this solution.





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *