Open In App

String Matching in R Programming

Last Updated : 23 May, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

String matching is an important aspect of any language. It is useful in finding, replacing as well as removing string(s). In order to understand string matching in R Language, we first have to understand what related functions are available in R. In order to do so, we can either use the matching strings or regular expressions. A regular expression is a string that contains special symbols and characters to find and extract the information needed from the given data. Regular expressions are basically strings containing characters and special symbols. To learn more about Regular Expressions.

Operations on String Matching

Finding a String

In order to search for a particular pattern in a string, we can use many functions. If we need to find the location of the required string/pattern, we can use the grep() method. On the other hand, if we just need to know whether the pattern exists or not, we can use the logical function grepl() which returns either True or False based on the result. Let us learn more about the methods.
  • grep() function: It returns the index at which the pattern is found in the vector. If there are multiple occurrences of the pattern, it returns a list of indices of the occurrences. This is very useful as it not only tells us about the occurrence of the pattern but also of its location in the vector.
    Syntax: grep(pattern, string, ignore.case=FALSE) Parameters: pattern: A regular expressions pattern. string: The character vector to be searched. ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.
    Example 1: To find all instances of 'he' in the string. Python3
    str <- c("Hello", "hello", "hi", "hey")
    grep('he', str)
    
    Output:
    [1] 2 4
    
    As you noticed in the above example 'He' was not considered because of the difference in the cases of 'H' and 'h'. But if the one wants the cases to be ignored the parameter ignore.case to True which is by default set as False. Example 2: To find all instances of 'he' in the string irrespective of case Python3
    str <- c("Hello", "hello", "hi", "hey")
    grep('he', str, ignore.case ="True")
    
    Output:
    [1] 1 2 4
    
  • grepl() function: It is a logical function that returns the value True if the specified pattern is found in the vector and false if it is not found.
    Syntax: grepl(pattern, string, ignore.case=FALSE) Parameters: pattern: A regular expressions pattern. string: The character vector to be searched. ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.
    Example 1: To find whether any instance(s) of 'the' are present in the string. Python3
    str <- c("Hello", "hello", "hi", "hey")
    grepl('the', str)
    
    Output:
    [1] FALSE
    
    Example 2: To find whether any instance(s) of 'he' are present in the string. Python3
    str <- c("Hello", "hello", "hi", "hey")
    grepl('he', str)
    
    Output:
    [1] TRUE
    
  • regexpr() function: It searches for occurrences of a pattern in every element of the string. For example, if a vector consists of 'n' strings, all 'n' strings are searched for the pattern. If the pattern is found, the index of the pattern is returned. If not found, -1 is returned. Therefore the size of the output vector returned is equal to the size of the input.
    Syntax: regexpr(pattern, string, ignore.case = FALSE) Parameters: pattern: A regular expression pattern. string: The character vector to be searched, where each element is searched separately. ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.
    Example 1: To find whether any instance(s) of 'he' is present in each string of the vector. Python3
    str <- c("Hello", "hello", "hi", "ahey")
    regexpr('he', str)
    
    Output:
    [1] -1  1 -1  2
    
    Example 2: To find whether any instance(s) of words starting with a vowel is present in each string of the vector. python3
    str <- c("abra", "Ubra", "hunt", "quirky")
    regexpr('^[aeiouAEIOU]', str)
    
    Output:
    [1]  1  1 -1 -1
    Example 3:To find whether each string is of the pattern '10+1' of the vector. python3
    str <- c("1001", "11", "10012", "101")
    regexpr('10 + 1$', str)
    
    Output:
    [1]  1 -1 -1  1

Finding and Replacing Strings

In order to search and replace a particular string, we can use two functions namely, sub() and gsub(). sub replaces the only first occurrence of the string to be replaced and returns the modified string. gsub(), on the other hand, replaces all occurrences of the string to be replaced and returns the modified string.
Syntax: sub(pattern, replaced_string, string) gsub(pattern, replaced_string, string) Parameters: pattern: A regular expressions pattern. string: The vector to be searched for instance(s) of the pattern to be replaced. ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.
Example 1: To replace the first occurrence of 'he' with 'aa' Python3
str = "heutabhe"
sub('he', 'aa', str)
Output:
aautabhe
Example 2: To replace all occurrences of 'he' with 'aa' Python3
str = "heutabhe"
gsub('he', 'aa', str)
Output:
[1] "aautabaa"

Finding and Removing Strings

In order to search and remove a particular string/pattern, we can use two functions namely, str_remove() and str_remove_all(). str_remove() removes the only first occurrence of the string/pattern to be removed and returns the modified string. str_remove_all() on the other hand removes all occurrences of the string to be removes and returns the modified string.
Syntax: str_remove(string, pattern, ignore.case=False) Parameters: pattern: A regular expressions pattern. string: The character vector to be searched for instance(s) of the pattern to be removed. ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.
Example 1: Removing the first occurrence of vowels in the vector python3
library(stringr)
x <- c("apple", "pear", "banana")
str_remove(x, "[aeiou]")
Output:
[1] "pple"  "par"   "bnana"
Example 2: Removing all occurrences of vowels in the vector python3
library(stringr)
x <- c("apple", "pear", "banana")
str_remove_all(x, "[aeiou]")
Output:
[1] "ppl" "pr"  "bnn"

Next Article
Article Tags :
Practice Tags :

Similar Reads