Filtering row which contains a certain string using Dplyr in R
Last Updated :
28 Jul, 2021
In this article, we will learn how to filter rows that contain a certain string using dplyr package in R programming language.
Functions Used
Two main functions which will be used to carry out this task are:
- filter(): dplyr package's filter function will be used for filtering rows based on condition
Syntax: filter(df , condition)
Parameter :
- df: The data frame object
- condition: The condition to filter the data upon
- grepl(): grepl() function will is used to return the value TRUE if the specified string pattern is found in the vector and FALSE if it is not found.
Syntax: grepl(pattern, string, ignore.case=FALSE)
Parameters:
- pattern: regular expressions pattern
- string: character vector to be searched
- ignore.case: whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.
Dataframe in Use:
marks | age | roles |
---|
20.1 | 21 | Software Eng. |
30.2 | 22 | Software Dev |
40.3 | 23 | Data Analyst |
50.4 | 24 | Data Eng. |
60.5 | 25 | FrontEnd Dev |
Filtering rows that contain the given string
Here we have to pass the string to be searched in the grepl() function and the column to search in, this function returns true or false according to which filter() function prints the rows.
Syntax: df %>% filter(grepl('Pattern', column_name))
Parameters:
df: Dataframe object
- grepl(): finds the pattern String
- "Pattern": pattern(string) to be found
- column_name: pattern(string) will be searched in this column
Example:
R
library(dplyr)
df <- data.frame( marks = c(20.1, 30.2, 40.3, 50.4, 60.5),
age = c(21:25),
roles = c('Software Eng.', 'Software Dev',
'Data Analyst', 'Data Eng.',
'FrontEnd Dev'))
df %>% filter(grepl('Dev', roles))
Output:
marks age roles
1 30.2 22 Software Dev
2 60.5 25 FrontEnd Dev
Filtering rows that do not contain the given string
Note the only difference in this code from the above approach is that here we are using a '!' not operator, this operator inverts the output provided by the grepl() function by converting TRUE to FALSE and vice versa, this in result only prints the rows which does not contain the patterns and filter outs the rows containing the pattern.
Syntax: df %>% filter(!grepl('Pattern', column_name))
Parameters:
- df: Dataframe object
- grepl(): finds the pattern String
- "Pattern": pattern(string) to be found
- column_name: pattern(string) will be searched in this column
Example:
R
library(dplyr)
df <- data.frame( marks = c(20.1, 30.2, 40.3, 50.4, 60.5),
age = c(21:25),
roles = c('Software Eng.', 'Software Dev',
'Data Analyst', 'Data Eng.',
'FrontEnd Dev'))
df %>% filter(!grepl('Eng.', roles))
Output:
marks age roles
1 30.2 22 Software Dev
2 40.3 23 Data Analyst
3 60.5 25 FrontEnd Dev
Filtering rows containing Multiple patterns(strings)
This code is also similar to the above approaches the only difference is that while passing the multiple patterns(string) in the grepl() function, the patterns are separated with the OR(' | ') operator. This prints all the rows containing the specified pattern.
Syntax:
df %>% filter(grepl('Patt.1 | Patt.2', column_name))
Example:
R
library(dplyr)
df <- data.frame( marks = c(20.1, 30.2, 40.3, 50.4, 60.5),
age = c(21:25),
roles = c('Software Eng.', 'Software Dev',
'Data Analyst', 'Data Eng.',
'FrontEnd Dev'))
df %>% filter(grepl('Dev|Eng.', roles))
Output:
marks age roles
1 20.1 21 Software Eng.
2 30.2 22 Software Dev
3 50.4 24 Data Eng.
4 60.5 25 FrontEnd Dev
Filtering rows that do not contain multiple patterns(strings)
This code is similar to the above approach, the only difference is that we are using '!' not operator, this operator inverts the output provided by the grepl() function by converting TRUE to FALSE and vice versa, this in result only prints the rows which do not contain the specified multiple patterns and filter outs the rows containing the patterns.
Syntax:
df %>% filter(!grepl('Patt.1 | Patt.2', column_name))
Example:
R
library(dplyr)
df <- data.frame( marks = c(20.1, 30.2, 40.3, 50.4, 60.5),
age = c(21:25),
roles = c('Software Eng.', 'Software Dev',
'Data Analyst', 'Data Eng.',
'FrontEnd Dev'))
df %>% filter(!grepl('Data|Front', roles))
Output:
marks age roles
1 20.1 21 Software Eng.
2 30.2 22 Software Dev
Similar Reads
Filter multiple values on a string column in R using Dplyr In this article we will learn how to filter multiple values on a string column in R programming language using dplyr package. Method 1: Using filter() method filter() function is used to choose cases and filtering out the values based on the filtering conditions. Syntax: filter(df, condition) Parame
3 min read
Filter or subsetting rows in R using Dplyr In this article, we are going to filter the rows from dataframe in R programming language using Dplyr package. Dataframe in use: Method 1: Subset or filter a row using filter() To filter or subset row we are going to use the filter() function. Syntax: filter(dataframe,condition) Here, dataframe is t
6 min read
Extracting a String Between Two Other Strings in R String manipulation is a fundamental aspect of data processing in R. Whether you're cleaning data, extracting specific pieces of information, or performing complex text analysis, the ability to efficiently work with strings is crucial. One common task in string manipulation is extracting a substring
3 min read
Filter data by multiple conditions in R using Dplyr In this article, we will learn how can we filter dataframe by multiple conditions in R programming language using dplyr package. The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R programming languag
3 min read
Filter Rows Based on Conditions in a DataFrame in R In this article, we will explore various methods to filter rows based on Conditions in a data frame by using the R Programming Language. How to filter rows based on Conditions in a data frame R language offers various methods to filter rows based on Conditions in a data frame. By using these methods
3 min read