Filter data by multiple conditions in R using Dplyr
Last Updated :
25 Jan, 2022
In this article, we will learn how can we filter dataframe by multiple conditions in R programming language using dplyr package.
The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R programming language can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset data frame has to be retained in a separate variable.
Method 1: Using filter() directly
For this simply the conditions to check upon are passed to the filter function, this function automatically checks the dataframe and retrieves the rows which satisfy the conditions.
Syntax: filter(df , condition)
Parameter :
df: The data frame object
condition: filtering based upon this condition
Example : R program to filter rows using filter() function
R
library (dplyr)
df= data.frame (x= c (12,31,4,66,78),
y= c (22.1,44.5,6.1,43.1,99),
z= c ( TRUE , TRUE , FALSE , TRUE , TRUE ))
filter (df, x<50 & z== TRUE )
|
Output:
x y z
1 12 22.1 TRUE
2 31 44.5 TRUE
Method 2: Using %>% with filter()
This approach is considered to be a cleaner approach when you are working with a large set of conditions because the dataframe is being referred to using %>% and then the condition is being applied through the filter() function.
Syntax: df %>% filter ( condition )
Parameter:
df: The data frame object
condition: filtering based upon this condition
Example : R program to filter using %>%
R
library (dplyr)
df= data.frame (x= c (12,31,4,66,78),
y= c (22.1,44.5,6.1,43.1,99),
z= c ( TRUE , TRUE , FALSE , TRUE , TRUE ))
df %>%
filter (y < 45, z != FALSE )
|
Output:
x y z
1 12 22.1 TRUE
2 31 44.5 TRUE
3 66 43.1 TRUE
Method 3: Using NA with filter()
is.na() function accepts a value and returns TRUE if it’s a NA value and returns FALSE if it’s not a NA value.
Syntax: df %>% filter(!is.na(x))
Parameters:
is.na(): reqd to check whether the value is NA or not
x: column of dataframe object.
Example: R program to filter dataframe using NA
R
library (dplyr)
df= data.frame (x= c (12,31, NA , NA , NA ),
y= c (22.1,44.5,6.1,10,99),
z= c ( TRUE , TRUE , FALSE , TRUE , TRUE ))
df %>% filter (! is.na (x))
|
Output:
x y z
1 12 22.1 TRUE
2 31 44.5 TRUE
Method 4: Using ‘%in%’ operator with filter()
The %in% operator is used to filter out only the columns which contain the data provided in the vector.
Syntax: filter( column %in% c(“data1”, “data2″….”data N” ))
Parameters:
column: column name of the dataframe
c(“data1”, “data2″….”data N”): A vector containing the names of data to be found and printed.
Example: R program to filter dataframe using %in%
R
library (dplyr)
df= data.frame (x= c (12,31,10,2,99),
y= c (22.1,44.5,6.1,10,99),
z= c ( "Apple" , "Guava" , "Mango" , "Apple" , "Mango" ))
df %>%
filter (z % in % c ( "Apple" , "Mango" ))
|
Output:
x y z
1 12 22.1 Apple
2 10 6.1 Mango
3 2 10.0 Apple
4 99 99.0 Mango
Similar Reads
Subset or Filter data with multiple conditions in PySpark
Sometimes while dealing with a big dataframe that consists of multiple rows and columns we have to filter the dataframe, or we want the subset of the dataframe for applying operation according to our need. For getting subset or filter the data sometimes it is not sufficient with only a single condit
3 min read
NumPy - Filtering rows by multiple conditions
In this article, we will discuss how to filter rows of NumPy array by multiple conditions. Before jumping into filtering rows by multiple conditions, let us first see how can we apply filter based on one condition. There are basically two approaches to do so: Method 1: Using mask array The mask func
4 min read
Filtering Data Using Conditions Joined by AND Operator
In the world of database management, precision is important. Whether we are dealing with customer records, financial transactions, or inventory data, the ability to retrieve specific information quickly is essential. SQL or Structured Query Language provides powerful tools for filtering data from da
5 min read
Pyspark - Filter dataframe based on multiple conditions
In this article, we are going to see how to Filter dataframe based on multiple conditions. Let's Create a Dataframe for demonstration: C/C++ Code # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving
3 min read
Filter Pandas Dataframe with multiple conditions
In this article, let's discuss how to filter pandas dataframe with multiple conditions. There are possibilities of filtering data from Pandas dataframe with multiple conditions during the entire software development. Filter Pandas Dataframe with multiple conditionsThe reason is dataframe may be havi
6 min read
How to Filter Data Using Conditions Joined by AND Operator
In the field of data analysis and database processing, efficient filtering is critical to obtain significant information. Filtering is based on the selection of data where the data criteria are applied. One commonly employed method is using the AND operator to join multiple conditions, allowing for
4 min read
Filter multiple values on a string column in R using Dplyr
In this article we will learn how to filter multiple values on a string column in R programming language using dplyr package. Method 1: Using filter() method filter() function is used to choose cases and filtering out the values based on the filtering conditions. Syntax: filter(df, condition) Parame
3 min read
Delete rows in PySpark dataframe based on multiple conditions
In this article, we are going to see how to delete rows in PySpark dataframe based on multiple conditions. Method 1: Using Logical expression Here we are going to use the logical expression to filter the row. Filter() function is used to filter the rows from RDD/DataFrame based on the given conditio
3 min read
Filter Rows Based on Conditions in a DataFrame in R
In this article, we will explore various methods to filter rows based on Conditions in a data frame by using the R Programming Language. How to filter rows based on Conditions in a data frame R language offers various methods to filter rows based on Conditions in a data frame. By using these methods
3 min read
Subset Data Frames Using Logical Conditions In R
In this article, we will explore various methods of Subset data frames using logical conditions using the R Programming Language. How to Subset data frames using logical conditionsR language offers various methods to subset data frames using logical conditions. By using these methods provided by R,
3 min read