Filter data by multiple conditions in R using Dplyr
Last Updated :
25 Jan, 2022
In this article, we will learn how can we filter dataframe by multiple conditions in R programming language using dplyr package.
The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R programming language can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset data frame has to be retained in a separate variable.
Method 1: Using filter() directly
For this simply the conditions to check upon are passed to the filter function, this function automatically checks the dataframe and retrieves the rows which satisfy the conditions.
Syntax: filter(df , condition)
Parameter :
df: The data frame object
condition: filtering based upon this condition
Example : R program to filter rows using filter() function
R
library(dplyr)
# sample data
df=data.frame(x=c(12,31,4,66,78),
y=c(22.1,44.5,6.1,43.1,99),
z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
# condition
filter(df, x<50 & z==TRUE)
Output:
x y z
1 12 22.1 TRUE
2 31 44.5 TRUE
Method 2: Using %>% with filter()
This approach is considered to be a cleaner approach when you are working with a large set of conditions because the dataframe is being referred to using %>% and then the condition is being applied through the filter() function.
Syntax: df %>% filter ( condition )
Parameter:
df: The data frame object
condition: filtering based upon this condition
Example : R program to filter using %>%
R
library(dplyr)
df=data.frame(x=c(12,31,4,66,78),
y=c(22.1,44.5,6.1,43.1,99),
z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
df %>%
filter(y < 45, z != FALSE)
Output:
x y z
1 12 22.1 TRUE
2 31 44.5 TRUE
3 66 43.1 TRUE
Method 3: Using NA with filter()
is.na() function accepts a value and returns TRUE if it's a NA value and returns FALSE if it's not a NA value.
Syntax: df %>% filter(!is.na(x))
Parameters:
is.na(): reqd to check whether the value is NA or not
x: column of dataframe object.
Example: R program to filter dataframe using NA
R
library(dplyr)
df=data.frame(x=c(12,31,NA,NA,NA),
y=c(22.1,44.5,6.1,10,99),
z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
df %>% filter(!is.na(x))
Output:
x y z
1 12 22.1 TRUE
2 31 44.5 TRUE
Method 4: Using '%in%' operator with filter()
The %in% operator is used to filter out only the columns which contain the data provided in the vector.
Syntax: filter( column %in% c("data1", "data2"...."data N" ))
Parameters:
column: column name of the dataframe
c("data1", "data2"...."data N"): A vector containing the names of data to be found and printed.
Example: R program to filter dataframe using %in%
R
library(dplyr)
df=data.frame(x=c(12,31,10,2,99),
y=c(22.1,44.5,6.1,10,99),
z=c("Apple","Guava", "Mango", "Apple","Mango"))
df %>%
filter(z %in% c("Apple", "Mango"))
Output:
x y z
1 12 22.1 Apple
2 10 6.1 Mango
3 2 10.0 Apple
4 99 99.0 Mango
Similar Reads
Filter Pandas Dataframe with multiple conditions
In this article, let's discuss how to filter pandas dataframe with multiple conditions. There are possibilities of filtering data from Pandas dataframe with multiple conditions during the entire software development. Filter Pandas Dataframe with multiple conditionsThe reason is dataframe may be havi
6 min read
Pyspark - Filter dataframe based on multiple conditions
In this article, we are going to see how to Filter dataframe based on multiple conditions. Let's Create a Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an
3 min read
Filter multiple values on a string column in R using Dplyr
In this article we will learn how to filter multiple values on a string column in R programming language using dplyr package. Method 1: Using filter() method filter() function is used to choose cases and filtering out the values based on the filtering conditions. Syntax: filter(df, condition) Parame
3 min read
Delete rows in PySpark dataframe based on multiple conditions
In this article, we are going to see how to delete rows in PySpark dataframe based on multiple conditions. Method 1: Using Logical expression Here we are going to use the logical expression to filter the row. Filter() function is used to filter the rows from RDD/DataFrame based on the given conditio
2 min read
Filter Rows Based on Conditions in a DataFrame in R
In this article, we will explore various methods to filter rows based on Conditions in a data frame by using the R Programming Language. How to filter rows based on Conditions in a data frame R language offers various methods to filter rows based on Conditions in a data frame. By using these methods
3 min read
Subset Data Frames Using Logical Conditions In R
In this article, we will explore various methods of Subset data frames using logical conditions using the R Programming Language. How to Subset data frames using logical conditionsR language offers various methods to subset data frames using logical conditions. By using these methods provided by R,
3 min read
Drop multiple columns using Dplyr package in R
In this article, we will discuss how to drop multiple columns using dplyr package in R programming language. Dataset in use: Drop multiple columns by using the column name We can remove a column with select() method by its column name Syntax: select(dataframe,-c(column_name1,column_name2,.,column_na
4 min read
Summarise multiple columns using dplyr in R
In this article, we will discuss how to summarise multiple columns using dplyr package in R Programming Language, Method 1: Using summarise_all() method The summarise_all method in R is used to affect every column of the data frame. The output data frame returns all the columns of the data frame whe
3 min read
Remove duplicate rows based on multiple columns using Dplyr in R
In this article, we will learn how to remove duplicate rows based on multiple columns using dplyr in R programming language. Dataframe in use: lang value usage 1 Java 21 21 2 C 21 21 3 Python 3 0 4 GO 5 99 5 RUST 180 44 6 Javascript 9 48 7 Cpp 12 53 8 Java 21 21 9 Julia 6 6 10 Typescript 0 8 11 Pyth
4 min read
Group data.table by Multiple Columns in R
In this article, we will discuss how to group data.table by multiple columns in R programming language. The package data.table can be used to work with data tables and subsetting and organizing data. It can be downloaded and installed into the workspace using the following command :Â library(data.ta
3 min read