How to filter R dataframe by multiple conditions?
Last Updated :
23 May, 2021
In R programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained :
- Rows are considered to be a subset of the input.
- Rows in the subset appear in the same order as the original data frame.
- Columns remain unmodified.
- The number of groups may be reduced, based on conditions.
- Data frame attributes are preserved during the data filter.
- Row numbers may not be retained in the final output
The data frame rows can be subjected to multiple conditions by combining them using logical operators, like AND (&) , OR (|). The rows returning TRUE are retained in the final output.
Method 1: Using indexing method and which() function
Any data frame column in R can be referenced either through its name df$col-name or using its index position in the data frame df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then data frame subset can be obtained. These conditions are applied to the row index of the data frame so that the satisfied rows are returned. Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition.
Syntax: which( vec, arr.ind = F)
Parameter :
vec - The vector to be subjected to conditions
The %in% operator is used to check a value in the vector specified.
Syntax:
val %in% vec
Example:
R
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1 are
# equivalent to b or e or the col2
# value is greater than 4
data_frame_mod <- data_frame[which(data_frame$col1 %in% c("b","e")
| data_frame$col2 > 4),]
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
4 e 4 TRUE
5 d 5 TRUE
The conditions can be aggregated together, without the use of which method also.
Example:
R
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1
# are equivalent to b or e
data_frame_mod <- data_frame[data_frame$col1 %in% c("b","e")
& data_frame$col2 > 4,]
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
[1] col1 col2 col3
<0 rows> (or 0-length row.names)
Method 2: Using dplyr package
The dplyr library can be installed and loaded into the working space which is used to perform data manipulation. The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset data frame has to be retained in a separate variable.
Syntax: filter(df , cond)
Parameter :
df - The data frame object
cond - The condition to filter the data upon
The difference in the application of this approach is that it doesn't retain the original row numbers of the data frame.
Example:
R
library ("dplyr")
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","e") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1 are
# equivalent to b and col3 is not
# TRUE
data_frame_mod <- filter(
data_frame,col1 == "b" & col3!=TRUE)
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 2 FALSE
Method 3: Using subset method
The subset() method in base R is used to return subsets of vectors, matrices, or data frames which satisfy the applied conditions. The subset() method is concerned with the rows. The row numbers are retained while applying this method.
Syntax: subset(df , cond)
Arguments :
df - The data frame object
cond - The condition to filter the data upon
Example:
R
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1 are
# equivalent to b or col2 value is
# greater than 4
data_frame_mod <- subset(data_frame, col1=="b" | col2 > 4)
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
5 d 5 TRUE
Similar Reads
How to filter R DataFrame by values in a column?
In R Programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained : Rows are considered to be a subset of the input.Rows in the subset appear in the same order as the original
5 min read
How to select multiple DataFrame columns by name in R ?
In this article, we will discuss how to select multiple columns from a DataFrame by name in R Programming Language. To get multiple columns we will use the list data structure. By using a list we can pass the dataframe columns separated with a comma. Then, we can get list by using list() function Sy
1 min read
Sort a given DataFrame by multiple column(s) in R
Sorting of data may be useful when working on a large data and data is un-arranged, so it is very helpful to sort data first before applying operations. In this article, we will learn how to sort given dataframes by multiple columns in R. Approach:Create data frameChoose any more number of columns m
2 min read
How to Conditionally Remove Rows in R DataFrame?
In this article, we will discuss how to conditionally remove rows from a dataframe in the R Programming Language. We need to remove some rows of data from the dataframe conditionally to prepare the data. For that, we use logical conditions on the basis of which data that doesn't follow the condition
4 min read
How to merge multiple DataFrames in R ?
In this article, we will discuss how to merge multiple dataframes in R Programming Language. Dataframes can be merged both row and column wise, we can merge the columns by using cbind() function and rows by using rbind() function Merging by Columns cbind() is used to combine the dataframes by column
2 min read
Split DataFrame Variable into Multiple Columns in R
In this article, we will discuss how to split dataframe variables into multiple columns using R programming language. Method 1: Using do.call method The strsplit() method in R is used to split the specified column string vector into corresponding parts. The pattern is used to divide the string into
3 min read
Insert multiple rows in R DataFrame
In this article, we are going to see how to insert multiple rows in the dataframe in R Programming Language. First, let's create a DataFrame To create a data frame we need to use vectors. We need to create vectors with some values and pass the vectors into data.frame() function as parameter. Thus, a
4 min read
How to Select DataFrame Columns by Index in R?
In this article, we will discuss how to select columns by index from a dataframe in R programming language. Note: The indexing of the columns in the R programming language always starts from 1. Method 1: Select Specific Columns By Index with Base R Here, we are going to select columns by using index
2 min read
Convert large list to dataframe in R
In this article, we will discuss how to convert a large list to a dataframe in the R Programming Language. Method 1 : Using rbindlist() First, create a large list. Then use the Map function on the list and convert it to dataframe using the as.data.frame function in R. The map function applies a fun
3 min read
How to sort R DataFrame by the contents of a column ?
In this article, we will discuss how to sort DataFrame by the contents of the column in R Programming language. We can use the order() function for the same. order() function with the provided parameters returns a permutation that rearranges its first argument into ascending or descending order, bre
1 min read