How to filter R dataframe by multiple conditions?
Last Updated :
23 May, 2021
In R programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained :
- Rows are considered to be a subset of the input.
- Rows in the subset appear in the same order as the original data frame.
- Columns remain unmodified.
- The number of groups may be reduced, based on conditions.
- Data frame attributes are preserved during the data filter.
- Row numbers may not be retained in the final output
The data frame rows can be subjected to multiple conditions by combining them using logical operators, like AND (&) , OR (|). The rows returning TRUE are retained in the final output.
Method 1: Using indexing method and which() function
Any data frame column in R can be referenced either through its name df$col-name or using its index position in the data frame df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then data frame subset can be obtained. These conditions are applied to the row index of the data frame so that the satisfied rows are returned. Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition.
Syntax: which( vec, arr.ind = F)
Parameter :
vec - The vector to be subjected to conditions
The %in% operator is used to check a value in the vector specified.
Syntax:
val %in% vec
Example:
R
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1 are
# equivalent to b or e or the col2
# value is greater than 4
data_frame_mod <- data_frame[which(data_frame$col1 %in% c("b","e")
| data_frame$col2 > 4),]
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
4 e 4 TRUE
5 d 5 TRUE
The conditions can be aggregated together, without the use of which method also.
Example:
R
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1
# are equivalent to b or e
data_frame_mod <- data_frame[data_frame$col1 %in% c("b","e")
& data_frame$col2 > 4,]
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
[1] col1 col2 col3
<0 rows> (or 0-length row.names)
Method 2: Using dplyr package
The dplyr library can be installed and loaded into the working space which is used to perform data manipulation. The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset data frame has to be retained in a separate variable.
Syntax: filter(df , cond)
Parameter :
df - The data frame object
cond - The condition to filter the data upon
The difference in the application of this approach is that it doesn't retain the original row numbers of the data frame.
Example:
R
library ("dplyr")
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","e") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1 are
# equivalent to b and col3 is not
# TRUE
data_frame_mod <- filter(
data_frame,col1 == "b" & col3!=TRUE)
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 2 FALSE
Method 3: Using subset method
The subset() method in base R is used to return subsets of vectors, matrices, or data frames which satisfy the applied conditions. The subset() method is concerned with the rows. The row numbers are retained while applying this method.
Syntax: subset(df , cond)
Arguments :
df - The data frame object
cond - The condition to filter the data upon
Example:
R
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1 are
# equivalent to b or col2 value is
# greater than 4
data_frame_mod <- subset(data_frame, col1=="b" | col2 > 4)
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
5 d 5 TRUE
Similar Reads
How to filter R DataFrame by values in a column? In R Programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained : Rows are considered to be a subset of the input.Rows in the subset appear in the same order as the original
5 min read
How to select multiple DataFrame columns by name in R ? In this article, we will discuss how to select multiple columns from a DataFrame by name in R Programming Language. To get multiple columns we will use the list data structure. By using a list we can pass the dataframe columns separated with a comma. Then, we can get list by using list() function Sy
1 min read
How to Conditionally Remove Rows in R DataFrame? In this article, we will discuss how to conditionally remove rows from a dataframe in the R Programming Language. We need to remove some rows of data from the dataframe conditionally to prepare the data. For that, we use logical conditions on the basis of which data that doesn't follow the condition
4 min read
How to Select DataFrame Columns by Index in R? In this article, we will discuss how to select columns by index from a dataframe in R programming language. Note: The indexing of the columns in the R programming language always starts from 1. Method 1: Select Specific Columns By Index with Base R Here, we are going to select columns by using index
2 min read
How to Extract random sample of rows in R DataFrame with nested condition In this article, we will learn how to extract random samples of rows in a DataFrame in R programming language with a nested condition. Method 1: Using sample() We will be using the sample() function to carry out this task. sample() function in R Language creates random samples based on the parameter
4 min read