How to filter R DataFrame by values in a column?
Last Updated :
30 May, 2021
In R Programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained :
- Rows are considered to be a subset of the input.
- Rows in the subset appear in the same order as the original dataframe.
- Columns remain unmodified.
- The number of groups may be reduced, based on conditions.
- dataframe attributes are preserved during data filter.
Method 1 : Using dataframe indexing
Any dataframe column in the R programming language can be referenced either through its name df$col-name or using its index position in the dataframe df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then a dataframe subset can be obtained. These conditions are applied to the row index of the dataframe so that the satisfied rows are returned.
- Selection based on a check of missing values or NA
Cells in dataframe can contain missing values or NA as its elements, and they can be verified using is.na() method in R language.
Example:
R
data_frame = data.frame (col1 = c ( NA , "b" , NA , "e" , "e" ) ,
col2 = c (0,2,1,4,5),
col3= c ( TRUE , FALSE , FALSE , TRUE , TRUE ))
print ( "Original dataframe" )
print (data_frame)
data_frame_mod <- data_frame[! is.na (data_frame$col1),]
print ( "Modified dataframe" )
print (data_frame_mod)
|
Output
[1] “Original dataframe”
col1 col2 col3
1 <NA> 0 TRUE
2 b 2 FALSE
3 <NA> 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] “Modified dataframe”
col1 col2 col3
2 b 2 FALSE
4 e 4 TRUE
5 e 5 TRUE
- Selection based on a single comparative condition on a column
Column values can be subjected to constraints to filter and subset the data. The values can be mapped to specific occurrences or within a range.
Example:
R
data_frame = data.frame (col1 = c ( "b" , "b" , "e" , "e" , "e" ) ,
col2 = c (0,2,1,4,5),
col3= c ( TRUE , FALSE , FALSE , TRUE , TRUE ))
print ( "Original dataframe" )
print (data_frame)
data_frame_mod <- data_frame[data_frame$col3== TRUE ,]
print ( "Modified dataframe" )
print (data_frame_mod)
|
Output
[1] “Original dataframe”
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 e 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] “Modified dataframe”
col1 col2 col3
1 b 0 TRUE
4 e 4 TRUE
5 e 5 TRUE
- Selection based on multiple comparative conditions on a column
Column values can be subjected to constraints to filter and subset the data. The conditions can be combined by logical & or | operators. The %in% operator is used here, in order to check values that match to any of the values within a specified vector.
Example:
R
data_frame = data.frame (col1 = c ( "b" , "b" , "d" , "e" , "e" ) ,
col2 = c (0,2,1,4,5),
col3= c ( TRUE , FALSE , FALSE , TRUE , TRUE ))
print ( "Original dataframe" )
print (data_frame)
data_frame_mod <- data_frame[data_frame$col1 % in % c ( "b" , "e" ),]
print ( "Modified dataframe" )
print (data_frame_mod)
|
Output
[1] “Original dataframe”
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] “Modified dataframe”
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
4 e 4 TRUE
5 e 5 TRUE
Method 2 : Using dplyr library
The dplyr library can be installed and loaded into the working space which is used to perform data manipulation.
The filter() function is used to produce a subset of the dataframe, retaining all rows that satisfy the specified conditions. The filter() method in R can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset dataframe has to be retained in a separate variable.
Syntax:
filter(df , cond)
Parameter :
df – The dataframe object
cond – The condition to filter the data upon
Example:
R
library ( "dplyr" )
data_frame = data.frame (col1 = c ( "b" , "b" , "d" , "e" , "e" ) ,
col2 = c (0,2,1,4,5),
col3= c ( TRUE , FALSE , FALSE , TRUE , TRUE ))
print ( "Original dataframe" )
print (data_frame)
data_frame_mod <- filter (data_frame,col2>1)
print ( "Modified dataframe" )
print (data_frame_mod)
|
Output
[1] “Original dataframe”
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] “Modified dataframe”
col1 col2 col3
1 b 2 FALSE
2 e 4 TRUE
3 e 5 TRUE
Also, the values can be checked using the %in% operator to match the column cell values with the elements contained in the input specified vector.
Example:
R
library ( "dplyr" )
data_frame = data.frame (col1 = c ( "b" , "b" , "d" , "e" , "e" ) ,
col2 = c (0,2,1,4,5),
col3= c ( TRUE , FALSE , FALSE , TRUE , TRUE ))
print ( "Original dataframe" )
print (data_frame)
data_frame_mod <- filter (data_frame,col1 % in % c ( "b" , "e" ))
print ( "Modified dataframe" )
print (data_frame_mod)
|
Output
[1] “Original dataframe”
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] “Modified dataframe”
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
4 e 4 TRUE
5 e 5 TRUE
Similar Reads
How to find the unique values in a column of R dataframe?
In this article, we will discuss how to find out the unique value in a column of dataframe in R Programming language. For this task, unique() function is used where the column name is passed for which unique values are to be printed. Syntax: unique(x) parameters: x: data frame For a column name sele
1 min read
How to add column to dataframe in R ?
In this article, we are going to see how to add columns to dataframe in R. First, let's create a sample dataframe. Adding Column to the DataFrame We can add a column to a data frame using $ symbol. syntax: dataframe_name $ column_name = c( value 1,value 2 . . . , value n) Here c() function is a vect
2 min read
How to find the sum of column values of an R dataframe?
In this article, we are going to find the sum of the column values of a dataframe in R with the use of sum() function. Syntax: sum(dataframe$column_name) Creating a Dataframe A dataframe can be created with the use of data.frame() function that is pre-defined in the R library. This function accepts
2 min read
How to extract column from data frame as vector in R ?
In this article, we are going to convert dataframe column into a vector in R Programming Language. Steps -Create vectorsCreate a dataframe by passing these vectorsConvert dataframe column using"[[]]" operator (indexing). [[]] is used to access the dataframe column.It is used to index the dataframe.
1 min read
How to filter R dataframe by multiple conditions?
In R programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained : Rows are considered to be a subset of the input.Rows in the subset appear in the same order as the original
5 min read
Sum of rows based on column value in R dataframe
In this article, we will be discussing how we can sum up row values based on column value in a data frame in R Programming Language. Suppose you have a data frame like this: fruits shop_1 shop_2 1. Apple 1 13 2. Mango 9 5 3. Strawberry 2 14 4. Apple 10 6 5. Apple 3 15 6. Strawberry 11 7 7. Mango 4 1
2 min read
How to change row values based on a column value in R dataframe ?
In this article, we will see how to change the values in rows based on the column values in Dataframe in R Programming Language. Syntax: df[expression ,] <- newrowvalue Arguments : df - Data frame to simulate the modification uponexpression - Expression to evaluate the cell data based on a column
4 min read
How to Extract a Column from R DataFrame to a List ?
In this article, we will discuss how to extract a column from a DataFrame to a List in R Programming Language. Method 1: Converting all columns to list In this method, we are going to create a vector with character (names) and integer(marks) type data and passing to the student dataframe. Similarly,
2 min read
How to Sort a DataFrame by Date in R?
In this article, we will discuss how to sort a dataframe in R Programming Language. We can create a dataframe in R by using data.frame() function. In a dataframe we can create a date column using as.Date() function in the '%m/%d/%Y' format. Example: Let's create a dataframe with 2 columns including
2 min read
Drop column(s) by name from a given DataFrame in R
Dropping of columns from a data frame is simply used to remove the unwanted columns in the data frame. In this article, we will be discussing the two different approaches to drop columns by name from a given Data Frame in R. The different approaches to drop columns by the name from a data frame is R
3 min read