How to set level of a factor column in R dataframe to NA?
Last Updated :
26 May, 2021
A data frame may contain columns belonging to different classes. A factor type column is a set of categorical variables each mapped to a unique level. The levels give us idea about the co-morbidity of the data variable with each other. These levels can be modified both quantitatively (increase/decrease) and qualitatively (value modification) using Base methods in R Programming Language.
Method 1 : Using levels() method
The levels() method in R programming language gives access to the levels attribute of a variable. All the levels of the variable can be assigned to a different value, even a missing value. As a result, the mapping of the levels changes and the levels may decrease, depending upon the number of levels' assignment to NA. All the instances of the assigned level as removed from the input variable column of the data frame as well as the original factor.
Syntax:
levels(df$col-name)[levels(df$col-name) == val ] <- NA
Example:
R
# declaring columns of data frame
col1 <-as.factor(sample(letters[1:3],15,replace=TRUE))
col2<-sample(5:10,15,replace=TRUE)
col3 <- letters[1:15]
# creating a data frame
data_frame <- data.frame(col1, col2, col3)
print ("Original DataFrame")
print (data_frame)
# getting levels of col1
print ("Levels of col1")
levels(data_frame$col1)
# if value of column1 is b then replace
# by missing value
levels(data_frame$col1)[levels(data_frame$col1)=="b"]<-NA
print ("Modified DataFrame")
print (data_frame)
print ("Levels of col1")
levels(data_frame$col1)
Output
[1] "Original DataFrame"
col1 col2 col3
1 a 8 a
2 c 10 b
3 c 9 c
4 a 8 d
5 b 8 e
6 b 8 f
7 a 5 g
8 c 9 h
9 b 8 i
10 c 9 j
11 c 8 k
12 c 8 l
13 a 6 m
14 b 10 n
15 c 6 o
[1] "Levels of col1"
[1] "a" "b" "c"
[1] "Modified DataFrame"
col1 col2 col3
1 a 8 a
2 c 10 b
3 c 9 c
4 a 8 d
5 <NA> 8 e
6 <NA> 8 f
7 a 5 g
8 c 9 h
9 <NA> 8 i
10 c 9 j
11 c 8 k
12 c 8 l
13 a 6 m
14 <NA> 10 n
15 c 6 o
[1] "Levels of col1"
[1] "a" "c"
%in% operator in R language is used to check for the presence of a value in a vector or list object. It returns a logical value depending on whether the value exists.
Syntax:
val %in% vec
Using the %in% operator, multiple values can be checked, and more than one levels can also be assigned to NA at the same time.
Example:
R
# declaring columns of data frame
col1 <-as.factor(sample(letters[1:3],15,replace=TRUE))
col2<-sample(5:10,15,replace=TRUE)
col3 <- letters[1:15]
# creating a data frame
data_frame <- data.frame(col1, col2, col3)
print ("Original DataFrame")
print (data_frame)
# getting levels of col1
print ("Levels of col1")
levels(data_frame$col1)
vec <- c("a","b")
# if value of column1 is b then replace
# by missing value
levels(data_frame$col1)[levels(data_frame$col1) %in% vec]<-NA
print ("Modified DataFrame")
print (data_frame)
print ("Levels of col1")
levels(data_frame$col1)
Output
[1] "Original DataFrame"
col1 col2 col3
1 a 7 a
2 b 5 b
3 b 7 c
4 a 5 d
5 b 5 e
6 a 10 f
7 c 8 g
8 c 5 h
9 a 6 i
10 a 8 j
11 a 5 k
12 b 10 l
13 b 5 m
14 b 6 n
15 c 8 o
[1] "Levels of col1"
[1] "a" "b" "c"
[1] "Modified DataFrame"
col1 col2 col3
1 <NA> 7 a
2 <NA> 5 b
3 <NA> 7 c
4 <NA> 5 d
5 <NA> 5 e
6 <NA> 10 f
7 c 8 g
8 c 5 h
9 <NA> 6 i
10 <NA> 8 j
11 <NA> 5 k
12 <NA> 10 l
13 <NA> 5 m
14 <NA> 6 n
15 c 8 o
[1] "Levels of col1"
[1] "c"
Method 2: Using indexing method
The desired value can be checked against the data frame column using an == operator and then all its instances from the data frame can be removed. However, this method is less efficient, since it only maps the occurrences of the variable to NA in the data frame and doesn't actually modify the number of levels already existing.
Syntax:
df$col-name[df$col-name == val] = NA
Example:
R
# declaring columns of data frame
col1 <-as.factor(sample(letters[1:3],15,replace=TRUE))
col2<-sample(5:10,15,replace=TRUE)
col3 <- letters[1:15]
# creating a data frame
data_frame <- data.frame(col1, col2, col3)
print ("Original DataFrame")
print (data_frame)
# getting levels of col1
print ("Levels of col1")
levels(data_frame$col1)
# if value of column1 is b then replace
# by missing value
data_frame$col1[data_frame$col1 == "b"] = NA
print ("Modified DataFrame")
print (data_frame)
print ("Levels of col1")
levels(data_frame$col1)
Output
[1] "Original DataFrame"
col1 col2 col3
1 c 7 a
2 b 5 b
3 b 6 c
4 c 5 d
5 c 7 e
6 c 10 f
7 b 7 g
8 a 9 h
9 c 9 i
10 a 8 j
11 c 9 k
12 a 10 l
13 a 8 m
14 b 10 n
15 a 7 o
[1] "Levels of col1"
[1] "a" "b" "c"
[1] "Modified DataFrame"
col1 col2 col3
1 c 7 a
2 <NA> 5 b
3 <NA> 6 c
4 c 5 d
5 c 7 e
6 c 10 f
7 <NA> 7 g
8 a 9 h
9 c 9 i
10 a 8 j
11 c 9 k
12 a 10 l
13 a 8 m
14 <NA> 10 n
15 a 7 o
[1] "Levels of col1"
[1] "a" "b" "c"
Similar Reads
Get All Factor Levels of DataFrame Column in R
The data frame columns in R can be factorized on the basis of its factor columns. The data frame factor columns are composed of factor levels. Factors are used to represent categorical data. Each of the factor is denoted by a level, computed in the lexicographic order of appearance of characters or
3 min read
How to add column to dataframe in R ?
In this article, we are going to see how to add columns to dataframe in R. First, let's create a sample dataframe. Adding Column to the DataFrame We can add a column to a data frame using $ symbol. syntax: dataframe_name $ column_name = c( value 1,value 2 . . . , value n)Â Here c() function is a vec
2 min read
How to add suffix to column names in R DataFrame ?
Each of the columns in a data frame is defined by a name, known as the column name. It may be of the type of numerical or string value. In this article, we will discuss how to add a suffix to column names in DataFrame in R Programming Language. Method 1 : Using paste() method In order to modify the
4 min read
How to change the order of levels of a factor in R?
In R programming language, factors are used to represent categorical data by uniquely identifying the elements from the given vector. It will return the levels of the unique elements when factor function is applied. In this article we are going to discuss how to change the levels of the factor. We c
2 min read
How to Add an Empty Column to DataFrame in R?
In this article, we will discuss how to add an empty column to the dataframe in R Programming Language. Add One Empty Column to dataframe Here we are going to add an empty column to the dataframe by assigning column values as NA. Syntax: dataframe[ , 'column_name'] = NA where, dataframe  is the inpu
1 min read
How to Select DataFrame Columns by Index in R?
In this article, we will discuss how to select columns by index from a dataframe in R programming language. Note: The indexing of the columns in the R programming language always starts from 1. Method 1: Select Specific Columns By Index with Base R Here, we are going to select columns by using index
2 min read
How to convert factor levels to list in R ?
In this article, we are going to discuss how to convert the factor levels to list data structure in R Programming Language. We can get the levels of the vector using factor() function Syntax: factor(vector) Return type: vector elements with levels. If we want to get only levels, Then we can use leve
2 min read
How to filter R DataFrame by values in a column?
In R Programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained : Rows are considered to be a subset of the input.Rows in the subset appear in the same order as the original
5 min read
How to Stack DataFrame Columns in R?
A dataframe is a tubular structure composed of rows and columns. The dataframe columns can be stacked together to divide the columns depending on the values contained within them. Method 1: Using stack method The cbind() operation is used to stack the columns of the data frame together. Initially,
3 min read
How to convert dataframe columns from factors to characters in R?
In this article, we will discuss how to convert dataframe columns from factors to characters in R Programming Language. A dataframe can have different types of columns stacked together to form a tubular structure. Easy modification of the columns' data as well as conversion between data types can be
5 min read