How to set level of a factor column in R dataframe to NA?

Last Updated : 26 May, 2021

A data frame may contain columns belonging to different classes. A factor type column is a set of categorical variables each mapped to a unique level. The levels give us idea about the co-morbidity of the data variable with each other. These levels can be modified both quantitatively (increase/decrease) and qualitatively (value modification) using Base methods in R Programming Language.

Method 1 : Using levels() method

The levels() method in R programming language gives access to the levels attribute of a variable. All the levels of the variable can be assigned to a different value, even a missing value. As a result, the mapping of the levels changes and the levels may decrease, depending upon the number of levels' assignment to NA. All the instances of the assigned level as removed from the input variable column of the data frame as well as the original factor.

Syntax:

levels(df$col-name)[levels(df$col-name) == val ] <- NA

Example:

# declaring columns of data frame
col1 <-as.factor(sample(letters[1:3],15,replace=TRUE))
col2<-sample(5:10,15,replace=TRUE)
col3 <- letters[1:15]

# creating a data frame
data_frame <- data.frame(col1, col2, col3)
print ("Original DataFrame")
print (data_frame)

# getting levels of col1
print ("Levels of col1")
levels(data_frame$col1)

# if value of column1 is b then replace
# by missing value
levels(data_frame$col1)[levels(data_frame$col1)=="b"]<-NA

print ("Modified DataFrame")
print (data_frame)

print ("Levels of col1")
levels(data_frame$col1)

Output

[1] "Original DataFrame"
  col1 col2 col3
1     a    8    a
2     c   10    b
3     c    9    c
4     a    8    d
5     b    8    e
6     b    8    f
7     a    5    g
8     c    9    h
9     b    8    i
10    c    9    j
11    c    8    k
12    c    8    l
13    a    6    m
14    b   10    n
15    c    6    o
[1] "Levels of col1"
[1] "a" "b" "c"
[1] "Modified DataFrame"
  col1 col2 col3
1     a    8    a
2     c   10    b
3     c    9    c
4     a    8    d
5  <NA>    8    e
6  <NA>    8    f
7     a    5    g
8     c    9    h
9  <NA>    8    i
10    c    9    j
11    c    8    k
12    c    8    l
13    a    6    m
14 <NA>   10    n
15    c    6    o
[1] "Levels of col1"
[1] "a" "c"

%in% operator in R language is used to check for the presence of a value in a vector or list object. It returns a logical value depending on whether the value exists.

Syntax:

val %in% vec

Using the %in% operator, multiple values can be checked, and more than one levels can also be assigned to NA at the same time.

Example:

# declaring columns of data frame
col1 <-as.factor(sample(letters[1:3],15,replace=TRUE))
col2<-sample(5:10,15,replace=TRUE)
col3 <- letters[1:15]

# creating a data frame
data_frame <- data.frame(col1, col2, col3)
print ("Original DataFrame")
print (data_frame)

# getting levels of col1
print ("Levels of col1")
levels(data_frame$col1)
vec <- c("a","b")

# if value of column1 is b then replace 
# by missing value
levels(data_frame$col1)[levels(data_frame$col1) %in% vec]<-NA

print ("Modified DataFrame")
print (data_frame)

print ("Levels of col1")
levels(data_frame$col1)

Output

[1] "Original DataFrame"
  col1 col2 col3
1     a    7    a
2     b    5    b
3     b    7    c
4     a    5    d
5     b    5    e
6     a   10    f
7     c    8    g
8     c    5    h
9     a    6    i
10    a    8    j
11    a    5    k
12    b   10    l
13    b    5    m
14    b    6    n
15    c    8    o
[1] "Levels of col1"
[1] "a" "b" "c"
[1] "Modified DataFrame"
  col1 col2 col3
1  <NA>    7    a
2  <NA>    5    b
3  <NA>    7    c
4  <NA>    5    d
5  <NA>    5    e
6  <NA>   10    f
7     c    8    g
8     c    5    h
9  <NA>    6    i
10 <NA>    8    j
11 <NA>    5    k
12 <NA>   10    l
13 <NA>    5    m
14 <NA>    6    n
15    c    8    o
[1] "Levels of col1"
[1] "c"

Method 2: Using indexing method

The desired value can be checked against the data frame column using an == operator and then all its instances from the data frame can be removed. However, this method is less efficient, since it only maps the occurrences of the variable to NA in the data frame and doesn't actually modify the number of levels already existing.

Syntax:

df$col-name[df$col-name == val] = NA

Example:

# declaring columns of data frame
col1 <-as.factor(sample(letters[1:3],15,replace=TRUE))
col2<-sample(5:10,15,replace=TRUE)
col3 <- letters[1:15]

# creating a data frame
data_frame <- data.frame(col1, col2, col3)
print ("Original DataFrame")
print (data_frame)

# getting levels of col1
print ("Levels of col1")
levels(data_frame$col1)

# if value of column1 is b then replace 
# by missing value
data_frame$col1[data_frame$col1 == "b"] = NA

print ("Modified DataFrame")
print (data_frame)

print ("Levels of col1")
levels(data_frame$col1)

Output

[1] "Original DataFrame"
  col1 col2 col3
1     c    7    a
2     b    5    b
3     b    6    c
4     c    5    d
5     c    7    e
6     c   10    f
7     b    7    g
8     a    9    h
9     c    9    i
10    a    8    j
11    c    9    k
12    a   10    l
13    a    8    m
14    b   10    n
15    a    7    o
[1] "Levels of col1"
[1] "a" "b" "c"
[1] "Modified DataFrame"
  col1 col2 col3
1     c    7    a
2  <NA>    5    b
3  <NA>    6    c
4     c    5    d
5     c    7    e
6     c   10    f
7  <NA>    7    g
8     a    9    h
9     c    9    i
10    a    8    j
11    c    9    k
12    a   10    l
13    a    8    m
14 <NA>   10    n
15    a    7    o
[1] "Levels of col1"
[1] "a" "b" "c"

How to add suffix to column names in R DataFrame ?

mallikagupta90

Improve

Article Tags :

How to set level of a factor column in R dataframe to NA?

Method 1 : Using levels() method

Method 2: Using indexing method

Similar Reads

Thank You!

What kind of Experience do you want to share?