How to Create Categorical Variables in R?
Last Updated :
19 Dec, 2021
In this article, we will learn how to create categorical variables in the R Programming language.
In statistics, variables can be divided into two categories, i.e., categorical variables and quantitative variables. The variables which consist of numerical quantifiable values are known as quantitative variables and a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.
Method 1: Categorical Variable from Scratch
To create a categorical variable from scratch i.e. by giving manual value for each row of data, we use the factor() function and pass the data column that is to be converted into a categorical variable. This factor() function converts the quantitative variable into a categorical variable by grouping the same values together.
Syntax:
df$categorical_variable <- factor( categorical_vector )
where
- df: determines the data frame.
- categorical_variable: determines the final column variable which will contain categorical data.
- categorical_vector: is the vector that has to be converted.
Example:
Here, is a basic data frame where a new column group is added as a categorical variable.
R
df <- data.frame (x= c (10, 23, 13, 41, 15),
y= c (71, 17, 28, 32, 12))
group_vector <- c ( 'A' , 'B' , 'C' , 'D' , 'E' )
df$group <- factor (group_vector)
df
|
Output:
x y group
1 10 71 A
2 23 17 B
3 13 28 C
4 41 32 D
5 15 12 E
Method 2: Categorical Variable from the Existing column using two values
To create a categorical variable from the existing column, we use an if-else statement within the factor() function and give a value to a column if a certain condition is true otherwise give another value.
Syntax:
df$categorical_variable <- as.factor( ifelse(condition, val1, val2) )
where
- df: determines the data frame.
- categorical_variable: determines the final column variable which will contain categorical data.
- condition: determines the condition to be checked, if the condition is true, use val1 otherwise val2.
Example:
Here, is a basic data frame where a new column group is added as a categorical variable from an if-else condition.
R
df <- data.frame (x= c (10, 23, 13, 41, 15),
y= c (71, 17, 28, 32, 12))
df$group <- as.factor ( ifelse (df$x >20, 'A' , 'B' ))
df
|
Output:
x y group
1 10 71 B
2 23 17 A
3 13 28 B
4 41 32 A
5 15 12 B
Method 3: Categorical Variable from the Existing column using multiple values
To create a categorical variable from the existing column, we use multiple if-else statements within the factor() function and give a value to a column if a certain condition is true, if none of the conditions are true we use the else value of the last statement.
Syntax:
df$categorical_variable <- as.factor( ifelse(condition, val,ifelse(condition, val,ifelse(condition, val, ifelse(condition, val, vale_else)))))
where
- df: determines the data frame.
- categorical_variable: determines the final column variable which will contain categorical data.
- condition: determines the condition to be checked, if the condition is true, use val.
- val_else: determines the value if no condition is true.
Example:
Here, is a basic data frame where a new column group is added as a categorical variable from multiple if-else conditions.
R
df <- data.frame (x= c (10, 23, 13, 41, 15, 11, 23, 45, 95, 23, 75),
y= c (71, 17, 28, 32, 12, 13, 41, 15, 11, 23, 34))
df$group <- as.factor ( ifelse (df$x<20, 'A' ,
ifelse (df$x<30, 'B' ,
ifelse (df$x<50, 'C' ,
ifelse (df$x<90, 'D' , 'E' )))))
df
|
Output:
x y group
1 10 71 A
2 23 17 B
3 13 28 A
4 41 32 C
5 15 12 A
6 11 13 A
7 23 41 B
8 45 15 C
9 95 11 E
10 23 23 B
11 75 34 D
Similar Reads
How to Create Added Variable Plots in R?
In this article, we will discuss how to create an added variable plot in the R Programming Language. The Added variable plot is an individual plot that displays the relationship between a response variable and one predictor variable in a multiple linear regression model while controlling for the pre
3 min read
How to Convert Categorical Variable to Numeric in Pandas?
In this article, we will learn how to convert a categorical variable into a Numeric by using pandas. When we look at the categorical data, the first question that arises to anyone is how to handle those data, because machine learning is always good at dealing with numeric values. We could make machi
3 min read
How to Add Variables to a Data Frame in R
In data analysis, it is often necessary to create new variables based on existing data. These new variables can provide additional insights, support further analysis, and improve the overall understanding of the dataset. R, a powerful tool for statistical computing and graphics, offers various metho
5 min read
How to Plot Categorical Data in R?
In this article, we will be looking at different plots for the categorical data in the R programming language. Categorical Data is a variable that can take on one of a limited, and usually fixed, a number of possible values, assigning each individual or other unit of observation to a particular grou
3 min read
How to Create a Histogram of Two Variables in R?
In this article, we will discuss how to create a histogram of two variables in the R programming language. Method 1: Creating a histogram of two variables with base R In this approach to create a histogram pf two variables, the user needs to call the hist() function twice as there is two number of v
2 min read
How to Assign Colors to Categorical Variable in ggplot2 Plot in R ?
In this article, we will see how to assign colors to categorical Variables in the ggplot2 plot in R Programming language. Note: Here we are using a scatter plot, the same can be applied to any other graph. Dataset in use: YearPointsUsers1201130user12201220user23201315user34201435user45201550user5 To
2 min read
How to create a frequency table for categorical data in R ?
In this article, we will see how to create a frequency table for categorical data in R Programming Language. Method 1 : Using table() method Tables in R are used for better organizing and summarizing the categorical variables. The table() method takes the cross-classifying factors belonging in a vec
5 min read
How to Create Tables in R?
In this article, we will discuss how to create tables in R Programming Language. Method 1: Create a table from scratch We can create a table by using as.table() function, first we create a table using matrix and then assign it to this method to get the table format. Syntax: as.table(data) Example: I
2 min read
How to create an array in R
The array is the fundamental data structure in R used to store multiple elements of the same data type. In this article, we will explore two different approaches to creating an array in R Programming Language. Creating an array in RBelow are the approaches for creating an array in R. Using array() f
4 min read
How do you create a factor variable in R
In R programming Language factor variables are a fundamental data type for categorical data. Factor variables, unlike numeric or character variables, reflect defined categories, making them useful for a variety of statistical analysis and data modeling applications. What are factor variables?Factor
3 min read