Dummy Variables in R Programming
Last Updated :
17 Apr, 2025
Dummy variables are binary variables used to represent categorical data in numerical form. They represents a characteristic of an observation for example, gender can be represented as 1 for male and 0 for female or vice versa. New columns are created to reflect these binary values, such as gender_m
for male and gender_f
for female.
Here's the original dataframe:
After creating dummy variable: 
Dummy variables are essential in statistical models and machine learning algorithms because most algorithms require numerical input. By converting categories into binary values, dummy variables allow these models to process and analyze categorical features effectively In this article, we will create dummy variables in R using two methods, ifelse() method and another is by using dummy_cols() function.
1. Using ifelse() function
ifelse() function performs a test and based on the result of the test return true value or false value as provided in the parameters of the function. Using this function, dummy variable can be created accordingly.
Syntax: ifelse(test, yes, no)
Parameters:
- test: represents test condition
- yes: represents the value which will be executed if test condition satisfies
- no: represents the value which will be executed if test condition does not satisfies
Example 1:
In this example, we loaded the built-in PlantGrowth
dataset and created a dummy variable group_ctr1
, which is 1 if the group is "ctrl" (control group) and 0 otherwise. This transformation makes the categorical group
variable suitable for numerical analysis.
R
pg <- PlantGrowth
cat("Original dataset:\n")
head(pg, 5)
pg$group_ctr1 <- ifelse(pg$group == "ctrl", 1, 0)
cat("After creating dummy variable:\n")
head(pg,5)
Output:
Original Data
Data with dummy variablesExample 2:
In this example, we created a data frame df
with categorical and numerical variables. We then generated two dummy variables: gender_m
, which is 1 if gender is "m" and 0 otherwise and gender_f
, which is 1 if gender is "f" and 0 otherwise. This allows the gender
variable to be represented in a numerical format suitable for analysis
R
df <- data.frame(gender = c("m", "f", "m"),
age = c(19, 20, 20),
city = c("Delhi", "Mumbai",
"Delhi"))
head(df)
df$gender_m <- ifelse(df$gender == "m", 1, 0)
df$gender_f <- ifelse(df$gender == "f", 1, 0)
head(df)
Output:
Original Data Frame
After creating dummy variables2. Using dummy_cols() function
dummy_cols() function is present in fastDummies package. It creates dummy variables on the basis of parameters provided in the function. If columns are not selected in the function call for which dummy variable has to be created, then dummy variables are created for all characters and factors column in the dataframe.
Syntax: dummy_cols(.data, select_columns = NULL)
Parameters:
- data: represents object for which dummy columns has to be created .
- select_columns: represents columns for which dummy variables has to be created.
Example 1:
In this example, we used the fastDummies
package to automatically create dummy variables for the group
column in the PlantGrowth
dataset. The dummy_cols()
function generates separate binary columns for each category in group
, enabling easy use of categorical data in numerical analysis.
R
install.packages("fastDummies")
library(fastDummies)
data <- PlantGrowth
data <- dummy_cols(data,
select_columns = "group")
head(data,5)
Output:
Using dummy_cols function Example 2:
In this example, we created a data frame df
and used the dummy_cols()
function from the fastDummies
package to automatically generate dummy variables for all categorical columns (gender
and city
). This converts each category into separate binary columns, making the data ready for numerical analysis.
R
df <- data.frame(gender = c("m", "f", "m"),
age = c(19, 20, 20),
city = c("Delhi", "Mumbai",
"Delhi"))
df <- dummy_cols(df)
head(df)
Output:
Using dummy_cols functionIn this article, we explored how to create dummy variables in R using two approaches ,manually with the ifelse()
function and automatically with the dummy_cols()
function from the fastDummies package ,to convert categorical data into a numerical format suitable for analysis.