Open In App

Dummy Variables in R Programming

Last Updated : 17 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Dummy variables are binary variables used to represent categorical data in numerical form. They represents a characteristic of an observation for example, gender can be represented as 1 for male and 0 for female or vice versa. New columns are created to reflect these binary values, such as gender_m for male and gender_f for female.

Here's the original dataframe:dummy varibales After creating dummy variable: dummy varibales

Dummy variables are essential in statistical models and machine learning algorithms because most algorithms require numerical input. By converting categories into binary values, dummy variables allow these models to process and analyze categorical features effectively In this article, we will create dummy variables in R using two methods, ifelse() method and another is by using dummy_cols() function.

1. Using ifelse() function

ifelse() function performs a test and based on the result of the test return true value or false value as provided in the parameters of the function. Using this function, dummy variable can be created accordingly.

Syntax: ifelse(test, yes, no)

Parameters:

  • test: represents test condition
  • yes: represents the value which will be executed if test condition satisfies
  • no: represents the value which will be executed if test condition does not satisfies

Example 1: 

In this example, we loaded the built-in PlantGrowth dataset and created a dummy variable group_ctr1, which is 1 if the group is "ctrl" (control group) and 0 otherwise. This transformation makes the categorical group variable suitable for numerical analysis.

R
pg <- PlantGrowth
cat("Original dataset:\n")
head(pg, 5)

pg$group_ctr1 <- ifelse(pg$group == "ctrl", 1, 0)

cat("After creating dummy variable:\n")
head(pg,5)

Output:

plant_group
Original Data
plat_group_dummy
Data with dummy variables

Example 2: 

In this example, we created a data frame df with categorical and numerical variables. We then generated two dummy variables: gender_m, which is 1 if gender is "m" and 0 otherwise and gender_f, which is 1 if gender is "f" and 0 otherwise. This allows the gender variable to be represented in a numerical format suitable for analysis

R
df <- data.frame(gender = c("m", "f", "m"),
                 age = c(19, 20, 20),
                 city = c("Delhi", "Mumbai", 
                                   "Delhi"))


head(df)

df$gender_m <- ifelse(df$gender == "m", 1, 0)
df$gender_f <- ifelse(df$gender == "f", 1, 0)

head(df)

Output:

city
Original Data Frame
city_dummy
After creating dummy variables

2. Using dummy_cols() function

dummy_cols() function is present in fastDummies package. It creates dummy variables on the basis of parameters provided in the function. If columns are not selected in the function call for which dummy variable has to be created, then dummy variables are created for all characters and factors column in the dataframe.

Syntax: dummy_cols(.data, select_columns = NULL)

Parameters:

  • data: represents object for which dummy columns has to be created .
  • select_columns: represents columns for which dummy variables has to be created.

Example 1: 

In this example, we used the fastDummies package to automatically create dummy variables for the group column in the PlantGrowth dataset. The dummy_cols() function generates separate binary columns for each category in group, enabling easy use of categorical data in numerical analysis.

R
install.packages("fastDummies")
library(fastDummies)

data <- PlantGrowth

data <- dummy_cols(data, 
                   select_columns = "group")

head(data,5)

Output:

plant_fast_dummies
Using dummy_cols function

Example 2: 

In this example, we created a data frame df and used the dummy_cols() function from the fastDummies package to automatically generate dummy variables for all categorical columns (gender and city). This converts each category into separate binary columns, making the data ready for numerical analysis.

R
df <- data.frame(gender = c("m", "f", "m"),
                 age = c(19, 20, 20),
                 city = c("Delhi", "Mumbai", 
                                  "Delhi"))

df <- dummy_cols(df)

head(df)

Output:

city_fast_dummies
Using dummy_cols function

In this article, we explored how to create dummy variables in R using two approaches ,manually with the ifelse() function and automatically with the dummy_cols() function from the fastDummies package ,to convert categorical data into a numerical format suitable for analysis.


Next Article

Similar Reads