0% found this document useful (0 votes)
124 views

Factor in R - Categorical & Continuous Variables

This document discusses factors in R, which are variables that take on a limited number of categorical values. It describes categorical and continuous variables, and how to convert categorical variables to factors. Factors allow categorical variables to be used in machine learning algorithms. Categorical variables can be nominal, with no inherent ordering, or ordinal, with a natural ordering. Continuous variables are numeric and stored as such by default in R.

Uploaded by

dobomighie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views

Factor in R - Categorical & Continuous Variables

This document discusses factors in R, which are variables that take on a limited number of categorical values. It describes categorical and continuous variables, and how to convert categorical variables to factors. Factors allow categorical variables to be used in machine learning algorithms. Categorical variables can be nominal, with no inherent ordering, or ordinal, with a natural ordering. Continuous variables are numeric and stored as such by default in R.

Uploaded by

dobomighie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

26/01/2020 Factor in R: Categorical & Continuous Variables

(/)

Factor in R: Categorical & Continuous Variables


What is Factor in R?
Factors are variables in R which take on a limited number of different values; such
variables are often referred to as categorical variables.

In a dataset, we can distinguish two types of variables: categorical and continuous.

In a categorical variable, the value is limited and usually based on a particular finite
group. For example, a categorical variable can be countries, year, gender, occupation.
A continuous variable, however, can take any values, from integer to decimal. For
example, we can have the revenue, price of a share, etc..

Categorical Variables
R stores categorical variables into a factor. Let's check the code below to convert a
character variable into a factor variable. Characters are not supported in machine
learning algorithm, and the only way is to convert a string to an integer.

Syntax

factor(x = character(), levels, labels = levels, ordered = is.ord


ered(x))

Arguments:

x: A vector of data. Need to be a string or integer, not decimal.


Levels: A vector of possible values taken by x. This argument is optional. The default
value is the unique list of items of the vector x.
Labels: Add a label to the x data. For example, 1 can take the label `male` while 0,
the label `female`.
ordered: Determine if the levels should be ordered.

/
26/01/2020 Factor in R: Categorical & Continuous Variables

Example:

Let's create a factor data frame.

# Create gender vector


gender_vector <- c("Male", "Female", "Female", "Male", "Male")
class(gender_vector)
# Convert gender_vector to a factor
factor_gender_vector <-factor(gender_vector)
class(factor_gender_vector)

Output:

## [1] "character"
## [1] "factor"

It is important to transform a string into factor when we perform Machine Learning task.

A categorical variable can be divided into nominal categorical variable and ordinal
categorical variable.

Nominal Categorical Variable


A categorical variable has several values but the order does not matter. For instance,
male or female categorical variable do not have ordering.

# Create a color vector


color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yell
ow')
# Convert the vector to factor
factor_color <- factor(color_vector)
factor_color

Output:

## [1] blue red green white black yellow


## Levels: black blue green red white yellow

From the factor_color, we can't tell any order.

Ordinal Categorical Variable


/
26/01/2020 Factor in R: Categorical & Continuous Variables
Ordinal categorical variables do have a natural ordering. We can specify the order, from
the lowest to the highest with order = TRUE and highest to lowest with order = FALSE.

Example:

We can use summary to count the values for each factor.

# Create Ordinal categorical vector


day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'mid
night', 'evening')
# Convert `day_vector` to a factor with ordered level
factor_day <- factor(day_vector, order = TRUE, levels =c('mornin
g', 'midday', 'afternoon', 'evening', 'midnight'))
# Print the new variable
factor_day

Output:

## [1] evening morning afternoon midday


midnight evening

Example:

## Levels: morning < midday < afternoon < evening < midnight
# Append the line to above code
# Count the number of occurence of each level
summary(factor_day)

Output:

## morning midday afternoon evening midnight


## 1 1 1 2 1

R ordered the level from 'morning' to 'midnight' as specified in the levels parenthesis.

Continuous Variables
Continuous class variables are the default value in R. They are stored as numeric or
integer. We can see it from the dataset below. mtcars is a built-in dataset. It gathers
information on different types of car. We can import it by using mtcars and check the
/
26/01/2020 Factor in R: Categorical & Continuous Variables

class of the variable mpg, mile per gallon. It returns a numeric value, indicating a
continuous variable.

dataset <- mtcars


class(dataset$mpg)

Output

## [1] "numeric"

 Prev (/r-matrix-tutorial.html) Report a Bug


Next  (/r-data-frames.html)

YOU MIGHT LIKE:

LINUX BLOG BLOG

(/unix-interview- (/free-form-builder- (/sap-solution-manager-


questions.html) creator.html) interview-
(/unix-interview- (/free-form- questions.html)
questions.html) builder-creator.html) (/sap-solution-
Top 50 Unix Interview 21 BEST Online Form manager-interview-
Questions & Answers Builders (Creators) in 2020 questions.html)
(/unix-interview- [Free/Paid] Top 12 SAP Solution
questions.html) (/free-form-builder- Manager Interview
creator.html) Questions & Answers
(/sap-solution-manager-
interview-questions.html)

DATA WAREHOUSING SDLC BLOG

(/oltp-vs-olap.html) (/xml-vs-html- (/best-python-


(/oltp-vs- difference.html) books.html)
olap.html) (/xml-vs-html- (/best-python-
OLTP vs OLAP: What's the difference.html) books.html)
Di erence? Di erence between XML 11 Best Python
(/oltp-vs-olap.html) and HTML Programming Books for
(/xml-vs-html- Beginner (2020 Update)
/

You might also like