Label Encoding in R programming
Last Updated :
09 Oct, 2022
The data that has to be processed for performing manipulations and Analyses should be easily understood and well denoted. The computer finds it difficult to process strings and other objects when data training and predictions based on it have to be performed. Label encoding is a mechanism to assign numerical values to the string variables so that they are easily transformed and fed into various models. Therefore label encoders typically perform the conversion of categorical variables into integral values. Decoders perform the reverse operation.
Label Encoding in R programming
Label encoders take as input a vector of categorical variables and convert it into numerical form. Initially, a vector is fed as input to the encoder.
To implement the Label Encoding in R Programming Language, we have two methods :
- Using superml
- Using factors()
Let's discuss the method below:
Using superml to Get Label Encoding in R programming
The superml package in R is designed to unify the model training process in R. It can be downloaded and installed into the working space using the following command :
install.packages("superml")
Initially, a new label encoder object is instantiated using LabelEncoder$new(). The vector supplied as input is used for fitting the model. The transformation takes place using the fit_transform method, which performs the transformation. The final result is the numerical vector.
The following sequence of operations is performed :
- encoder$fit(x)
- encoder$fit_transform(x)
- encoder$transform(x)
Arguments :
- x - The vector to be supplied
- In the following code snippet, there were 2 groups therefore, numerically a binary vector of 0s and 1s have been created.
After installing the superml library with the above mentioned command, we can now run the below code.
R
x = c("Geekster","GeeksforGeeks","Geekster","Geekster",
"GeeksforGeeks","GeeksforGeeks","Geekster","GeeksforGeeks",
"Geekster","Geekster")
print("Original Data Vector")
print(x )
# create a label encoder object
encoder = LabelEncoder$new()
# fitting the data over the x vector
encoder$fit(x)
# transforming the data
encoder$fit_transform(x)
# printing the transformed data
encoder$transform(x)
Output:
Using factors() to Get Label Encoding in R programming
The factors method in base R is used to transform the given data into categorical variables. The values are assigned to each of the variables. In case, we wish to use the numerical instances, we can simply use as.numeric() method for the conversion.
Syntax : factor(x)
Arguments : x - The vector to be encoded
In the following code, the data contained in the companies vector is first sorted lexicographically. The levels are then assigned to the values and mapped to integers beginning with 1. The word "GeeksForGeeks" is assigned 1 level, and all its occurrences are replaced with 1 in the final output.
R
# creating a data vector
companies = c("Geekster","TCS","Geekster","Geekster",
"GeeksforGeeks",
"Wipro","Geekster",
"GeeksforGeeks",
"Geekster","Wipro","TCS")
# printing the original vector
print("Original Data")
print(companies)
# converting the data to factors
factors <- factor(companies)
# converting data to label encoded values
print("Label Encoded Data")
# printing the numeric equivalents of these vector values
print(as.numeric(factors))
Output :
Similar Reads
R Programming Language - Introduction R is a programming language and software environment that has become the first choice for statistical computing and data analysis. Developed in the early 1990s by Ross Ihaka and Robert Gentleman, R was built to simplify complex data manipulation and create clear, customizable visualizations. Over ti
4 min read
How to Code in R programming? R is a powerful programming language and environment for statistical computing and graphics. Whether you're a data scientist, statistician, researcher, or enthusiast, learning R programming opens up a world of possibilities for data analysis, visualization, and modeling. This comprehensive guide aim
4 min read
Assigning Vectors in R Programming Vectors are one of the most basic data structure in R. They contain data of same type. Vectors in R is equivalent to arrays in other programming languages. In R, array is a vector of one or more dimensions and every single object created is stored in the form of a vector. The members of a vector are
5 min read
Hello World in R Programming When we start to learn any programming languages we do follow a tradition to begin HelloWorld as our first basic program. Here we are going to learn that tradition. An interesting thing about R programming is that we can get our things done with very little code. Before we start to learn to code, le
2 min read
Learn R Programming R is a Programming Language that is mostly used for machine learning, data analysis, and statistical computing. It is an interpreted language and is platform independent that means it can be used on platforms like Windows, Linux, and macOS. In this R Language tutorial, we will Learn R Programming La
15+ min read