0% found this document useful (0 votes)

14 views

Exp 6

Uploaded by

hemanthsairavipati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Exp 6

Uploaded by

hemanthsairavipati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Experiment-6

Write a program to implement Categorical Encoding, One-hot Encoding:

Categorical Encoding:

What is categorical data?

Since we are going to be working on categorical variables in this experiment , here is a quick refresher on

the same with a couple of examples. Categorical variables are usually represented as „strings‟ or „categories‟

and are finite in number. Here are a few examples:

1. The city where a person lives: Delhi, Mumbai, Ahmedabad, Bangalore, etc.

2. The department a person works in: Finance, Human resources, IT, Production.

3. The highest degree a person has: High school, Diploma, Bachelors, Masters, PhD.

4. The grades of a student: A+, A, B+, B, B- etc.

In the above examples, the variables only have definite possible values. Further, we can see there are two

kinds of categorical data-

 Ordinal Data: The categories have an inherent order

 Nominal Data: The categories do not have an inherent order

Regardless of the encoding method, they all aim to replace instances of a categorical variable with a fixed-

length vector. Before moving on to the next section, it is important to know that there are two types of

categorical variables:

1. Nominal → Athens, Cairo, Paris, Tokyo, New Delhi etc

2. Ordinal → High School Diploma, BS, MS, PhD

Label Encoding or Ordinal Encoding

We use this categorical data encoding technique when the categorical feature is ordinal. In this case,

retaining the order is important. Hence encoding should reflect the sequence.

In Label encoding, each label is converted into an integer value. We will create a variable that contains the

categories representing the education qualification of a person.

Label Encoding refers to converting the labels into a numeric form so as to convert them into the
machine-readable form. Machine learning algorithms can then decide in a better way how those labels
must be operated. It is an important pre-processing step for the structured dataset in supervised learning.
 Example :
Suppose we have a column Height in some dataset.

After applying label encoding, the Height column is converted into:

where 0 is the label for tall, 1 is the label for medium, and 2 is a label for short height.

# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

#importing datasets
data_set= pd.read_csv('/content/Iris.csv')

data_set
data_set['Species'].unique()

# Import label encoder

from sklearn import preprocessing

# label_encoder object knows how to understand word labels.

label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.

data_set['Species']= label_encoder.fit_transform(data_set['Species'])

data_set['Species'].unique()

data_set

here convert species into 3 labels (0,1,2)

One-hot Encoding:

One hot encoding is a technique used to represent categorical variables as numerical values in a
machine learning model. The advantages of using one hot encoding include:
1. It allows the use of categorical variables in models that require numerical input.
2. It can improve model performance by providing more information to the model about the categorical
variable.
3. It can help to avoid the problem of ordinality, which can occur when a categorical variable has a
natural ordering (e.g. “small”, “medium”, “large”).

In this technique, the categorical parameters will prepare separate columns for both Male and Female
labels. So, wherever there is Male, the value will be 1 in Male column and 0 in Female column, and vice-
versa. Let‟s understand with an example: Consider the data where fruits and their corresponding
categorical values and prices are given.

Fruit Price

apple 5

mango 10

apple 15

orange 20

The output after one-hot encoding of the data is given as follows,

apple mango orange price

1 0 0 5

0 1 0 10

1 0 0 15

0 0 1 20
Step-1: Data Pre-processing Step:
The very first step is data pre-processing, which we have already discussed in this tutorial. This process
contains the below steps:

# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

#importing datasets
data_set= pd.read_csv('50_CompList.csv')

data_set
data_set.isna().sum()
data_set.describe()

#Extracting Independent and dependent Variable

x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 4].values

Encoding Dummy Variables:

As we have one categorical variable (State), which cannot be directly applied to the model, so we will
encode it. To encode the categorical variable into numbers, we will use the LabelEncoder class. But it is
not sufficient because it still has some relational order, which may create a wrong model. So in order to
remove this problem, we will use OneHotEncoder, which will create the dummy variables. Below is code
for it:

from sklearn.preprocessing import OneHotEncoder

from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Country", OneHotEncoder(), [3])], remainder = 'passthrough')
x = ct.fit_transform(x)
#avoiding the dummy variable trap:
x = x[:, 1:]

# Splitting the dataset into training and test set.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state=0)

#Fitting the MLR model to the training set:

from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(x_train, y_train)

#Predicting the Test set result;

y_pred= regressor.predict(x_test)

print('Train Score: ', regressor.score(x_train, y_train))

print('Test Score: ', regressor.score(x_test, y_test))

print(" Actual output \n {}".format(y_test),"\n predict outputs:\n {}".format(y_pre

d))

# Calculation of Mean Squared Error (MSE)

from sklearn.metrics import mean_squared_error
print("Mean Squared Error (MSE):\n")
mean_squared_error(y_test,y_pred)

# Calculation r2_score

from sklearn.metrics import r2_score

r2 = r2_score(y_test,y_pred)
print('r2 score for perfect model is', r2*100)

All About Encoding - by Baijayanta Roy - Towards Data Science
No ratings yet
All About Encoding - by Baijayanta Roy - Towards Data Science
25 pages
All About Categorical Variable Encoding
No ratings yet
All About Categorical Variable Encoding
21 pages
Feature+Encoding
No ratings yet
Feature+Encoding
5 pages
L1_Data Pre-processing & Steps of Building a Model (1)
No ratings yet
L1_Data Pre-processing & Steps of Building a Model (1)
30 pages
TP4-ML-features encoding (3)
No ratings yet
TP4-ML-features encoding (3)
4 pages
Dealing with categorical
No ratings yet
Dealing with categorical
25 pages
lab 6
No ratings yet
lab 6
6 pages
ML Concepts Papers
No ratings yet
ML Concepts Papers
3 pages
Lecture 5 Encoding
No ratings yet
Lecture 5 Encoding
35 pages
Label Encoding Presentation
No ratings yet
Label Encoding Presentation
11 pages
ML-Lab05-Data Preprocessing Techniques in Python
No ratings yet
ML-Lab05-Data Preprocessing Techniques in Python
7 pages
LAB MANUAL 5 SOLVED 40 (1)
No ratings yet
LAB MANUAL 5 SOLVED 40 (1)
13 pages
Using Categorical Data With One Hot Encoding - Kaggle PDF
No ratings yet
Using Categorical Data With One Hot Encoding - Kaggle PDF
4 pages
Categorical Encoding Using Label-Encoding and One-Hot-Encoder
No ratings yet
Categorical Encoding Using Label-Encoding and One-Hot-Encoder
9 pages
6 One Hot Encoding
No ratings yet
6 One Hot Encoding
3 pages
Handling of Categorical Data
No ratings yet
Handling of Categorical Data
18 pages
Categorical Variable
No ratings yet
Categorical Variable
2 pages
Data Wrangling Python.
No ratings yet
Data Wrangling Python.
8 pages
ML Exp-6
No ratings yet
ML Exp-6
2 pages
A Deep-Learned Embedding Technique For Categorical Features Encoding
No ratings yet
A Deep-Learned Embedding Technique For Categorical Features Encoding
11 pages
Mastering Categorical Encoding
No ratings yet
Mastering Categorical Encoding
8 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
Comparison Between Encoding Methods - 1
No ratings yet
Comparison Between Encoding Methods - 1
7 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
OneHot Encoding
No ratings yet
OneHot Encoding
5 pages
Encoding Categorical Data: Is There Yet Anything Hotter' Than One-Hot Encoding?
No ratings yet
Encoding Categorical Data: Is There Yet Anything Hotter' Than One-Hot Encoding?
11 pages
Spyder Version Errors and Warnings
No ratings yet
Spyder Version Errors and Warnings
2 pages
Week 10
No ratings yet
Week 10
50 pages
A Comparative Study of Categorical Variable Encoding Techniques
No ratings yet
A Comparative Study of Categorical Variable Encoding Techniques
4 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds Jquery
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds Jquery
12 pages
Regularized Target Encoding Outperforms Traditional Methods in Supervised Machine Learning With High Cardinality Features
No ratings yet
Regularized Target Encoding Outperforms Traditional Methods in Supervised Machine Learning With High Cardinality Features
22 pages
Feature Engineering
100% (2)
Feature Engineering
76 pages
(Articulo) A Comparative Study of Categorical Variable Encoding PDF
No ratings yet
(Articulo) A Comparative Study of Categorical Variable Encoding PDF
4 pages
Encoding Comparison
No ratings yet
Encoding Comparison
2 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
10 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Practical 3 - Categorical Feature Engineering
No ratings yet
Practical 3 - Categorical Feature Engineering
6 pages
Data Preparation.2
No ratings yet
Data Preparation.2
18 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
5 - One - Hot - Encoding - Ipynb - Colaboratory
No ratings yet
5 - One - Hot - Encoding - Ipynb - Colaboratory
8 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Feature Engineering
No ratings yet
Feature Engineering
43 pages
Exp 2 Data Preprocessing_ Cleaning the Dataset Obtained from the UCI ML Repository
No ratings yet
Exp 2 Data Preprocessing_ Cleaning the Dataset Obtained from the UCI ML Repository
9 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Practical 1 52
No ratings yet
Practical 1 52
4 pages
Featureengineering 171206213206
No ratings yet
Featureengineering 171206213206
45 pages
4 Data Preprocessing
No ratings yet
4 Data Preprocessing
27 pages
Machine Learning: by Team 2
No ratings yet
Machine Learning: by Team 2
41 pages
Feature Engineering
No ratings yet
Feature Engineering
20 pages
Cerda Et Al. - 2018 - Similarity Encoding For Learning With Dirty Categorical Variables
No ratings yet
Cerda Et Al. - 2018 - Similarity Encoding For Learning With Dirty Categorical Variables
18 pages
Machine Learning With Python Data Preprocessing, Analysis and Visualization
No ratings yet
Machine Learning With Python Data Preprocessing, Analysis and Visualization
8 pages
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
45 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Lecture-2-20022025-092902am
No ratings yet
Lecture-2-20022025-092902am
87 pages
FeatureEngineering (1)
No ratings yet
FeatureEngineering (1)
50 pages
Code Day 9 ML (ordinal) - Jupyter Notebook
No ratings yet
Code Day 9 ML (ordinal) - Jupyter Notebook
4 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
Fifth China Wen
No ratings yet
Fifth China Wen
7 pages
Randomization in SV
No ratings yet
Randomization in SV
11 pages
Time Series
No ratings yet
Time Series
13 pages
Calculus Early Transcendental Functions 6th Edition Larson Solutions Manual download
100% (3)
Calculus Early Transcendental Functions 6th Edition Larson Solutions Manual download
44 pages
Solution Report For: Home My Test My Profile
No ratings yet
Solution Report For: Home My Test My Profile
18 pages
Test 1. Sequences
No ratings yet
Test 1. Sequences
3 pages
MMW - Module 4 Basic Concepts 2021
100% (1)
MMW - Module 4 Basic Concepts 2021
25 pages
2016-2 HVTT-IEEE STD 4 Presentation
No ratings yet
2016-2 HVTT-IEEE STD 4 Presentation
63 pages
Leveson, Applying Systems Safety
No ratings yet
Leveson, Applying Systems Safety
17 pages
Control of VSC-HVDC For Wind Power
100% (1)
Control of VSC-HVDC For Wind Power
75 pages
Ai TS 2 (X) - APT 2 - CMP - 16 09 2019 - SET A
100% (2)
Ai TS 2 (X) - APT 2 - CMP - 16 09 2019 - SET A
15 pages
Target Ntse Mvpp-Mat: Mental Ability
No ratings yet
Target Ntse Mvpp-Mat: Mental Ability
15 pages
ECON 102 Assignment 1
No ratings yet
ECON 102 Assignment 1
11 pages
Univariate Analysis of Variance: Notes
No ratings yet
Univariate Analysis of Variance: Notes
4 pages
DPP - Straight Lines
No ratings yet
DPP - Straight Lines
2 pages
Solution Manual for For All Practical Purposes: Mathematical Literacy in Today’s World Tenth Edition - Complete Set Of Chapters Available For Instant Download
100% (5)
Solution Manual for For All Practical Purposes: Mathematical Literacy in Today’s World Tenth Edition - Complete Set Of Chapters Available For Instant Download
41 pages
18.034 Honors Differential Equations: Mit Opencourseware
No ratings yet
18.034 Honors Differential Equations: Mit Opencourseware
5 pages
Boyer
No ratings yet
Boyer
3 pages
Duality Concept in Linear Programming: Prof. Biswajit Mahanty
No ratings yet
Duality Concept in Linear Programming: Prof. Biswajit Mahanty
53 pages
Maths GPT
No ratings yet
Maths GPT
11 pages
Copy Math9 Q4 Mod1 Trigonometric-Functions-SOH-CAH-TOA v3
No ratings yet
Copy Math9 Q4 Mod1 Trigonometric-Functions-SOH-CAH-TOA v3
16 pages
Wipro Practice Qs
100% (2)
Wipro Practice Qs
65 pages
Circula.-Circular Functions and Its Domain and Range-01
No ratings yet
Circula.-Circular Functions and Its Domain and Range-01
6 pages
Chapter 5
No ratings yet
Chapter 5
28 pages
December SAT v0
No ratings yet
December SAT v0
16 pages
An Introduction To Stochastic Calculus
100% (1)
An Introduction To Stochastic Calculus
102 pages
BC0043 Computer Oriented Numerical Methods Paper 2
No ratings yet
BC0043 Computer Oriented Numerical Methods Paper 2
14 pages
RMQ
No ratings yet
RMQ
8 pages
Flow Measurement
No ratings yet
Flow Measurement
15 pages
Math 10 Definitions
100% (1)
Math 10 Definitions
5 pages

Exp 6

Uploaded by

Exp 6

Uploaded by

Experiment-6

Write a program to implement Categorical Encoding, One-hot Encoding:

What is categorical data?

and are finite in number. Here are a few examples:

4. The grades of a student: A+, A, B+, B, B- etc.

kinds of categorical data-

 Ordinal Data: The categories have an inherent order

 Nominal Data: The categories do not have an inherent order

1. Nominal → Athens, Cairo, Paris, Tokyo, New Delhi etc

2. Ordinal → High School Diploma, BS, MS, PhD

Label Encoding or Ordinal Encoding

categories representing the education qualification of a person.

After applying label encoding, the Height column is converted into:

# Import label encoder

# label_encoder object knows how to understand word labels.

# Encode labels in column 'species'.

here convert species into 3 labels (0,1,2)

The output after one-hot encoding of the data is given as follows,

apple mango orange price

#Extracting Independent and dependent Variable

Encoding Dummy Variables:

from sklearn.preprocessing import OneHotEncoder

# Splitting the dataset into training and test set.

#Fitting the MLR model to the training set:

#Predicting the Test set result;

print('Train Score: ', regressor.score(x_train, y_train))

print(" Actual output \n {}".format(y_test),"\n predict outputs:\n {}".format(y_pre

# Calculation of Mean Squared Error (MSE)

from sklearn.metrics import r2_score

You might also like