0% found this document useful (0 votes)

124 views

5) Randomforest - Ipynb - Colaboratory

This document summarizes the preprocessing steps for a machine learning model to classify car evaluations: 1) The data is read from a CSV file and split into feature (X) and target (y) variables. 2) The data is split into training and test sets. 3) Categorical variables are encoded using ordinal encoding to prepare for modeling.

Uploaded by

Prajith Sprinťèř

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views

5) Randomforest - Ipynb - Colaboratory

Uploaded by

Prajith Sprinťèř

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

import

numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # statistical data visualization
%matplotlib inline

from google.colab import files
uploaded = files.upload()

Choose Files No file chosen

Upload widget is only available when the cell has been
executed in the
current browser session. Please rerun this cell to enable.
Saving car evaluation(1).csv to car evaluation(1) (2).csv

import io
df = pd.read_csv(io.BytesIO(uploaded['car_evaluation(1).csv']))
df

vhigh vhigh.1 2 2.1 small low unacc

0 vhigh vhigh 2 2 small med unacc

1 vhigh vhigh 2 2 small high unacc

2 vhigh vhigh 2 2 med low unacc

3 vhigh vhigh 2 2 med med unacc

4 vhigh vhigh 2 2 med high unacc

... ... ... ... ... ... ... ...

1722 low low 5more more med med good

1723 low low 5more more med high vgood

1724 low low 5more more big low unacc

1725 low low 5more more big med good

1726 low low 5more more big high vgood

1727 rows × 7 columns

df.shape

(1727, 7)

df.head()
vhigh vhigh.1 2 2.1 small low unacc

0 vhigh vhigh 2 2 small med unacc

1 vhigh vhigh 2 2 small high unacc

2 vhigh vhigh 2 2 med low unacc

3 vhigh vhigh 2 2 med med unacc

df 4 vhigh vhigh 2 2 med high unacc

vhigh vhigh.1 2 2.1 small low unacc

0 vhigh vhigh 2 2 small med unacc

1 vhigh vhigh 2 2 small high unacc

2 vhigh vhigh 2 2 med low unacc

3 vhigh vhigh 2 2 med med unacc

4 vhigh vhigh 2 2 med high unacc

... ... ... ... ... ... ... ...

1722 low low 5more more med med good

1723 low low 5more more med high vgood

1724 low low 5more more big low unacc

1725 low low 5more more big med good

1726 low low 5more more big high vgood

1727 rows × 7 columns

col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']

df.columns = col_names

col_names

['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']

df.head()
buying maint doors persons lug_boot safety class
#summary of data set
0 vhigh vhigh 2 2 small med unacc

1 vhigh vhigh 2 2 small high unacc
df.info()
2 vhigh vhigh 2 2 med low unacc
3 vhigh vhigh 2 2 med med unacc

vhigh vhigh 2 2 med high unacc

RangeIndex: 1727 entries, 0 to 1726

Data columns (total 7 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 buying 1727 non-null object

1 maint 1727 non-null object

2 doors 1727 non-null object

3 persons 1727 non-null object

4 lug_boot 1727 non-null object

5 safety 1727 non-null object

6 class 1727 non-null object

dtypes: object(7)

memory usage: 94.6+ KB

#
#Frequency distribution of values in variables

#Now, check the frequency counts of categorical variables.
col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']

for col in col_names:

print(df[col].value_counts())

high 432

med 432

low 432

vhigh 431

Name: buying, dtype: int64

high 432

med 432

low 432

vhigh 431

Name: maint, dtype: int64

3 432

4 432

5more 432

2 431

Name: doors, dtype: int64

4 576

more 576

2 575

Name: persons, dtype: int64

big 576

med 576

small 575

Name: lug_boot, dtype: int64

high 576

med 576

low 575

Name: safety, dtype: int64

unacc 1209

acc 384

good 69

vgood 65

Name: class, dtype: int64

df['class'].value_counts()

unacc 1209

acc 384

good 69

vgood 65

Name: class, dtype: int64

# check missing values in variables
df.isnull().sum()

buying 0

maint 0

doors 0

persons 0

lug_boot 0

safety 0

class 0

dtype: int64

#Declare feature vector and target variable
X = df.drop(['class'], axis=1)

y = df['class']

X
buying maint doors persons lug_boot safety

0 vhigh vhigh 2 2 small med

1 vhigh vhigh 2 2 small high

2 vhigh vhigh 2 2 med low

3 vhigh vhigh 2 2 med med

4 vhigh vhigh 2 2 med high

y ... ... ... ... ... ... ...

01722 unacc

low low 5more more med med

1 unacc

21723 low
unacc
low 5more more med high
3 unacc

41724 low
unacc
low 5more more big low
...

1725 low low 5more more big med

1722 good

1723
1726 vgood

low low 5more more big high

1724 unacc

1725 good

1727 rows × 6 columns

1726 vgood

Name: class, Length: 1727, dtype: object

#Split data into separate training and test set
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state =

# check the shape of X_train and X_test

X_train.shape, X_test.shape

((1157, 6), (570, 6))

#Feature Engineering
# check data types in X_train

X_train.dtypes

buying object

maint object

doors object

persons object

lug_boot object

safety object

dtype: object

#Encode categorical variables
X_train.head()
buying maint doors persons lug_boot safety

83 vhigh vhigh 5more 2 med low

48 vhigh vhigh 3 more med med

468 high vhigh 3 4 small med

155 vhigh high 3 more med low

1043 med high 4 more small low

pip install category_encoders

Requirement already satisfied: category_encoders in /usr/local/lib/python3.7/dist-pac

Requirement already satisfied: numpy>=1.14.0 in /usr/local/lib/python3.7/dist-package
Requirement already satisfied: patsy>=0.5.1 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: statsmodels>=0.9.0 in /usr/local/lib/python3.7/dist-pa
Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.7/dist-
Requirement already satisfied: scipy>=1.0.0 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: pandas>=0.21.1 in /usr/local/lib/python3.7/dist-packag
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dis
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from pa
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages

# import category encoders

import category_encoders as ce

# encode categorical variables with ordinal encoding

encoder = ce.OrdinalEncoder(cols=['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safe

X_train = encoder.fit_transform(X_train)

X_test = encoder.transform(X_test)

/usr/local/lib/python3.7/dist-packages/category_encoders/utils.py:21: FutureWarning:
elif pd.api.types.is_categorical(cols):

X_train.head()
buying maint doors persons lug_boot safety

83
X_test.head() 1 1 1 1 1 1

48 1 1 2 2 1 2
buying maint doors persons lug_boot safety
468 2 1 2 3 2 2
599 2 2 3 1 3 1
155 1 2 2 2 1 1
932 3 1 3 3 3 1
1043 3 2 3 2 2 1
628 2 2 1 1 3 3

1497 4 2 1 3 1 2

1262 3 4 3 2 1 1

# import Random Forest classifier

from sklearn.ensemble import RandomForestClassifier

# instantiate the classifier

rfc = RandomForestClassifier(random_state=0)

# fit the model

rfc.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,

criterion='gini', max_depth=None, max_features='auto',

max_leaf_nodes=None, max_samples=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, n_estimators=100,

n_jobs=None, oob_score=False, random_state=0, verbose=0,

warm_start=False)

# Predict the Test set results

y_pred = rfc.predict(X_test)

# Check accuracy score

from sklearn.metrics import accuracy_score

#print('Model accuracy score with 10 decision-trees : {0:0.4f}'. format(accuracy_score(y_t

#H b ild th R d F t Cl ifi d l ith d f lt t f ti t

#Here, we build the Random Forest Classifier model with default parameter of n_estimators

# instantiate the classifier with n_estimators = 100

rfc_100 = RandomForestClassifier(n_estimators=100, random_state=0)

# fit the model to the training set

rfc_100.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,

criterion='gini', max_depth=None, max_features='auto',

max_leaf_nodes=None, max_samples=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, n_estimators=100,

n_jobs=None, oob_score=False, random_state=0, verbose=0,

warm_start=False)

# Predict on the test set results

y_pred_100 = rfc_100.predict(X_test)

# Check accuracy score

print('Model accuracy score with 100 decision-trees : {0:0.4f}'. format(accuracy_score(y_t

Model accuracy score with 100 decision-trees : 0.9649

#Find important features with Random Forest model
# create the classifier with n_estimators = 100

clf = RandomForestClassifier(n_estimators=100, random_state=0)

# fit the model to the training set

clf.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,

criterion='gini', max_depth=None, max_features='auto',

max_leaf_nodes=None, max_samples=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, n_estimators=100,

n_jobs=None, oob_score=False, random_state=0, verbose=0,

warm_start=False)

# view the feature scores

feature scores = pd Series(clf feature importances index=X train columns) sort values(as
feature_scores = pd.Series(clf.feature_importances_, index=X_train.columns).sort_values(as

feature_scores

safety 0.291657

persons 0.235380

buying 0.160692

maint 0.134143

lug_boot 0.111595

doors 0.066533

dtype: float64

# Creating a seaborn bar plot

sns.barplot(x=feature_scores, y=feature_scores.index)
# Add title to the graph

plt.title("Visualizing Important Features")

# Visualize the graph

plt.show()

#Build Random Forest model on selected features
# declare feature vector and target variable

X = df.drop(['class', 'doors'], axis=1)

y = df['class']

# split data into training and testing sets

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state =
X

buying maint persons lug_boot safety

0 vhigh vhigh 2 small med

1 vhigh vhigh 2 small high

2 vhigh vhigh 2 med low

3 vhigh vhigh 2 med med

4 vhigh vhigh 2 med high

... ... ... ... ... ...

1722 low low more med med

1723 low low more med high

1724 low low more big low

1725 low low more big med

1726 low low more big high

1727 rows × 5 columns

#encode categorical variables with ordinal encoding

encoder = ce.OrdinalEncoder(cols=['buying', 'maint', 'persons', 'lug_boot', 'safety'])

X_train = encoder.fit_transform(X_train)

X_test = encoder.transform(X_test)

/usr/local/lib/python3.7/dist-packages/category_encoders/utils.py:21: FutureWarning:
elif pd.api.types.is_categorical(cols):

# instantiate the classifier with n_estimators = 100

clf = RandomForestClassifier(random_state=0)

# fit the model to the training set

clf.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,

criterion='gini', max_depth=None, max_features='auto',

max_leaf_nodes=None, max_samples=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, n_estimators=100,

n_jobs=None, oob_score=False, random_state=0, verbose=0,

warm_start=False)
# Predict on the test set results

y_pred = clf.predict(X_test)

# Check accuracy score

print('Model accuracy score with doors variable removed : {0:0.4f}'. format(accuracy_score

Model accuracy score with doors variable removed : 0.9263

# Classification Report

#Classification report is another way to evaluate the classification model performance. It

#We can print a classification report as follows:-

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

precision recall f1-score support

acc 0.88 0.85 0.86 127

good 0.62 0.56 0.59 18

unacc 0.97 0.97 0.97 399

vgood 0.75 0.81 0.78 26

accuracy 0.93 570

macro avg 0.80 0.80 0.80 570

weighted avg 0.93 0.93 0.93 570

3) Code For ID3 Algorithm Implementation
100% (1)
3) Code For ID3 Algorithm Implementation
8 pages
Python The Inventory Project
No ratings yet
Python The Inventory Project
52 pages
Answer 1 BDTA Neha Khandelwal 024
No ratings yet
Answer 1 BDTA Neha Khandelwal 024
5 pages
Ensmble - Learning - ML - 5 - Jupyter Notebook
No ratings yet
Ensmble - Learning - ML - 5 - Jupyter Notebook
7 pages
Day52_Random_Forest_1701050878
No ratings yet
Day52_Random_Forest_1701050878
6 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Random Forest_Car- Jupyter Notebook
No ratings yet
Random Forest_Car- Jupyter Notebook
4 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
79 pages
PCA2-1
No ratings yet
PCA2-1
26 pages
Hung-yi Lee GAN-Basic Idea (2017.04.21)
No ratings yet
Hung-yi Lee GAN-Basic Idea (2017.04.21)
67 pages
Password Cracking Decrypted
100% (1)
Password Cracking Decrypted
20 pages
Extra Class 2
No ratings yet
Extra Class 2
19 pages
q1
No ratings yet
q1
2 pages
Integer factorization
No ratings yet
Integer factorization
6 pages
5 Backward Propagation
No ratings yet
5 Backward Propagation
81 pages
Discrete Optimization: Assignments: Knapsack
No ratings yet
Discrete Optimization: Assignments: Knapsack
14 pages
Algo Assignment4
No ratings yet
Algo Assignment4
7 pages
Section 3.7 Optimization Problems
No ratings yet
Section 3.7 Optimization Problems
24 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
93 pages
IR&TM Review II: F A S T
No ratings yet
IR&TM Review II: F A S T
5 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
REPORT MESSAGE-ID zas521.
No ratings yet
REPORT MESSAGE-ID zas521.
84 pages
binarysearch_dsa
No ratings yet
binarysearch_dsa
16 pages
Game Playing. Updated (3)
No ratings yet
Game Playing. Updated (3)
44 pages
Spatial Weight Matrix r 5
No ratings yet
Spatial Weight Matrix r 5
7 pages
MakeUpCat
No ratings yet
MakeUpCat
6 pages
SV Datatype Lab Exercise
No ratings yet
SV Datatype Lab Exercise
3 pages
The Selection: Algorithm: Design & Analysis
No ratings yet
The Selection: Algorithm: Design & Analysis
33 pages
125. Base Dysp_d017 [Codebase 64 Wiki]
No ratings yet
125. Base Dysp_d017 [Codebase 64 Wiki]
13 pages
04 Sm Debuge Optimization
No ratings yet
04 Sm Debuge Optimization
85 pages
Chap9 Python-Dictionaries
No ratings yet
Chap9 Python-Dictionaries
29 pages
Calculator Use Part9
No ratings yet
Calculator Use Part9
1 page
assgmt1
No ratings yet
assgmt1
7 pages
CG Assignment 1
No ratings yet
CG Assignment 1
19 pages
Monte-Carlo Tree Search: Alan Fern
No ratings yet
Monte-Carlo Tree Search: Alan Fern
51 pages
Monte-Carlo Tree Search: Alan Fern
No ratings yet
Monte-Carlo Tree Search: Alan Fern
51 pages
Programming Assignment3
No ratings yet
Programming Assignment3
10 pages
Lecture 2 - LP Basics1
No ratings yet
Lecture 2 - LP Basics1
33 pages
Ece4750 T01 Proc Concepts Problems
No ratings yet
Ece4750 T01 Proc Concepts Problems
10 pages
Num Deriv Config
No ratings yet
Num Deriv Config
12 pages
Dedup Slides
No ratings yet
Dedup Slides
51 pages
Ex 1
No ratings yet
Ex 1
119 pages
Linalgthy 3 Inner Product Spaces
No ratings yet
Linalgthy 3 Inner Product Spaces
23 pages
Py4Inf 09 Dictionaries
No ratings yet
Py4Inf 09 Dictionaries
32 pages
Unit 2-Dictionaries
No ratings yet
Unit 2-Dictionaries
52 pages
Final Exam: ECH 5261 Advanced Transport Phenomena Fall 2020
No ratings yet
Final Exam: ECH 5261 Advanced Transport Phenomena Fall 2020
4 pages
MODIFICHE DEVK949695
No ratings yet
MODIFICHE DEVK949695
5 pages
Chapter 2
No ratings yet
Chapter 2
74 pages
06 - Combinational Logic - Coders Encoders
No ratings yet
06 - Combinational Logic - Coders Encoders
24 pages
Recitation 4
No ratings yet
Recitation 4
25 pages
New8 11
No ratings yet
New8 11
70 pages
Cascmd en
No ratings yet
Cascmd en
369 pages
Greedy Algorithm - Part 1
No ratings yet
Greedy Algorithm - Part 1
19 pages
Cryptography of question answer
No ratings yet
Cryptography of question answer
6 pages
Binary and Binomial Heaps: Priority Queues
No ratings yet
Binary and Binomial Heaps: Priority Queues
11 pages
Exercise 2.6 Thomas Calculus 12th
No ratings yet
Exercise 2.6 Thomas Calculus 12th
6 pages
Moving Object Detection Using Frame Differencing With OpenCV
No ratings yet
Moving Object Detection Using Frame Differencing With OpenCV
1 page
Deep Neural Network - Application 2layer
No ratings yet
Deep Neural Network - Application 2layer
7 pages
Lecture 11: Precursor To Lemke-Howson Algorithm: 1 Recap
No ratings yet
Lecture 11: Precursor To Lemke-Howson Algorithm: 1 Recap
6 pages
Delicate Crochet: 23 Light and Pretty Designs for Shawls, Tops and More
From Everand
Delicate Crochet: 23 Light and Pretty Designs for Shawls, Tops and More
Sharon Hernes Silverman
4/5 (11)
The Bark in Space: Book 5
From Everand
The Bark in Space: Book 5
Trina Robbins
3.5/5 (2)
Balsus Analogue
No ratings yet
Balsus Analogue
8 pages
Graded Quiz - Module 2 (Page 2 of 20)
No ratings yet
Graded Quiz - Module 2 (Page 2 of 20)
1 page
Graded Quiz - Module 4 (Page 3 of 10)
No ratings yet
Graded Quiz - Module 4 (Page 3 of 10)
1 page
6) TCE MOOC-jLinear Regression
No ratings yet
6) TCE MOOC-jLinear Regression
19 pages
My Mother at Sixty Six PDF234
No ratings yet
My Mother at Sixty Six PDF234
5 pages
Autumn Break Assignment I
100% (1)
Autumn Break Assignment I
7 pages
7 1 Breast Engorgement
No ratings yet
7 1 Breast Engorgement
10 pages
Presentation BP Monitor
No ratings yet
Presentation BP Monitor
7 pages
advtTBRL 01012024
No ratings yet
advtTBRL 01012024
1 page
Wolf-02
No ratings yet
Wolf-02
4 pages
241ICS202Assignment-4 (1)
No ratings yet
241ICS202Assignment-4 (1)
3 pages
17SOCOR MODULE 6-UNDERSTANDING MOPGs AS INSTITUTIONAL STANDARDS
No ratings yet
17SOCOR MODULE 6-UNDERSTANDING MOPGs AS INSTITUTIONAL STANDARDS
5 pages
Fee Structure 2022-23 National
No ratings yet
Fee Structure 2022-23 National
2 pages
Best Orthopedic Specialist Surgeon Clinic & Doctor in Mumbai
No ratings yet
Best Orthopedic Specialist Surgeon Clinic & Doctor in Mumbai
40 pages
2020 STRG Form (2) - New
No ratings yet
2020 STRG Form (2) - New
22 pages
(Solved) 1. Analyze The Mission Statement Using The Nine Components ... - Course Hero
100% (1)
(Solved) 1. Analyze The Mission Statement Using The Nine Components ... - Course Hero
5 pages
Service Guide: Low-Pressure Stub Pump
No ratings yet
Service Guide: Low-Pressure Stub Pump
10 pages
Question Bank Topic 4 - Equities
No ratings yet
Question Bank Topic 4 - Equities
5 pages
C1 Chapter 1
No ratings yet
C1 Chapter 1
1 page
Answers COST of CAPITAL Exercises 2
No ratings yet
Answers COST of CAPITAL Exercises 2
2 pages
Cag14 PDF
0% (1)
Cag14 PDF
5 pages
Optimization of Chain Link of Material Handling Chain Conveyor System From FEA and Experimental Aspects (Paper-II)
No ratings yet
Optimization of Chain Link of Material Handling Chain Conveyor System From FEA and Experimental Aspects (Paper-II)
4 pages
Antnna
No ratings yet
Antnna
4 pages
Laptop Collection Schedule and Induction 1
No ratings yet
Laptop Collection Schedule and Induction 1
2 pages
Ilp Semester 3
No ratings yet
Ilp Semester 3
7 pages
Sanitary Engineering Chapter 5
No ratings yet
Sanitary Engineering Chapter 5
23 pages
App Idea For India
No ratings yet
App Idea For India
2 pages
Tos (Direct and Bending Stress)
100% (1)
Tos (Direct and Bending Stress)
18 pages
Soybean Protein Fiber PDF
No ratings yet
Soybean Protein Fiber PDF
6 pages
An Overview of Oracle Form Builder v.6.0
No ratings yet
An Overview of Oracle Form Builder v.6.0
36 pages
Resume - Ramesh Devapatla
No ratings yet
Resume - Ramesh Devapatla
5 pages
Sustainable Procurement Policy Template
100% (1)
Sustainable Procurement Policy Template
5 pages
Operating Systems: Threads Implementation
No ratings yet
Operating Systems: Threads Implementation
47 pages
Uu Tien 1
No ratings yet
Uu Tien 1
180 pages
3 Cortex-M0+ Instruction Set
No ratings yet
3 Cortex-M0+ Instruction Set
45 pages
Public Market Code
100% (11)
Public Market Code
14 pages