0% found this document useful (0 votes)

775 views32 pages

MachineLearning Project PDF

This document discusses predicting employees' choice of transportation using machine learning models. It first explores the dataset containing employees' personal and professional details as well as their mode of transportation. Key findings from the exploratory data analysis include age 30 and above and salary 30k and above are more likely to use a car for transportation. Female car usage is also much lower than male usage. The document then outlines the steps to build logistic regression, KNN, and naive bayes models to predict car usage and determine significant predictor variables influencing an employee's choice of transportation.

Uploaded by

Senthil Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

775 views32 pages

MachineLearning Project PDF

Uploaded by

Senthil Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Machine Learning

Transport Choice of Employees

Senthil Kumar M
22.Sep.2019
Machine Learning (PGP-BABI)
by Great Learning

Table of Contents

INTRODUCTION 2

Observation 4

Step by step approach 5

Exploratory Data Analysis 5
EDA Summary: 12
Logistic Regression 14
KNN 20
Naive Bayes 22

REFERENCES 26
Great Learning PGP 1

Great Learning PGP(BABI)

INTRODUCTION
This project is to understand the determinants of transport choice made by employees.
The given data has an employee information about their mode of transport as well as
their personal and professional details like age, salary, work exp. We need to predict
whether or not an employee will use Car as a mode of transport. Also, which variables
are a significant predictor behind this decision.

We are gonna use multiple model and performance metrics to derive a better model that

can describe a variable influencing employee to use a car as a mode of transport. The
input variables include employee personal details like Age, Salary, Work.exp. We are
going to use

Process Map

Great Learning PGP(BABI)

The structure of input variable is tabled below:

Given the dataset, we required to perform following tasks as explained to complete this
project successfully:

1. EDA
2. Data Preparation
3. Modeling
4. Actionable Insights & Recommendations

Great Learning PGP(BABI)

Observation
Employees use 2 wheeler, public transport and car as a mode of transport to commute to
their workplace. We have been given 418 rows of data with 9 variables. We might want
to cleanup the dataset and convert its type appropriately as required before processing it
for analysis.

Problem statement is that of predicting whether or not an employee will use a car
as a mode of transport, also which variable is a significant predictor behind the
decision.

Step by step approach

We shall do the following to perform stepwise analysis and conclude this project.

1. Exploratory Data Analysis

2. Clustering
3. CART
4. Random Forest
5. Performance Measurement
6. Conclusion

1. Exploratory Data Analysis

We will start with converting categorical variables to factor to start our EDA process.

Great Learning PGP(BABI)

The following graph of overview as how the variables spread with volume of usage:

Great Learning PGP(BABI)

Structure of the dataset printed for reference.

Lets notice that there is a missing value in a variable MBA. we have several ways to treat

but we will remove the whole record as there is only 1 missing value.

There are several automated packages in ‘R’ to perform exploratory data analysis, we are
going to use one such package “dlookr” in this project. EDA report from “dlookr” package
gives us the detailed count of distinct values in each variable along with normality test,
correlation coefficient other descriptive stats are elaborated as below:

Great Learning PGP(BABI)

Normality Test of Numeric Variable:

Normality test statistics proves that Age & Distance variables are closely distributed
normal, while Work Exp & Salary having positive skew in the dataset. Numeric variables
individually tested for normality and skewness values with QQ plots for each variables
printed down for reference.

Great Learning PGP(BABI)

Univariate Distribution: Histogram

Great Learning PGP(BABI)

Churn Ratio by numerical predictors:

We can notice that the higher the salary & age the employees are using a car. There is
clear indication that age 30 above as well as salary 30k and above preferred to use a car
as a mode of transport. Also the distance above 15miles are with higher salary are
choosing car as mode that is very evident in this dataset.

Great Learning PGP(BABI)

The above map depicts that female car usage is much lower compared to male, whereas
qualification doesn’t have any correlation with car usage. But license as we can assume
employee without license uses public transport.

Target based Analysis: (Categorical Variables)

Great Learning PGP(BABI)

Target based Analysis: (Numerical Variables)

AGE:

Great Learning PGP(BABI)

Wrok Exp:

Great Learning PGP(BABI)

Salary:

Great Learning PGP(BABI)

Distance:

Great Learning PGP(BABI)

Grouped Correlation Plot of Numerical Variables

Great Learning PGP(BABI)

EDA Summary:
1. There is 1 NA’s in the entire dataset
2. Correlation between predictor variables found and removed from dataset
3. We had challenges in numeric variables that were positively correlated, hence
removing a variable Age & Work.Exp reduced numeric predictors to only 2 to go
ahead with model. We could have used other methods such as PCA to fix the same
but since the correlation about 90% we are retaining only Salary from personal
details to train our model.

Data Preparation:
Our primary interest as per problem statement is to understand the factors influencing
car usage. Hence we will create a new column for Car usage. It will take value 0 for
Public Transport & 2 Wheeler and 1 for car usage Understand the proportion of cars in
Transport Mode.

Great Learning PGP(BABI)

Only 8% of employees in the dataset is using cars as a mode of transport.

Smote the Data

Before Smote After Smote

Great Learning PGP(BABI)

Modelling Building:

Great Learning PGP(BABI)

Improving the model

Great Learning PGP(BABI)

VIF scores to verify the multicollinearity, Work.Exp variable score above 10 confirms
that multicollinearity exists in the dataset.

After dropping out the Age & Work.Exp variables, we notice that VIF results are
significantly low and we can conclude that the data is free from multicollinearity. We
might go ahead training model with remaining variables.

Great Learning PGP(BABI)

SMDM Project Report - Shubham Bakshi - 07.05.2023
0% (1)
SMDM Project Report - Shubham Bakshi - 07.05.2023
23 pages
Extended Project FastKart SQLite MYSQL 1 1 PDF
No ratings yet
Extended Project FastKart SQLite MYSQL 1 1 PDF
5 pages
Analysis of Transport Choice of Employees - A Project On Machine Learning
100% (10)
Analysis of Transport Choice of Employees - A Project On Machine Learning
24 pages
Machine Learning Assignment Report - Cars
100% (4)
Machine Learning Assignment Report - Cars
42 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
Machine Learning (Project5) PDF
100% (2)
Machine Learning (Project5) PDF
13 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
Machine Learning Project - Sapan Parikh
100% (1)
Machine Learning Project - Sapan Parikh
12 pages
Business Report Pradeep Chauhan 11june'23
100% (1)
Business Report Pradeep Chauhan 11june'23
25 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
No ratings yet
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
9 pages
Machine Learning - Final Project Report - Problem 1
100% (1)
Machine Learning - Final Project Report - Problem 1
26 pages
Extended Project
No ratings yet
Extended Project
1 page
Capstone Project Submission
100% (2)
Capstone Project Submission
31 pages
Machine Learning Business Report - Compress (AutoRecovered)
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
69 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Sunira - Predictive Modeling
100% (1)
Sunira - Predictive Modeling
65 pages
TSF Shoe Sales & Softdrink by Shubradip Ghosh Pgpdsba 2022 Mar
No ratings yet
TSF Shoe Sales & Softdrink by Shubradip Ghosh Pgpdsba 2022 Mar
61 pages
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
100% (3)
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
77 pages
Business Report Sparkling Dataset - TSF
No ratings yet
Business Report Sparkling Dataset - TSF
26 pages
Clustering Project
100% (1)
Clustering Project
44 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
Suresh-Rose Time Series Forecasting Project Report
100% (1)
Suresh-Rose Time Series Forecasting Project Report
75 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
No ratings yet
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
56 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
Statisitics Project 6
100% (2)
Statisitics Project 6
48 pages
Project Advance Stats - Abhishek
No ratings yet
Project Advance Stats - Abhishek
14 pages
Project Predictive Modeling PDF
100% (1)
Project Predictive Modeling PDF
58 pages
Project ML
100% (4)
Project ML
36 pages
SMDM Project
No ratings yet
SMDM Project
17 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Fra Project Report-Bajaj Auto Ltd. Vs Hero Motocorp Ltd. (Group-X)
100% (1)
Fra Project Report-Bajaj Auto Ltd. Vs Hero Motocorp Ltd. (Group-X)
10 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
Business Report SMDM Bhushan
No ratings yet
Business Report SMDM Bhushan
18 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
Data Mining Project
100% (1)
Data Mining Project
24 pages
ML Project Report
100% (2)
ML Project Report
35 pages
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
No ratings yet
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
28 pages
Predictive Modelling
67% (3)
Predictive Modelling
64 pages
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
100% (1)
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
25 pages
Shivani Pandey TSF
100% (1)
Shivani Pandey TSF
32 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Project Report
100% (3)
Project Report
36 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
Project Predictive Modeling
50% (2)
Project Predictive Modeling
69 pages
Machine Learning Project: Raghul Harish
100% (2)
Machine Learning Project: Raghul Harish
46 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
AS Project Report
No ratings yet
AS Project Report
22 pages
AKSHAYA - Advanced Statistics Project Report
No ratings yet
AKSHAYA - Advanced Statistics Project Report
50 pages
Business Analytics Report: Submitted To
No ratings yet
Business Analytics Report: Submitted To
32 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Car Transport Machine Learning
89% (9)
Car Transport Machine Learning
28 pages

MachineLearning Project PDF

Uploaded by

MachineLearning Project PDF

Uploaded by

Machine Learning

Transport Choice of Employees

Step by step approach 5

Great Learning PGP(BABI)

We are gonna use multiple model and performance metrics to derive a better model that

Great Learning PGP(BABI)

The structure of input variable is tabled below:

Great Learning PGP(BABI)

Step by step approach

1. Exploratory Data Analysis

1. Exploratory Data Analysis

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Structure of the dataset printed for reference.

Lets notice that there is a missing value in a variable MBA. we have several ways to treat

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Normality Test of Numeric Variable:

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Univariate Distribution: Histogram

Great Learning PGP(BABI)

Churn Ratio by numerical predictors:

Great Learning PGP(BABI)

Target based Analysis: (Categorical Variables)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Target based Analysis: (Numerical Variables)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Grouped Correlation Plot of Numerical Variables

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Only 8% of employees in the dataset is using cars as a mode of transport.

Smote the Data

Before Smote After Smote

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Improving the model

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

You might also like