0% found this document useful (0 votes)

6 views

Uber Data Analysis

Uploaded by

Dev Agarwal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Uber Data Analysis

Uploaded by

Dev Agarwal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Uber Data

Analysis
PRESENTED BY :

UNNATI GOYAL
(181500768)

SAUMYA GUPTA
(181500632)

NANDINEE GUPTA
(181500414)

ROSHNI RAWAT
(181500594)
Aim
Complete Data Analysis and
Exploration of Uber Dataset

4
Data Set:
Kaggle
CSV Format

Shape:
322844,56

5
Libraries

6
Exploratory Data Analysis
Exploratory Data Analysis refers to the critical
process of performing initial investigations on data so
as to discover patterns to spot anomalies to test hypothesis
and to check assumptions with the help of summary statistics
and graphical representations.

It is a good practice to understand the data first and try to

gather as many insights from it.

EDA is all about making sense of data in hand.

7
Feature Engineering
All machine learning algorithms use some input data to create outputs.
This input data comprise features, which are usually in the form of
structured columns. Algorithms require features with some specific
characteristics to work properly and so the need for feature engineering
arises.

I think feature engineering efforts mainly have two goals:

 Preparing the proper input dataset, compatible with the machine
learning algorithm requirements.

 Improving the performance of machine learning models.

8
Strip-plot between Name and Price

9
Label Encoding
Label Encoding refers to converting the labels into
numeric form so as to convert it into the machine-
readable form. Machine learning algorithms can then
decide in a better way on how those labels must be
operated. It is an important pre-processing step for the
structured dataset in supervised learning.
NANs(missing values)

Our data set contain NANs only in Price column.

The count of nans is: 55095

The nans is filled with the median of other values.

11
Feature Selection
Feature Selection is the process of selecting a subset of
relevant feature (variables, predictors) for use in model
construction.

Recursive Feature Elimination

:
Recursive feature elimination (RFE) is a feature
selection method that fits a model and removes the
weakest feature (or features) until the specified
number of features is reached.
12
After applying RFE on given data set with Linear Regression,
we found accuracy with different no of columns as follows:-

Serial No. No. of Feature Accuracy

1 56 0.805483422

2 40 0.8050662132

3 25 0.80553551515

4 15 0.8050457819
Final Dataset

14
Modeling

After Completion of Recursive

Feature Elimination process, we
can done Modeling on final
dataset.

15
Linear Regression: Linear Regression is a supervised machine
learning algorithm where the predicted output is continuous and has a
constant slope. It’s used to predict values within a continuous range.

Decision Tree: A decision tree is a graphical representation of all

the possible solutions to a decision based on certain conditions.

Random Forest: Random Forest is a popular machine learning

algorithm that belongs to the supervised learning technique. It can be
used for both Classification and Regression problems in ML. It is
based on the concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex problem and to
improve the performance of the model. 16
Gradient Boosting Regressor :Gradient boosting is a machine
learning technique for regression and classification problems, which
produces a prediction model in the form of an ensemble of weak
prediction models, typically decision trees. It builds the model in a
stage-wise fashion like other boosting methods do, and it generalizes
them by allowing optimization of an arbitrary differentiable loss function.

17
After applying different models on final dataset, we found
different accuracy as given below :-

Serial No. Models Accuracy

1 Linear Regression 0.74754507316

2 Decision Tree 0.961791729999

3 Random Forest 0.96226947434198

4 Gradient Boosting 0.96318719462782

Regressor

18
Testing

The usage of the word "testing" in relation to

machine learning models is primarily used
for testing the model performance in terms of
accuracy/precision of the model.

19
With the help of linear regression and random
forest models, we predict the price, plot a graph
between actual and predicted values and find the
following errors:-

For linear regression:-

MAE : 3.406077219
MSE : 20.03343709
RMAE : 4.47587277

For random forest:-

MAE : 0.998137009
MSE : 2.944653619
RMAE : 1.71599930 20
Price Prediction Function

At last, we create a function for price

prediction which take cab name,
source, surge multiplier and icon as
input and predict the price.

21
Free templates for all your presentation needs

For PowerPoint and 100% free for personal or Ready to use, professional Blow your audience away
Google Slides commercial use and customizable with attractive visuals

Uber Data Analysis
100% (4)
Uber Data Analysis
37 pages
Artificial Intelligence in Medicine Book - 2022 - 1
No ratings yet
Artificial Intelligence in Medicine Book - 2022 - 1
18 pages
cz4041 Project Final Report Nyc Taxi Fare Prediction
0% (1)
cz4041 Project Final Report Nyc Taxi Fare Prediction
18 pages
Azure Machine Learning Studio - Automobile Price Prediction
No ratings yet
Azure Machine Learning Studio - Automobile Price Prediction
11 pages
Project-Predictive Modeling-Rajendra M Bhat
100% (3)
Project-Predictive Modeling-Rajendra M Bhat
14 pages
Meta
No ratings yet
Meta
21 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
Kaggle Competitions - How To Win
No ratings yet
Kaggle Competitions - How To Win
74 pages
BA Project - Team17
No ratings yet
BA Project - Team17
13 pages
Report
No ratings yet
Report
36 pages
Machine Learning Project Car Price Prediction Algorithm
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
4 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Major Project
No ratings yet
Major Project
17 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Linear Regression
67% (3)
Linear Regression
15 pages
2_DataPreProcessing_code
No ratings yet
2_DataPreProcessing_code
46 pages
Week 7 - Lecture 13
No ratings yet
Week 7 - Lecture 13
22 pages
Is 4410 AzureML Regression Predict Auto Price-1
No ratings yet
Is 4410 AzureML Regression Predict Auto Price-1
15 pages
Machine Learning New
No ratings yet
Machine Learning New
8 pages
Cab Fare Prediction Report by Abhinav Jha
No ratings yet
Cab Fare Prediction Report by Abhinav Jha
41 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Flight Price Prediction Report
No ratings yet
Flight Price Prediction Report
18 pages
Final Lab Manual
No ratings yet
Final Lab Manual
34 pages
Sales Car Price Predictions
No ratings yet
Sales Car Price Predictions
6 pages
Anuj Sip - 1
No ratings yet
Anuj Sip - 1
34 pages
Report
No ratings yet
Report
24 pages
capstone overview
No ratings yet
capstone overview
58 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
Report_1
No ratings yet
Report_1
11 pages
Data Mining Project Presentation - JAG
No ratings yet
Data Mining Project Presentation - JAG
32 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
ML Assignment (22BCE8086) 2
No ratings yet
ML Assignment (22BCE8086) 2
19 pages
Data Analytics on Banking
No ratings yet
Data Analytics on Banking
3 pages
House Price Prediction Using Machine Learning Techniques
No ratings yet
House Price Prediction Using Machine Learning Techniques
5 pages
House Price Prediction Using Machine Learning Techniques
No ratings yet
House Price Prediction Using Machine Learning Techniques
5 pages
Big Mart Sales Prediction Using Machine Learning Report PDF
No ratings yet
Big Mart Sales Prediction Using Machine Learning Report PDF
56 pages
Session 4 Machine Learning Process (1)
No ratings yet
Session 4 Machine Learning Process (1)
28 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
ES205 Presentation
No ratings yet
ES205 Presentation
13 pages
COMP1801 - Copy 1
No ratings yet
COMP1801 - Copy 1
18 pages
Module 1 2
No ratings yet
Module 1 2
64 pages
IJRPR22505
No ratings yet
IJRPR22505
3 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
AzureMl Intro
No ratings yet
AzureMl Intro
28 pages
MDS372_LAB4_2448001
No ratings yet
MDS372_LAB4_2448001
17 pages
Assignment 1 - CIS 508
No ratings yet
Assignment 1 - CIS 508
11 pages
ML LAB
No ratings yet
ML LAB
23 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Ids Case Study
No ratings yet
Ids Case Study
15 pages
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
How To Learn Machine Learning Algorithms For Interviews
No ratings yet
How To Learn Machine Learning Algorithms For Interviews
16 pages
Car Price Prediction Using Machine Learning
33% (3)
Car Price Prediction Using Machine Learning
15 pages
Fundamentals of ML Recap
No ratings yet
Fundamentals of ML Recap
21 pages
DOC-20241216-WA0008.
No ratings yet
DOC-20241216-WA0008.
63 pages
Savitribai Phule Pune University: A Report On Mini Project
No ratings yet
Savitribai Phule Pune University: A Report On Mini Project
10 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
8 pages
AI UNIT-4 PPT
No ratings yet
AI UNIT-4 PPT
60 pages
Breast Cancer Dataset
60% (5)
Breast Cancer Dataset
41 pages
Nptel Bia All
No ratings yet
Nptel Bia All
42 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
9 pages
AD LAB-8.1-GrWork-updated
No ratings yet
AD LAB-8.1-GrWork-updated
7 pages
Camera Ready
No ratings yet
Camera Ready
5 pages
Analysis of Classification Algorithm in Data Mining
No ratings yet
Analysis of Classification Algorithm in Data Mining
3 pages
Bits+and+Bytes+-+July+2022+Edition
No ratings yet
Bits+and+Bytes+-+July+2022+Edition
17 pages
Btech III Year i Semester (Ar20)
No ratings yet
Btech III Year i Semester (Ar20)
7 pages
Report of Comparing 5 Classification Algorithms of Machine Learning PDF
No ratings yet
Report of Comparing 5 Classification Algorithms of Machine Learning PDF
4 pages
Doct Us Docs
No ratings yet
Doct Us Docs
95 pages
Predicting_Customer_Class_using_Customer_Lifetime_Value_with_Random_Forest_Algorithm
No ratings yet
Predicting_Customer_Class_using_Customer_Lifetime_Value_with_Random_Forest_Algorithm
6 pages
Tutorial Rapid Miner Life Insurance Promotion PDF
No ratings yet
Tutorial Rapid Miner Life Insurance Promotion PDF
11 pages
3804-Article Text-13288-1-2-20230528
No ratings yet
3804-Article Text-13288-1-2-20230528
11 pages
Fusion Based Feature Extraction Analysis of ECG Signal Interpretation - A Systematic Approach
No ratings yet
Fusion Based Feature Extraction Analysis of ECG Signal Interpretation - A Systematic Approach
16 pages
Estimation of Incremental Haulage Costs by Mining Historical Data and Their Influence in The Final Pit Definition
100% (1)
Estimation of Incremental Haulage Costs by Mining Historical Data and Their Influence in The Final Pit Definition
6 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Kazemi One Millisecond Face 2014 CVPR Paper
No ratings yet
Kazemi One Millisecond Face 2014 CVPR Paper
8 pages
AI Unit 2 Notes
No ratings yet
AI Unit 2 Notes
37 pages
Literature Survey of Association Rule Based Techniques For Preserving Privacy
No ratings yet
Literature Survey of Association Rule Based Techniques For Preserving Privacy
6 pages
A Hybrid Data Mining Model For Diagnosis of Patients With Clinical Suspicion of Dementia
No ratings yet
A Hybrid Data Mining Model For Diagnosis of Patients With Clinical Suspicion of Dementia
11 pages
A Comparative Study of Some Classification Algorithms Using and Algorithm
No ratings yet
A Comparative Study of Some Classification Algorithms Using and Algorithm
9 pages
Artificial Intelligence Vrs Statistics
100% (1)
Artificial Intelligence Vrs Statistics
25 pages
Career Recommendation System
No ratings yet
Career Recommendation System
8 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Artificial Neural Networks in Construction Engineering and Management
No ratings yet
Artificial Neural Networks in Construction Engineering and Management
12 pages
Course Recommendation From Social Data: Hana Bydžovská and Lubomír Popelínský
No ratings yet
Course Recommendation From Social Data: Hana Bydžovská and Lubomír Popelínský
8 pages
Project Report (ML PRO)
No ratings yet
Project Report (ML PRO)
71 pages