Bike Sharing Analysis

The document summarizes a student's analysis of a bike sharing dataset for a course on regression and time series analysis. It includes preprocessing steps like feature engineering, missing value analysis, outlier detection, and correlation analysis. Visualization techniques are used to understand patterns in bike rentals by month, season, hour, weekday, and user type. A random forest model is fit to the data and achieves an RMSLE score of 0.102804484141.

Uploaded by

Devansh Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

229 views4 pages

Bike Sharing Analysis

Uploaded by

Devansh Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Name- Gaurav Gupta

Roll no- 16HS20013

Subject- Regression and Time series
Course code- MA20005

Bike sharing Analysis

Workflow of dataset
● About Dataset
● Feature Engineering
● Missing Value Analysis
● Outlier Analysis
● Correlation Analysis
● Visualizing Distribution Of Data
● Visualizing Count Vs (Month,Season,Hour,Weekday,Usertype)
● Filling 0's In Windspeed Using Random Forest
● Random Forest Model

About dataset

Bike sharing systems are a means of renting bicycles where the process of obtaining
membership, rental, and bike return is automated via a network of kiosk locations
throughout a city. Using these systems, people are able rent a bike from a one location
and return it to a different place on an as-needed basis. Currently, there are over 500
bike-sharing programs around the world.

Feature Engineering
From the given dataset, the columns "season","holiday","workingday" and "weather" should
be of "categorical" data type. But the current data type is "int" for those columns. We
transform the dataset in the following ways so that we can get started up with our exploratory
data analysis (EDA).
● We Create new columns "date,"hour","weekDay","month" from "datetime" column.
● Coerce the datatype of "season","holiday","workingday" and weather to category.
● Drop the datetime column as we already extracted useful features from it.

Missing Values Analysis

Now we did missing value analysis and found no missing value in given dataset.
Outliers Analysis
At first look, "count" variable contains lot of outlier data points which skews the distribution
towards right (as there are more data points beyond Outer Quartile Limit). In addition to that,
following inferences can also been made from the simple boxplots given below.

● Spring season has got relatively lower count.The dip in median value in
boxplot gives evidence for it.
● The boxplot with "Hour Of The Day" is quiet interesting.The median value
are relatively higher at 7AM - 8AM and 5PM - 6PM. It can be attributed to
regular school and office users at that time.
● Most of the outlier points are mainly contributed from "Working Day" than
"Non Working Day". It is quiet visible from from figure 4.

Correlation Analysis
We plot a correlation plot between "count" and ["temp","atemp","humidity","windspeed"].

● temp and humidity features has got positive and negative correlation with
count respectively.Although the correlation between them are not very
prominent still the count variable has got little dependency on "temp" and
"humidity".
● windspeed is not gonna be really useful numerical feature and it is visible
from it correlation value with "count"
● "atemp" is variable is not taken into since "atemp" and "temp" has got strong
correlation with each other. During model building any one of the variable
has to be dropped since they will exhibit multicollinearity in the data.
● "Casual" and "Registered" are also not taken into account since they are
leakage variables in nature and need to dropped during model building.

Visualizing Distribution Of Data

As it is visible from the below figures that "count" variable is skewed towards right. We take
log transformation on "count" variable after removing outlier data points. After the
transformation the data looks much better (reducess its skewness).
Visualizing Count Vs (Month, Season, Hour, Weekday, Usertype)

● It is quite obvious that people tend to rent bike during summer season since it is
really conducive to ride bike at that season.Therefore June, July and August has
got relatively higher demand for bicycle.
● On weekdays more people tend to rent bicycle around 7AM-8AM and 5PM-6PM.
(Regular office days, school days).
● Above pattern is not observed on "Saturday" and "Sunday".More people tend to
rent bicycle between 10AM and 4PM.
● The peak user count around 7AM-8AM and 5PM-6PM is purely contributed by
registered user.
We got RMSLE Value For Random Forest: 0.102804484141

Subjective Questions
92% (13)
Subjective Questions
6 pages
Qbus2810 Notes PDF
100% (1)
Qbus2810 Notes PDF
58 pages
Yulu Business Case Final
No ratings yet
Yulu Business Case Final
19 pages
Data Analysis and Visualization in R - Final Paper - Bike Sharing Dataset Analysis
No ratings yet
Data Analysis and Visualization in R - Final Paper - Bike Sharing Dataset Analysis
16 pages
Solution - Data Analysis With Python-Project-2 - v1.0
No ratings yet
Solution - Data Analysis With Python-Project-2 - v1.0
14 pages
Predictive Tool: Rental Bike Hiring Services
No ratings yet
Predictive Tool: Rental Bike Hiring Services
11 pages
Bike Renting PDF
No ratings yet
Bike Renting PDF
26 pages
Module1 Report
No ratings yet
Module1 Report
8 pages
business-case-yulu-hypothesis-testing.ipynb - Colab
No ratings yet
business-case-yulu-hypothesis-testing.ipynb - Colab
4 pages
Milestone1 Report
No ratings yet
Milestone1 Report
10 pages
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
No ratings yet
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
18 pages
TD
No ratings yet
TD
4 pages
Bike Sharing Data Analysis
No ratings yet
Bike Sharing Data Analysis
24 pages
IS4240 - AY1314S2 - Assignment - DM1
No ratings yet
IS4240 - AY1314S2 - Assignment - DM1
3 pages
Bike Sharing Demand Prediction PPT
No ratings yet
Bike Sharing Demand Prediction PPT
42 pages
Regression Model To Predict Bike Sharing Demand
100% (1)
Regression Model To Predict Bike Sharing Demand
5 pages
yulu-srk
No ratings yet
yulu-srk
20 pages
Bike Sharing Company Analysis
No ratings yet
Bike Sharing Company Analysis
14 pages
Regression Linaire Python Tome I
No ratings yet
Regression Linaire Python Tome I
9 pages
Group7 Report
No ratings yet
Group7 Report
10 pages
Output
No ratings yet
Output
24 pages
Divvy Exercise R Script
No ratings yet
Divvy Exercise R Script
5 pages
Capital Bike Sharing Dataset Description-Part 3
No ratings yet
Capital Bike Sharing Dataset Description-Part 3
2 pages
Bike Sharing in Washington DC
No ratings yet
Bike Sharing in Washington DC
36 pages
Business Analytics Project - Group 06
100% (1)
Business Analytics Project - Group 06
16 pages
(Anh Duc Nguyen) Capstone
No ratings yet
(Anh Duc Nguyen) Capstone
53 pages
AnalysisReport
No ratings yet
AnalysisReport
54 pages
PFDA_Khalil_Mirza_TP053846.docx
No ratings yet
PFDA_Khalil_Mirza_TP053846.docx
39 pages
Case Study 1 Exercise R Script
No ratings yet
Case Study 1 Exercise R Script
5 pages
BIke Sharing Dataset Assignment PDF
No ratings yet
BIke Sharing Dataset Assignment PDF
2 pages
Macha Final Project
No ratings yet
Macha Final Project
8 pages
Seoul Rental Bike Data Analysis and Modeling: Quantitative Techniques - Ii
No ratings yet
Seoul Rental Bike Data Analysis and Modeling: Quantitative Techniques - Ii
20 pages
ML week 15
No ratings yet
ML week 15
6 pages
Case Study Demand Visualization and Insights Using Tableau
No ratings yet
Case Study Demand Visualization and Insights Using Tableau
2 pages
Ds R Capstone Template
No ratings yet
Ds R Capstone Template
36 pages
Machine Learning Statistical Model Using Transportation Data
No ratings yet
Machine Learning Statistical Model Using Transportation Data
32 pages
DSE 200X Final Project DarioDiazCuevas
No ratings yet
DSE 200X Final Project DarioDiazCuevas
46 pages
ass-2 (2)
No ratings yet
ass-2 (2)
13 pages
Bike Rental (Project)
No ratings yet
Bike Rental (Project)
16 pages
ANLY-500-53 Project Presentation
No ratings yet
ANLY-500-53 Project Presentation
13 pages
Xujia Wei - Data Science Portfolio
No ratings yet
Xujia Wei - Data Science Portfolio
13 pages
38313_Rune_Sten_Espen_Ostendorf_ProjectFinal_167619_812315045
No ratings yet
38313_Rune_Sten_Espen_Ostendorf_ProjectFinal_167619_812315045
20 pages
EDA Case of Study 2022
No ratings yet
EDA Case of Study 2022
43 pages
subjective questions answers
No ratings yet
subjective questions answers
14 pages
Rapid Minder Assignment
No ratings yet
Rapid Minder Assignment
38 pages
Case Study 1
No ratings yet
Case Study 1
11 pages
Sociology: Intermediate Quantitative Research Method
No ratings yet
Sociology: Intermediate Quantitative Research Method
26 pages
ML1Project
No ratings yet
ML1Project
67 pages
Mini Project Time Series
No ratings yet
Mini Project Time Series
55 pages
Sample Size 17379 Median Average Confidence Interval of Average Standard Deviation Minimum Maximum Sum
No ratings yet
Sample Size 17379 Median Average Confidence Interval of Average Standard Deviation Minimum Maximum Sum
3 pages
Report of BDA mini Project
No ratings yet
Report of BDA mini Project
11 pages
NY Airbnb Report FINAL
No ratings yet
NY Airbnb Report FINAL
17 pages
Sugam Assingment1.R
No ratings yet
Sugam Assingment1.R
3 pages
Milestone2 Report
No ratings yet
Milestone2 Report
6 pages
Exercise - 6: DS203-2024-S1 Problem1:: Statistics
No ratings yet
Exercise - 6: DS203-2024-S1 Problem1:: Statistics
10 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
Two Variable Statistics
No ratings yet
Two Variable Statistics
12 pages
Final
No ratings yet
Final
14 pages
Uber Data Analysis
No ratings yet
Uber Data Analysis
25 pages
Cyclistic Customer Usage
No ratings yet
Cyclistic Customer Usage
25 pages
Start Predicting In A World Of Data Science And Predictive Analysis
From Everand
Start Predicting In A World Of Data Science And Predictive Analysis
Matthew Abbitt
No ratings yet
Job Security
No ratings yet
Job Security
17 pages
Learning Agility As A Predict Er of Potential
No ratings yet
Learning Agility As A Predict Er of Potential
7 pages
Master's Thesis - Guney - Dogan
No ratings yet
Master's Thesis - Guney - Dogan
73 pages
Public Opinion and Propaganda
No ratings yet
Public Opinion and Propaganda
40 pages
Comprehensive Review of The Maritime Saf
No ratings yet
Comprehensive Review of The Maritime Saf
25 pages
Edexcel Statistics Coursework Cars
100% (2)
Edexcel Statistics Coursework Cars
8 pages
Chap001 (Read-Only)
No ratings yet
Chap001 (Read-Only)
33 pages
A Study On Consumer Buying Behavior Nicrome Leather Processing Industry at Coimbatore
No ratings yet
A Study On Consumer Buying Behavior Nicrome Leather Processing Industry at Coimbatore
13 pages
Soil Investigation and Sampling 1
No ratings yet
Soil Investigation and Sampling 1
1 page
True/False Questions: STAT 202 - Business Statistics II, Fall 2014 Chapter 11. Analysis of Variance
No ratings yet
True/False Questions: STAT 202 - Business Statistics II, Fall 2014 Chapter 11. Analysis of Variance
1 page
Factors Influencing Preferences of Tourists on Public and Private Resorts
No ratings yet
Factors Influencing Preferences of Tourists on Public and Private Resorts
29 pages
Minor Project Report
No ratings yet
Minor Project Report
16 pages
MATH IA Ali
No ratings yet
MATH IA Ali
16 pages
Methods of Enquiry in Psychology
No ratings yet
Methods of Enquiry in Psychology
5 pages
B0601020515 PDF
No ratings yet
B0601020515 PDF
11 pages
Module 1. - FLUP Process - Jan28 2013
No ratings yet
Module 1. - FLUP Process - Jan28 2013
16 pages
Custodial Mothers and Fathers and Their Child Support: 2015: Current Population Reports
No ratings yet
Custodial Mothers and Fathers and Their Child Support: 2015: Current Population Reports
17 pages
Exercises in Educ 800
No ratings yet
Exercises in Educ 800
4 pages
Card and Hyslop Econometrica 2005
No ratings yet
Card and Hyslop Econometrica 2005
49 pages
Qualitative Research Method in Social and Behavioural Science
No ratings yet
Qualitative Research Method in Social and Behavioural Science
6 pages
Background of The Study
No ratings yet
Background of The Study
12 pages
Mba Summer 2019
No ratings yet
Mba Summer 2019
2 pages
Residency Training in Ghana: The Residents' Perspective: Original Article
No ratings yet
Residency Training in Ghana: The Residents' Perspective: Original Article
7 pages
Thesis Title Proposal Architecture
100% (3)
Thesis Title Proposal Architecture
6 pages
MULTIPLE CHOICE II-V Research and Ethical Issues
No ratings yet
MULTIPLE CHOICE II-V Research and Ethical Issues
12 pages
Science Dissertation Methodology
100% (2)
Science Dissertation Methodology
7 pages
#2 Exam Prep Proposal Edited
No ratings yet
#2 Exam Prep Proposal Edited
6 pages
Gcse Citizenship Coursework Ideas
100% (2)
Gcse Citizenship Coursework Ideas
7 pages
Questionaire
100% (1)
Questionaire
11 pages

Bike Sharing Analysis

Uploaded by

Bike Sharing Analysis

Uploaded by

Name- Gaurav Gupta

Roll no- 16HS20013

Bike sharing Analysis

Missing Values Analysis

Visualizing Distribution Of Data

You might also like