KaggleTaxiTrip

Aug 28, 2017

1a29d9d · Aug 28, 2017

Name	Name	Last commit message	Last commit date
parent directory ..
pic	pic	Add picture for new e-mail	Aug 28, 2017
Exploring the dataset.ipynb	Exploring the dataset.ipynb	XGB/RF Predictions	Jul 26, 2017
README.md	README.md	edit READMEs	Jul 26, 2017

README.md

New York City Taxi Trip Duration

Kaggle playground to predict the total ride duration of taxi trips in New York City.

First part - Data exploration

The first part is to analyze the dataframe and observe correlation between variables.

Second part - Clustering

The goal of this playground is to predict the trip duration of test set. We know that some neighborhoods are more congested. So, I used K-Means to compute geo-clusters for pickup and drop off.

Third part - Cleaning and feature selection

I have found some odd long trips : one day trip with a mean spead < 1km/h.
I have removed these outliners.

I also added features from the data available : Haversine distance, Manhattan distance, means for clusters, PCA for rotation.

Forth part - Prediction

I compared Random Forest and XGBoost.
Current Root Mean Squared Logarithmic error : 0.391

Feature importance for RF & XGBoost

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

KaggleTaxiTrip

KaggleTaxiTrip

README.md

New York City Taxi Trip Duration

First part - Data exploration

Second part - Clustering

Third part - Cleaning and feature selection

Forth part - Prediction

Files

KaggleTaxiTrip

Directory actions

More options

Directory actions

More options

Latest commit

History

KaggleTaxiTrip

Folders and files

parent directory

README.md

New York City Taxi Trip Duration

First part - Data exploration

Second part - Clustering

Third part - Cleaning and feature selection

Forth part - Prediction