Coursera Capstone Project Final

The document summarizes a Coursera capstone project aimed at helping future entrepreneurs choose the best location to open small or medium businesses in New York City. The project uses data on New York City neighborhoods and venues from Foursquare to cluster neighborhoods based on common venue types. K-means clustering is applied to venue data from the top 10 most common venue categories. The results divide neighborhoods into clusters that are visualized on a map. Cluster 4 is identified as prime for restaurants, containing 9 neighborhoods in the Bronx.

Uploaded by

Yader Carrillo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

512 views6 pages

Coursera Capstone Project Final

Uploaded by

Yader Carrillo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Coursera Capstone Project: Applied Data Science

Yader Rafael Carrillo Jaime

[email protected]
National Autonomous University of Nicaragua, Managua

1) Introduction/Business Problem
Around the world, hundreds of people are trying every day to open small and medium
businesses. No matter in what city they are planning to do it, they will look for the best
place with the aim of increase their earnings. The present project, is directed to help future
entrepreneurs in order to choose the best location to build their businesses in New York
City, through providing data about neighborhoods' characteristics and common venues to
set up the venture.

It should be noted that to reach this goal, we need to follow a particular structure to show
the results. In this case, we were claimed to follow the typical Data science methodology. I
hope to do the best of myself along the project.

2) Data

To reach the goal of this project and provide information to stakeholders, I'll be using New
York data and Foursquare API to extract competitors on the same neighborhoods.
New York data can be found here https://geo.nyu.edu/catalog/nyu_2451_34572

2.1 Neighborhoods
The data of the neighborhoods in New York can be extracted from JSON file found in
https://cocl.us/new_york_dataset.

2.2 Geopy library

I used this library to get Bronx latitude and longitude
2.3 Venue Data

From the location data obtained previously, the venue data is found out by passing in the
required parameters to the FourSquare API, and creating another Data Frame to contain all
the venue details along with the respective neighborhoods.

3. Methodology
3.1 Folium

Folium builds on the data wrangling strengths of the Python ecosystem and the mapping
strengths of the leaflet.js library. All cluster visualization is done with help of Folium which
in turn generates a Leaflet map made using OpenStreetMap technology.

3.2 One hot encoding

One hot encoding is a process by which categorical variables are converted into a form that
could be provided to ML algorithms to do a better job in prediction. For the K-means
Clustering Algorithm, all unique items under Venue Category are one-hot encoded.

3.3 Top 10 most common venues

Due to high variety in the venues, only the top 10 common venues are selected and a new
Data Frame is made, which is used to train the K-means Clustering Algorithm.
3.4 K-means clustering

The venue data is then trained using K-means Clustering Algorithm to get the desired
clusters to base the analysis on. K-means was chosen as the variables (Venue Categories)
are huge, and in such situations K-means will be computationally faster than other
clustering algorithms.

4) Results

The neighborhoods are divided into n clusters where n is the number of

clusters found using the optimal approach. The clustered neighborhoods are
visualized using different colors so as to make them distinguishable
6 Discussion

After analyzing the various clusters produced by the Machine learning

algorithm, cluster no 4, is a prime fit to solving the problem of finding a
cluster with common venue as a restaurant mentioned before.
Nine neighborhoods called: Pelham Parkway, Morris Park, Van Nest, Throgs Neck,
Belmont, North Riverdale, Pelham Bay, Edgewater Park, Bronxdale are the best places to
set up the venture in Bronx, New York City.

A Comparison of Machine Learning Algorithms for Customer Churn Prediction
No ratings yet
A Comparison of Machine Learning Algorithms for Customer Churn Prediction
6 pages
Adoption of BI in SMEs PDF
No ratings yet
Adoption of BI in SMEs PDF
22 pages
Australian Gas Production - Project On Time Series Forecasting
100% (19)
Australian Gas Production - Project On Time Series Forecasting
29 pages
Development of Faculty Qualification Analysis System Using Naive Bayes Algorithm
No ratings yet
Development of Faculty Qualification Analysis System Using Naive Bayes Algorithm
11 pages
Captone Project - Data Analytics Capstone
0% (1)
Captone Project - Data Analytics Capstone
24 pages
Graded Quiz 1 - Working With Python Great Lakes
100% (1)
Graded Quiz 1 - Working With Python Great Lakes
6 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
AB Cheatsheet
No ratings yet
AB Cheatsheet
13 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Wine Quality Prediction Using Machine Learning Algorithms
100% (1)
Wine Quality Prediction Using Machine Learning Algorithms
4 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
Getting Started With Data Science Using Python
100% (1)
Getting Started With Data Science Using Python
25 pages
Capstone Project Data Science
No ratings yet
Capstone Project Data Science
5 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
K2 Cold Storage Case Study
0% (1)
K2 Cold Storage Case Study
1 page
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Coursera Capstone - Project Report
No ratings yet
Coursera Capstone - Project Report
11 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
Webinar 06 Performance Tuning
No ratings yet
Webinar 06 Performance Tuning
14 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
How Many Red Colored Swift Cars Are There in Delhi?
No ratings yet
How Many Red Colored Swift Cars Are There in Delhi?
1 page
Big Data
No ratings yet
Big Data
9 pages
Machine Learning Project Report
100% (1)
Machine Learning Project Report
4 pages
List of HTTP Header Fields
No ratings yet
List of HTTP Header Fields
10 pages
Predicting Mode of Transport (ML) : Akalya KS
No ratings yet
Predicting Mode of Transport (ML) : Akalya KS
17 pages
Wine Quality Classification
No ratings yet
Wine Quality Classification
36 pages
Lesson 2 WEB DESIGN PRINCIPLES AND ELEMENTS
No ratings yet
Lesson 2 WEB DESIGN PRINCIPLES AND ELEMENTS
28 pages
01 TShirt Sales Finished
No ratings yet
01 TShirt Sales Finished
7 pages
House Price Prediction Using Machine Learning: Bachelor of Technology
No ratings yet
House Price Prediction Using Machine Learning: Bachelor of Technology
20 pages
Final
100% (1)
Final
14 pages
Predictive Modelling Project_Nandini
No ratings yet
Predictive Modelling Project_Nandini
31 pages
Java Assigment
100% (1)
Java Assigment
94 pages
AWS VS Azure VS GCP VS IBM Cloud VS Oracle VS Alibaba
100% (2)
AWS VS Azure VS GCP VS IBM Cloud VS Oracle VS Alibaba
11 pages
New Wheels Quarterly Business Report
No ratings yet
New Wheels Quarterly Business Report
20 pages
2016-Letter-of-understanding-on-AS4-transition
No ratings yet
2016-Letter-of-understanding-on-AS4-transition
5 pages
Data Analyst Udemy Report Writing PDF
No ratings yet
Data Analyst Udemy Report Writing PDF
15 pages
Wine Quality Synopsis
No ratings yet
Wine Quality Synopsis
3 pages
Capstone Chapter 9 Case Problem Grey Code Corporation SBA 1 2
No ratings yet
Capstone Chapter 9 Case Problem Grey Code Corporation SBA 1 2
10 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
X Education - Lead Scoring Case Study
No ratings yet
X Education - Lead Scoring Case Study
24 pages
Promilo BA Assignment
No ratings yet
Promilo BA Assignment
33 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
Management Information Systems: Case Summary On "Klöckner & Co: Steeling For A Digital World
No ratings yet
Management Information Systems: Case Summary On "Klöckner & Co: Steeling For A Digital World
7 pages
Churn Analysis in Telecommunication Using Logistic Regression
No ratings yet
Churn Analysis in Telecommunication Using Logistic Regression
6 pages
Report On Linear Regression Using R
No ratings yet
Report On Linear Regression Using R
15 pages
IJARCCE.2023.125235
No ratings yet
IJARCCE.2023.125235
7 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
Banking, Finance and Insurance Domain
No ratings yet
Banking, Finance and Insurance Domain
14 pages
Churn Prediction Using Logistic Regression
No ratings yet
Churn Prediction Using Logistic Regression
5 pages
Problem 1
No ratings yet
Problem 1
12 pages
SC Lab Manual-1
No ratings yet
SC Lab Manual-1
59 pages
Assignment 2 Solution
No ratings yet
Assignment 2 Solution
6 pages
House Price Prediction Using Data Science
No ratings yet
House Price Prediction Using Data Science
8 pages
Dv15 HR
No ratings yet
Dv15 HR
104 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
01 Laboratory Exercises
No ratings yet
01 Laboratory Exercises
4 pages
Data Preparation
No ratings yet
Data Preparation
12 pages
Synopsis Minor Project-2
No ratings yet
Synopsis Minor Project-2
5 pages
Machine Learning Project Report - Customer Segmentation
No ratings yet
Machine Learning Project Report - Customer Segmentation
2 pages
Pass4sure 200-101 Dumps
No ratings yet
Pass4sure 200-101 Dumps
18 pages
Machine Learning Project Car Price Prediction Algorithm
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
4 pages
Arena Introduction
No ratings yet
Arena Introduction
13 pages
Oop Notes Unit 5
No ratings yet
Oop Notes Unit 5
21 pages
Mastercam 2019 (PDFDrive)
No ratings yet
Mastercam 2019 (PDFDrive)
94 pages
CLV and Pricing Analytics Case 3
No ratings yet
CLV and Pricing Analytics Case 3
2 pages
Konami Digital Entertainment, Inc. (KDE) Yu-Gi-Oh! Trading Card Game Konami Tournament Software (KTS) User Guide
No ratings yet
Konami Digital Entertainment, Inc. (KDE) Yu-Gi-Oh! Trading Card Game Konami Tournament Software (KTS) User Guide
64 pages
Java Lab File
No ratings yet
Java Lab File
23 pages
Java Calendar Invite
No ratings yet
Java Calendar Invite
4 pages
Classical Item and Test Analysis Report Ujian Sejarah Tahun 4
No ratings yet
Classical Item and Test Analysis Report Ujian Sejarah Tahun 4
23 pages
Report Capstone Week 4
No ratings yet
Report Capstone Week 4
7 pages
Help File
No ratings yet
Help File
9 pages
Capstone Presentation
No ratings yet
Capstone Presentation
9 pages
Better Ansible Network Automation With Roles and Custom Modules
No ratings yet
Better Ansible Network Automation With Roles and Custom Modules
22 pages
Market Segmentation For Airlines
No ratings yet
Market Segmentation For Airlines
1 page
Pbi Stress QRS Card
No ratings yet
Pbi Stress QRS Card
2 pages
Iot Pe Uniti
No ratings yet
Iot Pe Uniti
38 pages
AUtomation Exam
No ratings yet
AUtomation Exam
2 pages
Chapter 3 Operating System Overview
No ratings yet
Chapter 3 Operating System Overview
50 pages
How To Update Firmware
No ratings yet
How To Update Firmware
2 pages
Travelcarma: Online Travel Technology Solutions
No ratings yet
Travelcarma: Online Travel Technology Solutions
11 pages
GR-3 Modem Upgrade RevC IM2
No ratings yet
GR-3 Modem Upgrade RevC IM2
16 pages
MIS 1st Chapter Note
No ratings yet
MIS 1st Chapter Note
11 pages
Laravel 8 Tips
No ratings yet
Laravel 8 Tips
2 pages
EPBCS FDMEE User Manual - Draft - v1
No ratings yet
EPBCS FDMEE User Manual - Draft - v1
20 pages
DNS Conditional Forwarders With Mikrotik RouterOS
No ratings yet
DNS Conditional Forwarders With Mikrotik RouterOS
2 pages
CS242 Stanford HW#1
No ratings yet
CS242 Stanford HW#1
5 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet

Coursera Capstone Project Final

Uploaded by

Coursera Capstone Project Final

Uploaded by

Coursera Capstone Project: Applied Data Science

Yader Rafael Carrillo Jaime

2.2 Geopy library

3.2 One hot encoding

3.3 Top 10 most common venues

The neighborhoods are divided into n clusters where n is the number of

After analyzing the various clusters produced by the Machine learning

You might also like