INTRO TO DATA
ANALYTICS
Our Goal
By the end of this workshop, our goal is to build a simple
mathematical model.
A model may help to explain the effects of different
components, and depending on the type of model, can
help you make predictions
General Assembly: Intro to Data Analytics
TOOLS WE’LL BE USING
General Assembly: Intro to Data Analytics
Google Sheets
For this workshop, we’ll be using Google Sheets
● It’s free
● No installation required
● Works with most browsers
● All you need is a Gmail account
General Assembly: Intro to Data Analytics
Browser
We recommend using Google Chrome.
It’s free and is the most compatible with Google
sheets
General Assembly: Intro to Data Analytics
QUICK PRIMER: STATISTICS
General Assembly: Intro to Data Analytics
Statistics
Mathematical science pertaining to the collection, analysis, interpretation and
presentation of data
Descriptive Inferential Predictive
Summarized information Models that let us draw Models that let us anticipate
about a collection of data, conclusions about a population how members of a population
also called a dataset using sample data are likely to behave
United States: Election Polls Weather Forecast
Population: 323 Million Recommendation engine
Median age: 37.6
General Assembly: Intro to Data Analytics
Descriptive
Information that is most indicative of a large set of data → AVERAGE
MEAN = 33.08 MODE = 40
32
20
Sum:
20 + 25 + 32 + 40 + 40 +23 + 33 + Most repeated
25 40 40 observation: 40
50 + 21 + 40 + 29 + 37 + 40 = 430
23 33 Total observation: 13
40
50
Mean = 430/ 13
21 40
37
29 MEDIAN
20 21 23 25 29 32 33 37 40 40 40 40 50
General Assembly: Intro to Data Analytics
Correlation
CO- (together)- Relation- Relationship between two sets of data
Positive: when value is increasing together
Negative: when one value is increasing, the other is decreasing HEIGHT SHOE SIZE
109 3
121 5
138 6
155 8
160 9
180 12
191 14
General Assembly: Intro to Data Analytics
Correlation
General Assembly: Intro to Data Analytics
DATA ANALYTICS
General Assembly: Intro to Data Analytics
Data
Information that exists in a variety of formats and sizes
Name: Gus the Cat
Age: 4
Parents: Adam Webb
City: Toronto
Phone: 416-555-MEOW
Twitter: @guscat13
Hobby: Nap, Nip and Netflix
General Assembly: Intro to Data Analytics
Data Analytics
Process of examining data to draw conclusions about that information.
← If you like this
you will also like this →
@GusCat13
General Assembly: Intro to Data Analytics
DATA ANALYTICS PROCESS
General Assembly: Intro to Data Analytics
Identify the
Obtain the Data Explore the Data Prepare the Data Analyze the Data Present the Data
Problem
General Assembly: Intro to Data Analytics
Our hands-on exercise will focus
on some essential skills here:
Identify the
Obtain the Data Explore the Data Prepare the Data Analyze the Data Present the Data
Problem
General Assembly: Intro to Data Analytics
Identify the
Obtain the Data Explore the Data Prepare the Data Analyze the Data Present the Data
Problem
General Assembly: Intro to Data Analytics
Identify the Problem
Before you begin working with any data, you
must understand the what and the why of
the problem that you’re trying to solve.
General Assembly: Intro to Data Analytics
Identify the Problem
Netflix wants to make recommendations for
what its users should watch after completing
a show.
General Assembly: Intro to Data Analytics
Identify the
Obtain the Data Explore the Data Prepare the Data Analyze the Data Present the Data
Problem
General Assembly: Intro to Data Analytics
Obtain the Data
To work with the data, you first have to find
it or collect it – and it has to be the right
data to help you answer the question.
General Assembly: Intro to Data Analytics
Obtain the Data
Example of how the data can be collected:
Data Engineers IT Department Reports Databases APIs
Request Data Pull Data
General Assembly: Intro to Data Analytics
Obtain the Data
Someone from Netflix IT provides you the data
in a excel file.
But, how did Netflix collect this information?
General Assembly: Intro to Data Analytics
Obtain the Data
Logging Start & Stop Still watching?
General Assembly: Intro to Data Analytics
Identify the
Obtain the Data Explore the Data Prepare the Data Analyze the Data Present the Data
Problem
General Assembly: Intro to Data Analytics
Explore the Data
Ensure you can correctly interpret the
results and trust the data.
General Assembly: Intro to Data Analytics
Explore the Data
General Assembly: Intro to Data Analytics
Identify the
Obtain the Data Explore the Data Prepare the Data Analyze the Data Present the Data
Problem
General Assembly: Intro to Data Analytics
Prepare the Data
● Make sure the data doesn’t contain
incorrect or missing values
● Perform any transformation to the data so
it is in a format that can be easily analyzed
○ Ex. Adding new columns or pivoting
General Assembly: Intro to Data Analytics
Prepare the Data
● Add column to calculate ‘Total Minutes’
● Group information by subscriber with total
minutes for each show
General Assembly: Intro to Data Analytics
Identify the
Obtain the Data Explore the Data Prepare the Data Analyze the Data Present the Data
Problem
General Assembly: Intro to Data Analytics
Analyze the Data
Work with the clean and transformed data to
answer questions:
● Descriptive Stats
● Modelling and Algorithms
● Predictive Stats
General Assembly: Intro to Data Analytics
Analyze the Data
Descriptive statistics questions:
● What are the most watched shows?
● What shows have the highest average viewing
time?
General Assembly: Intro to Data Analytics
Analyze the Data
Problem: What shows should we recommend
viewers watch next?
Calculating correlation between shows will tell us
how common it is that people view both ‘show A’
and ‘show B’.
General Assembly: Intro to Data Analytics
Analyze the Data
A Correlation Matrix allows us to understand
relationships between multiple variables: in this
case, between time spent watching shows.
General Assembly: Intro to Data Analytics
Analyze the Data
Descriptive Statistics:
General Assembly: Intro to Data Analytics
Analyze the Data
Correlation Matrix:
General Assembly: Intro to Data Analytics
Identify the
Obtain the Data Explore the Data Prepare the Data Analyze the Data Present the Data
Problem
General Assembly: Intro to Data Analytics
Present the Data
Determine the best way to share your results with
others.
● Charts
● Infographics
● Dashboards
General Assembly: Intro to Data Analytics
Present the Data
General Assembly: Intro to Data Analytics
Present the Data
General Assembly: Intro to Data Analytics
Getting Started
1. Type this URL into your browser: ga.co/2iddFCZ
2. In the top left corner click: File > Make a copy
3. Select any folder you’ll easily find for the
duration of the workshop
General Assembly: Intro to Data Analytics
CONGRATS! YOU HAVE BUILT
YOUR FIRST MODEL!
General Assembly: Intro to Data Analytics
Business Implications
SUBSCRIBER ID SHOW EPISODES WATCHED EPISODE LENGTH (M)
PROFILE
NAME
EMAIL
LOCATION
…
Intuitively… + =
General Assembly: Intro to Data Analytics
Business Implications
SUBSCRIBER ID SHOW EPISODES WATCHED EPISODE LENGTH (M)
PROFILE
NAME
EMAIL
LOCATION
…
Based on our
➡
model…
General Assembly: Intro to Data Analytics
Business Implications
SUBSCRIBER ID SHOW EPISODES WATCHED EPISODE LENGTH (M)
PROFILE
NAME
EMAIL
LOCATION
…
General Assembly: Intro to Data Analytics
Business Implications
SUBSCRIBER ID SHOW EPISODES WATCHED EPISODE LENGTH (M)
PROFILE
NAME
EMAIL
LOCATION
…
Based on our ⇨
model…
General Assembly: Intro to Data Analytics
Next Steps
1. What else might you want to consider when
making recommendations?
2. How else could Netflix use the insights on
correlation between watching shows?
General Assembly: Intro to Data Analytics
Next Steps
General Assembly: Intro to Data Analytics
Data Analytics
We’ve walked through the analytics process
today in one use case, but businesses use data
for many reasons including…
1. Managing inventory and pricing
2. Forecasting demand
3. Understanding success and failure
General Assembly: Intro to Data Analytics