Big Data Analytics
Big Data Analytics
2022
Big Data Analytics
Outline
➔ an interdisciplinary field of study that uses data to extract knowledge and insight
Big Data Analytics
Gain customer
insights
Predict market
trends
Making a better
product
Managing business
efficiently
Big Data Analytics
Data engineers work in a variety Data Analyst work closely with Data Scientist need advanced
of settings to build systems that business stakeholders by skills and techniques such as
collect, manage, and convert creating the report, dashboard, designing data modeling
raw data into usable information or generating the insights to processes, creating algorithms
for data scientists and data support them in decision and predictive models to embed
analysts to interpret. making. insights into the business.
Big Data Analytics
Data analytics is simply a branch under the wider concept of data science.
Data analytics involves an inquiry into a hypothesis with the primary objective of uncovering
insights that would support and grow a business in a particular area.
Skills:
- SQL
- Microsoft Excel
- R or Python Programming
- Data Visualization
- Presentation Skills
- Critical Thinking
- Machine Learning
Big Data Analytics
Level of Analytics
Analytics Process
Analytics
Process
4. Data Science Solution 2. KPI metrics
Which data science solutions What indicators will you measure?
you need to implement?
3. Initiatives Business
Understanding
What actions you need to do?
Business Understanding
➔ to dig beneath the surface to uncover the structure of the business problem and the data that are available, and
then match them to one or more data mining tasks for which we may have substantial science and technology to
apply.
1 2 3
Data Preparation
➔ how to cast the business problem as one or more data science problems. Framing a business problem in terms of
expected value can allow us to systematically decompose it into data mining tasks
1 2 3
Level of Measurements
Data
Categorical Numerical
(can be grouped) (measure)
Data Types
Data Types
Database Relationship
Transaction Product
Customer
(Fact Table)
Product ID
Customer ID
Transaction ID
Product Name
Customer Name
Customer ID
Product Category
Customer LTV
Product ID
Time ID
Location ID
Time Location
Price
Time ID Location ID
Discount
Day City
Qty Sold
Month Province
Sales
Year
Big Data Analytics
Describe Data
Column Description Data Types Completeness Distribution Example
Transaction ID Identity number for transaction Char (10) 100% Unique Value: 200,000 TRX0000001
Customer ID Identity number for customer Char (10) 80% Unique Value: 10,000 CS00000001
Product ID Identity number for product Char (10) 95% Unique Value: 500 TA00000001
Location ID Identity number for transaction location Char (3) 70% Unique Value: 100 JKT
Data Preprocessing
➔ the data are manipulated and converted into forms that yield better results.
1 2 3
1 10,000,000 100,000
2 NULL 200,000
Drop Row 2
3. Fill Missing Values with Avg / Median / Mode
1 10,000,000 100,000
Handle Outlier
➔ Outliers are values at the extreme ends of a dataset.
Steps:
Data Transformation
1. Categorical Data
- Convert to numeric
1 Female SMP 0 0 0 1
2 Male SMA 1 0 1 0
3 Male S1 1 1 0 0
4 Female S1 0 1 0 0
Big Data Analytics
Data Transformation
2. Numerical Data
- Feature Scaling
- Convert to Group
- Transformation
Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on
the use of data and algorithms to imitate the way that humans learn,
Supervised Unsupervised
Learning Learning
Association
Classification Regression Clustering
Rules
Supervised Learning
Supervised learning is defined by its use of labeled datasets to train algorithms to make prediction.
Classification Regression
Logistic regression, Decision Tree, Random Linear regression, Decision Tree, Random
Forest, Support Vector Machine, Naive Forest, Support Vector Regressor, Neural
Algorithm
bayes, Nearest Neighbour, Neural Network Network
Big Data Analytics
Unsupervised Learning
Unsupervised learning uses machine learning algorithms to analyze and cluster unlabeled datasets.
interpret the input data and automatically discovering natural grouping in data
Customer Sentiment
Segmentation Analysis
Churn
Prediction
Pricing
Recommendation
Engine
A/B Testing
Customer Segmentation
Pricing
10,000
8,000
Transaction ID, Transaction Time,
Dataset
Quantity, Price Periode: 1 - 30 Sep
2021
10,000 10 100,000
8,000 15 120,000
Goals:
To find associations and
Customer 3: Rice, chicken, burger correlation between different
items that customer buy
Customer N:
Offer Rice
Chicken
Big Data Analytics
Promotion Effectiveness
Paid search
Referral
Methodology:
Social media First click, Last click, Linear approach, Markov model, etc
Big Data Analytics
Recommendation Engine
Sentiment Analysis
Positive Negative
Churn Prediction
➔ Churn model estimates the likelihood of a customer to leave in the next period
of time
Output
Churn or not churn
variable
Methodology Classification
Action
Experiment testing by giving a marketing action to customer
who likely to churn
Big Data Analytics
A/B Testing
Simulation Link
What you do to understand the data A specific thing you want to explain,
and figure out what might be a specific story you want to tell.
noteworthy or interesting to
highlight to others. Turn the data into information that
can be consumed by an audience
Storytelling (1/2)
Introduce the plot, building Throughout your End with a call to action.
the context for your communication, make the Make it totally clear to your
audience. information specific and audience what you want them
This section, set up relevant to your audience. to do with the new
the essential elements of The story should ultimately be understanding or knowledge
story. about your audience, not about that you’ve imparted to them.
you.
Big Data Analytics
Storytelling (2/2)
Narrative has to be central to the communication. These are words written, spoken, or a combination
of the two that tell the story in an order that makes sense and convinces the audience why it’s important
or interesting.
Create Presentation
1. Define questions/problem/context/background/objectives
2. Create hypothesis / list of analyses to answer the questions
3. Create story and outline presentation
4. Create visualization
5. Create executive summary
6. Create recommendation
Big Data Analytics
Data Visualization
Presentation Example
Big Data Analytics
Big Data Analytics
Business Objectives
How to retain customer in order to gain more demand or sales and spend promotion cost effectively?
Sales Performance
Product Sales
10Mn
Total Sales DOTCOM POSTAGE 206K
5.3Mn REGENCY
164K
CAKESTAND 3 TIER
Quantity Sold
PARTY BUNTING 98K
2,423
Avg. Sales / product WHITE HANGING
HEART T-LIGHT 97K
HOLDER
127
Avg. Trx / product JUMBO BAG RED
RETROSPOT
92K
1,281
Avg. Qty / product
Big Data Analytics
Sales by Country
United Kingdom already contributes 85.6% of sales
Big Data Analytics
Customer Performance
One of strategies to increase sales is to persuade customer to transact more and spend more money
61
Avg. Product * 10%, 25%, 50%, 75%, 90% distribution
Big Data Analytics
Customer Retention
Customer retention rate went up and down month by month, with the average retention rate was around
20-30%.
Big Data Analytics
RFM Segmentation
The combination for each score of those metrics will create different groups,
and then the groups can be clustered into several segments
Big Data Analytics
Customer Segmentation
➔ Champion segment already contributes more than 60% of sales
➔ Focusing your efforts on critical segments of customers is likely to give you much higher return on investment.
Big Data Analytics
Do transaction recently, buy often and Reward them, can be early adopters for new
1 Champion 779 (18%)
spend the most products. Will promote your brand.
Potential Recent customers, but spent a good Offer membership/loyalty programs and
3 461 (11%)
Loyalist amount and bought more than once recommend other products.
Cannot Lose Made biggest purchases and often Win them back with aggressive promo, don't lose
5 199 (5%)
Them but haven't returned for a long time them to competitor.
Lowest recency, frequency, and Do reach-out campaign, but don't put extra effort
8 Lost Customer 651 (15%)
monetary values to retain them.
Big Data Analytics
Support 5% Support 6%
Confidence 96% Confidence 82%
* Support: how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.
* Confidence: how likely item Y is purchased when item X is purchased
Big Data Analytics
Price Elasticity
Their demand vary between price, and we can find optimum price and sales for each product. Decrease
price for elastic product to give us a higher sales.
Total Sales
StockCode Elasticity Optimum Price Increase Price Projected Sales
(historical)
Executive Summary
➔ From 4,147 products and 4,380 customers, they contribute 10Mn sales, with the top 10% products
already contribute 62% of sales and United Kingdom contributes the most of sales (85.6%)
➔ Average monthly customer retention rate is around 20-30% and we need to have initiative to
increase the retention rate in order to increase the overall sales by doing marketing initiative such
as personalized promotion, product bundling recommendation, and price elasticity.
➔ Give personalized promotion based on their customer segment and focusing the promotion on
critical segments of customers is likely to give much higher return on investment
➔ Adjust price based on their optimum price for elastic products in order to increase the demand.
Big Data Analytics
Recommendation
Book Recommendation
Appendix
Big Data Analytics
Linear Regression
(dependent or output variable)
Linear regression is a linear model, e.g. a
model that assumes a linear relationship
between the input variables (x) and the single
output variable (y).
Output:
Algorithm:
Stopping criteria: