Data Mining Overview: by Dr. Sunil D. Lakdawala
Data Mining Overview: by Dr. Sunil D. Lakdawala
By
Dr. Sunil D. Lakdawala
Content
Case Data Mining – Supervised Learning
Case Data Mining – Unsupervised Learning
Definition
Applications
Techniques
Supervised Learning
Unsupervised Learning
• Target Marketing
• Attrition Prediction/Churn Analysis
• Fraud Detection
• Credit Scoring
Predicting for every case which class it belongs to or
probability of success based on its predictor
variables data
• Forecasting sales
• Predicting price fluctuations
• Predicting profitability of business units
• Predicting market value of assets
• Predicting yield or consumption of critical
inputs
Predicting for every case a value based on its
predictor variables data
Data Mining - Overview 14
Overview of Techniques - 3
Clustering and Dimension Reduction
k-Means For given number of clusters – k value - develops
Clustering clusters based on minimum distance between the
cluster centers and the cases in the cluster.
Hierarchical Builds, through successive steps, clusters by
Clustering grouping cases having less dissimilarities and
finally creating a single cluster. The user can
choose the number of clusters corresponding to a
distance measure.
Principal Creates new variables, called Principal
Components Components, that are uncorrelated and that
explain majority of variability in original data.
Data Mining - Overview 15
Dimension Reduction
• When there are many dimensions
(predictors), say 20, 30 or 50..
• Or when several predictors are correlated
• Develop new variables that:
– Explain the major portion of variability in data,
and
– Are uncorrelated
• Market segmentation
• Product grouping based on customer preferences
• Grouping of business units based on performance
parameters
• Grouping channel partners based on performance
parameters
Grouping of homogenous cases based on
predefined variables data
Data Mining - Overview 17
Overview of Techniques - 4
Market Basket Analysis / Affinity
40 120.00%
distribution using
Frequency
100.00%
30
80.00% Frequency
20 60.00%
Cumulative %
Histogram
40.00%
10
20.00%
0 .00%
Frequency
20 100.00%
30 100.00% Frequency
80.00%
15 Frequency 20
60.00%
Cumulative % 50.00% Cumulative %
10
40.00%
10
5 20.00%
0 .00%
0 .00% .5 .5 .5 .5 re
11 12 13 14 Mo
Bin - Alcohol Content
Bin - Alcohol Conte nt