Data analytics and its processess - models - methods
Data analytics and its processess - models - methods
• Data analysts typically analyze raw data for insights and trends. Data
analysts use methods, various tools and techniques to enable
organizations to make decisions and succeed.
Knowledge vs Wisdom
What is knowledge?
Source: https://www.ontotext.com/knowledgehub/fundamentals/dikw-pyramid/
What is wisdom?
Source: https://www.ontotext.com/knowledgehub/fundamentals/dikw-pyramid/
Source: https://www.ontotext.com/knowledgehub/fundamentals/dikw-pyramid/
Data or Information?
Invoice Date : 2/22/14 Invoice #: 123
Customer: ABC company
Item # Qty Price
99 3 $20
Source: https://www.ontotext.com/knowledgehub/fundamentals/dikw-
What is wisdom?
Source: https://www.ontotext.com/knowledgehub/fundamentals/dikw-pyramid/
A Data to
Knowledge
Continuum
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2020), Analytics, Data Science, & Artificial Intelligence: Systems for
Decision Support, 11e, Global Edition, Pearson Education
DATA ANALYTICS AND DATA PRIVACY TRAINING COURSE
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2020), Analytics, Data Science, & Artificial Intelligence: Systems for
Decision Support, 11e, Global Edition, Pearson Education
Analytics Applications Business Questions
What are
business
values?
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2020), Analytics, Data Science, & Artificial Intelligence: Systems for
Decision Support, 11e, Global Edition, Pearson Education
Analytics Applications Business Questions Business Values
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2020), Analytics, Data Science, & Artificial Intelligence: Systems for Decision Support, 11e,
Global Edition, Pearson Education
Step 2: Understand and Collect data
Before we can start analyzing, there
needs to be data available for use
(data source)
Yeye He, et al., Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformations.
PVLDB, 11(10): 1165-1177, 2018.
EDA with Scatter Plot
What happened?
Units
Amount
EDA with Box Plots
CRISP-DM (2 of 2)
1 2
Business Data
• The Six-Step C R I S P-D M Process Understanding Understanding
5
Testing and
Evaluation
SE M M A Sample
(Generate a representative
sample of the data)
• SEMMA (Sample,
Explore, Modify, Model,
and Assess) Data Mining Assess Explore
Process (Evaluate the accuracy and
usefulness of the models)
(Visualization and basic
description of the data)
• Developed by S A S Feedback
Institute
Model Modify
(Use variety of statistical and (Select variables, transform
machine learning models ) variable representations)
K DD
Internalization
Data Mining
• K D D (Knowledge Discovery in DEPLOYMENT CHART
Knowledge
Databases) Process “Actionable
PHASE 1 PHASE 2 PHASE 3 PHASE 4 PHASE 5
DEPT 1
DEPT 2
DEPT 3
5 Insight”
DEPT 4
4
Data 1 2 3
Transformation
Extracted
Patterns
Data
Cleaning Transformed
Data
Data
Selection Preprocessed
Data
Target
Data
Feedback
Sources for
Raw Data
Which Process is the Best?
Mining My own
Methodologies/Proc
SEMMA
esses.
KDD Process
My organization's
Domain-specific methodology
None
0 10 20 30 40 50 60 70
Machine Classification
Decision Trees, Neural Networks, Support
Vector Machines, kNN, Naïve Bayes, GA
Supervised
Association
Segmentation
• Decision tree
• Random forest
• Statistical analysis
• Neural networks
• Support vector machines
• Bayesian classifiers
• …………..
47
Random
Forest
• Random forest is a
commonly-used machine
learning algorithm
trademarked by Leo
Breiman and Adele Cutler
Logistic regression
Logistic regression estimates the probability of an event occurring, such
as voted or didn't vote, based on a given dataset of independent variables.
Artificial neural
networks (ANNs)
consist of input, hidden,
and output layers with
connected neurons
(nodes) to simulate the
human brain.
Data Aalytics and Data Privacy Course
Ensemble
Models for
Predictive
Analytics
Cluster Analysis
• A Graphical Illustration of the Steps in the k-Means
Algorithm