Instructor:: Doaa Adil Mohamed Altayeb
Instructor:: Doaa Adil Mohamed Altayeb
DATA MINING
Instructor:
Doaa Adil Mohamed Altayeb
Email : [email protected]
12/1/20 1
Content
Current Situation.
Why mine data?
What is data mining?
What is not data mining?
Related disciplines.
Decisions in Data Mining.
Data Mining tasks.
Data mining and different concepts.
Mining market.
Current Situation …
Data mining:
Task-relevant Data
Data Cleaning
Data Integration
Databases
Examples: What is (not) Data Mining?
Database
Statistics
Technology
Machine
Learning
Data Mining Visualization
Information Other
Science Disciplines
Decisions in Data Mining
Databases to be mined
Knowledge to be mined
Techniques utilized
Applications adapted
Data Mining Tasks
Tasks categories:
─ Prediction Tasks.
─ Description Tasks.
Predictive Tasks
Predictive: Descriptive:
• Classification • Clustering
• Regression • Association Rule
Discovery
• Deviation Detection
• Sequential Pattern
Discovery
Classification Definition
c al c al us
i i o
or or nu Splitting Attributes
t e g
t e g
nti
a ss
l
ca ca co c
Tid Refund Marital Taxable
Status Income Cheat
Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO MarSt
4 Yes Married 120K No Single, Divorced Married
5 No Divorced 95K Yes
TaxInc NO
6 No Married 60K No
< 80K > 80K
7 Yes Divorced 220K No
8 No Single 85K Yes NO YES
9 No Married 75K No
10 No Single 90K Yes
10
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Classification Application
Direct Marketing
Goal: Reduce cost of mailing by targeting
a set of consumers likely to buy a new
cell-phone product.
Regression
Original points:
Illustrating Clustering (cont.)
Iteration 1 Iteration 2 Iteration 3
3 3 3
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Clustering Application
Market Segmentation:
Goal: subdivide a market into distinct subsets of customers
where any subset reached with a distinct marketing mix.
Association Rule Definition
TID Items
1 Bread, Coke, Milk
2 Water, Bread Rules
RulesDiscovered:
Discovered:
{Milk}
{Milk}-->
-->{Coke}
{Coke}
3 Water, Coke, Diaper, Milk {Diaper,
{Diaper,Milk}
Milk}-->
-->{Water}
{Water}
4 Water, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Sequential Pattern Discovery: Definition
Healthcare area:
Goal: To help in properly identify the disease based on
sequence of symptoms.