DM-u3
DM-u3
3
Why Is Freq. Pattern Mining Important?
•Frequent Patterns:
Frequent Patterns are patterns that occur frequently in
data.
15
Support and Confidence
Transactions database Example 1
TID Products
1 A, B, E Examples:
2 B, D A C
3 B, C
4 A, B, D
Support: 4/9 = 44%
5 A, C •Confidence: 4/6 = 66%
6 B, C
7 A, C
8 A, B, C, E
9 A, B, C Customer
Customer
buys A ,C
buys C
Customer
buys A
17
Market Basket Analysis (cntd…)
•LIMITATIONS
–takes over 18 months to implement
–market basket analysis only identifies hypotheses,
which need to be tested
•neural network, regression, decision
tree
analyses
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Implementation of Apriori
• How to generate candidates?
– Step 1: self-joining Lk
– Step 2: pruning
• Example of Candidate-generation
– L3={abc, abd, acd, ace, bcd}
– Self-joining: L3*L3
• abcd from abc and abd
• acde from acd and ace
– Pruning:
• acde is removed because ade is not in L3
– C4 = {abcd}
23
How to Count Supports of Candidates?
If the minimum strong. confidence threshold is, say, 70%, then only the second,
third, and last rules are output, because these are the only ones generated that are
Further Improvement of the Apriori Method