Class Notes 10jan2023
Class Notes 10jan2023
Madhavan Mukund
https://www.cmi.ac.in/~madhavan
Two thresholds
How frequently does X ✓ tj imply Y ✓ tj ?
How significant is this pattern overall?
Apriori observation
If Z is not a frequent itemset, no superset Y ◆ Z can be
frequent
Apriori observation
If Z is not a frequent itemset, no superset Y ◆ Z can be
frequent
&
F1 : Scan T , maintain a counter for each x 2 I
C2 = {{x, y } | x, y 2 F1 }: Candidates in level 2
F2 : Scan T , maintain a counter for each X 2 C2
Fz
C3 = {{x, y , z} | {x, y }, {x, z}, {y , z} 2 F2 } /
F3 : Scan T , maintain a counter for each X 2 C3 2 ->
F
How do we generate Ck ?
How do we generate Ck ?
Naı̈ve: enumerate subsets of size k and check each one
Expensive!
innmi
How do we generate Ck ?
8
Naı̈ve: enumerate subsets of size k and check each one
Expensive!
calculate calculate
Madhavan Mukund Lecture 2: 10 January, 2023 DMML Jan–Apr 2023 8 / 16
Apriori algorithm
Ck = subsets of size k, every (k 1)-subset is in Fk 1
How do we generate Ck ?
Naı̈ve: enumerate subsets of size k and check each one
Expensive!
(x)
⑭
How do we generate Ck ?
Naı̈ve: enumerate subsets of size k and check each one
1
Expensive!
07
X = {i1 , i2 , . . . , ik 2 , ik 1 }
X 0 = {i1 , i2 , . . . , ik 2 , ik0 1 }
Merge(X , X 0 ) = {i1 , i2 , . . . , ik 0
2 , ik 1 , ik 1 }
*muxfix/
X = {i1 , i2 , . . . , ik 2 , ik 1 }
x.,x2. -
X 0 = {i1 , i2 , . . . , ik 2 , ik0 1 }
Ck0 = {Merge(X , X 0 ) | X,X0 2 Fk 1}
Claim Ck ✓ Ck0
Suppose Y = {i1 , i2 , . . . , ik 1 , ik } 2 Ck
X = {i1 , i2 , . . . , ik 2 , ik 1 } 2 Fk 1 and
X 0 = {i1 , i2 , . . . , ik 2 , ik } 2 Fk 1
Y = Merge(X , X 0 ) 2 Ck0
menentela
Merge(X , X 0 ) = {i1 , i2 , . . . , ik 0
2 , ik 1 , ik 1 }
X = {i1 , i2 , . . . , ik 2 , ik 1 }
X 0 = {i1 , i2 , . . . , ik 2 , ik0 1 }
Ck0 = {Merge(X , X 0 ) | X , X 0 2 Fk 1}
Claim Ck ✓ Ck0
Suppose Y = {i1 , i2 , . . . , ik 1 , ik } 2 Ck
X = {i1 , i2 , . . . , ik 2 , ik 1 } 2 Fk 1 and
X 0 = {i1 , i2 , . . . , ik 2 , ik } 2 Fk 1
Y = Merge(X , X 0 ) 2 Ck0
Can generate Ck0 efficiently
Arrange Fk 1 in dictionary order
Split into blocks that di↵er on last element
Merge all pairs within each block
abae
Madhavan Mukund Lecture 2: 10 January, 2023 DMML Jan–Apr 2023 9 / 16
Apriori algorithm
C1 = {{x} | x 2 I }
F1 = {Z | Z 2 C1 , Z .count · M}
For k 2 {2, 3, . . .}
Ck0 = {Merge(X , X 0 ) | X , X 0 2 Fk 1}
Fk = {Z | Z 2 Ck0 , Z .count · M}
Fk = {Z | Z 2 Ck0 , Z .count · M}
When do we stop?
Fk = {Z | Z 2 Ck0 , Z .count · M}
When do we stop?
k exceeds the size of the largest transaction
Cs(Miz
Fk is empty More
generally, in is
amply
#so: Applicability
of insight
Madhavan Mukund Lecture 2: 10 January, 2023 DMML Jan–Apr 2023 10 / 16
Apriori algorithm
C1 = {{x} | x 2 I }
F1 = {Z | Z 2 C1 , Z .count · M}
For k 2 {2, 3, . . .}
Ck0 = {Merge(X , X 0 ) | X , X 0 2 Fk 1}
Fk = {Z | Z 2 Ck0 , Z .count · M}
When do we stop?
k exceeds the size of the largest transaction
Fk is empty
Can we do better?
&
For every frequent itemset Z
77
Enumerate all pairs X , Y ✓ Z , X \ Y = ;
(X [ Y ).count
Check
X .count
Can we do better?
Sufficient to check all partitions of Z
Suppose X , Y ✓ Z , X ! Y is a valid association rule,
but X [ Y is a proper subset of Z
Can we do better?
Sufficient to check all partitions of Z
Suppose X , Y ✓ Z , X ! Y is a valid association rule,
but X [ Y is a proper subset of Z
X [ Y = Z0 ( Z
Z 0 is also a frequent itemset (a priori)
X , Y partitions Z 0
Madhavan Mukund Lecture 2: 10 January, 2023 DMML Jan–Apr 2023 12 / 16
Association rules
Sufficient to check all partitions of Z
is valid
(v)wnbX
Madhavan Mukund Lecture 2: 10 January, 2023 DMML Jan–Apr 2023 13 / 16
Association rules
Sufficient to check all partitions of Z
Know
(X [ Y ).count
X .count
(X [ Y ).count
x 0
+
Check
(X [ {y }).count
X .count (X [ {y }).count, always
Second fraction has smaller denominator, so
(X [ {y }) ! Y \ {y } is also a valid rule