0% found this document useful (0 votes)
12 views

FP Tree

Uploaded by

notesbook14925
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

FP Tree

Uploaded by

notesbook14925
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Mining Frequent Patterns

without Candidate Generation


Association Rule Mining
 FP-growth is an improved version of the
Apriori Algorithm which is widely used for
frequent pattern mining
 It is used as an analytical process that finds
frequent patterns or associations from data sets.
For example, grocery store transaction data
might have a frequent pattern that people
usually buy chips and beer together.

2
Apriori Algorithm
 The Apriori Algorithm produces frequent
patterns by generating itemsets and discovering
the most frequent itemset over a threshold
“minimal support count”.
 It greatly reduces the size of the itemset in the
database by one simple principle:
 If an itemset is frequent, then all of its
subsets must also be frequent.

3
Apriori Algorithm
 Apriori Algorithm is an influential algorithm for
mining frequent itemsets for boolean association
rules.
 It uses prior knowledge of frequent itemset
properties (Aprior).
 It uses K frequent itemsets to find K+1
itemsets.
 It is based on three concept: Frequent itemset,
Apriori property and join operations
4
Apriori Algorithm
 Advantages :
 Easy to understand and implement.
 Can be easily parallelized.

 Uses large itemset property.

 Disadvantages:
 Requires many database scans.
 Assumes transaction database is memory resident.

5
 Apriori Algorithm has a major shortfall. Using
Apriori requires multiple scans of the database
to check the support count of each item and
itemsets.
 When the database is huge, this will cost a
significant amount of disk I/O and computing
power.
 Therefore the FP-Growth algorithm is created
to overcome this shortfall. It only scans the
database twice and used a tree structure(FP-
tree) to store all the information.
6
FP Tree
 The root represents null,
 Each node represents an
item,
 association of the nodes in
the itemsets - the order
maintained while forming
the tree.
 The FP-tree is concise and is
used to directly generate
large itemsets.
 Once an FP-tree has been
constructed, it uses a
recursive divide-and-
conquer approach to mine
the frequent itemsets. 7
FP-Tree

8
Introduction
 Terminology
 Apriori-like Algorithms
 Generate-and-Test
 Cost Bottleneck

 FP-Tree and FP-Growth Algorithm


 FP-Tree: Frequent Pattern Tree
 FP-Growth: Mining frequent patterns with FP-Tree

11
Terminology
 Item set
 A set of items: I = {a1, a2, ……, am}
 Transaction database
 DB = <T1, T2, ……, Tn>
 Pattern
 A set of items: A
 Support
 The number of transactions containing A in DB
 Frequent pattern
 A’s support ≥ minimum support threshold ξ
 Frequent Pattern Mining Problem
 The problem of finding the complete set of frequent patterns
12
FP-Tree and FP-Growth Algorithm
 FP-Tree: Frequent Pattern Tree
 Compact presentation of the DB without information loss.
 Easy to traverse, can quickly find out patterns associated with
a certain item.
 Well-ordered by item frequency.
 FP-Growth Algorithm
 Start mining from length-1 patterns
 Recursively do the following
 Constructsits conditional FP-tree
 Concatenate patterns from conditional FP-tree with suffix

 Divide-and-Conquer mining technique


14
Outline
 Introduction
 Constructing FP-Tree
 Example 1
 Mining Frequent Patterns using FP-Tree
 Example 2
 Performance Evaluation
 Discussions

15
FP-Tree Definition
 Three components:
 One root: labeled as “null”
root
 A set of item prefix subtrees
f :4 c:1
 A frequent-item header table
c:3 b:1 b:1

Header Table a:3 p:1


item head of node-links
f m:2 b:1
c
a
b p:2 m:1
m
p
16
FP-Tree Definition (cont.)
 Each node in the item prefix subtree consists of
three fields:
 item-name
 node-link

 count

 Each entry in the frequent-item header table


consists of two fields:
 item-name
 head of node-link

17
Example 1: FP-Tree Construction
 The transaction database used (fist two column only):
TID Items Bought
100 f,a,c,d,g,i,m,p
200 a,b,c,f,l,m,o
300 b,f,h,j,o
400 b,c,k,s,p
500 a,f,c,e,l,p,m,n

minimum support threshold ξ= 3

18
Example 1 (cont.)
 First Scan: //count and sort
 count the frequencies of each item
 collect length-1 frequent items, then sort them in
support descending order into L, frequent item list.
L = {(f:4), (c:4), (a:3), (b:3), (m:3), (p:3)}
TID Items Bought (Ordered) Frequent Items
100 f,a,c,d,g,i,m,p f,c,a,m,p
200 a,b,c,f,l,m,o f,c,a,b,m
300 b,f,h,j,o f,b
400 b,c,k,s,p c,b,p
500 a,f,c,e,l,p,m,n f,c,a,m,p
19
Example 1 (cont.)
 Second Scan://create the tree and header table
 create the root, label it as “null”
 for each transaction Trans, do
 select and sort the frequent items in Trans
 increase nodes count or create new nodes

If prefix nodes already exist, increase their counts by 1;


If no prefix nodes, create it and set count to 1.
 build the item header table
 nodeswith the same item-name are linked in sequence
via node-links
20
Example 1 (cont.)
The building process of the tree

root root root root

f :1 f :2 f :3

c:1 c:2 c:2 b:1

a:1 a:2 a:2

m:1 m:1 b:1 m:1 b:1

p:1 p:1 m:1 p:1 m:1


Create root After trans 1 After trans 2 After trans 3
(f,c,a,m,p) (f,c,a,b,m) (f,b)
21
Example 1 (cont.)
The building process of the tree (cont.)

root root

f :3 c:1 f :4 c:1

c:2 b:1 b:1 c:3 b:1 b:1

a:2 p:1 a:3 p:1

m:1 b:1 m:2 b:1

p:1 m:1 p:2 m:1

After trans 4 After trans 5


(c,b,p) (f,c,a,m,p)
22
Example 1 (cont.)
Build the item header table
root

f :4 c:1

c:3 b:1 b:1

Header Table a:3 p:1


item head of node-links
f m:2 b:1
c
a
b p:2 m:1
m
p

23
FP-Tree Properties
 Completeness
 Each transaction that contains frequent pattern is
mapped to a path.
 Prefix sharing does not cause path ambiguity, as
only path starts from root represents a transaction.
 Compactness
 Number of nodes bounded by overall occurrence of
frequent items.
 Height of tree bounded by maximal number of
frequent items in any transaction.

24
FP-Tree Properties (cont.)
 Traversal Friendly (for mining task)
 For any frequent item ai, all the possible frequent
patterns that contain ai can be obtained by
following ai’s node-links.
 This property is important for divide-and-conquer.
It assures the soundness and completeness of
problem reduction.

25
29
30
Outline
 Introduction
 Constructing FP-Tree
 Example 1
 Mining Frequent Patterns using FP-Tree
 Example 2
 Performance Evaluation
 Discussions

31
FP-Growth Algorithm
 Functionality:
 Mining frequent patterns using FP-Tree generated before
 Input:
 FP-tree constructed earlier
 minimum support threshold ξ

 Output:
 The complete set of frequent patterns
 Main algorithm:
 Call FP-growth(FP-tree, null)
32
FP-growth(Tree, α)
Procedure FP-growth(Tree, α)
{
if (Tree contains only a single path P)
{ for each combination β of the nodes in P
{ generate pattern β Uα;
support = min(sup of all nodes in β)
}
}
else // Tree contains more than one path
{ for each ai in the header of Tree
{ generate pattern β= ai Uα;
β.support = ai.support;
construct β’s conditional pattern base;
construct β’s conditional FP-tree Treeβ;
if (Treeβ ≠ Φ)
FP-growth(Treeβ , β);
}
}

} 33
Example 2
 Start from the bottom of the header table: node p
 Two paths transformed prefix path
 p’s conditional pattern base
 {(f:2, c:2, a:2, m:2), (c:1, b:1)} root

 p’s conditional FP-tree f :4 c:1


 Only one branch (c:3)
c:3 b:1 b:1
pattern (cp:3)
 Patterns: Header Table a:3 p:1
item head of node-links
 (p:3)
 (cp:3) f m:2 b:1
c
a
b p:2 m:1
m
p 34
Example 2 (cont.)
 Continue with node m
 Two paths
 m’s conditional pattern base
 {(f:2, c:2, a:2), (f:1,c:1, a:1, b:1)} root

 m’s conditional FP-tree: f :4 c:1


 (f:3, c:3, a:3)
c:3 b:1 b:1
 Call mine(<f:3, c:3, a:3>| m)
 Patterns: Header Table a:3 p:1
item head of node-links
 (m:3)
f m:2 b:1
 see next slide c
a
b p:2 m:1
m
p 35
mine(<(f:3, c:3, a:3>| m)
root
 node a:
 (am:3)
Header Table
 call mine(<f:3, c:3>|am)
 (cam:3) item head of node-links f :3
 call(<f:3)|cam) f
 (fcam:3)
c
 (fam:3)
a c:3
 node c:
 (cm:3)
 call mine(<f:3>|cm)
 (fcm:3) a:3
 node f: conditional FP-tree of “m”
 (fm:3)
 All the patterns: (m:3, am:3, cm:3, fm:3, cam:3, fam:3, fcm:3, fcam:3)
 Conclusion: A single path FP-Tree can be mined by outputting all the
combination of the items in the path.

36
Example 2 (cont.)
 Continue with node b
 Three paths
root
 b’s conditional pattern base
f :4 c:1
 {(f:1, c:1, a:1), (f:1), (c:1)}
 b’s conditional FP-tree c:3 b:1 b:1

 Φ Header Table a:3 p:1


item head of node-links
 Patterns: f m:2 b:1
c
 (b:3) a
b p:2 m:1
m
p 37
Example 2 (cont.)
 Continue with node a
 One path
 a’s conditional pattern base root
 {(f:3, c:3)}
f :4 c:1
 a’s conditional FP-tree
 {(f:3, c:3)} c:3 b:1 b:1
 Patterns:
Header Table a:3 p:1
 (a:3)
item head of node-links
 (ca:3)
f m:2 b:1
 (fa:3)
c
 (fca:3) a
b p:2 m:1
m
p 38
Example 2 (cont.)
 Continue with node c
 Two paths
 c’s conditional pattern base root

 {(f:3)} f :4 c:1
 c’s conditional FP-tree c:3 b:1 b:1
 {(f:3)}
Header Table a:3 p:1
 Patterns: item head of node-links
 (c:4) f m:2 b:1
c
 (fc:3) a
b p:2 m:1
m
p 39
Example 2 (cont.)
 Continue with node f
 One path
root
 f’s conditional pattern base
f :4 c:1
 Φ
 f’s conditional FP-tree c:3 b:1 b:1

 Φ Header Table a:3 p:1


item head of node-links
 Patterns: f m:2 b:1
c
 (f:4) a
b p:2 m:1
m
p 40
Example 2 (cont.)
 Final results:
item conditional pattern base conditional FP-tree
p {(f:2, c:2, a:2, m:2), (c:1, b:1)} {(c:3)}| p
m {(f:4, c:3, a:3, m:2), {(f:3, c:3, a:3)}| m
(f:4, c:3, a:3, b:1, m:1)}
b {(f:4, c:3, a:3, b:1), (f:4, b:1), Φ
(c:1, b:1)}
a {(f;3, c:3)} {(f:3, c:3}| a
c {(f:3)} {(f:3)}| c
f Φ Φ

41
FP-Growth Properties
 Property 3.2 : Prefix path property
 To calculate the frequent patterns for a node ai in a
path P, only the prefix subpath of node ai in P need
to be accumulated, and the frequency count of
every node in the prefix path should carry the same
count as node ai.
 Lemma 3.1 : Fragment growth
 Letαbe an itemset in DB, B beα's conditional
pattern base, and βbe an itemset in B. Then the
support ofαUβin DB is equivalent to the support of
βin B.
42
FP-Growth Properties (cont.)
 Corollary 3.1 (Pattern growth)
 Let α be a frequent itemset in DB, B be α's conditional
pattern base, and β be an itemset in B. Then αUβ is frequent
in DB if and only if is β frequent in B.
 Lemma 3.2 (Single FP-tree path pattern generation)
 Suppose an FP-tree T has a single path P. The complete set
of the frequent patterns of T can be generated by the
enumeration of all the combinations of the subpaths of P
with the support being the minimum support of the items
contained in the subpath.

43
Outline
 Introduction
 Constructing FP-Tree
 Example 1
 Mining Frequent Patterns using FP-Tree
 Example 2
 Performance Evaluation
 Discussions

44
Performance Evaluation:
FP-Tree vs. Apriori
 Scalability with Support Threshold

45
Performance Evaluation:
FP-Tree vs. Apriori (Cont.)
 Per-item runtime actually decreases with support
threshold decrease.

46
Performance Evaluation:
FP-Tree vs. Apriori (Cont.)
 Scalability with DB size.

47

You might also like