FP Tree
FP Tree
2
Apriori Algorithm
The Apriori Algorithm produces frequent
patterns by generating itemsets and discovering
the most frequent itemset over a threshold
“minimal support count”.
It greatly reduces the size of the itemset in the
database by one simple principle:
If an itemset is frequent, then all of its
subsets must also be frequent.
3
Apriori Algorithm
Apriori Algorithm is an influential algorithm for
mining frequent itemsets for boolean association
rules.
It uses prior knowledge of frequent itemset
properties (Aprior).
It uses K frequent itemsets to find K+1
itemsets.
It is based on three concept: Frequent itemset,
Apriori property and join operations
4
Apriori Algorithm
Advantages :
Easy to understand and implement.
Can be easily parallelized.
Disadvantages:
Requires many database scans.
Assumes transaction database is memory resident.
5
Apriori Algorithm has a major shortfall. Using
Apriori requires multiple scans of the database
to check the support count of each item and
itemsets.
When the database is huge, this will cost a
significant amount of disk I/O and computing
power.
Therefore the FP-Growth algorithm is created
to overcome this shortfall. It only scans the
database twice and used a tree structure(FP-
tree) to store all the information.
6
FP Tree
The root represents null,
Each node represents an
item,
association of the nodes in
the itemsets - the order
maintained while forming
the tree.
The FP-tree is concise and is
used to directly generate
large itemsets.
Once an FP-tree has been
constructed, it uses a
recursive divide-and-
conquer approach to mine
the frequent itemsets. 7
FP-Tree
8
Introduction
Terminology
Apriori-like Algorithms
Generate-and-Test
Cost Bottleneck
11
Terminology
Item set
A set of items: I = {a1, a2, ……, am}
Transaction database
DB = <T1, T2, ……, Tn>
Pattern
A set of items: A
Support
The number of transactions containing A in DB
Frequent pattern
A’s support ≥ minimum support threshold ξ
Frequent Pattern Mining Problem
The problem of finding the complete set of frequent patterns
12
FP-Tree and FP-Growth Algorithm
FP-Tree: Frequent Pattern Tree
Compact presentation of the DB without information loss.
Easy to traverse, can quickly find out patterns associated with
a certain item.
Well-ordered by item frequency.
FP-Growth Algorithm
Start mining from length-1 patterns
Recursively do the following
Constructsits conditional FP-tree
Concatenate patterns from conditional FP-tree with suffix
15
FP-Tree Definition
Three components:
One root: labeled as “null”
root
A set of item prefix subtrees
f :4 c:1
A frequent-item header table
c:3 b:1 b:1
count
17
Example 1: FP-Tree Construction
The transaction database used (fist two column only):
TID Items Bought
100 f,a,c,d,g,i,m,p
200 a,b,c,f,l,m,o
300 b,f,h,j,o
400 b,c,k,s,p
500 a,f,c,e,l,p,m,n
18
Example 1 (cont.)
First Scan: //count and sort
count the frequencies of each item
collect length-1 frequent items, then sort them in
support descending order into L, frequent item list.
L = {(f:4), (c:4), (a:3), (b:3), (m:3), (p:3)}
TID Items Bought (Ordered) Frequent Items
100 f,a,c,d,g,i,m,p f,c,a,m,p
200 a,b,c,f,l,m,o f,c,a,b,m
300 b,f,h,j,o f,b
400 b,c,k,s,p c,b,p
500 a,f,c,e,l,p,m,n f,c,a,m,p
19
Example 1 (cont.)
Second Scan://create the tree and header table
create the root, label it as “null”
for each transaction Trans, do
select and sort the frequent items in Trans
increase nodes count or create new nodes
f :1 f :2 f :3
root root
f :3 c:1 f :4 c:1
f :4 c:1
23
FP-Tree Properties
Completeness
Each transaction that contains frequent pattern is
mapped to a path.
Prefix sharing does not cause path ambiguity, as
only path starts from root represents a transaction.
Compactness
Number of nodes bounded by overall occurrence of
frequent items.
Height of tree bounded by maximal number of
frequent items in any transaction.
24
FP-Tree Properties (cont.)
Traversal Friendly (for mining task)
For any frequent item ai, all the possible frequent
patterns that contain ai can be obtained by
following ai’s node-links.
This property is important for divide-and-conquer.
It assures the soundness and completeness of
problem reduction.
25
29
30
Outline
Introduction
Constructing FP-Tree
Example 1
Mining Frequent Patterns using FP-Tree
Example 2
Performance Evaluation
Discussions
31
FP-Growth Algorithm
Functionality:
Mining frequent patterns using FP-Tree generated before
Input:
FP-tree constructed earlier
minimum support threshold ξ
Output:
The complete set of frequent patterns
Main algorithm:
Call FP-growth(FP-tree, null)
32
FP-growth(Tree, α)
Procedure FP-growth(Tree, α)
{
if (Tree contains only a single path P)
{ for each combination β of the nodes in P
{ generate pattern β Uα;
support = min(sup of all nodes in β)
}
}
else // Tree contains more than one path
{ for each ai in the header of Tree
{ generate pattern β= ai Uα;
β.support = ai.support;
construct β’s conditional pattern base;
construct β’s conditional FP-tree Treeβ;
if (Treeβ ≠ Φ)
FP-growth(Treeβ , β);
}
}
} 33
Example 2
Start from the bottom of the header table: node p
Two paths transformed prefix path
p’s conditional pattern base
{(f:2, c:2, a:2, m:2), (c:1, b:1)} root
36
Example 2 (cont.)
Continue with node b
Three paths
root
b’s conditional pattern base
f :4 c:1
{(f:1, c:1, a:1), (f:1), (c:1)}
b’s conditional FP-tree c:3 b:1 b:1
{(f:3)} f :4 c:1
c’s conditional FP-tree c:3 b:1 b:1
{(f:3)}
Header Table a:3 p:1
Patterns: item head of node-links
(c:4) f m:2 b:1
c
(fc:3) a
b p:2 m:1
m
p 39
Example 2 (cont.)
Continue with node f
One path
root
f’s conditional pattern base
f :4 c:1
Φ
f’s conditional FP-tree c:3 b:1 b:1
41
FP-Growth Properties
Property 3.2 : Prefix path property
To calculate the frequent patterns for a node ai in a
path P, only the prefix subpath of node ai in P need
to be accumulated, and the frequency count of
every node in the prefix path should carry the same
count as node ai.
Lemma 3.1 : Fragment growth
Letαbe an itemset in DB, B beα's conditional
pattern base, and βbe an itemset in B. Then the
support ofαUβin DB is equivalent to the support of
βin B.
42
FP-Growth Properties (cont.)
Corollary 3.1 (Pattern growth)
Let α be a frequent itemset in DB, B be α's conditional
pattern base, and β be an itemset in B. Then αUβ is frequent
in DB if and only if is β frequent in B.
Lemma 3.2 (Single FP-tree path pattern generation)
Suppose an FP-tree T has a single path P. The complete set
of the frequent patterns of T can be generated by the
enumeration of all the combinations of the subpaths of P
with the support being the minimum support of the items
contained in the subpath.
43
Outline
Introduction
Constructing FP-Tree
Example 1
Mining Frequent Patterns using FP-Tree
Example 2
Performance Evaluation
Discussions
44
Performance Evaluation:
FP-Tree vs. Apriori
Scalability with Support Threshold
45
Performance Evaluation:
FP-Tree vs. Apriori (Cont.)
Per-item runtime actually decreases with support
threshold decrease.
46
Performance Evaluation:
FP-Tree vs. Apriori (Cont.)
Scalability with DB size.
47