FPgrowth
FPgrowth
In Data Mining, finding frequent patterns in large databases is very important and has
been studied on a large scale in the past few years. Unfortunately, this task is
computationally expensive, especially when many patterns exist.
The FP-Growth Algorithm proposed by Han in. This is an efficient and scalable method
for mining the complete set of frequent patterns by pattern fragment growth, using an
extended prefix-tree structure for storing compressed and crucial information about
frequent patterns named frequent-pattern tree (FP-tree). In his study, Han proved that
his method outperforms other popular methods for mining frequent patterns, e.g. the
Apriori Algorithm and the TreeProjection. In some later works, it was proved that FP-
Growth performs better than other methods, including Eclat and Relim. The popularity
and efficiency of the FP-Growth Algorithm contribute to many studies that propose
variations to improve its performance.
o After this first step, it divides the compressed database into a set of conditional
databases, each associated with one frequent pattern.
Using this strategy, the FP-Growth reduces the search costs by recursively looking for
short patterns and then concatenating them into the long frequent patterns.
In large databases, holding the FP tree in the main memory is impossible. A strategy to
cope with this problem is to partition the database into a set of smaller databases (called
projected databases) and then construct an FP-tree from each of these smaller
databases.
FP-Tree
The frequent-pattern tree (FP-tree) is a compact data structure that stores quantitative
information about frequent patterns in a database. Each transaction is read and then
mapped onto a path in the FP-tree. This is done until all transactions have been read.
Different transactions with common subsets allow the tree to remain compact because
their paths overlap.
A frequent Pattern Tree is made with the initial item sets of the database. The purpose of
the FP tree is to mine the most frequent pattern. Each node of the FP tree represents an
item of the item set.
The root node represents null, while the lower nodes represent the item sets. The
associations of the nodes with the lower nodes, that is, the item sets with the other item
sets, are maintained while forming the tree.
o This algorithm needs to scan the database twice when compared to Apriori, which
scans the transactions for each iteration.
o It is efficient and scalable for mining both long and short frequent patterns.
o It may be expensive.
o The algorithm may not fit in the shared memory when the database is large.
Apriori FP Growth
Since apriori scans the database in FP-tree requires only one database
each step, it becomes time-consuming scan in its beginning steps, so it
for data where the number of items is consumes less time.
larger.