0% found this document useful (0 votes)
13 views

FPgrowth

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

FPgrowth

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

FP Growth Algorithm in Data Mining

In Data Mining, finding frequent patterns in large databases is very important and has
been studied on a large scale in the past few years. Unfortunately, this task is
computationally expensive, especially when many patterns exist.

The FP-Growth Algorithm proposed by Han in. This is an efficient and scalable method
for mining the complete set of frequent patterns by pattern fragment growth, using an
extended prefix-tree structure for storing compressed and crucial information about
frequent patterns named frequent-pattern tree (FP-tree). In his study, Han proved that
his method outperforms other popular methods for mining frequent patterns, e.g. the
Apriori Algorithm and the TreeProjection. In some later works, it was proved that FP-
Growth performs better than other methods, including Eclat and Relim. The popularity
and efficiency of the FP-Growth Algorithm contribute to many studies that propose
variations to improve its performance.

What is FP Growth Algorithm?


The FP-Growth Algorithm is an alternative way to find frequent item sets without using
candidate generations, thus improving performance. For so much, it uses a divide-and-
conquer strategy. The core of this method is the usage of a special data structure named
frequent-pattern tree (FP-tree), which retains the item set association information.

This algorithm works as follows:

o First, it compresses the input database creating an FP-tree instance to represent


frequent items.

o After this first step, it divides the compressed database into a set of conditional
databases, each associated with one frequent pattern.

o Finally, each such database is mined separately.

Using this strategy, the FP-Growth reduces the search costs by recursively looking for
short patterns and then concatenating them into the long frequent patterns.

In large databases, holding the FP tree in the main memory is impossible. A strategy to
cope with this problem is to partition the database into a set of smaller databases (called
projected databases) and then construct an FP-tree from each of these smaller
databases.

FP-Tree
The frequent-pattern tree (FP-tree) is a compact data structure that stores quantitative
information about frequent patterns in a database. Each transaction is read and then
mapped onto a path in the FP-tree. This is done until all transactions have been read.
Different transactions with common subsets allow the tree to remain compact because
their paths overlap.

A frequent Pattern Tree is made with the initial item sets of the database. The purpose of
the FP tree is to mine the most frequent pattern. Each node of the FP tree represents an
item of the item set.
The root node represents null, while the lower nodes represent the item sets. The
associations of the nodes with the lower nodes, that is, the item sets with the other item
sets, are maintained while forming the tree.

Advantages of FP Growth Algorithm


Here are the following advantages of the FP growth algorithm, such as:

o This algorithm needs to scan the database twice when compared to Apriori, which
scans the transactions for each iteration.

o The pairing of items is not done in this algorithm, making it faster.

o The database is stored in a compact version in memory.

o It is efficient and scalable for mining both long and short frequent patterns.

Disadvantages of FP-Growth Algorithm


This algorithm also has some disadvantages, such as:

o FP Tree is more cumbersome and difficult to build than Apriori.

o It may be expensive.

o The algorithm may not fit in the shared memory when the database is large.

Difference between Apriori and FP Growth Algorithm


Apriori and FP-Growth algorithms are the most basic FIM algorithms. There are some
basic differences between these algorithms, such as:

Apriori FP Growth

Apriori generates frequent patterns by FP Growth generates an FP-Tree for


making the itemsets using pairings making frequent patterns.
such as single item set, double itemset,
and triple itemset.

Apriori uses candidate generation FP-growth generates a conditional FP-


where frequent subsets are extended Tree for every item in the data.
one item at a time.

Since apriori scans the database in FP-tree requires only one database
each step, it becomes time-consuming scan in its beginning steps, so it
for data where the number of items is consumes less time.
larger.

A converted version of the database is A set of conditional FP-tree for every


saved in the memory item is saved in the memory

It uses a breadth-first search It uses a depth-first search.

You might also like