Mining Frequent Patterns Unit-3
Mining Frequent Patterns Unit-3
Unit -3
Basic Concepts in Frequent Pattern Mining
Frequent pattern mining in data mining is the process of identifying patterns
or associations within a dataset that occur frequently. This is typically done
by analyzing large datasets to find items or sets of items that appear
together frequently.
Frequent item set mining is a market basket analysis methodology that helps
to find patterns in the shopping behaviours of users across different
shopping platforms. These relationships are represented in the form of
association rules. Frequent element set or pattern mining is widely used due
to its wide applications in pattern mining, correlations, and constraints that
are based on frequent patterns, sequential patterns, and many other data
mining tasks. Specifically, this technique is used to find sets of products that
are frequently bought together.
Consider the following dataset and we will find frequent itemsets and
generate association rules for them:
Candidate 2-Itemsets
Frequent 2-itemsets:
{Bread, Milk} meet the 50% threshold but {Butter, Milk} and
{Bread ,Butter} doesn't meet the threshold, so will be committed out.
Step 4: Generate Candidate 3-Itemsets
Combine the frequent 2-itemsets into groups of 3 and calculate their
support.
for the triplet, we have only got one case i.e {bread,butter,milk} and we
will calculate the support.
Candidate 3-Itemsets
Since this does not meet the 50% threshold, there are no frequent 3-
itemsets.
Support of {Bread} = 4.
Confidence = 2/4 = 50% (Failed threshold).
Rule 2: If Butter → Bread (if customer buys butter, the customer
will buy bread also)
Support of {Bread, Butter} = 3.
Support of {Butter} = 3.
Confidence = 3/3 = 100% (Passes threshold).
Rule 3: If Bread → Milk (if customer buys bread, the customer will
buy milk also)
Support of {Bread, Milk} = 3.
Support of {Bread} = 4.
Confidence = 3/4 = 75% (Passes threshold).
Mining the Tree: The algorithm then examines this tree to identify patterns
that appear frequently based on a minimum support threshold. It does this
by breaking the tree down into smaller "conditional" trees for each item
making the process more efficient.
Generating Patterns: Once the tree is built and analyzed the algorithm
generates the frequent patterns (itemsets) and the rules that describe
relationships between items.
{E,K,M,N,O,Y
T1 }
{D,E,K,N,O,Y
T2 }
T3 {A,E,K,M}
T4 {K,M,Y}
{C,E,I,K,O,O
T5 }
Ite Frequenc
m y
A 1
C 2
D 1
E 4
I 1
K 5
M 3
N 2
O 4
U 1
Y 3
Let the minimum support be 3. A Frequent Pattern set is built which will
contain all the elements whose frequency is greater than or equal to the
minimum support. These elements are stored in descending order of their
respective frequencies. After insertion of the relevant items, the set L looks
like this:-
L = {K : 5, E : 4, M : 3, O : 4, Y : 3}
Now for each transaction the respective Ordered-Item set is built. It is done
by iterating the Frequent Pattern set and checking if the current item is
contained in the transaction in question. If the current item is contained the
item is inserted in the Ordered-Item set for the current transaction. The
following table is built for all the transactions:
Transaction Ordered-Item-
ID Items Set
{E,K,M,N,O,Y
{K,E,M,O,Y}
T1 }
{D,E,K,N,O,Y
{K,E,O,Y}
T2 }
T3 {A,E,K,M} {K,E,M}
T4 {C,K,M,U,Y} {K,M,Y}
{C,E,I,K,O,O
{K,E,O}
T5 }
Now all the Ordered-Item sets are inserted into a Tree Data Structure.
Note that the items in the below table are arranged in the ascending order of
their
frequencies.
Now for each item, the Conditional Frequent Pattern Tree is built. It is
done by taking the set of elements that is common in all the paths in the
Conditional Pattern Base of that item and calculating its support count by
summing the support counts of all the paths in the Conditional Pattern Base.
Association Rule
Association rule mining finds interesting associations and relationships
among large sets of data items. This rule shows how frequently a itemset
occurs in a transaction. A typical example is a Market Based Analysis. Market
Based Analysis is one of the key techniques used by large relations to show
associations between items.It allows retailers to identify relationships
between the items that people buy together frequently. Given a set of
transactions, we can find rules that will predict the occurrence of an item
based on the occurrences of other items in the transaction.
OR
Association rule mining is a technique in data mining for discovering
interesting relationships, frequent patterns, associations, or correlations,
between variables in large datasets. It’s widely used in various fields such as
market basket analysis, web usage mining, bioinformatics, and more. The
basic idea is to find rules that predict the occurrence of an item based on the
occurrences of other items in the transaction.
Association Rules
Association rule mining finds interesting associations and relationships
among large sets of data items. This rule shows how frequently a itemset
occurs in a transaction
Example
Lift(l) - The lift of the rule X=>Y is the confidence of the rule divided by
the expected confidence, assuming that the itemsets X and Y are
independent of each other. The expected confidence is the confidence
divided by the frequency of {Y}.
Antecedent →Consequent
· Every association rule has two parts: the antecedent (if) and
the consequent (then). For instance, in the rule {Pasta →Sauce}, ‘Pasta’ is
the antecedent, and ‘Sauce’ is the consequent.
· Antecedent (X): This is the first part of the rule, the condition. It’s the set
of items found in the database that you are examining for patterns. In the
rule {X} →{Y}, X is the antecedent.
· Consequent (Y): This is the second part of the rule, which is inferred from
the presence of the antecedent in transactions. In the rule {X} →{Y}, Y is the
consequent.