0% found this document useful (0 votes)
3 views13 pages

Mining Frequent Patterns Unit-3

Frequent pattern mining is a data mining technique used to identify patterns or associations in datasets that occur frequently, particularly in market basket analysis. The Apriori Algorithm and FP-Growth Algorithm are two foundational methods for discovering frequent itemsets and generating association rules, which help businesses optimize marketing strategies and product placements. Applications of these algorithms span various industries, including e-commerce, food delivery, and financial services.

Uploaded by

vedanta.rcr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

Mining Frequent Patterns Unit-3

Frequent pattern mining is a data mining technique used to identify patterns or associations in datasets that occur frequently, particularly in market basket analysis. The Apriori Algorithm and FP-Growth Algorithm are two foundational methods for discovering frequent itemsets and generating association rules, which help businesses optimize marketing strategies and product placements. Applications of these algorithms span various industries, including e-commerce, food delivery, and financial services.

Uploaded by

vedanta.rcr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Mining Frequent Patterns

Unit -3
Basic Concepts in Frequent Pattern Mining
Frequent pattern mining in data mining is the process of identifying patterns
or associations within a dataset that occur frequently. This is typically done
by analyzing large datasets to find items or sets of items that appear
together frequently.

Frequent Item set in Data set


Frequent item sets, also known as association rules, are a fundamental
concept in association rule mining, which is a technique used in data mining
to discover relationships between items in a dataset. The goal of association
rule mining is to identify relationships between items in a dataset that occur
frequently together.
A frequent item set is a set of items that occur together frequently in a
dataset

Frequent item set mining is a market basket analysis methodology that helps
to find patterns in the shopping behaviours of users across different
shopping platforms. These relationships are represented in the form of
association rules. Frequent element set or pattern mining is widely used due
to its wide applications in pattern mining, correlations, and constraints that
are based on frequent patterns, sequential patterns, and many other data
mining tasks. Specifically, this technique is used to find sets of products that
are frequently bought together.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 1


Mining Frequent Patterns

A set of items together is called an itemset. If any itemset has k-items it is


called a k-itemset. An itemset consists of two or more items. An itemset
that occurs frequently is called a frequent itemset. Thus frequent itemset
mining is a data mining technique to identify the items that often occur
together.
• For Example, Bread and butter, Laptop and Antivirus software, etc

Apriori Algorithm is a foundational method in data mining used for


discovering frequent itemsets and generating association rules. Its
significance lies in its ability to identify relationships between items in large
datasets which is particularly valuable in market basket analysis.
For example, if a grocery store finds that customers who buy bread often
also buy butter, it can use this information to optimise product placement
or marketing strategies.

How the Apriori Algorithm Works?

The Apriori Algorithm operates through a systematic process that involves


several key steps:
1. Identifying Frequent Itemsets: The algorithm begins by scanning the
dataset to identify individual items (1-item) and their frequencies. It then
establishes a minimum support threshold, which determines whether
an itemset is considered frequent.
2. Creating Possible item group: Once frequent 1-itemgroup(single
items) are identified, the algorithm generates candidate 2-itemgroup by
combining frequent items. This process continues iteratively, forming
larger itemsets (k-itemgroup) until no more frequent itemgroup can be
found.
3. Removing Infrequent Item groups: The algorithm employs a pruning
technique based on the Apriori Property, which states that if an
itemset is infrequent, all its supersets must also be infrequent. This
significantly reduces the number of combinations that need to be
evaluated.
4. Generating Association Rules: After identifying frequent itemsets, the
algorithm generates association rules that illustrate how items relate
to one another, using metrics like support, confidence, and lift to
evaluate the strength of these relationships.

Consider the following dataset and we will find frequent itemsets and
generate association rules for them:

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 2


Mining Frequent Patterns

All items have support% ≥ 50%, so they qualify as frequent 1-itemsets.


if any item has support% < 50%, It will be omitted out from the frequent 1-
itemsets.
Step 3: Generate Candidate 2-Itemsets
Combine the frequent 1-itemsets into pairs and calculate their support.
For this use case, we will get 3 item pairs ( bread,butter) , (bread,ilk) and
(butter,milk) and will calculate the support similar to step 2

Candidate 2-Itemsets
Frequent 2-itemsets:
 {Bread, Milk} meet the 50% threshold but {Butter, Milk} and
{Bread ,Butter} doesn't meet the threshold, so will be committed out.
Step 4: Generate Candidate 3-Itemsets
Combine the frequent 2-itemsets into groups of 3 and calculate their
support.
for the triplet, we have only got one case i.e {bread,butter,milk} and we
will calculate the support.

Candidate 3-Itemsets
Since this does not meet the 50% threshold, there are no frequent 3-
itemsets.

Step 5: Generate Association Rules


Now we generate rules from the frequent itemsets and
calculate confidence.
Rule 1: If Bread → Butter (if customer buys bread, the customer will
buy butter also)
 Support of {Bread, Butter} = 2.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 3


Mining Frequent Patterns

 Support of {Bread} = 4.
 Confidence = 2/4 = 50% (Failed threshold).
Rule 2: If Butter → Bread (if customer buys butter, the customer
will buy bread also)
 Support of {Bread, Butter} = 3.
 Support of {Butter} = 3.
 Confidence = 3/3 = 100% (Passes threshold).
Rule 3: If Bread → Milk (if customer buys bread, the customer will
buy milk also)
 Support of {Bread, Milk} = 3.
 Support of {Bread} = 4.
 Confidence = 3/4 = 75% (Passes threshold).

Applications of Apriori Algorithm


Below are some applications of Apriori algorithm used in today's companies
and startups
1. E-commerce: Used to recommend products that are often bought
together, like laptop + laptop bag, increasing sales.
2. Food Delivery Services: Identifies popular combos, such as burger +
fries, to offer combo deals to customers.
3. Streaming Services: Recommends related movies or shows based on
what users often watch together, like action + superhero movies.
4. Financial Services: Analyzes spending habits to suggest personalised
offers, such as credit card deals based on frequent purchases.
5. Travel & Hospitality: Creates travel packages (e.g., flight + hotel) by
finding commonly purchased services together.
6. Health & Fitness: Suggests workout plans or supplements based on
users' past activities, like protein shakes + workouts .

Frequent Pattern Growth Algorithm


The FP-Growth (Frequent Pattern Growth) algorithm efficiently mines
frequent itemsets from large transactional datasets.

How FP-Growth Works


Here's how it works in simple terms:

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 4


Mining Frequent Patterns

Data Compression: First FP-Growth compresses the dataset into a smaller


structure called the Frequent Pattern Tree (FP-Tree). This tree stores
information about item sets (collections of items) and their frequencies
without need to generate candidate sets like Apriori does.

Mining the Tree: The algorithm then examines this tree to identify patterns
that appear frequently based on a minimum support threshold. It does this
by breaking the tree down into smaller "conditional" trees for each item
making the process more efficient.

Generating Patterns: Once the tree is built and analyzed the algorithm
generates the frequent patterns (itemsets) and the rules that describe
relationships between items.

Working of FP- Growth Algorithm


Lets jump to the usage of FP- Growth Algorithm and how it works with reallife
data. Consider the following data:
saction
ID Items

{E,K,M,N,O,Y
T1 }

{D,E,K,N,O,Y
T2 }

T3 {A,E,K,M}

T4 {K,M,Y}

{C,E,I,K,O,O
T5 }

The above-given data is a hypothetical dataset of transactions with each


letter representing an item. The frequency of each individual item is
computed:-

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 5


Mining Frequent Patterns

Ite Frequenc
m y

A 1

C 2

D 1

E 4

I 1

K 5

M 3

N 2

O 4

U 1

Y 3

Let the minimum support be 3. A Frequent Pattern set is built which will
contain all the elements whose frequency is greater than or equal to the
minimum support. These elements are stored in descending order of their
respective frequencies. After insertion of the relevant items, the set L looks
like this:-

L = {K : 5, E : 4, M : 3, O : 4, Y : 3}

Now for each transaction the respective Ordered-Item set is built. It is done
by iterating the Frequent Pattern set and checking if the current item is
contained in the transaction in question. If the current item is contained the

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 6


Mining Frequent Patterns

item is inserted in the Ordered-Item set for the current transaction. The
following table is built for all the transactions:

Transaction Ordered-Item-
ID Items Set

{E,K,M,N,O,Y
{K,E,M,O,Y}
T1 }

{D,E,K,N,O,Y
{K,E,O,Y}
T2 }

T3 {A,E,K,M} {K,E,M}

T4 {C,K,M,U,Y} {K,M,Y}

{C,E,I,K,O,O
{K,E,O}
T5 }

Now all the Ordered-Item sets are inserted into a Tree Data Structure.

a) Inserting the set {K, E, M, O, Y}


Here all the items are simply linked one after the other in the order of
occurrence in the set and initialise the support count for each item as 1. For
inserting {K, E, M, O, Y} we traverse the tree from the root. If a node already
exists for an item, we increase its support count. If it doesn’t exist, we create
a new node for that item and link it to the previous item.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 7


Mining Frequent Patterns

Inserting the set {K, E, M, O, Y}


b) Inserting the set {K, E, O, Y}
Till the insertion of the elements K and E, simply the support count is
increased by 1. On inserting O we can see that there is no direct link
between E and O, therefore a new node for the item O is initialized with the
support count as 1 and item E is linked to this new node. On inserting Y, we
first initialize a new node for the item Y with support count as 1 and link the
new node of O with the new node of Y.

c) Inserting the set {K, E, M}


Here simply the support count of each element is increased by 1.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 8


Mining Frequent Patterns

d) Inserting the set {K, M, Y}


Similar to step b), first the support count of K is increased, then new nodes
for M and Y are initialized and linked accordingly.

e) Inserting the set {K, E, O}


Here simply the support counts of the respective elements are increased.
Note that the support count of the new node of item O is increased.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 9


Mining Frequent Patterns

Note that the items in the below table are arranged in the ascending order of
their
frequencies.

Now for each item, the Conditional Frequent Pattern Tree is built. It is
done by taking the set of elements that is common in all the paths in the
Conditional Pattern Base of that item and calculating its support count by
summing the support counts of all the paths in the Conditional Pattern Base.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 10


Mining Frequent Patterns

Association Rule
Association rule mining finds interesting associations and relationships
among large sets of data items. This rule shows how frequently a itemset
occurs in a transaction. A typical example is a Market Based Analysis. Market
Based Analysis is one of the key techniques used by large relations to show
associations between items.It allows retailers to identify relationships
between the items that people buy together frequently. Given a set of
transactions, we can find rules that will predict the occurrence of an item
based on the occurrences of other items in the transaction.
OR
Association rule mining is a technique in data mining for discovering
interesting relationships, frequent patterns, associations, or correlations,
between variables in large datasets. It’s widely used in various fields such as
market basket analysis, web usage mining, bioinformatics, and more. The
basic idea is to find rules that predict the occurrence of an item based on the
occurrences of other items in the transaction.

Imagine a small dataset representing transactions in a grocery store:

In this table, each row represents a transaction (a customer’s purchase), and


each transaction has a unique ID. The ‘Items Purchased’ column lists the
items bought in that transaction.

Association Rules
Association rule mining finds interesting associations and relationships
among large sets of data items. This rule shows how frequently a itemset
occurs in a transaction

· An association rule is a fundamental concept in data mining that reveals


how items within a dataset are connected. It’s a directive that suggests a

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 11


Mining Frequent Patterns

strong, potentially useful relationship between two sets of items.

· These rules are expressed in the form of “If-Then” statements, typically


written as {X} → {Y}, where X and Y are different sets of items.

Example

let us first see the basic definitions. Support Count(σσ) - Frequency of


occurrence of a itemset.
Here σσ({Milk, Bread, Diaper})=2

Frequent Itemset - An itemset whose support is greater than or equal to


minsup threshold. Association Rule - An implication expression of the
form X -> Y, where X and Y are any 2 itemsets.

Example: {Milk, Diaper}->{Beer}


Rule Evaluation Metrics -
 Support(s) - The number of transactions that include items in the {X}
and {Y} parts of the rule as a percentage of the total number of
transaction. It is a measure of how frequently the collection of items
occur together as a percentage of all transactions.

 Support = σσ(X+Y) ÷÷ total - It is interpreted as fraction of


transactions that contain both X and Y.

 Confidence(c) - It is the ratio of the no of transactions that includes all


items in {B} as well as the no of transactions that includes all items in
{A} to the no of transactions that includes all items in {A}.

 Conf(X=>Y) = Supp(X∪∪Y) ÷÷ Supp(X) - It measures how often


each item in Y appears in transactions that contains items in X also.

 Lift(l) - The lift of the rule X=>Y is the confidence of the rule divided by
the expected confidence, assuming that the itemsets X and Y are
independent of each other. The expected confidence is the confidence
divided by the frequency of {Y}.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 12


Mining Frequent Patterns

 Lift(X=>Y) = Conf(X=>Y) ÷÷ Supp(Y) - Lift value near 1 indicates X


and Y almost often appear together as expected, greater than 1 means
they appear together more than expected and less than 1 means they
appear less than expected. Greater lift values indicate stronger
association.

To illustrate, consider a rule like {Diapers} →{Baby Wipes}. This rule


suggests that in transactions where diapers are bought, there’s a strong
likelihood that baby wipes are also purchased.

Market Basket Analysis


This is a primary application of association rules, focusing on analyzing
purchase patterns. By examining combinations of items that frequently occur
together in purchases, businesses can gain insights into marketing and sales
strategies.

Components of Association Rules — Antecedent and Consequent:


{BREAD, MILK} →{BEER}

Antecedent →Consequent
· Every association rule has two parts: the antecedent (if) and
the consequent (then). For instance, in the rule {Pasta →Sauce}, ‘Pasta’ is
the antecedent, and ‘Sauce’ is the consequent.

· Antecedent (X): This is the first part of the rule, the condition. It’s the set
of items found in the database that you are examining for patterns. In the
rule {X} →{Y}, X is the antecedent.

· Consequent (Y): This is the second part of the rule, which is inferred from
the presence of the antecedent in transactions. In the rule {X} →{Y}, Y is the
consequent.

These sets are disjoint, meaning they do not overlap.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 13

You might also like