0% found this document useful (0 votes)

3 views13 pages

Mining Frequent Patterns Unit-3

Frequent pattern mining is a data mining technique used to identify patterns or associations in datasets that occur frequently, particularly in market basket analysis. The Apriori Algorithm and FP-Growth Algorithm are two foundational methods for discovering frequent itemsets and generating association rules, which help businesses optimize marketing strategies and product placements. Applications of these algorithms span various industries, including e-commerce, food delivery, and financial services.

Uploaded by

vedanta.rcr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views13 pages

Mining Frequent Patterns Unit-3

Uploaded by

vedanta.rcr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Mining Frequent Patterns

Unit -3
Basic Concepts in Frequent Pattern Mining
Frequent pattern mining in data mining is the process of identifying patterns
or associations within a dataset that occur frequently. This is typically done
by analyzing large datasets to find items or sets of items that appear
together frequently.

Frequent Item set in Data set

Frequent item sets, also known as association rules, are a fundamental
concept in association rule mining, which is a technique used in data mining
to discover relationships between items in a dataset. The goal of association
rule mining is to identify relationships between items in a dataset that occur
frequently together.
A frequent item set is a set of items that occur together frequently in a
dataset

Frequent item set mining is a market basket analysis methodology that helps
to find patterns in the shopping behaviours of users across different
shopping platforms. These relationships are represented in the form of
association rules. Frequent element set or pattern mining is widely used due
to its wide applications in pattern mining, correlations, and constraints that
are based on frequent patterns, sequential patterns, and many other data
mining tasks. Specifically, this technique is used to find sets of products that
are frequently bought together.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 1

Mining Frequent Patterns

A set of items together is called an itemset. If any itemset has k-items it is

called a k-itemset. An itemset consists of two or more items. An itemset
that occurs frequently is called a frequent itemset. Thus frequent itemset
mining is a data mining technique to identify the items that often occur
together.
• For Example, Bread and butter, Laptop and Antivirus software, etc

Apriori Algorithm is a foundational method in data mining used for

discovering frequent itemsets and generating association rules. Its
significance lies in its ability to identify relationships between items in large
datasets which is particularly valuable in market basket analysis.
For example, if a grocery store finds that customers who buy bread often
also buy butter, it can use this information to optimise product placement
or marketing strategies.

How the Apriori Algorithm Works?

The Apriori Algorithm operates through a systematic process that involves

several key steps:
1. Identifying Frequent Itemsets: The algorithm begins by scanning the
dataset to identify individual items (1-item) and their frequencies. It then
establishes a minimum support threshold, which determines whether
an itemset is considered frequent.
2. Creating Possible item group: Once frequent 1-itemgroup(single
items) are identified, the algorithm generates candidate 2-itemgroup by
combining frequent items. This process continues iteratively, forming
larger itemsets (k-itemgroup) until no more frequent itemgroup can be
found.
3. Removing Infrequent Item groups: The algorithm employs a pruning
technique based on the Apriori Property, which states that if an
itemset is infrequent, all its supersets must also be infrequent. This
significantly reduces the number of combinations that need to be
evaluated.
4. Generating Association Rules: After identifying frequent itemsets, the
algorithm generates association rules that illustrate how items relate
to one another, using metrics like support, confidence, and lift to
evaluate the strength of these relationships.

Consider the following dataset and we will find frequent itemsets and
generate association rules for them:

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 2

Mining Frequent Patterns

All items have support% ≥ 50%, so they qualify as frequent 1-itemsets.

if any item has support% < 50%, It will be omitted out from the frequent 1-
itemsets.
Step 3: Generate Candidate 2-Itemsets
Combine the frequent 1-itemsets into pairs and calculate their support.
For this use case, we will get 3 item pairs ( bread,butter) , (bread,ilk) and
(butter,milk) and will calculate the support similar to step 2

Candidate 2-Itemsets
Frequent 2-itemsets:
 {Bread, Milk} meet the 50% threshold but {Butter, Milk} and
{Bread ,Butter} doesn't meet the threshold, so will be committed out.
Step 4: Generate Candidate 3-Itemsets
Combine the frequent 2-itemsets into groups of 3 and calculate their
support.
for the triplet, we have only got one case i.e {bread,butter,milk} and we
will calculate the support.

Candidate 3-Itemsets
Since this does not meet the 50% threshold, there are no frequent 3-
itemsets.

Step 5: Generate Association Rules

Now we generate rules from the frequent itemsets and
calculate confidence.
Rule 1: If Bread → Butter (if customer buys bread, the customer will
buy butter also)
 Support of {Bread, Butter} = 2.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 3

Mining Frequent Patterns

 Support of {Bread} = 4.
 Confidence = 2/4 = 50% (Failed threshold).
Rule 2: If Butter → Bread (if customer buys butter, the customer
will buy bread also)
 Support of {Bread, Butter} = 3.
 Support of {Butter} = 3.
 Confidence = 3/3 = 100% (Passes threshold).
Rule 3: If Bread → Milk (if customer buys bread, the customer will
buy milk also)
 Support of {Bread, Milk} = 3.
 Support of {Bread} = 4.
 Confidence = 3/4 = 75% (Passes threshold).

Applications of Apriori Algorithm

Below are some applications of Apriori algorithm used in today's companies
and startups
1. E-commerce: Used to recommend products that are often bought
together, like laptop + laptop bag, increasing sales.
2. Food Delivery Services: Identifies popular combos, such as burger +
fries, to offer combo deals to customers.
3. Streaming Services: Recommends related movies or shows based on
what users often watch together, like action + superhero movies.
4. Financial Services: Analyzes spending habits to suggest personalised
offers, such as credit card deals based on frequent purchases.
5. Travel & Hospitality: Creates travel packages (e.g., flight + hotel) by
finding commonly purchased services together.
6. Health & Fitness: Suggests workout plans or supplements based on
users' past activities, like protein shakes + workouts .

Frequent Pattern Growth Algorithm

The FP-Growth (Frequent Pattern Growth) algorithm efficiently mines
frequent itemsets from large transactional datasets.

How FP-Growth Works

Here's how it works in simple terms:

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 4

Mining Frequent Patterns

Data Compression: First FP-Growth compresses the dataset into a smaller

structure called the Frequent Pattern Tree (FP-Tree). This tree stores
information about item sets (collections of items) and their frequencies
without need to generate candidate sets like Apriori does.

Mining the Tree: The algorithm then examines this tree to identify patterns
that appear frequently based on a minimum support threshold. It does this
by breaking the tree down into smaller "conditional" trees for each item
making the process more efficient.

Generating Patterns: Once the tree is built and analyzed the algorithm
generates the frequent patterns (itemsets) and the rules that describe
relationships between items.

Working of FP- Growth Algorithm

Lets jump to the usage of FP- Growth Algorithm and how it works with reallife
data. Consider the following data:
saction
ID Items

{E,K,M,N,O,Y
T1 }

{D,E,K,N,O,Y
T2 }

T3 {A,E,K,M}

T4 {K,M,Y}

{C,E,I,K,O,O
T5 }

The above-given data is a hypothetical dataset of transactions with each

letter representing an item. The frequency of each individual item is
computed:-

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 5

Mining Frequent Patterns

Ite Frequenc
m y

A 1

C 2

D 1

E 4

I 1

K 5

M 3

N 2

O 4

U 1

Y 3

Let the minimum support be 3. A Frequent Pattern set is built which will
contain all the elements whose frequency is greater than or equal to the
minimum support. These elements are stored in descending order of their
respective frequencies. After insertion of the relevant items, the set L looks
like this:-

L = {K : 5, E : 4, M : 3, O : 4, Y : 3}

Now for each transaction the respective Ordered-Item set is built. It is done
by iterating the Frequent Pattern set and checking if the current item is
contained in the transaction in question. If the current item is contained the

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 6

Mining Frequent Patterns

item is inserted in the Ordered-Item set for the current transaction. The
following table is built for all the transactions:

Transaction Ordered-Item-
ID Items Set

{E,K,M,N,O,Y
{K,E,M,O,Y}
T1 }

{D,E,K,N,O,Y
{K,E,O,Y}
T2 }

T3 {A,E,K,M} {K,E,M}

T4 {C,K,M,U,Y} {K,M,Y}

{C,E,I,K,O,O
{K,E,O}
T5 }

Now all the Ordered-Item sets are inserted into a Tree Data Structure.

a) Inserting the set {K, E, M, O, Y}

Here all the items are simply linked one after the other in the order of
occurrence in the set and initialise the support count for each item as 1. For
inserting {K, E, M, O, Y} we traverse the tree from the root. If a node already
exists for an item, we increase its support count. If it doesn’t exist, we create
a new node for that item and link it to the previous item.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 7

Mining Frequent Patterns

Inserting the set {K, E, M, O, Y}

b) Inserting the set {K, E, O, Y}
Till the insertion of the elements K and E, simply the support count is
increased by 1. On inserting O we can see that there is no direct link
between E and O, therefore a new node for the item O is initialized with the
support count as 1 and item E is linked to this new node. On inserting Y, we
first initialize a new node for the item Y with support count as 1 and link the
new node of O with the new node of Y.

c) Inserting the set {K, E, M}

Here simply the support count of each element is increased by 1.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 8

Mining Frequent Patterns

d) Inserting the set {K, M, Y}

Similar to step b), first the support count of K is increased, then new nodes
for M and Y are initialized and linked accordingly.

e) Inserting the set {K, E, O}

Here simply the support counts of the respective elements are increased.
Note that the support count of the new node of item O is increased.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 9

Mining Frequent Patterns

Note that the items in the below table are arranged in the ascending order of
their
frequencies.

Now for each item, the Conditional Frequent Pattern Tree is built. It is
done by taking the set of elements that is common in all the paths in the
Conditional Pattern Base of that item and calculating its support count by
summing the support counts of all the paths in the Conditional Pattern Base.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 10

Mining Frequent Patterns

Association Rule
Association rule mining finds interesting associations and relationships
among large sets of data items. This rule shows how frequently a itemset
occurs in a transaction. A typical example is a Market Based Analysis. Market
Based Analysis is one of the key techniques used by large relations to show
associations between items.It allows retailers to identify relationships
between the items that people buy together frequently. Given a set of
transactions, we can find rules that will predict the occurrence of an item
based on the occurrences of other items in the transaction.
OR
Association rule mining is a technique in data mining for discovering
interesting relationships, frequent patterns, associations, or correlations,
between variables in large datasets. It’s widely used in various fields such as
market basket analysis, web usage mining, bioinformatics, and more. The
basic idea is to find rules that predict the occurrence of an item based on the
occurrences of other items in the transaction.

Imagine a small dataset representing transactions in a grocery store:

In this table, each row represents a transaction (a customer’s purchase), and

each transaction has a unique ID. The ‘Items Purchased’ column lists the
items bought in that transaction.

Association Rules
Association rule mining finds interesting associations and relationships
among large sets of data items. This rule shows how frequently a itemset
occurs in a transaction

· An association rule is a fundamental concept in data mining that reveals

how items within a dataset are connected. It’s a directive that suggests a

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 11

Mining Frequent Patterns

strong, potentially useful relationship between two sets of items.

· These rules are expressed in the form of “If-Then” statements, typically

written as {X} → {Y}, where X and Y are different sets of items.

Example

let us first see the basic definitions. Support Count(σσ) - Frequency of

occurrence of a itemset.
Here σσ({Milk, Bread, Diaper})=2

Frequent Itemset - An itemset whose support is greater than or equal to

minsup threshold. Association Rule - An implication expression of the
form X -> Y, where X and Y are any 2 itemsets.

Example: {Milk, Diaper}->{Beer}

Rule Evaluation Metrics -
 Support(s) - The number of transactions that include items in the {X}
and {Y} parts of the rule as a percentage of the total number of
transaction. It is a measure of how frequently the collection of items
occur together as a percentage of all transactions.

 Support = σσ(X+Y) ÷÷ total - It is interpreted as fraction of

transactions that contain both X and Y.

 Confidence(c) - It is the ratio of the no of transactions that includes all

items in {B} as well as the no of transactions that includes all items in
{A} to the no of transactions that includes all items in {A}.

 Conf(X=>Y) = Supp(X∪∪Y) ÷÷ Supp(X) - It measures how often

each item in Y appears in transactions that contains items in X also.

 Lift(l) - The lift of the rule X=>Y is the confidence of the rule divided by
the expected confidence, assuming that the itemsets X and Y are
independent of each other. The expected confidence is the confidence
divided by the frequency of {Y}.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 12

Mining Frequent Patterns

 Lift(X=>Y) = Conf(X=>Y) ÷÷ Supp(Y) - Lift value near 1 indicates X

and Y almost often appear together as expected, greater than 1 means
they appear together more than expected and less than 1 means they
appear less than expected. Greater lift values indicate stronger
association.

To illustrate, consider a rule like {Diapers} →{Baby Wipes}. This rule

suggests that in transactions where diapers are bought, there’s a strong
likelihood that baby wipes are also purchased.

Market Basket Analysis

This is a primary application of association rules, focusing on analyzing
purchase patterns. By examining combinations of items that frequently occur
together in purchases, businesses can gain insights into marketing and sales
strategies.

Components of Association Rules — Antecedent and Consequent:

{BREAD, MILK} →{BEER}

Antecedent →Consequent
· Every association rule has two parts: the antecedent (if) and
the consequent (then). For instance, in the rule {Pasta →Sauce}, ‘Pasta’ is
the antecedent, and ‘Sauce’ is the consequent.

· Antecedent (X): This is the first part of the rule, the condition. It’s the set
of items found in the database that you are examining for patterns. In the
rule {X} →{Y}, X is the antecedent.

· Consequent (Y): This is the second part of the rule, which is inferred from
the presence of the antecedent in transactions. In the rule {X} →{Y}, Y is the
consequent.

These sets are disjoint, meaning they do not overlap.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 13

Stat 151 Fall 2019 Syllabus
No ratings yet
Stat 151 Fall 2019 Syllabus
9 pages
IE 317 OR 1 Linear Programming Modeling Examples: Dyanne Brendalyn Mirasol-Cavero, Meie
No ratings yet
IE 317 OR 1 Linear Programming Modeling Examples: Dyanne Brendalyn Mirasol-Cavero, Meie
14 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
4 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Unit3 Data mining Pattern
No ratings yet
Unit3 Data mining Pattern
46 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Data Mining frequent patterns
No ratings yet
Data Mining frequent patterns
22 pages
Association
No ratings yet
Association
40 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Modified Frequent Pattern Mining From Data Stream
No ratings yet
Modified Frequent Pattern Mining From Data Stream
38 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Unit_3 Mining Frequent Patterns
No ratings yet
Unit_3 Mining Frequent Patterns
10 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
2 unit dm k raj kuamr
No ratings yet
2 unit dm k raj kuamr
26 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Association Rules
No ratings yet
Association Rules
48 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
DMDW Chapter 4(Updated)
No ratings yet
DMDW Chapter 4(Updated)
28 pages
Week 3
No ratings yet
Week 3
56 pages
DWDM 3
No ratings yet
DWDM 3
34 pages
L9
No ratings yet
L9
24 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
dm 2
No ratings yet
dm 2
71 pages
DM Unit2_1 Association Mining 19I504
No ratings yet
DM Unit2_1 Association Mining 19I504
86 pages
DMDW-U3
No ratings yet
DMDW-U3
16 pages
Data Mining - Lecture 4
No ratings yet
Data Mining - Lecture 4
40 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Research Journal of Pharmaceutical, Biological and Chemical Sciences
No ratings yet
Research Journal of Pharmaceutical, Biological and Chemical Sciences
7 pages
Data Mining
No ratings yet
Data Mining
41 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
DM UNIT-2
No ratings yet
DM UNIT-2
14 pages
Mod_5
No ratings yet
Mod_5
56 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
UNIT-iii
No ratings yet
UNIT-iii
13 pages
DWDM Mid Ii
No ratings yet
DWDM Mid Ii
13 pages
Association Rules
No ratings yet
Association Rules
20 pages
CK: Candidate Itemset of Size K LK: Frequent Itemset of Size K L1 (Frequent Items) Ck+1 Candidates Generated From LK
No ratings yet
CK: Candidate Itemset of Size K LK: Frequent Itemset of Size K L1 (Frequent Items) Ck+1 Candidates Generated From LK
7 pages
Lecture_4
No ratings yet
Lecture_4
76 pages
Unit 3
No ratings yet
Unit 3
62 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
30-Distance-Based Localization Methods-22-10-2024
No ratings yet
30-Distance-Based Localization Methods-22-10-2024
26 pages
Modeling Optimize Process Control: Around
No ratings yet
Modeling Optimize Process Control: Around
6 pages
2019HT01605 - ES ZG553 - EC2MAssignment
No ratings yet
2019HT01605 - ES ZG553 - EC2MAssignment
7 pages
ME Optimization With Calculus
No ratings yet
ME Optimization With Calculus
2 pages
CU4073 -SET 2
No ratings yet
CU4073 -SET 2
3 pages
Econometrics - Basic Concepts - Apte -Aug 2021
No ratings yet
Econometrics - Basic Concepts - Apte -Aug 2021
1 page
79ca22bd-c90a-4dd8-9304-4c8a6f02217d.pdf
No ratings yet
79ca22bd-c90a-4dd8-9304-4c8a6f02217d.pdf
2 pages
Problem Set #4 Due: 1:00pm On Wednesday, February 19: Written Problems
No ratings yet
Problem Set #4 Due: 1:00pm On Wednesday, February 19: Written Problems
5 pages
Estimating_the_probability_of_a_shot_resulting_in_
No ratings yet
Estimating_the_probability_of_a_shot_resulting_in_
15 pages
Chained Matrix Multiplication
No ratings yet
Chained Matrix Multiplication
32 pages
0xPARC
No ratings yet
0xPARC
70 pages
Predicting 28 Days Compressive Strength of Concrete From 7 Days Test Result
No ratings yet
Predicting 28 Days Compressive Strength of Concrete From 7 Days Test Result
1 page
FDS-Unit1-2 QB
No ratings yet
FDS-Unit1-2 QB
3 pages
DSA Imp 170 Questions
No ratings yet
DSA Imp 170 Questions
4 pages
Machine Learning Models For Equivalent Circulating Density Prediction From Drilling Data, Hany GamaL, 2021, 13 PG
No ratings yet
Machine Learning Models For Equivalent Circulating Density Prediction From Drilling Data, Hany GamaL, 2021, 13 PG
13 pages
INEQUALITIES
No ratings yet
INEQUALITIES
3 pages
Synchronous Sequential Circuit Problems: Problem # 1
No ratings yet
Synchronous Sequential Circuit Problems: Problem # 1
6 pages
Data Structure Program
No ratings yet
Data Structure Program
9 pages
Ps 1 PP
No ratings yet
Ps 1 PP
2 pages
Grade 7 Week 4 and Week 5 Las 5
No ratings yet
Grade 7 Week 4 and Week 5 Las 5
2 pages
Lab 02
No ratings yet
Lab 02
4 pages
Zatona
No ratings yet
Zatona
9 pages
Assignment #2
No ratings yet
Assignment #2
3 pages
Building Face Ageing Model Using Face Synthesis
No ratings yet
Building Face Ageing Model Using Face Synthesis
7 pages
Pricing of Quanto Options
No ratings yet
Pricing of Quanto Options
85 pages
Breadth First Search
No ratings yet
Breadth First Search
17 pages
Divide and Conquer Algorithm
No ratings yet
Divide and Conquer Algorithm
7 pages
The Simple Regression Model: Introductory Econometrics: A Modern Approach (Wooldridge)
No ratings yet
The Simple Regression Model: Introductory Econometrics: A Modern Approach (Wooldridge)
15 pages

Mining Frequent Patterns Unit-3

Uploaded by

Mining Frequent Patterns Unit-3

Uploaded by

Mining Frequent Patterns

Frequent Item set in Data set

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 1

A set of items together is called an itemset. If any itemset has k-items it is

Apriori Algorithm is a foundational method in data mining used for

How the Apriori Algorithm Works?

The Apriori Algorithm operates through a systematic process that involves

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 2

All items have support% ≥ 50%, so they qualify as frequent 1-itemsets.

Step 5: Generate Association Rules

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 3

Applications of Apriori Algorithm

Frequent Pattern Growth Algorithm

How FP-Growth Works

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 4

Data Compression: First FP-Growth compresses the dataset into a smaller

Working of FP- Growth Algorithm

The above-given data is a hypothetical dataset of transactions with each

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 5

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 6

a) Inserting the set {K, E, M, O, Y}

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 7

Inserting the set {K, E, M, O, Y}

c) Inserting the set {K, E, M}

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 8

d) Inserting the set {K, M, Y}

e) Inserting the set {K, E, O}

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 9

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 10

Imagine a small dataset representing transactions in a grocery store:

In this table, each row represents a transaction (a customer’s purchase), and

· An association rule is a fundamental concept in data mining that reveals

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 11

strong, potentially useful relationship between two sets of items.

· These rules are expressed in the form of “If-Then” statements, typically

let us first see the basic definitions. Support Count(σσ) - Frequency of

Frequent Itemset - An itemset whose support is greater than or equal to

Example: {Milk, Diaper}->{Beer}

 Support = σσ(X+Y) ÷÷ total - It is interpreted as fraction of

 Confidence(c) - It is the ratio of the no of transactions that includes all

 Conf(X=>Y) = Supp(X∪∪Y) ÷÷ Supp(X) - It measures how often

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 12

 Lift(X=>Y) = Conf(X=>Y) ÷÷ Supp(Y) - Lift value near 1 indicates X

To illustrate, consider a rule like {Diapers} →{Baby Wipes}. This rule

Market Basket Analysis

Components of Association Rules — Antecedent and Consequent:

These sets are disjoint, meaning they do not overlap.

Dr. Vidya Pawar, Vedanta Degree College, Raichur Page 13

You might also like