0% found this document useful (0 votes)

29 views10 pages

Unit_3 Mining Frequent Patterns

The document discusses the fundamentals of frequent pattern mining in data science, highlighting its importance in identifying patterns within datasets. It covers various algorithms such as Apriori, ECLAT, and FP-growth, detailing their applications and advantages in areas like market basket analysis and recommendation systems. Additionally, it explains the concepts of support, confidence, and lift in association rule learning, along with its diverse applications in fields like retail and healthcare.

Uploaded by

mohammedsadiqhafeez9786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views10 pages

Unit_3 Mining Frequent Patterns

Uploaded by

mohammedsadiqhafeez9786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Fundamentals of Data Science

SMD COLLEGE, HOSAPETE

FUNDAMENTALS OF DATA SCIENCE

UNIT 3: Mining Frequent Patterns

PREPARED BY:
KUMAR MUTTURAJ
BCA DEPT
Fundamentals of Data Science

UNIT 3: Mining Frequent Patterns

Basic Concepts:
Frequent pattern mining in data mining is the process of identifying patterns or associations
within a dataset that occur frequently. This is typically done by analyzing large datasets to
find items or sets of items that appear together frequently.
Frequent pattern extraction is an essential mission in data mining that intends to uncover
repetitive patterns or item sets in a granted dataset. It encompasses recognizing collections
of components that occur together frequently in a transactional or relational database. This
procedure can offer valuable perceptions into the connections and affiliations among
diverse components or features within the data.

There are several different algorithms used for frequent pattern mining, including:
1. Apriori algorithm: This is one of the most commonly used algorithms for frequent
pattern mining. It uses a “bottom-up” approach to identify frequent itemsets and
then generates association rules from those itemsets.
2. ECLAT algorithm: This algorithm uses a “depth-first search” approach to identify
frequent itemsets. It is particularly efficient for datasets with a large number of
items.
3. FP-growth algorithm: This algorithm uses a “compression” technique to find
frequent patterns efficiently. It is particularly efficient for datasets with a large
number of transactions.
4. Frequent pattern mining has many applications, such as Market Basket Analysis,
Recommender Systems, Fraud Detection, and many more.
Fundamentals of Data Science

Frequent-item-set –mining methods:

Association Mining searches for frequent items in the data set. In frequent mining usually,
interesting associations and correlations between item sets in transactional and relational
databases are found. In short, Frequent Mining shows which items appear together in a
transaction or relationship.
Need of Association Mining: Frequent mining is the generation of association rules from a
Transactional Dataset. If there are 2 items X and Y purchased frequently then it’s good to
put them together in stores or provide some discount offer on one item on purchase of
another item. This can really increase sales. For example, it is likely to find that if a customer
buys Milk and bread he/she also buys Butter. So the association rule
is [‘milk]^[‘bread’]=>[‘butter’]. So the seller can suggest the customer buy butter if he/she
buys Milk and Bread.

Advantages of using frequent item sets and association rule mining include:
1. Efficient discovery of patterns: Association rule mining algorithms are efficient at
discovering patterns in large datasets, making them useful for tasks such as market
basket analysis and recommendation systems.
2. Easy to interpret: The results of association rule mining are easy to understand and
interpret, making it possible to explain the patterns found in the data.
3. Can be used in a wide range of applications: Association rule mining can be used in a
wide range of applications such as retail, finance, and healthcare, which can help to
improve decision-making and increase revenue.
4. Handling large datasets: These algorithms can handle large datasets with many items
and transactions, which makes them suitable for big-data scenarios.
Disadvantages of using frequent item sets and association rule mining include:
1. Large number of generated rules: Association rule mining can generate a large
number of rules, many of which may be irrelevant or uninteresting, which can make
it difficult to identify the most important patterns.
2. Limited in detecting complex relationships: Association rule mining is limited in its
ability to detect complex relationships between items, and it only considers the co-
occurrence of items in the same transaction.
3. Can be computationally expensive: As the number of items and transactions
increases, the number of candidate item sets also increases, which can make the
algorithm computationally expensive.
4. Need to define the minimum support and confidence threshold: The minimum
support and confidence threshold must be set before the association rule mining
process, which can be difficult and requires a good understanding of the data.

Apriori algorithm:
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets
in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses
prior knowledge of frequent itemset properties. We apply an iterative approach or level-
wise search where k-frequent itemsets are used to find k+1 itemsets.
Fundamentals of Data Science

To improve the efficiency of level-wise generation of frequent itemsets, an important

property is used called Apriori property which helps by reducing the search space.

Apriori Property –
All non-empty subset of frequent itemset must be frequent. The key concept of Apriori
algorithm is its anti-monotonicity of support measure. Apriori assumes that
All subsets of a frequent itemset must be frequent(Apriori property).
If an itemset is infrequent, all its supersets will be infrequent.
Before we start understanding the algorithm, go through some definitions which are
explained in my previous post.
Consider the following dataset and we will find frequent itemsets and generate association
rules for them.

minimum support count is 2

minimum confidence is 60%

Step-1: K=1
(I) Create a table containing support count of each item present in dataset –
Called C1(candidate set)

(II) compare candidate set item’s support count with minimum support count(here
min_support=2 if support_count of candidate set items is less than min_support then
remove those items). This gives us itemset L1.
Fundamentals of Data Science

Step-2: K=2
 Generate candidate set C2 using L1 (this is called join step). Condition of joining L k-
1 and Lk-1 is that it should have (K-2) elements in common.
 Check all subsets of an itemset are frequent or not and if not frequent remove that
itemset.(Example subset of{I1, I2} are {I1}, {I2} they are frequent.Check for each
itemset)
 Now find support count of these item sets by searching in dataset.

(II) compare candidate (C2) support count with minimum support count(here
min_support=2 if support_count of candidate set item is less than min_support then
remove those items) this gives us itemset L2.

Step-3:
o Generate candidate set C3 using L2 (join step). Condition of joining Lk-1 and
Lk-1 is that it should have (K-2) elements in common. So here, for L2, first
element should match.
So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5}{I1, I3, i5}{I2, I3, I4}{I2,
I4, I5}{I2, I3, I5}
o Check if all subsets of these itemsets are frequent or not and if not, then
remove that itemset.(Here subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which
are frequent. For {I2, I3, I4}, subset {I3, I4} is not frequent so remove it.
Similarly check for every itemset)
o find support count of these remaining itemset by searching in dataset.
Fundamentals of Data Science

(II) Compare candidate (C3) support count with minimum support count(here
min_support=2 if support_count of candidate set item is less than min_support then
remove those items) this gives us itemset L3.

Step-4:
o Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and
Lk-1 (K=4) is that, they should have (K-2) elements in common. So here, for L3,
first 2 elements (items) should match.
o Check all subsets of these item sets are frequent or not (Here itemset formed
by joining L3 is {I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not
frequent). So no itemset in C4
o We stop here because no frequent itemsets are found further

Thus, we have discovered all the frequent item-sets. Now generation of strong association
rule comes into picture. For that we need to calculate confidence of each rule.

Confidence –
A confidence of 60% means that 60% of the customers, who purchased milk and bread also
bought butter.

Confidence(A->B)=Support_count(A∪B)/Support_count(A)

So here, by taking an example of any frequent itemset, we will show the rule generation.
Itemset {I1, I2, I3} //from L3

SO rules can be
[I1Î2]=>[I3] //confidence = sup(I1Î2Î3)/sup(I1Î2) = 2/4*100=50%
[I1Î3]=>[I2] //confidence = sup(I1Î2Î3)/sup(I1Î3) = 2/4*100=50%
[I2Î3]=>[I1] //confidence = sup(I1Î2Î3)/sup(I2Î3) = 2/4*100=50%
[I1]=>[I2Î3] //confidence = sup(I1Î2Î3)/sup(I1) = 2/6*100=33%
[I2]=>[I1Î3] //confidence = sup(I1Î2Î3)/sup(I2) = 2/7*100=28%
[I3]=>[I1Î2] //confidence = sup(I1Î2Î3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be considered as strong association
rules.

Frequent Pattern Growth:

Frequent Pattern Growth (FP-Growth) is a popular algorithm in data mining for finding
frequent patterns (sets of items that frequently occur together) in transactional databases.
It aims to improve upon the Apriori algorithm by using an FP-tree (Frequent Pattern tree)
Fundamentals of Data Science

structure to store compressed and structured frequent item sets.

Suppose we have a transactional database where each transaction consists of items

purchased by customers:

Transaction 1: {bread, milk, butter}

Transaction 2: {bread, coffee}
Transaction 3: {bread, butter}
Transaction 4: {bread, milk, coffee}
Transaction 5: {bread, milk}
And let's say our minimum support threshold is 2 (meaning we are interested in itemsets
that appear in at least 2 transactions).

Step 1: Counting item frequencies

First, we count the frequency of each item:
 bread: 5
 milk: 3
 butter: 2
 coffee: 2
Since the minimum support threshold is 2, all items meet this criterion.

Step 2: Building the FP-Tree

Next, we construct the FP-tree:

1. Transaction 1: {bread, milk, butter}

o Insert items into the tree: bread -> milk -> butter
2. Transaction 2: {bread, coffee}
o Insert items into the tree: bread -> coffee
3. Transaction 3: {bread, butter}
o Insert items into the tree: bread -> butter
4. Transaction 4: {bread, milk, coffee}
o Insert items into the tree: bread -> milk -> coffee
5. Transaction 5: {bread, milk}
o Insert items into the tree: bread -> milk

The FP-tree structure will look like this:

FP-tree:

- bread (5)
- milk (3)
- butter (1)
- coffee (1)
- butter (1)
- coffee (1)
Here, the numbers in parentheses indicate the count of transactions that contain that
itemset.
Fundamentals of Data Science

Step 3: Mining frequent itemsets

From the FP-tree, we can generate frequent itemsets. Starting from the bottom of the tree
(the least frequent items), we can generate frequent itemsets:
 {milk}: 3 transactions
 {butter}: 2 transactions
 {coffee}: 2 transactions
 {bread, milk}: 3 transactions
 {bread, butter}: 2 transactions
 {bread, coffee}: 2 transactions

Thus, the frequent itemsets above the minimum support threshold (2 transactions) are
{milk}, {butter}, {coffee}, {bread, milk}, {bread, butter}, and {bread, coffee}.

This example demonstrates the basic process of building an FP-tree from transactional
data and using it to efficiently mine frequent itemsets without generating candidate
itemsets explicitly, which is the key advantage of the FP-Growth algorithm.

Mining Association Rules:

Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be
more profitable. It tries to find some interesting relations or associations among the
variables of dataset. It is based on different rules to discover the interesting relations
between variables in the database.

The association rule learning is one of the very important concepts of machine learning,
and it is employed in Market Basket analysis, Web usage mining, continuous production,
etc. Here market basket analysis is a technique used by the various big retailer to discover
the associations between items. We can understand it by taking an example of a
supermarket, as in a supermarket, all products that are purchased together are put
together.

For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so
these products are stored within a shelf or mostly nearby. Consider the below diagram:
Fundamentals of Data Science

Association rule learning can be divided into three types of algorithms:

1. Apriori
2. Eclat
3. F-P Growth Algorithm

Association rule learning works on the concept of If and Else Statement, such as if A then
B.

Here the If element is called antecedent, and then statement is called as Consequent.
These types of relationships where we can find out some association or relation between
two items is known as single cardinality. It is all about creating rules, and if the number of
items increases, then cardinality also increases accordingly. So, to measure the associations
between thousands of data items, there are several metrics. These metrics are given below:

o Support
o Confidence
o Lift

Let's understand each of them:

Support
Support is the frequency of A or how frequently an item appears in the dataset. It is
defined as the fraction of the transaction T that contains the itemset X. If there are X
datasets, then for transactions T, it can be written as:

Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items
X and Y occur together in the dataset when the occurrence of X is already given. It is the
ratio of the transaction that contains X and Y to the number of records that contain X.

Lift
It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y are
independent of each other. It has three possible values:
o If Lift= 1: The probability of occurrence of antecedent and consequent is
independent of each other.
Fundamentals of Data Science

o Lift>1: It determines the degree to which the two itemsets are dependent to each
other.
o Lift<1: It tells us that one item is a substitute for other items, which means one item
has a negative effect on another.

Applications of Association Rule Learning:

It has various applications in machine learning and data mining. Below are some popular
applications of association rule learning:
o Market Basket Analysis: It is one of the popular examples and applications of
association rule mining. This technique is commonly used by big retailers to
determine the association between items.

o Medical Diagnosis: With the help of association rules, patients can be cured easily,
as it helps in identifying the probability of illness for a particular disease.

o Protein Sequence: The association rules help in determining the synthesis of

artificial Proteins.

o It is also used for the Catalog Design and Loss-leader Analysis and many more
other applications.

Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Final Exam Eduardo Ballesteros 1103 Nivel 3
100% (1)
Final Exam Eduardo Ballesteros 1103 Nivel 3
6 pages
The Beautiful, The Sublime, and The Picturesque in Eighteenth-Century
100% (4)
The Beautiful, The Sublime, and The Picturesque in Eighteenth-Century
426 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Association Rules
No ratings yet
Association Rules
48 pages
Unit 3 Data Science
No ratings yet
Unit 3 Data Science
15 pages
Association
No ratings yet
Association
40 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Week 3
No ratings yet
Week 3
56 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Association Rules
No ratings yet
Association Rules
24 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Mod_5
No ratings yet
Mod_5
56 pages
Data Analytics and Visualization Unit-IV
No ratings yet
Data Analytics and Visualization Unit-IV
4 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Unit - III
No ratings yet
Unit - III
27 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
dm 2
No ratings yet
dm 2
71 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
5 pages
DMDW-U3
No ratings yet
DMDW-U3
16 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Unit-4_Part-1
No ratings yet
Unit-4_Part-1
152 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Data Mining frequent patterns
No ratings yet
Data Mining frequent patterns
22 pages
Chap4-PatternMiningBasic
No ratings yet
Chap4-PatternMiningBasic
52 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
21 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Data Mining
No ratings yet
Data Mining
41 pages
Unit 3
No ratings yet
Unit 3
62 pages
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Deductive & Inductive Method
No ratings yet
Deductive & Inductive Method
25 pages
A Compative Study of Learning Organization Models PDF
No ratings yet
A Compative Study of Learning Organization Models PDF
26 pages
What Are The Advantages and Disadvantages of Storing Redundant Information, Such As A Bird Can Breathe', in A Semantic Network.
No ratings yet
What Are The Advantages and Disadvantages of Storing Redundant Information, Such As A Bird Can Breathe', in A Semantic Network.
2 pages
Lesson 14 - Korean Passive Verbs
No ratings yet
Lesson 14 - Korean Passive Verbs
18 pages
Model Based Reflex Agent
No ratings yet
Model Based Reflex Agent
4 pages
English 10 - Identifying Key Structural Elements and Language Features
75% (4)
English 10 - Identifying Key Structural Elements and Language Features
20 pages
Work Cooperatively in A General Administration Environment - FFP
No ratings yet
Work Cooperatively in A General Administration Environment - FFP
90 pages
Assignment BPCC 107 (E) July 2021 - Jan 22
No ratings yet
Assignment BPCC 107 (E) July 2021 - Jan 22
4 pages
Teacher Idi Discussion Guide-1
No ratings yet
Teacher Idi Discussion Guide-1
2 pages
Descriptive Text V 0.1
No ratings yet
Descriptive Text V 0.1
4 pages
Sun & Nippold, 2012
No ratings yet
Sun & Nippold, 2012
14 pages
Wa0001.
No ratings yet
Wa0001.
374 pages
Morphologization in Turkish: Implications For Phonology in Grammaticalization
No ratings yet
Morphologization in Turkish: Implications For Phonology in Grammaticalization
10 pages
Theories of Personality Midterm
No ratings yet
Theories of Personality Midterm
18 pages
Lexi - Actividad 2 - Resumen
No ratings yet
Lexi - Actividad 2 - Resumen
11 pages
Yesterday and Today
No ratings yet
Yesterday and Today
20 pages
Peer To Peer Marking Rubric
No ratings yet
Peer To Peer Marking Rubric
1 page
Pedagogy 06-06-2025 LTR
No ratings yet
Pedagogy 06-06-2025 LTR
42 pages
ENGL-102 Comm Arts 1
No ratings yet
ENGL-102 Comm Arts 1
6 pages
Koltko-Rivera 2005 VR-rev
No ratings yet
Koltko-Rivera 2005 VR-rev
10 pages
3 Esol Teaching Skills Taskbook Unit 3
100% (1)
3 Esol Teaching Skills Taskbook Unit 3
62 pages
Speaking Test Sample
100% (1)
Speaking Test Sample
7 pages
We Usually Use in Spite of and Despite With A Noun
No ratings yet
We Usually Use in Spite of and Despite With A Noun
8 pages
The Price of Love Losing Two of Your Closest Friends
No ratings yet
The Price of Love Losing Two of Your Closest Friends
4 pages
Tesol Module 3 Lesson Planning 3
No ratings yet
Tesol Module 3 Lesson Planning 3
49 pages
LF Onlineshopping TN 764968
No ratings yet
LF Onlineshopping TN 764968
2 pages
Magic of Self Hypnosis Notes
No ratings yet
Magic of Self Hypnosis Notes
39 pages
Module 8 Information Communication Technology
No ratings yet
Module 8 Information Communication Technology
2 pages

Unit_3 Mining Frequent Patterns

Uploaded by

Unit_3 Mining Frequent Patterns

Uploaded by

Fundamentals of Data Science

SMD COLLEGE, HOSAPETE

FUNDAMENTALS OF DATA SCIENCE

UNIT 3: Mining Frequent Patterns

Frequent-item-set –mining methods:

To improve the efficiency of level-wise generation of frequent itemsets, an important

minimum support count is 2

Frequent Pattern Growth:

structure to store compressed and structured frequent item sets.

Suppose we have a transactional database where each transaction consists of items

Transaction 1: {bread, milk, butter}

Step 1: Counting item frequencies

Step 2: Building the FP-Tree

Next, we construct the FP-tree:

1. Transaction 1: {bread, milk, butter}

The FP-tree structure will look like this:

Step 3: Mining frequent itemsets

Mining Association Rules:

Association rule learning can be divided into three types of algorithms:

Let's understand each of them:

Applications of Association Rule Learning:

o Protein Sequence: The association rules help in determining the synthesis of

You might also like