0% found this document useful (0 votes)

0 views

DM-u3

Unit III covers mining frequent patterns, focusing on concepts such as the Apriori algorithm for finding frequent itemsets, generating association rules, and mining multilevel associations. It emphasizes the importance of frequent pattern analysis in various applications, including market basket analysis and data classification. The document also discusses the limitations and benefits of market basket analysis, as well as improvements to the Apriori method for efficient candidate generation and support counting.

Uploaded by

swapnilmahajan231

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

DM-u3

Uploaded by

swapnilmahajan231

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Unit –III

Mining Frequent Patterns

Unit - III
• Basic Concepts

• Apriori Algorithm: Finding Frequent Itemsets by Confined

Candidate Generation

• Generating Association Rules from Frequent Itemsets

• Mining Multilevel Associations

• Constraint-Based Frequent Pattern Mining

What Is Frequent Pattern Analysis?
• Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.)
that occurs frequently in a data set
• First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of
frequent itemsets and association rule mining
• Motivation: Finding inherent regularities in data
– What products were often purchased together?— Beer and diapers?!
– What are the subsequent purchases after buying a PC?
– What kinds of DNA are sensitive to this new drug?
– Can we automatically classify web documents?
• Applications
– Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.

3
Why Is Freq. Pattern Mining Important?

• Freq. pattern: An intrinsic and important property of datasets

• Foundation for many essential data mining tasks
– Association, correlation, and causality analysis
– Sequential, structural (e.g., sub-graph) patterns
– Pattern analysis in spatiotemporal, multimedia, time-series,
and stream data
– Classification: discriminative, frequent pattern analysis
– Cluster analysis: frequent pattern-based clustering
– Data warehousing: iceberg cube and cube-gradient
– Semantic data compression: fascicles
– Broad applications
4
Basic Concepts: Frequent Patterns
Tid Items bought • itemset: A set of one or more items
10 Beer, Nuts, Diaper • k-itemset X = {x1, …, xk}
20 Beer, Coffee, Diaper
• (absolute) support, or, support
30 Beer, Diaper, Eggs count of X: Frequency or
40 Nuts, Eggs, Milk occurrence of an itemset X
50 Nuts, Coffee, Diaper, Eggs, Milk
• (relative) support, s, is the fraction
Customer Customer
of transactions that contains X (i.e.,
buys both buys diaper the probability that a transaction
contains X)
• An itemset X is frequent if X’s
support is no less than a minsup
threshold
Customer
buys beer
5
Basic Concepts: Association Rules
Tid Items bought • Find all the rules X  Y with
10 Beer, Nuts, Diaper
minimum support and confidence
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs – support, s, probability that a
40 Nuts, Eggs, Milk transaction contains X  Y
50 Nuts, Coffee, Diaper, Eggs, Milk
– confidence, c, conditional
Customer
buys both
Customer probability that a transaction
buys
diaper
having X also contains Y
Let minsup = 50%, minconf = 50%
Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer,
Customer Diaper}:3
buys beer  Association rules: (many more!)
 Beer  Diaper (60%, 100%)
 Diaper  Beer (60%, 75%)
6
Closed Patterns and Max-Patterns
• A long pattern contains a combinatorial number of sub-
patterns, e.g., {a1, …, a100} contains (1001) + (1002) + … + (110000) =
2100 – 1 = 1.27*1030 sub-patterns!
• Solution: Mine closed patterns and max-patterns instead
• An itemset X is closed if X is frequent and there exists no super-
pattern Y ‫ כ‬X, with the same support as X (proposed by
Pasquier, et al. @ ICDT’99)
• An itemset X is a max-pattern if X is frequent and there exists
no frequent super-pattern Y ‫ כ‬X (proposed by Bayardo @
SIGMOD’98)
• Closed pattern is a lossless compression of freq. patterns
– Reducing the # of patterns and rules
7
Closed Patterns and Max-Patterns
• Exercise. DB = {<a1, …, a100>, < a1, …, a50>}
– Min_sup = 1.
• What is the set of closed itemset?
– <a1, …, a100>: 1
– < a1, …, a50>: 2
• What is the set of max-pattern?
– <a1, …, a100>: 1
• What is the set of all patterns?
– !!
8
Frequent Pattern Mining

•Frequent Patterns:
Frequent Patterns are patterns that occur frequently in
data.

•Three types of frequent patterns

Frequent itemset
Frequent sequential pattern
Frequent structured pattern
Frequent Pattern Mining (cntd…)
•Frequent itemset. :A set of items, such as milk and bread, that
appear frequently together in a transaction data set is a frequent
itemset.

•Frequent sequential pattern : If a subsequence occurs frequently

in a shopping history database, it is a frequent sequential pattern.

•Frequent structured pattern :If a substructure occurs frequently, it

is called a frequent structured pattern
Frequent Pattern Mining (cntd…)

•Searches for recurring relationships in a given data set.

•Plays an essential role in associations mining.
•Helps in data classification, clustering and other data mining
tasks
Market Basket Analysis

•The earliest form of frequent pattern mining is Market Basket

Analysis.
•Consider shopping cart filled with several items.
•From marketing perspective, determining which items are
frequently purchased together within the same transaction
Market Basket Analysis (cntd…)
Market Basket Analysis (cntd…)
•To categorize customer purchase behavior
•To identify actionable information
–purchase profiles
–profitability of each purchase profile
–use for marketing
•Store layouts
•Design catalogs
•select products for promotion
•space allocation, product placement
•To plan marketing or advertising strategies.
•To plan which items to put on sale at reduced prices.
Transactions database Example 1
TID Products Attributes converted to binary flags
1 A, B, E TID A B C D E
2 B, D 1 1 1 0 0 1
3 B, C 2 0 1 0 1 0
4 A, B, D 3 0 1 1 0 0
5 A, C 4 1 1 0 1 0
6 B, C 5 1 0 1 0 0
7 A, C 6 0 1 1 0 0
8 A, B, C, E 7 1 0 1 0 0
9 A, B, C 8 1 1 1 0 1
9 1 1 1 0 0

15
Support and Confidence
Transactions database Example 1
TID Products
1 A, B, E Examples:
2 B, D A C
3 B, C
4 A, B, D
Support: 4/9 = 44%
5 A, C •Confidence: 4/6 = 66%
6 B, C
7 A, C
8 A, B, C, E
9 A, B, C Customer
Customer
buys A ,C
buys C

Customer
buys A
17
Market Basket Analysis (cntd…)
•LIMITATIONS
–takes over 18 months to implement
–market basket analysis only identifies hypotheses,
which need to be tested
•neural network, regression, decision
tree
analyses

–measurement of impact needed

–difficult to identify product groupings
–complexity grows exponentially
Market Basket Analysis (cntd…)
•BENEFITS:
•simple computations
–can be undirected (don’t have to have hypotheses
before analysis)
–different data forms can be analyzed
Apriori: A Candidate Generation & Test Approach
• Apriori Property: Any subset of a frequent itemset must be
frequent
• Apriori pruning principle: If there is any itemset which is
infrequent, its superset should not be generated/tested!
(Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)
• Method:
– Initially, scan DB once to get frequent 1-itemset
– Generate length (k+1) candidate itemsets from length k
frequent itemsets
– Test the candidates against DB
– Terminate when no frequent or candidate set can be
generated
Apriori Algorithm: Finding Frequent Itemsets
by Confined Candidate Generation—An Example
Supmin = 2
Itemset sup
Database TDB Itemset sup
{A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 C2
Itemset sup Itemset
L2 Itemset sup {A, B} 1 2nd scan {A, B}
{A, C} 2 {A, C} 2 {A, C}
{B, C} 2 {A, E} 1 {A, E}
{B, E} 3 {B, C} 2
{B, C}
{C, E} 2 {B, E} 3
{B, E}
{C, E} 2
{C, E}

C3 Itemset L3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2
The Apriori Algorithm (Pseudo-Code)
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Implementation of Apriori
• How to generate candidates?
– Step 1: self-joining Lk
– Step 2: pruning
• Example of Candidate-generation
– L3={abc, abd, acd, ace, bcd}
– Self-joining: L3*L3
• abcd from abc and abd
• acde from acd and ace
– Pruning:
• acde is removed because ade is not in L3
– C4 = {abcd}
23
How to Count Supports of Candidates?

• Why counting supports of candidates a problem?

– The total number of candidates can be very huge
– One transaction may contain many candidates
• Method:
– Candidate itemsets are stored in a hash-tree
– Leaf node of hash-tree contains a list of itemsets and
counts
– Interior node contains a hash table
– Subset function: finds all the candidates contained in a
transaction
25
Example
Generating association rules from frequent itemset

If the minimum strong. confidence threshold is, say, 70%, then only the second,
third, and last rules are output, because these are the only ones generated that are
Further Improvement of the Apriori Method

• Major computational challenges

– Multiple scans of transaction database
– Huge number of candidates
– Tedious workload of support counting for candidates
• Improving Apriori: general ideas
– Reduce passes of transaction database scans
– Shrink number of candidates
– Facilitate support counting of candidates
•Association rules from frequent itemset
-multilevel Association rules
-multidimensional Association rules

Functional Occlusion 2
No ratings yet
Functional Occlusion 2
30 pages
OSN 1800 V100R009C00 Product Description 01
100% (3)
OSN 1800 V100R009C00 Product Description 01
425 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
38 GM_ASAP-Association Rule Mining
No ratings yet
38 GM_ASAP-Association Rule Mining
64 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
MS (Data Science) Fall 2020 Semester
No ratings yet
MS (Data Science) Fall 2020 Semester
36 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Unit-2
No ratings yet
Unit-2
65 pages
Unit II
No ratings yet
Unit II
22 pages
Module 3
No ratings yet
Module 3
136 pages
14-Introduction to Apriori level wise algorithm-03-09-2024
No ratings yet
14-Introduction to Apriori level wise algorithm-03-09-2024
32 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Chap4-PatternMiningBasic
No ratings yet
Chap4-PatternMiningBasic
52 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
VIPDMTheoryChapter 5
No ratings yet
VIPDMTheoryChapter 5
96 pages
06 FPBasic
No ratings yet
06 FPBasic
74 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Unit-4_Part-1
No ratings yet
Unit-4_Part-1
152 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
CH - 5
No ratings yet
CH - 5
43 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
5 Frequent Pattern Mining
No ratings yet
5 Frequent Pattern Mining
44 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
No ratings yet
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
86 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Data Mining M2
No ratings yet
Data Mining M2
18 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
Apriori and FP-Growth Algorithm
No ratings yet
Apriori and FP-Growth Algorithm
48 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Unit 3
No ratings yet
Unit 3
62 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Association Rule Mining
No ratings yet
Association Rule Mining
21 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Data Mining: Frequent Itemsets and Association Rules
No ratings yet
Data Mining: Frequent Itemsets and Association Rules
105 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
How to Build a Chicken Coop for under $250
From Everand
How to Build a Chicken Coop for under $250
Business Success Shop
No ratings yet
MUCLecture 2024 41742243
No ratings yet
MUCLecture 2024 41742243
7 pages
Mobile-Edge Computing - Introductory Technical White Paper V1 18-09-14
No ratings yet
Mobile-Edge Computing - Introductory Technical White Paper V1 18-09-14
36 pages
Leland Blank, Anthony Tarquin-Engineering Economy-McGraw-Hill Science - Engineering - Math (2011) - 163-170
No ratings yet
Leland Blank, Anthony Tarquin-Engineering Economy-McGraw-Hill Science - Engineering - Math (2011) - 163-170
8 pages
A Strategic Analysis of Capitec Bank Limited Within The South African Banking Industry
No ratings yet
A Strategic Analysis of Capitec Bank Limited Within The South African Banking Industry
100 pages
45-Vibrating Screens PDF
100% (2)
45-Vibrating Screens PDF
12 pages
A New Online Journal Management System
No ratings yet
A New Online Journal Management System
2 pages
Tool Steel Alloy Cr12MoV
100% (1)
Tool Steel Alloy Cr12MoV
5 pages
Geography G4 Final
No ratings yet
Geography G4 Final
5 pages
Ch-6 My childhood
No ratings yet
Ch-6 My childhood
2 pages
Global Trade Operation_ ASM3_s3907574_ Ngan Nguyen-2
No ratings yet
Global Trade Operation_ ASM3_s3907574_ Ngan Nguyen-2
19 pages
SME Evaluation Matrix Original Template
No ratings yet
SME Evaluation Matrix Original Template
3 pages
Parametric Design Between Design and Application in Contemporary Architecture
No ratings yet
Parametric Design Between Design and Application in Contemporary Architecture
17 pages
Instant Download Planetary Atmospheres 1st Edition F.W. Taylor PDF All Chapters
100% (6)
Instant Download Planetary Atmospheres 1st Edition F.W. Taylor PDF All Chapters
81 pages
Design A Hotel Management System
No ratings yet
Design A Hotel Management System
16 pages
RR 01 Artificial Intelligence
No ratings yet
RR 01 Artificial Intelligence
14 pages
Ksi Pietro
No ratings yet
Ksi Pietro
4 pages
Savonius Wind Turbine
No ratings yet
Savonius Wind Turbine
4 pages
User Manual: Nanochill N C - 11 0 A
No ratings yet
User Manual: Nanochill N C - 11 0 A
10 pages
.Review of Seat Testing and Evaluation Regulation
No ratings yet
.Review of Seat Testing and Evaluation Regulation
5 pages
Ntinsider 2011 02
No ratings yet
Ntinsider 2011 02
32 pages
Taper Threaded Rebar Splicing Systems
No ratings yet
Taper Threaded Rebar Splicing Systems
20 pages
Furuno Installation Manual
71% (7)
Furuno Installation Manual
35 pages
Schneider Georgakis 2013 How To NOT Make The Extended Kalman Filter Fail
No ratings yet
Schneider Georgakis 2013 How To NOT Make The Extended Kalman Filter Fail
29 pages
Body Language Questionnaire
No ratings yet
Body Language Questionnaire
4 pages
The Crooked Ladder Thesis
100% (3)
The Crooked Ladder Thesis
5 pages
Smoking Mindfulness
No ratings yet
Smoking Mindfulness
9 pages
Michel Deza, Serguei Shpectorov and Mathieu Dutour-Sikiric - Wythoff Construction and l1 - Embedding
No ratings yet
Michel Deza, Serguei Shpectorov and Mathieu Dutour-Sikiric - Wythoff Construction and l1 - Embedding
53 pages
Patrick Melrose The Novels by Edward St. Aubyn Edward St. Aubyn download
100% (4)
Patrick Melrose The Novels by Edward St. Aubyn Edward St. Aubyn download
27 pages

DM-u3

Uploaded by

DM-u3

Uploaded by

Unit –III

Mining Frequent Patterns

• Apriori Algorithm: Finding Frequent Itemsets by Confined

• Generating Association Rules from Frequent Itemsets

• Mining Multilevel Associations

• Constraint-Based Frequent Pattern Mining

• Freq. pattern: An intrinsic and important property of datasets

•Three types of frequent patterns

•Frequent sequential pattern : If a subsequence occurs frequently

•Frequent structured pattern :If a substructure occurs frequently, it

•Searches for recurring relationships in a given data set.

•The earliest form of frequent pattern mining is Market Basket

–measurement of impact needed

C3 Itemset L3 Itemset sup

• Why counting supports of candidates a problem?

• Major computational challenges

You might also like