0% found this document useful (0 votes)

21 views

Association Rule Mining (Under Construction!)

Arm discription

Uploaded by

Jai Prakash Yanamadala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Association Rule Mining (Under Construction!)

Arm discription

Uploaded by

Jai Prakash Yanamadala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 32

Association Rule Mining

Part 2
(under construction!)

Introduction to Data Mining with Case Studies

Author: G. K. Gupta
Prentice Hall India, 2006.

Bigger Example

December 2008

GKGupta

Frequency of Items

December 2008

GKGupta

Frequent Items
Assume 25% support. In 25 transactions, a frequent item
must occur in at least 7 transactions. The frequent 1itemset or L1 is now given below. How many candidates in
C2? List them.

December 2008

GKGupta

L2
The following pairs are frequent. Now find C 3 and then
L3 and the rules.

December 2008

GKGupta

Rules
The full set of rules are given below. Could some rules be
removed?

Comment: Study the above rules carefully.

December 2008

GKGupta

Improving the Apriori Algorithm

Many techniques for improving the efficiency
have been proposed:
Pruning (already mentioned)
Hashing based technique
Transaction reduction
Partitioning
Sampling
Dynamic itemset counting

December 2008

GKGupta

Pruning
Pruning can reduce the size of the candidate
set C . We want to transform C into a set of
frequent items L . To reduce the work of
checking, we may use the rule that all subsets
of C must also be frequent.
k

December 2008

GKGupta

Example
Suppose the items are A, B, C, D, E, F, .., X, Y, Z
Suppose L is A, C, E, P, Q, S, T, V, W, X
Suppose L is {A, C}, {A, F}, {A, P}, {C, P}, {E, P},
{E, G}, {E, V}, {H, J}, {K, M}, {Q, S}, {Q, X}
Are you able to identify errors in the L list?
What is C ?
How to prune C ?
C is {A, C, P}, {E, P, V}, {Q, S, X}
1

December 2008

GKGupta

Hashing
The direct hashing and pruning (DHP) algorithm
attempts to generate large itemsets efficiently
and reduces the transaction database size.
When generating L1, the algorithm also
generates all the 2-itemsets for each
transaction, hashes them to a hash table and
keeps a count.

December 2008

GKGupta

Example
Consider the transaction database in the first table below used
in an earlier example. The second table below shows all possible
2-itemsets for each transaction.

December 2008

GKGupta

Hashing Example
The possible 2-itemsets in the last table are now hashed to a hash
table below. The last column shown in the table below is not
required in the hash table but we have included it for explaining the
technique.

December 2008

GKGupta

Hash Function Used

For each pair, a numeric value is obtained by first representing
B by 1, C by 2, E 3, J 4, M 5 and Y 6. Now each pair can be
represented by a two digit number, for example (B, E) by 13
and (C, M) by 26.
The two digits are then coded as modulo 8 number (dividing
by 8 and using the remainder). This is the bucket address.
A count of the number of pairs hashed is kept. Those addresses
that have a count above the support value have the bit vector
set to 1 otherwise 0.
All pairs in rows that have zero bit are removed.
December 2008

GKGupta

Find C2
The major aim of the algorithm is to reduce the size of C2. It is
therefore essential that the hash table is large enough so that
collisions are low. Collisions result in loss of effectiveness of
the hash table. This is what happened in the example in which
we had collisions in three of the eight rows of the hash table
which required us finding which pair was frequent.

December 2008

GKGupta

Transaction Reduction
As discussed earlier, any transaction that does
not contain any frequent k-itemsets cannot
contain any frequent (k+1)-itemsets and such a
transaction may be marked or removed.

December 2008

GKGupta

Example
Frequent items (L1) are
A, B, D, M, T. We are
not able to use these to
eliminate any
transactions since all
transactions have at
least one of the items in
L1. The frequent pairs
(C2) are {A,B} and
{B,M}. How can we
reduce transactions
using these?
December 2008

TID

Items bought

001

B, M, T, Y

002

B, M

003

T, S, P

004

A, B, C, D

005

A, B

006

T, Y, E

007

A, B, M

008

B, C, D, T, P

009

D, T, S

010

A, B, M

GKGupta

Partitioning
The set of transactions may be divided into a
number of disjoint subsets. Then each partition is
searched for frequent itemsets. These frequent
itemsets are called local frequent itemsets.
How can information about local itemsets be
used in finding frequent itemsets of the global
set of transactions?
In the example on the next slide, we have
divided a set of transactions into two partitions.
Find the frequent items sets for each partition.
Are these local frequent itemsets useful?

December 2008

GKGupta

Example
2
2
2
1
4
1
2
5

4
13
5
2
5
2
4
6

3
6
5
6
11

7
11
13

December 2008

1
2
2
3
5
6
2
5
6
11
14 19
1
2
12 14
2
4
2
4
GKGupta

5
4
11
7
13

7
5
4
2

13
4

5
19
5
6

6
11

7
1318

Partitioning
Phase 1
Divide n transactions into m partitions
Find the frequent itemsets in each partition
Combine all local frequent itemsets to form
candidate itemsets
Phase 2
Find global frequent itemsets

December 2008

GKGupta

Sampling
A random sample (usually large enough to fit in
the main memory) may be obtained from the
overall set of transactions and the sample is
searched for frequent itemsets. These frequent
itemsets are called sample frequent itemsets.
How can information about sample itemsets be
used in finding frequent itemsets of the global set
of transactions?

December 2008

GKGupta

Sampling
Not guaranteed to be accurate but we sacrifice
accuracy for efficiency. A lower support threshold
may be used for the sample to ensure not missing
any frequent datasets.
The actual frequencies of the sample frequent
itemsets are then obtained.
More than one sample could be used to improve
accuracy.

December 2008

GKGupta

Problems with Association Rules

Algorithms
Users are overwhelmed by the number of rules
identified how can the number of rules be
reduced to those that are relevant to the user
needs?
Apriori algorithm assumes sparsity since
number of items on each record is quite small.
Some applications produce dense data which
may also have
many frequently occurring items
strong correlations
many items on each record
December 2008

GKGupta

Problems with Association Rules

Also consider:
AB C (90% confidence)
and A C
(92% confidence)
Clearly the first rule is of no use. We should look
for more complex rules only if they are better
than simple rules.

December 2008

GKGupta

Top Down Approach

Algorithms considered so far were bottom up i.e. they
started from looking at each frequent item, then each pair
and so on.
Is it possible to design top down algorithms that consider
the largest group of items first and then finds the smaller
groups. Let us first look at the itemset ABCD which can be
frequent only if all subsets are frequent.

December 2008

GKGupta

Subsets of ABCD
ABCD

December 2008

ACD

BCD

ABC

ABD

GKGupta

Closed and Maximal Itemsets

A frequent closed itemset is a frequent itemset X such
that there exists no superset of X with the same support
count as X.
A frequent itemset Y is maximal if it is not a proper
subset of any other frequent itemset.
Therefore a maximal itemset is a closed itemset but a
closed itemset is not necessarily a maximal itemset.

December 2008

GKGupta

Closed and Maximal Itemsets

Frequent maximal itemsets the frequent maximal
itemsets uniquely determine all frequent itemsets.
Therefore the aim of any association rule algorithm is to
find all maximal frequent itemsets.

December 2008

GKGupta

Closed and Maximal Itemsets

In Example, we found {B, D} and {B, C, D} had the same
support of 8 while {C, D} had a support of 9. {C, D} is
therefore a closed itemset but not maximal. On the other
hand, {B, C} was frequent but no superset of the two
items is frequent. This pair therefore is maximal as well
as closed.

December 2008

GKGupta

Closed and maximal itemsets

Frequent
Itemsets

Closed
Frequent
Itemsets
Maximal
Frequent
Itemsets

December 2008

GKGupta

Performance Evaluation of
Algorithms
The FP-growth method was usually better than the best
implementation of the Apriori algorithm.
CHARM was also usually better than Apriori. In some
cases, Charm was better than the FP-growth method.
Apriori was generally better than other algorithms if the
support required was high since high support leads to a
smaller number of frequent items which suits the Apriori
algorithm.
At very low support, the number of frequent items
became large and none of the algorithms were able to
handle large frequent sets gracefully.
December 2008

GKGupta

Bibliography

R. Agarwal, T. Imielinski, and A. Swami, Mining Association Rules

Between sets of Items in Large Databases, In Proc of the ACM
SIGMOD, 1993, pp. 207-216.

R. Ramakrishnan and J. Gehrke, Database management systems,,

2nd ed. McGraw-Hill, 2000.

M. J. A. Berry and G. Linoff, Mastering data mining : the art and

science of customer relationship management, Wiley, 2000.

I. H. Witten and E. Frank, Data mining: practical machine learning

tools and techniques with Java implementations,. Morgan Kaufmann,
2000.

December 2008

GKGupta

Bibliography

M. J. A. Berry and G. Linoff, Data mining techniques: for marketing,

sales, and customer support, New York : Wiley, 1997.
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy
(eds.), Advances in Knowledge Discovery and Data Mining, AAAI/MIT
Press, 1996.
R. Agarwal, M. Mehta, J. Shafer, A. Arning, and T. Bollinger, The Quest
Data Mining System, Proc 1996 Int. Conf on Data Mining and
Knowledge Discovery (KDD96), Portland, Oregon, pp. 244-249, Aug
1996.

December 2008

GKGupta

Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Week 3
No ratings yet
Week 3
56 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Unit 3
No ratings yet
Unit 3
62 pages
Frequent Patterns and Association Rule Mining: Outline
No ratings yet
Frequent Patterns and Association Rule Mining: Outline
26 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
dm 2
No ratings yet
dm 2
71 pages
Concepts and Techniques: - Chapter 6
No ratings yet
Concepts and Techniques: - Chapter 6
64 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Module 3
No ratings yet
Module 3
136 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Slides 06FPBasic
No ratings yet
Slides 06FPBasic
30 pages
06 FPBasic
No ratings yet
06 FPBasic
59 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
DMDW Chapter 4(Updated)
No ratings yet
DMDW Chapter 4(Updated)
28 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Association Rule Mining
No ratings yet
Association Rule Mining
11 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Chap4-PatternMiningBasic
No ratings yet
Chap4-PatternMiningBasic
52 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Chap 6
No ratings yet
Chap 6
77 pages
Chapter - 6 Data Mining
No ratings yet
Chapter - 6 Data Mining
65 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
7 - Association Rule Analysis
No ratings yet
7 - Association Rule Analysis
16 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
4 Association
No ratings yet
4 Association
66 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
[2025-05-27]-FPM_LECTURE 9-
No ratings yet
[2025-05-27]-FPM_LECTURE 9-
35 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
Chapter4
No ratings yet
Chapter4
32 pages
Module 3
No ratings yet
Module 3
98 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
Unit-2
No ratings yet
Unit-2
65 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Learn Design and Analysis of Algorithms in 24 Hours
From Everand
Learn Design and Analysis of Algorithms in 24 Hours
Alex Nordeen
No ratings yet
Apriori Algorithm: 1 Setting
No ratings yet
Apriori Algorithm: 1 Setting
3 pages
Part 2
No ratings yet
Part 2
165 pages
16 Marks DWDM
No ratings yet
16 Marks DWDM
6 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
102 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Lecture - 9 Unsupervised Learning (K-Means, Association Analysis and Frequuent Items)
No ratings yet
Lecture - 9 Unsupervised Learning (K-Means, Association Analysis and Frequuent Items)
73 pages
DWBI Assignment 17B
0% (1)
DWBI Assignment 17B
3 pages
AprioriTID Algorithm Improved From Apriori Algorithm
No ratings yet
AprioriTID Algorithm Improved From Apriori Algorithm
5 pages
Association Rules
No ratings yet
Association Rules
20 pages
Importance of Clustering
No ratings yet
Importance of Clustering
5 pages
DMDW Unit 3
No ratings yet
DMDW Unit 3
25 pages
15th QN
No ratings yet
15th QN
3 pages
Rapid Association Rule Mining
No ratings yet
Rapid Association Rule Mining
9 pages
Data Mining Jurnal
No ratings yet
Data Mining Jurnal
20 pages
Apriori
No ratings yet
Apriori
13 pages
Unit 2 Data Preprocessing and Association Rule Mining
No ratings yet
Unit 2 Data Preprocessing and Association Rule Mining
31 pages
Data Mining NOTES
No ratings yet
Data Mining NOTES
57 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Unit Iii (DWDM)
No ratings yet
Unit Iii (DWDM)
11 pages
Data Mining For High Performance Data Cloud Using Association Rule Mining
No ratings yet
Data Mining For High Performance Data Cloud Using Association Rule Mining
6 pages
Unit 5
No ratings yet
Unit 5
40 pages
Department of Computer Engineering: Experiment No.8
No ratings yet
Department of Computer Engineering: Experiment No.8
4 pages
Association
No ratings yet
Association
40 pages
Predictive Data Analytics With Python
100% (1)
Predictive Data Analytics With Python
97 pages
Classification Clustering
No ratings yet
Classification Clustering
3 pages
RDataMining Slides Association Rules PDF
No ratings yet
RDataMining Slides Association Rules PDF
75 pages
Association Analysis: Basic Concepts and Algorithms
No ratings yet
Association Analysis: Basic Concepts and Algorithms
28 pages

Association Rule Mining (Under Construction!)

Uploaded by

Association Rule Mining (Under Construction!)

Uploaded by

Association Rule Mining

Introduction to Data Mining with Case Studies

Comment: Study the above rules carefully.

Improving the Apriori Algorithm

Hash Function Used

Problems with Association Rules

Problems with Association Rules

Top Down Approach

Closed and Maximal Itemsets

Closed and Maximal Itemsets

Closed and Maximal Itemsets

Closed and maximal itemsets

R. Agarwal, T. Imielinski, and A. Swami, Mining Association Rules

R. Ramakrishnan and J. Gehrke, Database management systems,,

M. J. A. Berry and G. Linoff, Mastering data mining : the art and

I. H. Witten and E. Frank, Data mining: practical machine learning

M. J. A. Berry and G. Linoff, Data mining techniques: for marketing,

You might also like