0% found this document useful (0 votes)

4 views

DataMining_Chapter2

This chapter discusses association rules, a key model in data mining used to identify correlations in large datasets, first introduced by Skirant and Agrawal in 1995. It explains basic concepts such as transaction volume, support, and confidence, and outlines the process for finding association rules, including the Apriori algorithm which optimizes the search for frequent item sets. The chapter concludes with examples illustrating the generation of association rules based on specified thresholds for support and confidence.

Uploaded by

hamidbnb865

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

DataMining_Chapter2

Uploaded by

hamidbnb865

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CHAPTER II : ASSOCIATION RULES

In this chapter we present one of the most commonly used models in data mining: association
rules.

2.1 INTRODUCTION:
Association rules model is one of the most important and well researched models of data
mining. It was first introduced by Skirant and Agrawal in 1995 [8]. It aims to extract
interesting correlations, associations or casual structures among huge sets of data.
Association rules are widely used in various areas such as market, medical diagnostic,
telecommunication networks, risk management, inventory control etc [9].
Many companies accumulate voluminous amounts of data on their IT systems, resulting from
day-to-day operations. For example, hypermarkets collect many data on consumer purchases.
Table 2.1 gives an illustration of this type of data (note that we are interested in the list of
products and not in quantities or prices).
Table 2.1 Example of shopping basket
TIQ Items
1 { Bread, Milk}
2 { Bread, Chocolate, Juice, Eggs}
3 { Milk, Chocolate, Juice, Lemonade }
4 { Bread, Milk, Chocolate, Juice }
5 { Bread, Milk, Chocolate, Lemonade }

Each line corresponds to a transaction and includes the ticket number and a list of products
purchased. Commercial companies are interested in analysing this type of data to gain a better
understanding of their customers' purchasing behaviour. For example, in a hypermarket,
decision-makers may be interested to know which products are often bought together by
customers ?. "Association rules" model can answer this question. It is recommended for
problems involving the search for hidden relationships between data in large databases. For
example, the following rule can be extracted from the table above:
{Bread} → {Milk}

This rule suggests that there is a strong relationship between the sale of bread and milk,
because many customers (3/5) who buy bread also buy milk.

2.2 BASIC CONCEPTS

The data in the shopping basket (previous example) can be presented in binary form:

24
Table 2.2 Shopping basket data in binary representation
TIQ Bread Milk Chocolate Juice Eggs Lemonade
1 1 1 0 0 0 0
2 1 0 1 1 1 0
3 0 1 1 1 0 1
4 1 1 1 1 0 0
5 1 1 1 0 0 1

For the rest of the chapter, we will use the following notations : let I = { i1, i2, ... id} be the set
of all items in the baskets. and Let T = { t1, t2, ... tn} be the set of transactions.

2.2.1 Transaction Volume

Definition 1 : Volume
Volume is the number of items contained in the transaction.

For example, transaction no. 1 has a volume of 2, while all the others have a volume of 4.

2.2.2 Support of an Items Set

Definition 2 : Support of an items set
The support of a set of items X is given by Supp(X), the number of transactions containing it.

For example, the support of the set {Jus, Chocolate, Milk} is equal to 2.

2.2.3 Association Rule

Definition 3 : Association rule
An association rule is an application of the form X → Y where X and Y are disjoint sets of
items. Note that an association rule denotes co-occurrence, not causality.

2.2.4 Support and Confidence of an association rule

The strength of an association rule is measured by its support and confidence. These two
values are calculated using the following expressions.
Definition 4 : Support of a rule
The support of a rule: Supp(X → Y) = Supp(X U Y) / N ,where N is the number of
transactions.

Definition 5 : Confidence of a rule

The confidence of a rule Conf ( X → Y) = Supp(X U Y) / Supp(X)

Example: Consider the rule {Milk, Chocolate} → {Juice}.

The support of the set {Milk, Chocolate, Juice} is equal to 2.
The support of the rule is therefore 2/5 , i.e. 0.4, or 40.00%.
The support of {Milk, Chocolate} is equal to 3.
The confidence of this rule is therefore: 2/3, or 66.67%.

25
Intuitively, we can say that a “good association rule” is one that is based on good support and
has good confidence. Indeed, low support means that the rule is observed rarely in the data.
This is why support is often used to eliminate uninteresting rules.

2.3 SEARCHING FOR ASSOCIATION RULES

The problem of finding association rules in a database can be formulated as follows:
Given a set of transactions T, we want to find all the rules with support >= minsup and
confidence >= minconf where minsup and minconf are thresholds for support and confidence.

2.3.1 Basic method:

A basic solution to the problem presented above is to :
1/ Search for all candidate association rules
2/ Select those whose support and confidence are greater than or equal to the chosen
thresholds.
The calculation costs of this first approach are high. For a set of d items, the total number of
possible rules is :
nb = 3d - 2d+1 + 1 (1)

If d=6 (the example of the previous shopping basket), we have R=602 ! But if we apply a
selection of minsup=20% and minconf=50%, we can eliminate more than 80% of the rules (so
we keep 20% of the rules, or nearly 120 association rules).
A first step towards improving the performance of a rule search algorithm is to separate the
requirements on support and confidence. The definition of support shows that the support of a
rule X -> Y depends only on the support of its set of items (X U Y) . For example, the
following rules have the same support because they come from the same set :
{Jus, Chocolate, Milk}.
{Juice , Chocolate} -> {Milk}
{Egg, Milk} -> {Chocolate}
{Juice} -> {Milk, Chocolate}
{Milk} -> {Mus, Chocolate}
If all the items are rare, then all the associated rules are rare. They can therefore be eliminated
even before their confidence is calculated.
So a better strategy for finding association rules would be to split the process into two stages:
calculating sets of frequent items and generating rules from the sets calculated.
1.Generation of frequent item sets: the aim is to find all the item sets that satisfy the minsup
threshold.
2.Rule generation: the objective is to extract all high-confidence rules from the frequent item
sets found in the previous step. These rules are called strong rules.
This is the strategy applied by the Apriori algorithm (see next section).

26
2.3.2 Apriori algorithm
The Apriori algorithm (Agrawal & Skirant, 1995) [8] reduces the number of candidates sets
considered during the generation of frequent item sets. It is based on the following principle:
If a set of items is frequent, then all its subsets are also frequent. Conversely, if a set {a, b} is
infrequent, then its super-sets are also infrequent (i.e. the sets of items containing {a, b}). So
if we know that {a, b} is infrequent, we can eliminate a priori all the sets containing it. This
strategy is called support-based pruning.
This strategy is made possible by a key property of the support measure: the support of a set
of items is never greater than the support of its subsets.

2.3.3 Frequent items Sets generation

To introduce the Apriori algorithm, let's use the following notation:
Let Ck be the set of candidate items of size k, and Fk the set of frequent items of size k.
The algorithm begins by determining the support for each item: initialisation of F1.
The algorithm iteratively generates the sets of frequent items of size k from the sets of items
of size k-1 obtained in the previous step.
Algorithm 2.1 Apriori, Part 1 : Frequet item sets calculation
Algorithm Frequent item sets generation
Input : Data Set T
Output : Frequent item sets
Begin
//Initialisation
k=1 ;
Fk={ i / i  T et Supp(i) >= N x minsup }
//Iteration
K=2;
While Fk-1 <>Ǿ
Do begin
For each transaction t € T
Do begin
Ct = Extract(Ck, t) ; //extract all the item sets containg t
For each element c € Ct
Do supp(c) := supp(c) +1;
End;
Fk = { c / c  Ck et supp(c) >= N x minsup}
k=k+1
End;
Result = UFk //Union of Fk
End.

27
Let's execute the algorithm on the following example with minsup=40%.

N° items
01 F, M, B, E
02 O, F, E
03 O, A, H, S
04 D, B, F, E
05 D, A, H, S
06 O, M, E
07 O, A, D, H, S

Iteration 1 : Construction of sets of frequent items of size 1: F1

At this iteration, let's take each item separately, and calculate the support for each of them.
We can see that there are 7 sets of frequent terms (F1) which meet the criterion supp>=40%.
C1 : Sets of items of size 1 Sets of frequent items of size 1:
Ensemble support
{F} 3/7 =42,8 % Sets
{M} 2/7=28,6% {F}
{B} 2/7=28,6% {E}
{E} 4/7=57,1% {O}
{O} 4/7=57,1% {A}
{A} 3/7=42,8% {H}
{H} 3/7=42,8% {S}
{S} 3/7=42,8% {D}
{D} 3/7=42,8%

Iteration 2 : Construction of sets of frequent items of size 2: F2

At this stage, we take the result of the previous iteration (F1), and construct all the possible
item sets of size 2. By calculating the support for each of them, we find that there are 4 sets of
frequent items (which verify the criterion sup >= 40%).

C2 : Sets of items of size 2 Sets of frequent items of size 2:

Ensemble support Ensemble
{F, E} 3/7=42,8% {F, E}
{F, O} 1/7=14,3% {A, H}
{F, A} 0% {A, S}
{F, H} 0% {H, S}
{F, S} 0%
{F, D} 1/7=14,3%
{E, O} 2/7=28,6%
{E, A} 0%
{E, H} 0%
{E, S} 0%
{E, D} 1/7=14,3%

28
{O, A} 2/7=28,6%
{O, H} 2/7=28,6%
{O, S} 2/7=28,6%
{O, D} 1/7=14,3%
{A, H} 3/7=42,8%
{A, S} 3/7=42,8%
{A, D} 2/7=28,6%
{H, S} 3/7=42,8%
{H, D} 1/7=14,3%
{S, D} 2/7=14,3%

Iteration 3 : Construction of sets of frequent items of size 3: F3

All possible sets of size 3 are constructed from the sets F2 found in the previous step. We
calculate the support for each set. The result is a single set of frequent items.

Sets of items of size 3 Sets of frequent items of size 3:

Ensemble support Ensemble
{F, E, A} 0% {A, H, S}
{F, E, H} 0%
{F, E, S} 0%
{A, H, F} 0%
{A, H, E} 0%
{A, H, S} 3/7=42,8%
{A, S, F} 0%
{A, S, E} 0%
{H, S, F} 0%
{H, S, E} 0%

Iteration: Construction of sets of frequent items of size 4: F4

Sets of size 4 cannot be constructed. This part of the algorithm stops here.

2.3.4 Association Rules Generation

From the sets of frequent items obtained in the previous step, we can construct association
rules with a confidence greater than or equal to minconf. Here is the algorithm for this step:

29
Algorithm 2.2 Apriori, Part 2 : Association Rules Generation
Algorithm Frequent item sets generation
Input : Frequent item Sets , the confidence threshold minconf.
Output : The set of association rules whose confidence is greater than or equal to minconf
Begin
For each set L of Fi
Do Begin
Generate all sets of non-empty items in L
For each subset s
Do Generate the rule s → (L – s)

Select association rules checking conf >= minconf

End
End.

Let's apply this algorithm to our example with minconf=80%.

Association rules extracted from F2s (sets of frequent items of dimension 2) :
Rule Confidence
{E} → {F} 75%
{F} → {E} 100%
{A} → {H} 100%
{H} → {A} 100%
{A} → {S} 100%
{S} → {A} 100%
{H} → {S} 100%
{S} → {H} 100%

Association rules extracted from F3 (sets of frequent items of dimension 3) :

Rule Confidence
{A} → {H, S} 100%
{H} → {A, S} 100%
{S} → {A, H} 100%
{A, H} → { S } 100%
{A, S} → { H } 100%
{H, S} → { A } 100%

We can therefore see that there are 13 association rules extracted which meet the confidence
criterion >=80%.

CONCLUSION OF THE CHAPTER

In this chapter, we have introduced one of the most commonly used models in data mining:
association rules. We have introduced its basic concepts (association rule, support,

30
confidence). We discussed the problem of finding association rules. We presented the Apriori
algorithm with an example.

EXERCISES
…….

The Flex Offense
100% (1)
The Flex Offense
38 pages
TFA Training The Female Athlete Course Guide
No ratings yet
TFA Training The Female Athlete Course Guide
12 pages
Lesson 3 Economic Globalization, Poverty, and Inequality
100% (2)
Lesson 3 Economic Globalization, Poverty, and Inequality
13 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Introduction to Applied Econometrics Analysis Using Stata
From Everand
Introduction to Applied Econometrics Analysis Using Stata
Justin Doran
5/5 (3)
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Association-Rules
No ratings yet
Association-Rules
33 pages
Association Rules
No ratings yet
Association Rules
24 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Appriori Algorithm
No ratings yet
Appriori Algorithm
15 pages
Module1 Part2
No ratings yet
Module1 Part2
17 pages
Unit-4
No ratings yet
Unit-4
97 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
No ratings yet
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
7 pages
Association Rule Mining
No ratings yet
Association Rule Mining
19 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
UNIT-2 DMA (2)
No ratings yet
UNIT-2 DMA (2)
68 pages
Lect 6
No ratings yet
Lect 6
74 pages
UNIT 4 .3 ASSOCIATION ANALYSIS
No ratings yet
UNIT 4 .3 ASSOCIATION ANALYSIS
50 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Association Rule
No ratings yet
Association Rule
27 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
No ratings yet
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
18 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
No ratings yet
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
37 pages
class 4-Associative Analysis
No ratings yet
class 4-Associative Analysis
42 pages
06 FPBasic
No ratings yet
06 FPBasic
74 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Unit - III
No ratings yet
Unit - III
27 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
dwdm FINAL4
No ratings yet
dwdm FINAL4
37 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Association Rules
No ratings yet
Association Rules
48 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Unit 4 - Association Analysis
100% (1)
Unit 4 - Association Analysis
12 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Association
No ratings yet
Association
29 pages
Unit 4 - Association Analysis
No ratings yet
Unit 4 - Association Analysis
12 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
Unit 2
No ratings yet
Unit 2
14 pages
Data Mining frequent patterns
No ratings yet
Data Mining frequent patterns
22 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Visual Financial Accounting for You: Greatly Modified Chess Positions as Financial and Accounting Concepts
From Everand
Visual Financial Accounting for You: Greatly Modified Chess Positions as Financial and Accounting Concepts
Anthony Brticevic
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Seam3 Finals Topic1
No ratings yet
Seam3 Finals Topic1
3 pages
Pro Evolution MXE Panel User Manual
No ratings yet
Pro Evolution MXE Panel User Manual
20 pages
Dynamic FREE FALL Lab Report
No ratings yet
Dynamic FREE FALL Lab Report
2 pages
Sigma DS4 Datasheet
No ratings yet
Sigma DS4 Datasheet
4 pages
Summary - Chapter 9
100% (2)
Summary - Chapter 9
21 pages
Ecommerce Coursework Assignment - UK University BSC Final Year
No ratings yet
Ecommerce Coursework Assignment - UK University BSC Final Year
4 pages
PET519 Exam1
No ratings yet
PET519 Exam1
4 pages
A Review of Azimuth Thruster: Virendra Desai-Patil, Abhishek Ayare, Bhushan Mahajan, Sushmita Bade
No ratings yet
A Review of Azimuth Thruster: Virendra Desai-Patil, Abhishek Ayare, Bhushan Mahajan, Sushmita Bade
3 pages
Fire Detection System Research Paper
No ratings yet
Fire Detection System Research Paper
7 pages
EML 4501 Syllabus
No ratings yet
EML 4501 Syllabus
4 pages
FIAT MONEY
No ratings yet
FIAT MONEY
3 pages
File Clerk Resume
100% (1)
File Clerk Resume
8 pages
Advanced Java Practical Assignment
No ratings yet
Advanced Java Practical Assignment
3 pages
A Portfolio On Benchmarking Experiences in Ateneo de Davao University
No ratings yet
A Portfolio On Benchmarking Experiences in Ateneo de Davao University
10 pages
Ethical Hacking Lab 10
100% (1)
Ethical Hacking Lab 10
33 pages
Fosroc Nitoproof 810 TDS
No ratings yet
Fosroc Nitoproof 810 TDS
3 pages
Exercises On Indirect Speech With Key
No ratings yet
Exercises On Indirect Speech With Key
5 pages
Angular Js Packaging & Testing Assessment
No ratings yet
Angular Js Packaging & Testing Assessment
3 pages
plx101 PDF
No ratings yet
plx101 PDF
2 pages
Preposition PDF
100% (3)
Preposition PDF
17 pages
Marlowe s Ovid The Elegies in the Marlowe Canon Marlowe instant download
No ratings yet
Marlowe s Ovid The Elegies in the Marlowe Canon Marlowe instant download
82 pages
3A and 3B
No ratings yet
3A and 3B
13 pages
Randstad - White Paper Choosing The Right RPO
No ratings yet
Randstad - White Paper Choosing The Right RPO
12 pages
(eBook PDF) Inventing Arguments 2016 MLA Update 4th Edition download
100% (2)
(eBook PDF) Inventing Arguments 2016 MLA Update 4th Edition download
51 pages
NR7505915345058349 Invoice
No ratings yet
NR7505915345058349 Invoice
2 pages
Trigonometry Modul
No ratings yet
Trigonometry Modul
144 pages
Cable and Wire Catalogue PDF
No ratings yet
Cable and Wire Catalogue PDF
84 pages

DataMining_Chapter2

Uploaded by

DataMining_Chapter2

Uploaded by

CHAPTER II : ASSOCIATION RULES

2.2 BASIC CONCEPTS

2.2.1 Transaction Volume

2.2.2 Support of an Items Set

2.2.3 Association Rule

2.2.4 Support and Confidence of an association rule

Definition 5 : Confidence of a rule

Example: Consider the rule {Milk, Chocolate} → {Juice}.

2.3 SEARCHING FOR ASSOCIATION RULES

2.3.1 Basic method:

2.3.3 Frequent items Sets generation

Iteration 1 : Construction of sets of frequent items of size 1: F1

Iteration 2 : Construction of sets of frequent items of size 2: F2

C2 : Sets of items of size 2 Sets of frequent items of size 2:

Iteration 3 : Construction of sets of frequent items of size 3: F3

Sets of items of size 3 Sets of frequent items of size 3:

Iteration: Construction of sets of frequent items of size 4: F4

2.3.4 Association Rules Generation

Select association rules checking conf >= minconf

Let's apply this algorithm to our example with minconf=80%.

Association rules extracted from F3 (sets of frequent items of dimension 3) :

CONCLUSION OF THE CHAPTER

You might also like