0% found this document useful (0 votes)
244 views

Data Mining Task - Association Rule Mining

This document discusses frequent pattern analysis and association rule mining. It describes how association rule mining uses a two-step process: 1) finding frequent itemsets and 2) generating strong association rules from those itemsets. The Apriori algorithm is introduced as a way to efficiently find frequent itemsets by pruning candidates that do not meet the minimum support threshold. An example application of market basket analysis is provided to demonstrate how association rule mining can help retailers.

Uploaded by

whyaguilar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views

Data Mining Task - Association Rule Mining

This document discusses frequent pattern analysis and association rule mining. It describes how association rule mining uses a two-step process: 1) finding frequent itemsets and 2) generating strong association rules from those itemsets. The Apriori algorithm is introduced as a way to efficiently find frequent itemsets by pruning candidates that do not meet the minimum support threshold. An example application of market basket analysis is provided to demonstrate how association rule mining can help retailers.

Uploaded by

whyaguilar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

 What is Frequent Pattern Analysis?

 Association Rule Mining


 A Two-Step Process of Association Rule Mining
 The Apriori Algorithm
 Frequent Itemset Generation
 Rule Generation
 Mining Association Rules: An Example
Frequent Pattern Analysis
 A frequent pattern: a pattern that occurs frequently in
a data set
 Frequent itemset
 A set of items, such as milk and bread appear frequently
together in a transaction data set
 Frequent sequential pattern
 Buying first a PC, then a digital camera, and then a
memory card, if it occurs frequently in a shopping
history database
Frequent Pattern Analysis
 Motivation: Finding inherent regularities in data
 What products were often purchased together?— Beer
and diapers?!
 What are the subsequent purchases after buying a PC?
 Applications
 Market-basket analysis, cross-marketing, catalog design,
sale campaign analysis.
Market Basket Analysis
 Help retailers plan
marketing or
advertising strategies,
or in design of a new
catalog.
 Help retailers design
different store layouts
 Help retailers plan
which items to put on
sale at reduced prices
Association Rule Mining
 Mining for interesting rules (= gold) through a large
database (=mountain)
 “Interesting” rules tell you something about your
database that you did not already know and probably
were not able to explicit articulate because the data is
so large.
Problem Statement
 Given a set of items in a transaction database
 Retrieve all possible patterns in form of association
rules
 Number of rules may be massively large
 May need a filter to select a set of the most valuable or
interesting rules
Components of Association Rule
 A -> B, [support = %, confidence = %]
 Milk->Bread, [support = 3%, confidence = 80%]

 If milk is purchased, then bread is purchased 80


percent of the time and this pattern occurs in 3
percent of all shopping basket
Components of Association Rule
 A -> B, [support = %, confidence = %]
 A and B are sets of items, i.e. itemsets. For example,
A = {bread, milk} and B = {jam, eggs}.
 A = Antecedence
 B = Consequence
 Support = P(A U B)
 Confidence = P(B | A)
Association Rule Mining
 Given a set of transactions, find rules that will predict
the occurrence of an item based on the occurrences of
other items in the transaction
TID Items
1 Bread, Milk
 Example of Association Rules
2 Bread, Diaper, Beer, Eggs
 {Diaper} -> {Beer} 3 Milk, Diaper, Beer, Coke
 {Milk, Bread} -> {Eggs, Coke} 4 Bread, Milk, Diaper, Beer
 {Beer, Bread} -> {Milk} 5 Bread, Milk, Diaper, Coke
Binary
TID Items
Representation 1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
 Each row correspond to a 3 Milk, Diaper, Beer, Coke
transaction. 4 Bread, Milk, Diaper, Beer
 Each column correspond to 5 Bread, Milk, Diaper, Coke
an item.
TID Bd M D Br E C
1 1 1 0 0 0 0
 1 if the item is present in a
2 1 0 1 1 1 0
transaction
3 0 1 1 1 0 1
 0 otherwise
4 1 1 1 1 0 0
5 1 1 1 0 0 1
Definitions
 Itemset X = {x1, …, xk}
 A collection of one or more items
 Example: {Milk, Bread, Diaper}
 k-itemset
 An itemset that contains k items
 Example: {Milk, Bread} is a 2-itemset
 Support count ( )
 Frequency of occurrence of an itemset
 Example: ({Milk, Bread,Diaper}) = 2
 Support
 Percentage of transactions that contain an itemset
 Example: s({Milk, Bread, Diaper}) = 2/5
Definitions
 Frequent Itemset
 An itemset whose support is greater than or equal to a
minsup threshold
 Association Rule
 An implication expression of the form X ->Y, where X
and Y are disjoint itemsets (X ∩Y = Ø)
 Example: {Milk, Diaper} -> {Beer}
Rule Evaluation Metrics
 Support (s)
 Percentage of transactions in D that contain both X and Y
 Support(X=>Y) = P(X U Y)
 Confidence (c )
 Percentage of transactions in D containing X that also contain Y
 Confidence (X=>Y) = P (Y | X) / P ( X )
 Example:
{Milk,Diaper}->Beer TID Items
s ={Milk, Diaper, Beer} / T 1 Bread, Milk
= 2 / 5 = 0.4 2 Bread, Diaper, Beer, Eggs
c = {Milk,Diaper,Beer}
3 Milk, Diaper, Beer, Coke
{Milk, Diaper}
4 Bread, Milk, Diaper, Beer
= 2 / 3 = 0.67
5 Bread, Milk, Diaper, Coke
Association Rule Mining
 Example of Rules:
 {Milk Diaper} -> {Beer} s=0 4 c=0 67)
 {Milk,Beer} -> {Diaper} (s=0.4, c=1.0)
 {Diaper,Beer} -> {Milk} (s=0.4, c=0.67)
 {Beer} -> {Milk,Diaper} (s=0.4, c=0.67)
 {Diaper} -> {Milk,Beer} (s=0.4, c=0.5)
 {Milk} -> {Diaper Beer} (s =0. 4 c=0. 5)
Association Rule Mining Task
 Given a set of transactions T, the goal of association
rule mining is to find all rules having
 support ≥ minsup threshold
 confidence ≥ minconf threshold
 Rules that satisfy both minsup and minconf are called
strong rules.
Two-step process
 1. Find all frequent itemsets
 Generate all itemsets whose support >= minsup
 2. Generate strong association rules from the frequent
itemsets
 Generate strong rules from each frequent itemset.
These rules must satisfy minsup and minconf
 The overall performance of mining association rules is
determined by Step 1.
Frequent Itemset Generation
Reducing Number of Candidates
 The Apriori property:
 Any subset of a frequent itemset must be frequent
 If {beer, diaper, nuts} is frequent, so is {beer, diaper}
 i.e., every transaction having {beer, diaper, nuts} also
contains {beer, diaper}
 for all X ,Y : (X subset of Y )-> s(X ) >= s(Y )
 Note that support of an itemset never exceeds the
support of its subsets
Apriori: A Candidate Generation
and Test Approach
 Apriori pruning principle: If there is any itemset which
is infrequent, its superset should not be generated or
tested!
 Method:
 Initially, scan DB once to get frequent 1-itemset
 Generate length (k+1) candidate itemsets from length k
frequent itemsets
 Test the candidates against DB
 Terminate when no frequent or candidate set can be
generated
Illustrating Apriori Principle
Illustrating Apriori Principle
The Apriori Algorithm—An Example
Generation of candidate itemsets
and frequent itemsets (where
minsup count=2)
Generating Association Rules
from Frequent Itemsets
Mining Association Rules: An
Example
1-Itemsets
2-Itemsets
3-Itemsets
Rules

You might also like