This document discusses frequent pattern analysis and association rule mining. It describes how association rule mining uses a two-step process: 1) finding frequent itemsets and 2) generating strong association rules from those itemsets. The Apriori algorithm is introduced as a way to efficiently find frequent itemsets by pruning candidates that do not meet the minimum support threshold. An example application of market basket analysis is provided to demonstrate how association rule mining can help retailers.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
244 views
Data Mining Task - Association Rule Mining
This document discusses frequent pattern analysis and association rule mining. It describes how association rule mining uses a two-step process: 1) finding frequent itemsets and 2) generating strong association rules from those itemsets. The Apriori algorithm is introduced as a way to efficiently find frequent itemsets by pruning candidates that do not meet the minimum support threshold. An example application of market basket analysis is provided to demonstrate how association rule mining can help retailers.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30
What is Frequent Pattern Analysis?
Association Rule Mining
A Two-Step Process of Association Rule Mining The Apriori Algorithm Frequent Itemset Generation Rule Generation Mining Association Rules: An Example Frequent Pattern Analysis A frequent pattern: a pattern that occurs frequently in a data set Frequent itemset A set of items, such as milk and bread appear frequently together in a transaction data set Frequent sequential pattern Buying first a PC, then a digital camera, and then a memory card, if it occurs frequently in a shopping history database Frequent Pattern Analysis Motivation: Finding inherent regularities in data What products were often purchased together?— Beer and diapers?! What are the subsequent purchases after buying a PC? Applications Market-basket analysis, cross-marketing, catalog design, sale campaign analysis. Market Basket Analysis Help retailers plan marketing or advertising strategies, or in design of a new catalog. Help retailers design different store layouts Help retailers plan which items to put on sale at reduced prices Association Rule Mining Mining for interesting rules (= gold) through a large database (=mountain) “Interesting” rules tell you something about your database that you did not already know and probably were not able to explicit articulate because the data is so large. Problem Statement Given a set of items in a transaction database Retrieve all possible patterns in form of association rules Number of rules may be massively large May need a filter to select a set of the most valuable or interesting rules Components of Association Rule A -> B, [support = %, confidence = %] Milk->Bread, [support = 3%, confidence = 80%]
If milk is purchased, then bread is purchased 80
percent of the time and this pattern occurs in 3 percent of all shopping basket Components of Association Rule A -> B, [support = %, confidence = %] A and B are sets of items, i.e. itemsets. For example, A = {bread, milk} and B = {jam, eggs}. A = Antecedence B = Consequence Support = P(A U B) Confidence = P(B | A) Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction TID Items 1 Bread, Milk Example of Association Rules 2 Bread, Diaper, Beer, Eggs {Diaper} -> {Beer} 3 Milk, Diaper, Beer, Coke {Milk, Bread} -> {Eggs, Coke} 4 Bread, Milk, Diaper, Beer {Beer, Bread} -> {Milk} 5 Bread, Milk, Diaper, Coke Binary TID Items Representation 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs Each row correspond to a 3 Milk, Diaper, Beer, Coke transaction. 4 Bread, Milk, Diaper, Beer Each column correspond to 5 Bread, Milk, Diaper, Coke an item. TID Bd M D Br E C 1 1 1 0 0 0 0 1 if the item is present in a 2 1 0 1 1 1 0 transaction 3 0 1 1 1 0 1 0 otherwise 4 1 1 1 1 0 0 5 1 1 1 0 0 1 Definitions Itemset X = {x1, …, xk} A collection of one or more items Example: {Milk, Bread, Diaper} k-itemset An itemset that contains k items Example: {Milk, Bread} is a 2-itemset Support count ( ) Frequency of occurrence of an itemset Example: ({Milk, Bread,Diaper}) = 2 Support Percentage of transactions that contain an itemset Example: s({Milk, Bread, Diaper}) = 2/5 Definitions Frequent Itemset An itemset whose support is greater than or equal to a minsup threshold Association Rule An implication expression of the form X ->Y, where X and Y are disjoint itemsets (X ∩Y = Ø) Example: {Milk, Diaper} -> {Beer} Rule Evaluation Metrics Support (s) Percentage of transactions in D that contain both X and Y Support(X=>Y) = P(X U Y) Confidence (c ) Percentage of transactions in D containing X that also contain Y Confidence (X=>Y) = P (Y | X) / P ( X ) Example: {Milk,Diaper}->Beer TID Items s ={Milk, Diaper, Beer} / T 1 Bread, Milk = 2 / 5 = 0.4 2 Bread, Diaper, Beer, Eggs c = {Milk,Diaper,Beer} 3 Milk, Diaper, Beer, Coke {Milk, Diaper} 4 Bread, Milk, Diaper, Beer = 2 / 3 = 0.67 5 Bread, Milk, Diaper, Coke Association Rule Mining Example of Rules: {Milk Diaper} -> {Beer} s=0 4 c=0 67) {Milk,Beer} -> {Diaper} (s=0.4, c=1.0) {Diaper,Beer} -> {Milk} (s=0.4, c=0.67) {Beer} -> {Milk,Diaper} (s=0.4, c=0.67) {Diaper} -> {Milk,Beer} (s=0.4, c=0.5) {Milk} -> {Diaper Beer} (s =0. 4 c=0. 5) Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having support ≥ minsup threshold confidence ≥ minconf threshold Rules that satisfy both minsup and minconf are called strong rules. Two-step process 1. Find all frequent itemsets Generate all itemsets whose support >= minsup 2. Generate strong association rules from the frequent itemsets Generate strong rules from each frequent itemset. These rules must satisfy minsup and minconf The overall performance of mining association rules is determined by Step 1. Frequent Itemset Generation Reducing Number of Candidates The Apriori property: Any subset of a frequent itemset must be frequent If {beer, diaper, nuts} is frequent, so is {beer, diaper} i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper} for all X ,Y : (X subset of Y )-> s(X ) >= s(Y ) Note that support of an itemset never exceeds the support of its subsets Apriori: A Candidate Generation and Test Approach Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated or tested! Method: Initially, scan DB once to get frequent 1-itemset Generate length (k+1) candidate itemsets from length k frequent itemsets Test the candidates against DB Terminate when no frequent or candidate set can be generated Illustrating Apriori Principle Illustrating Apriori Principle The Apriori Algorithm—An Example Generation of candidate itemsets and frequent itemsets (where minsup count=2) Generating Association Rules from Frequent Itemsets Mining Association Rules: An Example 1-Itemsets 2-Itemsets 3-Itemsets Rules