0% found this document useful (0 votes)
76 views

Mining Frequent, Patterns, Associations, and Correlations

This document discusses frequent pattern mining and association rule mining. It defines frequent patterns as patterns that appear frequently in a dataset, such as items purchased together. Association rule mining analyzes customer purchase data to find relationships between items purchased together. It represents these relationships as rules with metrics like support (how frequently an itemset occurs) and confidence (the probability an item is purchased given another item).

Uploaded by

Sasi S INDIA
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Mining Frequent, Patterns, Associations, and Correlations

This document discusses frequent pattern mining and association rule mining. It defines frequent patterns as patterns that appear frequently in a dataset, such as items purchased together. Association rule mining analyzes customer purchase data to find relationships between items purchased together. It represents these relationships as rules with metrics like support (how frequently an itemset occurs) and confidence (the probability an item is purchased given another item).

Uploaded by

Sasi S INDIA
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Mining Frequent,Patterns,

Associations, and Correlations


• Frequent patterns are patterns (such as itemsets, subsequences, or

substructures) that appear in a data set frequently. For example, a set of

items, such as milk and bread, that appear frequently together in a transaction

data set is a frequent itemset.

• A subsequence, such as buying first a PC, then a digital camera, and then a

memory card, if it occurs frequently in a shopping history database, is a

(frequent) sequential pattern.

• A substructure can refer to different structural forms, such as subgraphs,

subtrees, or sub lattices, which may be combined with itemsets or

subsequences.

• If a substructure occurs frequently, it is called a (frequent) structured pattern.


Basic Concepts and a Road Map - Market
Basket Analysis
• Example of frequent itemset mining is market
basket analysis.
• This process analyzes customer buying habits by
finding associations between the different items
that customers place in their “shopping baskets”
• The discovery of such associations can help
retailers develop marketing strategies by gaining
insight into which items are frequently
purchased together by customers.
• The Boolean vectors can be analyzed for buying
patterns that reflect items that are frequently
associated or purchased together.
• These patterns can be represented in the form of
association rules.
• For example, the information that customers who
purchase computers also tend to buy antivirus
software at the same time is represented in
Association Rule :
• Computer=>antivirus software [support = 2%;
confidence = 60%]
• Association Rule support and confidence are two measures of rule
interestingness

• A support of 2% for Association Rule (5.1) means that 2% of all the


transactions under analysis show that computer and antivirus software
are purchased together.

• A confidence of 60% means that 60% of the customers who purchased


a computer also bought the software.

• Typically, association rules are considered interesting if they satisfy both


a minimum support threshold and a minimum confidence threshold.

• Such thresholds can be set by users or domain experts.


Frequent Itemsets, Closed Itemsets, and Association
Rules
• Itemset: a set of items
• k-itemset: an itemset containing k items
• Let I = {I1, …, Im} be the set of all possible items
• T  I is a transaction, each associated with TID
• DB is a set of transactions
• An association rule is of the form A =>B with minimum support and confidence,
where A  I, B  I, A  B = 

Transaction-id Items bought


10 a, b, d
20 a, c, d
• In the example, I = {a, b, c, d, e, f}
30 a, d, e
• A possible association rule: d => a
– Better notation {d} => {a} 40 b, e, f
50 b, c, d, e, f
• support (A=>B) = P(A U B)
– Probability that a transaction contains A  B, i.e., each item in the union; not to be confused with
P(A or B), which indicates either A or B

– Percentage of transactions in DB that contain A  B,


– Also referred to as relative support
– Absolute support: number of transactions that contain A  B. also called occurrence frequency,
frequency, support count, count,
– Relative support = absolute support / |DB|
• confidence(A=>B) = P(B|A)
– conditional probability that a transaction having A also contains B
– confidence(A=>B) = P(B|A) = support(A  B) / support (A)

= support_count(A  B) / support_count(A)

• confidence(A=>B) can be easily derived from the support count of A and A  B,


association rule mining can be reduced to frequent pattern mining
• frequent (large) itemset: itemset satisfying min_sup threshold
• strong rule: rules satisfying both min_sup and min_conf thresholds
• In general, association rule mining can be
viewed as a two-step process:
1. Find all frequent itemsets: By definition, each
of these itemsets will occur at least as
frequently as a predetermined minimum
support count, min sup.
2. Generate strong association rules from the
frequent itemsets: By definition, these rules
must satisfy minimum support and minimum
confidence.
Closed Patterns and Max-Patterns
• A long pattern contains a combinatorial number of sub-
patterns, e.g., {a1, …, a100} contains (1001) + (1002) + … + (110000) =
2100 – 1 = 1.27*1030 sub-patterns!
• Solution: Mine closed patterns and max-patterns instead
• An itemset X is closed if X is frequent and there exists no super-
pattern Y ‫ כ‬X, with the same support as X
• An itemset X is a max-pattern if X is frequent and there exists
no frequent super-pattern Y ‫ כ‬X
• Closed pattern is a lossless compression of freq. patterns
– Reducing the # of patterns and rules
• A frequent itemset is one that occurs in at
least a user-specific percentage of the
database. That percentage is called support.
• An itemset is closed if none of its immediate
supersets has the same support as the
itemset.
• An itemset is maximal frequent if none of its
immediate supersets is frequent.

You might also like