0% found this document useful (0 votes)
24 views16 pages

Dar Lec 15 Association Rules

Association rules are a data mining technique used to identify relationships within large datasets, particularly in market basket analysis. Key components include antecedents and consequents, with measurements such as support, confidence, and lift to assess the strength of these associations. The document also outlines steps to create association rules using R, including the use of the apriori function to generate and filter rules based on specified criteria.

Uploaded by

sharmahemant3610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views16 pages

Dar Lec 15 Association Rules

Association rules are a data mining technique used to identify relationships within large datasets, particularly in market basket analysis. Key components include antecedents and consequents, with measurements such as support, confidence, and lift to assess the strength of these associations. The document also outlines steps to create association rules using R, including the use of the apriori function to generate and filter rules based on specified criteria.

Uploaded by

sharmahemant3610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Association rules

Complied and presented by


Dr. Chetna Arora
• Association rules are a data mining technique
used to uncover interesting relationships,
patterns, or associations within large datasets.
They are widely used in market basket analysis
to identify products frequently purchased
together.
• Components of an Association Rule
• Antecedent (If):
– These are the items or conditions in the "if" part of the rule.
– Example: In a supermarket, bread is the antecedent in the rule: "If a
customer buys bread, they are likely to buy butter."
• Consequent (Then):
– These are the items or outcomes in the "then" part of the rule.
– Example: Butter is the consequent in the rule: "If a customer buys
bread, they are likely to buy butter."
• Rule Example:
• If {bread} then {butter}
• This means customers who buy bread are also likely to buy butter.
Key Measurements for Association Rules

• Support:
• Definition: Measures how frequently an itemset
appears in the dataset.
• Formula:
Support=Number of transactions containing both ante
cedent and consequent/Total number of transactions​
• Example:
If bread and butter appear together in 40 out of 1,000
transactions:
• {Support} = {40}/{1000} = 0.04 (4% of transactions)}
• Confidence:
• Definition: Measures how often the rule is true when the
antecedent occurs.
• Formula:
Confidence=Number of transactions containing both anteceden
t and consequent/
Number of transactions containing antecedent
• Example:
If 50 transactions contain bread, and 40 of these also contain
butter:
• {Confidence} = {40}{50} = 0.8 {(80% confidence)} This means
80% of the time, customers who buy bread also buy butter.
• Lift:
• Definition: Measures how much more likely the antecedent
and consequent occur together compared to if they were
independent.
• Formula: Lift=Confidence/Support of consequent
• Example:
If butter appears in 100 out of 1,000 transactions:
Support of butter=100/1000=0.1
• Using the previous confidence (0.8): Lift=0.80/0.1=8
• A lift of 8 means that customers buying bread are 8 times
more likely to buy butter compared to random chance.
• Lift > 1: This indicates that the antecedent and consequent
are positively associated—the occurrence of the
antecedent makes the consequent more likely to occur
than by random chance.
• Lift = 1: This indicates no association—the antecedent and
consequent occur together as frequently as they would if
they were independent.
• Lift < 1: This indicates a negative association—the
occurrence of the antecedent makes the consequent less
likely to occur than by random chance.
• Example:
Transaction ID Items Bought
1 Bread, Butter, Milk
2 Bread, Butter
3 Bread, Milk
4 Butter, Milk
5 Bread, Butter, EGGS

Rule:
If {Bread} → {Butter}
Support: Bread and Butter appear together in 3 out of 5
transactions. Support=3/5=0.6 (60%)
Confidence: Bread appears in 4 transactions, and 3 of those
include Butter. Confidence=3/4=0.75 (75%)
Lift: Butter appears in 4 out of 5 transactions.
Lift=0.75/0.8*=0.9375
A lift less than 1 suggests the items are less likely to be associated
than by chance.
*Support of consequent is how frequently Butter appears in the
dataset, regardless of whether Bread is purchased or not. In this case,
if Butter appears in 4 transactions out of 5,
the Support of Butter is:
Support of Butter=4/5=0.8
Metric Meaning Interpretation

How often the itemset (both antecedent and


Support Higher support indicates a frequent pattern.
consequent) occurs in the dataset.

The likelihood of the consequent occurring Higher confidence means the rule is more
Confidence
when the antecedent is present. reliable.

Lift > 1 indicates a positive association, Lift =


How much the presence of the antecedent
Lift 1 means independence, Lift < 1 indicates a
increases the likelihood of the consequent.
negative association.
• Steps to Create Association Rules in R
• Install and load the arules package.
• Load a transactional dataset.
• Use the apriori() function to generate rules.
• Inspect and interpret the rules.
• Example in R
• Here’s a step-by-step guide with a simple
example:
• Step 1: Install and Load Required Package
• install.packages("arules")
• # Install only if not already installed
• library(arules)
• Step 2: Load Dataset
• We’ll use the built-in Groceries dataset from
the arules package.
• data("Groceries")
• summary(Groceries)
• Step 3: Generate Association Rules
• rules <- apriori(Groceries, parameter = list(support = 0.01, confidence = 0.5))
• # Adjust values as needed
• *apriori()
• apriori() is a function from the arules package in R used to apply the Apriori
algorithm to a dataset. The Apriori algorithm finds frequent itemsets and generates
association rules based on the given support and confidence thresholds.
• In simple terms, it finds patterns like "if item X is bought, item Y is likely to be bought
too.“
• Support measures how frequently an item or itemset appears in the dataset. For
example, if an itemset has a support of 0.01, it means that itemset appears in at
least 1% of the total transactions.
• his specifies the minimum confidence threshold.
• Confidence measures how often the consequent (the item in the "then" part of the
rule) appears in transactions where the antecedent (the item in the "if" part)
appears.
• If you set confidence = 0.5, you're looking for rules where, when the antecedent is
bought, the consequent is bought at least 50% of the time.
• For example, a rule like {Bread} → {Butter} would be considered valid if, whenever
someone buys bread, they also buy butter at least 50% of the time.
• Step 4: View and Inspect Rules
• # View the top 5 rules
• inspect(head(rules, 5)
• 5rows (if,then)with support, confidence, lift
• Step 5: Filter Rules (Optional)
• Filter rules to focus on specific criteria, like high lift:
• filtered_rules <- subset(rules, lift > 1.5)
• inspect(head(filtered_rules, 5))
• his code will filter out the rules in the rules object where the lift
is greater than 1.5.
• Why 1.5?: Setting a threshold for lift ensures that you are
selecting rules that show a strong association. If you set the
threshold too low, you may end up with too many weak
associations, whereas a higher threshold, like 1.5, ensures that
the rules you get have a stronger and more significant
relationship.
• Real-World Example
• If a rule says: {milk} => {bread} [support=0.02,
confidence=0.8, lift=3]
• 2% of transactions include milk and bread
together.
• 80% of the time, bread is bought when milk is
bought.
• Customers are 3 times more likely to buy bread
if they buy milk.

You might also like