0% found this document useful (0 votes)
18 views

Patterning in Multilevel and Multidimensional Space

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Patterning in Multilevel and Multidimensional Space

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Mining Multilevel Associations

A concept hierarchy defines a sequence of mappings from a set of low-level concepts to a


higher-level, more general concept set. Data can be generalized by replacing low-level concepts
within the data by their corresponding higher-level concepts, or ancestors, from a concept
hierarchy.
Example: The task-relevant set of transactional data in Table 7.1 for sales in an AllElectronics
store, showing the items purchased for each transaction. The concept hierarchy for the items is
shown in Figure 7.2.
Association rules generated from mining data at multiple abstraction levels are called multiplelevel
or multilevel association rules. Multilevel association rules can be mined efficiently using
concept hierarchies under a support-confidence framework.
In general, a top-down strategy is employed, where counts are accumulated for the calculation of
frequent itemsets at each concept level, starting at concept level 1 and working downward in the
hierarchy toward the more specific concept levels, until no more frequent itemsets can be found.
11
For each level, any algorithm for discovering frequent itemsets may be used, such as
Apriori or its variations.
A number of variations to this approach are described next, where each variation involves
“playing” with the support threshold in a slightly different way.
1. Using uniform minimum support for all levels (referred to as uniform support): The
same minimum support threshold is used when mining at each abstraction level. For
example, in Figure 7.3, a minimum support threshold of 5% is used throughout (e.g., for
mining from“computer” downward to “laptop computer”). Both “computer” and “laptop
computer” are found to be frequent, whereas “desktop computer” is not.
2. Using reduced minimum support at lower levels (referred to as reduced support):
Each abstraction level has its own minimum support threshold. The deeper the
abstraction level, the smaller the corresponding threshold. For example, in Figure 7.4, the
minimum support thresholds for levels 1 and 2 are 5% and 3%, respectively. In this way,
“computer,” “laptop computer,” and “desktop computer” are all considered frequent.
12
3. Using item or group-based minimum support (referred to as group-based support):
Because users or experts often have insight as to which groups are more important than others, it
is sometimes more desirable to set up user-specific, item, or group-based minimal support
thresholds when miningmultilevel rules. For example, a user could set up the minimum support
thresholds based on product price or on items of interest, such as by setting particularly low
support thresholds for “camera with price over $1000” or “Tablet PC.
Note: A serious side effect of mining multilevel association rules is its generation of many
redundant rules across multiple abstraction levels due to the “ancestor” relationships
among items.
3.5 Mining Multidimensional Associations
 A single dimensional or intradimensional association rule because it contains a single
distinct predicate (e.g., buys) with multiple occurrences (i.e., the predicate occurs more
than once within the rule).
 Association rules that involve two or more dimensions or predicates can be referred to as
multidimensional association rules. Rule below contains three predicates (age,
occupation, and buys), each of which occurs only once in the rule. Hence, we say that it
has no repeated predicates. Multidimensional association rules with no repeated

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/14
predicates are called interdimensional association rules.
Multidimensional association rules with repeated predicates, which contain multiple occurrences
of some predicates. These rules are called hybrid-dimensional association rules.
Database attributes can be nominal or quantitative.
The values of nominal (or categorical) attributes are “names of things.” Nominal attributes have
a finite number of possible values, with no ordering among the values (e.g., occupation, brand,
color).
Quantitative attributes are numeric and have an implicit ordering among values (e.g.,age,
income, price).
13
There are two basic approaches regarding the treatment of quantitative attributes, they are
 In the first approach, quantitative attributes are discretized using predefined concept
hierarchies. For example, a concept hierarchy for income may be used to replace the
original numeric values of this attribute by interval labels such as “0..20K,” “21K..30K,”
“31K..40K,” and so on. Here, discretization is static and predetermined. This is refer as
mining multidimensional association rules using static discretization of quantitative
attributes.
 In the second approach, quantitative attributes are discretized or clustered into “bins”
based on the data distribution. These bins may be further combined during the mining
process. The discretization process is dynamic and established so as to satisfy some
mining criteria such as maximizing the confidence of the rules mined. It is referred to as
(dynamic) quantitative association rules
Mining Quantitative Association Rules
There are three methods
(1) a data cube method,
(2) a clustering-based method, and
(3) a statistical analysis method to uncover exceptional behaviors.
1) Data Cube–Based Mining of Quantitative Associations
Data cubes are well suited for the mining of multidimensional association rules: They store
aggregates (e.g., counts) in multidimensional space, which is essential for computing the support
and confidence of multidimensional association rules.
Figure 7.5 shows the lattice of cuboids defining a data cube for the dimensions age, income, and
buys. The cells of an n-dimensional cuboid can be used to store the support counts of the
corresponding n-predicate sets. The base cuboid aggregates the task-relevant data by age,
income, and buys; the 2-D cuboid, (age, income), aggregates by age and income, and so on; the
0-D (apex) cuboid contains the total number of transactions in the task-relevant data.
14
2) Mining Clustering-Based Quantitative Associations
 In Top Down approach, each quantitative dimension, a standard clustering algorithm
(e.g., k-means or a density-based clustering algorithm) can be applied to find clusters in
this dimension that satisfy the minimum support threshold.
 For each cluster, we then examine the 2-D spaces generated by combining the cluster
with a cluster or nominal value of another dimension to see if such a combination passes
the minimum support threshold.
 A bottom-up approach for finding clustering-based frequent patterns works by first
clustering in high-dimensional space to form clusters with support that satisfies the
minimum support threshold, and then projecting and merging those clusters in the space
containing fewer dimensional combinations.
3) Using Statistical Theory to Disclose Exceptional Behavior
 It is possible to discover quantitative association rules that disclose exceptional behavior,
where “exceptional” is defined based on a statistical theory. For example, the following
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/14
association rule may indicate exceptional behavior:
15
Mining Rare Patterns and Negative Patterns
An infrequent (or rare) pattern is a pattern with a frequency support that is below (or far
below) a user-specified minimum support threshold.
Example: In jewelry sales data, sales of diamond watches are rare;
Negative Patterns
Example: In supermarket data, if we find that customers frequently buy Coca-Cola Classic or
Diet Coke but not both, then buying Coca-Cola Classic and buying Diet Coke together is
considered a negative (correlated) pattern.
3.6 Constraint-Based Frequent Pattern Mining
Constraint-based mining allows the users to specify intuition or expectations as constraints to
confine the search space.
The constraints can include the following:
 Knowledge type constraints: These specify the type of knowledge to be mined, such as
association, correlation, classification, or clustering.
 Data constraints: These specify the set of task-relevant data.
 Dimension/level constraints: These specify the desired dimensions (or attributes) of the
data, the abstraction levels, or the level of the concept hierarchies to be used in mining.
 Interestingness constraints: These specify thresholds on statistical measures of rule
interestingness such as support, confidence, and correlation.
 Rule constraints: These specify the form of, or conditions on, the rules to be mined.
Such constraints may be expressed as metarules (rule templates), as the maximum or
minimum number of predicates that can occur in the rule antecedent or consequent, or as
relationships among attributes, attribute values, and/or aggregates.
Metarule-Guided Mining of Association Rules
16
A metarule can be used to specify the information describing the form of rules you are interested
in finding. An example of such a metarule is
where P1 and P2 are predicate variables that are instantiated to attributes from the given
database during the mining process, X is a variable representing a customer, and Y and W take on
values of the attributes assigned to P1 and P2, respectively.
Pruning Pattern Space with Pattern Pruning Constraints
Based on how a constraint may interact with the pattern mining process, there are fivevcategories
of pattern mining constraints: (1) antimonotonic, (2) monotonic, (3) succinct, (4) convertible, and
(5) inconvertible.
 If an itemset does not satisfy the rule constraint, none of its supersets can satisfy the
constraint. If a rule constraint obeys this property, it is antimonotonic.
 if an itemset satisfies the rule constraint, so do all of its supersets. If a rule constraint
obeys this property, it is monotonic.
 Succinct enumerate all and only those sets that are guaranteed to satisfy the constraint.
That is,
 if a rule constraint is succinct, we can directly generate precisely the sets that satisfy it,
even before support counting begins.
 The constraint which belongs to none of the above category they are convertible rules.i need
indetailed explantaion with understandable english for 12 marks for this quesion readabove
content and answer for this 1.How patering is done in Multilevel and multidimensional space.

Here's a well-structured, detailed, and easy-to-understand explanation for your 12-mark answer on
"How patterning is done in multilevel and multidimensional space."

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/14
Patterning in Multilevel and Multidimensional Space
Pattern mining involves identifying frequently occurring itemsets or associations in a dataset. When
dealing with complex data, patterning can be done at multiple levels of abstraction (multilevel) or across
different dimensions (multidimensional). Let’s explore how these two approaches work.

1. Multilevel Patterning
Multilevel pattern mining involves extracting frequent itemsets across different levels of abstraction
within a concept hierarchy. A concept hierarchy is a structured representation where data is
categorized into different levels of generalization. For example:

Low-level concept: "Dell Inspiron 15"


Higher-level concept: "Laptop Computer"

Even higher-level concept: "Computer"

How Multilevel Patterning is Done

A top-down approach is generally followed, starting from a higher-level category and moving downward.
The frequent itemsets are mined at each level using algorithms like Apriori. The process is as follows:

1. Start from the Highest Level: The algorithm first identifies frequent itemsets at the highest level
of abstraction (e.g., "computer").
2. Move Down the Hierarchy: If an itemset is frequent, then its subcategories (e.g., "laptop
computer" and "desktop computer") are examined at a lower level.

3. Stop When No More Frequent Itemsets are Found: This continues until no more frequent
itemsets can be discovered at a deeper level.

Variations in Multilevel Patterning

Different techniques are used to adjust the support threshold (minimum frequency required for an
itemset to be considered frequent):

Uniform Support: A fixed support threshold is used across all levels. (e.g., 5% at all levels)

Reduced Support: Lower levels have a reduced support threshold to allow for more specific
itemsets to be discovered. (e.g., 5% at level 1, 3% at level 2)
Group-Based Support: Different support thresholds are set for specific groups of items based on
business priorities. (e.g., lower threshold for high-priced items like "Luxury Watches")

Challenges in Multilevel Patterning

Redundant Rules: If "Computer → Printer" is a frequent rule, then "Laptop → Printer" may also
appear as a rule, leading to redundancy.
Choosing the Right Support Levels: If the support threshold is too high, detailed patterns at
lower levels may be missed.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/14
2. Multidimensional Patterning
Multidimensional pattern mining involves identifying associations between different attributes
(dimensions) in a dataset. Unlike multilevel mining, where data is generalized within a hierarchy,
multidimensional mining looks for patterns across different data fields.

How Multidimensional Patterning is Done

1. Identify Dimensions: The dataset is divided into multiple dimensions (e.g., Age, Income,
Occupation, and Purchase Behavior).

2. Find Frequent Patterns in Each Dimension: The algorithm searches for frequent associations
within each dimension separately.
3. Combine Patterns Across Dimensions: Patterns from different dimensions are combined to find
meaningful relationships.

4. Evaluate Rule Strength: Rules are evaluated using support and confidence to ensure meaningful
associations.

Types of Multidimensional Associations

1. Interdimensional Association Rules

These rules involve different dimensions, each occurring only once.


Example:
Rule: (Age = 25-35) & (Income = 50K-70K) → (Buys = Laptop)

Interpretation: Customers aged 25-35 with an income of 50K-70K are likely to buy a
laptop.
2. Hybrid-Dimensional Association Rules

These rules involve repeated dimensions.


Example:
Rule: (Age = 25-35) & (Age = 36-45) → (Buys = Smartwatch)

Interpretation: A person in the age group of 25-35 or 36-45 is likely to buy a


smartwatch.

Handling Quantitative Attributes in Multidimensional Mining

Since attributes like income, age, and price are numeric, they need to be categorized before mining
patterns. This can be done using:

Static Discretization: Predefined intervals like "Low (0-20K)", "Medium (21K-50K)", "High (51K+)"
are used.
Dynamic Discretization: Clustering techniques like k-means dynamically group values based on
distribution.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/14
3. Mining Quantitative Patterns
Three major techniques are used to extract patterns from numeric attributes:

1. Data Cube Method

A multidimensional data cube stores aggregated data for quick retrieval.


Example: A cube storing sales data based on "Age, Income, and Buys" allows easy
identification of trends.

2. Clustering-Based Method
Top-Down Approach: Clusters similar data points and then finds frequent patterns.
Bottom-Up Approach: Finds frequent patterns first and then merges them into clusters.

3. Statistical Analysis Method


Identifies patterns that significantly deviate from normal behavior.
Example: Customers earning $10K usually don’t buy luxury cars, but if a rule says otherwise, it
may indicate fraudulent behavior.

4. Rare and Negative Pattern Mining


Rare Patterns: Items that occur with very low frequency. Example: Diamond watch sales in a
jewelry store.

Negative Patterns: If two items never appear together, it indicates a negative correlation.
Example: "Customers buy Coca-Cola Classic OR Diet Coke, but not both."

5. Constraint-Based Frequent Pattern Mining


Users can limit the search space by setting constraints.

Types of Constraints

1. Knowledge Constraints: Focus on association rules, classification, or clustering.


2. Data Constraints: Specify which data to consider (e.g., transactions from 2024 only).

3. Level Constraints: Choose specific concept levels in the hierarchy.

4. Interestingness Constraints: Set thresholds for support and confidence to avoid trivial rules.
5. Rule Constraints: Define the format of acceptable rules. Example: "Customer (X) must be in the
Age group 20-30".

Metarule-Guided Mining

Metarules specify what kind of patterns should be extracted.


Example:

Metarule: “If (Customer, Age) → (Buys, Product)”

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/14
Rule found: “If (Age = 25-35) → (Buys = Smartwatch)”

Pattern Pruning Constraints

To optimize mining, constraints help remove unnecessary patterns:

1. Antimonotonic Constraints: If an itemset isn’t frequent, its superset won’t be either.


2. Monotonic Constraints: If an itemset is frequent, all its supersets are too.

3. Succinct Constraints: Directly list valid sets before counting support.

4. Convertible Constraints: Can be converted to simpler forms.


5. Inconvertible Constraints: Cannot be optimized easily.

Conclusion
Multilevel patterning allows mining at different abstraction levels, following a top-down
approach.

Multidimensional patterning finds patterns across multiple attributes using interdimensional or


hybrid rules.

Quantitative patterns are extracted using data cubes, clustering, and statistics.

Rare and negative patterns help identify uncommon trends.


Constraints help optimize mining by focusing on relevant patterns and avoiding redundant rules.

By understanding and applying these techniques, businesses can extract meaningful insights from large
datasets effectively! 🚀

1.How patering is done in Multilevel and multidimensional space. in data mining and data
warehousing in understandable english with indeatiled explanation for 12 marks

Patterning in Multilevel and Multidimensional Space in Data


Mining and Data Warehousing

Introduction
Patterning in data mining and data warehousing refers to the discovery of relationships, trends, or
structures in large datasets. These patterns help businesses and organizations make data-driven
decisions. Two major approaches to patterning are multilevel association rules and multidimensional
association rules.

Multilevel association rules identify relationships at different levels of abstraction using a


concept hierarchy.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/14
Multidimensional association rules identify relationships among multiple attributes or
dimensions in a dataset.

Now, let’s discuss each of these in detail.

1. Multilevel Patterning in Data Mining


Concept of Multilevel Association Rules

Multilevel association rules involve mining frequent patterns at different levels of abstraction in a
concept hierarchy. A concept hierarchy is a structured way of organizing data from general to specific.

For example, consider an electronic store selling items. A concept hierarchy for products might look
like this:

nginx

`Electronics
├── Computer
│ ├── Laptop
│ ├── Desktop
├── Mobile
│ ├── Smartphone
│ ├── Feature Phone
`

In multilevel association rule mining, we start from the highest level (general category) and move
down to more specific categories.

How Multilevel Patterning Works

Multilevel association rule mining follows a top-down approach:

1. Step 1: Mining at the Top Level

Find frequent itemsets using a minimum support threshold (e.g., 5%).

Example: “Electronics” appears in 20% of transactions, so it is frequent.


2. Step 2: Move to Lower Levels

If "Electronics" is frequent, check the next level: “Computer” and “Mobile”.

If "Computer" is frequent, break it down further into "Laptop" and "Desktop".


3. Step 3: Stop When No More Frequent Patterns Are Found

The process continues until we reach a level where no frequent patterns exist.

Methods for Multilevel Patterning

1. Uniform Support

The same minimum support is applied at all levels.


Example: If we set 5% as the minimum support, all levels must meet this requirement.

2. Reduced Support

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/14
Lower levels have lower minimum support to allow deeper pattern discovery.

Example: Level 1 (5%), Level 2 (3%).


3. Group-Based Support

Different minimum support is assigned to different item groups.

Example: Expensive items (e.g., "Cameras over $1000") may have a lower threshold.

Challenges in Multilevel Patterning

Redundant Rules: Many rules may repeat at different levels (e.g., “Electronics → Buy” and “Laptop
→ Buy”).

Complex Computation: More levels mean more calculations and time consumption.

2. Multidimensional Patterning in Data Mining


Concept of Multidimensional Association Rules

Unlike multilevel rules that focus on a single attribute hierarchy (e.g., product type), multidimensional
association rules examine relationships across multiple attributes.

For example, a retail store collects data on:

Customer Age

Customer Occupation
Items Purchased

An example of a multidimensional association rule:

scss

`(Young Adults, Students) → (Buy Laptop)


`

This means that young adults who are students are likely to buy laptops.

Types of Multidimensional Patterns

1. Interdimensional Association Rules

Each attribute appears only once.


Example: “(Age 25-35, Salary > $5000) → (Buy Smartphone)”

2. Hybrid-Dimensional Association Rules

Some attributes appear multiple times.

Example: “(Customer Age 20-30, Buys Laptop, Buys Smartphone) → (Buys Headphones)”

How Multidimensional Patterning Works

1. Step 1: Identify Relevant Dimensions

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/14
Choose attributes like age, occupation, income, and purchase behavior.
2. Step 2: Discretization of Quantitative Attributes
Convert numerical values into meaningful groups.

Example: Instead of using exact income values, group them as:

Low: $0 - $20K
Medium: $21K - $50K

High: > $50K

3. Step 3: Apply an Association Rule Mining Algorithm


Use Apriori or other algorithms to find frequent itemsets.

4. Step 4: Generate Rules and Analyze Patterns

Identify relationships, such as:


“(Occupation = Student, Age = 18-25) → (Buys Gaming Laptop)”

“(Income > $100K, Age > 40) → (Buys Luxury Car)”

Challenges in Multidimensional Patterning

Handling Large Datasets: More dimensions mean more combinations to check.

Data Discretization Issues: Poor grouping of values can lead to misleading patterns.

Comparison: Multilevel vs. Multidimensional Patterning


Feature Multilevel Patterning Multidimensional Patterning

Focus Hierarchy within a single attribute Relationships across multiple attributes


Example Electronics → Computer → Laptop (Age, Income) → (Buys Laptop)

Technique Top-down approach using concept hierarchies Mining across multiple dimensions

Challenge Redundant rules at different levels Handling large datasets

Conclusion
Patterning in multilevel and multidimensional spaces is a powerful approach in data mining and data
warehousing. Multilevel patterning focuses on discovering relationships within a single hierarchy,
while multidimensional patterning finds relationships across multiple attributes. Both approaches
help businesses understand customer behavior, optimize inventory, and improve decision-making.

By using efficient algorithms, adjusting support thresholds, and handling data complexity, organizations
can extract meaningful insights from large datasets.
This detailed explanation provides a structured understanding of patterning in multilevel and
multidimensional spaces, ensuring clarity for academic purposes. 🚀

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/14
what is the difefrence between two answers u provided ineed to study for my exam

The two answers I provided are actually the same. I didn’t provide a different explanation, but I made
sure the response was structured and detailed so that you can easily understand and study for your
exam.

How to Study This Answer Efficiently?

Here’s a quick breakdown of what you should focus on:

1. Multilevel Patterning (Hierarchical Approach)

Focus: Finding patterns at different levels of abstraction within a single attribute.

Uses a Concept Hierarchy: Starts from general categories and moves to specific ones.

Example:
Level 1: Electronics → frequent

Level 2: Computers & Mobiles → frequent

Level 3: Laptops & Desktops → frequent


Real-life Use: Analyzing product categories in retail stores.

2. Multidimensional Patterning (Attribute-Based Approach)

Focus: Finding relationships across multiple attributes.

Uses Multiple Dimensions: Age, Income, Occupation, Product Purchased, etc.

Example:
(Age = 18-25, Student) → (Buys Laptop)

(Income > $100K, Age > 40) → (Buys Luxury Car)

Real-life Use: Understanding customer purchase behavior based on demographics.

Key Difference

Feature Multilevel Patterning Multidimensional Patterning


Structure Uses a hierarchy within one attribute Uses multiple attributes

Example Electronics → Computers → Laptops (Age, Income) → (Buys Laptop)


Finding patterns at different levels of the same Finding relationships between different data
Purpose
data type types

If you need a short revision before your exam, focus on this table and the key concepts above. Let me
know if you need further simplification or diagrams to understand better! 😊

how to calculate confidence in apriori algorithm for 3-itmset indetailed

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 11/14
Confidence Calculation in Apriori Algorithm for a 3-itemset

In the Apriori algorithm, confidence is a measure of how often a rule X → Y is found to be true. It is
defined as:

Support(X ∪ Y )
Confidence(X → Y ) =
Support(X)

where:

Support(X ∪ Y) = The number of transactions containing both X and Y.


Support(X) = The number of transactions containing X alone.

Now, let's go step by step to calculate confidence for a 3-itemset.

Step-by-Step Explanation with an Example

Let's say we have a dataset of 10 transactions, and we are analyzing the frequent 3-itemset:

{M ilk, Bread, Butter}

with the following frequency counts:

Itemset Support Count (Number of Transactions)


{Milk, Bread, Butter} 4
{Milk, Bread} 6

{Milk, Butter} 5
{Bread, Butter} 7
{Milk} 8

{Bread} 9
{Butter} 7

We will now calculate confidence for all possible association rules from this 3-itemset.

Step 1: Generate Association Rules from {Milk, Bread, Butter}

Since we are given a 3-itemset, we can generate rules by splitting it into antecedents (LHS) and
consequents (RHS).

Rule 1: {Milk, Bread} → {Butter}

Support({M ilk, Bread, Butter})


Confidence =
Support({M ilk, Bread})

= \frac{4}{6} = 0.6667 \text{ (or 66.67%)}

Rule 2: {Milk, Butter} → {Bread}

Support({M ilk, Bread, Butter})


Confidence =
Support({M ilk, Butter})

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 12/14
= \frac{4}{5} = 0.8 \text{ (or 80%)}

Rule 3: {Bread, Butter} → {Milk}

Support({M ilk, Bread, Butter})


Confidence =
Support({Bread, Butter})

= \frac{4}{7} = 0.5714 \text{ (or 57.14%)}

Rule 4: {Milk} → {Bread, Butter}

Support({M ilk, Bread, Butter})


Confidence =
Support({M ilk})

= \frac{4}{8} = 0.5 \text{ (or 50%)}

Rule 5: {Bread} → {Milk, Butter}

Support({M ilk, Bread, Butter})


Confidence =
Support({Bread})

= \frac{4}{9} = 0.4444 \text{ (or 44.44%)}

Rule 6: {Butter} → {Milk, Bread}

Support({M ilk, Bread, Butter})


Confidence =
Support({Butter})

= \frac{4}{7} = 0.5714 \text{ (or 57.14%)}

Step 2: Interpreting Confidence Values

The rule {Milk, Butter} → {Bread} has the highest confidence (80%), meaning that if someone
buys Milk and Butter, there is an 80% chance they will also buy Bread.

The rule {Bread} → {Milk, Butter} has the lowest confidence (44.44%), meaning that if someone
buys Bread, they are less likely to buy both Milk and Butter together.

Step 3: Setting a Confidence Threshold

In Apriori, we usually set a minimum confidence threshold (e.g., 50%).

If a rule has confidence ≥ threshold, it is considered a strong rule.

If confidence is below the threshold, it is discarded.

Example Decision:

If minimum confidence = 60%, then:


✅ {Milk, Butter} → {Bread} (80%) → Accepted
✅ {Milk, Bread} → {Butter} (66.67%) → Accepted
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 13/14
❌ {Bread} → {Milk, Butter} (44.44%) → Rejected
Final Summary

1. Confidence measures the strength of an association rule.


2. It is calculated as:

Support(X ∪ Y )
Confidence(X → Y ) =
Support(X)

3. For a 3-itemset, you generate all possible rules and compute confidence for each.

4. Only rules that meet the confidence threshold are considered useful.

Exam Tip:

Explain the formula clearly.


Provide a step-by-step example with calculations.
Mention why confidence matters in pattern discovery.

Discuss the minimum confidence threshold and how it filters useful rules.

Would you like a simplified version or any practice questions to prepare for your exam? 😊

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 14/14

You might also like