0% found this document useful (0 votes)
17 views5 pages

Da Exp 9

Uploaded by

anandkrishna1511
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

Da Exp 9

Uploaded by

anandkrishna1511
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

BHARATIYA VIDYA BHAVAN’S

SARDAR PATEL INSTITUTE OF TECHNOLOGY


(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering

Course - Data Analytics

UID 2021600022
2021600033

Name Mahek Gupta


Shruti Kedari

Class and Batch BE AIML Batch B

Date 10-11-2024

Lab 9

Aim To perform association rule mining on a dataset (Apriori Algorithm)c

Objective Association rule mining identifies patterns or relationships between items in large datasets.
In market basket analysis, it uncovers frequent item combinations, such as customers
buying bread and butter also purchasing milk. These insights help businesses optimize
product placements, promotions, and inventory management.

Theory
1. Association Rule Mining:

Association rule mining is a data mining technique used to find rules that predict the
occurrence of an item based on the occurrences of other items in a transaction. The rules
are typically represented in the form of "If-Then" statements, where if a particular set of
items (antecedent) is present, then there is a likelihood that another item or set of items
(consequent) will also be present in the same transaction.

Each rule has two main components:

● Support: Measures the frequency of occurrence of an itemset in the dataset. A


higher support indicates that the rule is more frequently applicable.
● Confidence: Measures the likelihood that items in the consequent will also be
present in transactions containing the antecedent.

For example: In a supermarket dataset, a rule like "If a customer buys bread and butter,
then they are 70% likely to buy milk" could be a common association.

2. Apriori Algorithm:

The Apriori algorithm is a widely used algorithm in association rule mining for generating
frequent itemsets. It works by identifying the frequent individual items and extending them
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering


to larger itemsets as long as they meet a minimum support threshold. This approach
reduces computational complexity by avoiding the generation of non-frequent itemsets.

Key steps in the Apriori algorithm:

1. Identify Frequent 1-itemsets: Find all individual items that meet the minimum
support threshold.
2. Generate Candidates for k-itemsets: From the (k-1)-itemsets that meet the
support threshold, create candidate k-itemsets by combining pairs of (k-1)-itemsets
that share a common prefix.
3. Prune Non-frequent Itemsets: Remove any candidate k-itemsets that do not meet
the minimum support.
4. Generate Association Rules: For each frequent itemset, generate association
rules and calculate their confidence. If the confidence meets the threshold, keep the
rule; otherwise, discard it.

Apriori Principle: This principle states that any subset of a frequent itemset must also be
frequent. The algorithm uses this property to prune the search space, reducing the number
of candidate itemsets.

Implementation / # Import necessary libraries


Code import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt

# Step 1: Create the dataset


data = {
'Milk': [1, 0, 1, 1, 0, 1, 1, 0, 1, 1],
'Bread': [1, 1, 0, 1, 0, 1, 0, 1, 1, 1],
'Butter': [1, 1, 1, 1, 1, 0, 1, 1, 0, 1],
'Cheese': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
'Eggs': [1, 1, 0, 1, 1, 0, 1, 1, 1, 0],
'Apples': [0, 1, 1, 0, 1, 1, 1, 0, 1, 1],
'Bananas': [1, 0, 1, 1, 0, 1, 0, 1, 0, 1]
}

# Convert the dictionary to a DataFrame


df = pd.DataFrame(data)
df.index.name = 'Transaction'
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering

# Save the DataFrame to a CSV file


df.to_csv('grocery_transactions.csv')
print("Dataset created and saved as 'grocery_transactions.csv'.")

# Step 2: Perform Association Rule Mining


# Load the dataset
basket = pd.read_csv('grocery_transactions.csv', index_col=0)

# Apply the Apriori algorithm to find frequent itemsets with a minimum


support of 0.3 (30%)
frequent_itemsets = apriori(basket, min_support=0.3,
use_colnames=True)

# Generate the association rules with a minimum confidence of 0.7


(70%)
rules = association_rules(frequent_itemsets, metric="confidence",
min_threshold=0.7)

# Display the generated rules


print("Association Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence',
'lift']])

# Step 3: Visualization
# Scatter plot for Support vs. Confidence
plt.figure(figsize=(10, 6))
plt.scatter(rules['support'], rules['confidence'], alpha=0.5,
color='purple')
plt.xlabel('Support')
plt.ylabel('Confidence')
plt.title('Support vs Confidence')
plt.show()

# Histogram of Lift values


plt.figure(figsize=(10, 6))
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering


plt.hist(rules['lift'], bins=10, alpha=0.7, color='blue')
plt.xlabel('Lift')
plt.ylabel('Frequency')
plt.title('Distribution of Lift Values')
plt.show()

Output
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
(Empowered Autonomous Institute Affiliated to University of Mumbai)
[Knowledge is Nectar]

Department of Computer Science Engineering

Conclusion In conclusion, association rule mining, particularly through the Apriori algorithm, is a
powerful tool for discovering meaningful relationships and patterns in large datasets. By
identifying frequently occurring itemsets and generating association rules, it provides
valuable insights that can drive strategic business decisions, optimize product offerings,
and enhance customer experiences. This technique is widely applied in areas like retail,
healthcare, and marketing, where understanding item correlations is crucial for success.

References https://www.youtube.com/watch?v=zi_ydmbWfAs

You might also like