0% found this document useful (0 votes)
254 views

Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives

The document discusses implementing the Apriori algorithm for association rule mining. It explains the objectives are to generate frequent patterns and association rules using Apriori with different minimum support and confidence thresholds. It then provides details on the steps of the Apriori algorithm, including generating candidates, calculating support, and pruning. An example application of the algorithm on a sample dataset is shown.

Uploaded by

Abhinay Surve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
254 views

Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives

The document discusses implementing the Apriori algorithm for association rule mining. It explains the objectives are to generate frequent patterns and association rules using Apriori with different minimum support and confidence thresholds. It then provides details on the steps of the Apriori algorithm, including generating candidates, calculating support, and pruning. An example application of the algorithm on a sample dataset is shown.

Uploaded by

Abhinay Surve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Assignment 3

Aim: Association rule mining using Apriori Algorithm.

Objectives:
 To generate Frequent patterns and association rules using Apriori algorithm

Problem Statement:
1. Consider transactions in array or csv form.
2. Implement apriori algorithm using library function using python for the
following dataset and generate rules for different minimum support and
minimum confidence thresholds.

3. Write a function to generate candidates for apriori algorithm using python for
the following dataset
[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]
4. Write a function to generate frequent patterns using apriori algorithm. Use
python programming.
Theory:
Explain:
• Write Apriori Algorithm
The Apriori algorithm is used for mining frequent itemsets and devising
association rules from a transactional database. The parameters “support”
and “confidence” are used. Support refers to items’ frequency of occurrence;
confidence is a conditional probability.
The following are the main steps of the algorithm:
1. Calculate the support of item sets (of size k = 1) in the transactional
database (note that support is the frequency of occurrence of an itemset).
This is called generating the candidate set.
2. Prune the candidate set by eliminating items with a support less than the
given threshold.
3. Join the frequent itemsets to form sets of size k + 1, and repeat the above
sets until no more itemsets can be formed. This will happen when the
set(s) formed have a support less than the given support.
Let’s go over an example to see the algorithm in action. Suppose that the
given support is 3 and the required confidence is 80%.
The transactional database
1 of 7
Now let’s create the association rules. This is where the given confidence is
required. For rule X -> YX−>Y, the confidence is calculated as Support(X
and Y)/Support(X)Support(XandY)/Support(X).
The following rules can be obtained from the size of two frequent itemsets
(2-frequent itemsets):
1. I2 -> I3I2−>I3 Confidence = 3/3 = 100%.
2. I3 -> I2I3−>I2 Confidence = 3/4 = 75%
3. I3 -> I4I3−>I4 Confidence = 3/4 = 75%.
4. I4 -> I3I4−>I3 Confidence = 3/3 = 100%
Since our required confidence is 80%, only rules 1 and 4 are included in the
result. Therefore, it can be concluded that customers who bought item two
(I2) always bought item three (I3) with it, and customers who bought item
four (I4) always bought item 3 (I3) with it.
• Solve Example considered using Apriori algorithm.
Step-1: Calculating C1 and L1:
o In the first step, we will create a table that contains support count (The
frequency of each itemset individually in the dataset) of each itemset in
the given dataset. This table is called the Candidate set or C1.

o Now, we will take out all the itemsets that have the greater support count
that the Minimum Support (2). It will give us the table for the frequent
itemset L1.
Since all the itemsets have greater or equal support count than the
minimum support, except the E, so E itemset will be removed.

Step-2: Candidate Generation C2, and L2:


o In this step, we will generate C2 with the help of L1. In C2, we will create
the pair of the itemsets of L1 in the form of subsets.
o After creating the subsets, we will again find the support count from the
main transaction table of datasets, i.e., how many times these pairs have
occurred together in the given dataset. So, we will get the below table for
C2:

o Again, we need to compare the C2 Support count with the minimum


support count, and after comparing, the itemset with less support count
will be eliminated from the table C2. It will give us the below table for L2

Step-3: Candidate generation C3, and L3:


o For C3, we will repeat the same two processes, but now we will form the
C3 table with subsets of three itemsets together, and will calculate the
support count from the dataset. It will give the below table:

o Now we will create the L3 table. As we can see from the above C3 table,
there is only one combination of itemset that has support count equal to
the minimum support count. So, the L3 will have only one combination,
i.e., {A, B, C}.
Step-4: Finding the association rules for the subsets:
To generate the association rules, first, we will create a new table with the
possible rules from the occurred combination {A, B.C}. For all the rules, we
will calculate the Confidence using formula sup( A ^B)/A. After calculating
the confidence value for all rules, we will exclude the rules that have less
confidence than the minimum threshold(50%).
Consider the below table:
Rules Support Confidence

A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)= 2/4=0.5=50%

B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50%

A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50%

C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40%

A→ B^C 2 Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%

B→ B^C 2 Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%


As the given threshold or minimum confidence is 50%, so the first three
rules A ^B → C, B^C → A, and A^C → B can be considered as the strong
association rules for the given problem.

Implementation Guidelines:
Input of the algorithm: (Transactions considered)
1. A database D.
2. A support threshold min_sup.
3. A confidence threshold min_conf.

Output of the algorithm: (Frequent Patterns.)


1. The set of frequent itemsets in D.
2. The set of valid association rules in D.

Platform: Windows
Conclusion:Thus, we have learned to generate frequent patterns using apriori
algorithm.

FAQ’s:
1) What is association rule mining?
Association rule mining finds
interesting associations and
relationships among large sets of
data items. This rule shows how
frequently a itemset occurs in a
transaction. A typical example is
Market Based Analysis.
Market Based Analysis is one of the
key techniques used by large
relations to show associations
between items.It allows retailers to
identify relationships between the
items that people buy together
frequently.

2) What is support and confidence?


The support and confidence terms
are used in implementing Market
basket analysis. These helps in
identifying the joint purchasing and
associations between products.
Support represents the popularity of
that product of all the product
transactions. Support of the product
is calculated as the ratio of the
number of transactions includes that
product and the total number of
transactions.
Support of the product = (Number of
transactions includes that product)/
(Total number of transactions)
Confidence can be interpreted as the
likelihood of purchasing both the
products A and B. Confidence is
calculated as the number of
transactions that include both A and
B divided by the number of
transactions includes only product
A.
Confidence (A=>B) = (Number of
transactions includes both A and B)/
(Number of transactions includes
only product A)

3) What are different algorithms available for association rule mining?


1)Systematization
2)BFS and Counting Occurrences
3)BFS and TID-List Intersections
4)DFS and Counting Occurrences
5)DFS and TID-List Intersections

You might also like