Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
Objectives:
To generate Frequent patterns and association rules using Apriori algorithm
Problem Statement:
1. Consider transactions in array or csv form.
2. Implement apriori algorithm using library function using python for the
following dataset and generate rules for different minimum support and
minimum confidence thresholds.
3. Write a function to generate candidates for apriori algorithm using python for
the following dataset
[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]
4. Write a function to generate frequent patterns using apriori algorithm. Use
python programming.
Theory:
Explain:
• Write Apriori Algorithm
The Apriori algorithm is used for mining frequent itemsets and devising
association rules from a transactional database. The parameters “support”
and “confidence” are used. Support refers to items’ frequency of occurrence;
confidence is a conditional probability.
The following are the main steps of the algorithm:
1. Calculate the support of item sets (of size k = 1) in the transactional
database (note that support is the frequency of occurrence of an itemset).
This is called generating the candidate set.
2. Prune the candidate set by eliminating items with a support less than the
given threshold.
3. Join the frequent itemsets to form sets of size k + 1, and repeat the above
sets until no more itemsets can be formed. This will happen when the
set(s) formed have a support less than the given support.
Let’s go over an example to see the algorithm in action. Suppose that the
given support is 3 and the required confidence is 80%.
The transactional database
1 of 7
Now let’s create the association rules. This is where the given confidence is
required. For rule X -> YX−>Y, the confidence is calculated as Support(X
and Y)/Support(X)Support(XandY)/Support(X).
The following rules can be obtained from the size of two frequent itemsets
(2-frequent itemsets):
1. I2 -> I3I2−>I3 Confidence = 3/3 = 100%.
2. I3 -> I2I3−>I2 Confidence = 3/4 = 75%
3. I3 -> I4I3−>I4 Confidence = 3/4 = 75%.
4. I4 -> I3I4−>I3 Confidence = 3/3 = 100%
Since our required confidence is 80%, only rules 1 and 4 are included in the
result. Therefore, it can be concluded that customers who bought item two
(I2) always bought item three (I3) with it, and customers who bought item
four (I4) always bought item 3 (I3) with it.
• Solve Example considered using Apriori algorithm.
Step-1: Calculating C1 and L1:
o In the first step, we will create a table that contains support count (The
frequency of each itemset individually in the dataset) of each itemset in
the given dataset. This table is called the Candidate set or C1.
o Now, we will take out all the itemsets that have the greater support count
that the Minimum Support (2). It will give us the table for the frequent
itemset L1.
Since all the itemsets have greater or equal support count than the
minimum support, except the E, so E itemset will be removed.
o Now we will create the L3 table. As we can see from the above C3 table,
there is only one combination of itemset that has support count equal to
the minimum support count. So, the L3 will have only one combination,
i.e., {A, B, C}.
Step-4: Finding the association rules for the subsets:
To generate the association rules, first, we will create a new table with the
possible rules from the occurred combination {A, B.C}. For all the rules, we
will calculate the Confidence using formula sup( A ^B)/A. After calculating
the confidence value for all rules, we will exclude the rules that have less
confidence than the minimum threshold(50%).
Consider the below table:
Rules Support Confidence
Implementation Guidelines:
Input of the algorithm: (Transactions considered)
1. A database D.
2. A support threshold min_sup.
3. A confidence threshold min_conf.
Platform: Windows
Conclusion:Thus, we have learned to generate frequent patterns using apriori
algorithm.
FAQ’s:
1) What is association rule mining?
Association rule mining finds
interesting associations and
relationships among large sets of
data items. This rule shows how
frequently a itemset occurs in a
transaction. A typical example is
Market Based Analysis.
Market Based Analysis is one of the
key techniques used by large
relations to show associations
between items.It allows retailers to
identify relationships between the
items that people buy together
frequently.