Apriori Algorithm
Apriori Algorithm
Introduction
In data mining, Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation).
Other algorithms are designed for finding association rules in data having no transactions (Winepi and Minepi), or having no timestamps (DNA sequencing).
Overview
The whole point of the algorithm (and data mining, in general) is to extract useful information from large amounts of data. For example, the information that a customer who purchases a keyboard also tends to buy a mouse at the same time is acquired from the association rule below: Collapse | Copy Code
Keyboard -> Mouse [support = 6%, confidence = 70%]
Support: The percentage of task-relevant data transactions for which the pattern is true.
Support (Keyboard -> Mouse) = Confidence: The measure of certainty or trustworthiness associated with each discovered pattern.
Confidence (Keyboard -> Mouse) = The algorithm aims to find the rules which satisfy both a minimum support threshold and a minimum confidence threshold (Strong Rules).
Item: article in the basket. Itemset: a group of items purchased together in a single transaction.
private List<clssRules> GenerateRules() { List<clssRules> lstRulesReturn = new List<clssRules>(); foreach (string strItem in m_dicAllFrequentItems.Keys) { if (strItem.Length > 1) { int nMaxCombinationLength = strItem.Length / 2; GenerateCombination(strItem, nMaxCombinationLength, ref lstRulesReturn); } } return lstRulesReturn; } private void GenerateCombination(string strItem, int nCombinationLength, ref List<clssRules> lstRulesReturn) { int nItemLength = strItem.Length; if (nItemLength == 2) { AddItem(strItem[0].ToString(), strItem, ref lstRulesReturn); return; } else if (nItemLength == 3) { for (int i = 0; i < nItemLength; i++) { AddItem(strItem[i].ToString(), strItem, ref lstRulesReturn); } return; } else { for (int i = 0; i < nItemLength; i++) { GetCombinationRecursive(strItem[i].ToString(), strItem, nCombinationLength, ref lstRulesReturn); } } } private string GetCombinationRecursive(string strCombination, string strItem, int nCombinationLength, ref List<clssRules> lstRulesReturn) { AddItem(strCombination, strItem, ref lstRulesReturn); char cLastTokenCharacter = strCombination[strCombination.Length - 1]; int nLastTokenCharcaterIndex = strCombination.IndexOf(cLastTokenCharacter); int nLastTokenCharcaterIndexInParent = strItem.IndexOf(cLastTokenCharacter); char cNextCharacter; char cLastItemCharacter = strItem[strItem.Length - 1]; if (strCombination.Length == nCombinationLength) { if (cLastTokenCharacter != cLastItemCharacter) {
strCombination = strCombination.Remove(nLastTokenCharcaterIndex, 1); cNextCharacter = strItem[nLastTokenCharcaterIndexInParent + 1]; string strNewToken = strCombination + cNextCharacter; return (GetCombinationRecursive(strNewToken, strItem, nCombinationLength, ref lstRulesReturn)); } else { return string.Empty; } } else { if (strCombination != cLastItemCharacter.ToString()) { cNextCharacter = strItem[nLastTokenCharcaterIndexInParent + 1]; string strNewToken = strCombination + cNextCharacter; return (GetCombinationRecursive(strNewToken, strItem, nCombinationLength, ref lstRulesReturn)); } else { return string.Empty; } } }
Example
A database has five transactions. Let the min sup = 50% and min con f = 80%.
Solution
Step 1: Find all Frequent Itemsets
Lattice
Closed Itemset: support of all parents are not equal to the support of the itemset. Maximal Itemset: all parents of that itemset must be infrequent. Keep in mind:
Itemset {c} is closed as support of parents (supersets) {A C}:2, {B C}:2, {C D}:1, {C E}:2 not equal support of {c}:3. And the same for {A C}, {B E} & {B C E}. Itemset {A C} is maximal as all parents (supersets) {A B C}, {A C D}, {A C E} are infrequent. And the same for {B C E}.
License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)