0% found this document useful (0 votes)
13 views

III Unit-DM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

III Unit-DM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Association Rule



Association rule mining finds interesting associations and relationships among


large sets of data items. This rule shows how frequently a itemset occurs in a
transaction. A typical example is a Market Based Analysis. Market Based Analysis
is one of the key techniques used by large relations to show associations between
items.It allows retailers to identify relationships between the items that people buy
together frequently. Given a set of transactions, we can find rules that will predict
the occurrence of an item based on the occurrences of other items in the
transaction.
TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke


Before we start defining the rule, let us first see the basic definitions. Support
Count( ) – Frequency of occurrence of a itemset.
Here ({Milk, Bread, Diaper})=2
Frequent Itemset – An itemset whose support is greater than or equal to minsup
threshold. Association Rule – An implication expression of the form X -> Y,
where X and Y are any 2 itemsets.
Example: {Milk, Diaper}->{Beer}
Rule Evaluation Metrics –
 Support(s) – The number of transactions that include items in the {X} and {Y}
parts of the rule as a percentage of the total number of transaction.It is a
measure of how frequently the collection of items occur together as a
percentage of all transactions.
 Support = (X+Y) total – It is interpreted as fraction of transactions
that contain both X and Y.
 Confidence(c) – It is the ratio of the no of transactions that includes all items in
{B} as well as the no of transactions that includes all items in {A} to the no of
transactions that includes all items in {A}.
 Conf(X=>Y) = Supp(X Y) Supp(X) – It measures how often each
item in Y appears in transactions that contains items in X also.
 Lift(l) – The lift of the rule X=>Y is the confidence of the rule divided by the
expected confidence, assuming that the itemsets X and Y are independent of
each other.The expected confidence is the confidence divided by the frequency
of {Y}.
 Lift(X=>Y) = Conf(X=>Y) Supp(Y) – Lift value near 1 indicates X and
Y almost often appear together as expected, greater than 1 means they appear
together more than expected and less than 1 means they appear less than
expected.Greater lift values indicate stronger association.
Example – From the above table, {Milk, Diaper}=>{Beer}
s= ({Milk, Diaper, Beer}) |T|
= 2/5
= 0.4

c= (Milk, Diaper, Beer) (Milk, Diaper)


= 2/3
= 0.67

l= Supp({Milk, Diaper, Beer}) Supp({Milk, Diaper})*Supp({Beer})


= 0.4/(0.6*0.6)
= 1.11
The Association rule is very useful in analyzing datasets. The data is collected
using bar-code scanners in supermarkets. Such databases consists of a large
number of transaction records which list all items bought by a customer on a single
purchase. So the manager could know if certain groups of items are consistently
purchased together and use this data for adjusting store layouts, cross-selling,
promotions based on statistics.

Are you passionate about data and looking to make one giant leap into your career?
Our Data Science Course will help you change your game and, most importantly,
allow students, professionals, and working adults to tide over into the data science
immersion. Master state-of-the-art methodologies, powerful tools, and industry
best practices, hands-on projects, and real-world applications. Become the
executive head of industries related to Data Analysis, Machine Learning,
and Data Visualization with these growing skills. Ready to Transform Your
Future? Enroll Now to Be a Data Science Expert!

Frequent Item set in Data set (Association Rule Mining)




INTRODUCTION:

1. Frequent item sets, also known as association rules, are a fundamental concept
in association rule mining, which is a technique used in data mining to discover
relationships between items in a dataset. The goal of association rule mining is
to identify relationships between items in a dataset that occur frequently
together.
2. A frequent item set is a set of items that occur together frequently in a dataset.
The frequency of an item set is measured by the support count, which is the
number of transactions or records in the dataset that contain the item set. For
example, if a dataset contains 100 transactions and the item set {milk, bread}
appears in 20 of those transactions, the support count for {milk, bread} is 20.
3. Association rule mining algorithms, such as Apriori or FP-Growth, are used to
find frequent item sets and generate association rules. These algorithms work
by iteratively generating candidate item sets and pruning those that do not meet
the minimum support threshold. Once the frequent item sets are found,
association rules can be generated by using the concept of confidence, which is
the ratio of the number of transactions that contain the item set and the number
of transactions that contain the antecedent (left-hand side) of the rule.
4. Frequent item sets and association rules can be used for a variety of tasks such
as market basket analysis, cross-selling and recommendation systems.
However, it should be noted that association rule mining can generate a large
number of rules, many of which may be irrelevant or uninteresting. Therefore,
it is important to use appropriate measures such as lift and conviction to
evaluate the interestingness of the generated rules.
Association Mining searches for frequent items in the data set. In frequent mining
usually, interesting associations and correlations between item sets in transactional
and relational databases are found. In short, Frequent Mining shows which items
appear together in a transaction or relationship.
Need of Association Mining: Frequent mining is the generation of association
rules from a Transactional Dataset. If there are 2 items X and Y purchased
frequently then it’s good to put them together in stores or provide some discount
offer on one item on purchase of another item. This can really increase sales. For
example, it is likely to find that if a customer buys Milk and bread he/she also
buys Butter. So the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So the
seller can suggest the customer buy butter if he/she buys Milk and Bread.

Important Definitions :

 Support : It is one of the measures of interestingness. This tells about the


usefulness and certainty of rules. 5% Support means total 5% of transactions
in the database follow the rule.
Support(A -> B) = Support_count(A ∪ B)
 Confidence: A confidence of 60% means that 60% of the customers who
purchased a milk and bread also bought butter.
Confidence(A -> B) = Support_count(A ∪ B) / Support_count(A)
If a rule satisfies both minimum support and minimum confidence, it is a strong
rule.
 Support_count(X): Number of transactions in which X appears. If X is
A union B then it is the number of transactions in which A and B both are
present.
 Maximal Itemset: An itemset is maximal frequent if none of its supersets are
frequent.
 Closed Itemset: An itemset is closed if none of its immediate supersets have
same support count same as Itemset.
 K- Itemset: Itemset which contains K items is a K-itemset. So it can be said
that an itemset is frequent if the corresponding support count is greater than the
minimum support count.
Example On finding Frequent Itemsets – Consider the given dataset with given

transactions.
 Lets say minimum support count is 3
 Relation hold is maximal frequent => closed => frequent
1-frequent: {A} = 3; // not closed due to {A, C} and not maximal {B} = 4; // not
closed due to {B, D} and no maximal {C} = 4; // not closed due to {C, D} not
maximal {D} = 5; // closed item-set since not immediate super-set has same count.
Not maximal
2-frequent: {A, B} = 2 // not frequent because support count < minimum support
count so ignore {A, C} = 3 // not closed due to {A, C, D} {A, D} = 3 // not closed
due to {A, C, D} {B, C} = 3 // not closed due to {B, C, D} {B, D} = 4 // closed but
not maximal due to {B, C, D} {C, D} = 4 // closed but not maximal due to {B, C,
D}
3-frequent: {A, B, C} = 2 // ignore not frequent because support count < minimum
support count {A, B, D} = 2 // ignore not frequent because support count <
minimum support count {A, C, D} = 3 // maximal frequent {B, C, D} = 3 //
maximal frequent
4-frequent: {A, B, C, D} = 2 //ignore not frequent </
ADVANTAGES OR DISADVANTAGES:

Advantages of using frequent item sets and association rule mining include:

1. Efficient discovery of patterns: Association rule mining algorithms are efficient


at discovering patterns in large datasets, making them useful for tasks such as
market basket analysis and recommendation systems.
2. Easy to interpret: The results of association rule mining are easy to understand
and interpret, making it possible to explain the patterns found in the data.
3. Can be used in a wide range of applications: Association rule mining can be
used in a wide range of applications such as retail, finance, and healthcare,
which can help to improve decision-making and increase revenue.
4. Handling large datasets: These algorithms can handle large datasets with many
items and transactions, which makes them suitable for big-data scenarios.

Disadvantages of using frequent item sets and association rule mining include:

1. Large number of generated rules: Association rule mining can generate a large
number of rules, many of which may be irrelevant or uninteresting, which can
make it difficult to identify the most important patterns.
2. Limited in detecting complex relationships: Association rule mining is limited
in its ability to detect complex relationships between items, and it only
considers the co-occurrence of items in the same transaction.
3. Can be computationally expensive: As the number of items and transactions
increases, the number of candidate item sets also increases, which can make the
algorithm computationally expensive.
4. Need to define the minimum support and confidence threshold: The minimum
support and confidence threshold must be set before the association rule mining
process, which can be difficult and requires a good understanding of the data.

Multilevel Association Rule in data mining


Last Updated : 16 Dec, 2021



Multilevel Association Rule :
Association rules created from mining information at different degrees of
reflection are called various level or staggered association rules.
Multilevel association rules can be mined effectively utilizing idea progressions
under a help certainty system.
Rules at a high idea level may add to good judgment while rules at a low idea level
may not be valuable consistently.
Utilizing uniform least help for all levels :
 At the point when a uniform least help edge is utilized, the pursuit system is
rearranged.
 The technique is likewise straightforward, in that clients are needed to indicate
just a single least help edge.
 A similar least help edge is utilized when mining at each degree of deliberation.
(for example for mining from “PC” down to “PC”). Both “PC” and “PC”
discovered to be incessant, while “PC” isn’t.
Needs of Multidimensional Rule :
 Sometimes at the low data level, data does not show any significant pattern but
there is useful information hiding behind it.
 The aim is to find the hidden information in or between levels of abstraction.
Approaches to multilevel association rule mining :
1. Uniform Support(Using uniform minimum support for all level)
2. Reduced Support (Using reduced minimum support at lower levels)
3. Group-based Support(Using item or group based support)
Let’s discuss one by one.
1. Uniform Support –
At the point when a uniform least help edge is used, the search methodology is
simplified. The technique is likewise basic in that clients are needed to
determine just a single least help threshold. An advancement technique can be
adopted, based on the information that a progenitor is a superset of its
descendant. the search keeps away from analyzing item sets containing
anything that doesn’t have minimum support. The uniform support approach
however has some difficulties. It is unlikely that items at lower levels of
abstraction will occur as frequently as those at higher levels of abstraction. If
the minimum support threshold is set too high it could miss several meaningful
associations occurring at low abstraction levels. This provides the motivation
for the following approach.
2. Reduce Support –
For mining various level relationship with diminished support, there are various
elective hunt techniques as follows.
 Level-by-Level independence –
This is a full-broadness search, where no foundation information on regular
item sets is utilized for pruning. Each hub is examined, regardless of
whether its parent hub is discovered to be incessant.
 Level – cross-separating by single thing –
A thing at the I level is inspected if and just if its parent hub at the (I-1) level
is regular .all in all, we research a more explicit relationship from a more
broad one. If a hub is frequent, its kids will be examined; otherwise, its
descendant is pruned from the inquiry.
 Level-cross separating by – K-itemset –
A-itemset at the I level is inspected if and just if it’s For mining various
level relationship with diminished support, there are various elective hunt
techniques.
 Level-by-Level independence –
This is a full-broadness search, where no foundation information on regular
item sets is utilized for pruning. Each hub is examined, regardless of
whether its parent hub is discovered to be incessant.
 Level – cross-separating by single thing –
A thing at the 1st level is inspected if and just if its parent hub at the (I-1)
the level is regular .all in all, we research a more explicit relationship from a
more broad one. If a hub is frequent, its kids will be examined otherwise, its
descendant is pruned from the inquiry.
 Level-cross separating by – K-item set –
A-item set at the I level is inspected if and just if its corresponding parents A
item set (i-1) level is frequent.
3. Group-based support –
The group-wise threshold value for support and confidence is input by the user
or expert. The group is selected based on a product price or item set because
often expert has insight as to which groups are more important than others.
Example –
For e.g. Experts are interested in purchase patterns of laptops or clothes in the
non and electronic category. Therefore low support threshold is set for this
group to give attention to these items’ purchase patterns.

Are you a student in Computer Science or an employed professional looking to


take up the GATE 2025 Exam? Of course, you can get a good score in it but to get
the best score our GATE CS/IT 2025 - Self-Paced Course is available on
GeeksforGeeks to help you with its preparation. Get comprehensive coverage
of all topics of GATE, detailed explanations, and practice questions for study.
Study at your pace. Flexible and easy-to-follow modules. Do well in GATE to
enhance the prospects of your career. Enroll now and let your journey to success
begin!

Mining multidimensional association rules from relational databases and data


warehouses

https://www.youtube.com/watch?v=M3wyG3HKuNg&t=552s

From association mining to correlation analysis

https://www.youtube.com/watch?v=Dy9urawfXos&t=47s

Constraint Based Association Mining

https://www.youtube.com/watch?v=wmzpgKeI8QI

You might also like