III Unit-DM
III Unit-DM
1 Bread, Milk
Are you passionate about data and looking to make one giant leap into your career?
Our Data Science Course will help you change your game and, most importantly,
allow students, professionals, and working adults to tide over into the data science
immersion. Master state-of-the-art methodologies, powerful tools, and industry
best practices, hands-on projects, and real-world applications. Become the
executive head of industries related to Data Analysis, Machine Learning,
and Data Visualization with these growing skills. Ready to Transform Your
Future? Enroll Now to Be a Data Science Expert!
INTRODUCTION:
1. Frequent item sets, also known as association rules, are a fundamental concept
in association rule mining, which is a technique used in data mining to discover
relationships between items in a dataset. The goal of association rule mining is
to identify relationships between items in a dataset that occur frequently
together.
2. A frequent item set is a set of items that occur together frequently in a dataset.
The frequency of an item set is measured by the support count, which is the
number of transactions or records in the dataset that contain the item set. For
example, if a dataset contains 100 transactions and the item set {milk, bread}
appears in 20 of those transactions, the support count for {milk, bread} is 20.
3. Association rule mining algorithms, such as Apriori or FP-Growth, are used to
find frequent item sets and generate association rules. These algorithms work
by iteratively generating candidate item sets and pruning those that do not meet
the minimum support threshold. Once the frequent item sets are found,
association rules can be generated by using the concept of confidence, which is
the ratio of the number of transactions that contain the item set and the number
of transactions that contain the antecedent (left-hand side) of the rule.
4. Frequent item sets and association rules can be used for a variety of tasks such
as market basket analysis, cross-selling and recommendation systems.
However, it should be noted that association rule mining can generate a large
number of rules, many of which may be irrelevant or uninteresting. Therefore,
it is important to use appropriate measures such as lift and conviction to
evaluate the interestingness of the generated rules.
Association Mining searches for frequent items in the data set. In frequent mining
usually, interesting associations and correlations between item sets in transactional
and relational databases are found. In short, Frequent Mining shows which items
appear together in a transaction or relationship.
Need of Association Mining: Frequent mining is the generation of association
rules from a Transactional Dataset. If there are 2 items X and Y purchased
frequently then it’s good to put them together in stores or provide some discount
offer on one item on purchase of another item. This can really increase sales. For
example, it is likely to find that if a customer buys Milk and bread he/she also
buys Butter. So the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So the
seller can suggest the customer buy butter if he/she buys Milk and Bread.
Important Definitions :
transactions.
Lets say minimum support count is 3
Relation hold is maximal frequent => closed => frequent
1-frequent: {A} = 3; // not closed due to {A, C} and not maximal {B} = 4; // not
closed due to {B, D} and no maximal {C} = 4; // not closed due to {C, D} not
maximal {D} = 5; // closed item-set since not immediate super-set has same count.
Not maximal
2-frequent: {A, B} = 2 // not frequent because support count < minimum support
count so ignore {A, C} = 3 // not closed due to {A, C, D} {A, D} = 3 // not closed
due to {A, C, D} {B, C} = 3 // not closed due to {B, C, D} {B, D} = 4 // closed but
not maximal due to {B, C, D} {C, D} = 4 // closed but not maximal due to {B, C,
D}
3-frequent: {A, B, C} = 2 // ignore not frequent because support count < minimum
support count {A, B, D} = 2 // ignore not frequent because support count <
minimum support count {A, C, D} = 3 // maximal frequent {B, C, D} = 3 //
maximal frequent
4-frequent: {A, B, C, D} = 2 //ignore not frequent </
ADVANTAGES OR DISADVANTAGES:
Advantages of using frequent item sets and association rule mining include:
Disadvantages of using frequent item sets and association rule mining include:
1. Large number of generated rules: Association rule mining can generate a large
number of rules, many of which may be irrelevant or uninteresting, which can
make it difficult to identify the most important patterns.
2. Limited in detecting complex relationships: Association rule mining is limited
in its ability to detect complex relationships between items, and it only
considers the co-occurrence of items in the same transaction.
3. Can be computationally expensive: As the number of items and transactions
increases, the number of candidate item sets also increases, which can make the
algorithm computationally expensive.
4. Need to define the minimum support and confidence threshold: The minimum
support and confidence threshold must be set before the association rule mining
process, which can be difficult and requires a good understanding of the data.
https://www.youtube.com/watch?v=M3wyG3HKuNg&t=552s
https://www.youtube.com/watch?v=Dy9urawfXos&t=47s
https://www.youtube.com/watch?v=wmzpgKeI8QI