pattern mining[1]
pattern mining[1]
Apriori algorithm refers to the algorithm which is used to calculate the association
rules between objects. It means how two or more objects are related to one another.
In other words, we can say that the apriori algorithm is an association rule leaning
that analyzes that people who bought product A also bought product B.
The primary objective of the apriori algorithm is to create the association rule
between different objects. The association rule describes how two or more objects
are related to one another. Apriori algorithm is also called frequent pattern mining.
Generally, you operate the Apriori algorithm on a database that consists of a huge
number of transactions. Let's understand the apriori algorithm with the help of an
example; suppose you go to Big Bazar and buy different products. It helps the
customers buy their products with ease and increases the sales performance of the
Big Bazar. In this tutorial, we will discuss the apriori algorithm with examples.
1. Support
2. Confidence
3. Lift
Let's take an example to understand this concept.
We have already discussed above; you need a huge database containing a large no
of transactions. Suppose you have 4000 customers transactions in a Big Bazar. You
have to calculate the Support, Confidence, and Lift for two products, and you may
say Biscuits and Chocolate. This is because customers frequently buy these two
items together.
Out of 4000 transactions, 400 contain Biscuits, whereas 600 contain Chocolate, and
these 600 transactions include a 200 that includes Biscuits and chocolates. Using
this data, we will find out the support, confidence, and lift.
Support
Support refers to the default popularity of any product. You find the support as a
quotient of the division of the number of transactions comprising that product by the
total number of transactions. Hence, we get
= 400/4000 = 10 percent.
Confidence
Confidence refers to the possibility that the customers bought both biscuits and
chocolates together. So, you need to divide the number of transactions that comprise
both biscuits and chocolates by the total number of transactions to get the
confidence.
Hence,
AD
= 200/400
= 50 percent.
It means that 50 percent of customers who bought biscuits bought chocolates also.
Lift
Consider the above example; lift refers to the increase in the ratio of the sale of
chocolates when you sell biscuits. The mathematical equations of lift are given
below.
= 50/10 = 5
It means that the probability of people buying both biscuits and chocolates together
is five times more than that of purchasing the biscuits alone. If the lift value is below
one, it requires that the people are unlikely to buy both the items together. Larger the
value, the better is the combination.
Consider a Big Bazar scenario where the product set is P = {Rice, Pulse, Oil, Milk,
Apple}. The database comprises six transactions where 1 represents the presence of
the product and 0 represents the absence of the product.
t1 1 1 1 0 0
t2 0 1 1 1 0
t3 0 0 0 1 1
t4 1 1 0 1 0
t5 1 1 1 0 1
t6 1 1 1 1 1
Step 1
Make a frequency table of all the products that appear in all the transactions. Now,
short the frequency table to add only those products with a threshold support level
of over 50 percent. We find the given frequency table.
Rice (R) 4
Pulse(P) 5
Oil(O) 4
Milk(M) 4
The above table indicated the products frequently bought by the customers.
Step 2
Create pairs of products such as RP, RO, RM, PO, PM, OM. You will get the given
frequency table.
RP 4
RO 3
RM 2
PO 4
PM 3
OM 2
Step 3
Implementing the same threshold support of 50 percent and consider the products
that are more than 50 percent. In our case, it is more than 3
Step 4
Now, look for a set of three products that the customers buy together. We get the
given combination.
Step 5
Calculate the frequency of the two itemsets, and you will get the given frequency
table.
RPO 4
POM 3
If you implement the threshold assumption, you can figure out that the customers'
set of three products is RPO.
We have considered an easy example to discuss the apriori algorithm in data mining.
In reality, you find thousands of such combinations.
In hash-based itemset counting, you need to exclude the k-itemset whose equivalent
hashing bucket count is least than the threshold is an infrequent itemset.
Transaction Reduction
The primary requirements to find the association rules in data mining are given
below.
Analyze all the rules and find the support and confidence levels for the individual
rule. Afterward, eliminate the values which are less than the threshold support and
confidence levels.
The two-step approach is a better option to find the associations rules than the Brute
Force method.
Step 1
In this article, we have already discussed how to create the frequency table and
calculate itemsets having a greater support value than that of the threshold support.
Step 2
To create association rules, you need to use a binary partition of the frequent
itemsets. You need to choose the ones having the highest confidence levels.
In the above example, you can see that the RPO combination was the frequent
itemset. Now, we find out all the rules using RPO.
RP-O, RO-P, PO-R, O-RP, P-RO, R-PO
You can see that there are six different combinations. Therefore, if you have n
elements, there will be 2n - 2 candidate association rules.
AD
Apriori algorithm was the first algorithm that was proposed for frequent itemset
mining. It was later improved by R Agarwal and R Srikant and came to be known as
Apriori. This algorithm uses two steps “join” and “prune” to reduce the search space.
It is an iterative approach to discover the most frequent itemsets.
Apriori says:
Steps In Apriori
#1) In the first iteration of the algorithm, each item is taken as a 1-itemsets
candidate. The algorithm will count the occurrences of each item.
#2) Let there be some minimum support, min_sup ( eg 2). The set of 1 – itemsets
whose occurrence is satisfying the min sup are determined. Only those candidates
which count more than or equal to min_sup, are taken ahead for the next iteration
and the others are pruned.
AD
#3) Next, 2-itemset frequent items with min_sup are discovered. For this in the join
step, the 2-itemset is generated by forming a group of 2 by combining items with
itself.
#4) The 2-itemset candidates are pruned using min-sup threshold value. Now the
table will have 2 –itemsets with min-sup only.
#5) The next iteration will form 3 –itemsets using join and prune step. This iteration
will follow antimonotone property where the subsets of 3-itemsets, that is the 2
–itemset subsets of each group fall in min_sup. If all 2-itemset subsets are frequent
then the superset will be frequent otherwise it is pruned.
#6) Next step will follow making 4-itemset by joining 3-itemset with itself and pruning
if its subset does not meet the min_sup criteria. The algorithm is stopped when the
most frequent itemset is achieved.
Example of Apriori: Support threshold=50%, Confidence= 60%
TABLE-1
T1 I1,I2,I3
T2 I2,I3,I4
T3 I4,I5
T4 I1,I2,I4
T5 I1,I2,I3,I5
T6 I1,I2,I3,I4
Solution:
TABLE-2
Item Count
I1 4
I2 5
I3 4
I4 4
I5 2
2. Prune Step: TABLE -2 shows that I5 item does not meet min_sup=3, thus it is
deleted, only I1, I2, I3, I4 meet min_sup count.
TABLE-3
Item Count
I1 4
I2 5
I3 4
I4 4
3. Join Step: Form 2-itemset. From TABLE-1 find out the occurrences of 2-itemset.
TABLE-4
Item Count
I1,I2 4
I1,I3 3
I1,I4 2
I2,I3 4
I2,I4 3
I3,I4 2
4. Prune Step: TABLE -4 shows that item set {I1, I4} and {I3, I4} does not meet
min_sup, thus it is deleted.
TABLE-5
Item Count
I1,I2 4
I1,I3 3
I2,I3 4
I2,I4 3
5. Join and Prune Step: Form 3-itemset. From the TABLE- 1 find out occurrences
of 3-itemset. From TABLE-5, find out the 2-itemset subsets which support min_sup.
AD
We can see for itemset {I1, I2, I3} subsets, {I1, I2}, {I1, I3}, {I2, I3} are occurring in
TABLE-5 thus {I1, I2, I3} is frequent.
We can see for itemset {I1, I2, I4} subsets, {I1, I2}, {I1, I4}, {I2, I4}, {I1, I4} is not
frequent, as it is not occurring in TABLE-5 thus {I1, I2, I4} is not frequent, hence it is
deleted.
TABLE-6
Item
I1,I2,I3
I1,I2,I4
I1,I3,I4
I2,I3,I4
6. Generate Association Rules: From the frequent itemset discovered above the
association could be:
Confidence = support {I1, I2, I3} / support {I1, I2} = (3/ 4)* 100 = 75%
Confidence = support {I1, I2, I3} / support {I1, I3} = (3/ 3)* 100 = 100%
Confidence = support {I1, I2, I3} / support {I2, I3} = (3/ 4)* 100 = 75%
Confidence = support {I1, I2, I3} / support {I1} = (3/ 4)* 100 = 75%
Confidence = support {I1, I2, I3} / support {I2 = (3/ 5)* 100 = 60%
Confidence = support {I1, I2, I3} / support {I3} = (3/ 4)* 100 = 75%
This shows that all the above association rules are strong if minimum confidence
threshold is 60%.
The Apriori Algorithm: Pseudo Code
Advantages
Disadvantages
1. It requires high computation if the itemsets are very large and the
minimum support is kept very low.
2. The entire database needs to be scanned.
This tree structure will maintain the association between the itemsets. The database
is fragmented using one frequent item. This fragmented part is called “pattern
fragment”. The itemsets of these fragmented patterns are analyzed. Thus with this
method, the search for frequent itemsets is reduced comparatively.
FP Tree
Frequent Pattern Tree is a tree-like structure that is made with the initial itemsets of
the database. The purpose of the FP tree is to mine the most frequent pattern. Each
node of the FP tree represents an item of the itemset.
The root node represents null while the lower nodes represent the itemsets. The
association of the nodes with the lower nodes that is the itemsets with the other
itemsets are maintained while forming the tree.
Frequent Pattern Algorithm Steps
The frequent pattern growth method lets us find the frequent pattern without
candidate generation.
Let us see the steps followed to mine the frequent pattern using frequent
pattern growth algorithm:
#1) The first step is to scan the database to find the occurrences of the itemsets in
the database. This step is the same as the first step of Apriori. The count of
1-itemsets in the database is called support count or frequency of 1-itemset.
#2) The second step is to construct the FP tree. For this, create the root of the tree.
The root is represented by null.
#3) The next step is to scan the database again and examine the transactions.
Examine the first transaction and find out the itemset in it. The itemset with the max
count is taken at the top, the next itemset with lower count and so on. It means that
the branch of the tree is constructed with transaction itemsets in descending order of
count.
#4) The next transaction in the database is examined. The itemsets are ordered in
descending order of count. If any itemset of this transaction is already present in
another branch (for example in the 1st transaction), then this transaction branch
would share a common prefix to the root.
This means that the common itemset is linked to the new node of another itemset in
this transaction.
#5) Also, the count of the itemset is incremented as it occurs in the transactions.
Both the common node and new node count is increased by 1 as they are created
and linked according to transactions.
#6) The next step is to mine the created FP Tree. For this, the lowest node is
examined first along with the links of the lowest nodes. The lowest node represents
the frequency pattern length 1. From this, traverse the path in the FP Tree. This path
or paths are called a conditional pattern base.
Table 1
T1 I1,I2,I3
T2 I2,I3,I4
T3 I4,I5
T4 I1,I2,I4
T5 I1,I2,I3,I5
T6 I1,I2,I3,I4
Solution:
Table 2
Item Count
I1 4
I2 5
I3 4
I4 4
I5 2
Table 3
Item Count
I2 5
I1 4
I3 4
I4 4
3. Build FP Tree
AD
1. The lowest node item I5 is not considered as it does not have a min
support count, hence it is deleted.
2. The next lower node is I4. I4 occurs in 2 branches ,
{I2,I1,I3:,I41},{I2,I3,I4:1}. Therefore considering I4 as suffix the prefix
paths will be {I2, I1, I3:1}, {I2, I3: 1}. This forms the conditional pattern
base.
3. The conditional pattern base is considered a transaction database, an
FP-tree is constructed. This will contain {I2:2, I3:2}, I1 is not considered
as it does not meet the min support count.
4. This path will generate all combinations of frequent patterns :
{I2,I4:2},{I3,I4:2},{I2,I3,I4:2}
5. For I3, the prefix path would be: {I2,I1:3},{I2:1}, this will generate a 2
node FP-tree : {I2:4, I1:3} and frequent patterns are generated: {I2,I3:4},
{I1:I3:3}, {I2,I1,I3:3}.
6. For I1, the prefix path would be: {I2:4} this will generate a single node
FP-tree: {I2:4} and frequent patterns are generated: {I2, I1:4}.
The diagram given below depicts the conditional FP tree associated with the
conditional node I3.
1. This algorithm needs to scan the database only twice when compared
to Apriori which scans the transactions for each iteration.
2. The pairing of items is not done in this algorithm and this makes it
faster.
3. The database is stored in a compact version in memory.
4. It is efficient and scalable for mining both long and short frequent
patterns.
FP Growth vs Apriori
FP Growth Apriori
Pattern Generation
FP growth generates pattern by Apriori generates pattern by pairing the
constructing a FP tree items into singletons, pairs and triplets.
Candidate Generation
Process
Memory Usage
ECLAT
The above method, Apriori and FP growth, mine frequent itemsets using horizontal
data format. ECLAT is a method of mining frequent itemsets using the vertical data
format. It will transform the data in the horizontal data format into the vertical format.
T1 I1,I2,I3
T2 I2,I3,I4
T3 I4,I5
T4 I1,I2,I4
T5 I1,I2,I3,I5
T6 I1,I2,I3,I4
I1 {T1,T4,T5,T6}
I2 {T1,T2,T4,T5,T6}
I3 {T1,T2,T5,T6}
I4 {T2,T3,T4,T5}
I5 {T3,T5}
This method will form 2-itemsets, 3 itemsets, k itemsets in the vertical data format.
This process with k is increased by 1 until no candidate itemsets are found. Some
optimizations techniques such as diffset are used along with Apriori.
This method has an advantage over Apriori as it does not require scanning the
database to find the support of k+1 itemsets. This is because the Transaction set will
carry the count of occurrence of each item in the transaction (support). The
bottleneck comes when there are many transactions taking huge memory and
computational time for intersecting the sets.
Conclusion
The Apriori algorithm is used for mining association rules. It works on the principle,
“the non-empty subsets of frequent itemsets must also be frequent”. It forms
k-itemset candidates from (k-1) itemsets and scans the database to find the frequent
itemsets.
Frequent Pattern Growth Algorithm is the method of finding frequent patterns without
candidate generation. It constructs an FP Tree rather than using the generate and
test strategy of Apriori. The focus of the FP Growth algorithm is on fragmenting the
paths of the items and mining frequent patterns.
https://www.softwaretestinghelp.com/weka-explorer-tutorial/
INTRODUCTION:
Association Mining searches for frequent items in the data set. In frequent mining
usually, interesting associations and correlations between item sets in transactional
and relational databases are found. In short, Frequent Mining shows which items
appear together in a transaction or relationship.
Need of Association Mining: Frequent mining is the generation of association rules
from a Transactional Dataset. If there are 2 items X and Y purchased frequently then
it’s good to put them together in stores or provide some discount offer on one item on
purchase of another item. This can really increase sales. For example, it is likely to
find that if a customer buys Milk and bread he/she also buys Butter. So the
association rule is [‘milk]^[‘bread’]=>[‘butter’]. So the seller can suggest the customer
buy butter if he/she buys Milk and Bread.
Important Definitions :
Example On finding Frequent Itemsets – Consider the given dataset with given
transactions.
1-frequent: {A} = 3; // not closed due to {A, C} and not maximal {B} = 4; // not closed
due to {B, D} and no maximal {C} = 4; // not closed due to {C, D} not maximal {D} =
5; // closed item-set since not immediate super-set has same count. Not maximal
2-frequent: {A, B} = 2 // not frequent because support count < minimum support
count so ignore {A, C} = 3 // not closed due to {A, C, D} {A, D} = 3 // not closed due
to {A, C, D} {B, C} = 3 // not closed due to {B, C, D} {B, D} = 4 // closed but not
maximal due to {B, C, D} {C, D} = 4 // closed but not maximal due to {B, C, D}
3-frequent: {A, B, C} = 2 // ignore not frequent because support count < minimum
support count {A, B, D} = 2 // ignore not frequent because support count < minimum
support count {A, C, D} = 3 // maximal frequent {B, C, D} = 3 // maximal frequent
4-frequent: {A, B, C, D} = 2 //ignore not frequent </
ADVANTAGES OR DISADVANTAGES:
Advantages of using frequent item sets and association rule mining include:
Disadvantages of using frequent item sets and association rule mining include:
conclusions from enormous volumes of data. Data mining professionals can assess
Several metrics and criteria, including support, confidence, and lift, are used in this
from vast amounts of data. Finding patterns, trends, and correlations in data allows
for the discovery of hidden information that can help with decision−making and
evaluation and pattern discovery go hand in hand since the assessment standards
and metrics adopted are frequently impacted by the aims and purposes of the
mining operation.
An association rule, for instance, may show that consumers who buy diapers also
frequently buy infant formula in a market basket study. Businesses might conduct
these analytics.
Support describes how frequently an item set appears in a dataset, indicating how
relationships.
Sequential Patterns
Data mining also uses sequential patterns, which concentrate on the time
trends in sequential data. Sequential patterns, for instance, might identify the most
is. Sequence length, frequency, and predictive metrics including predictive accuracy
and predictive power are typical assessment criteria. These assessment metrics
assist analysts in locating significant and useful patterns within sequential data,
transactions that contain the itemset. The conditional likelihood of the subsequent
association rules include lift and conviction metrics. Lift quantifies how dependent
difference between the observed and predicted levels of support for the rule under
independence.
how likely it is that the subsequent item will emerge without the antecedent. It is
the consequent's support. Strong links between the items are implied by conviction
values closer to 1.
Evaluation Methods for Sequential Patterns
Sequential Pattern Evaluation
Evaluation of sequential patterns entails determining the importance and
sequences, making sure that each extension is still common in the dataset. This
technique allows analysts to quickly find and assess sequential patterns of various
Episode Evaluation
Another assessment technique utilized in the study of sequential patterns is
episode evaluation. The term "episode" refers to a group of related events that take
episodes could stand in for groups of symptoms that frequently coexist in a given
condition.
the main goal of episode assessment. By examining episodes, analysts can obtain
insight into the patterns of how events occur together and can find significant
Conclusion
The lift and conviction measures for association rules, the sequential pattern
growth algorithm, and episode assessment for sequential patterns are only a few
optimizing their operations using the data's trustworthy patterns and relationships.
However, not all patterns are created equally. Some patterns may be spurious or meaningless, while
others may be highly predictive and useful. This is where pattern evaluation methods come in - a set
of techniques used to assess the quality and usefulness of patterns discovered through data mining.
Let's dive into pattern evaluation methods in data mining and learn more about their importance in
data science or data mining and key takeaways. You should check out the data science tutorial guide
to clarify your basic concepts.
Accuracy
The accuracy of a data mining model may be defined as the extent to which it accurately predicts the
values of the input variables. After the model has been trained, it is evaluated with a separate test
dataset to determine how well it performed. One of the most common approaches to measuring one's
accuracy level is to keep track of the number of times their predictions are accurate. The so-called
"accuracy rate" refers to that particular proportion to determine its reliability to be high quality if it
achieves 100% accuracy on the data used for training but only 50% accuracy on the data used for
testing. Because of this model's tendency toward overfitting, it cannot be utilized for analyzing new
data. A respectable model must achieve 80% or above accuracy on both training data. And the test
data to be considered credible. The universal use of the model is possible to predict newly collected
data.
The data mining models are essential, but there are other things to consider. The MAE is calculated
by taking the sum of fundamental mistakes, and the RMSE is calculated by taking the RME and the
mean absolute error (MAE).
Clustering Accuracy
This statistic determines how well the patterns discovered by the algorithm can be used to cluster
newly collected data accurately. Applying the detected practices to a set of data that has already been
tagged with known cluster labels is the method used most of the time to achieve this goal. After then,
accuracy may be determined by examining the degree to which the predicted labels are congruent
with the existing brands.
The effectiveness of a clustering algorithm can be evaluated using various criteria, including the
following:-
■ The clustering quality is determined by employing internal indices, which do not depend on
any data from the outside world. The Dunn index is the index that is utilized the majority of the
time within organizations.
■ Stability is utilized to quantify how effectively the clustering maintains its integrity in the face of
observed changes. We observed strategy as being stable when it consistently produces the
same clustering results over a wide variety of data samples on its own.
■ Evaluating how well the algorithm's clusters align with a standard external illustrates an
external index. You can use tools such as the Rand Index and the Jaccard coefficient if you
know the truth.
■ One of the most critical indicators of an algorithm's effectiveness is how quickly it can cluster
the data appropriately.
Classification Accuracy
This metric evaluates how well the algorithm's found patterns can be leveraged to label new data.
Typically, this is accomplished by applying the identified patterns to a dataset already classified with
known class labels. Accuracy may be calculated by checking how well the predicted labels match the
true ones.For classification models, a standard performance measure is classification accuracy, which
is just the proportion of times the model gets its predictions right. Even while classification accuracy is
a simple and straightforward statistic, it can be deceiving in some circumstances.
Measures of classification model performance like accuracy and recall are more illuminating in
unbalanced data sets. The model's accuracy in predicting a class is measured by its precision, while
its recall indicates how many instances of that class it adequately identified.Finding a model's
weaknesses may be more accessible by seeing how well or poorly it is doing. An additional
instrument for evaluating classification models is called a confusion matrix. Confusion matrices are
tables that describe the proportion of correct and incorrect predictions made by the model for each
class. These percentages are broken down according to the confusion matrix.
Visual Examination
By visually examining the data, the data miner may use this method, arguably the most common one,
to decide whether or not the patterns make sense. Plotting the data visually and then analyzing the
emerging pattern is involved in visual analysis. This method is utilized when there is a need for more
complexity in the data, which can be shown straightforwardly. This method is also frequently used for
categorically presenting the information. The process of determining patterns in data by visually
examining the data is referred to as "visual inspection" in data mining. One can look at the raw data
and a graph or plot to accomplish this goal. Identifying irregularities and patterns that do not conform
to the norm commonly uses this approach.
Running Time
The time it takes to train a model and provide predictions is a frequent metric for evaluating the
effectiveness of a machine learning algorithm, though it is by no means the only one. The time it takes
for the algorithm to analyze the data and identify patterns is quantified here. Standard units of time
measurement for this are seconds and minutes. The term for this type of assessment is "running time
pattern."
Measuring the execution time of an algorithm requires attention to several factors. The first
consideration is how long it will take for the data to be loaded into memory. Second, you must t think
about the time it takes to pre-process the data. Last, you must factor in the time necessary to train the
model and generate forecasts.
Algorithm execution time tends to grow proportionally with data size. This is because a more
extensive data set requires more processing power from the learning algorithm. While most
algorithms can handle enormous datasets, some perform better than others. It is essential to consider
the dataset being utilized while comparing algorithms. Different kinds of data may require different
algorithms. Hardware can also play a role in how long something takes to operate.
Support
A pattern's strength is measured by how many records out of a whole set have the pattern as a
percentage. Pattern evaluation methods in data mining and machine learning programs frequently
include support pattern evaluation methods. The support pattern evaluation methods aim to find
intriguing and valuable ways to data. To aid decision-making, evaluation of the association method
becomes necessary to support patterns to see whether any are of interest.
Several approaches exist for gauging the efficacy of a specific support pattern. Using a support
metric, which counts the times a particular pattern appears in a dataset, is a typical method.
Employing a lift metric, which compares the actual frequency of a pattern to its predicted frequency, is
another popular strategy.
Confidence
Confidence in Pattern evaluation methods in data mining, evaluating the quality of identified patterns
is accomplished through a process known as pattern assessment. Standard methods for making this
assessment include counting the number of occurrences of a pattern in a given data set and
comparing that number to the number of occurrences predicted by the data's normal distribution. A
pattern is considered to inspire high confidence levels if the frequency with which it is observed is far
higher than expected by chance alone. One may measure the reliability of a pattern by the proportion
of times it is validated as suitable. You can learn more about the six stages of data science processing
to grasp the above topic better.
Lift
A pattern's lift is the proportion of successes relative to failures when comparing the actual number of
successes to the projected number.
True positive rate (TPR) against false positive rate (FPR) is shown as a lifted pattern (FPR). TPR
measures how well a model accurately classifies positive examples, whereas FPR measures how
often negative examples are wrongly labeled as positive. While a TPR of 100% and an FPR of 0%
would be ideal, this is rarely the case in the real world. A model's perfect lifting pattern should be near
diagonal.
The model's performance drops significantly when the lifted pattern deviates too much from the
diagonal. Numerous issues, such as skewed data, inadequate feature selection, and model
overfitting, might contribute to this. As a result, we may infer that the model accurately identifies a
comparable proportion of cases as positive and negative and that the TPR and FPR are identical.
Prediction
A pattern's accuracy rate may be predicted as the number of times it is validated. Pattern evaluation
methods in data mining are done through pattern evaluation. A measure of a model's predictive ability
employs it to see how well it can extrapolate from historical data. Evaluating a model's performance or
comparing many models is possible and valuable with Prediction Pattern evaluation methods.
To assess a Prediction Pattern, it is common practice to divide the data set into training and test sets.
We use one set, called the training set, to teach the model how to behave and another set, called the
test set, to evaluate how well it did. The prediction error is computed to assess the performance of the
model. It is possible to enhance the precision of prediction models by evaluating Prediction Patterns.
Predictive models may be modified to better suit the data by utilizing a test set. Adding additional
characteristics to the data set or adjusting the model parameters are two possible approaches.
Precision
Data from a wide range of sources can be analyzed with the help of Precision Pattern Evaluation
methods. This technique may be utilized to assess the reliability of data and spot trends and patterns
within the information at hand. Errors in data may be detected, and their root causes investigated with
the help of Precision Pattern Evaluation methods. The effect of the inaccuracies needed for more
reliability of the data as a whole may also be calculated using this technique.
Pattern evaluation methods in data mining can significantly benefit from Precision Pattern Evaluation.
This strategy may be used to refine data quality and spot trends and patterns.
Bootstrapping
Training the model on the sampled data, then testing it. The actual data is the steps that make up this
methodology. This may be used to obtain a performance distribution for the model, which can provide
light on the model's stability. To gauge a model's precision, statisticians employ a resampling method
called "bootstrapping." The process entails training the model using a randomly selected subset of the
original dataset. After training, the model is put through its paces using a separate data set. Several
iterations are performed, and the model's average accuracy is determined.
C >>n
https://www.geeksforgeeks.org/multilevel-association-rule-in-data-mining
/
https://codinginfinite.com/fp-growth-algorithm-explained-with-numerical
-example/
https://www.geeksforgeeks.org/ml-frequent-pattern-growth-algorithm/
6 given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36,
8):
(a) compute the euclidean distance between the two objects.
(b) compute the manhattan distance between the two objects.
(c) compute the minkowski distance between the two objects, using q = 3.