Unit 3 Final
Unit 3 Final
Association Rule Mining – Mining Single Dimensions Boolean Association Rules from
Transactional Databases – Multilevel Association Rule – Classification and
Prediction – Classification by Decision Tree Induction – Bayesian Classification –
Predictive.
Before we start defining the rule, let us first see the basic definition
Support COUNT(σ)– Frequency of occurrence of a
itemset. Here σ({Milk, Bread, Diaper})=2
Frequent Itemset – An itemset whose support is greater than or equal to minsup
threshold. Association Rule – An implication expression of the form X -> Y, where
X and Y are any 2 itemsets.
Example: {Milk, Diaper}->{ Beer}
Rule Evaluation Metrics –
• Support(s) –
The number of transactions that include items in the {X} and {Y} parts of the
rule as a percentage of the total number of transaction.It is a measure of how
frequently the collection of items occur together as a percentage of all
transactions.
• Support = σ (X+Y) ÷ total –
It is interpreted as fraction of transactions that contain both X and Y.
• Confidence(c) –
It is the ratio of the no of transactions that includes all items in {B} as well as
the no of transactions that includes all items in {A} to the no of transactions
that includes all items in {A}.
• Conf(X=>Y) = Supp(XUY) ÷ Supp(X) –
It measures how often each item in Y appears in transactions that contains items in X
also.
•Lift(l) –
The lift of the rule X=>Y is the confidence of the rule divided by the expected
confidence, assuming that the itemsets X and Y are independent of each
other.The expected confidence is the confidence divided by the frequency of
{Y}.
• Lift(X=>Y) = Conf(X=>Y) ÷ Supp(Y) –
Lift value near 1 indicates X and Y almost often appear together as expected,
greater than 1
means they appear together more than expected and less than 1 means they appear
less than expected.Greater lift values indicate stronger association. Example – From
= 2/5
= 0.4
Examples:
Rule form: “Body⇒head *support, confidence+”. Buys
• Applications
⎯ *⇒maintenance agreement (what the store should do to boost
maintenance agreement sales)
⎯ Home electronics ⇒* (what other products should the store stocks up?)
⎯ Attached mailing in direct marketing
⎯ Detecting “ping-pong” ing of patients, faulty “collisions”
A C (50%, 66.6%)
C A (50%, 100%)
There are two forms of data analysis that can be used for extracting models describing
important classes or to predict future data trends. These two forms are as follows −
• Classification
• Prediction
Classification models predict categorical class labels; and prediction models predict
continuous valued functions. For example, we can build a classification model to
categorize bank loan applications as either safe or risky, or a prediction model to
predict the expenditures in dollars of potential customers on computer equipment
given their income and occupation.
What is classification?
Following are the examples of cases where the data analysis task is Classification − • A
bank loan officer wants to analyze the data in order to know which customer (loan
applicant) are risky or which are safe.
• A marketing manager at a company needs to analyze a customer with a given
profile, who will buy a new computer.
In both of the above examples, a model or classifier is constructed to predict the
categorical labels. These labels are risky or safe for loan application data and yes or no
for marketing data.
What is prediction?
Following are the examples of cases where the data analysis task is Prediction −
Suppose the marketing manager needs to predict how much a given customer will
spend during a sale at his company. In this example we are bothered to predict a
numeric value. Therefore the data analysis task is an example of numeric prediction. In
this case, a model or a predictor will be constructed that predicts a continuous-valued-
function or ordered value.
Note − Regression analysis is a statistical methodology that is most often used for
numeric prediction.
With the help of the bank loan application that we have discussed above, let us
understand the working of classification. The Data Classification process includes two
steps −
• Each tuple that constitutes the training set is referred to as a category or class.
These tuples can also be referred to as sample, object or data points.
Using Classifier for Classification
In this step, the classifier is used for classification. Here the test data is used to
estimate the accuracy of classification rules. The classification rules can be applied to
the new data tuples if the accuracy is considered acceptable.
The major issue is preparing the data for Classification and Prediction. Preparing the
data involves the following activities −
• Data Cleaning − Data cleaning involves removing the noise and treatment of
missing values. The noise is removed by applying smoothing techniques and the
problem of missing values is solved by replacing a missing value with most
commonly occurring value for that attribute.
• Relevance Analysis − Database may also have the irrelevant attributes.
Correlation analysis is used to know whether any two given attributes are related.
• Data Transformation and reduction − The data can be transformed by any of the
following methods. o Normalization − The data is transformed using
normalization. Normalization involves scaling all values for given attribute in
order to make them fall within a small specified range. Normalization is used
when in the learning step, the neural networks or the methods involving
measurements are used. o Generalization − The data can also be transformed by
generalizing it to the higher concept. For this purpose we can use the concept
hierarchies.
Note − Data can also be reduced by some other methods such as wavelet
transformation, binning, histogram analysis, and clustering.
Here is the criteria for comparing the methods of Classification and Prediction −
• Accuracy − Accuracy of classifier refers to the ability of classifier. It predict the
class label correctly and the accuracy of the predictor refers to how well a given
predictor can guess the value of predicted attribute for a new data.
• Speed − This refers to the computational cost in generating and using the
classifier or predictor.
• Robustness − It refers to the ability of classifier or predictor to make correct
predictions from given noisy data.
• Scalability − Scalability refers to the ability to construct the classifier or predictor
efficiently; given large amount of data.
• Interpretability − It refers to what extent the classifier or predictor understand
A decision tree is a structure that includes a root node, branches, and leaf nodes. Each
internal node denotes a test on an attribute, each branch denotes the outcome of a test,
and each leaf node holds a class label. The topmost node in the tree is the root node.
The following decision tree is for the concept buy_computer that indicates whether a
customer at a company is likely to buy a computer or not. Each internal node
represents a test on an attribute. Each leaf node represents a class.
The benefits of having a decision tree are as follows −
Tree Pruning
Tree pruning is performed in order to remove anomalies in the training data due to
noise or outliers. The pruned trees are smaller and less complex.
Tree Pruning Approaches
Cost Complexity
Baye's Theorem
Bayes' Theorem is named after Thomas Bayes. There are two types of probabilities −
Bayesian Belief Networks specify joint conditional probability distributions. They are
also known as Belief Networks, Bayesian Networks, or Probabilistic Networks.
• A Belief Network allows class conditional independencies to be defined between
subsets of variables.
• It provides a graphical model of causal relationship on which learning can be
performed.
• We can use a trained Bayesian Network for classification.
There are two components that define a Bayesian Belief Network −
The following diagram shows a directed acyclic graph for six Boolean variables.
The arc in the diagram allows representation of causal knowledge. For example, lung
cancer is influenced by a person's family history of lung cancer, as well as whether or
not the person is a smoker. It is worth noting that the variable PositiveXray is
independent of whether the patient has a family history of lung cancer or that the
patient is a smoker, given that we know the patient has lung cancer.
The conditional probability table for the values of the variable LungCancer (LC)
showing each possible combination of the values of its parent nodes, FamilyHistory
(FH), and Smoker
(S) is as follows −