L05 - Advance Analytical Theory and Methods - Classification
L05 - Advance Analytical Theory and Methods - Classification
Methods: Classification
Madava Viranjan
1
What is Classification?
• Classification is a form of data analysis that extracts models describing important data classes
• The classifier is presented with a set of examples that are already classified and from these
examples the classifier learns to assign a class label for unseen example
• Since we use a dataset with the class labels to train the classifier, classification comes under
the supervised learning mechanism
2
Some Examples
• A bank officer needs to analyze a loan application as safe or not. Categorical prediction as
“safe” or “risky”
• A medical researcher wants to analyze breast cancer data to predict which one of three
specific treatments a patient should receive. Categorical prediction as “treatment A”,
“treatment B”, “treatment C”
3
Classification vs Numeric Prediction
• In previous example if the marketing manager of the electronic shop wants to know; How
much a given customer will spend in one visit? then this task is a numeric prediction.
• The regression analysis is one of the most common numeric prediction methodology.
4
Steps in Classification
1. Learning step
Construct the classifier by learning from the training dataset
5
Steps in Classification
2. Classification step
The model is used for classification.
Accuracy of the prediction need to be measured. For this if we used the same training dataset that will
generate optimistic measures as the classifier tends to overfit the data. Therefore separate dataset
called test dataset is needed
6
Decision Trees
7
Introduction
• Uses a tree structure to specify sequences of
decisions and consequences.
• Nodes
• Tests some attribute
• Root node is the top node
• Internal nodes are the decision or test points
• Leaf nodes are at the end of the last branch,
and they represent the class label
8
Introduction ctd.
• Branching of a node is called as split
• For each split it is required to use the most informative attribute as the splitting attribute
• The common way of selecting the most informative attribute is entropy-based methods
• At the end this seems to be a bunch of if-else statements. So, why it is considered under
the data mining or machine learning?
• Consider below dataset and identify how data points will be arranged in the given
decision tree
9
10
X0 <= -2
X0 <= 2
X1 <= 2
11
Splitting an Attribute
• At any splitting point, an attribute with maximum Information Gain will be selected as
the splitting attribute
pi = probability of class i
12
Splitting an Attribute: Example
• In previous dataset there are two attributes which can be used to split. Consider below
node which checks for the given attribute value. Show which attribute should be selected
as the splitting attribute based on the information gain.
A A
X0 <= -2 X1 <= 0.8
B D
C E
13
How the Algorithm Works?
• The algorithm is called with three parameters
• D – Data partition. Initially, it is the complete set of training tuples
• Attribute_list – list of attributes describing the tuples
• Attribute_selection_method – procedure for selecting the attribute which best discriminate the given tuples
14
How the Algorithm Works?
• The expected information needed to classify a tuple in D is given by,
15
16
RID Age Income Student Credit_rating Class: buys_computer
• Prepruning
• Tree is pruned by halting the construction.
• Eg: Decide not to further split the subset of a given node. In that current node will become a leaf.
• Based on statistical significance, information gain, Gini index some threshold will be set.
• Postpruning
• Remove sub trees from fully grown tree
• Most common approach
18
Why Decision Trees are Popular?
• Do not require any domain knowledge to construct a decision tree
19
Problems With Decision Trees
• Repetition and replication of tree branches cause to large trees
20
Statistical
Classification
21
Naïve Bayes Classification
• Algorithm based on the Bayes theorem
• Bayes Theorem
• The conditional probability of event C occurring, given that event A has already occurred,
22
Naïve Bayes Classification
• Predicts that tuple belongs to a class Ci if and only if;
• How to maximize )?
𝑃 ( 𝑋|𝐶 𝑖 ) . 𝑃 (𝐶 𝑖 )
𝑃 ( 𝐶 𝑖| 𝑋 ) =
𝑃 ( 𝑋)
23
RID Age Income Student Credit_rating Class: buys_computer
25
Naïve Bayes Classification
Text Tag
“A great game” Sports
“The election was over” Not sports
“Very clean match” Sports
“A clean but forgettable game” Sports
“It was a close election” Not sports
26
Rule-based
Classification
27
Rule-based Classification
• Use IF-THEN rules for classification
• IF age = youth AND student = yes THEN buys_computer =yes
28
Evaluate
Classifier
Performance
29
Outcome of the Classification
• True Positive (TP)
• Positive tuples that were correctly labeled by the Classifier
• Error Rate
31
Evaluate the Outcome
32
Evaluate the Outcome
• Sensitivity
• True positive (recognition rate)
• Specificity
• True negative rate
33
Evaluate the Outcome
• What can we tell about below classification results?
34