0% found this document useful (0 votes)
80 views34 pages

L05 - Advance Analytical Theory and Methods - Classification

Naive Bayes classification is a simple statistical method for classification that is based on applying Bayes' theorem with strong independence assumptions. It calculates the probability that a given tuple belongs to a particular class based on the values of predictor variables, assuming predictor variables are independent of each other. To classify a new tuple, it predicts the class with the highest conditional probability given the values of the predictor variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views34 pages

L05 - Advance Analytical Theory and Methods - Classification

Naive Bayes classification is a simple statistical method for classification that is based on applying Bayes' theorem with strong independence assumptions. It calculates the probability that a given tuple belongs to a particular class based on the values of predictor variables, assuming predictor variables are independent of each other. To classify a new tuple, it predicts the class with the highest conditional probability given the values of the predictor variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Advance Analytical Theory and

Methods: Classification
Madava Viranjan

1
What is Classification?
• Classification is a form of data analysis that extracts models describing important data classes

• These models or classifiers predict categorical class labels

• The classifier is presented with a set of examples that are already classified and from these
examples the classifier learns to assign a class label for unseen example

• Classification is a two-step process consisting a learning step and a classification step

• Since we use a dataset with the class labels to train the classifier, classification comes under
the supervised learning mechanism

2
Some Examples
• A bank officer needs to analyze a loan application as safe or not. Categorical prediction as
“safe” or “risky”

• A marketing manager of a reputed electronic shop needs to identify whether given


customer profile will buy a computer or not. Categorical prediction as “yes” or “no”

• A medical researcher wants to analyze breast cancer data to predict which one of three
specific treatments a patient should receive. Categorical prediction as “treatment A”,
“treatment B”, “treatment C”

3
Classification vs Numeric Prediction
• In previous example if the marketing manager of the electronic shop wants to know; How
much a given customer will spend in one visit? then this task is a numeric prediction.

• In numeric prediction we are working with continuous-valued functions but in


classification we work with categorical values

• The regression analysis is one of the most common numeric prediction methodology.

4
Steps in Classification
1. Learning step
 Construct the classifier by learning from the training dataset

5
Steps in Classification
2. Classification step
 The model is used for classification.
 Accuracy of the prediction need to be measured. For this if we used the same training dataset that will
generate optimistic measures as the classifier tends to overfit the data. Therefore separate dataset
called test dataset is needed

6
Decision Trees

7
Introduction
• Uses a tree structure to specify sequences of
decisions and consequences.

• Input to a decision tree can be categorical or


continuous.

• Nodes
• Tests some attribute
• Root node is the top node
• Internal nodes are the decision or test points
• Leaf nodes are at the end of the last branch,
and they represent the class label
8
Introduction ctd.
• Branching of a node is called as split

• For each split it is required to use the most informative attribute as the splitting attribute

• The common way of selecting the most informative attribute is entropy-based methods

• At the end this seems to be a bunch of if-else statements. So, why it is considered under
the data mining or machine learning?

• Consider below dataset and identify how data points will be arranged in the given
decision tree
9
10
X0 <= -2

X0 <= 2

X1 <= 2

11
Splitting an Attribute
• At any splitting point, an attribute with maximum Information Gain will be selected as
the splitting attribute

pi = probability of class i

Wi = size of a child relative to the parent

12
Splitting an Attribute: Example
• In previous dataset there are two attributes which can be used to split. Consider below
node which checks for the given attribute value. Show which attribute should be selected
as the splitting attribute based on the information gain.

A A
X0 <= -2 X1 <= 0.8

B D
C E

13
How the Algorithm Works?
• The algorithm is called with three parameters
• D – Data partition. Initially, it is the complete set of training tuples
• Attribute_list – list of attributes describing the tuples
• Attribute_selection_method – procedure for selecting the attribute which best discriminate the given tuples

step1: Tree starts with a single Node ‘N’


step2: if all the tuples in D are in same class then ‘N’ is a leaf
step3: otherwise attribute_selection_method determines splitting criterion
step4: tuples in D are partition accordingly
• Discrete values
• Continuous value
• Discrete values and binary tree

14
How the Algorithm Works?
• The expected information needed to classify a tuple in D is given by,

• The amount of more information require to arrives at an exact classification,

• Information Gain that can obtain from the partitioning,

15
16
RID Age Income Student Credit_rating Class: buys_computer

1 Youth High No Fair No


2 Youth High No Excellent No
3 Middle High No Fair Yes
4 Senior Medium No Fair Yes
5 Senior Low Yes Fair Yes
6 Senior Low Yes Excellent No
7 Middle Low Yes Excellent Yes
8 Youth Medium No Fair No
9 Youth Low Yes Fair Yes
10 Senior Medium Yes Fair Yes
11 Youth Medium Yes Excellent Yes
12 Middle Medium No Excellent Yes
13 Middle High Yes Fair Yes
14 Senior Medium No Excellent No 17
Tree Pruning
• After constructing a tree there will be many branches due to anomalies of data
• In tree pruning it removes some least reliable branches
• Pruned trees are tend to be smaller, less complex, and faster in classifying

• Prepruning
• Tree is pruned by halting the construction.
• Eg: Decide not to further split the subset of a given node. In that current node will become a leaf.
• Based on statistical significance, information gain, Gini index some threshold will be set.

• Postpruning
• Remove sub trees from fully grown tree
• Most common approach

18
Why Decision Trees are Popular?
• Do not require any domain knowledge to construct a decision tree

• Can handle multidimensional data

• Since knowledge is represented as trees it is easy to understandable by humans

• The learning and classifications steps are fast

• Provide good accuracy

19
Problems With Decision Trees
• Repetition and replication of tree branches cause to large trees

• Loading entire dataset into memory

20
Statistical
Classification

21
Naïve Bayes Classification
• Algorithm based on the Bayes theorem

• Assumes the conditional independence of attributes

• Compute the probability that a tuple belongs to a particular class

• Bayes Theorem
• The conditional probability of event C occurring, given that event A has already occurred,

22
Naïve Bayes Classification
• Predicts that tuple belongs to a class Ci if and only if;

• How to maximize )?
𝑃 ( 𝑋|𝐶 𝑖 ) . 𝑃 (𝐶 𝑖 )
𝑃 ( 𝐶 𝑖| 𝑋 ) =
𝑃 ( 𝑋)

• Calculate conditional probabilities for attributes is required

23
RID Age Income Student Credit_rating Class: buys_computer

1 Youth High No Fair No


2 Youth High No Excellent No
3 Middle High No Fair Yes
4 Senior Medium No Fair Yes
5 Senior Low Yes Fair Yes
6 Senior Low Yes Excellent No
7 Middle Low Yes Excellent Yes
8 Youth Medium No Fair No
9 Youth Low Yes Fair Yes
10 Senior Medium Yes Fair Yes
11 Youth Medium Yes Excellent Yes
12 Middle Medium No Excellent Yes
13 Middle High Yes Fair Yes
14 Senior Medium No Excellent No 24
Naïve Bayes Classification
X = (age = Youth, Income = Medium, Student = Yes, Credit_rating = fair)

X = (age = Senior, Income = High, Student = No, Credit_rating = Excellent)

• What if one probability becomes 0?


• Use Laplasian correction

25
Naïve Bayes Classification

Text Tag
“A great game” Sports
“The election was over” Not sports
“Very clean match” Sports
“A clean but forgettable game” Sports
“It was a close election” Not sports

“A very close game” Sport or Not Sport?

26
Rule-based
Classification

27
Rule-based Classification
• Use IF-THEN rules for classification
• IF age = youth AND student = yes THEN buys_computer =yes

• Coverage and Accuracy

• What is the coverage and accuracy of R1?

28
Evaluate
Classifier
Performance

29
Outcome of the Classification
• True Positive (TP)
• Positive tuples that were correctly labeled by the Classifier

• True Negative (TN)


• Negative tuples that were correctly labeled by the Classifier

• False Positive (FP)


• Negative tuples that were incorrectly labeled as Positive by the Classifier

• False Negative (FN)


• Positive tuples that were incorrectly labeled as Negative by the Classier
30
Evaluate the Outcome
• Accuracy

• Error Rate

31
Evaluate the Outcome

Classes Buys_computer= Yes Buys_computer= No Total


Buys_computer= Yes 6954 46 7000
Buys_computer= No 412 2588 3000
Total 7366 2634 10000

32
Evaluate the Outcome
• Sensitivity
• True positive (recognition rate)

• Specificity
• True negative rate

33
Evaluate the Outcome
• What can we tell about below classification results?

Classes Cancer = Yes Cancer = No Total


Cancer = Yes 90 210 300
Cancer = No 140 9560 9700
Total 230 9770 10000

34

You might also like