DM Lect 9_Classification - Decision Trees
DM Lect 9_Classification - Decision Trees
Supervised Learning
— Chapter 8 —
Objects X Y Z
OB-1 1 4 1
OB-2 1 2 2
OB-3 1 4 2
OB-4 2 1 2
OB-5 1 1 1
OB-6 2 4 2
OB-7 1 1 2
OB-8 2 1 1
INTRODUCTION-
• Given the following dataset of objects
Objects X Y Z Class
OB-1 1 4 1 A
OB-2 1 2 2 B
OB-3 1 4 2 B
OB-4 2 1 2 A
OB-5 1 1 1 A
OB-6 2 4 2 B
OB-7 1 1 2 A
OB-8 2 1 1 A
Supervised vs. Unsupervised Learning
mathematical formulae
◼ Model usage: for classifying future or unknown objects
◼ Estimate accuracy of the model
◼ Note: If the test set is used to select models, it is called validation (test) set
Process (1): Model Construction
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS Qualified
Tom Assistant Prof 2 no Qualified?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Chapter 8. Classification: Basic Concepts
no yes yes
Algorithm for Decision Tree Induction
◼ Basic algorithm (a greedy algorithm)
◼ Tree is constructed in a top-down recursive divide-and-
conquer manner
◼ At start, all the training examples are at the root
discretized in advance)
◼ Examples are partitioned recursively based on selected
attributes
◼ Test attributes are selected on the basis of a heuristic or
4 4 4 4
◼ Entropy =H(Attribute 1)= - 𝑙𝑜𝑔2 ( ) - 𝑙𝑜𝑔2 ( ) =1
8 8 8 8
Attribute 2
X Y
0 Class A 8 Class B
0 0 8 8
◼ Entropy =H(Attribute 2)= - 𝑙𝑜𝑔2 ( ) - 𝑙𝑜𝑔2 ( ) =0
8 8 8 8
Brief Review of Entropy
Attribute 3
M N
5 Class A 3 Class B
5 5 3 3
◼ Entropy =H(Attribute 3)= - 𝑙𝑜𝑔2 ( ) - 𝑙𝑜𝑔2 ( )
8 8 8 8
= 0.424 + 0.531 = 0.955
Attribute 4
K L
2 Class A 6 Class B
2 2 6 6
◼ Entropy =H(Attribute 4)= - 𝑙𝑜𝑔2 ( ) - 𝑙𝑜𝑔2 ( ) = 0.811
8 8 8 8
Brief Review of Entropy
◼
Attribute Selection Measure:
Information Gain (ID3/C4.5)
◼ Select the attribute with the highest information gain
◼ Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
◼ Expected information (entropy) needed to classify a tuple in D:
m
Info( D) = − pi log 2 ( pi )
i =1
◼ Information needed (after using A to split D into v partitions) to
classify D: v | D |
InfoA ( D) = Info( D j )
j
j =1 | D |
◼ Information gained by branching on attribute A
So, Age is the root of the tree, because IG (Edible/ Smooth) has
the Greatest Information Gain Value
Decision Trees Using ID3 Algorithm
b. Which attribute should you choose as the root of a
decision tree?
Age (14) 0.941
<= 30 31..40 >40
5 4 5
[3 No , 2 Yes] [0 No , 4 Yes] [2 No , 3 Yes]
Age
Gain(income) = 0.029
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30
>40
low
medium
yes fair
yes fair
yes
yes
Gain( student ) = 0.151
Gain(credit _ rating ) = 0.048
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Decision Trees Using ID3 Algorithm
Edible
Yes
Yes
Yes
No
No
No
No
No
Decision Trees Using ID3 Algorithm
noise or outliers
◼ Poor accuracy for unseen samples