03 InformationGain
03 InformationGain
Information Gain
These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many
others who made their course materials freely available online. Feel free to reuse or adapt these slides for
your own academic purposes, provided that you include proper attribution.
Robot Image Credit: Viktoriya Sukhanova © 123RF.com
Last Time: Basic Algorithm for
Top-Down Learning of Decision Trees
[ID3, C4.5 by Quinlan]
5
Based on slide by Pedro Domingos
Sample Entropy
Sample Entropy
7
Based on slide by Pedro Domingos
From Entropy to Information Gain
Entropy
Entropy H(X) of a random variable X
child æ1 1 ö æ 12 12 ö
impurity
entropy= -ç × log 2 ÷ - ç × log 2 ÷ = 0.391
è 13 13 ø è 13 13 ø
æ 17 ö æ 13 ö
(Weighted) Average Entropy of Children = ç × 0.787 ÷ + ç × 0.391÷ = 0.615
è 30 ø è 30 ø
Information Gain= 0.996 - 0.615 = 0.38 13
Based on slide by Pedro Domingos
Entropy-Based Automatic Decision
Tree Construction
14
Based on slide by Pedro Domingos
Using Information Gain to Construct
a Decision Tree
Choose the attribute A
Full Training Set X with highest information
Attribute A
gain for the full training
v1 v2 vk set at the root of the tree.
Construct child nodes
for each value of A. Set X ¢ X¢={xÎX | value(A)=v1}
Each has an associated
subset of vectors in repeat
which A has a particular recursively
till when?
value.
15
Based on slide by Pedro Domingos
Sample Dataset (was Tennis Played?)
• Columns denote features Xi
• Rows denote labeled instances hxi , yi i
• Class label denotes whether a tennis game was played
hxi , yi i
Slide by Tom Mitchell
Slide by Tom Mitchell
Slide by Tom Mitchell
Which Tree Should We Output?
• ID3 performs heuristic
search through space of
decision trees
• It stops at smallest
acceptable tree. Why?