0% found this document useful (0 votes)

14 views20 pages

03 InformationGain

Uploaded by

Abdullateef Abba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views20 pages

03 InformationGain

Uploaded by

Abdullateef Abba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Decision Trees:

Information Gain

These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many
others who made their course materials freely available online. Feel free to reuse or adapt these slides for
your own academic purposes, provided that you include proper attribution.
Robot Image Credit: Viktoriya Sukhanova © 123RF.com
Last Time: Basic Algorithm for
Top-Down Learning of Decision Trees
[ID3, C4.5 by Quinlan]

node = root of decision tree

Main loop:
1. A ß the “best” decision attribute for the next node.
2. Assign A as decision attribute for node.
3. For each value of A, create a new descendant of node.
4. Sort training examples to leaf nodes.
5. If training examples are perfectly classified, stop. Else,
recurse over new leaf nodes.

How do we choose which attribute is best?

Entropy
Entropy # of possible
values for X
Entropy H(X) of a random variable X

H(X) is the expected number of bits needed to encode a

randomly drawn value of X (under most efficient code)

Why? Information theory:

• Most efficient code assigns -log2P(X=i) bits to encode
the message X=i
• So, expected number of bits to code one random X is:

Slide by Tom Mitchell

Entropy
Entropy # of possible
values for X
Entropy H(X) of a random variable X

H(X) is the expected number of bits needed to encode a

randomly drawn value of X (under most efficient code)

Why? Information theory:

• Most efficient code assigns -log2P(X=i) bits to encode
the message X=i
• So, expected number of bits to code one random X is:

Slide by Tom Mitchell

2-Class Cases:
n
X
Entropy H(x) = P (x = i) log2 P (x = i)
i=1
• What is the entropy of a group in which all Minimum
examples belong to the same class? impurity
– entropy = - 1 log21 = 0

• What is the entropy of a group with 50% Maximum

in either class? impurity
– entropy = -0.5 log20.5 – 0.5 log20.5 =1

5
Based on slide by Pedro Domingos
Sample Entropy
Sample Entropy

Slide by Tom Mitchell

Information Gain
• We want to determine which attribute in a given set
of training feature vectors is most useful for
discriminating between the classes to be learned.

• Information gain tells us how important a given

attribute of the feature vectors is.

• We will use it to decide the ordering of attributes in

the nodes of a decision tree.

7
Based on slide by Pedro Domingos
From Entropy to Information Gain
Entropy
Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

Slide by Tom Mitchell

From Entropy to Information Gain
Entropy
Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

Slide by Tom Mitchell

From Entropy to Information Gain
Entropy
Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

Slide by Tom Mitchell

From Entropy to Information Gain
Entropy
Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

Slide by Tom Mitchell

Information Gain
Information Gain is the mutual information between
input attribute A and target variable Y

Information Gain is the expected reduction in entropy

of target variable Y for data sample S, due to sorting
on variable A

Slide by Tom Mitchell

Calculating Information Gain
Information Gain = entropy(parent) – [average entropy(children)]

child = -æç 13 × log 13 ö÷ - æç 4 × log 4 ö÷ = 0.787

impurity 2 2
entropy è 17 17 ø è 17 17 ø

Entire population (30 instances)

17 instances

child æ1 1 ö æ 12 12 ö
impurity
entropy= -ç × log 2 ÷ - ç × log 2 ÷ = 0.391
è 13 13 ø è 13 13 ø

parent= -æç 14 × log 14 ö÷ - æç 16 × log 16 ö÷ = 0.996

impurity 2 2
entropy è 30 30 ø è 30 30 ø 13 instances

æ 17 ö æ 13 ö
(Weighted) Average Entropy of Children = ç × 0.787 ÷ + ç × 0.391÷ = 0.615
è 30 ø è 30 ø
Information Gain= 0.996 - 0.615 = 0.38 13
Based on slide by Pedro Domingos
Entropy-Based Automatic Decision
Tree Construction

Training Set X Node 1

x1=(f11,f12,…f1m) What feature
x2=(f21,f22, f2m) should be used?
.
What values?
.
xn=(fn1,f22, f2m)

Quinlan suggested information gain in his ID3 system

14
Based on slide by Pedro Domingos
Using Information Gain to Construct
a Decision Tree
Choose the attribute A
Full Training Set X with highest information
Attribute A
gain for the full training
v1 v2 vk set at the root of the tree.
Construct child nodes
for each value of A. Set X ¢ X¢={xÎX | value(A)=v1}
Each has an associated
subset of vectors in repeat
which A has a particular recursively
till when?
value.

15
Based on slide by Pedro Domingos
Sample Dataset (was Tennis Played?)
• Columns denote features Xi
• Rows denote labeled instances hxi , yi i
• Class label denotes whether a tennis game was played

hxi , yi i
Slide by Tom Mitchell
Slide by Tom Mitchell
Slide by Tom Mitchell
Which Tree Should We Output?
• ID3 performs heuristic
search through space of
decision trees
• It stops at smallest
acceptable tree. Why?

Occam’s razor: prefer the

simplest hypothesis that
fits the data

Slide by Tom Mitchell

Chemistry Module Form 4 Complete Set
100% (2)
Chemistry Module Form 4 Complete Set
197 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Lease Question and Solution
100% (1)
Lease Question and Solution
7 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Chap5 - Machine Learning Part II - Decision Tree
No ratings yet
Chap5 - Machine Learning Part II - Decision Tree
68 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
MSDS NaOH 0.1 N
No ratings yet
MSDS NaOH 0.1 N
5 pages
02 DecisionTrees Done
No ratings yet
02 DecisionTrees Done
68 pages
ITC Module - I
No ratings yet
ITC Module - I
98 pages
Business Analytics & Machine Learning: Decision Tree Classifiers
No ratings yet
Business Analytics & Machine Learning: Decision Tree Classifiers
60 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
ML - 04 - Decision Trees
No ratings yet
ML - 04 - Decision Trees
51 pages
ML Lecture04x2
No ratings yet
ML Lecture04x2
16 pages
Lec7 - Nonparametric Methods - II
No ratings yet
Lec7 - Nonparametric Methods - II
38 pages
2024 Lecture11 MLAlgorithms
No ratings yet
2024 Lecture11 MLAlgorithms
84 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
15-780: Graduate Artificial Intelligence: Decision Trees
No ratings yet
15-780: Graduate Artificial Intelligence: Decision Trees
41 pages
Recitation Decision Trees Adaboost 02-09-2006
No ratings yet
Recitation Decision Trees Adaboost 02-09-2006
30 pages
Decision Tree 2
No ratings yet
Decision Tree 2
19 pages
Module 3
No ratings yet
Module 3
101 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
Chapter4 Machine Learning Part3
No ratings yet
Chapter4 Machine Learning Part3
43 pages
2025 Lecture07 P1 ID3
No ratings yet
2025 Lecture07 P1 ID3
41 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
Decision Tree Class 2
No ratings yet
Decision Tree Class 2
40 pages
Entropy and IG
No ratings yet
Entropy and IG
23 pages
Module14 InformationTheoryandEntropy
No ratings yet
Module14 InformationTheoryandEntropy
24 pages
Decision Tree-Using Entropy
No ratings yet
Decision Tree-Using Entropy
17 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
A08 Decision Trees 2up
No ratings yet
A08 Decision Trees 2up
20 pages
Sandvik Jaw Crushers-2009 PDF
100% (1)
Sandvik Jaw Crushers-2009 PDF
12 pages
ID3 Algorithm & ROC Analysis
No ratings yet
ID3 Algorithm & ROC Analysis
51 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
44 pages
Decision Trees
No ratings yet
Decision Trees
42 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Comp101 Lect02
No ratings yet
Comp101 Lect02
44 pages
Information Theory: Mike Brookes E4.40, ISE4.51, SO20
No ratings yet
Information Theory: Mike Brookes E4.40, ISE4.51, SO20
114 pages
Information Theory
No ratings yet
Information Theory
114 pages
Machine Learning II - Decision Trees
No ratings yet
Machine Learning II - Decision Trees
16 pages
23 Id3
No ratings yet
23 Id3
20 pages
Probability & Information: Prof. J Bapat
No ratings yet
Probability & Information: Prof. J Bapat
20 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
10b Understanding Entropy Information Gain
No ratings yet
10b Understanding Entropy Information Gain
10 pages
Dtree
No ratings yet
Dtree
15 pages
Executive Summary: Municipality of Tigbauan Comprehensive Land Use Plan 2014-2024
No ratings yet
Executive Summary: Municipality of Tigbauan Comprehensive Land Use Plan 2014-2024
96 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Communication Theory and Coding: Basics
No ratings yet
Communication Theory and Coding: Basics
17 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
MIT16 36s09 Lec03
No ratings yet
MIT16 36s09 Lec03
10 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Tasks On Decision Trees
No ratings yet
Tasks On Decision Trees
11 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Grammer 3 Comparison
No ratings yet
Grammer 3 Comparison
25 pages
Lec35 - 210108062 - ZAINAB ALI
No ratings yet
Lec35 - 210108062 - ZAINAB ALI
9 pages
Mutual Information
No ratings yet
Mutual Information
48 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
SSG Agro: Dry Fish Market Assessment
No ratings yet
SSG Agro: Dry Fish Market Assessment
26 pages
Lecture 1: Introduction, Entropy and ML Estimation
No ratings yet
Lecture 1: Introduction, Entropy and ML Estimation
5 pages
What Is Entropy and Why Information Gain Matter in Decision Trees
No ratings yet
What Is Entropy and Why Information Gain Matter in Decision Trees
10 pages
EV Etech 1 95 - 50hz
No ratings yet
EV Etech 1 95 - 50hz
80 pages
Yearly Current Affairs - January To November @IAS - Valley
No ratings yet
Yearly Current Affairs - January To November @IAS - Valley
53 pages
Scissor Skill Develop.
No ratings yet
Scissor Skill Develop.
6 pages
2016 Guam Statistical Yearbook
No ratings yet
2016 Guam Statistical Yearbook
510 pages
Control and Notification Automatic Water Pump With Arduino and SMS Gateway
No ratings yet
Control and Notification Automatic Water Pump With Arduino and SMS Gateway
6 pages
Cable Testing
No ratings yet
Cable Testing
4 pages
Assignment PM
No ratings yet
Assignment PM
20 pages
As 5808-2009 Child-Resistant Packaging - Requirements and Testing Procedures For Non-Reclosable Packages For
No ratings yet
As 5808-2009 Child-Resistant Packaging - Requirements and Testing Procedures For Non-Reclosable Packages For
10 pages
Fairfield Methodist - 11 Prelims - E Maths Paper 2
No ratings yet
Fairfield Methodist - 11 Prelims - E Maths Paper 2
15 pages
Practice Makes Pedagogy John Dewey
No ratings yet
Practice Makes Pedagogy John Dewey
16 pages
1 JBC Market Narrative
No ratings yet
1 JBC Market Narrative
28 pages
Shigeaki Hinohara
No ratings yet
Shigeaki Hinohara
2 pages
Trivia PPE 4
No ratings yet
Trivia PPE 4
1 page
Transcript Record: Transcript Not Official Unless Delivered Through Parchment Exchange
No ratings yet
Transcript Record: Transcript Not Official Unless Delivered Through Parchment Exchange
1 page
Ahaha
No ratings yet
Ahaha
13 pages
CLUP Step 9 Report
No ratings yet
CLUP Step 9 Report
11 pages
AAA2
No ratings yet
AAA2
14 pages
Chapter One
No ratings yet
Chapter One
7 pages
No. 5 of 19 Geosynthetics in Separation by Prof. Alan Mcgown University of Strathclyde
No ratings yet
No. 5 of 19 Geosynthetics in Separation by Prof. Alan Mcgown University of Strathclyde
17 pages
Solar Power Control System On Smart Green Home
No ratings yet
Solar Power Control System On Smart Green Home
12 pages
Ahaaha
No ratings yet
Ahaaha
13 pages
Grade 11 Schedule of Classes For Students Science, Technology, Engineering, Mathematics (STEM) Strand
No ratings yet
Grade 11 Schedule of Classes For Students Science, Technology, Engineering, Mathematics (STEM) Strand
2 pages
Perunthalaivar Kamarajar Institute of Engineering and Technology (Pkiet) Nedungadu - Karaikal - 609 603
No ratings yet
Perunthalaivar Kamarajar Institute of Engineering and Technology (Pkiet) Nedungadu - Karaikal - 609 603
9 pages
Special Smarandache Ruled Surfaces According To FLC Frame in E 3
No ratings yet
Special Smarandache Ruled Surfaces According To FLC Frame in E 3
19 pages
Surpac Appendix 1
No ratings yet
Surpac Appendix 1
4 pages
O1d102 04 - en Us PDF
No ratings yet
O1d102 04 - en Us PDF
5 pages
Why Coffee Shops Fail in Their First Year
No ratings yet
Why Coffee Shops Fail in Their First Year
3 pages
Original
No ratings yet
Original
7 pages
Problema Resuelto
No ratings yet
Problema Resuelto
3 pages
Lab 8 Electronics 2
No ratings yet
Lab 8 Electronics 2
5 pages
A 2
No ratings yet
A 2
2 pages
C 2
No ratings yet
C 2
2 pages
A Lesson From My Favorite Hobby
No ratings yet
A Lesson From My Favorite Hobby
1 page
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet

03 InformationGain

Uploaded by

03 InformationGain

Uploaded by

Decision Trees:

node = root of decision tree

How do we choose which attribute is best?

H(X) is the expected number of bits needed to encode a

Why? Information theory:

Slide by Tom Mitchell

H(X) is the expected number of bits needed to encode a

Why? Information theory:

Slide by Tom Mitchell

• What is the entropy of a group with 50% Maximum

Slide by Tom Mitchell

• Information gain tells us how important a given

• We will use it to decide the ordering of attributes in

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

Slide by Tom Mitchell

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

Slide by Tom Mitchell

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

Slide by Tom Mitchell

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Mututal information (aka Information Gain) of X and Y :

Slide by Tom Mitchell

Information Gain is the expected reduction in entropy

Slide by Tom Mitchell

child = -æç 13 × log 13 ö÷ - æç 4 × log 4 ö÷ = 0.787

Entire population (30 instances)

parent= -æç 14 × log 14 ö÷ - æç 16 × log 16 ö÷ = 0.996

Training Set X Node 1

Quinlan suggested information gain in his ID3 system

Occam’s razor: prefer the

Slide by Tom Mitchell

You might also like