0% found this document useful (0 votes)
4 views

18. Decision Tree

The document discusses decision trees, focusing on the ID3 algorithm, which uses entropy and information gain to create a model for classification. It explains the process of selecting attributes based on their information gain and outlines the characteristics and potential issues of decision trees, such as overfitting. Additionally, it introduces the concept of Occam's Razor in relation to hypothesis selection in machine learning.

Uploaded by

jsonlyfans js
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

18. Decision Tree

The document discusses decision trees, focusing on the ID3 algorithm, which uses entropy and information gain to create a model for classification. It explains the process of selecting attributes based on their information gain and outlines the characteristics and potential issues of decision trees, such as overfitting. Additionally, it introduces the concept of Occam's Razor in relation to hypothesis selection in machine learning.

Uploaded by

jsonlyfans js
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Decision Trees

■Decision tree representation


■ID3 Iterative Dichotomizer learning algorithm
■Entropy, information gain
■Overfitting
ID3:Entropy Entropy(S) = -p+ log2 p+ - p- log2 p-

-+-+
-+-+

---- +++
---- +++

• S is a sample of training examples taken from a population


• p+ is the proportion of positive examples, very high or very
low low entropy.
• p- is the proportion of negative examples
• Entropy measures the impurity of S
more impure S, more is the entropy. 2
Top-Down Induction of Decision Trees
-ID3 – A greedy BFS method

1. Start by calculating entropy of Decision node

2. A ← the “best” decision attribute for next node


Least Entropy, highest Information Gain

3. For each value of A create new descendant

4. If all training examples are perfectly classified


(same value of target attribute) stop, else iterate
over new leaf nodes.
3
Entropy(S) = -p+ log2 p+ - p- log2 p-
Training Examples
Day Outlook Temp. Humidity Wind Play Tennis
D1 Sunny Hot High Strong No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
4
Entropy of the Table’s Decision Field
• -(9/14) x log2 (9/14) – (5/14)xlog2 (5/14) = .940

• Shows the amount of disorderliness


ROOT
Decision

5
Information Gain

• Information Gain(S,A): expected reduction in entropy due to


sorting S on attribute A

6
Selecting the Next Attribute
S=[9+,5-] S=[9+,5-]
E(S)=0.940 E(S)=0.940
Humidity Wind

High Normal Weak Strong

[3+, 4-] [6+, 1-] [6+, 2-] [3+, 3-]


E=0.985 E=0.592 E=0.811 E=1.0
Gain(S,Humidity) Gain(S,Wind)
=0.940-(7/14)*0.985 =0.940-(8/14)*0.811
– (7/14)*0.592 – (6/14)*1.0
=0.151 =0.048
7
Humidity provides greater info. gain than Wind, w.r.t target classification.
Selecting the Next Attribute
S=[9+,5-]
E(S)=0.940
Outlook

Over
Sunny Rain
cast

[2+, 3-] [4+, 0] [3+, 2-]


E=0.970 E=0.0 E=0.970
Gain(S,Outlook)
=0.940-(5/14)*0.971
-(4/14)*0.0 – (5/14)*0.0971
=0.247
8
Temperature
S=[9+,5-]
E(S)=0.940
Temperatur
e
Hot Cool Mild

[2+, 2-] [3+, 1-] [4+,2-]


E=1.00 E=0.81 E=0.91

9
Selecting the Next Attribute
The information gain values for the 4 attributes
are:
• Gain(S,Outlook) =0.247
• Gain(S,Humidity) =0.151
• Gain(S,Wind) =0.048
• Gain(S,Temperature) =0.029

where S denotes the collection of training


examples

10
R

ID3 Algorithm
[D1,D2,…,D14] Outlook
[9+,5-]

Sunny Overcast Rain

Ssunny =[D1,D2,D8,D9,D11] [D3,D7,D12,D13] [D4,D5,D6,D10,D14]


[2+,3-] [4+,0-] [3+,2-]
? Yes ?
Gain(Ssunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970
Gain(Ssunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570
Gain(Ssunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
11
ID3 Algorithm
Outlook

Sunny Overcast Rain

Humidity Yes Wind


[D3,D7,D12,D13]

High Normal Strong Weak

No Yes No Yes

[D1,D2] [D8,D9,D11] [D6,D14] [D4,D5,D10]


12
[mistake]
Converting a Tree to Rules
Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak


No Yes No Yes

R1: If (Outlook=Sunny) ∧ (Humidity=High) Then PlayTennis=No


R2: If (Outlook=Sunny) ∧ (Humidity=Normal) Then
PlayTennis=Yes
R3: If (Outlook=Overcast) Then PlayTennis=Yes
R4: If (Outlook=Rain) ∧ (Wind=Strong) Then PlayTennis=No 13
R : If (Outlook=Rain) ∧ (Wind=Weak) Then PlayTennis=Yes
Some Characteristics of Decision Trees
• Discrete values target function
• Suitable for discrete attribute values though continuous also possible
• Target function is expressed as disjunction of conjunction
• Searches the complete space of finite discrete valued function for
correct hypothesis – decision trees
• However, results one single final hypothesis
• Greedy method for search, no backtrack. Solution may be
sub-optimal
• Uses all training examples at each step
• Robust but may suffer overfiiting
14
Occam’s Razor
”If two theories explain the facts equally well, then the simpler theory is
to be preferred”

Arguments in favor:
• Fewer short hypotheses than long hypotheses
• A short hypothesis that fits the data is unlikely to be a coincidence
• A long hypothesis that fits the data might be a coincidence
Arguments opposed:
• There are many ways to define small sets of hypotheses

• Supports Inductive bias – after training, the learner justifies


future classifications due to generalization, by choosing
shortest tree

15

You might also like