Decision-Tree Learning .
Decision-Tree Learning .
• Introduction
– Decision Trees
– TDIDT: Top-Down Induction of Decision Trees
• ID3
– Attribute selection
– Entropy, Information, Information Gain
– Gain Ratio
• C4.5
– Numeric Values
– Missing Values Acknowledgements: Many slides
– Pruning based on Frank & Witten, a few on
Kan, Steinbach & Kumar
• Regression and Model Tree
Decision Trees
to classifiy an example:
1. start at the root
2. perform the test
3. follow the edge corresponding to outcome
4. goto 2. unless leaf
5. predict that outcome associated with the leaf
Decision Tree Learning
• Function ID3
– Input: Example set S
– Output: Decision Tree DT
• If all examples in S belong to the same class c
– return a new leaf and label it with c
• Else
– i. Select an attribute A according to some heuristic function
– ii. Generate a new node DT with A as test
– iii.For each Value vi of A
• (a) Let Si = all examples in S with A = v i
• (b) Use ID3 to construct a decision tree DTi for example set Si
• (c) Generate an edge that connects DT and DTi
A Different Decision Tree
• S is a set of examples
• p⊕ is the proportion of examples in
class ⊕
• p⊖ = 1 − p⊕ is the proportion of
examples in class ⊖
Interpretation:
•amount of unorderedness in the class
distribution of S
Example: Attribute Outlook
Note: this
is normally
undefined.
• Outlook = rainy : 2 examples yes, 3 examples no Here: = 0
Entropy (for more classes)
• Problem:
– Entropy only computes the quality of a single (sub-)set of examples
• corresponds to a single value
– How can we compute the quality of the entire split?
• corresponds to an entire attribute
• Solution:
– Compute the weighted average over all sets resulting from the split
• weighted by their size
I (S , A) =
• Example:
• Average entropy for attribute Outlook:
Information Gain
Outlook is selected as
Outlook
the root note
overcast Rain
Sunny
? Yes ?
further splitting
necessary
Outlook = overcast
contains only examples
of class yes
Example (Ctd.)
overcast Rain
Sunny
Humidity Yes ?
normal high
Yes No
further splitting
necessary
Pure leaves
→ No further expansion necessary
Final decision tree
Properties of Entropy
• Entropy is the only function that satisfies all of the following three properties
• When node is pure, measure should be zero
• When impurity is maximal (i.e. all classes equally likely), measure should be
maximal
• Measure should obey multistage property:
• p, q, r are classes in set S, and T are examples of class t = q ˅ r
Entropy of split:
IntI (S , A) =
• Example:
– Intrinsic information of Day attribute:
• Observation:
– Attributes with higher intrinsic information are less useful
Gain Ratio
• modification of the information gain that reduces its bias towards multi-valued
attributes
• takes number and size of branches into account when choosing an attribute
• corrects the information gain by taking the intrinsic information of a split into
account
• Definition of Gain Ratio:
• Example:
• Gain Ratio of Day attribute
Gain ratios for weather data
• Gini Gain
– could be defined analogously to information gain
– but typically avg. Gini index is minimized instead of maximizing Gini gain
Comparison among Splitting Criteria