0% found this document useful (0 votes)
11 views

Decision Trees

Uploaded by

Seema Mehla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Decision Trees

Uploaded by

Seema Mehla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Decision Trees

Slides are assembled from various online sources with a grateful acknowledgement to all those who made them publicly available on the web.
Decision Tree Classifier

10
9 Ross Quinlan
8
Antenna Length

7
6 Abdomen Length > 7.1?

5
no yes
4
3 Antenna Length > 6.0? Katydid
2
no yes
1
Grasshopper Katydid
1 2 3 4 5 6 7 8 9 10
Abdomen Length
Example tree

Intermediate nodes : Attributes value tests

Edges : Attribute value

Leaf nodes : Class predictions

Example algorithms: ID3, C4.5, SPRINT, CART


Decision Tree schematic

Training data set

a1 a2 a3 a4 a5 a6

X Y Z

Impure node, Impure node, Pure node,


Select best attribute Select best attribute Leaf node:
and continue and continue Class RED
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES
How to Build Decision Trees

• Greedy strategy.
– Split the records based on an attribute test that optimizes certain
criterion.
• Issues
– Determine how to split the records
– How to specify the attribute test condition?
– How to determine the best split?
– Determine when to stop splitting
How to specify the attribute test condition

• Idea: choose attribute that leads to greatest increase in “purity”

• Depends on attribute types


– Nominal
– Ordinal
– Continuous

• Depends on number of ways to split


– 2-way split
– Multi-way split
Splitting Based on Nominal Attributes

• Multi-way split: Use as many partitions as distinct values.

CarType
Family Luxury
Sports

• Binary split: Divides values into two subsets.


Need to find optimal partitioning.

CarType OR CarType
{Sports, {Family,
Luxury} {Family} Luxury} {Sports}
Splitting Based on Ordinal Attributes

• Multi-way split: Use as many partitions as distinct values.


Size
Small Large
Medium

• Binary split: Divides values into two subsets.


Need to find optimal partitioning.
Size Size
{Small,
{Large}
OR {Medium,
{Small}
Medium} Large}

Size
{Small,
{Medium}
• What about this split? Large}
Splitting Based on Continuous Attributes

• Different ways of handling


– Discretization to form an ordinal categorical attribute
• Static – discretize once at the beginning
• Dynamic – ranges can be found by equal interval
bucketing, equal frequency bucketing
(percentiles), or clustering.

– Binary Decision: (A < v) or (A  v)


• consider all possible splits and finds the best cut
• can be more compute intensive
How to determine the Best Split
Before Splitting: 10 records of class 0,
10 records of class 1

Own Car Student


Car? Type? ID?

Yes No Family Luxury c1 c20


c10 c11
Sports
C0: 6 C0: 4 C0: 1 C0: 8 C0: 1 C0: 1 ... C0: 1 C0: 0 ... C0: 0
C1: 4 C1: 6 C1: 3 C1: 0 C1: 7 C1: 0 C1: 0 C1: 1 C1: 1

Which test condition is the best?


How to determine the Best Split
• Greedy approach:
– Nodes with homogeneous class distribution are preferred
• Need a measure of node impurity:

C0: 5 C0: 9
C1: 5 C1: 1

Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity
Measures of Node Impurity

• Entropy

• Gini Index

• Misclassification error
Entropy
• Entropy (impurity) of a set of examples, S, relative to a
binary classification is:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = −𝑝1 log 2 ( 𝑝1 ) − 𝑝0 log 2 ( 𝑝0 )

• where p1 is the fraction of positive examples in S and p0 is


the fraction of negatives.
• If all examples are in one category, entropy will be zero
• If examples are equally mixed (p1=p0=0.5), entropy is a
maximum of 1.
• For multi-class problems with c categories, entropy
𝑐
generalizes to:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = ෍ −𝑝𝑖 log 2 ( 𝑝𝑖 )
𝑖=1
Entropy Plot for Binary Classification
Information Gain

• The information gain of a attribute F is the expected reduction in entropy


resulting from splitting on this feature.
𝑆𝑣
𝐺𝑎𝑖𝑛(𝑆, 𝐹) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ෍ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑣 )
𝑆
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐹)

where Sv is the subset of S having value v for feature F.

• Entropy of each resulting subset weighted by its relative size.


Example
Day Outlook Temperature Humidity Wind Play
Tennis
E S = −P⊕ log 2 P⊕ − P⊖ log 2 P⊖ Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
9 9 5 5 Day4 Rain Mild High Weak Yes
E S = 9+, 5 − = − log 2 − log 2
14 14 14 14 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
E S = 0.94 Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
2 2 3 3 Day11 Sunny Mild Normal Strong Yes
E Outlook sunny = 2+, 3 − = − log 2 − log 2 = 0.971
5 5 5 5 Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
4 4 0 0 Day14 Rain Mild High Strong No
E Outlook overcast = 4+, 0 − = − log 2 − log 2 =0
4 4 4 4

S: Data set
3 3 2 2
E Outlook rainy = 3+, 2 − = − log 2 − log 2 = 0.971
5 5 5 5
Example
Day Outlook Temperature Humidity Wind Play
Tennis
2 2 2 2 Day1 Sunny Hot High Weak No
E Temperaturehot = 2+, 2 − = − log 2 − log 2 =1
4 4 4 4 Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
4 4 2 2
E Temperaturemoderate = 4+, 2 − = − 6
log 2 6
− 6
log 2 6
= 0.918 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
3 3 1 1 Day8 Sunny Mild High Weak No
E Temperaturecold = 3+, 1 − = − log 2 − log 2 = 0.811
4 4 4 4 Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
3 3 4 4 Day12 Overcast Mild High Strong Yes
E Humidityhigh = 3+, 4 − = − log 2 − log 2 = 0.985
7 7 7 7 Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

6 6 1 1
E Humiditynormal = 6+, 1 − = − log 2 − log 2 = 0.592 S: Data set
7 7 7 7
Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No


Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
6 6 2 2 Day4 Rain Mild High Weak Yes
E Windweak = 6+, 2 − = − log 2 − log 2 = 0.811
8 8 8 8 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
3 3 3 3
E Windstrong = 3+, 3 − = − 6
log 2 6
− 6
log 2 6
=1 Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

S: Data set
Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No


Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

S: Data set
Example
Day Outlook Temperature Humidity Wind Play
Tennis
𝑆𝑣 Day1 Sunny Hot High Weak No
𝐺𝑎𝑖𝑛(𝑆, 𝐹) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ෍ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑣 )
𝑆 Day2 Sunny Hot High Strong No
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐹)
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
|𝑆𝑠𝑢𝑛𝑛𝑦 | Day6 Rain Cool Normal Strong No
𝐺𝑎𝑖𝑛 𝑆, Outlook = 0.94 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny )
|𝑆| Day7 Overcast Cool Normal Strong Yes
|𝑆𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 |
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 ) Day8 Sunny Mild High Weak No
|𝑆|
Day9 Sunny Cool Normal Weak Yes
|𝑆𝑟𝑎𝑖𝑛𝑦 |
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook rainy) Day10 Rain Mild Normal Weak Yes
|𝑆|
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes

5 4 5 Day13 Overcast Hot Normal Weak Yes


𝐺𝑎𝑖𝑛 𝑆, Outlook = 0.94 − ∗ 0.971 − ∗ 0 − ∗0.971 Day14 Rain Mild High Strong No
14 14 14

= 0.2465 S: Data set


Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No


Day2 Sunny Hot High Strong No
4 6 4
𝐺𝑎𝑖𝑛 𝑆, Temperature = 0.94 − ∗ 1 − ∗ 0.918 − ∗0.811 Day3 Overcast Hot High Weak Yes
14 14 14 Day4 Rain Mild High Weak Yes
= 0.0291 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
7 7
𝐺𝑎𝑖𝑛 𝑆, Humidity = 0.94 − ∗ 0.985 − ∗ 0.592 Day9 Sunny Cool Normal Weak Yes
14 14
Day10 Rain Mild Normal Weak Yes
= 0.151
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
8 6 Day14 Rain Mild High Strong No
𝐺𝑎𝑖𝑛 𝑆, Wind = 0.94 − ∗ 0.811 − ∗1
14 14
= 0.048 S: Data set
Example
Day Outlook Temperature Humidity Wind Play
Tennis

D1, D2, D1,…, D14 Day1 Sunny Hot High Weak No


Day2 Sunny Hot High Strong No
[9+, 5-]
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Outlook
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Sunny Overcast Rainy
Day8 Sunny Mild High Weak No

D1, D2, D8, D9, D11 D3, D7, D12, D13 D4, D5, D6, D10, D14 Day9 Sunny Cool Normal Weak Yes
[2+, 3-] [4+, 0-] [3+, 2-] Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
? Yes ? Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

S: Data set

Which attribute should be tested here ?


Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No


2 2 3 3
E Outlook sunny = 2+, 3 − = − log 2 − log 2 = 0.971 Day2 Sunny Hot High Strong No
5 5 5 5 Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes

E Outlook sunny ‫ ٿ‬Temperaturehot = 0+, 2 − = 0 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
E Outlook sunny ‫ ٿ‬TemperatureMild = 1+, 1 − = 1 Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
E Outlook sunny ‫ ٿ‬Temperaturecool = 1+, 0 − = 0 Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
𝐺𝑎𝑖𝑛 Outlook sunny , 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = E(Outlook sunny ) Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫ ٿ‬Temperaturehot | Day14 Rain Mild High Strong No
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫ ٿ‬Temperaturehot )
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 |
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫ ٿ‬Temperaturemild |
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫ ٿ‬Temperaturemild ) S: Data set
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 |

|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫ ٿ‬Temperaturecool |
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫ ٿ‬Temperaturecool )
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 |
Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No


Day2 Sunny Hot High Strong No
𝐺𝑎𝑖𝑛 Outlook sunny , 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
2 2 1 Day5 Rain Cool Normal Weak Yes
= 0.971 − ∗ 0 − ∗ 1 − ∗0=0.571
5 5 5 Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

S: Data set
Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No


2 2 3 3
E Outlook sunny = 2+, 3 − = − log 2 − log 2 = 0.971 Day2 Sunny Hot High Strong No
5 5 5 5 Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes

E Outlook sunny ‫𝑦𝑡𝑖𝑑𝑖𝑚𝑢𝐻 ٿ‬ℎ𝑖𝑔ℎ = 0+, 3 − = 0 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
E Outlook sunny ‫ ٿ‬Humiditynormal = 2+, 0 − = 0 Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
𝐺𝑎𝑖𝑛 Outlook sunny , Humidity = E(Outlook sunny ) Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫𝑦𝑡𝑖𝑑𝑖𝑚𝑢𝐻 ٿ‬ℎ𝑖𝑔ℎ | Day13 Overcast Hot Normal Weak Yes
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫𝑦𝑡𝑖𝑑𝑖𝑚𝑢𝐻 ٿ‬ℎ𝑖𝑔ℎ )
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 | Day14 Rain Mild High Strong No

|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫ ٿ‬Humiditynormal |
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫ ٿ‬Humiditynormal) S: Data set
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 |

3 2
𝐺𝑎𝑖𝑛 Outlook sunny , Humidity = 0.971 − ∗ 0 − ∗ 0 =0.971
5 5
Example
Day Outlook Temperature Humidity Wind Play
Tennis
2 2 3 3
E Outlook sunny = 2+, 3 − = − log 2 − log 2 = 0.971 Day1 Sunny Hot High Weak No
5 5 5 5 Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
E Outlook sunny ‫ = 𝑘𝑎𝑒𝑤𝑑𝑛𝑖𝑊 ٿ‬1+, 2 − = 0.918
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
E Outlook sunny ‫ = 𝑔𝑛𝑜𝑟𝑡𝑠𝑑𝑛𝑖𝑊 ٿ‬1+, 1 − = 1 Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
𝐺𝑎𝑖𝑛 Outlook sunny , wind = E(Outlook sunny ) Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫| 𝑘𝑎𝑒𝑤𝑑𝑛𝑖𝑊 ٿ‬ Day12 Overcast Mild High Strong Yes
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫) 𝑘𝑎𝑒𝑤𝑑𝑛𝑖𝑊 ٿ‬
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 | Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫| 𝑔𝑛𝑜𝑟𝑡𝑠𝑑𝑛𝑖𝑊 ٿ‬
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫) 𝑔𝑛𝑜𝑟𝑡𝑠𝑑𝑛𝑖𝑊 ٿ‬
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 |
S: Data set
3 2
Gain Outlook sunny , Wind = 0.971 − ∗ 0.918 − ∗ 1 =0.020
5 5
D1, D2, D1,…, D14
Example [9+, 5-]
Day Outlook Temperature Humidity Wind Play
Tennis
Outlook
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Sunny Overcast Rainy Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
D1, D2, D8, D9, D11 D3, D7, D12, D13 D4, D5, D6, D10, D14
[3+, 2-] Day6 Rain Cool Normal Strong No
[2+, 3-] [4+, 0-]
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Humidity Yes ?
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
High Normal Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes

D1, D2, D8, D9, D11 Day13 Overcast Hot Normal Weak Yes

[0+, 3-] [2+, 0-] Day14 Rain Mild High Strong No

No Yes S: Data set

Which attribute should be tested here ?


Splitting Based on Continuous Attributes

• Different ways of handling


– Discretization to form an ordinal categorical attribute
• Static – discretize once at the beginning
• Dynamic – ranges can be found by equal interval bucketing, equal
frequency bucketing (percentiles), or clustering.
– Binary Decision: (A < v) or (A  v)
• consider all possible splits and finds the best cut
• can be more compute intensive
Binary Decision
• For continuous attribute
– Partition the continuous value of attribute A into a
discrete set of intervals
– Create a new Boolean attribute Ac , looking for a
threshold c,

𝑡𝑟𝑢𝑒 if 𝐴𝑐 < 𝑐
𝐴𝑐 = ቊ
𝑓𝑎𝑙𝑠𝑒 otherwise

How to choose c ?
Tid Refund Marital Taxable
Status Income Cheat

Continuous Attributes 1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
• For efficient computation: for each attribute, 4 Yes Married 120K No
– Sort the attribute on values (static or dynamic) 5 No Divorced 95K Yes
– Linearly scan these values 6 No Married 60K No
– Choose the split position which gives you the in the 7 Yes Divorced 220K No
purity 8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

Cheat No No No Yes Yes Yes No No No No


Taxable Income
Sorted Values 60 70 75 85 90 95 100 120 125 220
Split Positions 55 65 72 80 87 92 97 110 122 172 230
<= > <= > <= > <= > <= > <= > <= > <= > <= > <= > <= >
Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0

No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0

Purity ? ? ? ? ? ? ? ? ? ? ?
Issue with Information Gain

• There is a natural bias in the information gain measure that favors attributes with
many values over those with few values.
• As an extreme example, consider the attribute Date, which has a very large number of
possible values, it would have the highest information gain of any of the attributes.
• This is because Date alone perfectly predicts the target attribute over the training data.
• Thus, it would be selected as the decision attribute for the root node of the tree and
lead to a (quite broad) tree of depth one, which perfectly classifies the training data.
• Of course, this decision tree would fare poorly on subsequent examples, because it is
not a useful predictor despite the fact that it perfectly separates the training data.
Measure of impurity: Gini index
• Gini index for a given node t :

𝐺𝐼𝑁𝐼(𝑡) = 1 − ෍[𝑝(𝑗|𝑡)]2
𝑗

p( j | t ) is the relative frequency of class j at node


t
– Maximum (1 – 1 / nc ) when records are equally distributed
among all classes, implying least amount of information ( nc
= number of classes ).
– Minimum ( 0.0 ) when all records belong to one class,
implying most amount of information.

C1 0 C1 1 C1 2 C1 3
C2 6 C2 5 C2 4 C2 3
Gini=0.000 Gini=0.278 Gini=0.444 Gini=0.500
Practical Issues

• Underfitting and Overfitting


• Missing Values
• Costs of Classification
Underfitting and Overfitting

Overfitting

Underfitting: when model is too simple, both training and test errors are large
Overfitting due to Noise

Decision boundary is distorted by noise point


Overfitting due to Insufficient Examples

Lack of data points in the lower half of the diagram makes it difficult to
predict correctly the class labels of that region
Insufficient number of training records in the region causes the decision tree
to predict the test examples using other training records that are irrelevant to
the classification task
Notes on Overfitting

• Overfitting results in decision trees that are more complex


than necessary
• Training error no longer provides a good estimate of how
well the tree will perform on previously unseen records
How to Address Overfitting

• Pre-Pruning (Early Stopping Rule)


– Stop the algorithm before it becomes a fully-grown tree
– Typical stopping conditions for a node:
• Stop if all instances belong to the same class
• Stop if all the attribute values are the same
– More restrictive conditions:
• Stop if number of instances is less than some user-specified
threshold
• Stop if expanding the current node does not improve impurity
measures.
How to Address Overfitting

• Post-pruning
– Grow decision tree to its entirety
– Trim the nodes of the decision tree in a bottom-up fashion
– If generalization error improves after trimming, replace sub-tree by a
leaf node.
– Class label of leaf node is determined from majority class of instances
in the sub-tree
Handling Missing Attribute Values

• Assume an attribute can take the value “blank”.


• Assign most common value of A among training data at node n.
• Assign most common value of A among training data at node n which have
the same target class.
• Assign prob pi to each possible value vi of A
– Assign a fraction (pi) of example to each descendant in tree
– This method is used in C4.5.
Strengths

• Can generate understandable rules


• Perform classification without much computation
• Can handle continuous and categorical variables
• Provide a clear indication of which fields are most important for prediction or
classification
Weakness

• Perform poorly with many class and small data.


• Computationally expensive to train.
– At each node, each candidate splitting field must be sorted before its best split can be
found.
– In some algorithms, combinations of fields are used and a search must be made for
optimal combining weights.
– Pruning algorithms can also be expensive since many candidate sub-trees must be
formed and compared.
References

• Machine Learning,Tom Mitchell.


• http://didattica.cs.unicam.it/lib/exe/fetch.php?media=didatti
ca:magistrale:kebi:ay_1516:ke-
7_learning_decision_trees.pdf
• Few online sources.

You might also like