Open navigation menu

Scribd

0% found this document useful (0 votes)

11 views

Decision Trees

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Decision Trees

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Decision Trees

Slides are assembled from various online sources with a grateful acknowledgement to all those who made them publicly available on the web.
Decision Tree Classifier

10
9 Ross Quinlan
8
Antenna Length

7
6 Abdomen Length > 7.1?

5
no yes
4
3 Antenna Length > 6.0? Katydid
2
no yes
1
Grasshopper Katydid
1 2 3 4 5 6 7 8 9 10
Abdomen Length
Example tree

Intermediate nodes : Attributes value tests

Edges : Attribute value

Leaf nodes : Class predictions

Example algorithms: ID3, C4.5, SPRINT, CART

Decision Tree schematic

Training data set

a1 a2 a3 a4 a5 a6

X Y Z

Impure node, Impure node, Pure node,

Select best attribute Select best attribute Leaf node:
and continue and continue Class RED
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES
How to Build Decision Trees

• Greedy strategy.
– Split the records based on an attribute test that optimizes certain
criterion.
• Issues
– Determine how to split the records
– How to specify the attribute test condition?
– How to determine the best split?
– Determine when to stop splitting
How to specify the attribute test condition

• Idea: choose attribute that leads to greatest increase in “purity”

• Depends on attribute types

– Nominal
– Ordinal
– Continuous

• Depends on number of ways to split

– 2-way split
– Multi-way split
Splitting Based on Nominal Attributes

• Multi-way split: Use as many partitions as distinct values.

CarType
Family Luxury
Sports

• Binary split: Divides values into two subsets.

Need to find optimal partitioning.

CarType OR CarType
{Sports, {Family,
Luxury} {Family} Luxury} {Sports}
Splitting Based on Ordinal Attributes

• Multi-way split: Use as many partitions as distinct values.

Size
Small Large
Medium

• Binary split: Divides values into two subsets.

Need to find optimal partitioning.
Size Size
{Small,
{Large}
OR {Medium,
{Small}
Medium} Large}

Size
{Small,
{Medium}
• What about this split? Large}
Splitting Based on Continuous Attributes

• Different ways of handling

– Discretization to form an ordinal categorical attribute
• Static – discretize once at the beginning
• Dynamic – ranges can be found by equal interval
bucketing, equal frequency bucketing
(percentiles), or clustering.

– Binary Decision: (A < v) or (A  v)

• consider all possible splits and finds the best cut
• can be more compute intensive
How to determine the Best Split
Before Splitting: 10 records of class 0,
10 records of class 1

Own Car Student

Car? Type? ID?

Yes No Family Luxury c1 c20

c10 c11
Sports
C0: 6 C0: 4 C0: 1 C0: 8 C0: 1 C0: 1 ... C0: 1 C0: 0 ... C0: 0
C1: 4 C1: 6 C1: 3 C1: 0 C1: 7 C1: 0 C1: 0 C1: 1 C1: 1

Which test condition is the best?

How to determine the Best Split
• Greedy approach:
– Nodes with homogeneous class distribution are preferred
• Need a measure of node impurity:

C0: 5 C0: 9
C1: 5 C1: 1

Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity
Measures of Node Impurity

• Entropy

• Gini Index

• Misclassification error
Entropy
• Entropy (impurity) of a set of examples, S, relative to a
binary classification is:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = −𝑝1 log 2 ( 𝑝1 ) − 𝑝0 log 2 ( 𝑝0 )

• where p1 is the fraction of positive examples in S and p0 is

the fraction of negatives.
• If all examples are in one category, entropy will be zero
• If examples are equally mixed (p1=p0=0.5), entropy is a
maximum of 1.
• For multi-class problems with c categories, entropy
𝑐
generalizes to:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = ෍ −𝑝𝑖 log 2 ( 𝑝𝑖 )
𝑖=1
Entropy Plot for Binary Classification
Information Gain

• The information gain of a attribute F is the expected reduction in entropy

resulting from splitting on this feature.
𝑆𝑣
𝐺𝑎𝑖𝑛(𝑆, 𝐹) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ෍ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑣 )
𝑆
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐹)

where Sv is the subset of S having value v for feature F.

• Entropy of each resulting subset weighted by its relative size.

Example
Day Outlook Temperature Humidity Wind Play
Tennis
E S = −P⊕ log 2 P⊕ − P⊖ log 2 P⊖ Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
9 9 5 5 Day4 Rain Mild High Weak Yes
E S = 9+, 5 − = − log 2 − log 2
14 14 14 14 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
E S = 0.94 Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
2 2 3 3 Day11 Sunny Mild Normal Strong Yes
E Outlook sunny = 2+, 3 − = − log 2 − log 2 = 0.971
5 5 5 5 Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
4 4 0 0 Day14 Rain Mild High Strong No
E Outlook overcast = 4+, 0 − = − log 2 − log 2 =0
4 4 4 4

S: Data set
3 3 2 2
E Outlook rainy = 3+, 2 − = − log 2 − log 2 = 0.971
5 5 5 5
Example
Day Outlook Temperature Humidity Wind Play
Tennis
2 2 2 2 Day1 Sunny Hot High Weak No
E Temperaturehot = 2+, 2 − = − log 2 − log 2 =1
4 4 4 4 Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
4 4 2 2
E Temperaturemoderate = 4+, 2 − = − 6
log 2 6
− 6
log 2 6
= 0.918 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
3 3 1 1 Day8 Sunny Mild High Weak No
E Temperaturecold = 3+, 1 − = − log 2 − log 2 = 0.811
4 4 4 4 Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
3 3 4 4 Day12 Overcast Mild High Strong Yes
E Humidityhigh = 3+, 4 − = − log 2 − log 2 = 0.985
7 7 7 7 Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

6 6 1 1
E Humiditynormal = 6+, 1 − = − log 2 − log 2 = 0.592 S: Data set
7 7 7 7
Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No

Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
6 6 2 2 Day4 Rain Mild High Weak Yes
E Windweak = 6+, 2 − = − log 2 − log 2 = 0.811
8 8 8 8 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
3 3 3 3
E Windstrong = 3+, 3 − = − 6
log 2 6
− 6
log 2 6
=1 Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

S: Data set
Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No

Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

S: Data set
Example
Day Outlook Temperature Humidity Wind Play
Tennis
𝑆𝑣 Day1 Sunny Hot High Weak No
𝐺𝑎𝑖𝑛(𝑆, 𝐹) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ෍ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑣 )
𝑆 Day2 Sunny Hot High Strong No
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐹)
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
|𝑆𝑠𝑢𝑛𝑛𝑦 | Day6 Rain Cool Normal Strong No
𝐺𝑎𝑖𝑛 𝑆, Outlook = 0.94 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny )
|𝑆| Day7 Overcast Cool Normal Strong Yes
|𝑆𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 |
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 ) Day8 Sunny Mild High Weak No
|𝑆|
Day9 Sunny Cool Normal Weak Yes
|𝑆𝑟𝑎𝑖𝑛𝑦 |
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook rainy) Day10 Rain Mild Normal Weak Yes
|𝑆|
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes

5 4 5 Day13 Overcast Hot Normal Weak Yes

𝐺𝑎𝑖𝑛 𝑆, Outlook = 0.94 − ∗ 0.971 − ∗ 0 − ∗0.971 Day14 Rain Mild High Strong No
14 14 14

= 0.2465 S: Data set

Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No

Day2 Sunny Hot High Strong No
4 6 4
𝐺𝑎𝑖𝑛 𝑆, Temperature = 0.94 − ∗ 1 − ∗ 0.918 − ∗0.811 Day3 Overcast Hot High Weak Yes
14 14 14 Day4 Rain Mild High Weak Yes
= 0.0291 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
7 7
𝐺𝑎𝑖𝑛 𝑆, Humidity = 0.94 − ∗ 0.985 − ∗ 0.592 Day9 Sunny Cool Normal Weak Yes
14 14
Day10 Rain Mild Normal Weak Yes
= 0.151
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
8 6 Day14 Rain Mild High Strong No
𝐺𝑎𝑖𝑛 𝑆, Wind = 0.94 − ∗ 0.811 − ∗1
14 14
= 0.048 S: Data set
Example
Day Outlook Temperature Humidity Wind Play
Tennis

D1, D2, D1,…, D14 Day1 Sunny Hot High Weak No

Day2 Sunny Hot High Strong No
[9+, 5-]
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Outlook
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Sunny Overcast Rainy
Day8 Sunny Mild High Weak No

D1, D2, D8, D9, D11 D3, D7, D12, D13 D4, D5, D6, D10, D14 Day9 Sunny Cool Normal Weak Yes
[2+, 3-] [4+, 0-] [3+, 2-] Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
? Yes ? Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

S: Data set

Which attribute should be tested here ?

Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No

2 2 3 3
E Outlook sunny = 2+, 3 − = − log 2 − log 2 = 0.971 Day2 Sunny Hot High Strong No
5 5 5 5 Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes

E Outlook sunny ‫ ٿ‬Temperaturehot = 0+, 2 − = 0 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
E Outlook sunny ‫ ٿ‬TemperatureMild = 1+, 1 − = 1 Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
E Outlook sunny ‫ ٿ‬Temperaturecool = 1+, 0 − = 0 Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
𝐺𝑎𝑖𝑛 Outlook sunny , 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = E(Outlook sunny ) Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫ ٿ‬Temperaturehot | Day14 Rain Mild High Strong No
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫ ٿ‬Temperaturehot )
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 |
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫ ٿ‬Temperaturemild |
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫ ٿ‬Temperaturemild ) S: Data set
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 |

|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫ ٿ‬Temperaturecool |
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫ ٿ‬Temperaturecool )
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 |
Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No

Day2 Sunny Hot High Strong No
𝐺𝑎𝑖𝑛 Outlook sunny , 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
2 2 1 Day5 Rain Cool Normal Weak Yes
= 0.971 − ∗ 0 − ∗ 1 − ∗0=0.571
5 5 5 Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

S: Data set
Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No

2 2 3 3
E Outlook sunny = 2+, 3 − = − log 2 − log 2 = 0.971 Day2 Sunny Hot High Strong No
5 5 5 5 Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes

E Outlook sunny ‫𝑦𝑡𝑖𝑑𝑖𝑚𝑢𝐻 ٿ‬ℎ𝑖𝑔ℎ = 0+, 3 − = 0 Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
E Outlook sunny ‫ ٿ‬Humiditynormal = 2+, 0 − = 0 Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
𝐺𝑎𝑖𝑛 Outlook sunny , Humidity = E(Outlook sunny ) Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫𝑦𝑡𝑖𝑑𝑖𝑚𝑢𝐻 ٿ‬ℎ𝑖𝑔ℎ | Day13 Overcast Hot Normal Weak Yes
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫𝑦𝑡𝑖𝑑𝑖𝑚𝑢𝐻 ٿ‬ℎ𝑖𝑔ℎ )
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 | Day14 Rain Mild High Strong No

|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫ ٿ‬Humiditynormal |
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫ ٿ‬Humiditynormal) S: Data set
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 |

3 2
𝐺𝑎𝑖𝑛 Outlook sunny , Humidity = 0.971 − ∗ 0 − ∗ 0 =0.971
5 5
Example
Day Outlook Temperature Humidity Wind Play
Tennis
2 2 3 3
E Outlook sunny = 2+, 3 − = − log 2 − log 2 = 0.971 Day1 Sunny Hot High Weak No
5 5 5 5 Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
E Outlook sunny ‫ = 𝑘𝑎𝑒𝑤𝑑𝑛𝑖𝑊 ٿ‬1+, 2 − = 0.918
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
E Outlook sunny ‫ = 𝑔𝑛𝑜𝑟𝑡𝑠𝑑𝑛𝑖𝑊 ٿ‬1+, 1 − = 1 Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
𝐺𝑎𝑖𝑛 Outlook sunny , wind = E(Outlook sunny ) Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫| 𝑘𝑎𝑒𝑤𝑑𝑛𝑖𝑊 ٿ‬ Day12 Overcast Mild High Strong Yes
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫) 𝑘𝑎𝑒𝑤𝑑𝑛𝑖𝑊 ٿ‬
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 | Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 ‫| 𝑔𝑛𝑜𝑟𝑡𝑠𝑑𝑛𝑖𝑊 ٿ‬
− 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(Outlook sunny ‫) 𝑔𝑛𝑜𝑟𝑡𝑠𝑑𝑛𝑖𝑊 ٿ‬
|𝑂𝑢𝑡𝑙𝑜𝑜𝑘𝑠𝑢𝑛𝑛𝑦 |
S: Data set
3 2
Gain Outlook sunny , Wind = 0.971 − ∗ 0.918 − ∗ 1 =0.020
5 5
D1, D2, D1,…, D14
Example [9+, 5-]
Day Outlook Temperature Humidity Wind Play
Tennis
Outlook
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Sunny Overcast Rainy Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
D1, D2, D8, D9, D11 D3, D7, D12, D13 D4, D5, D6, D10, D14
[3+, 2-] Day6 Rain Cool Normal Strong No
[2+, 3-] [4+, 0-]
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Humidity Yes ?
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
High Normal Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes

D1, D2, D8, D9, D11 Day13 Overcast Hot Normal Weak Yes

[0+, 3-] [2+, 0-] Day14 Rain Mild High Strong No

No Yes S: Data set

Which attribute should be tested here ?

Splitting Based on Continuous Attributes

• Different ways of handling

– Discretization to form an ordinal categorical attribute
• Static – discretize once at the beginning
• Dynamic – ranges can be found by equal interval bucketing, equal
frequency bucketing (percentiles), or clustering.
– Binary Decision: (A < v) or (A  v)
• consider all possible splits and finds the best cut
• can be more compute intensive
Binary Decision
• For continuous attribute
– Partition the continuous value of attribute A into a
discrete set of intervals
– Create a new Boolean attribute Ac , looking for a
threshold c,

𝑡𝑟𝑢𝑒 if 𝐴𝑐 < 𝑐
𝐴𝑐 = ቊ
𝑓𝑎𝑙𝑠𝑒 otherwise

How to choose c ?
Tid Refund Marital Taxable
Status Income Cheat

Continuous Attributes 1 Yes Single 125K No

2 No Married 100K No
3 No Single 70K No
• For efficient computation: for each attribute, 4 Yes Married 120K No
– Sort the attribute on values (static or dynamic) 5 No Divorced 95K Yes
– Linearly scan these values 6 No Married 60K No
– Choose the split position which gives you the in the 7 Yes Divorced 220K No
purity 8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

Cheat No No No Yes Yes Yes No No No No

Taxable Income
Sorted Values 60 70 75 85 90 95 100 120 125 220
Split Positions 55 65 72 80 87 92 97 110 122 172 230
<= > <= > <= > <= > <= > <= > <= > <= > <= > <= > <= >
Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0

No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0

Purity ? ? ? ? ? ? ? ? ? ? ?
Issue with Information Gain

• There is a natural bias in the information gain measure that favors attributes with
many values over those with few values.
• As an extreme example, consider the attribute Date, which has a very large number of
possible values, it would have the highest information gain of any of the attributes.
• This is because Date alone perfectly predicts the target attribute over the training data.
• Thus, it would be selected as the decision attribute for the root node of the tree and
lead to a (quite broad) tree of depth one, which perfectly classifies the training data.
• Of course, this decision tree would fare poorly on subsequent examples, because it is
not a useful predictor despite the fact that it perfectly separates the training data.
Measure of impurity: Gini index
• Gini index for a given node t :

𝐺𝐼𝑁𝐼(𝑡) = 1 − ෍[𝑝(𝑗|𝑡)]2
𝑗

p( j | t ) is the relative frequency of class j at node

t
– Maximum (1 – 1 / nc ) when records are equally distributed
among all classes, implying least amount of information ( nc
= number of classes ).
– Minimum ( 0.0 ) when all records belong to one class,
implying most amount of information.

C1 0 C1 1 C1 2 C1 3
C2 6 C2 5 C2 4 C2 3
Gini=0.000 Gini=0.278 Gini=0.444 Gini=0.500
Practical Issues

• Underfitting and Overfitting

• Missing Values
• Costs of Classification
Underfitting and Overfitting

Overfitting

Underfitting: when model is too simple, both training and test errors are large
Overfitting due to Noise

Decision boundary is distorted by noise point

Overfitting due to Insufficient Examples

Lack of data points in the lower half of the diagram makes it difficult to
predict correctly the class labels of that region
Insufficient number of training records in the region causes the decision tree
to predict the test examples using other training records that are irrelevant to
the classification task
Notes on Overfitting

• Overfitting results in decision trees that are more complex

than necessary
• Training error no longer provides a good estimate of how
well the tree will perform on previously unseen records
How to Address Overfitting

• Pre-Pruning (Early Stopping Rule)

– Stop the algorithm before it becomes a fully-grown tree
– Typical stopping conditions for a node:
• Stop if all instances belong to the same class
• Stop if all the attribute values are the same
– More restrictive conditions:
• Stop if number of instances is less than some user-specified
threshold
• Stop if expanding the current node does not improve impurity
measures.
How to Address Overfitting

• Post-pruning
– Grow decision tree to its entirety
– Trim the nodes of the decision tree in a bottom-up fashion
– If generalization error improves after trimming, replace sub-tree by a
leaf node.
– Class label of leaf node is determined from majority class of instances
in the sub-tree
Handling Missing Attribute Values

• Assume an attribute can take the value “blank”.

• Assign most common value of A among training data at node n.
• Assign most common value of A among training data at node n which have
the same target class.
• Assign prob pi to each possible value vi of A
– Assign a fraction (pi) of example to each descendant in tree
– This method is used in C4.5.
Strengths

• Can generate understandable rules

• Perform classification without much computation
• Can handle continuous and categorical variables
• Provide a clear indication of which fields are most important for prediction or
classification
Weakness

• Perform poorly with many class and small data.

• Computationally expensive to train.
– At each node, each candidate splitting field must be sorted before its best split can be
found.
– In some algorithms, combinations of fields are used and a search must be made for
optimal combining weights.
– Pruning algorithms can also be expensive since many candidate sub-trees must be
formed and compared.
References

• Machine Learning,Tom Mitchell.

• http://didattica.cs.unicam.it/lib/exe/fetch.php?media=didatti
ca:magistrale:kebi:ay_1516:ke-
7_learning_decision_trees.pdf
• Few online sources.

You might also like

Basic Level PPC Interview Questions & Answers
No ratings yet
Basic Level PPC Interview Questions & Answers
4 pages
Brannon Chapter 5
100% (2)
Brannon Chapter 5
49 pages
Decisiontrees
No ratings yet
Decisiontrees
46 pages
3 Decision Trees_LMS
No ratings yet
3 Decision Trees_LMS
47 pages
07_Decision tree
No ratings yet
07_Decision tree
45 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Information Gain: Information Gain (IG) Measures How Much "Information" A Feature Gives Us About The Class
No ratings yet
Information Gain: Information Gain (IG) Measures How Much "Information" A Feature Gives Us About The Class
34 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Unit 6 Finalized
No ratings yet
Unit 6 Finalized
30 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Decision Tree
100% (1)
Decision Tree
10 pages
Decision Tree
No ratings yet
Decision Tree
27 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Machine Learning Descision Tree
No ratings yet
Machine Learning Descision Tree
20 pages
Outlook Temp Humidity Windy Play
No ratings yet
Outlook Temp Humidity Windy Play
17 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Tree
No ratings yet
Decision Tree
26 pages
ML_Unit-3
No ratings yet
ML_Unit-3
29 pages
ML-19 (1)
No ratings yet
ML-19 (1)
28 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
Play Tennis Example: Outlook Temperature Humidity Windy
No ratings yet
Play Tennis Example: Outlook Temperature Humidity Windy
29 pages
What Is An ID3 Algorithm?
No ratings yet
What Is An ID3 Algorithm?
10 pages
ML intro
No ratings yet
ML intro
45 pages
ID3 Decision Tree Explanation
No ratings yet
ID3 Decision Tree Explanation
8 pages
Data Mining All Slides
No ratings yet
Data Mining All Slides
206 pages
Machine Learning - Part 1
100% (1)
Machine Learning - Part 1
80 pages
Decision Trees Classification: Mustafa Jarrar
No ratings yet
Decision Trees Classification: Mustafa Jarrar
46 pages
03-FSSR_DS610_2024=2025T1_DT
No ratings yet
03-FSSR_DS610_2024=2025T1_DT
51 pages
DLWSS551 - Algorithms Part I
No ratings yet
DLWSS551 - Algorithms Part I
59 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Session 6 - Decision Tree
No ratings yet
Session 6 - Decision Tree
37 pages
W2
No ratings yet
W2
33 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
jdavis-indlearn2 (1)
No ratings yet
jdavis-indlearn2 (1)
91 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
DM UNIT 4b (1R ALGO)
No ratings yet
DM UNIT 4b (1R ALGO)
39 pages
Decision Tree
No ratings yet
Decision Tree
100 pages
Predictive Analytics
No ratings yet
Predictive Analytics
29 pages
ID3
No ratings yet
ID3
7 pages
Decision Tree Learning and Inductive Inference
No ratings yet
Decision Tree Learning and Inductive Inference
37 pages
Information_Gain_with_Calculations
No ratings yet
Information_Gain_with_Calculations
3 pages
unit 3
No ratings yet
unit 3
90 pages
06 06 Information Gain 11-43
No ratings yet
06 06 Information Gain 11-43
11 pages
AIML Lect5 Decision Tree
No ratings yet
AIML Lect5 Decision Tree
33 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Lec-2 Decision Tree_13-8-2024
No ratings yet
Lec-2 Decision Tree_13-8-2024
38 pages
3ID3 Algorithm
No ratings yet
3ID3 Algorithm
9 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
ML UNIT III
No ratings yet
ML UNIT III
18 pages
EXP-3-To Implement CART Decision Tree Algorithm
No ratings yet
EXP-3-To Implement CART Decision Tree Algorithm
14 pages
Simple Learning Algorithms: Jiming Peng, Advol, Cas, Mcmaster 1
No ratings yet
Simple Learning Algorithms: Jiming Peng, Advol, Cas, Mcmaster 1
41 pages
Chapter 4
No ratings yet
Chapter 4
103 pages
CH - 7 Decision Tree?
No ratings yet
CH - 7 Decision Tree?
20 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Grip Phase 2
No ratings yet
Grip Phase 2
3 pages
Refraction Questions
100% (2)
Refraction Questions
18 pages
Case Study
No ratings yet
Case Study
7 pages
Maldives Hiring Agency
No ratings yet
Maldives Hiring Agency
2 pages
ARC Valve YARWAY
No ratings yet
ARC Valve YARWAY
8 pages
The Poorhouse Subsidized Housing in Chicago 2nd ed Edition Devereux Bowly Jr. - The ebook with rich content is ready for you to download
100% (1)
The Poorhouse Subsidized Housing in Chicago 2nd ed Edition Devereux Bowly Jr. - The ebook with rich content is ready for you to download
61 pages
Nurmayulis 2021
No ratings yet
Nurmayulis 2021
12 pages
Workt Ext in GE 105: Mathe Matics in The Moder N World
No ratings yet
Workt Ext in GE 105: Mathe Matics in The Moder N World
5 pages
Assignment - 1-Engineering-Management
No ratings yet
Assignment - 1-Engineering-Management
44 pages
Basic Cost Accounting
50% (2)
Basic Cost Accounting
322 pages
HTML/CSS: at First, There Was HTML
No ratings yet
HTML/CSS: at First, There Was HTML
9 pages
excel_text
No ratings yet
excel_text
555 pages
Didactics Summary
No ratings yet
Didactics Summary
2 pages
Kernel Sentence LP
No ratings yet
Kernel Sentence LP
2 pages
1999 Bobby Flay´s Boy Meets Grill
No ratings yet
1999 Bobby Flay´s Boy Meets Grill
381 pages
A New Model For Assessment Fast Food Customer Behavior Case Study: An Iranian Fast-Food Restaurant
No ratings yet
A New Model For Assessment Fast Food Customer Behavior Case Study: An Iranian Fast-Food Restaurant
14 pages
Da Curated Archive 2020-07-19
100% (3)
Da Curated Archive 2020-07-19
15 pages
Divisibility Rules With Examples
No ratings yet
Divisibility Rules With Examples
2 pages
Module 11
No ratings yet
Module 11
11 pages
How To Control Your Robot Using A Wii Nunchuck An
No ratings yet
How To Control Your Robot Using A Wii Nunchuck An
10 pages
1.assignment 1 Lingusitics
No ratings yet
1.assignment 1 Lingusitics
7 pages
Javascript Basics 101
No ratings yet
Javascript Basics 101
39 pages
B17. Nvis7017 PDF
No ratings yet
B17. Nvis7017 PDF
2 pages
Install Windows 8 Dev Preview in VMware Workstation
No ratings yet
Install Windows 8 Dev Preview in VMware Workstation
25 pages
iHFG Part Q 2-Equipment Planning
No ratings yet
iHFG Part Q 2-Equipment Planning
37 pages
The People's History
No ratings yet
The People's History
3 pages
Emerald City
No ratings yet
Emerald City
9 pages
Denyo DLW-300LS Instruction Manual
100% (1)
Denyo DLW-300LS Instruction Manual
50 pages