0% found this document useful (0 votes)

137 views25 pages

Addressing Overfitting in Decision Trees

(ii) Developing the rules first and then pruning the rules is generally better than pruning the tree first and then developing rules. This is because pruning the tree may remove branches that could be useful for generating accurate rules, even if those branches don't improve the tree's accuracy. By generating all possible rules first without pruning, we avoid losing potentially useful information for rule generation. We can then evaluate the rules individually and prune only the rules that don't contribute to accuracy, leaving other useful rules intact.

Uploaded by

Hamza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views25 pages

Addressing Overfitting in Decision Trees

Uploaded by

Hamza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CS-13410

Introduction to Machine Learning

Lecture # 07

(Decision Trees – Ch # 3 by Tom Mitchell)

by
Mudasser Naseer
Assignment -1

Assignment – 1 is uploaded on Moodle. Due date is 12-03-

2021 till 4:00 pm.

2
Measuring Node Impurity
p(i|t): fraction of records associated
with node t belonging to class i

• Used in ID3 and C4.5

• Used in CART, SLIQ, SPRINT.

3
Example
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Gini = 1 – (P(C1))2 – (P(C2))2 = 1 – 0 – 1 = 0
Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0
Error = 1 – max (0, 1) = 1 – 1 = 0

P(C1) = 1/6 P(C2) = 5/6

Gini = 1 – (1/6)2 – (5/6)2 = 0.278
Entropy = – (1/6) log2 (1/6) – (5/6) log2 (1/6) = 0.65
Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6

P(C1) = 2/6 P(C2) = 4/6

Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Entropy = – (2/6) log2 (2/6) – (4/6) log2 (4/6) = 0.92
Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3
4
Impurity measures
All of the impurity measures take value zero
(minimum) for the case of a pure node where a
single value has probability 1

All of the impurity measures take maximum

value when the class distribution in a node is
uniform.

5
Splitting Based on GINI
When a node p is split into k partitions
(children), the quality of split is computed as,

where, ni = number of records at child i,

n = number of records at node p.

6
Binary Attributes: Computing GINI Index
• Splits into two partitions
• Effect of Weighing partitions:
– Larger and Purer Partitions are sought for.

Yes No

Node N1 Node N2

Gini(Children)
Gini(N1) = 7/12 * 0.194 + 5/12 * 0.528
= 1 – (5/7)2 – (2/7)2 = 0.194 = 0.333
Gini(N2) This is the quality of split
= 1 – (1/5)2 – (4/5)2 = 0.528 for Variable B 7
Categorical Attributes

 For binary values split in two

 For multivalued attributes, for each distinct value,
gather counts for each class in the dataset
 Use the count matrix to make decisions

Multi-way split Two-way split

(find best partition of values)

8
Continuous Attributes
 Use Binary Decisions based on one value

 Choices for the splitting value

 Number of possible splitting values
= Number of distinct values

 Each splitting value has a count matrix

associated with it
 Class counts in each of the partitions, A
≤ v and A > v

 Exhaustive method to choose best v

 For each v, scan the database to gather
count matrix and compute the impurity
index
 Computationally Inefficient! Repetition of
work.
Continuous Attributes
 For efficient computation: for each attribute,
 Sort the attribute on values
 Linearly scan these values, each time updating the count matrix
and computing impurity
 Choose the split position that has the least impurity

Sorted Values
Split Positions

10
Splitting based on impurity
 Impurity measures favor attributes with
large number of categories

 A test condition with large number of

outcomes may not be desirable
 # of records in each partition is too small
to make predictions

11
Gain Ratio
 The information gain measure tends to prefer attributes with
large numbers of possible categories.
 Gain ratio: a modification of the information gain that reduces
its bias on high‐branch attributes.
 Gian ratio should be
 Large when data is evenly spread
 Small when all data belong to one branch
 Gain ratio takes number and size of branches into account
when choosing an attribute
 It corrects the information gain by taking the intrinsic
information of a split into account
Or if we use S in place of D
Gain Ratio
 Adjusts Information Gain by the entropy of the partitioning
(SplitINFO). Higher entropy partitioning (large number of
small partitions) is penalized!
 Used in C4.5
 Designed to overcome the disadvantage of impurity
Example (Play tennis) :
More on the gain ratio
 “Outlook” still comes out top
 However: “ID code” has greater gain ratio
 Standard fix: In particular applications we can
use an ad hoc test to prevent splitting on that
type of attribute
 Problem with gain ratio: it may overcompensate
 May choose an attribute just because its intrinsic
information is very low
 Standard fix:
• First, only consider attributes with greater than average
information gain
• Then, compare them on gain ratio
14
Comparing Attribute Selection Measures
 The three measures, in general, return good
results but
 Information Gain
 Biased towards multivalued attributes
 Gain Ratio
 Tends to prefer unbalanced splits in which one
partition is much small than the other
 Gini Index
 Biased towards multivalued attributes
 Has difficulties when the number of classes is
large
 Tends to favor tests that result in equal-sized
partitions and purity in both partitions
15
Stopping Criteria for Tree Induction

 Stop expanding a node when all the

records belong to the same class

 Stop expanding a node when all the

records have similar attribute values

16
Decision Tree Based Classification
 Advantages:
 Inexpensive to construct
 Extremely fast at classifying unknown records
 Easy to interpret for small-sized trees
 Accuracy is comparable to other classification
techniques for many simple data sets

17
Example: C4.5
 Simple depth-first construction.
 Uses Information Gain
 Sorts Continuous Attributes at each
node.
 Needs entire data to fit in memory.
 Unsuitable for Large Datasets.
 Needs out-of-core sorting.

 You can download the software from:

[Link]
18
Practical Issues of Classification
 Underfitting and Overfitting

 Evaluation

19
Underfitting and Overfitting
Underfitting Overfitting

Underfitting: when model is too simple, both training and test errors are large
Overfitting: when model is too complex it models the details of the training20set
and fails on the test set
Overfitting due to Noise

Decision boundary is distorted by noise point 21

Notes on Overfitting
 Overfitting results in decision trees that
are more complex than necessary
 Training error no longer provides a
good estimate of how well the tree will
perform on previously unseen records
 The model does not generalize well
 Need new ways for estimating errors

22
How to Address Overfitting:
Tree Pruning
 Pre-Pruning (Early Stopping Rule)
 Stop the algorithm before it becomes a fully-grown tree
 Typical stopping conditions for a node:
• Stop if all instances belong to the same class
• Stop if all the attribute values are the same
 More restrictive conditions:
• Stop if number of instances is less than some user-specified threshold
• Stop if class distribution of instances are independent of the available features
(e.g., using  2 test)
• Stop if expanding the current node does not improve impurity measures (e.g.,
Gini or information gain) or it falls below a threshold value.
 Upon halting, the node becomes a leaf
 The leaf may hold the most frequent class among the subset
tuples.
 Problem
23
• Difficult to choose an appropriate threshold
How to Address Overfitting…

 Post-pruning
 Grow decision tree to its entirety
 Trim the nodes of the decision tree in a
bottom-up fashion
 If generalization error improves after
trimming, replace sub-tree by a leaf node.
 Class label of leaf node is determined from
majority class of instances in the sub-tree

24
Prune the Tree OR Prune the Rule
 In order to Reduce the complexity of
decision procedure we have two options
(i) either we can prune the tree first
and then develop the rules or (ii) we
can develop the rules and then prune
the rules.
 Which is better?
(ii) is better, why?

Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
48 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
56 pages
Decision Tree Induction Overview
No ratings yet
Decision Tree Induction Overview
18 pages
Aiml
No ratings yet
Aiml
16 pages
CART: Classification and Regression Trees
No ratings yet
CART: Classification and Regression Trees
34 pages
Understanding Decision Tree Classification
No ratings yet
Understanding Decision Tree Classification
71 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
75 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
52 pages
Data Mining: Classification Basics
No ratings yet
Data Mining: Classification Basics
34 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
55 pages
Decision Trees: Algorithms and Applications
No ratings yet
Decision Trees: Algorithms and Applications
34 pages
Understanding Information Gain in Trees
No ratings yet
Understanding Information Gain in Trees
13 pages
Classification Techniques in Machine Learning
No ratings yet
Classification Techniques in Machine Learning
82 pages
Understanding Decision Tree Classifiers
No ratings yet
Understanding Decision Tree Classifiers
30 pages
Decision Tree
No ratings yet
Decision Tree
10 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
32 pages
Unit 3
No ratings yet
Unit 3
23 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
16 pages
K-Fold CV: Pros and Cons Explained
No ratings yet
K-Fold CV: Pros and Cons Explained
30 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
80 pages
Decision Trees and Ensemble Methods Overview
No ratings yet
Decision Trees and Ensemble Methods Overview
52 pages
Understanding Decision Trees in Classification
No ratings yet
Understanding Decision Trees in Classification
47 pages
Understanding CART in Business Analytics
No ratings yet
Understanding CART in Business Analytics
22 pages
Data Mining Algorithms Overview
No ratings yet
Data Mining Algorithms Overview
68 pages
Basic Concepts of Classification
No ratings yet
Basic Concepts of Classification
69 pages
Attribute Selection in Decision Trees
No ratings yet
Attribute Selection in Decision Trees
3 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
30 pages
Decision Tree Induction Explained
100% (1)
Decision Tree Induction Explained
22 pages
CART: Decision Trees Explained
No ratings yet
CART: Decision Trees Explained
17 pages
Classification vs Prediction in Machine Learning
No ratings yet
Classification vs Prediction in Machine Learning
59 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
86 pages
Understanding Decision Trees and Algorithms
No ratings yet
Understanding Decision Trees and Algorithms
34 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
81 pages
Supervised Learning Techniques Syllabus
No ratings yet
Supervised Learning Techniques Syllabus
183 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
75 pages
Best Split Measures in Decision Trees
No ratings yet
Best Split Measures in Decision Trees
37 pages
Decision Tree Construction and Pruning
No ratings yet
Decision Tree Construction and Pruning
37 pages
Decision Tree Classification Explained
No ratings yet
Decision Tree Classification Explained
107 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
57 pages
Inductive Inference with Decision Trees
No ratings yet
Inductive Inference with Decision Trees
53 pages
Data Mining Classifiers Overview
No ratings yet
Data Mining Classifiers Overview
57 pages
Decision Tree vs Extra Tree Analysis
No ratings yet
Decision Tree vs Extra Tree Analysis
13 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
42 pages
Decision Tree Induction in Machine Learning
No ratings yet
Decision Tree Induction in Machine Learning
52 pages
Decision Tree Classification Techniques
No ratings yet
Decision Tree Classification Techniques
57 pages
Decision Trees for Classification Techniques
100% (1)
Decision Trees for Classification Techniques
62 pages
General Classification Problem Approach
No ratings yet
General Classification Problem Approach
18 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Understanding Decision Trees in Machine Learning
No ratings yet
Understanding Decision Trees in Machine Learning
9 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Decision Tree Learning Explained
No ratings yet
Decision Tree Learning Explained
74 pages
Understanding Classification and Decision Trees
No ratings yet
Understanding Classification and Decision Trees
24 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
INACS: Classification Algorithms Overview
No ratings yet
INACS: Classification Algorithms Overview
60 pages
Naïve Bayes vs Decision Trees Explained
No ratings yet
Naïve Bayes vs Decision Trees Explained
42 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
45 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Understanding Tree Classifiers in AI
No ratings yet
Understanding Tree Classifiers in AI
30 pages
Prose Poetry in Theory and Practice
100% (1)
Prose Poetry in Theory and Practice
251 pages
Philosophical Foundations of Education
No ratings yet
Philosophical Foundations of Education
68 pages
Watts Zimmerman (1978 Towards A Positive Accounting Theory For The Determination of Accounting
100% (2)
Watts Zimmerman (1978 Towards A Positive Accounting Theory For The Determination of Accounting
24 pages
Child-Centered Play Therapy Insights
No ratings yet
Child-Centered Play Therapy Insights
1 page
Nutrition Basics: Food Classes & Functions
No ratings yet
Nutrition Basics: Food Classes & Functions
7 pages
Marketing Aptitude Quiz for Bank Exams
100% (1)
Marketing Aptitude Quiz for Bank Exams
13 pages
Ramil Cabugao Drug Case Decision
No ratings yet
Ramil Cabugao Drug Case Decision
6 pages
Julius Caesar Multiple Choice Quiz
100% (3)
Julius Caesar Multiple Choice Quiz
15 pages
Energies: Comprehensive Electric Arc Furnace Electric Energy Consumption Modeling: A Pilot Study
No ratings yet
Energies: Comprehensive Electric Arc Furnace Electric Energy Consumption Modeling: A Pilot Study
13 pages
Teachings of Saint Kabir in Dohas
No ratings yet
Teachings of Saint Kabir in Dohas
8 pages
Santos' Struggles as a Pensionado
No ratings yet
Santos' Struggles as a Pensionado
25 pages
CEZ Group Financial Overview 2023
No ratings yet
CEZ Group Financial Overview 2023
1 page
200 Common Phrasal Verbs Explained
No ratings yet
200 Common Phrasal Verbs Explained
5 pages
Differentiated Instruction for Poetry Analysis
No ratings yet
Differentiated Instruction for Poetry Analysis
5 pages
Algebra Fundamentals and Polynomial Roots
No ratings yet
Algebra Fundamentals and Polynomial Roots
108 pages
Craig (1988) Daseinsanalysis - A Quest For Essentials
No ratings yet
Craig (1988) Daseinsanalysis - A Quest For Essentials
21 pages
Agamemnon: Pride, Revenge, and Fate
No ratings yet
Agamemnon: Pride, Revenge, and Fate
7 pages
Vietnamese Students' Attitudes Toward AI
No ratings yet
Vietnamese Students' Attitudes Toward AI
8 pages
Time Operator in Quantum Mechanics
No ratings yet
Time Operator in Quantum Mechanics
4 pages
Kuhn's Paradigm in Education Methods
No ratings yet
Kuhn's Paradigm in Education Methods
17 pages
Biographical Narrative Writing Rubric
No ratings yet
Biographical Narrative Writing Rubric
2 pages
Atomic Models and Subatomic Particles
No ratings yet
Atomic Models and Subatomic Particles
37 pages
Power Electronics Simulation with Simulink
83% (23)
Power Electronics Simulation with Simulink
129 pages
MARK SCHEME For The June 2005 Question Paper
No ratings yet
MARK SCHEME For The June 2005 Question Paper
5 pages
Significance of Narasimha Avathara
No ratings yet
Significance of Narasimha Avathara
7 pages
Jacques Lacan Talking To Brick Walls A Series of Presentations in The Chapel at Sainteanne Hospital
100% (1)
Jacques Lacan Talking To Brick Walls A Series of Presentations in The Chapel at Sainteanne Hospital
116 pages
Montessori Education: A Review of The Evidence Base
No ratings yet
Montessori Education: A Review of The Evidence Base
9 pages
Santos vs. Bank of the Philippines Case
No ratings yet
Santos vs. Bank of the Philippines Case
3 pages
Metro Manila Trip Estimation Model
No ratings yet
Metro Manila Trip Estimation Model
2 pages
Business Mathematics Course Outline
No ratings yet
Business Mathematics Course Outline
4 pages

Addressing Overfitting in Decision Trees

Uploaded by

Addressing Overfitting in Decision Trees

Uploaded by

CS-13410

Introduction to Machine Learning

(Decision Trees – Ch # 3 by Tom Mitchell)

Assignment – 1 is uploaded on Moodle. Due date is 12-03-

• Used in ID3 and C4.5

• Used in CART, SLIQ, SPRINT.

P(C1) = 1/6 P(C2) = 5/6

P(C1) = 2/6 P(C2) = 4/6

All of the impurity measures take maximum

where, ni = number of records at child i,

 For binary values split in two

Multi-way split Two-way split

 Choices for the splitting value

 Each splitting value has a count matrix

 Exhaustive method to choose best v

 A test condition with large number of

 Stop expanding a node when all the

 Stop expanding a node when all the

 You can download the software from:

Decision boundary is distorted by noise point 21

You might also like