0% found this document useful (0 votes)

3 views64 pages

Tree Based Methods

The document discusses tree-based methods in machine learning, focusing on decision trees, random forests, and boosted trees. It outlines the advantages and limitations of decision trees, including their ability to handle various data types and their susceptibility to overfitting. Additionally, it covers concepts like pruning, surrogate splits, and ensemble techniques such as BAGGING and boosting to enhance model performance.

Uploaded by

Arya Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views64 pages

Tree Based Methods

Uploaded by

Arya Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Tree Based Methods

Which ML technique is most appropriate?

2
Which ML technique is most appropriate?

3
Tree Based Methods
• Capable of representing non-linear relationships
• Easy to interpret (if .. then .. )

4
Decision Trees
• Supervised learning algorithm
• Applicable to continuous and discrete responses
• Example:

5
Decision Trees: Advantages
• Advantages
– Can handle various situations:
• Sparse, skewed, continuous, categorical
• Less influenced by outliers and missing values
– Not required to guess the relationship of the response
variable to the predictors
• Eg. Linear – as required in Linear Regression
– Simple to understand
– Easy to interpret
– Data preparation – minimal
– Algorithmic complexity not high: ~log(data_points)
– Implicitly does feature selection
6
Decision Trees: Limitations
• Limitations
– Over complexity, sometimes: Not “generalized representation” of data
– Instability: small changes in data leading to large structural changes
– If predictor → response relationship does not follow
rectangular sub-spaces
• high prediction errors will result
– Algorithms are heuristic: Global optima not guaranteed
– Biased trees, if some classes dominate
– Not really appropriate for continuous variables (?)
• Overcome using methods like:
• Random Forest, Bootstrap Aggregation (BAGGING)
7
Tree Based Methods
• Belong to “non-parametric” techniques
– No assumptions about the nature of relationship
between the predictors and the response variable

• Tree based models

– Decision trees
– Random Forest
– Boosted trees

• Can be used for

– Regression
– Classification 8
Tree Based Methods
• Basic idea
– Partition the solution space into rectangular areas
• Which predictors to use? Where to split?
• Decided by minimizing a cost function
– Within each rectangle, fit a model (… a constant)
• Training stops when at least specified number of training
instances are assigned to each leaf nodes

• Use: Classification and Regression (CART)

– CART: also the name of the algorithm

9
Classification and Regression Trees
Algorithm
• All “features” are “searched” to find the optimum
split
• Once a split happens, each resulting partition is
then recursively split
– By searching all “features” for optimum split, as above
– Until a termination criteria is reached
• Termination criteria:
– Number of terminal nodes
– Depth of the tree
10
Regression Trees: Explanation
Consider the following data set

Type of learning:
– Supervised

Technique to be used:
– Recursively Partitioned Regression Tree

11
Regression Tree: Algorithm

12
Regression Tree: Algorithm

13
Regression Tree: Algorithm

14
Regression Tree: Algorithm

15
Regression Tree: Algorithm

16
Regression Tree: Algorithm

17
Regression Tree: Algorithm

18
Regression Tree: Algorithm

19
Regression Tree: Algorithm
SPLIT SSE
X=1 164
X=2 123.354
X=3 84.823
X=4 63.548
X=5 59.948
X=6 57.328
X=7 71.35
X=8 120.84

20
Regression Tree: Algorithm

21
Pruning of Regression Trees
Pruning of tree required when
– Size of the tree is large
– Hence there is possibility of over-fitting the training
data set
Pruning carried out using
– Cost complexity tuning
– That is:
– cp is known as the complexity parameter
– Best pruned tree is found by varying cp over a range
– SSE or RMSE is used as the selection criteria

22
Pruning of Regression Trees
Consider the following data

23
Full Tree

24
Trees with cp = 0, 0.1, 0.2

25
Trees with cp = 0, 0.1, 0.2

26
Surrogate Splits
Technique for handling missing values
• Missing data is ignored and splits happen
• However, alternate splits are also remembered –
whose results are similar
• If predictors are not available at some split
– One of the surrogate split is used

A number of surrogate splits may be stored

for each split in the tree

27
Classification Trees

28
How do splits happen in Classification Trees?
Criteria for creating splits in trees

• Continuous response variable: splitting based on

– Variance

• Categorical response variable: splitting based on

– Classification error rates
– Gini Index
– Entropy

29
Classification Trees: Gini Score
• In classification trees
– Goal: Partition data into smaller homogeneous groups
– “Class” of the data point is determined by “mode”
– One of the methods for branching: Gini Score

• Given p1 and p2 as the probabilities of Class-1 and

Class-2 respectively of a node, the Gini Score of
the node is defined as:

G = p1 * (1-p1) + p2 * (1-p2)
G = (2 * p1 * p2) … for two class problem 30
Derivation of Gini Score for 2 Class Situation

Gini: Sounds like "Jee-nee" (with a soft 'g' like in "giraffe" and the 'i' like in "jean").
31
Understanding the Gini Score
• The Gini Score represents impurity of data
• Lower the score higher the purity, as illustrated in the figure below

32
Calculating and Interpreting Gini Scores
The following figures illustrate Gini Score calculations when a dataset is subdivided

• When sub-divided, as shown, the overall Gini Score is the weighted sum of the
Gini Score of each part. After the split we see that the overall score has reduced.
• We say that the split has improved the classification quality by increasing class
purity in the resulting subsets.
33
Example: Decision based on Gini Index
• The overall Gini Score of a split is the weighed
sum of Gini Index of individual nodes

34
Example: Decision based on Gini Index
• The overall Gini Score of a split is the weighed
sum of Gini Index of individual nodes

35
Example: Decision based on Gini Index
• The overall Gini Score of a split is the weighed
sum of Gini Index of individual nodes

36
Example: Decision based on Gini Index
• The overall Gini Score of a split is the weighed
sum of Gini Index of individual nodes

37
Example: Decision based on Gini Index
• The overall Gini Score of a split is the weighed
sum of Gini Index of individual nodes

38
Example: Decision based on Gini Index
If the main node is split on the basis of Gender

39
Classification Trees: Cross-Entropy
• Entropy : An alternative to the Gini Index
• Cross Entropy:

• Interpretation:
– Purer the node, cross-entropy will be closer to zero
– Cross-entropy and Gini Index are similar in values

40
Classification Trees: Classification Error Rates
Classification error rate
– Fraction of training observations in the candidate
region that do not belong to the most common class

Observation:
– Classification error rate is not sensitive enough for
growing the trees

41
Governing Parameters in Tree-based Algorithms
• Parameters that need tuning:
– Minimum samples for a node-split
– Minimum samples for a terminal node
– Maximum depth of the tree
– Maximum number of terminal nodes
– Maximum features to consider for split
• Eg. sqrt(total_number_of_features)

42
Ensemble Techniques in Trees
• Ensemble
– A group of items viewed as a whole rather than
individually
– Eg. a group of musicians who play together
• In the context of trees
– Methods that combine many models’ predictions
• Specific techniques
– BAGGING
– Random Forest
– Boosting

43
Bootstrap Aggregation (BAGGING)
• Bootstrap Aggregation (BAGGING)
– General method for reducing the variance of a statistical
learning technique
• Method to reduce variance:
– Take a number of samples from the population:
• But this is not always possible
– Solution:
• Take repeated samples from the available data set
– Sampling with replacement
• Construct decision tree using each such sample
– Trees are grown deep, not pruned
– Such trees will have high variance but low bias
– Overall variance reduced by aggregating output from the trees
• Statistical basis of BAGGING
44
Bootstrap Aggregation (BAGGING)
• Aggregation
– Regression: take the mean of individual results
– Classification: majority vote
• Note:
– Results are more accurate
– But interpretability goes down
• Bagging improves prediction at the cost of interpretability
– Multiple full trees are generated:
• Hundreds, even thousands of trees may be generated
• Computational time increases

45
Data Set and Normal (Full) Tree

46
BAGGING of Trees
BAGGING with 25 iterations

47
Random Forest
• Essentially based on BAGGING concept
– A number of trees are built by bootstrapping samples of
training data
– Handles the situation of “strong predictors”
• Strong predictors = Correlated trees
• ➔ Variance not reduced as much as expected
• With some tweaks
– Select a random number of predictors (m) from total (p)
– And then split a node
• Advantages
– Lower variance (advantage of BAGGING)
– Lower bias (number of predictors can be considered)
– A very large number of predictors can be handled
48
Random Forest Algorithm

Applied Predictive Modeling, Kuhn and Johnson

49
Random Forest

50
Random Forest
• Tuning parameters
– Node size: can be small => goal: reduce bias
– Number of trees
– Number of predictors sampled:
• Guideline: one-third in case of regression
• Guideline: square root in case of classification
• Limitations
– Extrapolation cannot be done
– Training and prediction: slow processes
– In case of classification: do not perform well if too
many classes
51
Data Set 2: With Single Tree

52
Data Set 2: BAGGED Tree

53
Data Set 2: Random Forest

54
Data Set 2: Random Forest – Error Trend

55
Boosted Trees
• Essential idea
– Use a set of classifiers (potentially weak) to boost the
overall results
• Boosting:
– There is no bootstrap sampling
– Each tree is grown using info resulting from previous trees
– Boosting uses residuals from the previous tree to fit a new
tree
• Boosted trees are very popular …
– Originally developed for CLASSIFICATION
– Example: Adaptive Boosting (ADABOOST)

56
Single Tree: All points with equal weights

57
Single Tree: First 100 points with weights=5

58
Single Tree: 1:100->wts=5; 500:720->wts=0

59
Boosting Algorithms
• First of all create a tree with equal weights assigned
to all observations
• Find out the prediction error with respect to each
observation
• For observations in error, increase the weights; for
observations not in error, decrease weights
• Re-fit a new tree with the newly weighted
observations
• Continue till the number of pre-decided trees have
been fit
• Predicted value = weighted / voted value from all the
trees generated
60
Boosted Tree (100 trees)

61
Boosted Tree (500 Trees)

62
Boosted Tree (100 trees)

63
Boosted Tree (500 trees)

Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Unit 3 - ML (NEW)
No ratings yet
Unit 3 - ML (NEW)
68 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Random Forest
No ratings yet
Random Forest
5 pages
Random Forests Simplified
No ratings yet
Random Forests Simplified
39 pages
Tree-Based Machine Learning Methods
100% (1)
Tree-Based Machine Learning Methods
138 pages
Tree Based Algorithms in Machine Learning
No ratings yet
Tree Based Algorithms in Machine Learning
8 pages
Unit 2
No ratings yet
Unit 2
29 pages
Tree-Based Methods
No ratings yet
Tree-Based Methods
32 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Decision Trees: Example
No ratings yet
Decision Trees: Example
14 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
BSC ML Ch3
No ratings yet
BSC ML Ch3
106 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Unit 3 by GPT
No ratings yet
Unit 3 by GPT
10 pages
Chapter 09 CART-3
No ratings yet
Chapter 09 CART-3
42 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
ML Unit 3
No ratings yet
ML Unit 3
21 pages
Session 9 10 Decision Tree
No ratings yet
Session 9 10 Decision Tree
41 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
Unit-IV New
No ratings yet
Unit-IV New
18 pages
Unit3 ML
No ratings yet
Unit3 ML
23 pages
Unit-5 Decision Trees and Ensemble Learning
100% (1)
Unit-5 Decision Trees and Ensemble Learning
162 pages
Data Mining: Trees & Rules
No ratings yet
Data Mining: Trees & Rules
36 pages
Module09 TreeBasedMethods
No ratings yet
Module09 TreeBasedMethods
36 pages
Decision Trees and Random Forest
No ratings yet
Decision Trees and Random Forest
79 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
36 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
37 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
Module10 TreeBasedMethods
No ratings yet
Module10 TreeBasedMethods
33 pages
Classification & Regression Trees Guide
No ratings yet
Classification & Regression Trees Guide
80 pages
Decision Trees Cheat Sheet PDF
No ratings yet
Decision Trees Cheat Sheet PDF
2 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
Random Forests
No ratings yet
Random Forests
43 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Unit II
No ratings yet
Unit II
34 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Notes On Decision Trees
No ratings yet
Notes On Decision Trees
2 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
15 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
ML Unit-3
No ratings yet
ML Unit-3
23 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Adobe Scan 16 May 2023
No ratings yet
Adobe Scan 16 May 2023
12 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
19 pages
Unit 3
No ratings yet
Unit 3
25 pages
Template With Morph Transition
No ratings yet
Template With Morph Transition
13 pages
Iptv Guide
No ratings yet
Iptv Guide
66 pages
6 - HSNO Hazardous Substances Management Procedure
100% (1)
6 - HSNO Hazardous Substances Management Procedure
4 pages
Python Quick Reference
100% (1)
Python Quick Reference
3 pages
Innovation Club Booklet
No ratings yet
Innovation Club Booklet
29 pages
Employee Probationary Period Policy
No ratings yet
Employee Probationary Period Policy
3 pages
322-70 Smoke Goggles
No ratings yet
322-70 Smoke Goggles
1 page
Ug940 Vivado Tutorial Embedded Design
No ratings yet
Ug940 Vivado Tutorial Embedded Design
108 pages
The Ultimate Guide To GOURI 2024 - BIOHACKING BEAUTY
No ratings yet
The Ultimate Guide To GOURI 2024 - BIOHACKING BEAUTY
35 pages
Assignment#2
No ratings yet
Assignment#2
3 pages
Automation of Drawbridge Model Using by PLC: Iarjset
No ratings yet
Automation of Drawbridge Model Using by PLC: Iarjset
8 pages
Solasa
No ratings yet
Solasa
2 pages
Superstar Announcement Sep 2024
No ratings yet
Superstar Announcement Sep 2024
15 pages
Petrel Workflow Tools: 5 Day Introduction Course
No ratings yet
Petrel Workflow Tools: 5 Day Introduction Course
22 pages
12thEGN Conference 2013
No ratings yet
12thEGN Conference 2013
13 pages
BÀI TẬP READING CƠ BẢN
No ratings yet
BÀI TẬP READING CƠ BẢN
4 pages
Standard Costing in Manufacturing
100% (2)
Standard Costing in Manufacturing
28 pages
Santaan Gopal Mantra Vidhi in Hindi and Sanskrit PDF
25% (4)
Santaan Gopal Mantra Vidhi in Hindi and Sanskrit PDF
8 pages
WK 2. IAMSAR Rules and Regulations
No ratings yet
WK 2. IAMSAR Rules and Regulations
17 pages
Geotube Cone Test Prueba de Cono Del Geotube
No ratings yet
Geotube Cone Test Prueba de Cono Del Geotube
12 pages
Battle Bus Drop at DuckDuckGo
No ratings yet
Battle Bus Drop at DuckDuckGo
1 page
Fault Code 278-01
100% (2)
Fault Code 278-01
6 pages
Sanket Papinwar Pluralsight Removed
No ratings yet
Sanket Papinwar Pluralsight Removed
30 pages
Advanced GUI Code Example: Button List Bar
No ratings yet
Advanced GUI Code Example: Button List Bar
9 pages
L&T Construction: Water & Effluent Treatment IC - EDRC
No ratings yet
L&T Construction: Water & Effluent Treatment IC - EDRC
4 pages
Background Investigation Form
100% (1)
Background Investigation Form
2 pages
Thomas Alexander Bateman Resume 2017
No ratings yet
Thomas Alexander Bateman Resume 2017
2 pages
Digital Meter User Guide
100% (1)
Digital Meter User Guide
17 pages
Engineering Materials Murat Bengisu Engineering Ceramics Springer Verlag Berlin Heidelberg Iran
No ratings yet
Engineering Materials Murat Bengisu Engineering Ceramics Springer Verlag Berlin Heidelberg Iran
630 pages
Role of Cap Plies - Osborne
No ratings yet
Role of Cap Plies - Osborne
16 pages

Tree Based Methods

Uploaded by

Tree Based Methods

Uploaded by

Tree Based Methods

Which ML technique is most appropriate?

• Tree based models

• Can be used for

• Use: Classification and Regression (CART)

A number of surrogate splits may be stored

• Continuous response variable: splitting based on

• Categorical response variable: splitting based on

• Given p1 and p2 as the probabilities of Class-1 and

Applied Predictive Modeling, Kuhn and Johnson

You might also like