CHL5230 2025w Lecture 07 v1
CHL5230 2025w Lecture 07 v1
Nicholas Mitsakakis
[email protected]
1 / 24
CHL5230H - Applied Machine Learning for Health Data
Introduction
I Tree based methods are methods that split the feature space
in a recursive way using one feature at a time
I The segmentation of the space can be summarized and
visualized by a tree-like structure
I They can be used for both classification and regression
I They are often called decision trees
I One of the most popular methods, Classification And
Regression Trees (CART) was invented by Leo Breiman
around 1984
2 / 24
CHL5230H - Applied Machine Learning for Health Data
3 / 24
CHL5230H - Applied Machine Learning for Health Data
Recursive partitioning
4 / 24
CHL5230H - Applied Machine Learning for Health Data
Recursive partitioning
5 / 24
CHL5230H - Applied Machine Learning for Health Data
Recursive partitioning
I After the initial partitioning was done, each one of the two
regions can be further partitioned
I This is done again after selecting features and thresholds for
each one of the partitions
I The result has a tree-like structure
6 / 24
CHL5230H - Applied Machine Learning for Health Data
7 / 24
CHL5230H - Applied Machine Learning for Health Data
8 / 24
CHL5230H - Applied Machine Learning for Health Data
For classification
9 / 24
CHL5230H - Applied Machine Learning for Health Data
Measuring purity
PK
Gini index G = k=1 p̂mk (1 − p̂mk )
I
=− K
P
I Cross-entropy D k=1 p̂mk log p̂mk
I p̂mk represents the proportion of training observations in the
m-th region that are from the k-th class
I Both of these measures take values close to 0 if p̂mk are either
close to 0 or close to 1
10 / 24
CHL5230H - Applied Machine Learning for Health Data
Purity of partitions
11 / 24
CHL5230H - Applied Machine Learning for Health Data
12 / 24
CHL5230H - Applied Machine Learning for Health Data
13 / 24
CHL5230H - Applied Machine Learning for Health Data
Here P(survival) for the leaves are 0.73, 0.17, 0.05, 0.89
14 / 24
CHL5230H - Applied Machine Learning for Health Data
15 / 24
CHL5230H - Applied Machine Learning for Health Data
Missing data
16 / 24
CHL5230H - Applied Machine Learning for Health Data
17 / 24
CHL5230H - Applied Machine Learning for Health Data
18 / 24
CHL5230H - Applied Machine Learning for Health Data
19 / 24
CHL5230H - Applied Machine Learning for Health Data
Thal:a
|
0.6
Training
Cross−Validation
Test
0.5
0.4
Error
0.3
Yes Yes
0.1
No No
No Yes
5 10 15
Tree Size
20 / 24
CHL5230H - Applied Machine Learning for Health Data
Regression tress
21 / 24
CHL5230H - Applied Machine Learning for Health Data
Predictions
22 / 24
CHL5230H - Applied Machine Learning for Health Data
2
1
1
X2
X2
0
0
−1
−1
−2
−2
−2 −1 0 1 2 −2 −1 0 1 2
X1 X1
2
2
1
1
X2
X2
0
0
−1
−1
−2
−2
−2 −1 0 1 2 −2 −1 0 1 2
X1 X1
23 / 24
CHL5230H - Applied Machine Learning for Health Data
Discussion
24 / 24