3 Decision Trees_LMS
3 Decision Trees_LMS
⚫ where,
n is the total number of classes in the target column (output)
⚫ pᵢ is the probability of class ‘i’
4
⚫ Information Gain for a feature A is computed as
⚫ IG(S, A) =
⚫ Entropy(S) - ∑((|Sᵥ| / |S|) * Entropy(Sᵥ))
⚫ 8 light, 6 strong
⚫ 0.94 – (8/14) 0.811 – (6/14) 1.0
⚫ = 0.048
Info Gain for all parameters
⚫ Information Gains
⚫ IG(S,wind) = 0.048
⚫ IG(S,Outlook) = 0.246
⚫ IG(S, temp) = 0.029
⚫ IG (S, Humidity) = 0.15
Dealing with continuous-
valued attributes
5/20/2020
Example for play tennis
Day Outlook Tempe Humidity Wind PlayTennis?
rature
1 Sunny Hot High Light No
2 Sunny Hot High Strong No
3 Overcast Hot High Light Yes
4 Rain Mild High Light Yes
5 Rain Cool Normal Light Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Light No
9 Sunny Cool Normal Light Yes
10 Rain Mild Normal Light Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Light Yes
14 Rain Mild High Strong No
Problems with information gain
5/21/2020
Problems with information gain
c |Si | |Si |
SplitInformation(S, A) − log2
|S |
i=1 |S |
⚫ Si are the sets obtained by partitioning on value i of A
⚫ 8 light, 6 strong
⚫ 0.94 – (8/14) 0.811 – (6/14) 1.0
⚫ = 0.049
Information gain for Wind
⚫ Splitinfo
⚫ – (8/14) log₂ (8/14) – (6/14) log₂ (6/14)
⚫ = 0.985
41/
14 Predrag Radenković 3237/10
Algorithm
46/
14 Predrag Radenković 3237/10
⚫ Read more about random forests from here:
⚫ https://towardsdatascience.com/understanding-random-
forest-58381e0602d2