0% found this document useful (0 votes)
24 views

3 Decision Trees_LMS

Uploaded by

akg.uk14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

3 Decision Trees_LMS

Uploaded by

akg.uk14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Decision Trees

Interview for Selection of Engineer

⚫ Well Dressed 46 Yes 4 No


⚫ 3 year Experience 12 Yes 38 No
⚫ G.K. 8 Yes 42 No
⚫ Btech 32 Yes 18 No
⚫ Phd 1 Yes 49 No
⚫ Internship done 20 Yes 30 No
⚫ Good physique 40 yes 10 No
⚫ Entropy(S) = -– ∑ pᵢ * log₂(pᵢ) ; i = 1 to n

⚫ where,
n is the total number of classes in the target column (output)
⚫ pᵢ is the probability of class ‘i’

⚫ There will be number of features.


⚫ Calculate Info Gain for each one.
3
⚫ Entropy(S) = -– ∑ pᵢ * log₂(pᵢ) ; i = 1 to n
⚫ 46 log 46 + 4 log 4
⚫ Logb x = Loga x/ Logab
⚫ (46/50) log₂ (46/50) + (4/50) log₂ (4/50)
⚫ = 0 .1207
⚫ (28/50) log₂ (28/50) + (22/50) log₂ (22/50)
⚫ = 1.006

4
⚫ Information Gain for a feature A is computed as
⚫ IG(S, A) =
⚫ Entropy(S) - ∑((|Sᵥ| / |S|) * Entropy(Sᵥ))

⚫ where Sᵥ is the set of rows in S for which the feature


column A has value v,
⚫ |Sᵥ| is the number of rows in Sᵥ
⚫ and likewise |S| is the number of rows in S.
5
⚫ Suppose there are 3 features
⚫ Compute IG(S,A), IG(S,B), IG(S,C)
⚫ Whichever is maximum , select that to be top root of the
decision tree.
⚫ That will result in a tree of small height
⚫ Today’s example was from
⚫ https://towardsdatascience.com/decision-trees-for-
classification-id3-algorithm-explained-89df76e72df1
Choosing the attribute split

◼ IDEA: evaluate attribute according to its power of


separation between near instances

◼ Values of good attribute should distinguish between near


instances from different class and have similar values for
near instances from the same class

◼ Numerical values can be discretized


Choosing the attribute

◼ Main difference: divide (split) criterion


◼ Which attribute to test at each node in the
tree ? The attribute that is most useful for
classifying examples.
Example for play tennis
Day Outlook Tempe Humidity Wind PlayTennis?
rature
1 Sunny Hot High Light No
2 Sunny Hot High Strong No
3 Overcast Hot High Light Yes
4 Rain Mild High Light Yes
5 Rain Cool Normal Light Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Light No
9 Sunny Cool Normal Light Yes
10 Rain Mild Normal Light Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Light Yes
14 Rain Mild High Strong No
entropy of whole set

⚫ Entropy of the whole set


entropy of whole set

⚫ Entropy of the whole set


⚫ – (9/14) log₂ (9/14) – (5/14) log₂ (5/14)
⚫ = 0.94
Info Gain for Wind parameter

⚫ Entropy of Light wind component


Example for play tennis
Day Outlook Tempe Humidity Wind PlayTennis?
rature
1 Sunny Hot High Light No
2 Sunny Hot High Strong No
3 Overcast Hot High Light Yes
4 Rain Mild High Light Yes
5 Rain Cool Normal Light Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Light No
9 Sunny Cool Normal Light Yes
10 Rain Mild Normal Light Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Light Yes
14 Rain Mild High Strong No
Info Gain for Wind parameter

⚫ Entropy of Light wind component

⚫ 8 Light ( 2 No, 6 yes)


⚫ – (2/8) log₂ (2/8) – (6/8) log₂ (6/8)
⚫ = 0.811
Info Gain for Wind parameter

⚫ Entropy of strong wind component


Example for play tennis
Day Outlook Tempe Humidity Wind PlayTennis?
rature
1 Sunny Hot High Light No
2 Sunny Hot High Strong No
3 Overcast Hot High Light Yes
4 Rain Mild High Light Yes
5 Rain Cool Normal Light Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Light No
9 Sunny Cool Normal Light Yes
10 Rain Mild Normal Light Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Light Yes
14 Rain Mild High Strong No
Info Gain for Wind parameter

⚫ Entropy of strong wind component

⚫ 6 strong ( 3 No, 3 yes)


⚫ – (3/6) log₂ (3/6) – (3/6) log₂ (3/6)
⚫ = 1.0
Info Gain for Wind parameter

⚫ Information Gain for wind

⚫ 8 light, 6 strong
⚫ 0.94 – (8/14) 0.811 – (6/14) 1.0
⚫ = 0.048
Info Gain for all parameters

⚫ Information Gains
⚫ IG(S,wind) = 0.048
⚫ IG(S,Outlook) = 0.246
⚫ IG(S, temp) = 0.029
⚫ IG (S, Humidity) = 0.15
Dealing with continuous-
valued attributes

▪ So far discrete values for attributes and for


outcome.
▪ Given a continuous-valued attribute A,
dynamically create a new attribute Ac
Ac = True if A < c, False otherwise
▪ How to determine threshold value c ?

5/20/2020 Maria Simi


Dealing with continuous-
valued attributes
▪ Example. Suppose we have Temperature as discrete
values in PlayTennis example
▪ Sort the examples according to Temperature
Temperature 40 48 | 60 72 80 | 90
PlayTennis No No 54Yes Yes Yes 85 No
▪ Determine candidate thresholds by averaging consecutive
values where there is a change in classification: (48+60)/2=54
and (80+90)/2=85
▪ Evaluate candidate thresholds (attributes) according to
information gain. The best is Temperature>54.
5/20/2020
Problems with information gain

⚫ Natural bias of information gain: it favours attributes with


many possible values.
⚫ Consider the attribute Date in the PlayTennis example.

5/20/2020
Example for play tennis
Day Outlook Tempe Humidity Wind PlayTennis?
rature
1 Sunny Hot High Light No
2 Sunny Hot High Strong No
3 Overcast Hot High Light Yes
4 Rain Mild High Light Yes
5 Rain Cool Normal Light Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Light No
9 Sunny Cool Normal Light Yes
10 Rain Mild Normal Light Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Light Yes
14 Rain Mild High Strong No
Problems with information gain

⚫ Natural bias of information gain: it favours attributes with


many possible values.
⚫ Consider the attribute Date in the PlayTennis example.
– Date would have the highest information gain since it perfectly
separates the training data.
– It would be selected at the root resulting in a very broad tree
– Very good on the training, this tree would perform poorly in
predicting unknown instances. Overfitting.

5/21/2020
Problems with information gain

⚫ The problem is that the partition is too specific,


too many small classes are generated.
⚫ We need to look at alternative measures …

5/20/2020 Maria Simi


C4.5
An alternative measure: gain ratio

c |Si | |Si |
SplitInformation(S, A)  −  log2
|S |
i=1 |S |
⚫ Si are the sets obtained by partitioning on value i of A

⚫ SplitInformation measures the entropy of S with respect to the values of


A. The more uniformly dispersed the data the higher it is.
Gain(S, A)
GainRatio(S, A) 
SplitInformation(S, A)

5/20/2020 Maria Simi


An alternative measure: gain ratio

GainRatio penalizes attributes that split examples in many small


classes
such as Date. Let |S |=n, Date splits examples in n classes
– SplitInformation(S, Date)= −[(1/n log2 1/n)+…+ (1/n log2 1/n)]=
−log21/n =log2n
⚫ Compare with A, which splits data in two even classes:
– SplitInformation(S, A)= − [(1/2 log21/2)+ (1/2 log21/2) ]= − [− 1/2
−1/2]=1

5/20/2020 Maria Simi


Day Outlook Tempe Humidity Wind PlayTennis?
Exampleraturefor play tennis
1 Sunny 85 85 Light No
2 Sunny 80 90 Strong No
3 Overcast 83 78 Light Yes
4 Rain 70 96 Light Yes
5 Rain 68 80 Light Yes
6 Rain 65 70 Strong No
7 Overcast 64 65 Strong Yes
8 Sunny 72 95 Light No
9 Sunny 69 70 Light Yes
10 Rain 75 80 Light Yes
11 Sunny 75 70 Strong Yes
12 Overcast 72 90 Strong Yes
13 Overcast 81 75 Light Yes
14 Rain 71 80 Strong No
Info Gain for Wind parameter

⚫ Information Gain for wind

⚫ 8 light, 6 strong
⚫ 0.94 – (8/14) 0.811 – (6/14) 1.0
⚫ = 0.049
Information gain for Wind

⚫ Splitinfo: How many strong and how many weak ?


Information gain for Wind

⚫ Splitinfo
⚫ – (8/14) log₂ (8/14) – (6/14) log₂ (6/14)
⚫ = 0.985

⚫ Gain Ratio = 0.049/0.985


⚫ = 0.49
Information gain for Outlook

⚫ Info Gain = 0.246


⚫ Splitinfo
Information gain for Outlook

⚫ Info Gain = 0.246


⚫ Splitinfo
⚫ – (5/14) log₂ (5/14) – (4/14) log₂ (4/14) – (5/14) log₂ (5/14)
⚫ = 1.577

⚫ Gain Ratio = 0.246/1.577


⚫ = 0.155
Information gain for Humidity

⚫ Humidity is a continuous attribute


⚫ C4.5 proposes to split it using a threshold
⚫ Split at max gain ( try number of splits)

⚫ First organize the table so that humidity is split at 75 and


above
Day Outlook Tempe Humidity Wind PlayTennis?
Exampleraturefor play tennis
1 Sunny 85 85 Light No
2 Sunny 80 90 Strong No
3 Overcast 83 78 Light Yes
4 Rain 70 96 Light Yes
5 Rain 68 80 Light Yes
6 Rain 65 70 Strong No
7 Overcast 64 65 Strong Yes
8 Sunny 72 95 Light No
9 Sunny 69 70 Light Yes
10 Rain 75 80 Light Yes
11 Sunny 75 70 Strong Yes
12 Overcast 72 90 Strong Yes
13 Overcast 81 75 Light Yes
14 Rain 71 80 Strong No
⚫ Entropy for humidity<75
⚫ = – (1/5) log₂ (1/5) – (4/5) log₂ (4/5) = 0.721
⚫ Entropy for humidity>=75
⚫ = – (5/9) log₂ (5/9) – (4/9) log₂ (4/9) = 0.991
⚫ Gain(Humidity) = 0.94 – (5/14) 0.721 – (9/14) 0.991
⚫ = 0.045
⚫ splitinfo = – (5/14) log₂ (5/14) – (9/14) log₂ (9/14) = 0.94

⚫ Gain ratio Humidity75= 0.045/0.94 = 0.047


⚫ Gain ratio Humidity70= 0.016
⚫ Gain ratio Humidity80 = 0.107
⚫ Gain ratio Temp83 = 0.305
⚫ Gain ratio Humidity80 = 0.107

⚫ Temperature attribute happens to be the highest


Random Forests
Definition

⚫ Random forest (or random forests) is an ensemble


classifier that consists of many decision trees and outputs the
class that is the mode of the class's output by individual trees.

41/
14 Predrag Radenković 3237/10
Algorithm

Each tree is constructed using the following algorithm:


1. Let the number of training cases be N, and the number of variables in the
classifier be M.
2. We are told the number m of input variables to be used to determine the
decision at a node of the tree; m should be much less than M.
3. Choose a training set for this tree by choosing n times with replacement
from all N available training cases (i.e. take a bootstrap sample).
4. For each node of the tree, randomly choose m variables on which to base
the decision at that node. Calculate the best split based on these m
variables in the training set.
5. Each tree is fully grown and not pruned
42/
14 Predrag Radenković 3237/10
⚫ For prediction a new sample is pushed down the tree. It is
assigned the label of the training sample in the terminal node
it ends up in. This procedure is iterated over all trees in the
ensemble, and the average vote of all trees is reported as
random forest prediction.
Features and Advantages

The advantages of random forest are:


⚫ It is one of the most accurate learning algorithms available. For
many data sets, it produces a highly accurate classifier.
⚫ It runs efficiently on large databases.
⚫ It can handle thousands of input variables without variable
deletion.
⚫ It gives estimates of what variables are important in the
classification.
⚫ It generates an internal unbiased estimate of the generalization
error as the forest building progresses.
44/
14 Predrag Radenković 3237/10
Features and Advantages

⚫ It has methods for balancing error in class population unbalanced


data sets.
⚫ Generated forests can be saved for future use on other data.
⚫ Prototypes are computed that give information about the relation
between the variables and the classification.
⚫ The capabilities of the above can be extended to unlabeled data,
leading to unsupervised clustering, data views and outlier
detection.
⚫ It offers an experimental method for detecting variable
interactions.
45/
14 Predrag Radenković 3237/10
Disadvantages

⚫ Random forests have been observed to overfit for some


datasets with noisy classification/regression tasks.
⚫ For data including categorical variables with different number
of levels, random forests are biased in favor of those
attributes with more levels. Therefore, the variable
importance scores from random forest are not reliable for this
type of data.

46/
14 Predrag Radenković 3237/10
⚫ Read more about random forests from here:

⚫ https://towardsdatascience.com/understanding-random-
forest-58381e0602d2

You might also like