0% found this document useful (0 votes)
8 views

Tasks on Decision Trees

The document provides solutions and explanations for assignments related to decision trees in a machine learning course. It covers concepts such as entropy, information gain, and the process of building decision trees using top-down induction. The document includes specific assignment questions and their correct answers, along with detailed calculations and explanations of the underlying principles.

Uploaded by

pysingh289
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Tasks on Decision Trees

The document provides solutions and explanations for assignments related to decision trees in a machine learning course. It covers concepts such as entropy, information gain, and the process of building decision trees using top-down induction. The document includes specific assignment questions and their correct answers, along with detailed calculations and explanations of the underlying principles.

Uploaded by

pysingh289
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Topic: Learning of Decision Trees

Solutions and Explanations for Assignment Q 4 and 5 - Week 4


Assignment tasks - Week 4 2021

Problem # 4 Correct Marks: 2 Theme: Learning of Decision Trees

What is the entropy for a decision tree data-set with 6 positive and 4 negative examples.

A 0.97 B 0.840 C 0.41

Answer: A
Assignment tasks - Week 4 2021

Problem # 5 Correct Marks: 2 Theme: Learning of Decision Trees

What is the value of the Information Gain in the following partitioning?:

N=100
Entropy =0.7

N=60 N=40
Entropy =0.4 Entropy =0.2

A. 0.26
B. 0.38
C. 0.42
D. 0.18

Answer: B
Learning of Decision Trees
We will focus on a particular category of learning techniques called
Top-Down Induction of Decision Trees (TDIDT).

The scenario for learning is supervised non-incremental data-driven learning from examples.

The systems are presented with a set of instances and develops a decision tree from the top down, guided
by frequency information in the examples. The trees are constructed beginning with the root of the tree
and proceeding down to its leaves.

The order in which instances are handled is not supposed to influence the build up of the trees.
The systems typically examine and re-examine all of the instances at many stages during learning.

Building the tree from from the top and downward, the issue is to choose and order features that
discriminate data-items (instances) in an optimal way.

Subtopics:
- Use of information theoretic measures to guide the selection and ordering of features
- Avoiding underfitting and overfitting by pruning of the tree
- Generation of several decision trees in parallel (e.g. random forest).
- Introduction of some kind of Inductive Bias (e.g Occam´s razor)
Purity or Homogeneity
The entire Data-set (all training instances) is associated
with the tree as a whole (the Root).

For every decision split based on a chosen feature and its


values, the Data-set is partitioned and the sub-sets become
associated with the nodes.

This is repeated recursively down to the leaves.

Purity or Homogeneity refers to the distribution of Data-


items of the k target classes both for the root and for each
of the nodes. Less degree of mix of classes implies higher
purity.

Most algorithms aim to maximize the purity of all nodes.

The purity or im-purity of nodes is measured by a set of


alternative information theoretic metrics.
Entropy measures

Entropy is a statistical measure from information theory that characterizes


impurity of an arbitrary collection of examples = S.

For binary classification: H(S) = −p (+) log2 p(+) − p (-) log2 p(-)

For n-ary classification: H(S) = - (all c in target classes):Sum (p(c) * log2 p (c))
Information Gain

Information Gain is a statistical measure that indicates how well


a given feature F separates (discriminates) instances according to the target classes
for an arbitrary collection of examples = S.

|S|= cardinality of S. S v = subsets of sets with value v of feature F.

Gain (S, F) = Entropy ( S ) − (v ∈values(F)): Sum ((|S v|/ |S|) * Entropy ( S v


))
Example
Example cont.
Entropy H of S (whole Data-set)
S={D1,...,D14}=[9+,5−]
H(S)=−9 /14*log2 9/14 −5/14*log2 5/14=0.940

Information Gain for Wind feature


S for Weak value ={D1,D3,D4,D5,D8,D9,D10,D13}=[6+,2−]
S for Strong value ={D2,D6,D7,D11,D12,D4}=[3+,3−]

Gain(S, Wind) = H(S) – for al v of Wind Sum ( |Sv|/|S| *H(Sv) ).


= H(S) – 8/14*H(S weak)-6/14*H(S Strong) =0.940-8/14*0.811 – 6/14* 1.0 =0.048

Information gains for the four features:


Gain(S,Outlook)=0.246 Gain(S,Humidity)=0.151
Gain(S,Wind)=0.048 Gain(S,Temperature)=0.029

Outlook has the highest Information Gain and is the preferred feature to discriminate
among data-items.
Solution Q4

What is the entropy for a decision tree data-set S,


With 6 positive and 4 negative examples.

H(S) = −p (+) log2 p(+) − p (-) log2 p(-)

H(S) = - 0.6 * log2 0.6 – 0.4 * log 2 0.4 = - ( 0.6*- 0.737 + 0.4* -1.322)
= 0.97
Solution Q5
What is the value of the Information Gain in the following partitioning?:

N=100
Entropy =0.7

N=60 N=40
Entropy =0.4 Entropy =0.2

Gain (S, F) = Entropy ( S ) − (v ∈values(F)): Sum ((|S v|/ |S|) * Entropy ( S v ))

Gain (S, F) = 0.7 − (60/100*0.4 + 40/100 * 0.2) = 0.38

You might also like