0% found this document useful (0 votes)

8 views

Tasks on Decision Trees

The document provides solutions and explanations for assignments related to decision trees in a machine learning course. It covers concepts such as entropy, information gain, and the process of building decision trees using top-down induction. The document includes specific assignment questions and their correct answers, along with detailed calculations and explanations of the underlying principles.

Uploaded by

pysingh289

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Tasks on Decision Trees

Uploaded by

pysingh289

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Topic: Learning of Decision Trees

Solutions and Explanations for Assignment Q 4 and 5 - Week 4

Assignment tasks - Week 4 2021

Problem # 4 Correct Marks: 2 Theme: Learning of Decision Trees

What is the entropy for a decision tree data-set with 6 positive and 4 negative examples.

A 0.97 B 0.840 C 0.41

Answer: A
Assignment tasks - Week 4 2021

Problem # 5 Correct Marks: 2 Theme: Learning of Decision Trees

What is the value of the Information Gain in the following partitioning?:

N=100
Entropy =0.7

N=60 N=40
Entropy =0.4 Entropy =0.2

A. 0.26
B. 0.38
C. 0.42
D. 0.18

Answer: B
Learning of Decision Trees
We will focus on a particular category of learning techniques called
Top-Down Induction of Decision Trees (TDIDT).

The scenario for learning is supervised non-incremental data-driven learning from examples.

The systems are presented with a set of instances and develops a decision tree from the top down, guided
by frequency information in the examples. The trees are constructed beginning with the root of the tree
and proceeding down to its leaves.

The order in which instances are handled is not supposed to influence the build up of the trees.
The systems typically examine and re-examine all of the instances at many stages during learning.

Building the tree from from the top and downward, the issue is to choose and order features that
discriminate data-items (instances) in an optimal way.

Subtopics:
- Use of information theoretic measures to guide the selection and ordering of features
- Avoiding underfitting and overfitting by pruning of the tree
- Generation of several decision trees in parallel (e.g. random forest).
- Introduction of some kind of Inductive Bias (e.g Occam´s razor)
Purity or Homogeneity
The entire Data-set (all training instances) is associated
with the tree as a whole (the Root).

For every decision split based on a chosen feature and its

values, the Data-set is partitioned and the sub-sets become
associated with the nodes.

This is repeated recursively down to the leaves.

Purity or Homogeneity refers to the distribution of Data-

items of the k target classes both for the root and for each
of the nodes. Less degree of mix of classes implies higher
purity.

Most algorithms aim to maximize the purity of all nodes.

The purity or im-purity of nodes is measured by a set of

alternative information theoretic metrics.
Entropy measures

Entropy is a statistical measure from information theory that characterizes

impurity of an arbitrary collection of examples = S.

For binary classification: H(S) = −p (+) log2 p(+) − p (-) log2 p(-)

For n-ary classification: H(S) = - (all c in target classes):Sum (p(c) * log2 p (c))
Information Gain

Information Gain is a statistical measure that indicates how well

a given feature F separates (discriminates) instances according to the target classes
for an arbitrary collection of examples = S.

|S|= cardinality of S. S v = subsets of sets with value v of feature F.

Gain (S, F) = Entropy ( S ) − (v ∈values(F)): Sum ((|S v|/ |S|) * Entropy ( S v

))
Example
Example cont.
Entropy H of S (whole Data-set)
S={D1,...,D14}=[9+,5−]
H(S)=−9 /14*log2 9/14 −5/14*log2 5/14=0.940

Information Gain for Wind feature

S for Weak value ={D1,D3,D4,D5,D8,D9,D10,D13}=[6+,2−]
S for Strong value ={D2,D6,D7,D11,D12,D4}=[3+,3−]

Gain(S, Wind) = H(S) – for al v of Wind Sum ( |Sv|/|S| *H(Sv) ).

= H(S) – 8/14*H(S weak)-6/14*H(S Strong) =0.940-8/14*0.811 – 6/14* 1.0 =0.048

Information gains for the four features:

Gain(S,Outlook)=0.246 Gain(S,Humidity)=0.151
Gain(S,Wind)=0.048 Gain(S,Temperature)=0.029

Outlook has the highest Information Gain and is the preferred feature to discriminate
among data-items.
Solution Q4

What is the entropy for a decision tree data-set S,

With 6 positive and 4 negative examples.

H(S) = −p (+) log2 p(+) − p (-) log2 p(-)

H(S) = - 0.6 * log2 0.6 – 0.4 * log 2 0.4 = - ( 0.6*- 0.737 + 0.4* -1.322)
= 0.97
Solution Q5
What is the value of the Information Gain in the following partitioning?:

N=100
Entropy =0.7

N=60 N=40
Entropy =0.4 Entropy =0.2

Gain (S, F) = Entropy ( S ) − (v ∈values(F)): Sum ((|S v|/ |S|) * Entropy ( S v ))

Gain (S, F) = 0.7 − (60/1000.4 + 40/100 0.2) = 0.38

Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
4.3-DecisionTreesLearningAlgorithms Part 1
No ratings yet
4.3-DecisionTreesLearningAlgorithms Part 1
13 pages
Decision Tree-Using Entropy
No ratings yet
Decision Tree-Using Entropy
17 pages
10b Understanding Entropy Information Gain
No ratings yet
10b Understanding Entropy Information Gain
10 pages
Act9
No ratings yet
Act9
22 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
7_DecisionTree
No ratings yet
7_DecisionTree
58 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
9 pages
03-FSSR_DS610_2024=2025T1_DT
No ratings yet
03-FSSR_DS610_2024=2025T1_DT
51 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
unit 4
No ratings yet
unit 4
3 pages
Entropy and Information Gain Explained
No ratings yet
Entropy and Information Gain Explained
10 pages
Unit V-Part 1-1
No ratings yet
Unit V-Part 1-1
45 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
What Is Entropy and Why Information Gain Matter in Decision Trees
No ratings yet
What Is Entropy and Why Information Gain Matter in Decision Trees
10 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
ML & Statistics Unit 6
No ratings yet
ML & Statistics Unit 6
36 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
ML-Lec5
No ratings yet
ML-Lec5
7 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
COS10022 DSP Week05 Decision Tree and Random Forest
No ratings yet
COS10022 DSP Week05 Decision Tree and Random Forest
50 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
03 InformationGain
No ratings yet
03 InformationGain
20 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Module 3
No ratings yet
Module 3
101 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
ML_UNIT_3_NOTES-1
No ratings yet
ML_UNIT_3_NOTES-1
118 pages
Classification
No ratings yet
Classification
30 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Entropy and Information Gain
No ratings yet
Entropy and Information Gain
3 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Module 3
No ratings yet
Module 3
102 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Classification
No ratings yet
Classification
7 pages
Substitute Cipher
No ratings yet
Substitute Cipher
23 pages
Recursion Tree Method
No ratings yet
Recursion Tree Method
10 pages
BSTA 320 Comprehensive Exam Formula Sheet
No ratings yet
BSTA 320 Comprehensive Exam Formula Sheet
5 pages
Table For Discrete-Time Fourier Transform
No ratings yet
Table For Discrete-Time Fourier Transform
2 pages
Interpolation PDF
No ratings yet
Interpolation PDF
18 pages
M Com Question Paper
No ratings yet
M Com Question Paper
3 pages
Detecting Cyberattacks Using Anomaly Detection in Industrial Control Systems - A Federated Learning Approach
No ratings yet
Detecting Cyberattacks Using Anomaly Detection in Industrial Control Systems - A Federated Learning Approach
16 pages
Lab5 Gams Formulation
No ratings yet
Lab5 Gams Formulation
48 pages
Practical 5 Gauss Jacobi Method To Solve Systems of Linear Equations Q: 4x + x2 + x3 2 x1 + 5x2 + x3 - 6 x1 + 2x2 + 3x3 - 4
No ratings yet
Practical 5 Gauss Jacobi Method To Solve Systems of Linear Equations Q: 4x + x2 + x3 2 x1 + 5x2 + x3 - 6 x1 + 2x2 + 3x3 - 4
3 pages
Statistics and Probability Week 7 - 8
No ratings yet
Statistics and Probability Week 7 - 8
4 pages
Stephest Algorithm
No ratings yet
Stephest Algorithm
19 pages
Assessment 3 LINEAR PROGRAMMING PROBLEMS
No ratings yet
Assessment 3 LINEAR PROGRAMMING PROBLEMS
1 page
Lec - 05 AAA - Brute Force and Exhaustive Search
No ratings yet
Lec - 05 AAA - Brute Force and Exhaustive Search
39 pages
Assignment
No ratings yet
Assignment
2 pages
Assignment #2 - Solutions
No ratings yet
Assignment #2 - Solutions
8 pages
Netflix Stock Price Prediction
No ratings yet
Netflix Stock Price Prediction
20 pages
Lecture10 2
No ratings yet
Lecture10 2
8 pages
Formal Languages and Automata: Simplification of Context-Free Grammars and Normal Forms
No ratings yet
Formal Languages and Automata: Simplification of Context-Free Grammars and Normal Forms
33 pages
Measurement and Uncertainty
No ratings yet
Measurement and Uncertainty
6 pages
SAP HANA Predictive Analysis Library PAL en
No ratings yet
SAP HANA Predictive Analysis Library PAL en
578 pages
169 Melanoma Skin Cancer Detection Using Image Processing and Machine Learning20190703-23098-1o8hr8x-With-cover-page-V2
No ratings yet
169 Melanoma Skin Cancer Detection Using Image Processing and Machine Learning20190703-23098-1o8hr8x-With-cover-page-V2
6 pages
Unit 4-BLUE
No ratings yet
Unit 4-BLUE
18 pages
Business Forecasting - Term VI - R - Will Be Modified
No ratings yet
Business Forecasting - Term VI - R - Will Be Modified
5 pages
Experiment No. 2
No ratings yet
Experiment No. 2
4 pages
Best Practices Workshop: Overset Meshing
No ratings yet
Best Practices Workshop: Overset Meshing
21 pages
Electric Vehicle Charging Load Forecasting Method Based on
No ratings yet
Electric Vehicle Charging Load Forecasting Method Based on
19 pages
Seminar Presentation ON Machine Learning
No ratings yet
Seminar Presentation ON Machine Learning
10 pages
LECTURE_Assignment_tranship_24May_1Jun24
No ratings yet
LECTURE_Assignment_tranship_24May_1Jun24
30 pages
Join Ordering in Fragment Queries: Approach I: Ordering Joins Without Using Semi-Joins
No ratings yet
Join Ordering in Fragment Queries: Approach I: Ordering Joins Without Using Semi-Joins
7 pages
Artificial Intelligence: Prof. Dr. Fazlul Hasan Siddiqui
No ratings yet
Artificial Intelligence: Prof. Dr. Fazlul Hasan Siddiqui
21 pages

Tasks on Decision Trees

Uploaded by

Tasks on Decision Trees

Uploaded by

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Topic: Learning of Decision Trees

Solutions and Explanations for Assignment Q 4 and 5 - Week 4

Problem # 4 Correct Marks: 2 Theme: Learning of Decision Trees

A 0.97 B 0.840 C 0.41

Problem # 5 Correct Marks: 2 Theme: Learning of Decision Trees

What is the value of the Information Gain in the following partitioning?:

For every decision split based on a chosen feature and its

This is repeated recursively down to the leaves.

Purity or Homogeneity refers to the distribution of Data-

Most algorithms aim to maximize the purity of all nodes.

The purity or im-purity of nodes is measured by a set of

Entropy is a statistical measure from information theory that characterizes

Information Gain is a statistical measure that indicates how well

|S|= cardinality of S. S v = subsets of sets with value v of feature F.

Gain (S, F) = Entropy ( S ) − (v ∈values(F)): Sum ((|S v|/ |S|) * Entropy ( S v

Information Gain for Wind feature

Gain(S, Wind) = H(S) – for al v of Wind Sum ( |Sv|/|S| *H(Sv) ).

Information gains for the four features:

What is the entropy for a decision tree data-set S,

H(S) = −p (+) log2 p(+) − p (-) log2 p(-)

Gain (S, F) = Entropy ( S ) − (v ∈values(F)): Sum ((|S v|/ |S|) * Entropy ( S v ))

Gain (S, F) = 0.7 − (60/100*0.4 + 40/100 * 0.2) = 0.38

You might also like

Gain (S, F) = 0.7 − (60/1000.4 + 40/100 0.2) = 0.38