0% found this document useful (0 votes)

20 views

Lec7 - Nonparametric Methods - II

Uploaded by

omar.okasha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Lec7 - Nonparametric Methods - II

Uploaded by

omar.okasha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Lecture 7

parametric vs non-parametric
methods (II)

Ghada Khoriba
[email protected]

1
Recommending App

For a woman who works at

an office, which app do we
recommend?

For a man who works at a

factory, which app do we
recommend?

ML ask:
Between Gender and Occupation,
which one seems more decisive for
predicting what app will the users
download?....

2
Recommending App
Occupation

School Work

Pokemon Go Gender

F M

WhatsApp Snapchat

3
Between a horizontal
and a vertical line, which
one would cut the data
better?

4
5
Non-parametric Estimation
• A non-parametric model is not fixed, but its complexity
depends on the size of the training set or, rather, the
complexity of the problem inherent in the data.
• A nonparametric model does not mean that the model has no
parameters; it means that the number of parameters is not
fixed and that their number can grow depending on the size
of the data or, better still, depending on the complexity of the
regularity that underlies the data.

6
Decision tree
• A decision tree is a hierarchical data structure implementing
the divide-and-conquer strategy.
• It is an efficient nonparametric method that can be used for
both classification and regression.
• A decision tree is also a nonparametric model in the sense
that we do not assume any parametric form for the class
densities and the tree structure is not fixed a priori, but the
tree grows, branches and leaves are added during learning
depending on the complexity of the problem inherent in the
data.

7
FunctionFunction
Approximation
Approximation
Problem Setting
• Set of possible instances X
• Set of possible labels Y
• Unknown target function f : X ! Y
• Set of function hypotheses H = {h | h : X ! Y}

Input: Training examples of unknown target function f

n
{hxi , yi i}i=1 = {hx1 , y1 i , . . . , hxn , yn i}
Output: Hypothesis h 2 H that best approximates f
8
Ref: https://www.seas.upenn.edu, Eric Eaton.
Based on slide by Tom Mitchell
Sample
Sample Dataset
Dataset (was
(was Tennis Tennis
Played?) Played?)
• Columns denote features Xi
• Rows denote labeled instances hxi , yi i
• Class label denotes whether a tennis game was played

hxi , yi i

9
Ref: https://www.seas.upenn.edu, Eric Eaton.
Decision Tree Decision Tree
• A possible decision tree for the data:

• Each internal node: test one attribute Xi

• Each branch from a node: selects one value for Xi
• Each leaf node: predict Y
10
Ref: https://www.seas.upenn.edu, Eric Eaton.
Based on slide by Tom Mitchell
Decision Tree
Decision Tree
• A possible decision tree for the data:

• What prediction would we make for

<outlook=sunny, temperature=hot, humidity=high, wind=weak> ?

Ref: https://www.seas.upenn.edu, Eric Eaton. 11

Based on slide by Tom Mitchell
– Decision Boundary
DecisionTree
Decision Tree– Decision Boundary
the feature space into axis-
angles
• Decision trees divide the feature space into axis-
parallel (hyper-)rectangles
ion is labeled with one label
• Each rectangular region is labeled with one label
bution– over labels distribution over labels
or a probability

Decision
boundary

12
Ref: https://www.seas.upenn.edu, Eric Eaton. 7
Stages of of
Stages Machine LearningLearning
(Batch) Machine
n
Given: labeled training data X, Y = {hxi , yi i}i=1
• Assumes each xi ⇠ D(X ) with yi = ftarget (xi )

X, Y
Train the model:
learner
model ß classifier.train(X, Y )
x model yprediction

Apply the model to new data:

• Given: new unlabeled instance x ⇠ D(X )
yprediction ß model.predict(x)
13
Ref: https://www.seas.upenn.edu, Eric Eaton.
Basic Algorithm for Top-Down
Learning of Decision
Top-down learning (ID3, C4.5) Trees
[ID3, C4.5 by Quinlan]

node = root of decision tree

Main loop:
1. A ß the “best” decision attribute for the next node.
2. Assign A as decision attribute for node.
3. For each value of A, create a new descendant of node.
4. Sort training examples to leaf nodes.
5. If training examples are perfectly classified, stop. Else,
recurse over new leaf nodes.

How do we choose which attribute is best?

14
Choosing the Best Attribute
Key problem: choosing which attribute to split a given set of examples
• Some possibilities are:
– Random: Select any attribute at random
– Least-Values: Choose the attribute with the smallest number of possible values
– Most-Values: Choose the attribute with the largest number of possible values
– Max-Gain: Choose the attribute that has the largest expected information gain
• i.e., attribute that results in smallest expected size of subtrees rooted at its children

• The ID3 algorithm uses the Max-Gain method of selecting the best
attribute

15
Information Gain
Information Gain
Which test is more informative?
Split over whether Split over whether
Balance exceeds 50K applicant is employed

Less or equal 50K Over 50K Unemployed Employed

12
Based on slide by Pedro Domingos 16
Information Gain
Information Gain
Impurity/Entropy (informal)
– Measures the level of impurity in a group of
examples

13
Based on slide by Pedro Domingos

17
18
Entropy: a common way to measure impurity

19
2-Class
2-Class Cases:
Cases:
2-Class 2-Class
2-Class Cases:2-ClassCases:
Entropy H(x)
Cases:
X
X
n
n

2-Class
Entropy
X Cases:
n Cases:
H(x)
n
=
X = 2-Class
i=1
P (x = i) log2 P (x = i)
P (x =Cases:
i) log2 P (x = i)
Entropy EntropyH(x) =
H(x) = X P (x = i) log P (x = i)
n X Pn(x = i) log2 i=1 P (x = n i)
2
X in which all Minimum
• What is the entropy of a group
Entropy
Entropy H(x) •=
H(x) What =i=1is the
Entropy
P (xentropy
P (xi)H(x)
i=1
= =logi)of a
log
P
= group
(x P=(xi)in=
Pwhich all
i) = i)
(x log Minimum
P (x = i)
2 impurity
examples belong to the same class? 2 2
impurity
• What is• theWhat is the
entropy examples
entropy
of a
i=1group belong
i=1 of ina
– entropy = - 1 log21 = 0
to
group
which thein
allsame
which class?
all
Minimum
i=1 Minimum
examples examples
belong to belong
– the same
entropy =to- 1the 21 same
class?
log =0 class? impurity impurity
• What
• What is theis entropy
the entropy of• a of
group
What a group
isinthe in
which which
entropy all allMinimum
of a Minimum
group in which all Minimum
= -–1 log
entropy not
= 0 = -not a good
1 log21 = 0 training set for learning impurity
examples
– entropy
examples belong to thebelong
21 to the
same
a good same
trainingclass? class?
examples belong to the same class?
set for learning impurity impurity
– entropy
– entropy = -not =a 2-good
1 log 11=log 1 = 0 set–forentropy
0 training learning= - 1 log21 = 0
not a good training set for2learning

not atraining
not a good set for•
good training set What isathe
for learning
learning not good entropy oflearning
training set for a groupwith 50% Maximum
• What is the entropy of a group with 50% Maximum
in either class? impurity
• What is• theWhat is the
entropy in–of
either
entropy
a class?
group of a
with group
50% with 50%
Maximum
entropy = -0.5 log20.5 – 0.5 log20.5 =1 Maximum impurity
in either
• What
• What in
is the either
class?
is entropy
the entropyclass?
– entropy = -0.5 log20.5 – 0.5 log20.5 =1 impurity impurity
of
of• aWhat a group
group the with
iswith 50%50%ofMaximum
entropy Maximum
a group with 50% Maximum
in either
– entropy = -0.5
in either class? class?
– entropy
log 20.5 =
– -0.5 log220.5
0.5 log
good in
0.5 =1
– 0.5 log20.5 =1
either
training set forclass?
learning
impurity
impurity impurity
– entropy
– entropy = -0.5 =log
-0.5 log2good
0.5 log
20.5 – 0.5
– training
0.5– log
20.5 =1 20.5
set =1
entropyfor=learning
-0.5 log20.5 – 0.5 log20.5 =1 16
Based on slide by Pedro Domingos 16
good
good training set training
forBased
learning set by
on slide forPedro
learning
Domingos
16 16 20
Based on slide
good goodDomingos
Based
by Pedro ontraining
training slide
set by set
forPedrofor learning
Domingos
learning good training set for learning
16 16
16
Based onBased
slide on
by slide
PedrobyDomingos
Pedro Domingos
Based on slide by Pedro Domingos
Sample Entropy
Sample Entropy
Sample Entropy

ell

21
InformationInformation
Gain Gain
• We want to determine which attribute in a given set
of training feature vectors is most useful for
discriminating between the classes to be learned.

• Information gain tells us how important a given

attribute of the feature vectors is.

• We will use it to decide the ordering of attributes in

the nodes of a decision tree.

18
Based on slide by Pedro Domingos
From Entropy to Information Gain
From Entropy to Information Gain
Entropy
Entropy H(X) of a random variable X

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Information Gain
Mututal information Information Gain isGain)
(aka Information the mutual information
of X and Y: between
input attribute A and target variable Y
Slide by Tom Mitchell
Information Gain is the expected reduction in entropy
of target variable Y for data sample S, due to sorting 23

on variable A
Calculating Informa
Information Gain = entropy(parent) – [averag
Calculating Information Gain
Calculating Information Gain
Information Gain = entropy(parent) – [average entropy(children)]
child =
impurity
entropy

Calculating Information Gain Information Gain = entropy(parent) – [average

child
entropy(children)]
æ 13

æ 13
13 ö æ 4
impurity
13 ö æ 4
= -ç × log
Entire population
entropy è 17
4ö
(30
2 ÷ - ç ×
instances)
17 ø è 17
log 2
4ö
÷ = 0.787
17 ø
CalculatingInformation
Calculating Information Gain Gain child
impurity = -ç × log 2 ÷ - ç × log 2 ÷ = 0
Entire population (30 instances) entropy è 17 17 ø è 1717 instances
17 ø
Information Gain = entropy(parent) – [average entropy(children)]
Information Gain = entropy(parent) – [average entropy(children)] child
Entire population æ
(30
13 instances)
13 ö æ 4 4 ö entropy= -
impurity
child =æ 13 13 ö æ÷ 4- ç × log 4ö
impurity
child
impurity = -ç -ç × log× log 2÷ - ç × log ÷ = 0.787
2 = 0.÷787 17 instan
entropy è 17è
entropy 17 2 17
17 ø èø17 è 17 2
17 ø 17 ø child æ 1
æ 14
1 ö æ 12
14 ö æ 16 12
16 öö
impurity = -=ç-ç × log
parent
entropy
impurity × log2 213 ÷÷--çç13 ××log
log 22 13÷÷==00.996
.391
13
entropyè è 30 30øø èè 30 30 øø
Entire
Entire population (30 instances)
population (30 instances)
parent= -æç 14 × log 14 ö÷ - æç 16 × log 16 ö÷ = 0.996
impurity
entropy è 30
2
30 ø 17 172 instances
è 30instances
30 ø
child æ 1 1 ö æ 1213 instances
12 ö æç
entropy=(Weighted)
impurity -ç × log Average
2 - ç ×oflog
÷ Entropy Children
2 ÷ == 0è
è 13 13 ø è 13 13 ø
æ 17
Information ö æ 13 0.996ö - 0.615 = 0.3
Gain=
(Weighted) Average Entropy of Children = ç Pedro Domingos÷ + ç × 0.391÷ = 0.615
× 0.787
child æ 1æ 1
child æ 14
parent 14 1 ö 16
æ 12 1612 ö
ö æ1 ö æ 12 ö 12 ö
Based on slide by 30
è ø è 30 ø
impurity
impurity =
impurity
entropy=
-ç- ç × ×
loglog 2 ÷ 2-÷ç- ç÷ -
= -ç 2Information
× log ç× log
× log ÷ =÷2 0=.996
2 ×2log 0÷.391
= 0.391
entropy 30 13
entropy è è è Based 30 13
ø øè è
30 13 Gain=
30 0.996
13
ø ø - 0.615 = 0.38
13 on slide by Pedro
13Domingos
ø è 13 13 ø 21
13 instanc

parent æ 14
æ 14 14
14 öö ææ16
16 öö
1616
= - × -
parent= -çç × log2 ÷÷ -çç × log
impurity
impurity log × log 2 ÷ =÷ 0=.996
0.996
entropy è 30 2 30 ø è 30 30
2 ø
13 instances
entropy è 30 30 ø è 30 30 ø æ ö æ 13 ö
13 instances 17
(Weighted) Average Entropy of Children = ç × 0.787 +
÷ ç × 0.391÷ = 0.615
è 30 ø è 30 ø
Information æ 17 Gain= ö æ 13
0.996 -ö 0.615 = 0.38
(Weighted) Average Entropy of Children = ç æ × 0.787 ×
170.787 +
÷ çö æ ×13
0.391÷ = 0ö.615 21
(Weighted) Average Entropy of Children
Based =è 30
on slide by Pedro ç
Domingos + ç × 0.ø391÷ = 0.615
ø è÷ 30 24
è 30 ø è 30 ø
Information Gain= 0.996 - 0.615 = 0.38 21
Information Gain= 0.996 - 0.615 = 0.38
Based on slide by Pedro Domingos 21
Based on slide by Pedro Domingos
Slide by Tom Mitchell

Slide by Tom Mitchell

12 25
26
Which Tree Should We Output?
Which Tree Should We Output?
• ID3 performs heuristic
search through space of
decision trees
• It stops at smallest
acceptable tree. Why?

Occam’s razor: prefer the

simplest hypothesis that
fits the data

Slide by Tom Mitchell

27
Overfitting in Decision Trees
Overfitting in Decision Trees
• Many kinds of noise can occur in the examples:
• Many kinds of noise can occur in the examples:
– Two examples have same attribute/value pairs, but different
– Two examples have same attribute/value pairs, but different
classifications
– Some valuesclassifications
of attributes are incorrect because of errors in
Some values
the data–acquisition of attributes
process are incorrectphase
or the preprocessing because of errors in
thewas
– The instance datalabeled
acquisition process
incorrectly or the preprocessing
(+ instead of -) phase
– The instance was labeled incorrectly (+ instead of -)
• Also, some attributes are irrelevant to the decision-
Also, some attributes are irrelevant to the decision-
making• process
making
– e.g., color process
of a die is irrelevant to its outcome
– e.g., color of a die is irrelevant to its outcome
27 28
sed on Slide from M. desJardins & T. Finin

27
Based on Slide from M. desJardins & T. Finin
Overfitting
Overfitting in Decision
in Decision Trees Trees
• Irrelevant
• Many kinds of noiseattributes
can occur incan
the result in overfitting the
examples:
– Two examples have same attribute/value pairs, but different
training
classifications
example data
–If hypothesis
– Some values space
of attributes are hasbecause
incorrect many dimensions
of errors in (large
number process
the data acquisition of attributes), we may find
or the preprocessing phasemeaningless
– The instanceregularity
was labeledinincorrectly
the data(+that is irrelevant
instead of -) to the true,
important, distinguishing features
• Also, some attributes are irrelevant to the decision-
making process
• If we have too little training
– e.g., color of a die is irrelevant to its outcome
data, even a
reasonable hypothesis space will overfit
27 29
sed on Slide from M. desJardins & T. Finin
28
Based on Slide from M. desJardins & T. Finin
Avoiding
AvoidingOverfitting in Decision
Overfitting in DecisionTrees
Trees
How
How canwe
can weavoid
avoidoverfitting?
overfitting?
• Stop
•Stop growingwhen
growing whendata
data split
split is
is not
notstatistically
statisticallysignificant
significant
• Acquire
•Acquire more training data
more training data
• Remove irrelevant attributes (manual process – not always possible)
• Remove irrelevant attributes (manual process – not always possible)
• Grow full tree, then post-prune
• Grow full tree, then post-prune

How to select “best” tree:

How to select
• Measure “best” tree:
performance over training data
• •Measure
Measure performance
performanceoverover training
separate data
validation data set
• •Measure performance
Add complexity penaltyover separate validation
to performance measure data set
• (heuristic:
Add complexitysimpler is better)
penalty to performance measure
(heuristic: simpler is better)
32 30
Based on Slide by Pedro Domingos
32
Based on Slide by Pedro Domingos
Reduced-Error
Reduced-Error Pruning
Pruning
Split training
training data into
data further further into training
training and validation
and validation sets sets
Grow
tree treeon
based based on training
training set set
ntilDo until pruning
further further pruning is harmful:
is harmful:
valuate impact on
1. Evaluate validation
impact set of pruning
on validation each each
set of pruning
ossiblepossible
node (plus those
node (plusbelow
thoseit)below it)
reedily remove remove
2. Greedily the nodethethat most
node improves
that most improves
alidation set accuracy
validation set accuracy

33 33
Domingos 31
Slide by Pedro Domingos
Effect of Reduced-Error Pruning
Effect of Reduced-Error Pruning

The tree is pruned back to the red line where

it gives more accurate results on the test data 36
Based on Slide by Pedro Domingos

The tree is pruned back to the red line where

it gives more accurate results on the test data 36
32
ased on Slide by Pedro Domingos
Decision Tree Decision Tree Learning
Summary:
• Representation: Summary:
decision trees Decision Tree Learning
• Bias: prefer small decision trees
• Widely used in practice
• Search algorithm: greedy
• Heuristic function: • information gain or information
Strengths include
Summary: Decision Tree Learning – Fast and
content or simple
othersto implement
– Can convert to rules
• Overfitting / pruning – Handles noisy data
• Widely used in practice
• Weaknesses include
• Strengths include – Univariate splits/partitioning using only one attribute at a
– Fast and simple to implement time --- limits types of possible trees
– Can convert to rules – Large decision trees may be hard to understand
– Handles noisy data – Requires fixed-length feature vectors
– Non-incremental (i.e., batch method)
• Weaknesses include
Slide by Pedro Domingos
38
33
37
– Univariate splits/partitioning using only one attribute at a
time --- limits types of possible trees
– Large decision trees may be hard to understand
K-Nearest Neighbor
• The nearest neighbor class of estimators adapts the amount
of smoothing to the local density of data. The degree of
smoothing is controlled by k, the number of neighbors taken
into account, which is much smaller than N, the sample size.
• 1‐Nearest Neighbor

34
35
36
37
KNN has three basic steps.
1. Calculate the distance.
2. Find the k nearest neighbors.
3. Vote for classes

Decision Trees
No ratings yet
Decision Trees
42 pages
AI-unit-4
No ratings yet
AI-unit-4
91 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
Chapter19 4e
No ratings yet
Chapter19 4e
67 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
UNIT 5
No ratings yet
UNIT 5
21 pages
AI Learning
No ratings yet
AI Learning
81 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
06 Learning
No ratings yet
06 Learning
51 pages
2024-Lecture11-MLAlgorithms
No ratings yet
2024-Lecture11-MLAlgorithms
84 pages
Classification
No ratings yet
Classification
33 pages
Learning AI
No ratings yet
Learning AI
34 pages
Machine Learning Learning
No ratings yet
Machine Learning Learning
35 pages
AICh 6
No ratings yet
AICh 6
44 pages
Supervised Learning
No ratings yet
Supervised Learning
41 pages
ai unit 5 part 3
No ratings yet
ai unit 5 part 3
9 pages
Machine
No ratings yet
Machine
61 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Unit V-Part 1-1
No ratings yet
Unit V-Part 1-1
45 pages
JU Ch9
No ratings yet
JU Ch9
21 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Lect6 PDF
No ratings yet
Lect6 PDF
66 pages
l9
No ratings yet
l9
110 pages
Chapter 6:artificial Intelligence Learning: By. Getaneh T
No ratings yet
Chapter 6:artificial Intelligence Learning: By. Getaneh T
59 pages
Chapter4 Machine Learning Part3
No ratings yet
Chapter4 Machine Learning Part3
43 pages
Cooperating Intelligent Systems: Learning From Observations Chapter 18, AIMA
No ratings yet
Cooperating Intelligent Systems: Learning From Observations Chapter 18, AIMA
53 pages
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
No ratings yet
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
30 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Chapter 8: Learning: By, Safa Hamdare
No ratings yet
Chapter 8: Learning: By, Safa Hamdare
46 pages
Cooperating Intelligent Systems: Learning From Observations Chapter 18, AIMA
No ratings yet
Cooperating Intelligent Systems: Learning From Observations Chapter 18, AIMA
51 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Learning From Observations: Section 1 - 3
No ratings yet
Learning From Observations: Section 1 - 3
26 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Supervised Learning Part1
No ratings yet
Supervised Learning Part1
42 pages
Notes
No ratings yet
Notes
125 pages
Ai Learning
No ratings yet
Ai Learning
25 pages
jdavis-indlearn2 (1)
No ratings yet
jdavis-indlearn2 (1)
91 pages
Big Data Lesson 5 Lucrezia Noli
No ratings yet
Big Data Lesson 5 Lucrezia Noli
30 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
IntroClassificationDA-2024
No ratings yet
IntroClassificationDA-2024
129 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
ML DecisionTrees
No ratings yet
ML DecisionTrees
46 pages
2021 Lecture10 BasicML
No ratings yet
2021 Lecture10 BasicML
76 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Unit 5 2
No ratings yet
Unit 5 2
31 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
DTreesAndOverfitting-1-11-2011_final
No ratings yet
DTreesAndOverfitting-1-11-2011_final
20 pages
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
No ratings yet
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
46 pages
Ai Unit V
No ratings yet
Ai Unit V
18 pages
Week 3
No ratings yet
Week 3
56 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
The Deep Learning Engineer's Handbook: From Fundamentals to Advanced Techniques with Scikit-Learn, Keras, and TensorFlow
From Everand
The Deep Learning Engineer's Handbook: From Fundamentals to Advanced Techniques with Scikit-Learn, Keras, and TensorFlow
Aarav Joshi
No ratings yet
Decision Trees Explained - Entropy, Information Gain, Gini Index, CCP Pruning - by Shailey Dash - Towards Data Science
No ratings yet
Decision Trees Explained - Entropy, Information Gain, Gini Index, CCP Pruning - by Shailey Dash - Towards Data Science
25 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
Unit3-ID3-DT-Examples
No ratings yet
Unit3-ID3-DT-Examples
12 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Introduction to Machine Learning 9
No ratings yet
Introduction to Machine Learning 9
3 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree For Classification (ID3 Information Gain Entropy)
No ratings yet
Decision Tree For Classification (ID3 Information Gain Entropy)
3 pages
Pilot_Fear_Detection_from_EEG_Signals_Classified_by_Decision_Tree_During_Landing_Conditions
No ratings yet
Pilot_Fear_Detection_from_EEG_Signals_Classified_by_Decision_Tree_During_Landing_Conditions
7 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Foundations of Machine
No ratings yet
Foundations of Machine
120 pages
Flood Detection and Avoidance by Using Iot: © 2022, Irjedt Volume: 04 Issue: 05 - May-2022
No ratings yet
Flood Detection and Avoidance by Using Iot: © 2022, Irjedt Volume: 04 Issue: 05 - May-2022
13 pages
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
Decision Trees
No ratings yet
Decision Trees
88 pages
Findings On Paper 23242
No ratings yet
Findings On Paper 23242
7 pages
W7-8_ Decision Trees
No ratings yet
W7-8_ Decision Trees
81 pages
ML Unit 1
No ratings yet
ML Unit 1
156 pages
1822 B.E Cse Batchno 287
No ratings yet
1822 B.E Cse Batchno 287
65 pages
THEORY FILE - Machine Learning (6th Sem)!!
No ratings yet
THEORY FILE - Machine Learning (6th Sem)!!
26 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
Jurnal Mental Health 1
No ratings yet
Jurnal Mental Health 1
10 pages
ESE Lab File
No ratings yet
ESE Lab File
105 pages
Unit-3 (MLT)
No ratings yet
Unit-3 (MLT)
46 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
algosintrvwques
No ratings yet
algosintrvwques
27 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Final Report
No ratings yet
Final Report
26 pages
Decision Tree With Cross Validation
No ratings yet
Decision Tree With Cross Validation
19 pages
UNIT-IV - Decision Tree Induction
No ratings yet
UNIT-IV - Decision Tree Induction
19 pages
Big Data Computing Notes
No ratings yet
Big Data Computing Notes
17 pages

Lec7 - Nonparametric Methods - II

Uploaded by

Lec7 - Nonparametric Methods - II

Uploaded by

Lecture 7

For a woman who works at

For a man who works at a

Input: Training examples of unknown target function f

• Each internal node: test one attribute Xi

• What prediction would we make for

Ref: https://www.seas.upenn.edu, Eric Eaton. 11

Apply the model to new data:

node = root of decision tree

How do we choose which attribute is best?

Less or equal 50K Over 50K Unemployed Employed

• Information gain tells us how important a given

• We will use it to decide the ordering of attributes in

Specific conditional entropy H(X|Y=v) of X given Y=v :

Conditional entropy H(X|Y) of X given Y :

Calculating Information Gain Information Gain = entropy(parent) – [average

Slide by Tom Mitchell

Occam’s razor: prefer the

Slide by Tom Mitchell

How to select “best” tree:

The tree is pruned back to the red line where

The tree is pruned back to the red line where

You might also like