0% found this document useful (0 votes)

27 views57 pages

Wk. 5.2. Decision Trees (27.10.2020)

Uploaded by

walid49161

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views57 pages

Wk. 5.2. Decision Trees (27.10.2020)

Uploaded by

walid49161

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Summer 2022

Data Mining and Machine Learning

(CSE 321)
Topic – 5.2: Decision Trees

Course Teacher:
Md. Aynul Hasan Nahid
Lecturer
Department of Computer Science and Engineering
Daffodil International University
Recommended Reading

• “Introduction to Data Mining,” Pang-Ning

Tan, Michael Steinbach and Vipin Kumar,
Addison Wesley, 2006.
☞ Chapter 4 (Classification: Basic Concepts,
Decision Trees, and Model Evaluation)

2
Classification: Definition
• Given a collection of records (training set )
– Each record contains a set of attributes, one of the
attributes is the class.
• Find a model for class attribute as a function of the
values of other attributes.
• Goal: previously unseen records should be
assigned a class as accurately as possible.
– A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into training
and test sets, with training set used to build the model
and test set used to validate it.

3
Illustrating Classification Task

4
Classification Techniques
• Decision Tree based Methods
• Rule-based Methods
• Memory based reasoning
• Neural Networks
• Naïve Bayes and Bayesian Belief Networks
• Support Vector Machines

5
Example of a Decision Tree
l l
a s
ric ir ca uou
e go go tin ss
at te n
c ca co cla
Splitting Attributes

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

Training Data Model: Decision Tree

6
Another Example of Decision Tree
l l s
r ica ric
a
uo
u
go go tin ss
te t e n
ca ca c o cla MarSt Single,
Married Divorced

NO Refund
Yes No

NO TaxInc
< 80K > 80K

NO YES

There could be more than one tree that fits

the same data!

7
Decision Tree Classification Task

Decision
Tree

8
Apply Model to Test Data
Test Data
Start from the root of tree.

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

9
Apply Model to Test Data
Test Data

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

10
Apply Model to Test Data
Test Data

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

11
Apply Model to Test Data
Test Data

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

12
Apply Model to Test Data
Test Data

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

13
Apply Model to Test Data
Test Data

Refund
Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES

14
Decision Tree Terminology

15
Decision Tree Classification Task

Decision
Tree

16
Decision Tree Induction
• Many Algorithms:
– Hunt’s Algorithm (one of the earliest)
– CART
– ID3, C4.5
– SLIQ,SPRINT

• John Ross Quinlan is a computer science researcher in data

mining and decision theory. He has contributed extensively to the
development of decision tree algorithms, including inventing the
canonical C4.5 and ID3 algorithms.

17
Decision Tree Classifier

10
9 Ross Quinlan

8
7
Antenna Length

6 Abdomen Length > 7.1?

5 no yes
4
3 Antenna Length > 6.0? Katydid
2
no yes
1
Grasshopper Katydid
1 2 3 4 5 6 7 8 9 10 18
Abdomen Length
Antennae shorter than body?

Yes No

3 Tarsi?

Grasshopper
Yes No

Foretiba has ears?

Yes No
Cricket

Decision trees predate computers Katydids Camel Cricket19

Definition
● Decision tree is a classifier in the form of a tree structure
– Decision node: specifies a test on a single attribute
– Leaf node: indicates the value of the target attribute
– Arc/edge: split of one attribute
– Path: a disjunction of test to make the final decision

● Decision trees classify instances or examples by starting at the root of the

tree and moving through it until a leaf node.

20
Decision Tree Classification
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision tree

21
Decision Tree Representation
• Each internal node tests an attribute
• Each branch corresponds to attribute value
• Each leaf node assigns a classification

outlook

sunny overcast rain

humidity yes wind

high normal strong weak

no yes no yes

22
How do we construct the
decision tree?
• Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they can be discretized in
advance)
– Examples are partitioned recursively based on selected attributes.
– Test attributes are selected on the basis of a heuristic or statistical measure
(e.g., information gain)
• Conditions for stopping partitioning
– All samples for a given node belong to the same class
– There are no remaining attributes for further partitioning – majority voting is
employed for classifying the leaf
– There are no samples left

23
Top-Down Decision Tree Induction

• Main loop:
1. A 🡨 the “best” decision attribute for next node
2. Assign A as decision attribute for node
3. For each value of A, create new descendant of node
4. Sort training examples to leaf nodes
5. If training examples perfectly classified,
Then STOP, Else iterate over new leaf nodes

24
Tree Induction
• Greedy strategy.
– Split the records based on an attribute test that optimizes
certain criterion.

• Issues
– Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
– Determine when to stop splitting

25
How To Split Records
• Random Split
– The tree can grow huge
– These trees are hard to understand.
– Larger trees are typically less accurate than smaller trees.

• Principled Criterion
– Selection of an attribute to test at each node - choosing the most useful attribute
for classifying examples.
– How?
– Information gain
• measures how well a given attribute separates the training examples
according to their target classification
• This measure is used to select among the candidate attributes at each step
while growing the tree

26
Tree Induction
• Greedy strategy:
– Split the records based on an attribute test that optimizes
certain criterion:
– Hunt’s algorithm: recursively partition training records into
successively purer subsets. How to measure
purity/impurity
• Entropy and information gain (covered in the lectures slides)
• Gini (covered in the textbook)
• Classification error

27
How to determine the Best Split
Before Splitting: 10 records of class 0,
10 records of class 1

Gender

Which test condition is the best?

Why is student id a bad feature to use?

28
How to determine the Best Split
• Greedy approach:
– Nodes with homogeneous class distribution are preferred
• Need a measure of node impurity:

Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity

29
Picking a Good Split Feature
• Goal is to have the resulting tree be as small as possible, per Occam’s
razor.
• Finding a minimal decision tree (nodes, leaves, or depth) is an NP-hard
optimization problem.
• Top-down divide-and-conquer method does a greedy search for a simple
tree but does not guarantee to find the smallest.
– General lesson in Machine Learning and Data Mining: “Greed is good.”
• Want to pick a feature that creates subsets of examples that are relatively
“pure” in a single class so they are “closer” to being leaf nodes.
• There are a variety of heuristics for picking a good test, a popular one is
based on information gain that originated with the ID3 system of Quinlan
(1979).

R. Mooney, UT Austin
30
Information Theory
• Think of playing "20 questions": I am thinking of an integer between 1 and
1,000 -- what is it? What is the first question you would ask?
• What question will you ask?
• Why?

• Entropy measures how much more information you need before you can
identify the integer.
• Initially, there are 1000 possible values, which we assume are equally
likely.
• What is the maximum number of question you need to ask?

31
Entropy
• Entropy (disorder, impurity) of a set of examples, S, relative to a binary
classification is:

where p1 is the fraction of positive examples in S and p0 is the fraction of

negatives.
• If all examples are in one category, entropy is zero (we define 0⋅log(0)=0)
• If examples are equally mixed (p1=p0=0.5), entropy is a maximum of 1.
• Entropy can be viewed as the number of bits required on average to encode
the class of an example in S where data compression (e.g. Huffman coding) is
used to give shorter codes to more likely cases.
• For multi-class problems with c categories, entropy generalizes to:

R. Mooney, UT Austin
32
Entropy Plot for Binary
Classification
• The entropy is 0 if the outcome is certain.
• The entropy is maximum if we have no knowledge of the system
(or any outcome is equally possible).

Entropy of a 2-class
problem with regard to
the portion of one of the
two groups

33
Information Gain
• Is the expected reduction in entropy caused by partitioning the examples
according to this attribute.
• is the number of bits saved when encoding the target value of an arbitrary
member of S, by knowing the value of attribute A.

34
Information Gain in Decision
Tree Induction
• Assume that using attribute A, a current set will be partitioned into some
number of child sets

• The encoding information that would be gained by branching on A

Note: entropy is at its minimum if the collection of objects is completely uniform

35
Examples for Computing Entropy
NOTE: p( j | t) is computed as the relative frequency of class j at node t

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

Entropy = – 0 log2 0 – 1 log2 1 = – 0 – 0 = 0

P(C1) = 1/6 P(C2) = 5/6

Entropy = – (1/6) log2 (1/6) – (5/6) log2 (5/6) = 0.65

P(C1) = 2/6 P(C2) = 4/6

Entropy = – (2/6) log2 (2/6) – (4/6) log2 (4/6) = 0.92

P(C1) = 3/6=1/2 P(C2) = 3/6 = 1/2

Entropy = – (1/2) log2 (1/2) – (1/2) log2 (1/2)
= -(1/2)(-1) – (1/2)(-1) = ½ + ½ = 1 36
How to Calculate log2x
• Many calculators only have a button for log10x and
logex (note log typically means log10)
• You can calculate the log for any base b as follows:
– logb(x) = logk(x) / logk(b)
– Thus log2(x) = log10(x) / log10(2)
– Since log10(2) = .301, just calculate the log base 10 and
divide by .301 to get log base 2.
– You can use this for HW if needed

37
Splitting Based on INFO...

• Information Gain:

Parent Node, p is split into k partitions;

ni is number of records in partition i
– Measures Reduction in Entropy achieved because of the
split. Choose the split that achieves most reduction
(maximizes GAIN)
– Used in ID3 and C4.5
– Disadvantage: Tends to prefer splits that result in large
number of partitions, each being small but pure.
Continuous Attribute?
(more on it later)

• Each non-leaf node is a test, its edge partitioning the attribute into
subsets (easy for discrete attribute).
• For continuous attribute
– Partition the continuous value of attribute A into a discrete set of
intervals
– Create a new boolean attribute Ac , looking for a threshold c,

How to choose c ?

39
Person Hair Weight Age Class
Length
Homer 0” 250 36 M
Marge 10” 150 34 F
Bart 2” 90 10 M
Lisa 6” 78 8 F
Maggie 4” 20 1 F
Abe 1” 170 70 M
Selma 8” 160 41 F
Otto 10” 180 38 M
Krusty 6” 200 45 M

Comic 8” 290 38 ? 40
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)
= 0.9911
yes no
Hair Length <= 5?

Let us try splitting on

Hair length

Entrop Entro
y (1 py (3F,2M
F,3M) ) = -(3
= -(1/4 /5)log
= 0.81 )log (1 = 0.9
13 2 /4) - (3/4)lo 710 2 (3/5) - (2/5
g )log (
2 (3/4) 2 2/5)

Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911

41
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)
= 0.9911
yes no
Weight <= 160?

Let us try splitting on

Weight

Entrop Entro
y (4 py (0F,4M
F,1M) ) = -(0
= -(4/5 /4)log
= 0.72 )log (4 = 0
19 2 /5) - (1/5)lo 2 (0/4) - (4/4
g )log (
2 (1/5) 2 4/4)

Gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900 42

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)
= 0.9911
yes no
age <= 40?

Let us try splitting on

Age

Entrop Entro
y (3 py (1F,2M
F,3M) ) = -(1
= -(3/6 /3)log
= 1 )log (3 = 0.9
2 /6) - (3/6)lo 183 2 (1/3) - (2/3
g )log (
2 (3/6) 2 2/3)

Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183 43

Of the 3 features we had, Weight was best.
But while people who weigh over 160 are
perfectly classified (as males), the under 160
people are not perfectly classified… So we
simply recurse!
yes no
Weight <= 160?

This time we find that we can split on

Hair length, and we are done!

yes no
Hair Length <= 2?

44
We don’t need to keep the data around, just the
test conditions. Weight <= 160?

yes no

How would these

people be Hair Length <= 2?
classified? Male
yes no

Male Female

45
It is trivial to convert Decision Trees to rules…
Weight <= 160?

yes no

Hair Length <= 2?

Male
yes no

Male Female

Rules to Classify Males/Females

If Weight greater than 160, classify as Male

Elseif Hair Length less than or equal to 2, classify as Male
Else classify as Female
46
Once we have learned the decision tree, we don’t even need a computer!
This decision tree is attached to a medical machine, and is designed to help
nurses make decisions about what type of doctor to call.

Decision tree for a typical shared-care setting applying the system for the
diagnosis of prostatic obstructions.
47
The worked examples we have seen were
performed on small datasets. However with
small datasets there is a great danger of
overfitting the data…

When you have few datapoints, there are

many possible splitting rules that perfectly Yes No
classify the data, but will not generalize to
future datasets. Wears green?

Female Male

For example, the rule “Wears green?” perfectly classifies the data, so does
“Mothers name is Jacqueline?”, so does “Has blue shoes”…
48
How to Find the Best Split: GINI
Before Splitting: M0

A? B?
Yes No Yes No
Node Node Node Node
N1 N2 N3 N4

M1 M2 M3 M4

M12 M34
Gain = M0 – M12 vs M0 – M34
49
Measure of Impurity: GINI (at node t)
• Gini Index for a given node t with classes j

NOTE: p( j | t) is computed as the relative frequency of class j at node t

• Example: Two classes C1 & C2 and node t has 5 C1

and 5 C2 examples. Compute Gini(t)
– 1 – [p(C1|t) + p(C2|t)] = 1 – [(5/10)2 + [(5/10)2 ]
– 1 – [¼ + ¼] = ½.
– Do you think this Gini value indicates a good split or bad
split? Is it an extreme value?

50
More on Gini
• Worst Gini corresponds to probabilities of 1/nc, where nc is
the number of classes.
– For 2-class problems the worst Gini will be ½
• How do we get the best Gini? Come up with an example for
node t with 10 examples for classes C1 and C2
– 10 C1 and 0 C2
– Now what is the Gini?
• 1 – [(10/10)2 + (0/10)2 = 1 – [1 + 0] = 0
– So 0 is the best Gini
• So for 2-class problems:
– Gini varies from 0 (best) to ½ (worst).

51
Some More Examples
• Below we see the Gini values for 4 nodes with
different distributions. They are ordered from best to
worst. See next slide for details
– Note that thus far we are only computing GINI for one
node. We need to compute it for a split and then compute
the change in Gini from the parent node.

52
Examples for computing GINI

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0

P(C1) = 1/6 P(C2) = 5/6

Gini = 1 – (1/6)2 – (5/6)2 = 0.278

P(C1) = 2/6 P(C2) = 4/6

Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Splitting Criteria based on
Classification Error
• Classification error at a node t :

• Measures misclassification error made by a node.

• Maximum (1 - 1/nc) when records are equally distributed among all
classes, implying least interesting information
• Minimum (0.0) when all records belong to one class, implying most
interesting information

54
Examples for Computing Error

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

Error = 1 – max (0, 1) = 1 – 1 = 0

P(C1) = 1/6 P(C2) = 5/6

Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6

P(C1) = 2/6 P(C2) = 4/6

Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3

55
Comparison among Splitting Criteria
For a 2-class problem:

56
57

Ultratech Cement: Particulars Test Results Requirements of
100% (1)
Ultratech Cement: Particulars Test Results Requirements of
1 page
Decision Tree
No ratings yet
Decision Tree
74 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Tree Models
No ratings yet
Tree Models
42 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Unit 3
No ratings yet
Unit 3
95 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
07.2.decision Trees - ML
No ratings yet
07.2.decision Trees - ML
32 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Classification&Decision Tree
No ratings yet
Classification&Decision Tree
10 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
514 614 L28 32H Fuel Oil System
100% (1)
514 614 L28 32H Fuel Oil System
30 pages
Trees
No ratings yet
Trees
78 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Tree Based Classifiers: Dinesh R
No ratings yet
Tree Based Classifiers: Dinesh R
54 pages
03 Decision Tree
No ratings yet
03 Decision Tree
59 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
DMDW Classification
No ratings yet
DMDW Classification
18 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
15.module6 Decisiontree-Updated 14
No ratings yet
15.module6 Decisiontree-Updated 14
20 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
A Comparative Study of The Academic Performance of USL-SHS Students Before and During Pandemic I
100% (1)
A Comparative Study of The Academic Performance of USL-SHS Students Before and During Pandemic I
10 pages
Fire-Alarm A Guide To Bs5839 Part-1 2002
No ratings yet
Fire-Alarm A Guide To Bs5839 Part-1 2002
114 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
M-Duino 21+Arduino-PLC
No ratings yet
M-Duino 21+Arduino-PLC
3 pages
CEPT
No ratings yet
CEPT
19 pages
Mcu/Eeprom: Selection Guide
No ratings yet
Mcu/Eeprom: Selection Guide
16 pages
Data Structure Course
No ratings yet
Data Structure Course
48 pages
Boe 1
No ratings yet
Boe 1
9 pages
R2023-AIML-Curriculum and Syllabus
No ratings yet
R2023-AIML-Curriculum and Syllabus
59 pages
OCCUPATIONAL HEALTH AND SAFETY PROCEDURES IN COMPUTER - PPTM
No ratings yet
OCCUPATIONAL HEALTH AND SAFETY PROCEDURES IN COMPUTER - PPTM
29 pages
Auto Sentry Eau99g 240521 095417
No ratings yet
Auto Sentry Eau99g 240521 095417
75 pages
O133932v89 SUPER 19003L EN 2951844 MPW 080221
No ratings yet
O133932v89 SUPER 19003L EN 2951844 MPW 080221
18 pages
Master Thesis RSM Erasmus University
100% (3)
Master Thesis RSM Erasmus University
5 pages
Hoeganaes Corporation
No ratings yet
Hoeganaes Corporation
11 pages
9852 2340 01b Manual Cement Unit Boltec M & L RCS 4.5
No ratings yet
9852 2340 01b Manual Cement Unit Boltec M & L RCS 4.5
56 pages
Microprocessor in Agriculture
No ratings yet
Microprocessor in Agriculture
2 pages
Tertiary Winding Function
No ratings yet
Tertiary Winding Function
1 page
Filmdropwise
No ratings yet
Filmdropwise
26 pages
Candidate Privacy
No ratings yet
Candidate Privacy
6 pages
Aegis El RG 4m El RG 4k Manual 21-03-31
No ratings yet
Aegis El RG 4m El RG 4k Manual 21-03-31
8 pages
Tips For Managing Virtual Teams 24 03 PDF
No ratings yet
Tips For Managing Virtual Teams 24 03 PDF
1 page
Catalysts Contrivance
No ratings yet
Catalysts Contrivance
9 pages
XT4N 250 Ekip LS/I in 250A 4p F F
No ratings yet
XT4N 250 Ekip LS/I in 250A 4p F F
3 pages
6.-SESSION-PLAN Sample
No ratings yet
6.-SESSION-PLAN Sample
9 pages
Tendering & Contracts: Process Flow
No ratings yet
Tendering & Contracts: Process Flow
6 pages
Capacidades de Reabastecimento R1700K
No ratings yet
Capacidades de Reabastecimento R1700K
2 pages
Tentative 3rd International Conference On Communication
No ratings yet
Tentative 3rd International Conference On Communication
2 pages
Innovating HRM in The Local Government - The Northern Samar Experience - BATULA, FLORENCIO A
No ratings yet
Innovating HRM in The Local Government - The Northern Samar Experience - BATULA, FLORENCIO A
1 page
Piski Sundari, S.Kom: Education 2009 - 2011 2011-2014 2014 - 2018
No ratings yet
Piski Sundari, S.Kom: Education 2009 - 2011 2011-2014 2014 - 2018
1 page

Wk. 5.2. Decision Trees (27.10.2020)

Uploaded by

Wk. 5.2. Decision Trees (27.10.2020)

Uploaded by

Summer 2022

Data Mining and Machine Learning

• “Introduction to Data Mining,” Pang-Ning

Training Data Model: Decision Tree

There could be more than one tree that fits

• John Ross Quinlan is a computer science researcher in data

6 Abdomen Length > 7.1?

Foretiba has ears?

Decision trees predate computers Katydids Camel Cricket19

● Decision trees classify instances or examples by starting at the root of the

sunny overcast rain

humidity yes wind

high normal strong weak

Which test condition is the best?

where p1 is the fraction of positive examples in S and p0 is the fraction of

• The encoding information that would be gained by branching on A

Note: entropy is at its minimum if the collection of objects is completely uniform

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

P(C1) = 1/6 P(C2) = 5/6

P(C1) = 2/6 P(C2) = 4/6

P(C1) = 3/6=1/2 P(C2) = 3/6 = 1/2

Parent Node, p is split into k partitions;

Let us try splitting on

Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911

Let us try splitting on

Gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900 42

Let us try splitting on

Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183 43

This time we find that we can split on

How would these

Hair Length <= 2?

Rules to Classify Males/Females

If Weight greater than 160, classify as Male

When you have few datapoints, there are

NOTE: p( j | t) is computed as the relative frequency of class j at node t

• Example: Two classes C1 & C2 and node t has 5 C1

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

P(C1) = 1/6 P(C2) = 5/6

P(C1) = 2/6 P(C2) = 4/6

• Measures misclassification error made by a node.

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

P(C1) = 1/6 P(C2) = 5/6

P(C1) = 2/6 P(C2) = 4/6

You might also like