Decision Tree

Uploaded by

toxicacm0

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Decision Tree

Uploaded by

toxicacm0

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Supervised

Learning
-
Classification
Classification: Decision Tree
Distance Measures and Cosine Similarity

K-nearest Neighbour Classification

Bayes Classification Methods

Classification: Model Construction

Decision Tree Induction

Model Evaluation

2
Decision Tree Induction: An Example
 Training data set: Buys_computer class age income student credit_rating buys_computer
 Resulting tree: <=30 high no fair no
<=30 high no excellent no
age? 31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
<=30 overcast
31..40 >40 >40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
student? yes credit rating? <=30 low yes fair yes
>40 medium yes fair yes
no yes excellent fair <=30 medium yes excellent yes
31…40 medium no excellent yes
no yes no yes 31…40 high yes fair yes
>40 medium no excellent no
Algorithm for Decision Tree Induction

Basic algorithm (a greedy algorithm)

During the late 1970s and early 1980s, J. Ross Quinlan, a
researcher in machine learning, developed a decision tree
algorithm known as ID3 (Iterative Dichotomiser).
Quinlan later presented C4.5 (a successor of ID3), benchmark to
which newer supervised learning algorithms are often compared.
◦ Tree is constructed in a top-down recursive divide-and-conquer
manner
◦ At start, all the training examples are at the root
◦ Attributes are categorical (if continuous-valued, they are discretized
in advance)
◦ Examples partitioned recursively based on selected attributes
◦ Attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
Brief Review of Entropy

m=2
Question?
How do you determine which attribute best classifies data?
Answer: Entropy!

Information gain:
◦ Statistical quantity measuring how well an attribute classifies the data.
◦ Calculate the information gain for each attribute.
◦ Choose attribute with greatest information gain.
But how do you measure information?
◦ Claude Shannon in 1948 at Bell Labs established the field of information theory.

◦ Mathematical function, Entropy, measures information content of random process:

◦ Takes on largest value when events are equiprobable.

◦ Takes on smallest value when only one event has
non-zero probability.

◦ For two states: Entropy of set S denoted by H(S)

◦Positive examples and Negative examples from set S:
H(S) = -p+log2(p+) - p-log2(p-)
Largest
entropy

Boolean
functions
with the same
number of
ones and
zeros have
largest
entropy
(Back to the story of ID3)
Information gain is our metric for how well one attribute A i classifies the training data.

Information gain for a particular attribute = Information about target function, given the value of that
attribute.(conditional entropy)

Mathematical expression for information gain:

Gain( S , Ai )  H ( S)   P( A  v ) H ( S )
vValues ( Ai )
i v

entropy
Entropy for
value v
ID3 algorithm (boolean-valued function)

Calculate the entropy for all training examples

◦ positive and negative cases
◦ p+ = #pos/Tot p- = #neg/Tot
◦ H(S) = -p+log2(p+) - p-log2(p-)
◦ Determine which single attribute best classifies the training examples using
information gain.
◦ For each attribute find:
Gain( S , Ai )  H ( S)   P( A  v ) H ( S )
v Values ( Ai )
i v

◦ Use attribute with greatest information gain as a root / node

Attribute Selection Measure:
Information Gain (ID3/C4.5)
 Select the attribute with the highest information gain
 Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed to classify
m a tuple in D:
Info ( D)   pi log 2 ( pi )
i 1
 Information needed (after using A to split D into v partitions) to
v | D |
classify D:
Info A ( D )  
j
 Info ( D j )
j 1 | D |
 Information gained by branching on attribute A
Gain(A)  Info(D)  Info A(D)
Attribute Selection: Information Gain
gClass P: buys_computer = “yes”
gClass N: buys_computer = “no”

9 9 5 5
Info ( D )  I (9,5)   log 2 ( )  log 2 ( ) 0.940
14 14 14 14
5 4
Info age ( D )  I ( 2,3)  I ( 4,0)
14 14
5
 I (3,2)  0.694
14 Log2 (a/b) = log10(a/b)

5 ________
I (2,3) means “age <=30” has 5 out of 14
14 samples, with 2 yes’es and 3 no’s. log10(2)
Gain(age)  Info ( D )  Info age ( D )  0.246

0.940 – 0.694 = 0.246

Similarly,

Gain(income)  0.029
Gain( student )  0.151
Gain(credit _ rating )  0.048
Data to be classified:
X = (age <=30, Income = medium, Student = yes,
Credit_rating = Fair)

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes
Conditions for stopping partitioning
All samples for a given node belong to the
same class
There are no samples left
There are no remaining attributes for further
partitioning
BP Angiogram TMT Echo Class
Yes No Yes Yes A
No No No No B
Yes Yes No No C
Yes No No Yes A
No Yes No No C
No Yes No No C
Yes No Yes Yes A
Yes No Yes Yes A
Yes No No Yes A
Yes No No No B
No Yes No No C
No Yes No No C

5 6087116402542512855
100% (2)
5 6087116402542512855
6 pages
Classification: Basic Concepts
No ratings yet
Classification: Basic Concepts
73 pages
AI-day-3-14th mar-2023
No ratings yet
AI-day-3-14th mar-2023
12 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
_08ClassBasic_v1
No ratings yet
_08ClassBasic_v1
46 pages
Chapter#03 Supervised Learning and Its Algorithms - III
No ratings yet
Chapter#03 Supervised Learning and Its Algorithms - III
29 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
04 Classification
No ratings yet
04 Classification
72 pages
Decision Tree
100% (1)
Decision Tree
12 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Decision Tree-Using Entropy
No ratings yet
Decision Tree-Using Entropy
17 pages
08ClassBasic-L
No ratings yet
08ClassBasic-L
78 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Mod 3 part1_merged
No ratings yet
Mod 3 part1_merged
101 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
UN Data minig
No ratings yet
UN Data minig
24 pages
08 Class Basic
No ratings yet
08 Class Basic
76 pages
Decision Tree Part 1
No ratings yet
Decision Tree Part 1
15 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Lecture 4
No ratings yet
Lecture 4
79 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
CH 5
No ratings yet
CH 5
81 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
Classification
No ratings yet
Classification
73 pages
MIS416 Chapter6 by DrAsimAlwabel
No ratings yet
MIS416 Chapter6 by DrAsimAlwabel
73 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
5.KNN Naive Bayes and DT
No ratings yet
5.KNN Naive Bayes and DT
44 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
05 Classification
No ratings yet
05 Classification
79 pages
Module 3
No ratings yet
Module 3
132 pages
Classification Intr DT .Pptx
No ratings yet
Classification Intr DT .Pptx
31 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Unit 3-Classification
No ratings yet
Unit 3-Classification
71 pages
ID3 Algorithm & ROC Analysis
No ratings yet
ID3 Algorithm & ROC Analysis
51 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
DM GTU Study Material Presentations Unit-4 21052021124323PM
No ratings yet
DM GTU Study Material Presentations Unit-4 21052021124323PM
28 pages
Supervised Learning
No ratings yet
Supervised Learning
41 pages
unit 2 notes (1)
No ratings yet
unit 2 notes (1)
83 pages
DS_w12_DT
No ratings yet
DS_w12_DT
61 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
AIML- Module 3- Updated
No ratings yet
AIML- Module 3- Updated
42 pages
Unit-3
No ratings yet
Unit-3
98 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Ml Lecture04x2
No ratings yet
Ml Lecture04x2
16 pages
dm4
No ratings yet
dm4
68 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
MCSL - 228 Solved Assignment
No ratings yet
MCSL - 228 Solved Assignment
37 pages
MID Server Best Practices
No ratings yet
MID Server Best Practices
26 pages
Proxmark III User Guid
No ratings yet
Proxmark III User Guid
24 pages
Red Hat OpenShfit - DO280
0% (1)
Red Hat OpenShfit - DO280
7 pages
Computer Security Audit Checklist
No ratings yet
Computer Security Audit Checklist
5 pages
21ME653 - Mechatronics - B Division
No ratings yet
21ME653 - Mechatronics - B Division
2 pages
Data Backup and Disaster Recovery in The Cloud: February 2024
No ratings yet
Data Backup and Disaster Recovery in The Cloud: February 2024
20 pages
E5-E6 - Text - Chapter 4. Overview of Broadband Network
No ratings yet
E5-E6 - Text - Chapter 4. Overview of Broadband Network
55 pages
Info Download Shop Help Bartels Media
No ratings yet
Info Download Shop Help Bartels Media
10 pages
Vav-Dd7: Features and Highlights
No ratings yet
Vav-Dd7: Features and Highlights
2 pages
IJTG
No ratings yet
IJTG
13 pages
Pricelist Desktop PC
No ratings yet
Pricelist Desktop PC
174 pages
CSS10-1ST Quarter - Week 5
No ratings yet
CSS10-1ST Quarter - Week 5
21 pages
Synology SAT5210 Data Sheet Enu
No ratings yet
Synology SAT5210 Data Sheet Enu
3 pages
Gender Recognition From Image
No ratings yet
Gender Recognition From Image
22 pages
Azure + Dynamics 365 + Online Services - Public & Government - SOC 1 Type II Report (10-01-2022 To 09-30-2023)
No ratings yet
Azure + Dynamics 365 + Online Services - Public & Government - SOC 1 Type II Report (10-01-2022 To 09-30-2023)
142 pages
s71500 Cm Ptp Function Manual en-US en-US
No ratings yet
s71500 Cm Ptp Function Manual en-US en-US
189 pages
PowerEdge T350
No ratings yet
PowerEdge T350
3 pages
IDS816 Installer Manual
No ratings yet
IDS816 Installer Manual
65 pages
Fall-19-BSIT-043 (Hammad Arif)
No ratings yet
Fall-19-BSIT-043 (Hammad Arif)
5 pages
20BCS5847 - SIDDHARTH PANDEY - Report
No ratings yet
20BCS5847 - SIDDHARTH PANDEY - Report
27 pages
The Elements of Programming Style
No ratings yet
The Elements of Programming Style
3 pages
Commission Sheet Template 05
No ratings yet
Commission Sheet Template 05
10 pages
Computer Planner Second Term Four Class
No ratings yet
Computer Planner Second Term Four Class
5 pages
Unit - V Introduction To Computational Complexity: 5.1. Time & Space Complexity of Tms
No ratings yet
Unit - V Introduction To Computational Complexity: 5.1. Time & Space Complexity of Tms
9 pages
Dod STD 2167
No ratings yet
Dod STD 2167
95 pages
Data Structure Introduction
No ratings yet
Data Structure Introduction
58 pages
DSE892 Update Instructions
No ratings yet
DSE892 Update Instructions
1 page
Cybersecurity-Protecting-Our-Digital-World
No ratings yet
Cybersecurity-Protecting-Our-Digital-World
15 pages