0% found this document useful (0 votes)

13 views28 pages

Lecture11evaluationmetricsforclassification 240913060639 0c766554

The document discusses evaluation metrics for binary classifiers in machine learning, including confusion matrix, accuracy, precision, recall, F-score, and ROC curve. It explains how these metrics are calculated and their significance in assessing classifier performance. Additionally, it covers the trade-offs between precision and recall and introduces the AUC score as a summary measure of the ROC curve.

Uploaded by

signe.magne

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views28 pages

Lecture11evaluationmetricsforclassification 240913060639 0c766554

Uploaded by

signe.magne

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Evaluation Metrics for classifica-

tion
Confusion Matrix, Precision, Re-
call, F-score, Accuracy, ROC
curve

Machine Learning, CS354

 Slides courtesy: Nuriev Sirojiddin, Tuong

1 Le
CONTENTS

BINARY CLASSIFIER
CONFUSION MATRIX
ACCURACY
PRECISION
RECALL
ROC CURVE
- HOW TO PLOT ROC CURVE
2
Binary Classifier

 A binary classifier produces output with two class values or labels,

such as Yes/No, 1/0, Positive/Negative for given input data
 For performance evaluation: Observed labels are used to compare
with the predicted labels after classification
 The predicted labels will be exactly the same, if the performance of
a classifier is perfect.
 But it is uncommon to be able to develop a perfect classifier.

3
Confusion Matrix

 A confusion matrix is formed from the four outcomes

produced as a result of binary classification
 True positive (TP): correct positive prediction
 False positive (FP): incorrect positive prediction
 True negative (TN): correct negative prediction
 False negative (FN): incorrect negative prediction

4
Confusion Matrix

 Binary Classification: Green Vs. Grey

• Rows represent what is predicted, and columns represent what

is the actual label.

5
Confusion Matrix

True Positives False Positives

Green examples Gray examples
correctly identified Confusion Matrix falsely identified as
as green green

False Negatives True Negatives

Green examples Gray examples cor-
falsely identified as rectly identified as
grey grey

6
Accuracy

 Accuracy is calculated as the number of all correct predictions

divided by the total number of the dataset
 The best ACC is 1.0, whereas the worst is 0.0

= (TP + TN) / (TP+ FP + TN + FN)

 Accuracy = (9+8) / (9+2+1+8) = 0.85

7
Precision

 Precision is calculated as the number of correct positive predictions

divided by the total number of positive predictions
 The best precision is 1.0, whereas the worst is 0.0

 Precision = 9 / (9+2) = 0.81

8
Recall

 Sensitivity = Recall = True Positive Rate

 Recall is calculated as the number of correct positive predictions
divided by the total number of positives
 The best recall is 1.0, whereas the worst is 0.0
= TP / (TP+FN)

 Recall = 9 / (9+1) = 0.9

9
Example 1

 Example: The example to classify whether images contain either a dog

or a cat
 The training data contains 25000 images of dogs and cats;
 The training data 75% of 25000 images; (25000*0.75 = 18750)
 Validation data 25% of training data; (25000*0.25 = 6250)
Test Data, 5 cats, 5 dogs

 Precision = 2/(2 + 0) * 100% = 100%

 Recall = 2/(2 + 3) * 100% = 40%
 Accuracy = (2 + 5)/(2 + 0 + 3 + 5) * 100% = 70%
10
Example 2 (Multi-class classification)

= = = 170/300 = .556

= = .5
= = .5
= = .667
= 0.556

= = .3
= = .6
= = 0.8
P = 0.556

11
Precision vs Recall

 There is always a tradeoff between precision and recall.

 Higher precision means lower recall and vice versa.
 Due to this tradeoff, precision/recall alone will not be very useful.
 We need to compute both the measures to get a true picture.
 There is another measure, which takes into account both precision and
recall, i.e. the F-measure.

12
The F-measure

 F-measure: A combined measure that assesses the precision/recall

tradeoff
 It is computed as a weighted harmonic mean of both preci-
sion and recall. 1 (  2
 1) PR
F  2
1 1  PR
  (1   )
P R
 People usually use balanced F1 measure, where F1 measure is an F measure with

β=1
 i.e., with β = 1 or α = ½
2 𝑃𝑅
 Thus, the F1-measure can be computed using the following equation:
𝐹 1=
𝑃+𝑅

 For example 1, precision was 1.0 and recall was 0.4, therefore, F1-measure can
13
The F1-measure (Example)

 In example 1:
 Precision was 1.0 2 𝑃𝑅
𝐹 1=
 Recall was 0.4
𝑃+𝑅
 Therefore, F1-measure can be computed as:
2∗ 1.0 ∗0.4
𝐹 1=
1.0+ 0.4
0.8
𝐹 1= =0.57
1.4

 Therefore, F1-measure = 0.57

 Try yourself:
 Precision=0.8, recall=0.3
 Precision=0.4, recall=0.9

14
ROC Curve
Binary classification Confusion Ma-
trix
ROC Curve basics

 The ROC curve is an evaluation measure that is based on two

basic evaluation measures:
- specificity and sensitivity
 Specificity = True Negative Rate,

 Sensitivity – It is the same as Recall = True Positive Rate,

 Specificity
 Specificity is calculated as the number of correct negative
predictions divided by the total number of negatives.

17
ROC Curve basics (contd.)

 In order to understand, let us consider a disease detection (binary classification)

problem.
 Here, positive class represents diseased people and negative class represents
healthy people.
 Broadly speaking, these two quantities tell us how good we are at detecting
diseased people (sensitivity) and healthy people (specificity).
 The sensitivity is the proportion of the diseased people (T P +F N) that
we correctly classify as being diseased (T P).
 Speciﬁcity is the proportion of all of the healthy people (T N + F P) that
we correctly classify as being healthy (T N).

18
ROC Curve basics (contd.)

 If we diagnosed everyone as healthy, we would have a speciﬁcity

of 1 (very good – we diagnose all healthy people correctly), but a
sensitivity of 0 (we diagnose all unhealthy people incorrectly)
which is very bad.
 Ideally, we would like Se = Sp = 1, i.e. perfect sensitivity and
speciﬁcity.
 It is often convenient to be able to combine sensitivity and speciﬁc-
ity into a single value.
 This can be achieved through evaluating the area under the re-
ceiver operating characteristic (ROC) curve.
19
What is an ROC Curve?

 The ROC curve stands for Receiver Operating Characteristics

curve.
 Used in signal detection to show the tradeoff between hit rate and
false alarm rate over a noisy channel, hence the term ‘receiver’
operating characteristics.
 It is a visual tool for evaluating the performance of a binary classifier.
 It illustrates the diagnostic ability of a binary classification system
as its discrimination threshold is varied.

 An example ROC curve is shown in the figure.

 The blue line represents the ROC curve.
 The dashed line is a reference curve.

20
ROC Curve presumption
 Before, we can plot the ROC curve, it is assumed that the classifier produces a positivity score,
which can then be used to determine a discrete label.
 E.g. in case of NB classifier, the positivity score is the probability of positive class given the
test example. Usually, a cut-off threshold v = 0.5 is applied on this positivity score for pro-
ducing the output label.
 In case of ML algorithms, that produce a discrete label for the test examples, slight modifications
can be made to produce a positivity score, e.g.
1. In KNN algorithm with K=3, majority voting is used to produce a discrete output label.
 However, a positive score can be produced by computing the probability/ratio of posi-
tive neighbors of the test example.
 i.e. if K=3, and the labels for closest neighbors are: [1, 0, 1], then the positivity score
may be (1+0+1)/3 = 2/3 = 0.66. A cut-off threshold v can be applied on this positivity
score to produce an output label, usually, v = 0.5.
 For decision tree algorithm, refer to the following link:
https://stats.stackexchange.com/questions/105760/how-we-can-draw-an-roc-curve-for-decision-trees

21
ROC Curve presumption

 An ROC curve is plotted by varying the cut-off threshold v.

 The y-axis represents the sensitivity (true positive rate).
 The x-axis represents 1-specificity, also called the false positive rate.

 The threshold v is varied (e.g. from 0 to 1), and a pair of sensitivity/specificity

value is achieved, which forms a single point on the ROC curve.
 E.g. for v = 0.5, say sensitivity=0.8, specificity=0.6 (hence FPR=1-speci-
ficity=0.4), hence (x,y)=(0.8, 0.4). However, if v = 0.6, say sensitivity=0.7,
specificity=0.7 (hence FPR=1-specificity=0.3), hence (x,y) = (0.7, 0.3)
 The final ROC curve is drawn by connecting all these points, e.g. given below:

22
How to Plot ROC Curve - Example

 Dynamic cut-off thresh-

olds
Cut-off = 0.020 Cut-off = 0.015 Cut-off = 0.010

23
How to Plot ROC Curve - Example

 True positive rate (TPR) = 𝑇𝑃/(𝑇𝑃+𝐹𝑁)

and False positive rate (FPR) = 𝐹𝑃/(𝐹𝑃+𝑇𝑁)
 Use different cut-off thresholds (0.00, 0.01, 0.02,…, 1.00),
calculate the TPR and FPR, and plot them into graph.
That is receiver operating characteristic (ROC) curve.
 Example

TPR = 0.5 TPR = 1 TPR = 1

FPR = 0 FPR = 0.167 FPR = 0.667

24
How to Plot ROC Curve

 A ROC curve is created by connecting ROC points of a classifier

 A ROC point is a point with a pair of x and y values
where x is 1-specificity and y is sensitivity
 The curve starts at (0.0, 0.0) and ends at (1.0, 1.0)

25
ROC Curve

 A classifier with the random performance level always shows a straight

line
 Two areas separated by this ROC curve
 ROC curves in the area with the top left corner indicate good performance
levels
 ROC curves in the other area with the bottom right corner indicate poor per-
formance levels

26
ROC Curve

 A classifier with the perfect performance level shows

a combination of two straight lines
 It is important to notice that classifiers with meaningful performance
levels
usually lie in the area between the random ROC curve and the perfect
ROC curve

27
The AUC measure

 AUC(Area under the ROC curve) score

 An advantage of using ROC curve is a single measure called AUC score
 As the name indicates, it is an area under the curve calculated in the ROC space
 Although the theoretical range of AUC score is between 0 and 1, the actual scores of
meaningful classifiers are greater than 0.5, which is the AUC score of a random classi-
fier

 ROC curves clearly shows classifiers A

outperforms classifier B

Ca 3 Merged
No ratings yet
Ca 3 Merged
275 pages
Format of Deworming Masterlist Form 1 Modified School Level Reporting Form 7 Dahlia
No ratings yet
Format of Deworming Masterlist Form 1 Modified School Level Reporting Form 7 Dahlia
8 pages
IS 14458 - 2 - 1997 - Design of Retaining - Breast Wall
100% (1)
IS 14458 - 2 - 1997 - Design of Retaining - Breast Wall
11 pages
Roc Curve in Python
No ratings yet
Roc Curve in Python
58 pages
Principles of Information Security, Fifth Edition: Risk Management
100% (2)
Principles of Information Security, Fifth Edition: Risk Management
32 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
Holte Slides
No ratings yet
Holte Slides
47 pages
Performance Parameters
No ratings yet
Performance Parameters
23 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
Performance Measures
No ratings yet
Performance Measures
32 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
How To Use ROC Curves and Precision-Recall Curves For Classification in Python
No ratings yet
How To Use ROC Curves and Precision-Recall Curves For Classification in Python
47 pages
An Introduction To ROC Analysis
100% (1)
An Introduction To ROC Analysis
14 pages
An Introduction To ROC Analysis
No ratings yet
An Introduction To ROC Analysis
14 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
Performance
No ratings yet
Performance
11 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Evaluation in Ai
No ratings yet
Evaluation in Ai
25 pages
Spare Parts Catalog: 6 WG 260 Material Number: 4646.066.001 Current Date: 27.08.2024
No ratings yet
Spare Parts Catalog: 6 WG 260 Material Number: 4646.066.001 Current Date: 27.08.2024
103 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
AI Performance Evaluation - Annotated
No ratings yet
AI Performance Evaluation - Annotated
52 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
4.9 Estimating The Performance of A Classifier II
No ratings yet
4.9 Estimating The Performance of A Classifier II
16 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
18 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
Lecture 10
No ratings yet
Lecture 10
16 pages
Lec5 Classification
No ratings yet
Lec5 Classification
27 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Introduction To ROC Analysis
No ratings yet
Introduction To ROC Analysis
15 pages
Statistical Modelling and Evaluation
No ratings yet
Statistical Modelling and Evaluation
15 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Introduction To Data Mining Unit 4
No ratings yet
Introduction To Data Mining Unit 4
13 pages
004 07 Roc Auc Eer W4L2 W5L1 PDF
No ratings yet
004 07 Roc Auc Eer W4L2 W5L1 PDF
12 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
Imbalance Problem
No ratings yet
Imbalance Problem
13 pages
13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
No ratings yet
13-Module 5 - ROC Curve Analysis - Introduction and Motivation-26-09-2023
8 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Introduction To ROC Analysis
No ratings yet
Introduction To ROC Analysis
15 pages
Unit 2 Chap 4
No ratings yet
Unit 2 Chap 4
14 pages
An Introduction To ROC Curve (Receiver Operating Characteristics)
No ratings yet
An Introduction To ROC Curve (Receiver Operating Characteristics)
16 pages
Introduction To ROC Analysis: Pattern Recognition Letters June 2006
No ratings yet
Introduction To ROC Analysis: Pattern Recognition Letters June 2006
16 pages
Inteligen 500 g2 Datasheet
No ratings yet
Inteligen 500 g2 Datasheet
4 pages
9b. Evaluation of Classifiers
No ratings yet
9b. Evaluation of Classifiers
4 pages
ICA and Curve ROC.: P Erez Mart Inez Luis Alberto March 4, 2024
No ratings yet
ICA and Curve ROC.: P Erez Mart Inez Luis Alberto March 4, 2024
5 pages
AUC ROC Curve
No ratings yet
AUC ROC Curve
5 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
PROS - Ivanna Kristianti T - Predicting Receiver Operating Characteristic - Fulltext
No ratings yet
PROS - Ivanna Kristianti T - Predicting Receiver Operating Characteristic - Fulltext
5 pages
5 ROC Curve
No ratings yet
5 ROC Curve
2 pages
Flach Roc Analysis
No ratings yet
Flach Roc Analysis
12 pages
Auc Roc Curve Machine Learning
No ratings yet
Auc Roc Curve Machine Learning
12 pages
CS340 Machine Learning ROC Curves
No ratings yet
CS340 Machine Learning ROC Curves
8 pages
Roc 1 PDF
No ratings yet
Roc 1 PDF
8 pages
SAP RVND Configurations
No ratings yet
SAP RVND Configurations
61 pages
Coastal Road Project
No ratings yet
Coastal Road Project
95 pages
HyCO 2DT Manual (Preview)
100% (2)
HyCO 2DT Manual (Preview)
14 pages
DMRC Report
100% (1)
DMRC Report
66 pages
Section 11 Piping Systems, Valves and Pumps: I - Part 1 GL 2012 Page 11-1
No ratings yet
Section 11 Piping Systems, Valves and Pumps: I - Part 1 GL 2012 Page 11-1
52 pages
Manual de Utilizare Fit Mate GS Engleza
No ratings yet
Manual de Utilizare Fit Mate GS Engleza
244 pages
32 Civsyll
No ratings yet
32 Civsyll
63 pages
Hash Table Time Costs - Hash Functions - The Map Interface and Implementations
No ratings yet
Hash Table Time Costs - Hash Functions - The Map Interface and Implementations
25 pages
Adimasu Ambo L 2016
No ratings yet
Adimasu Ambo L 2016
42 pages
# 3 - Ultrasonic Machining
No ratings yet
# 3 - Ultrasonic Machining
48 pages
Lean Model To Reduce Picking Time Delays - 2020
No ratings yet
Lean Model To Reduce Picking Time Delays - 2020
10 pages
Group 1 - Learning Plan
No ratings yet
Group 1 - Learning Plan
21 pages
McClellands Acquired Needs Theory
No ratings yet
McClellands Acquired Needs Theory
14 pages
Tradexpert: Revolutionizing Trading With Mixture of Expert Llms
No ratings yet
Tradexpert: Revolutionizing Trading With Mixture of Expert Llms
9 pages
TOR SF International Seminar & Expo
No ratings yet
TOR SF International Seminar & Expo
18 pages
Fake News Detector With Real Time Web Scraping
No ratings yet
Fake News Detector With Real Time Web Scraping
11 pages
A Cuckoo Search Based Pairwise Strategy For Combinatorial Testing Problem
No ratings yet
A Cuckoo Search Based Pairwise Strategy For Combinatorial Testing Problem
9 pages
Correlation Between Viscosity and V-Funnel Flow
No ratings yet
Correlation Between Viscosity and V-Funnel Flow
6 pages
Corbin Fisher - Sheet1
No ratings yet
Corbin Fisher - Sheet1
4 pages
The Role of Mini Bronchoalveolar Lavage Fluid in The Diagnosis of
No ratings yet
The Role of Mini Bronchoalveolar Lavage Fluid in The Diagnosis of
7 pages
Ec 11100
No ratings yet
Ec 11100
1 page
World's Safest Airlines
No ratings yet
World's Safest Airlines
2 pages
Fancy Friends - Pleasing
No ratings yet
Fancy Friends - Pleasing
1 page
Result List (2) of Selected Candidates Under Stipendium Hungaricum Programme 2018
No ratings yet
Result List (2) of Selected Candidates Under Stipendium Hungaricum Programme 2018
3 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet