Notes 03
Notes 03
EE514 – CS535
Zubair Khalid
https://www.zubairkhalid.org/ee514_2023.html
Outline
- For each test-point, the loss is either 0 or 1; whether the prediction is correct or
incorrect.
- Averaged over n data-points, this loss is a ‘Misclassification Rate’.
Interpretation:
- Misclassification Rate: Estimate of the probability that a point is incorrectly classified.
- Accuracy = 1- Misclassification rate
Issue:
- Not meaningful when the classes are imbalanced or skewed.
Evaluation of Classification Performance
Classification Accuracy (0/1 Loss):
Example:
- Predict if a bowler will not bowl a no-ball?
- Assuming 15 no-balls in an inning, a model that says ‘Yes’ all the time will have
95% accuracy.
- Using accuracy as performance metric, we can say that a model is very accurate,
but it is not useful or valuable in fact.
Why?
- Total points: 315 (assuming other balls are legal ☺)
- No-ball label: Class 0 (4.76% are from this class) Imbalanced
- Not a no-ball label: Class 1 (95.24% are from this class) Classes
Evaluation of Classification Performance
TP, TN, FP and FN:
- Consider a binary classification problem.
Evaluation of Classification Performance
TP, TN, FP and FN:
Evaluation of Classification Performance
TP, TN, FP and FN:
Example:
- Predict if a bowler will not bowl a no-ball?
- 15 no-balls in an inning (Total balls: 315)
- Bowl no-ball (Class 0), Bowl regular ball (Class 1)
- Model(*) predicted 10 no-balls (8 correct predictions, 2 incorrect)
* Assume you have a model that has been observing the bowlers for the last 15 years
and used these observations for learning.
Evaluation of Classification Performance
Confusion Matrix (Contingency Table):
- (TP; TN; FP; FN); usefully summarized in a table, referred to as confusion matrix:
- the rows correspond to predicted class (𝑦)
ො
- and the columns to true class (𝑦)
Actual Labels
1 (Positive) 0 (Negative) Total
1 (Positive) Predicted Total
Predicted TP FP Positives
Labels 0 (Negative) Predicted Total
FN TN Negatives
Total P= TP+FN N= P+TN
Actual Actual
Total Total
Positives Negatives
Evaluation of Classification Performance
Confusion Matrix:
Actual Labels
Example:
- Disease Detection : 1 (Positive) 0 (Negative) Total
Given pathology reports and
1 (Positive)
scans, predict heart disease Predicted TP = 100 FP = 10 110
- Yes: 1, No: 0 Labels 0 (Negative)
FN = 5 TN = 50 55
- Sensitivity or Recall or True Positive Rate (TPR): How often does it predict Positive
when it is actually Positive?
Evaluation of Classification Performance
Confusion Matrix:
Metrics using Confusion Matrix:
- Specificity or True Negative Rate (TNR): When it's actually Negative, how often does it
predict Negative?
= (100+50)/165 = 0.91
= (10+5)/165 = 0.09
- Sensitivity or Recall or True Positive Rate (TPR): When it's positive, how often does
the model detected disease?
= 100/105 = 0.95
Evaluation of Classification Performance
Confusion Matrix:
Metrics using Confusion Matrix (Example: Disease Prediction):
- False Positive Rate: Actually heathy, how often does it predict yes?
= 10/60 = 0.17
- Specificity or True Negative Rate (TNR): When it's actually health, how often does it predict
healthy?
= 50/60 = 0.83
= 100/110 = 0.91
Evaluation of Classification Performance
Confusion Matrix:
Metrics using Confusion Matrix:
When to use which?
- Sp and Se; how good we are at detecting healthy and diseased people, respectively.
- If we have diagnosed everyone healthy, Sp=1 (diagnose all healthy people correctly) but
Se=0 (diagnose all unhealthy people incorrectly)
- Ideally: we want Sp= Se= 1 (perfect sensitivity and specificity) but unrealistic.
Evaluation of Classification Performance
Confusion Matrix:
Sensitivity and Specificity Trade-off:
How optimal a pair of sensitivity, specificity values is?
- Is Sp= 0.8, Se= 0.7 better than Sp= 0.7, Se= 0.8?
Threshold
- In disease diagnosis;
- Higher levels of recall may be obtained at the price of lower values of precision.
- We need to define a single measure that combines recall and precision or other
metrics to evaluate the performance of a classifier.
- One measure that assesses recall and precision trade-off is weighted harmonic
mean (HM) of recall and precision, that is,
Evaluation of Classification Performance
F1 Score:
Why harmonic mean?
- We could also use arithmetic mean (AM) or geometric mean (GM).
- Matthew’s correlation coefficient determines the correlation between true class and predicted
class. The higher the correlation between true and predicted values, the better the prediction.
- Defined as
- For each value of the recall, determine the precision and find the average value of precision,
referred to as average precision (AP).
- This is just uniformly-spaced sampling of Precision-Recall curve and taking average value.
Examples:
- Emotion Detection.
- Vehicle Type, Make, model, color of the vehicle from the images streamed by safe city camera.
- State (rest, ramp-up, normal, ramp-down) of the process machine in the plant.
- Take an image of the sky and determine the pollution level (healthy, moderate, hazard).
- Record Home WiFi signals and identify the type of appliance being operated.
Multi-Class Classification
Implementation (Possible options using binary classifiers):
Option 1: Build a one-vs-all (OvA) one-vs-rest (OvR) classifier:
- How do we define the measures for the evaluation of the performance of multi-class classifier?
- Micro-averaging: Compute confusion matrix after collecting decisions for all classes and then
evaluate.
Evaluation of Classification Performance
Multiclass Classification:
Confusion Matrix
- Predict if a bowler will bowl a no-ball, wide bowl, regular bowl?
- 15 no-balls, 20 wide-balls in an inning (Total balls: 335)
- Model Predictions:
Actual
No-ball Wide-ball Regular ball Precision
No-ball
8 5 20
Classifier
Output
Wide-ball 2 10 10
Regular ball 5 5 270
Recall
Evaluation of Classification Performance
Multiclass Classification:
Confusion Matrix – Recall and Precision:
Recall
- For i-th class, recall represents the fraction of data-points classified correctly, that is,
Precision
- For i-th class, precision represents the fraction of data-points
predicted to be in class i are actually in the i-th class, that is,
Accuracy
- Fraction of data points classified correctly, that is,
Evaluation of Classification Performance
Multiclass Classification:
Confusion Matrix – Macro-Averaging:
- We compute performance for
each class and then average.
Classifier
No-ball 8 25 Wide 10 12 Regular 270 10
Output Not a no- Not
ball 7 295 Not Wide
10 303 Regular 30 25
Recall
Macro-average Recall:
Evaluation of Classification Performance
Multiclass Classification:
True False
Confusion Matrix – Micro-Averaging: Micro-average
- Compute confusion matrix after collecting
True
288 47 Recall:
decisions for all classes and then evaluate.
False
47 623
Classifier
No-ball 8 25 Wide 10 12 Regular 270 10
Output Not a no- Not
ball 7 295 Not Wide
10 303 Regular 30 25
Evaluation of Classification Performance
Multiclass Classification:
Micro-Averaging vs Macro Averaging:
- Note Micro-average recall= Micro-average precision = F1 Score = Accuracy (computed from
confusion matrix)
- Micro-average is termed as a global metric.
- Consequently, it is not a good measure when classes are not balanced.
Weighted-average Recall:
Evaluation of Classification Performance
References:
- KM 5.7.2