Lesson 4 - Performance Metrics
Lesson 4 - Performance Metrics
“Numbers have an important story to tell. They rely on you to give them a voice.”
— Stephen Few
• Machine learning models are evaluated against the performance measures you choose.
• Performance metrics help in evaluating the efficiency and accuracy of the machine learning models.
Performance Metrics
Topic 2: Key Methods of Performance Metrics
Key Methods of Performance Metrics
• The confusion matrix is one of the most intuitive and easiest metrics used for finding the correctness and
accuracy of the model.
• It is used for classification problem where the output can be of two or more types of classes.
Confusion Metrics: Example
okay
TP True Positive
TN True Negative
FP False Positive
FN False Negative
True Positive
TP True Positive • True positives are the cases where the actual class of the data
point is 1 (true) and the predicted is also 1 (true).
TN
The case where a person has cancer and the model classifies the
case as cancer positive comes under true positive.
FP
FN
True Negative
TP • True negatives are the cases when the actual class of the
data point is 0 (false) and the predicted is also 0 (false).
TN True Negative
The case where a person does not have cancer and the model
classifies the case as cancer negative comes under true negative.
FP
FN
False Positive
TP • False positives are the cases when the actual class of the data point is
0 (false) and the predicted is 1 (true).
TN
The case where a person does not have cancer and the model
classifies the case as cancer positive comes under false positive.
FP False Positive
FN
False Negative
TP • False negatives are the cases when the actual class of the data
point is 1 (true) and the predicted is 0 (false).
• It is false because the model has predicted incorrectly.
• It is negative because the class predicted was a negative one.
TN
The case where person has cancer and the model classifies the
case as cancer negative comes under false negative.
FP
FN False Negative
Minimize False Cases
• No rules are defined to identify the false cases that need to be minimized.
• Two things that determine false negatives and positives are business needs and the context of
the problem.
Minimize False Negatives: Example
Bad Model
Out of Actual cancer patient
100 people =5
Predicts everyone as
non-cancerous
Accuracy = 95%
Minimize False Positives: Example
The model needs to classify an email as spam or ham (term used for genuine email).
1 Email is spam
Classifies
• An important email marked as spam is more
business critical than diverting a spam email to the
inbox.
Actual
Positives (1)
TP FP
Predicates
FN TN
Negatives (0)
Accuracy = TP + TN
TP + FP + FN + TN
Accuracy: Example
• Accuracy is a good measure when the target variable classes in the data are nearly balanced.
• In this image, 60% classes are apple and 40% are oranges.
• With this type of data, the machine learning model will have approximately 97% accuracy in any
new predictions.
Accuracy as a Measure
• Consider the previously discussed cancer detection example where only 5 out of 100 people have
cancer.
• In doing so, it classifies 95 non-cancerous patients correctly and 5 cancerous patients as non-cancerous.
• Now, even though the model did not accurately predict the cancer patients, the accuracy of the model is
95%.
Note:
When the majority of the target variable classes in data belong to a single class, accuracy should not be used as a measure.
Precision
• Precision refers to the closeness of two or more measurements with each other.
• It aims at deriving the correct proportion of positive identifications.
Precision: Calculation
Actual
Positives (1)
TP FP
Predicates
FN TN
Negatives (0)
Precision = TP
TP + FP
Precision: Example
• Consider the previously discussed cancer detection example where only 5 out of 100 people have
cancer.
• Precision will help you identify the proportion of cancer diagnosed patients who actually have
cancer.
• The predicted positives are the people who are predicted to have cancer and include true
positives and false positives.
• The actual positives are the people actually having cancer and include true positives.
Precision: Example (contd.)
Consider that the model is bad and predicts every case as cancer positive. In such scenario:
TP
Precision =
TP + FP
• The numerator is 5 and denotes a person having cancer. The model predicts the case as cancer
positive.
Actual
Positives (1)
TP FP
Predicates
FN TN
Negatives (0)
Recall = TP
TP + FN
Recall or Sensitivity: Example
• Consider the previously discussed cancer detection example where only 5 out of 100 people have
cancer.
• Recall helps you identify the actual proportion of cancerous patients that are diagnosed by the
algorithm.
Recall = TP
TP + FN
• The model predicting the case as cancer is also 5 (Since five cancer cases are predicted correctly).
• Precision is about being precise, whereas recall is about capturing all the cases.
• Therefore, even if the model captures one correct cancer positive case, it is 100% precise.
• If the model captures every case as cancer positive, you have 100% recall.
• If you want to focus more on minimizing false negatives, you would want 100% recall with good
precision score.
• If you want to focus on minimizing false positives, then you should aim for 100% precision.
Specificity
• Specificity measures the proportion of actual negatives that are correctly identified.
• Specificity tries to identify the probability of a negative test result when input with negative
example.
Specificity: Calculation
Actual
Positives (1)
TP FP
Predicates
FN TN
Negatives (0)
Specifically = TN
TN + FP
Specificity: Example
• Consider the previously discussed cancer detection example where out of 100 people, only 5
people have cancer.
• Specificity identifies the proportion of patients who were predicted as not having cancer and are
non-cancerous.
Specificity = TN
TN + FP
• The numerator (person not having cancer and the model predicting the case as no cancer) is 0.
• Since every predicted case is cancerous, the specificity of this model is 0%.
Note:
Specificity is exact opposite of recall.
F1 Score
Actual
Fraud Not Fraud
Fraud
3 97 Arithmetic Mean
= (x + y)/2
Predicates
• Let’s consider 100 credit card transactions, out of which 97 are legit and 3 are fraud.
Precision = 3 = 3%
100
Recall = 3 = 100%
3
3+100
Arithmetic mean = = 51.5%
2
Note:
A model that predicts every transaction as fraud should not be given a moderate score.
Instead of arithmetic mean, harmonic mean can be used.
Harmonic Mean
• With reference to the fraud detection example, F1 score can be calculated as:
2 * Precision* Recall
F1 Score =
Precision + Recall
= 2 * 3 * 100 = 5%
100 + 3
Key Takeaways
Confusion metrics is used for finding the correctness and accuracy of the model.
Accuracy is the number of correct predictions made by the model over all kinds of
predictions.
Precision tries to answer the question, “what proportion of positive identifications was
actually correct?”
Recall measures the proportion of actual positives that are identified correctly.
Specificity measures the proportion of actual negatives that are identified correctly.
F1 Score gives a single score that represents both precision(P) and recall(R).
The harmonic mean is used when the sample data contains extreme values (too big or
too small) because it is more balanced than arithmetic mean.
Quiz
QUIZ
What is precision?
1
b. Precision is also known as the positive predictive value. It is a measure of the amount of accurate
positives that our model claims in comparison with the number of positives it actually claims.
c. Precision is also known as the positive predictive value. Recall is the number of correct
predictions made by the model over all kinds predictions made.
d. Precision is the number of correct predictions made by the model over all kinds predictions
made.
QUIZ
What is precision?
1
b. Precision is also known as the positive predictive value. It is a measure of the amount of accurate
positives that our model claims in comparison with the number of positives it actually claims.
c. Precision is the number of correct predictions made by the model over all kinds predictions made.
Recall is the number of correct predictions made by the model over all kinds predictions made.
d. Precision is the number of correct predictions made by the model over all kinds predictions
made.
Precision is also known as the positive predictive value, and it is a measure of the amount of accurate positives our model claims
compared to the number of positives it actually claims.
QUIZ
Which is more important: model accuracy or model performance?
2
c. There are models with higher accuracy that can perform worse in predictive power. it has
everything to do with how model accuracy is only a subset of model performance.
d. Confusion matrix is the key for model performance and is determined by accuracy. The more
the accuracy better the model.
QUIZ
Which is more important: model accuracy or model performance?
2
c. There are models with higher accuracy that can perform worse in predictive power. it has
everything to do with how model accuracy is only a subset of model performance.
d. Confusion matrix is the key for model performance and is determined by accuracy. The more
the accuracy better the model.
There are models with higher accuracy that can perform worse in predictive power. it has everything to do with how model accuracy
is only a subset of model performance.
This concludes “Performance Metrics.”