0% found this document useful (0 votes)
494 views

Accuracy, Precision, Recall & F1 Score Interpretation of Performance Measures

This document discusses accuracy, precision, recall, and the F1 score as performance measures for classification models. It defines true positives, true negatives, false positives, and false negatives using a confusion matrix. It then provides the formulas and interpretations for accuracy, precision, recall, and the F1 score. Accuracy measures the proportion of correct predictions, while precision measures the proportion of true positives, and recall measures the proportion of actual positives that were correctly identified. The F1 score calculates the weighted average of precision and recall to account for both false positives and false negatives.

Uploaded by

Friday Jones
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
494 views

Accuracy, Precision, Recall & F1 Score Interpretation of Performance Measures

This document discusses accuracy, precision, recall, and the F1 score as performance measures for classification models. It defines true positives, true negatives, false positives, and false negatives using a confusion matrix. It then provides the formulas and interpretations for accuracy, precision, recall, and the F1 score. Accuracy measures the proportion of correct predictions, while precision measures the proportion of true positives, and recall measures the proportion of actual positives that were correctly identified. The F1 score calculates the weighted average of precision and recall to account for both false positives and false negatives.

Uploaded by

Friday Jones
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Accuracy, Precision, Recall & F1 Score: Interpretation of Performance

Measures

A confusion matrix is a table that is often used to describe the performance of a


classification model on a set of test data for which the true values are known. So, let’s
talk about those four parameters first.

True positive and true negatives are the observations that are correctly predicted and
therefore shown in green. We want to minimize false positives and false negatives so
they are shown in red color. These terms are a bit confusing. So let’s take each term one
by one and understand it fully.

True Positives (TP) - These are the correctly predicted positive values which means that
the value of actual class is yes and the value of predicted class is also yes. E.g. if actual
class value indicates that this passenger survived and predicted class tells you the same
thing.

True Negatives (TN) - These are the correctly predicted negative values which means
that the value of actual class is no and value of predicted class is also no. E.g. if actual
class says this passenger did not survive and predicted class tells you the same thing.
False positives and false negatives, these values occur when your actual class
contradicts with the predicted class.

False Positives (FP) – When actual class is no and predicted class is yes. E.g. if actual
class says this passenger did not survive but predicted class tells you that this passenger
will survive.
False Negatives (FN) – When actual class is yes but predicted class in no. E.g. if actual
class value indicates that this passenger survived and predicted class tells you that
passenger will die.
Once you understand these four parameters then we can calculate Accuracy, Precision,
Recall and F1 score.

Accuracy - Accuracy is the most intuitive performance measure and it is simply a ratio of
correctly predicted observation to the total observations. One may think that, if we have
high accuracy then our model is best. Yes, accuracy is a great measure but only when
you have symmetric datasets where values of false positive and false negatives are
almost same. Therefore, you have to look at other parameters to evaluate the
performance of your model. For our model, we have got 0.803 which means our model is
approx. 80% accurate.

Accuracy = TP+TN / TP+FP+FN+TN

Precision - Precision is the ratio of correctly predicted positive observations to the total
predicted positive observations. The question that this metric answer is of all passengers
that labeled as survived, how many actually survived? High precision relates to the low
false positive rate. We have got 0.788 precision which is pretty good.

Precision = TP / TP+FP

What do you notice for the denominator? The denominator is actually the Total Predicted
Positive! So the formula becomes
Immediately, you can see that Precision talks about how precise/accurate your model is
out of those predicted positive, how many of them are actual positive.

Precision is a good measure to determine, when the costs of False Positive is high. For
instance, email spam detection. In email spam detection, a false positive means that an
email that is non-spam (actual negative) has been identified as spam (predicted spam).
The email user might lose important emails if the precision is not high for the spam
detection model.

Recall (Sensitivity) - Recall is the ratio of correctly predicted positive observations to the
all observations in actual class - yes. The question recall answers is: Of all the passengers
that truly survived, how many did we label? We have got recall of 0.631 which is good for
this model as it’s above 0.5.

Recall = TP/TP+FN
There you go! So, Recall actually calculates how many of the Actual Positives our model
capture through labeling it as Positive (True Positive). Applying the same understanding,
we know that Recall shall be the model metric we use to select our best model when there
is a high cost associated with False Negative.

For instance, in fraud detection or sick patient detection. If a fraudulent transaction (Actual
Positive) is predicted as non-fraudulent (Predicted Negative), the consequence can be
very bad for the bank.

Similarly, in sick patient detection. If a sick patient (Actual Positive) goes through the test
and predicted as not sick (Predicted Negative). The cost associated with False Negative
will be extremely high if the sickness is contagious.

F1 score - F1 Score is the weighted average of Precision and Recall. Therefore, this
score takes both false positives and false negatives into account. Intuitively it is not as
easy to understand as accuracy, but F1 is usually more useful than accuracy, especially
if you have an uneven class distribution. Accuracy works best if false positives and false
negatives have similar cost. If the cost of false positives and false negatives are very
different, it’s better to look at both Precision and Recall. In our case, F1 score is 0.701.

F1 Score = 2*(Recall * Precision) / (Recall + Precision)

F1 Score is needed when you want to seek a balance between Precision and Recall.
Right…so what is the difference between F1 Score and Accuracy then? We have
previously seen that accuracy can be largely contributed by a large number of True
Negatives which in most business circumstances, we do not focus on much whereas
False Negative and False Positive usually has business costs (tangible & intangible) thus
F1 Score might be a better measure to use if we need to seek a balance between
Precision and Recall AND there is an uneven class distribution (large number of Actual
Negatives).

So, whenever you build a model, this should help you to figure out what these parameters
mean and how good your model has performed.
78 (TP) 89 (FN)
55 (FP) 66 (TN)

Accuracy = TP+TN / TP+FP+TN+FN


=78+66 / (78+66+55+89) = ? = 0 to 1

= 0.67 = 67.00 % Accuracy

You might also like