0% found this document useful (0 votes)
11 views

Lecture 3 Precision vs Recall machine Learning

The document discusses the importance of understanding precision and recall in machine learning, emphasizing that accuracy alone can be misleading, especially in imbalanced datasets. It explains how precision measures the relevance of detected items while recall measures the detection of relevant elements, highlighting the trade-offs between the two based on specific use cases. The document also introduces the F-measure as a way to combine precision and recall for better model evaluation.

Uploaded by

m amjad
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture 3 Precision vs Recall machine Learning

The document discusses the importance of understanding precision and recall in machine learning, emphasizing that accuracy alone can be misleading, especially in imbalanced datasets. It explains how precision measures the relevance of detected items while recall measures the detection of relevant elements, highlighting the trade-offs between the two based on specific use cases. The document also introduces the F-measure as a way to combine precision and recall for better model evaluation.

Uploaded by

m amjad
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Precision vs Recall in Machine Learning

Lecture: 03
 Introduction Precision vs Recall in Machine Learning
 Why you shouldn't blindly use your most accurate ML model
 Measuring relevance: Dealing with high-priority classes
Agenda  Combining precision and recall: The F-measure
 "To minimize the mistakes your AI will make, you should use the most
accurate Machine Learning model." Sounds straightforward, right?
However, making the least mistakes should not always be your goal
Introduction Precision since different types of mistakes can have varying impacts. ML models
vs Recall in Machine will make mistakes and it is, therefore, crucial to decide which mistakes
you can better live with.
Learning
 To choose the right ML model and make informed decisions based on
its predictions, it is important to understand different measures of
relevance.
 The precision of a model describes how many detected items are truly
 Precision relevant. It is calculated by dividing the true positives by overall
positives.
 Recall is a measure of how many relevant elements were detected.
Recall Therefore it divides true positives by the number of relevant elements.
 First, let's start by defining accuracy:
 The accuracy of an ML model describes how many data points
 were detected correctly.
 To use a practical example, let's look at an image classification problem in which
 the AI is tasked to label an image dataset containing images of 500 cats and 500
dogs.
Why you shouldn't  The model correctly labels 500 dogs and 499 cats. Mistakenly, it labels one cat as
blindly use your most "dog".
accurate ML model  The corresponding accuracy is therefore 99.9%. Here, accuracy is a good
assessment of model quality.
 For comparison, let's look at a second, less balanced example: A hospital looks
for cancer in 1,000 images.
 In reality, two of those pictures contain evidence of cancer but the model only
detects one of them.
 Since the model only makes one mistake by labeling one cancerous image as
"healthy", the accuracy of the model is also at 99.9%.
 Take a look at image classification use cases here.
 In this case, the 99.9% accuracy gives a wrong impression as the model
actually missed 50% of relevant items. Without a doubt, it would be
Why you shouldn't
preferable to reduce the accuracy to 99% and mistakenly detect 8
blindly use your most
healthy images as "cancerous" if, in return, the second cancerous image
accurate ML model
could be detected – the trade-off of manually checking 10 images to
discover two relevant elements is unworthy of discussion.
 If your dataset is not well balanced or mistakes have a varying impact,
your model's accuracy is not a good measure of performance.
Measuring relevance:  Whenever you are looking for specific information, the main task is
Dealing with high- often to differentiate between the relevant data you are looking for and
priority classes the irrelevant information that clouds your view. Therefore, it is more
important to analyze model performance concerning relevant elements
and not the overall dataset.
 first example. If the objective is to detect dogs, all dogs are relevant
elements whereas cats are irrelevant elements.
Measuring relevance:  In this task, the AI can make two types of mistakes:
Dealing with high- 1. It can miss the detection of a dog (false negative) or
priority classes
2. It can wrongly identify a cat as a dog (false positive).
 For a detailed description of the different mistakes, their possible
implications, and how you can systematically control them, head over
to our article on how to control AI-enabled workflow automation.
Measuring relevance:
Dealing with high-
priority classes

Ideally, the AI should detect all dogs without a miss and make no
mistake by labeling a cat as a dog. Hence, there are two main
dimensions according to which the correctness of Machine Learning
models can be compared.
 As we outlined at the beginning of this article, the precision of a model
describes how many detected items are actually relented.
 Our first example compares the number of dogs that were detected to
the number of dogs and dressed-up cats that were all detected as dogs.
Since missed detections of dogs are not considered in the calculation, it
Precision can be increased by setting higher thresholds on when a dog should be
detected as such.
 In the cat/dog example, the precision is at 99.8% since out of the 501
animals that were detected as dogs, only one was a cat. If we look at
the cancer example, we get a perfect score of 100% since the model
detected no healthy image as cancerous.
 A recall is a measure of how many relevant elements were detected.
Our cat/dog example compares the dogs that were detected to the
overall amount of dogs in the dataset (disguised or not). Hence, the
recall of the model is at a perfect 100%.
Recall  In contrast, the cancer-detection model has a terrible recall. Since only
one of two examples of cancer was detected, the recall is at 50%. While
accuracy and precision suggested that the model is suitable to detect
cancer, calculating recall reveals its weakness.
 As with precision, analyzing purely recall can also give a wrong
impression of model performance. A model labeling all animals in the
dataset as "dog" would have a recall of 100% since it would detect all
Recall dogs without a miss. The 500 wrongly labeled cats would not have an
impact on recall.
 For the individual element, the recall percentage gives the probability
that a randomly selected relevant item from the dataset will be detected.
Recall
 Going back to the question of how to select the right model, there's a
trade-off between trying to detect all relevant items and avoiding
making wrong detections. In the end, the decision depends on your use
case.
 Put differently, you will need to consider these questions: How crucial
is it that you detect every relevant element? Are you willing to manually
sort out irrelevant elements in return for optimal recall?
Recall
 In the cancer diagnosis example, false negatives should be avoided at all
costs since they can have lethal consequences. Here, recall is a better
measure than precision.
 If you were to optimize recommendations on YouTube, false negatives
are less important since only a small subset of recommendations is
shown anyways. Most importantly, false positives (bad
recommendations) should be avoided. Hence, the model should be
optimized for precision.
 There is also a way to combine the two and it can sometimes make
sense to calculate what's called the F-measure. However, unless you are
Combining precision currently preparing for a statistics exam, the above might already be a
and recall: The F- stretch and to be honest: we struggle with them, too. When working
with our software, all you really need to worry about are these two
measure measures.
 https://levity.ai/blog/precision-vs-recall
 https://www.educba.com/deep-learning/
 https://www.educba.com/deep-learning-toolbox/
 https://www.educba.com/deep-learning-software
Supporting Material
 https://towardsdatascience.com
 https://www.edureka.com
 https://www.geeksforgeeks.org
Q/A Thanks

You might also like