Confusion Matrix
Confusion Matrix
Introduction
❖Machine Learning deals with two types of
problems: Regression problems and
Classification problems.
❖Regression techniques or models are used when
our dependent variable is continuous in nature
whereas Classification techniques are used
when the dependent variable is categorical.
❖When a Machine Learning model is built
various evaluation metrics are used to check
the quality or the performance of a model.
❖ For classification models, metrics such as
Accuracy, Confusion Matrix, Classification
report (i.e Precision, Recall, F1 score), and
AUC-ROC curve are used.
What is Confusion Matrix?
❖Confusion Matrix is the visual representation
of the Actual VS Predicted values.
❖ It measures the performance of our Machine
Learning classification model and looks like a
table-like structure.
What is a Confusion Matrix?
❖A Confusion matrix is an N x N matrix used for
evaluating the performance of a classification
model, where N is the number of target classes.
❖The matrix compares the actual target values
with those predicted by the machine learning
model.
• For a binary classification problem, we would
have a 2 x 2 matrix as shown below with 4
values:
✓The above 2-dimensional array is the
Confusion Matrix which evaluates the
performance of a classification model on a
set of predicted values for which the true(or
actual) values are known.
How does the Confusion Matrix evaluate
the model’s performance?
• Now, all we need to know is how this matrix
works on the dataset.
Feature 1 Feature 2 Target( has cancer: 0,doesn’t have cancer:1) Prediction
( what our model predicted)
1 1
0 1
1 1
1 0
0 0
0 0
1 1
1 1
0 0
❖In the above classification dataset, Feature 1,
Feature 2, and up to Feature n are the
independent variables.
❖For Target (dependent variable) we have
assigned 1 to a positive value (i.e have cancer)
and 0 to a negative value (i.e doesn’t have
cancer).
❖ After we have trained our model and got our
predictions and we want to evaluate the
performance, this is how the confusion matrix
would look like.
Confusion matrix
✓ TP = 4, There are four cases in the dataset where the
model predicted 1 and the target was also 1
✓ FP = 1, There is only one case in the dataset where
the model predicted 1 but the target was 0
✓ FN = 1, There is only one case in the dataset where
the model predicted 0 but the target was 1
✓ TN = 3, There are three cases in the dataset where the
model predicted 0 and the target was also 0
Elements of Confusion Matrix
✓ It represents the different combinations of
Actual VS Predicted values.
1. True Positive (TP)
• The predicted value matches the actual value.
• The actual value was positive and the model
predicted a positive value.
True Negative (TN)
1. Accuracy
2. Precision (Positive Prediction Value)
3. Recall (True Positive Rate or Sensitivity)
4. F beta Score
1. ACCURACY:
Accuracy is the number of correctly (True) predicted results out of the
total.
= (4 + 3) / 9 = 0.77
• Accuracy should be considered when TP and TN are more important and the
dataset is balanced because in that case the model will not get baised based on
the class distribution.
• But in real-life classification problem, imbalanced class distribution exists.
Why Do We Need a Confusion Matrix?
• Let’s say you want to predict how many
people are infected with a contagious virus in
times before they show the symptoms, and
isolate them from the healthy population.
• The two values for our target variable would
be: Sick and Not Sick.
Our dataset is an example of an imbalanced
data set
• There are 947 data points for the negative
class and 3 data points for the positive class.
This is how we’ll calculate the accuracy:
Let’s see how our model performed:
• The total outcome values are:
• TP = 30, TN = 930, FP = 30, FN = 10
• So, the accuracy for our model turns out to
be:
• 96%! Not bad!
• But it is giving the wrong idea about the result. Think
about it.
• Our model should say “I can predict sick people 96%
of the time”.
• However, it is doing the opposite.
• It is predicting the people who will not get sick with
96% accuracy while the sick are spreading the virus!
Precision vs. Recall
• how to calculate Precision: