Confusion Matrix
Confusion Matrix
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an
error matrix,[8] is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised
learning one (in unsupervised learning it is usually called a matching matrix). Each row of the matrix represents the instances in
a predicted class while each column represents the instances in an actual class (or vice versa).[3] The name stems from the fact
that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).
It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both
dimensions (each combination of dimension and class is a variable in the contingency table).
Example
If a classification system has been trained to distinguish between cats and dogs, a confusion matrix will summarize the results of
testing the algorithm for further inspection. Assuming a sample of 13 animals — 8 cats and 5 dogs — the resulting confusion
matrix could look like the table below:
Predicted class
Cat Dog
Cat 5 3
Actual
class
Dog 2 3
In this confusion matrix, of the 8 actual cats, the system predicted that three were dogs, and of the five dogs, it predicted that two
were cats. All correct predictions are located in the diagonal of the table (highlighted in bold), so it is easy to visually inspect the
table for prediction errors, as they will be represented by values outside the diagonal.
Predicted class
P N
P TP FN
Actual
class
N FP TN
where: P = positive; N = Negative; TP = True Positive; FP = False Positive; TN = True Negative; FN = False Negative.
Table of confusion
In predictive analytics, a table of confusion (sometimes also called a confusion matrix), is a table with two rows and two
columns that reports the number of false positives, false negatives, true positives, and true negatives. This allows more detailed
analysis than mere proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is
unbalanced; that is, when the numbers of observations in different classes vary greatly. For example, if there were 95 cats and
only 5 dogs in the data, a particular classifier might classify all the observations as cats. The overall accuracy would be 95%, but
in more detail the classifier would have a 100% recognition rate (sensitivity) for the cat class but a 0% recognition rate for the
dog class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such
bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cat).
According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews
correlation coefficient (MCC)[6].
Assuming the confusion matrix above, its corresponding table of confusion, for the cat class, would be:
Predicted class
Terminology and derivations
Cat Non-cat from a confusion matrix
3 False
Cat 5 True Positives
Negatives condition positive (P)
Actual
class
Non- 2 False
the number of real positive cases in the data
3 True Negatives condition negative (N)
cat Positives
the number of real negative cases in the data
The final table of confusion would contain the average
values for all classes combined. true positive (TP)
eqv. with hit
Let us define an experiment from P positive instances true negative (TN)
and N negative instances for some condition. The four eqv. with correct rejection
outcomes can be formulated in a 2×2 confusion false positive (FP)
matrix, as follows: eqv. with false alarm, Type I error
false negative (FN)
eqv. with miss, Type II error
accuracy (ACC)
F1 score
is the harmonic mean of precision and sensitivity
True condition
References
1. Balayla, Jacques. "Prevalence Threshold and the Geometry of Screening Curves." arXiv preprint
arXiv:2006.00398 (2020) doi: https://arxiv.org/abs/2006.00398.
2. Fawcett, Tom (2006). "An Introduction to ROC Analysis" (http://people.inf.elte.hu/kiss/11dwhdm/roc.pdf) (PDF).
Pattern Recognition Letters. 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010 (https://doi.org/10.1016%2Fj.patr
ec.2005.10.010).
3. Powers, David M W (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness,
Markedness & Correlation" (https://www.researchgate.net/publication/228529307_Evaluation_From_Precision_R
ecall_and_F-Factor_to_ROC_Informedness_Markedness_Correlation). Journal of Machine Learning
Technologies. 2 (1): 37–63.
4. Ting, Kai Ming (2011). Encyclopedia of machine learning (https://link.springer.com/referencework/10.1007%2F97
8-0-387-30164-8). Springer. ISBN 978-0-387-30164-8.
5. Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson,
David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research" (https://www.cawcr.
gov.au/projects/verification/). Collaboration for Australian Weather and Climate Research. World Meteorological
Organisation. Retrieved 2019-07-17.
6. Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1
score and accuracy in binary classification evaluation" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941312).
BMC Genomics. 21 (6). doi:10.1186/s12864-019-6413-7 (https://doi.org/10.1186%2Fs12864-019-6413-7).
PMC 6941312 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941312). PMID 31898477 (https://pubmed.ncbi.nl
m.nih.gov/31898477).
7. Tharwat A (August 2018). "Classification assessment methods". Applied Computing and Informatics.
doi:10.1016/j.aci.2018.08.003 (https://doi.org/10.1016%2Fj.aci.2018.08.003).
8. Stehman, Stephen V. (1997). "Selecting and interpreting measures of thematic classification accuracy". Remote
Sensing of Environment. 62 (1): 77–89. Bibcode:1997RSEnv..62...77S (https://ui.adsabs.harvard.edu/abs/1997R
SEnv..62...77S). doi:10.1016/S0034-4257(97)00083-7 (https://doi.org/10.1016%2FS0034-4257%2897%2900083
-7).
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the
Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.