Confusion Matrix
Confusion Matrix
model, where N is the number of target classes. The matrix compares the actual target values
with those predicted by the machine learning model.
o For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it
is 3*3 table, and so on.
o The matrix is divided into two dimensions that are predicted values and actual
values along with the total number of predictions.
o Predicted values are those values, which are predicted by the model, and actual
values are the true values for the given observations.
o It looks like the below table:
o True Negative: Model has given prediction No, and the real or actual value was
also No.
o True Positive: The model has predicted yes, and the actual value was also true.
o False Negative: The model has predicted no, but the actual value was Yes, it is
also called as Type-II error.
o False Positive: The model has predicted Yes, but the actual value was No. It is
also called a Type-I error.
Suppose we are trying to create a model that can predict the result for the
disease that is either a person has that disease or not. So, the confusion matrix
for this is given as:
o The table is given for the two-class classifier, which has two predictions "Yes" and
"NO." Here, Yes defines that patient has the disease, and No defines that patient
does not has that disease.
o The classifier has made a total of 100 predictions. Out of 100 predictions, 89 are
true predictions, and 11 are incorrect predictions.
o The model has given prediction "yes" for 32 times, and "No" for 68 times.
Whereas the actual "Yes" was 27, and actual "No" was 73 times.
Recall: It is defined as the out of total positive classes, how our model predicted
correctly. The recall must be as high as possible.
Precision: It can be defined as the number of correct outputs provided by the model or
out of all positive classes that have predicted correctly by the model, how many of them
were actually true. It can be calculated using the below formula:
F-measure: If two models have low precision and high recall or vice versa, it is difficult
to compare these models. So, for this purpose, we can use F-score. This score helps us
to evaluate the recall and precision at the same time. The F-score is maximum if the
recall is equal to the precision. It can be calculated using the below formula:
import pandas as pd
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
#dataset.head()
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))