Open In App

How to produce a confusion matrix and find the misclassification rate of the Naive Bayes Classifier in R?

Last Updated : 23 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Producing a confusion matrix and calculating the misclassification rate of a Naive Bayes Classifier in R involves a few straightforward steps. In this guide, we'll use a sample dataset to demonstrate how to interpret the results.

Understanding Confusion Matrix

A confusion matrix is a table that describes the performance of a classification model by summarizing the counts of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions made by the model.

Misclassification Rate

The misclassification rate is a metric that measures the overall error rate of a classification model. This rate quantifies the proportion of incorrect predictions made by the model.

Let's apply the Naïve Bayes classifier to the famous Iris dataset to demonstrate how to compute a confusion matrix and Step-by-Step Implementation to calculate the misclassification rate in R Programming Language.

Step 1: Load Required Libraries and Dataset

First, install and load necessary packages.

R
# Install and load necessary packages
install.packages("e1071")
install.packages("caret")

library(e1071)
library(caret)

Step 2: Load and Prepare Dataset

For this example, we'll use the built-in iris dataset, which is commonly used for classification tasks.

R
# Load the dataset
data(iris)

# Inspect the structure of the dataset
head(iris)

Output:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

Step 3: Split Data into Training and Testing Sets

Split the dataset into training and testing sets. We'll use 70% of the data for training and 30% for testing.

R
set.seed(123)  # Set seed for reproducibility
trainIndex <- createDataPartition(iris$Species, p = 0.7, list = FALSE)
data_train <- iris[trainIndex, ]
data_test <- iris[-trainIndex, ]

Step 4: Train Naïve Bayes Classifier

Train the Naive Bayes classifier using the training data.

R
# Train Naïve Bayes classifier
model <- naiveBayes(Species ~ ., data = data_train)

Step 5: Predict Classes on Test Data and Produce Confusion Matrix

Use the trained model to predict classes on the test data.

R
# Create confusion matrix
conf_matrix <- confusionMatrix(data = predicted_class, reference = data_test$Species)
conf_matrix

Output:

Confusion Matrix and Statistics

Reference
Prediction setosa versicolor virginica
setosa 15 0 0
versicolor 0 13 2
virginica 0 2 13

Overall Statistics

Accuracy : 0.9111
95% CI : (0.7878, 0.9752)
No Information Rate : 0.3333
P-Value [Acc > NIR] : 8.467e-16

Kappa : 0.8667

Mcnemar's Test P-Value : NA

Statistics by Class:

Class: setosa Class: versicolor Class: virginica
Sensitivity 1.0000 0.8667 0.8667
Specificity 1.0000 0.9333 0.9333
Pos Pred Value 1.0000 0.8667 0.8667
Neg Pred Value 1.0000 0.9333 0.9333
Prevalence 0.3333 0.3333 0.3333
Detection Rate 0.3333 0.2889 0.2889
Detection Prevalence 0.3333 0.3333 0.3333
Balanced Accuracy 1.0000 0.9000 0.9000

Step 6: Calculate Misclassification Rate

Compute the misclassification rate from the confusion matrix.

R
# Misclassification rate
misclassification_rate <- 1 - conf_matrix$overall['Accuracy']
cat("Misclassification Rate:", misclassification_rate, "\n")

Output:

Misclassification Rate: 0.08888889 

Conclusion

By following these steps, you can produce a confusion matrix and calculate the misclassification rate for a Naïve Bayes Classifier in R. This process allows you to assess how well the classifier performs in predicting classes based on the test data. The confusion matrix provides insights into true positives, true negatives, false positives, and false negatives, while the misclassification rate quantifies the overall error rate of the classifier. Adjustments in training and model evaluation can further refine the classifier's performance for specific classification tasks.


Similar Reads