LAB7
LAB7
After selecting an appropriate model and training it on the training data, we can evaluate its
performance using the testing data. Let's assume that the model predicted 180 negative
samples correctly and 20 negative samples incorrectly. Similarly, it predicted 150 positive
samples correctly and 50 positive samples incorrectly.
Using this information, we can calculate the accuracy, precision, recall, and confusion matrix.
Accuracy:
Accuracy measures the proportion of correctly classified samples out of the total number of
samples. It is calculated as follows:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Where TP is the number of true positive samples, TN is the number of true negative samples,
FP is the number of false positive samples, and FN is the number of false negative samples.
In our example, the accuracy is calculated as follows:
Accuracy = (180 + 150) / (180 + 20 + 150 + 50) = 0.77 or 77%
Precision:
Precision measures the proportion of correctly classified positive samples out of the total
number of positive samples predicted by the model. It is calculated as follows:
Precision = TP / (TP + FP)
In our example, the precision is calculated as follows:
Precision = 150 / (150 + 50) = 0.75 or 75%
Recall:
Recall measures the proportion of correctly classified positive samples out of the total
number of positive samples in the dataset. It is calculated as follows:
Recall = TP / (TP + FN)
In our example, the recall is calculated as follows:
Recall = 150 / (150 + 50) = 0.75 or 75%
Confusion matrix:
A confusion matrix is a table that summarizes the performance of a model on a testing
dataset. It shows the number of true positive, false positive, true negative, and false negative
predictions made by the model.
In our example, the confusion matrix is as follows:
Actual Positive Actual Negative
Predicted Positive 150 50
Predicted Negative 20 780
The confusion matrix provides a more detailed understanding of the performance of the
model. It helps to identify the types of errors made by the model and can be used to optimize
the model by adjusting the threshold or selecting a different algorithm.
MSE (Mean Squared Error) and RMSE (Root Mean Squared Error) are metrics used in
regression analysis to measure the difference between the predicted values and actual values.
The lower the MSE and RMSE values, the better the model's performance.
MSE is calculated by taking the average of the squared difference between the predicted and
actual values. It is given by the formula:
where n is the number of observations, yi is the actual value, and ŷi is the predicted value.
RMSE is calculated by taking the square root of the MSE. It is given by the formula:
RMSE = √MSE
F1 score is a metric used in binary classification to measure the model's accuracy. It is the
harmonic mean of precision and recall. It is a number between 0 and 1, with 1 being the best
possible score.
Precision is the number of true positives divided by the total number of predicted positives.
Recall is the number of true positives divided by the total number of actual positives. The F1
score is given by the formula:
F1 score = 2 * (precision * recall) / (precision + recall)
It is a useful metric when the classes are imbalanced, meaning that one class has more
observations than the other. The F1 score takes into account both precision and recall, which
makes it a better metric than accuracy in imbalanced datasets.
Conclusion:
Evaluating the performance of a supervised learning algorithm involves calculating various
metrics such as accuracy, precision, recall, and confusion matrix. These metrics help to
understand the strengths and weaknesses of the model and can be used to optimize its
performance
FAQ’s
1. What is accuracy?
2. What is precision?
3. What is recall?
4. What is a confusion matrix?
5. How do you interpret a confusion matrix?