0% found this document useful (0 votes)
2 views

Evaluating Model Performance Unit 6

Uploaded by

bauuaverma2002
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Evaluating Model Performance Unit 6

Uploaded by

bauuaverma2002
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Evaluating Model Performance

• Evaluating student performance


• Evaluating employee performance
• Evaluating machine learning algorithm
performance
Measuring performance for
classification
• Evaluating the performance of any medical test
• The goal of evaluating a classification model is to
have a better understanding of how its
performance will extrapolate to future cases.
• Though we've evaluated classifiers in the prior
chapters, it's worth reflecting on the types of
data at our disposal:
• Actual class values
• Predicted class values
• Estimated probability of the prediction
• The actual and predicted class values may be self-
evident, but they are the key to evaluation. Just
like a teacher uses an answer key to assess the
student's answers, we need to know the correct
answer for a machine learner's predictions. The
goal is to maintain two vectors of data: one
holding the correct or actual class values, and the
other holding the predicted class values. Both
vectors must have the same number of values
stored in the same order. The predicted and
actual values may be stored as separate R vectors
or columns in a single R data frame.
• Obtaining this data is easy. The actual class
values come directly from the target feature
in the test dataset. Predicted class values are
obtained from the classifier built upon the
training data, and applied to the test data. For
most machine learning packages, this involves
applying the predict() function to a model
object and a data frame of test data, such as:
predicted_outcome <- predict(model,
test_data).
• Studying these internal prediction probabilities
provides useful data to evaluate a model's
performance. If two models make the same
number of mistakes, but one is more capable of
accurately assessing its uncertainty, then it is a
smarter model. It's ideal to fid a learner that is
extremely confident when making a correct
prediction, but timid in the face of doubt. The
balance between confidence and caution is a key
part of model evaluation.
• Unfortunately, obtaining internal prediction
probabilities can be tricky because the method to
do so varies across classifiers. In general, for
most classifiers, the predict() function is used to
specify the desired type of prediction. To obtain a
single predicted class, such as spam or ham, you
typically set the type = "class“ parameter. To
obtain the prediction probability, the type
parameter should be set to one of "prob",
"posterior", "raw", or "probability" depending on
the classifier used.
• For example, to output the predicted
probabilities for the C5.0 classifier-
Classification Using Decision Trees and Rules,
use the predict() function with type = "prob"
as follows:
• > predicted_prob <- predict(credit_model,
credit_test, type = "prob")
• In most cases, the predict() function returns a
probability for each category of the outcome. For
example, in the case of a two-outcome model
like the SMS classifier, the predicted probabilities
might be a matrix or data frame as shown here:
• > head(sms_test_prob)
• For convenience during the evaluation process, it
can be helpful to construct a data frame
containing the predicted class values, actual class
values, as well as the estimated probabilities of
interest.
Confusion Matrix
• A confusion matrix is a table that categorizes
predictions according to whether they match the
actual value. One of the table's dimensions
indicates the possible categories of predicted
values, while the other dimension indicates the
same for actual values. Although we have only
seen 2 x 2 confusion matrices so far, a matrix can
be created for models that predict any number of
class values. The following figure depicts the
familiar confusion matrix for a two-class binary
model as well as the 3 x 3 confusion matrix for a
three-class model.
• When the predicted value is the same as the
actual value, it is a correct classification.
Correct predictions fall on the diagonal in the
confusion matrix (denoted by O). The off-
diagonal matrix cells (denoted by X) indicate
the cases where the predicted value differs
from the actual value. These are incorrect
predictions.
• The most common performance measures consider the model's ability
to discern one class versus all others. The class of interest is known as
the positive class, while all others are known as negative.
• The relationship between the positive class and negative class
predictions can be depicted as a 2 x 2 confusion matrix that tabulates
whether predictions fall into one of the four categories:
• True Positive (TP): Correctly classified as the class of interest
• True Negative (TN): Correctly classified as not the class of interest
• False Positive (FP): Incorrectly classified as the class of interest
False Negative (FN): Incorrectly classified as not the class of interest
Using confusion matrix to measure
accuracy
• An easy way to tabulate a classifier's
predictions into a confusion matrix is to use
R's table() function. The command to create a
confusion matrix for the SMS data is shown as
follows. The counts in this table could then be
used to calculate accuracy and other statistics:
• >table(sms_results$actual_type,sms_results$
predict_type)
• If you like to create a confusion matrix with a more
informative output, the CrossTable() function in the
gmodels package offers a customizable solution. you will
need to do so using the install.packages("gmodels")
command.
• By default, the CrossTable() output includes proportions in
each cell that indicate the cell count as a percentage of
table's row, column, or overall total counts. The output also
includes row and column totals. As shown in the following
code, the syntax is similar to the table() function:
• > library(gmodels)
• >CrossTable(sms_results$actual_type,sms_results$predict
_type)
• We can use the confusion matrix to obtain the accuracy
and error rate. Since the accuracy is (TP + TN) / (TP + TN +
FP + FN), we can calculate it using following command:
• > (152 + 1203) / (152 + 1203 + 4 + 31)
• 1] 0.9748201
• We can also calculate the error rate (FP + FN) / (TP + TN +
FP + FN) as:
• > (4 + 31) / (152 + 1203 + 4 + 31)
• [1] 0.02517986
• This is the same as one minus accuracy:
• > 1 - 0.9748201
• [1] 0.0251799
Other Performance Measures
• The Classification and Regression Training
package caret by Max Kuhn includes functions to
compute many such performance measures. This
package provides a large number of tools to
prepare, train, evaluate, and visualize machine
learning models and data.
• Before proceeding, you will need to install the
package using the install.packages("caret")
command.
• Caret provides measures of model
performance that consider the ability to
classify the positive class, a positive parameter
should be specified. In this case, since the SMS
classifier is intended to detect spam, we will
set positive = "spam" as follows:
• > library(caret)
• > confusionMatrix(sms_results$predict_type,
• sms_results$actual_type, positive = "spam")
The kappa statistic
• The kappa statistic adjusts accuracy by
accounting for the possibility of a correct
prediction.
• Kappa values range from 0 to a maximum of 1,
which indicates perfect agreement between
the model's predictions and the true values.
Values less than one indicate imperfect
agreement.
• Poor agreement = less than 0.20
• Fair agreement = 0.20 to 0.40
• Moderate agreement = 0.40 to 0.60
• Good agreement = 0.60 to 0.80
• Very good agreement = 0.80 to 1.00
Sensitivity and specificity
• The sensitivity of a model (also called the
true positive rate) measures the proportion
of positive examples that were correctly
classified. Therefore, as shown in the
following formula, it is calculated as the
number of true positives divided by the total
number of positives, both correctly classified
(the true positives) as well as incorrectly
classified (the false negatives):
• The specificity of a model (also called the
true negative rate) measures the proportion
of negative examples that were correctly
classified. As with sensitivity, this is computed
as the number of true negatives, divided by
the total number of negatives—the true
negatives plus the false positives:
Precision and recall
• The precision (also known as the positive
predictive value) is defined as the proportion
of positive examples that are truly positive; in
other words, when a model predicts the
positive class, how often is it correct? A
precise model will only predict the positive
class in cases that are very likely to be
positive. It will be very trustworthy.
• On the other hand, recall is a measure of how
complete the results are.
• A model with a high recall captures a large
portion of the positive examples, meaning that it
has wide breadth. For example, a search engine
with a high recall returns a large number of
documents pertinent to the search query.
Similarly, the SMS spam filter has a high recall if
the majority of spam messages are correctly
identified.
The F-measure
• A measure of model performance that combines
precision and recall into a single number is
known as the F-measure (also sometimes called
the F1 score or F-score). The F-measure
combines precision and recall using the harmonic
mean, a type of average that is used for rates of
change. The harmonic mean is used rather than
the common arithmetic mean since both
precision and recall are expressed as proportions
between zero and one, which can be interpreted
as rates. The following is the formula for the F-
measure:

You might also like