0% found this document useful (0 votes)

2 views

Evaluating Model Performance Unit 6

Uploaded by

bauuaverma2002

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Evaluating Model Performance Unit 6

Uploaded by

bauuaverma2002

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 33

Evaluating Model Performance

• Evaluating student performance

• Evaluating employee performance
• Evaluating machine learning algorithm
performance
Measuring performance for
classification
• Evaluating the performance of any medical test
• The goal of evaluating a classification model is to
have a better understanding of how its
performance will extrapolate to future cases.
• Though we've evaluated classifiers in the prior
chapters, it's worth reflecting on the types of
data at our disposal:
• Actual class values
• Predicted class values
• Estimated probability of the prediction
• The actual and predicted class values may be self-
evident, but they are the key to evaluation. Just
like a teacher uses an answer key to assess the
student's answers, we need to know the correct
answer for a machine learner's predictions. The
goal is to maintain two vectors of data: one
holding the correct or actual class values, and the
other holding the predicted class values. Both
vectors must have the same number of values
stored in the same order. The predicted and
actual values may be stored as separate R vectors
or columns in a single R data frame.
• Obtaining this data is easy. The actual class
values come directly from the target feature
in the test dataset. Predicted class values are
obtained from the classifier built upon the
training data, and applied to the test data. For
most machine learning packages, this involves
applying the predict() function to a model
object and a data frame of test data, such as:
predicted_outcome <- predict(model,
test_data).
• Studying these internal prediction probabilities
provides useful data to evaluate a model's
performance. If two models make the same
number of mistakes, but one is more capable of
accurately assessing its uncertainty, then it is a
smarter model. It's ideal to fid a learner that is
extremely confident when making a correct
prediction, but timid in the face of doubt. The
balance between confidence and caution is a key
part of model evaluation.
• Unfortunately, obtaining internal prediction
probabilities can be tricky because the method to
do so varies across classifiers. In general, for
most classifiers, the predict() function is used to
specify the desired type of prediction. To obtain a
single predicted class, such as spam or ham, you
typically set the type = "class“ parameter. To
obtain the prediction probability, the type
parameter should be set to one of "prob",
"posterior", "raw", or "probability" depending on
the classifier used.
• For example, to output the predicted
probabilities for the C5.0 classifier-
Classification Using Decision Trees and Rules,
use the predict() function with type = "prob"
as follows:
• > predicted_prob <- predict(credit_model,
credit_test, type = "prob")
• In most cases, the predict() function returns a
probability for each category of the outcome. For
example, in the case of a two-outcome model
like the SMS classifier, the predicted probabilities
might be a matrix or data frame as shown here:
• > head(sms_test_prob)
• For convenience during the evaluation process, it
can be helpful to construct a data frame
containing the predicted class values, actual class
values, as well as the estimated probabilities of
interest.
Confusion Matrix
• A confusion matrix is a table that categorizes
predictions according to whether they match the
actual value. One of the table's dimensions
indicates the possible categories of predicted
values, while the other dimension indicates the
same for actual values. Although we have only
seen 2 x 2 confusion matrices so far, a matrix can
be created for models that predict any number of
class values. The following figure depicts the
familiar confusion matrix for a two-class binary
model as well as the 3 x 3 confusion matrix for a
three-class model.
• When the predicted value is the same as the
actual value, it is a correct classification.
Correct predictions fall on the diagonal in the
confusion matrix (denoted by O). The off-
diagonal matrix cells (denoted by X) indicate
the cases where the predicted value differs
from the actual value. These are incorrect
predictions.
• The most common performance measures consider the model's ability
to discern one class versus all others. The class of interest is known as
the positive class, while all others are known as negative.
• The relationship between the positive class and negative class
predictions can be depicted as a 2 x 2 confusion matrix that tabulates
whether predictions fall into one of the four categories:
• True Positive (TP): Correctly classified as the class of interest
• True Negative (TN): Correctly classified as not the class of interest
• False Positive (FP): Incorrectly classified as the class of interest
False Negative (FN): Incorrectly classified as not the class of interest
Using confusion matrix to measure
accuracy
• An easy way to tabulate a classifier's
predictions into a confusion matrix is to use
R's table() function. The command to create a
confusion matrix for the SMS data is shown as
follows. The counts in this table could then be
used to calculate accuracy and other statistics:
• >table(sms_results$actual_type,sms_results$
predict_type)
• If you like to create a confusion matrix with a more
informative output, the CrossTable() function in the
gmodels package offers a customizable solution. you will
need to do so using the install.packages("gmodels")
command.
• By default, the CrossTable() output includes proportions in
each cell that indicate the cell count as a percentage of
table's row, column, or overall total counts. The output also
includes row and column totals. As shown in the following
code, the syntax is similar to the table() function:
• > library(gmodels)
• >CrossTable(sms_results$actual_type,sms_results$predict
_type)
• We can use the confusion matrix to obtain the accuracy
and error rate. Since the accuracy is (TP + TN) / (TP + TN +
FP + FN), we can calculate it using following command:
• > (152 + 1203) / (152 + 1203 + 4 + 31)
• 1] 0.9748201
• We can also calculate the error rate (FP + FN) / (TP + TN +
FP + FN) as:
• > (4 + 31) / (152 + 1203 + 4 + 31)
• [1] 0.02517986
• This is the same as one minus accuracy:
• > 1 - 0.9748201
• [1] 0.0251799
Other Performance Measures
• The Classification and Regression Training
package caret by Max Kuhn includes functions to
compute many such performance measures. This
package provides a large number of tools to
prepare, train, evaluate, and visualize machine
learning models and data.
• Before proceeding, you will need to install the
package using the install.packages("caret")
command.
• Caret provides measures of model
performance that consider the ability to
classify the positive class, a positive parameter
should be specified. In this case, since the SMS
classifier is intended to detect spam, we will
set positive = "spam" as follows:
• > library(caret)
• > confusionMatrix(sms_results$predict_type,
• sms_results$actual_type, positive = "spam")
The kappa statistic
• The kappa statistic adjusts accuracy by
accounting for the possibility of a correct
prediction.
• Kappa values range from 0 to a maximum of 1,
which indicates perfect agreement between
the model's predictions and the true values.
Values less than one indicate imperfect
agreement.
• Poor agreement = less than 0.20
• Fair agreement = 0.20 to 0.40
• Moderate agreement = 0.40 to 0.60
• Good agreement = 0.60 to 0.80
• Very good agreement = 0.80 to 1.00
Sensitivity and specificity
• The sensitivity of a model (also called the
true positive rate) measures the proportion
of positive examples that were correctly
classified. Therefore, as shown in the
following formula, it is calculated as the
number of true positives divided by the total
number of positives, both correctly classified
(the true positives) as well as incorrectly
classified (the false negatives):
• The specificity of a model (also called the
true negative rate) measures the proportion
of negative examples that were correctly
classified. As with sensitivity, this is computed
as the number of true negatives, divided by
the total number of negatives—the true
negatives plus the false positives:
Precision and recall
• The precision (also known as the positive
predictive value) is defined as the proportion
of positive examples that are truly positive; in
other words, when a model predicts the
positive class, how often is it correct? A
precise model will only predict the positive
class in cases that are very likely to be
positive. It will be very trustworthy.
• On the other hand, recall is a measure of how
complete the results are.
• A model with a high recall captures a large
portion of the positive examples, meaning that it
has wide breadth. For example, a search engine
with a high recall returns a large number of
documents pertinent to the search query.
Similarly, the SMS spam filter has a high recall if
the majority of spam messages are correctly
identified.
The F-measure
• A measure of model performance that combines
precision and recall into a single number is
known as the F-measure (also sometimes called
the F1 score or F-score). The F-measure
combines precision and recall using the harmonic
mean, a type of average that is used for rates of
change. The harmonic mean is used rather than
the common arithmetic mean since both
precision and recall are expressed as proportions
between zero and one, which can be interpreted
as rates. The following is the formula for the F-
measure:

M.S. King - Planet Rotschild Vol. 1. - The Forbidden History of The New World Order
100% (7)
M.S. King - Planet Rotschild Vol. 1. - The Forbidden History of The New World Order
291 pages
PSC Checklist
No ratings yet
PSC Checklist
19 pages
Evaluating Model Performance Unit 6
No ratings yet
Evaluating Model Performance Unit 6
46 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
ML MAKAUT unit-3
No ratings yet
ML MAKAUT unit-3
6 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
EvaluationMatrix
No ratings yet
EvaluationMatrix
29 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
20 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Chapitre_2
No ratings yet
Chapitre_2
26 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Performance Measures - Session 2
No ratings yet
Performance Measures - Session 2
35 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
AD3501-DL-UNIT 4 NOTES
No ratings yet
AD3501-DL-UNIT 4 NOTES
16 pages
009 Confusion Matrix - Unlocked
No ratings yet
009 Confusion Matrix - Unlocked
4 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
CE880_Lecture6_slides
No ratings yet
CE880_Lecture6_slides
25 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
A Gentle Introduction To Statistical Hypothesis Tests
No ratings yet
A Gentle Introduction To Statistical Hypothesis Tests
6 pages
Chapitre_2-converti
No ratings yet
Chapitre_2-converti
26 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
13 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
11 pages
Module 7 - Evaluation Measures
No ratings yet
Module 7 - Evaluation Measures
27 pages
Lec 8
No ratings yet
Lec 8
35 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
Accuracy and error measures
No ratings yet
Accuracy and error measures
14 pages
chapter 5 Model Evaluation
No ratings yet
chapter 5 Model Evaluation
21 pages
Module 2
No ratings yet
Module 2
151 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Unit 2 Chap 4
No ratings yet
Unit 2 Chap 4
14 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
BSC ML CH1.pptx
No ratings yet
BSC ML CH1.pptx
63 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Developing Skills For Business Leadership
No ratings yet
Developing Skills For Business Leadership
8 pages
Lesson 5 Money Matters
No ratings yet
Lesson 5 Money Matters
1 page
Three-Way Switches Dimensional Data 115 - 230 KV, Type 3G and 3GL
No ratings yet
Three-Way Switches Dimensional Data 115 - 230 KV, Type 3G and 3GL
4 pages
Index: S.No. Particulars No
No ratings yet
Index: S.No. Particulars No
49 pages
CSC Ssce Practical Examination
No ratings yet
CSC Ssce Practical Examination
7 pages
Smart Watch PPT Btech Project
No ratings yet
Smart Watch PPT Btech Project
21 pages
A GLUT-Based User Interface Library: by Paul Rademacher
No ratings yet
A GLUT-Based User Interface Library: by Paul Rademacher
38 pages
CMA Part 1A Mircoeconomics
100% (3)
CMA Part 1A Mircoeconomics
54 pages
Module 3 - Planning
No ratings yet
Module 3 - Planning
10 pages
Triggers
100% (1)
Triggers
4 pages
1367 6 PDF
100% (1)
1367 6 PDF
20 pages
Confederation of Indian Industry
No ratings yet
Confederation of Indian Industry
12 pages
ACC 107 Depletion of Mineral Resources
No ratings yet
ACC 107 Depletion of Mineral Resources
17 pages
Eapp-Week 7 Las 1-3
No ratings yet
Eapp-Week 7 Las 1-3
4 pages
Rule Against Accumulation Section 17
100% (3)
Rule Against Accumulation Section 17
2 pages
AAAV 30mm HE Lethality Testing: Test Procedures and Casualty Models
No ratings yet
AAAV 30mm HE Lethality Testing: Test Procedures and Casualty Models
32 pages
Dissertation Civil Engineering
100% (2)
Dissertation Civil Engineering
6 pages
202402261220743031
No ratings yet
202402261220743031
2 pages
IKEA and DELL Case Study
No ratings yet
IKEA and DELL Case Study
2 pages
McKinsey 7-S Framework
No ratings yet
McKinsey 7-S Framework
4 pages
EE010 508 Integrated Circuits Lab
No ratings yet
EE010 508 Integrated Circuits Lab
1 page
Conditional Formatting in Excel
No ratings yet
Conditional Formatting in Excel
23 pages
Lea 4 - Instructional Material
No ratings yet
Lea 4 - Instructional Material
34 pages
High Voltage Switchgear Control Circuits
No ratings yet
High Voltage Switchgear Control Circuits
2 pages
Get The R Book, 3rd Edition Elinor Jones PDF ebook with Full Chapters Now
100% (12)
Get The R Book, 3rd Edition Elinor Jones PDF ebook with Full Chapters Now
63 pages
Overlord - Volume 14 - The Witch of The Falling Kingdom Dark
No ratings yet
Overlord - Volume 14 - The Witch of The Falling Kingdom Dark
471 pages
Full Download Building energy simulation a workbook using designbuilder Second Edition Garg PDF DOCX
100% (1)
Full Download Building energy simulation a workbook using designbuilder Second Edition Garg PDF DOCX
67 pages
IELTS Reading Practice
No ratings yet
IELTS Reading Practice
5 pages

Evaluating Model Performance Unit 6

Uploaded by

Evaluating Model Performance Unit 6

Uploaded by

Evaluating Model Performance

• Evaluating student performance

You might also like