0% found this document useful (0 votes)
9 views

3-Performance Measures

The document outlines performance measures for regression and classification in machine learning, detailing various metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and classification metrics like accuracy, precision, recall, and F1 score. It emphasizes the importance of these measures in evaluating model performance and includes explanations of confusion matrices and ROC curves. The document also discusses the implications of different metrics in the context of balanced and imbalanced datasets.

Uploaded by

yashw609
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

3-Performance Measures

The document outlines performance measures for regression and classification in machine learning, detailing various metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and classification metrics like accuracy, precision, recall, and F1 score. It emphasizes the importance of these measures in evaluating model performance and includes explanations of confusion matrices and ROC curves. The document also discusses the implications of different metrics in the context of balanced and imbalanced datasets.

Uploaded by

yashw609
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

1

PERFORMANCE
MEASURES
2

PERFORMANCE MEASURE - REGRESSION

Performance Measures for regression are used to evaluate learning algorithms and form
an important aspect of machine learning

• These measures can also be used as heuristics to build learning models

• Performance Measures / Loss functions:

o Mean Squared Error (MSE)


o Mean Absolute Error (MAE)
o Root Mean Square Error (RMSE)
3

What is our Aim?

• Typical regression-based machine learning model, our model will


produce continuous values (predicted value)
• Our primary objective is to keep these predicted values closer to actual
values

❑ Pi - predicted values
❑ Ai - observed values / Actual Value

Residual Error= Pi-Ai

Ideal condition (hypothetical one) is that this error (difference) is 0, which


means our model can predict all values correctly (which is not going to
happen)
4

MEAN-ABSOLUTE ERROR

• Mean Absolute Error (MAE) is a measure of difference between two


continuous variables

• Average of the absolute errors


• More intuitive and less sensitive to outliers

• It doesn’t consider the direction, that is, positive or negative

❑ Pi - predicted values
❑ Ai - observed values
❑ n - number of observations

If consider the direction it called Mean Bias Error (MBE), which is a sum of
errors(difference)
5

Actual values [3,−0.5,2,7][3,−0.5,2,7


The predicted values [2.5,0.0,2,8][2.5,0.0,2,8]

MAE=(∣3−2.5∣+∣−0.5−0.0∣+∣2−2∣+∣7−8∣​) / 4 = 0.5
Actual values [3,−0.5,2,7][3,−0.5,2,7 6

MEAN-SQUARED ERROR The predicted values [2.5,0.0,2,8][2.5,0.0,2,8]

MSE=((3−2.5)2+(−0.5−0.0)2+(2−2)2+(7−8)2​)/4
= 0.375
• Measures the average of the squares of the errors

• Always non-negative, and values closer to zero are better

• Good performance metric

❑ Pi - predicted values
❑ Ai - observed values
❑ n - number of observations
7

ROOT MEAN-ABSOLUTE ERROR


• Measures differences between predicted values (by a model and the
values actually observed (Oi), where n is the number of observation

❑ Pi - predicted values
❑ Oi - observed values
❑ n - number of observations

• it indicates the spread of the residual errors. It is always positive, and a


lower value indicates better performance. Ideal value would be 0 but it is
never achieved.

• Effect of each error on RMSE is directly proportional to the squared error


therefore, RMSE is sensitive to outliers and can exaggerate results if
there are outliers in the data set.
8

PERFORMANCE MEASURE - CLASSIFICATION

In a classification problem, you can represent the errors using “Confusion Matrix”

Confusion Matrix

• Table with two dimensions viz. “Actual” and “Predicted”

• It is the easiest way to measure the performance of a


classification problem where the output can be of two or
more type of classes.
Metrics for Performance Evaluation 9

Focus on the predictive capability of a model


Rather than how fast it takes to classify or build models, scalability, etc.

Confusion Matrix:

PREDICTED CLASS

Class=Yes Class=No

a: TP (true positive)
Class=Yes a b
ACTUAL b: FN (false negative)

CLASS Class=No c d
c: FP (false positive)
d: TN (true negative)
1
0
Type-I and Type-II error
Metrics for Performance Evaluation… 1
1

PREDICTED CLASS

Class=Yes Class=No

Class=Yes a b
ACTUAL (TP) (FN)
CLASS
Class=No c d
(FP) (TN)

a+d TP + TN
Accuracy = =
a + b + c + d TP + TN + FP + FN
Most widely-used metric:
Class=Yes Class=No
1
Class=Yes a. (TP) B (FN) 2
ACCURACY ACTUAL
CLASS Class=No C (FP) D (TN)

Accuracy is the percent of predictions that were correct


The proportion of correctly classified emails (both Spam and Not Spam) out of the total number of emails.

For example consider 10,000 customer records


Action Actually bought Actually did not buy

Predicted(that there will be a TP : 500 FP : 100


buy)

Predicted(that there will be no FN : 400 TN : 9000


buy)

The "accuracy" is (9,000+500) out of 10,000 = 95%


Class=Yes Class=No
1
Class=Yes a. (TP) B (FN) 3
PRECISION (Positive predictive value) ACTUAL
CLASS Class=No C (FP) D (TN)

Precision is the percent of positive predictions that were correct


EX: The proportion of correctly predicted Spam emails out of all emails predicted as Spam

For example consider 10,000 customer records


Action Actually bought Actually did not buy

Predicted(that there will be a buy) TP : 500 FP : 100

Predicted(that there will be no buy) FN : 400 TN : 9000

The "precision" is 500 out of 600 = 83.33%


Class=Yes Class=No
1
Class=Yes a. (TP) B (FN) 4
RECALL or Sensitivity or True positive rate ACTUAL
CLASS Class=No C (FP) D (TN)

Recall is the percent of positives cases that you were able to catch
The proportion of actual Spam emails correctly identified by the model.

For example consider 10,000 customer records


Action Actually bought Actually did not buy

Predicted(that there will be a TP : 500 FP : 100


buy)

Predicted(that there will be no FN : 400 TN : 9000


buy)

The "recall" is 500 out of 900 = 55.55%


Class=Yes Class=No
1
Class=Yes a. (TP) B (FN) 5
F1 Score ACTUAL
CLASS Class=No C (FP) D (TN)

F1-score: The harmonic mean of Precision and Recall. It balances the two metrics

For example consider 10,000 customer records


Action Actually bought Actually did not buy

Predicted(that there will be a TP : 500 FP : 100


buy)

Predicted(that there will be no FN : 400 TN : 9000


buy)
1
6

Each metric highlights different aspects of model performance:


1.Accuracy is good for balanced datasets but can be misleading in imbalanced
ones.
2.Precision focuses on minimizing false positives.
3.Recall focuses on minimizing false negatives.
4.F1-Score balances Precision and Recall, making it a robust measure for
imbalanced datasets.
ROC (Receiver Operating Characteristic) 1
7

Developed in 1950s for signal detection theory to analyze noisy signals


Characterize the trade-off between positive hits and false alarms
ROC curve plots TPR (on the y-axis) against FPR (on the x-axis)

TP PREDICTED CLASS
TPR =
TP + FN Yes No
Fraction of positive instances predicted
as positive Yes a b
Actual (TP) (FN)
FP No c d
FPR = (FP) (TN)
FP + TN
Fraction of negative instances predicted
as positive
1
8
RECEIVER OPERATING CHARACTERISTICS (ROC) CURVE

• Shows tradeoff between True Positive


Rate(sensitivity) and False Positive Rate(1-specificity)

• Closer the curve to left-hand border and top border of


the ROC space, more accurate the test

• Closer the curve to the 45-degree diagonal of the ROC


space, the less accurate the test

• The area under the curve is a measure of test accuracy


Class=Yes Class=No

Class=Yes a. (TP) B (FN)


ACTUAL
CLASS Class=No C (FP) D (TN)
1
9

Sensitivity and Specificity

𝑻𝑷 𝑻𝑵
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚 = 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚 =
𝑻𝑷 + 𝑭𝑵 𝑻𝑵 + 𝑭𝑷

Actual
Cat Dog Horse
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝒄𝒂𝒕 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝒄𝒂𝒕
Predicted

Cat 12 102 93
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝒅𝒐𝒈 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝒅𝒐𝒈
Dog 112 23 77
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝑯𝒐𝒓𝒔𝒆 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝑯𝒐𝒓𝒔𝒆
Horse 83 92 17
2
0

Sensitivity and Specificity

𝑻𝑷 𝑻𝑵
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝒄𝒂𝒕 = 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝒄𝒂𝒕 =
𝑻𝑷 + 𝑭𝑵 𝑻𝑵 + 𝑭𝑷

True Positive (TP): 12 Actual


False Positive (FP): 102 + 93 = 195
False Negative (FN): 112 + 83 = 195 Cat Dog Horse
True Negative (TN): 23 + 77 + 92 + 17 = 209

Predicted
Cat 12 102 93

Dog 112 23 77

Horse 83 92 17
2
1

Sensitivity and Specificity

𝑻𝑷 𝑻𝑵
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝒄𝒂𝒕 = 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝒄𝒂𝒕 =
𝑻𝑷 + 𝑭𝑵 𝑻𝑵 + 𝑭𝑷

Actual
Cat Dog Horse

Predicted
Cat 12 102 93

Dog 112 23 77

Horse 83 92 17
2
2

Sensitivity and Specificity

𝑻𝑷 𝑻𝑵
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝒄𝒂𝒕 = 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝒄𝒂𝒕 =
𝑻𝑷 + 𝑭𝑵 𝑻𝑵 + 𝑭𝑷

Actual
𝟏𝟐 Cat Dog Horse
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝒄𝒂𝒕 = = 𝟎. 𝟎𝟔
𝟏𝟐 + (𝟏𝟏𝟐 + 𝟖𝟑)

Predicted
Cat 12 102 93

Dog 112 23 77
𝟐𝟑 + 𝟕𝟕 + 𝟗𝟐 + 𝟏𝟕
𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝒄𝒂𝒕 = = 𝟎. 𝟓𝟐 Horse 83 92 17
(𝟐𝟑 + 𝟕𝟕 + 𝟗𝟐 + 𝟏𝟕) + (𝟏𝟎𝟐 + 𝟗𝟑)
2
3

Sensitivity and Specificity

𝑻𝑷 𝑻𝑵
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝒅𝒐𝒈 = 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝒅𝒐𝒈 =
𝑻𝑷 + 𝑭𝑵 𝑻𝑵 + 𝑭𝑷

Actual
Cat Dog Horse

Predicted
Cat 12 102 93

Dog 112 23 77

Horse 83 92 17
2
4

Sensitivity and Specificity

𝑻𝑷 𝑻𝑵
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝒅𝒐𝒈 = 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝒅𝒐𝒈 =
𝑻𝑷 + 𝑭𝑵 𝑻𝑵 + 𝑭𝑷

Actual
Cat Dog Horse

Predicted
Cat 12 102 93

Dog 112 23 77

Horse 83 92 17
2
5

Sensitivity and Specificity

𝑻𝑷 𝑻𝑵
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝒅𝒐𝒈 = 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝒅𝒐𝒈 =
𝑻𝑷 + 𝑭𝑵 𝑻𝑵 + 𝑭𝑷

Actual
𝟐𝟑 Cat Dog Horse
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝒅𝒐𝒈 = = 𝟎. 𝟏𝟏
𝟐𝟑 + (𝟏𝟎𝟐 + 𝟗𝟐)

Predicted
Cat 12 102 93

Dog 112 23 77
𝟏𝟐 + 𝟗𝟑 + 𝟖𝟑 + 𝟏𝟕
𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝒅𝒐𝒈 = = 𝟎. 𝟓𝟐 Horse 83 92 17
(𝟏𝟐 + 𝟗𝟑 + 𝟖𝟑 + 𝟏𝟕) + (𝟏𝟏𝟐 + 𝟕𝟕)
2
6

Sensitivity and Specificity

𝑻𝑷 𝑻𝑵
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝑯𝒐𝒓𝒔𝒆 = 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝑯𝒐𝒓𝒔𝒆 =
𝑻𝑷 + 𝑭𝑵 𝑻𝑵 + 𝑭𝑷

Actual
Cat Dog Horse

Predicted
Cat 12 102 93
Dog 112 23 77
Horse 83 92 17
2
7

Sensitivity and Specificity

𝑻𝑷 𝑻𝑵
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝑯𝒐𝒓𝒔𝒆 = 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝑯𝒐𝒓𝒔𝒆 =
𝑻𝑷 + 𝑭𝑵 𝑻𝑵 + 𝑭𝑷

Actual
Cat Dog Horse

Predicted
Cat 12 102 93
Dog 112 23 77
Horse 83 92 17
2
8

Sensitivity and Specificity

𝑻𝑷 𝑻𝑵
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝑯𝒐𝒓𝒔𝒆 = 𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝑯𝒐𝒓𝒔𝒆 =
𝑻𝑷 + 𝑭𝑵 𝑻𝑵 + 𝑭𝑷

Actual
𝟏𝟕 Cat Dog Horse
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚𝑯𝒐𝒓𝒔𝒆 = = 𝟎. 𝟎𝟗
𝟏𝟕 + (𝟕𝟕 + 𝟗𝟑)

Predicted
Cat 12 102 93
Dog 112 23 77
𝟏𝟐 + 𝟏𝟎𝟐 + 𝟏𝟏𝟐 + 𝟐𝟑
𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚𝑯𝒐𝒓𝒔𝒆 = = 𝟎. 𝟓𝟖 Horse
(𝟏𝟐 + 𝟏𝟎𝟐 + 𝟏𝟏𝟐 + 𝟐𝟑) + (𝟖𝟑 + 𝟗𝟐) 83 92 17
2
9

Confusion Matrix

Actual
Thing 1 Thing 2 Thing 3 Thing 4

Thing 1 12 102 93 56
Predicted

Thing 2 112 23 77 36
Thing 3 83 92 17 45
Thing 4 12 45 68 48
SAMPLE QUESTION
3
What does a confusion matrix primarily help to evaluate in a classification model? 0

A. The feature importance


B. The model's classification accuracy
C. The correlation between variables
D. The computational complexity of the model

In a confusion matrix for a binary classifier, what does the term 'True Positives' (TP) refer to?
A. The instances correctly labeled as the negative class
B. The instances incorrectly labeled as the negative class
C. The instances correctly labeled as the positive class
D. The instances incorrectly labeled as the positive class

What does the 'False Negative' (FN) cell of a confusion matrix represent in a binary classification problem?

A. Positive instances incorrectly labeled as negative


B. Negative instances incorrectly labeled as positive
C. Positive instances correctly labeled as negative
D. Negative instances correctly labeled as positive
SAMPLE QUESTION
3
What does a confusion matrix primarily help to evaluate in a classification model? 1

A. The feature importance


B. The model's classification accuracy
C. The correlation between variables
D. The computational complexity of the model

In a confusion matrix for a binary classifier, what does the term 'True Positives' (TP) refer to?
A. The instances correctly labeled as the negative class
B. The instances incorrectly labeled as the negative class
C. The instances correctly labeled as the positive class
D. The instances incorrectly labeled as the positive class

What does the 'False Negative' (FN) cell of a confusion matrix represent in a binary classification problem?

A. Positive instances incorrectly labeled as negative


B. Negative instances incorrectly labeled as positive
C. Positive instances correctly labeled as negative
D. Negative instances correctly labeled as positive
3
2
HOW TO MEASURE PURITY?

Entropy: Measure of impurities


3
3
TYPES OF ERROR

Training Error

• Fraction of training data misclassified


Test Error

• Fraction of test data misclassified

Generalization Error

• Probability of misclassifying new random data


3
4

The data used to train the machine learning model affects the correctness or predictions
made by it. Sometimes inadequate data may result in inconsistent results. The more the
data you have the more accuracy you will get.
3
5

Next Class on Bias and Variance Tradeoff

You might also like