0% found this document useful (0 votes)
1 views

SML Updated UNIT 4

Uploaded by

22416
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

SML Updated UNIT 4

Uploaded by

22416
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Model Evaluation and

Selection
Unit 4
Model Evaluation
 Model evaluation is the process of using different evaluation metrics
to understand a machine learning model's performance, as well as its
strengths and weaknesses.
 It is a crucial step in the development and deployment of machine
learning systems.
 The primary goal of model evaluation is to determine how well the
model generalizes to unseen data and whether it meets the desired
objectives.
Model Evaluation techniques

 Data mining and Machine learning


 Independent and dependent variable
 Splitting of data
 Under fitting and over fitting
 Model evaluation and selection
Data mining:

 Data mining is the process of discovering patterns, trends, and


insights from large sets of data.
 It involves extracting useful information and knowledge from raw
data by using various techniques, including statistical analysis,
machine learning, and artificial intelligence.
 Data mining helps businesses and researchers make informed
decisions, predict future trends, identify relationships between
variables, and gain a deeper understanding of their data.
 It is widely used across industries such as finance, marketing,
healthcare, and telecommunications to uncover valuable
insights that can lead to improved strategies, increased
efficiency, and better decision-making.
Independent and dependent variable

Independent Variables: The variable that are not affected by the


other variables are called independent variables
Dependent Variables: The variables which depend on other
variables or factors
Machine Learning:
Training Data, Validation Data, Testing Data

 Training Data:
 Training data are collections of examples or samples that are
used to 'teach' or 'train the machine learning model.
 The model uses a training data set to understand the patterns
and relationships within the data, thereby learning to make
predictions or decisions without being explicitly programmed to
perform a specific task.
 It is the set of data that is used to train and make the model
learn the hidden features/patterns present in the data.
 Validation Data:
 The validation data is a set of data that is used to validate the
model performance during training.
 This data is held aside during the modelling process and used
only to evaluate a model after the modelling is complete.
 After training a machine learning model using the training data,
the model's performance is evaluated using the validation data.
 This evaluation typically involves measuring metrics such as
accuracy, precision, recall, F1 score, or other relevant
performance indicators, depending on the nature of the
problem being solved.
 Testing Data:
 The testing data is used to evaluate the accuracy of the trained
algorithm.
 Data that held aside during the modelling process and used
only to evaluate a model after the modelling is complete.
 Test data has the same variables as the training data, the same
set of independent variables and the dependent variable.
Overfitting:
 Definition:
 Overfitting occurs when a model learns the training data too well,
capturing noise or random fluctuations in the data as if they were
genuine patterns. Consequently, the model performs well on the
training data but fails to generalize to new, unseen data.
 Characteristics:
 Low bias: The model has low bias as it fits the training data very
closely. Bias is the inability for ML model to get a proper relationship
between variables.
 High variance: However, it has high variance because it fails to
generalize well to unseen data. In ML the difference in fits between
data sets is called variance.
 It will have excellent performance on training data but poor
performance on test data.
 Causes:
 Using a too complex model or algorithm.
 Having too many features relative to the amount of training data.
 Insufficient regularization. Regularization refers to techniques that
are used to compare machine learning models in order to minimize
the adjusted loss function and prevent overfitting or under fitting.
Using Regularization, we can fit our machine learning model
appropriately on a given test set and hence reduce the errors in it.
Loss functions are a measurement of how good your model is in
terms of predicting the expected outcome.
 Regularization in machine learning is a technique used to prevent
overfitting and improve the generalization ability of a model
 Remedies:
 Simplify the model by reducing the number of features or
decreasing its complexity.
 Cross-validation to tune hyper parameters and prevent
overfitting.
 Early stopping during training to prevent the model from
learning noise in the data.
Under fitting
Definition: Under fitting occurs when a model is too simple to capture the underlying
structure of the data. In other words, the model fails to learn the patterns in the
training data, resulting in poor performance not only on the training data but also on
unseen data (test data).
Characteristics:
 High bias: The model is biased toward a certain set of assumptions and fails to
capture the complexity of the data.
 Poor performance: Both on training and test data, the model's performance is
poor.
Causes:
 Using a too simple model or algorithm.
 Insufficient training data.
 Insufficient training time.
Remedies:
 Increase model complexity by adding more features or
increasing the model's capacity.
 Use more advanced algorithms that can capture complex
patterns.
 Gather more training data.
 Train the model for longer periods.
How to overcome over fitting and under fitting
in model?

 Introduce a validation set


 Variance-bias tradeoff
 Cross-validation
 Hyper parameter tuning
 Regularization
Introduce Validation set:
 A validation set is a set of data used to train the model with the goal of
finding and optimizing the best model to solve a given problem.
 The training set is used to train the model. The validation set is used to
fine-tune the model's hyper parameters. The test set serves as a
benchmark to assess the model's performance on new data.

Variance-bias tradeoff:
 If the algorithm is too simple then it may be on high bias and low
variance condition and thus is error-prone. If algorithms fit too complex
then it may be on high variance and low bias.
 In the latter condition, the new entries will not perform well. Well, there is
something between both of these conditions, known as a Trade-off or
Bias Variance Trade-off.
 This tradeoff in complexity is why there is a tradeoff between bias and
variance. An algorithm can’t be more complex and less complex at the
same time
Cross-validation
 Cross validation is a technique used in machine learning to
evaluate the performance of a model on unseen data. It involves
dividing the available data into multiple folds or subsets, using
one of these folds as a validation set, and training the model on
the remaining folds. This process is repeated multiple times,
each time using a different fold as the validation set. Finally, the
results from each validation step are averaged to produce a
more robust estimate of the model’s performance.
Hyper parameter tuning:
 When you’re training machine learning models, each dataset and
model needs a different set of hyper parameters, which are a
kind of variable. The only way to determine these is through
multiple experiments, where you pick a set of hyper parameters
and run them through your model. This is called hyper parameter
tuning. In essence, you're training your model sequentially with
different sets of hyper parameters.
Model Evaluation Metrics
 Model evaluation is the process of using different evaluation metrics to
understand a machine learning model's performance, as well as its strengths
and weaknesses.
 To evaluate the performance of a classification model, different metrics are
used, and some of them are as follows:
1. Accuracy
2. Confusion Matrix
3. Precision
4. Recall
5. F-Score
6. AUC(Area Under the Curve)-ROC
1.Confusion Matrix
 Classification is the process of categorizing a given set of data into
different categories.
 In machine learning, to measure the performance of the classification
model, we use the confusion matrix
 The confusion matrix is a tool used to evaluate the performance of a
model and is visually represented as a table.
 It provides a deeper layer of insight to data practitioners on the
model's performance, errors, and weaknesses.
The Confusion Matrix Structure

Let’s learn about the basic structure of a


confusion matrix, using the example of
identifying an email as spam or not spam.
 True Positive (TP) - Your model predicted the
positive class. For example, identifying a
spam email as spam.
 True Negative (TN) - Your model correctly
predicted the negative class. For example,
identifying a regular email as not spam.
 False Positive (FP) - Your model incorrectly
predicted the positive class. For example,
identifying a regular email as spam.
 False Negative (FN) - Your model incorrectly
predicted the negative class. For example,
identifying a spam email as a regular email.
 In general, the table is divided into four
terminologies, which are as follows:
 True Positive(TP): In this case, the
prediction outcome is true, and it is
true in reality, also.
 True Negative(TN): in this case, the
prediction outcome is false, and it is
false in reality, also.
 False Positive(FP): In this case,
prediction outcomes are true, but they
are false in actuality.
 False Negative(FN): In this case,
predictions are false, and they are true
in actuality.
1.Accuracy
 Accuracy is a metric that measures how often a
machine learning model correctly predicts the outcome.
 You can calculate accuracy by dividing the number of
correct predictions by the total number of predictions

It can be formulated as:


2.Precision
 The precision metric is used to overcome the limitation of Accuracy. The
precision determines the proportion of positive prediction that was actually
correct
 Precision is defined as the ratio of correctly classified positive samples (True
Positive) to a total number of classified positive samples (either correctly or
incorrectly).
 Precision measures the accuracy of the positive predictions made by the
model.
 It is the ratio of true positive predictions to all positive predictions made by the
model, including both true positives and false positives
 A high precision indicates that the model has a low false positive rate, meaning
it is good at avoiding misclassifying negative instances as positive.
 TP- True Positive
 FP- False Positive
Examples to calculate the Precision in the
machine learning model
 Below are some examples for
calculating Precision in Machine
Learning:
 Case 1- In the below-mentioned
scenario, the model correctly
classified two positive samples while
incorrectly classified one negative
sample as positive. Hence, according
to precision formula;

 Precision = TP/TP+FP
 Precision = 2/2+1 = 2/3 = 0.667
 Case 2- In this scenario, we have
three Positive samples that are
correctly classified, and one
Negative sample is incorrectly
classified.

 Put TP =3 and FP =1 in the


precision formula, we get
 Precision =0.75
 Case 3- In this scenario, we have three Positive samples
that are correctly classified but no Negative sample which
is incorrectly classified.
 Put TP =3 and FP =0 in precision formula, we get;
 Precision = TP/TP+FP
 Precision = 3/3+0 = 3/3 = 1
 Hence, in the last scenario, we have a precision value of
1 or 100% when all positive samples are classified as
positive, and there is no any Negative sample that is
incorrectly classified.
Recall:
 It is also similar to the Precision metric; however, it aims to calculate the
proportion of actual positive that was identified incorrectly
 It is the ratio of true positive predictions to all actual positive instances in
the dataset, including both true positives and false negatives.
 Recall measures the ability of the model to correctly identify all positive
samples.
 A high recall indicates that the model has a low false negative rate,
meaning it is good at capturing positive instances without missing many.
 Examples to calculate the Recall in the machine learning model
Example 1- Let's understand the calculation of Recall with four different cases
where each case has the same Recall as 0.667 but differs in the classification
of negative samples.

In the above image, we have only two positive


samples that are correctly classified as positive
while only 1 positive sample that is incorrectly
classified as negative
Recall = TP/TP+FN
Hence, true positivity rate is 2 and while false
=2/(2+1) negativity rate is 1
=0.667
Example-2
Now, we have another scenario where all positive
samples are classified correctly as positive. Hence,
the True Positive rate is 3 while the False Negative
rate is 0.

Recall = TP/TP+FN
= 3/(3+0)
=1

If the recall is 100%, then it tells us the model has


detected all positive samples as positive and
neglects how all negative samples are classified in
the model
 Example-3
In this scenario, the model does not identify any positive sample that is
classified as positive. All positive samples are incorrectly classified as
Negative. Hence, the true positive rate is 0, and the False Negative rate is 3.
Then Recall will be:
Recall = TP/TP+FN
= 0/(0+3)
=0/3 =0
This means the model has not correctly classified any Positive Samples.
Note:
From the above definitions of Precision and Recall, we can say that recall
determines the performance of a classifier with respect to a false negative,
whereas precision gives information about the performance of a classifier with
respect to a false positive.
F1 Score
 The F1 score is the harmonic mean of precision and recall.
 It balances both precision and recall, making it a useful metric for
situations where there is an imbalance between the classes or when both
false positives and false negatives are equally important.

 The F1 score ranges from 0 to 1, where 1 indicates perfect precision and


recall, and 0 indicates the worst possible performance.
 It's used as a single measure to compare different models or to tune
model parameters in classification tasks.
Difference between Precision and Recall in Machine
Learning:
What is the purpose of using validation set during model
evaluation?

 Model Assessment: The validation set provides an independent dataset


that was not used during training.
 Hyper parameter Tuning: Machine learning models often have hyper
parameters that need to be tuned for optimal performance. These hyper
parameters cannot be learned from the training data and need to be set
based on performance on a validation set.
 Preventing Overfitting: Regularly evaluating the model on a validation set
helps to monitor for signs of overfitting. If the model's performance on the
training set continues to improve while its performance on the validation set
starts to degrade, it may be a sign that the model is overfitting the training
data. In such cases, adjustments can be made to prevent overfitting, such as
reducing model complexity or adding regularization.
 Avoiding Data Leakage: Using a separate validation set ensures that the
model is evaluated on data that it has not seen during training. This helps to
avoid data leakage, where information from the validation set inadvertently
influences the model's training process, leading to overly optimistic
performance estimates.
Explain the concept of confusion matrices and their role in model evaluation.

A confusion matrix is a table that summarizes the performance of a classification


model by comparing the predicted class labels with the actual class labels in the test
dataset. It provides a detailed breakdown of the model's predictions, highlighting the
correct and incorrect classifications for each class. Confusion matrices are a
fundamental tool for evaluating the performance of classification models and
assessing their strengths and weaknesses.
The components of a confusion matrix include:
 True Positives (TP): Instances that are correctly predicted as belonging to the
positive class.
 True Negatives (TN): Instances that are correctly predicted as belonging to the
negative class.
 False Positives (FP): Instances that are incorrectly predicted as belonging to the
positive class when they actually belong to the negative class (Type I error).
 False Negatives (FN): Instances that are incorrectly predicted as belonging to
the negative class when they actually belong to the positive class (Type II error).
The role of confusion matrices in model evaluation includes:
 Performance Metrics Calculation: Confusion matrices are used to compute
various performance metrics such as accuracy, precision, recall (sensitivity),
specificity, F1-score, and area under the receiver operating characteristic (ROC)
curve (AUC-ROC). These metrics provide insights into different aspects of the
model's performance and help assess its effectiveness in classifying instances
correctly.
 Understanding Model Behavior: Confusion matrices provide a detailed
breakdown of the model's predictions for each class, allowing analysts to
understand which classes are being correctly classified and which are being
confused with others. This information helps identify potential sources of errors
and guides model improvement efforts.
 Class Imbalance Assessment: In datasets with class imbalance (i.e., unequal
distribution of classes), confusion matrices help assess how well the model
performs for minority classes. By examining the distribution of true positives and
false negatives across classes, analysts can identify whether the model is biased
towards predicting the majority class and take appropriate corrective measures.
 Model Selection and Comparison: Confusion matrices facilitate the
comparison of multiple models by providing a concise summary of their
classification performance. Analysts can compare the distributions of TP, FP, FN,
and TN across models to identify the most suitable one for their specific use case.
Describe the importance of selecting appropriate
performance measures when evaluating machine learning
models
 Performance Assessment: Metrics help you quantify how well your model is
doing. Without them, it’s challenging to compare different models or
iterations of the same model.
 Goal Alignment: Choosing the right metric aligns your model’s performance
assessment with the ultimate goal of your project. For example, in a medical
diagnosis task, you might prioritize minimizing false negatives (missed
diagnoses) over false positives (false alarms).
 Model Selection: When experimenting with different algorithms and hyper
parameters, evaluation metrics guide you in choosing the best-performing
model.
 Iterative Improvement: Metrics provide feedback that allows you to fine-tune
your model, addressing its weaknesses and enhancing its strengths.
Define cross-validation and explain its importance in model
evaluation.
Cross-validation is a technique used to assess the performance of a machine learning model by
partitioning the dataset into multiple subsets, training the model on some subsets, and evaluating
it on the remaining subset(s). The main idea behind cross-validation is to use multiple train-test
splits of the data to obtain more reliable estimates of the model's performance.

Importance of Cross-Validation in Model Evaluation:


1.Robustness: Cross-validation provides a more robust estimate of a model's performance by
averaging the evaluation results across multiple train-test splits. This helps reduce the variability in
performance estimates that can arise from using a single train-test split.

2.Avoiding Overfitting: Cross-validation helps detect overfitting by assessing how well the model
generalizes to unseen data. By evaluating the model on multiple test sets, it becomes more
evident if the model is learning patterns specific to the training data rather than capturing
underlying relationships.

3.Hyper parameter Tuning: Cross-validation is essential for tuning hyperparameters effectively.


By evaluating the model's performance across different parameter values or model configurations,
practitioners can select the optimal set of hyper parameters that yield the best performance on
average.

4.Maximizing Data Usage: Cross-validation allows for maximal utilization of the available data
by using each data point for both training and testing purposes across different folds. This helps in
making the most out of limited data resources, especially in cases where the dataset is small.
Describe at least two commonly used techniques for cross-
validation and discuss their differences

Commonly Used Techniques for Cross-Validation:


1. K-Fold Cross-Validation:
•In k-fold cross-validation, the dataset is divided into k equal-sized folds. The model is
trained and evaluated k times, each time using a different fold as the test set and the
remaining folds as the training set. The performance metrics are then averaged
across all folds to obtain the final performance estimate.
•Advantages: Provides a robust estimate of performance, ensures that each data
point is used for both training and testing, suitable for most machine learning tasks.
•Disadvantages: Can be computationally expensive, especially for large datasets
and complex models.

2. Stratified K-Fold Cross-Validation:


•Stratified k-fold cross-validation is similar to k-fold cross-validation, but it ensures
that each fold preserves the class distribution of the original dataset. This is
particularly useful for imbalanced datasets where one class is much more prevalent
than others.
•Advantages: Ensures that class distribution is maintained in each fold, leading to
more reliable performance estimates for imbalanced datasets.
•Disadvantages: Adds computational overhead compared to regular k-fold cross-
validation.
Define bias and variance in the context of machine learning
models.
Bias:
Bias refers to the error introduced by approximating a real-world problem with a simplified model. A model
with high bias makes strong assumptions about the underlying data distribution and may fail to capture the
true relationship between the features and the target variable. High bias often leads to under fitting, where
the model is too simple to capture the complexities of the data.
Characteristics of a model with high bias:
 Consistently makes systematic errors regardless of the training data.
 Tends to oversimplify the underlying data distribution.
 Poor performance on both training and test datasets.
Variance:
Variance refers to the error introduced by the model's sensitivity to fluctuations or noise in the training
data. A model with high variance is overly complex and captures random fluctuations in the training data
as if they were meaningful patterns. High variance often leads to overfitting, where the model performs
well on the training data but poorly on unseen data.
Characteristics of a model with high variance:
 Fits the training data too closely and captures noise or outliers.
 Sensitive to small changes in the training data.
 Good performance on the training dataset but poor generalization to unseen data.
How does Bias and Variance contribute to the overall error of a
model?
The overall error of a model can be decomposed into three components: bias,
variance, and irreducible error.
 Bias-Variance Trade-off: Bias and variance have an inverse relationship – as
one decreases, the other increases. Achieving a balance between bias and
variance is essential for minimizing the overall error of the model.
 Irreducible Error: Irreducible error is the inherent noise or randomness in the
data that cannot be reduced by the model. It sets a lower bound on the overall
error that cannot be eliminated, even with a perfect model.
 Bias-Variance Decomposition: The bias-variance decomposition of the
mean squared error (MSE) provides a framework for understanding how bias
and variance contribute to the overall error of the model. The MSE can be
expressed as the sum of squared bias, variance, and irreducible error.
When would you prioritize one model evaluation metric over another, and what insights can
each metric provide about model performance?

The choice of model evaluation metric depends on the specific characteristics of the problem at hand, the
objectives of the analysis, and the preferences of stakeholders.
 Accuracy is suitable when all classes are equally important, and false positives and false negatives have
similar costs.
 Precision and recall are important when the costs of false positives and false negatives differ significantly,
such as in medical diagnosis or fraud detection.
 F1-score is appropriate when there is an uneven class distribution or when both precision and recall need
to be considered simultaneously.

Explain the differences between accuracy, precision, recall, and F1 score.


 Accuracy measures the overall correctness of the model's predictions, while precision, recall,
and F1 score focus on specific aspects of the model's performance related to true positives,
false positives, and false negatives.
 Precision emphasizes the avoidance of false positives, recall emphasizes the capture of true
positives, and the F1 score balances these two aspects.
 Accuracy may be misleading when classes are imbalanced, whereas precision, recall, and F1
score provide more nuanced insights into model performance, especially in scenarios with
class imbalance.

You might also like