SML Updated UNIT 4
SML Updated UNIT 4
Selection
Unit 4
Model Evaluation
Model evaluation is the process of using different evaluation metrics
to understand a machine learning model's performance, as well as its
strengths and weaknesses.
It is a crucial step in the development and deployment of machine
learning systems.
The primary goal of model evaluation is to determine how well the
model generalizes to unseen data and whether it meets the desired
objectives.
Model Evaluation techniques
Training Data:
Training data are collections of examples or samples that are
used to 'teach' or 'train the machine learning model.
The model uses a training data set to understand the patterns
and relationships within the data, thereby learning to make
predictions or decisions without being explicitly programmed to
perform a specific task.
It is the set of data that is used to train and make the model
learn the hidden features/patterns present in the data.
Validation Data:
The validation data is a set of data that is used to validate the
model performance during training.
This data is held aside during the modelling process and used
only to evaluate a model after the modelling is complete.
After training a machine learning model using the training data,
the model's performance is evaluated using the validation data.
This evaluation typically involves measuring metrics such as
accuracy, precision, recall, F1 score, or other relevant
performance indicators, depending on the nature of the
problem being solved.
Testing Data:
The testing data is used to evaluate the accuracy of the trained
algorithm.
Data that held aside during the modelling process and used
only to evaluate a model after the modelling is complete.
Test data has the same variables as the training data, the same
set of independent variables and the dependent variable.
Overfitting:
Definition:
Overfitting occurs when a model learns the training data too well,
capturing noise or random fluctuations in the data as if they were
genuine patterns. Consequently, the model performs well on the
training data but fails to generalize to new, unseen data.
Characteristics:
Low bias: The model has low bias as it fits the training data very
closely. Bias is the inability for ML model to get a proper relationship
between variables.
High variance: However, it has high variance because it fails to
generalize well to unseen data. In ML the difference in fits between
data sets is called variance.
It will have excellent performance on training data but poor
performance on test data.
Causes:
Using a too complex model or algorithm.
Having too many features relative to the amount of training data.
Insufficient regularization. Regularization refers to techniques that
are used to compare machine learning models in order to minimize
the adjusted loss function and prevent overfitting or under fitting.
Using Regularization, we can fit our machine learning model
appropriately on a given test set and hence reduce the errors in it.
Loss functions are a measurement of how good your model is in
terms of predicting the expected outcome.
Regularization in machine learning is a technique used to prevent
overfitting and improve the generalization ability of a model
Remedies:
Simplify the model by reducing the number of features or
decreasing its complexity.
Cross-validation to tune hyper parameters and prevent
overfitting.
Early stopping during training to prevent the model from
learning noise in the data.
Under fitting
Definition: Under fitting occurs when a model is too simple to capture the underlying
structure of the data. In other words, the model fails to learn the patterns in the
training data, resulting in poor performance not only on the training data but also on
unseen data (test data).
Characteristics:
High bias: The model is biased toward a certain set of assumptions and fails to
capture the complexity of the data.
Poor performance: Both on training and test data, the model's performance is
poor.
Causes:
Using a too simple model or algorithm.
Insufficient training data.
Insufficient training time.
Remedies:
Increase model complexity by adding more features or
increasing the model's capacity.
Use more advanced algorithms that can capture complex
patterns.
Gather more training data.
Train the model for longer periods.
How to overcome over fitting and under fitting
in model?
Variance-bias tradeoff:
If the algorithm is too simple then it may be on high bias and low
variance condition and thus is error-prone. If algorithms fit too complex
then it may be on high variance and low bias.
In the latter condition, the new entries will not perform well. Well, there is
something between both of these conditions, known as a Trade-off or
Bias Variance Trade-off.
This tradeoff in complexity is why there is a tradeoff between bias and
variance. An algorithm can’t be more complex and less complex at the
same time
Cross-validation
Cross validation is a technique used in machine learning to
evaluate the performance of a model on unseen data. It involves
dividing the available data into multiple folds or subsets, using
one of these folds as a validation set, and training the model on
the remaining folds. This process is repeated multiple times,
each time using a different fold as the validation set. Finally, the
results from each validation step are averaged to produce a
more robust estimate of the model’s performance.
Hyper parameter tuning:
When you’re training machine learning models, each dataset and
model needs a different set of hyper parameters, which are a
kind of variable. The only way to determine these is through
multiple experiments, where you pick a set of hyper parameters
and run them through your model. This is called hyper parameter
tuning. In essence, you're training your model sequentially with
different sets of hyper parameters.
Model Evaluation Metrics
Model evaluation is the process of using different evaluation metrics to
understand a machine learning model's performance, as well as its strengths
and weaknesses.
To evaluate the performance of a classification model, different metrics are
used, and some of them are as follows:
1. Accuracy
2. Confusion Matrix
3. Precision
4. Recall
5. F-Score
6. AUC(Area Under the Curve)-ROC
1.Confusion Matrix
Classification is the process of categorizing a given set of data into
different categories.
In machine learning, to measure the performance of the classification
model, we use the confusion matrix
The confusion matrix is a tool used to evaluate the performance of a
model and is visually represented as a table.
It provides a deeper layer of insight to data practitioners on the
model's performance, errors, and weaknesses.
The Confusion Matrix Structure
Precision = TP/TP+FP
Precision = 2/2+1 = 2/3 = 0.667
Case 2- In this scenario, we have
three Positive samples that are
correctly classified, and one
Negative sample is incorrectly
classified.
Recall = TP/TP+FN
= 3/(3+0)
=1
2.Avoiding Overfitting: Cross-validation helps detect overfitting by assessing how well the model
generalizes to unseen data. By evaluating the model on multiple test sets, it becomes more
evident if the model is learning patterns specific to the training data rather than capturing
underlying relationships.
4.Maximizing Data Usage: Cross-validation allows for maximal utilization of the available data
by using each data point for both training and testing purposes across different folds. This helps in
making the most out of limited data resources, especially in cases where the dataset is small.
Describe at least two commonly used techniques for cross-
validation and discuss their differences
The choice of model evaluation metric depends on the specific characteristics of the problem at hand, the
objectives of the analysis, and the preferences of stakeholders.
Accuracy is suitable when all classes are equally important, and false positives and false negatives have
similar costs.
Precision and recall are important when the costs of false positives and false negatives differ significantly,
such as in medical diagnosis or fraud detection.
F1-score is appropriate when there is an uneven class distribution or when both precision and recall need
to be considered simultaneously.