0% found this document useful (0 votes)
24 views

ML Unit-3 - RTU

Introduction to Statistical Learning Theory

Uploaded by

vishakhasahu0001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

ML Unit-3 - RTU

Introduction to Statistical Learning Theory

Uploaded by

vishakhasahu0001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Evaluating Machine Learning Models and Model Selection

Model Selection→
Model selection is the process of choosing the best ml model for a given task. It is done by
comparing various model candidates on chosen evaluation metrics calculated on a designed
evaluation schema. Choosing the correct evaluation schema, whether a simple train test split
or a complex cross-validation strategy, is the crucial first step of building any machine learning
solution.
Model Evaluation→
Model evaluation is a process of assessing the model’s performance on a chosen evaluation
setup. It is done by calculating quantitative performance metrics like F1 score or RMSE or
assessing the results qualitatively by the subject matter experts. The machine
learning evaluation metrics you choose should reflect the business metrics you want to
optimize with the machine learning solution.
Steps for Model Selection and Evaluation:
Step 1: Choose a proper validation strategy. Without a reliable way to validate your model
performance, no amount of hyperparameter tuning and state-of-the-art models will help you.
Step 2: Choose the right evaluation metric. Figure out the business case behind your model and
try to use the machine learning metric that correlates with that. Typically, no one metric is ideal
for the problem. So, calculate multiple metrics and make your decisions based on that.
Sometimes you need to combine classic ML metrics with a subject matter expert evaluation.
And that is ok.
Step 3: Keep track of your experiment results. Whether you use a spreadsheet or a
dedicated experiment tracker, make sure to log all the important metrics, learning curves,
dataset versions, and configurations. You will thank yourself later.
Step 4: Compare experiments and pick a winner. Regardless of the metrics and validation
strategy you choose, at the end of the day, you want to find the best model. But no model is
really best, but some are good enough.

Model Selection Strategies:


1. Resampling methods-
Resampling methods, as the name suggests, are simple techniques of rearranging data samples
to inspect if the model performs well on data samples that it has not been trained on. In other
words, resampling helps us understand if the model will generalize well.
2. Random Split-
Random Splits are used to randomly sample a percentage of data into training, testing, and
preferably validation sets. The advantage of this method is that there is a good chance that the
original population is well represented in all the three sets. In more formal terms, random
splitting will prevent a biased sampling of data.
3. Time-Based Split-
There are some types of data where random splits are not possible. For example, if we have to
train a model for weather forecasting, we cannot randomly divide the data into training and
testing sets. This will jumble up the seasonal pattern! Such data is often referred to by the term
– Time Series.
In such cases, a time-wise split is used. The training set can have data for the last three years
and 10 months of the present year. The last two months can be reserved for the testing or
validation set.
4. K-Fold Cross-Validation
The cross-validation technique works by randomly shuffling the dataset and then splitting it
into k groups. Thereafter, on iterating over each group, the group needs to be considered as a
test set while all other groups are clubbed together into the training set. The model is tested on
the test group and the process continues for k groups.
Thus, by the end of the process, one has k different results on k different test groups. The best
model can then be selected easily by choosing the one with the highest score.
5. Bootstrap
Bootstrap is one of the most powerful ways to obtain a stabilized model. It is close to the
random splitting technique since it follows the concept of random sampling.
The first step is to select a sample size (which is usually equal to the size of the original dataset).
Thereafter, a sample data point must be randomly selected from the original dataset and added
to the bootstrap sample. After the addition, the sample needs to be put back into the original
sample. This process needs to be repeated for N times, where N is the sample size.
6. Probabilistic measures
Probabilistic Measures do not just take into account the model performance but also the model
complexity. Model complexity is the measure of the model’s ability to capture the variance in
the data.
For example, a highly biased model like the linear regression algorithm is less complex and on
the other hand, a neural network is very high on complexity.
7. Bayesian Information Criterion (BIC)
BIC was derived from the Bayesian probability concept and is suited for models that are trained
under the maximum likelihood estimation.

K = number of independent variables


L = maximum-likelihood
N = Number of sampler/data points in the training set
BIC penalizes the model for its complexity and is preferably used when the size of the dataset
is not very small (otherwise it tends to settle on very simple models).
Evaluation of ML Models:
Models can be evaluated using multiple metrics. However, the right choice of an evaluation
metric is crucial and often depends upon the problem that is being solved. A clear understanding
of a wide range of metrics can help the evaluator to chance upon an appropriate match of the
problem statement and a metric.
1. Classification metrics (Confusion Matrix)-
For every classification model prediction, a matrix called the confusion matrix can be
constructed which demonstrates the number of test cases correctly and incorrectly classified.
It looks something like this (considering 1 -Positive and 0 -Negative are the target classes):
Actual 0 Actual 1
Predicted 0 True Negatives (TN) False Negatives (FN)
Predicted 1 False Positives (FP) True Positives (TP)
TN: Number of negative cases correctly classified
TP: Number of positive cases correctly classified
FN: Number of positive cases incorrectly classified as negative
FP: Number of negative cases correctly classified as positive
2. Accuracy-
Accuracy is the simplest metric and can be defined as the number of test cases correctly
classified divided by the total number of test cases.

It can be applied to most generic problems but is not very useful when it comes to unbalanced
datasets.
3. Precision-
Precision is the metric used to identify the correctness of classification.

Intuitively, this equation is the ratio of correct positive classifications to the total number of
predicted positive classifications. The greater the fraction, the higher is the precision, which
means better is the ability of the model to correctly classify the positive class.
4. Recall-
Recall tells us the number of positive cases correctly identified out of the total number of
positive cases.
5. F1 Score-
F1 score is the harmonic mean of Recall and Precision and therefore, balances out the strengths
of each.
It is useful in cases where both recall and precision can be valuable – like in the identification
of plane parts that might require repairing. Here, precision will be required to save on the
company’s cost (because plane parts are extremely expensive) and recall will be required to
ensure that the machinery is stable and not a threat to human lives.

6. AUC-ROC-
ROC curve is a plot of true positive rate (recall) against false positive rate (TN / (TN+FP)).
AUC-ROC stands for Area Under the Receiver Operating Characteristics and the higher the
area, the better is the model performance.
If the curve is somewhere near the 50% diagonal line, it suggests that the model randomly
predicts the output variable.

7. Mean Squared Error or MSE-


MSE is a simple metric that calculates the difference between the actual value and the predicted
value (error), squares it and then provides the mean of all the errors.

MSE is very sensitive to outliers and will show a very high error value even if a few outliers
are present in the otherwise well-fitted model predictions.
8. Root Mean Squared Error or RMSE-
RMSE is the root of MSE and is beneficial because it helps to bring down the scale of the errors
closer to the actual values, making it more interpretable.
9. Mean Absolute Error or MAE
MAE is the mean of the absolute error values (actuals – predictions).
If one wants to ignore the outlier values to a certain degree, MAE is the choice since it reduces
the penalty of the outliers significantly with the removal of the square terms.
10. R-Squared-
R-Square helps to identify the proportion of variance of the target variable that can be captured
with the help of the independent variables or predictors.

11. Adjusted R-Squared-


Adjusted R-Square solves the problem of R-Square by dismissing its inability to reduce in
value with added features. It penalizes the score as more features are added.

The denominator here is the magic element which increases with the increase in the number of
features. Therefore, a significant increase in R2 is required to increase the overall value.

You might also like