0% found this document useful (0 votes)
5 views5 pages

18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024

Uploaded by

gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024

Uploaded by

gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

In deep learning, overfitting and underfitting are two common issues that negatively affect

the performance of models on unseen data. They represent two extremes of the bias-variance
tradeoff.
Overfitting
 Overfitting occurs when a deep learning model learns the training data too well,
including the noise and fluctuations.
 The model becomes extremely well-tuned to the specifics of the training data, making
it perform exceptionally well on this data but poorly on new, unseen data because it
has not learned the underlying patterns, but rather memorized the training set.
Characteristics of Overfitting:
- High accuracy on training data but poor generalization to new data.
- The model captures noise and random fluctuations in the training data as if they were
meaningful concepts.
- The learning curve shows that as training progresses, the training error decreases, but
the validation error starts to increase after a certain point.
Common Causes:
- Excessively complex model with too many parameters.
- Insufficient amount of training data.
- Lack of regularization or too little regularization.
- Training for too many epochs, leading to the model starting to memorize the data.
How to Combat Overfitting:
- Simplify the model (reduce its complexity).
- Collect more training data or augment the existing dataset.
- Apply regularization techniques (L1, L2, dropout).
- Use early stopping during training.
- Implement cross-validation.
- Utilize data augmentation techniques to increase the diversity of the training set.
Underfitting
Underfitting, on the other hand, occurs when a model is too simple to capture the underlying
structure of the data. It does not learn well from the training data and, consequently, performs
poorly on both the training data and unseen data.
Characteristics of Underfitting:
- Low accuracy on both training and validation data.
- The model is too simple to capture the complexities and patterns in the data.
- The learning curve shows high bias, with both training and validation errors being
high.
Common Causes:
- The model is too simple with very few parameters (high bias).
- Insufficient training (too few epochs).
- Overly strong regularization that prevents the model from fitting the data well.
- Poor choice of features in the input data that fails to capture important characteristics.
How to Combat Underfitting:
- Increase model complexity (e.g., more layers, more units per layer).
- Train longer or with more epochs until performance improves.
- Reduce regularization strength.
- Feature engineering to ensure that important characteristics of the data are being fed
into the model.
- Tune model hyperparameters to find a better architecture for the problem at hand.
The key to addressing overfitting and underfitting is to strike a balance where the model is
complex enough to learn the underlying patterns in the data but not so complex that it learns
the noise and details specific to the training set. This balance can typically be achieved
through a combination of model selection, regularization, and tuning, alongside a robust
training methodology.
Regularization techniques are methods used to prevent overfitting by imposing
constraints on the model or its learning process. Here are some common regularization
techniques used in deep learning:

L1 Regularization (Lasso)
L1 regularization, also known as Lasso (Least Absolute Shrinkage and Selection Operator),
adds a penalty equal to the absolute value of the magnitude of the coefficients. This can lead
not only to smaller weights but can also produce some coefficients that are exactly zero,
effectively performing feature selection.

from keras.regularizers import l1


# Adding L1 regularization to a dense layer in Keras
layer = Dense(units, activation='relu', kernel_regularizer=l1(0.01))

L2 Regularization (Ridge)
L2 regularization, also known as Ridge, adds a penalty equal to the square of the magnitude
of the coefficients. This penalty encourages the weights to be small but does not enforce them
to be zero.

from keras.regularizers import l2


# Adding L2 regularization to a dense layer in Keras
layer = Dense(units, activation='relu', kernel_regularizer=l2(0.01))
Elastic Net Regularization
Elastic Net is a combination of L1 and L2 regularization and is useful when multiple features
are correlated with one another.

from keras.regularizers import l1_l2


# Adding Elastic Net regularization to a dense layer in Keras
layer = Dense(units, activation='relu', kernel_regularizer=l1_l2(l1=0.01, l2=0.01))

Dropout
Dropout is a regularization technique that involves randomly setting a fraction of input units
to 0 at each update during training time, which helps prevent overfitting.

from keras.layers import Dropout


# Adding Dropout to a model in Keras
model.add(Dropout(0.5)) # Dropout 50% of the neurons

Early Stopping
Early stopping involves stopping training before the model has fully fitted to the training
data. When the performance on the validation set starts to deteriorate, training is halted to
prevent overfitting.

from keras.callbacks import EarlyStopping


# Using EarlyStopping in Keras
early_stopping = EarlyStopping(monitor='val_loss', patience=5)

Batch Normalization
Although primarily used to help with training stability and speed, batch normalization can
have a regularizing effect by reducing the internal covariate shift.

from keras.layers import BatchNormalization


# Adding Batch Normalization to a model in Keras
model.add(BatchNormalization())

Data Augmentation
Data augmentation artificially increases the size and diversity of the training dataset by
applying random, but realistic, transformations to the input images, such as rotation, scaling,
and cropping.
from keras.preprocessing.image import ImageDataGenerator
# Example of using ImageDataGenerator for data augmentation
datagen = ImageDataGenerator(rotation_range=20, width_shift_range=0.2,
height_shift_range=0.2, horizontal_flip=True)
Noise Injection
Adding noise to inputs can improve robustness and reduce overfitting, as the model learns to
ignore the noise and focus on the underlying data patterns.

from keras.layers import GaussianNoise


# Adding Gaussian Noise to a model in Keras
model.add(GaussianNoise(0.1))

Ensemble Methods
Combining the predictions from multiple models can improve generalization by reducing the
model's variance. Ensemble methods include techniques like bagging, boosting, and stacking.

Weight Constraints
By imposing constraints on the norm of the weights, you can ensure that no individual weight
can have a disproportionately large impact on the outcome, promoting a more distributed and
generalized model.

from keras.constraints import max_norm


from keras.layers import Dense
# Adding a max-norm weight constraint to a dense layer
model.add(Dense(units, activation='relu', kernel_constraint=max_norm(2.)))

These regularization techniques can be used alone or in combination to combat overfitting.


The choice of technique(s) often depends on the specific problem, data characteristics, and
the model being used.

The bias-variance tradeoff is a fundamental concept in supervised learning that describes the
tradeoff between two types of error that affect the performance of a machine learning model:
Bias
Bias refers to the error that results from incorrect assumptions in the learning algorithm. High
bias can cause the model to miss relevant relations between features and target outputs
(underfitting), leading the model to perform poorly on both training and the unseen data.
 High Bias: Simplistic models with high bias pay little attention to the training data
and oversimplify the model, which can lead to a model that does not capture the
complexity of the data (e.g., linear models).
 Low Bias: Complex models with low bias pay more attention to the training data and
can capture more complex relationships (e.g., deep neural networks).
Variance
Variance refers to the error that results from sensitivity to small fluctuations in the training
set. High variance can cause an algorithm to model the random noise in the training data
rather than the intended outputs (overfitting), leading to poor performance on unseen data.
 High Variance: Models with high variance follow the training data very closely (e.g.,
a model with many parameters such as a deep neural network without proper
regularization).
 Low Variance: Models with low variance are not as affected by the specifics of the
training data and are simpler (e.g., linear models).
Tradeoff
The tradeoff is that to achieve a good model performance, you need to find a balance between
bias and variance, minimizing both errors. Here's why:
 High Bias/Low Variance Models often lead to underfitting, where the model is not
complex enough to capture underlying patterns in the data, and hence has low
predictive performance on both training and unseen data.
 Low Bias/High Variance Models often lead to overfitting, where the model is overly
complex and captures noise in the training data, performing well on the training data
but poorly on unseen data.
In general, as model complexity increases, bias tends to decrease and variance tends to
increase, and vice versa. The ideal situation is to have both low bias and low variance, but in
practice, it's often necessary to compromise:
 Simple models (few parameters): tend to have high bias and low variance.
 Complex models (many parameters, such as deep learning): tend to have low bias
and high variance.
Minimizing the Tradeoff
Machine learning practitioners aim to minimize this tradeoff by:
 Choosing the right model complexity for the given problem and data.
 Using techniques like cross-validation to estimate model performance.
 Implementing regularization techniques to reduce overfitting.
 Gathering more data or constructing a more representative feature set to reduce bias
without increasing variance.
 Using ensemble methods that combine the predictions of several models to reduce
variance.
Understanding and balancing the bias-variance tradeoff is key to building models that
generalize well to new, unseen data.

You might also like