0% found this document useful (0 votes)
2 views

UNIT-II Regularization in Deep Learning

The document discusses overfitting in deep learning, explaining how it occurs when a model learns noise from training data, leading to poor generalization on new data. It introduces regularization techniques, such as L1 and L2 regularization, early stopping, dropout, data augmentation, and batch normalization, which help improve model performance by preventing overfitting. The document also includes a practical lab activity demonstrating the implementation of dropout layers in a multi-layer neural network using TensorFlow.

Uploaded by

22b81a6610
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

UNIT-II Regularization in Deep Learning

The document discusses overfitting in deep learning, explaining how it occurs when a model learns noise from training data, leading to poor generalization on new data. It introduces regularization techniques, such as L1 and L2 regularization, early stopping, dropout, data augmentation, and batch normalization, which help improve model performance by preventing overfitting. The document also includes a practical lab activity demonstrating the implementation of dropout layers in a multi-layer neural network using TensorFlow.

Uploaded by

22b81a6610
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Regularization in Deep Learning

What is Overfitting?
• When a model trains on sample data for an excessively long time or
becomes very complicated, it may begin to learn "noise," or
unimportant information, from the dataset.
• The model becomes "overfitted" and unable to generalize
successfully to new data when it memorizes the noise.
• A model won't be able to carry out the classification or prediction
tasks that it was designed for if it can't generalize successfully to new
data.
What is Regularization?
• When a neural network faces entirely new data, regularization acts as
a guiding principle to prevent it from becoming too focused on just
the training data.
• By slightly altering the learning process, the model learns to
generalize better, improving its performance on unseen data.
• The model then performs better on the unobserved data as a result.
Why Regularization?
• Through Regularization the bigger coefficient input parameters
receive a "penalty", which ultimately reduces the variance of the
model, and particularly in deep learning the nodes weight matrices
are penalized.
• With regularization, a more optimized and better accurate model for
better output is achieved.
How does Regularization work?
• When modeling the data, a low bias and high variance scenario is
referred to as overfitting.
• To handle this, regularization techniques trade more bias for less
variance.
• Effective regularization is one that strikes the optimal balance
between bias and variation.
• Additionally, Regularization orders possible models from weakest
overfit to biggest and adds penalties to more complicated models.
• Regularization makes the assumption that the least weights could
result in simpler models and help prevent overfitting.
Techniques of Regularization
L1 Regularization(Lasso Regression)
L2 Regularization(Ridge Regression)
Early stopping
Dropout Regularization
Data Augmentation
Batch Normalization
L1 Regularization
• L1 regularization adds the absolute values of weights to the loss
function as a penalty.
• This encourages some weights to shrink to exactly zero, effectively
eliminating those parameters from the model.
• This is particularly useful for feature selection, as it helps the model
focus on only the most important inputs while ignoring irrelevant
ones.
• Mathematical representation for the L1 regularization is:

Here the lambda is the regularization parameter .


m is the number of training samples
L2 Regularization
• By limiting the coefficient and maintaining all the variables, L2
regularization helps solve problems with multi-collinearity .
• The importance of predictors may be estimated using L2 regression,
and based on that, the unimportant predictors can be penalized.
• The mathematical representation for the L2 regularization is:
Early Stopping
• Stops the training process when performance on a validation set
starts to degrade.
• Prevents the model from overfitting by halting training before the
model becomes overly complex.
Metrics used to perform early stopping
Steps to perform Early Stopping
Dropout Regularization
•Randomly sets a fraction of neurons' outputs to zero during training.
•Prevents co-adaptation of neurons and forces the network to learn more robust features.
•Used primarily in fully connected layers of neural networks.
•Dropout is turned off during testing.
Steps to perform Dropout Regularization
Data Augmentation
• Increases the diversity of the training data by applying
transformations like rotations, flipping, cropping, or color
adjustments.
• Improves generalization by exposing the model to a broader range of
input variations.
Batch Normalization
• Normalizes the inputs to each layer by scaling and shifting them to a
standard distribution.
• Acts as a regularizer by reducing the internal covariate shift, making
the network less sensitive to weight initialization and learning rates.
m = number of neurons in that layer
µ= mean
Standard Deviation
ϒ, β are learnable parameters
Steps to perform Batch Normalization
Lab Activity : Build a multi-layer neural network to improve the test
accuracy using drop out layers.
import tensorflow as tf y_train = to_categorical(y_train, 10)
from tensorflow.keras import Sequential y_test = to_categorical(y_test, 10)
from tensorflow.keras.layers import Dense, Dropout
# Build the neural network with dropout layers
from tensorflow.keras.datasets import mnist
model = Sequential([
from tensorflow.keras.utils import to_categorical
Dense(512, activation='relu', input_shape=(28 * 28,)), # First
hidden layer
# Load and preprocess the MNIST dataset
Dropout(0.2), # Dropout with 20% probability
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Dense(256, activation='relu'), # Second hidden layer
# Normalize the images to the range [0, 1] Dropout(0.3), # Dropout with 30% probability
x_train = x_train / 255.0 Dense(128, activation='relu'), # Third hidden layer
x_test = x_test / 255.0 Dropout(0.4), # Dropout with 40% probability
Dense(10, activation='softmax') # Output layer for 10 classes
# Flatten the images (28x28 to 784) and convert labels to
])
one-hot encoding
x_train = x_train.reshape(-1, 28 * 28)
x_test = x_test.reshape(-1, 28 * 28)
# Compile the model plt.plot(history.history['accuracy'], label='Training Accuracy')
model.compile(optimizer='adam', plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
loss='categorical_crossentropy', plt.title('Model Accuracy')
metrics=['accuracy']) plt.xlabel('Epoch')
plt.ylabel('Accuracy')
# Train the model
plt.legend()
history = model.fit(x_train, y_train,
epochs=10, # Plot training & validation loss values
batch_size=128, plt.subplot(1, 2, 2)
validation_split=0.2) plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
# Evaluate the model
plt.title('Model Loss')
test_loss, test_accuracy = model.evaluate(x_test, y_test)
plt.xlabel('Epoch')
print(f"\nTest Accuracy: {test_accuracy * 100:.2f}%")
plt.ylabel('Loss')
# Plot training history plt.legend()
import matplotlib.pyplot as plt
plt.tight_layout()
plt.figure(figsize=(12, 6)) plt.show()

# Plot training & validation accuracy values


plt.subplot(1, 2, 1)
output

You might also like