Open In App

L1/L2 Regularization in PyTorch

Last Updated : 31 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

L1 and L2 regularization techniques help prevent overfitting by adding penalties to model parameters, thus improving generalization and model robustness. PyTorch simplifies the implementation of regularization techniques like L1 and L2 through its flexible neural network framework and built-in optimization routines, making it easier to build and train regularized models.

The article aims to illustrate how can we apply L1 and L2 regularization in the Pytorch framework.

Understanding Regularization in Deep Learning

Regularization is a technique used in machine learning to improve a model's performance by reducing its complexity. The main purpose of regularization is to prevent overfitting, where the model learns noise in the training data rather than the underlying pattern.

Role in Preventing Overfitting

Regularization helps prevent overfitting by discouraging the model from fitting noise or overly complex patterns in the training data. This ensures that the model captures the underlying trends without becoming too specific to the training set.

Types of Regularization

  1. L1 Regularization (Lasso): Adds a penalty proportional to the absolute value of the coefficients. It encourages sparsity by driving some coefficients to zero, leading to a simpler, more interpretable model.
  2. L2 Regularization (Ridge): Adds a penalty proportional to the square of the coefficients. It prevents the coefficients from becoming too large, reducing model complexity and improving generalization.

Both regularization techniques can be used simultaneously in a combined approach known as Elastic Net, which leverages the strengths of both L1 and L2 regularization.

Concept of L1 Regularization

L1 regularization adds a penalty proportional to the sum of the absolute values of the model’s coefficients to the loss function. Mathematically, it can be represented as:

L_{\text{total}} = L_{\text{original}} + \lambda \sum_{i=1}^{n} |w_i|

where:

  • L_{\text{total}}​ is the total loss with regularization.
  • L_{\text{original}} is the original loss (e.g., mean squared error).
  • \lambda is the regularization parameter controlling the strength of the penalty.
  • w_i​ represents the model parameters (weights).

Characteristics and Impact on Model

  • Sparsity: L1 regularization encourages sparsity in the model by pushing some weights to exactly zero. This can simplify the model and aid in feature selection by automatically eliminating less important features.
  • Interpretability: Sparse models are often easier to interpret because they rely on fewer features.
  • Robustness: L1 regularization can make models more robust to noise by reducing the complexity of the model.

Concept of L2 Regularization

L2 regularization, also known as Ridge regularization or weight decay, is a technique used to prevent overfitting by adding a penalty to the loss function proportional to the sum of the squares of the model’s weights. Unlike L1 regularization, which promotes sparsity, L2 regularization encourages the weights to be small but does not necessarily push them to zero.

L2 regularization adds a term to the loss function that is proportional to the sum of the squares of the weights. The regularized loss function can be expressed as:

L_{\text{total}} = L_{\text{original}} + \lambda \sum_{i=1}^{n} w_i^2

where:

  • L_{\text{total}}​ is the total loss with regularization.
  • L_{\text{original}} is the original loss (e.g., mean squared error).
  • \lambda is the regularization parameter controlling the strength of the penalty.
  • w_i represents the model parameters (weights).

Characteristics and Impact on Model

  • Weight Shrinkage: L2 regularization shrinks the weights towards zero but does not force them to be exactly zero. This results in smaller weights, which can reduce model complexity and prevent overfitting.
  • Smoothness: It tends to produce models with more evenly distributed weights, avoiding scenarios where a few weights are excessively large.
  • Computational Stability: L2 regularization can improve the numerical stability of the optimization process, especially in the presence of multicollinearity or when features are highly correlated.
  • Interpretability: While L2 regularization does not produce sparse models, it can still contribute to improved model performance by reducing the influence of less important features.

Elastic Net: Combined L1/L2 Regularization

Elastic Net regularization is a technique that combines both L1 and L2 regularization to leverage the advantages of both methods. It addresses the limitations of L1 regularization (which may result in sparse solutions) and L2 regularization (which does not produce sparse solutions) by providing a balance between sparsity and weight shrinkage. Elastic Net is particularly useful in situations where there are many features or when features are correlated.

By combining these two methods, Elastic Net can offer a more flexible regularization approach, allowing the model to avoid overfitting while still keeping useful features.

The Elastic Net regularized loss function can be expressed as:

L_{\text{total}} = L_{\text{original}} + \lambda_1 \sum_{i=1}^{n} |w_i| + \lambda_2 \sum_{i=1}^{n} w_i^2

In Elastic Net, \lambda_1​ and \lambda_2​ are used to control the balance between L1 and L2 regularization. The total regularization strength is determined by these parameters, which can be adjusted through cross-validation to achieve optimal model performance.

Implementing L1/L2 Regularization in PyTorch

Step 1: Import Libraries

This step involves importing necessary libraries for data manipulation, model training, and visualization. Libraries like PyTorch and scikit-learn are used for creating datasets, defining models, and optimizing parameters, while matplotlib is used for plotting.

import torch
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
import numpy as np
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt

Step 2: Set Seeds for Reproducibility

Setting seeds ensures that the results are reproducible. Random operations will produce the same output each time the code is run.

# Set seed for reproducibility
np.random.seed(0)
torch.manual_seed(0)

Step 3: Create a Synthetic Dataset

Generate a synthetic dataset for training and testing. This dataset includes feature vectors and binary labels. The dataset is then split into training and testing sets.

# Create a synthetic dataset
X = np.random.randn(1000, 10).astype(np.float32)
y = (np.random.randn(1000) > 0).astype(np.float32)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

train_dataset = TensorDataset(torch.tensor(X_train), torch.tensor(y_train))
test_dataset = TensorDataset(torch.tensor(X_test), torch.tensor(y_test))

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

Step 4: Define the Neural Network Model

Define a simple feedforward neural network with one hidden layer using PyTorch. This model will be used for training and evaluation.

# Define the neural network model
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(10, 50)
self.fc2 = nn.Linear(50, 1)

def forward(self, x):
x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x

Step 5: Define the Training Function with Regularization

Implement the training function that includes options for L1 and L2 regularization. The function calculates the loss, applies regularization, and updates the model weights.

# Define the training function with regularization
def train_model(model, criterion, optimizer, train_loader, regularization_type=None, lambda_reg=0.01, epochs=20):
epoch_losses = []

for epoch in range(epochs):
model.train()
running_loss = 0.0

for inputs, targets in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs.squeeze(), targets)

# Apply L1 regularization
if regularization_type == 'L1':
l1_norm = sum(p.abs().sum() for p in model.parameters())
loss += lambda_reg * l1_norm

# Apply L2 regularization
elif regularization_type == 'L2':
l2_norm = sum(p.pow(2).sum() for p in model.parameters())
loss += lambda_reg * l2_norm

loss.backward()
optimizer.step()

running_loss += loss.item() * inputs.size(0)

epoch_loss = running_loss / len(train_loader.dataset)
epoch_losses.append(epoch_loss)
print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss:.4f}")

return epoch_losses

Step 6: Define the Evaluation Function

Create a function to evaluate the trained model's performance on the test dataset. This function calculates and prints the accuracy of the model.

# Define the evaluation function
def evaluate_model(model, test_loader):
model.eval()
correct = 0
total = 0

with torch.no_grad():
for inputs, targets in test_loader:
outputs = model(inputs)
predicted = (outputs.squeeze() > 0.5).float()
total += targets.size(0)
correct += (predicted == targets).sum().item()

accuracy = correct / total
print(f"Accuracy: {accuracy:.4f}")
return accuracy

Step 7: Plot Training Loss Over Epochs

Define a function to plot the training loss over epochs to visualize the training progress and performance.

# Plot loss over epochs
def plot_training_loss(losses, title):
plt.figure(figsize=(10, 5))
plt.plot(range(1, len(losses) + 1), losses, marker='o')
plt.title(title)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.grid(True)
plt.show()

Step 8: Train and Evaluate with L1 Regularization

Train the model using L1 regularization, evaluate its performance, and plot the training loss.

# Training and evaluating with L1 regularization
print("Training with L1 Regularization:")
model = SimpleNN()
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
l1_losses = train_model(model, criterion, optimizer, train_loader, regularization_type='L1', lambda_reg=0.01)
evaluate_model(model, test_loader)
plot_training_loss(l1_losses, 'Training Loss with L1 Regularization')

Step 9: Reinitialize and Train with L2 Regularization

Reinitialize the model and optimizer, then train using L2 regularization, evaluate its performance, and plot the training loss.

# Reinitialize model, optimizer, and train with L2 regularization
print("\nTraining with L2 Regularization:")
model = SimpleNN()
optimizer = optim.SGD(model.parameters(), lr=0.01)
l2_losses = train_model(model, criterion, optimizer, train_loader, regularization_type='L2', lambda_reg=0.01)
evaluate_model(model, test_loader)
plot_training_loss(l2_losses, 'Training Loss with L2 Regularization')

Complete code:

Python
import torch
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
import numpy as np
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt

# Set seed for reproducibility
np.random.seed(0)
torch.manual_seed(0)

# Create a synthetic dataset
X = np.random.randn(1000, 10).astype(np.float32)
y = (np.random.randn(1000) > 0).astype(np.float32)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

train_dataset = TensorDataset(torch.tensor(X_train), torch.tensor(y_train))
test_dataset = TensorDataset(torch.tensor(X_test), torch.tensor(y_test))

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Define the neural network model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.fc2 = nn.Linear(50, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Define the training function with regularization
def train_model(model, criterion, optimizer, train_loader, regularization_type=None, lambda_reg=0.01, epochs=20):
    epoch_losses = []
    
    for epoch in range(epochs):
        model.train()
        running_loss = 0.0
        
        for inputs, targets in train_loader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs.squeeze(), targets)
            
            # Apply L1 regularization
            if regularization_type == 'L1':
                l1_norm = sum(p.abs().sum() for p in model.parameters())
                loss += lambda_reg * l1_norm
            
            # Apply L2 regularization
            elif regularization_type == 'L2':
                l2_norm = sum(p.pow(2).sum() for p in model.parameters())
                loss += lambda_reg * l2_norm
            
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item() * inputs.size(0)
        
        epoch_loss = running_loss / len(train_loader.dataset)
        epoch_losses.append(epoch_loss)
        print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss:.4f}")
    
    return epoch_losses

# Define the evaluation function
def evaluate_model(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for inputs, targets in test_loader:
            outputs = model(inputs)
            predicted = (outputs.squeeze() > 0.5).float()
            total += targets.size(0)
            correct += (predicted == targets).sum().item()
    
    accuracy = correct / total
    print(f"Accuracy: {accuracy:.4f}")
    return accuracy

# Plot loss over epochs
def plot_training_loss(losses, title):
    plt.figure(figsize=(10, 5))
    plt.plot(range(1, len(losses) + 1), losses, marker='o')
    plt.title(title)
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.grid(True)
    plt.show()

# Training and evaluating with L1 regularization
print("Training with L1 Regularization:")
model = SimpleNN()
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
l1_losses = train_model(model, criterion, optimizer, train_loader, regularization_type='L1', lambda_reg=0.01)
evaluate_model(model, test_loader)
plot_training_loss(l1_losses, 'Training Loss with L1 Regularization')

# Reinitialize model, optimizer, and train with L2 regularization
print("\nTraining with L2 Regularization:")
model = SimpleNN()
optimizer = optim.SGD(model.parameters(), lr=0.01)
l2_losses = train_model(model, criterion, optimizer, train_loader, regularization_type='L2', lambda_reg=0.01)
evaluate_model(model, test_loader)
plot_training_loss(l2_losses, 'Training Loss with L2 Regularization')

Output:

Training with L1 Regularization:
Epoch 1/20, Loss: 1.6103
Epoch 2/20, Loss: 1.5943
Epoch 3/20, Loss: 1.5787
Epoch 4/20, Loss: 1.5632
Epoch 5/20, Loss: 1.5481
Epoch 6/20, Loss: 1.5332
Epoch 7/20, Loss: 1.5185
Epoch 8/20, Loss: 1.5041
Epoch 9/20, Loss: 1.4900
Epoch 10/20, Loss: 1.4759
Epoch 11/20, Loss: 1.4620
Epoch 12/20, Loss: 1.4482
Epoch 13/20, Loss: 1.4345
Epoch 14/20, Loss: 1.4210
Epoch 15/20, Loss: 1.4077
Epoch 16/20, Loss: 1.3945
Epoch 17/20, Loss: 1.3813
Epoch 18/20, Loss: 1.3684
Epoch 19/20, Loss: 1.3557
Epoch 20/20, Loss: 1.3430
Accuracy: 0.4900
file
Training Loss with L1 Regularization
Training with L2 Regularization:
Epoch 1/20, Loss: 0.8900
Epoch 2/20, Loss: 0.8854
Epoch 3/20, Loss: 0.8819
Epoch 4/20, Loss: 0.8789
Epoch 5/20, Loss: 0.8762
Epoch 6/20, Loss: 0.8737
Epoch 7/20, Loss: 0.8713
Epoch 8/20, Loss: 0.8691
Epoch 9/20, Loss: 0.8670
Epoch 10/20, Loss: 0.8648
Epoch 11/20, Loss: 0.8628
Epoch 12/20, Loss: 0.8608
Epoch 13/20, Loss: 0.8587
Epoch 14/20, Loss: 0.8567
Epoch 15/20, Loss: 0.8548
Epoch 16/20, Loss: 0.8529
Epoch 17/20, Loss: 0.8512
Epoch 18/20, Loss: 0.8493
Epoch 19/20, Loss: 0.8475
Epoch 20/20, Loss: 0.8456
Accuracy: 0.4900
l2
Training Loss with L2 Regularization

Conclusion

L1 and L2 regularization techniques are essential for enhancing model generalization and combating overfitting. L1 regularization fosters sparsity by driving some weights to zero, leading to simpler and more interpretable models. In contrast, L2 regularization reduces model complexity by shrinking weights, improving numerical stability and overall performance. By implementing these techniques in PyTorch, we can effectively control model complexity and improve performance, ensuring more robust and reliable machine learning models.


Next Article

Similar Reads