Pytorch-Lightning is an open-source deep learning framework. It acts as a wrapper for the PyTorch models. This library is helpful as it helps to simplify the training and testing of the models. It has a built in logging system that keeps the track of metrics like accuracies and losses after every iteration or epoch. In this article, we will explore how to extract these metrics by epoch using the PyTorch Lightning logger.
Understanding Logging in PyTorch Lightning
Logging means keeping records of the losses and accuracies that has been calculated during the training, validation and testing of the model. This technique is useful as it helps developers to check whether the model is prone to overfitting or underfitting.
Pytorch-Lightning has a built in feature of extracting the metrics. The most commonly used technique to capture logs is using self.log method that is a part of the LightningModule class. This library can also be integrated with many loggers like Tensorboard, MLFlow that helps users to keep records of the logs. Loggers can also be used to record the hyperparameters.
Extracting Loss and Accuracy by Epoch
As we all know that we can track the model's metrics step wise as well as epoch wise, we can use different techniques to extract the loss and accuracy of the model.
1. Using Tensorboard logger
Tensorboard logger is the most commonly used logger to keep the records of the metrics. Whenever we set the logger to True, it stores all the results in the directory lightning_logs/ by default.
If we need to view the results in an interactive manner we need to use the command tensorboard --logdir lightning_logs/ to start the server if the custom folder has not been defined.
Python
import pytorch_lightning as pl
from pytorch_lightning.loggers import TensorBoardLogger
import torch
import numpy as np
from torch.utils.data import DataLoader, TensorDataset
# Generate random data
X_train = torch.tensor(np.random.rand(100, 10), dtype=torch.float32) # 100 samples, 10 features
y_train = torch.tensor(np.random.randint(0, 2, size=(100,)), dtype=torch.long) # 100 labels (binary classification)
# Create a DataLoader
train_dataset = TensorDataset(X_train, y_train)
train_dataloader = DataLoader(train_dataset, batch_size=16)
# Define a simple model (LightningModule)
class LitModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.layer = torch.nn.Linear(10, 2) # 10 input features, 2 output classes
def forward(self, x):
return self.layer(x)
def training_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
loss = torch.nn.functional.cross_entropy(logits, y)
# Log loss to TensorBoard
self.log('train_loss', loss)
# Calculate accuracy
preds = torch.argmax(logits, dim=1)
acc = (preds == y).float().mean()
self.log('train_acc', acc) # Log accuracy to TensorBoard
print(acc)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters())
# Initialize TensorBoard logger
tb_logger = TensorBoardLogger("logs/", name="my_model")
# Set up trainer with the logger
trainer = pl.Trainer(logger=tb_logger, max_epochs=10)
# Train the model
trainer.fit(LitModel(), train_dataloaders=train_dataloader)
Here we have created a sample Numpy array that comprises of 100 samples with 10 features. The target is a binary variable. Lastly we created the DataLoaders.
Inside the class we have a created a simple model that has one layer along with forward pass and optimizer. In the training step, we are calculating the loss and using self.log method to capture the same. Finally we have initialized the Tensorboard logger and configured with the model.
Extracting Loss and Accuracy by Epoch2. CSV Logger
As the name suggests after training and testing of the model, the results are stored in CSV files. This technique is used when we want to see the results in csv format.
Python
import pytorch_lightning as pl
from pytorch_lightning.loggers import CSVLogger
import torch
import numpy as np
from torch.utils.data import DataLoader, TensorDataset
# Generate random data
X_train = torch.tensor(np.random.rand(100, 10), dtype=torch.float32)
y_train = torch.tensor(np.random.randint(0, 2, size=(100,)), dtype=torch.long)
# Create a DataLoader
train_dataset = TensorDataset(X_train, y_train)
train_dataloader = DataLoader(train_dataset, batch_size=16)
# Define a simple model (LightningModule)
class LitModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.layer = torch.nn.Linear(10, 2)
def forward(self, x):
return self.layer(x)
def training_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
loss = torch.nn.functional.cross_entropy(logits, y)
self.log('train_loss', loss)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters())
# Initialize CSV logger
csv_logger = CSVLogger("logs/", name="my_csv_model")
# Print log directory
print(f"Logs will be saved to: {csv_logger.log_dir}")
# Set up trainer with the logger
trainer = pl.Trainer(logger=csv_logger, max_epochs=12)
# Train the model
trainer.fit(LitModel(), train_dataloaders=train_dataloader)
Here also after setting up the model, we have initialized the CSV logger and the results are stored in logs/my_csv_model/version_0/metrics.csv. Also ensure that the epoch size is sufficient so that we can see the logs.
Extracting Loss and Accuracy by EpochCallbacks are small pieces of code that is used to capture model's behavior at any particular point. This method is particularly useful when we want to use checkpoints to prevent the model from overfitting or record the behavior of the model.
Python
import pytorch_lightning as pl
import torch
from torch import nn
from torch.optim import Adam
# Define a simple model
class SampleModel(pl.LightningModule):
def __init__(self):
super(SampleModel, self).__init__()
self.layer = nn.Linear(10, 2) # Input size of 10 and 2 output classes
def forward(self, x):
return self.layer(x)
def training_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
loss = nn.CrossEntropyLoss()(logits, y)
self.log("train_loss", loss) # Log training loss for metrics
return loss
def configure_optimizers(self):
return Adam(self.parameters(), lr=0.001)
# Custom Callback
class LogLossCallback(pl.Callback):
def on_train_epoch_start(self, trainer, pl_module):
print(f"Starting epoch {trainer.current_epoch}")
# Create random data
def random_data(num_samples=100, input_dim=10):
X = torch.randn(num_samples, input_dim) # Random input data
y = torch.randint(0, 2, (num_samples,)) # Random targets (binary classification)
return X, y
# Example usage with Trainer
if __name__ == "__main__":
model = SampleModel()
# Generate random data
X, y = random_data(num_samples=1000)
# Create a DataLoader
train_loader = torch.utils.data.DataLoader(list(zip(X, y)), batch_size=32, shuffle=True)
# Instantiate the Trainer with the custom callback
trainer = pl.Trainer(callbacks=[LogLossCallback()], max_epochs=5)
trainer.fit(model, train_loader)
In this we have created a custom Callback class that basically inherits the Callback class. Whenever each epoch will start, the method will be called and appropriate message will be printed.
Extracting Loss and Accuracy by EpochAccessing Logged Metrics After Training
Below we have used iris dataset and have implemented the CSV Logger and Custom callbacks.
Python
import torch
from torch.utils.data import DataLoader, TensorDataset
import torch.nn.functional as F
import pytorch_lightning as pl
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from pytorch_lightning.callbacks import Callback
from pytorch_lightning.loggers import CSVLogger
# 1. Load and Preprocess the Iris Dataset
iris = load_iris()
X = iris['data']
y = iris['target']
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Convert to PyTorch tensors
X_tensor = torch.tensor(X_scaled, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.long)
# Split into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_tensor, y_tensor, test_size=0.2, random_state=42)
# Create DataLoaders
train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)
# 2. Define the PyTorch Lightning Model
class IrisClassifier(pl.LightningModule):
def __init__(self):
super().__init__()
self.layer_1 = torch.nn.Linear(4, 16) # Input is 4-dimensional (features in Iris dataset)
self.layer_2 = torch.nn.Linear(16, 3) # Output is 3-dimensional (three Iris classes)
def forward(self, x):
x = F.relu(self.layer_1(x))
x = self.layer_2(x)
return x
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = F.cross_entropy(y_hat, y)
# Log training loss and accuracy
acc = (y_hat.argmax(dim=1) == y).float().mean()
self.log('train_loss_epoch', loss, on_epoch=True, prog_bar=True)
self.log('train_acc_epoch', acc, on_epoch=True, prog_bar=True)
return loss
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = F.cross_entropy(y_hat, y)
# Log validation loss and accuracy
acc = (y_hat.argmax(dim=1) == y).float().mean()
self.log('val_loss', loss, on_epoch=True, prog_bar=True)
self.log('val_acc', acc, on_epoch=True, prog_bar=True)
return loss
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=0.001)
return optimizer
# 3. Define a Custom Callback to Extract Loss and Accuracy
class MetricsLogger(Callback):
def __init__(self):
super().__init__()
self.train_losses = []
self.val_losses = []
self.train_accs = []
self.val_accs = []
def on_validation_epoch_end(self, trainer, pl_module):
# Extract metrics from the trainer's logger after the validation epoch ends
train_loss = trainer.callback_metrics.get('train_loss_epoch', None)
val_loss = trainer.callback_metrics.get('val_loss', None)
train_acc = trainer.callback_metrics.get('train_acc_epoch', None)
val_acc = trainer.callback_metrics.get('val_acc', None)
# Only append metrics if they exist
if train_loss is not None:
self.train_losses.append(train_loss.item())
if val_loss is not None:
self.val_losses.append(val_loss.item())
if train_acc is not None:
self.train_accs.append(train_acc.item())
if val_acc is not None:
self.val_accs.append(val_acc.item())
# Optionally, print metrics for each epoch
if train_loss is not None and val_loss is not None:
print(f'Epoch {trainer.current_epoch}: train_loss={train_loss:.4f}, val_loss={val_loss:.4f}')
if train_acc is not None and val_acc is not None:
print(f'Epoch {trainer.current_epoch}: train_acc={train_acc:.4f}, val_acc={val_acc:.4f}')
# 4. Train the Model with PyTorch Lightning
# Initialize the model, logger, and callback
model = IrisClassifier()
logger = CSVLogger(save_dir='logs/', name='iris_model')
metrics_logger = MetricsLogger()
# Trainer
trainer = pl.Trainer(
max_epochs=20,
logger=logger,
callbacks=[metrics_logger]
)
# Fit the model
trainer.fit(model, train_dataloaders=train_loader, val_dataloaders=val_loader)
# After training, you can access the stored metrics from the callback
print("Training Losses by Epoch:", metrics_logger.train_losses)
print("Validation Losses by Epoch:", metrics_logger.val_losses)
print("Training Accuracies by Epoch:", metrics_logger.train_accs)
print("Validation Accuracies by Epoch:", metrics_logger.val_accs)
Output:
Output of Custom Callbacks:
Training Losses by Epoch: [1.163916826248169, 1.1702001094818115, 0.9137388467788696, 0.9981975555419922, 1.1598992347717285, 0.9526604413986206, 0.8650658130645752, 0.9340900778770447, 1.1416115760803223, 0.9190675616264343, 0.8127445578575134, 0.9559628963470459, 0.887129545211792, 0.8117339015007019, 0.6771875023841858, 0.731735110282898, 0.4703018367290497, 0.6701182723045349, 0.6709480285644531, 0.6414933204650879]
Validation Losses by Epoch: [1.1944286823272705, 1.159570574760437, 1.1258552074432373, 1.094372272491455, 1.0632723569869995, 1.0325226783752441, 1.0021071434020996, 0.9709540605545044, 0.9395963549613953, 0.9076734185218811, 0.8769798874855042, 0.8452874422073364, 0.8138272762298584, 0.7839321494102478, 0.75382399559021, 0.7244547605514526, 0.6960629224777222, 0.6684385538101196, 0.6421023011207581, 0.6166333556175232, 0.593155562877655]
Training Accuracies by Epoch: [0.25, 0.125, 0.75, 0.25, 0.125, 0.5, 0.625, 0.5, 0.25, 0.5, 0.875, 0.625, 0.625, 1.0, 0.75, 0.625, 0.875, 0.625, 0.75, 0.875]
Validation Accuracies by Epoch: [0.0, 0.13333334028720856, 0.1666666716337204, 0.20000000298023224, 0.3333333432674408, 0.36666667461395264, 0.6000000238418579, 0.6000000238418579, 0.6333333253860474, 0.6333333253860474, 0.6333333253860474, 0.699999988079071, 0.699999988079071, 0.7333333492279053, 0.800000011920929, 0.8333333134651184, 0.8999999761581421, 0.8999999761581421, 0.8999999761581421, 0.8999999761581421, 0.8999999761581421]
Conclusion
PyTorch-Lightining is an efficient library and can be used by industry-level experts. Developers can focus on the core logic of the model and need not worry about keeping track of the metrics as they can use CSV Loggers or TensorBoard Loggers or customize callbacks as per their requirements.
Similar Reads
Loggers â PyTorch Lightning 1.5.10 Documentation PyTorch Lightning provides an efficient and flexible framework for scaling PyTorch models, and one of its essential features is the logging capability. In machine learning, logging is crucial for tracking metrics, losses, hyperparameters, and system outputs. PyTorch Lightning integrates seamlessly w
6 min read
How to create a custom Loss Function in PyTorch? Choosing the appropriate loss function is crucial in deep learning. It serves as a guide for directing the optimization process of neural networks while they are being trained. Although PyTorch offers many pre-defined loss functions, there are cases where regular loss functions are not enough. In th
3 min read
PyTorch Lightning Multi Dataloader Guide PyTorch Lightning provides a streamlined interface for managing multiple dataloaders, which is essential for handling complex datasets and training scenarios. This guide will explore the various methods and best practices for using multiple dataloaders in PyTorch Lightning, covering everything from
4 min read
PyTorch Lightning 1.5.10 Overview PyTorch Lightning is a lightweight wrapper around PyTorch that aims to simplify the process of building and training machine learning models. It abstracts much of the boilerplate code, allowing researchers and developers to focus more on the model architecture and less on the engineering details. Th
5 min read
Saving and Loading Weights in PyTorch Lightning In Machine learning models, it is important to save and load weights efficiently. This helps us preserve the state of our model during training, so we can resume later without starting from scratch. In this article, we are going to discuss how to save and load weights in PyTorch Lightning. PyTorch L
8 min read