Open In App

PyTorch Lightning with TensorBoard

Last Updated : 24 Sep, 2024
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

Pytorch-Lightning is a popular deep learning framework. It basically works with PyTorch models to simplify the training and testing of the models. This library is useful for distributed training as one can train the model seamlessly without much complex codes. Now to get the metrics in an user interactive, we need TensorBoard. TensorBoard is a powerful library that provides visualizations, loss and accuracy of the model. It also helps to debug our models.

Integrating PyTorch Lightning with TensorBoard, a powerful visualization tool, enhances the ability to monitor metrics, model performance, and training progress in real time.

Setting Up TensorBoard with PyTorch Lightning

To setup PyTorch Lightning with TensorBoard, we have to ensure that PyTorch has been installed. To install the library use the below command. We can also use pip command.

conda install pytorch torchvision torchaudio cpuonly -c pytorch

After installing PyTorch, we need to install PyTorch Lightning. To install the library use pip command:

pip install pytorch-lightning

Screenshot-2024-09-21-175118
Setting Up TensorBoard with PyTorch Lightning

Now to install the TensorBoard Library use the below command:

pip install tensorboard

Screenshot-2024-09-21-194706
Setting Up TensorBoard with PyTorch Lightning

Why Use PyTorch Lightning with TensorBoard?

Using PyTorch Lightning and TensorBoard together has multiple benefits:

  • Automated Logging: PyTorch Lightning automatically logs metrics, making it easier to monitor the training process.
  • Visualization: TensorBoard visualizes training progress, making debugging and analysis more efficient.
  • Scalability: PyTorch Lightning scales models across multiple GPUs and TPUs, while TensorBoard keeps track of metrics across these distributed systems.

Logging Metrics with PyTorch Lightning to TensorBoard

TensorBoard works hand in hand with Pytorch-Lightning. Whatever errors we log in using PyTorch Lightning, TensorBoard automatically captures the data, creates interactive visualizations and hosts them on local host. To store the results we use self.log method. This method interacts with TensorBoard and provides with the logs.

Python
x, y = batch
y_hat = self(x)
loss = self.loss_fn(y_hat, y)
acc = (y_hat.argmax(dim=1) == y).float().mean()
self.log('test_loss', loss, prog_bar=True)
self.log('test_acc', acc, prog_bar=True)

self.log method is applicable for training, testing and validation methods.

  • After running the model, we need to log in to the TensorBoard and get the details of the metrics and its corresponding visualizations.
  • The results are usually hosted on http://localhost:6006/. TensorBoard is useful as it also helps in comparison of many versions of models.

The command to log in to TensorBoard is as follows:

tensorboard --logdir=lightning_logs/

Screenshot-2024-09-21-202855
Logging Metrics with PyTorch Lightning
Screenshot-2024-09-21-203044
Sample - Visualizing Model Training in TensorBoard

Example: Neural Network with PyTorch Lightning and TensorBoard

Here we have used MNIST dataset. We have defined the class using Pytorch-Lightning.

  • In the class there are three fully connected layers and methods like forward pass, optimizing, training steps to train model, validating steps to prevent model from overfitting and testing the model.
  • On the dataset, we apply transformations like converting to tensors, normalizations etc. Finally we call the Trainer object to train and test the model.
Batch Size:32
Epochs:5
Learning Rate:0.001
Optimizer: Adam
Activation Function: ReLU
Loss: Cross Entropy
Python
import pytorch_lightning as pl
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import transforms, datasets

# Step 1: Define the LightningModule
class LitModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer_1 = nn.Linear(28 * 28, 128)
        self.layer_2 = nn.Linear(128, 256)
        self.layer_3 = nn.Linear(256, 10)
        self.loss_fn = nn.CrossEntropyLoss()

    def forward(self, x):
        # Flatten the input (28x28 images to 784)
        x = x.view(x.size(0), -1)
        x = torch.relu(self.layer_1(x))
        x = torch.relu(self.layer_2(x))
        x = self.layer_3(x)
        return x

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = self.loss_fn(y_hat, y)
        acc = (y_hat.argmax(dim=1) == y).float().mean()  # Accuracy for training
        self.log('train_loss', loss)
        self.log('train_acc', acc)  # Logging training accuracy
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = self.loss_fn(y_hat, y)
        acc = (y_hat.argmax(dim=1) == y).float().mean()  # Validation accuracy
        self.log('val_loss', loss, prog_bar=True)
        self.log('val_acc', acc, prog_bar=True)  # Logging validation accuracy

    def test_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = self.loss_fn(y_hat, y)
        acc = (y_hat.argmax(dim=1) == y).float().mean()  # Testing accuracy
        self.log('test_loss', loss, prog_bar=True)
        self.log('test_acc', acc, prog_bar=True)  # Logging test accuracy

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=1e-3)

# Step 2: Prepare Data
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
mnist_train = datasets.MNIST(root='.', train=True, download=True, transform=transform)
mnist_test = datasets.MNIST(root='.', train=False, download=True, transform=transform)

train_loader = DataLoader(mnist_train, batch_size=32)
test_loader = DataLoader(mnist_test, batch_size=32)

# Step 3: Create Trainer and Train Model
model = LitModel()
trainer = pl.Trainer(max_epochs=5, accelerator='cpu')

# Step 4: Train the model
trainer.fit(model, train_loader, test_loader)

# Step 5: Test the model
trainer.test(model, test_loader)

Output:

Screenshot-2024-09-21-204454
Neural Network on the MNIST Dataset with PyTorch Lightning

After the complete training and testing of the model use the command 'tensorboard --logdir=lightning_logs/' as all the logs are stored in lightning_logs. Now go to the host link as provided.

As we can see that the training loss decreases with the step size and the same goes for test as well. It also shows the relative time that has been taken to train the model. The overall testing accurcy of the model is 96.67%.

Conclusion

TensorBoard is a powerful library as it captures all the logs that has been generated during the training and testing of the models. By incorporating it with PyTorch Lightning model and simply using the self.log method, all the records gets captured and using the command all those logs are hosted on the local host. TensorBoard efficiently uses those logs to create interactive visualizations thereby reducing the code size.


Next Article

Similar Reads