PyTorch Lightning with TensorBoard
Last Updated :
24 Sep, 2024
Pytorch-Lightning is a popular deep learning framework. It basically works with PyTorch models to simplify the training and testing of the models. This library is useful for distributed training as one can train the model seamlessly without much complex codes. Now to get the metrics in an user interactive, we need TensorBoard. TensorBoard is a powerful library that provides visualizations, loss and accuracy of the model. It also helps to debug our models.
Integrating PyTorch Lightning with TensorBoard, a powerful visualization tool, enhances the ability to monitor metrics, model performance, and training progress in real time.
Setting Up TensorBoard with PyTorch Lightning
To setup PyTorch Lightning with TensorBoard, we have to ensure that PyTorch has been installed. To install the library use the below command. We can also use pip command.
conda install pytorch torchvision torchaudio cpuonly -c pytorch
After installing PyTorch, we need to install PyTorch Lightning. To install the library use pip command:
pip install pytorch-lightning
Setting Up TensorBoard with PyTorch LightningNow to install the TensorBoard Library use the below command:
pip install tensorboard
Setting Up TensorBoard with PyTorch LightningWhy Use PyTorch Lightning with TensorBoard?
Using PyTorch Lightning and TensorBoard together has multiple benefits:
- Automated Logging: PyTorch Lightning automatically logs metrics, making it easier to monitor the training process.
- Visualization: TensorBoard visualizes training progress, making debugging and analysis more efficient.
- Scalability: PyTorch Lightning scales models across multiple GPUs and TPUs, while TensorBoard keeps track of metrics across these distributed systems.
Logging Metrics with PyTorch Lightning to TensorBoard
TensorBoard works hand in hand with Pytorch-Lightning. Whatever errors we log in using PyTorch Lightning, TensorBoard automatically captures the data, creates interactive visualizations and hosts them on local host. To store the results we use self.log method. This method interacts with TensorBoard and provides with the logs.
Python
x, y = batch
y_hat = self(x)
loss = self.loss_fn(y_hat, y)
acc = (y_hat.argmax(dim=1) == y).float().mean()
self.log('test_loss', loss, prog_bar=True)
self.log('test_acc', acc, prog_bar=True)
self.log method is applicable for training, testing and validation methods.
- After running the model, we need to log in to the TensorBoard and get the details of the metrics and its corresponding visualizations.
- The results are usually hosted on http://localhost:6006/. TensorBoard is useful as it also helps in comparison of many versions of models.
The command to log in to TensorBoard is as follows:
tensorboard --logdir=lightning_logs/
Logging Metrics with PyTorch Lightning
Sample - Visualizing Model Training in TensorBoardExample: Neural Network with PyTorch Lightning and TensorBoard
Here we have used MNIST dataset. We have defined the class using Pytorch-Lightning.
- In the class there are three fully connected layers and methods like forward pass, optimizing, training steps to train model, validating steps to prevent model from overfitting and testing the model.
- On the dataset, we apply transformations like converting to tensors, normalizations etc. Finally we call the Trainer object to train and test the model.
Batch Size:32
Epochs:5
Learning Rate:0.001
Optimizer: Adam
Activation Function: ReLU
Loss: Cross Entropy
Python
import pytorch_lightning as pl
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import transforms, datasets
# Step 1: Define the LightningModule
class LitModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.layer_1 = nn.Linear(28 * 28, 128)
self.layer_2 = nn.Linear(128, 256)
self.layer_3 = nn.Linear(256, 10)
self.loss_fn = nn.CrossEntropyLoss()
def forward(self, x):
# Flatten the input (28x28 images to 784)
x = x.view(x.size(0), -1)
x = torch.relu(self.layer_1(x))
x = torch.relu(self.layer_2(x))
x = self.layer_3(x)
return x
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = self.loss_fn(y_hat, y)
acc = (y_hat.argmax(dim=1) == y).float().mean() # Accuracy for training
self.log('train_loss', loss)
self.log('train_acc', acc) # Logging training accuracy
return loss
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = self.loss_fn(y_hat, y)
acc = (y_hat.argmax(dim=1) == y).float().mean() # Validation accuracy
self.log('val_loss', loss, prog_bar=True)
self.log('val_acc', acc, prog_bar=True) # Logging validation accuracy
def test_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = self.loss_fn(y_hat, y)
acc = (y_hat.argmax(dim=1) == y).float().mean() # Testing accuracy
self.log('test_loss', loss, prog_bar=True)
self.log('test_acc', acc, prog_bar=True) # Logging test accuracy
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)
# Step 2: Prepare Data
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
mnist_train = datasets.MNIST(root='.', train=True, download=True, transform=transform)
mnist_test = datasets.MNIST(root='.', train=False, download=True, transform=transform)
train_loader = DataLoader(mnist_train, batch_size=32)
test_loader = DataLoader(mnist_test, batch_size=32)
# Step 3: Create Trainer and Train Model
model = LitModel()
trainer = pl.Trainer(max_epochs=5, accelerator='cpu')
# Step 4: Train the model
trainer.fit(model, train_loader, test_loader)
# Step 5: Test the model
trainer.test(model, test_loader)
Output:
Neural Network on the MNIST Dataset with PyTorch LightningAfter the complete training and testing of the model use the command 'tensorboard --logdir=lightning_logs/' as all the logs are stored in lightning_logs. Now go to the host link as provided.
As we can see that the training loss decreases with the step size and the same goes for test as well. It also shows the relative time that has been taken to train the model. The overall testing accurcy of the model is 96.67%.
Conclusion
TensorBoard is a powerful library as it captures all the logs that has been generated during the training and testing of the models. By incorporating it with PyTorch Lightning model and simply using the self.log method, all the records gets captured and using the command all those logs are hosted on the local host. TensorBoard efficiently uses those logs to create interactive visualizations thereby reducing the code size.