Open In App

Extracting Loss and Accuracy from PyTorch Lightning Logger

Last Updated : 26 Sep, 2024
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

Pytorch-Lightning is an open-source deep learning framework. It acts as a wrapper for the PyTorch models. This library is helpful as it helps to simplify the training and testing of the models. It has a built in logging system that keeps the track of metrics like accuracies and losses after every iteration or epoch. In this article, we will explore how to extract these metrics by epoch using the PyTorch Lightning logger.

Understanding Logging in PyTorch Lightning

Logging means keeping records of the losses and accuracies that has been calculated during the training, validation and testing of the model. This technique is useful as it helps developers to check whether the model is prone to overfitting or underfitting.

Pytorch-Lightning has a built in feature of extracting the metrics. The most commonly used technique to capture logs is using self.log method that is a part of the LightningModule class. This library can also be integrated with many loggers like Tensorboard, MLFlow that helps users to keep records of the logs. Loggers can also be used to record the hyperparameters.

Extracting Loss and Accuracy by Epoch

As we all know that we can track the model's metrics step wise as well as epoch wise, we can use different techniques to extract the loss and accuracy of the model.

1. Using Tensorboard logger

Tensorboard logger is the most commonly used logger to keep the records of the metrics. Whenever we set the logger to True, it stores all the results in the directory lightning_logs/ by default.

If we need to view the results in an interactive manner we need to use the command tensorboard --logdir lightning_logs/ to start the server if the custom folder has not been defined.

Python
import pytorch_lightning as pl
from pytorch_lightning.loggers import TensorBoardLogger
import torch
import numpy as np
from torch.utils.data import DataLoader, TensorDataset

# Generate random data
X_train = torch.tensor(np.random.rand(100, 10), dtype=torch.float32)  # 100 samples, 10 features
y_train = torch.tensor(np.random.randint(0, 2, size=(100,)), dtype=torch.long)  # 100 labels (binary classification)

# Create a DataLoader
train_dataset = TensorDataset(X_train, y_train)
train_dataloader = DataLoader(train_dataset, batch_size=16)

# Define a simple model (LightningModule)
class LitModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(10, 2)  # 10 input features, 2 output classes

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = torch.nn.functional.cross_entropy(logits, y)

        # Log loss to TensorBoard
        self.log('train_loss', loss)

        # Calculate accuracy
        preds = torch.argmax(logits, dim=1)
        acc = (preds == y).float().mean()
        self.log('train_acc', acc)  # Log accuracy to TensorBoard
        print(acc)

        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters())

# Initialize TensorBoard logger
tb_logger = TensorBoardLogger("logs/", name="my_model")

# Set up trainer with the logger
trainer = pl.Trainer(logger=tb_logger, max_epochs=10)

# Train the model
trainer.fit(LitModel(), train_dataloaders=train_dataloader)


Here we have created a sample Numpy array that comprises of 100 samples with 10 features. The target is a binary variable. Lastly we created the DataLoaders.

Inside the class we have a created a simple model that has one layer along with forward pass and optimizer. In the training step, we are calculating the loss and using self.log method to capture the same. Finally we have initialized the Tensorboard logger and configured with the model.

Screenshot-2024-09-23-205841
Extracting Loss and Accuracy by Epoch

2. CSV Logger

As the name suggests after training and testing of the model, the results are stored in CSV files. This technique is used when we want to see the results in csv format.

Python
import pytorch_lightning as pl
from pytorch_lightning.loggers import CSVLogger
import torch
import numpy as np
from torch.utils.data import DataLoader, TensorDataset

# Generate random data
X_train = torch.tensor(np.random.rand(100, 10), dtype=torch.float32)
y_train = torch.tensor(np.random.randint(0, 2, size=(100,)), dtype=torch.long)

# Create a DataLoader
train_dataset = TensorDataset(X_train, y_train)
train_dataloader = DataLoader(train_dataset, batch_size=16)

# Define a simple model (LightningModule)
class LitModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(10, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = torch.nn.functional.cross_entropy(logits, y)
        self.log('train_loss', loss)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters())

# Initialize CSV logger
csv_logger = CSVLogger("logs/", name="my_csv_model")

# Print log directory
print(f"Logs will be saved to: {csv_logger.log_dir}")

# Set up trainer with the logger
trainer = pl.Trainer(logger=csv_logger, max_epochs=12)

# Train the model
trainer.fit(LitModel(), train_dataloaders=train_dataloader)


Here also after setting up the model, we have initialized the CSV logger and the results are stored in logs/my_csv_model/version_0/metrics.csv. Also ensure that the epoch size is sufficient so that we can see the logs.

Screenshot-2024-09-23-210836
Extracting Loss and Accuracy by Epoch

3. Using Callbacks to Extract Metrics

Callbacks are small pieces of code that is used to capture model's behavior at any particular point. This method is particularly useful when we want to use checkpoints to prevent the model from overfitting or record the behavior of the model.

Python
import pytorch_lightning as pl
import torch
from torch import nn
from torch.optim import Adam

# Define a simple model
class SampleModel(pl.LightningModule):
    def __init__(self):
        super(SampleModel, self).__init__()
        self.layer = nn.Linear(10, 2)  # Input size of 10 and 2 output classes

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = nn.CrossEntropyLoss()(logits, y)
        self.log("train_loss", loss)  # Log training loss for metrics
        return loss

    def configure_optimizers(self):
        return Adam(self.parameters(), lr=0.001)

# Custom Callback
class LogLossCallback(pl.Callback):
    def on_train_epoch_start(self, trainer, pl_module):
        print(f"Starting epoch {trainer.current_epoch}")


# Create random data
def random_data(num_samples=100, input_dim=10):
    X = torch.randn(num_samples, input_dim)  # Random input data
    y = torch.randint(0, 2, (num_samples,))   # Random targets (binary classification)
    return X, y

# Example usage with Trainer
if __name__ == "__main__":
    model = SampleModel()
    
    # Generate random data
    X, y = random_data(num_samples=1000)
    
    # Create a DataLoader
    train_loader = torch.utils.data.DataLoader(list(zip(X, y)), batch_size=32, shuffle=True)
    
    # Instantiate the Trainer with the custom callback
    trainer = pl.Trainer(callbacks=[LogLossCallback()], max_epochs=5)
    trainer.fit(model, train_loader)
    

In this we have created a custom Callback class that basically inherits the Callback class. Whenever each epoch will start, the method will be called and appropriate message will be printed.

Screenshot-2024-09-23-213231
Extracting Loss and Accuracy by Epoch

Accessing Logged Metrics After Training

Below we have used iris dataset and have implemented the CSV Logger and Custom callbacks.

Python
import torch
from torch.utils.data import DataLoader, TensorDataset
import torch.nn.functional as F
import pytorch_lightning as pl
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from pytorch_lightning.callbacks import Callback
from pytorch_lightning.loggers import CSVLogger

# 1. Load and Preprocess the Iris Dataset
iris = load_iris()
X = iris['data']
y = iris['target']

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Convert to PyTorch tensors
X_tensor = torch.tensor(X_scaled, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.long)

# Split into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_tensor, y_tensor, test_size=0.2, random_state=42)

# Create DataLoaders
train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)

# 2. Define the PyTorch Lightning Model
class IrisClassifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer_1 = torch.nn.Linear(4, 16)  # Input is 4-dimensional (features in Iris dataset)
        self.layer_2 = torch.nn.Linear(16, 3)  # Output is 3-dimensional (three Iris classes)

    def forward(self, x):
        x = F.relu(self.layer_1(x))
        x = self.layer_2(x)
        return x

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)

        # Log training loss and accuracy
        acc = (y_hat.argmax(dim=1) == y).float().mean()
        self.log('train_loss_epoch', loss, on_epoch=True, prog_bar=True)
        self.log('train_acc_epoch', acc, on_epoch=True, prog_bar=True)

        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)

        # Log validation loss and accuracy
        acc = (y_hat.argmax(dim=1) == y).float().mean()
        self.log('val_loss', loss, on_epoch=True, prog_bar=True)
        self.log('val_acc', acc, on_epoch=True, prog_bar=True)

        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=0.001)
        return optimizer

# 3. Define a Custom Callback to Extract Loss and Accuracy
class MetricsLogger(Callback):
    def __init__(self):
        super().__init__()
        self.train_losses = []
        self.val_losses = []
        self.train_accs = []
        self.val_accs = []

    def on_validation_epoch_end(self, trainer, pl_module):
        # Extract metrics from the trainer's logger after the validation epoch ends
        train_loss = trainer.callback_metrics.get('train_loss_epoch', None)
        val_loss = trainer.callback_metrics.get('val_loss', None)
        train_acc = trainer.callback_metrics.get('train_acc_epoch', None)
        val_acc = trainer.callback_metrics.get('val_acc', None)

        # Only append metrics if they exist
        if train_loss is not None:
            self.train_losses.append(train_loss.item())
        if val_loss is not None:
            self.val_losses.append(val_loss.item())
        if train_acc is not None:
            self.train_accs.append(train_acc.item())
        if val_acc is not None:
            self.val_accs.append(val_acc.item())

        # Optionally, print metrics for each epoch
        if train_loss is not None and val_loss is not None:
            print(f'Epoch {trainer.current_epoch}: train_loss={train_loss:.4f}, val_loss={val_loss:.4f}')
        if train_acc is not None and val_acc is not None:
            print(f'Epoch {trainer.current_epoch}: train_acc={train_acc:.4f}, val_acc={val_acc:.4f}')

# 4. Train the Model with PyTorch Lightning

# Initialize the model, logger, and callback
model = IrisClassifier()
logger = CSVLogger(save_dir='logs/', name='iris_model')
metrics_logger = MetricsLogger()

# Trainer
trainer = pl.Trainer(
    max_epochs=20,
    logger=logger,
    callbacks=[metrics_logger]
)

# Fit the model
trainer.fit(model, train_dataloaders=train_loader, val_dataloaders=val_loader)

# After training, you can access the stored metrics from the callback
print("Training Losses by Epoch:", metrics_logger.train_losses)
print("Validation Losses by Epoch:", metrics_logger.val_losses)
print("Training Accuracies by Epoch:", metrics_logger.train_accs)
print("Validation Accuracies by Epoch:", metrics_logger.val_accs)

Output:

Output of Custom Callbacks:

Training Losses by Epoch: [1.163916826248169, 1.1702001094818115, 0.9137388467788696, 0.9981975555419922, 1.1598992347717285, 0.9526604413986206, 0.8650658130645752, 0.9340900778770447, 1.1416115760803223, 0.9190675616264343, 0.8127445578575134, 0.9559628963470459, 0.887129545211792, 0.8117339015007019, 0.6771875023841858, 0.731735110282898, 0.4703018367290497, 0.6701182723045349, 0.6709480285644531, 0.6414933204650879]
Validation Losses by Epoch: [1.1944286823272705, 1.159570574760437, 1.1258552074432373, 1.094372272491455, 1.0632723569869995, 1.0325226783752441, 1.0021071434020996, 0.9709540605545044, 0.9395963549613953, 0.9076734185218811, 0.8769798874855042, 0.8452874422073364, 0.8138272762298584, 0.7839321494102478, 0.75382399559021, 0.7244547605514526, 0.6960629224777222, 0.6684385538101196, 0.6421023011207581, 0.6166333556175232, 0.593155562877655]
Training Accuracies by Epoch: [0.25, 0.125, 0.75, 0.25, 0.125, 0.5, 0.625, 0.5, 0.25, 0.5, 0.875, 0.625, 0.625, 1.0, 0.75, 0.625, 0.875, 0.625, 0.75, 0.875]
Validation Accuracies by Epoch: [0.0, 0.13333334028720856, 0.1666666716337204, 0.20000000298023224, 0.3333333432674408, 0.36666667461395264, 0.6000000238418579, 0.6000000238418579, 0.6333333253860474, 0.6333333253860474, 0.6333333253860474, 0.699999988079071, 0.699999988079071, 0.7333333492279053, 0.800000011920929, 0.8333333134651184, 0.8999999761581421, 0.8999999761581421, 0.8999999761581421, 0.8999999761581421, 0.8999999761581421]

Conclusion

PyTorch-Lightining is an efficient library and can be used by industry-level experts. Developers can focus on the core logic of the model and need not worry about keeping track of the metrics as they can use CSV Loggers or TensorBoard Loggers or customize callbacks as per their requirements.


Next Article

Similar Reads