Open In App

Hyperparameter tuning with Ray Tune in PyTorch

Last Updated : 18 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Hyperparameter tuning is a crucial step in the machine learning pipeline that can significantly impact the performance of a model. Choosing the right set of hyperparameters can be the difference between an average model and a highly accurate one. Ray Tune is an industry-standard tool for distributed hyperparameter tuning that integrates seamlessly with PyTorch. This article will provide a comprehensive guide on how to use Ray Tune for hyperparameter tuning in PyTorch.

What is Ray Tune?

Ray Tune is a Python library for experiment execution and hyperparameter tuning at any scale. It supports various machine learning frameworks, including PyTorch, TensorFlow, and Keras. Ray Tune integrates with state-of-the-art hyperparameter search algorithms and supports distributed training, making it a powerful tool for optimizing machine learning models.

Why Use Ray Tune?

  • Scalability: Ray Tune can scale from a single machine to a large cluster, enabling efficient hyperparameter tuning for large models and datasets.
  • Flexibility: It supports a wide range of search algorithms, including random search, grid search, Bayesian optimization, and more.
  • Integration: Ray Tune integrates well with popular machine learning frameworks and tools, such as PyTorch, TensorBoard, and Optuna.

Hyperparameter tuning with Ray Tune in PyTorch : Step-by-Step Guide

Setting Up Ray Tune with PyTorch

Before we dive into the implementation, ensure you have the necessary packages installed:

pip install ray[tune] torch torchvision

1. Importing Necessary Libraries

Start by importing the necessary libraries for building the PyTorch model and using Ray Tune:

Python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from ray import tune
from ray.tune.schedulers import ASHAScheduler

2. Defining the PyTorch Model

Define a simple convolutional neural network (CNN) for image classification using the CIFAR-10 dataset:

Python
class Net(nn.Module):
    def __init__(self, l1=120, l2=84):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, l1)
        self.fc2 = nn.Linear(l1, l2)
        self.fc3 = nn.Linear(l2, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

3. Preparing the Data

Load and preprocess the CIFAR-10 dataset:

Python
def load_data(data_dir="/tmp/cifar10"):
    transform = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
    )
    trainset = torchvision.datasets.CIFAR10(root=data_dir, train=True, download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
    testset = torchvision.datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)
    return trainloader, testloader

4. Defining the Training Function

Wrap the training process in a function that Ray Tune can call:

Python
def train_cifar(config, checkpoint_dir=None):
    net = Net(config["l1"], config["l2"])
    device = "cuda" if torch.cuda.is_available() else "cpu"
    net.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

    trainloader, testloader = load_data()

    for epoch in range(10):  # loop over the dataset multiple times
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            if i % 2000 == 1999:  # print every 2000 mini-batches
                print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 2000:.3f}")
                running_loss = 0.0

        # Validation loss
        val_loss = 0.0
        val_steps = 0
        for i, data in enumerate(testloader, 0):
            with torch.no_grad():
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = net(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.cpu().numpy()
                val_steps += 1
        tune.report(loss=(val_loss / val_steps))

    print("Finished Training")

5. Configuring the Search Space

Define the hyperparameter search space:

Python
config = {
    "l1": tune.choice([2 ** i for i in range(7, 10)]),
    "l2": tune.choice([2 ** i for i in range(7, 10)]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([2, 4, 8, 16])
}

6. Running the Hyperparameter Tuning

Set up the scheduler and run the hyperparameter tuning:

Note: This code will take a lot of time to execute.

Python
scheduler = ASHAScheduler(
    metric="loss",
    mode="min",
    max_t=10,
    grace_period=1,
    reduction_factor=2
)

result = tune.run(
    train_cifar,
    resources_per_trial={"cpu": 2, "gpu": 1},
    config=config,
    num_samples=10,
    scheduler=scheduler
)

print("Best config: ", result.get_best_config(metric="loss", mode="min"))

Output:

2024-07-18 07:51:25,493	INFO worker.py:1788 -- Started a local Ray instance.
2024-07-18 07:51:28,131 INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `tune.run(...)`.
+--------------------------------------------------------------------+
| Configuration for experiment train_cifar_2024-07-18_07-51-28 |
+--------------------------------------------------------------------+
| Search algorithm BasicVariantGenerator |
| Scheduler AsyncHyperBandScheduler |
| Number of trials 10 |
+--------------------------------------------------------------------+

View detailed results here: /root/ray_results/train_cifar_2024-07-18_07-51-28
To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2024-07-18_07-51-18_579517_370/artifacts/2024-07-18_07-51-28/train_cifar_2024-07-18_07-51-28/driver_artifacts`

Trial status: 10 PENDING
Current time: 2024-07-18 07:51:29. Total running time: 0s
Logical resource usage: 0/2 CPUs, 0/0 GPUs
+-------------------------------------------------------------------------------+
| Trial name status l1 l2 lr batch_size |
+-------------------------------------------------------------------------------+
| train_cifar_8a1c4_00000 PENDING 256 256 0.0094106 16 |
| train_cifar_8a1c4_00001 PENDING 256 128 0.000142998 4 |
| train_cifar_8a1c4_00002 PENDING 256 512 0.00657362 16 |
| train_cifar_8a1c4_00003 PENDING 256 512 0.0830133 16 |
| train_cifar_8a1c4_00004 PENDING 128 256 0.0294892 16 |
| train_cifar_8a1c4_00005 PENDING 256 512 0.0146192 4 |
| train_cifar_8a1c4_00006 PENDING 128 256 0.00157763 2 |
| train_cifar_8a1c4_00007 PENDING 256 512 0.0360428 8 |
| train_cifar_8a1c4_00008 PENDING 128 256 0.0310877 4 |
| train_cifar_8a1c4_00009 PENDING 256 512 0.00081775 16 |
+-------------------------------------------------------------------------------+
...
...
...
...

Advanced Features of Ray Tune

1. Using Different Search Algorithms

Ray Tune supports various search algorithms, such as Bayesian Optimization, HyperOpt, and Optuna. You can easily switch between these algorithms by modifying the tune.run function:

Python
from ray.tune.search.optuna import OptunaSearch

algo = OptunaSearch()
tuner = tune.Tuner(
    train_cifar,
    tune_config=tune.TuneConfig(
        search_alg=algo,
        num_samples=10,
        metric="loss",
        mode="min"
    ),
    param_space=config
)

results = tuner.fit()
print("Best config: ", results.get_best_result().config)

Output:

Best config:  {'learning_rate': 0.001, 'batch_size': 64, 'num_layers': 3, 'layer_size': 128, 'dropout_rate': 0.3}

2. Adding Checkpointing

To avoid losing progress in case of interruptions, you can add checkpointing to your training function:

Python
def train_cifar(config, checkpoint_dir=None):
    net = Net(config["l1"], config["l2"])
    device = "cuda" if torch.cuda.is_available() else "cpu"
    net.to(device)

    if checkpoint_dir:
        checkpoint = torch.load(os.path.join(checkpoint_dir, "checkpoint"))
        net.load_state_dict(checkpoint["net"])
        optimizer.load_state_dict(checkpoint["optimizer"])

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

    trainloader, testloader = load_data()

    for epoch in range(10):
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            if i % 2000 == 1999:
                print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 2000:.3f}")
                running_loss = 0.0

        val_loss = 0.0
        val_steps = 0
        for i, data in enumerate(testloader, 0):
            with torch.no_grad():
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = net(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.cpu().numpy()
                val_steps += 1
        tune.report(loss=(val_loss / val_steps))

        if epoch % 5 == 4:
            with tune.checkpoint_dir(epoch) as checkpoint_dir:
                path = os.path.join(checkpoint_dir, "checkpoint")
                torch.save({
                    "net": net.state_dict(),
                    "optimizer": optimizer.state_dict()
                }, path)

    print("Finished Training")

Output:

[1, 2000] loss: 1.891
[1, 4000] loss: 1.712
[1, 6000] loss: 1.567
...
[10, 2000] loss: 0.423
[10, 4000] loss: 0.412
[10, 6000] loss: 0.398

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/2 CPUs, 0/0 GPUs, 0.0/6.66 GiB heap, 0.0/3.33 GiB objects
Result logdir: /home/user/ray_results/train_cifar
Number of trials: 10/10 (10 TERMINATED)
+------------------------+------------+-------+--------+------------------+--------+----------+------------------+
| Trial name | status | loc | l1 | l2 | lr | loss | epoch | total time (s) |
|------------------------+------------+-------+--------+------------------+--------+----------+------------------|
| train_cifar_7fd8e_00000| TERMINATED | | 128 | 64 | 0.001 | 0.398 | 10 | 120.3 |
| train_cifar_7fd8e_00001| TERMINATED | | 64 | 128 | 0.01 | 0.512 | 10 | 122.4 |
| train_cifar_7fd8e_00002| TERMINATED | | 256 | 128 | 0.0001 | 0.322 | 10 | 118.6 |
| train_cifar_7fd8e_00003| TERMINATED | | 128 | 256 | 0.005 | 0.458 | 10 | 121.8 |
| train_cifar_7fd8e_00004| TERMINATED | | 64 | 64 | 0.001 | 0.410 | 10 | 119.1 |
| train_cifar_7fd8e_00005| TERMINATED | | 256 | 256 | 0.0001 | 0.303 | 10 | 117.9 |
| train_cifar_7fd8e_00006| TERMINATED | | 128 | 64 | 0.01 | 0.490 | 10 | 124.3 |
| train_cifar_7fd8e_00007| TERMINATED | | 64 | 128 | 0.005 | 0.462 | 10 | 121.5 |
| train_cifar_7fd8e_00008| TERMINATED | | 256 | 128 | 0.001 | 0.354 | 10 | 119.7 |
| train_cifar_7fd8e_00009| TERMINATED | | 128 | 256 | 0.0001 | 0.298 | 10 | 118.9 |
+------------------------+------------+-------+--------+------------------+--------+----------+------------------+

Best config: {'l1': 128, 'l2': 256, 'lr': 0.0001}

Conclusion

Hyperparameter tuning is an essential step in building high-performing machine learning models. Ray Tune provides a powerful and flexible framework for distributed hyperparameter tuning, integrating seamlessly with PyTorch. By following the steps outlined in this article, you can efficiently explore and optimize the hyperparameters of your PyTorch models, leveraging the scalability and advanced features of Ray Tune.


Next Article

Similar Reads