Hyperparameter tuning with Ray Tune in PyTorch
Last Updated :
18 Jul, 2024
Hyperparameter tuning is a crucial step in the machine learning pipeline that can significantly impact the performance of a model. Choosing the right set of hyperparameters can be the difference between an average model and a highly accurate one. Ray Tune is an industry-standard tool for distributed hyperparameter tuning that integrates seamlessly with PyTorch. This article will provide a comprehensive guide on how to use Ray Tune for hyperparameter tuning in PyTorch.
What is Ray Tune?
Ray Tune is a Python library for experiment execution and hyperparameter tuning at any scale. It supports various machine learning frameworks, including PyTorch, TensorFlow, and Keras. Ray Tune integrates with state-of-the-art hyperparameter search algorithms and supports distributed training, making it a powerful tool for optimizing machine learning models.
Why Use Ray Tune?
- Scalability: Ray Tune can scale from a single machine to a large cluster, enabling efficient hyperparameter tuning for large models and datasets.
- Flexibility: It supports a wide range of search algorithms, including random search, grid search, Bayesian optimization, and more.
- Integration: Ray Tune integrates well with popular machine learning frameworks and tools, such as PyTorch, TensorBoard, and Optuna.
Hyperparameter tuning with Ray Tune in PyTorch : Step-by-Step Guide
Setting Up Ray Tune with PyTorch
Before we dive into the implementation, ensure you have the necessary packages installed:
pip install ray[tune] torch torchvision
1. Importing Necessary Libraries
Start by importing the necessary libraries for building the PyTorch model and using Ray Tune:
Python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from ray import tune
from ray.tune.schedulers import ASHAScheduler
2. Defining the PyTorch Model
Define a simple convolutional neural network (CNN) for image classification using the CIFAR-10 dataset:
Python
class Net(nn.Module):
def __init__(self, l1=120, l2=84):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, l1)
self.fc2 = nn.Linear(l1, l2)
self.fc3 = nn.Linear(l2, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
3. Preparing the Data
Load and preprocess the CIFAR-10 dataset:
Python
def load_data(data_dir="/tmp/cifar10"):
transform = transforms.Compose(
[transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
)
trainset = torchvision.datasets.CIFAR10(root=data_dir, train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)
return trainloader, testloader
4. Defining the Training Function
Wrap the training process in a function that Ray Tune can call:
Python
def train_cifar(config, checkpoint_dir=None):
net = Net(config["l1"], config["l2"])
device = "cuda" if torch.cuda.is_available() else "cpu"
net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
trainloader, testloader = load_data()
for epoch in range(10): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 2000:.3f}")
running_loss = 0.0
# Validation loss
val_loss = 0.0
val_steps = 0
for i, data in enumerate(testloader, 0):
with torch.no_grad():
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
loss = criterion(outputs, labels)
val_loss += loss.cpu().numpy()
val_steps += 1
tune.report(loss=(val_loss / val_steps))
print("Finished Training")
5. Configuring the Search Space
Define the hyperparameter search space:
Python
config = {
"l1": tune.choice([2 ** i for i in range(7, 10)]),
"l2": tune.choice([2 ** i for i in range(7, 10)]),
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([2, 4, 8, 16])
}
6. Running the Hyperparameter Tuning
Set up the scheduler and run the hyperparameter tuning:
Note: This code will take a lot of time to execute.
Python
scheduler = ASHAScheduler(
metric="loss",
mode="min",
max_t=10,
grace_period=1,
reduction_factor=2
)
result = tune.run(
train_cifar,
resources_per_trial={"cpu": 2, "gpu": 1},
config=config,
num_samples=10,
scheduler=scheduler
)
print("Best config: ", result.get_best_config(metric="loss", mode="min"))
Output:
2024-07-18 07:51:25,493 INFO worker.py:1788 -- Started a local Ray instance.
2024-07-18 07:51:28,131 INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `tune.run(...)`.
+--------------------------------------------------------------------+
| Configuration for experiment train_cifar_2024-07-18_07-51-28 |
+--------------------------------------------------------------------+
| Search algorithm BasicVariantGenerator |
| Scheduler AsyncHyperBandScheduler |
| Number of trials 10 |
+--------------------------------------------------------------------+
View detailed results here: /root/ray_results/train_cifar_2024-07-18_07-51-28
To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2024-07-18_07-51-18_579517_370/artifacts/2024-07-18_07-51-28/train_cifar_2024-07-18_07-51-28/driver_artifacts`
Trial status: 10 PENDING
Current time: 2024-07-18 07:51:29. Total running time: 0s
Logical resource usage: 0/2 CPUs, 0/0 GPUs
+-------------------------------------------------------------------------------+
| Trial name status l1 l2 lr batch_size |
+-------------------------------------------------------------------------------+
| train_cifar_8a1c4_00000 PENDING 256 256 0.0094106 16 |
| train_cifar_8a1c4_00001 PENDING 256 128 0.000142998 4 |
| train_cifar_8a1c4_00002 PENDING 256 512 0.00657362 16 |
| train_cifar_8a1c4_00003 PENDING 256 512 0.0830133 16 |
| train_cifar_8a1c4_00004 PENDING 128 256 0.0294892 16 |
| train_cifar_8a1c4_00005 PENDING 256 512 0.0146192 4 |
| train_cifar_8a1c4_00006 PENDING 128 256 0.00157763 2 |
| train_cifar_8a1c4_00007 PENDING 256 512 0.0360428 8 |
| train_cifar_8a1c4_00008 PENDING 128 256 0.0310877 4 |
| train_cifar_8a1c4_00009 PENDING 256 512 0.00081775 16 |
+-------------------------------------------------------------------------------+
...
...
...
...
Advanced Features of Ray Tune
1. Using Different Search Algorithms
Ray Tune supports various search algorithms, such as Bayesian Optimization, HyperOpt, and Optuna. You can easily switch between these algorithms by modifying the tune.run
function:
Python
from ray.tune.search.optuna import OptunaSearch
algo = OptunaSearch()
tuner = tune.Tuner(
train_cifar,
tune_config=tune.TuneConfig(
search_alg=algo,
num_samples=10,
metric="loss",
mode="min"
),
param_space=config
)
results = tuner.fit()
print("Best config: ", results.get_best_result().config)
Output:
Best config: {'learning_rate': 0.001, 'batch_size': 64, 'num_layers': 3, 'layer_size': 128, 'dropout_rate': 0.3}
2. Adding Checkpointing
To avoid losing progress in case of interruptions, you can add checkpointing to your training function:
Python
def train_cifar(config, checkpoint_dir=None):
net = Net(config["l1"], config["l2"])
device = "cuda" if torch.cuda.is_available() else "cpu"
net.to(device)
if checkpoint_dir:
checkpoint = torch.load(os.path.join(checkpoint_dir, "checkpoint"))
net.load_state_dict(checkpoint["net"])
optimizer.load_state_dict(checkpoint["optimizer"])
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
trainloader, testloader = load_data()
for epoch in range(10):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 2000 == 1999:
print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 2000:.3f}")
running_loss = 0.0
val_loss = 0.0
val_steps = 0
for i, data in enumerate(testloader, 0):
with torch.no_grad():
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
loss = criterion(outputs, labels)
val_loss += loss.cpu().numpy()
val_steps += 1
tune.report(loss=(val_loss / val_steps))
if epoch % 5 == 4:
with tune.checkpoint_dir(epoch) as checkpoint_dir:
path = os.path.join(checkpoint_dir, "checkpoint")
torch.save({
"net": net.state_dict(),
"optimizer": optimizer.state_dict()
}, path)
print("Finished Training")
Output:
[1, 2000] loss: 1.891
[1, 4000] loss: 1.712
[1, 6000] loss: 1.567
...
[10, 2000] loss: 0.423
[10, 4000] loss: 0.412
[10, 6000] loss: 0.398
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/2 CPUs, 0/0 GPUs, 0.0/6.66 GiB heap, 0.0/3.33 GiB objects
Result logdir: /home/user/ray_results/train_cifar
Number of trials: 10/10 (10 TERMINATED)
+------------------------+------------+-------+--------+------------------+--------+----------+------------------+
| Trial name | status | loc | l1 | l2 | lr | loss | epoch | total time (s) |
|------------------------+------------+-------+--------+------------------+--------+----------+------------------|
| train_cifar_7fd8e_00000| TERMINATED | | 128 | 64 | 0.001 | 0.398 | 10 | 120.3 |
| train_cifar_7fd8e_00001| TERMINATED | | 64 | 128 | 0.01 | 0.512 | 10 | 122.4 |
| train_cifar_7fd8e_00002| TERMINATED | | 256 | 128 | 0.0001 | 0.322 | 10 | 118.6 |
| train_cifar_7fd8e_00003| TERMINATED | | 128 | 256 | 0.005 | 0.458 | 10 | 121.8 |
| train_cifar_7fd8e_00004| TERMINATED | | 64 | 64 | 0.001 | 0.410 | 10 | 119.1 |
| train_cifar_7fd8e_00005| TERMINATED | | 256 | 256 | 0.0001 | 0.303 | 10 | 117.9 |
| train_cifar_7fd8e_00006| TERMINATED | | 128 | 64 | 0.01 | 0.490 | 10 | 124.3 |
| train_cifar_7fd8e_00007| TERMINATED | | 64 | 128 | 0.005 | 0.462 | 10 | 121.5 |
| train_cifar_7fd8e_00008| TERMINATED | | 256 | 128 | 0.001 | 0.354 | 10 | 119.7 |
| train_cifar_7fd8e_00009| TERMINATED | | 128 | 256 | 0.0001 | 0.298 | 10 | 118.9 |
+------------------------+------------+-------+--------+------------------+--------+----------+------------------+
Best config: {'l1': 128, 'l2': 256, 'lr': 0.0001}
Conclusion
Hyperparameter tuning is an essential step in building high-performing machine learning models. Ray Tune provides a powerful and flexible framework for distributed hyperparameter tuning, integrating seamlessly with PyTorch. By following the steps outlined in this article, you can efficiently explore and optimize the hyperparameters of your PyTorch models, leveraging the scalability and advanced features of Ray Tune.
Similar Reads
Hyperparameter tuning with Optuna in PyTorch
Hyperparameter tuning is a critical step in the machine learning pipeline, often determining the success of a model. Optuna is a powerful and flexible framework for hyperparameter optimization, designed to automate the search for optimal hyperparameters. When combined with PyTorch, a popular deep le
5 min read
Hyperparameter Tuning with R
In R Language several techniques and packages can be used to optimize these hyperparameters, leading to better, more reliable models. in this article, we will discuss all the techniques and packages for Hyperparameter Tuning with R. What are Hyperparameters?Hyperparameters are the settings that cont
5 min read
How to tune a Decision Tree in Hyperparameter tuning
Decision trees are powerful models extensively used in machine learning for classification and regression tasks. The structure of decision trees resembles the flowchart of decisions helps us to interpret and explain easily. However, the performance of decision trees highly relies on the hyperparamet
14 min read
Hyperparameter Tuning in Linear Regression
Linear regression is one of the simplest and most widely used algorithms in machine learning. Despite its simplicity, it can be quite powerful, especially when combined with proper hyperparameter tuning. Hyperparameter tuning is the process of tuning a machine learning model's parameters to achieve
7 min read
HyperParameter Tuning: Fixing Overfitting in Neural Networks
Overfitting is a pervasive problem in neural networks, where the model becomes too specialized to the training data and fails to generalize well to new, unseen data. This issue can be addressed through hyperparameter tuning, which involves adjusting various parameters to optimize the performance of
6 min read
Hyperparameter tuning
Machine Learning model is defined as a mathematical model with several parameters that need to be learned from the data. By training a model with existing data we can fit the model parameters. However there is another kind of parameter known as hyperparameters which cannot be directly learned from t
8 min read
Understanding torch.nn.Parameter
PyTorch is a widely used library for building and training neural networks, and understanding its components is key to effectively using it for machine learning tasks. One of the essential classes in PyTorch is torch.nn.Parameter, which plays a crucial role in defining trainable parameters within a
5 min read
PyTorch Tutorial - Learn PyTorch with Examples
PyTorch is an open-source deep learning framework designed to simplify the process of building neural networks and machine learning models. With its dynamic computation graph, PyTorch allows developers to modify the networkâs behavior in real-time, making it an excellent choice for both beginners an
7 min read
Sklearn | Model Hyper-parameters Tuning
Hyperparameter tuning is the process of finding the optimal values for the hyperparameters of a machine-learning model. Hyperparameters are parameters that control the behaviour of the model but are not learned during training. Hyperparameter tuning is an important step in developing machine learnin
12 min read
Understanding Broadcasting in PyTorch
Broadcasting is a fundamental concept in PyTorch that allows element-wise operations between tensors with diverse shapes. PyTorch automatically conforms (or "broadcasts") the smaller tensor's shape to match the larger tensor's when the two tensors have different dimensions. This allows the operation
8 min read