Building a Convolutional Neural Network using PyTorch

Convolutional Neural Networks (CNNs) are deep learning models used for image processing tasks. They automatically learn spatial hierarchies of features from images through convolutional, pooling and fully connected layers. In this article, we'll learn how to build a CNN model using PyTorch which includes defining the network architecture, preparing the data, training the model and evaluating its performance.

1. Importing necessary libraries

We are import necessary modules from the PyTorch library.

Python

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F

2. Preparing Dataset

We are setting up the CIFAR-10 dataset for training and testing in PyTorch. We apply basic image transformations, load the datasets and use data loaders to handle batching and shuffling. Finally, we define the 10 class labels for the dataset.

Python

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)

testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

3. Define CNN Architecture

We are defining a neural network by creating a class Net that inherits from nn.Module. It includes two convolutional layers with ReLU and max pooling, followed by three fully connected layers. In the forward method, we pass the input through these layers, flattening it before the dense layers. Finally we create an instance of this model as net.

Python

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

4. Defining Loss Function and Optimizer

We are setting up the training components of the model. nn.CrossEntropyLoss() is used as the loss function for handling classification tasks by comparing predicted outputs with true labels. optim.SGD is chosen as the optimizer to update model weights using Stochastic Gradient Descent (SGD) with a learning rate of 0.001 and momentum of 0.9.

Python

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

5. Training Network

We are training the neural network (net) on the CIFAR-10 dataset for 2 epochs. During training we use the defined loss function and optimizer and print the average loss every 2000 mini-batches to monitor progress.

Python

for epoch in range(2):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data

        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 2000 == 1999:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

Output:

6. Testing Network

We are evaluating the trained network (net) on the test dataset by computing predictions and comparing them with the actual labels. This helps us calculate the overall accuracy of the model.

Python

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Output:

Accuracy of the network on the 10000 test images: 53 %

The model's accuracy of 55% shows that it is underperforming due to simple network architecture. To improve this we can experiment with adjusting the learning rate and momentum or can use better optimization techniques like Adam optimizer. These optimizations can help model achieve higher accuracy.

You can download source code from here.

Building a Convolutional Neural Network using PyTorch

1. Importing necessary libraries

2. Preparing Dataset

3. Define CNN Architecture

4. Defining Loss Function and Optimizer

5. Training Network

6. Testing Network

Explore