Open In App

Building a Convolutional Neural Network using PyTorch

Last Updated : 11 Feb, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Convolutional Neural Networks (CNNs) are deep learning models used for image processing tasks. They automatically learn spatial hierarchies of features from images through convolutional, pooling and fully connected layers. In this article we'll learn how to build a CNN model using PyTorch. This includes defining the network architecture, preparing the data, training the model and evaluating its performance.

Implementation of Building a Convolutional Neural Network in PyTorch

Step 1: Import necessary libraries

In this Python code block, we are importing essential modules from the PyTorch library, which is a popular open-source machine learning framework.

Python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F

Step 2: Prepare the dataset

  • This code sets up the CIFAR-10 dataset for training and testing a neural network using PyTorch.
  • It defines a sequence of image transformations, including converting images to PyTorch tensors and normalizing them. Then, it creates dataset objects for both the training and test sets of CIFAR-10, specifying the root directory, that it's for training or testing, and the transformation sequence.
  • Next, it creates data loaders for both sets, which help in loading the data in batches, shuffling it, and using multiple processes for faster data loading.
  • Finally, it defines the class labels for CIFAR-10, representing the 10 different object classes in the dataset. Overall, this code prepares the CIFAR-10 dataset for use in training and evaluating neural network models.
Python
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Step 3: Define the CNN architecture

  • This code defines a neural network architecture using the nn.Module class from PyTorch. The Net class inherits from nn.Module and defines the layers of the network in its __init__ method.
  • It has two convolutional layers (conv1 and conv2) with ReLU activation functions, followed by max pooling layers (pool). The fully connected layers (fc1, fc2, and fc3) process the output of the convolutional layers.
  • The forward method defines the forward pass of the network, where input x is passed through each layer sequentially. The view method reshapes the output of the second convolutional layer to be compatible with the fully connected layers. Finally, an instance of the Net class is created as net, representing the neural network model.
Python
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

Step 4: Define loss function and optimizer

  • In this code , the nn.CrossEntropyLoss() is used as the loss function (criterion) for training the neural network. CrossEntropyLoss is commonly used for classification tasks and calculates the loss between the predicted class probabilities and the actual class labels.
  • The optimizer (optim.SGD) is used to update the weights of the neural network during training. Stochastic Gradient Descent (SGD) is the chosen optimization algorithm, with a learning rate of 0.001 and momentum of 0.9.
Python
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Step 5: Train the network

This code trains a neural network (net) using the CIFAR-10 dataset with a specified loss function (criterion) and optimizer (optimizer) for 2 epochs, printing the average loss every 2000 mini-batches.

Python
for epoch in range(2):  

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data

        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 2000 == 1999: 
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

Step 6: Testing the network

This code calculates the accuracy of the neural network (net) on the test dataset (testloader) by comparing the predicted labels with the actual labels. It iterates over the test dataset, computes the outputs of the network for each image, and compares the predicted labels with the actual labels.

Python
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Complete Code to Build CNN using PyTorch

Python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2): 

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data

        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 2000 == 1999: 
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Output:

[1, 2000] loss: 2.279
[1, 4000] loss: 1.992
[1, 6000] loss: 1.718
[1, 8000] loss: 1.589
[1, 10000] loss: 1.513
[1, 12000] loss: 1.492
[2, 2000] loss: 1.410
[2, 4000] loss: 1.375
[2, 6000] loss: 1.366
[2, 8000] loss: 1.343
[2, 10000] loss: 1.325
[2, 12000] loss: 1.263
Finished Training
Accuracy of the network on the 10000 test images: 55 %

The model's accuracy of 55% shows that it is underperforming due to simple network architecture. To improve this we can experiment with adjusting the learning rate and momentum or can use better optimization techniques like Adam optimizer. These optimizations can help model achieve higher accuracy.


Next Article

Similar Reads