Building a Convolutional Neural Network using PyTorch
Last Updated :
19 May, 2025
Convolutional Neural Networks (CNNs) are deep learning models used for image processing tasks. They automatically learn spatial hierarchies of features from images through convolutional, pooling and fully connected layers. In this article, we'll learn how to build a CNN model using PyTorch which includes defining the network architecture, preparing the data, training the model and evaluating its performance.
1. Importing necessary libraries
We are import necessary modules from the PyTorch library.
Python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F
2. Preparing Dataset
We are setting up the CIFAR-10 dataset for training and testing in PyTorch. We apply basic image transformations, load the datasets and use data loaders to handle batching and shuffling. Finally, we define the 10 class labels for the dataset.
Python
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
3. Define CNN Architecture
We are defining a neural network by creating a class Net
that inherits from nn.Module
. It includes two convolutional layers with ReLU and max pooling, followed by three fully connected layers. In the forward
method, we pass the input through these layers, flattening it before the dense layers. Finally we create an instance of this model as net
.
Python
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
4. Defining Loss Function and Optimizer
We are setting up the training components of the model. nn.CrossEntropyLoss() is used as the loss function for handling classification tasks by comparing predicted outputs with true labels. optim.SGD is chosen as the optimizer to update model weights using Stochastic Gradient Descent (SGD) with a learning rate of 0.001 and momentum of 0.9.
Python
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
5. Training Network
We are training the neural network (net) on the CIFAR-10 dataset for 2 epochs. During training we use the defined loss function and optimizer and print the average loss every 2000 mini-batches to monitor progress.
Python
for epoch in range(2):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 2000 == 1999:
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
Output:
Training a CNN madel6. Testing Network
We are evaluating the trained network (net
) on the test dataset by computing predictions and comparing them with the actual labels. This helps us calculate the overall accuracy of the model.
Python
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
Output:
Accuracy of the network on the 10000 test images: 53 %
The model's accuracy of 55% shows that it is underperforming due to simple network architecture. To improve this we can experiment with adjusting the learning rate and momentum or can use better optimization techniques like Adam optimizer. These optimizations can help model achieve higher accuracy.
You can download source code from here.
Similar Reads
Cat & Dog Classification using Convolutional Neural Network in Python Convolutional Neural Networks (CNNs) are a type of deep learning model specifically designed for processing images. Unlike traditional neural networks CNNs uses convolutional layers to automatically and efficiently extract features such as edges, textures and patterns from images. This makes them hi
5 min read
How to Define a Simple Convolutional Neural Network in PyTorch? In this article, we are going to see how to  Define a Simple Convolutional Neural Network in PyTorch using Python. Convolutional Neural Networks(CNN) is a type of Deep Learning algorithm which is highly instrumental in learning patterns and features in images. CNN has a unique trait which is its abi
5 min read
Convolutional Neural Networks (CNNs) in R Convolutional Neural Networks (CNNs) are a specialized type of neural network designed to process and analyze visual data. They are particularly effective for tasks involving image recognition and classification due to their ability to automatically and adaptively learn spatial hierarchies of featur
10 min read
Training Neural Networks using Pytorch Lightning Introduction: PyTorch Lightning is a library that provides a high-level interface for PyTorch. Problem with PyTorch is that every time you start a project you have to rewrite those training and testing loop. PyTorch Lightning fixes the problem by not only reducing boilerplate code but also providing
7 min read
Convolutional Neural Network (CNN) in Tensorflow Convolutional Neural Networks (CNNs) are used in the field of computer vision. There ability to automatically learn spatial hierarchies of features from images makes them the best choice for such tasks. In this article we will explore the basic building blocks of CNNs and show you how to implement a
4 min read
Working of Convolutional Neural Network (CNN) in Tensorflow Convolutional Neural Networks (CNNs) are deep learning models particularly used for image processing tasks. In this article, weâll see how CNNs work using TensorFlow. To understand how Convolutional Neural Networks function it is important to break down the process into three core operations:Convolu
3 min read
Applying Convolutional Neural Network on mnist dataset CNN is a model known to be a Convolutional Neural Network and in recent times it has gained a lot of popularity because of its usefulness. CNN uses multilayer perceptrons to do computational work. CNN uses relatively little pre-processing compared to other image classification algorithms. This means
6 min read
Vision Transformers vs. Convolutional Neural Networks (CNNs) In recent years, the landscape of computer vision has evolved significantly with the introduction of Vision Transformers (ViTs), which challenge the dominance of traditional Convolutional Neural Networks (CNNs). While CNNs have been the backbone of many state-of-the-art image classification models,
5 min read
Training of Convolutional Neural Network (CNN) in TensorFlow In this article, we are going to implement and train a convolutional neural network CNN using TensorFlow a massive machine learning library. Now in this article, we are going to work on a dataset called 'rock_paper_sissors' where we need to simply classify the hand signs as rock paper or scissors.
4 min read
Math Behind Convolutional Neural Networks Convolutional Neural Networks (CNNs) are designed to process data that has a known grid-like topology, such as images (which can be seen as 2D grids of pixels). The key components of a CNN include convolutional layers, pooling layers, activation functions, and fully connected layers. Each of these c
7 min read