Computer Vision with PyTorch
Last Updated :
28 Mar, 2024
PyTorch is a powerful framework applicable to various computer vision tasks. The article aims to enumerate the features and functionalities within the context of computer vision that empower developers to build neural networks and train models. It also demonstrates how PyTorch framework can be utilized for computer vision tasks.
AI can use various technologies like computer vision, which facilitates the customization and experimentation, thus allowing researchers and developers to come up with the best methods of solving serious vision related problems.
- Image Classification: Image includes becomes to figure out what objects (for instance, dog, car and beach ) are featured in it.
- Object Detection: Obtain an image, locate the position of the objects, and draw boxes around them by example the individual, car, and traffic sign (people images).
- Image Segmentation: The segmentation of an object into different portions (e.g., background, foreground objects and respective parts of an object) can be done by dividing the image in different regions attributed to distinct features.
- Video Processing: What is important in Video Analytic for example identifying activities (such as whether a person is walking, running or dancing), recognizing objects on the video (it can be sport, news or entertainments) or following the object when it moves.
PyTorch Capabilities for Computer Vision Tasks
- It supports Torchvision which is a PyTorch library and it is given with some pre-trained models, datasets, and tools designed specifically for computer vision tasks. It also gives researchers an access to popular deep learning models like ResNet, VGG, and DenseNet, which they can be used to build their model.
- PyTorch manages the load easily and also allows users to prepare image datasets for training their models. It consists of some standard datasets like ImageNet, CIFAR, and COCO which can be used for the own custom datasets.
- It supports data augmentation with PyTorch's TorchVision transforms. It can be used for random transformations like cropping, resizing, and color tweaks into the images during training, which helps the model to get better.
- This integrates with CUDA, allowing users to leverage the power of its GPU for accelerating the training of deep learning models. This can increase the model to speed up the training process, especially for large datasets and complex architectures.
- It have dynamic computation graph which allows users to approach the model & allows them to create and modify computational graphs during runtime. This enables flexibility which let users to experiment the different model architectures and control flows easily, which is great for rapid development in computer vision tasks.
- This also provides automatic differentiation which is a key feature of PyTorch's autograd engine. It offers efficiently computes gradients for training the models, simplifying the complex neural network architectures and optimize the algorithms used in the computer vision.
Computer Vision Hands on with PyTorch
In this, we will use the CIFAR-10 dataset, a popular dataset for image classification. This contains 60,000 32x32 color images in 10 classes, with 6,000 images per class so, We'll load the dataset, prepare data loaders, build a simple convolutional neural network (CNN) as a baseline model, and perform evaluation.
- We import the necessary libraries including torch for PyTorch functionalities and torchvision for datasets and transformations.
- We define transformations to normalize the data using transforms.Compose.
Step 1: Loading the Dataset
We are going to Load the CIFAR-10 dataset using torchvision.datasets.CIFAR10 and create data loaders for training and testing sets using torch.utils.data.DataLoader.
Python
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
# Step 1: Loading the CIFAR-10 dataset
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # Normalize to [-1, 1]
])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Step 2: Defining the Model
In this step, we are preparing data loaders for training and testing. In this, we will define the classes of the dataset and define a simple CNN model (SimpleCNN) using nn.Module.
Python
# Step 2: Defining the CNN model
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(torch.relu(self.conv1(x)))
x = self.pool(torch.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
Step 3: Defining Loss Function and optimizer
Now, we shall be building a simple CNN model as a baseline for which we define loss function using (nn.CrossEntropyLoss) and optimizer using (optim.SGD).
Python
# Step 3: Defining loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Step 4: Model Training Process
Now we will be training the model by using a couple of epochs. In this step, we define some crucial points while training a couple of epochs like Data Loading, Forward Pass, Compute Loss etc.
Python
# Step 4: Training the model
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
Output:
[1, 2000] loss: 2.140
[1, 4000] loss: 1.808
[1, 6000] loss: 1.638
[1, 8000] loss: 1.562
[1, 10000] loss: 1.505
[1, 12000] loss: 1.441
[2, 2000] loss: 1.378
[2, 4000] loss: 1.356
[2, 6000] loss: 1.343
[2, 8000] loss: 1.330
[2, 10000] loss: 1.282
[2, 12000] loss: 1.292
Finished Training
Step 5: Model Evaluation
In this step, we shall evaluate the network on the test dataset by iterating through the test data loader. Lets Evaluate the model on the test set.
Python
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
Output:
Accuracy of the network on the 10000 test images: 54 %
Similar Reads
Computer Vision Tutorial
Computer Vision is a branch of Artificial Intelligence (AI) that enables computers to interpret and extract information from images and videos, similar to human perception. It involves developing algorithms to process visual data and derive meaningful insights. Why Learn Computer Vision?High Demand
8 min read
Computer Vision - Introduction
Ever wondered how are we able to understand the things we see? Like we see someone walking, whether we realize it or not, using the prerequisite knowledge, our brain understands what is happening and stores it as information. Imagine we look at something and go completely blank. Into oblivion. Scary
3 min read
Load a Computer Vision Dataset in PyTorch
Computer vision is a subset of Artificial Intelligence that gives the ability to the computer to understand images. In Deep Learning, Convolution Neural Network is used to process the image. For building the good we need a lot of images to process. There are several ways to load a computer vision da
3 min read
Top Computer Vision Models
Computer Vision has affected diverse fields due to the release of resourceful models. Some of these are the image classification models of CNNs such as AlexNet and ResNet; object detection models include R-CNN variants, while medical image segmentation uses U-Nets. YOLO and SSD models are perfect fo
10 min read
Vision Transformer in Computer Vision
Vision Transformers (ViTs) are inspired by the success of transformers in NLP and apply self-attention mechanisms to interpret images by treating them as sequences of words. ViTs have found applications in various fields such as image classification, object detection, and segmentation. In this artic
9 min read
Computer Vision 101
Computer Vision, an interdisciplinary field at the intersection of artificial intelligence and image processing, focuses on enabling machines to interpret and understand visual data from the world around us. This technology empowers computers to derive meaningful information from images, videos, and
12 min read
Deep Learning for Computer Vision
One of the most impactful applications of deep learning lies in the field of computer vision, where it empowers machines to interpret and understand the visual world. From recognizing objects in images to enabling autonomous vehicles to navigate safely, deep learning has unlocked new possibilities i
10 min read
PyTorch Functional Transforms for Computer Vision
In this post, we will discuss ten PyTorch Functional Transforms most used in computer vision and image processing using PyTorch. PyTorch provides the torchvision library to perform different types of computer vision-related tasks. The functional transforms can be accessed from the torchvision.transf
6 min read
Python | PyTorch sinh() method
PyTorch is an open-source machine learning library developed by Facebook. It is used for deep neural network and natural language processing purposes. The function torch.sinh() provides support for the hyperbolic sine function in PyTorch. It expects the input in radian form. The input type is tensor
2 min read
PyTorch for Speech Recognition
Speech recognition is a transformative technology that enables computers to understand and interpret spoken language, fostering seamless interaction between humans and machines. By implementing algorithms and machine learning techniques, speech recognition systems transcribe spoken words into text,
5 min read