0% found this document useful (0 votes)
121 views

Convolutional Neural Network (CNN)

This document provides an overview of convolutional neural networks (CNNs), including their inspiration from biological processes, key components such as convolutional layers, pooling layers, and fully connected layers, and applications in computer vision tasks like image classification.

Uploaded by

pcjoshi02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views

Convolutional Neural Network (CNN)

This document provides an overview of convolutional neural networks (CNNs), including their inspiration from biological processes, key components such as convolutional layers, pooling layers, and fully connected layers, and applications in computer vision tasks like image classification.

Uploaded by

pcjoshi02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Convolutional Neural Network (CNN)

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Intro

❖ A Convolutional Neural Network (CNN), also known as ConvNet

❖ It is a specialized type of deep learning algorithm mainly designed for tasks


that necessitate object recognition, including image classification, detection,
and segmentation.

❖ CNNs are employed in a variety of practical scenarios, such as autonomous


vehicles, security camera systems, and others.
The importance of CNNs

➔ Autonomously extract features at a large scale.


bypassing the need for manual feature
engineering and thereby enhancing efficiency.

➔ translation-invariant characteristics:

empowering CNN to identify and extract patterns


and features from data irrespective of variations
in position, orientation, scale, or translation.
Contd…

➔ A variety of pre-trained CNN architectures, including VGG-16, ResNet50,


Inceptionv3, and EfficientNet, have demonstrated top-tier performance.
These models can be adapted to new tasks with relatively little data through a
process known as fine-tuning.

➔ Beyond image classification tasks, CNNs are versatile and can be applied to a
range of other domains, such as natural language processing, time series
analysis, and speech recognition.
Inspiration Behind CNN and Parallels With The Human Visual System

Convolutional neural networks were inspired by the layered


architecture of the human visual cortex

Hierarchical architecture:

Local connectivity:

Translation invariance:

Multiple feature maps:

Non-linearity:
Key Components of a CNN

The convolutional neural network is made of four main parts.

● Convolutional layers
● Rectified Linear Unit (ReLU for short)
● Pooling layers
● Fully connected layers
We dive into the definition of each one of these components through the example
of the following example of classification of a handwritten digit.
Convolution layers
● This is the first building block of a CNN. The main mathematical task
performed is called convolution, which is the application of a sliding window
function to a matrix of pixels representing an image. The sliding function
applied to the matrix is called kernel or filter, and both can be used
interchangeably.
● In the convolution layer, several filters of equal size are applied, and each
filter is used to recognize a specific pattern from the image, such as the
curving of the digits, the edges, the whole shape of the digits, and more.
● in the convolution layer, we use small grids (called filters or kernels) that
move over the image. Each small grid is like a mini magnifying glass that
looks for specific patterns in the photo, like lines, curves, or shapes. As it
moves across the photo, it creates a new grid that highlights where it found
these patterns.
● For example, one filter might be good at finding straight lines, another might
find curves, and so on. By using several different filters, the CNN can get a
good idea of all the different patterns that make up the image.
Let’s consider grayscale image of a handwritten digit.
let’s consider the kernel used for the convolution. It is a matrix with a
dimension of 3x3.

● The weights of
each element of the
kernel is
represented in the
grid.
● Zero represents
black grids and
ones white grid.
Perform the convolution operation by applying the dot product, and
work as follows:

1. Apply the kernel matrix from the top-left corner to the right.
2. Perform element-wise multiplication.
3. Sum the values of the products.
4. The resulting value corresponds to the first value (top-left corner) in the
convoluted matrix.
5. Move the kernel down with respect to the size of the sliding window.
6. Repeat steps 1 to 5 until the image matrix is fully covered.
https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-79a07dd44cf9#:~:text=Besides%20convolution
%20layers%2C%20CNNs,for%20each%20of%20those%20parts.
● we have seen, when we perform convolution over the 6x6 image with a 3x3
kernel, we get a 4x4 feature map.
● This is because there are only 16 unique positions where we can place our
filter inside this picture.
● Since our image shrinks every time we perform convolution, we can do it only
a limited number of times before our image disappears completely.
● we see that the impact of the pixels located on the outskirts is much smaller
than those in the center of image.
● This way we lose some of the information contained in the picture.
● To solve both of these problems we can pad our image with an additional
border. For example, if we use 1px padding, we increase the size of our photo
to 8x8, so that output of the convolution with the 3x3 filter will be 6x6.

● Usually in practice we fill in additional padding with zeros.

● The padding width, should meet the following equation, where p is padding and
f is the filter dimension (usually odd).
Strided Convolution
When designing our CNN architecture, we can decide to increase the step if we
want the receptive fields to overlap less or if we want smaller spatial dimensions
of our feature map.

The dimensions of the output matrix - taking into account padding and stride - can
be calculated using the following formula.
Activation function

● A ReLU activation function is applied after each convolution operation.


● This function helps the network learn non-linear relationships between the
features in the image, hence making the network more robust for identifying
different patterns.
● It also helps to mitigate the vanishing gradient problems.
Pooling layer

● The goal of the pooling layer is to pull the most significant features from the
convoluted matrix.
● This is done by applying some aggregation operations, which reduce the
dimension of the feature map (convoluted matrix).
● Hence reducing the memory used while training the network.
● Pooling is also relevant for mitigating overfitting.
The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.
kernel/Filter

A kernel is a matrix, which is slid


across the image and multiplied with
the input such that the output is
enhanced in a certain desirable
manner.
Example
Consider the two input image arrangements as shown in the example

First image, the center value is


3*5 + 2*-1 + 2*-1 + 2*-1 + 2*-1 = 7.
The value 3 got increased to 7.

For the second image, the output is


1*5+ 2*-1 + 2*-1 + 2*-1 + 2*-1 = -3.

The value 1 got decreased to -3.

Clearly, the contrast between 3 and 1 is increased


to 7 and -3, which will in turn sharpen the image.

You might also like