0% found this document useful (0 votes)
121 views

Convolutional Neural Networks

1. Convolutional neural networks (CNNs) are a type of deep learning algorithm used for computer vision tasks. They apply filters to images to extract features and learn patterns in the data. 2. CNNs are inspired by the visual cortex in the human brain and are better than regular neural networks for images because they can capture spatial and temporal dependencies. They reduce the number of parameters needed compared to flattening images. 3. CNNs apply filters in the form of kernels that convolve across input images to extract features. The kernels learn features like edges and colors through training. Multiple convolutional layers can learn both low and high-level features from images.

Uploaded by

Srikanth Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views

Convolutional Neural Networks

1. Convolutional neural networks (CNNs) are a type of deep learning algorithm used for computer vision tasks. They apply filters to images to extract features and learn patterns in the data. 2. CNNs are inspired by the visual cortex in the human brain and are better than regular neural networks for images because they can capture spatial and temporal dependencies. They reduce the number of parameters needed compared to flattening images. 3. CNNs apply filters in the form of kernels that convolve across input images to extract features. The kernels learn features like edges and colors through training. Multiple convolutional layers can learn both low and high-level features from images.

Uploaded by

Srikanth Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Images haven’t loaded yet. Please exit printing, wait for images to load, and try to
A Comprehensive Guide to print again.
Convolutional Neural Networks — the
ELI5 way
Sumit Saha Follow
Dec 15, 2018 · 7 min read

Arti cial Intelligence has been witnessing a monumental growth in


bridging the gap between the capabilities of humans and machines.
Researchers and enthusiasts alike, work on numerous aspects of the
eld to make amazing things happen. One of many such areas is the
domain of Computer Vision.

The agenda for this eld is to enable machines to view the world as
humans do, perceive it in a similar manner and even use the knowledge
for a multitude of tasks such as Image & Video recognition, Image
Analysis & Classi cation, Media Recreation, Recommendation Systems,
Natural Language Processing, etc. The advancements in Computer
Vision with Deep Learning has been constructed and perfected with
time, primarily over one particular algorithm — a Convolutional
Neural Network.

Introduction

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 1/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

A CNN sequence to classify handwritten digits

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning


algorithm which can take in an input image, assign importance
(learnable weights and biases) to various aspects/objects in the image
and be able to di erentiate one from the other. The pre-processing
required in a ConvNet is much lower as compared to other classi cation
algorithms. While in primitive methods lters are hand-engineered,
with enough training, ConvNets have the ability to learn these
lters/characteristics.

The architecture of a ConvNet is analogous to that of the connectivity


pattern of Neurons in the Human Brain and was inspired by the
organization of the Visual Cortex. Individual neurons respond to
stimuli only in a restricted region of the visual eld known as the
Receptive Field. A collection of such elds overlap to cover the entire
visual area.

Why ConvNets over Feed-Forward Neural Nets?

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 2/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Flattening of a 3x3 image matrix into a 9x1 vector

An image is nothing but a matrix of pixel values, right? So why not just
atten the image (e.g. 3x3 image matrix into a 9x1 vector) and feed it
to a Multi-Level Perceptron for classi cation purposes? Uh.. not really.

In cases of extremely basic binary images, the method might show an


average precision score while performing prediction of classes but
would have little to no accuracy when it comes to complex images
having pixel dependencies throughout.

A ConvNet is able to successfully capture the Spatial and Temporal


dependencies in an image through the application of relevant lters.
The architecture performs a better tting to the image dataset due to
the reduction in the number of parameters involved and reusability of
weights. In other words, the network can be trained to understand the
sophistication of the image better.

Input Image

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 3/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

4x4x3 RGB Image

In the gure, we have an RGB image which has been separated by its
three color planes — Red, Green, and Blue. There are a number of such
color spaces in which images exist — Grayscale, RGB, HSV, CMYK, etc.

You can imagine how computationally intensive things would get once
the images reach dimensions, say 8K (7680×4320). The role of the
ConvNet is to reduce the images into a form which is easier to process,
without losing features which are critical for getting a good prediction.
This is important when we are to design an architecture which is not
only good at learning features but also is scalable to massive datasets.

Convolution Layer —The Kernel

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 4/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Convoluting a 5x5x1 image with a 3x3x1 kernel to get a 3x3x1 convolved feature

Image Dimensions = 5 (Height) x 5 (Breadth) x 1 (Number of


channels, eg. RGB)

In the above demonstration, the green section resembles our 5x5x1


input image, I. The element involved in carrying out the convolution
operation in the rst part of a Convolutional Layer is called the
Kernel/Filter, K, represented in the color yellow. We have selected K as
a 3x3x1 matrix.

Kernel/Filter, K =

1 0 1
0 1 0
1 0 1

The Kernel shifts 9 times because of Stride Length = 1 (Non-Strided),


every time performing a matrix multiplication operation between K
and the portion P of the image over which the kernel is hovering.

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 5/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Movement of the Kernel

The lter moves to the right with a certain Stride Value till it parses the
complete width. Moving on, it hops down to the beginning (left) of the
image with the same Stride Value and repeats the process until the
entire image is traversed.

Convolution operation on a MxNx3 image matrix with a 3x3x3 Kernel

In the case of images with multiple channels (e.g. RGB), the Kernel has
the same depth as that of the input image. Matrix Multiplication is
performed between Kn and In stack ([K1, I1]; [K2, I2]; [K3, I3]) and all
the results are summed with the bias to give us a squashed one-depth
channel Convoluted Feature Output.

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 6/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Convolution Operation with Stride Length = 2

The objective of the Convolution Operation is to extract the high-level


features such as edges, from the input image. ConvNets need not be
limited to only one Convolutional Layer. Conventionally, the rst
ConvLayer is responsible for capturing the Low-Level features such as
edges, color, gradient orientation, etc. With added layers, the
architecture adapts to the High-Level features as well, giving us a
network which has the wholesome understanding of images in the
dataset, similar to how we would.

There are two types of results to the operation — one in which the
convolved feature is reduced in dimensionality as compared to the
input, and the other in which the dimensionality is either increased or
remains the same. This is done by applying Valid Padding in case of
the former, or Same Padding in the case of the latter.

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 7/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

SAME padding: 5x5x1 image is padded with 0s to create a 6x6x1 image

When we augment the 5x5x1 image into a 6x6x1 image and then apply
the 3x3x1 kernel over it, we nd that the convolved matrix turns out to
be of dimensions 5x5x1. Hence the name — Same Padding.

On the other hand, if we perform the same operation without padding,


we are presented with a matrix which has dimensions of the Kernel
(3x3x1) itself — Valid Padding.

The following repository houses many such GIFs which would help you
get a better understanding of how Padding and Stride Length work
together to achieve results relevant to our needs.

vdumoulin/conv_arithmetic

A technical report on convolution arithmetic in the


context of deep learning -…
github.com

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 8/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Pooling Layer

3x3 pooling over 5x5 convolved feature

Similar to the Convolutional Layer, the Pooling layer is responsible for


reducing the spatial size of the Convolved Feature. This is to decrease
the computational power required to process the data through
dimensionality reduction. Furthermore, it is useful for extracting
dominant features which are rotational and positional invariant, thus
maintaining the process of e ectively training of the model.

There are two types of Pooling: Max Pooling and Average Pooling. Max
Pooling returns the maximum value from the portion of the image
covered by the Kernel. On the other hand, Average Pooling returns the
average of all the values from the portion of the image covered by the
Kernel.

Max Pooling also performs as a Noise Suppressant. It discards the


noisy activations altogether and also performs de-noising along with
dimensionality reduction. On the other hand, Average Pooling simply
performs dimensionality reduction as a noise suppressing mechanism.
Hence, we can say that Max Pooling performs a lot better than
Average Pooling.

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 9/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Types of Pooling

The Convolutional Layer and the Pooling Layer, together form the i-th
layer of a Convolutional Neural Network. Depending on the
complexities in the images, the number of such layers may be increased
for capturing low-levels details even further, but at the cost of more
computational power.

After going through the above process, we have successfully enabled


the model to understand the features. Moving on, we are going to
atten the nal output and feed it to a regular Neural Network for
classi cation purposes.

Classi cation — Fully Connected Layer (FC Layer)

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 10/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Adding a Fully-Connected layer is a (usually) cheap way of learning


non-linear combinations of the high-level features as represented by
the output of the convolutional layer. The Fully-Connected layer is
learning a possibly non-linear function in that space.

Now that we have converted our input image into a suitable form for
our Multi-Level Perceptron, we shall atten the image into a column
vector. The attened output is fed to a feed-forward neural network
and backpropagation applied to every iteration of training. Over a
series of epochs, the model is able to distinguish between dominating
and certain low-level features in images and classify them using the
Softmax Classi cation technique.

There are various architectures of CNNs available which have been key
in building algorithms which power and shall power AI as a whole in
the foreseeable future. Some of them have been listed below:

1. LeNet

2. AlexNet

3. VGGNet

4. GoogLeNet

5. ResNet

6. ZFNet

. . .

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 11/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

GitHub Notebook — Recognising Hand Written Digits using MNIST


Dataset with TensorFlow

ss-is-master-chief/MNIST-Digit.Recognizer-
CNNs

Implementation of CNN to recognize hand written


digits (MNIST) running for 10 epochs. Accuracy:…
github.com

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 12/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 13/13

You might also like