0% found this document useful (0 votes)
12 views

Convolutinal Neural Networks

The document provides an introduction to computer vision and convolutional neural networks. It discusses how images are processed numerically and explains the key concepts of CNNs including convolution, activation functions, pooling, and fully connected layers. Examples of convolution and different kernel types like Gaussian blur and Sobel operator are also covered.

Uploaded by

ee23b007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Convolutinal Neural Networks

The document provides an introduction to computer vision and convolutional neural networks. It discusses how images are processed numerically and explains the key concepts of CNNs including convolution, activation functions, pooling, and fully connected layers. Examples of convolution and different kernel types like Gaussian blur and Sobel operator are also covered.

Uploaded by

ee23b007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Introduction to

Computer Vision
CNNs
A special
mathematical operation

Convolutional
Neural Network
It's another class of neural networks used to
get important information in images
CONTENT

What are Grayscale/RGB images?


The Convolution Operation
Layers of a Convolutional Neural Network
Convolution
Activation Functions
Pooling
Fully Connected Network
HOW DO COMPUTERS PROCESS IMAGES?
We convert it into a grayscale/RGB image

Each pixel of the image has a different intensity


of the colour white. Namely, this intensity ranges from
0 to 255. Thus we convert each pixel into its corresponding
numerical value.
RED GREEN BLUE IMAGES(RGB)
In most cases we deal with colour images which can be represented
simply with their red, blue and green parts. Just like in Grayscale images,
each of the pixels in the Red, Green and Blue images are converted into
their corresponding numerical values depending on the intensity of the
colour in the image
Neural networks will take the numerical value from each pixel
WHY? as an input. That's 24x16=384 nodes in the input layer of our
neural network!

24 pixels

16 pixels
Each of the input node will be connected to each node of the hidden layer. If we
assume a hidden layer of 36 nodes, we need 384*36 = 13,824 weights and 13,824
biases!. This takes a huge amount of time to compute. Thus we need to decrease
the amount of inputs without losing out the important features of the image.
Now for a different image of 8, the previous
network will not be as accurate because instead
of capturing the features of the image, its been
trained to assign weights and biases for each
pixel.

Hence traditional neural networks


are ineffective for image detection.

The solution?
CNNs
THE CONVOLUTION OPERATION

Kernel
A kernel is a small matrix which extracts the required features from the
given image. The kernel is much smaller than the input image and have
different kernels for different tasks like blurring, sharpening or edge
detection.

0 1 2 Typically kernels of size 3x3


The values of the kernels will be updated by
2 2 0 back propagation
0 1 2
a small part from the image matrix

3 3 2 1 0
kernel
0 0 1 3 1 0 1 2

3 1 2 2 3 * 2 2 0 =
2 0 0 2 2 0 1 2

2 0 0 0 1

3(0)+3(1)+2(2)+
0(2)+0(2)+1(0) = 12
3(0)+1(1)+2(2)
a small part from the image matrix

3 3 2 1 0
kernel
0 0 1 3 1 0 1 2

3 1 2 2 3 * 2 2 0 =
2 0 0 2 2 0 1 2

2 0 0 0 1

3(0)+2(1)+1(2)+
0(2)+1(2)+3(0) = 12
1(0)+2(1)+2(2)
THE CONVOLUTION
OPERATOR

4(1) + 9(0) + 0(1)


* = + 8(0) + 6(2) + 1(0)
+ 7(1) + 2(0) + 34(1)

= 57
CONVOLUTION OF AN
IMAGE 50(1) + 165(0) + 67(-1) + ....

50 165 67 0

1 0 -1
94 23 88 12 83 179

* 2 0 -2 =
178 56 90 64 338 80
1 0 -1

234 204 78 123

Image Kernel The new image


WHY IS CONVOLUTION NEEDED FOR
IMAGE DETECTECTION?
By Convolving the Image matrix with the given kernel
we extract the information from each pixel along with
the influence its neighboring pixels have. Doing this
allows us to extract the features of the image.

by applying the convolution we are


focusing on the central pixel and
extract its info along with it neighbours
3 3 2 1 0
0 0 1 3 1
3 1 2 2 3
2 0 0 2 2
2 0 0 0 1
FILTERS VS KERNELS
Often in computer visiion , the terms kernel and filter are used interchangeably
but there is a slight difference between them. Filters are groups of kernels and a
reused in convoling RGB images, which have three channels (red, green, blue)
instead of just one. IN each filter the kernels might be the same or sifferennt
according to the features we need to extract.
The resultant three matrices are then combined
using matrix addition to get a single output.

Remember that this is only for RGB images. in a gray scale a


image filer is equal to the kernel because we only have one
channel of input
Few more
terminologies in
CNNs
Stride is the no. of steps we
move the kernel each time we
convolve. Stride = 1
STRIDE
Normally we use a stride of 1,
so every possible section of the
a different stride value
image is taken.
might be
taken according to the need
of the CNN model
Stride = 3
PADDING
During the convolution operation we extract the information of each pixel
by applying a kernel over it such the required pixel is at the middle of the
kernel.
However, we end up losing the data on the edges of the image because we
are unable to apply the kernel over those pixels.

we retain the information of these pixels


but lose out of the edge pixels

THE SOLUTION?
Padding the image
Padding is adding a layer of zeros around
0 0 0 0 0 0 0 the image. It will help in convolving the entire image
without distorting any information as the zeros will have no
0 3 3 2 1 0 0 effect on the convolution

0 0 0 1 3 1 0 kernel 6 14 17 11 3
0 1 2
0 3 1 2 2 3 0 14 12 12 17 11

0 2 0 0 2 2 0 * 2 2 0 = 8 10 17 19 13

0 2 0 0 0 1 0 0 1 2 11 9 6 14 12

0 0 0 0 0 0 0 6 4 4 6 4

thus by padding the image we have retained its size


and the information of the edges
PADDING

0 0 0 0 0 0
-353 28 341 222
0 50 165 67 0 0
1 0 -1
0 94 23 88 12 0 -267 83 179 333
2 0 -2
0 178 56 90 64 0
-343 338 80 346
0 234 204 78 123 0
1 0 -1
0 0 0 0 0 0 -472 400 162 246

Now the size is the same


We add zeros around the
The kernel is the same and entire information of
image
the image is retained
Congratulations! You just learnt
the basics of a Convolutional
Layer!
Before we move forward, let's look at
syntax of convolutions in PyTorch:

No of input No of output stride


channels channels
(depends on (depends on kernel size padding
the colours we the number of
have filters we use)
Also here's a simple formula
It's just for reference
dont worry about memorizing this

where
o is the output size
i is the input size
k is kernel size
p is padding
s is stride
Some Examples of
Kernels
lets apply gaussian blur
on this image!
GAUSSIAN BLUR

Gaussian blurring is a technique that blurs a pixel giving


more weight to middle pixel. Its similar to the gaussian
curve. Blurring the image reduces computation time by
filering out noise(unwanted data) in the image while
reataing the context
Blurring the image retained important information
but also got rid of unnecessary noise
THE SOBEL
OPERATOR
Vertical Edges Horizontal Edges

1 0 -1 1 2 1

2 0 -2 0 0 0

1 0 -1 -1 -2 -1

The kernel shown before is actually the Sobel


operator - it's used to find the edges in an image.
SOBEL OPERATOR

Original image vertical sobel horizontal sobel

We then combine both outputs to get our final result


Gaussian blur also reduces noise, as you can see when you compare
the edges detected in both images

Sobel Gaussian Blur + Sobel


(Horizontal and Vertical Combined)
Other Layers of the
CNN
Images by default have intensities
ranging from 0 to 255, but neural NORMALIZATION
networks usually work best if inputs are
normalized, i.e, between 0 and 1. This
helps in reducing the learning time and
helps in preventing overfitting of the
model PyTorch code
So, we divide our numpy array before
applying convolutions.
ACTIVATION Relu
FUNCTIONS
Just like in artificial neural
networks, we can make use of
activation functions like ReLU
here.

After we apply convolution


and batch normalization to an
image, we can apply an f(x) = argmax(0,x) where
activation function to each x is the value of each pixel in the image array
pixel.
POOLING
We reduce the dimensionality of the convolved

Max Pooling
image by pooling it. Normally, a stride of the size of
the block is used (so there's no overlap)

This greatly reduces the size of the image, and also


removes unnecessary detail (preventing overfitting)

Average Pooling
MAX POOLING

Max pooling is a pooling operation that selects the maximum element from the region of the
feature map covered by the filter. Thus, the output after max-pooling layer would be a feature
map containing the most prominent features of the previous feature map
AVERAGE POOLING

Average pooling gives us the average


of the features in a filter size
TYPES OF POOLING

50 165 67 0

165 88 94 23 88 12 83 41.75

234 123 178 56 90 64 168 88.75

234 204 78 123

Max Pooling Average Pooling


FLATTENING
The output of the final pooling layer is fed into a fully
connected neural network. To do this we "flatten" the outputs

124

95

49

121
FULLY CONNECTED
LAYER
Finally we input our results into a
neural network to run the
classification

Convolutions just let the First, we take our image,


network learn features of which is a matrix, and
Next, we add hidden
the image. We then use flatten it. This acts as the
layers and output layers as
these in a normal network input layer for our artificial
in a normal neural
to get our results neural network
network.
THE ENTIRE NETWORK
BACKPROPAGATION

Backpropagation is
applied on the kernels in
CNNs.

CNNs can be considered a We Pytorch can do the


partially connected neural standard backpropagation
network by only considering
connected neurons
50 165 67 0

1 0 -1
94 23 88 12 83 179

* 2 0 -2 =
178 56 90 64 338 80
1 0 -1

234 204 78 123

Our target is to update W11. to do that lets see


which pixels it affects.
50 165 67 0

1 0 -1
94 23 88 12 83 179

* 2 0 -2 =
178 56 90 64 338 80
1 0 -1

234 204 78 123

where x11, x12 so on are from the image like 50, 165
t11,t12 are the targets for each output pixel
y11 ,y12 etc are the present values, that is 83, 179

You might also like