Convolutinal Neural Networks
Convolutinal Neural Networks
Computer Vision
CNNs
A special
mathematical operation
Convolutional
Neural Network
It's another class of neural networks used to
get important information in images
CONTENT
24 pixels
16 pixels
Each of the input node will be connected to each node of the hidden layer. If we
assume a hidden layer of 36 nodes, we need 384*36 = 13,824 weights and 13,824
biases!. This takes a huge amount of time to compute. Thus we need to decrease
the amount of inputs without losing out the important features of the image.
Now for a different image of 8, the previous
network will not be as accurate because instead
of capturing the features of the image, its been
trained to assign weights and biases for each
pixel.
The solution?
CNNs
THE CONVOLUTION OPERATION
Kernel
A kernel is a small matrix which extracts the required features from the
given image. The kernel is much smaller than the input image and have
different kernels for different tasks like blurring, sharpening or edge
detection.
3 3 2 1 0
kernel
0 0 1 3 1 0 1 2
3 1 2 2 3 * 2 2 0 =
2 0 0 2 2 0 1 2
2 0 0 0 1
3(0)+3(1)+2(2)+
0(2)+0(2)+1(0) = 12
3(0)+1(1)+2(2)
a small part from the image matrix
3 3 2 1 0
kernel
0 0 1 3 1 0 1 2
3 1 2 2 3 * 2 2 0 =
2 0 0 2 2 0 1 2
2 0 0 0 1
3(0)+2(1)+1(2)+
0(2)+1(2)+3(0) = 12
1(0)+2(1)+2(2)
THE CONVOLUTION
OPERATOR
= 57
CONVOLUTION OF AN
IMAGE 50(1) + 165(0) + 67(-1) + ....
50 165 67 0
1 0 -1
94 23 88 12 83 179
* 2 0 -2 =
178 56 90 64 338 80
1 0 -1
THE SOLUTION?
Padding the image
Padding is adding a layer of zeros around
0 0 0 0 0 0 0 the image. It will help in convolving the entire image
without distorting any information as the zeros will have no
0 3 3 2 1 0 0 effect on the convolution
0 0 0 1 3 1 0 kernel 6 14 17 11 3
0 1 2
0 3 1 2 2 3 0 14 12 12 17 11
0 2 0 0 2 2 0 * 2 2 0 = 8 10 17 19 13
0 2 0 0 0 1 0 0 1 2 11 9 6 14 12
0 0 0 0 0 0 0 6 4 4 6 4
0 0 0 0 0 0
-353 28 341 222
0 50 165 67 0 0
1 0 -1
0 94 23 88 12 0 -267 83 179 333
2 0 -2
0 178 56 90 64 0
-343 338 80 346
0 234 204 78 123 0
1 0 -1
0 0 0 0 0 0 -472 400 162 246
where
o is the output size
i is the input size
k is kernel size
p is padding
s is stride
Some Examples of
Kernels
lets apply gaussian blur
on this image!
GAUSSIAN BLUR
1 0 -1 1 2 1
2 0 -2 0 0 0
1 0 -1 -1 -2 -1
Max Pooling
image by pooling it. Normally, a stride of the size of
the block is used (so there's no overlap)
Average Pooling
MAX POOLING
Max pooling is a pooling operation that selects the maximum element from the region of the
feature map covered by the filter. Thus, the output after max-pooling layer would be a feature
map containing the most prominent features of the previous feature map
AVERAGE POOLING
50 165 67 0
165 88 94 23 88 12 83 41.75
124
95
49
121
FULLY CONNECTED
LAYER
Finally we input our results into a
neural network to run the
classification
Backpropagation is
applied on the kernels in
CNNs.
1 0 -1
94 23 88 12 83 179
* 2 0 -2 =
178 56 90 64 338 80
1 0 -1
1 0 -1
94 23 88 12 83 179
* 2 0 -2 =
178 56 90 64 338 80
1 0 -1
where x11, x12 so on are from the image like 50, 165
t11,t12 are the targets for each output pixel
y11 ,y12 etc are the present values, that is 83, 179