0% found this document useful (0 votes)
23 views

6 Lecture CNN

The document discusses various aspects of convolutional neural networks including spatial convolution, activation functions, max pooling, and dropout layers. It describes how convolutional filters are applied to images through sliding windows and produce activation maps. Common activation functions like tanh, sigmoid, ReLU, and leaky ReLU are explained and their properties discussed. Max pooling and global average pooling are also introduced.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

6 Lecture CNN

The document discusses various aspects of convolutional neural networks including spatial convolution, activation functions, max pooling, and dropout layers. It describes how convolutional filters are applied to images through sliding windows and produce activation maps. Common activation functions like tanh, sigmoid, ReLU, and leaky ReLU are explained and their properties discussed. Max pooling and global average pooling are also introduced.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Insight of Convolution

Neural Networks
Dr. Dinesh Kumar Vishwakarma
Professor, Department of Information Technology
Delhi Technological University, Shahabad Daulatpur, Bawana Road, Delhi-110042
2
Topics Covered

• Spatial convolution
• Effect of Stride and padding
• Activations – Unit step, Tanh, Sigmoid, ReLU
• Max-Pooling and Global Average Pooling
• Dropout Layer
• Fully Connected Layer
• Gradient Vs Stochastic Gradient Descent Optimisation Algorithm

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


3
Introduction to Convolution Neural Networks
Convolution Neural Networks are every where!

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


4
Introduction to Convolution Neural Networks
Convolution Neural Networks are every where!

A white teddy bear sitting in A man riding a wave A cat sitting on a A woman is holding a cat in A woman standing on a
the grass on top of a surfboard suitcase on the floor her hand beach holding a surfboard

Image Captioning
[Vinyals et al., 2015] [Karpathy and Fei-Fei, 2015]

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


5
Convolutions in CNNs

32 × 32 × 3
Original Image
5×5×3
Filter

32
width

Convolve the filter with the image i.e.


32
3 Height “slide over the image spatially, computing dot products
depth

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


6
Convolutions in CNNs

32 × 32 × 3
Original Image

Activation

5×5×3 𝟐𝟖 × 𝟐𝟖 ×1
Filter Activation Map for 1 filter
(Convolved image)

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


7
CNN- Activations
• Activation functions are mathematical equations that define how the weighted sum of the input
of a neural node is transformed into an output.
• It add non-linearity to a neural network, allowing the network to learn complex patterns in the
32 × 32 × 3 data.
Original Image

∑ 𝑔

5×5×3 𝟐𝟖 × 𝟐𝟖 ×1
Filter Activation Map
𝒈 𝑨𝒄𝒕𝒊𝒗𝒂𝒕𝒊𝒐𝒏 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏
𝒈 = 𝒇 σ 𝒘𝒊 𝒙𝒊 + 𝒃 ,𝒘𝒊 is filter coefficient,
b is bias

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


8
CNN- Activations

The activation function 𝑓( ) is a mathematical “gate” in between the input


feeding the current neuron and its output going to the next layer which turns
the neuron output on and off, depending on a rule or threshold.
It can also be defined as a transformation that maps the input signals into
output signals that are needed for the neural network to function.

𝒈 = 𝒇 ෍ 𝒘𝒊 𝒙 𝒊 + 𝒃 ,

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


9
CNN- Activations
• The choice of activation function has a significant impact on Network performance.

Types of Activations:-
Tanh
Sigmoid
ReLU
Leaky ReLU

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


10
CNN- Non-Linear Activation functions

TanH Sigmoid ReLU Leaky ReLU


𝑌 = 𝑓(𝑥) 𝑌 = 𝑓(𝑥) 𝑌 = 𝑓(𝑥) 𝑌 = 𝑓(𝑥)
𝑌 = tanh(𝑥) 1 𝑌 = max(0, 𝑥)
𝑌= 𝑌 = max 𝛼𝑥, 𝑥 , 0 < 𝛼 < 1
1 + 𝑒 −𝑥

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


11
CNN- Tanh(x)
Tanh or hyperbolic tangent Activation Function

 It squashes a real-valued number to


the range [-1, 1].
 Its activations saturate.
 Its output is zero-centered. Therefore,
in practice the tanh non-linearity is
always preferred to the sigmoid
nonlinearity.
 Also note that the tanh neuron is
simply a scaled sigmoid neuron
 The tanh function is mainly used
classification between two classes
DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022
12
CNN- Sigmoid Activation

 Sigmoid is not Zero centred


Later layers of processing in a Neural Network
would receive data that is not zero-centered i.e.
data coming into a neuron is always positive. It
results in undesirable zig-zagging dynamics in the
gradient updates for the weights.

 Gradeint Saturation Problem:


when the neuron’s activation
saturates at either tail of 0 or 1,
the gradient at these regions is
almost zero (kills the gradients).
Learning happens barely.

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


13
CNN- ReLU Activation
Large Negative Bias
 Rectified Linear Unit is designed such as
it is zero centred.
 It greatly accelerate (e.g. a factor of 6 in
Krizhevsky et al.) the convergence of
stochastic gradient descent compared to
the sigmoid/tanh functions. It is argued
that this is due to its linear, non-
saturating form.
 Dying ReLU Problem: For the learning
rate set too high, large gradient flowing
through a ReLU neuron could cause the
weights to update in such a way that the If our learning rate (α) is set too high, there
neuron will never activate on any is a significant chance that our new weights

datapoint again. 1) High Learning Rate will end up in the highly negative value
range since our old weights will be
2) Large Negative Bias subtracted by a large number. These
negative weights result in negative inputs for
DELHI TECHNOLOGICAL UNIVERSITY
ReLU, thereby causing the dying ReLU
problem to happen.
14
ReLU Activation…
• Advantages
 ReLU takes less time to learn and is computationally less expensive than other common
activation functions (e.g., tanh, sigmoid). Because it outputs 0 whenever its input is
negative, fewer neurons will be activated, leading to network sparsity and thus
higher computational efficiency.
 ReLU involves simpler mathematical operations compared to tanh and sigmoid, thereby
boosting its computational performance further.
 tanh and sigmoid functions are prone to the vanishing gradient problem, where gradients
shrink drastically in backpropagation such that the network is no longer able to learn.
ReLU avoids this by preserving the gradient since:
 its linear portion (in positive input range) allows gradients to flow well on active paths of
neurons and remain proportional to node activations.
 It is an unbounded function (i.e., no max value).

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


15
CNN- Leaky ReLU Activation

 Instead of the function being zero


when x < 0, a leaky ReLU will have a
small negative slope (of 0.01, or so).
 Some people report success with this
form of activation function, but the
results are not always consistent.
 It fixes the “dying ReLU” problem, as
it doesn't have zero-slope parts
 It speeds up training. There is
evidence that having the “mean
activation” be close to 0 makes
training faster
DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022
16
CNN- Activations

32 × 32 × 3
Original Image

∑ 𝑔

5×5×3 𝟐𝟖 × 𝟐𝟖 ×1
Filter Activation Map
𝒈 𝑨𝒄𝒕𝒊𝒗𝒂𝒕𝒊𝒐𝒏 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏
𝒈 = 𝒇 σ 𝒘𝒊 𝒙𝒊 + 𝒃 ,𝒘𝒊 is filter coefficient,
b is bias

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


17
CNN – Effect of stride
A Closer Look at Spatial Dimension
𝑭

Output size: (N - F) / stride + 1


𝑭
N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1 = 5×5
stride 2 => (7 - 3)/2 + 1 = 3×3
stride 3 => (7 - 3)/3 + 1 = 2.33
𝑵×𝑵

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


A closer look at spatial dimensions:

7 18
7x7 input (spatially)
assume 3x3 filter

Dinesh K. Vishwakarma, Ph.D. 3/30/2022


A closer look at spatial dimensions:

7 19
7x7 input (spatially)
assume 3x3 filter

Dinesh K. Vishwakarma, Ph.D. 3/30/2022


A closer look at spatial dimensions:
7
7x7 input (spatially)
assume 3x3 filter

Di
A closer look at spatial dimensions:
7 21
7x7 input (spatially)
assume 3x3 filter

Dinesh K. Vishwakarma, Ph.D. 3/30/2022


A closer look at spatial dimensions:

7 22
7x7 input (spatially)
assume 3x3 filter

=> 5x5 output


7

Dinesh K. Vishwakarma, Ph.D. 3/30/2022


A closer look at spatial dimensions:

7 23
7x7 input (spatially)
assume 3x3 filter
applied with stride 2

Dinesh K. Vishwakarma, Ph.D. 3/30/2022


A closer look at spatial dimensions:

7 24
7x7 input (spatially)
assume 3x3 filter
applied with stride 2

Dinesh K. Vishwakarma, Ph.D. 3/30/2022


A closer look at spatial dimensions:

7 25
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
=> 3x3 output!
7

Dinesh K. Vishwakarma, Ph.D. 3/30/2022


A closer look at spatial dimensions:
26
7

7x7 input (spatially)


assume 3x3 filter
applied with stride 3?
7
A closer look at spatial dimensions:

7 27
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?

7 doesn’t fit!
cannot apply 3x3 filter on
7x7 input with stride 3.

Dinesh K. Vishwakarma, Ph.D. 3/30/2022


28
Summary of Stride

N Output size:
(N - F) / stride + 1
F e.g. N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1 = 5
F stride 2 => (7 - 3)/2 + 1 = 3
N stride 3 => (7 - 3)/3 + 1 = 2.33 :\

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


29
Convolutions in CNNs

𝟑𝟐−𝟓
32 × 32 × 3 Output size = +𝟏 = 𝟐𝟖
𝟏
Original Image

𝟐𝟖 × 𝟐𝟖 ×2
Activation Map for 2 filters
5×5×3 (Convolved image)
Filter

There will be 2 different neurons all looking at the same


region in the input volume

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


30
Convolutions in CNNs

32 × 32 × 3
Original Image

5×5×3 𝟐𝟖 × 𝟐𝟖 ×3
Filter Activation Map for 3rd filter
(Convolved image)
There will be 3 different neurons all looking at the same
region in the input volume

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


31
Convolutions in CNNs
𝟐𝟖 × 𝟐𝟖 × 𝒏
𝟑𝟐 × 𝟑𝟐 × 𝟑 Activation Map for 𝑛 filters
Original Image (Convolved image)

𝟓×𝟓×𝟑
Filter 𝟐𝟖 × 𝟐𝟖 ×4
Activation Map for 4 filter
There will be 4 different neurons all looking at the same (Convolved image)
region in the input volume

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


32
Convolutions in CNNs
5×5×5 5×5×5
Stride=1 5 Filters 4 Filters
32 × 32 × 3
Original Image

5×5×3
5 Filters 𝟐𝟖 × 𝟐𝟖 × 𝟓 𝟐𝟒 × 𝟐𝟒 × 𝟓 𝟐𝟎 × 𝟐𝟎 × 𝟒
Activation Map Activation Map Activation Map

Conv Layer 1 Conv Layer 2 Conv Layer 3

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


33
Convolutions in CNNs
5×5×5 5×5×5
5 Filters 4 Filters
32 × 32 × 3
Original Image

5×5×3 𝟐𝟖 × 𝟐𝟖 × 𝟓 𝟐𝟒 × 𝟐𝟒 × 𝟓 𝟐𝟎 × 𝟐𝟎 × 𝟒
5 Filters Activation Map Activation Map Activation Map
Conv Layer 1 Conv Layer 2 Conv Layer 3

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


34
CNN - Padding
The spatial size of input image is reducing after every
convolution layer  Solution  Padding

32 × 32 × 3 Convolution Layer
𝑊𝑖𝑡ℎ𝑜𝑢𝑡 𝑃𝑎𝑑𝑑𝑖𝑛𝑔 (𝑛, 5 × 5𝑓𝑖𝑙𝑡𝑒𝑟𝑠)
28 × 28 × n

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


35
CNN - Padding
The spatial size of input image is reducing after every
convolution layer  Solution  Padding
filters of size 𝐹 × 𝐹, and zero-padding
with
(𝐹 − 1)/2.
(will preserve size spatially)

F = 3: Zero Padding = (3-1)/2=1


F = 5: Zero Padding = (5-1)/2=2

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


36
CNN - Padding
The size of input image is reducing after every
convolution layer  Solution  Padding

𝑶𝒖𝒕𝒑𝒖𝒕 𝒔𝒊𝒛𝒆: (𝑵 + 𝟐𝑷 − 𝑭) / 𝒔𝒕𝒓𝒊𝒅𝒆 + 𝟏


𝑵 = 𝟑𝟐, 𝑭 = 𝟓, 𝑷 = 𝟐, 𝒔𝒕𝒓𝒊𝒅𝒆 = 𝟏

𝑶𝒖𝒕𝒑𝒖𝒕 𝑺𝒊𝒛𝒆:
(𝟑𝟐 + 𝟐 ∗ 𝟐 − 𝟓)/𝟏 + 𝟏 = 𝟑𝟐 × 𝟑𝟐
Convolution Layer
(𝑛, 5 × 5𝑓𝑖𝑙𝑡𝑒𝑟𝑠)
𝑊𝑖𝑡ℎ 𝑃𝑎𝑑𝑑𝑖𝑛𝑔 36 × 36 × 3

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


37
CNNs
Numbers of Parameters required for each convolution layer?
 Input volume: 32 × 32 × 3
 Number of filters: 5
 Each filter size: 5 × 5
 Stride 1
 Zero pad 2
 Number of parameters in the conv layer per filter 
{ (5 × 5) × 3 + 1 (for bias)} =76 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑝𝑒𝑟 𝑓𝑖𝑙𝑡𝑒𝑟
 Total number of parameters = 76 × 5 = 380 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


38
Pooling Layer
• Makes the representations smaller and more • Pooling is two types: Max and Average
manageable.
• Operates over each activation map independently.
• Pooling layer is responsible for reducing the
spatial size of the Convolved Feature.
• This is to decrease the computational power
required to process the data through
dimensionality reduction.
• Furthermore, it is useful for extracting dominant
features which are rotational and positional
invariant, thus maintaining the process of
effectively training of the model.
DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022
39
CNN- Max Pooling and Average Pooling

Max Pooling

Stride 2,
2×2 Filter
14 × 14 × 5

28 × 28 × 5
Filter size can be changed
Ex. for 3×3 Filter
Stride 1
DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022
40
CNN- Max Pooling and Average Pooling
1. Pooling adds the model’s invariance to local translation.
2. Down sampling

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


41
CNN- Global Average Pooling Layer
 Instead of down sampling patches of the input
feature map, global pooling down samples the
entire feature map to a single value, Introduced with
GoogleNet architecture.
 This would be the same as setting the pooling
filter_size (𝑁 × 𝑁) instead of 2 × 2 , 𝑁 is the size of
the input feature map.
 It is also used in models as an alternative to using a
fully connected layer to transition from feature
maps to an output prediction for the model.
 In Keras the GlobalAveragePooling2D and GAP improved the top-1 accuracy by about 0.6%
GlobalMaxPooling2D classes respectively.

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


42
CNN- Fully Connected Layer
 Fully connected (FC) layers are used at the
end of network.

 All inputs are connected to each output.

 No. of parameters required= (7 × 7 ×

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


43
CNN- Dropout Layer

Standard Neural Network After Applying dropout

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


44
CNN- Dropout Layer

 Large weights in a neural network are a


sign of a more complex network that has
overfit the training data.
 Probabilistically dropping out nodes in the
network is a simple and effective
regularization method.
 A large network with more training and the
use of a weight constraint are suggested
when using dropout.
 Dropout regularize the weights and
improves the performance

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022


45

Thank You

DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022

You might also like