6 Lecture CNN
6 Lecture CNN
Neural Networks
Dr. Dinesh Kumar Vishwakarma
Professor, Department of Information Technology
Delhi Technological University, Shahabad Daulatpur, Bawana Road, Delhi-110042
2
Topics Covered
• Spatial convolution
• Effect of Stride and padding
• Activations – Unit step, Tanh, Sigmoid, ReLU
• Max-Pooling and Global Average Pooling
• Dropout Layer
• Fully Connected Layer
• Gradient Vs Stochastic Gradient Descent Optimisation Algorithm
A white teddy bear sitting in A man riding a wave A cat sitting on a A woman is holding a cat in A woman standing on a
the grass on top of a surfboard suitcase on the floor her hand beach holding a surfboard
Image Captioning
[Vinyals et al., 2015] [Karpathy and Fei-Fei, 2015]
32 × 32 × 3
Original Image
5×5×3
Filter
32
width
32 × 32 × 3
Original Image
Activation
5×5×3 𝟐𝟖 × 𝟐𝟖 ×1
Filter Activation Map for 1 filter
(Convolved image)
∑ 𝑔
5×5×3 𝟐𝟖 × 𝟐𝟖 ×1
Filter Activation Map
𝒈 𝑨𝒄𝒕𝒊𝒗𝒂𝒕𝒊𝒐𝒏 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏
𝒈 = 𝒇 σ 𝒘𝒊 𝒙𝒊 + 𝒃 ,𝒘𝒊 is filter coefficient,
b is bias
𝒈 = 𝒇 𝒘𝒊 𝒙 𝒊 + 𝒃 ,
Types of Activations:-
Tanh
Sigmoid
ReLU
Leaky ReLU
datapoint again. 1) High Learning Rate will end up in the highly negative value
range since our old weights will be
2) Large Negative Bias subtracted by a large number. These
negative weights result in negative inputs for
DELHI TECHNOLOGICAL UNIVERSITY
ReLU, thereby causing the dying ReLU
problem to happen.
14
ReLU Activation…
• Advantages
ReLU takes less time to learn and is computationally less expensive than other common
activation functions (e.g., tanh, sigmoid). Because it outputs 0 whenever its input is
negative, fewer neurons will be activated, leading to network sparsity and thus
higher computational efficiency.
ReLU involves simpler mathematical operations compared to tanh and sigmoid, thereby
boosting its computational performance further.
tanh and sigmoid functions are prone to the vanishing gradient problem, where gradients
shrink drastically in backpropagation such that the network is no longer able to learn.
ReLU avoids this by preserving the gradient since:
its linear portion (in positive input range) allows gradients to flow well on active paths of
neurons and remain proportional to node activations.
It is an unbounded function (i.e., no max value).
32 × 32 × 3
Original Image
∑ 𝑔
5×5×3 𝟐𝟖 × 𝟐𝟖 ×1
Filter Activation Map
𝒈 𝑨𝒄𝒕𝒊𝒗𝒂𝒕𝒊𝒐𝒏 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏
𝒈 = 𝒇 σ 𝒘𝒊 𝒙𝒊 + 𝒃 ,𝒘𝒊 is filter coefficient,
b is bias
7 18
7x7 input (spatially)
assume 3x3 filter
7 19
7x7 input (spatially)
assume 3x3 filter
Di
A closer look at spatial dimensions:
7 21
7x7 input (spatially)
assume 3x3 filter
7 22
7x7 input (spatially)
assume 3x3 filter
7 23
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
7 24
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
7 25
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
=> 3x3 output!
7
7 27
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?
7 doesn’t fit!
cannot apply 3x3 filter on
7x7 input with stride 3.
N Output size:
(N - F) / stride + 1
F e.g. N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1 = 5
F stride 2 => (7 - 3)/2 + 1 = 3
N stride 3 => (7 - 3)/3 + 1 = 2.33 :\
𝟑𝟐−𝟓
32 × 32 × 3 Output size = +𝟏 = 𝟐𝟖
𝟏
Original Image
𝟐𝟖 × 𝟐𝟖 ×2
Activation Map for 2 filters
5×5×3 (Convolved image)
Filter
32 × 32 × 3
Original Image
5×5×3 𝟐𝟖 × 𝟐𝟖 ×3
Filter Activation Map for 3rd filter
(Convolved image)
There will be 3 different neurons all looking at the same
region in the input volume
𝟓×𝟓×𝟑
Filter 𝟐𝟖 × 𝟐𝟖 ×4
Activation Map for 4 filter
There will be 4 different neurons all looking at the same (Convolved image)
region in the input volume
5×5×3
5 Filters 𝟐𝟖 × 𝟐𝟖 × 𝟓 𝟐𝟒 × 𝟐𝟒 × 𝟓 𝟐𝟎 × 𝟐𝟎 × 𝟒
Activation Map Activation Map Activation Map
5×5×3 𝟐𝟖 × 𝟐𝟖 × 𝟓 𝟐𝟒 × 𝟐𝟒 × 𝟓 𝟐𝟎 × 𝟐𝟎 × 𝟒
5 Filters Activation Map Activation Map Activation Map
Conv Layer 1 Conv Layer 2 Conv Layer 3
32 × 32 × 3 Convolution Layer
𝑊𝑖𝑡ℎ𝑜𝑢𝑡 𝑃𝑎𝑑𝑑𝑖𝑛𝑔 (𝑛, 5 × 5𝑓𝑖𝑙𝑡𝑒𝑟𝑠)
28 × 28 × n
𝑶𝒖𝒕𝒑𝒖𝒕 𝑺𝒊𝒛𝒆:
(𝟑𝟐 + 𝟐 ∗ 𝟐 − 𝟓)/𝟏 + 𝟏 = 𝟑𝟐 × 𝟑𝟐
Convolution Layer
(𝑛, 5 × 5𝑓𝑖𝑙𝑡𝑒𝑟𝑠)
𝑊𝑖𝑡ℎ 𝑃𝑎𝑑𝑑𝑖𝑛𝑔 36 × 36 × 3
Max Pooling
Stride 2,
2×2 Filter
14 × 14 × 5
28 × 28 × 5
Filter size can be changed
Ex. for 3×3 Filter
Stride 1
DELHI TECHNOLOGICAL UNIVERSITY 3/30/2022
40
CNN- Max Pooling and Average Pooling
1. Pooling adds the model’s invariance to local translation.
2. Down sampling
Thank You