0% found this document useful (0 votes)
22 views

W H D K F S P W H D W W H H D F F D D K: Summary. To Summarize, The Conv Layer

The document summarizes the operations of a convolutional layer in a neural network. It accepts a multi-dimensional input volume and applies learnable filters to extract features. The filters have a specified width, height, and depth and are applied across the input volume with a selected stride to generate output activation maps. The number of filters determines the depth of the output volume. Each value in the output maps is computed by taking the dot product between the filter and a region of the input, adding the filter bias. This filtering and subsampling is done for each filter to build up the output volume.

Uploaded by

olia.92
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

W H D K F S P W H D W W H H D F F D D K: Summary. To Summarize, The Conv Layer

The document summarizes the operations of a convolutional layer in a neural network. It accepts a multi-dimensional input volume and applies learnable filters to extract features. The filters have a specified width, height, and depth and are applied across the input volume with a selected stride to generate output activation maps. The number of filters determines the depth of the output volume. Each value in the output maps is computed by taking the dot product between the filter and a region of the input, adding the filter bias. This filtering and subsampling is done for each filter to build up the output volume.

Uploaded by

olia.92
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

V[2,0,0] = np.

sum(X[4:9,:5,:] * W0) + b0
V[3,0,0] = np.sum(X[6:11,:5,:] * W0) + b0

Remember that in numpy, the operation * above denotes elementwise multiplication between
the arrays. Notice also that the weight vector W0 is the weight vector of that neuron and b0 is
the bias. Here, W0 is assumed to be of shape W0.shape: (5,5,4) , since the filter size is 5
and the depth of the input volume is 4. Notice that at each point, we are computing the dot
product as seen before in ordinary neural networks. Also, we see that we are using the same
weight and bias (due to parameter sharing), and where the dimensions along the width are
increasing in steps of 2 (i.e. the stride). To construct a second activation map in the output
volume, we would have:

V[0,0,1] = np.sum(X[:5,:5,:] * W1) + b1


V[1,0,1] = np.sum(X[2:7,:5,:] * W1) + b1
V[2,0,1] = np.sum(X[4:9,:5,:] * W1) + b1
V[3,0,1] = np.sum(X[6:11,:5,:] * W1) + b1
V[0,1,1] = np.sum(X[:5,2:7,:] * W1) + b1 (example of going along y)
V[2,3,1] = np.sum(X[4:9,6:11,:] * W1) + b1 (or along both)

where we see that we are indexing into the second depth dimension in V (at index 1) because
we are computing the second activation map, and that a different set of parameters ( W1 ) is now
used. In the example above, we are for brevity leaving out some of the other operations the Conv
Layer would perform to fill the other parts of the output array V . Additionally, recall that these
activation maps are often followed elementwise through an activation function such as ReLU, but
this is not shown here.

Summary. To summarize, the Conv Layer:


Summary

Accepts a volume of size W1 × H1 × D1

Requires four hyperparameters:


Number of filters K ,
their spatial extent F ,
the stride S ,
the amount of zero padding P .
Produces a volume of size W2 × H2 × D2 where:
W2 = (W1 − F + 2P )/S + 1

H2 = (H1 − F + 2P )/S + 1 (i.e. width and height are computed equally by


symmetry)
D2 = K

With parameter sharing, it introduces F ⋅ F ⋅ D1 weights per filter, for a total of


(F ⋅ F ⋅ D1 ) ⋅ K weights and K biases.
In the output volume, the d -th depth slice (of size W2 × H2 ) is the result of performing a
valid convolution of the d -th filter over the input volume with a stride of S , and then offset
by d -th bias.

A common setting of the hyperparameters is F = 3, S = 1, P = 1 . However, there are


common conventions and rules of thumb that motivate these hyperparameters. See the ConvNet
architectures section below.

Convolution Demo.
Demo Below is a running demo of a CONV layer. Since 3D volumes are hard to
visualize, all the volumes (the input volume (in blue), the weight volumes (in red), the output
volume (in green)) are visualized with each depth slice stacked in rows. The input volume is of
size W1 = 5, H1 = 5, D1 = 3 , and the CONV layer parameters are
K = 2, F = 3, S = 2, P = 1 . That is, we have two filters of size 3 × 3 , and they are applied

with a stride of 2. Therefore, the output volume size has spatial size (5 - 3 + 2)/2 + 1 = 3.
Moreover, notice that a padding of P = 1 is applied to the input volume, making the outer border
of the input volume zero. The visualization below iterates over the output activations (green), and
shows that each element is computed by elementwise multiplying the highlighted input (blue)
with the filter (red), summing it up, and then offsetting the result by the bias.
Input Volume (+pad 1) (7x7x3) Filter W0 (3x3x3) Filter W1 (3x3x3) Output Vo
x[:,:,0] w0[:,:,0] w1[:,:,0] o[:,:,0]
0 0 0 0 0 0 0 1 0 1 0 0 0 2 4 3
0 0 2 1 2 2 0 -1 1 0 0 0 0 5 1 2
0 0 0 2 2 1 0 0 1 1 1 -1 0 4 0 1
0 0 0 2 2 1 0 w0[:,:,1] w1[:,:,1] o[:,:,1]
-1 -1 1 -1 -1 1 1 -2 1
0 0 2 1 1 2 0
0 2 2 2 0 1 0 -1 0 0 1 0 0 2 -1 -7

0 0 0 0 0 0 0 1 -1 0 1 -1 0 -1 4 -4

x[:,:,1] w0[:,:,2] w1[:,:,2]


0 0 0 0 0 0 0 0 0 -1 -1 0 1

0 0 1 1 0 2 0 0 0 1 -1 0 1
0 1 -1 0 1 -1
0 1 2 2 2 1 0
0 0 1 2 0 0 0
Bias b0 (1x1x1) Bias b1 (1x1x1)
0 2 1 2 1 0 0 b0[:,:,0] b1[:,:,0]
0 2 2 0 1 0 0 1 0

0 0 0 0 0 0 0
x[:,:,2] toggle movement
0 0 0 0 0 0 0
0 2 2 2 1 0 0
0 0 0 2 2 0 0
0 1 2 0 2 0 0
0 1 0 0 2 0 0
0 0 0 0 2 2 0
0 0 0 0 0 0 0

Implementation as Matrix Multiplication.


Multiplication Note that the convolution operation essentially
performs dot products between the filters and local regions of the input. A common
implementation pattern of the CONV layer is to take advantage of this fact and formulate the
forward pass of a convolutional layer as one big matrix multiply as follows:

1. The local regions in the input image are stretched out into columns in an operation
commonly called im2col
im2col. For example, if the input is [227x227x3] and it is to be convolved
with 11x11x3 filters at stride 4, then we would take [11x11x3] blocks of pixels in the input

You might also like