0% found this document useful (0 votes)
3 views

Lecture_8.2

Uploaded by

mr.jhion.adbar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture_8.2

Uploaded by

mr.jhion.adbar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

A step by step forward pass and backpropagation

example

There are multiple libraries (PyTorch, TensorFlow) that can assist you in
implementing almost any architecture of neural networks. This article is not
about solving a neural net using one of those libraries. There are already
plenty of articles, videos on that. In this article, we’ll see a step by step
forward pass (forward propagation) and backward pass (backpropagation)
example. We’ll be taking a single hidden layer neural network and solving
one complete cycle of forward propagation and backpropagation.

Getting to the point, we will work step by step to understand how weights
are updated in neural networks. The way a neural network learns is by
updating its weight parameters during the training phase. There are multiple
concepts needed to fully understand the working mechanism of neural
networks: linear algebra, probability, calculus. I’ll try my best to re-visit
calculus for the chain rule concept. I will keep aside the linear algebra
(vectors, matrices, tensors) for this article. We’ll work on each and every
computation and in the end up we’ll update all the weights of the example
neural network for one complete cycle of forward propagation and
backpropagation. Let’s get started.

Here’s a simple neural network on which we’ll be working.


Example Neural Network

I think the above example neural network is self-explanatory. There are two
units in the Input Layer, two units in the Hidden Layer and two units in the
Output Layer. The w1,w2,w2,…,w8 represent the respective weights. b1
and b2 are the biases for Hidden Layer and Output Layer, respectively.

In this article, we’ll be passing two inputs i1 and i2, and perform a forward
pass to compute total error and then a backward pass to distribute the error
inside the network and update weights accordingly.

Before getting started, let us deal with two basic concepts which should be
sufficient to comprehend this article.

Peeking inside a single neuron


Inside h1 (first unit of the hidden layer)

Inside a unit, two operations happen (i) computation of weighted sum and
(ii) squashing of the weighted sum using an activation function. The result
from the activation function becomes an input to the next layer (until the
next layer is an Output Layer). In this example, we’ll be using the Sigmoid
function (Logistic function) as the activation function. The Sigmoid
function basically takes an input and squashes the value between 0 and +1.
We’ll discuss the activation functions in later articles. But, what you should
note is that inside a neural network unit, two operations (stated above)
happen. We can suppose the input layer to have a linear function that
produces the same value as the input.

Chain Rule in Calculus

The Forward Pass


Remember that each unit of a neural network performs two operations:
compute weighted sum and process the sum through an activation function.
The outcome of the activation function determines if that particular unit
should activate or become insignificant.

Let’s get started with the forward pass.

For h1,

Now we pass this weighted sum through the logistic function (sigmoid
function) so as to squash the weighted sum into the range (0 and +1). The
logistic function is an activation function for our example neural network.
Similarly for h2, we perform the weighted sum operation sumh2 and
compute the activation value outputh2.

Now, outputh1outputh1 and outputh2outputh2 will be considered as


inputs to the next layer.

For o1,
Computing the total error
We started off supposing the expected outputs to be 0.05 and 0.95
respectively for outputo1outputo1 and outputo2outputo2. Now we
will compute the errors based on the outputs received until now and the
expected outputs.

We’ll use the following error formula,


The Backpropagation
The aim of backpropagation (backward pass) is to distribute the total error
back to the network so as to update the weights in order to minimize the cost
function (loss). The weights are updated in such as way that when the next
forward pass utilizes the updated weights, the total error will be reduced by
a certain margin (until the minima is reached).

For weights in the output layer (w5, w6, w7, w8)


For w5,

Let’s compute how much contribution w5 has on E1. If we become clear


on how w5 is updated, then it would be really easy for us to generalize the
same to the rest of the weights. If we look closely at the example neural
network, we can see that E1 is affected by outputo1, outputo1 is affected
by sumo1, and sumo1 is affected by w5. It’s time to recall the Chain
Rule.
Component 2: partial derivative of Output w.r.t. Sum

The output section of a unit of a neural network uses non-linear activation


functions. The activation function used in this example is Logistic Function.
When we compute the derivative of the Logistic Function, we get:
For weights in the hidden layer (w1, w2, w3, w4)
Similar calculations are made to update the weights in the hidden layer.
However, this time the chain becomes a bit longer. It does not matter how
deep the neural network goes, all we need to find out is how much error is
propagated (contributed) by a particular weight to the total error of the
network. For that purpose, we need to find the partial derivative of Error
w.r.t. to the particular weight. Let’s work on updating w1 and we’ll be able
to generalize similar calculations to update the rest of the weights.

For w1 (with respect to E1),

Let’s quickly go through the above chain. We know that E1 is affected


by outputo1outputo1, outputo1 is affected by sumo1, sumo1 is affected
by outputh1outputh1, outputh1outputh1 is affected by sumh1sumh1
, and finally sumh1 is affected by w1. It is quite easy to comprehend, isn’t
it?
For the first component of the above chain,

We’ve already computed the second component. This is one of the benefits
of using the chain rule. As we go deep into the network, the previous
computations are re-usable.
Once we’ve computed all the new weights, we need to update all the old
weights with these new weights. Once the weights are updated, one
backpropagation cycle is finished. Now the forward pass is done and the
total new error is computed. And based on this newly computed total error
the weights are again updated. This goes on until the loss value converges
to minima. This way a neural network starts with random values for its
weights and finally converges to optimum values.

I hope you found this article useful. I’ll see you in the next one.

You might also like