Lecture_8.2
Lecture_8.2
example
There are multiple libraries (PyTorch, TensorFlow) that can assist you in
implementing almost any architecture of neural networks. This article is not
about solving a neural net using one of those libraries. There are already
plenty of articles, videos on that. In this article, we’ll see a step by step
forward pass (forward propagation) and backward pass (backpropagation)
example. We’ll be taking a single hidden layer neural network and solving
one complete cycle of forward propagation and backpropagation.
Getting to the point, we will work step by step to understand how weights
are updated in neural networks. The way a neural network learns is by
updating its weight parameters during the training phase. There are multiple
concepts needed to fully understand the working mechanism of neural
networks: linear algebra, probability, calculus. I’ll try my best to re-visit
calculus for the chain rule concept. I will keep aside the linear algebra
(vectors, matrices, tensors) for this article. We’ll work on each and every
computation and in the end up we’ll update all the weights of the example
neural network for one complete cycle of forward propagation and
backpropagation. Let’s get started.
I think the above example neural network is self-explanatory. There are two
units in the Input Layer, two units in the Hidden Layer and two units in the
Output Layer. The w1,w2,w2,…,w8 represent the respective weights. b1
and b2 are the biases for Hidden Layer and Output Layer, respectively.
In this article, we’ll be passing two inputs i1 and i2, and perform a forward
pass to compute total error and then a backward pass to distribute the error
inside the network and update weights accordingly.
Before getting started, let us deal with two basic concepts which should be
sufficient to comprehend this article.
Inside a unit, two operations happen (i) computation of weighted sum and
(ii) squashing of the weighted sum using an activation function. The result
from the activation function becomes an input to the next layer (until the
next layer is an Output Layer). In this example, we’ll be using the Sigmoid
function (Logistic function) as the activation function. The Sigmoid
function basically takes an input and squashes the value between 0 and +1.
We’ll discuss the activation functions in later articles. But, what you should
note is that inside a neural network unit, two operations (stated above)
happen. We can suppose the input layer to have a linear function that
produces the same value as the input.
For h1,
Now we pass this weighted sum through the logistic function (sigmoid
function) so as to squash the weighted sum into the range (0 and +1). The
logistic function is an activation function for our example neural network.
Similarly for h2, we perform the weighted sum operation sumh2 and
compute the activation value outputh2.
For o1,
Computing the total error
We started off supposing the expected outputs to be 0.05 and 0.95
respectively for outputo1outputo1 and outputo2outputo2. Now we
will compute the errors based on the outputs received until now and the
expected outputs.
We’ve already computed the second component. This is one of the benefits
of using the chain rule. As we go deep into the network, the previous
computations are re-usable.
Once we’ve computed all the new weights, we need to update all the old
weights with these new weights. Once the weights are updated, one
backpropagation cycle is finished. Now the forward pass is done and the
total new error is computed. And based on this newly computed total error
the weights are again updated. This goes on until the loss value converges
to minima. This way a neural network starts with random values for its
weights and finally converges to optimum values.
I hope you found this article useful. I’ll see you in the next one.