0% found this document useful (0 votes)
41 views

Backpropagation PDF 1644779488

Backpropagation is used to update the weights and biases in a neural network using gradient descent. The document focuses on calculating the derivative of the sum of squared residuals (SSR) with respect to the last bias (b5) to update it through gradient descent. The derivative is calculated using the chain rule as the difference between the true and predicted values times 1. This process of calculating the derivative of the loss function with respect to each weight and bias allows updating all network parameters through backpropagation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Backpropagation PDF 1644779488

Backpropagation is used to update the weights and biases in a neural network using gradient descent. The document focuses on calculating the derivative of the sum of squared residuals (SSR) with respect to the last bias (b5) to update it through gradient descent. The derivative is calculated using the chain rule as the difference between the true and predicted values times 1. This process of calculating the derivative of the loss function with respect to each weight and bias allows updating all network parameters through backpropagation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Neural Networks

Part 2

Week 16
Backpropagation

Middlesex University Dubai; CST4050 Fall21;


Instructor: Ivan Reznikov
Backpropagation
Gradient descent is highly used
to update weight and biases.
Let's look at how the last bias b5
is updated and will slowly go
backward. Updating network
parameters in such a fashion is
called backpropagation.

2
Updating bias
On the right is our fitting curve for the data generated by our neural
network. As one may notice, the predicted curve has a high bias. Let's reduce
it by updating b5.

3
Updating bias
Similarly to the previous case, we can use the sum of squared residuals as the
loss function. Finding the best fit is something
we are already comfortable with.

For this purpose, we'll use gradient


descent.

4
Updating bias through GD

5
Updating bias
In order to calculate the gradient, we
need to find the following derivative:
d(SSR)
d(b5)
As
SSR = ∑(Truei – Predi)2
Predi = softplus(i1,1 × w3 + b3 +
+ i1,2 × w4 + b4) + b5
Using the chain rule:
d(SSR) d(SSR) d(Pred)
= ×
d(b5) d(Pred) d(b5)
6
Updating bias
d(SSR) d(SSR) d(Pred)
= ×
d(b5) d(Pred) d(b5)
d(SSR)
SSR = ∑(Truei – Predi) =>
2
= -2 × ∑(Truei – Predi)
d(Pred)
Predi = softplus(i1,1 × w3 + b3 + i1,2 × w4 + b4) + b5 =>

d(Pred) d(softplus(i1,1 × w3 + b3 + i1,2 × w4 + b4) + b5)


= =
d(b5) d(b5)
=
d(0 + b5) = 1 d(SSR) = -2 × ∑(True – Pred ) × 1
d(b5)
d(b5)
i i

7
Updating weights and biases
d(SSR) = d(SSR) × d(Pred)
d(b5) d(Pred) d(b5)
d(SSR) = d(SSR) × d(Pred)
d(b3) d(Pred) d(b3)
d(SSR) = d(SSR) × d(Pred)
d(w3) d(Pred) d(w3)
d(SSR) = d(SSR) × d(Pred) × d(i1,1)
d(b1) d(Pred) d(i1,1) d(b1)
d(SSR) = d(SSR) × d(Pred) × d(i1,1)
d(w1) d(Pred) d(i1,1) d(w1)
8

You might also like