Back Propagation
Back Propagation
In the forward pass, the input data is fed into the input layer. These inputs, combined with their
respective weights, are passed to hidden layers.
For example, in a network with two hidden layers (h1 and h2 as shown in Fig. (a)), the output from h1
serves as the input to h2. Before applying an activation function, a bias is added to the weighted inputs.
Each hidden layer applies an activation function like ReLU (Rectified Linear Unit), which returns the
input if it’s positive and zero otherwise. This adds non-linearity, allowing the model to learn complex
relationships in the data. Finally, the outputs from the last hidden layer are passed to the output layer,
where an activation function, such as softmax, converts the weighted outputs into probabilities for
classification.
How Does the Backward Pass Work?
In the backward pass, the error (the difference between the predicted and actual output) is propagated
back through the network to adjust the weights and biases. One common method for error calculation is
the Mean Squared Error (MSE),
Once the error is calculated, the network adjusts weights using gradients, which are computed with the
chain rule. These gradients indicate how much each weight and bias should be adjusted to minimize the
error in the next iteration. The backward pass continues layer by layer, ensuring that the network learns
and improves its performance. The activation function, through its derivative, plays a crucial role in
computing these gradients during backpropagation.
EX:
example of backpropagation in machine learning. Assume the neurons use the sigmoid activation
function for the forward and backward pass. The target output is 0.5, and the learning rate is 1.
Forward Propagation
1. Initial Calculation
The weighted sum at each node is calculated using:
aj=∑(wi,j∗xi)aj=∑(wi,j∗xi)
Where,
ajaj is the weighted sum of all the inputs and weights at each node,
wi,jwi,j represents the weights associated with the jthjth input to the ithith neuron,
xi represents the value of the jthjth input,
Sigmoid Function
The sigmoid function returns a value between 0 and 1, introducing non-linearity into the model.
yj=11+e−ajyj=1+e−aj1
Computing Outputs
At h1 node,
a1=(w1,1x1)+(w2,1x2)=(0.2∗0.35)+(0.2∗0.7)=0.21a1=(w1,1x1)+(w2,1x2)=(0.2∗0.35)+(0.2∗0.7)=0.21
Once, we calculated the a1 value, we can now proceed to find the y3 value:
yj=F(aj)=11+e−a1yj=F(aj)=1+e−a11
y3=F(0.21)=11+e−0.21y3=F(0.21)=1+e−0.211
y3=0.56y3=0.56
Similarly find the values of y4 at h2 and y5 at O3 ,
a2=(w1,2∗x1)+(w2,2∗x2)=(0.3∗0.35)+(0.3∗0.7)=0.315a2=(w1,2∗x1)+(w2,2∗x2
)=(0.3∗0.35)+(0.3∗0.7)=0.315
y4=F(0.315)=11+e−0.315y4=F(0.315)=1+e−0.3151
a3=(w1,3∗y3)+(w2,3∗y4)=(0.3∗0.57)+(0.9∗0.59)=0.702a3=(w1,3∗y3)+(w2,3∗y4
)=(0.3∗0.57)+(0.9∗0.59)=0.702
y5=F(0.702)=11+e−0.702=0.67y5=F(0.702)=1+e−0.7021=0.67
Error Calculation
Note that, our actual output is 0.5 but we obtained 0.67.
To calculate the error, we can use the below formula:
Errorj=ytarget−y5Errorj=ytarget−y5
Error=0.5−0.67=−0.17Error=0.5−0.67=−0.17
Using this error value, we will be backpropagating.
Backpropagation
1. Calculating Gradients
The change in each weight is calculated as:
Δwij=η×δj×OjΔwij=η×δj×Oj
Where:
δjδj is the error term for each unit,
ηη is the learning rate.
2. Output Unit Error
For O3:
δ5=y5(1−y5)(ytarget−y5)δ5=y5(1−y5)(ytarget−y5)
=0.67(1−0.67)(−0.17)=−0.0376=0.67(1−0.67)(−0.17)=−0.0376
3. Hidden Unit Error
For h1:
δ3=y3(1−y3)(w1,3×δ5)δ3=y3(1−y3)(w1,3×δ5)
=0.56(1−0.56)(0.3×−0.0376)=−0.0027=0.56(1−0.56)(0.3×−0.0376)=−0.0027
For h2:
δ4=y4(1−y4)(w2,3×δ5)δ4=y4(1−y4)(w2,3×δ5)
=0.59(1−0.59)(0.9×−0.0376)=−0.0819=0.59(1−0.59)(0.9×−0.0376)=−0.0819
4. Weight Updates
For the weights from hidden to output layer:
Δw2,3=1×(−0.0376)×0.59=−0.022184Δw2,3=1×(−0.0376)×0.59=−0.022184
New weight:
w2,3(new)=−0.22184+0.9=0.67816w2,3(new)=−0.22184+0.9=0.67816
For weights from input to hidden layer:
Δw1,1=1×(−0.0027)×0.35=0.000945Δw1,1=1×(−0.0027)×0.35=0.000945
New weight:
w1,1(new)=0.000945+0.2=0.200945w1,1(new)=0.000945+0.2=0.200945
Similarly, other weights are updated:
w1,2(new)=0.271335w1,2(new)=0.271335
w1,3(new)=0.08567w1,3(new)=0.08567
w2,1(new)=0.29811w2,1(new)=0.29811
w2,2(new)=0.24267w2,2(new)=0.24267
The updated weights are illustrated below,
Since y5=0.61y5=0.61 is still not the target output, the process of calculating the error and
backpropagating continues until the desired output is reached.
This process demonstrates how backpropagation iteratively updates weights by minimizing errors until
the network accurately predicts the output.
Error=ytarget−y5Error=ytarget−y5
=0.5−0.61=−0.11=0.5−0.61=−0.11
This process is said to be continued until the actual output is gained by the neural network.