Lect 15 MLP Introduction Backprop
Lect 15 MLP Introduction Backprop
Algorithm- Backpropagation
Multilayer Perceptron
output layer
hidden layer
input layer
Non-Linear Model : Mathematical Representation of
Sigmoid activation function
weights
x1 w1
activation output
w2
x2 y
. a=i=1n wi xi
.
. wn
xn y=(a) =1/(1+e-a)
Learning with hidden units
• Networks without hidden units are very limited in the
input-output mappings they can model.
– More layers of linear units do not help. Its still linear.
– Fixed output non-linearities are not enough
yj Backward step:
j propagate errors from
output to hidden layer
wjk
xk k
wki
Forward step:
xi Propagate activation
from input to output
Inputs
layer
The idea behind Backpropagation
• We don’t know what the hidden units ought to do, but
we can compute how fast the error changes as we
change a hidden activity.
– Instead of using desired activities to train the hidden units, use
error derivatives w.r.t. hidden activities.
– Each hidden activity can affect many output units and can
therefore have many separate effects on the error. These
effects must be combined.
– We can compute error derivatives for all the hidden units
efficiently.
– Once we have the error derivatives for the hidden activities, its
easy to get the error derivatives for the weights going into a
hidden unit.
Formalizing learning in MLP using Backpropagation
Wj,i
K ……………………
xn
Derivative of sigmoid(range of 0 - 1)
Wk,j Wk,j + × ak × Δj eq 2 k
Equation 1 and 2 are similar in nature
Δj= g’ ( in j ) Wj,i Δi
Error at j
Error Computation chain rule
E / Wk,j = - (Yi - ai) ai / Wk,j
=- (Yi - ai) g (ini) / Wk,j
=- (Yi - ai) g’ (ini) (ini)/ Wk,j
= i (ini)/ Wk,j
= i . / Wk,j . ( Wj,i . aj)
= - i Wj,i . aj / Wk,j
=- i Wj,i . g’ (inj) (inj) / Wk,j
=- i Wj,i . g’ (inj) inj / Wk,j
= - i Wj,i . g’ (inj) ( Wk,j . ak ) / Wk,j
= - i Wj,i . g’ (inj) ak
= -ak . j
Bias is 1
Learning rate is 0.05
Activation is y=(a) =1/(1+e-a)
1
-0.2
B3 O1
0.4 0.1
B1 0.3
0.5 B2
Z1 Z2
0.6
-0.1
-0.3
0.4
X1 X2
0 1
Steps to solve the problem
• Feed-Forward Phase
– Calculate the net input at Z1 and Z2
– Calculate the net input at O1
– Compute the error at O1
• Back-Prop Phase
– Change wt between hidden and output layer
– Compute error at Z1 and Z2 w.r.t input layer
– Change wt between input and hidden layer
– Compute final wt of the network
Feed-Forward Computation
• Net input at Z1
Z1 = 0 * 0.6 + 1 * -0.1 + 1 * 0.3 = 0.2
az1 = f ( 0.2 ) =0.5498
• Net input at Z2
Z2= -0.3 * 0 + 0.4 * 1 + 1 * 0.5 = 0.9
az1 = f(0.9) =0.7109
• Net input at O1
– O1 = 0.54 * 0.4 + 0.71 * 0.1 = 0.091
– ao1 = f(.091) = 0.5227