0% found this document useful (0 votes)
8 views

Lect 15 MLP Introduction Backprop

Uploaded by

harshitad1272
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lect 15 MLP Introduction Backprop

Uploaded by

harshitad1272
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

MLP and its Learning

Algorithm- Backpropagation
Multilayer Perceptron

1. Hidden layers of computation nodes


2. Learning by Backpropagation Method
3. Input propagates in a forward direction, layer-
by-layer basis
– also called Multilayer Feed forward Network,
MLP
MLP Distinctive Characteristics
• Non-linear activation function 1
yi 
– differentiable 1  exp(v j )
– sigmoidal function, logistic function

• One or more layers of hidden neurons


– progressively extracting more meaningful features from input patterns
• High degree of connectivity
• Nonlinearity and high degree of connectivity makes
theoretical analysis difficult
• Learning process is hard to visualize
• BP is a landmark in NN: computationally efficient training
Preliminaries
• Function signal
– input signals comes in at the input end of the network
– propagates forward to output nodes
• Error signal
– originates from output neuron
– propagates backward to input nodes

• Two computations in Training


– computation of function signal
– computation of an estimate of gradient vector
• gradient of error surface with respect to the weights
Multi-Layer Networks

output layer

hidden layer

input layer
Non-Linear Model : Mathematical Representation of
Sigmoid activation function

weights
x1 w1
activation output
w2
x2  y
. a=i=1n wi xi
.
. wn
xn y=(a) =1/(1+e-a)
Learning with hidden units
• Networks without hidden units are very limited in the
input-output mappings they can model.
– More layers of linear units do not help. Its still linear.
– Fixed output non-linearities are not enough

• We need multiple layers of adaptive non-linear hidden


units. This gives us a universal approximator. But how
can we train such nets?
– We need an efficient way of adapting all the weights, not just the last
layer. This is hard. Learning the weights going into hidden units is
equivalent to learning features.
Learning by disturbing weights
• Randomly disturb one weight and see if it
improves performance. If so, save the
change. output
– Very inefficient. We need to do units
multiple forward passes on a
representative set of training data just to
change one weight. hidden units
– Towards the end of learning, large
weight perturbations will nearly always
make things worse. input
• We could randomly perturb all the weights units
in parallel and correlate the performance Learning the hidden to output
gain with the weight changes. weights is easy. Learning the
input to hidden weights is
– Not better because we need lots of trials
hard.
to “see” the effect of changing one
weight through the noise created by all
the others.
MLP Learning Algorithm -Backpropagation

yj Backward step:
j propagate errors from
output to hidden layer
wjk

xk k

wki
Forward step:
xi Propagate activation
from input to output
Inputs
layer
The idea behind Backpropagation
• We don’t know what the hidden units ought to do, but
we can compute how fast the error changes as we
change a hidden activity.
– Instead of using desired activities to train the hidden units, use
error derivatives w.r.t. hidden activities.
– Each hidden activity can affect many output units and can
therefore have many separate effects on the error. These
effects must be combined.
– We can compute error derivatives for all the hidden units
efficiently.
– Once we have the error derivatives for the hidden activities, its
easy to get the error derivatives for the weights going into a
hidden unit.
Formalizing learning in MLP using Backpropagation

i Error occur in this layer

Wj,i

J Fraction of error are


returning back to j
unit
Wk,j

K ……………………

We distribute error at output unit to their hidden unit i.e. we


Backpropagate the error to hidden units, so we are just
blaming to hidden unit for generating error and perform
weight updating rule
Learning in MLP has two phases :

1. Feedforward pass computes ‘functional


signal’, feedforward propagation of input
pattern signals through network

2. Backward pass phase: computes ‘error


signal’, propagates the error backwards
through network starting at output units
(where the error is the difference between
actual and desired output values)
Feed Forward Phase
Compute values for output units
ai = g (ini) i
Wj,i Wj,i +  × aj
Wj,i

aj = g (inj) j Compute values for hidden units

Wk,j Wk,j Wk,j +  × ak

Using these, activation function at all units are calculated


SIGMOID ACTIVATION FUNCTION
x0=1
x1 w1
w0 net=i=0n wi xi o=(net)=1/(1+e-net)
w2
x2  o
.
.
. wn f(x) is the sigmoid function: 1/(1+e-x)

xn

Derivative of sigmoid(range of 0 - 1)

df(x)/dx= f(x) (1- f(x))


Backpropagation Phase
1.Updating rule of j,i i
Wj,i Wj,i +  × aj × Δi eq 1
Wj,i
j
Where Δi = Erri × g’ ( in i ) (by delta rule)
2. Updating rule of k,j Wk,j

Wk,j Wk,j +  × ak × Δj eq 2 k
Equation 1 and 2 are similar in nature

Δj= g’ ( in j )  Wj,i Δi
Error at j
Error Computation chain rule
E / Wk,j = - (Yi - ai) ai / Wk,j
=- (Yi - ai) g (ini) / Wk,j
=- (Yi - ai) g’ (ini) (ini)/ Wk,j
= i (ini)/ Wk,j
= i .  / Wk,j . ( Wj,i . aj)
= -  i Wj,i .  aj / Wk,j
=-  i Wj,i . g’ (inj)  (inj) / Wk,j
=-  i Wj,i . g’ (inj)  inj / Wk,j
= -  i Wj,i . g’ (inj)  ( Wk,j . ak ) /  Wk,j
= -  i Wj,i . g’ (inj) ak
= -ak . j

Change in weight at Wkj as per equation 2


Wkj -> W kj +  * ak * j
Back-propagation network (BPN)
Training algorithm
• Step 1: Initialize the network synaptic weights to small random
value.
• Step 2: Form the set of training input/output pairs, present
an input pattern and calculate the network response.
• Step 3: The desire network response is compared with the actual
output of the network, and all the local errors can be
computed
• Step 4: Update weight of the network

• Step 5: Until the network reaches a predetermined level of


accuracy in producing the adequate response for all
the training pattern, continue step 2 through 4
Question

Find the new weights when NN presents {0,1} as input


and target is 1.

Bias is 1
Learning rate is 0.05
Activation is y=(a) =1/(1+e-a)
1

-0.2
B3 O1
0.4 0.1
B1 0.3
0.5 B2
Z1 Z2
0.6
-0.1
-0.3
0.4

X1 X2
0 1
Steps to solve the problem
• Feed-Forward Phase
– Calculate the net input at Z1 and Z2
– Calculate the net input at O1
– Compute the error at O1
• Back-Prop Phase
– Change wt between hidden and output layer
– Compute error at Z1 and Z2 w.r.t input layer
– Change wt between input and hidden layer
– Compute final wt of the network
Feed-Forward Computation
• Net input at Z1
Z1 = 0 * 0.6 + 1 * -0.1 + 1 * 0.3 = 0.2
az1 = f ( 0.2 ) =0.5498
• Net input at Z2
Z2= -0.3 * 0 + 0.4 * 1 + 1 * 0.5 = 0.9
az1 = f(0.9) =0.7109
• Net input at O1
– O1 = 0.54 * 0.4 + 0.71 * 0.1 = 0.091
– ao1 = f(.091) = 0.5227

• Error at O1:- 1=ok(1-ok)(tk-ok) :-


Derivative of total sigmoid = (1-0.54)*0.54 = 0.2495
Eo1 = (1-0.54) * (0.24)=0.1191
Back-propagation Computation
• Wt change between output to hidden
 W1ho =.05 * 0.1191 * 0.54 = 0.0032
 W2ho = .05 * 0.1191 * 0.71 =0.0042
 B3 = .05 * 0.1191 = 0.0059

• Error at input and hidden z1& z2


Error at output is 0.1191
Z1=0.4 * 0.1191= 0.047
Z2= 0.1 * 0.1191 = 0.1191
• Portion of z1=0.54(1-0.54)= 0.2475
• Portion of z2=0.71(1-0.71)=0.2055
• Ez1 = z1 * 0.2475 = 0.0118
• Ez2 = z2 * 0.2055 = 0.002
Wt change
• New wt = Learning rate * err * input

• Sum all the new wts to old wt

You might also like