ANN 2 A
ANN 2 A
• Major difference: Error at output layer is clear, error in hidden layers is unclear
• So, at the output layer the weight update is identical to Gradient Descent
w14 w46
N4
w47
w24
X2
N2 N7
w25 N5
Learning with Back-propagation
1. Calculate the errors of the output neurons
E6 = out6(1-out6)(Target6-
out6) w3
E7 = out7(1-out7)(Target7-
2. Change the output layer weights w3 6
out7) N6
W36new W36old E 6 out 3 W37new W37old E 7 out 3 w4
7
Assume that the neurons have a Sigmoid Activation Function (learning rate = 1) and
(i) Perform a forward pass on the network
(ii) Perform a reverse pass (training) once (target = 0.5)
(iii) Perform a further forward pass and comment on the results
Assume that the neurons have a Sigmoid Activation Function (learning rate = 1) and
(i) Perform a forward pass on the network
(ii) Perform a reverse pass (training) once (target = 1)
(iii) Perform a further forward pass and comment on the results
Learning with Back-propagation
Learning with Back-propagation
Targets
0 1
1 0
Problems with Back-propagation
This occurs because the algorithm always changes the weights in such a way as
to cause the error to fall. But the error might briefly have to rise as part of a
more general fall, as shown in figure. If this is the case, the algorithm will “gets
stuck” (because it can’t go uphill) and the error will not decrease further
Comparison of RBF and MLP (XOR)
to group the input data. Explores the underlying structure in the data, or correlations between patterns in the data, and organizes
patterns into categories from these correlations. Network learns to categorize (cluster) the inputs. This is often referred to as self-
organization or adaption
• Example architectures : Kohonen, ART
• Example Algorithms: Hebbian Learning
• Idea : Group typical input data in function of resemblance criteria un-known a priori
• No need of a professor. The network finds itself the correlations between the data
• The system learns itself by discovering and adapting to the structural features in the input patterns
Learning Algorithms
Idea: When neuron A repeatedly participates in firing neuron B, the strength of the action
of A onto B increases
Unsupervised Learning – Hebbian learning
Hebbian learning can be mathematically characterized as the correlation
weight adjustment described the following equation
network; yj is the value of the jth FY processing element; and the connection
• Hybrid Learning
o Combines supervised and unsupervised learning
o Part of the weights are determined through supervised learning and the others are
obtained through an unsupervised learning
• Reinforcement Learning
o Network is only provided with a grade, or score, which indicates network performance
o A teacher is present but does not present the expected or desired output but only
indicated if the computed output is correct or incorrect
o The information provided helps the network in its learning process
o A reward is given for correct answer computed and a penalty for a wrong answer
Learning Algorithms
• In supervised learning an input data set and a full set of desired outputs is presented
• In reinforcement learning the feedback is not as elaborate. Desired output is not described
explicitly. Learning network only gets feedback whether output was a success or not
• Learning with a critic (rather than learning with a teacher)
• Main objective is to maximize the (expected) reward or reinforcement signal
• Neural network reinforcement learning usually requires a multi-layer architecture
• An external evaluator is needed to decide whether network has scored a success or not
• Every node in the network receives a scalar reinforcement signal r
• r represents the quality of output
• r is between 0 and 1, 0 meaning maximum error, 1 meaning optimal
• Compared to back-propagation (where output nodes receive error signal, which is
propagated backward), here every node receives same signal
Practical Issues
• What kind of architecture is best for modeling the underlying problem?
• Which learning algorithm can achieve the best generalization?
• What size network gives the best generalization?
• How many training instances are required for good generalization?
• Generalization Vs. Memorization
o How to choose the network size (free parameters)
o How many training examples
o When to stop training
Summary
• ANNs are inspired by the learning processes that take place in biological systems
• Artificial neurons networks try to imitate the working mechanisms of their biological
counterparts
• Learning can be perceived as an optimisation process.
• Biological neural learning happens by the modification of the synaptic strength. Artificial
neural networks learn in the same way.
• Learning tasks of artificial neural networks can be reformulated as function
approximation tasks.
• The synapse strength modification rules for artificial neural networks can be derived by
applying mathematical optimisation methods.
• The optimisation is done with respect to the approximation error measure.
• In general it is enough to have a single hidden layer neural network (MLP, RBF or other)
to learn the approximation of a nonlinear function. In such cases general optimisation can
be applied to find the change rules for the synaptic weights.
Applications of ANN
Prediction/ Optimizatio
Forecasting n
Pattern
Classification Content-addressable
Memory
Clusterin Function
g Approximation Control