INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
Artificial Neural Networks
ANN
Neural Networks
1. Neural Networks (NNs) are networks of neurons, for example, as found in
real (i.e. biological) brains.
2. Artificial Neurons are crude approximations of the neurons found in
brains. They may be physical devices, or purely mathematical constructs.
3. Artificial Neural Networks (ANNs) are networks of Artificial Neurons,
and hence constitute crude approximations to parts of real brains. They
may be physical devices, or simulated on conventional computers.
4. From a practical point of view, an ANN is just a parallel computational
system consisting of many simple processing elements connected together
in a specific way in order to perform a particular task.
5. One should never lose sight of how crude the approximations are, and
how over-simplified our ANNs are compared to real brains.
2
Brains versus Computers
1. There are approximately 10 billion neurons in the human cortex, compared with
10 of thousands of processors in the most powerful parallel computers.
2. Each biological neuron is connected to several thousands of other neurons,
similar to the connectivity in powerful parallel computers.
3. Lack of processing units can be compensated by speed. The typical operating
speeds of biological neurons is measured in milliseconds (10-3 s), while a silicon
chip can operate in nanoseconds (10-9 s).
4. The human brain is extremely energy efficient, using approximately 10-16 joules
per operation per second, whereas the best computers today use around 10-6
joules per operation per second.
5. Brains have been evolving for tens of millions of years, computers have been
evolving for tens of decades.
6. Brain is capable of adaptation by changing the connectivity. But computer is hard
to be adaptive.
The brain uses massively parallel computation
– »1011 neurons in the brain
– »104 connections per neuron
3
Brains versus Computers
• receives input signals generated by
Biological neuron other neurons through its
dendrites,
• integrates these signals in its body,
• then generates its own signal (a
series of electric pulses) that travel
along the axon which in turn makes
contacts with dendrites of other
neurons.
• The points of contact between
neurons are called synapses.
• Incoming impulses can be
excitatory if they cause firing, or
inhibitory if they hinder the firing
of the response.
After carrying a pulse, an axon is in a state of non-excitability for a certain time called
the refractory period.
4
Brains versus Computers
Biological neuron
5
Brains versus Computers
Brain Computation
6
Biological Neural Networks (BNN)
7
Artificial Neural Networks (ANN)
A neural network consists of four main parts:
1. Processing units
2. Weighted interconnections between the various processing units.
3. An activation rule which acts on the set of input signals at a unit to
produce a new output signal, or activation.
4. Optionally, a learning rule that specifies how to adjust the weights
for a given input/output pair.
8
Definitions
Haykin :
A neural network is a massively parallel distributed processor that has a natural
propensity for storing experiential knowledge and making it available for use. It
resembles the brain in two respects:
– Knowledge is acquired by the network through a learning process.
– Interneuron connection strengths known as synaptic weights are used to store the
knowledge.
Zurada:
Artificial neural systems, or neural networks, are physical cellular systems
which can acquire, store, and utilize experiential knowledge.
9
Definitions
Mohamad H Hasssoun :
Neural Networks are neural in the sense that they may have been inspired by
neuroscience but not necessarily because they are faithful models of biologic
neural or cognitive phenomena
J.A. Anderson :
It is not absolutely necessary to believe that neural network models have
anything to do with the nervous system, but it helps. Because, we are able to use
a large body of ideas and facts from
10
Importance of ANN
• They are extremely powerful computational devices
• Massive parallelism makes them very efficient.
• They can learn and generalize from training data – so there is no
need for enormous feats of programming.
• They are particularly fault tolerant – this is equivalent to the
graceful degradation* found in biological systems. ‘you could
shoot every tenth neuron in the brain and not even notice it’
• They are very noise tolerant – so they can cope with situations
where normal systems would have difficulty.
* The property that enables a system to continue operating properly in the event
of the failure of some of its components.
11
Artificial Neural Net
W1
Y
X1
W2
X2
The figure shows a simple artificial neural net with two input
neurons (X1, X2) and one output neuron (Y). The inter connected
weights are given by W1 and W2.
12
Artificial Neural Net
Mathematical Model of Artificial Neuron
13
Artificial Neural Net
The neuron is the basic information processing unit of a NN. It
consists of:
1. A set of links, describing the neuron inputs, with weights W1, W2,
…, Wm.
2. An adder function (linear combiner) for computing the weighted
sum of the inputs (real numbers): m
u W jX j
j 1
3. Activation function for limiting the amplitude of the neuron output.
y (u b)
14
Artificial Neural Net
The bias value is added to the weighted sum
∑wixi so that we can transform it from the origin.
Yin = ∑wixi + b, where b is the bias
x1-x2= -1
x2 x1-x2=0
x1-x2= 1
x1
15
Operation of a Neural Network
-
x0 w0j
x1 w1j
f
Output y
xn wnj
Input Weight Weighted Activation
vector x vector w sum function
16
McCulloch-Pitts neuron model (1943)
Activation function
0
17
McCulloch-Pitts neuron model (1943)
18
Networks of McCulloch-Pitts Neurons
19
Building Blocks of Artificial Neural Net
• Network Architecture (Connection between Neurons)
• Setting the Weights (Training)
• Activation Function
20
Network Architecture
Input Layer: Each input unit may be designated by an attribute
value possessed by the instance.
Hidden Layer: Not directly observable, provides nonlinearities
for the network.
Output Layer: Encodes possible values.
21
Training Process
Supervised Training - Providing the network with a series of
sample inputs and comparing the output with the expected
responses.
Unsupervised Training - Most similar input vector is assigned to
the same output unit.
Reinforcement Training - Right answer is not provided but
indication of whether ‘right’ or ‘wrong’ is provided.
22
Activation Function
ACTIVATION LEVEL – DISCRETE OR CONTINUOUS
HARD LIMIT FUCNTION (DISCRETE)
• Binary Activation function
• Bipolar activation function
• Identity function
SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)
• Binary Sigmoidal activation function
• Bipolar Sigmoidal activation function
23
Activation Function
Activation functions:
(A) Identity
(B) Binary step
(C) Bipolar step
(D) Binary sigmoidal
(E) Bipolar sigmoidal
(F) Ramp
24
Decision Boundaries/Linear Separability
Linear separability is the concept wherein the separation of the input space into
regions is based on whether the network response is positive or negative.
The decision boundary is the surface at which the output of the unit is precisely
equal to the threshold.
x1
w1
1 w1
slope =
y w2
2 w2
w2
0 2
x2 W1=1, w2=2, 2
25
Learning and Generalization
Learning The network must learn decision surfaces from a set of
training patterns so that these training patterns are classified
correctly.
Generalization After training, the network must also be able to
generalize, i.e.correctly classify test patterns it has never seen
before.
Usually we want our neural networks to learn well, and also to
generalize well.
26
Perceptron
An arrangement of one input layer of neurons feeding forward to
one output layer of neurons is kown as Perceptron.
27
Perceptron Network
The Perceptron network consists of three units namely sensory unit
(input unit) associative unit (hidden unit) and response unit (output
unit)
28
Perceptron Network
Epoch : Presentation of the entire training set to the neural
network. In the case of the AND function, an epoch consists of
four sets of inputs being presented to the network (i.e. [0,0], [0,1],
[1,0], [1,1]).
Error: The error value is the amount by which the value output by
the network differs from the target value. For example, if we
required the network to output 0 and it outputs 1, then Error = -1.
Target Value, T : When we are training a network we not only
present it with the input but also with a value that we require the
network to produce. For example, if we present the network with
[1,1] for the AND function, the training value will be 1.
29
Perceptron Network
Output , O : The output value from the neuron.
Ij : Inputs being presented to the neuron.
Wj : Weight from input neuron (Ij) to the output neuron.
LR : The learning rate. This dictates how quickly the network
converges. It is set by a matter of experimentation. It is typically
0.1.
30
Perceptron Learning
wi = wi + wi or wi = (t - o) xi
where
t = c(x) is the target value, o is the perceptron output, is a small
constant (e.g., 0.1) called learning rate.
If the output is correct (t = o) the weights wi are not changed
If the output is incorrect (t o) the weights wi are changed such that the
output of the perceptron for the new weights is closer to t.
The algorithm converges to the correct classification
• if the training data is linearly separable
• is sufficiently small
31
Perceptron Architecture
32
Learning Rules
Multiple Neuron Perceptron
33
Learning Rules
Consider the four-class decision problem , train a perceptron
network to solve this problem using the perceptron learning rule.
34
Supervised Hebbian Learning
Linear Associator
Hebb’s learning law can be used in combination with a variety of
neural network architectures. The network we will use is the
linear associator
35
Supervised Hebbian Learning
36
Supervised Hebbian Learning
37
Steepest Descent Method
Trajectory with = 0.01 Trajectory with = 0.035
38
Steepest Descent Method
Trajectory with = 0.039 Trajectory with = 0.041
39
Steepest Descent Method
Steepest Descent with Minimization Along a Line Trajectory for Newton’s Method
40
Conjugate Gradient
1. Select the first search direction to be the negative of the gradient
2. Select the learning rate k to minimize the function along the search direction
3. Select the next search direction according to
4. If the algorithm has not converged, return to step 2
41
Widrow-Hoff Learning
ADALINE Network
42
Widrow-Hoff Learning
Mean Square Error
LMS Algorithm
Convergence Point
Stable Learning Rate
43
Widrow-Hoff Learning
Adaptive Filter ADALINE
Tapped Delay Line
44
Multilayer Perceptrons
Three-Layer Network
45
Pattern Classification
Two-Layer XOR Network
46
Function Approximation
Function Approximation Network
47
Function Approximation
Nominal Response of Network
48
Function Approximation
Effect of Parameter Changes on Network Response
49