Artificial Neural Network
Artificial Neural Network
- The brain consists of a large number of highly connected elements called neurons.
- The biological neuron consists of four main components (see Fig. 1):
• Dendrite
• Cell body
• Axon
• Synapse
- The axon of a neuron connects to the dendrite of another neuron by means of connectors called
synapses.
- Depending on type of neuron, the number of synaptic connections from other neurons range from few
hundred to 103
- Because of the electrical properties of the neuronal membranes, the signals that reach the dendrite
quickly decay in strength and over distance and hence lose the ability to stimulate the neuron unless they
are strengthened by other signals occurring at almost the same time and/or at nearby locations.
- The cell body (soma) sums the incoming signals from the dendrites
- When sufficient i/ps are received to stimulate the neuron to its threshold, the neuron generates action
potential, i.e., fires, and transmits an action potential along its axon to other neurons or target cells
outside the neurons system such as a muscle
- But if i/ps do not reach the threshold level, the i/ps will quickly decay and do not generate an action
potential.
Cell body
Dendrite
Axon
Synapse
e.g complex pattern recognition, speech recognition, natural language processing, and common-sense
reasoning, etc.
- the input to the synapses is a vector signal x with the individual vector components
Page | 2
- Each component of x is i/p to a synapses and connected to a neuron through a synaptic weight
w, i.e x is multiplied by the synaptic weight, w
• Summing Device
- Summing device acts to add all the signals broadcast into the adder, i.e., each input is multiplied
by the associated synaptic weight and then summed
- All the operations up to and including the o/p of the adder constitute a linear combiner
• Bias, bq
- The bias, b, is usually externally applied and is used to set the threshold of the neuron.
- Activation function, f(.), serves to limit the amplitude of the neuron’s output (yq).
Linear a=y
Page | 3
1
Log-Sigmoid a=
1 + e−y
e− y − e y
Hyperbolic Tangent a=
e y + e−y
Sigmoid
Threshold
(or bias)
Synapses b
x1
w1
w2 yq
Vector i/p x2
f(.)
aq
axon(o/p)
signal
Summing
Activation
wn junction
function
Synaptic
xn weights cell body (soma)
- Learning algorithm is the rule that governs how changes are made to weights and thresholds (or bias) of
an ANN.
● Topology
Page | 4
• Input layer
• Hidden layer(s)
• Output layer
Input layer
Hidden layer (Zero, one or several hidden layers)
Output layer
- A single layer NN has one i/p layer and one o/p layer but no hidden layer
- Fig. 4 shows a single layer neural network of S neurons
- Each unit, receives a weighted i/p xj with weight wji
y1
x1 w11 a1
b1
y2
x2
a1
b2
yS
xR aS
WS,R
bS
Fig. 4 A single layer neural network
- Note that each of the R inputs is connected to each of the neurons and that the weight matrix now has S
rows.
- In order to simplify the complex network of arrows, Fig. 4 can be replaced by a simplified form shown in
Fig. 5 and the weight matrix is given as
w w w
1,1 1,2 1, R
w w w
w=
2,1 2,2 2, R
wS ,1 wS , 2 wS , R
Page | 5
- Fig. 6 comprises a layer of S neurons and the layer includes the weight matrix, the summers, the bias
vector b, input x and output a
b
(Sx1)
Fig. 5 A single layer neuron in simplified form
• Multi-Layer NN
- As shown in Fig. 6, a multi-layer NN has one i/p layer, one or more hidden layer(s), and one o/p layer.
- The layer of i/p units is connected to a layer of hidden units which in turn is connected to a layer of o/p
units.
- A hidden layer’s output is not externally directly accessible
- Multi-Layer NN with nonlinear activation function provides more computational capability than a single
layer system
b b
(S1x1) (S2x1)
Fig. 6 A single layer neuron in simplified form
Page | 6
- The output of the hidden layer is computed as
a1 = f 1 y1 = f 1 w1 x + b1
- While the output of the output layer is computed as:
a 2 = f 2 y 2 = f 2 w 2 a1 + b 2 = f 2 w 2 f 1 w1 x + b1 + b 2
-If there is a 3rd layer, its output is will be computed as
3 3 3 3 2 2 1 1 1 2 3
a = f y = f w f
3
w f w x + b + b + b
- The superscript in the above expressions does not denote power but its just an indication of the layer
number
Inter-Layer Connections
- Each neuron on a previous layer is connected to every neuron on the next layer
- Neurons on a previous layer does not have to be connected to all neurons on the next layer
(iii) Feedforward
- Previous layer neurons have their outputs fed into inputs of next layer
(iv) Feedback
- Previous layer neurons have their outputs fed into inputs of next layer and vice versa (i.e., it has
bidirectional connections)-
(v) Hierarchical
- Neurons of a lower layer may only communicate with neurons on the next level of layer
(vi) Resonance
Page | 7
- Messages are communicated across connections and repeated until certain conditions are reached.
Mcculloch-Pitts Neuron
x1
1
x2
w a
y 0
xn w
xn+1
w
xn+m
-c
-c
xn+1,. . ., xn+m Inhibitory i/ps because the synaptic weights, –c, are negative
- State of neuron y at discrete time k is determined by the states of the i/ps x, to xn+m @ time k-1
1, yb
a=
0, yb
1 0 0
0 1 0
0 0 0
Page | 8
e.g 2 OR logic function
OR logic function
b=2 x1 x2 a
x1 w=2 a
y 1 1 1
x2 w=2 1 0 1
0 1 1
0 0 0
1 1 0
1 0 1
0 1 1
0 0 0
b=2
2
x1 y1 x3 2 b=2
-1
a
-1 y3
y2 x4
x2 2
2
b=2
Page | 9
- Layer 1: x1, x2
- Layer 2: x3, x4
- Output: a
- Supposing x1 = x2 =1, the middle layer of the network, x3 = x4 = 0 because y1 < b and y2 < b
The Perceptron
- The name perceptron is now used as a synonym for single-layer, feed-forward networks.
w1,1 y1
x1
a1
b1 x1
x2 y2
a1
x2 y
b2 a
xR yS
b
aS
wS,R xR
bS
Fig. 7(a) A multineuron perceptron Fig. 7(b) A single neuron perceptron
- It is observable in Fig. 7(a) that a single weight only affects one of the outputs.
- This means that the study of perceptrons is made easier by only considering networks with a single output
(i.e. similar to the network shown Fig. 7(b)).
Page | 10
What perceptrons can represent
- Perceptrons can represent the AND, OR and NOT logic functions.
- But it does not follow that a perceptron (a single-layer, feed-forward network) can represent any Boolean
function?
- To see why, consider the following truth tables of the AND and XOR functions
1 0 0 1 0 1
0 1 0 0 1 1
0 0 0 0 0 0
1,1 1,1
0,1 0,1
Fig. 8 (a) AND logic function and (b) XOR logic function
(where a filled circle represents an output of one and a hollow circle represents an output of zero).
- Looking at the AND graph in Fig 8(a), it is observable that one’s can be separated from the zero’s with a
line. This is not possible with XOR in Fig. 8(b).
-Although, only being able to learn linearly separable functions is a major disadvantage of the perceptron it
is still worth studying as it is relatively simple and can help provide a framework for other architectures.
Page | 11
-It should however, be realised that perceptrons are not only limited to two inputs (e.g. the AND function).
We can have n inputs which gives us an n-dimension problem.
- Therefore, we would like our neural network to “learn” so that it can come up with its own set of weights.
- We will consider this aspect of neural networks for a simple function (i.e. one with two inputs). We could
obviously scale up the problem to accommodate more complex problems, providing the problems are
linearly separable.
Epoch : An epoch is the presentation of the entire training set to the neural network. In the
case of the AND function an epoch consists of four sets of inputs being presented
to the network (i.e. [0,0], [0,1], [1,0], [1,1]).
Target Value, t : When we are training a network we not only present it with the input but also with
a value that we require the network to produce. For example, if we present the
network with [1,1] for the AND function the training value will be 1.
Error, e : The error value is the amount by which the value output by the network differs
from the training value. For example, if we required the network to output 0 and it
output a 1, then e = 0-1 = -1.
Bias, b : The bias value used to set the activation threshold value of the neuron
Output from Neuron, a : The output value from the neuron
xi : Inputs being presented to the neuron
wi : Weight from input neuron ( x i ) to the output neuron
LR : The learning rate. This dictates how quickly the network converges. It is set by a
matter of experimentation. It is typically 0.1.
binew = biold + e
Note : This is often called the delta learning rule.
End If
End While
Page | 12
- The perceptron learning rule is guaranteed to converge to a solution in a finite number of steps, so long as
a solution exists.
- A single-layer perceptron is able to learn only linearly separable functions (e.g., Fig. 8(a)).
- Number of network inputs = number of problem inputs (e.g., if features such as fruit shape, fruit skin
smoothness, and fruit number of seeds are used to classify fruits into lemon and apple, 3 inputs are
required)
- Number of neurons in output layer = number of problem outputs (e.g., to classify animals into cats and
dog classes, one neuron is required where output 1 could represent the cat class and output 0 could
represent the dog class)
- k neurons can categorise 2k classes; for example, to categorise 2 classes, 1 neuron can be used and only
one decision boundary is required; similarly, to categorise 4 classes, 2 neurons can be used and only 2
decision boundaries are required etc
- Consider the graphs in Fig. 9 that has only 2 classes in each graph, it is impossible to draw a single line
(one decision boundary) to separate the classes into 2 regions, these graphs, therefore, represent linearly
inseparable problems
- For problems where the function is linearly inseparable (e.g., Fig. 9), multilayer perceptrons combined
with the backpropagation algorithm can be used for such classification.
(c)
(a) (b)
Example
For the single-input neuron in the figure below, the input is 2.0, its weight is 2.3 and its
bias is -3.
(i.) What is the net input to the activation function?
(ii.) What is the neuron output?
b = -3
a = f (y)
x = 2.0 y
w = 2.3 Page | 13
Solution
(i) The net input to the activation function is:
y = wx + b
= (2.3) x (2.0) + (-3)
= 1.6
(ii) The output cannot be computed because the activation function is not given
Example
What is the output of the neuron of the previous example if it has the following activation functions:
(i) Hardlimit
(ii) Linear
(iii) Log-sigmoid
Solution
(i) Hard limit activation function
0 if ( y 0)
a = f ( y) =
1 if ( y 0)
a=1
(ii) Linear activation function
a = f (y) = y
a = 1.6
1 1
a = f ( y) = = = 0.8320
1 + e−y 1 + e −1.6
Example
Consider the 2-input single neuron shown below. The input vector is [-5 6]T and the weight vector is [3 2],
while the bias is 1.2. Calculate the neuron output for the following activation functions:
b = 1.2
x1 = -5 w1 = 3
a = f (y)
y
x2 = 6 w2 = 2
Page | 14
Solution
− 1 if ( y 0)
a = f ( y) =
+ 1 if ( y 0)
a = f ( y) = f (−1.8) = −1
− 1 if( y 0)
a = f ( y ) = y if(0 y +1)
+ 1 if( y 1)
a = f ( y) = f (−1.8) = 0
Example
In a classification problem that employs a perceptron, x1, x2, x3, and x4 are input vectors while t1, t2, t3, and
t4 are the corresponding outputs; the values of the input-output pairs are provided below. Starting with a
random guess value of weight vector (w) given as [0 0], a bias value given as 0, and using a learning rate of
1.0 and employing the hardlimit activation function, determine
(i) the weight and bias values for the perceptron.
(ii) represent the classification problem graphically while showing the input vector, the decision boundary
and the weight vector.
2 1 − 2 − 1
x1 = , t1 = 0, x 2 = , t 2 = 1, x3 = , t 3 = 0, x 4 = , t 4 = 1
2 −2 2 1
Page | 15
Solution
From the given information, there are exactly two classes (i.e., class 0 and class 1), therefore only one
output neuron is required. It is a 2-dimensional problem with one decision boundary since there are 2
inputs to the system at any time. A sketch of the perceptron network with the first set of inputs and the
guess value of weight vector and bias is provided below.
b=0
2 w=0
a = f (y)
y
2 w=0
We start by calculating the perceptron’s output a for the first input vector x1, using the initial weights and
bias.
a = f ( w(0) x1 + b(0))
2
a = f 0
0 + 0 = f 0 = 1 (using hardlimit activation function)
2
The output a does not equal the target value t1, so we use the perceptron rule to find new weights and
biases based on the error.
e = t1 − a = 0 − 1 = −1
w1 = w 0 + ex1T = 0 0 + − 1 2 2 = − 2 − 2
b1 = b 0 + e = 0 + − 1 = −1
We now apply the second input vector x2, using the updated weights and bias.
a = f ( w(1) x2 + b(1))
1
a = f − 2 − 2 − 1 = f 1 = 1
− 2
This time the output a is equal to the target t2. Application of the perceptron rule will not result in any
changes.
w 2 = w1
b 2 = b1
Page | 16
We now apply the third input vector.
a = f ( w(2) x3 + b(2))
− 2
a = f − 2 − 2 − 1 = f − 1 = 0
2
The output in response to input vector x3 is equal to the target t3, so there will be no changes.
w 3 = w 2
b 3 = b 2
We now move on to the last input vector x4.
a = f ( w(3) x4 + b(3))
− 1
a = f − 2 − 2 − 1 = f − 1 = 0
1
This time the output a does not equal the appropriate target t4. The perceptron rule will result in a new set
of values for w and b.
e = t4 − a = 1 − 0 = 1
a = f ( w(4) x1 + b(4))
2
a = f − 3 − 1 + 0 = f − 8 = 0
2
Therefore there are no changes.
w 5 = w 4
b 5 = b 4
The second presentation of x2 results in an error and therefore a new set of weight and bias values.
a = f ( w(5) x2 + b(5))
1
a = f − 3 − 1 + 0 = f − 1 = 0
− 2
Page | 17
Here are those new values:
e = t2 − a = 1 − 0 = 1
w 6 = w 5 + ex2T = − 3 − 1 + 1 1 − 2 = − 2 − 3
b 6 = b 5 + e = 0 + 1 = 1
Cycling through each input vector once more results in no errors.
2
a = f w 6 x1 + b 6 = f − 2 − 3 + 1 = f − 9 = 0 = t1
2
1
a = f w 6 x 2 + b 6 = f − 2 − 3 + 1 = f 5 = 1 = t 2
− 2
− 2
`
a = f w 6 x3 + b 6 = f − 2 − 3 + 1 = f − 1 = 0 = t 3
2
− 1
a = f w 6 x 4 + b 6 = f − 2 − 3 + 1 = f 0 = 1 = t 4
1
(ii) We can graph the training data and the decision boundary of the solution. The decision boundary is
given by
y = wx + b = w1,1 x1 + w1, 2 x 2 + b = −2 x1 − 3 x 2 + 1 = 0
To find the x 2 intercept of the decision boundary, set x1 = 0
b 1 1
x2 = − =− = if x1 = 0
w1, 2 −3 3
Page | 18
The decision boundary
The weight vectors must be
orthogonal to the decision
boundaries, and pointing in
the direction of points to be
classified as 1 (the dark
points). The weight vectors
can have any length we like.
Example
Consider the following graph representation of classification problems and determine
(i) which of (a), (b), and (c) can be learnt by a single layer perceptron.
(ii) the weight vectors and biases of problems that can be learnt by single layer perceptron and confirm
their correctness by testing them with the input vectors
2
2 2
1
1 1
1 2 3
1 2 3 1 2 3
(a) (c)
(b)
Solution
(i) Since there are only 2 classes (shaded and unshaded circles), only 1 decision boundary is required in
order to classify the problems with a single-neuron perceptron. Hence, only (a) and (c) can be learnt by a
single layer perceptron. This is because (a) and (c) are linearly separable can have a single line dividing
the "1" outputs from the "0" outputs as shown below.
2 2
2
1 1
w 1
3 1 2 3
1 2 3 1 2
w
(b) (c)
(a)
Page | 19
(ii) The next step is to find the weights and biases. The weight vectors must be orthogonal to the decision
boundaries, and pointing in the direction of points to be classified as 1 (the dark points). The weight
vectors can have any length we like.
For (a):
The slope of the decision boundary line is computed as:
y − y1 2 − 0
m= = =2
x − x1 1− 0
Since the weight vector must be orthogonal to the decision boundary, its line is normal to the decision
1 1
boundary line with a slope of − = −
m 2
Choose any value for the x component of the weight vector and use this to compute the corresponding
value of y as follows:
Assuming x = -2 and the weight vector line is chosen to pass through the origin, then, the y component
is computed as:
y = mx + c = − − 2 + 0 = 1 (This is the equation of a straight line, hence y and x this context are
1
2
different from the input and output vectors of the neuron)
Now we find the bias values for each perceptron by picking a point on the decision boundary and
satisfying
y = wx + b = 0
0
b = − wx = − − 2 1 = 0
0
We can now check our solution against the original points. Here we test the first network on the input
vector
• For input [-2 2]:
− 2
a = f − 2 1 + 0 = f 6 = 1 (the result matches the actual output)
2
− 2
a = f − 2
1 + 0 = f 4 = 1 (the result matches the actual output)
0
• For input [-2 -2]:
Page | 20
− 2
a = f − 2 1 + 0 = f 2 = 1 (the result matches the actual output)
− 2
• For input [0 -2]:
0
a = f − 2 1 + 0 = f − 2 = 0 (the result matches the actual output)
− 2
The test continues until all the input points have been tested
For (c):
The slope of the decision boundary line is computed as:
y − y1 2 −1 1
m= = =
x − x1 − 1 − (−2) 1
Since the weight vector must be orthogonal to the decision boundary, its line is normal to the decision
1
boundary line with a slope of − = −1
m
Choose any value for the x component of the weight vector and use this to compute the corresponding
value of y as follows:
Assuming x = 2 and the weight vector line is chosen to pass through the origin, then, the y component
is computed as:
y = mx + c = −1 2 + 0 = −2
Hence the weight vector is [2 -2]
Now we find the bias values for each perceptron by picking a point on the decision boundary and
satisfying
y = wx + b = 0
Picking point [-1 2], we have:
− 1
b = − wx = − 2 − 2 = 6
2
We can now check our solution against the original points. Here we test the first network on the input
vector
• For input [0 2]:
0
a = f 2 − 2 + 6 = f 2 = 1 (the result matches the actual output)
2
Page | 21
• For input [-2 2]:
− 2
a = f 2 − 2 + 6 = f − 2 = 0 (the result matches the actual output)
2
− 2
a = f 2 − 2 + 6 = f 2 = 1 (the result matches the actual output)
0
The test continues until all the input points have been tested
Page | 22