scunit-1-application-of-soft-computing-kcs056
scunit-1-application-of-soft-computing-kcs056
LECTURE NOTES ON
(KCS056) (APPLICATION OF SOFT COMPUTING)
(B.TECH.) (CSE/IT) (IVTH-YEAR) (VIITH-SEMESTER)
(AKTU)
UNIT – 1
(NEURAL NETWORKS-I)
CONTENTS
1. Introduction
1.1. What is Soft Computing?
1.2. Soft computing vs. hard computing
1.3. Definitions of Soft Computing (SC)
1.4. Goals of Soft Computing
1.5. Importance of Soft Computing
2. Neural Network
2.1. Structure of Biological neuron
2.2. Function of a Biological neuron
3. Artificial Neuron and its model
3.1. Activation functions
4. Neural network architecture
4.1 Single layer and multilayer feed forward networks
4.2 Recurrent networks.
5. Various learning techniques
5.1 Perception and convergence rule
5.2 Auto-associative memory
5.3 Hetro-associative memory
Introduction:
What is Soft Computing ?
The idea of soft computing was initiated in 1981 when Lotfi A. Zadeh published his first paper on
soft data analysis ―What is Soft Computing‖, Soft Computing. Springer-Verlag Germany/USA
1997.].
Zadeh, defined Soft Computing into one multidisciplinary system as the fusion of the fields of
Fuzzy Logic, Neuro-Computing, Evolutionary and Genetic Computing, and Probabilistic
Computing.
Soft Computing is the fusion of methodologies designed to model and enable solutions to real
world problems, which are not modeled or too difficult to model mathematically.
The aim of Soft Computing is to exploit the tolerance for imprecision, uncertainty, approximate
reasoning, and partial truth in order to achieve close resemblance with human like decision
making.
The Soft Computing – development history
SC = EC + NN + FL
Soft Evolutionary Neural Fuzzy
Computing Computing Network Logic
Zadeh Rechenberg McCulloch Zadeh
1981 1960 1943 1965
EC = GP + ES + EP + GA
Evolutionary Genetic Evolution Evolutionary Genetic
Computing Programming Strategies Programming Algorithms
Rechenberg Koza Rechenberg Fogel Holland
1960 1992 1965 1962 1970
Soft Computing is tolerant of imprecision, uncertainty, partial truth and approximation whereas
Hard Computing requires a precisely state analytic model.
Soft Computing is based on fuzzy logic, neural sets, and probabilistic reasoning whereas hard
Computing is based on binary logic, crisp system, numerical analysis and crisp software.
Soft computing has the characteristics of approximation and dispositional whereas hard computing
has the characteristics of precision and categorical.
Soft computing can evolve its own programs whereas hard computing requires programs to be
written.
Soft computing can use multi valued or fuzzy logic whereas hard computing uses two-valued
logic.
Lotfi A. Zadeh, 1992 : ―Soft computing is an emerging approach to computing that gives
the remarkable ability of the human mind to argue and learn in the atmosphere of
uncertainty and distrust.‖.
Soft computing is not a concoction, mixture, or combination, rather, Soft computing is
a partnership in which each of the partners contributes a distinct methodology for addressing
problems in its domain. In principal the constituent methodologies in Soft computing are
complementary rather than competitive.
Soft computing may be viewed as a foundation component for the emerging field of Conceptual
Intelligence.
The Soft Computing consists of several computing paradigms mainly :
Fuzzy set : for knowledge representation via fuzzy If – Then rules.
Neural Networks : for learning and adaptation
Genetic Algorithms : for evolutionary computation
Soft computing differs from hard(conventional) computing. Unlike hard computing, the soft computing is
tolerant of imprecision, uncertainty, partial truth, and approximation. The guiding principle of soft
computing is to exploit these tolerance to achieve tractability, robustness and low solution cost. In effect,
the role model for soft computing is the human mind.
Neural Network
Neural Networks. which are simplified models of the biological neuron system, is a massively parallel
distributed processing system made up of highly interconnected neural computing elements that have the
ability to learn and thereby acquire knowledge and make it available for use.
Neural Networks (NNs) are also known as Artificial Neural Networks (ANNs), Connectionist Models, and
Parallel Distributed Processing (PDP) Models. Artificial Neural Networks are massively parallel
interconnected networks of simple (usually adaptive) elements and their hierarchical organizations which
are intended to interact with the objects of the real world in the same way as biological nervous systems
do.
A neural net is an artificial representation of the human brain that tries to simulate its learning process.
The term "artificial" means that neural nets are implemented in computer programs that are able to handle
the large number of necessary calculations during the learning process.
Brain contains about 1010 basic units called neurons. Each neuron in turn, is connected to about 104 other
neurons. A neuron is a small cell that receives electro-chemical signals from its various sources and in
turn responds by transmitting electrical impulses to other neurons.
An average brain weighs about 1.5 kg and an average neuron has a weight of 1.5 x 10-9 gms. While some
of the neurons perform input and output operations, the remaining form a part of an interconnected
network of neurons which are responsible for signal transformation and storage of information. However,
despite their different activities, all neurons share common characteristics.
A neuron is composed of a nucleus-a cell body known as soma. Attached to the soma are long
irregularly shaped filaments called dendrites. The dendrites behave as input channels, (i.e.) all inputs from
other neurons arrive through the dendrites. Dendrites look like branches of a tree during winter. Another
type of link attached to the soma is the Axon. Unlike the Dendrites links, the axon is electrically active
and serves as an output channels, If the cumulative inputs received by the soma raise the internal electric
potential of the cell known as Membrane Potential.
The axon terminates in a specialized contact called synapse or synaptic junction that· connects the axon
with the dendritic links of another neuron. The synaptic junction, which is a very minute gap at the end of
the dendritic link contains a neuro-transmitter fluid. It is this fluid which is ·responsible for accelerating or
retarding the electric charges to the soma. Each dendritic link can have many synapses acting on it thus
bringing about massive interconnectivity.
In general, a single neuron can have many synaptic inputs and synaptic outputs.
There are many different types of neuron cells found in the nervous system. The differences are due to
their location and function.
The input /output and the propagation of information are shown below.
The neurons perform basically the following function: all the inputs to the cell, which may vary by the
strength of the connection or the frequency of the incoming signal, are summed up. The input sum is
processed by a threshold function and produces an output signal.
The brain works in both a parallel and serial way. The parallel and serial nature of the brain is readily
apparent from the physical anatomy of the nervous system. That there is serial and parallel processing
involved can be easily seen from the time needed to perform tasks. For example a human can recognize
the picture of another person in about 100 ms. Given the processing time of 1 ms for an individual neuron
this implies that a certain number of neurons, but less than 100, are involved in serial; whereas the
complexity of the task is evidence for a parallel processing, because a difficult recognition task cannot be
performed by such a small number of neuron. This phenomenon is known as the 100-step-rule.
Biological neural systems usually have a very high fault tolerance. Experiments with people with brain
injuries have shown that damage of neurons up to a certain level does not necessarily influence the
performance of the system, though tasks such as writing or speaking may have to be learned again. This
can be regarded as re-training the network.
In the following work no particular brain part or function will be modeled. Rather the fundamental brain
characteristics of parallelism and fault tolerance will be applied.
Here, x1, x2, x3, ... , xn are the n inputs to the artificial neuron. W1, W2, ... , wn are the weights attached
to the input links.
Recollect that a biological neuron receives all inputs through the dendrites, sums them and produces an
output. The input signals are passed on to the cell body through the synapse which may accelerate or
retard an arriving signal. An effective synapse which transmits a stronger signal will have a
correspondingly larger weight while a weak synapse will have smaller weights. Thus, weights here are
multiplicative factors of the inputs to account for the strength of the synapse. Hence, the total input I
received by the soma of the artificial neuron is
To generate the final output y, the sum is passed on to a non-linear filter ϕ called Activation function, or
Transfer function, or Squash function which releases the output.
i.e. . y = ϕ (I)
A very commonly used Activation function is the Thresholding function. In this, the sum is compared
with a threshold value θ. If the value of I is greater than θ, then the output is 1 else it is 0. i.e.
where, ϕ is the step function known as Heaviside function and is such that
Figure 2.4 illustrates the Thresholding function. This is convenient in the sense that the output signal is either 1
or O resulting in the neuron being on or off.
For neurons in same layer, same activations functions are used. There may be linear as well as non-linear
activation function. So there are many type of activation functions are used. Few are given here.
Now, I= Ʃxiwi = x1 w1 + x2 w2 + x3 w3
Downloaded by Deepansh Sharma ([email protected])
lOMoARcPSD|50004020
ϕ(I)
Identity function : The linear neuron or linear network is also called as Identity
funciton. In this case,
ϕ(I) = I ..
Example:
Let, x1 = 0.25; w1 = 0.10
X2 = 0.50; W2 = 0.40
X3 = 0.82; W3 = 0.90
Now, I= Ʃxiwi = x1 w1 + x2 w2 + x3 w3
= 0.25 X 0.10 + 0.50 X 0.40 + 0.82 X 0.90
= 0.025 + 0.20 + 0.738
I = 0.963 . .
As, ϕ(I)= I
So, ϕ(0.963) = 0.963.
ϕ(I)
Signum function
Example:
Now, I= Ʃxiwi = x1 w1 + x2 w2 + x3 w3
= 0.25 X 0.10 + 0.50 X 0.40 + 0.82 X 0.90
= 0.025 + 0.20 + 0.738
I = 0.963 . .
Assume θ= 0.5
So I = 0.963 and θ = 0.5.
As, I> θ; Hence, ϕ(I )= + l
Figure 2.5 illustrates the Signum function.
Sigmoidal function
This function is a continuous function that varies gradually between the asymptotic values 0 and 1 or -1
and + 1. Sigmoidal functions are differentiable, which is an important feature of NN theory, as given by
where α is the slope parameter, which adjusts the abruptness of the function as it changes between the two
asymptotic values.
Eexample:
Let, x1 = 0.25; w1 = 0.10
X2 = 0.50; W2 = 0.40
X3 = 0.82; W3 = 0.90
Now, I= Ʃxiwi = x1 w1 + x2 w2 + x3 w3
= 0.25 X 0.10 + 0.50 X 0.40 + 0.82 X 0.90
= 0.025 + 0.20 + 0.738
I = 0.963 .
Assume the value of α to be 1.
Now, ϕ(I)=
= 0.723
ϕ (0.963) = 0.723
The first formal definition of a synthetic neuron model based on the highly simplified considerations of
biological model was formulated by Warren Mcculloch and Walter Pitts in 1943.
McCulloch-Pitts neuron allows binary 0 or l states only so it is binary activated. The neurons are
connected by direct weighted path. The connected path can be excitatory or inhibitory. Excitatory
connection have positive weight and inhibitory connection have negative weights. In Mcculloch-pitts
neuron model the output is found as :
n
1 if ∑ Wixi ≥ ɸ(threshold)
i=1
Y(output) = n
Where Y is an output produced by McCulloch-Pitts neuron model. W, is the weight of an edges and x is
an input (excitatory input or inhibitory input) given to neuron. ɸ is the threshold value.·
2. A connection path is excitatory if the weight is positive otherwise it is inhibitory. All excitatory
3. Each neuron has a fixed threshold such that if the net input to the neuron is greater than or equal to the
4. The threshold is set so that inhibition is absolute. That is, any non-zero inhibitory input will prevent the
neuron fires.
Input output
X1 X2 Y
0 0 0
0 1 0
1 0 0
1 1 1
If we apply the concept of AND gate as a function of McCulloch-Pitts model we will get the output
n
1 if ∑ Wixi ≥ ɸ(threshold)
i=1
Y(output) =
n
0 if ∑ Wixi < ɸ(threshold)
i=1
where ɸ is a threshold value for McCulloch-Pitts neuron model. Let's take both weights as '1', here :
Then, W1 x1 + W2X2
=l x 0 + l x l
=0 + 1
=1
Since, W1 x1 + W2 x2< ɸ
So, the output is zero.
(iv) When x 1 = 0 and x2 = 0
Then, W1 x1 + W2x2
=l x 0 + l x 0
=0+0
=0
Since, W1 x1 + W2 x2< ɸ
So, the output is zero.
Note: Actually we can take any integer value for weights but both weights should be same. Since weights
represents synaptic link and we want equal priority for both inputs.
OR Gate: Here the pitts neuron is treated as OR gate. So as per properties of OR gates, returns a high (1)
if any of input is high, returns a low (0) if none of the input is high. Truth table for OR gate :
Input output
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1
As per Fig. 1.10, the threshold value for pitts neuron is 2. Now output is calculated as,
n
1 if ∑ Wixi ≥ 2
i=1
Output =
n
0 if ∑ Wixi < 2
i=1
NOT Gate: Here the Pitts neuron in Fig. 1.11 is treated as NOT gate function. As per NOT gate
properties it returns true value (1) if input is false (0) and returns false (0) if the input is true (1).
'NOT' Gate (Implementation) : Truth table for NOT gate :
Input(X) 1 0
Output(Y) 0 1
n
1 if ∑ Wixi ≥ ɸ(threshold)
i=1
Y( Output) =
n
0 if ∑ Wixi < ɸ(threshold)
i=1
XOR Gate (Implementation) : First we will draw truth table for XOR, as follows :
Input output
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 0
There are several classes of NN, classified according to their learning mechanisms. However, we identify three
fundamentally different classes of Networks. All the three classes employ the digraph structure for their
representation.
This type of network comprises of two layers, namely the input layer and the output layer. The input layer
neurons receive the input signals and the output layer neurons receive the output signals. The synaptic links
carrying the weights connect every input neuron to the output neuron but not vice-versa. Such a network is said
to be feedforward in type or acyclic in nature. Despite the two layers, the network is termed single layer since
it is the output layer, alone which performs computation. The input layer merely transmits the signals to the
output layer. Hence, the name single layer feedforward network. Figure 2.8 illustrates an example network.
and the corresponding weights are referred to as hidden-output layer weights. A multilayer feed-forward
network with l input neurons m1, neurons in the first hidden layer, m2 neurons in the second hidden layer
and n output neurons in the output layer is written as l - m1 - m2 - n. Figure 2.9 illustrates a multilayer feed-
forward network with a configuration l - m - n.
Recurrent Networks
These networks differ from feed-forward network architectures in the sense that there is at least one
feedback loop. Thus, in these networks, for example, there could exist one layer with feedback
connections as shown in Fig. 2. 10. There could also be neurons with self-feedback Jinks, i.e. the output of
a neuron is fed back into itself as input.
(iii) The NNs possess the capability to generalize. Thus, they can predict new outcomes from past trends. ·
(iv) The NNs are robust systems and are fault tolerant. They can, therefore, recall full patterns from
incomplete, partial or noisy patterns.
(v) The NNs can process information in parallel, at high speed, and in a distributed manner.
LEARNING METHODS/TECHNIQUES
Learning methods in Neural Networks can be broadly classified into three basic types: supervised,
unsupervised, and reinforced.
Supervised learning
In this, every input pattern that is used to train the network is associated with an output pattern, which is
the target or the desired pattern. A teacher is assumed to be present during the learning process, when a
comparison is made between the network's computed output and the correct expected output, to determine
the error. The error can then be used to change network parameters, which result in an improvement in
performance.
Unsupervised learning
In this learning method, the target output is not presented to the network. It is as if there is no teacher to
present the desired patterns and hence, the system learns of its own by discovering and adapting to
structural features in the input patterns.
Reinforced learning
In this method, a teacher though available, does not present the expected answer but only indicates if the
computed output is correct or incorrect. The information provided helps the network in its learning
process. A reward is given for a correct answer computed and a penalty for a wrong answer. But,
reinforced learning is not one of the popular forms of learning.
Supervised and unsupervised learning methods, which are most popular forms of learning have found
expression through various rules. Some of the widely used rules have been presented below:
Hebbian learning
This rule was proposed by Hebb (1949) and is based on correlative weight adjustment. This is the oldest
learning mechanism inspired by biology. ·
In this the input-output pattern pairs (Xi, Yi) are associated by the weight matrix W, known as the
correlation matrix. It is computed as
where, η is the learning rate parameter and ∂E/∂Wij is the error gradient with reference to the weight
Wij.
The Widrow and Hoffs Delta rule and Back-propagation learning rule are all examples of this type of
learning mechanism.
Competitive learning
In this method, those neurons which respond strongly to input stimuli have their weights updated.
When an input pattern is presented, all neurons in the layer compete and the winning neuron
undergoes weight adjustment. Hence, it is a "winner-takes-all" strategy.
Stochastic learning
In this method, weights are adjusted in a probabilistic fashion. An example is evident in simulated
annealing-the learning mechanism employed by Boltzmann and Cauchy machines, which are a kind
of NN systems. ·
A perceptron is a neural network unit (an artificial neuron) that does certain computations to detect
features or business intelligence in the input data.
A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm enables neurons
to learn and processes elements in the training set one at a time.
Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning rule based on
the original MCP neuron
Rosenblatt's Perceptron
The perceptron is a computational model of the retina of the eye and hence, is named 'perceptron'. The
network comprises three units, the Sensory unit ·s, Association unit A, and Response unit R (refer Fig.
2.12).
The S unit comprising 400 photodetectors receives input images and provides a 0/1 electric signal as
output. If the input signals exceed a threshold, then the photodetector outputs l else 0.
The photodetectors are randomly connected to the Association unit A. The A unit comprises, feature
demons or predicates. The predicates examine the output of the S unit for specific features of the image.
The third unit R comprises pattem recognizers or perceptrons which receives the results of the predicate,
also in binary form. While the weights of the S and A units are fixed those of R are adjustable.
The output of the R unit could be such that if the weighted sum of its inputs is less than or equal to 0, then
the output is 0 otherwise it is the weighted sum itself. It could also be determined by a step function with
binary values (O/1) or bipolar values (-1/1). Thus, in the case of a step function yielding 0/1 output values,
it is defined as where
Yj = ɸ (netj) = 1, if netj > 0
= 0, otherwise
n
where netj = Ʃwij xi
i=I
Here, xi is the input, wij is the weight on the connection leading to the output units (R unit), and yj is the
output.
The training algorithm of the perceptron is a supervised learning algorithm where weights are adjusted to
minimize error whenever the computed output does not match the target output.
There are two types of Perceptrons: Single layer and Multilayer.
An arrangement of one input layer of neurons feed forward to one output layer of neurons is known as
Single Layer Perceptron.
Multilayer perceptron
A multilayer perceptron (MLP) is a perceptron that teams up with additional perceptrons, stacked in
several layers, to solve complex problems. The diagram below shows an MLP with three layers. Each
perceptron in the first layer on the left (the input layer), sends outputs to all the perceptrons in the
second layer (the hidden layer), and all perceptrons in the second layer send outputs to the final layer
on the right (the output layer).
eeach each
Each perceptron sends multiple signals, one signal going to each perceptron in the next layer. For each
signal, the perceptron uses different weights. In the diagram above, every line going from a perceptron
in one layer to the next layer represents a different output. Each layer can have a large number of
perceptrons, and there can be multiple layers, so the multilayer perceptron can quickly become a very
complex system. The multilayer perceptron has another, more common name—a neural network. A
three-layer MLP, like the diagram above, is called a Non-Deep or Shallow Neural Network. An MLP
with four or more layers is called a Deep Neural Network. One difference between an MLP and a neural
network is that in the classic perceptron, the decision function is a step function and the output is binary.
In neural networks that evolved from MLPs, other activation functions can be used which result in
outputs of real values, usually between 0 and 1 or between -1 and 1. This allows for probability-based
predictions or classification of items into multiple labels
Step 1: Create a perceptron with (n + 1) input neurons x0 , x1 , . . . , Xn, where x 0 = 1 is the bias input.
Let O be the output neuron.
Step 2 : Initialize w = (w0 , w1 , .. . , Wn) to random weights.
Step 3 : Iterate through the input patterns xj of the training set using the weight set, (i.e.) compute the
weighted sum of inputs netj = Σn xi wij for each input pattern j.
i=1
- Definition : Sets of points in 2-D space are linearly separable if the sets can be separated by a
straight line.
- Generalizing, a set of points in n-dimensional space are linearly separable if there is a hyper plane of
(n-1) dimensions separates the sets.
Example
S1 S2 S1
S2
The perceptron cannot find weights for classification type of problems that are not linearly separable. An
example is the XOR ( eXclusive OR) problem.
Step 1: Initialization -- Set w0 = 0. Then perform the following computations for time step n = 1, 2……··
Step 2: Activation - At time step n, activate the perceptron by applying continuous valved input vector
x (n) and desired response d (n ).
Step 3: Computation of actual response - Compute the actual response of the perceptron : ·
~ /-
where d(n) =
Thus, the adaptation of the weight vector w(n) is summed up nicely in the form of the error-correction
learning rule :
w(n + 1) = w(n) + α [d(n)-y(n)] x(n)
where α is the learning-rate parameter and the difference d(n)-y(n) plays the role of an error signal. The
learning-rate parameter is a positive constant limited to the range
0 < α <= 1.
Example: A single-layer two-input perceptron is shown in Figure
And
e(p) = Yd(p) - Y(p), where p = 1, 2, 3, . (where e denotes error, Yd(p) desired inputs Y(p) current output)
w(n + 1) = w(n) + α [d(n)-y(n)] x(n) (where e=d(n)-y(n)])
Example of perceptron learning: the logical operation AND
ADALINE Network
The Adaptive Linear Neural Element Network framed by Bernard Widrow of Stanford University,
makes use of supervised learning. Figure 2.19 illustrates a simple ADALINE network. Here, there
is only one output neuron and the output values are bipolar (-1 or + 1 ). However, the inputs x I
could be binary, bipolar or real valued. The bias weight is w0 with an input link of Xo = + 1. If the
weighted sum of the inputs is greater than or equal to 0 then the output is 1 otherwise it is -1.
The supervised learning algorithm adopted by the network is similar to the perceptron learning
algorithm. Devised by Widrow-Hoff (1960), the learning algorithm is also known as the Least
Mean Square (LMS) or Delta rule. The rule is given by Wi
new
= Wiold + α (t - y)xi
where, α is the learning coefficient, t is the target output, y is the computed output, and xi is the
input.
ADALINE network has had the most successful applications because-it is used virtually in
all high speed modems and telephone switching systems to cancel the echo in long distance
communication circuits.
MADALINE Network
MADALINE network with two units exhibits the capability to solve the XOR problem (refer Fig. 2.21). In
this, each ADALINE unit receives the input bits x 1, x2 and the bias input x0 = 1 as its inputs. The
weighted sum of the inputs is calculated and passed on to the bipolar threshold units. The logical ‗and‘ ing
(bipolar) of the two threshold outputs are computed to obtain the final output. Here, if the threshold
outputs are both +l or -1 then the final output is +l. If the threshold outputs are different, (i.e.) (+l, -1) then
the final output is -1. Inputs which are of even parity produce positive outputs and inputs of odd parity
produce negative outputs.
Figure 2.22 shows the decision boundaries for the XOR problem while trying to classify the even
parity inputs (positive outputs) from the odd parity inputs ·(negative outputs).
Part_1
Part_2
Case 1: Now,
When, (i) x1 = 0, x2 = 0 for part_1. The value of y' = x0w0 + x 1w1 + x 2w2
So, y' = 1 x (-2) + 0 x (1) + 0 x (3) = - 2
When this input will pass from threshold unit it will pass the value - 1 as output, because -2 < 0 so, value
of z' = - 1.
Now,
When, (i) x1 = 0, x2 = 0 for part_2. The value of y‖ = XoWo + X1W1 + X2W2
so, y‖ = 1 X (-1) + 0 X (3) + 0 X (-1) = -1
because - 1 < 0 so, threshold unit will pass the value - 1 as output. So, value of z‖ = -1.
When the value of z' and z" will pass to "AND" logic unit it will give the value + 1 as a output because,
both threshold result is same, so, final output is + 1. So, value of Y = 1 (Positive output even parity).
Case 2:
When, (ii) x1 = 0, x2 = 1 for part_1
Now, y' = 1 x (-2) + 0 x (1) + 1 x (3) = - 2 + 3 = + 1
·: +1 > 0 so, threshold unit will pass out as +1 so, z' = +1.
Now, x1 = 0, and x2 = 1 for Part_2
SO, y" = 1 X (-1) + 0 X (3) + 1 X (-1) = - 2
·: -2 < 0 so, threshold unit will pass out as -1 so z‖=-1
,, 1 so, z = - .
because z' and z" have different value so, 'AND' logic unit will generate output as -1.
So, value of Y = - 1 (classified as odd parity).
Case 3: Now,
When, (iii) Xi = 1, x2 = 0, for Part_1
then, y' = 1 x (-2) + 1 x (1) + 0 x (3) = - 1
·: -1 < 0 so, threshold unit will give output as – 1. So, z' = -1.
When, x1 = 1, and x2 = 0 for Part_2
So, y" = 1 X (-1) + 0 X (3) + 0 X (-1) = 2
·: 2 < 0 so, threshold unit will give output as +1 So, z" = +l.
·: z' and z" both have different value so, 'AND' logic unit will generate output as -1.
So, value of Y = - 1 (classified as odd parity).
Case 4:
When, (iv) x 1 = 1, x2 = 1, for Part_1 then, y' = 1 x (-2) + 1 x (1) + 1 x (3) = 2
·.· 2 > 0 so, threshold unit will give output as +l, so, z' = +1.
Now,
When, x1 = 1, and x2 = +1 for Part_ 2 then, y" = 1 x (-1) + 1 x (3) + 1 x (-1) = 1
·.- + 1 > 0 so, threshold unit will give output as +1 so, z" = +l.
·: z' and z" both have similar value so, 'AND' logic unit will generate output as+ 1. So,
value of Y = +1 (classified as even parity).
Control There is a control unit for No specific control mechanism external to the
Mechanism controlling computing activities computing task.
Associative Memories:
Associative Memories, one of the major classes of neural networks, are faint imitations of the
human brain's ability to associate patterns. An Associative Memory (AM) which belongs to the
class of single layer feed-forward or recurrent network architecture depending on its association
capability.
An associative memory is a storehouse of associated patterns which are encoded in some form.
When· the storehouse is triggered or incited with a pattern, the associated pattern pair is recalled
or output. The input pattern could be an exact replica of the stored pattern or a distorted or partial
representation of a stored pattern. Figure 4.1 illustrates the working of an associative memory. _
In the figure, (∆,ɾ), (7, 4), and (+, +) are associated pattern pairs. The associations represented
using '↔' symbol are stored in the memory. When the memory is triggered for instance, with a ∆,
the associated pattern ɾ is retrieved automatically.
Hetero-Associative Memory: If the associated pattern pairs (x, y) are different and if the model
recalls a y given an x vice versa, then .it is termed as hetero-associative memory. hetero-
associative memories are useful for the association of patterns. and hetero-associative correlation
memories are known as hetro-correlators.
Auto-Associative Memory: On the .other hand, if x and• y refer to the same pattern, then the
model is termed as auto-associative memory. Auto-associative memories are useful for image
refinement i.e. if a distorted or a partial pattern is given as input then the whole pattern stored in
Downloaded by Deepansh Sharma ([email protected])
lOMoARcPSD|50004020
its perfect form can be recalled. Auto-associative correlation memories are known as auto-
correlators.
Associative memory models may be further classified into static and dynamic network based on
the principle of recall.
(a) Static model: Static networks recall an output given an input in one feed-forward pass. Static
networks are therefore called non-recurrent.
(b) Dynamic network : Dynamic network recalls through an input/output feed-back mechanism
so it takes some time. Dynamic networks are termed recurrent.
The training vector is denoted as S: t and testing vector as x S denotes training pattern.
The algorithm is as follows :
. l. Initialize the weight for different neurons (i = 1, 2, 3 ... n and j = 1, 2, 3 ... m)
i.e., Wu= 0
Or W11 = W12 = W13 = 0, etc.
2. For each training input vector set target output vector.
e.g. : if training set = (1,0,0,1)
then set target ,= (0, 1)
3. Set activation function for input units to present training input data.
xi= Si
4. Set activation function for output data. So we can get desired output
Yi = tj
I
S1
..
ST = S t1…...t…….…tn
..
Sn
This matrix product will produce the weight and this weight is stored which represent
association. ' -
To find weight matrix W ij for a set of association
S(P): t(P), P = 1 ... P
where, S(P)= S1(P), S2(P) .... Sn(P).
t(P) = t1(P), t2(P) .... tn(P).
The weight matrix Wij can be given as
Delta Rule for Pattern Association : It is defined as - "the adjustment mode to a synaptic weight of
a neuron proportional to product of error signal and the input signal of the synapse."' The aim of delta
rule is to minimize the error over all training pattern.
Delta Rule for Single Output Unit : The delta rule changes the' weight of the connections to
minimize the difference between the net input unit , Yin and the target value t. The delta rule' is, "
∆Wi-= α (t - Yin. )xi
Delta Rule for Several Output Units: In this case, the weight are changed to reduce the difference
between the input and target. The weight correction involves the adjustment of the ith input unit to the
jth output unit is
∆Wij-= α (tj - Yin. )xi
AUTOCORRELATORS
Auto-associative memory is also known as Autocorrelators, easily recognized by the title of Hopfield
Associative Memory (HAM), were introduced as a theoretical notation by Donald Hebb ( 1949).
Other researchers who studied their dynamics include Little (1974), Little and Shaw (1978), and Hopfield
(1982).
First order autocorrelators obtain their connection matrix (indicative of the association of the pattern with
itself) by multiplying a pattern's element with every other pattern's elements.
A first order autocorrelator stores M bipolar patterns A1, A2, ••• , Am by summing together m outer
products as
m
T= ∑ [AiT][Ai]
i=l
Here, T = [tij] is a (p x p) connection matrix and Ai ϵ{-1, 1 }P. The autocorrelator's recall equation is a
vector-matrix multiplication followed by a point wise nonlinear threshold operation. The recall equation is
given by
aj new = f ( ai tij, ajold) for every j=1,2,…...p.
where Ai = (a1, a2, ...... , ap) and the two parameter bipolar threshold function is
1 if α > 0
f(α, β) = β if α = 0
-1 if α < 0
3 1 3 -3
n 1 3 1 -1
T= ∑ [AiT]4x1[Ai] 1x4 = 3 1 3 -3
i=l
-3 -1 -3 3
This is indeed the vector itself. Also, in the retrieval of A3 = (-1, -1, -1, 1)
(a1 new , a2new , a3new , a4new) = (-1 -1 -1 1)
yielding the same vector.
It is evident that the vector A' is closer to A2 and therefore resembles it, or in other words, is a noisy
version of A 2. ·
Now, the computations
(a1 new , a2new , a3new , a4new) = (f(4,1),f(4,1),f(4,1),f(-4,1))
= (1, 1, 1, -1)
=A2
Hence, in the case of partial vectors, an autocorrelator results in the refinement of the pattern or removal
of noise to retrieve the closest matching stored pattern.
Example 2. Use the Hebb rule to store the vector (1 1-1 -1) in an auto associative neural net.
(a) Find the weight/connection matrix
(6)Test the net input vector = (1 1-1 - 1)
c) Test the net with one mistake in input vector
d) Test the net with one missing in output vector
Solution: (a) To initialize the weight using outer produet Hebb rule
n
T= ∑ [AiT]4x1[Ai] 1x4
i=1
here Ai = [1 1 -1 -1]
hence
1 1 -1 -1
1 1 -1 -1
T/W = -1 -1 1 1
-1 -1 1 1
2. In the binary mode, ON = 1 and OFF = 0 and in the bipolar mode, ON = 1 and OFF= -1
We frame the correlation matrix
N
M= ∑XiYi
i=l
To retrieve the nearest (Ai, Bi) pair given any pair (α, β), the recall equations are as follows:
Starting with (α, β)as the initial condition, we determine a finite sequence (α‘, β‘), (α‖, β‖), .:. until an
equilibrium point ((αF, βF)), is reached.
Here,
any matrix M which corresponds to the local minimum of the energy function.
Kosko proved that each cycle of decoding lowers the energy E if the energy function for any point (α, β)
is given by
E = -αM βT
However, if the energy E evaluated using the coordinates of the pair (Ai, Bi),
i.e. E = - AiM BiT
does not constitute a local minimum, then the point cannot be recalled, eventhough one starts with α = Ai.
In this aspect, Kosko's encoding method does not ensure that the stored pairs are at a local minima.
Example :
suppose that we have to train own network to store the following pattern pairs: let N=3
A1 = (100001) B1 = (11000)
A2 = (011000) B2 = (10100)
A3 = (001011) B3 = (01110)
Solution: Converting these to bipolar forms
X1 = ( 1-1-1-1 -1 1)
X2= (-l 1 1-1-1-1)
X3 = (-1-1 1-1 1 1) .
Y1 = ( I 1 -1 -1 -1 )
Y2 = ( I -1 1 -1 -1)
Y3 = (-1 1 1 1 -1)
The matrix M is calculated as
1 1 -3 -1 1
1 -3 1 -1 1
-1 -1 -1 1 3
-3 1 1 3 1
-1 3 -1 1 -1
Let us suppose we start with α = X3 hoping to retrieve the associated pair Y3•
β‘ = ϕ (α M) = (-1 1 1 1 -1) .
T
β‘ M = (-5 -5 5 -3 7 5)
T)
ϕ(β‘ M = (-1 -1 1 -1 1 1) = α'
α'M = (-1, -1, 1 -1, 1 1) M = (-6 6 6 6 -6)
Here, β' is same as Y3• Hence, (αF, βF) = (X3, Y3) is the desired result.