0% found this document useful (0 votes)
8 views

scunit-1-application-of-soft-computing-kcs056

Uploaded by

deepansh432109
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

scunit-1-application-of-soft-computing-kcs056

Uploaded by

deepansh432109
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

lOMoARcPSD|50004020

SCUnit 1 - Application of Soft Computing KCS056

Soft Computing (Dr. A.P.J. Abdul Kalam Technical University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Deepansh Sharma ([email protected])
lOMoARcPSD|50004020

LECTURE NOTES ON
(KCS056) (APPLICATION OF SOFT COMPUTING)
(B.TECH.) (CSE/IT) (IVTH-YEAR) (VIITH-SEMESTER)

(AKTU)

(MR. MAN SINGH)


(ASSISTANT PROFESSOR)

(DEPARTMENT OF COMPUTER SCIENCE


ENGINEERING & INFORMATION TECHNOLOGY)
(UNITED INSTITUTE OF TECHNOLOGY,
PRAYAGRAJ)

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

UNIT – 1
(NEURAL NETWORKS-I)

CONTENTS
1. Introduction
1.1. What is Soft Computing?
1.2. Soft computing vs. hard computing
1.3. Definitions of Soft Computing (SC)
1.4. Goals of Soft Computing
1.5. Importance of Soft Computing
2. Neural Network
2.1. Structure of Biological neuron
2.2. Function of a Biological neuron
3. Artificial Neuron and its model
3.1. Activation functions
4. Neural network architecture
4.1 Single layer and multilayer feed forward networks
4.2 Recurrent networks.
5. Various learning techniques
5.1 Perception and convergence rule
5.2 Auto-associative memory
5.3 Hetro-associative memory

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Introduction:
What is Soft Computing ?
 The idea of soft computing was initiated in 1981 when Lotfi A. Zadeh published his first paper on
soft data analysis ―What is Soft Computing‖, Soft Computing. Springer-Verlag Germany/USA
1997.].
 Zadeh, defined Soft Computing into one multidisciplinary system as the fusion of the fields of
Fuzzy Logic, Neuro-Computing, Evolutionary and Genetic Computing, and Probabilistic
Computing.
 Soft Computing is the fusion of methodologies designed to model and enable solutions to real
world problems, which are not modeled or too difficult to model mathematically.
 The aim of Soft Computing is to exploit the tolerance for imprecision, uncertainty, approximate
reasoning, and partial truth in order to achieve close resemblance with human like decision
making.
 The Soft Computing – development history

SC = EC + NN + FL
Soft Evolutionary Neural Fuzzy
Computing Computing Network Logic
Zadeh Rechenberg McCulloch Zadeh
1981 1960 1943 1965

EC = GP + ES + EP + GA
Evolutionary Genetic Evolution Evolutionary Genetic
Computing Programming Strategies Programming Algorithms
Rechenberg Koza Rechenberg Fogel Holland
1960 1992 1965 1962 1970

Soft computing vs. hard computing:

Following points clearly differentiate the both:

 Soft Computing is tolerant of imprecision, uncertainty, partial truth and approximation whereas
Hard Computing requires a precisely state analytic model.
 Soft Computing is based on fuzzy logic, neural sets, and probabilistic reasoning whereas hard
Computing is based on binary logic, crisp system, numerical analysis and crisp software.
 Soft computing has the characteristics of approximation and dispositional whereas hard computing
has the characteristics of precision and categorical.
 Soft computing can evolve its own programs whereas hard computing requires programs to be
written.

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

 Soft computing can use multi valued or fuzzy logic whereas hard computing uses two-valued
logic.

Definitions of Soft Computing (SC)

 Lotfi A. Zadeh, 1992 : ―Soft computing is an emerging approach to computing that gives
the remarkable ability of the human mind to argue and learn in the atmosphere of
uncertainty and distrust.‖.
 Soft computing is not a concoction, mixture, or combination, rather, Soft computing is
a partnership in which each of the partners contributes a distinct methodology for addressing
problems in its domain. In principal the constituent methodologies in Soft computing are
complementary rather than competitive.

Soft computing may be viewed as a foundation component for the emerging field of Conceptual
Intelligence.
The Soft Computing consists of several computing paradigms mainly :
Fuzzy set : for knowledge representation via fuzzy If – Then rules.
Neural Networks : for learning and adaptation
Genetic Algorithms : for evolutionary computation

Goals of Soft Computing


Soft Computing is a new multidisciplinary field, to construct new generation of Artificial Intelligence,
known as Computational Intelligence.
 The main goal of Soft Computing is to develop intelligent machines to provide solutions to real
world problems, which are not modeled or too difficult to model mathematically.
 Its aim is to exploit the tolerance for Approximation, Uncertainty, Imprecision, and Partial Truth in
order to achieve close resemblance with human like decision making.
Approximation : here the model features are similar to the real ones, but not the same.
Uncertainty: here we are not sure that the features of the model are the same as that of the entity (belief).
Imprecision: here the model features (quantities) are not the same as that of the real ones, but close to
them.

Importance of Soft Computing

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Soft computing differs from hard(conventional) computing. Unlike hard computing, the soft computing is
tolerant of imprecision, uncertainty, partial truth, and approximation. The guiding principle of soft
computing is to exploit these tolerance to achieve tractability, robustness and low solution cost. In effect,
the role model for soft computing is the human mind.

Neural Network
Neural Networks. which are simplified models of the biological neuron system, is a massively parallel
distributed processing system made up of highly interconnected neural computing elements that have the
ability to learn and thereby acquire knowledge and make it available for use.
Neural Networks (NNs) are also known as Artificial Neural Networks (ANNs), Connectionist Models, and
Parallel Distributed Processing (PDP) Models. Artificial Neural Networks are massively parallel
interconnected networks of simple (usually adaptive) elements and their hierarchical organizations which
are intended to interact with the objects of the real world in the same way as biological nervous systems
do.
A neural net is an artificial representation of the human brain that tries to simulate its learning process.
The term "artificial" means that neural nets are implemented in computer programs that are able to handle
the large number of necessary calculations during the learning process.

Structure of Biological neuron (Human brain):


The human brain is one of the most complicated things which, on the whole, has been poorly understood.
However, the concept of neurons as the fundamental constituent of the brain attributed has made the study
of its functioning comparatively easier. Figure illustrates the physical structure of the human brain.

Fig. Physical structure of the human brain-cross-sectional view.

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Brain contains about 1010 basic units called neurons. Each neuron in turn, is connected to about 104 other
neurons. A neuron is a small cell that receives electro-chemical signals from its various sources and in
turn responds by transmitting electrical impulses to other neurons.
An average brain weighs about 1.5 kg and an average neuron has a weight of 1.5 x 10-9 gms. While some
of the neurons perform input and output operations, the remaining form a part of an interconnected
network of neurons which are responsible for signal transformation and storage of information. However,
despite their different activities, all neurons share common characteristics.
A neuron is composed of a nucleus-a cell body known as soma. Attached to the soma are long
irregularly shaped filaments called dendrites. The dendrites behave as input channels, (i.e.) all inputs from
other neurons arrive through the dendrites. Dendrites look like branches of a tree during winter. Another
type of link attached to the soma is the Axon. Unlike the Dendrites links, the axon is electrically active
and serves as an output channels, If the cumulative inputs received by the soma raise the internal electric
potential of the cell known as Membrane Potential.
The axon terminates in a specialized contact called synapse or synaptic junction that· connects the axon
with the dendritic links of another neuron. The synaptic junction, which is a very minute gap at the end of
the dendritic link contains a neuro-transmitter fluid. It is this fluid which is ·responsible for accelerating or
retarding the electric charges to the soma. Each dendritic link can have many synapses acting on it thus
bringing about massive interconnectivity.
In general, a single neuron can have many synaptic inputs and synaptic outputs.

Fig: Biological Neuron Structure

There are many different types of neuron cells found in the nervous system. The differences are due to
their location and function.

Information flow in a Neural Cell

The input /output and the propagation of information are shown below.

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Fig. Structure of a neural cell in the human brain

 Dendrites receive activation from other neurons.


 Soma processes the incoming activations and converts them into output activations.
 Axons act as transmission lines to send activation to other neurons.
 Synapses the junctions allow signal transmission between the axons and dendrites.
 The process of transmission is by diffusion of chemicals called neuro-transmitters.

Function of a Biological neuron:

The neurons perform basically the following function: all the inputs to the cell, which may vary by the
strength of the connection or the frequency of the incoming signal, are summed up. The input sum is
processed by a threshold function and produces an output signal.

The brain works in both a parallel and serial way. The parallel and serial nature of the brain is readily
apparent from the physical anatomy of the nervous system. That there is serial and parallel processing
involved can be easily seen from the time needed to perform tasks. For example a human can recognize
the picture of another person in about 100 ms. Given the processing time of 1 ms for an individual neuron
this implies that a certain number of neurons, but less than 100, are involved in serial; whereas the
complexity of the task is evidence for a parallel processing, because a difficult recognition task cannot be
performed by such a small number of neuron. This phenomenon is known as the 100-step-rule.

Biological neural systems usually have a very high fault tolerance. Experiments with people with brain
injuries have shown that damage of neurons up to a certain level does not necessarily influence the

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

performance of the system, though tasks such as writing or speaking may have to be learned again. This
can be regarded as re-training the network.

In the following work no particular brain part or function will be modeled. Rather the fundamental brain
characteristics of parallelism and fault tolerance will be applied.

Artificial Neuron and its model:


As mentioned earlier, the human brain no doubt is a highly complex structure viewed as a massive, highly
interconnected network of simple processing elements called neurons. However the behavior of a neuron
can be captured by a simple model as shown in below Fig. 2.3. Every component of the model bears a
direct analogy to the actual constituents of a biological neuron and hence is termed as artificial neuron. It
is the model which forms the basis of artificial neural Networks.

Fig. 2.3 Simple model of an artificial neuron.

Here, x1, x2, x3, ... , xn are the n inputs to the artificial neuron. W1, W2, ... , wn are the weights attached
to the input links.
Recollect that a biological neuron receives all inputs through the dendrites, sums them and produces an
output. The input signals are passed on to the cell body through the synapse which may accelerate or
retard an arriving signal. An effective synapse which transmits a stronger signal will have a
correspondingly larger weight while a weak synapse will have smaller weights. Thus, weights here are
multiplicative factors of the inputs to account for the strength of the synapse. Hence, the total input I
received by the soma of the artificial neuron is

To generate the final output y, the sum is passed on to a non-linear filter ϕ called Activation function, or
Transfer function, or Squash function which releases the output.

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

i.e. . y = ϕ (I)
A very commonly used Activation function is the Thresholding function. In this, the sum is compared
with a threshold value θ. If the value of I is greater than θ, then the output is 1 else it is 0. i.e.

where, ϕ is the step function known as Heaviside function and is such that

Figure 2.4 illustrates the Thresholding function. This is convenient in the sense that the output signal is either 1
or O resulting in the neuron being on or off.

Fig. 2.4 Thresholding function.

For neurons in same layer, same activations functions are used. There may be linear as well as non-linear
activation function. So there are many type of activation functions are used. Few are given here.

TYPES OF ACTIVATION FUNCTION


Heaviside Function : It is also known as step function. In this case

Example: according to fig2.3


To get the output, first we calculate input :
Let, x1 = 0.25; w1 = 0.10
X2 = 0.50; W2 = 0.40
X3 = 0.82; W3 = 0.90

Now, I= Ʃxiwi = x1 w1 + x2 w2 + x3 w3
Downloaded by Deepansh Sharma ([email protected])
lOMoARcPSD|50004020

= 0.25 X 0.10 + 0.50 X 0.40 + 0.82 X 0.90


= 0.025 + 0.20 + 0.738
= 0.963 . .
As, I = 0.963 > 0
So, ϕ(I) = 1.

ϕ(I)

Identity function : The linear neuron or linear network is also called as Identity
funciton. In this case,
ϕ(I) = I ..
Example:
Let, x1 = 0.25; w1 = 0.10
X2 = 0.50; W2 = 0.40
X3 = 0.82; W3 = 0.90

Now, I= Ʃxiwi = x1 w1 + x2 w2 + x3 w3
= 0.25 X 0.10 + 0.50 X 0.40 + 0.82 X 0.90
= 0.025 + 0.20 + 0.738
I = 0.963 . .
As, ϕ(I)= I
So, ϕ(0.963) = 0.963.

ϕ(I)

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Signum function

Also known. as the Quantizer function, the function ϕ is defined as

Example:

Let, x1 = 0.25; w1 = 0.10


X2 = 0.50; W2 = 0.40
X3 = 0.82; W3 = 0.90

Now, I= Ʃxiwi = x1 w1 + x2 w2 + x3 w3
= 0.25 X 0.10 + 0.50 X 0.40 + 0.82 X 0.90
= 0.025 + 0.20 + 0.738
I = 0.963 . .
Assume θ= 0.5
So I = 0.963 and θ = 0.5.
As, I> θ; Hence, ϕ(I )= + l
Figure 2.5 illustrates the Signum function.

Fig. 2.5 Signum function

Sigmoidal function
This function is a continuous function that varies gradually between the asymptotic values 0 and 1 or -1
and + 1. Sigmoidal functions are differentiable, which is an important feature of NN theory, as given by

where α is the slope parameter, which adjusts the abruptness of the function as it changes between the two
asymptotic values.
Eexample:
Let, x1 = 0.25; w1 = 0.10

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

X2 = 0.50; W2 = 0.40
X3 = 0.82; W3 = 0.90
Now, I= Ʃxiwi = x1 w1 + x2 w2 + x3 w3
= 0.25 X 0.10 + 0.50 X 0.40 + 0.82 X 0.90
= 0.025 + 0.20 + 0.738
I = 0.963 .
Assume the value of α to be 1.

Now, ϕ(I)=

= 0.723
ϕ (0.963) = 0.723

Figure 2.6 illustrates the sigmoidal function.

Fig. 2.6 Sigmoidal function.

Mcculloch-Pitts Neuron Model:

The first formal definition of a synthetic neuron model based on the highly simplified considerations of
biological model was formulated by Warren Mcculloch and Walter Pitts in 1943.
McCulloch-Pitts neuron allows binary 0 or l states only so it is binary activated. The neurons are
connected by direct weighted path. The connected path can be excitatory or inhibitory. Excitatory

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

connection have positive weight and inhibitory connection have negative weights. In Mcculloch-pitts
neuron model the output is found as :
n

1 if ∑ Wixi ≥ ɸ(threshold)
i=1

Y(output) = n

0 if ∑ Wixi < ɸ(threshold)


i=1

Where Y is an output produced by McCulloch-Pitts neuron model. W, is the weight of an edges and x is
an input (excitatory input or inhibitory input) given to neuron. ɸ is the threshold value.·

Fig. 1.8. McCulloch-pitts model.

The model used in MuCulJoch-Pitts neuron model. It's features are :


1. The activation of a pitts neuron is binary.

2. A connection path is excitatory if the weight is positive otherwise it is inhibitory. All excitatory

connections into a particular neuron has the same weights.

3. Each neuron has a fixed threshold such that if the net input to the neuron is greater than or equal to the

threshold, the neuron fires(fires 0 or 1).

4. The threshold is set so that inhibition is absolute. That is, any non-zero inhibitory input will prevent the

neuron fires.

LOGIC GATE REALIZATION BY MCCULLOCH-PITTS NEURON


MODEL
AND GATE: As per truth table for AND gate function, the output is l if both inputs are true i.e. 1,
otherwise it is 0.

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Input output
X1 X2 Y
0 0 0
0 1 0
1 0 0
1 1 1

If we apply the concept of AND gate as a function of McCulloch-Pitts model we will get the output
n
1 if ∑ Wixi ≥ ɸ(threshold)
i=1

Y(output) =
n
0 if ∑ Wixi < ɸ(threshold)
i=1

where ɸ is a threshold value for McCulloch-Pitts neuron model. Let's take both weights as '1', here :

Fig. 1.9. McCulloch-Pitts model to perform AND function.

Here both inputs are 1. So, we get ɸ = 1 + 1 = 2.


Now calculate:
(i) W1x1 + W2x2 (when x1 = 1 ahd x2= 1)
=l x l + l x l
=l + l
=2
.·. W1x1 + W2x2= ɸ =2
So, the output is 1.
(ii) When_x1 = 1 and x2 = 0
Then, W1 x1 + W2x2
=l x l + l x 0
=1 + 0
=1
Since, W1 x1 + W2 x2< ɸ
So, the output is zero.
(iii) When x1 = 0 and x2 = 1

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Then, W1 x1 + W2X2
=l x 0 + l x l
=0 + 1
=1
Since, W1 x1 + W2 x2< ɸ
So, the output is zero.
(iv) When x 1 = 0 and x2 = 0
Then, W1 x1 + W2x2
=l x 0 + l x 0
=0+0
=0
Since, W1 x1 + W2 x2< ɸ
So, the output is zero.

Note: Actually we can take any integer value for weights but both weights should be same. Since weights
represents synaptic link and we want equal priority for both inputs.

OR Gate: Here the pitts neuron is treated as OR gate. So as per properties of OR gates, returns a high (1)
if any of input is high, returns a low (0) if none of the input is high. Truth table for OR gate :

Input output
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1

Fig. 1.10. McCulloch-Pitts model to perform OR function.

As per Fig. 1.10, the threshold value for pitts neuron is 2. Now output is calculated as,
n
1 if ∑ Wixi ≥ 2
i=1

Output =
n
0 if ∑ Wixi < 2

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

i=1

Now for inputs:


(i) x1= 1, x2= 1 output= W1x1 + W2x2= 2 x 1 + 2 x 1 = 4
·: 4 ≥ 2 (threshold) so, output= 1.
(ii) x1 = 1, x2 = 0 output= 2 x 1 + 2 x 0 = 2
·: 2 = 2 (threshold) so, output= 1.
(iii) x1 = 0, x2 = 1 output= 2 x 0 + 2 x 1 = 2
·: 2 = 2 (threshold) so, output = 1
(iv) For x1 = 0, x2 = 0 output= 2 x 0 + 2 x 0 = 0
·: 0 < 2 (threshold) so, output= 0.

NOT Gate: Here the Pitts neuron in Fig. 1.11 is treated as NOT gate function. As per NOT gate
properties it returns true value (1) if input is false (0) and returns false (0) if the input is true (1).
'NOT' Gate (Implementation) : Truth table for NOT gate :

Input(X) 1 0
Output(Y) 0 1

Fig. 1.11. Pitts Model for 'NOT'.

n
1 if ∑ Wixi ≥ ɸ(threshold)
i=1

Y( Output) =
n
0 if ∑ Wixi < ɸ(threshold)
i=1

Now at input X = 0; we hove


(Weight) x (Input)= - 1 x 0 = 0 ≥ 0(threshold)
So, output Y = 1 (from above definition of Y)
and for input X = 1;
(Weight) x (Input)= - 1 x 1 = - 1 < 0 (threshold)
So, output Y = 0
which is correct from truth table of 'NOT' gate.

XOR Gate (Implementation) : First we will draw truth table for XOR, as follows :

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Input output
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 0

Now we can write


X1 XOR X2 (X1 AND (NOT X2)) OR (X2 AND (NOT X1))
Thus, here we require implementation of 'AND', 'OR', 'NOT' gate to implement 'XOR' gate.

Fig. 1.12. Implementation of 'XOR' gate.

In the above diagram-


Neuron 'Z 1' computes · X1 'AND' (NOT X2)
Neuron 'Z2' computes X2 'AND' (NOT X1) and
'Z' computes Z1 'OR' Z 2 The final output Y.
Now let‘s check this model for possible inputs:
Case 1: x1=0 and x2=0
• Output of Neuron Z1 = 0 (Since 2 x X1 + (-1) X2 = 0 and 0 < ɸ(threshold) which is 2. And
• Output of Neuron Z2 = 0 (Since both inputs are zero and 2 x X2 + (-1) X1 = 0 and 0 < ɸ(threshold). So
input to neuron Z = ∑Wixi =2x0+2x0
=0
So, output of Neuron 'Z' = 0 (Since ∑Wixi < ɸ(threshold). So, output Y = 0,
Case II: X1 , = 0 and X2 = 1
Input of z1 = - 1. thus output of z1, is 0 as .-1≤2 (ɸ)
and I nput. of 2=2 , thus output of Z2 is 1 as 2 ≤2 (ɸ).
The input of Y is 2, thus output of Y is 1.
Case III : X1 = 1 and X2 = 0 (same as Case II, due to symmetry of problem ).
Case IV: X 1 = 1 and X2= 1
Input of Z 1 is 2 x l + 1 x - 1 = 1
Thus, output of Z1 is '0' as 1 ≤ 2 (threshold)
Input of Z2 is 2 x 1 + 1 x - 1 = 1

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Thus, output of Z2 is also '0' as 1 ≤2 (ɸ)


Now, input of Z is O x 2 + 0 x 2 = 0: Thus output of Y = 0.

BENEFITS OF NEURAL NETWORKS


1. Non-linearity.
2. Input-output mapping.
3. Adaptivity.
4. Evidential response.
5. Contextual information.
6. Fault tolerance
7. VLSI implementability.
8. Uniformity of analysis and design.
9. Neurological analogy.

USES OF NEURAL NETWORKS


1. Signal processing
2. Control.
3. Pattern recognition
4. Medicine
5. Speech recognition.
6. Speech product.ion
7. Business.

NEURAL NETWORK ARCHITECTURES


An Artificial Neural Network is defined as a data processing system consisting of a large number of simple
highly interconnected processing elements (artificial neurons) in an architecture inspired by the structure of the
cerebral cortex of the brain . Generally, an ANN structure can be represented using a directed graph. A graph
G is an ordered 2-tuple (V, E) consisting of a set V of vertices and a set E of edges. When each edge is
assigned an orientation, the graph is directed and is called a directed graph or a digraph . Figure 2.7 illustrates a
digraph.
Digraphs assume significance in Neural Network theory since signals in NN systems are restricted to flow in
specific directions.
The vertices of the graph may represent neurons (input/output) and the edges, the synaptic links. The edges are
labelled by the weights attached to the synaptic links.

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Fig. 2. 7 An example digraph.

There are several classes of NN, classified according to their learning mechanisms. However, we identify three
fundamentally different classes of Networks. All the three classes employ the digraph structure for their
representation.

Single Layer Feed-forward Network

This type of network comprises of two layers, namely the input layer and the output layer. The input layer
neurons receive the input signals and the output layer neurons receive the output signals. The synaptic links
carrying the weights connect every input neuron to the output neuron but not vice-versa. Such a network is said
to be feedforward in type or acyclic in nature. Despite the two layers, the network is termed single layer since
it is the output layer, alone which performs computation. The input layer merely transmits the signals to the
output layer. Hence, the name single layer feedforward network. Figure 2.8 illustrates an example network.

Fig. 2.8 Single layer feed-forward network.

Multilayer Feed-forward Network


This network, as its name indicates is made up of multiple layers. Thus, architectures of this class besides
possessing an input and an output layer also have one or more intermediary layers called hidden layers.
The computational units of the hidden layer are known as the hidden neurons or hidden units. The hidden
layer aids in performing useful intermediary computations before directing the input to the output layer.
The input layer neurons are linked to the hidden layer neurons and the weights on these links are referred
to as input-hidden layer weights. Again, the hidden layer neurons are linked to the output layer neurons

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

and the corresponding weights are referred to as hidden-output layer weights. A multilayer feed-forward
network with l input neurons m1, neurons in the first hidden layer, m2 neurons in the second hidden layer
and n output neurons in the output layer is written as l - m1 - m2 - n. Figure 2.9 illustrates a multilayer feed-
forward network with a configuration l - m - n.

Fig. 2.9 A multilayer feed-forward network (I - m - n configuration).

Recurrent Networks

These networks differ from feed-forward network architectures in the sense that there is at least one
feedback loop. Thus, in these networks, for example, there could exist one layer with feedback
connections as shown in Fig. 2. 10. There could also be neurons with self-feedback Jinks, i.e. the output of
a neuron is fed back into itself as input.

Fig. 2.10 A recurrent neural network.

CHARACTERISTICS OF NEURAL NETWORKS


(i) The NNs exhibit mapping capabilities, that is, they can map input patterns to their associated output
patterns.
(ii) The NNs learn by examples. Thus, NN architectures can be 'trained' with known examples of a
problem before they are tested for their 'inference' capability on unknown instances of the problem. They
can, therefore, identify new objects previously untrained.

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

(iii) The NNs possess the capability to generalize. Thus, they can predict new outcomes from past trends. ·
(iv) The NNs are robust systems and are fault tolerant. They can, therefore, recall full patterns from
incomplete, partial or noisy patterns.
(v) The NNs can process information in parallel, at high speed, and in a distributed manner.

LEARNING METHODS/TECHNIQUES

Learning methods in Neural Networks can be broadly classified into three basic types: supervised,
unsupervised, and reinforced.

Supervised learning

In this, every input pattern that is used to train the network is associated with an output pattern, which is
the target or the desired pattern. A teacher is assumed to be present during the learning process, when a
comparison is made between the network's computed output and the correct expected output, to determine
the error. The error can then be used to change network parameters, which result in an improvement in
performance.

Unsupervised learning

In this learning method, the target output is not presented to the network. It is as if there is no teacher to
present the desired patterns and hence, the system learns of its own by discovering and adapting to
structural features in the input patterns.

Reinforced learning

In this method, a teacher though available, does not present the expected answer but only indicates if the
computed output is correct or incorrect. The information provided helps the network in its learning

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

process. A reward is given for a correct answer computed and a penalty for a wrong answer. But,
reinforced learning is not one of the popular forms of learning.
Supervised and unsupervised learning methods, which are most popular forms of learning have found
expression through various rules. Some of the widely used rules have been presented below:

Hebbian learning

This rule was proposed by Hebb (1949) and is based on correlative weight adjustment. This is the oldest
learning mechanism inspired by biology. ·
In this the input-output pattern pairs (Xi, Yi) are associated by the weight matrix W, known as the
correlation matrix. It is computed as

Here. YiT is the transpose of the associated output vector Yi.

Gradient descent learning


This is based on the minimization of error E defined in terms of weights and the activation function
of the network. Also, it is required that the activation function employed by the network is
differentiable, as the weight update is dependent on the gradient of the error E.
Thus. if ∆Wij is the weight update of the link connecting the ith and jth neuron of the two neighboring
layers, then ∆Wij is defined as

where, η is the learning rate parameter and ∂E/∂Wij is the error gradient with reference to the weight
Wij.
The Widrow and Hoffs Delta rule and Back-propagation learning rule are all examples of this type of
learning mechanism.

Competitive learning

In this method, those neurons which respond strongly to input stimuli have their weights updated.
When an input pattern is presented, all neurons in the layer compete and the winning neuron
undergoes weight adjustment. Hence, it is a "winner-takes-all" strategy.

Stochastic learning

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

In this method, weights are adjusted in a probabilistic fashion. An example is evident in simulated
annealing-the learning mechanism employed by Boltzmann and Cauchy machines, which are a kind
of NN systems. ·

Perception and convergence rule

A perceptron is a neural network unit (an artificial neuron) that does certain computations to detect
features or business intelligence in the input data.
A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm enables neurons
to learn and processes elements in the training set one at a time.
Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning rule based on
the original MCP neuron

Rosenblatt's Perceptron

The perceptron is a computational model of the retina of the eye and hence, is named 'perceptron'. The
network comprises three units, the Sensory unit ·s, Association unit A, and Response unit R (refer Fig.
2.12).

Fig. 2.12 Rosenblatt's original perceptron model.

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

The S unit comprising 400 photodetectors receives input images and provides a 0/1 electric signal as
output. If the input signals exceed a threshold, then the photodetector outputs l else 0.
The photodetectors are randomly connected to the Association unit A. The A unit comprises, feature
demons or predicates. The predicates examine the output of the S unit for specific features of the image.
The third unit R comprises pattem recognizers or perceptrons which receives the results of the predicate,
also in binary form. While the weights of the S and A units are fixed those of R are adjustable.
The output of the R unit could be such that if the weighted sum of its inputs is less than or equal to 0, then
the output is 0 otherwise it is the weighted sum itself. It could also be determined by a step function with
binary values (O/1) or bipolar values (-1/1). Thus, in the case of a step function yielding 0/1 output values,
it is defined as where
Yj = ɸ (netj) = 1, if netj > 0
= 0, otherwise
n
where netj = Ʃwij xi
i=I
Here, xi is the input, wij is the weight on the connection leading to the output units (R unit), and yj is the
output.
The training algorithm of the perceptron is a supervised learning algorithm where weights are adjusted to
minimize error whenever the computed output does not match the target output.
There are two types of Perceptrons: Single layer and Multilayer.

Single layer Perceptron

An arrangement of one input layer of neurons feed forward to one output layer of neurons is known as
Single Layer Perceptron.

Fig. 2.13 A simple perceptron model.


yj = ɸ (netj) = 1 if netj ≥ 0 where netj = Σn xi wij
i=1
0 if netj < 0

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Multilayer perceptron
A multilayer perceptron (MLP) is a perceptron that teams up with additional perceptrons, stacked in
several layers, to solve complex problems. The diagram below shows an MLP with three layers. Each
perceptron in the first layer on the left (the input layer), sends outputs to all the perceptrons in the
second layer (the hidden layer), and all perceptrons in the second layer send outputs to the final layer
on the right (the output layer).

eeach each

Fig. 2.14 A multilayer feedforward perceptron rnodel.

Each perceptron sends multiple signals, one signal going to each perceptron in the next layer. For each
signal, the perceptron uses different weights. In the diagram above, every line going from a perceptron
in one layer to the next layer represents a different output. Each layer can have a large number of
perceptrons, and there can be multiple layers, so the multilayer perceptron can quickly become a very
complex system. The multilayer perceptron has another, more common name—a neural network. A
three-layer MLP, like the diagram above, is called a Non-Deep or Shallow Neural Network. An MLP
with four or more layers is called a Deep Neural Network. One difference between an MLP and a neural
network is that in the classic perceptron, the decision function is a step function and the output is binary.
In neural networks that evolved from MLPs, other activation functions can be used which result in
outputs of real values, usually between 0 and 1 or between -1 and 1. This allows for probability-based
predictions or classification of items into multiple labels

Learning Algorithm : Training Perceptron


The training of Perceptron is a supervised learning algorithm where weights are adjusted to
minimize error whenever the output does not match the desired output.

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

A basic learning algorithm for training the perceptron is as. follows:


• If the output is correct then no adjustment of weights is done.
Wij (k +1) = Wij(k)
• If the output is 1 but should have been 0 then the weights ·are decreased on the active input links
Wij (k +1) = Wij (k ) - α .xi
• If the output is 0 but should have been 1 then the weights ·are increased on the active input links
Wij (k +1) = Wij (k ) + α .xi
Here, Wij (k +1) is the new adjusted weight, Wij (k ) is the old weight, xi the input and α is the learning rate
parameter. Also, small α leads to slow learning and large α to fast learning . However large α also runs
the risk of allowing weights to oscillate about values which would result in the correct outputs. For a
constant α, the learning algorithm is termed fixed increment algorithm.

Perceptron Learning Algorithm


Fixed increment perceptron learning algorithm for a classification problem with n input features (x1, x2, •••
, Xn) and two output classes (0/1). The algorithm is illustrated step-by-step.

Step 1: Create a perceptron with (n + 1) input neurons x0 , x1 , . . . , Xn, where x 0 = 1 is the bias input.
Let O be the output neuron.
Step 2 : Initialize w = (w0 , w1 , .. . , Wn) to random weights.
Step 3 : Iterate through the input patterns xj of the training set using the weight set, (i.e.) compute the
weighted sum of inputs netj = Σn xi wij for each input pattern j.
i=1

Step 4 : Compute the output Yj using the step function


Yj = (netj) = 1, netj > 0
= 0 , otherwise .
Step 5: Compare the computed output Yj with the target output Yj for each input pattern j. If all the input
patterns have been classified correctly, output the weights and exit.
Step 6 : Otherwise , update t he weights as given below:
If the output is 1 but should have been 0 then the weights ·are decreased on the active input links
Wij (k +1) = Wij (k ) - α .xi
If the output is 0 but should have been 1 then the weights ·are decreased on the active input links
Wij (k +1) = Wij (k ) + α .xi
(Here, Wij (k +1) is the new adjusted weight, Wij (k ) is the old weight, xi the input and α is the learning rate
parameter.)
Step 7: goto step 3

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Perceptron and Linearly Separable Task

Perceptron cannot handle tasks which are not separable.

- Definition : Sets of points in 2-D space are linearly separable if the sets can be separated by a
straight line.
- Generalizing, a set of points in n-dimensional space are linearly separable if there is a hyper plane of
(n-1) dimensions separates the sets.
Example

S1 S2 S1

S2

(a) Linearly separable patterns (b) Not Linearly separable patterns

The perceptron cannot find weights for classification type of problems that are not linearly separable. An
example is the XOR ( eXclusive OR) problem.

Summary of the Perceptron Convergence Algorithm:

Step 1: Initialization -- Set w0 = 0. Then perform the following computations for time step n = 1, 2……··

Step 2: Activation - At time step n, activate the perceptron by applying continuous valved input vector
x (n) and desired response d (n ).

Step 3: Computation of actual response - Compute the actual response of the perceptron : ·

y(n) = sgn [w (n)x(n)]


T

where sgn (.) is the signum function.


Step 4: Adaptation of weight vector- Update the weight vector of the perceptron:

w(n + 1) = w(n) + α [d(n)-y(n)] x(n)

+ 1 if x(n) belongs to class C1

where d(n) = -1 if x(n) belongs to class C2

~ /-

Step 5: Continuation - Increment time step n by one and go back to step 2.


Downloaded by Deepansh Sharma ([email protected])
lOMoARcPSD|50004020

Variables and Parameters :


x(n) = (m + 1)-by-1 input vector = (+l ,x1(n), x2(n), ........ , xm(n)]T
w(n) = (m + 1)-by-1 weight vector = [b(n), w1(n), w2(n), ...... , wm(n)]T
b(n) = bias
y(n) = actual response (quantized)

d(n) = desired response

α = learning-rate parameter, a positive constant less than unity


A quantized desired response d(n) is given by

+ 1 if x(n) belongs to class C1

where d(n) =

-1 if x(n) belongs to class C2

Thus, the adaptation of the weight vector w(n) is summed up nicely in the form of the error-correction
learning rule :
w(n + 1) = w(n) + α [d(n)-y(n)] x(n)

where α is the learning-rate parameter and the difference d(n)-y(n) plays the role of an error signal. The
learning-rate parameter is a positive constant limited to the range
0 < α <= 1.
Example: A single-layer two-input perceptron is shown in Figure

Where Threshold  = 0.1 and learning rate  = 0.2,


Solution:The sequence of four input patterns representing an epoch. The four input patterns (i.e., training
examples) are (x1, x2) = (0, 0), (x1, x2) = (0, 1), (x1, x2) = (1, 0), (x1, x2) = (1, 1). Here

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

And

According to signm activation function

e(p) = Yd(p) - Y(p), where p = 1, 2, 3, . (where e denotes error, Yd(p) desired inputs Y(p) current output)
w(n + 1) = w(n) + α [d(n)-y(n)] x(n) (where e=d(n)-y(n)])
Example of perceptron learning: the logical operation AND

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

ADALINE Network

The Adaptive Linear Neural Element Network framed by Bernard Widrow of Stanford University,
makes use of supervised learning. Figure 2.19 illustrates a simple ADALINE network. Here, there
is only one output neuron and the output values are bipolar (-1 or + 1 ). However, the inputs x I

could be binary, bipolar or real valued. The bias weight is w0 with an input link of Xo = + 1. If the
weighted sum of the inputs is greater than or equal to 0 then the output is 1 otherwise it is -1.
The supervised learning algorithm adopted by the network is similar to the perceptron learning
algorithm. Devised by Widrow-Hoff (1960), the learning algorithm is also known as the Least
Mean Square (LMS) or Delta rule. The rule is given by Wi
new
= Wiold + α (t - y)xi
where, α is the learning coefficient, t is the target output, y is the computed output, and xi is the
input.

Fig. 2.19 A simple ADALINE network.

ADALINE network has had the most successful applications because-it is used virtually in
all high speed modems and telephone switching systems to cancel the echo in long distance
communication circuits.

MADALINE Network

A MADALINE (Many ADALINE) network is created by combining a number of ADALINES. The


network of ADALINES can span many layers. Figure 2.20 illustrates a simple MADALINE network. The
use of multiple ADALINES helps counter the problem of non-linear separability. For example, the

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

MADALINE network with two units exhibits the capability to solve the XOR problem (refer Fig. 2.21). In
this, each ADALINE unit receives the input bits x 1, x2 and the bias input x0 = 1 as its inputs. The
weighted sum of the inputs is calculated and passed on to the bipolar threshold units. The logical ‗and‘ ing
(bipolar) of the two threshold outputs are computed to obtain the final output. Here, if the threshold
outputs are both +l or -1 then the final output is +l. If the threshold outputs are different, (i.e.) (+l, -1) then
the final output is -1. Inputs which are of even parity produce positive outputs and inputs of odd parity
produce negative outputs.

Fig. 2.20 MADALINE network

Figure 2.22 shows the decision boundaries for the XOR problem while trying to classify the even
parity inputs (positive outputs) from the odd parity inputs ·(negative outputs).

Part_1

Part_2

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Fig. 2.21 A MADALINE network to solve the XOR problem.

learning rule adopted by MADALINE network is termed as 'MADALINE Adaptation Rule


(MR) and is a form of supervised learning. In this method, the objective is to adjust the
weights such that the error is minimum for the current .training pattern, but with as little
damage to the learning acquired through previous training patterns.

Example: The input values for part_1 is taken as,


x0 = 1, x1 and x 2 and these weight as taken as, w0 = - 2, w1 = 1 and w2 = 3. ·While for Part_2 the
weights for their inputs are taken as, w0 = - 1, w1 = 3 and w2 = - 1.
Here the value of.x1 and x2 for part_1 and part_2 can be taken as,
(i) X1 = 0, X2 = 0
(ii) X l = 0, X2=1
(iii) X1 = 1, X2 = 0
(iv) x1 = 1, X2 = 1

Case 1: Now,
When, (i) x1 = 0, x2 = 0 for part_1. The value of y' = x0w0 + x 1w1 + x 2w2
So, y' = 1 x (-2) + 0 x (1) + 0 x (3) = - 2
When this input will pass from threshold unit it will pass the value - 1 as output, because -2 < 0 so, value
of z' = - 1.
Now,
When, (i) x1 = 0, x2 = 0 for part_2. The value of y‖ = XoWo + X1W1 + X2W2
so, y‖ = 1 X (-1) + 0 X (3) + 0 X (-1) = -1
because - 1 < 0 so, threshold unit will pass the value - 1 as output. So, value of z‖ = -1.
When the value of z' and z" will pass to "AND" logic unit it will give the value + 1 as a output because,
both threshold result is same, so, final output is + 1. So, value of Y = 1 (Positive output even parity).
Case 2:
When, (ii) x1 = 0, x2 = 1 for part_1
Now, y' = 1 x (-2) + 0 x (1) + 1 x (3) = - 2 + 3 = + 1
·: +1 > 0 so, threshold unit will pass out as +1 so, z' = +1.
Now, x1 = 0, and x2 = 1 for Part_2
SO, y" = 1 X (-1) + 0 X (3) + 1 X (-1) = - 2
·: -2 < 0 so, threshold unit will pass out as -1 so z‖=-1
,, 1 so, z = - .

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

because z' and z" have different value so, 'AND' logic unit will generate output as -1.
So, value of Y = - 1 (classified as odd parity).

Case 3: Now,
When, (iii) Xi = 1, x2 = 0, for Part_1
then, y' = 1 x (-2) + 1 x (1) + 0 x (3) = - 1
·: -1 < 0 so, threshold unit will give output as – 1. So, z' = -1.
When, x1 = 1, and x2 = 0 for Part_2
So, y" = 1 X (-1) + 0 X (3) + 0 X (-1) = 2
·: 2 < 0 so, threshold unit will give output as +1 So, z" = +l.
·: z' and z" both have different value so, 'AND' logic unit will generate output as -1.
So, value of Y = - 1 (classified as odd parity).

Case 4:
When, (iv) x 1 = 1, x2 = 1, for Part_1 then, y' = 1 x (-2) + 1 x (1) + 1 x (3) = 2
·.· 2 > 0 so, threshold unit will give output as +l, so, z' = +1.
Now,
When, x1 = 1, and x2 = +1 for Part_ 2 then, y" = 1 x (-1) + 1 x (3) + 1 x (-1) = 1
·.- + 1 > 0 so, threshold unit will give output as +1 so, z" = +l.
·: z' and z" both have similar value so, 'AND' logic unit will generate output as+ 1. So,
value of Y = +1 (classified as even parity).

Difference between ANN and human brain:


Following points differentiate ANN with Human brain as follows:

Characteristics Artificial Neural Network Biological(Real) Neural Network


Faster in processing information. Slower in processing information. The response
Speed Response time is in nanoseconds. time is in milliseconds.
Processing Serial processing. Massively parallel processing.
Less size & complexity. It does not Highly complex and dense network of
Size & perform complex pattern recognition interconnected neurons containing neurons of the
Complexity tasks. order of 1011 with 1015 of interconnections.
Storage Information storage is replaceable Highly complex and dense network of
means new data can be added by interconnected neurons containing neurons of the
deleting an old one. order of 1011 with 1015 of interconnections.
Fault Fault intolerant. Information once Information storage is adaptable means new
tolerance corrupted cannot be retrieved in case information is added by adjusting the
of failure of the system. interconnection strengths without destroying old
information

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Control There is a control unit for No specific control mechanism external to the
Mechanism controlling computing activities computing task.

Associative Memories:

Associative Memories, one of the major classes of neural networks, are faint imitations of the
human brain's ability to associate patterns. An Associative Memory (AM) which belongs to the
class of single layer feed-forward or recurrent network architecture depending on its association
capability.
An associative memory is a storehouse of associated patterns which are encoded in some form.
When· the storehouse is triggered or incited with a pattern, the associated pattern pair is recalled
or output. The input pattern could be an exact replica of the stored pattern or a distorted or partial
representation of a stored pattern. Figure 4.1 illustrates the working of an associative memory. _

Fig. 4.1 The working of an associative memory.

In the figure, (∆,ɾ), (7, 4), and (+, +) are associated pattern pairs. The associations represented
using '↔' symbol are stored in the memory. When the memory is triggered for instance, with a ∆,
the associated pattern ɾ is retrieved automatically.
Hetero-Associative Memory: If the associated pattern pairs (x, y) are different and if the model
recalls a y given an x vice versa, then .it is termed as hetero-associative memory. hetero-
associative memories are useful for the association of patterns. and hetero-associative correlation
memories are known as hetro-correlators.

Auto-Associative Memory: On the .other hand, if x and• y refer to the same pattern, then the
model is termed as auto-associative memory. Auto-associative memories are useful for image
refinement i.e. if a distorted or a partial pattern is given as input then the whole pattern stored in
Downloaded by Deepansh Sharma ([email protected])
lOMoARcPSD|50004020

its perfect form can be recalled. Auto-associative correlation memories are known as auto-
correlators.

(a) Heteroassociaflve memory (b) Autoassociatlve memory

Associative memory models may be further classified into static and dynamic network based on
the principle of recall.
(a) Static model: Static networks recall an output given an input in one feed-forward pass. Static
networks are therefore called non-recurrent.
(b) Dynamic network : Dynamic network recalls through an input/output feed-back mechanism
so it takes some time. Dynamic networks are termed recurrent.

Fig.. Static model

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

fig.. Dynamic model

ALGORITHM FOR PATTERN ASSOCIATION


The associative net involves, the two common methods of training used for a single layer network.
(i) Hebb Rule

(ii) Delta Rule

Hebb Rule for Pattern Association


The simplest and frequently used method for determining the weights for an associative memory neural
net. This method can be used for both binary (0, 1) as well as for bipolarizations (-1, 1). The training and
testing vectors are different.

The training vector is denoted as S: t and testing vector as x S denotes training pattern.
The algorithm is as follows :
. l. Initialize the weight for different neurons (i = 1, 2, 3 ... n and j = 1, 2, 3 ... m)
i.e., Wu= 0
Or W11 = W12 = W13 = 0, etc.
2. For each training input vector set target output vector.
e.g. : if training set = (1,0,0,1)
then set target ,= (0, 1)
3. Set activation function for input units to present training input data.
xi= Si

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

4. Set activation function for output data. So we can get desired output
Yi = tj
I

5. If desired output is not correct then adjust the weights.

W ij{new) = Wij{old) +XiYj

Hebb Outer Product Rule


The outer product of two vectors can be used to find weights based on Hebb rule.

Let S = [S1, S2 ••• Si ….Sn]

be a n x 1 matrix. This is training pattern.

The target vector · t = (t1 , t 2, ••• tj. ... tm ) is a 1 x m matrix.

The ST matrix product of S = sT and T = t

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

S1
..
ST = S t1…...t…….…tn
..
Sn

This matrix product will produce the weight and this weight is stored which represent
association. ' -
To find weight matrix W ij for a set of association
S(P): t(P), P = 1 ... P
where, S(P)= S1(P), S2(P) .... Sn(P).
t(P) = t1(P), t2(P) .... tn(P).
The weight matrix Wij can be given as

W = wij = S(P) t (P)

Delta Rule for Pattern Association : It is defined as - "the adjustment mode to a synaptic weight of
a neuron proportional to product of error signal and the input signal of the synapse."' The aim of delta
rule is to minimize the error over all training pattern.
Delta Rule for Single Output Unit : The delta rule changes the' weight of the connections to
minimize the difference between the net input unit , Yin and the target value t. The delta rule' is, "
∆Wi-= α (t - Yin. )xi

Delta Rule for Several Output Units: In this case, the weight are changed to reduce the difference
between the input and target. The weight correction involves the adjustment of the ith input unit to the
jth output unit is
∆Wij-= α (tj - Yin. )xi

AUTOCORRELATORS

Auto-associative memory is also known as Autocorrelators, easily recognized by the title of Hopfield
Associative Memory (HAM), were introduced as a theoretical notation by Donald Hebb ( 1949).

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Other researchers who studied their dynamics include Little (1974), Little and Shaw (1978), and Hopfield
(1982).
First order autocorrelators obtain their connection matrix (indicative of the association of the pattern with
itself) by multiplying a pattern's element with every other pattern's elements.
A first order autocorrelator stores M bipolar patterns A1, A2, ••• , Am by summing together m outer
products as
m

T= ∑ [AiT][Ai]
i=l

Here, T = [tij] is a (p x p) connection matrix and Ai ϵ{-1, 1 }P. The autocorrelator's recall equation is a
vector-matrix multiplication followed by a point wise nonlinear threshold operation. The recall equation is
given by
aj new = f ( ai tij, ajold) for every j=1,2,…...p.
where Ai = (a1, a2, ...... , ap) and the two parameter bipolar threshold function is
1 if α > 0
f(α, β) = β if α = 0
-1 if α < 0

Example 4.1 (Working of an autocorrelator) Consider the following patterns


A1 = (-1, 1, -1, 1)
A2 = (1, 1, 1, - 1)
A3 = (-1, - 1, -1, 1)
which are to be stored as an autocorrelator

Solution: The connection matrix.

3 1 3 -3
n 1 3 1 -1
T= ∑ [AiT]4x1[Ai] 1x4 = 3 1 3 -3
i=l

-3 -1 -3 3

Recognition of stored patterns

The autocorrelator is presented a stored pattern A2 = (l, 1, 1, -1). now


a1new = f(3 + 1 + 3 + 3, 1) = 1
a 2new = f(6, 1) = 1
a3new = f(10, 1) = 1
a4new = f(-10, -1) = -1

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

This is indeed the vector itself. Also, in the retrieval of A3 = (-1, -1, -1, 1)
(a1 new , a2new , a3new , a4new) = (-1 -1 -1 1)
yielding the same vector.

Recognition of noisy patterns


Consider a vector A' = ( 1, 1, 1, 1) which is a distorted presentation of one among the stored patterns.
We proceed to find the proximity of the noisy vector to the stored patterns using the Hamming distance
measure. The Hamming distance (HD) of a vector X from Y, given X = (xi, x2 ... , xn) and Y = (y1, y 2, •• •
, yn,) is given by,
n
HD(x, y) = ∑ |xi – yi|
i=l
Thus, the HD of A' from each of the patterns in the stored set is as follows:
HD (A', A1) = 4
HD (A', A2)= 2
HD (A', A3) = 6

It is evident that the vector A' is closer to A2 and therefore resembles it, or in other words, is a noisy
version of A 2. ·
Now, the computations
(a1 new , a2new , a3new , a4new) = (f(4,1),f(4,1),f(4,1),f(-4,1))
= (1, 1, 1, -1)
=A2
Hence, in the case of partial vectors, an autocorrelator results in the refinement of the pattern or removal
of noise to retrieve the closest matching stored pattern.
Example 2. Use the Hebb rule to store the vector (1 1-1 -1) in an auto associative neural net.
(a) Find the weight/connection matrix
(6)Test the net input vector = (1 1-1 - 1)
c) Test the net with one mistake in input vector
d) Test the net with one missing in output vector
Solution: (a) To initialize the weight using outer produet Hebb rule
n

T= ∑ [AiT]4x1[Ai] 1x4
i=1

here Ai = [1 1 -1 -1]

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

hence
1 1 -1 -1
1 1 -1 -1
T/W = -1 -1 1 1
-1 -1 1 1

(b) For testing input vector


A = [1 1 -1 -1]
a1new = f(1 + 1 + 1 + 1, 1) = 1
a 2new = f(4, 1) = 1
a3new = f(-4, -1) = -1
a4new = f(-4, -1) = -1
so A‘ = [ 1 1 -1 -1]
A‘ is a response vector is same as stored vector A, we can say that the input vector is recognized as
stored/ known vector
(c) An auto associative net with one mistake in the input vector.
(1)
1 1 -1 -1
[-1 1 -1 -1] 1 1 -1 -1
-1 -1 1 1
-1 -1 1 1

f([2 2 2 -2]) = [1 1 -1 -1]


(2) [1 -1 -1 -1] x T
f([2 2 2 -2]) = [1 1 -1 -1]
Similarly
(3) [1 1 1 -1] xT
f([2 2 2 -2]) = [1 1 -1 -1]
(4) [1 1 -1 1] xT
f([2 2 2 -2]) = [1 1 -1 -1]
Note that in each of these cases input vector is recognised as "known".
d) Consider an auto associative network with one component as missing.
(1) [0 1-1-1] W
F([3 3-3-3]) = [1 1-1-1]
(2) [10-1-1] W

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

f([3 3-3-3]) = [1 1-1-1]


(3) [1 10-1] W
F([33-3-3]) = [1 1-1-1]
(4) [1 1-10] W
F([3 3-3-3]) = [1 1-1-1]
Thus the net recognizes the vectors formed when one component is missing.

HETEROCORRELATORS: KOSKO'S DISCRETE BAM


As Kosko and others (1987a, 1987b) , Cruz. Jr. and Stubberud (1987) have noted , the bidirectional
associative memory (BAM) is a two-level nonlinear neural network based on earlier studies and models of
associative memory .
Kosko extended the unidirectional autoassociators to bidirectional processes. One important performance
attribute of discrete BAM is its ability to recall stored pairs particularly in the presence of noise. ·
Bidirectional Associative Memory (BAM) introduced by Kosko (1987b) has the following operations:
1. There are N training pairs { (A 1, B1), (A 2, B2), • .• , (Ai, Bi), ... , (An, Bn)} where,
Ai =(ai1,ai2,……..,ain)
Bi = ( bi1, bi2, •• • , bip)
Here, aii or bii is either in the ON or OFF state.

2. In the binary mode, ON = 1 and OFF = 0 and in the bipolar mode, ON = 1 and OFF= -1
We frame the correlation matrix
N

M= ∑XiYi
i=l

To retrieve the nearest (Ai, Bi) pair given any pair (α, β), the recall equations are as follows:
Starting with (α, β)as the initial condition, we determine a finite sequence (α‘, β‘), (α‖, β‖), .:. until an
equilibrium point ((αF, βF)), is reached.
Here,

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

Addition and Deletion of Pattern Pairs


Given a set of pattern pairs (Xi, Yi), for i = 1, 2, . .. , n and their correlation matrix M, a pair (X', Y') can be
added or an existing pair (Xj, Yj) can be erased or deleted from the memory model.
In the case of addition, the new correlation matrix M is
M=X1TY1+X2TY2+……+XnTYn+X‘TY‘
In the case of deletion, we subtract the matrix corresponding to (XJ, Yj) from the matrix M i.e. (New) M =
M - (XjTYj)
The addition and deletion of information contributes to the functioning of the system as a typical human
memory exhibiting learning and forgetfulness.

Energy Function for BAM


A pair (A, B) defines the state of a BAM. To store a pattern, the value of the energy function for
that particular pattern has to occupy a minimum point in the energy landscape. Also, adding new
patterns ought not to destroy the previously stored patterns.
The stability of a BAM can be proved by identifying a energy function E with each state (A, B). In the
autoassociative case, Hopfield identified an appropriate E (actually, Hopfield defined half this quantity) as
E(A) = -AMAT
However, Kosko proposed an energy function,
E(A, B) = -A MBT
for the bidirectional case and this for a particular case A = B corresponds to Hopfield's autoassociative
energy function.
Also, when a paired pattern (A, B) is presented to BAM, the neurons change states until a bidirectionally
stable state (Af, Bf) is reached'. Kosko proved that such a stable state is reached for

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

any matrix M which corresponds to the local minimum of the energy function.
Kosko proved that each cycle of decoding lowers the energy E if the energy function for any point (α, β)
is given by
E = -αM βT
However, if the energy E evaluated using the coordinates of the pair (Ai, Bi),
i.e. E = - AiM BiT
does not constitute a local minimum, then the point cannot be recalled, eventhough one starts with α = Ai.
In this aspect, Kosko's encoding method does not ensure that the stored pairs are at a local minima.
Example :
suppose that we have to train own network to store the following pattern pairs: let N=3
A1 = (100001) B1 = (11000)
A2 = (011000) B2 = (10100)

A3 = (001011) B3 = (01110)
Solution: Converting these to bipolar forms
X1 = ( 1-1-1-1 -1 1)
X2= (-l 1 1-1-1-1)
X3 = (-1-1 1-1 1 1) .
Y1 = ( I 1 -1 -1 -1 )
Y2 = ( I -1 1 -1 -1)
Y3 = (-1 1 1 1 -1)
The matrix M is calculated as

1 1 -3 -1 1

1 -3 1 -1 1

M. = X1TY1 + X2TY2 + X3TY3 = -1 -1 3 1 -1

-1 -1 -1 1 3
-3 1 1 3 1
-1 3 -1 1 -1
Let us suppose we start with α = X3 hoping to retrieve the associated pair Y3•

α M = (-1 -1 1 -1 1 1) (M) = (-6 6 6 6 -6)

Downloaded by Deepansh Sharma ([email protected])


lOMoARcPSD|50004020

β‘ = ϕ (α M) = (-1 1 1 1 -1) .
T
β‘ M = (-5 -5 5 -3 7 5)
T)
ϕ(β‘ M = (-1 -1 1 -1 1 1) = α'
α'M = (-1, -1, 1 -1, 1 1) M = (-6 6 6 6 -6)

ϕ (α'M )= β" = (-l 1 1 1 -1)


= β'

Here, β' is same as Y3• Hence, (αF, βF) = (X3, Y3) is the desired result.

Limitations of Associative Memory


Main limitation of associative memory is to efficiency of access and retrieval of pattern stored in database.
In case of damaged pattern, it is unable to restore it.

Applications of Associative Memory


Following are the application areas of associative memory:
1.Patter Recognition i.e. face, signature etc.
2.Content Addressable Storage (CAS)
3.Clustering
4.Encoding and Decoding of Data

***End of Unit – 1***

Downloaded by Deepansh Sharma ([email protected])

You might also like