0% found this document useful (1 vote)
78 views

Unit 1 Notes

Uploaded by

vasanthshm123
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
78 views

Unit 1 Notes

Uploaded by

vasanthshm123
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

CCS355 - NEURAL NETWORK AND DEEP LEARNING

UNIT - I

Processing of ANN depends upon the following three building blocks

 Network Topology
 Adjustments of Weights or Learning
 Activation Functions

In this chapter, we will discuss in detail about these three building blocks of ANN

Network Topology

A network topology is the arrangement of a network along with its nodes and
connecting lines. According to the topology, ANN can be classified as the following
kinds −

Feedforward Network

It is a non-recurrent network having processing units/nodes in layers and


all the nodes in a layer are connected with the nodes of the previous layers. The
connection has different weights upon them. There is no feedback loop means the
signal can only flow in one direction, from input to output. It may be divided into
the following two types −

 Single layer feedforward network − The concept is of feedforward ANN


having only one weighted layer. In other words, we can say the input layer is
fully connected to the output layer.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 1 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

 Multilayer feedforward network − The concept is of feedforward ANN


having more than one weighted layer. As this network has one or more
layers between the input and the output layer, it is called hidden layers.

Feedback Network

As the name suggests, a feedback network has feedback paths, which means
the signal can flow in both directions using loops. This makes it a non-linear
dynamic system, which changes continuously until it reaches a state of
equilibrium. It may be divided into the following types −

 Recurrent networks − They are feedback networks with closed loops.


Following are the two types of recurrent networks.
 Fully recurrent network − It is the simplest neural network architecture
because all nodes are connected to all other nodes and each node works as
both input and output.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 2 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

 Jordan network − It is a closed loop network in which the output will go to


the input again as feedback as shown in the following diagram.

Adjustments of Weights or Learning

Learning, in artificial neural network, is the method of modifying the weights


of connections between the neurons of a specified network. Learning in ANN can
be classified into three categories namely supervised learning, unsupervised
learning, and reinforcement learning.

Supervised Learning

As the name suggests, this type of learning is done under the supervision of
a teacher. This learning process is dependent.

During the training of ANN under supervised learning, the input vector is
presented to the network, which will give an output vector. This output vector is

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 3 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

compared with the desired output vector. An error signal is generated, if there is a
difference between the actual output and the desired output vector. On the basis
of this error signal, the weights are adjusted until the actual output is matched
with the desired output.

Unsupervised Learning

As the name suggests, this type of learning is done without the supervision
of a teacher. This learning process is independent.

During the training of ANN under unsupervised learning, the input vectors
of similar type are combined to form clusters. When a new input pattern is applied,
then the neural network gives an output response indicating the class to which the
input pattern belongs.

There is no
feedback from
the environment
as to what should
be the desired output
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 4 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING

and if it is correct or incorrect. Hence, in this type of learning, the network itself
must discover the patterns and features from the input data, and the relation for
the input data over the output.

Reinforcement Learning

As the name suggests, this type of learning is used to reinforce or strengthen


the network over some critic information. This learning process is similar to
supervised learning, however we might have very less information.

During the training of network under reinforcement learning, the network


receives some feedback from the environment. This makes it somewhat similar to
supervised learning. However, the feedback obtained here is evaluative not
instructive, which means there is no teacher as in supervised learning. After
receiving the feedback, the network performs adjustments of the weights to get
better critic information in future.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 5 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Activation Functions

It may be defined as the extra force or effort applied over the input to obtain an
exact output. In ANN, we can also apply activation functions over the input to get
the exact output. Followings are some activation functions of interest −

Linear Activation Function

It is also called the identity function as it performs no input editing. It can be


defined as −

F(x)= x

Sigmoid Activation Function

It is of two type as follows −

 Binary sigmoidal function − This activation function performs input editing


between 0 and 1. It is positive in nature. It is always bounded, which means
its output cannot be less than 0 and more than 1. It is also strictly
increasing in nature, which means more the input higher would be the
output. It can be defined as

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 6 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

 Bipolar sigmoidal function − This activation function performs input


editing between -1 and 1. It can be positive or negative in nature. It is always
bounded, which means its output cannot be less than -1 and more than 1. It
is also strictly increasing in nature like sigmoid function. It can be defined
as

Backpropagation is a popular method for training artificial neural networks,


especially deep neural networks.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 7 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Backpropagation is needed to calculate the gradient, which we need to adapt


the weights of the weight matrices. The weights of the neurons (ie nodes) of the neural
network are adjusted by calculating the gradient of the loss function. For this purpose
a gradient descent optimization algorithm is used. It is also called backward
propagation of errors.

A metaphor might help : picture yourself being put in a mountain, not


necessarily at the top, by a helicopter at night and/or under fog. Let’s also imagine
that this mountain is on an island and you want to reach sea level.

 You have to go down, but you hardly see anything, maybe just a few meters.
Your task is to find your way down, but you cannot see the path. You can use
the method of gradient descent. This means that you are examining the
steepness at your current position. You will proceed in the direction with the
steepest descent.

 You take only a few steps and then you stop again to reorientate yourself. This
means you are applying again the previously described procedure, i.e. you are
looking for the steepest descend.

Keeping going like this will enable you to arrive at a position where there is no
further descend (ie each direction goes upwards). You may have reached the
deepest level (global minimum), but you could be stuck in a basin or something. If
you start at the position on the right side of our image, everything works out fine,
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 8 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING

but from the left-side, you will be stuck in a local minimum. In summary, if you
are dropped many times at random places on this theoretical island, you will find
ways downwards to sea level. This is what we actually do when we train a neural
network.

The actual backpropagation procedure

Assuming we start with a simple (linear) neural network:

with the following example value associated with weights:

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 9 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

We have labels, i.e. target or desired values t for each output value o.
The error is the difference between the target and the actual output:

We will later use a squared error function, because it has better characteristics
for the algorithm.

We will have a look at the output value o1, which is depending on the
values w11, w21, w31 and w41. Let’s assume the calculated value (o1) is 0.92 and
the desired value (t1) is 1. In this case the error is

This means in our example:

The total error in our weight matrix between the hidden and the output layer looks
like this:

The denominator in the left matrix is always the same (scaling factor).

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 10 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

We can drop it so that the


This example has demonstrated backpropagation for a basic scenario of a linear
neural network.

Flowchart of backpropagation neural network algorithm.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 11 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

The flowchart of Error Back Propagation Artificial Neural Network training


architecture

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 12 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 13 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

What is Artificial Neural Network?

Artificial Neural Network ANN is an efficient computing system whose


central theme is borrowed from the analogy of biological neural networks. ANNs
are also named as “artificial neural systems,” or “parallel distributed processing
systems,” or “connectionist systems.” ANN acquires a large collection of units that
are interconnected in some pattern to allow communication between the units.
These units, also referred to as nodes or neurons, are simple processors which
operate in parallel.

Every neuron is connected with other neuron through a connection link.


Each connection link is associated with a weight that has information about the
input signal. This is the most useful information for neurons to solve a particular
problem because the weight usually excites or inhibits the signal that is being
communicated. Each neuron has an internal state, which is called an activation
signal. Output signals, which are produced after combining the input signals and
activation rule, may be sent to other units.

A Brief History of ANN

The history of ANN can be divided into the following three eras −

ANN during 1940s to 1960s

Some key developments of this era are as follows −

 1943 − It has been assumed that the concept of neural network started with
the work of physiologist, Warren McCulloch, and mathematician, Walter
Pitts, when in 1943 they modeled a simple neural network using electrical
circuits in order to describe how neurons in the brain might work.
 1949 − Donald Hebb’s book, The Organization of Behavior, put forth the fact
that repeated activation of one neuron by another increases its strength
each time they are used.
 1956 − An associative memory network was introduced by Taylor.
 1958 − A learning method for McCulloch and Pitts neuron model named
Perceptron was invented by Rosenblatt.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 14 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

 1960 − Bernard Widrow and Marcian Hoff developed models called


"ADALINE" and “MADALINE.”

ANN during 1960s to 1980s

Some key developments of this era are as follows −

 1961 − Rosenblatt made an unsuccessful attempt but proposed the


“backpropagation” scheme for multilayer networks.
 1964 − Taylor constructed a winner-take-all circuit with inhibitions among
output units.
 1969 − Multilayer perceptron MLP was invented by Minsky and Papert.
 1971 − Kohonen developed Associative memories.
 1976 − Stephen Grossberg and Gail Carpenter developed Adaptive
resonance theory.

ANN from 1980s till Present

Some key developments of this era are as follows −

 1982 − The major development was Hopfield’s Energy approach.


 1985 − Boltzmann machine was developed by Ackley, Hinton, and
Sejnowski.
 1986 − Rumelhart, Hinton, and Williams introduced Generalised Delta Rule.
 1988 − Kosko developed Binary Associative Memory BAM and also gave the
concept of Fuzzy Logic in ANN.

The historical review shows that significant progress has been made in this field.
Neural network based chips are emerging and applications to complex problems
are being developed. Surely, today is a period of transition for neural network
technology.

Biological Neuron

A nerve cell neuron is a special biological cell that processes information.


According to an estimation, there are huge number of neurons, approximately
1011 with numerous interconnections, approximately 1015.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 15 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING

Schematic Diagram

Working

As shown in the above diagram, a typical neuron consists of the following four
parts with the help of which we can explain its working −

 Dendrites − They are tree-like branches, responsible for receiving the


information from other neurons it is connected to. In other sense, we can
say that they are like the ears of neuron.
 Soma − It is the cell body of the neuron and is responsible for processing of
information, they have received from dendrites.
 Axon − It is just like a cable through which neurons send the information.
 Synapses − It is the connection between the axon and other neuron
dendrites.

ANN versus BNN

Before taking a look at the differences between Artificial Neural


Network ANN and Biological Neural Network BNN, let us take a look at the
similarities based on the terminology between these two.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 16 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Biological Neural Network BNN Artificial Neural Network ANN

Soma Node

Dendrites Input

Synapse Weights or Interconnections

Axon Output

The following table shows the comparison between ANN and BNN based on some
criteria mentioned.

Criteria BNN ANN

Massively parallel, slow but


Processing Massively parallel, fast but inferior than BNN
superior than ANN

1011 neurons and 102 to 104 nodes mainly depends on the type of ap
Size
1015 interconnections And network design erℎ

Very precise, structured and formatted data is required


Learning They can tolerate ambiguity
to tolerate ambiguity

Fault Performance degrades with even It is capable of robust performance, hence has
tolerance partial damage the potential to be fault tolerant

Storage Stores the information in the


Stores the information in continuous memory loca
capacity synapse

Model of Artificial Neural Network

The following diagram represents the general model of ANN followed by its
processing.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 17 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Supervised Learning

As the name suggests, supervised learning takes place under the


supervision of a teacher. This learning process is dependent. During the training of
ANN under supervised learning, the input vector is presented to the network,
which will produce an output vector. This output vector is compared with the
desired/target output vector. An error signal is generated if there is a difference
between the actual output and the desired/target output vector. On the basis of
this error signal, the weights would be adjusted until the actual output is matched
with the desired output.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 18 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING

Perceptron

Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is


the basic operational unit of artificial neural networks. It employs supervised
learning rule and is able to classify the data into two classes.

Operational characteristics of the perceptron: It consists of a single neuron


with an arbitrary number of inputs along with adjustable weights, but the output
of the neuron is 1 or 0 depending upon the threshold. It also consists of a bias
whose weight is always 1. Following figure gives a schematic representation of the
perceptron.

Perceptron thus has the following three basic elements −

 Links − It would have a set of connection links, which carries a weight


including a bias always having weight 1.
 Adder − It adds the input after they are multiplied with their respective
weights.
 Activation function − It limits the output of neuron. The most basic
activation function is a Heaviside step function that has two possible
outputs. This function returns 1, if the input is positive, and 0 for any
negative input.

Training Algorithm
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 19 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING

Perceptron network can be trained for single output unit as well as multiple output
units.

Training Algorithm for Single Output Unit

Step 1 − Initialize the following to start the training −

 Weights
 Bias
 Learning rate α

For easy calculation and simplicity, weights and bias must be set equal to 0 and
the learning rate must be set equal to 1.

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − Continue step 4-6 for every training vector x.

Step 4 − Activate each input unit as follows −

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 20 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Case 2 − if y = t then,
wi(new)=wi(old)

b(new)=b(old)

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Step 8 − Test for the stopping condition, which would happen when there is no
change in weight.

Training Algorithm for Multiple Output Units

The following diagram is the architecture of perceptron for multiple output classes.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 21 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Step 1−
Initialize the following to start the training −
 Weights
 Bias
 Learning rate α�

For easy calculation and simplicity, weights and bias must be set equal to 0 and
the learning rate must be set equal to 1.

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − Continue step 4-6 for every training vector x.

Step 4 − Activate each input unit as follows −

Case 1 − if yj ≠ tj then,

wij(new)=wij(old)+αtjxi

bj(new)=bj(old)+αtj

Case 2 − if yj = tj then,

wij(new)=wij(old)

bj(new)=bj(old)

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 22 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Step 8 − Test for the stopping condition, which will happen when there is no
change in weight.

Adaptive Linear Neuron (Adaline)

Case 1 − if yj ≠ tj then,

wij(new)=wij(old)+αtjxi

bj(new)=bj(old)+αtj

Case 2 − if yj = tj then,

wij(new)=wij(old)

bj(new)=bj(old)

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Step 8 − Test for the stopping condition, which will happen when there is no
change in weight.

Adaptive Linear Neuron (Adaline)

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 23 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Adaline which stands for Adaptive Linear Neuron, is a network having a


single linear unit. It was developed by Widrow and Hoff in 1960. Some important
points about Adaline are as follows −

 It uses bipolar activation function.


 It uses delta rule for training to minimize the Mean-Squared Error (MSE)
between the actual output and the desired/target output.
 The weights and the bias are adjustable.

Architecture

The basic structure of Adaline is similar to perceptron having an extra


feedback loop with the help of which the actual output is compared with the
desired/target output. After comparison on the basis of training algorithm, the
weights and bias will be updated.

Training Algorithm

Step 1 − Initialize the following to start the training −

 Weights
 Bias
 Learning rate α�

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 24 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

For easy calculation and simplicity, weights and bias must be set equal to 0 and
the learning rate must be set equal to 1.

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − Continue step 4-6 for every bipolar training pair s:t.

Step 4 − Activate each input unit as follows −

(t−yin) is the computed error.

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 25 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Step 8 − Test for the stopping condition, which will happen when there is no
change in weight or the highest weight change occurred during training is smaller
than the specified tolerance.

Multiple Adaptive Linear Neuron (Madaline)

Madaline which stands for Multiple Adaptive Linear Neuron, is a network which
consists of many Adalines in parallel. It will have a single output unit. Some
important points about Madaline are as follows −

 It is just like a multilayer perceptron, where Adaline will act as a hidden unit
between the input and the Madaline layer.
 The weights and the bias between the input and Adaline layers, as in we see
in the Adaline architecture, are adjustable.
 The Adaline and Madaline layers have fixed weights and bias of 1.
 Training can be done with the help of Delta rule.

Architecture

The architecture of Madaline consists of “n” neurons of the input


layer, “m” neurons of the Adaline layer, and 1 neuron of the Madaline layer. The
Adaline layer can be considered as the hidden layer as it is between the input layer
and the output layer, i.e. the Madaline layer.

Training Algorithm
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 26 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING

By now we know that only the weights and bias between the input and the Adaline
layer are to be adjusted, and the weights and bias between the Adaline and the
Madaline layer are fixed.

Step 1 − Initialize the following to start the training −

 Weights

 Bias

 Learning rate α�

For easy calculation and simplicity, weights and bias must be set equal to 0 and
the learning rate must be set equal to 1.

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − Continue step 4-7 for every bipolar training pair s:t.

Step 4 − Activate each input unit as follows −

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 27 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 28 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Back Propagation Neural Networks

Back Propagation Neural (BPN) is a multilayer neural network consisting of


the input layer, at least one hidden layer and output layer. As its name suggests,
back propagating will take place in this network. The error which is calculated at
the output layer, by comparing the target output and the actual output, will be
propagated back towards the input layer.

Architecture

As shown in the diagram, the architecture of BPN has three interconnected


layers having weights on them. The hidden layer as well as the output layer also

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 29 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

has bias, whose weight is always 1, on them. As is clear from the diagram, the
working of BPN is in two phases. One phase sends the signal from the input layer
to the output layer, and the other phase back propagates the error from the output
layer to the input layer.

Training Algorithm

For training, BPN will use binary sigmoid activation function. The training of BPN
will have the following three phases.

 Phase 1 − Feed Forward Phase


 Phase 2 − Back Propagation of error
 Phase 3 − Updating of weights

All these steps will be concluded in the algorithm as follows

Step 1 − Initialize the following to start the training −

 Weights
 Learning rate α

For easy calculation and simplicity, take some small random values.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 30 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING

Step 2 − Continue step 3-11 when the stopping condition is not true.

Step 3 − Continue step 4-10 for every training pair.

Phase 1

Step 4 − Each input unit receives input signal xi and sends it to the hidden unit
for all i = 1 to n

Step 5 − Calculate the net input at the hidden unit using the following relation −

PHASE-2

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 31 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 32 of 33


CCS355 - NEURAL NETWORK AND DEEP LEARNING

Prepared by: Dr.B.GOPINATH, PROF/ECE Page 33 of 33

You might also like