0% found this document useful (0 votes)
7 views

Unit-1 (1)

Uploaded by

dr.deepika.nith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit-1 (1)

Uploaded by

dr.deepika.nith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 89

UNIT-1, 21CS 3031

Neural Networks
Computer Science and
Engineering 6th semester

Dr. Seema Kharb


Unit-1 : ARCHITECTURE 2

• Biological Neural Network


• Artificial Neuron Model
• Operations of Artificial Neuron
• Types of Neuron
• Activation Function
• ANN Architectures
• Classification Taxonomy of ANN -Connectivity, Learning
Strategy (Supervised, Unsupervised, Reinforcement)
• Learning Rules.
Dataset for Application 3

• IRIS dataset (Numeric)


• MNIST (hand-written characters)
• Imagenet
Reasons to study neural computation 4
• To understand how the brain actually works.
• Its very big and very complicated and made of stuff that dies when
you poke it around. So we need to use computer simulations.
• To understand a style of parallel computation inspired by neurons and
their adaptive connections.
• Very different style from sequential computation.
• should be good for things that brains are good at (e.g. vision)
• Should be bad for things that brains are bad at (e.g. 23 x 71)
• To solve practical problems by using novel learning algorithms inspired
by the brain
• Learning algorithms can be very useful even if they are not how the
brain actually works.
Biological nervous System 5

•Biological nervous system is the most important part of


many living things, in particular, human beings.
•There is a part called brain at the center of human nervous
system.
•In fact, any biological nervous system consists of a large
number of interconnected processing units called neurons.
•Each neuron is approximately 10µm long and they can
operate in parallel.
•Typically, a human brain consists of approximately 1011
neurons communicating with each other with the help of
Biological Neuron 6

6
Biological Neural Networks 7

Biological neuron
Working of Boilogical Neuron 8
• Gross physical structure:
• There is one axon that branches
• There is a dendritic tree that collects input from other
neurons.

• Axons typically contact dendritic trees at synapses


• A spike of activity in the axon causes charge to be
injected into the post-synaptic neuron.

• Spike generation:
• There is an axon hillock that generates outgoing spikes
whenever enough charge has flowed in at synapses to
depolarize the cell membrane.
Synapse 9

• When a spike of activity travels along an axon and


arrives at a synapse it causes vesicles of transmitter
chemical to be released.
• There are several kinds of transmitter.

• The transmitter molecules diffuse across the synaptic


cleft and bind to receptor molecules in the membrane
of the post-synaptic neuron thus changing their shape.
• This opens up holes that allow specific ions in or out.
Adaptation of Synapse 10

• The effectiveness of the synapse can be changed:


• vary the number of vesicles of transmitter.
• vary the number of receptor molecules.

• Synapses are slow, but they have advantages over


RAM
• They are very small and very low-power.
• They adapt using locally available signals
• But what rules do they use to decide how to
change?
How the brain works on one slide! 11
• Each neuron receives inputs from other neurons
- A few neurons also connect to receptors.
- Cortical neurons use spikes to communicate.
• The effect of each input line on the neuron is controlled
by a synaptic weight
• The weights can be positive or negative.
• The synaptic weights adapt so that the whole network learns to perform useful
computations
• Recognizing objects, understanding language, making plans, controlling the
body.
• You have about neurons each with about weights.
• A huge number of weights can affect the computation in a very short time.
Much better bandwidth than a workstation.
Modularity and the brain 12
• Different bits of the cortex do different things.
• Local damage to the brain has specific effects.
• Specific tasks increase the blood flow to specific regions.
• But cortex looks pretty much the same all over.
• Early brain damage makes functions relocate.
• Cortex is made of general purpose stuff that has the ability to turn
into special purpose hardware in response to experience.
• This gives rapid parallel computation plus flexibility.
• Conventional computers get flexibility by having stored
sequential programs, but this requires very fast central
processors to perform long sequential computations.
Biological Neural Networks
13
Idealized neurons 14
• To model things we have to idealize them (e.g. atoms)
• Idealization removes complicated details that are not essential for
understanding the main principles.
• It allows us to apply mathematics and to make analogies to other,
familiar systems.
• Once we understand the basic principles, its easy to add complexity
to make the model more faithful.
• It is often worth understanding models that are known to be wrong (but
we must not forget that they are wrong!)
• E.g. neurons that communicate real values rather than discrete spikes
of activity.
Artificial Neurons
15
ANN is an information processing system that has
certain performance characteristics in common
with biological nets.
Several key features of the processing elements of ANN are
suggested by the properties of biological neurons:
1. The processing element receives many signals.
2. Signals may be modified by a weight at the receiving
synapse.
3. The processing element sums the weighted inputs.
4. Under appropriate circumstances (sufficient input), the neuron
transmits a single output.
5. The output from a particular neuron may go to many other neurons.
Artificial Neural Network 16

• In fact, the human brain is a highly complex structure viewed


as a massive, highly interconnected network of simple
processing elements called neurons.
• Artificial neural networks (ANNs) or simply we refer it as
neural network (NNs), which are simplified models (i.e.
imitations) of the biological nervous system, and obviously,
therefore, have been motivated by the kind of computing
performed by the human brain.
• The behavior of a biolgical neural network can be captured
by a simple model called artificial neural network.
Analogy between BNN and ANN 17
Artificial Neuron
18

Four basic components of a human The components of a basic artificial neuron


biological
neuron
19

•Note that, a biological neuron receives all inputs through the dendrites,
sums them and produces an output if the sum is greater than a
threshold value.
•The input signals are passed on to the cell body through the synapse,
which may accelerate or retard an arriving signal.
•It is this acceleration or retardation of the input signals that is modeled
by the weights.
•An effective synapse, which transmits a stronger signal will have a
correspondingly larger weights while a weak synapse will have smaller
weights.
•Thus, weights here are multiplicative factors of the inputs to account
for the strength of the synapse.
Artificial Neurons
21

ANNs have been developed as generalizations of mathematical


models of neural biology, based on the assumptions that:

1. Information processing occurs at many simple elements called


neurons.
2. Signals are passed between neurons over connection links.
3. Each connection link has an associated weight, which, in typical neural net,
multiplies the signal transmitted.
4. Each neuron applies an activation function to its net input to determine its
output signal.
Model Of A Neuron 22
Wa
X1

Wb Y
X2  f()

Wc
X3

Input units Connectio Summing


computation
n function
weights
(dendrit (synaps (axon
e) e) (soma )
)
23
• A neural net consists of a large number of
simple processing elements called neurons,
units, cells or nodes.

• Each neuron is connected to other neurons by


means of directed communication links, each with
associated weight.

• The weight represent information being used by the


net to solve a problem.
24

• Each neuron has an internal state, called its


activation or activity level, which is a function of
the inputs it has received. Typically, a neuron
sends its activation as a signal to several other
neurons.

• It is important to note that a neuron can send


only one signal at a time, although that signal is
broadcast to several other neurons.
25

• Neural networks are configured for a specific


application, such as pattern recognition or data
classification, through a learning process
• In a biological system, learning involves
adjustments to the synaptic connections
between neurons
 same for artificial neural networks (ANNs)
Artificial Neural Network
Synapse Nukleus 26
x1 w1
 
y
Axon
x2 w2 Activation Function:
yin = x1w1 + x2w2 (y-in) = 1 if y-in >= 
and (y-in) = 0

Dendrite

-A neuron receives input, determines the strength or the weight of the input, calculates
the total weighted input, and compares the total weighted with a value (threshold)

-The value is in the range of 0 and 1

- If the total weighted input greater than or equal the threshold value, the neuron will
produce the output, and if the total weighted input less than the threshold value, no
output will be produced
Thresholding Unit and Bias 27

•A very commonly known transfer function is the thresholding function.


•In this thresholding function, sum (i.e. I) is compared with a threshold value θ.
•If the value of I is greater than T, then the output is 1 else it is 0 (this is just like a
simple linear filter).
•In other words,
Advantages of ANN 28

• ANNs exhibits mapping capabilities, that is, they can map


input patterns to their associated output pattern.
• The ANNs learn by examples. Thus, an ANN architecture can
be trained with known example of a problem before they
are tested for their inference capabilities on unknown
instance of the problem. In other words, they can identify
new objects previous untrained.
• The ANNs posses the capability to generalize. This is the
power to apply in application where exact mathematical
model to problem are not possible.
29

• The ANNs are robust system and fault tolerant. They can
therefore, recall full patterns from incomplete, partial or
noisy patterns.
• The ANNS can process information in parallel, at high speed
and in a distributed manner. Thus a massively parallel
distributed processing system made up of highly
interconnected (artificial) neural computing elements
having ability to learn and acquire knowledge is possible.
Advantages Of NN 30

NON-LINEARITY
It can model non-linear systems

INPUT-OUTPUT MAPPING
It can derive a relationship between a set of input & output responses

ADAPTIVITY
The ability to learn allows the network to adapt to changes in the
surrounding environment

EVIDENTIAL RESPONSE
It can provide a confidence level to a given solution
Advantages Of NN 31

CONTEXTUAL INFORMATION
Knowledge is presented by the structure of the network. Every neuron in
the network is potentially affected by the global activity of all other
neurons in the network. Consequently, contextual information is dealt with
naturally in the network.

FAULT TOLERANCE
Distributed nature of the NN gives it fault tolerant capabilities

NEUROBIOLOGY ANALOGY
Models the architecture of the brain
Comparison of ANN with conventional AI methods 32
Lecture-1 (Questions) 33

1. Discuss in detail the structure and working of biological


neuron.
2. Analogy between ANN and BNN
3. What are the advantages of ANN.
4. Design basic NN for AND, OR, NOT gate.
5. Can we design a basic NN for NAND, NOR, XOR, XNOR.
Justify your answer.
Lecture-2 34

• Activation Function
• ANN Architectures
History 35
36

the discovery of the neural net resulted in the implementation of optical neural
nets, Boltzmann machine, spatiotemporal nets, pulsed neural networks and
support vector machines.
Model of ANN 37

• The models of ANN are specified by the three basic entities namely:
1. the model’s synaptic interconnections (Architecture)
1. Feed-forward NN
1. Single Layer FF NN
2. Multilayer FF NN
2. Feedback NN / Recurrent NN
1. single node with its own feedback;
2. single-layer recurrent network
3. multilayer recurrent network.
2. The training or learning rules adopted for updating and adjusting the connection
weights
1. Parameter learning
2. Structure learning
3. their activation functions.
1. Linear AF
2. Binary Step Function
3. Non-Linear AF
Activation Function 39

• Identity function / Linear : It is a linear function and can be


defined as
f(x)=x, for all x,
The output here remains the same as input. The input layer
uses the identity activation function.
• A linear activation function has two major problems :
• It’s not possible to use backpropagation as the derivative of the
function is a constant and has no relation to the input x.
• All layers of the neural network will collapse into one if a linear
activation function is used. No matter the number of layers in the
neural network, the last layer will still be a linear function of the first
layer. So, essentially, a linear activation function turns the neural
network into just one layer.
Activation Function 40

• Binary step function: This function can be defined as

where θ represents the threshold value. This function is most


widely used in single-layer nets to convert the net input to
an output that is a binary (1 or 0)
41

• Bipolar step function: This function can be defined as

where θ represents the threshold value. This function is also


used in single-layer nets to convert the net input to an output
that is bipolar (+1 or –1)
• Sigmoidal functions: The sigmoidal functions are widely used
in back-propagation nets because of the relationship between
the value of the functions at a point and the value of the
derivative at that point which reduces the computational
burden during training.
Non-Linear AF 42

• Non-linear activation functions solve the following limitations


of linear activation functions:
• They allow backpropagation because now the derivative
function would be related to the input, and it’s possible to go
back and understand which weights in the input neurons can
provide a better prediction.
• They allow the stacking of multiple layers of neurons as the
output would now be a non-linear combination of input
passed through multiple layers. Any output can be
represented as a functional computation in a neural network.
Non-Linear AF 43

• Binary sigmoid function: It is also termed


as logistic sigmoid function or unipolar
sigmoid function. It can be defined as

where λ is the steepness parameter. The


derivative of this function is
44

• why sigmoid/logistic activation function is one of the most


widely used functions:

• It is commonly used for models where we have to predict the


probability as an output. Since probability of anything exists
only between the range of 0 and 1, sigmoid is the right
choice because of its range.
• The function is differentiable and provides a smooth gradient,
i.e., preventing jumps in output values. This is represented by
an S-shape of the sigmoid activation function.
Limitation of Logistic AF 45

• The derivative of the function is


• f'(x) = sigmoid(x)*(1-sigmoid(x)).
• As we can see from the above Figure, the gradient
values are only significant for range -3 to 3, and the
graph gets much flatter in other regions.

• It implies that for values greater than 3 or less than -


3, the function will have very small gradients. As the
gradient value approaches zero, the network ceases
to learn and suffers from the Vanishing
gradient problem.

• The output of the logistic function is not symmetric


around zero. So the output of all the neurons will be
of the same sign. This makes the
training of the neural network more difficult and
unstable.
• Bipolar sigmoid function: This function is defined as
46
where λ is the steepness parameter and the sigmoid function range is
between –1 and +1. The derivative of this function can be

• The bipolar sigmoidal function is closely related to hyperbolic tangent


function, which is written as

The derivative of the hyperbolic tangent function is

If the network uses a binary data, it is better to convert it to bipolar form


and use the bipolar sigmoidal activation function or hyperbolic tangent
function.
47

• Advantages of using this activation function are:


• The output of the tanh activation function is Zero centered;
hence we can easily map the output values as strongly
negative, neutral, or strongly positive.
• Usually used in hidden layers of a neural network as its
values lie between -1 to; therefore, the mean for the hidden
layer comes out to be 0 or very close to it. It helps in
centering the data and makes learning for the next layer
much easier.
48

• it also faces the problem of vanishing gradients


similar to the sigmoid activation function. Plus the
gradient of the tanh function is much steeper as
compared to the sigmoid function.

• Although both sigmoid and tanh face vanishing


gradient issue, tanh is zero centered, and the
gradients are not restricted to move in a certain
direction. Therefore, in practice, tanh nonlinearity is
always preferred to sigmoid nonlinearity.

• Both tanh and logistic sigmoid activation functions are


used in feed-forward nets.
ReLU : Rectified Linear Unit 49

• Although it gives an impression of a linear


function, ReLU has a derivative function and
allows for backpropagation while simultaneously
making it computationally efficient.

• The main catch here is that the ReLU function


does not activate all the neurons at the same
time.

• The neurons will only be deactivated if the output


of the linear transformation is less than 0.
• Mathematically, f(x) = max(0,x)
• The advantages of using ReLU as an activation function
are as follows:

• Since only a certain number of neurons are activated, the


50
ReLU function is far more computationally efficient when
compared to the sigmoid and tanh functions.
• ReLU accelerates the convergence of gradient descent
towards the global minimum of the loss function due to
its linear, non-saturating property.

• The limitations faced by this function are:

• The Dying ReLU problem,


• The negative side of the graph makes the gradient value zero.
Due to this reason, during the backpropagation process, the
weights and biases for some neurons are not updated. This can
create dead neurons which never get activated.

• All the negative input values become zero immediately, which


decreases the model’s ability to fit or train from the data properly.
Leaky ReLU 51
• Leaky ReLU is an improved version of ReLU function to solve the
Dying ReLU problem as it has a small positive slope in the negative
area.
• Mathematically, f(x) = max(0.01x,x)
• The advantages of Leaky ReLU are same as that of ReLU, in addition
to the fact that it does enable backpropagation, even for negative
input values.
• By making this minor modification for negative input values, the
gradient of the left side of the graph comes out to be a non-zero
value. Therefore, we would no longer encounter dead neurons in
that region.
• The limitations that this function faces include:

• The predictions may not be consistent for negative input values.


• The gradient for negative values is a small value that makes the learning of
model parameters time-consuming.
Parametric / Randomized ReLU 52
• Parametric ReLU is another variant of ReLU that aims to solve the
problem of gradient’s becoming zero for the left half of the axis.

• This function provides the slope of the negative part of the function
as an argument a. By performing backpropagation, the most
appropriate value of a is learnt.
• Mathematically, f(x) = max(ax , x)
• Where "a" is the slope parameter for negative values.
• The parameterized ReLU function is used when the leaky ReLU
function still fails at solving the problem of dead neurons, and the
relevant information is not successfully passed to the next layer.

• This function’s limitation is that it may perform differently for


different problems depending upon the value of slope parameter a.
Exponential Linear Units (ELUs)
Function 53

• is also a variant of ReLU that modifies the


slope of the negative part of the function.

• ELU uses a log curve to define the negativ


values unlike the leaky ReLU and Parametric
ReLU functions with a straight line.
54

• ELU is a strong alternative for f ReLU because of


the following advantages:

• ELU becomes smooth slowly until its output equal to -α


whereas RELU sharply smoothes.
• Avoids dead ReLU problem by introducing log curve for
negative values of input. It helps the network nudge
weights and biases in the right direction.

• The limitations of the ELU function are as follow:

• It increases the computational time because of the


exponential operation included
• No learning of the ‘a’ value takes place
• Exploding gradient problem
Softmax Function 55

• the Softmax function is described as a combination of multiple


sigmoids.

• It calculates the relative probabilities. Similar to the sigmoid/logistic


activation function, the SoftMax function returns the probability of each
class.

• It is most commonly used as an activation function for the last layer of


the neural network in the case of multi-class classification.
Need of Softmax 56
• The output of the sigmoid function was in the range of
0 to 1, which can be thought of as probability.

• But the sigmoid function faces certain problems.

• Let’s suppose we have five output values of 0.8, 0.9,


0.7, 0.8, and 0.6, respectively. How can we move
forward with it?

• The answer is: We can’t.

• The above values don’t make sense as the sum of all


the classes/output probabilities should be equal to 1.
Example (Multi-class Classification) 57

• Assume that you have three classes, meaning that there would
be three neurons in the output layer. Now, suppose that your
output from the neurons is [1.8, 0.9, 0.68].
• Applying the softmax function over these values to give a
probabilistic view will result in the following outcome: [0.58, 0.23,
0.19].
• The function returns 1 for the largest probability index while it
returns 0 for the other two array indexes. Here, giving full weight
to index 0 and no weight to index 1 and index 2. So the output
would be the class corresponding to the 1st neuron(index 0) out
of three.
• You can see now how softmax activation function make things
easy for multi-class classification problems.
Swish 58

• It is a self-gated activation function developed by


researchers at Google.

• Swish consistently matches or outperforms ReLU


activation function on deep networks applied to
various challenging domains such as
image classification, machine translation etc.
• This function is bounded below but unbounded
above i.e. Y approaches to a constant value
as X approaches negative infinity but Y approaches
to infinity as X approaches infinity.
• Mathematically it can be represented as:
• Here are a few advantages of the Swish activation function over
ReLU: 59
• Swish is a smooth function that means that it does not abruptly
change direction like ReLU does near x = 0. Rather, it smoothly
bends from 0 towards values < 0 and then upwards again.

• Small negative values were zeroed out in ReLU activation


function. However, those negative values may still be relevant
for capturing patterns underlying the data. Large negative values
are zeroed out for reasons of sparsity making it a win-win
situation.

• The swish function being non-monotonous enhances the


expression of input data and weight to be learnt.
Gaussian Error Linear Unit (GELU) 60

• The Gaussian Error Linear Unit (GELU) activation function is compatible with BERT,
ROBERTa, ALBERT, and other top NLP models. This activation function is motivated by
combining properties from dropout, zoneout, and ReLUs.

• ReLU and dropout together yield a neuron’s output. ReLU does it deterministically by
multiplying the input by zero or one (depending upon the input value being positive or
negative) and dropout stochastically multiplying by zero.

• RNN regularizer called zoneout stochastically multiplies inputs by one.

• We merge this functionality by multiplying the input by either zero or one which is
stochastically determined and is dependent upon the input. We multiply the neuron input
x by
• m ∼ Bernoulli(Φ(x)), where Φ(x) = P(X ≤x), X ∼ N (0, 1) is the cumulative distribution
function of the standard normal distribution.

• This distribution is chosen since neuron inputs tend to follow a normal distribution,
61

GELU nonlinearity is better than


ReLU and ELU activations and
finds performance
improvements across all tasks in
domains of computer vision,
natural language processing,
and speech recognition.
Scaled Exponential Linear Unit
(SELU) 62
• SELU was defined in self-normalizing networks and takes care of internal
normalization which means each layer preserves the mean and variance from
the previous layers. SELU enables this normalization by adjusting the mean
and variance.

• SELU has both positive and negative values to shift the mean, which was
impossible for ReLU activation function as it cannot output negative values.

• Gradients can be used to adjust the variance. The activation function needs a
region with a gradient larger than one to increase it.
• SELU has values of alpha α and lambda λ predefined.

• Here’s the main advantage of SELU over ReLU:

• Internal normalization is faster than external normalization, which means the


network converges faster.

• SELU is a relatively newer activation function and needs more papers on


architectures such as CNNs and RNNs, where it is comparatively explored.
63

• Ramp function
64
How to choose the right Activation
Function? 65
• You need to match your activation function for your output layer based on the type of
prediction problem that you are solving—specifically, the type of predicted variable.

• Here’s what you should keep in mind.

• As a rule of thumb, you can begin with using the ReLU activation function and then
move over to other activation functions if ReLU doesn’t provide optimum results.
• And here are a few other guidelines to help you out.
• ReLU activation function should only be used in the hidden layers.
• Sigmoid/Logistic and Tanh functions should not be used in hidden layers as they
make the model more susceptible to problems during training (due to vanishing
gradients).
• Swish function is used in neural networks having a depth greater than 40 layers.
66

• a few rules for choosing the activation function for your


output layer based on the type of prediction problem that
you are solving:

• Regression - Linear Activation Function


• Binary Classification—Sigmoid/Logistic Activation
Function
• Multiclass Classification—Softmax
• Multilabel Classification—Sigmoid
67

• The activation function used in hidden layers is typically


chosen based on the type of neural network architecture.

• Convolutional Neural Network (CNN): ReLU activation


function.
• Recurrent Neural Network: Tanh and/or Sigmoid
activation function.
68
69
Summary 70

• Activation Functions are used to introduce non-linearity in the network.

• A neural network will almost always have the same activation function in all hidden layers.
This activation function should be differentiable so that the parameters of the network are
learned in backpropagation.

• ReLU is the most commonly used activation function for hidden layers.

• While selecting an activation function, you must consider the problems it might face: vanishing
and exploding gradients.

• Regarding the output layer, we must always consider the expected value range of the
predictions. If it can be any numeric value (as in case of the regression problem) you can use
the linear activation function or ReLU.
Neural Network Architectures 71
Single layer feed forward neural network
• We see, a layer of n neurons constitutues a single layer feed
forward neural network.
72
• This is so called because, it contains a single layer of artificial
neurons.
• Note that the input layer and output layer, which receive input
signals and transmit output signals are although called layers,
they are actually boundary of the architecture and hence truly
not
layers.
• The only layer in the architecture is the synaptic links carrying
the
weights connect every input to the output neurons.
Multilayer feed forward neural
networks 73

• This network, as its name indicates is made up of multiple layers.


• Thus architectures of this class besides processing an input and
an output layer also have one or more intermediary layers called
hidden layers.
• The hidden layer(s) aid in performing useful intermediary
computation before directing the input to the output layer.
• A multilayer feed forward network with l input neurons (number of
neuron at the first layer), m1; m2; · · · ; mp number of neurons at i-
th
hidden layer (i = 1; 2; · · · ; p) and n neurons at the last layer (it is
the output neurons) is written as l - m1 - m2 - · · · - mp - n
MLFFNN
74
• In l - m - n MLFFNN, the input first layer contains l numbers
neurons, the only hidden layer contains m number of neurons and
the last (output) layer contains n number of neurons. 75
• The inputs x1; x2; :::::xp are fed to the first layer and the weight
matrices between input and the first layer, the first layer and the
hidden layer and those between hidden and the last (output) layer
are denoted as W 1, W 2, and W 3, respectively.
• Further, consider that f 1, f 2, and f 3 are the transfer functions of
neurons lying on the first, hidden and the last layers, respectively.
Likewise, the threshold values of any i-th neuron in j-th layer is
denoted by θij.
• Moreover, the output of i-th, j-th, and k-th neuron in any l-th layer
is represented by Oil = fil P XiW l + θil, where Xl is the input
vector to the l-th layer.
Recurrent neural network
architecture 76

• The networks differ from feedforward network architectures


in the sense that there is at least one ”feedback loop”.
• Thus, in these networks, there could exist one layer with
feedback
connection.
There could also be neurons with self-feedback links, that
is, the
output of a neuron is fed back into itself as input.
77

• Depending on different type of feedback loops, several recurrent neural networks are
known such as Hopfield network, Boltzmann machine network etc.
78
Single Layer feedback NN 79
Multi- Layer Feedback NN 80
Modular Neural Network 81

• A modular neural network has a number of different


networks that function independently and perform sub-
tasks. The different networks do not really interact with
or signal each other during the computation process.
They work independently towards achieving the output.
• As a result, a large and complex computational process
can be done significantly faster by breaking it down into
independent components. The computation speed
increases because the networks are not interacting with
or even connected to each other. Here’s a visual
representation of a Modular Neural Network.
Learning Paradigm 82

• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
Supervised Learning 83
Consider yourself as a student sitting in a classroom wherein your teacher is supervising you, “how you can solve the
problem” or “whether you are doing correctly or not”. Likewise, in Supervised Learning input is provided as a labelled dataset, a
model can learn from it to provide the result of the problem easily.
Types of Problems
Supervised Learning deals with two types of problem- classification problems and regression problems.

• Classification Problems • Regression problems


• This algorithm helps to predict a discrete value. • These problems are used for continuous data.
It can be thought, the input data as a member For example, predicting the price of a piece of
of a particular class or group. For instance, land in a city, given the area, location, number
taking up the photos of the fruit dataset, each of rooms, etc. And then the input is sent to the
photo has been labelled as a mango, an apple,
machine for calculating the price of the land
etc. Here, the algorithm has to classify the
new images into any of these categories. according to previous examples. Examples-
Examples: • Linear Regression
• Naive Bayes Classifier • Nonlinear Regression
• Support Vector Machines • Bayesian Linear Regression
• Logistic Regression
Unsupervised Learning 84

• This learning algorithm is completely opposite to Supervised


Learning. In short, there is no complete and clean labelled
dataset in unsupervised learning. Unsupervised learning is
self-organized learning. Its main aim is to explore the underlying
patterns and predicts the output. Here we basically provide the
machine with data and ask to look for hidden features and
cluster the data in a way that makes sense. Example
• K – Means clustering
• Neural Networks
• Principal Component Analysis
Reinforcement Learning 85

• It is neither based on supervised learning nor unsupervised


learning. Moreover, here the algorithms learn to react to an
environment on their own. It is rapidly growing and moreover
producing a variety of learning algorithms. These algorithms are
useful in the field of Robotics, Gaming etc.
• For a learning agent, there is always a start state and an end
state. However, to reach the end state, there might be a different
path. In Reinforcement Learning Problem an agent tries to
manipulate the environment. The agent travels from
one state to another. The agent gets the reward(appreciation) on
success but will not receive any reward or appreciation on
failure. In this way, the agent learns from the environment.
86
87

(i) supervised learning, which requires the availability of a target or


desired response for the realization of a specific input–output mapping by
minimizing a cost function of interest;

(ii) unsupervised learning, the implementation of which relies on the


provision of a task-independent measure of the quality of representation
that the network is required to learn in a self-organized manner;

(iii) reinforcement learning, in which input–output mapping is performed


through the continued interaction of a learning system with its
environment so as to minimize a scalar index of performance.
88

• semisupervised learning, which employs a training sample that


consists of labeled as well as unlabeled examples. The challenge
in semisupervised learning, discussed in a subsequent chapter,
is to design a learning system that scales reasonably well
for its implementation to be practically feasible when dealing
with large-scale pattern classification problems.
• Reinforcement learning lies between supervised learning and
unsupervised learning. It operates through continuing
interactions between a learning system (agent) and the
environment. The learning system performs an action and learns
from the response of the environment to that action. In effect,
the role of the teacher in supervised learning is replaced by a
critic, for example, that is integrated into the learning machinery.
Key Differences 89

• Supervised Learning deals with two main tasks Regression and


Classification. Unsupervised Learning deals with clustering and
associative rule mining problems. Whereas Reinforcement
Learning deals with exploitation or exploration, Markov’s decision
processes, Policy Learning, Deep Learning and value learning.
• Supervised Learning works with the labelled data and here the
output data patterns are known to the system. But, the
unsupervised learning deals with unlabeled data where the
output is based on the collection of perceptions. Whereas in
Reinforcement Learning Markov’s Decision process- the agent
interacts with the environment in discrete steps.
90

• The name itself says, Supervised Learning is highly supervised. And


Unsupervised Learning is not supervised. As against, Reinforcement Learning is
less supervised which depends on the agent in determining the output.
• The input data in Supervised Learning in labelled data. Whereas, in
Unsupervised Learning the data is unlabelled. The data is not predefined in
Reinforcement Learning.
• Supervised Learning predicts based on a class type. Unsupervised Learning
discovers underlying patterns. And in Reinforcement Learning, the learning
agent works as a reward and action system.
• Supervised learning maps labelled data to known output. Whereas,
Unsupervised Learning explore patterns and predict the output. Reinforcement
Learning follows a trial and error method.
• To sum up, in Supervised Learning, the goal is to generate formula based on
input and output values. In Unsupervised Learning, we find an association
between input values and group them. In Reinforcement Learning an agent
learn through delayed feedback by interacting with the environment.
91

You might also like