0% found this document useful (0 votes)
93 views

ASC-unit 1 Notes

The document provides details about the syllabus for a course on neural networks including topics like neuron structure, artificial neurons, activation functions, neural network architecture, learning techniques, and memory networks. It describes key concepts such as the biological structure of neurons, the components of artificial neurons, different types of neural networks and their applications.

Uploaded by

Aryan jha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

ASC-unit 1 Notes

The document provides details about the syllabus for a course on neural networks including topics like neuron structure, artificial neurons, activation functions, neural network architecture, learning techniques, and memory networks. It describes key concepts such as the biological structure of neurons, the components of artificial neurons, different types of neural networks and their applications.

Uploaded by

Aryan jha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Unit-1

Syllabus
Neural Networks-I (Introduction & Architecture): Neuron, Nerve structure and synapse,
Artificial Neuron and its model, activation functions, Neural network architecture: single layer and
multilayer feed forward networks, recurrent networks. Various learning techniques; perception
and convergence rule, Auto-associative and hetro-associative memory.
Detailed Topics
1. Neuron
2. Nerve Structure and synapse
3. Artificial Neuron and its Model
4. Activation functions
4.1 Linear activation function
4.2 Binary activation function
4.3 Non-Linear activation function
4.3.1 Sigmoid activation function
4.3.2 Tanh activation function
4.3.3 ReLU activation function
4.3.4 Leaky ReLU activation function
4.3.5 ELU activation function
4.3.6 Softmax activation function
5. Neural Network Architecture:
a. Single Layer Feed Forward
b. Multi-Layer Feed forward
c. Recurrent Networks
6. Various Learning Techniques
6.1 Unsupervised Learning
6.1.1 Hebb Learning Rule
6.1.2 Competitive (Winner-take-all) Learning Rule
6.2 Supervised Learning
6.2.1 Stochastic Learning Rule
6.2.2 Gradient descent Learning Rule
6.2.2.1 LMS (Least Mean Square Error)
6.2.2.2 Backpropagation
6.3 Reinforcement Learning
7. Perception and Convergence rule
8. Auto associative Memory and Hetero Associative memory
8.1 Hop-field Network

NEURON, NERVE STRUCTURE


Neurons as the information carriers that use electrical impulses and chemical signals to transmit
information. The neurons transmit the information in the following two areas:

1. different parts of the brain


2. the brain and the nervous system
Thus, whatever we think, feel, and later do is all due to the working of the neurons. The
following figure shows a typical biological neuron:
A neuron has the following three basic parts:

1. Cell body
2. Cell extension Axon
3. Cell extension Dendrite

The following figure shows the architecture of a biological neuron:

The nucleus in the cell body controls the cell’s functioning. The axon extension (having a long tail)
transmits messages from the cell. Dendrites extension (like a tree branch) receive messages for the cell.
So, in a nutshell, we can summarize that the biological neurons communicate with each other by sending
chemicals, called neurotransmitters, across a tiny space, called a synapse, between the axons and
dendrites of adjacent neurons.

ARTIFICIAL NEURON AND ITS MODEL

An artificial neuron or neural node is a mathematical model. In most cases, it computes the weighted
average of its input and then applies a bias to it. Post that, it passes this resultant term through an
activation function. This activation function is a nonlinear function such as the sigmoid function that
accepts a linear input and gives a nonlinear output.
The following figure shows a typical artificial neuron:

A typical neural network consists of layers of neurons called neural nodes. These layers are of the
following three types:

1. input layer (single)


2. hidden layer (one or more than one)
3. output layer (single)

Each neural node is connected to another and is characterized by its weight and a threshold. It
gets an input on which it does some transformation and post that, it sends an output. If the output of any
individual node is above the specified threshold value, that node gets activated. Then, it sends data to
the next layer of the network. Otherwise, it remains dormant and thus doesn’t transmit any data to the
next layer of the network.
The following figure marks all three types of layers:
Input Layer

This is the first layer in a typical neural network. Input layer neurons receive the input information,
process it through a mathematical function (activation function), and transmit output to the next layer’s
neurons based on comparison with a preset threshold value. We usually have only one input layer in
the network.
We pre-process text, image, audio, video, and other types of data to derive their numeric representation.
Later, we pass this number representation as information to each input layer neuron. Each neuron then
applies a predefined nonlinear mathematical function to calculate the output.
As a final step, we scale the output by preset weights linked to edges between the outgoing layer’s
neurons and the incoming layer’s respective neurons.

Hidden Layer

Then comes the hidden layer. There can be one or more hidden layers in a neural network. Neurons in
a hidden layer receive their inputs either from the neurons of the input layer or from the neurons of the
previously hidden layer. Each neuron then passes the input to another nonlinear activation function and
post that, it sends the output to the next layer neurons.
Here also, we multiply the data by edge weights as it is transmitted to the next layer.

Output Layer

At last, comes the output layer. We have only one output layer in the network that marks the logical
ending of the neural network.
Similar to previously described layers, neurons in the output layer receive inputs from previous layers.
Then, they process them through new activation functions and output the expected results. Depending
on the type of Artificial Neural Network, we can either use this output as a final result or feed it as an
input to the same neural network (loopback) or another neural network (cascade).
Now, let’s talk about the output layer’s result. We can have the final result as a simple binary
classification denoting one of the two classes or we can have multi-class classifications. We can also
use the final result as a predicted value.
Further, depending on the type of Artificial Neural Network, the final output could be used as a final
result, or as an output to a new loop over the same or another neural net.

TYPES OF NEURAL NETWORKS

Here, we’ll study different types of neural networks based on the direction of information flow.
1. Single layer neural network

Here, X1 and X2 are inputs to the artificial neurons, f(X) represents the processing done on the
inputs and y represents the output of the neuron.
2. Multi Layer Feedforward Neural Network
Here, we find that the signal or the information travels in one way only. In a feedforward neural network,
information flow starts from the input layer to the hidden layer(s) and finally to the output layer. We
find no feedback or loops in it.
In other words, we can say that the output of a layer say does not affect in any way in this type of
network. But, the output of will affect the output of the layers ahead of it. Feedforward neural networks
are simple and straight networks. They have a one-to-one mapping between inputs with outputs.
The following figure shows a typical feedforward neural network:

We mostly use them in pattern generation, pattern recognition, and classification.

3.Feedback Neural Network

After going through the feedforward neural network, let’s move to the feedback neural network.
In this type of network, the signal or the information flows in both directions, i.e., forward and
backward. This makes them more powerful and more complex than the feed-forward neural networks.
Feedback neural networks are dynamic because the network state keeps on changing until it reaches an
equilibrium point. They remain at the equilibrium point till the input remains the same. Once the input
changes, this process goes on until they find a new equilibrium.
We also call these networks interactive or recurrent networks due to their dynamic
architecture. Moreover, we find feedback loops in these networks.
The following figure shows a typical feed backward neural network:

We mostly use them in speech recognition, text prediction, and image recognition.

Understanding Neural Networks – Bias, Threshold, Benefits of Neural Networks

In this section, let’s enumerate the benefits of neural networks.


Neural networks possess the unique ability to derive quantifiable meaning from complicated or
imprecise data. We can employ a well-structured neural network to extract patterns and detect otherwise
too complex trends for us to discover and understand using other computer techniques.
Once we design a network for a specific problem and train it with a well-curated dataset, then we can
analyze complex information using it. And this aids in decision-making at the highest level by providing
projections for all plausible situations.
The single most significant benefit for neural networks is that they have adaptive learning capacity. We
can understand this as the ability to learn how to do tasks based on the data given for training and then
constantly improving its performance as it gets more and more data.

What is a Firing(activating) of a neuron?


In real life, we all have heard the phrase- “Fire up those neurons” in one form or another. The same
applies to artificial neurons as well. Every neuron has a tendency to fire but only in certain conditions.
For example-
If we represent this f(X) by addition then this neuron may fire when the sum is greater than, say 100.
While there may be a case where the other neuron may fire when the sum is greater than 10-

These certain conditions which differ neuron to neuron are called Threshold. For example, if the input
X1 into the first neuron is 30 and X2 is 0:

This neuron will not fire, since the sum 30+0 = 30 is not greater than the threshold i.e 100. Whereas if
the input had remained the same for the other neuron then this neuron would have fired since the sum
of 30 is greater than the threshold of 10.

Now, the negative threshold is called the Bias of a neuron. Let us represent this a bit mathematically.
So we can represent the firing and non-firing condition of a neuron using these couple of equations-
If the sum of the inputs is greater than the threshold then the neuron will fire. Otherwise, the neuron
will not fire. Let’s simplify this equation a bit and bring the threshold to the left side of the equations.
Now, this negative threshold is called Bias-

One thing to note is that in an artificial neural network, all the neurons in a layer have the same bias.
Now that we have a good understanding of bias and how it represents the condition for a neuron to fire,
let’s move to another aspect of an artificial neuron called Weights.

So far even in our calculation, we have assigned equal importance to all the inputs. For example-

Here X1 has a weight of 1 and X2 has a weight of 1 and the bias has a weight of 1 but what if we want
to have different weights attached to different inputs?

Let’s have a look at an example to understand this better. Suppose today is a college party and you have
to decide whether you should go to the party or not based on some input conditions such as Is the
weather good? Is the venue near? Is your crush coming?

So, if the weather is good then it will be presented with a value of 1, otherwise 0. Similarly, if the venue
is near it will be represented by 1, otherwise 0. And similarly for whether your crush is coming to the
party or not.

Now suppose being a college teenager, you absolutely adore your crush and you can go to any lengths
to see him or her. So you will definitely go to the party no matter how the weather is or how far the
venue is, then you will want to assign more weight to X3 which represents the crush in comparison to
the other two inputs.

Such a situation can be represented if we assign weights to an input such as this-

We can assign a weight of 3 to the weather, a weight of 2 to the venue, and a weight of 6 to the crush.
Now if the sum of all these three factors that is weather, venue, and crush is greater than a threshold of
5, then you can decide to go to the party otherwise not.

Note: X0 is the bias value

So for example, we have taken initially the condition where crush is more important than the weather
or the venue itself.

So let’s say for example, as we represented here the weather(X1) is bad represented by 0 and the
venue(X2) is far off represented by 0 but your crush(X3) is coming to the party which is represented by
1, so when you calculate the sum after multiplying the values of Xs with their respective weights, we
get a sum of 0 for Weather(X1), 0 for Venue(X2) and 6 for Crush(X3). Since 6 is greater than the
threshold of 5, you will decide to go to the party. Hence the output(y) is 1.

Let’s imagine a different scenario now. Imagine you’re sick today and no matter what you will not
attend the party then this situation can be represented by assigning equal weight to weather, venue, and
crush with the threshold of 4.
Now, in this case we are changing the value of the threshold and setting it to a value of 4 so even if the
weather is good, the venue is near and your crush is coming, you won’t be going to the party since the
sum i.e 1 + 1 + 1 equal to 3, is less than the threshold value of 4.

This w0, w1, w2, and w3 are called the weights of neurons and are different for different neurons. These
weights are the ones that a neural network has to learn to make good decisions.

ACTIVATION FUNCTIONS IN A NEURAL NETWORK

Now that we know how a neural network combines different inputs using weights, let’s move to the
last aspect of a neuron called the Activation functions. So far what we have been doing is simply
adding some weighted inputs and calculating some output and this output can read from minus infinity
to infinity.

But this can be challenged in many circumstances. Assume we first want to estimate the age of a person
from his height, weight, and cholesterol level and then classify the person as old or not, based on if the
age is greater than 60.
Now if we use this given neuron then the age of -20 is even possible. You know that the range of age
according to the current structure of this neuron will range from -∞ to ∞. So even the age of someone
as -20 is possible, given this absurd range for age we can still use our condition to decide whether a
person is old or not. For example, if we have said a certain criterion such as a person is old only if the
age is greater than 60. So even if the age comes out to be -20 we can use this criterion to classify the
person as not old.

But it would have been much better had the age made much more sense such as if the output of this
neuron which represents the age had been in the range of let’s say 0 to 120. So, how can we solve this
problem when the output of a neuron is not in a particular range?

One method is to clip the age on the negative side would be to use a function such as max(0, X).

Now let’s first note the original condition, before using any function. For the positive X, we had a
positive Y, and for negative X we had a negative Y. Here x-axis represents the actual values and y
represents the transformed values-

But now if you want to get rid of the negative values what we can do is use a function like max(0, X).
Using this function anything which is on the negative side of the x-axis gets clipped to 0.

Linear Activation Function

The linear activation function, often called the identity activation function, is proportional to the input.
The range of the linear activation function will be (-∞ to ∞). The linear activation function simply adds
up the weighted total of the inputs and returns the result.
Linear Activation Function — Graph

Mathematically, it can be represented as:

Linear Activation Function — Equation


Pros and Cons
 It is not a binary activation because the linear activation function only delivers a range of
activations. We can surely connect a few neurons together, and if there are multiple
activations, we can calculate the max (or soft max) based on that.
 The derivative of this activation function is a constant. That is to say, the gradient is
unrelated to the x (input).

Binary Step Activation Function

A threshold value determines whether a neuron should be activated or not activated in a binary step
activation function.
The activation function compares the input value to a threshold value. If the input value is greater than
the threshold value, the neuron is activated. It’s disabled if the input value is less than the threshold
value, which means its output isn’t sent on to the next or hidden layer.
Binary Step Function — Graph
Mathematically, the binary activation function can be represented as:

Binary Step Activation Function — Equation


Pros and Cons
 It cannot provide multi-value outputs — for example, it cannot be used for multi-class
classification problems.
 The step function’s gradient is zero, which makes the back propagation procedure difficult.

Non-linear Activation Functions

The non-linear activation functions are the most-used activation functions. They make it uncomplicated
for an artificial neural network model to adapt to a variety of data and to differentiate between the
outputs.
Non-linear activation functions allow the stacking of multiple layers of neurons, as the output would
now be a non-linear combination of input passed through multiple layers. Any output can be represented
as a functional computation output in a neural network.
These activation functions are mainly divided basis on their range and curves. The remainder of this
article will outline the major non-linear activiation functions used in neural networks.

1. Sigmoid

Sigmoid accepts a number as input and returns a number between 0 and 1. It’s simple to use and has all
the desirable qualities of activation functions: nonlinearity, continuous differentiation, monotonicity,
and a set output range.
This is mainly used in binary classification problems. This sigmoid function gives the probability of
an existence of a particular class.
Sigmoid Activation Function — Graph
Mathematically, it can be represented as:

Sigmoid Activation Function — Equation

Pros and Cons


 It is non-linear in nature. Combinations of this function are also non-linear, and it will give
an analogue activation, unlike binary step activation function. It has a smooth gradient too,
and It’s good for a classifier type problem.
 The output of the activation function is always going to be in the range (0,1) compared to (-
∞, ∞) of linear activation function. As a result, we’ve defined a range for our activations.
 Sigmoid function gives rise to a problem of “Vanishing gradients” and Sigmoids saturate
and kill gradients.
 Its output isn’t zero centred, and it makes the gradient updates go too far in different
directions. The output value is between zero and one, so it makes optimization harder.
 The network either refuses to learn more or is extremely slow.

2.TanH (Hyperbolic Tangent)

TanH compress a real-valued number to the range [-1, 1]. It’s non-linear, But it’s different from
Sigmoid,and its output is zero-centered. The main advantage of this is that the negative inputs will be
mapped strongly to the negative and zero inputs will be mapped to almost zero in the graph of TanH.
TanH Activation Function — Graph
Mathematically, TanH function can be represented as:

TanH Activation Function — Equation

Pros and Cons


 TanH also has the vanishing gradient problem, but the gradient is stronger for TanH than
sigmoid (derivatives are steeper).
 TanH is zero-centered, and gradients do not have to move in a specific direction.

3.ReLU (Rectified Linear Unit)

ReLU stands for Rectified Linear Unit and is one of the most commonly used activation function in the
applications. It’s solved the problem of vanishing gradient because the maximum value of the gradient
of ReLU function is one. It also solved the problem of saturating neuron, since the slope is never zero
for ReLU function. The range of ReLU is between 0 and infinity.
ReLU Activation Function — Graph
Mathematically, it can be represented as:

ReLU Activation Function — Equation


Pros and Cons
 Since only a certain number of neurons are activated, the ReLU function is far more
computationally efficient when compared to the sigmoid and TanH functions.
 ReLU accelerates the convergence of gradient descent towards the global minimum of the
loss function due to its linear, non-saturating property.
 One of its limitations is that it should only be used within hidden layers of an artificial
neural network model.
 Some gradients can be fragile during training.
 In other words, For activations in the region (x<0) of ReLu, the gradient will be 0 because
of which the weights will not get adjusted during descent. That means, those neurons, which
go into that state will stop responding to variations in input (simply because the gradient is
0, nothing changes.) This is called the dying ReLu problem.

4.Leaky ReLU

Leaky ReLU is an upgraded version of the ReLU activation function to solve the dying ReLU problem,
as it has a small positive slope in the negative area. But, the consistency of the benefit across tasks is
presently ambiguous.
Leaky ReLU Activation Function — Graph
Mathematically, it can be represented as,

Leaky ReLU Activation Function — Equation


Pros and Cons
 The advantages of Leaky ReLU are the same as that of ReLU, in addition to the fact that it
does enable back propagation, even for negative input values.
 Making minor modification of negative input values, the gradient of the left side of the graph
comes out to be a real (non-zero) value. As a result, there would be no more dead neurons
in that area.
 The predictions may not be steady for negative input values.

5.ELU (Exponential Linear Units)

ELU is also one of the variations of ReLU which also solves the dead ReLU problem. ELU, just like
leaky ReLU also considers negative values by introducing a new alpha parameter and multiplying it
will another equation.
ELU is slightly more computationally expensive than leaky ReLU, and it’s very similar to ReLU except
negative inputs. They are both in identity function shape for positive inputs.
ELU Activation Function-Graph
Mathematically, it can be represented as:

ELU Activation Function — Equation


Pros and Cons
 ELU is a strong alternative to ReLU. Different from the ReLU, ELU can produce negative
outputs.
 Exponential operations are there in ELU, So it increases the computational time.
 No learning about the ‘a’ value takes place, and exploding gradient problem.

6. Softmax

A combination of many sigmoids is referred to as the Softmax function. It determines relative


probability. Similar to the sigmoid activation function, the Softmax function returns the probability of
each class/labels. In multi-class classification, softmax activation function is most commonly used
for the last layer of the neural network.
The softmax function gives the probability of the current class with respect to others. This means that
it also considers the possibility of other classes too.
Softmax Activation Function — Graph
Mathematically, it can be represented as:

Softmax Activation Function — Equation


Pros and Cons
 It mimics the one encoded label better than the absolute values.
 We would lose information if we used absolute (modulus) values, but the exponential takes
care of this on its own.
 The softmax function should be used for multi-label classification and regression task as
well.

VARIOUS LEARNING RULES/TECHNIQUES


Learning rule enhances the Artificial Neural Network’s performance by applying this rule over the
network. Thus learning rule updates the weights and bias levels of a network when certain conditions
are met in the training process. it is a crucial part of the development of the Neural Network.

Types Of Learning Rules in ANN


1. Hebbian Learning Rule
Hebbian Learning Rule, also known as Hebb Learning Rule, was proposed by Donald O Hebb. It is
one of the first and also easiest learning rules in the neural network. It is used for pattern classification.
It is a single layer neural network, i.e. it has one input layer and one output layer. The input layer can
have many units, say n. The output layer only has one unit. Hebbian rule works by updating the
weights between neurons in the neural network for each training sample.
Hebbian Learning Rule Algorithm :
1. Set all weights to zero, wi = 0 for i=1 to n, and bias to zero.
2. For each input vector, S(input vector) : t(target output pair), repeat steps 3-5.
3. Set activations for input units with the input vector Xi = Si for i = 1 to n.
4. Set the corresponding output value to the output neuron, i.e. y = t.
5. Update weight and bias by applying Hebb rule for all i = 1 to n:
2.Competitive learning Rule(winner take all)
3. Perceptron and Convergence Rule
AUTO ASSOCIATIVE MEMORY AND HETERO ASSOCIATIVE
MEMORY
Hopfield Network:

You might also like