ASC-unit 1 Notes
ASC-unit 1 Notes
Syllabus
Neural Networks-I (Introduction & Architecture): Neuron, Nerve structure and synapse,
Artificial Neuron and its model, activation functions, Neural network architecture: single layer and
multilayer feed forward networks, recurrent networks. Various learning techniques; perception
and convergence rule, Auto-associative and hetro-associative memory.
Detailed Topics
1. Neuron
2. Nerve Structure and synapse
3. Artificial Neuron and its Model
4. Activation functions
4.1 Linear activation function
4.2 Binary activation function
4.3 Non-Linear activation function
4.3.1 Sigmoid activation function
4.3.2 Tanh activation function
4.3.3 ReLU activation function
4.3.4 Leaky ReLU activation function
4.3.5 ELU activation function
4.3.6 Softmax activation function
5. Neural Network Architecture:
a. Single Layer Feed Forward
b. Multi-Layer Feed forward
c. Recurrent Networks
6. Various Learning Techniques
6.1 Unsupervised Learning
6.1.1 Hebb Learning Rule
6.1.2 Competitive (Winner-take-all) Learning Rule
6.2 Supervised Learning
6.2.1 Stochastic Learning Rule
6.2.2 Gradient descent Learning Rule
6.2.2.1 LMS (Least Mean Square Error)
6.2.2.2 Backpropagation
6.3 Reinforcement Learning
7. Perception and Convergence rule
8. Auto associative Memory and Hetero Associative memory
8.1 Hop-field Network
1. Cell body
2. Cell extension Axon
3. Cell extension Dendrite
The nucleus in the cell body controls the cell’s functioning. The axon extension (having a long tail)
transmits messages from the cell. Dendrites extension (like a tree branch) receive messages for the cell.
So, in a nutshell, we can summarize that the biological neurons communicate with each other by sending
chemicals, called neurotransmitters, across a tiny space, called a synapse, between the axons and
dendrites of adjacent neurons.
An artificial neuron or neural node is a mathematical model. In most cases, it computes the weighted
average of its input and then applies a bias to it. Post that, it passes this resultant term through an
activation function. This activation function is a nonlinear function such as the sigmoid function that
accepts a linear input and gives a nonlinear output.
The following figure shows a typical artificial neuron:
A typical neural network consists of layers of neurons called neural nodes. These layers are of the
following three types:
Each neural node is connected to another and is characterized by its weight and a threshold. It
gets an input on which it does some transformation and post that, it sends an output. If the output of any
individual node is above the specified threshold value, that node gets activated. Then, it sends data to
the next layer of the network. Otherwise, it remains dormant and thus doesn’t transmit any data to the
next layer of the network.
The following figure marks all three types of layers:
Input Layer
This is the first layer in a typical neural network. Input layer neurons receive the input information,
process it through a mathematical function (activation function), and transmit output to the next layer’s
neurons based on comparison with a preset threshold value. We usually have only one input layer in
the network.
We pre-process text, image, audio, video, and other types of data to derive their numeric representation.
Later, we pass this number representation as information to each input layer neuron. Each neuron then
applies a predefined nonlinear mathematical function to calculate the output.
As a final step, we scale the output by preset weights linked to edges between the outgoing layer’s
neurons and the incoming layer’s respective neurons.
Hidden Layer
Then comes the hidden layer. There can be one or more hidden layers in a neural network. Neurons in
a hidden layer receive their inputs either from the neurons of the input layer or from the neurons of the
previously hidden layer. Each neuron then passes the input to another nonlinear activation function and
post that, it sends the output to the next layer neurons.
Here also, we multiply the data by edge weights as it is transmitted to the next layer.
Output Layer
At last, comes the output layer. We have only one output layer in the network that marks the logical
ending of the neural network.
Similar to previously described layers, neurons in the output layer receive inputs from previous layers.
Then, they process them through new activation functions and output the expected results. Depending
on the type of Artificial Neural Network, we can either use this output as a final result or feed it as an
input to the same neural network (loopback) or another neural network (cascade).
Now, let’s talk about the output layer’s result. We can have the final result as a simple binary
classification denoting one of the two classes or we can have multi-class classifications. We can also
use the final result as a predicted value.
Further, depending on the type of Artificial Neural Network, the final output could be used as a final
result, or as an output to a new loop over the same or another neural net.
Here, we’ll study different types of neural networks based on the direction of information flow.
1. Single layer neural network
Here, X1 and X2 are inputs to the artificial neurons, f(X) represents the processing done on the
inputs and y represents the output of the neuron.
2. Multi Layer Feedforward Neural Network
Here, we find that the signal or the information travels in one way only. In a feedforward neural network,
information flow starts from the input layer to the hidden layer(s) and finally to the output layer. We
find no feedback or loops in it.
In other words, we can say that the output of a layer say does not affect in any way in this type of
network. But, the output of will affect the output of the layers ahead of it. Feedforward neural networks
are simple and straight networks. They have a one-to-one mapping between inputs with outputs.
The following figure shows a typical feedforward neural network:
After going through the feedforward neural network, let’s move to the feedback neural network.
In this type of network, the signal or the information flows in both directions, i.e., forward and
backward. This makes them more powerful and more complex than the feed-forward neural networks.
Feedback neural networks are dynamic because the network state keeps on changing until it reaches an
equilibrium point. They remain at the equilibrium point till the input remains the same. Once the input
changes, this process goes on until they find a new equilibrium.
We also call these networks interactive or recurrent networks due to their dynamic
architecture. Moreover, we find feedback loops in these networks.
The following figure shows a typical feed backward neural network:
We mostly use them in speech recognition, text prediction, and image recognition.
These certain conditions which differ neuron to neuron are called Threshold. For example, if the input
X1 into the first neuron is 30 and X2 is 0:
This neuron will not fire, since the sum 30+0 = 30 is not greater than the threshold i.e 100. Whereas if
the input had remained the same for the other neuron then this neuron would have fired since the sum
of 30 is greater than the threshold of 10.
Now, the negative threshold is called the Bias of a neuron. Let us represent this a bit mathematically.
So we can represent the firing and non-firing condition of a neuron using these couple of equations-
If the sum of the inputs is greater than the threshold then the neuron will fire. Otherwise, the neuron
will not fire. Let’s simplify this equation a bit and bring the threshold to the left side of the equations.
Now, this negative threshold is called Bias-
One thing to note is that in an artificial neural network, all the neurons in a layer have the same bias.
Now that we have a good understanding of bias and how it represents the condition for a neuron to fire,
let’s move to another aspect of an artificial neuron called Weights.
So far even in our calculation, we have assigned equal importance to all the inputs. For example-
Here X1 has a weight of 1 and X2 has a weight of 1 and the bias has a weight of 1 but what if we want
to have different weights attached to different inputs?
Let’s have a look at an example to understand this better. Suppose today is a college party and you have
to decide whether you should go to the party or not based on some input conditions such as Is the
weather good? Is the venue near? Is your crush coming?
So, if the weather is good then it will be presented with a value of 1, otherwise 0. Similarly, if the venue
is near it will be represented by 1, otherwise 0. And similarly for whether your crush is coming to the
party or not.
Now suppose being a college teenager, you absolutely adore your crush and you can go to any lengths
to see him or her. So you will definitely go to the party no matter how the weather is or how far the
venue is, then you will want to assign more weight to X3 which represents the crush in comparison to
the other two inputs.
We can assign a weight of 3 to the weather, a weight of 2 to the venue, and a weight of 6 to the crush.
Now if the sum of all these three factors that is weather, venue, and crush is greater than a threshold of
5, then you can decide to go to the party otherwise not.
So for example, we have taken initially the condition where crush is more important than the weather
or the venue itself.
So let’s say for example, as we represented here the weather(X1) is bad represented by 0 and the
venue(X2) is far off represented by 0 but your crush(X3) is coming to the party which is represented by
1, so when you calculate the sum after multiplying the values of Xs with their respective weights, we
get a sum of 0 for Weather(X1), 0 for Venue(X2) and 6 for Crush(X3). Since 6 is greater than the
threshold of 5, you will decide to go to the party. Hence the output(y) is 1.
Let’s imagine a different scenario now. Imagine you’re sick today and no matter what you will not
attend the party then this situation can be represented by assigning equal weight to weather, venue, and
crush with the threshold of 4.
Now, in this case we are changing the value of the threshold and setting it to a value of 4 so even if the
weather is good, the venue is near and your crush is coming, you won’t be going to the party since the
sum i.e 1 + 1 + 1 equal to 3, is less than the threshold value of 4.
This w0, w1, w2, and w3 are called the weights of neurons and are different for different neurons. These
weights are the ones that a neural network has to learn to make good decisions.
Now that we know how a neural network combines different inputs using weights, let’s move to the
last aspect of a neuron called the Activation functions. So far what we have been doing is simply
adding some weighted inputs and calculating some output and this output can read from minus infinity
to infinity.
But this can be challenged in many circumstances. Assume we first want to estimate the age of a person
from his height, weight, and cholesterol level and then classify the person as old or not, based on if the
age is greater than 60.
Now if we use this given neuron then the age of -20 is even possible. You know that the range of age
according to the current structure of this neuron will range from -∞ to ∞. So even the age of someone
as -20 is possible, given this absurd range for age we can still use our condition to decide whether a
person is old or not. For example, if we have said a certain criterion such as a person is old only if the
age is greater than 60. So even if the age comes out to be -20 we can use this criterion to classify the
person as not old.
But it would have been much better had the age made much more sense such as if the output of this
neuron which represents the age had been in the range of let’s say 0 to 120. So, how can we solve this
problem when the output of a neuron is not in a particular range?
One method is to clip the age on the negative side would be to use a function such as max(0, X).
Now let’s first note the original condition, before using any function. For the positive X, we had a
positive Y, and for negative X we had a negative Y. Here x-axis represents the actual values and y
represents the transformed values-
But now if you want to get rid of the negative values what we can do is use a function like max(0, X).
Using this function anything which is on the negative side of the x-axis gets clipped to 0.
The linear activation function, often called the identity activation function, is proportional to the input.
The range of the linear activation function will be (-∞ to ∞). The linear activation function simply adds
up the weighted total of the inputs and returns the result.
Linear Activation Function — Graph
A threshold value determines whether a neuron should be activated or not activated in a binary step
activation function.
The activation function compares the input value to a threshold value. If the input value is greater than
the threshold value, the neuron is activated. It’s disabled if the input value is less than the threshold
value, which means its output isn’t sent on to the next or hidden layer.
Binary Step Function — Graph
Mathematically, the binary activation function can be represented as:
The non-linear activation functions are the most-used activation functions. They make it uncomplicated
for an artificial neural network model to adapt to a variety of data and to differentiate between the
outputs.
Non-linear activation functions allow the stacking of multiple layers of neurons, as the output would
now be a non-linear combination of input passed through multiple layers. Any output can be represented
as a functional computation output in a neural network.
These activation functions are mainly divided basis on their range and curves. The remainder of this
article will outline the major non-linear activiation functions used in neural networks.
1. Sigmoid
Sigmoid accepts a number as input and returns a number between 0 and 1. It’s simple to use and has all
the desirable qualities of activation functions: nonlinearity, continuous differentiation, monotonicity,
and a set output range.
This is mainly used in binary classification problems. This sigmoid function gives the probability of
an existence of a particular class.
Sigmoid Activation Function — Graph
Mathematically, it can be represented as:
TanH compress a real-valued number to the range [-1, 1]. It’s non-linear, But it’s different from
Sigmoid,and its output is zero-centered. The main advantage of this is that the negative inputs will be
mapped strongly to the negative and zero inputs will be mapped to almost zero in the graph of TanH.
TanH Activation Function — Graph
Mathematically, TanH function can be represented as:
ReLU stands for Rectified Linear Unit and is one of the most commonly used activation function in the
applications. It’s solved the problem of vanishing gradient because the maximum value of the gradient
of ReLU function is one. It also solved the problem of saturating neuron, since the slope is never zero
for ReLU function. The range of ReLU is between 0 and infinity.
ReLU Activation Function — Graph
Mathematically, it can be represented as:
4.Leaky ReLU
Leaky ReLU is an upgraded version of the ReLU activation function to solve the dying ReLU problem,
as it has a small positive slope in the negative area. But, the consistency of the benefit across tasks is
presently ambiguous.
Leaky ReLU Activation Function — Graph
Mathematically, it can be represented as,
ELU is also one of the variations of ReLU which also solves the dead ReLU problem. ELU, just like
leaky ReLU also considers negative values by introducing a new alpha parameter and multiplying it
will another equation.
ELU is slightly more computationally expensive than leaky ReLU, and it’s very similar to ReLU except
negative inputs. They are both in identity function shape for positive inputs.
ELU Activation Function-Graph
Mathematically, it can be represented as:
6. Softmax