Unit-1 (1)
Unit-1 (1)
Neural Networks
Computer Science and
Engineering 6th semester
6
Biological Neural Networks 7
Biological neuron
Working of Boilogical Neuron 8
• Gross physical structure:
• There is one axon that branches
• There is a dendritic tree that collects input from other
neurons.
• Spike generation:
• There is an axon hillock that generates outgoing spikes
whenever enough charge has flowed in at synapses to
depolarize the cell membrane.
Synapse 9
•Note that, a biological neuron receives all inputs through the dendrites,
sums them and produces an output if the sum is greater than a
threshold value.
•The input signals are passed on to the cell body through the synapse,
which may accelerate or retard an arriving signal.
•It is this acceleration or retardation of the input signals that is modeled
by the weights.
•An effective synapse, which transmits a stronger signal will have a
correspondingly larger weights while a weak synapse will have smaller
weights.
•Thus, weights here are multiplicative factors of the inputs to account
for the strength of the synapse.
Artificial Neurons
21
Wb Y
X2 f()
Wc
X3
Dendrite
-A neuron receives input, determines the strength or the weight of the input, calculates
the total weighted input, and compares the total weighted with a value (threshold)
- If the total weighted input greater than or equal the threshold value, the neuron will
produce the output, and if the total weighted input less than the threshold value, no
output will be produced
Thresholding Unit and Bias 27
• The ANNs are robust system and fault tolerant. They can
therefore, recall full patterns from incomplete, partial or
noisy patterns.
• The ANNS can process information in parallel, at high speed
and in a distributed manner. Thus a massively parallel
distributed processing system made up of highly
interconnected (artificial) neural computing elements
having ability to learn and acquire knowledge is possible.
Advantages Of NN 30
NON-LINEARITY
It can model non-linear systems
INPUT-OUTPUT MAPPING
It can derive a relationship between a set of input & output responses
ADAPTIVITY
The ability to learn allows the network to adapt to changes in the
surrounding environment
EVIDENTIAL RESPONSE
It can provide a confidence level to a given solution
Advantages Of NN 31
CONTEXTUAL INFORMATION
Knowledge is presented by the structure of the network. Every neuron in
the network is potentially affected by the global activity of all other
neurons in the network. Consequently, contextual information is dealt with
naturally in the network.
FAULT TOLERANCE
Distributed nature of the NN gives it fault tolerant capabilities
NEUROBIOLOGY ANALOGY
Models the architecture of the brain
Comparison of ANN with conventional AI methods 32
Lecture-1 (Questions) 33
• Activation Function
• ANN Architectures
History 35
36
the discovery of the neural net resulted in the implementation of optical neural
nets, Boltzmann machine, spatiotemporal nets, pulsed neural networks and
support vector machines.
Model of ANN 37
• The models of ANN are specified by the three basic entities namely:
1. the model’s synaptic interconnections (Architecture)
1. Feed-forward NN
1. Single Layer FF NN
2. Multilayer FF NN
2. Feedback NN / Recurrent NN
1. single node with its own feedback;
2. single-layer recurrent network
3. multilayer recurrent network.
2. The training or learning rules adopted for updating and adjusting the connection
weights
1. Parameter learning
2. Structure learning
3. their activation functions.
1. Linear AF
2. Binary Step Function
3. Non-Linear AF
Activation Function 39
• This function provides the slope of the negative part of the function
as an argument a. By performing backpropagation, the most
appropriate value of a is learnt.
• Mathematically, f(x) = max(ax , x)
• Where "a" is the slope parameter for negative values.
• The parameterized ReLU function is used when the leaky ReLU
function still fails at solving the problem of dead neurons, and the
relevant information is not successfully passed to the next layer.
• Assume that you have three classes, meaning that there would
be three neurons in the output layer. Now, suppose that your
output from the neurons is [1.8, 0.9, 0.68].
• Applying the softmax function over these values to give a
probabilistic view will result in the following outcome: [0.58, 0.23,
0.19].
• The function returns 1 for the largest probability index while it
returns 0 for the other two array indexes. Here, giving full weight
to index 0 and no weight to index 1 and index 2. So the output
would be the class corresponding to the 1st neuron(index 0) out
of three.
• You can see now how softmax activation function make things
easy for multi-class classification problems.
Swish 58
• The Gaussian Error Linear Unit (GELU) activation function is compatible with BERT,
ROBERTa, ALBERT, and other top NLP models. This activation function is motivated by
combining properties from dropout, zoneout, and ReLUs.
• ReLU and dropout together yield a neuron’s output. ReLU does it deterministically by
multiplying the input by zero or one (depending upon the input value being positive or
negative) and dropout stochastically multiplying by zero.
• We merge this functionality by multiplying the input by either zero or one which is
stochastically determined and is dependent upon the input. We multiply the neuron input
x by
• m ∼ Bernoulli(Φ(x)), where Φ(x) = P(X ≤x), X ∼ N (0, 1) is the cumulative distribution
function of the standard normal distribution.
• This distribution is chosen since neuron inputs tend to follow a normal distribution,
61
• SELU has both positive and negative values to shift the mean, which was
impossible for ReLU activation function as it cannot output negative values.
• Gradients can be used to adjust the variance. The activation function needs a
region with a gradient larger than one to increase it.
• SELU has values of alpha α and lambda λ predefined.
• Ramp function
64
How to choose the right Activation
Function? 65
• You need to match your activation function for your output layer based on the type of
prediction problem that you are solving—specifically, the type of predicted variable.
• As a rule of thumb, you can begin with using the ReLU activation function and then
move over to other activation functions if ReLU doesn’t provide optimum results.
• And here are a few other guidelines to help you out.
• ReLU activation function should only be used in the hidden layers.
• Sigmoid/Logistic and Tanh functions should not be used in hidden layers as they
make the model more susceptible to problems during training (due to vanishing
gradients).
• Swish function is used in neural networks having a depth greater than 40 layers.
66
• A neural network will almost always have the same activation function in all hidden layers.
This activation function should be differentiable so that the parameters of the network are
learned in backpropagation.
• ReLU is the most commonly used activation function for hidden layers.
• While selecting an activation function, you must consider the problems it might face: vanishing
and exploding gradients.
• Regarding the output layer, we must always consider the expected value range of the
predictions. If it can be any numeric value (as in case of the regression problem) you can use
the linear activation function or ReLU.
Neural Network Architectures 71
Single layer feed forward neural network
• We see, a layer of n neurons constitutues a single layer feed
forward neural network.
72
• This is so called because, it contains a single layer of artificial
neurons.
• Note that the input layer and output layer, which receive input
signals and transmit output signals are although called layers,
they are actually boundary of the architecture and hence truly
not
layers.
• The only layer in the architecture is the synaptic links carrying
the
weights connect every input to the output neurons.
Multilayer feed forward neural
networks 73
• Depending on different type of feedback loops, several recurrent neural networks are
known such as Hopfield network, Boltzmann machine network etc.
78
Single Layer feedback NN 79
Multi- Layer Feedback NN 80
Modular Neural Network 81
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
Supervised Learning 83
Consider yourself as a student sitting in a classroom wherein your teacher is supervising you, “how you can solve the
problem” or “whether you are doing correctly or not”. Likewise, in Supervised Learning input is provided as a labelled dataset, a
model can learn from it to provide the result of the problem easily.
Types of Problems
Supervised Learning deals with two types of problem- classification problems and regression problems.