Deep Learning Unit 1..
Deep Learning Unit 1..
(AUTONOMOUS)
*****
(2024-2025)
LECTURE NOTES
On
DEEP LEARNING
Introduction
Artificial Neural Networks (ANN) are algorithms based on brain function and are used to model
complicated patterns and forecast issues. The Artificial Neural Network (ANN) is a deep learning
method that arose from the concept of the human brain Biological Neural Networks. The
development of ANN was the result of an attempt to replicate the workings of the human brain.
The workings of ANN are extremely similar to those of biological neural networks, although they
are not identical. ANN algorithm accepts only numeric and structured data.
Convolutional Neural Networks (CNN) and Recursive Neural Networks (RNN) are used to accept
unstructured and non-numeric data forms such as Image, Text, and Speech. This article focuses
solely on Artificial Neural Networks.
An Artificial Neural Network (ANN) is a mathematical model that tries to simulate the structure
and functionalities of biological neural networks.
Basic building block of every artificial neural network is artificial neuron, that is, a simple
mathematical model (function).
In order to define a neural network that consists of a large number of artificial neurons, which are termed
units arranged in a sequence of layers.
Input Layer:
• As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
• The hidden layer presents in-between input and output layers. It performs all the calculations to find
hidden features and patterns.
Output Layer:
• The input goes through a series of transformations using the hidden layer, which finally results in output
that is conveyed using this layer.
• The artificial neural network takes input and computes the weighted sum of the inputs and includes a
bias. This computation is represented in the form of a transfer function.
• It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not.
• There are distinctive activation functions available that can be applied upon the sort of task we are
performing.
It is possible to think of the hidden layer as a “distillation layer,” which extracts some of
the most relevant patterns from the inputs and sends them on to the next layer for
further analysis.
The activation function is important for two reasons: first, it allows you to turn on your
computer.
This model captures the presence of non-linear relationships between the inputs.
It contributes to the conversion of the input into a more usable output.
Finding the “optimal values of W — weights” that minimize prediction error is critical to building a
successful model. The “backpropagation algorithm” does this by converting ANN into a learning
algorithm by learning from mistakes.
1. Network Topology
2. Adjustments of Weights or Learning
3. Activation Functions
Single layer feed forward network: The concept is of feed forward ANN having
only one weighted layer. In other words, we can say the input layer is fully
connected to the output layer.
Multilayer feed forward network: The concept is of feed forward ANN having
more than one weighted layer. As this network has one or more layers between
the input and the output layer, it is called hidden layers.
B. Feedback Network: As the name suggests, a feedback network has feedback paths,
which means the signal can flow in both directions using loops. This makes it a non-linear
dynamic system, which changes continuously until it reaches a state of equilibrium.
Recurrent networks: They are feedback networks with closed loops. Following are
Fully recurrent network: It is the simplest neural network architecture because all
nodes are connected to all other nodes and each node works as both input and output.
Jordan network − It is a closed loop network in which the output will go to the input
again as feedback as shown in the following diagram.
During the training of ANN under supervised learning, the input vector is presented to the
network, which will give an output vector.
An error signal is generated, if there is a difference between the actual output and the
desired output vector.
On the basis of this error signal, the weights are adjusted until the actual output is matched
with the desired output.
Unsupervised Learning:
As the name suggests, this type of learning is done without the supervision of a teacher.
This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of similar type
are combined to form clusters. When a new input pattern is applied, then the neural
network gives an output response indicating the class to which the input pattern belongs.
There is no feedback from the environment as to what should be the desired output and if
it is correct or incorrect.
Hence, in this type of learning, the network itself must discover the patterns and features
from the input data, and the relation for the input data over the output.
Reinforcement Learning:
As the name suggests, this type of learning is used to reinforce or strengthen the network over some
critic information.
This learning process is similar to supervised learning, however we might have very less information.
During the training of network under reinforcement learning, the network receives some feedback
from the environment.
This makes it somewhat similar to supervised learning. However, the feedback obtained here is
9 CSE DEEP LEARNING Prepared by - PCK
evaluative not instructive, which means there is no teacher as in supervised learning.
After receiving the feedback, the network performs adjustments of the weights to get better critic
information in future.
3. Activation Functions: An activation function is a mathematical equation that determines the output
of each element (perceptron or neuron) in the neural network. It takes in the input from each neuron
and transforms it into an output, usually between one and zero or between -1 and one. It may be
defined as the extra force or effort applied over the input to obtain an exact output. In ANN, we can
also apply activation functions over the input to get the exact output.
i) Linear Activation Function: It is also called the identity function as it performs no input
editing.
Binary sigmoidal function: This activation function performs input editing between 0 and
1. It is positive in nature. It is always bounded, which means its output cannot be less than 0
and more than 1. It is also strictly increasing in nature, which means more the input higher
would be the output. It can be defined as
Bipolar sigmoidal function: This activation function performs input editing between -1 and
1. It can be positive or negative in nature. It is always bounded, which means its output
cannot be less than -1 and more than 1. It is also strictly increasing in nature like sigmoid
function. It can be defined as
Here is a glossary of basic terms you should be familiar with before learning the details of neural networks.
Inputs: Source data fed into the neural network, with the goal of making a decision or prediction about the
data. Inputs to a neural network are typically a set of real values, each value is fed into one of the neurons in
the input layer.
Training Set: A set of inputs for which the correct outputs are known, used to train the neural network.
Outputs : Neural networks generate their predictions in the form of a set of real values or boolean
decisions. Each output value is generated by one of the neurons in the output layer.
Neuron/perceptron: The basic unit of the neural network. Accepts an input and generates a prediction.
Each neuron accepts part of the input and passes it through the activation function. Common activation
functions are sigmoid, TanH and ReLu. Activation functions help generate output values within an
acceptable range, and their non-linear form is crucial for training the network.
Weight Space: Each neuron is given a numeric weight. The weights, together with the activation function,
define each neuron’s output. Neural networks are trained by fine-tuning weights, to discover the optimal
set of weights that generates the most accurate prediction.
Forward Pass: The forward pass takes the inputs, passes them through the network and allows each
neuron to react to a fraction of the input. Neurons generate their outputs and pass them on to the next
layer, until eventually the network generates an output.
Error Function: Defines how far the actual output of the current model is from the correct output. When
training the model, the objective is to minimize the error function and bring output as close as possible to
the correct value.
Backpropagation: In order to discover the optimal weights for the neurons, we perform a backward pass,
moving back from the network’s prediction to the neurons that generated that prediction. This is called
backpropagation. Backpropagation tracks the derivatives of the activation functions in each successive
neuron, to find weights that bring the loss function to a minimum, which will generate the best prediction.
This is a mathematical process called gradient descent.
Bias and Variance: When training neural networks, like in other machine learning techniques, we try to
balance between bias and variance. Bias measures how well the model fits the training set—able to
correctly predict the known outputs of the training examples. Variance measures how well the model
works with unknown inputs that were not available during training. Another meaning of bias is a “ bias
Hyperparameters: A hyper parameter is a setting that affects the structure or operation of the neural
network. In real deep learning projects, tuning hyper parameters is the primary way to build a network that
provides accurate predictions for a certain problem. Common hyper parameters include the number of
hidden layers, the activation function, and how many times (epochs) training should be repeated.
LEARNING RULES
1. Hebbian learning rule – It identifies, how to modify the weights of nodes of a network.
2. Perceptron learning rule – Network starts its learning by assigning a random value to each weight.
3. Delta learning rule – Modification in sympatric weight of a node is equal to the multiplication of
error and the input.
Outstar learning rule – We can use it when it assumes that nodes or neurons in a network arranged in a
layer.
1. Hebbian Learning Rule: The Hebbian rule was the first learning rule. In 1949 Donald Hebb
developed it as learning algorithm of the unsupervised neural network.
We can use it to identify how to improve the weights of nodes of a network
The Hebb learning rule assumes that – If two neighbor neurons activated and deactivated at the
same time, then the weight connecting these neurons should increase. At the start, values of all
weights are set to zero. This learning rule can be used for both soft- and hard-activation
functions. Since desired responses of neurons are not used in the learning procedure, this is the
unsupervised learning rule.
The absolute values of the weights are usually proportional to the learning time, which is
undesired.
According to it, an example of supervised learning, the network starts its learning by
assigning a random value to each weight. Calculate the output value on the basis of a set of records
for which we can know the expected output value. This is the learning sample that indicates the
entire definition.
As a result, it is called a learning sample.
The network then compares the calculated output value with the expected value. Next
calculates an error function ∈, which can be the sum of squares of the errors occurring for each
individual in the learning sample which can be computed as:
Perform the first summation on the individuals of the learning set, and perform the second summation
on the output units. Eij and Oij are the expected and obtained values of the jth unit for the ith individual.
The network then adjusts the weights of the different units, checking each time to see if the error
function has increased or decreased. As in a conventional regression, this is a matter of solving a
problem of least squares. Since assigning the weights of nodes according to users, it is an example of
supervised learning.
Delta Learning Rule: Developed by Widrow and Hoff, the delta rule, is one of the most common learning rules.
It depends on supervised learning. This rule states that the modification in sympatric weight of
a node is equal to the multiplication of error and the input. In Mathematical form the delta rule
is as follows:
For a given input vector, compare the output vector is the correct answer.
If the difference is zero, no learning takes place; otherwise, adjusts its weights to reduce this
difference.
The change in weight from ui to uj is: dwij = r* ai * ej. where r is the learning rate, ai represents
Correlation Learning Rule: The correlation learning rule based on a similar principle as the Hebbian
learning rule.
It assumes that weights between responding neurons should be more positive, and weights
between neurons with opposite reaction should be more negative.
Contrary to the Hebbian rule, the correlation rule is the supervised learning, instead of an
actual. The response, oj, the desired response, dj, uses for the weight-change calculation.
Where dj is the desired value of output signal. This training algorithm usually starts with the initialization
of weights to zero.
Since assigning the desired weight by users, the correlation learning rule is an example of supervised
learning.
Out Star Learning Rule: We use the Out Star Learning Rule when we assume that nodes or neurons in a
network arranged in a layer.
Here the weights connected to a certain node should be equal to the desired outputs for the neurons
connected through those weights.
The out start rule produces the desired response t for the layer of n nodes. Apply this type of
learning for all nodes in a particular layer.
1. Data Mining: Discovery of meaningful patterns (knowledge) from large volumes of data.
2. Expert Systems: A computer program for decision making that simulates thought process of a
human expert.
3. Fuzzy Logic: Theory of approximate reasoning.
4. Artificial Life: Evolutionary Computation, Swarm Intelligence.
5. Artificial Immune System: A computer program based on the biological immune system.
6. Medical: At the moment, the research is mostly on modelling parts of the human body and
recognizing diseases from various scans (e.g. cardiograms, CAT scans, ultrasonic scans, etc.).Neural
networks are ideal in recognizing diseases using scans since there is no need to provide a specific
algorithm on how to identify the disease. Neural networks learn by example so the details of how to
recognize the disease are not needed. What is needed is a set of examples that are representative of
all the variations of the disease. The quantity of examples is not as important as the 'quantity'. The
examples need to be selected very carefully if the system is to perform reliably and efficiently.
7. Computer Science: Researchers in quest of artificial intelligence have created spin offs like
dynamic programming, object oriented programming, symbolic programming, intelligent storage
management systems and many more such tools. The primary goal of creating an artificial
intelligence still remains a distant dream but people are getting an idea of the ultimate path, which
could lead to it.
8. Aviation: Airlines use expert systems in planes to monitor atmospheric conditions and system
status. The plane can be put on autopilot once a course is set for the destination.
9. Weather Forecast: Neural networks are used for predicting weather conditions. Previous data is
fed to a neural network, which learns the pattern and uses that knowledge to predict weather
patterns.
10. Neural Networks in business: Business is a diverted field with several general areas of
specialization such as accounting or financial analysis. Almost any neural network application
would fit into one business area or financial analysis.
ADVANTAGES OF ANN
1. Adaptive learning: An ability to learn how to do tasks based on the data given for training or
initial experience.
2. Self-Organisation: An ANN can create its own organisation or representation of the
information it receives during learning time.
3. Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices
are being designed and manufactured which take advantage of this capability.
4. Pattern recognition: It is a powerful technique for harnessing the information in the data and
generalizing about it. Neural nets learn to recognize the patterns which exist in
the data set.The system is developed through learning rather than
programming.. Neural nets teach themselves the patterns in the data freeing
the analyst for more interesting work.
5. Neural networks are flexible in a changing environment. Although neural networks may take some
Perceptron Networks:
WHAT IS A PERCEPTRON?
A perceptron is a binary classification algorithm modeled after the functioning of the human brain—it
was intended to emulate the neuron. The perceptron, while it has a simple structure, has the ability to
learn and and solve very complex problems.
1. Takes the inputs which are fed into the perceptrons in the input layer, multiplies them by
their weights, and computes the sum.
2. Adds the number one, multiplied by a “bias weight”. This is a technical step that makes it
possible to move the output function of each perceptron (the activation function) up, down, left
and right on the number graph.
3. Feeds the sum through the activation function—in a simple perceptron system, the
activation function is a step function.
4. The result of the step function is the output.
(or)
WHAT IS PERCEPTRON?
Types of Perceptron
Single-Layer Perceptron: This type of perceptron is limited to learning linearly separable patterns.
effective for tasks where the data can be divided into distinct categories through a straight line.
Multilayer Perceptron: Multilayer perceptrons possess enhanced processing capabilities as they consist
of two or more layers, adept at handling more complex patterns and relationships within the data.
Single Layer Perceptron (SLP) is a feed-forward network based on a threshold transfer function. SLP is the
simplest type of artificial neural networks and can only classify linearly separable cases with a binary target (1 ,
0).
Algorithm
The single layer perceptron does not have a priori knowledge, so the initial weights are assigned
randomly. SLP sums all the weighted inputs and if the sum is above the threshold (some
predetermined value), SLP is said to be activated (output=1).
Because SLP is a linear classifier and if the cases are not linearly separable the learning process will
never reach a point where all the cases are classified properly. The most famous example of the inability
of perceptron to solve problems with linearly non-separable cases is the XOR problem.
However, a multi-layer perceptron using the backpropagation algorithm can successfully classify the
XOR data.
Activation functions and other hyperparameters: a full neural network uses a variety of
activation functions which output real values, not boolean values like in the classic perceptron.
It is more flexible in terms of other details of the learning process, such as the number of
training iterations (iterations and epochs), weight initialization schemes, regularization, and so
on. All these can be tuned as hyperparameters.
Advanced architectures: full neural networks can have a variety of architectures that can
help solve specific problems. A few examples are Recurrent Neural Networks (RNN),
Convolutional Neural Networks (CNN), and Generative Adversarial Networks (GAN).
In the Multilayer perceptron, there can more than one linear layer (combinations of neurons). If
we take the simple example the three-layer network, first layer will be the input layer and last will be
output layer and middle layer will be called hidden layer. We feed our input data into the input layer and
take the output from the output layer. We can increase the number of the hidden layer as much as we
want, to make the model more complex according to our task.
Feed Forward Network, is the most typical neural network model. Its goal is to approximate some
function f (). Given, for example, a classifier y = f ∗ (x) that maps an input x to an output class y, the MLP
find the best approximation to that classifier by defining a mapping, y = f(x; θ) and learning the best
parameters θ for it. The MLP networks are composed of many functions that are chained together. A
Training the Model of MLP: There are basically three steps in the training of the model.
1. Forward pass
2. Calculate error or loss
3. Backward pass
1. Forward pass: In this step of training the model, we just pass the input to model and multiply with
weights and add bias at every layer and find the calculated output of the model.
2. Calculate error / loss: When we pass the data instance (or one example) we will get some output
from the model that is called Predicted output (pred_out) and we have the label with the data that is real
output or expected output(Expect_out). Based upon these both we calculate the loss that we have to
backpropagate(using Backpropagation algorithm). There is various Loss Function that we use based on
our output and requirement.
3. Backward Pass: After calculating the loss, we back propagate the loss and update the weights of the
model by using gradient. This is the main step in the training of the model. In this step, weights will
adjust according to the gradient flow in that direction.
Applications of MLP:
1. MLPs are useful in research for their ability to solve problems stochastically, which often
allows approximate solutions for extremely complex problems like fitness approximation.
2. MLPs are universal function approximators and they can be used to create mathematical
models by regression analysis.
3. MLPs are a popular machine learning solution in diverse fields such as speech recognition, image
recognition, and machine translation software.