0% found this document useful (0 votes)
73 views21 pages

Deep Learning Unit 1..

Deep learning

Uploaded by

yeshuarcot2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views21 pages

Deep Learning Unit 1..

Deep learning

Uploaded by

yeshuarcot2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

N.B.K.R.

INSTITUTE OF SCIENCE AND TECHNOLOGY::VIDYANAGAR

(AUTONOMOUS)

*****

Department of Computer Science and Engineering

B.TECH IV YEAR – I SEM (R20)

(2024-2025)

LECTURE NOTES

On

DEEP LEARNING

1 CSE DEEP LEARNING Prepared by - PCK


UNIT-I
Artificial Neural Networks: Introduction, Define Artificial Neural Networks, Basic
Building Blocks of Artificial Neural Networks, Artificial Neural Network Terminologies,
Learning Rules, Applications of Artificial Neural Networks.

Perceptron Networks: Single Layer Perceptron, Multi-Layer Perceptron.

Introduction

Artificial Neural Networks (ANN) are algorithms based on brain function and are used to model
complicated patterns and forecast issues. The Artificial Neural Network (ANN) is a deep learning
method that arose from the concept of the human brain Biological Neural Networks. The
development of ANN was the result of an attempt to replicate the workings of the human brain.
The workings of ANN are extremely similar to those of biological neural networks, although they
are not identical. ANN algorithm accepts only numeric and structured data.

Convolutional Neural Networks (CNN) and Recursive Neural Networks (RNN) are used to accept
unstructured and non-numeric data forms such as Image, Text, and Speech. This article focuses
solely on Artificial Neural Networks.

Define Artificial Neural Network

An Artificial Neural Network (ANN) is a mathematical model that tries to simulate the structure
and functionalities of biological neural networks.

Basic building block of every artificial neural network is artificial neuron, that is, a simple
mathematical model (function).

2 CSE DEEP LEARNING Prepared by - PCK


Such a model has three simple sets of rules: multiplication, summation and activation. At the
entrance of artificial neuron the inputs are weighted what means that every input value is
multiplied with individual weight. In the middle section of artificial neuron is sum function that
sums all weighted inputs and bias. At the exit of artificial neuron the sum of previously
weighted inputs and bias is passing through activation function that is also called transfer
function.

3 CSE DEEP LEARNING Prepared by - PCK


Artificial Neural Networks Architecture

In order to define a neural network that consists of a large number of artificial neurons, which are termed
units arranged in a sequence of layers.

Lets us look at various types of layers available in an artificial neural network.

Input Layer:

• As the name suggests, it accepts inputs in several different formats provided by the programmer.

Hidden Layer:

• The hidden layer presents in-between input and output layers. It performs all the calculations to find
hidden features and patterns.

Output Layer:

• The input goes through a series of transformations using the hidden layer, which finally results in output
that is conveyed using this layer.

• The artificial neural network takes input and computes the weighted sum of the inputs and includes a
bias. This computation is represented in the form of a transfer function.

• It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not.

• Only those who are fired make it to the output layer.

• There are distinctive activation functions available that can be applied upon the sort of task we are
performing.

4 CSE DEEP LEARNING Prepared by - PCK


There are three layers in the network architecture: the input layer, the hidden layer (more
than one), and the output layer. Because of the numerous layers are sometimes referred to
as the MLP (Multi-Layer Perceptron).

 It is possible to think of the hidden layer as a “distillation layer,” which extracts some of
the most relevant patterns from the inputs and sends them on to the next layer for
further analysis.

 The activation function is important for two reasons: first, it allows you to turn on your
computer.

 This model captures the presence of non-linear relationships between the inputs.
 It contributes to the conversion of the input into a more usable output.

Finding the “optimal values of W — weights” that minimize prediction error is critical to building a
successful model. The “backpropagation algorithm” does this by converting ANN into a learning
algorithm by learning from mistakes.

5 CSE DEEP LEARNING Prepared by - PCK


The optimization approach uses a “gradient descent” technique to quantify prediction errors. To find
the optimum value for W, small adjustments in W are tried, and the impact on prediction errors is
examined. Finally, those W values are chosen as ideal since further W changes do not reduce mistakes.

STRUCTURE AND FUNCTIONS OF ARTIFICIAL NEURON

An artificial neuron is a mathematical function conceived as a model of biological neurons, a


neural network. Artificial neurons are elementary units in an artificial neural network. The
artificial neuron receives one or more inputs (representing excitatory postsynaptic potentials
and inhibitory postsynaptic potentials at neural dendrites) and sums them to produce an
output (or activation, representing a neuron's action potential which is transmitted along its
axon). Usually each input is separately weighted, and the sum is passed through a non-linear
function known as an activation function or transfer function. The transfer functions usually
have a sigmoid shape, but they may also take the form of other non-linear functions, piecewise
linear functions, or step functions. They are also often monotonically increasing, continuous,
differentiable and bounded. The thresholding function has inspired building logic gates
referred to as threshold logic; applicable to building logic circuits resembling brain processing.

BASIC BUILDING BLOCKS OF ARTIFICIAL NEURAL NETWORKS

Processing of ANN depends upon the following three building blocks:

1. Network Topology
2. Adjustments of Weights or Learning
3. Activation Functions

6 CSE DEEP LEARNING Prepared by - PCK


1. Network Topology: A network topology is the arrangement of a network along with its nodes
and connecting lines. According to the topology,

ANN can be classified as the following kinds:

A. Feed forward Network: It is a non-recurrent network having processing units/nodes in


layers and all the nodes in a layer are connected with the nodes of the previous layers. The
connection has different weights upon them. There is no feedback loop means the signal
can only flow in one direction, from input to output.

It may be divided into the following two types:

 Single layer feed forward network: The concept is of feed forward ANN having
only one weighted layer. In other words, we can say the input layer is fully
connected to the output layer.

 Multilayer feed forward network: The concept is of feed forward ANN having
more than one weighted layer. As this network has one or more layers between
the input and the output layer, it is called hidden layers.

B. Feedback Network: As the name suggests, a feedback network has feedback paths,
which means the signal can flow in both directions using loops. This makes it a non-linear
dynamic system, which changes continuously until it reaches a state of equilibrium.

It may be divided into the following types:

 Recurrent networks: They are feedback networks with closed loops. Following are

7 CSE DEEP LEARNING Prepared by - PCK


the two types of recurrent networks.

 Fully recurrent network: It is the simplest neural network architecture because all
nodes are connected to all other nodes and each node works as both input and output.

 Jordan network − It is a closed loop network in which the output will go to the input
again as feedback as shown in the following diagram.

2. Adjustments of Weights or Learning: Learning, in artificial neural network, is the method


of modifying the weights of connections between the neurons of a specified network.
Learning in ANN can be classified into three categories namely supervised learning,
unsupervised learning, and reinforcement learning.

8 CSE DEEP LEARNING Prepared by - PCK


Supervised Learning:
 As the name suggests, this type of learning is done under the supervision of a teacher. This
learning process is dependent.

 During the training of ANN under supervised learning, the input vector is presented to the
network, which will give an output vector.

 This output vector is compared with the desired output vector.

 An error signal is generated, if there is a difference between the actual output and the
desired output vector.

 On the basis of this error signal, the weights are adjusted until the actual output is matched
with the desired output.

Unsupervised Learning:

 As the name suggests, this type of learning is done without the supervision of a teacher.
This learning process is independent.

 During the training of ANN under unsupervised learning, the input vectors of similar type
are combined to form clusters. When a new input pattern is applied, then the neural
network gives an output response indicating the class to which the input pattern belongs.

 There is no feedback from the environment as to what should be the desired output and if
it is correct or incorrect.

 Hence, in this type of learning, the network itself must discover the patterns and features
from the input data, and the relation for the input data over the output.

Reinforcement Learning:

As the name suggests, this type of learning is used to reinforce or strengthen the network over some
critic information.

This learning process is similar to supervised learning, however we might have very less information.
During the training of network under reinforcement learning, the network receives some feedback
from the environment.

This makes it somewhat similar to supervised learning. However, the feedback obtained here is
9 CSE DEEP LEARNING Prepared by - PCK
evaluative not instructive, which means there is no teacher as in supervised learning.

After receiving the feedback, the network performs adjustments of the weights to get better critic
information in future.

3. Activation Functions: An activation function is a mathematical equation that determines the output
of each element (perceptron or neuron) in the neural network. It takes in the input from each neuron
and transforms it into an output, usually between one and zero or between -1 and one. It may be
defined as the extra force or effort applied over the input to obtain an exact output. In ANN, we can
also apply activation functions over the input to get the exact output.

Followings are some activation functions of interest:

i) Linear Activation Function: It is also called the identity function as it performs no input
editing.

It can be defined as: F(x) = x

ii) Sigmoid Activation Function: It is of two type as follows −

Binary sigmoidal function: This activation function performs input editing between 0 and
1. It is positive in nature. It is always bounded, which means its output cannot be less than 0
and more than 1. It is also strictly increasing in nature, which means more the input higher
would be the output. It can be defined as

F(x) = sigm(x) =11 + exp (−x) F (x) = sigm(x) = 11 + exp(−x)

Bipolar sigmoidal function: This activation function performs input editing between -1 and
1. It can be positive or negative in nature. It is always bounded, which means its output
cannot be less than -1 and more than 1. It is also strictly increasing in nature like sigmoid
function. It can be defined as

10 CSE DEEP LEARNING Prepared by - PCK


F(x) = sigm (x) = 2 1+ exp (−x) −1 = 1 − exp(x)1 + exp(x)

ARTIFICIAL NEURAL NETWORK CONCEPTS/TERMINOLOGY

Here is a glossary of basic terms you should be familiar with before learning the details of neural networks.

Inputs: Source data fed into the neural network, with the goal of making a decision or prediction about the
data. Inputs to a neural network are typically a set of real values, each value is fed into one of the neurons in
the input layer.

Training Set: A set of inputs for which the correct outputs are known, used to train the neural network.

Outputs : Neural networks generate their predictions in the form of a set of real values or boolean
decisions. Each output value is generated by one of the neurons in the output layer.

Neuron/perceptron: The basic unit of the neural network. Accepts an input and generates a prediction.

Each neuron accepts part of the input and passes it through the activation function. Common activation
functions are sigmoid, TanH and ReLu. Activation functions help generate output values within an
acceptable range, and their non-linear form is crucial for training the network.

Weight Space: Each neuron is given a numeric weight. The weights, together with the activation function,
define each neuron’s output. Neural networks are trained by fine-tuning weights, to discover the optimal
set of weights that generates the most accurate prediction.

Forward Pass: The forward pass takes the inputs, passes them through the network and allows each
neuron to react to a fraction of the input. Neurons generate their outputs and pass them on to the next
layer, until eventually the network generates an output.

Error Function: Defines how far the actual output of the current model is from the correct output. When
training the model, the objective is to minimize the error function and bring output as close as possible to
the correct value.

Backpropagation: In order to discover the optimal weights for the neurons, we perform a backward pass,
moving back from the network’s prediction to the neurons that generated that prediction. This is called
backpropagation. Backpropagation tracks the derivatives of the activation functions in each successive
neuron, to find weights that bring the loss function to a minimum, which will generate the best prediction.
This is a mathematical process called gradient descent.

Bias and Variance: When training neural networks, like in other machine learning techniques, we try to
balance between bias and variance. Bias measures how well the model fits the training set—able to
correctly predict the known outputs of the training examples. Variance measures how well the model
works with unknown inputs that were not available during training. Another meaning of bias is a “ bias

11 CSE DEEP LEARNING Prepared by - PCK


neuron” which is used in every layer of the neural network. The bias neuron holds the number 1, and
makes it possible to move the activation function up, down, left and right on the number graph.

Hyperparameters: A hyper parameter is a setting that affects the structure or operation of the neural
network. In real deep learning projects, tuning hyper parameters is the primary way to build a network that
provides accurate predictions for a certain problem. Common hyper parameters include the number of
hidden layers, the activation function, and how many times (epochs) training should be repeated.

LEARNING RULES

WHAT ARE THE LEARNING RULES IN ANN?


Learning rule is a method or a mathematical logic. It helps a Neural Network to learn from the existing
conditions and improve its performance. Thus learning rules updates the weights and bias levels of a
network when a network simulates in a specific data environment. Applying learning rule is an iterative
process. It helps a neural network to learn from the existing conditions and improve its performance.

The different learning rules in the Neural network are:

1. Hebbian learning rule – It identifies, how to modify the weights of nodes of a network.

2. Perceptron learning rule – Network starts its learning by assigning a random value to each weight.

3. Delta learning rule – Modification in sympatric weight of a node is equal to the multiplication of
error and the input.

4. Correlation learning rule – The correlation rule is the supervised learning.

Outstar learning rule – We can use it when it assumes that nodes or neurons in a network arranged in a
layer.

1. Hebbian Learning Rule: The Hebbian rule was the first learning rule. In 1949 Donald Hebb
developed it as learning algorithm of the unsupervised neural network.
We can use it to identify how to improve the weights of nodes of a network
 The Hebb learning rule assumes that – If two neighbor neurons activated and deactivated at the
same time, then the weight connecting these neurons should increase. At the start, values of all
weights are set to zero. This learning rule can be used for both soft- and hard-activation
functions. Since desired responses of neurons are not used in the learning procedure, this is the
unsupervised learning rule.
 The absolute values of the weights are usually proportional to the learning time, which is
undesired.

Eq - Mathematical Formula of Hebb Learning Rule

12 CSE DEEP LEARNING Prepared by - PCK


Perceptron Learning Rule: Each connection in a neural network has an associated weight, which changes in
the course of learning.

 According to it, an example of supervised learning, the network starts its learning by
assigning a random value to each weight. Calculate the output value on the basis of a set of records
for which we can know the expected output value. This is the learning sample that indicates the
entire definition.
As a result, it is called a learning sample.

 The network then compares the calculated output value with the expected value. Next
calculates an error function ∈, which can be the sum of squares of the errors occurring for each
individual in the learning sample which can be computed as:

Eq - Mathematical Formula of Perceptron Learning Rule

Perform the first summation on the individuals of the learning set, and perform the second summation
on the output units. Eij and Oij are the expected and obtained values of the jth unit for the ith individual.
The network then adjusts the weights of the different units, checking each time to see if the error
function has increased or decreased. As in a conventional regression, this is a matter of solving a
problem of least squares. Since assigning the weights of nodes according to users, it is an example of
supervised learning.

Delta Learning Rule: Developed by Widrow and Hoff, the delta rule, is one of the most common learning rules.
It depends on supervised learning. This rule states that the modification in sympatric weight of
a node is equal to the multiplication of error and the input. In Mathematical form the delta rule
is as follows:

Eq - Mathematical Formula of Delta Learning Rule

For a given input vector, compare the output vector is the correct answer.
If the difference is zero, no learning takes place; otherwise, adjusts its weights to reduce this
difference.
The change in weight from ui to uj is: dwij = r* ai * ej. where r is the learning rate, ai represents

13 CSE DEEP LEARNING Prepared by - PCK


the activation of ui and ej is the difference between the expected output and the actual output of
uj. If the set of input patterns form an independent set then learn arbitrary associations using
the delta rule.
It has seen that for networks with linear activation functions and with no hidden units.
The error squared vs. the weight graph is a paraboloid in n-space. Since the proportionality
constant is negative, the graph of such a function is concave upward and has the least value.
The vertex of this paraboloid represents the point where it reduces the error. The weight vector
corresponding to this point is then the ideal weight vector.
We can use the delta learning rule with both single output unit and several output units. While
applying the delta rule assume that the error can be directly measured. The aim of applying the
delta rule is to reduce the difference between the actual and expected output that is the error.

Correlation Learning Rule: The correlation learning rule based on a similar principle as the Hebbian
learning rule.
 It assumes that weights between responding neurons should be more positive, and weights
between neurons with opposite reaction should be more negative.

 Contrary to the Hebbian rule, the correlation rule is the supervised learning, instead of an
actual. The response, oj, the desired response, dj, uses for the weight-change calculation.

In Mathematical form the correlation learning rule is as follows:

Eq - Mathematical Formula of Correlation Learning Rule

Where dj is the desired value of output signal. This training algorithm usually starts with the initialization
of weights to zero.

Since assigning the desired weight by users, the correlation learning rule is an example of supervised
learning.

Out Star Learning Rule: We use the Out Star Learning Rule when we assume that nodes or neurons in a
network arranged in a layer.

Here the weights connected to a certain node should be equal to the desired outputs for the neurons
connected through those weights.

The out start rule produces the desired response t for the layer of n nodes. Apply this type of
learning for all nodes in a particular layer.

14 CSE DEEP LEARNING Prepared by - PCK


Update the weights for nodes are as in Kohonen neural networks. In Mathematical form, express the
out star learning as follows:

Eq - Mathematical Formula of Out Star Learning Rule


This is a supervised training procedure because desired outputs must be known.

Applications of Artificial Neural Networks

1. Data Mining: Discovery of meaningful patterns (knowledge) from large volumes of data.
2. Expert Systems: A computer program for decision making that simulates thought process of a
human expert.
3. Fuzzy Logic: Theory of approximate reasoning.
4. Artificial Life: Evolutionary Computation, Swarm Intelligence.
5. Artificial Immune System: A computer program based on the biological immune system.
6. Medical: At the moment, the research is mostly on modelling parts of the human body and
recognizing diseases from various scans (e.g. cardiograms, CAT scans, ultrasonic scans, etc.).Neural
networks are ideal in recognizing diseases using scans since there is no need to provide a specific
algorithm on how to identify the disease. Neural networks learn by example so the details of how to
recognize the disease are not needed. What is needed is a set of examples that are representative of
all the variations of the disease. The quantity of examples is not as important as the 'quantity'. The
examples need to be selected very carefully if the system is to perform reliably and efficiently.

7. Computer Science: Researchers in quest of artificial intelligence have created spin offs like
dynamic programming, object oriented programming, symbolic programming, intelligent storage
management systems and many more such tools. The primary goal of creating an artificial
intelligence still remains a distant dream but people are getting an idea of the ultimate path, which
could lead to it.
8. Aviation: Airlines use expert systems in planes to monitor atmospheric conditions and system
status. The plane can be put on autopilot once a course is set for the destination.
9. Weather Forecast: Neural networks are used for predicting weather conditions. Previous data is
fed to a neural network, which learns the pattern and uses that knowledge to predict weather
patterns.
10. Neural Networks in business: Business is a diverted field with several general areas of
specialization such as accounting or financial analysis. Almost any neural network application
would fit into one business area or financial analysis.

15 CSE DEEP LEARNING Prepared by - PCK


o There is some potential for using neural networks for business purposes, including
resource allocation and scheduling.
11. There is also a strong potential for using neural networks for database mining, which is, searching
for patterns implicit within the explicitly stored information in databases. Most of the funded work
in this area is classified as proprietary. Thus, it is not possible to report on the full extent of the
work going on. Most work is applying neural networks, such as the Hopfield-Tank network for
optimization and scheduling.
12. Marketing: There is a marketing application which has been integrated with a neural network
system. The Airline Marketing Tactician (a trademark abbreviated as AMT) is a computer system
made of various intelligent technologies including expert systems. A feed forward neural network is
integrated with the AMT and was trained using back-propagation to assist the marketing control of
airline seat allocations. The adaptive neural approach was amenable to rule expression.
Additionally, the application's environment changed rapidly and constantly, which required a
continuously adaptive solution.
13. Credit Evaluation: The HNC company, founded by Robert Hecht-Nielsen, has developed several
neural network applications. One of them is the Credit Scoring system which increases the
profitability of the existing model up to 27%. The HNC neural systems were also applied to
mortgage screening. A neural network automated mortgage insurance under writing system was
developed by the Nestor Company. This system was trained with 5048 applications of which 2597
were certified. The data related to property and borrower qualifications. In a conservative mode
the system agreed on the under writers on 97% of the cases. In the liberal model the system agreed
84% of the cases. This is system run on an Apollo DN3000 and used 250K memory while processing
a case file in approximately 1 sec.

ADVANTAGES OF ANN

1. Adaptive learning: An ability to learn how to do tasks based on the data given for training or
initial experience.
2. Self-Organisation: An ANN can create its own organisation or representation of the
information it receives during learning time.
3. Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices
are being designed and manufactured which take advantage of this capability.
4. Pattern recognition: It is a powerful technique for harnessing the information in the data and
generalizing about it. Neural nets learn to recognize the patterns which exist in
the data set.The system is developed through learning rather than
programming.. Neural nets teach themselves the patterns in the data freeing
the analyst for more interesting work.

5. Neural networks are flexible in a changing environment. Although neural networks may take some

16 CSE DEEP LEARNING Prepared by - PCK


time to learn a sudden drastic change they are excellent at adapting to constantly changing
information.
6. Neural networks can build informative models whenever conventional approaches fail. Because
neural networks can handle very complex interactions they can easily model data which is too
difficult to model with traditional approaches such as inferential statistics or programming logic.
7. Performance of neural networks is at least as good as classical statistical modelling, and better on
most problems. The neural networks build models that are more reflective of the structure of the
data in significantly less time.

Perceptron Networks:

WHAT IS A PERCEPTRON?

A perceptron is a binary classification algorithm modeled after the functioning of the human brain—it
was intended to emulate the neuron. The perceptron, while it has a simple structure, has the ability to
learn and and solve very complex problems.

The perceptron learns as follows:

1. Takes the inputs which are fed into the perceptrons in the input layer, multiplies them by
their weights, and computes the sum.
2. Adds the number one, multiplied by a “bias weight”. This is a technical step that makes it
possible to move the output function of each perceptron (the activation function) up, down, left
and right on the number graph.
3. Feeds the sum through the activation function—in a simple perceptron system, the
activation function is a step function.
4. The result of the step function is the output.

(or)
WHAT IS PERCEPTRON?

 Perceptron is one of the simplest Artificial neural network architectures.


17 CSE DEEP LEARNING Prepared by - PCK
 It was introduced by Frank Rosenblatt in 1957s.
 It is the simplest type of feedforward neural network, consisting of a single layer of input nodes
that are fully connected to a layer of output nodes.
 It can learn the linearly separable patterns.
 It uses slightly different types of artificial neurons known as threshold logic units (TLU).
 It was first introduced by McCulloch and Walter Pitts in the 1940s.

Types of Perceptron

 Single-Layer Perceptron: This type of perceptron is limited to learning linearly separable patterns.
effective for tasks where the data can be divided into distinct categories through a straight line.

 Multilayer Perceptron: Multilayer perceptrons possess enhanced processing capabilities as they consist
of two or more layers, adept at handling more complex patterns and relationships within the data.

Single Layer Perceptron (SLP) is a feed-forward network based on a threshold transfer function. SLP is the
simplest type of artificial neural networks and can only classify linearly separable cases with a binary target (1 ,
0).

Algorithm
The single layer perceptron does not have a priori knowledge, so the initial weights are assigned
randomly. SLP sums all the weighted inputs and if the sum is above the threshold (some
predetermined value), SLP is said to be activated (output=1).

18 CSE DEEP LEARNING Prepared by - PCK


The input values are presented to the perceptron, and if the predicted output is the same as the desired
output, then the performance is considered satisfactory and no changes to the weights are made.
However, if the output does not match the desired output, then the weights need to be changed to reduce
the error.

Because SLP is a linear classifier and if the cases are not linearly separable the learning process will
never reach a point where all the cases are classified properly. The most famous example of the inability
of perceptron to solve problems with linearly non-separable cases is the XOR problem.

However, a multi-layer perceptron using the backpropagation algorithm can successfully classify the
XOR data.

What is Multilayer Perceptron?


A multilayer perceptron (MLP) is a group of perceptrons, organized in multiple layers, that can accurately
answer complex questions. Each perceptron in the first layer (on the left) sends signals to all the perceptrons
in the second layer, and so on. An MLP contains an input layer, at least one hidden layer, and an output layer.

19 CSE DEEP LEARNING Prepared by - PCK


A multilayer perceptron is quite similar to a modern neural network. By adding a few ingredients, the
perceptron architecture becomes a full-fledged deep learning system:

 Activation functions and other hyperparameters: a full neural network uses a variety of
activation functions which output real values, not boolean values like in the classic perceptron.
It is more flexible in terms of other details of the learning process, such as the number of
training iterations (iterations and epochs), weight initialization schemes, regularization, and so
on. All these can be tuned as hyperparameters.

 Backpropagation: a full neural network uses the backpropagation algorithm, to perform


iterative backward passes which try to find the optimal values of perceptron weights, to
generate the most accurate prediction.

 Advanced architectures: full neural networks can have a variety of architectures that can
help solve specific problems. A few examples are Recurrent Neural Networks (RNN),
Convolutional Neural Networks (CNN), and Generative Adversarial Networks (GAN).

BRIEFLY EXPLAIN MULTI LAYER PERCEPTRON MODEL

In the Multilayer perceptron, there can more than one linear layer (combinations of neurons). If
we take the simple example the three-layer network, first layer will be the input layer and last will be
output layer and middle layer will be called hidden layer. We feed our input data into the input layer and
take the output from the output layer. We can increase the number of the hidden layer as much as we
want, to make the model more complex according to our task.

Feed Forward Network, is the most typical neural network model. Its goal is to approximate some
function f (). Given, for example, a classifier y = f ∗ (x) that maps an input x to an output class y, the MLP
find the best approximation to that classifier by defining a mapping, y = f(x; θ) and learning the best
parameters θ for it. The MLP networks are composed of many functions that are chained together. A

20 CSE DEEP LEARNING Prepared by - PCK


network with three functions or layers would form f(x) = f (3)(f (2)(f (1)(x))). Each of these layers is
composed of units that perform a transformation of a linear sum of inputs. Each layer is represented as y
= f(WxT + b). Where f is the activation function, W is the set of parameter, or weights, in the layer, x is
the input vector, which can also be the output of the previous layer, b is the bias vector and T is the
training function. The layers of an MLP consist of several fully connected layers because each unit in a
layer is connected to all the units in the previous layer. In a fully connected layer, the parameters of each
unit are independent of the rest of the units in the layer, that means each unit possess a unique set of
weights.

Training the Model of MLP: There are basically three steps in the training of the model.

1. Forward pass
2. Calculate error or loss
3. Backward pass

1. Forward pass: In this step of training the model, we just pass the input to model and multiply with
weights and add bias at every layer and find the calculated output of the model.

2. Calculate error / loss: When we pass the data instance (or one example) we will get some output
from the model that is called Predicted output (pred_out) and we have the label with the data that is real
output or expected output(Expect_out). Based upon these both we calculate the loss that we have to
backpropagate(using Backpropagation algorithm). There is various Loss Function that we use based on
our output and requirement.
3. Backward Pass: After calculating the loss, we back propagate the loss and update the weights of the
model by using gradient. This is the main step in the training of the model. In this step, weights will
adjust according to the gradient flow in that direction.

Applications of MLP:

1. MLPs are useful in research for their ability to solve problems stochastically, which often
allows approximate solutions for extremely complex problems like fitness approximation.
2. MLPs are universal function approximators and they can be used to create mathematical
models by regression analysis.
3. MLPs are a popular machine learning solution in diverse fields such as speech recognition, image
recognition, and machine translation software.

21 CSE DEEP LEARNING Prepared by - PCK

You might also like