FFNN,GD,Backpropagation

Backpropagation is a crucial algorithm for training deep neural networks, allowing efficient computation of gradients for optimization through gradient descent. It emerged in the 1970s and 1980s, addressing the limitations of earlier methods by utilizing the chain rule of calculus to propagate errors backward through the network. This innovation has significantly advanced the field of deep learning, enabling the development of complex models and modern techniques such as CNNs and RNNs.

Uploaded by

BENAZIR AE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

FFNN,GD,Backpropagation

Uploaded by

BENAZIR AE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Backpropagation: The Backbone of Neural

Network Training
 Backpropagation, short for “backward
propagation of errors,” is a fundamental
algorithm in the training of deep neural
networks.
 It efficiently computes the gradients of the
loss function with respect to the network’s
parameters, enabling the use of gradient
descent methods to optimize these
parameters.
1. The Evolution of the Multilayer Perceptron
or Feed Forward Neural Networks
The development of backpropagation is
deeply intertwined with the history of neural
networks. In the 1960s, researchers
encountered the XOR problem, a classic
example of a non-linearly separable dataset,
which highlighted the limitations of single-
layer perceptrons. This challenge spurred the
development of the Multilayer Perceptron
(MLP), a class of feedforward neural networks
capable of modeling complex, non-linear
relationships.
The Structure of an MLP
 An MLP consists of an input layer, one or
more hidden layers, and an output layer.
 Each layer contains nodes (neurons) that
are connected to nodes in adjacent layers
through weighted connections.
A 3-layer Multilayer Perceptron (MLP) Model.
 A 3-layer MLP, for instance, can be
represented mathematically as a
composite function:

Where:
 W⁽ˡ⁾ and b⁽ˡ⁾ are the weight matrix and bias
vector for layer l.
 a⁽ˡ⁾ is the activation output of all neurons
at layer l.
 g⁽ˡ⁾ is the activation function for
layer l (e.g., sigmoid, tanh, ReLU, Softmax).

A 3-layer MLP network as shown above can

be represented mathematically as a
composite function:

2. Gradient Descent
 The objective of training a neural network

that minimizes a cost function ℒ(θ) over

is to find the optimal set of parameters θ*

the training dataset.

 The cost function, also known as the loss
function, measures the discrepancy
between the network’s predictions and
the actual targets.
 This optimization problem can be
formalized as:
where θ = {W⁽¹⁾,b⁽¹⁾,W⁽²⁾,b⁽²⁾, …,W⁽ᴸ⁾,b⁽ᴸ⁾}
represents all the network parameters
(weights and biases).
 Initially, researchers used gradient descent
to train MLPs.
 Gradient descent is an optimization
algorithm that adjusts the weights of the
network to minimize the error between
the predicted output and the actual
output.
However, gradient descent was not efficient
for training deep neural networks, as it
required manual computation of the
gradient and was prone to getting stuck in
local minima.
3. The Need for Backpropagation
A major challenge in training deep neural
networks is the efficient computation of
gradients. A naive approach would involve
perturbing each parameter individually and

ℒ(θ). This method is computationally

observing the change in the cost function

prohibitive, especially for networks with

millions of parameters.

The breakthrough came with the

development of backpropagation in the 1970s
and 1980s. Backpropagation leverages the
chain rule of calculus and the layered
structure of neural networks to compute
gradients efficiently. By reducing the
computational complexity from exponential
to linear in the number of parameters,
backpropagation made the training of deep
networks feasible.
Key contributors to the development and
popularization of backpropagation include
Seppo Linnainmaa, Paul Werbos, David
Rumelhart, Geoffrey Hinton, and Ronald
Williams. Their work marked a turning point
in neural network research, enabling the
training of deep models and laying the
foundation for modern deep learning.
4. The Chain Rule in Backpropagation
The chain rule is a fundamental concept in
calculus that allows us to compute the
derivative of a composite function. In the
context of backpropagation, the chain rule is
used to propagate the error backward
through the network.
For a composite function y = g(f(x)),
where y = g(z) and z = f(x), the chain rule
states:

Backpropagation applies the chain rule layer

by layer, starting from the output layer and
moving backward to the input layer. This
systematic approach enables the efficient
computation of gradients, making it possible
to train deep neural networks with many
layers and parameters.
5. The Backpropagation Algorithm
The backpropagation algorithm consists of
two main phases: the forward pass and the
backward pass. These phases work together
to compute gradients efficiently, enabling the
optimization of network parameters.
This efficiency arises because the gradients of
the weights and biases can be determined
using the chain rule, and these partial
derivatives can be effectively pre-computed
through forward and backward passes.

5.1 Forward Pass:

During the forward pass, activations aⱼ⁽ˡ⁾ are
computed layer by layer, starting from the
input layer and moving to the output layer.
These activations represent the outputs of the
neurons in each layer and are used to
compute the network’s final prediction.
Mathematically, the activation aⱼ⁽ˡ⁾ of
neuron j in layer l is computed as:
An important observation is that the
activations aⱼ⁽ˡ⁻¹⁾ from the previous layer l−1
are also equal to the partial derivatives of the
net input zᵢ⁽ˡ⁾ with respect to the
weights wᵢ,ⱼ⁽ˡ⁾. Specifically:

This relationship is crucial because it allows

the activations computed during the forward
pass to be reused in the backward pass for
efficient gradient computation.
By storing these activations, the
backpropagation algorithm avoids redundant
calculations, significantly improving
computational efficiency.
5.2 Backward Pass:
The backward pass begins at the output layer
and moves backward through the network. It

the partial derivatives of the cost function ℒ

computes delta terms δᵢ⁽ˡ⁾, which represent

with respect to the net inputs zᵢ⁽ˡ⁾. These delta

terms are propagated through the network in
reverse order, allowing the algorithm to
compute gradients for all layers.
For hidden layers l:

where g⁽ˡ⁾′() is the derivative of the activation

function for layer l.
For the output layer L:
Therefore, the backpropagation process
begins by calculating the error terms δᵢ⁽ᴸ⁾ for
the output layer L. It then systematically
computes the error terms δₖ⁽ˡ⁾ for each
preceding layer l, using the error terms δₖ⁽ˡ⁺¹⁾
from the layer immediately above it.

5.3 Gradient Computation:

Using the activations from the forward pass
and the delta terms from the backward pass,
the gradients of the cost function with respect
to the weights and biases are computed as:
6. Training Neural Networks with
Backpropagation
The training process using gradient descent
with backpropagation involves the following
steps:
The efficiency of backpropagation stems from
its ability to reuse computed values and avoid
redundant calculations through the clever
application of the chain rule. This makes it
significantly faster than naive gradient
computation methods.
7. Impact of Backpropagation
Backpropagation has revolutionized the field
of artificial neural networks. It has enabled
the training of complex models that can learn
and generalize from large datasets, leading to
breakthroughs in computer vision, natural
language processing, speech recognition, and
more. The widespread adoption of
backpropagation has also spurred the
development of advanced techniques such as
convolutional neural networks (CNNs),
recurrent neural networks (RNNs), and
Transformers.
Today, gradients are automatically computed
using PyTorch’s automatic differentiation
(AutoGrad) feature. AutoGrad simplifies the
process of implementing backpropagation by
automatically calculating the gradients of
neural network parameters with respect to
the loss function. This allows researchers and
developers to focus more on designing and
experimenting with neural network
architectures rather than manually computing
gradients, thereby accelerating innovation
and improving the efficiency of model
training.

8. Conclusion
Backpropagation is the backbone of modern
deep learning, providing a computationally
efficient method for training complex neural
networks. By leveraging the chain rule and the
layered structure of networks, it enables the
application of gradient-based optimization
techniques to find optimal parameters.
Understanding backpropagation is essential
for anyone working in deep learning, as it
underpins many of the field’s most powerful
techniques. Its historical significance,
mathematical elegance, and practical utility
make it an indispensable tool in the deep
learning toolkit.

Introduction To ME 446: Automatic Controls 446 - 1
No ratings yet
Introduction To ME 446: Automatic Controls 446 - 1
4 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
32 pages
Backpropagation Process in Deep Neural Network
No ratings yet
Backpropagation Process in Deep Neural Network
6 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Backpropagation
No ratings yet
Backpropagation
4 pages
Backpropagation
No ratings yet
Backpropagation
2 pages
ML Session 15 Backpropagation
No ratings yet
ML Session 15 Backpropagation
30 pages
Back Propagation
No ratings yet
Back Propagation
19 pages
ANN research
No ratings yet
ANN research
18 pages
Chapter 3-3 Neural Network-Back Propagation
No ratings yet
Chapter 3-3 Neural Network-Back Propagation
32 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
RBFN and TDNN
No ratings yet
RBFN and TDNN
42 pages
Backpropagation
No ratings yet
Backpropagation
4 pages
Unit4 - Chain Rule and Backpropagation
No ratings yet
Unit4 - Chain Rule and Backpropagation
4 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
ML Module 2 New
No ratings yet
ML Module 2 New
36 pages
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
No ratings yet
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
6 pages
Complete Deep Learning Interview Question
No ratings yet
Complete Deep Learning Interview Question
46 pages
Unit II Supervised II
No ratings yet
Unit II Supervised II
16 pages
Back Propogation
No ratings yet
Back Propogation
28 pages
Machine Learning
No ratings yet
Machine Learning
73 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND_20250415_122012_0000
No ratings yet
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND_20250415_122012_0000
18 pages
Backpropagation_Networks_Presentation_Updated
No ratings yet
Backpropagation_Networks_Presentation_Updated
10 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
BACK PROPAGATION Cluster 4
No ratings yet
BACK PROPAGATION Cluster 4
45 pages
Unit 3
No ratings yet
Unit 3
6 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
mod 2 3
No ratings yet
mod 2 3
27 pages
nn1
No ratings yet
nn1
21 pages
2012-1158. Backpropagation NN
No ratings yet
2012-1158. Backpropagation NN
56 pages
Feedforward
No ratings yet
Feedforward
34 pages
Clase 4 Backpropagation
No ratings yet
Clase 4 Backpropagation
63 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
16 pages
DOC-20241108-WA0006.
No ratings yet
DOC-20241108-WA0006.
70 pages
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
No ratings yet
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
39 pages
Multiple-Layer Networks Backpropagation Algorithms
No ratings yet
Multiple-Layer Networks Backpropagation Algorithms
25 pages
UNIT 2 Notes
No ratings yet
UNIT 2 Notes
19 pages
DL Unit 3 Notes PPT
No ratings yet
DL Unit 3 Notes PPT
37 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
72 pages
Backpropagation
No ratings yet
Backpropagation
8 pages
Back Propagation
No ratings yet
Back Propagation
5 pages
29324-Article Text-33378-1-2-20240324
No ratings yet
29324-Article Text-33378-1-2-20240324
8 pages
MLP Lecture 4
No ratings yet
MLP Lecture 4
35 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
ca3dl
No ratings yet
ca3dl
6 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
7 pages
Principles of Training Multi-Layer Neural Network Using Backpropagation
100% (1)
Principles of Training Multi-Layer Neural Network Using Backpropagation
15 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
backprop unit 2
No ratings yet
backprop unit 2
5 pages
3.forward and Backpropagation
No ratings yet
3.forward and Backpropagation
23 pages
Loss Optimization Gradient Decent
No ratings yet
Loss Optimization Gradient Decent
10 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Inception New
No ratings yet
Inception New
11 pages
Ex NO 9 DL LAB
No ratings yet
Ex NO 9 DL LAB
3 pages
JS FUNCTIONS
No ratings yet
JS FUNCTIONS
8 pages
Bootstrap Lab Manual
No ratings yet
Bootstrap Lab Manual
28 pages
Javascript Programs
No ratings yet
Javascript Programs
14 pages
Css Text Styling
No ratings yet
Css Text Styling
20 pages
Unit 1
No ratings yet
Unit 1
16 pages
Seminar On Advanced Technologies Computer Vision in Robotics
No ratings yet
Seminar On Advanced Technologies Computer Vision in Robotics
10 pages
Maulina Putri Lestari - M0220052 - Tugas 5
No ratings yet
Maulina Putri Lestari - M0220052 - Tugas 5
7 pages
Properties of Fourier Transform
No ratings yet
Properties of Fourier Transform
10 pages
Searching, Hashing and Sorting: MU Problems (CS)
No ratings yet
Searching, Hashing and Sorting: MU Problems (CS)
4 pages
Globally Normalized Transition-Based Neural Networks
No ratings yet
Globally Normalized Transition-Based Neural Networks
12 pages
ECE438 - Laboratory 7a: Digital Filter Design (Week 1) : N 1 I M K
No ratings yet
ECE438 - Laboratory 7a: Digital Filter Design (Week 1) : N 1 I M K
11 pages
Session 5-6 PORTFOLIO CONSTRUCTION - SELECTION
No ratings yet
Session 5-6 PORTFOLIO CONSTRUCTION - SELECTION
65 pages
Stock Price Prediction Using Dynamic Mode Decomposition
No ratings yet
Stock Price Prediction Using Dynamic Mode Decomposition
6 pages
Speedup and Efficiency
No ratings yet
Speedup and Efficiency
11 pages
Operación Dinámica y Seguridad de Los SEP Outline: Universidad de Cuenca Escuela de Ingeniería Eléctrica
No ratings yet
Operación Dinámica y Seguridad de Los SEP Outline: Universidad de Cuenca Escuela de Ingeniería Eléctrica
3 pages
TCS Digital Coding Practice Set
No ratings yet
TCS Digital Coding Practice Set
14 pages
Automatic Design of Decision-Tree Induction Algorithms - Rodrigo C. Barros PDF
100% (1)
Automatic Design of Decision-Tree Induction Algorithms - Rodrigo C. Barros PDF
184 pages
SMA 2207 (SAMPLE EXAM QUESTIONS)
No ratings yet
SMA 2207 (SAMPLE EXAM QUESTIONS)
2 pages
Bigdata PPT Slides (E)
No ratings yet
Bigdata PPT Slides (E)
10 pages
Affine Cipher: Addition (Shifting) and Multiplication Can Be Combined To Give An Affine Transformation
No ratings yet
Affine Cipher: Addition (Shifting) and Multiplication Can Be Combined To Give An Affine Transformation
8 pages
LPP 1
0% (1)
LPP 1
36 pages
Self Quiz U5
No ratings yet
Self Quiz U5
7 pages
Shor's algorithm - Wikipedia
No ratings yet
Shor's algorithm - Wikipedia
15 pages
Overview of Web Password Hashing Using S
No ratings yet
Overview of Web Password Hashing Using S
3 pages
Stationary Stochastic Processes
No ratings yet
Stationary Stochastic Processes
11 pages
Reece Harding - Stata 12
No ratings yet
Reece Harding - Stata 12
2 pages
Graph Sheet V.2
No ratings yet
Graph Sheet V.2
13 pages
Quiz 3: Mathematical Methods I: Answer
No ratings yet
Quiz 3: Mathematical Methods I: Answer
1 page
Summary(-) Probabilities LS & GS
No ratings yet
Summary(-) Probabilities LS & GS
2 pages
Lec 01 GENG 300 Numerical Methods PDF
No ratings yet
Lec 01 GENG 300 Numerical Methods PDF
37 pages
ML 2
No ratings yet
ML 2
4 pages
2022 10 12 Exam Pa Project Statement
No ratings yet
2022 10 12 Exam Pa Project Statement
25 pages
The 4 Simple Steps For Creating A Monte Carlo Simulation With Engage or Workspace
No ratings yet
The 4 Simple Steps For Creating A Monte Carlo Simulation With Engage or Workspace
11 pages
Deep Learning For Intelligent Demand Response and Smart Grids: A Comprehensive Survey
No ratings yet
Deep Learning For Intelligent Demand Response and Smart Grids: A Comprehensive Survey
25 pages