0% found this document useful (0 votes)

4 views

lecture 4

The document outlines the training process of Feedforward Neural Networks (FFNN), focusing on forward pass, loss calculation, and backpropagation. It discusses various optimization techniques, including gradient descent and its variants (batch, mini-batch, and stochastic). Additionally, it highlights challenges in mini-batch gradient descent and introduces adaptive learning rates and alternative optimizers like ADAM.

Uploaded by

lokr.789

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

lecture 4

Uploaded by

lokr.789

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

FFNN training

Prepared by: Dr / Doaa Gamal

Assistant professor at Faculty of Engineering, Suez Canal University
([email protected])
Lecture outline
2

 Last time we talked about:

 Multi-layer Perceptron (MLP, or FFNN)
 FFNN training (Gradient descent)
 Today we are going to talk about:
 FFNN training
 Backpropagation
 Different Loss optimizers
TRAINING FFNN
Regression Example
4

• The following data is a set of large regression dataset to calculate the savings of
workers given their income and minimum wage in their countries
• It is required to design a FFNN to fit the problem
Regression Example (forward direction)
5

For the first sample:

𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 = 𝑦 (1) =f(𝑥 (1) ; 𝑊)
Actual=𝑦 (𝑖)
Regression Example (forward direction)
6

Is the predicted output for the ith input sample

Loss optimization
7
Loss optimization
8

the gradient of a scalar-function f of several

variables is the vector ∇f whose value at a
point p gives the direction and the rate of
fastest increase.
Loss optimization
9
Gradient Descent Optimizer
10
Gradient Descent Optimizer
11

𝜕𝐽(𝑊)
 However How to calculate the gradients
𝜕𝑊
BACKPROPAGATION
Computing gradients (Backpropagation)
13
Computing gradients (Backpropagation)
14
Computing gradients (Backpropagation)
15
Computing gradients (Backpropagation)
16
Computing gradients (Backpropagation)
17
Training FFNN
18

 Forward Pass: every data point (or a batch of data points) in

the training dataset is passed through the model, resulting in
a prediction.
 Calculation of Loss: The predictions from the forward pass are
compared to the actual targets using a loss function.
 Backward Pass (Backpropagation): The error calculated is then
propagated backward through the model. During this phase,
gradients (derivatives) of the loss with respect to model
parameters are computed.
Training FFNN
19

 Optimization: An optimizer then adjusts the model

parameters in a direction that minimizes the loss which
depends on the calculated gradients.
 Repeat: previous steps are repeated for the entire dataset.
Once the dataset is entirely processed, one epoch is
completed. This process is then repeated for the desired
number of epochs.
TRAINING EXAMPLE
Backpropagation example (XOR)
21

𝒙𝟏 𝒙𝟐 target
0 0 0

0 1 1

1 0 1

1 1 0
Backpropagation example (random initialization)
22
Backpropagation example (Feedforward)
23
Backpropagation example (loss calculation)
24

n: number of samples
In this example n=1
Backpropagation example (backword)
25

Gradient descent:
𝑊𝑛𝑒𝑤 = 𝑊𝑜𝑙𝑑 − 𝜂∆𝑊

𝜕𝐽(𝑊)
∆𝑤 =
𝜕𝑊
Backpropagation example (weight update)
26

Gradient descent:
𝑊𝑛𝑒𝑤 = 𝑊𝑜𝑙𝑑 − 𝜂∆𝑊
𝜕𝐽(𝑊)
∆𝑤 =
𝜕𝑊
Backpropagation example
27

Iteration 2:
• Repeat the above steps using the second sample (0,1), desired output=1
Backpropagation example (final result)
28
OPTIMIZATION: GRADIENT DESCENT ALTERNATIVES
Gradient descent
30

Gradient Descent (GD) is the most classic optimization algorithm. It

aims to minimize the loss function by decreasing each parameter in
the direction of the gradient.
Gradient descent variants
31

 There are three variants of gradient descent, which differ

in how much data we use to compute the gradient of the
objective function.
i. batch gradient descent ii. Minibatch gradient descent
Iii. Stochastic gradient descent
Gradient descent variants
32

 A training dataset can be divided into one or more

batches.
 Stochastic Gradient Descent. Batch Size = 1
 Batch Gradient Descent. Batch Size = Size of Training Set
 Mini-Batch Gradient Descent. 1 < Batch Size < Size of
Training Set
Stochastic gradient descent
33

 Stochastic gradient descent (SGD) in contrast performs a

parameter update for each training example x(i) and
label y(i):
 n=1
 It is therefore usually much faster and can also be used to
learn online.
 SGD performs frequent updates with a high variance that
cause the objective function to fluctuate heavily.
Batch gradient descent
34

 batch gradient descent, computes the gradient of the cost

function w.r.t. to the parameters θ (weight and bias) for
the whole training dataset to perform just one update :
 n=dataset size
 batch gradient descent is very slow and intractable for
datasets that don't fit in memory. Batch gradient descent
also doesn't allow us to update our model online, i.e. with
new examples on-the-fly.
Mini-batch gradient descent
35

 Mini-batch gradient descent finally takes the best of both

worlds and performs an update for every mini-batch
of n training examples:
 n=100
Mini-batch gradient descent
36

a) reduces the variance of the parameter updates, which can lead to more stable
convergence;
b) can make use of highly optimized matrix optimizations
c) Common mini-batch sizes range between 50 and 256, but can vary for different
applications.
Mini-batch gradient descent is typically the algorithm of choice when training a
neural network and the term SGD usually is employed also when mini-batches are
used.
Challenges of mini-batch gradient descent
37
Challenges of mini-batch gradient descent
38
Challenges of mini-batch gradient descent
39
Challenges of mini-batch gradient descent
40
Challenges of mini-batch gradient descent
41
Challenges of mini-batch gradient descent
42

Choosing a proper learning rate can be difficult.

 Learning rate schedules try to adjust the learning rate during training

by reducing the learning rate according to a pre-defined schedule or

when the change in objective between epochs falls below a
threshold.
 Additionally, the same learning rate applies to all parameter updates.
If our data is sparse and our features have very different frequencies,
we might not want to update all of them to the same extent, but
perform a larger update for rarely occurring features.
Adaptive learning rate
43
Gradient descent Alternative optimizers
44
Gradient descent optimization algorithms
45

 ADAM optimizer  SGD

Thank You

Lecture 4
No ratings yet
Lecture 4
45 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
72 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Training NNs
No ratings yet
Training NNs
34 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
AI ML Nov 15
No ratings yet
AI ML Nov 15
32 pages
2020 CS182 Section 2 Notes
No ratings yet
2020 CS182 Section 2 Notes
6 pages
ML807_Distributed_and_Federated_Learning_Slides_2
No ratings yet
ML807_Distributed_and_Federated_Learning_Slides_2
211 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
Lec 8
No ratings yet
Lec 8
43 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
59 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
Lecture 7 - Optimization Part I
No ratings yet
Lecture 7 - Optimization Part I
38 pages
Backpropagation, Sgmiod Neuron & Gradient Discend
No ratings yet
Backpropagation, Sgmiod Neuron & Gradient Discend
29 pages
S09_DNN_Gradients_wip
No ratings yet
S09_DNN_Gradients_wip
28 pages
Lecture02-Basics of Deep Learning
No ratings yet
Lecture02-Basics of Deep Learning
34 pages
Lecture 08 ML
No ratings yet
Lecture 08 ML
20 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
2 Deep Neural Network_241120_095158
No ratings yet
2 Deep Neural Network_241120_095158
47 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
AI Unit II Lec Notes Deep Learning
No ratings yet
AI Unit II Lec Notes Deep Learning
64 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
Aie231 NN Lab5
No ratings yet
Aie231 NN Lab5
7 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Lecture_2
No ratings yet
Lecture_2
31 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Lec 8 Training NN (1)
No ratings yet
Lec 8 Training NN (1)
71 pages
Unit 2
No ratings yet
Unit 2
13 pages
Lec 8 Training NN
No ratings yet
Lec 8 Training NN
71 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
5.Scaling_Optimization
No ratings yet
5.Scaling_Optimization
68 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Lect 6
No ratings yet
Lect 6
60 pages
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
No ratings yet
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
20 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Pattern Classification 11. Backpropagation & Time-Series Forecasting
No ratings yet
Pattern Classification 11. Backpropagation & Time-Series Forecasting
78 pages
Lect 12 -Deep Feed Forward NN- Review
No ratings yet
Lect 12 -Deep Feed Forward NN- Review
93 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
AI - W7L13
No ratings yet
AI - W7L13
46 pages
DNNTrain3 Printable
No ratings yet
DNNTrain3 Printable
251 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Module 2
No ratings yet
Module 2
67 pages
Lec 5 Scaling and Opt
No ratings yet
Lec 5 Scaling and Opt
68 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Admission Process Web PDF
No ratings yet
Admission Process Web PDF
2 pages
R. A. Kartini - The Biography
No ratings yet
R. A. Kartini - The Biography
2 pages
Burgin - Situational - Aesthetics
No ratings yet
Burgin - Situational - Aesthetics
4 pages
Obedience_Following Orders
No ratings yet
Obedience_Following Orders
7 pages
The Cognitive Benefits of Language Learning
No ratings yet
The Cognitive Benefits of Language Learning
2 pages
Flower Dissection
No ratings yet
Flower Dissection
5 pages
A How To Write An Article Review - Full Guide With Examples - EssayPro
No ratings yet
A How To Write An Article Review - Full Guide With Examples - EssayPro
18 pages
Demo Lesson Plan Sport
No ratings yet
Demo Lesson Plan Sport
3 pages
Coq
No ratings yet
Coq
7 pages
SPH2105 2016 Ex
No ratings yet
SPH2105 2016 Ex
5 pages
(Midterm) Test 1 - Decision Making (1 Chap)
No ratings yet
(Midterm) Test 1 - Decision Making (1 Chap)
4 pages
Anambra State College of Health Technology 000
No ratings yet
Anambra State College of Health Technology 000
2 pages
2016 MOH Questions of PROMETRIC Licensing Exam For General Practitioners
No ratings yet
2016 MOH Questions of PROMETRIC Licensing Exam For General Practitioners
3 pages
Interview Method of Data Collection
0% (1)
Interview Method of Data Collection
3 pages
Forward-Walking Greens Function Monte Carlo Metho
No ratings yet
Forward-Walking Greens Function Monte Carlo Metho
23 pages
Ethics of Writing: Carlo Sini
No ratings yet
Ethics of Writing: Carlo Sini
189 pages
MODAL-VERBS Lesson Plan V2
100% (1)
MODAL-VERBS Lesson Plan V2
4 pages
Against The Sophists: by R.C. Sproul
No ratings yet
Against The Sophists: by R.C. Sproul
2 pages
Analytical Synthesis and Analysis of Mechanisms Using Matlab and Simulink
No ratings yet
Analytical Synthesis and Analysis of Mechanisms Using Matlab and Simulink
15 pages
Unit 1 Changing Trends & Career in PE
No ratings yet
Unit 1 Changing Trends & Career in PE
4 pages
450.22.Cr.m0 NC Biomedical Engineering Eng Maths Syllabus
No ratings yet
450.22.Cr.m0 NC Biomedical Engineering Eng Maths Syllabus
24 pages
Lesson Plan in English 2 Karen
No ratings yet
Lesson Plan in English 2 Karen
3 pages
Hip Hop Architecture: Seminar Report
No ratings yet
Hip Hop Architecture: Seminar Report
13 pages
CH Min Tool
No ratings yet
CH Min Tool
193 pages
Chapter - 12: (Heron's Formula)
No ratings yet
Chapter - 12: (Heron's Formula)
5 pages
Digital Footprint Assignment
No ratings yet
Digital Footprint Assignment
2 pages
Dtu Thesis Submission Form
100% (2)
Dtu Thesis Submission Form
8 pages
Humanities Module 3
No ratings yet
Humanities Module 3
6 pages
Keywords: Child Learning Assessment, Knowledge, Practice, and Shashemene
No ratings yet
Keywords: Child Learning Assessment, Knowledge, Practice, and Shashemene
22 pages
Bps 432320 Assignment
No ratings yet
Bps 432320 Assignment
2 pages