0% found this document useful (0 votes)

34 views23 pages

Ann2018 L5

The document describes the gradient descent algorithm and backpropagation for training neural networks. It provides equations for calculating the error gradient and updating the weights based on this gradient. The gradient is calculated using a chain rule decomposition involving derivatives of the error with respect to outputs, outputs with respect to inputs, and inputs with respect to weights. Examples are given to demonstrate calculating the gradient and updating weights for single and multi-layer neural networks.

Uploaded by

Amartya Keshri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views23 pages

Ann2018 L5

Uploaded by

Amartya Keshri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

1 1

E  (d i  oi )  (d i  f ( wi x)) 2
2 t

2 2

E  (d i  oi ) f ' ( wi x) x
t

The components of the gradient vector are

E
 (d i  oi ) f ' ( wi x) x j j  1,2,...., n
t
for
wij
w i   E
w i   [ d i  oi ] f ' ( net i ) x
w    r  x
r  [d i  f ( wi x)] f ' ( wi x)
t t
w    r  x

w2  w1   [d i  oi ] f ' (neti ) x d=t

Example 1

X1=[1 –2 0 –1]1, X2=[0 1.5 -.5 -1]1, X3=[-1 1 .5 -1]1

d1 = -1 d2 = -1 d3 = 1
w1=[1 -1 0 .5]; 2
Net1=net1=[1 -1 0 .5]*[1 -2 0 -1]'=2.5 f ( net )  (  net )
1
1 e
2e (  net )
f ' ( net ) 
w1 [1  e (  net ) ]2
w2
2e(  net ) 1
o (  net ) 2
 (1  o 2
)
w3 [1  e ] 2
w4 net 1  2.5 O1  0.848
f ' ( net 1 )  0.140
Complete this problem
w 2   [ d i  oi ] f ' ( net 1 ) x1  w1
for one epoch

 [0.974  .948 0 0.526 ]'

net 2  1.948
Example 2

x1 2
0.982
x  d  1   0.1
 0.5  0
4 -3.93
d=t x2
1

The transfer function is unipolar continuous (logsig) 1

o
f ' (net)  (1  o)o 1  e net
w   (d  o)(1  o)o x net=2*0.982+4*0.5-3.93*1=0.034

w    x o=1/(1+exp(-0.04)) = 0.51
Error=1-0.51=0.49
  (1  .51)(1  .51)(.51)  .1225
w   *  * 0.982  0.1* .1225 * .982  0.012 4+0.1*0.1225*.5=4.0061
wnew  wold  .012  2  .012  2.012 -3.93+.1*.1225*1=-3.9178
net = 2.012*0.982+4.0061*0.5-3.9178*1=0.061
Error=1-0.5152=0.4848
o=1/(1+exp(-0.061)=0.5152
By chain rule:
∂E ∂E ∂oi ∂xi
---- = ---- ---- ----
∂wij ∂oi ∂xi ∂wij

∂E
---- = (1/2) 2 (di - oi) (-1) = (oi - ti) E = 1/2 ∑ (di - oi)2
∂oi i

∂oi ∂
---- = ---- [1 / (1 + e-xi)] = - [1 / (1 + e-xi)2] (- e-xi ) = e-xi / (1 + e-xi)2
∂xi ∂xi
(1 + e-xi) - 1 1
= ------------- • ----------- = [1 - 1 / (1 + e-xi)] • [1 / (1 + e-xi)]
(1 + e-xi) (1 + e-xi)

= (1 - oi) oi
∂xi
---- = aj xi = ∑ wijaj
∂wij j
∂E ∂E ∂oi ∂xi
---- = ---- ---- ----
∂wij ∂oi ∂xi ∂wij

= (oi - ti) (1 - oi)oi aj

}
}
}
raw error term due to incoming
(pre-synaptic) activation
due to sigmoid

∂E
Δwij = - η ----- (where η is an arbitrary learning rate)
∂wij

wijt+1 = wijt + η (ti - oi) (1 - oi) oi aj

A two layer network
1
x  d 1   0.1
0 

Transfer function is unipolar continuous

1
o
1  e net
net3=u3= 3*1+4*0+1*1=4 o3=1/(1+exp(-4))=0.982
net4=u4= 6*1+5*0+-6*1=0 o4=1/(1+exp(0))=0.5
net5=u5=2*0.982+4*0.5-3.93*1=0.034 o5=1/(1+exp(-0.04))
=0.51
f ' ( net )  ( 1  o )o
w   ( d  o )( 1  o )o x  w    x  5  (1  .51)(1  .51)(.51)  .1225
δ
w53   *  5 * 0.982  0.1* .1225 * .982  0.012
w53  w53  .012  2  .012  2.012
Performance Optimization
Taylor Series Expansion

d
F ( x ) = F ( x* ) + F (x) ( x – x* )
dx x = x*

2
1 d 2
+ --- F (x) ( x – x* ) + 
2 d x2
x = x*

n
1 d
( x – x* ) + 
n
nnd8ts + ----- F (x)
n! d x n
x = x*
Example
–x
F( x ) = e

Taylor series of F(x) about x* = 0 :

–x –0 –0 1 –0 2 1 –0 3
F (x ) = e = e – e ( x – 0 ) + ---e ( x – 0 ) – -- e ( x – 0 ) + 
2 6

1 2 1 3
F ( x ) = 1 – x + -- x – --- x + 
2 6

Taylor series approximations:

F ( x )  F0 ( x ) = 1

F ( x)  F 1 ( x) = 1 – x

1 2
F ( x )  F 2 ( x ) = 1 – x + --- x
2
Plot of Approximations
6

F2 ( x )
3

2 F1 ( x )

1
F0 ( x )

-2 -1 0 1 2
Vector Case
F ( x) = F ( x1 x 2   x n )

 
F ( x ) = F ( x* ) + F (x ) ( x 1 – x 1* ) + F (x ) ( x 2 – x 2* )
 x1 x = x *  x2 x=x *

2
 1  2
+ + F (x ) ( x – x * ) + --
- F ( x ) ( x – x * )
 xn x* x = x*
n n 2  x2 1 1
x =
1

2
1 
+ --- F (x ) *
( x 1 – x 1* ) ( x 2 – x 2* ) + 
2  x 1 x 2 x = x
Matrix Form
T
F ( x ) = F ( x* ) +  F ( x ) ( x – x* )
x = x*

1 T
+ --- ( x – x * ) 2F ( x ) ( x – x* ) + 
2 x = x*

Gradient Hessian

2 2 2
  
 F (x ) F (x )  F (x )
F (x ) 2
 x1  1 2
x x  1 n
x x
 x1
2 2 2
   
F (x ) F (x ) F (x )  F (x )
F ( x ) =  x2  (x ) =
2F
 2 1
x x 2
 x2  2 n
x x






 2 2 2
F (x )   
 xn F (x ) F (x )  F (x )
 n 1
x x  n 2
x x 2
 xn
Directional Derivatives
First derivative (slope) of F(x) along xi axis:  F ( x )   xi

(ith element of gradient)

2 2
Second derivative (curvature) of F(x) along xi axis:  F (x )  x i

(i,i element of Hessian)

T
p F ( x )
First derivative (slope) of F(x) along vector p: -----------------------
p

T
Second derivative (curvature) of F(x) along vector p: p 2 F ( x ) p
------------------------------
2
p
Example
2 2
F (x ) = x 1 + 2x 1 x2 + 2 x2

x* = 0.5 p = 1
0 –1


F( x )
 x1 2x 1 + 2x 2 1
 F ( x) = = =
x = x*  2x 1 + 4x 2 1
F( x )
 x2 x = x*
x = x*

1
T 1 – 1
p F ( x ) 1 0
----------------------- = ------------------------ = ------- = 0
p 1 2
–1
Plots
Directional
Derivatives
2

15
1
1.4
10
1.3
5
x2 0 1.0

0 0.5
2
1 2
-1
0.0
0 1
0
-1
x2 -2 -2
-1
x1
-2
-2 -1 0 1 2

nnd8dd
Minima
Strong Minimum

The point x* is a strong minimum of F(x) if a scalar  > 0 exists, such that F(x*) <
F(x* + x) for all x such that  > ||x|| > 0.

Global Minimum

The point x* is a unique global minimum of F(x) if

F(x*) < F(x* + x) for all x  0.

Weak Minimum

The point x* is a weak minimum of F(x) if it is not a strong minimum, and a scalar  > 0
exists, such that F(x*) F(x* + x) for all x such that  > ||x|| > 0.
Scalar Example
4 2 1
F ( x ) = 3x – 7x – --- x + 6
2
8

Strong Maximum
6

2 Strong Minimum

Global Minimum
0
-2 -1 0 1 2
Quadratic Functions
1 T T
F ( x ) = -- x Ax + d x + c (Symmetric A)
2

Gradient of Quadratic Function:

F( x ) = Ax + d

Hessian of Quadratic Function:

 2F ( x ) = A
• If the eigenvalues of the Hessian matrix are all positive,
the function will have a single strong minimum.
• If the eigenvalues are all negative, the function will
have a single strong maximum.
• If some eigenvalues are positive and other eigenvalues
are negative, the function will have a single saddle
point.
• If the eigenvalues are all nonnegative, but some
eigenvalues are zero, then the function will either have
a weak minimum or will have no stationary point.
• If the eigenvalues are all nonpositive, but some
eigenvalues are zero, then the function will either have
a weak maximum or will have no stationary point.
Stationary point nature summary
xT Ax i Definiteness H Nature x*

0 Positive d. Minimum

0 Positive semi-d. Valley

Indefinite Saddlepoint
0
0 Negative semi-d. Ridge

0 Negative d. Maximum
Steepest Descent
2 2
F ( x ) = x1 + 2 x1 x 2 + 2x 2 + x1

x 0 = 0.5  = 0.1
0.5


F( x )
 x1 2x 1 + 2x2 + 1 g0 =  F (x ) = 3
F ( x ) = = x= x0
 2x 1 + 4x 2 3
F( x )
 x2

x 1 = x 0 – g 0 = 0.5 – 0.1 3 = 0.2

0.5 3 0.2

x2 = x1 – g1 = 0.2 – 0.1 1.8 = 0.02

0.2 1.2 0.08
2

-1

-2
-2 -1 0 1 2

If A is a symmetric matrix with eigenvalues λs, then eigenvalues of I- αA are

1- αλ
Stable Learning Rates (Quadratic)
1 T T
F ( x ) = -- x Ax + d x + c
2

F( x ) = Ax + d

x k + 1 = xk –  gk = x k –  ( Ax k + d ) xk + 1 =  I – A x k – d

Stability is determined
by the eigenvalues of
this matrix.

 I –  A  zi = z i –  Az i = z i –  iz i = ( 1 –  i) z i

(i - eigenvalue of A) Eigenvalues

of [I - A].

Stability Requirement:
2 2
( 1 –  i)  1   ----   ------------
i max
Example
  0.851     0.526  
A= 22 (
 1  = 0.764) z
 1 =  
 2 = 5.24 z
 2 = 
24   – 0.526     0.851  

2 2
  ------------ = ---------- = 0.38
max 5.24

 = 0.37  = 0.39
2 2

1 1

0 0

-1 -1

-2 -2
-2 -1 0 1 2 -2 -1 0 1 2

When A Donor Attempts But Fails To Complete A Gift, There Is No Reason For A Trust To Arise Unless It Would Be Unconscionable To Refuse To Complete This Gift.
No ratings yet
When A Donor Attempts But Fails To Complete A Gift, There Is No Reason For A Trust To Arise Unless It Would Be Unconscionable To Refuse To Complete This Gift.
5 pages
Capacities For Refrigerant (Excavators)
No ratings yet
Capacities For Refrigerant (Excavators)
9 pages
Recognition Patterns: Jean Carlo Grandas Franco March 2020
No ratings yet
Recognition Patterns: Jean Carlo Grandas Franco March 2020
9 pages
Matrix Differentiation
No ratings yet
Matrix Differentiation
15 pages
ME554 Sheet 3 Final PDF
No ratings yet
ME554 Sheet 3 Final PDF
31 pages
Linear Algebra Assignment Solution
100% (1)
Linear Algebra Assignment Solution
28 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Assignment 4 Solution
No ratings yet
Assignment 4 Solution
3 pages
Ch8 Presn PDF
No ratings yet
Ch8 Presn PDF
24 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
8 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
CS6910 Tutorial1
No ratings yet
CS6910 Tutorial1
10 pages
CENG3300 Lecture 2-1
No ratings yet
CENG3300 Lecture 2-1
21 pages
Artificial Neural Networks: Multilayer Perceptrons Backpropagation
No ratings yet
Artificial Neural Networks: Multilayer Perceptrons Backpropagation
71 pages
Gradient Notes PDF
No ratings yet
Gradient Notes PDF
7 pages
Computing Neural Network Gradients-merged
No ratings yet
Computing Neural Network Gradients-merged
67 pages
L8 Ann
No ratings yet
L8 Ann
19 pages
Linear Quadratic Control
No ratings yet
Linear Quadratic Control
7 pages
Scalar and Vector Field Operations
No ratings yet
Scalar and Vector Field Operations
6 pages
Montanari
No ratings yet
Montanari
10 pages
calculus
No ratings yet
calculus
5 pages
Matrix Calculus Short
No ratings yet
Matrix Calculus Short
5 pages
Vector - Matrix Calculus
No ratings yet
Vector - Matrix Calculus
10 pages
Spring 2015 Mid-Sem Q_A
No ratings yet
Spring 2015 Mid-Sem Q_A
10 pages
hw4_red
No ratings yet
hw4_red
6 pages
Calculus - class notes
No ratings yet
Calculus - class notes
4 pages
Backpropagation in Matrix Notation
No ratings yet
Backpropagation in Matrix Notation
8 pages
TUM I2DL Matrix Derivatives
No ratings yet
TUM I2DL Matrix Derivatives
8 pages
ECE275AB Lecture 10 View Graphs 2008-2009
No ratings yet
ECE275AB Lecture 10 View Graphs 2008-2009
15 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
Math For Machine Learning
No ratings yet
Math For Machine Learning
1 page
2.NCC-SFC-LMT-KKT 2
No ratings yet
2.NCC-SFC-LMT-KKT 2
56 pages
4-Optimization of 2 Variables, Gradient Descent
No ratings yet
4-Optimization of 2 Variables, Gradient Descent
12 pages
FAI 4 Mathematical Concepts II
No ratings yet
FAI 4 Mathematical Concepts II
39 pages
Deep Learning Assignment2 Solutions PDF
No ratings yet
Deep Learning Assignment2 Solutions PDF
16 pages
L-ANN6
No ratings yet
L-ANN6
21 pages
Homework2 Advanced Ml
No ratings yet
Homework2 Advanced Ml
4 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
斯坦福大学机器学习数学基础 17-24
No ratings yet
斯坦福大学机器学习数学基础 17-24
8 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
No ratings yet
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
6 pages
Curs Tehnici de Optimizare
No ratings yet
Curs Tehnici de Optimizare
141 pages
Chap11 - Alpha Chiang Soluciones
No ratings yet
Chap11 - Alpha Chiang Soluciones
13 pages
hw07 Neural Soln PDF
No ratings yet
hw07 Neural Soln PDF
6 pages
2021-exam2-solution
No ratings yet
2021-exam2-solution
11 pages
Matrix Calculus
No ratings yet
Matrix Calculus
8 pages
mit18_s096iap23_lec1
No ratings yet
mit18_s096iap23_lec1
16 pages
Math YHPLinear Regression
No ratings yet
Math YHPLinear Regression
13 pages
Fast Curvature Matrix-Vector Products For Second-Order Gradient Descent
No ratings yet
Fast Curvature Matrix-Vector Products For Second-Order Gradient Descent
16 pages
Calculus of Variation and Image Processing: Scalar Product
No ratings yet
Calculus of Variation and Image Processing: Scalar Product
9 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Nptel CN Maths
No ratings yet
Nptel CN Maths
32 pages
Cs217 Perceptron Sigmoid Softmax Week5 3feb25
No ratings yet
Cs217 Perceptron Sigmoid Softmax Week5 3feb25
90 pages
SkriptOptMach
No ratings yet
SkriptOptMach
49 pages
Final Exam Ch4 3
No ratings yet
Final Exam Ch4 3
11 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Lec 1 - Maths in ML I
No ratings yet
Lec 1 - Maths in ML I
38 pages
AI2025_Lecture08_recording_slide
No ratings yet
AI2025_Lecture08_recording_slide
38 pages
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Ann2018 L6
No ratings yet
Ann2018 L6
18 pages
Ann2018 L2 PDF
No ratings yet
Ann2018 L2 PDF
18 pages
Radial Basis Function (RBF) Networks
No ratings yet
Radial Basis Function (RBF) Networks
24 pages
Optimization Methods: - Gradient Descent - Conjugate Gradient - Levenberg-Marquardt - Quasi-Newton - Evolutionary Methods
No ratings yet
Optimization Methods: - Gradient Descent - Conjugate Gradient - Levenberg-Marquardt - Quasi-Newton - Evolutionary Methods
23 pages
Classification Problem: Feedforwardnet Patternnet Fitnet
No ratings yet
Classification Problem: Feedforwardnet Patternnet Fitnet
16 pages
Genetic Algorithms: Genetic Algorithms in Search, Optimization, and Machine Learning-David E. Goldberg
No ratings yet
Genetic Algorithms: Genetic Algorithms in Search, Optimization, and Machine Learning-David E. Goldberg
15 pages
Problems With Backpropagation
No ratings yet
Problems With Backpropagation
8 pages
Ann2018 L7
No ratings yet
Ann2018 L7
17 pages
Ann2018 L6
No ratings yet
Ann2018 L6
18 pages
DS - Inversor 15kW
No ratings yet
DS - Inversor 15kW
2 pages
Work Sheet Computation of Income Under The Head "Capital Gains"
No ratings yet
Work Sheet Computation of Income Under The Head "Capital Gains"
4 pages
Auditing Financial Institutions
No ratings yet
Auditing Financial Institutions
92 pages
BOSCH Gop Professional 40 30
No ratings yet
BOSCH Gop Professional 40 30
115 pages
Advanced Query Operators: Analytics Data Sources
No ratings yet
Advanced Query Operators: Analytics Data Sources
10 pages
RDDA - Direct-Acting Relief Valve
No ratings yet
RDDA - Direct-Acting Relief Valve
3 pages
QCVN76 - 2023 - BTNMT - 920744 (English)
No ratings yet
QCVN76 - 2023 - BTNMT - 920744 (English)
15 pages
Agrochemical Manufacturing Industry - Case Study
No ratings yet
Agrochemical Manufacturing Industry - Case Study
2 pages
STRIVE Prep Presentation
No ratings yet
STRIVE Prep Presentation
14 pages
University of New Haven Nikhil SOP
No ratings yet
University of New Haven Nikhil SOP
3 pages
1.-Look at These Maps and Write The Correct Order To Go To The Place
No ratings yet
1.-Look at These Maps and Write The Correct Order To Go To The Place
3 pages
Business Combination: Expense Immediately
No ratings yet
Business Combination: Expense Immediately
7 pages
Attendance Management System
64% (14)
Attendance Management System
29 pages
Calendar 2023 - 2024
No ratings yet
Calendar 2023 - 2024
10 pages
TM03-Abstraction-Packages-Exception Handling
No ratings yet
TM03-Abstraction-Packages-Exception Handling
11 pages
Exam JN0-633
No ratings yet
Exam JN0-633
83 pages
Allowable Deductions
No ratings yet
Allowable Deductions
4 pages
Application+of+Derivative - Advanced Level by Neha Maam
100% (1)
Application+of+Derivative - Advanced Level by Neha Maam
10 pages
Rural Bank of Davao City V Hon
No ratings yet
Rural Bank of Davao City V Hon
2 pages
Chapter 1 12th Solid State
No ratings yet
Chapter 1 12th Solid State
3 pages
Bolt Torque Values.
No ratings yet
Bolt Torque Values.
13 pages
What Is A Just Society, Lec 4
No ratings yet
What Is A Just Society, Lec 4
26 pages
C 200 Ugid
100% (2)
C 200 Ugid
296 pages
Certified Management Accountant CMA Part 2
No ratings yet
Certified Management Accountant CMA Part 2
69 pages
Sachin Anna Nagar
No ratings yet
Sachin Anna Nagar
1 page
ABKA350a - Full Year Case Study Assessment 1 Question Paper v2
No ratings yet
ABKA350a - Full Year Case Study Assessment 1 Question Paper v2
9 pages
Revised Schedule For End Term Examination 2023-24 V2 Jan 2024
No ratings yet
Revised Schedule For End Term Examination 2023-24 V2 Jan 2024
5 pages
U760E VacTest
No ratings yet
U760E VacTest
4 pages

Ann2018 L5

Uploaded by

Ann2018 L5

Uploaded by

1 1

The components of the gradient vector are

w2  w1   [d i  oi ] f ' (neti ) x d=t

X1=[1 –2 0 –1]1, X2=[0 1.5 -.5 -1]1, X3=[-1 1 .5 -1]1

 [0.974  .948 0 0.526 ]'

The transfer function is unipolar continuous (logsig) 1

= (oi - ti) (1 - oi)oi aj

wijt+1 = wijt + η (ti - oi) (1 - oi) oi aj

Transfer function is unipolar continuous

Taylor series of F(x) about x* = 0 :

Taylor series approximations:

(ith element of gradient)

(i,i element of Hessian)

The point x* is a unique global minimum of F(x) if

Gradient of Quadratic Function:

Hessian of Quadratic Function:

0 Positive semi-d. Valley

x 1 = x 0 – g 0 = 0.5 – 0.1 3 = 0.2

x2 = x1 – g1 = 0.2 – 0.1 1.8 = 0.02

If A is a symmetric matrix with eigenvalues λs, then eigenvalues of I- αA are

(i - eigenvalue of A) Eigenvalues

You might also like