Levenberg-Marquardt Matlab
Levenberg-Marquardt Matlab
Mathematics
Edited by A Dold and B. Eckmann
630
Numerical Analysis
Proceedings of the Biennial Conference
Springer-Verlag
32166
Editor
G. A. Watson
University of Dundee
Department of Mathematics
Dundee, DD1 4HN/Scotland
This work is subject to copyright. All rights are reserved, whether the whole
or part of the material is concerned, specifically those of translation, re-
printing, re-use of illustrations, broadcasting, reproduction by photocopying
machine or similar means, and storage in data banks. Under § 54 of the
German Copyright Law where copies are made for other than private use,
a fee is payable to the publisher, the amount of the fee to be determined by
agreement with the publisher.
© by Springer-Verlag Berlin Heidelberg 1978
Printed in Germany
Printing and binding: Beltz Offsetdruck, Hemsbach/Bergstr.
2141/3140-543210
Preface
For the 4 days June 28 - July i, 1977, over 220 people attended the 7th
Dundee Biennial Conference on Numerical Analysis at the University of Dundee,
Scotland. The technical program consisted of 16 invited papers, and 63 short
submitted papers, the contributed talks being given in 3 parallel sessions. This
volume contains, in complete form, the papers given by the invited speakers, and
a list of all other papers presented.
I would like to take this opportunity of thanking the speakers, including the
after dinner speaker at the conference dinner, Professor oD S Jones, all chairmen
and participants for their contributions. I would also like to thank the many
people in the Mathematics Department of this University who assisted in various
ways with the preparation for, and running of, this conference. In particular, the
considerable task of typing the various documents associated with the conference,
and some of the typing in this volume has been done by Miss R Dudgeon; this work
is gratefully acknowledged.
G A Watson
179
H J STETTER: Global error estimation in ODE-solvers ...................
190
E L WACHSPRESS: Isojacobic crosswind differencing .....................
INVITED SPEAKERS
Per Grove Thomsen and Zahari Zlatev: Institute for Numerical Analysis, Technical
University of Denmark.
The use of Backward Differentiation methods in the solution of non-stationary heat
conduction problems.
Jorge J. Mor~
i. Introduction
Notation. in all cases If" II refers to the £2 vector norm or to the induced operator
norm. The Jacobian matrix of F evaluated at x is denoted by F' (x), but if we have a
sequence of vectors {Xk} , then Jk and fk are used instead of F'(x k) and F(x k)
respectively.
2. Derivation
~(P) = IIF(x+p) II
Of course, this linearization is not valid for all values of p, and thus we con-
sider the constrained linear least squares problem
Work performed under the auspices of the U.S. Energy Research and Development
Administration
106
but if D is diagonal, then E has axes along the coordinate directions and the length
of the ith semi-axis is A/d..
i
We now consider the solution of (2.1) in some generality, and thus the problem
(2.5) Al$orithm
(b) If IIF(Xk+P~I 1 < IIF(Xk)II set Xk+ I = Xk+P k and evaluate Jk+l; otherwise
set Xk+ I = x k and Jk+l = Jk"
In the next four sections we elaborate on how (2.5) leads to a very robust and
efficient implementation of the Levenberg-Marquardt algorithm.
107
Another method is to recognize that (3.1) are the normal equations for the least
squares problem
(3.2) p ~ - ,
0
and to solve this structured least squares problem using QR decomposition with
column pivoting.
The least squares solution of (3.2) proceeds in two stages. These stages are
the same as those suggested by Golub (Osborne [1972]), but modified to take into
account the pivoting.
p=~
I-°l
0 0
Qf~J-f
4 ilI] I
where D~ = x½~TD~ is still a diagonal matrix and R is a (possibly singular) upper
In the second stage, compute the QR decomposition of the matrix on the right of
(3.4). This can be done with a sequence of n(n+l)/2 Givens rotations. The result
is an orthogonal matrix W such that
(3.5)
p = -~R~lu
W = .
0 V
It is important to note that if X is changed, then only the second stage must be
redone.
The choice of A depends on the ratio between the actual reduction and the pre-
dicted reduction obtained by the correction. In our case, this ratio is given by
Thus (4.1) measures the agreement between the linear model and the (nonlinear) func-
tion. For example, if F is linear then p(p) = i for all p, and if F'(x)TF(x) # 0,
then p(p) + 1 as lien ~ o. Moreover, if !IF(x+p)II t IIF(x)II then p(p) ! 0.
The scheme for updating A has the objective of keeping the value of (4.1) at a
reasonable level. Thus, if p(p) is close to unity (i.e. p(p) ~ 3/4), we may want to
increase &, but if p(p) is not close to unity (i.e. p(p) ! 1/4), then A must be
decreased. Before giving more specific rules for updating A, we discuss the compu-
tation of (4.1). For this, write
IIfN2 -Ilf+II 2
(4.2) p =
[If[l2 - Hf+jpll 2
11f+l[] 2
1- ~i-F~J
(4.4) p = IllJPIll2 [~ll] 2
t~j + 2
the computation of the denominator will not generate any overflows, and moreover,
the denominator will be non-negative regardless of roundoff errors. Note that this
is not the case with (4.2). The numerator of (4.4) may generate overflows if llf+[I
is much larger than llfII, but since we are only interested in positive values of p,
if Ilf+II > Ilfl] we can just set p = 0 and avoid (4.4).
8(0) = ~1 llF(x+ep)Ii2
1
~Y
(4.5) ~ =
+½[1 [Hf421
Urn-F] J
If IIf+lI <_ llfll we set N = 1/2. Also note that we only compute U by (4.5) if say,
llf+lJ < lOIIf]], for otherwise, u < i/i0.
(5.1)
where
2 2 ~½
° i zi
,
where z = uTf and oi,...,o n are the singular values of ~. Hence, it is very natural
to assume that
• a A ~ 7(6)
¢ (~) b + 6
and to choose a and b so that ¢ ( ~ k ) = ¢(~k ) and 7'(~ k) = ¢'(~k ). Then ~(~k+l ) = 0
if
is a suitable upper bound. If J is not rank deficient, then ~'(0) is defined and
the convexity of ¢ implies that
111
(5.5) Algorithm
(b) Evaluate ~(0~k) and ~'(ek). Update u k by letting uk# 1 = ~k if ~(ak) < 0
and Uk+ 1 = u k otherwise. Update £k by
~(o k)
Ik+l = maxllk' ~k ~'(~k) } "
(DT~(~))T(jTj+aDTD)-I(DTq(~))
~'(~) ............ II~(~)ll
where q(a) = Dp(a) and p(.) is defined by (2.4). From (3.4) and (3.5) we have
~T(jTj+~DTD)~ = R TR
and hence,
oTI 112
6. Scaling
where
This choice is usually adequate as long as II~iF(Xk)II does not increase with k. How-
ever, if ll~iF(Xk)II increases, this requires a decrease in the length (= ~/di) of the
.th
z semi-axis of the hyperellipsoid (2.2), since F is now changing faster along the
"i12
.th .th
l variable, and therefore, steps which have a large I component tend to be un-
reliable. This argument leads to the choice
Note that a decrease in II$iF(Xk)!I only implies that F is not changing as fast along
the i th variable, and hence does not require a decrease in d.. In fact, the choice
i
is computationally inferior to both (6.2) and (6.3). Moreover, our theoretical re-
sults support choice (6.3) over (6.4), and to a lesser extent, (6.2).
It is interesting to note that (6.2), (6.3), and (6.4) make the Levenberg-
Marquardt algorithm scale invariant. In other words, for all of the above choices,
if D is a diagonal matrix with positive diagonal elements, then algorithm (2.5) gen-
erates the same iterates if either it is applied to F and started at x 0, or if it is
applied to F(x) = F(D-Ix) and started at x0 = Dx0" For this result it is assumed
that the decision to change A is only based on (4.1), and thus is also scale
invariant.
7. Theoretical Results
(7.1) Al$orithm
then
The proof of our convergence result is somewhat long and will therefore be pre-
sented elsewhere.
k÷+~
This result guarantees that eventually a scaled gradient will be small enough.
Of course, if {Jk} is bounded then (7.2) implies the more standard result that
k÷+~
k÷q~o
Powell [1975] and Osborne [1975] have also obtained global convergence results
for their versions of the Levenberg-Marquardt algorithm. Powell presented a general
algorithm for unconstrained minimization which as a special case contains (7.1) with
o = 0 and {Dk} constant. For this case Powell obtains (7.3) under the assumption
that {Jk} is bounded. Osborne's algorithm directly controls {Ik} instead of {Ak},
and allows {Dk} to be chosen by (6.1) and (6.3). For this case he proves (7.4)
under the assumptions that {Jk} and {%k} are bounded.
8. Numerical Results
In our numerical results we would like to illustrate the behavior of our algo-
rithm with the three choices of scaling mentioned in Section 6. For this purpose,
we have chosen four functions.
f2(x ) = lO[(Xl~+X 2 )2 _ i]
f3(x) = x 3
where
f! (x2/xl), Xl > 0
e(Xl,X2) = ~2~ arctan
x 0 = (-i,0,0) T
114
f i ( x ) = Yi - 1 + x 2 v i + x3w i
x o = (i,i,i) T
where t. = (0.2)i.
1
x 0 = (25, 5, -5, i) T
i. lIF(x*)II = 0.0
2. IIF(x~)II 0.0175358
3. IIF(x~)II 0.0906359
4. llF(x~) I] 292. 9542
Problems 2 and 3 have other solutions. To see this, note that for Kowalik and
Osborne 's function,
These are now linear least squares problems, and as such, the parameter x 2 in (8.1)
and x I in (8.2) are completely determined. However, the remaining parameters only
need to be sufficiently large.
In presenting numerical results one must be very careful about the convergence
criteria used. This is particularly true of the Levenberg-Marquardt method since,
unless F(x*) = 0, the algorithm converges linearly. In our implementation, an
approximation x to x is acceptable if either x is close to x or IIF(x)II is close
115
and
Ij~~ H I 2 [ 2
(8.4) IIfIU + 2 %½ ~ I <__ FTOL .
An important aspect of these tests is that they are scale invariant in the sense of
Section 6. Also note that the work of Section 4 shows that (8.4) is just the rela-
tive error between IIf+Jpll2 and IIf~2.
The problems were run on the IBM 370/195 of Argonne National Laboratory in dou-
ble precision (14 hexadecimal digits) and under the FORTRAN H (opt=2) compiler. The
tolerances in (8.3) and (8.4) were set at FTOL = 10 -8 and XTOL = 10 -8 . Each problem
is run with three starting vectors. We have already given the starting vector x 0
which is closest to the solution; the other two points are 10x 0 and lOOx 0. For each
starting vector, we have tried our algorithm with the three choices of {Dk}. In the
table below, choices (6.2), (6.3) and (6.4) are referred to as initial, adaptive,
and continuous sealing, respectively. Moreover, NF and NJ stands for the number of
function and Jacobian evaluations required for convergence.
x0 10x 0 100x 0
L
PROBLEM SCALING I NF NJ I NF NJ NF NJ
Initial 12 9 34 29 FC FC
1 Adaptive ii 8 20 15 19 16
Continuous 12 9 14 12 176 141
t Initial 8 7 37 36 14 13
3 Adaptive 8 7 37 36 14 13
Continuous 8 7 FC FC FC FC
it is clear from the table that the adaptive strategy is best in these four
examples. We have run other problems, but in all other cases the difference is not
as dramatic as in these cases. However, we believe that the above examples ade-
quately justify our choice of scaling matrix.
116
Acknowledgments. This work benefited from interaction with several people. Beverly
Arnoldy provided the numerical results for several versions of the Levenberg-
Marquardt algorithm, Brian Smith showed how to use pivoting in the two-stage process
of Section 3, and Danny Sorensen made many valuable comments on an earlier draft of
this paper. Finally, I would like to thank Judy Beumer for her swift and beautiful
typing of the paper.
References
i0. Osborne, M. R. [1975]. Nonlinear least squares - the Levenberg algorithm re-
visited, to appear in Series B of the Journal of the Australian Mathematical
Society.