CompactNumerical Methods For ComputersLinearAlgebra - Muya
CompactNumerical Methods For ComputersLinearAlgebra - Muya
Next
COMPACT NUMERICAL
METHODS
FOR COMPUTERS
linear algebra and
function minimisation
Second Edition
J C NASH
0-85274-318-1
0-85274-319-X (pbk)
0-7503-0036-1 (5" IBM disc)
0-7503-0043-4 (3" IBM disc)
CONTENTS
ix
xi
1
1
3
9
11
13
15
17
17
19
19
19
21
24
26
28
30
30
31
32
35
38
40
49
49
49
54
54
63
66
vi
72
72
72
80
82
83
84
84
94
94
86
90
97
102
102
102
108
110
119
119
121
126
128
133
135
142
142
148
148
148
160
146
Contents
14. DIRECT SEARCH METHODS
14.1. The Nelder-Mead simplex search for the minimum of a
function of several parameters
14.2. Possible modifications of the Nelder-Mead algorithm
14.3. An axial search procedure
14.4. Other direct search methods
vii
168
168
172
178
182
186
186
187
190
197
197
198
207
207
208
210
211
212
215
18. LEFT-OVERS
18.1. Introduction
18.2. Numerical approximation of derivatives
18.3. Constrained optimisation
18.4. A comparison of function minimisation and nonlinear leastsquares methods
218
218
218
221
226
234
234
235
241
243
APPENDICES
1. Nine test matrices
2. List of algorithms
3. List of examples
4. Files on the software diskette
253
253
255
256
258
BIBLIOGRAPHY
263
INDEX
271
xii
(ii) in the United States, the members of the Applied Mathematics Division of the
Argonne National Laboratory who have taken such an interest in the algorithms,
and Stephen Nash who has pointed out a number of errors and faults; and
(iii) in Canada, the members of the Economics Branch of Agriculture Canada for
presenting me with such interesting problems to solve, Kevin Price for careful and
detailed criticism, Bob Henderson for trying out most of the algorithms, Richard
Wang for pointing out several errors in chapter 8, John Johns for trying (and
finding errors in) eigenvalue algorithms, and not least Mary Nash for a host of
corrections and improvements to the book as a whole.
It is a pleasure to acknowledge the very important roles of Neville Goodman
and Geoff Amor of Adam Hilger Ltd in the realisation of this book.
J. C. Nash
Ottawa, 22 December 1977
Chapter 1
A STARTING POINT
1.1. PURPOSE AND SCOPE
This monograph is written for the person who has to solve problems with (small)
computers. It is a handbook to help him or her obtain reliable answers to specific
questions, posed in a mathematical way, using limited computational resources.
To this end the solution methods proposed are presented not only as formulae but
also as algorithms, those recipes for solving problems which are more than merely
a list of the mathematical ingredients.
There has been an attempt throughout to give examples of each type of
calculation and in particular to give examples of cases which are prone to upset
the execution of algorithms. No doubt there are many gaps in the treatment
where the experience which is condensed into these pages has not been adequate
to guard against all the pitfalls that confront the problem solver. The process of
learning is continuous, as much for the teacher as the taught. Therefore, the user
of this work is advised to think for him/herself and to use his/her own knowledge and
familiarity of particular problems as much as possible. There is, after all, barely a
working career of experience with automatic computation and it should not seem
surprising that satisfactory methods do not exist as yet for many problems. Throughout the sections which follow, this underlying novelty of the art of solving numerical
problems by automatic algorithms finds expression in a conservative design policy.
Reliability is given priority over speed and, from the title of the work, space
requirements for both the programs and the data are kept low.
Despite this policy, it must be mentioned immediately and with some
emphasis that the algorithms may prove to be surprisingly efficient from a
cost-of-running point of view. In two separate cases where explicit comparisons
were made, programs using the algorithms presented in this book cost less to
run than their large-machine counterparts. Other tests of execution times for
algebraic eigenvalue problems, roots of a function of one variable and function
minimisation showed that the eigenvalue algorithms were by and large slower
than those recommended for use on large machines, while the other test problems
were solved with notable efficiency by the compact algorithms. That small
programs may be more frugal than larger, supposedly more efficient, ones based
on different algorithms to do the same job has at least some foundation in the way
todays computers work.
Since the first edition of this work appeared, a large number and variety of
inexpensive computing machines have appeared. Often termed the microcomputer
revolution, the widespread availability of computing power in forms as diverse as
programmable calculators to desktop workstations has increased the need for
1
suitable software of all types. including numerical methods. The present work is
directed at the user who needs, for whatever reason, to program a numerical method
to solve a problem. While software packages and libraries exist to provide for the
solution of numerical problems, financial, administrative or other obstacles may
render their use impossible or inconvenient. For example, the programming tools
available on the chosen computer may not permit the packaged software to be used.
Firstly, most machines are controlled by operating systems which control (and
sometimes charge for) the usage of memory, storage, and other machine resources. In
both compilation (translation of the program into machine code) and execution, a
smaller program usually will make smaller demands on resources than a larger one.
On top of this, the time of compilation is usually related to the size of the source
code.
Secondly, once the program begins to execute, there are housekeeping operations
which must be taken care of:
(i) to keep programs and data belonging to one task or user separate from those
belonging to others in a time-sharing environment, and
(ii) to access the various parts of the program and data within the set of
resources allocated to a single user.
Studies conducted some years ago by Dr Maurice Cox of the UK National
Physical Laboratory showed that (ii) requires about 90% of the time a computer
spends with a typical scientific computation. Only about 10% of the effort goes to
actual arithmetic. This mix of activity will vary greatly with the machine and problem
under consideration. However, it is not unreasonable that a small program can use
simpler structures, such as address maps and decision tables, than a larger routine. It
is tempting to suggest that the computer may be able to perform useful work with a
small program while deciding what to do with a larger one. Gathering specific
evidence to support such conjectures requires the fairly tedious work of benchmarking. Moreover, the results of the exercise are only valid as long as the machine,
operating system, programming language translators and programs remain
unchanged. Where performance is critical, as in the case of real-time computations,
for example in air traffic control, then benchmarking will be worthwhile. In other
situations, it will suffice that programs operate correctly and sufficiently quickly that
the user is not inconvenienced.
This book is one of the very few to consider algorithms which have very low
storage requirements. The first edition appeared just as programmable calculators
and the first microcomputers were beginning to make their presence felt. These
brought to the users desk a quantum improvement in computational power.
Comparisons with the earliest digital computers showed that even a modest microcomputer was more powerful. It should be noted, however, that the programmer did
not have to handle all the details of arithmetic and data storage, as on the early
computers, thanks to the quick release of programming language translators. There
is unfortunately still a need to be vigilant for errors in the floating-point arithmetic
and the special function routines. Some aids to such checking are mentioned later in
1.2.
Besides the motivation of cost savings or the desire to use an available and
A starting point
3
possibly under-utilised small computer, this work is directed to those who share
my philosophy that human beings are better able to comprehend and deal with
small programs and systems than large ones. That is to say, it is anticipated that
the costs involved in implementing, modifying and correcting a small program will
be lower for small algorithms than for large ones, though this comparison will
depend greatly on the structure of the algorithms. By way of illustration, I
implemented and tested the eigenvalue/vector algorithm (algorithm 13) in under
half an hour from a 10 character/second terminal in Aberystwyth using a Xerox
Sigma 9 computer in Birmingham. The elapsed time includes my instruction in the
use of the system which was of a type I had not previously encountered. I am
grateful to Mrs Lucy Tedd for showing me this system. Dr John Johns of the
Herzberg Institute of Astrophysics was able to obtain useful eigensolutions from
the same algorithm within two hours of delivery of a Hewlett-Packard 9825
programmable calculator. He later discovered a small error in the prototype of
the algorithm.
The topics covered in this work are numerical linear algebra and function
minimisation. Why not differential equations? Quite simply because I have had
very little experience with the numerical solution of differential equations except
by techniques using linear algebra or function minimisation. Within the two broad
areas, several subjects are given prominence. Linear equations are treated in
considerable detail with separate methods given for a number of special situations.
The algorithms given here are quite similar to those used on larger machines. The
algebraic eigenvalue problem is also treated quite extensively, and in this edition, a
method for complex matrices is included. Computing the eigensolutions of a general
square matrix is a problem with many inherent difficulties, but we shall not dwell on
these at length.
Constrained optimisation is still a topic where I would be happier to offer more
material, but general methods of sufficient simplicity to include in a handbook of this
sort have eluded my research efforts. In particular, the mathematical programming
problem is not treated here.
Since the aim has been to give a problem-solving person some tools with which
to work, the mathematical detail in the pages that follow has been mostly confined
to that required for explanatory purposes. No claim is made to rigour in any
proof, though a sincere effort has been made to ensure that the statement of
theorems is correct and precise.
1.2. MACHINE CHARACTERISTICS
In the first edition, a small computer was taken to have about 6000 characters of
main memory to hold both programs and data. This logical machine, which might be
a part of a larger physical computer, reflected the reality facing a quite large group of
users in the mid- to late-1970s.
A more up-to-date definition of small computer could be stated, but it is not
really necessary. Users of this book are likely to be those scientists, engineers, and
statisticians who must, for reasons of financial or administrative necessity or
convenience, carry out their computations in environments where the programs
cannot be acquired simply and must, therefore, be written in-house. There are also a
number of portable computers now available. This text is being entered on a Tandy
Radio Shack TRS-80 Model 100, which is only the size of a large book and is
powered by four penlight batteries.
Users of the various types of machines mentioned above often do not have much
choice as to the programming tools available. On borrowed computers, one has to
put up with the compilers and interpreters provided by the user who has paid for the
resource. On portables, the choices may be limited by the decisions of the manufacturer. In practice, I have, until recently, mostly programmed in BASIC, despite its
limitations, since it has at least been workable on most of the machines available to
me.
Another group of users of the material in this book is likely to be software
developers. Some scenarios which have occurred are:
software is being developed in a particular computing environment (e.g. LISP
for artificial intelligence) and a calculation is required for which suitable off-the-shelf
routines are not available;
standard routines exist but when linked into the package cause the executable
code to be too large for the intended disk storage or memory;
standard routines exist, but the coding is incompatible with the compiler or
interpreter at hand.
It is worth mentioning that programming language standards have undergone
considerable development in the last decade. Nevertheless, the goal of portable
source codes of numerical methods is still only won by careful and conservative
programming practices.
Because of the emphasis on the needs of the user to program the methods, there is
considerable concern in this book to keep the length and complexity of the
algorithms to a minimum. Ideally, I like program codes for the algorithms to be no
longer than a page of typed material, and at worse, less than three pages. This makes
it feasible to implement the algorithms in a single work session. However, even this
level of effort is likely to be considered tedious and it is unnecessary if the code can be
provided in a suitable form. Here we provide source code in Turbo Pascal for the
algorithms in this book and for the driver and support routines to run them (under
Turbo Pascal version 3.01a).
The philosophy behind this book remains one of working with available tools
rather than complaining that better ones exist, albeit not easily accessible. This
should not lead to complacency in dealing with the machine but rather to an active
wariness of any and every feature of the system. A number of these can and should be
checked by using programming devices which force the system to reveal itself in spite
of the declarations in the manual(s). Others will have to be determined by exploring
every error possibility when a program fails to produce expected results. In most
cases programmer error is to blame, but I have encountered at least one system error
in each of the systems I have used seriously. For instance, trigonometric functions are
usually computed by power series approximation. However, these approximations
have validity over specified domains, usually [0, /2] or [0, /2] (see Abramowitz and
Stegun 1965, p 76). Thus the argument of the function must first be transformed to
A starting point
sin( ) = sin
(1.1)
sin( /2 ) = cos
(1.2)
Unless this range reduction is done very carefully the results may be quite
unexpected. On one system, hosted by a Data General NOVA, I have observed
that the sine of an angle near and the cosine of an angle near /2 were both
computed as unity instead of a small value, due to this type of error. Similarly, on
some early models of Hewlett- Packard pocket calculators, the rectangular-to-polar
coordinate transformation may give a vector 180 from the correct direction. (This
appears to have been corrected now.)
Testing the quality of the floating-point arithmetic and special functions is
technically difficult and tedious. However, some developments which aid the user
have been made by public-spirited scientists. Of these, I consider the most worthy
example to be PARANOIA, a program to examine the floating-point arithmetic
provided by a programming language translator. Devised originally by Professor W
Kahan of the University of California, Berkeley, it has been developed and distributed in a number of programming languages (Karpinski 1985). Its output is
didactic, so that one does not have to be a numerical analyst to interpret the results. I
have used the BASIC, FORTRAN, Pascal and C versions of PARANOIA, and have seen
reports of Modula-2 and ADA versions.
In the area of special functions, Cody and Waite (1980) have devised software to
both calculate and test a number of the commonly used transcendental functions
(sin, cos, tan, log, exp, sqrt, xy). The ELEFUNT testing software is available in their
book, written in FORTRAN. A considerable effort would be needed to translate it into
other programming languages.
An example from our own work is the program DUBLTEST, which is designed to
determine the precision to which the standard special functions in BASIC are
computed (Nash and Walker-Smith 1987). On the IBM PC, many versions of
Microsoft BASIC (GWBASIC, BASICA) would only compute such functions in single
precision, even if the variables involved as arguments or results were double
precision. For some nonlinear parameter estimation problems, the resulting low
precision results caused unsatisfactory operation of our software.
Since most algorithms are in some sense iterative, it is necessary that one has
some criterion for deciding when sufficient progress has been made that the
execution of a program can be halted. While, in general, I avoid tests which
require knowledge of the machine, preferring to use the criterion that no progress
has been made in an iteration, it is sometimes convenient or even necessary to
employ tests involving tolerances related to the structure of the computing device
at hand.
The most useful property of a system which can be determined systematically is
the machine precision. This is the smallest number, eps, such that
1+eps>1
ADA is a registered name of the US Department of Defense.
(1.3)
within the arithmetic of the system. Two programs in FORTRAN for determining the
machine precision, the radix or base of the arithmetic, and machine rounding or
truncating properties have been given by Malcolm (1972). The reader is cautioned
that, since these programs make use of tests of conditions like (1.3), they may be
frustrated by optimising compilers which are able to note that (1.3) in exact
arithmetic is equivalent to
eps>0.
(1.4)
Condition (1.4) is not meaningful in the present context. The Univac compilers
have acquired some notoriety in this regard, but they are by no means the only
offenders.
To find the machine precision and radix by using arithmetic of the computer
itself, it is first necessary to find a number q such that (1 + q ) and q are
represented identically, that is, the representation of 1 having the same exponent
as q has a digit in the (t + 1)th radix position where t is the number of radix digits
in the floating-point mantissa. As an example, consider a four decimal digit
machine. If q = 10,000 or greater, then q is represented as (say)
0.1 * 1E5
while 1 is represented as
000001 * 1E5.
The action of storing the five-digit sum
010001 * 1E5
in a four-digit word causes the last digit to be dropped. In the example,
q = 10 000 is the smallest number which causes (1 + q) and q to be represented
identically, but any number
q > 9999
will have the same property. If the machine under consideration has radix R, then
any
q > Rt
(1.5)
t +1
will have the desired property. If, moreover, q and R are represented so that
then
q < Rt
+1
q+R>q.
(1.6)
(1.7)
A starting point
(1.8)
(1.9)
a
b
a+b
(a + b) /2
Exact
Rounded
Truncated
5008
5007
10015
50075
5008
5007
1002 * 10
5010 * 10
= 5010
5008
5007
1001 * 10
5005 * 10
= 500.5
A starting point
That this can and does occur should be kept in mind whenever averages are
computed. For instance, the calculations are quite stable if performed as
(a + b)/2 = 5000+[(a 5000) + (b 5000)]/2.
Taking account of every eventuality of this sort is nevertheless extremely tedious.
Another annoying characteristic of small machines is the frequent absence of
extended precision, also referred to as double precision, in which extra radix digits
are made available for calculations. This permits the user to carry out arithmetic
operations such as accumulation, especially of inner products of vectors, and
averaging with less likelihood of catastrophic errors. On equipment which functions with number representations similar to the IBM/360 systems, that is, six
hexadecimal (R = 16) digits in the mantissa of each number, many programmers
use the so-called double precision routinely. Actually t = 14, which is not double
six. In most of the calculations that I have been asked to perform, I have not
found such a sledgehammer policy necessary, though the use of this feature in
appropriate situations is extremely valuable. The fact that it does not exist on
most small computers has therefore coloured much of the development which
follows.
Finally, since the manufacturers basic software has been put in question above,
the user may also wonder about their various application programs and packages.
While there are undoubtedly some good programs, my own experience is that the
quality of such software is very mixed. Badly written and poorly documented
programs may take longer to learn and understand than a satisfactory homegrown
product takes to code and debug. A major fault with many software products is that
they lack references to the literature and documentation of their pedigree and
authorship. Discussion of performance and reliability tests may be restricted to very
simple examples. Without such information, it may be next to impossible to
determine the methods used or the level of understanding of the programmer of the
task to be carried out, so that the user is unable to judge the quality of the product.
Some developers of mathematical and statistical software are beginning to recognise
the importance of background documentation, and their efforts will hopefully be
rewarded with sales.
1.3. SOURCES OF PROGRAMS
When the first edition of this book was prepared, there were relatively few sources of
mathematical software in general, and in essence none (apart from a few manufacturers offerings) for users of small computers. This situation has changed remarkably, with some thousands of suppliers. Source codes of numerical methods,
however, are less widely available, yet many readers of this book may wish to
conduct a search for a suitable program already coded in the programming language
to be used before undertaking to use one of the algorithms given later.
How should such a search be conducted? I would proceed as follows.
First, if FORTRAN is the programming language, I would look to the major
collections of mathematical software in the Collected Algorithms of the Association for
Computing Machinery (ACM). This collection, abbreviated as CALGO, is comprised
10
of all the programs published in the Communications of the ACM (up to 1975) and
the ACM Transactions on Mathematical Software (since 1975). Other important
collections are EISPACK, UNPACK, FUNPACK and MINPACK, which concern algebraic
eigenproblems, linear equations, special functions and nonlinear least squares minimisation problems. These and other packages are, at time of writing, available from
the Mathematics and Computer Sciences Division of the Argonne National Laboratory of the US Department of Energy. For those users fortunate enough to have
access to academic and governmental electronic mail networks, an index of software
available can be obtained by sending the message
SEND INDEX
to the pseudo-user NETLIB at node ANL-MCS on the ARPA network (Dongarra and
Grosse 1987). The software itself may be obtained by a similar mechanism.
Suppliers such as the Numerical Algorithms Group (NAG), International Mathematical and Statistical Libraries (IMSL), C Abaci, and others, have packages
designed for users of various computers and compilers, but provide linkable object
code rather than the FORTRAN source. C Abaci, in particular, allows users of the
Scientific Desk to also operate the software within what is termed a problem solving
environment which avoids the need for programming.
For languages other than FORTRAN, less software is available. Several collections of
programs and procedures have been published as books, some with accompanying
diskettes, but once again, the background and true authorship may be lacking. The
number of truly awful examples of badly chosen, badly coded algorithms is alarming,
and my own list of these too long to include here.
Several sources I consider worth looking at are the following.
Maindonald (1984)
A fairly comprehensive collection of programs in BASIC (for a Digital Equipment Corporation VAX computer) are presented covering linear estimation,
statistical distributions and pseudo-random numbers.
Nash and Walker-Smith (1987)
Source codes in BASIC are given for six nonlinear minimisation methods and a
large selection of examples. The algorithms correspond, by and large, to those
presented later in this book.
LEQBO5 (Nash 1984b, 1985)
This single program module (actually there are three starting points for
execution) was conceived as a joke to show how small a linear algebra package
could be made. In just over 300 lines of BASIC is the capability to solve linear
equations, linear least squares, matrix inverse and generalised inverse, symmetric matrix eigenproblem and nonlinear least squares problems. The joke
back-fired in that the capability of this program, which ran on the Sinclair ZX81
computer among other machines, is quite respectable.
Kahaner, Moler and Nash (1989)
This numerical analysis textbook includes FORTRAN codes which illustrate the
material presented. The authors have taken pains to choose this software for
A starting point
11
quality. The user must, however, learn how to invoke the programs, as there is
no user interface to assist in problem specification and input.
Press et al (1986) Numerical Recipes
This is an ambitious collection of methods with wide capability. Codes are
offered in FORTRAN, Pascal, and C. However, it appears to have been only
superficially tested and the examples presented are quite simple. It has been
heavily advertised.
Many other products exist and more are appearing every month. Finding out
about them requires effort, the waste of which can sometimes be avoided by using
modern online search tools. Sadly, more effort is required to determine the quality of
the software, often after money has been spent.
Finally on sources of software, readers should be aware of the Association for
Computing Machinery (ACM) Transactions on Mathematical Software which publishes research papers and reports algorithms. The algorithms themselves are available after a delay of approximately 1 year on NETLIB and are published in full in the
Collected Algorithms of the ACM. Unfortunately, many are now quite large programs, and the Transactions on Mathematical Software (TOMS) usually only
publishes a summary of the codes, which is insufficient to produce a working
program. Moreover, the programs are generally in FORTRAN.
Other journals which publish algorithms in some form or other are Applied
Statistics (Journal of the Royal Statistical Society, Part C), the Society for Industrial
and Applied Mathematics (SIAM) journals on Numerical Analysis and on Scientific
and Statistical Computing, the Computer Journal (of the British Computer Society),
as well as some of the specialist journals in computational statistics, physics,
chemistry and engineering. Occasionally magazines, such as Byte or PC Magazine,
include articles with interesting programs for scientific or mathematical problems.
These may be of very variable quality depending on the authorship, but some
exceptionally good material has appeared in magazines, which sometimes offer the
codes in machine-readable form, such as the Byte Information Exchange (BIX) and
disk ordering service. The reader has, however, to be assiduous in verifying the
quality of the programs.
1.4. PROGRAMMING LANGUAGES USED AND STRUCTURED
PROGRAMMING
The algorithms presented in this book are designed to be coded quickly and easily for
operation on a diverse collection of possible target machines in a variety of
programming languages. Originally, in preparing the first edition of the book, I
considered presenting programs in BASIC, but found at the time that the various
dialects of this language were not uniform in syntax. Since then, International
Standard Minimal BASIC (IS0 6373/ 1984) has been published, and most commonly
available BASICS will run Minimal BASIC without difficulty. The obstacle for the user is
that Minimal BASIC is too limited for most serious computing tasks, in that it lacks
string and file handling capabilities. Nevertheless, it is capable of demonstrating all
the algorithms in this book.
12
A starting point
13
It should also be noted that I have taken pains to make it easy to save a copy of
the screen output to a file by duplicating all the output statements, that is the write
and writeln commands, so that output is copied to a file which the user may name.
(These statements are on the disk files, but deleted from the listings to reduce space
and improve readability.) Input is allowed from an input file to allow examples to be
presented without the user needing to type the appropriate response other than the
name of the relevant example file.
Furthermore, I have taken advantage of features within the MS-DOS operating
system, and supported by compiler directives in Turbo Pascal, which allow for
pipelining of input and output. This has allowed me to use batch files to automate the
running of tests.
In the driver programs I have tried to include tests of the results of calculations, for
example, the residuals in eigenvalue computations. In practice, I believe it is
worthwhile to perform these calculations. When memory is at a premium, they can
be performed off-line in most cases. That is. the results can be saved to disk
(backing storage) and the tests computed as a separate task, with data brought in
from the disk only as needed.
These extra features use many extra bytes of code, but are, of course, easily
deleted. Indeed, for specific problems, 75% or more of the code can be removed.
1.5. CHOICE OF ALGORITHMS
The algorithms in this book have been chosen for their utility in solving a variety of
important problems in computational mathematics and their ease of implementation
to short programs using relatively little working storage. Many topics are left out,
despite their importance, because I feel that there has been insufficient development in
directions relevant to compact numerical methods to allow for a suitable algorithm
to be included. For example, over the last 15 years I have been interested in methods
for the mathematical programming problem which do not require a tableau to be
developed either explicitly or implicitly, since these techniques are generally quite
memory and code intensive. The appearance of the interior point methods associated
with the name of Karmarkar (1984) hold some hope for the future, but currently the
programs are quite complicated.
In the solution of linear equations, my exposition has been confined to Gauss
elimination and the Choleski decomposition. The literature on this subject is,
however, vast and other algorithms exist. These can and should be used if special
circumstances arise to make them more appropriate. For instance, Zambardino
(1974) presents a form of Gauss elimination which uses less space than the one
presented here. This procedure, in ALGOL, is called QUARTERSOLVE because only
n 2/4 elements are stored, though an integer vector is needed to store pivot
information and the program as given by Zambardino is quite complicated.
Many special methods can also be devised for matrices having special structures
such as diagonal bands. Wilkinson and Reinsch (1971) give several such algorithms for both linear equations and the eigenvalue problem. The programmer
with many problems of a special structure should consider these. However, I have
found that most users want a reliable general-purpose method for linear equations
14
because their day-to-day problems vary a great deal. I have deliberately avoided
including a great many algorithms in this volume because most users will likely be
their own implementors and not have a great deal of time to spend choosing,
coding, testing and, most of all, maintaining programs.
Another choice which has been made is that of only including algorithms which
are relatively small in the sense that they fit into the machine all at once. For
instance, in the solution of the algebraic eigenvalue problem on large computers,
conventionally one reduces the matrix to a special form such as a tridiagonal or a
Hessenberg matrix, solves the eigenproblem of the simpler system then backtransforms the solutions. Each of the three phases of this procedure could be
fitted into a small machine. Again, for the practitioner with a lot of matrices to
solve or a special requirement for only partial solution, such methods should be
employed. For the one-at-a-time users, however, there is three times as much
program code to worry about.
The lack of double-precision arithmetic on the machines I used to develop the
algorithms which are included has no doubt modified my opinions concerning
algorithms. Certainly, any algorithm requiring inner products of vectors, that is
(1.10)
cannot be executed as accurately without extended-precision arithmetic (Wilkinson 1963). This has led to my abandonment of algorithms which seek to find the
minimum of a function along a line by use of gradient information. Such
algorithms require the derivative along the line and employ an inner product to
compute this derivative. While these methods are distinctly unreliable on a
machine having only a single, low-precision arithmetic, they can no doubt be used
very effectively on other machines.
From the above discussion it will be evident that the principles guiding
algorithm selection have been:
(i) shortness of program which results from implementation and low storage
requirement, and
(ii) general utility of the method and importance of the problem which it solves.
To these points should be added:
(iii) proven reliability in a number of tests
(iv) the ease and speed with which a user can obtain useful results from the
algorithms.
The third point is very important. No program should be considered acceptable until
it has been tested fairly extensively. If possible, any method which gives solutions
that can be checked by computing diagnostics should compute such information
routinely. For instance, I have had users of my eigenvalue/eigenvector programs call
me to say, Your program doesnt work! In all cases to date they have been
premature in their judgement, since the residuals computed as a routine adjunct to
the eigensolution formation have shown the output to be reasonable even though it
might be very different from what the user expected. Furthermore, I have saved
A starting point
15
myself the effort of having to duplicate their calculation to prove the correctness of
the results. Therefore, if at all possible, such checks are always built into my
programs.
The fourth point is important if users are to be able to try out the ideas presented
in this book. As a user myself, I do not wish to spend many hours mastering the
details of a code. The programs are to be treated as tools, not an end in themselves.
These principles lead to the choice of the Givens reduction in algorithm 4 as a
method for the solution of least-squares problems where the amount of data is too
great to allow all of it to be stored in the memory at once. Similarly, algorithms 24
and 25 require the user to provide a rule for the calculation of the product of a
matrix and a vector as a step in the solution of linear equations or the algebraic
eigenproblem. However, the matrix itself need not be stored explicitly. This
avoids the problem of designing a special method to take advantage of one type of
matrix, though it requires rather more initiative from the user as it preserves this
measure of generality.
In designing the particular forms of the algorithms which appear, a conscious
effort has been made to avoid the requirement for many tolerances. Some
machine-dependent quantities are unfortunately needed (they can in some cases
be calculated by the program but this does lengthen the code), but as far as
possible, and especially in determining when iterative algorithms have converged,
devices are used which attempt to ensure that as many digits are determined as
the machine is able to store. This may lead to programs continuing to execute long
after acceptable answers have been obtained. However, I prefer to sin on the side
of excess rather than leave results wanting in digits. Typically, the convergence
test requires that the last and present iterates be identical to the precision of the
machine by means of a test such as
if x + delta + offset = x + offset then halt;
where offset is some modest number such as 10. On machines which have an
accumulator with extra digits, this type of test may never be satisfied, and must be
replaced by
y: = x + delta + offset;
z: = x + offset;
if y = z then halt;
The tolerance in this form of test is provided by the offset: within the computer the
representations of y and z must be equal to halt execution. The simplicity of this type
of test usually saves code though, as mentioned, at the possible expense of execution
time.
1.6. A METHOD FOR EXPRESSING ALGORITHMS
In the first edition of this work, algorithms were expressed in step-and-description
form. This allowed users to program them in a variety of programming languages.
Indeed, one of the strengths of the first edition was the variety of implementations.
At the time it appeared, a number of computers had their own languages or dialects,
16
and specialisation to one programming language would have inhibited users of these
special machines. Now, however, computer users are unlikely to be willing to type in
code if a machine-readable form of an algorithm exists. Even if the programming
language must be translated. having a workable form is useful as a starting point.
The original codes for the first edition were in BASIC for a Data General NOVA.
Later these codes were made to run on a North Star Horizon. Some were made to
work on a Hewlett-Packard 9830A. Present BASIC versions run on various common
microcomputers under the Microsoft BASIC dialect; however, since I have used very
conservative coding practices, apart from occasional syntactic deviations, they
conform to IS0 Minimal BASIC (IS0 6373-1984).
Rather than proof-reading the algorithms for the first edition, I re-coded them in
FORTRAN. These codes exist as NASHLIB, and were and are commercially available
from the author. I have not, however, been particularly satisfied that the FORTRAN
implementation shows the methods to advantage, since the structure of the algorithms seems to get lost in the detail of FORTRAN code. Also, the working parts of the
codes are overshadowed by the variable definitions and subroutine calls. Compact
methods are often best placed in-line rather than called as standard subprograms as I
have already indicated.
In the current edition, I want to stay close to the original step-and-description
form of the algorithms, but nevertheless wish to offer working codes which could be
distributed in machine-readable form. I have chosen to present the algorithms in
Borland Turbo Pascal. This has the following justification.
(i) Pascal allows comments to be placed anywhere in the code, so that the
original style for the algorithms, except for the typographic conventions, could be
kept.
(ii) Turbo Pascal is available for many machines and is relatively inexpensive. It
is used as a teaching tool at many universities and colleges, including the University
of Ottawa. Version 3.01a of the Turbo Pascal system was used in developing the
codes which appear here. I intend to prepare versions of the codes to run under later
versions of this programming environment.
(iii) The differences between Turbo and Standard Pascal are unlikely to be
important for the methods, so that users of other Pascal compilers can also use these
codes.
(iv) Pascal is close enough to many other programming languages to allow for
straightforward translation of the codes.
A particular disadvantage of Turbo Pascal for my development work is that I have
yet to find a convenient mechanism allowing automatic compilation and execution of
codes, which would permit me to check a complete set of code via batch execution.
From the perspective of the physical length of the listings, the present algorithms are
also longer than I would like because Pascal requires program headings and
declarations. In the procedural parts of the codes, begin and end statements also
add to the line count to some extent.
From the user perspective, the requirement that matrix sizes be explicitly specified
can be a nuisance. For problems with varying matrix sizes it may be necessary to
compile separate versions of programs.
A starting point
17
Section 1.8 notes some other details of algorithm expression which relate to the
ease of use of the codes.
18
The use of an include file which is not a complete procedure or function is not
permitted by Turbo Pascal 5.0.
2. The same program code (startup.pas) allows an output file to be specified so
that all output which appears on the console screen is copied to a file. The name
for this file is stored in the global variable confname, and the file is referred to in
programs by the global text variable confile. Output is saved by the crude but
effective means of duplicating every write(. . .) and writeln(. . .) statement with
equivalent write(confile, . . .) and writeln(confile, . . .) statements.
3. To make the algorithms less cluttered, these write and writeln statements to
confile do not appear in the listings. They are present in the files on diskette.
4. To discourage unauthorised copying of the diskette files, all commentary and
documentation of the algorithm codes has been deleted.
5. To allow for execution of the programs from operating system commands (at
least in MS-DOS), compiler directives have been included at the start of all
driver programs. Thus, if a compiled form of code dr0102.pas exists as
dr0102.com, and a file dr0102x contains text equivalent to the keyboard input
needed to correctly run this program, the command
dr0102 < dr0102x
will execute the program for the given data.
Chapter 2
i = 1, 2, . . . , n
(2.1)
between the n quantities xj with the coefficients Aij and right-hand side elements
bi, i = 1, 2, . . . , n, then (2.1) is a set of n simultaneous linear equations in n
unknowns xj, j = 1, 2, . . . , n. It is simpler to write this problem in matrix form
Ax = b
(2.2)
where the coefficients have been collected into the matrix A, the right-hand side is
now the vector b and the unknowns have been collected as the vector x. A further
way to write the problem is to collect each column of A (say the jth) into a column
vector (i.e. aj). Then we obtain
(2.3)
Numerous textbooks on linear algebra, for instance Mostow and Sampson
(1969) or Finkbeiner (1966), will provide suitable reading for anyone wishing to
19
20
learn theorems and proofs concerning the existence of solutions to this problem.
For the purposes of this monograph, it will suffice to outline a few basic properties
of matrices as and when required.
Consider a set of n vectors of length n, that is
a1, a2, . . . , an.
(2.4)
These vectors are linearly independent if there exists no set of parameters
xj, j = 1, 2, . . . , n (not all zero), such that
(2.5)
where 0 is the null vector having all components zero. If the vectors aj are now
assembled to make the matrix A and are linearly independent, then it is always
possible to find an x such that (2.2) is satisfied. Other ways of stating the
condition that the columns of A are linearly independent are that A has full rank
or
rank(A) = n
(2.6)
or that A is non-singular,
If only k < n of the vectors are linearly independent, then
rank(A) = k
(2.7)
21
(2.8)
for x = 0
for x = 1.
(2.9)
(2.10)
Ax b
(2.14)
(2.15)
(2.16)
|| r || > 0
(2.17)
|| cr || = || c || || r ||
(2.18)
0, and || 0 || = 0,
22
(2.20)
will be used. The superscript T denotes transposition, so the norm is a scalar. The
square of the Euclidean norm of r
(2.21)
is appropriately called the sum of squares. The least-squares solution x of (2.14) is
that set of parameters which minimises this sum of squares. In cases where
rank(A) < n this solution is not unique. However, further conditions may be
imposed upon the solution to ensure uniqueness. For instance. it may be required
that in the non-unique case, x shall be that member of the set of vectors which
minimises rT r which has x T x a minimum also. In this case x is the unique
minimum-length least-squares solution.
If the minimisation of r T r with respect to x is attempted directly, then using
(2.15) and elementary calculus gives
AT Ax = A T b
(2.22)
as the set of conditions which x must satisfy. These are simply n simultaneous
linear equations in n unknowns x and are called the normal equations. Solution of
the least-squares problem via the normal equations is the most common method
by which such problems are solved. Unfortunately, there are several objections to
such an approach if it is not carefully executed, since the special structure of ATA
and the numerical instabilities which attend its formation are ignored at the peril
of meaningless computed values for the parameters x.
Firstly, any matrix B such that
x T Bx > 0
for all x
(2.23)
(2.24)
for all x, B is non-negative definite or positive semidefinite. The last two terms are
synonymous. The matrix AT A gives the quadratic form
Q = xTAT Ax
(2.25)
(2.26)
(2.27)
23
(2.14)
is obtained. The data and solutions for this problem are given as table 3.1 and
example 3.2.
24
(2.29)
where rij is the error made in taking the measurement. If m height differences are
measured, the best set of heights h is obtained as the solution to the least-squares
problem
minimise rT r
(2.30)
where
r = b Ah
and each row of A has only two non-zero elements, 1 and -1, corresponding to
the indices of the two points involved in the height-difference measurement. Sometimes the error is defined as the weighted residual
rij = [bij (hi hj)]d ij
where dij is the horizontal distance between the two points (that is, the measurement error increases with distance).
A special feature of this particular problem is that the solution is only
determined to within a constant, h0, because no origin for the height scale has
been specified. In many instances, only relative heights are important, as in a
study of subsidence of land. Nevertheless, the matrix A is rank-deficient, so any
method chosen to solve the problem as it has been presented should be capable of
finding a least-squares solution despite the singularity (see example 19.2).
2.4. THE INVERSE AND GENERALISED INVERSE OF A MATRIX
An important concept is that of the inverse of a square matrix A. It is defined as
that square matrix, labelled A-1, such that
A-1A = AA-1 = 1 n
(2.31)
where 1n is the unit matrix of order n. The inverse exists only if A has full rank.
Algorithms exist which compute the inverse of a matrix explicitly, but these are of
value only if the matrix inverse itself is useful. These algorithms should not be
used, for instance, to solve equation (2.2) by means of the formal expression
x = A- 1 b
(2.32)
25
(2.34)
A+ A =
(2.35)
but in this case x is not defined uniquely since it can contain arbitrary components
from the orthogonal complement of the space spanned by the columns of A. That
is, we have
(2.36)
x = A+ b + (1 n A + A) g
where g is any vector of order n.
The normal equations (2.22) must still be satisfied. Thus in the full-rank case, it
is straightforward to identify
A+ = (AT A) -lA T .
(2.37)
(2.39)
(2.40)
(AA+)T = AA+
(2.41)
and
this can indeed be made to happen. The proposed solution (2.36) is therefore a
least-squares solution under the conditions (2.40) and (2.41) on A+. In order that
(2.36) gives the minimum-length least-squares solution, it is necessary that xT x be
minimal also. But from equation (2.36) we find
x T x = b T (A+ ) T A+ b + g T (1 A+A) T ( 1 A+A)g + 2g T(1 A+ A) TA+ b (2.42)
which can be seen to have a minimum at
g =0
if
(1 A + A) T
(2.43)
26
is the annihilator of A+ b, thus ensuring that the two contributions (that is, from b
and g) to x T x are orthogonal. This requirement imposes on A + the further
conditions
A+ AA + = A+
(2.44)
(A+ A)T = A+ A.
(2.45)
The four conditions (2.40), (2.41), (2.44) and (2.45) were proposed by Penrose
(1955). The conditions are not, however, the route by which A + is computed.
Here attention has been focused on one generalised inverse, called the MoorePenrose inverse. It is possible to relax some of the four conditions and arrive at
other types of generalised inverse. However, these will require other conditions to
be applied if they are to be specified uniquely. For instance, it is possible to
consider any matrix which satisfies (2.40) and (2.41) as a generalised inverse of A
since it provides, via (2.33), a least-squares solution to equation (2.14). However,
in the rank-deficient case, (2.36) allows arbitrary components from the null space
of A to be added to this least-squares solution, so that the two-condition generalised inverse is specified incompletely.
Over the years a number of methods have been suggested to calculate generalised
inverses. Having encountered some examples of dubious design, coding or applications of such methods, I strongly recommend testing computed generalised inverse
matrices to ascertain the extent to which conditions (2.40), (2.41), (2.44) and (2.45)
are satisfied (Nash and Wang 1986).
2.5. DECOMPOSITIONS OF A MATRIX
In order to carry out computations with matrices, it is common to decompose
them in some way to simplify and speed up the calculations. For a real m by n
matrix A, the QR decomposition is particularly useful. This equates the matrix A
with the product of an orthogonal matrix Q and a right- or upper-triangular
matrix R, that is
A = QR
(2.46)
where Q is m by m and
QT Q = QQT = 1 m
(2.47)
and R is m by n with all elements
Rij = 0
for i > j.
(2.48)
(2.50)
V T V = VVT = 1 n .
(2.5 1)
Sij = 0
for i
27
Note that the zeros below the diagonal in both R and S imply that, apart from
orthogonality conditions imposed by (2.47), the elements of columns (n + 1),
(n + 2), . . . , m of Q are arbitrary. In fact, they are not needed in most calculations, so will be dropped, leaving the m by n matrix U, where
UT U = 1n .
(2.52)
T
(2.53)
(2.54)
In the literature this is also known as the LU decomposition from the use of U for
upper triangular. Here another mnemonic, U for unitary has been employed.
If the matrix A is symmetric and positive definite, the decomposition
A = LLT
(2.55)
is possible and is referred to as the Choleski decomposition.
A scaled form of this decomposition with unit diagonal elements for L can be written
A = LDL T
where D is a diagonal matrix.
To underline the importance of decompositions, it can be shown by direct
substitution that if
A = USVT
(2.53)
then the matrix
(2.56)
A+ = VS+ U T
where
for Sii 0
S = 1/S ii
(2.57)
0
for Sii = 0
satisfies the four conditions (2.40), (2.41), (2.44) and (2.45), that is
AA + A = USVT VS+ UT USVT
= USS+ SVT
= USVT = A
(2.58)
(2.59)
28
(2.60)
and
T T
(A+ A)T = (VS + UT USVT)T = ( VS+ SV )
= VS + SVT = A+ A.
(2.61)
+
and
(2.62)
There will be up to n eigensolutions (e, x) for any matrix (Wilkinson 1965) and
finding them for various types of matrices has given rise to a rich literature. In
many cases, solutions to the generalised eigenproblem
Ax = eBx
(2.63)
are wanted, where B is another n by n matrix. For matrices which are of a size
that the computer can accommodate, it is usual to transform (2.63) into type
(2.62) if this is possible. For large matrices, an attempt is usually made to solve
(2.63) itself for one or more eigensolutions. In all the cases where the author has
encountered equation (2.63) with large matrices, A and B have fortunately been
symmetric, which provides several convenient simplifications, both theoretical and
computational.
Example 2.5. Illustration of the matrix eigenvalue problem
In quantum mechanics, the use of the variation method to determine approximate
energy states of physical systems gives rise to matrix eigenvalue problems if the
trial functions used are linear combinations of some basis functions (see, for
instance, Pauling and Wilson 1935, p 180ff).
If the trial function is F, and the energy of the physical system in question is
described by the Hamiltonian operator H, then the variation principle seeks
stationary values of the energy functional
(F, HF)
C = (E, F)
(2.64)
subject to the normalisation condition
(F, F) = 1
(2.65)
29
(2.63)
(2.67)
(2.68)
with
and
It is obvious that if B is a unit matrix, that is, if
1
for i = j
(fi, fi) = ij = 0
for i j
(2.69)
Previous
Chapter 3
3.1. INTRODUCTION
This chapter presents an algorithm for accomplishing the powerful and versatile
singular-value decomposition. This allows the solution of a number of problems to
be realised in a way which permits instabilities to be identified at the same time.
This is a general strategy I like to incorporate into my programs as much as
possible since I find succinct diagnostic information invaluable when users raise
questions about computed answers-users do not in general raise too many idle
questions! They may, however, expect the computer and my programs to produce
reliable results from very suspect data, and the information these programs
generate together with a solution can often give an idea of how trustworthy are
the results. This is why the singular values are useful. In particular, the appearance of singular values differing greatly in magnitude implies that our data are
nearly collinear. Collinearity introduces numerical problems simply because small
changes in the data give large changes in the results. For example, consider the
following two-dimensional vectors:
A = (1, 0)T
B = (1, 01)T
C = (095, 01)T.
Vector C is very close to vector B, and both form an angle of approximately 6
with vector A. However, while the angle between the vector sums (A + B) and
(A + C) is only about 0.07, the angle between (B A) and (C A) is greater
than 26. On the other hand, the set of vectors
A = (1, 0)T
D = (0, 1)T
E = (0, 095)T
gives angles between (A + D) and (A + E) and between (D A) and (E A) of
approximately 15. In summary, the sum of collinear vectors is well determined,
the difference is not. Both the sum and difference of vectors which are not
collinear are well determined.
30
Home
Next
31
(3.1)
where
(3.2)
and
VVT = VT V = 1n .
(3.3)
{ 10
for i j
for i = j.
(3.4)
The quantities Si may, as yet, be either positive or negative, since only their
square is defined by equation (3.2). They will henceforth be taken arbitrarily as
positive and will be called singular values of the matrix A. The vectors
uj = bj/Sj
(3.5)
which can be computed when none of the Sj is zero, are unit orthogonal vectors.
Collecting these vectors into a real m by n matrix, and the singular values into a
diagonal n by n matrix, it is possible to write
B = US
(3.6)
UT U = 1 n
(3.7)
where
is a unit matrix of order n.
In the case that some of the Sj are zero, equations (3.1) and (3.2) are still valid,
but the columns of U corresponding to zero singular values must now be
32
constructed such that they are orthogonal to the columns of U computed via
equation (3.5) and to each other. Thus equations (3.6) and (3.7) are also satisfied.
An alternative approach is to set the columns of U corresponding to zero singular
values to null vectors. By choosing the first k of the singular values to be the
non-zero ones, which is always possible by simple permutations within the matrix
V, this causes the matrix UT U to be a unit matrix of order k augmented to order n
with zeros. This will be written
(3.8)
While not part of the commonly used definition of the svd, it is useful to require
the singular values to be sorted, so that
S11 > S22 > S33 > . . . > Skk > . . . > Snn.
This allows (2.53) to be recast as a summation
(2.53a)
Partial sums of this series give a sequence of approximations
1, 2, . . . , n .
where, obviously, the last member of the sequence
n = A
since it corresponds to a complete reconstruction of the svd. The rank-one matrices
u j S jj v T j
can be referred to as singular planes, and the partial sums (in order of decreasing
singular values) are partial svds (Nash and Shlien 1987).
A combination of (3.1) and (3.6) gives
AV = US
(3.9)
(2.53)
33
X = x cos + y sin
Y = x sin + y cos .
(3.13)
(3.14)
There is a variety of choices for the angle , or more correctly for the sine and
cosine of this angle, which satisfy (3.14). Some of these are mentioned by
Hestenes (1958), Chartres (1962) and Nash (1975). However, it is convenient if
the rotation can order the columns of the orthogonalised matrix B by length, so
that the singular values are in decreasing order of size and those which are zero
(or infinitesimal) are found in the lower right-hand corner of the matrix S as in
equation (3.8). Therefore, a further condition on the rotation is that
XT X xT x > 0.
(3.15)
34
Because only two columns are involved in the kth rotation, we have
Z(k) = Z(k-1) + (XT Y)2 (x T y) 2 .
(3.18)
(3.19)
and
p = xT y
q = xT x y T y
(3.20)
(3.21)
v = (4p2 + q2) .
(3.22)
They are
cos = [(v + q)/(2v ) ]
sin = p/(v cos )
for q > 0
for q < 0
(3.23)
(3.24)
(3.25)
(3.26)
where
1
sgn (p) = 1
for p > 0
for p < 0.
(3.27)
Note that having two forms for the calculation of the functions of the angle of
rotation permits the subtraction of nearly equal numbers to be avoided. As the
matrix nears orthogonality p will become small, so that q and v are bound to have
nearly equal magnitudes.
In the first edition of this book, I chose to perform the computed rotation only
when q > r, and to use
sin ( ) = 1
cos ( ) = 0
(3.28)
when q < 0. This effects an interchange of the columns of the current matrix A.
However, I now believe that it is more efficient to perform the rotations as defined in
the code presented. The rotations (3.28) were used to force nearly null columns of the
final working matrix to the right-hand side of the storage array. This will occur when
the original matrix A suffers from linear dependencies between the columns (that is,
is rank deficient). In such cases, the rightmost columns of the working matrix
eventually reflect the lack of information in the data in directions corresponding to
the null space of the matrix A. The current methods cannot do much about this lack
of information, and it is not sensible to continue computations on these columns. In
the current implementation of the method (Nash and Shlien 1987), we prefer to
ignore columns at the right of the working matrix which become smaller than a
35
specified tolerance. This has a side effect of speeding the calculations significantly
when rank deficient matrices are encountered.
3.4. A FINE POINT
Equations (3.15) and (3.19) cause the algorithm just described obviously to
proceed towards both an orthogonalisation and an ordering of the columns of the
resulting matrix A(z). However the rotations must be arranged in some sequence
to carry this task to completion. Furthermore, it remains to be shown that some
sequences of rotations will not place the columns in disorder again. For suppose
a1 is orthogonal to all other columns and larger than any of them individually. A
sequential arrangement of the rotations to operate first on columns (1, 2), then
(1, 3), (1, 4), . . . , (1, n), followed by (2, 3), . . . , (2, n), (3, 4), . . . , ((n 1), n) will
be called a cycle or sweep. Such a sweep applied to the matrix described can easily
yield a new a2 for which
(3.29)
if, for instance, the original matrix has a2 = a3 and the norm of these vectors is
greater than 2- times the norm of a1. Another sweep of rotations will put
things right in this case by exchanging a1 and a2. However, once two columns
have achieved a separation related in a certain way to the non-orthogonality
measure (3.17), it can be shown that no subsequent rotation can exchange them.
Suppose that the algorithm has proceeded so far that the non-orthogonality
measure Z satisfies the inequality
Z < t2
(3.30)
where t is some positive tolerance. Then, for any subsequent rotation the
parameter p, equation (3.21), must obey
p2 < t2.
(3.31)
(3.33)
36
37
38
39
STEP 5. While the inner product used to compute p = x y must still be performed,
it is possible to use equation (3.33) and the corresponding result for Y, that is
Y T Y yT y = ( v q)/2
(3.35)
to compute the updated column norms after each rotation. There is a danger that
nearly equal magnitudes may be subtracted, with the resultant column norm having a
large relative error. However, if the application requires information from the largest
singular values and vectors, this approach offers some saving of effort. The changes
needed are:
(1) an initial loop to compute the Z[i], that is, the sum of squares of the elements
of each column of the original matrix A;
(2) the addition of two statements to the end of the main svd loop on k, which,
if a rotation has been performed, update the column norms Z[j] and Z[k] via
formulae (3.34) and (3.35). Note that in the present algorithm the quantities
needed for these calculations have not been preserved. Alternatively, add at
the end of STEP 8 (after the rotation) the statements
Z[j] : = Z[j] + 05*q*(vt r);
Z[k] : = Z[k] 05*q*(vt r);
if Z[k] < 00 then Z[k] : = 00;
and at the end of STEP 9 the statements
Z[j] := Z[j] + 05*r*(vt q);
Z[k] := Z[k] 05*r*(vt q);
if Z[k] < 00 then Z[k] := 00;
(3) the deletion of the assignments
Z[j] := q;
Z[k] := r;
40
Singular values
3.3658407311E+00 1.0812763036E+00 6.7431328720E-01 5.3627598567E-01
3.3658407311E+00 1.0812763036E+00 6.7431328701E-01 5.3627598503E-01
Hilbert segment:
Column orthogonality of U
Largest inner product is 5, 5 = -1.44016460160157E-006
Largest inner product is 3, 3 = 5.27355936696949E-016
Singular values
1.27515004411E+000 4.97081651063E-001 1.30419686491E-001 2.55816892287E-002
1.27515004411E+000 4.97081651063E-001 1.30419686491E-001 2.55816892259E-002
3.60194233367E-003
3.60194103682E-003
3.6. USING THE SINGULAR-VALUE DECOMPOSITION TO SOLVE
LEAST-SQUARES PROBLEMS
By combining equations (2.33) and (2.56), the singular-value decomposition can
be used to solve least-squares problems (2.14) via
x = VS+ UT b.
(3.36)
41
(3.40)
w = S - 1UT b.
(3.41)
(3.42)
(3.43)
(3.46)
is the first singular value less than or equal to the tolerance, then
wi = 0
for i > k.
(3.47)
The components corresponding to small singular values are thus dropped from the
solution. But it is these components which are the least accurately determined
since they arise as differences. Furthermore, from (3.6) and (3.45)
r T r = b T b bT USS+ U T b
(3.48)
where the limit of the sum in (3.48) is k, the number of principal components
which are included. Thus inclusion of another component cannot increase the
42
residual sum of squares. However, if a component with a very small singular value
is introduced, it will contribute a very large amount to the corresponding element
of w, and x will acquire large elements also. From (3.48), however, it is the
interaction between the normalised component uj and b which determines how
much a given component reduces the sum of squares. A least-squares problem
will therefore be ill conditioned if b is best approximated by a column of U which
is associated with a small singular value and thus may be computed inaccurately.
On the other hand, if the components corresponding to large singular values
are the ones which are responsible for reducing the sum of squares, then the
problem has a solution which can be safely computed by leaving out the
components which make the elements of w and x large without appreciably
reducing the sum of squares. Unless the unwanted components have no part in
reducing the sum of squares, that is unless
uiT b = 0
for i > k
(3.49)
under the same condition (3.46) for k, then solutions which omit these components
are not properly termed least-squares solutions but principal-components solutions.
In many least-squares problems, poorly determined components will not arise,
all singular values being of approximately the same magnitude. As a rule of
thumb for my clients, I suggest they look very carefully at their data, and in
particular the matrix A, if the ratio of the largest singular value to the smallest
exceeds 1000. Such a distribution of singular values suggests that the columns of A
are not truly independent and, regardless of the conditioning of the problem as
discussed above, one may wish to redefine the problem by leaving out certain
variables (columns of A) from the set used to approximate b.
Algorithm 2. Least-squares solution via singular-value decomposition
procedure svdlss(nRow, nCo1: integer; {order of problem}
W : wmatrix; {working array with decomposition}
Y : rvector; {right hand side vector}
Z : r-vector; {squares of singular values}
A : rmatrix; {coefficient matrix (for residuals)}
var Bvec: r-vector); {solution vector}
{alg02.pas ==
least squares solution via singular value decomposition.
On entry, W must have the working matrix resulting from the operation of
NashSVD on a real matrix A in alg1.pas. Z will have the squares of the
singular values. Y will have the vector to be approximated. Bvec will be
the vector of parameters (estimates) returned. Note that A could be
omitted if residuals were not wanted. However, the user would then lose
the ability to interact with the problem by changing the tolerance q.
Because this uses a slightly different decomposition from that in the
first edition of Compact Numerical Methods, the step numbers are not
given.
Copyright 1988 J. C. Nash
}
var
i, j, k : integer;
q, s : real;
43
44
Example 3.1. The generalised inverse of a rectangular matrix via the singular-value
decomposition
Given the matrices U, V and S of the singular-value decomposition (2.53), then by
the product
(2.56)
A+ = VS+ UT
the generalised (Moore-Penrose) inverse can be computed directly. Consider the
matrix
and
The generalised inverse using the definition (2.57) of S+ is then (to six figures)
45
in place of
In the above solutions and products, all figures printed by the HP 9830 have been
given rather than the six-figure approximations used earlier in the example.
Example 3.2. Illustration of the use of algorithm 2
The estimation of the coefficients xi, i = 1, 2, 3, 4, 5, in example 2.3 (p. 23),
provides an excellent illustration of the worth of the singular-value decomposition
for solving least-squares problems when the data are nearly collinear. The data for
the problem are given in table 3.1.
To evaluate the various solutions, the statistic
(3.50)
will be used, where
r = b Ax
(2.15)
is the residual vector and is the mean of the elements of b, the dependent
variable. The denominator in the second term of (3.50) is often called the total
sum of squares since it is the value of the residual sum of squares for the model
y = constant =
2
(3.51)
The statistic R can be corrected for the number of degrees of freedom in the
least-squares problem. Thus if there are m observations and k fitted parameters,
46
Nitrogen
Phosphate
Potash
Petroleum
305
342
331
339
354
369
378
368
405
438
438
451
485
563
658
676
749
834
973
1079
1151
1324
1499
1690
1735
1778
262
291
294
302
320
350
386
401
446
492
510
534
559
461
473
513
516
540
596
650
676
769
870
907
932
956
221
222
221
218
217
218
218
225
228
230
237
235
236
TABLE 3.2. Solutions for various principal-component regressions using the data in table 3.1.
Tolerance
for zero
(a)
(b)
(c)
22
(d)
40
R*
0972586
(0958879)
0969348
(0959131)
0959506
(0951407)
093839
(0932789)
X nitrogen
xconstant
x phosphate
xcotash
xpetroleum
0290373
0046191
10194
015983
433368E3
585314E2
11757
0252296
0699621
514267E3
434851E2
0392026
693389E2
10115
207782
254597E3
015299
0300127
0469294
0528881
(f)
0968104
(0965204)
0273448
179375
224851E3
0518966
0945525
48
(3.53)
for which the singular values are computed as 147119 and 087188, again quite
collinear. The solutions are (e) and (f) in table 3.2 and the values of R2 speak for
themselves.
A sample driver program DR0102.PAS is included on the program diskette.
Appendix 4 describes the sample driver programs and supporting procedures and
functions.
Chapter 4
c = cos
s = sin
(4.2)
(4.3)
(4.4)
50
This is a simpler angle calculation than that of 3.3 for the orthogonalisation
process, since it involves only one square root per rotation instead of two. That is,
if
(4.5)
then we have
and
c = z 1 /p
(4.6)
s = y1/p.
(4.7)
It is possible, in fact, to perform such transformations with no square roots at
all (Gentleman 1973, Hammarling 1974, Golub and Van Loan 1983) but no way has
so far come to light for incorporating similar ideas into the orthogonalising rotation
of 3.3. Also, it now appears that the extra overhead required in avoiding the square
root offsets the expected gain in efficiency, and early reports of gains in speed now
appear to be due principally to better coding practices in the square-root-free
programs compared to their conventional counterparts.
The Givens transformations are assembled in algorithm 3 to triangularise a real
m by n matrix A. Note that the ordering of the rotations is crucial, since an
element set to zero by one rotation must not be made non-zero by another.
Several orderings are possible; algorithm 3 acts column by column, so that
rotations placing zeros in column k act on zeros in columns 1, 2, . . . , (k - 1) and
leave these elements unchanged. Algorithm 3 leaves the matrix A triangular, that
is
A [i,j] = 0
for i > j
(4.8)
which will be denoted R. The matrix Q contains the transformations, so that the
original m by n matrix is
A = QR.
(4.9)
In words, this procedure simply zeros the last (m 1) elements of column 1,
then the last (m 2) elements of column 2, . . . , and finally the last ( m n )
elements of column n.
Since the objective in considering the Givens reduction was to avoid storing a
large matrix, it may seem like a step backwards to discuss an algorithm which
introduces an m by m matrix Q. However, this matrix is not needed for the
solution of least-squares problems except in its product Q T b with the right-hand
side vector b. Furthermore, the ordering of the rotations can be altered so that
they act on one row at a time, requiring only storage for this one row and for the
resulting triangular n by n matrix which will again be denoted R, that is
(4.10)
In the context of this decomposition, the normal equations (2.22) become
(4.11)
51
(4.13)
52
After the Givens procedure is complete, the array A contains the triangular factor R in rows 1
to mn. If nRow is less than nCol, then the right-hand (nCo1 nRow) columns of array A contain the
transformed columns of the original matrix so that the product Q*R = A, in which R is now
trapezoidal. The decomposition can be used together with a back-substitution algorithm such as
algorithm 6 to solve systems of linear equations.
The order in which non-zero elements of the working array are transformed to zero is not
unique. In particular, it may be important in some applications to zero elements row by row
instead of column by column. The ,file alg03a.pas on the software disk presents such a row-wise
variant. Appendix 4 documents driver programs DR03.PAS and DR03A.PAS which illustrate
how the two Givens reduction procedures may be used.
53
the factors Q and R gives back the original matrix apart from very small errors
which are of the order of the machine precision multiplied by the magnitude of
the elements in question.
*RUN
TEST GIVENS - GIFT - ALG 3 DEC: 12 77
SIZE -- M= ? 3 N= ? 4
MTIN - INPUT M BY N MATRIX
ROW 1 : ? 1 ? 2 ? 3 ? 4
ROW 2 : ? 5 ? 6 ? 7 ? 8
ROW 3 : ? 9 ? 10 ? 11 ? 12
ORIGINAL A MATRIX
1
ROW 1 :
ROW 2 :
5
9
HOW 3 :
2
6
10
3
7
11
MATRIX
1 :
2 :
3 :
K= 3
J = 1
A MATRIX
ROW 1 :
ROW 2 :
ROW 3 :
Q
ROW
ROW
ROW
MATRIX
1 :
2 :
3 :
J= 2
FINAL
ROW 1
ROW 2
ROW 3
A[J,J]= 1
5.09902
-1.19209E-07
9
.196116
.980581
0
0
0
1
A[K,J]= 5
6.27572
-.784466
10
-.980581
.l96116
0
A[J,J]= 5.09902
7.45242
-1.56893
11
8.62912
-2.3534
12
0
0
1
A[K,J]= 9
10.3441
-1.19209E-07
0
11.7942
-.784466
-.530862
13.2443
-1.56893
-1.06172
9.66738E-02
.483369
.870063
-.980581
.196116
0
-.170634
-.853168
.492941
K= 3
A[J,J]=-.784466
A[K,J]=-.530862
A MATRIX
11.7942
13.2443
10.3441
:
.947208
:
9.87278E-08
1.89441
-6.68l09E-08
:
0
-9.53674E-07
FINAL Q MATRIX
ROW 1 :
9.66738E-02
ROW 2 :
.483369
ROW 3 :
.870063
.907738
.315737
.276269
-.40825
.816498
-.4408249
RECOMBINATION
ROW 1 :
ROW 2 :
ROW 3 :
2.00001
6.00002
10
3.00001
7.00002
11
1
5.00001
9.00001
4
8
12
14.6944
-2.3534
-1.59258
14.6944
2.84162
-1.90735E-06
4.00001
8.00002
12
54
(4.15)
(4.16)
with
SVT VS = S2 .
(3.17)
(4.18)
or
(4.19)
which is a singular-value decomposition.
As in the case of the columnwise orthogonalisation, small singular values (i.e.
rows of P TR having small norm) will cause V to possess some unnormalised rows
having essentially zero elements. In this case (4.17) will not be correct. since
(4.20)
where k is the number of singular values larger than some pre-assigned tolerance
for zero. Since in the solution of least-squares problems these rows always act
only in products with S or S+, this presents no great difficulty to programming an
algorithm using the above Givens reduction/row orthogonalisation method.
4.4. SOME LABOUR-SAVING DEVICES
The above method is not nearly so complicated to implement as it may appear.
Firstly, all the plane rotations are row-wise for both the Givens reduction and the
orthogonalisation. Moreover, one or more (say g) vectors b can be concatenated
with the matrix A so that rotations do not have to be applied separately to these,
but appear to act on a single matrix.
The second observation which reduces the programming effort is that the rows
of this matrix (A, b) are needed only one at a time. Consider a working array
55
(n + 1) by (n + g) which initially has all elements in the first n rows equal to zero.
Each of the m observations or rows of (A, b) can be loaded in succession into the
(n + 1)th row of the working array. The Givens rotations will suitably fill up the
workspace and create the triangular matrix R. The same workspace suffices for the
orthogonalisation. Note that the elements of the vectors d2 are automatically left
in the last g elements of row (n + 1) of the working array when the first n have
been reduced to zero. Since there are only (m n) components in each d2, but m
rows to process, at least n of the values left in these positions will be zero. These
will not necessarily be the first n values.
A further feature of this method is that the residual sum of squares, r T r, is
This can be shown quite easily since
equal to the sum of squared terms
r T r = (b Ax) T (b Ax)
= bTb bT Ax xTA Tb + xTATAx.
(4.21)
By using the normal equations (2.22) the last two terms of this expression cancel
leaving
(4.22)
r T r = b T b b T Ax.
If least-squares problems with large numbers of observations are being solved
via the normal equations, expression (4.22) is commonly used to compute the
residual sum of squares by accumulating bT b, AT A and AT b with a single pass
through the data. In this case, however, (4.22) almost always involves the
subtraction of nearly equal numbers. For instance, when it is possible to approximate b very closely with Ax, then nearly all the digits in b T b will be cancelled by
those in b TAx, leaving a value for r T r with very few correct digits.
For the method using rotations, on the other hand, we have
(4.23)
and
(4.24)
by equation (4.12). Hence, by substitution of (4.23) and (4.24) into (4.22) we
obtain
(4.25)
The cancellation is now accomplished theoretically with the residual sum of
squares computed as a sum of positive terms, avoiding the digit cancellation.
The result (4.25) is derived on the assumption that (4.14) holds. In the
rank-deficient case, as shown by k zero or small singular values, the vector f in
equation (4.15) can be decomposed so that
(4.26)
where f1 is of order (n k) and f 2 of order k. Now equation (4.24) will have the
form
(4.27)
56
by application of equation (4.16) and the condition that Sk+l, Sk+2, . . . , Sn, are all
zero. Thus, using
(4.28)
and (4.22) with (4.27) and (4.23), the residual sum of squares in the rank-deficient
case is
(4.29)
From a practical point of view (4.29) is very convenient, since the computation of
the residual sum of squares is now clearly linked to those singular values which
are chosen to be effectively zero by the user of the method. The calculation is
once again as a sum of squared terms, so there are no difficulties of digit
cancellation.
The vector
(4.30)
in the context of statistical calculations is referred to as a set of uncorrelated
residuals (Golub and Styan 1973).
Nash and Lefkovitch (1976) report other experience with algorithm 4. In
particular, the largest problem I have solved using it involved 25 independent
variables (including a constant) and two dependent variables, for which there were
196 observations. This problem was initially run on a Hewlett-Packard 9830
calculator, where approximately four hours elapsed in the computation. Later the
same data were presented to a FORTRAN version of the algorithm on both Univac
1108 and IBM 370/168 equipment, which each required about 12 seconds of
processor time. Despite the order-of-magnitude differences in the timings between the computers and the calculator, they are in fact roughly proportional to
the cycle times of the machines. Moreover, as the problem has a singularity of
order 2, conventional least-squares regression programs were unable to solve it,
and when it was first brought to me the program on the HP 9830 was the only one
on hand which could handle it.
Algorithm 4. Givens reductions, singular-value decomposition and least-squares
solution
procedure GivSVD( n : integer; {order of problem}
nRHS: integer; {number of right hand sides}
var B: x-matrix; {matrix of solution vectors}
var rss: r-vector; {residual sums of squares}
var svs: x-vector; {singular values}
var W: rmatrix; {returns V-transpose}
var nobs : integer); {number of observations}
{alg04.pas ==
Givens reduction, singular value decomposition and least squares
solution.
In this program, which is designed to use a very small working array yet
solve least squares problems with large numbers of observations, we do not
explicitly calculate the U matrix of the singular value decomposition.
57
58
59
60
61
62
63
64
The method suggested is mainly useful for adding single observations, and other
approaches are better if more than a very few observations are to be included. For
instance, one could update the triangular form which results from the Givens
reduction if this had been saved, then proceed to the singular-value decomposition
as in algorithm 4.
No methods will be discussed for removing observations, since while methods
exist to accomplish this (see Lawson and Hanson 1974, pp 225-31), the operation is potentially unstable. See also Bunch and Nielsen (1978).
For instance, suppose we have a Givens QR decomposition of a matrix A (or
any other QR decomposition with Q orthogonal and R upper-triangular), then add
and delete a row (observation) denoted y T. Then after the addition of this row,
the (1, 1) elements of the matrices are related by
(4.31)
where the tilde is used to indicate matrices which have been updated. Deletion of
y T now requires the subtraction
(4.32)
to be performed in some way or another, an operation which will involve digit
cancellation if y1 and
are close in magnitude. The same difficulty may of
course occur in other columns-the first is simply easier to illustrate. Such cases
imply that an element of yT dominates the column in which it occurs and as such
should arouse suspicions about the data. Chambers (1971) subroutine to delete
rows from a QR decomposition contains a check designed to catch such occurrences.
Of interest to those users performing regression calculations are the estimates of
standard errors of the regression coefficients (the least-squares solution elements).
The traditional standard error formula is
SE(bi ) = (
(AT A) i i - 1 )
(4.33)
where 2 is an estimate of the variance of data about the fitted model calculated by
dividing the sum of squared residuals by the number of degrees of freedom (nRow
nCol) = (nRow n). The sum of squared residuals has already been computed in
algorithm 4, and has been adjusted for rank deficiency within the solution phase of
the code.
The diagonal elements of the inverse of the sum of squares and cross-products
matrix may seem to pose a bigger task. However, the singular-value decomposition
leads easily to the expression
T
(ATA)-1 = VS + S+ V .
(4.34)
In particular, diagonal elements of the inverse of the sum of squares and cross-
65
Thus, the relevant information for the standard errors is obtained by quite simple
row sums over the V matrix from a singular-value decomposition. When the original
A matrix is rank deficient, and we decide (via the tolerance for zero used to select
non-zero singular values) that the rank is r, the summation above reduces to
(4.36)
However, the meaning of a standard error in the rank-deficient case requires careful
consideration, since the standard error will increase very sharply as small singular
values are included in the summation given in (4.36). I usually refer to the dispersion
measures computed via equations (4.33) through (4.36) for rank r < n cases as
standard errors under the condition that the rank is5 (or whatever value r currently
has). More discussion of these issues is presented in Searle (1971) under the topic
estimable functions, and in various sections of Belsley, Kuh and Welsch (1980).
Chapter 5
(5.1)
are required, since they are central to the calculation of variances for parameters
estimated by least-squares regression. The cross-products matrix ATA from the
singular-value decomposition (2.53) is given by
ATA = VSUT USVT = VS2 V T .
(5.2)
(5.3)
If the cross-products matrix is of full rank, the generalised inverse is identical to
the inverse (5.1) and, further,
S+ = S- 1 .
(5.4)
Thus we have
(AT A)-1 = VS- 2VT .
(5.5)
The diagonal elements of this inverse are therefore computed as simple row
norms of the matrix
VS- l .
(5.6)
In the above manner the singular-value decomposition can be used to compute
the required elements of the inverse of the cross-products matrix. This means that
the explicit computation of the cross-products matrix is unnecessary.
Indeed there are two basic problems with computation of AT A. One is induced
by sloppy programming practice, the other is inherent in the formation of AT A.
The former of these occurs in any problem where one of the columns of A is
constant and the mean of each column is not subtracted from its elements. For
instance, let one of the columns of A (let it be the last) have all its elements equal
to 1. The normal equations (2.22) then yield a cross-products matrix with last row
(and column), say the nth,
(5.7)
But
(5.8)
66
67
where is the mean of the jth column of the m by n matrix A. Furthermore, the
right-hand side of the nth normal equation is
(5.9)
This permits xn to be eliminated by using the nth normal equation
(5.10)
or
(5.11)
When this expression is substituted into the normal equations, the kth equation
(note carefully the bars above the symbols) becomes
(5.12)
But since
(5.13)
and
(5.14)
equation (5.12) becomes
(5.15)
which defines a set of normal equations of order (n - 1)
(A' ) TA' x' = (A') Tb'
(5.16)
(5.17)
68
= 10035 so that
(5.18)
(5.22)
(a1 )
(a2 ) 2
(a 3)2
(a4 ) 2
sum
sum/4
var( a)
1000000
1004004
1008016
1016064
4028084
1007021
-1007012.25
875
Truncated
100000 * 10
100400 * 10
100801* 10
101606 * 10
402807 * 10
100701* 10
-100701 * 10
0
Rounded
100000 * 10
100400 * 10
100802 * 10
101606 * 10
402808 * 10
100702 * 10
-100701 * 10
l * 10 = 10
69
which is singular since the first two columns or rows are identical. If we use
deviations from means (and drop the constant column) a singular matrix still
results. For instance, on a Data General NOVA minicomputer using a 23-bit
binary mantissa (between six and seven decimal digits), the A matrix using
deviation from mean data printed by the machine is
which is singular.
However, by means of the singular-value decomposition given by algorithm 1,
the same machine computes the singular values of A (not A') as
217533, 112603 and 1E5.
Since the ratio of the smallest to the largest of the singular values is only slightly
larger than the machine precision (2-22 238419E-7), it is reasonable to presume that the tolerance q in the equation (3.37) should be set to some value
between 1E-5 and 112603. This leads to a computed least-squares solution
with
T
r r = 168956E4.
(In exact arithmetic it is not possible for the sum of squares with q= 0 to exceed
that for a larger tolerance.)
70
When using the singular-value decomposition one could choose to work with
deviations from means or to scale the data in some way, perhaps using columns
which are deviations from means scaled to have unit variance. This will then
prevent large data from swamping small data. Scaling of equations has proved a
difficult and somewhat subjective issue in the literature (see, for instance, Dahlquist and Bjrck 1974, p 181ff).
Despite these cautions, I have found the solutions to least-squares problems
obtained by the singular-value decomposition approach to be remarkably resilient
to the omission of scaling and the subtraction of means.
As a final example of the importance of using decomposition methods for
least-squares problems, consider the data (Nash and Lefkovitch 1976)
This is a regression through the origin and can be shown to have the exact solution
with a zero residual sum of squares. If we wish to use a method which only scans
the data once, that is, explicit residuals are not computed, then solution of the
normal equations allows the residual sum of squares to be computed via
r T r = b T b bTAx.
(5.23)
r Tr = 422E4
and
r T r = 0046709.
(h) for = 64
rT r = 0
and
rT r = 0.
71
Since the first edition of this book appeared, several authors have considered
problems associated with the formation of the sum of squares and cross-products
matrix, in particular the question of collinearity. See, for example, Nash (1979b) and
Stewart (1987).
Previous
Chapter 6
(6.4)
(6.5)
Home
Next
73
Furthermore, the sum of any two equations of (6.5) is also an equation of the set
(6.5). Multiplying the first equation (i.e. that for i = 1) by
mi1 = Ai1/A11
(6.6)
(6.7)
where
= Aik mi 1 A 1 k
(6.8)
and
(6.9)
But
(6.10)
so that we have eliminated all but the first element of column 1 of A . This process
can now be repeated with new equations 2, 3, . . . , n to eliminate all but the first
two elements of column 2. The element A12 is unchanged because equation 1 is
not a participant in this set of eliminations. By performing (n - 1) such sets of
eliminations we arrive at an upper-triangular matrix R. This procedure can be
thought of as an ordered sequence of multiplications by elementary matrices. The
elementary matrix which eliminates Aij will be denoted Mij and is defined by
where
M ij = 1 n mij
(6.11)
ij
mij = Aij/Ajj
(the elements in A are all current, not original, values) and where
having 1 in the position ij and zeros elsewhere, that is
(6.12)
ij
is the matrix
(6.13)
which uses the Kronecker delta, ir = 1 for i = r and ir = 0 otherwise. The effect
on Mij when pre-multiplying a matrix A is to replace the ith row with the
difference between the ith row and mij times the jth row, that is, if
A' = M i jA
(6.14)
then
for r i
(6.15)
(6.16)
with k = 1, 2, . . . , n. Since Ajk = 0 for k < j, for computational purposes one need
only use k = j, ( j+ 1), . . . , n. Thus
(6.17)
= L - 1A
(6.18)
74
for j = i
m ij
for j < i
L - 1L = 1
(6.20)
(6.21)
(6.22)
which is a triangular decomposition of the matrix A. It permits us to rewrite the
original equations
A x = LRx = b
(6.23)
as
Rx = L- lAx = L- lb = f.
(6.24)
Because we can retain the triangular structure by writing the unit matrix
1 = DD - 1
(6.25)
(6.26)
the Gauss elimination is not the only means to obtain a triangular decomposition
of A.
In fact, the Gauss elimination procedure as it has been explained so far is
unsatisfactory for computation in finite arithmetic because the mij are computed
from current elements of the matrix, that is, those elements which have been
computed as a result of eliminating earlier elements in the scheme. Recall that
(6.12)
using the prime to denote current values of the matrix elements, that is, those
values which have resulted from eliminating elements in columns 1, 2, . . . , (j 1).
75
If
is zero, we cannot proceed, and small
are quite likely to occur during
subtractions involving digit cancellations, so that multipliers mij that are large and
inaccurate are possible. However, we can ensure that multipliers mij are all less
than one in magnitude by permuting the rows of A' (and hence A) so that the
largest of
for i = j, (j + 1), . . . , n, is in the diagonal or pivot position. This
modified procedure, called Gauss elimination with partial pivoting, has a large
literature (see Wilkinson (1965) for a discussion with error analysis). Since the
rows of A have been exchanged, this procedure gives the triangular decomposition
of a transformed matrix
PA = LR
(6.27)
76
77
Example 6.1. The use of linear equations and linear least-squares problems
Organisations which publish statistics frequently use indices to summarise the
change in some set of measurable quantities. Already in example 3.2 we have
used indices of the use of various chemicals in agriculture and an index for farm
income. The consumer price index, and the Dow Jones and Financial Times
indices provide other examples. Such indices are computed by dividing the
average value of the quantity for period t by the average for some base period
t = 0 which is usually given the index value 100. Thus, if the quantity is called P,
then
(6.28)
78
where
(6.29)
given n classes or types of quantity P, of which the jth has value Ptj in period t
and is assigned weight Wj in the average. Note that it is assumed that the
weighting Wj is independent of the period, that is, of time. However, the
weightings or shopping basket may in fact change from time to time to reflect
changing patterns of product composition, industrial processes causing pollution,
stocks or securities in a portfolio, or consumer spending.
Substitution of (6.29) into (6.28) gives
Finally, letting
(6.30)
gives
(6.31)
Thus, if n periods of data It, Ptj, j = 1, . . . , n, are available, we can compute the
weightings KWj. Hence, by assuming
(6.32)
that is, that the weights are fractional contributions of each component, we can
find the value of K and each of the Wj. This involves no more nor less than the
solution of a set of linear equations. The work of solving these is, of course,
unnecessary if the person who computes the index publishes his set of weights-as
indeed is the case for several indices published in the Monthly Digest of Staristics .
Unfortunately, many workers do not deem this a useful or courteous practice
towards their colleagues, and I have on two occasions had to attempt to discover
the weightings. In both cases it was not possible to find a consistent set of weights
over more than n periods, indicating that these were being adjusted over time.
This created some difficulties for my colleagues who brought me the problems,
since they were being asked to use current price data to generate a provisional
estimate of a price index considerably in advance of the publication of the indices
by the agency which normally performed the task. Without the weights, or even
approximate values from the latest period for which they were available, it was
not possible to construct such estimates. In one case the calculation was to have
79
1
11
1l
l25
l3
128
131
05
05
05
06
06
06
06
P 4
l3
136
l4
141
1412
152
16
36
36
36
36
395
39
395
100
103718
104487
109167
114974
115897
118846
I2
100
103718
104487
109167
114974
984615
101506
used various proposed oil price levels to ascertain an index of agricultural costs.
When it proved impossible to construct a set of consistent weights, it was
necessary to try to track down the author of the earlier index values.
As an example of such calculations, consider the set of prices shown in table 6.1
and two indices I1 and I2 calculated from them. I1 is computed using proportions
04, 01, 03 and 02 respectively of P1, P2, P3 and P4. I2 uses the same weights
except for the last two periods where the values 035, 015, 04 and 01 are used.
Suppose now that these weights are unknown. Then the data for the first four
periods give a set of four equations (6.31) which can be solved to give
KW =
using Gauss elimination (Data General NOVA, 23-bit binary mantissa). Applying
the normalisation (6.32) gives
W =
If these weights are used to generate index numbers for the last three periods, the
values I1 will be essentially reproduced, and we would detect a change in the
weighting pattern if the values I2 were expected.
An alternative method is to use a least-squares formulation, since if the set of
weights is consistent, the residual sum of squares will be zero. Note that there is
no constant term (column of ones) in the equations. Again on the NOVA in
23-bit arithmetic, I1 gives
80
with a residual sum of squares (using KW) over the seven periods of 415777E7.
The same calculation with I2 gives a residual sum of squares of 241112, showing
that there is not a consistent set of weights. It is, of course, possible to find a
consistent set of weights even though index numbers have been computed using a
varying set; for instance, if our price data had two elements identical in one
period, any pair of weights for these prices whose sum was fixed would generate
the same index number.
6.3. VARIATIONS ON THE THEME OF GAUSS ELIMINATION
Gauss elimination really presents only one type of difficulty to the programmerwhich of the many possible variations to implement. We have already touched
upon the existence of two of the better known ones, those of Crout and Doolittle
(see Dahlquist and Bjrck 1974, pp 1578). While these methods are useful and
important, they require double-precision arithmetic to be used to full advantage,
so cannot be used effectively if the computing system at hand lacks this capability.
Bowdler et al (1966) present ALGOL versions of the Crout algorithm which
implicitly scale the rows of the coefficient matrix. These algorithms are complicated by ALGOL'S lack of double-length arithmetic, necessitating calls to machine
code procedures. (This comment applies to ALGOL-60, not ALGOL-68.)
By and large I have avoided scaling within my programs because of the great
difficulty of making any reliable general recommendations. Indeed, given any two
non-singular diagonal matrices D and E, the system of equations
DAEE - 1x = D b
(6.33)
(2.2)
In scaling the equations by row multiplication we are adjusting D, which adjusts
the pivot selection. It is often recommended that the scaling factors in D be
chosen to equilibrate the matrix A, that is, so that
max |( DA)ij| = 1
j
for i = 1, 2, . . . , n
(6.34)
where for the moment E is assumed to be a unit matrix. This is simply a dose of
common sense which attempts to avoid arithmetic involving numbers widely
different in magnitude. However, as Dahlquist and Bjrck (1974, pp 1813)
point out, the scaling E-1 of the solution x can frustrate our efforts to stabilise the
computation. Furthermore, optimal scaling depends on knowledge of the matrix
A-1, which is not known. They therefore suggest E be chosen to reflect the
importance of the unknowns. This statement is suitably amorphous to cover
whatever situations arise, so I shall venture the opinion that the magnitudes of the
solution elements
y = E- 1x
(6.35)
should be roughly equivalent. That is to say, the variables in the problem at hand
should be measured in units which give the expected solution elements
81
approximately the same size. Is this worth the bother? I can only add that I rarely
scale sets of equations unless there is some very obvious and natural way to do it.
Similar comments apply to iterative improvement of a computed solution
(Dahlquist and Bjrck 1974, pp 183-5, Bowdler et al 1966). Given a computed
solution
(6.36)
if
(6.37)
then a triangular decomposition of A permits solution of
Ac = LRc = r
(6.38)
for i = 1, 2, . . . , n
(6.40)
82
(6.41)
(6.42)
or row-wise, so that
These translations offer some simplifications of the elimination and backsubstitution algorithms. In fact, the row-wise form (6.41) is more useful for
elimination where the index of an element is simply incremented to proceed
across a row of the coefficient matrix. For back-substitution, we need to form
matrix-vector products which oblige us to access array elements by marching
simultaneously across rows and down columns. Implicit pivoting is also possible
with a one-dimensional storage scheme. This adds just one more item to those
from which a method must be selected.
It is probably clear to my readers that I have already decided that simplest is
best and intend to stick with algorithms 5 and 6. My reasons are as follows.
(i) Despite the elegance of implicit pivoting, the extra index vector and the
program code needed to make it work are counter to the spirit of a compact
algorithm.
(ii) The implicit interchange only gains in efficiency relative to the direct method
if an interchange is needed; this is without counting the overhead which array
access via q implies. But in many instances very few interchanges are required and
the whole discussion then boils down to an argument over the likely number of
interchanges in the problem set to be solved.
(iii) In coding Gauss elimination with back-substitution and the Gauss-Jordan
reduction with various of the above choices, S G Nash and I (unpublished
work) found that the implicit pivoting methods were surprisingly prone to bugs
which were difficult to discover. This applied particularly to the one-dimensional
storage forms. Most of these errors were simple typographical errors in entry of
the code. Since it is hoped the algorithms in this book will prove straightforward
to implement, only a direct method has been included.
6.4. COMPLEX SYSTEMS OF EQUATIONS
Consider the system of equations (where i = (1))
( Y+iZ) (u +i v) = g + i h.
(6.43)
83
Separating these into real and imaginary components gives the real equations
(6.44)
Yu Zv = g
(6.45)
Yv + Zu = h
which is a set of linear equations (2.22) with
(6.46)
(6.47)
and
(6.48)
This is how complex systems of linear equations can be solved using real
arithmetic only. Unfortunately the repetition of the matrices Y and Z in (6.46)
means that for a set of equations of order n, 2n2 storage locations are used
unnecessarily. However, the alternative is to recode algorithms 5 and 6 to take
account of the complex arithmetic in (6.43). Bowdler et al (1966) give ALGOL
procedures to perform the Crout variant of the elimination for such systems of
equations, unfortunately again requiring double-length accumulation.
6.5. METHODS FOR SPECIAL MATRICES
The literature contains a number of methods for solving special systems of
equations. For instance, several contributions in Wilkinson and Reinsch (1971)
deal with band matrices, that is, those for which
Aij = 0
if | i j | > k
(6.49)
for some k. Thus if k = 1, the matrix is tridiagonal. While these methods are
undoubtedly useful and save memory, I have not included them in this monograph because I feel any serious user with enough special problems to warrant a
method tailored to the task is likely to find and implement one. Others may only
find too many special cases tedious or bewildering. Thus no discussion of banded
or other special forms is given, though the user should be alert to triangular forms
since it is very wasteful of effort to apply Gauss elimination to a lower-triangular
matrix when simple forward-substitution will suffice. Likewise, no treatment is
included of the various iteration methods for the systems of equations arising
from partial differential equations (see Varga 1962). It should be pointed out,
however, that the Givens reduction can often be organised to take advantage of
patterns of zeros in matrices. Even as it stands, algorithm 3 is quite efficient for
such problems, since very little work is done when zero elements are encountered
and no pivot interchanges are made.
The only special form which will be considered is a symmetric positive definite
matrix. Chapter 7 deals with a decomposition of such a matrix useful for solving
special sets of linear equations. Chapter 8 discusses a very compact algorithm for
inverting such a matrix in situ, that is, on top of itself.
Chapter 7
(7.5)
Ai1 = LilL11
(7.6)
L i1 = A i1 /L 11 .
(7.7)
Furthermore
so that we obtain
Consider now the mth column of L which is defined for i > m by
(7.8)
with the diagonal element determined first by setting i = m. It is straightforward to
see that every element in the right-hand side of equation (7.8) comes from
columns 1, 2, . . . , (m 1) of L or from column m of A. Since (7.5) and (7.7)
define the first column of L, we have a stepwise procedure for computing its
remaining columns, and furthermore this can be arranged so that L overwrites A
84
85
within the computer. It remains to be shown that the procedure is stable and that
for i = m the right-hand side of (7.8) is positive, so no square roots of negative
numbers are required.
Firstly, A is positive definite if
x T A x >0
for all x 0.
(7.9)
An equivalent statement is that all the eigenvalues of A are positive. From (7.9) it
follows by setting x to any column of the unit matrix that no diagonal element of
A can be non-positive. Likewise, by taking only xi and xi non-zero
(7.10)
which requires that the quadratic equation
z2Aii + 2zAij + Ajj = 0
(7.11)
(7.14)
or
(7.15)
where Li-1 is assumed non-singular. In fact, it is positive definite providing the
positive square root is chosen in the computation of each of its diagonal elements
via (7.8). Consider now the choice in (7.9) of an x such that the first ( i 1)
elements of x are given by
xi = -1, and xj = 0 for j > i. This choice, using
(7.13) gives
(7.16)
which reduces to
Aii - cTc > 0.
(7.17)
But a comparison of this with (7.8) shows that it implies the square of each
diagonal element of L is positive, so that all the elements of L are real providing A
is positive definite. Furthermore, an analysis similar to that used in (7.10), (7.11)
and (7.12) demands that
(7.18)
86
(Again, the diagonal elements must be chosen to be positive in the decomposition.) Equations (7.17) and (7.18) give bounds to the size of the subdiagonal
elements of L, which suggests the algorithm is stable. A much more complete
analysis which confirms this conjecture is given by Wilkinson (1961) who shows
the matrix LLT as computed is always close in some norm to A.
Once the Choleski decomposition has been performed, linear equations
Ax=LLT x = b
(7.19)
(7.20)
followed by
Rx = LT x = v
(7.21)
T
(7.23)
Likewise, the solution elements xj of (7.21) are obtained in the backward order n,
(n 1), . . . , 1 from
(7.24)
xn= vn / Ln n
(7.25)
(7.26)
7.2. EXTENSION OF THE CHOLESKI DECOMPOSITION TO
NON-NEGATIVE DEFINITE MATRICES
When A is non-negative definite, the inequality (7.9) becomes
x TAx > 0
for all x
(7.27)
and inequalities (7.10), (7.12), (7.17) and (7.18) must be amended similarly.
There is no difficulty in performing the decomposition unless a zero diagonal
element appears in L before the decomposition is complete. For L mm = 0, equations (7.3) and (7.8) are satisfied by any values of Lim for i > m. However, if we
desire L to be non-negative definite and wish to satisfy the amended form of
(7.18), that is
(7.28)
87
we should set Lim = 0 for i > m. This is a relatively trivial modification to the
decomposition algorithm. Lmm is, of course, found as the square root of a quantity
which arises by subtraction, and in practice this quantity may be slightly negative
due to rounding error. One policy, adopted here, is to proceed with the decomposition, setting all Lim = 0 for i > m even if this quantity is negative, thus
assuming the matrix A is non-negative definite. Since there may be very good
reasons for presupposing A to be non-negative definite, for instance if it has been
formed as a sum-of-squares and cross-products matrix in a least-squares regression calculation, this is not as dangerous as it appears. Furthermore the decision
to continue the decomposition in 7.1 when the computed square of Lmm is
positive, rather than greater than some tolerance for zero, has been made after
the following considerations.
(i) The decomposition is valid to the precision available in the arithmetic being
used. When zero diagonal elements arise in L they reflect linear dependencies in
the set of equations for which a solution is to be found, and any of the infinity of
solutions which exist is taken to be acceptable. However, in recognition of the
possibility that there may only be a near-linear dependence, it does not seem wise
to presume a small number is zero, since the Choleski decomposition, unlike the
singular-value decomposition (algorithm 1), does not allow the user to decide at
the time solutions are computed which small numbers are to be assumed zero.
(ii) The size of the computed square of Lmm is dependent on the scale of the
matrix A. Unless the tolerance for zero is proportional to a norm of A, its
application has an effect which is not consistent from one problem to another.
If the computed square of Lmm is non-positive, the mth column of L is
therefore set to zero
Lim = 0
for i = m, (m + 1), . . . , n.
(7.29)
(2.2)
are unfortunately not now possible since division by zero occurs. If, however, the
equations are consistent, that is, if b belongs to the column space of A, at least one
solution x exists (see, for example, Finkbeiner 1966, p 98).
Consider the forward-substitution to solve
Lv = b
(7.30)
(7.31)
are solutions to (2.2) for arbitrary values of those xk for which Lkk = 0. This is, of
course, only possible if
(7.32)
which is another way of stating the requirement of consistency for the equations.
88
For a specific example, consider that Lmm = 0 as above. Thus, in the forwardsubstitution v m is always multiplied by zero and could have arbitrary value, except
that in the back-substitution the mth row of LT is null. Denoting this by the vector
(7.33)
it is easily seen that
(7.34)
so that vm must be zero or the equation (7.34) is not satisfied. From (7.30) and
(7.31) one has
Lv = LLT x= b =Ax.
(7.35)
Since xm is arbitrary by virtue of (7.34), the value chosen will influence the
values of all xi, i < m, so some standard choice here is useful when, for instance,
an implementation of the algorithm is being tested. I have always chosen to set
xm = 0 (as below in step 14 of algorithm 8).
The importance of the above ideas is that the solution of linear least-squares
problems by the normal equations
BT Bx = BT y
(7.36)
89
90
91
finding seemed to be reversed, probably because these latter machines are run as
interpreters, that is, they evaluate the code as they encounter it in a form of
simultaneous translation into machine instructions whereas a compiler makes the
translation first then discards the source code. The Healy and Price routines most
likely are slowed down by the manner in which their looping is organised,
requiring a backward jump in the program. On a system which interprets, this is
likely to require a scan of the program until the appropriate label is found in the
code and it is my opinion that this is the reason for the more rapid execution of
algorithm 7 in interpreter environments. Such examples as these serve as a
warning that programs do show very large changes in their relative, as well as
absolute, performance depending on the hardware and software environment in
which they are run.
Algorithm 7 computes the triangular factor L column by column. Row-by-row
development is also possible, as are a variety of sequential schemes which perform
the central subtraction in STEP 4 in piecemeal fashion. If double-precision arithmetic is possible, the forms of the Choleski decomposition which compute the
right-hand side of (7.8) via a single accumulation as in algorithm 7 have the
advantage of incurring a much smaller rounding error.
for i
for j > i
for i = j
-1
for j < i.
92
-1 2
-1 0 3
-1 0 1 4
-1 0 1 2 5
which gives the Choleski factor
1
-1
-1
-1
-1
1
-1
-1
-1
1
-1 1
-1 -1
1.
0051383414
93
x =
-0046192435
1019386565
-0159822924
-0290376225
2077826146
This is to be compared with solution (a) of table 3.2 or the first solution of
example 4.2 (which is on pp 62 and 63), which shows that the various methods all
give essentially the same solution under the assumption that none of the singular
values is zero. This is despite the fact that precautions such as subtracting means
have been ignored. This is one of the most annoying aspects of numerical
computation-the foolhardy often get the right answer! To underline, let us use
the above data (that is BT B and BTy ) in the Gauss elimination method, algorithms
5 and 6. If a Data General NOVA operating in 23-bit binary arithmetic is used,
the largest integer which can be represented exactly is
223 1 = 8388607
so that the original matrix of coefficients cannot be represented exactly. However,
the solution found by this method, which ignores the symmetry of B T B, is
-462306E-2
101966
-0159942
-0288716
207426
While this is not as close as solution (a) of table 3.2 to the solutions computed in
comparatively double-length arithmetic on the Hewlett-Packard 9830, it retains
the character of these solutions and would probably be adequate for many
practitioners. The real advantage of caution in computation is not, in my opinion,
that one gets better answers but that the answers obtained are known not to be
unnecessarily in error.
Previous Home
Chapter 8
(2.2)
0, we might use
(8.1)
The substitution of this into the other equations gives a new set of (n - 1)
equations in (n - 1) unknowns which we shall write
A'x' = b'
(8.2)
in which the indices will run from 2 to n. In fact x' will consist of the last (n - 1)
elements of x. By means of (8.1) it is simple to show that
(8.3)
and
(8.4)
for k, j = 2, . . . , n. Notice that if b is included as the (n + 1)th column of A, (8.4) is
the only formula needed, though j must now run to (n + 1).
We now have a set of equations of order (n l), and can continue the process
until only a set of equations of order 1 remains. Then, using the trivial solution of
this, a set of substitutions gives the desired solution to the original equations. This
is entirely equivalent to Gauss elimination (without pivoting) and back-substitution,
and all the arithmetic is the same.
Consider, however, that the second substitution is made not only of x2 into the
remaining (n 2) equations, but also into the formula (8.1) for x1. Then the final
order-1 equations yield all the xj at once. From the viewpoint of elimination it
corresponds to eliminating upper-triangular elements in the matrix R in the
system
Rx = f
(6.4)
94
Next
95
1x = f '
that is, the solution to the set of equations.
Yet another way to look at this procedure is as a series of elementary row
operations (see Wilkinson 1965) designed to replace the pth column of an n by n
matrix A with the p th column of the unit matrix of order n, that is, eP. To
accomplish this, the pth row of A must be divided by APP, and Aip times the
resulting pth row subtracted from every row i for i p. For this to be
possible, of course, App cannot be zero.
A combination of n such steps can be used to solve sets of linear equations. To
avoid build-up of rounding errors, some form of pivoting must be used. This will
involve one of a variety of schemes for recording pivot positions or else will use
explicit row interchanges. There are naturally some trade-offs here between
simplicity of the algorithm and possible efficiency of execution, particularly if the
set of equations is presented so that many row exchanges are needed.
By using the Gauss-Jordan elimination we avoid the programming needed to
perform the back-substitution required in the Gauss elimination method. The
price we pay for this is that the amount of work rises from roughly n 3 /3
operations for a single equation to n 3/2 as n becomes large (see Ralston 1965
p 401). For small n the Gauss-Jordan algorithm may be more efficient depending
on factors such as implementation and machine architecture. In particular, it is
possible to arrange to overwrite the ith column of the working matrix with the
corresponding column of the inverse. That is, the substitution operations of
equations (8.1) and (8.4) with 1 replaced by i give elimination formulae
ij = Aij/Aii
kj = Akj Aki (A ij /A ii)
(8.1a)
(8.4a)
for j = 1, 2, . . . , n, k = 1, 2 , . . . , n, but k
i, with the tilde representing the
transformed matrix. However, these operations replace column i with ei, the ith
column of the unit matrix 1 n, information which need not be stored. The
right-hand side b is transformed according to
(8.1b)
(8.3a)
for k = 1, 2, . . . , n with k i. To determine the inverse of a matrix, we could solve
the linear-equation problems for the successive columns of 1n. But now all
columns ej for j > i will be unaltered by (8.1b) and (8.3a). At the ith stage of the
reduction, ei can be substituted on column i of the matrix by storing the pivot Aii,
substituting the value of 1 in this diagonal position, then performing the division
implied in (8.1a). Row i of the working matrix now contains the multipliers
ij = (Aij/Aii). By performing (8.4a) row-wise, each value Aki can be saved, a
zero substituted from ei, and the elements of Akj, j = 1, 2, . . . , n, computed.
96
This process yields a very compact algorithm for inverting a matrix in the
working storage needed to store only a single matrix. Alas, the lack of pivoting
may be disastrous. The algorithm will not work, for instance, on the matrix
0 1
1 0
which is its own inverse. Pivoting causes complications. In this example, interchanging rows to obtain a non-zero pivot implies that the columns of the resulting
inverse are also interchanged.
The extra work involved in column interchanges which result from partial
pivoting is avoided if the matrix is symmetric and positive definite-this special
case is treated in detail in the next section. In addition, in this case complete
pivoting becomes diagonal pivoting, which does not disorder the inverse. Therefore algorithms such as that discussed above are widely used to perform stepwise
regression, where the pivots are chosen according to various criteria other than
error growth. Typically, since the pivot represents a new independent variable
entering the regression, we may choose the variable which most reduces the
residual sum of squares at the current stage. The particular combination of (say) m
out of n independent variables chosen by such a forward selection rule is not
necessarily the combination of m variables of the n available which gives the
smallest residual sum of squares. Furthermore, the use of a sum-of-squares and
cross-products matrix is subject to all the criticisms of such approaches to
least-squares problems outlined in chapter 5.
As an illustration, consider the problem given in example 7.2. A Data General
ECLIPSE operating in six hexadecimal digit arithmetic gave a solution
x=
-464529E-2
102137
-0160467
-0285955
206734
when the pivots were chosen according to the residual sum-of-squares reduction
criterion. The average relative difference of these solution elements from those of
solution ( ) of table 3.2 is 079%. Complete (diagonal) pivoting for the largest
element in the remaining submatrix of the Gauss-Jordan working array gave a
solution with an average relative difference (root mean square) of 041%. There
are, of course, 120 pivot permutations, and the differences measured for each
solution ranged from 010% to 079%. Thus pivot ordering does not appear to be
a serious difficulty in this example.
The operations of the Gauss-Jordan algorithm are also of utility in the solution
of linear and quadratic programming problems as well as in methods derived from
such techniques (for example, minimum absolute deviation fitting). Unfortunately,
these topics, while extremely interesting, will not be discussed further in this
monograph.
97
Bauer and Reinsch (in Wilkinson and Reinsch 1971, p 45) present a very compact
algorithm for inverting a positive definite symmetric matrix in situ, that is,
overwriting itself. The principal advantages of this algorithm are as follows.
(i) No pivoting is required. This is a consequence of positive definiteness and
symmetry. Peters and Wilkinson (1975) state that this is well known, but I
believe the full analysis is as yet unpublished.
(ii) Only a triangular portion of the matrix need be stored due to symmetry,
though a working vector of length n, where n is the order of the matrix, is needed.
The algorithm is simply the substitution procedure outlined above. The
modifications which are possible due to symmetry and positive definiteness,
however, cause the computational steps to look completely different.
Consider an intermediate situation in which the first k of the elements x and b
have been exchanged in solving
Ax = b
(8.6)
by the Gauss-Jordan algorithm. At this stage the matrix of coefficients will have
the form
W X
Y Z
(8.7)
(8.9)
98
k)
(8.10)
(8.11)
(8.12)
(8.13)
(8.16)
where we use the identity
(8.17)
since these elements belong to a submatrix Z which is symmetric in accord with
the earlier discussion.
It remains to establish that
for j = (k+1), . . . , n
(8.18)
but this follows immediately from equations (8.11) and (8.12) and the symmetry of
the submatrix Z. This completes the induction.
There is one more trick needed to make the Bauer-Reinsch algorithm extremely compact. This is a sequential cyclic re-ordering of the rows and columns
of A so that the arithmetic is always performed with k = 1. This re-numeration
relabels (j + 1) as j for j = 1, 2, . . . , (n - 1) and relabels 1 as n. Letting
(8.19)
this gives a new Gauss-Jordan step
(8.20)
(8.21)
(8.22)
(8.23)
for i, j = 2, . . . , n.
99
100
NEW
LOAD ENHBRT
LOAD ENHMT5
RUN
ENHBRG AUG 19 75
BAUER REINSCH
ORDER? 5
MOLER MATRIX
1
-1 2
-1 0 3
-1 0 1 4
-1 0 1 2 5
INVERSE
INVERSE
ROW 1
2
ROW 2
-1 2
ROW 3
0 -1 2
ROW 4
-1 2
ROW 5
0 0 0 -1
ROW
86
ROW
43
ROW
22
ROW
12
ROW
8
1
2
22
3
11 6
4
6 3 2
5
4 2 1 1
101
OF INVERSE
2 3
2 3 4
4 5
INVERSE OF INVERSE
ROW 1
.999994
ROW 2
-1 2
ROW 3
-.999987
ROW 4
-.999989
ROW 5
-.999999
0 2.99997
0
.999976
3.99998
.999978
1.99998
4.99998
Previous Home
Chapter 9
(9.1a)
(9.1b)
from some starting vector x1 until successive x i are identical. The eigenvector is
then given by x and the magnitude of the eigenvalue by ||y || where || || represents
any vector norm. The simplicity of this algorithm and its requirement for only a
matrix-vector product makes it especially suitable for small computers, though
convergence may at times be very slow.
The method works as follows. Consider the expansion of x 1 in terms of the
eigenvectors j, j = 1, 2, . . . n, which span the space. This may not be possible for
non-symmetric matrices. However, the algorithm is generally less useful for such
matrices and will not be considered further in this context. Wilkinson (1965, chap
9) has a good discussion of the difficulties. Therefore, returning to the expansion
102
Next
103
of x 1, we have
(9.2)
The first iteration of the power method then gives
(9.3)
where ej is the eigenvalue of A corresponding to j. Denoting the reciprocal of
the norm of yj by Nj, that is
Nj = || yj||-1
(9.4)
(9.7)
unless j = 1 (the case of degenerate eigenvalues is treated below), the coefficients
of j, j 1, eventually become very small. The ultimate rate of convergence is
given by
r = | e 2 /e1 |
(9.8)
where e 2 is the eigenvalue having second largest magnitude. By working with the
matrix
A ' = A kl
(9.9)
this rate of convergence can be improved if some estimates of e 1 and e2 are
known. Even if such information is not at hand, ad hoc shifts may be observed to
improve convergence and can be used to advantage. Furthermore, shifts permit
(i) the selection of the most positive eigenvalue or the most negative eigenvalue
and, in particular,
(ii) evasion of difficulties when these two eigenvalues are equal in magnitude.
Degenerate eigenvalues present no difficulty to the power method except that it
now converges to a vector in the subspace spanned by all eigenvectors corresponding to e1 . Specific symmetry or other requirements on the eigenvector must
be imposed separately.
In the above discussion the possibility that a 1 = 0 in the expansion of x 1 has
been conveniently ignored, that is, some component of x 1 in the direction of l i s
assumed to exist. The usual advice to users is, Dont worry, rounding errors will
104
(9.10a)
xi+l = yi/||yi||.
(9.10b)
Note that the solution of a set of simultaneous linear equations must be found at
each iteration.
While the power method is only applicable to the matrix eigenproblem (2.62),
inverse iteration is useful for solving the generalised eigenproblem (2.63) using
A' = A kB
(9.11)
(9.12a)
xi+1 = yi/||yi||.
(9.12b)
Once again, the purpose of the normalisation of y in (9.1b), (9.10b) and (9.12b) is
simply to prevent overflow in subsequent calculations (9.1a), (9.10a) or (9.12a).
The end use of the eigenvector must determine the way in which it is standardised. In particular, for the generalised eigenproblem (2.63), it is likely that x
should be normalised so that
x T Bx = 1.
(9.13)
Such a calculation is quite tedious at each iteration and should not be performed
until convergence has been obtained, since a much simpler norm will suffice, for
instance the infinity norm
(9.14)
where yj is the jth element of y. On convergence of the algorithm, the eigenvalue
is
(9.14)
e = k + xj/y j
(where the absolute value is not used).
Inverse iteration works by the following mechanism. Once again expand x 1 as
105
in (9.2); then
(9.16)
or
(9.17)
Therefore
(9.18)
and the eigenvector(s) corresponding to the eigenvalue closest to k very quickly
dominate(s) the expansion. Indeed, if k is an eigenvalue, A' is singular, and after
solution of the linear equations (9.12a) (this can be forced to override the
singularity) the coefficient of the eigenvector corresponding to k should be of
the order of 1/eps, where eps is the machine precision. Peters and Wilkinson
(1971, pp 418-20) show this full growth to be the only reliable criterion for
convergence in the case of non-symmetric matrices. The process then converges
in one step and obtaining full growth implies the component of the eigenvector in
the expansion (9.2) of x 1 is not too small. Wilkinson proposes choosing different
vectors x 1 until one gives full growth. The program code to accomplish this is
quite involved, and for symmetric matrices repetition of the iterative step is
simpler and, because of the nature of the symmetric matrix eigenproblem, can
also be shown to be safe. The caution concerning the choice of starting vector for
matrices which are exactly representable should still be heeded, however. In the
case where k is not an eigenvalue, inverse iteration cannot be expected to
converge in one step. The algorithm given below therefore iterates until the
vector x has converged.
The form of equation (9.12a) is amenable to transformation to simplify the
iteration. That is, pre-multiplication by a (non-singular) matrix Q gives
QAyi = QBxi.
(9.19)
106
107
108
(9.20)
(9.21)
since a normalisation is performed at each stage. Alas, too many matrices have
109
(9.22)
by virtue of (6.27), where P is the permutation matrix defined by the interchanges resulting from pivoting. These can be stored, as discussed in 6.3 in a
single integer vector of indices q. Then to perform inverse iteration, it is only
necessary to store this vector plus a working array n by n instead of the n by 2n
array used in algorithm 10. Two vectors x and y are still needed. The elements of
the lower-triangular matrix L are
1
for i = j
(9.23)
Lij = 0
for j > i
for j < i.
mij
110
The subdiagonal elements mij are left in place after the Gauss elimination step.
and the upper-triangular matrix R forms the upper triangle of the working array.
Then the inverse iteration step (9.10a) involves the forward-substitution
L v = Px i
(9.23)
Ryi = v.
(9.25)
and back-substitution
The latter substitution step has been treated before. but the former is simplified by
the ones on the diagonal of L so that
(9.26)
(9.27)
The calculation can be arranged so that u is not needed. that is so x and y are the
only working vectors needed.
(9.28)
111
(9.29)
(9.30)
112
113
114
115
116
117
AT SEPT 3 74
);REAL=?
);REAL=?
);REAL=?
);REAL=?
);REAL=?
);REAL=?
);REAL=?
);REAL=?
);REAL=?
1 IMAGINARY? 2
3 IMAGINARY? 4
21 IMAGINARY? 22
43 IMAGINARY? 44
13 IMAGINARY? 14
15 IMAGINARY? 16
5 IMAGINARY? 6
7 IMAGINARY? 8
25 IMAGINARY? 26
118
Previous
Chapter 10
(10.2)
(10.3)
(10.4)
(10.5)
-1
(10.6)
-l
(10.7)
but also positive definite (see 7.1, p 71), then from the singular-value decomposition
A = USVT
(2.53)
119
Home Next
120
(10.11)
(10.12)
2
Since (10.11) and (10.12) are both eigenvalue equations for A , S and E 2 are
identical to within ordering, and since all ei are positive, the orderings (10.9) and
(10.10) imply
(10.13)
S = E.
Now it is necessary to show that
(10.14)
AV = VS.
From (10.1), letting Q = X T V, we obtain
T
AV = XEX V = XEQ = XSQ.
(10.15)
(10.16)
121
Explicit analysis of the elements of equation (10.16) shows that (a) if Sii Sjj, then
Qij = 0, and (b) the commutation
QS = SQ
(10.17)
(10.18)
(10.19)
for i = 1, 2, . . . , n
(10.20)
122
for E >
for E <
(10.23a)
(10.23b)
to ensure a positive definite matrix A' results from the shift (10.19). The machine
precision is used simply to take care of those situations, such as a matrix with a
null row (and column), where the lower bound E is in fact a small eigenvalue.
Unfortunately, the accuracy of eigensolutions computed via this procedure is
sensitive to the shift. For instance, the largest residual element R, that is, the
element of largest magnitude in the matrix
AX XE
(10.24)
and the largest inner product P, that is, the off-diagonal element of largest
magnitude in the matrix
(10.25)
XT X 1 n
for the order-10 Ding Dong matrix (see appendix 1) are: for h = 3.57509,
R = 536442E6 and P = 124425E6 while for h = 107238, R = 149012E5
and P = 216812E6. These figures were computed on a Data General NOVA
(23-bit binary arithmetic) using single-length arithmetic throughout as no extended precision was available. The latter shift was obtained using
for E > 0
for E < 0.
(10.26a)
(10.26 b)
In general, in a test employing all nine test matrices from appendix 1 of order 4
and order 10, the shift defined by formulae (10.23) gave smaller residuals and
inner products than the shift (10.26). The eigenvalues used in the above examples
were computed via the Rayleigh quotient
(10.27)
rather than the singular value, that is, equation (10.20). In the tests mentioned
above, eigenvalues computed via the Rayleigh quotient gave smaller residuals
than those found merely by adding on the shift. This is hardly surprising if the
nature of the problem is considered. Suppose that the true eigenvectors are i ,
i = 1, 2, . . . , n. Let us add a component c w to j , where w is some normalised
combination of the i , i j, and c measures the size of the component (error); the
normalised approximation to the eigenvector is then
xj = (l + c 2 ) - ( j + cw).
(10.28)
The norm of the deviation (xj j ) is found, using the binomial expansion and
ignoring terms in c4 and higher powers relative to those in c2, to be approximately
equal to c. The Rayleigh quotient corresponding to the vector given by (10.28) is
2 T
2
Qj = ( Ejj + c w Aw)/(1+ c )
(10.29)
since
is zero by virtue of the orthogonality of the eigenvectors. The
deviation of Qj from the eigenvalue is
123
TABLE 10.1. Maximum absolute residual element R and maximum absolute inner product P between
normalised eigenvectors for eigensolutions of order n = 10 real symmetric matrices. All programs in
-22
BASIC on a Data General NOVA. Machine precision = 2 .
Algorithm 13 type
Algorithm 14 type
Rutishauser
Rutishauser with Nash
Jacobi
formulae
Matrix
Jacobi
which
orders
Jacobi
using
symmetry
with
equation
(10.27)
with
equation
(10.20)
Hilbert
R
P
726E-7
0
576E-6
864E-7
482E-6
113E-6
629E-6
110E-6
668E-6
232E-6
715E-6
232E-6
Ding Dong
R
P
232E-6
0
286E-6
536E-7
886E-6
143E-6
108E-5
119E-6
536E-6
124E-6
154E-5
124E-6
Moler
R
P
174E-5
194E-7
362E-5
864E-7
634E-5
805E-7
101E-4
894E-7
391E-5
221E-6
946E-5
221E-6
Frank
R
P
229E-5
209E-7
553E-5
685E-7
896E-5
107E-6
125E-4
857E-7
572E-5
166E-6
972E-5
166E-6
Bordered
R
P
179E-6
534E-9
191E-6
596E-7
620E-6
998E-7
205E-5
140E-6
143E-6
554E-7
191E-6
554E-7
Diagonal
R
P
0
0
0
0
0
0
0
0
0
0
0
0
W+
R
P
232E-6
179E-6
459E-6
126E-6
245E-5
188E-6
201E-5
191E-6
916E-6
175E-6
143E-5
175E-6
W-
R
P
194E-6
477E-7
858E-6
626E-7
163E-5
797E-7
286E-5
541E-7
135E-5
210E-6
200E-5
210E-6
Ones
R
P
465E-6
0
106E-6
365E-7
106E-5
992E-7
505E-5
104E-3
243E-5
989E-7
ll9E-5
989E-7
(10.30)
Thus the error has been squared in the sense that the deviation of xj from j is of
order c, while that of Qj from Ejj is of order c2. Since c is less than unity, this
implies that the Rayleigh quotient is in some way closer to the eigenvalue than
the vector is to an eigenvector.
Unfortunately, to take advantage of the Rayleigh quotient (and residual calculation) it is necessary to keep a copy of the original matrix in the memory or
perform some backing store manipulations. A comparison of results for algorithm
13 using formulae (10.20) and (10.27) are given in table 10.1.
Algorithm 13. Eigensolutions of a real symmetric matrix via the singular-value
decomposition
Procedure evsvd(n: integer; {order of matrix eigenproblem}
var A,V : matrix; {matrix and eigenvectors}
initev: boolean; {switch -- if TRUE eigenvectors
are initialized to a unit matrix of order n}
124
{alg13.pas ==
eigensolutions of a real symmetric matrix via the singular value
decomposition by shifting eigenvalues to form a positive definite
matrix.
This algorithm replaces Algorithm 13 in the first edition of Compact
Numerical Methods.
Copyright 1988 J.C.Nash
}
var
count, i, j, k, limit, skipped : integer;
c, p, q, s, shift, t : real ; {rotation angle quantities}
oki, okj, rotn : boolean;
ch : char;
begin
writeln(alg13.pas-- symmetric matrix eigensolutions via svd);
{Use Gerschgorin disks to approximate the shift. This version
calculates only a positive shift.}
shift:=0.0;
for i:=1 to n do
begin
t:=A[i,i];
for j:=1 to n do
if i<>j then t:=t-abs(A[ij]);
if t<shift then shift:=t; {looking for lowest bound to eigenvalue}
end; {loop over rows}
shift:=-shift; {change sign, since current value < 0 if useful}
if shift0.0 then shift:=0.0;
writeln(Adding a shift of ,shift, to diagonal of matrix.);
for i:=1 to n do
begin
for j:=1 to n do
begin
W[i,j]:=A[i,j]; {copy matrix to working array}
if i=j then W[i,i]:=A[i,i]+shift; {adding shift in process}
if initev then {initialize eigenvector matrix}
begin
if i=j then W[i+n,i]:=0.0
else
begin
W[i+n,j]:=0.0;
end,
end; {eigenvector initialization}
end; {loop on j}
end; {loop on i}
NashSVD(n, n, W, Z); {call alg01 to do the work}
for i:=1 to n do
begin
Z[i]:=sqrt(Z[i])-shift; {to adjust eigenvalues}
for j:=1 to n do
V[i,j]:=W[n+i,j]; {extract eivenvectors}
end; {loop on i}
end; {alg13.pas == evsvd}
125
(The maximum absolute residual was 38147E6, the maximum inner product
44226E7.) The last two principal moments of inertia are the same or
degenerate. Thus any linear combination of v2 and v 3 will give a new vector
which is orthogonal to
I 1 = 2ma2/12
I2 =11ma2/12
I 3 =11ma 2/12
126
(10.31)
(using V in place of X). The fact that a real symmetric matrix can be diagonalised
by its eigenvectors gives rise to a number of approaches to the algebraic
eigenvalue problem for such matrices. One of the earliest of these was suggested
by Jacobi (1846). This proposes the formation of the sequence of matrices
A( 0 ) = A
A (k+1) =(V(k) ) T A ( k )V ( k )
(10.32)
where the V(k) are the plane rotations introduced in 3.3. The limit of the
sequence is a diagonal matrix under some conditions on the angles of rotation.
Each rotation is chosen to set one off-diagonal element of the matrix A (k) to zero.
In general an element made zero by one rotation will be made non-zero by
another so that a series of sweeps through the off-diagonal elements are needed to
reduce the matrix to diagonal form. Note that the rotations in equation (10.32)
preserve symmetry, so that there are n(n -1)/2 rotations in one sweep if A is of
order n.
Consider now the effect of a single rotation, equation (3.11), in the ij plane.
Then for m i, j
(10.33)
(10.34)
while
(10.35)
(10.36)
(10.37)
By allowing
(10.38)
and
(10.39)
the angle calculation defined by equations (3.22)-(3.27) will cause
zero. By letting
to be
(10.40)
127
(10.41)
(k+1)
(k)
(10.42)
128
129
Algorithm 14. A Jacobi algorithm for eigensolutions of a real symmetric matrix (cont.)
}
{STEP 0 -- via the calling sequence of the procedure, we supply the matrix
and its dimensions to the program.}
var
count, i, j, k, limit, skipped : integer;
c, p, q, s, t : real;
ch : char;
oki, okj, rotn : boolean;
begin
writeln(alg14.pas -- eigensolutions of a real symmetric);
writeln(matrix via a Jacobi method);
if initev then {Do we initialize the eigenvectors to columns of
the identity?}
begin
for i := l to n do
begin
for j := 1 to n do V[i,j] := 0.0;
V[i,i] := 1.0; {to set V to a unit matrix -- rotated to become
the eigenvectors}
end; {loop on i;}
end; {initialize eigenvectors}
count := 0;
limit := 30; {an arbitrary choice following lead of Eberlein}
skipped := 0; {so far no rotations have been skipped. We need to set
skipped here because the while loop below tests this variable.}
{main loop}
while (count<=limit) and (skipped<((n*(n-1)) div 2) ) do
{This is a safety check to avoid indefinite execution of the algorithm.
The figure used for limit here is arbitrary. If the program terminates
by exceeding the sweep limit, the eigenvalues and eigenvectors computed
may still be good to the limitations of the machine in use, though
residuals should be calculated and other tests made. Such tests are
always useful, though they are time- and space-consuming.}
begin
count := count+1; {to count sweeps -- STEP 1}
write(sweep,count, );
skipped := 0; {to count rotations skipped during the sweep.}
for i := 1 to (n-1) do {STEP 2}
begin {STEP 3}
for j := (i+1) to n do {STEP 4}
begin
rotn := true; {to indicate that we carry out a rotation unless
calculations show it unnecessary}
p := 0.5*(A[i,j]+A[j,i]); {An average of the off-diagonal elements
is used because the right and left multiplications by the
rotation matrices are non-commutative in most cases due to
differences between floating-point and exact arithmetic.}
q := A[i,i]-A[j,j]; {Note: this may suffer from digit cancellation
when nearly equal eigenvalues exist. This cancellation is not
normally a problem, but may cause the algorithm to perform more
work than necessary when the off-diagonal elements are very
small.}
130
Algorithm 14. A Jacobi algorithm for eigensolutions of a real symmetric matrix (cont.)
t := sqrt(4.0*p*p+q*q);
if t=0.0 then {STEP 5}
begin {STEP 11 -- If t is zero, no rotation is needed.}
rotn := false; {to indicate no rotation is to be performed.}
end
else
begin {t>0.0}
if q>=0.0 then {STEP 6}
begin {rotation for eigenvalue approximations already in order}
{STEP 7 -- test for small rotation}
oki := (abs(A[i,i])=abs(A[i,i])+l00.0*abs(p));
okj := (abs(A[j,j])=abs(A[j,j])+l00.0*abs(p));
if oki and okj then rotn := false
else rotn := true;
{This test for a small rotation uses an arbitrary factor of
100 for scaling the off-diagonal elements. It is chosen to
ensure small but not very small rotations are performed.}
if rotn then
begin {STEP 8}
c := sqrt((t+q)/(2.0*t)); s := p/(t*c);
end;
end {if q>=0.0}
else
begin {q<0.0 -- always rotate to bring eigenvalues into order}
rotn := true; {STEP 9}
s := sqrt((t-q)/(2,0*t));
if p<0.0 then s := -s;
c := p/(t*s);
end; {STEP 10}
if 1.0=(1.0+abs(s)) then rotn := false; {test for small angle}
end; {if t=0.0}
if rotn then {STEP 11 -- rotate if necessary}
begin {STEP 12}
for k := 1 to n do
begin
q := A[i,k]; A[i,k] := c*q+s*A[j,k]; A[j,k] := -s*q+c*A[j,k];
end; {left multiplication of matrix A}
{STEP 13}
for k := l to n do
begin {right multiplication of A and V}
q := A[k,i]; A[k,i] := c*q+s*A[k,j]; A[k,j] := -s*q+c*A[k,j];
{STEP 14 -- can be omitted if eigenvectors not needed}
q := V[k,i]; V[k,i] := c*q+s*V[k,j]; V[k,j] := -s*q+c*V[k,j];
end; {loop on k for right multiplication of matrices A and V}
end {rotation carried out}
else
{STEP 11 -- count skipped rotations}
skipped := skipped+1; {to count the skipped rotations}
end; {loop on j} {STEP 15}
end; {loop on i. This is also the end of the sweep. -- STEP 16}
writeln( ,skipped, / ,n*(n-1) div 2, rotations skipped);
end; {while -- main loop}
end; {alg14.pas = evJacobi -- STEP 17}
131
132
S=A(
s=A(
S=A(
S=A(
S=A(
S=A(
S=A(
S=A(
S=A(
S=A(
S=A(
S=A(
S=A(
s=A(
S=A(
S=A(
S=A(
S=A(
S=A(
)=
)=
)=
)=
)=
)=
)=
)=
)=
)=
)=
)=
)=
)=
)=
)=
)=
)=
)=
,4
, 5
,6
,7
, 3
, 4
,5
, 6
,7
,4
, 5
, 6
, 7
,5
, 6
,7
, 6
,7
, 7
7.41318E-03
1.82147E-03
7.52768E-05
1.64317E-06
4.960549737
0.125876293
0.022728042
9.21734E-04
2.08461E-05
0.566085721
0.060712798
2.40536E-03
5.40200E-05
0.077233589
2.86862E-03
6.29383E-05
7.568018813
0.105281178
0.440491969
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
LOG10(S)+10=
7.870004752
7.260421332
5.876661279
4.215681882
10.6955298
9.099943945
8.356562015
6.964605555
5.319024874
9.7528822
8.78328025
7.381180598
5.73255455
8.887806215
7.457672605
5.798915030
10.8789822
9.022350736
9.643937995
MATRIX
ROW 1
:
-5.509882 0.733711324 0.144098438 7.41318E-03 1.82147E-03
1.64317E-06
7.52768E-05
:
ROW 2
0.733711324 -11.811654 4.960549737 0.125876293 0.022728042
9.21734E-04 2.08461E-05
:
ROW 3
0.144098438 4.960549737 -12.970687 0.566085721 0.060712798
2.40536E-03 5.40200E-05
:
ROW 4
7.41318E-03 0.125876293 9.566085721 -17.596207
0.077233589
2.86362E-03 6.29383E-05
:
ROW 5
1.82147E-03
0.022728042 0.060712798
0.077233589
-7.489041
7.568018813 0.105281178
:
ROW 6
7.52768E-05 9.21734E-04 2.40536E-03
2.86862E-03
7.568018813
-18.58541
0.440491969
ROW 7
:
1.64317E-06 2.08461E-05 5.40200E-05
6.29383E-05
0.105281178
0.440491969 -2.325935
NASH JACOBI ALG.
0
ROTATIONS
ROTATIONS
0
0
ROTATIONS
5
ROTATIONS
19
ROTATIONS
21
ROTATIONS
CONVERGED
EIGENVALUE 1
4.90537E-04
0.107934624
EIGENVALUE 2
6.13203E-03
0.438925002
EIGENVALUE 3
-0.954835405
5.12620E-03
EIGENVALUE 4
0.295304747
7.55197E-03
EIGENVALUE 5
-0.027700000
6.88542E-03
14 DEC 13/77
SKIPPED
SKIPPED
SKIPPED
SKIPPED
SKIPPED
SKIPPED
VECTOR:
=-2.258417596
1.72184E-03
1.37576E-03
0.978469440
VECTOR:
=-3.713643588
0.011861238
0.010411191
-0.205670808
VECTOR:
=-5.298872615
-0.240328992
-0.174066634
-1.07833E-03
=-7.574719192
VECTOR:
-0.704148469
-0.644023478
-8.51399E-04
VECTOR:
=-17.15255764
-0.584183372
0.560202581
-2.12488E-04
9.85037E-04
0.175902256
5.53618E-03
0.874486723
-0.010989746
9.16066E-03
-0.044915626
0.011341429
-0.586587905
1.65695E-03
=-17.86329687
-0.363977437
-1.36341E-04
=-22.42730849
-3.42339E-04
-0.017179128
133
VECTOR:
0.462076406
-0.808519448
6.627116-04
VECTOR:
2.46894E-03
6.41285E-03
-0.451791590
134
eigenvalues in addition to two n by n arrays to store the original matrix and the
eigenvectors. Thus 2n2+n elements appear to be needed. Rutishausers program
requires 2n2+2n elements (that is, an extra vector of length n) as well as a
number of individual variables to handle the extra operations used to guarantee
the high precision of the computed eigensolutions. Furthermore, the program
does not order the eigenvalues and in a great many applications this is a necessary
requirement before the eigensolutions can be used in further calculations. The
ordering the author has most frequently required is from most positive eigenvalue
to most negative one (or vice versa).
The extra code required to order the eigenvalues and their associated eigenvectors can be avoided by using different formulae to calculate intermediate results.
That is to say, a slightly different organisation of the program permits an extra
function to be incorporated without increasing the code length. This is illustrated
by the second column of table 10.11. It should be emphasised that the program
responsible for these results was a simple combination of Rutishausers algorithm
and some of the ideas that have been presented in 3.3 with little attention to how
well these meshed to preserve the high-precision qualities Rutishauser has carefully built into his routine. The results of the mongrel program are nonetheless
quite precise.
If the precision required is reduced a little further, both the extra vector of
length n and about a third to a half of the code can be removed. Here the
uncertainty in measuring the reduction in the code is due to various choices such
as which DIMENSION-ing, input-output or constant setting operations are included
in the count.
It may also seem attractive to save space by using the symmetry of the matrix A
as well as that of the intermediate matrices A(k). This reduces the workspace by
n (n 1)/2 elements. Unfortunately, the modification introduces sufficient extra
code that it is only useful when the order, n, of A is greater than approximately
10. However, 10 is roughly the order at which the Jacobi methods become
non-competitive with approaches such as those mentioned earlier in this section.
Still worse, on a single-precision machine, the changes appear to reduce the
precision of the results, though the program used to produce column 4 of table
10.1 was not analysed extensively to discover and rectify sources of precision loss.
Note that at order 10, the memory capacity of a small computer may already be
used up, especially if the eigenprohlem is part of a larger computation.
If the storage requirement is critical, then the methods of Hestenes (1958),
Chartres (1962) and Kaiser (1972) as modified by Nash (1975) should be
considered. This latter method is outlined in 10.1 and 10.2, and is one which
transforms the original matrix A into another, B, whose columns are the eigenvectors of A each multiplied by its corresponding eigenvalue, that is
(10.43)
2
where E is the diagonal matrix of eigenvalues. Thus only n storage locations are
required and the code is, moreover, very short. Column 5 of table 10.1 shows the
precision that one may expect to obtain, which is comparable to that found using
simpler forms of the traditional Jacobi method. Note that the residual and
inner-product computations for table 10.1 were all computed in single precision.
Chapter 11
(2.63)
(11.3)
B-1/2 = ZD - 1ZT
(11.4)
(11.5)
Then
and
(11.5a)
which is simply a matrix form which collects together all solutions to (2.63).
Equation (11.5) can be solved as a conventional symmetric eigenproblem
A 1V = VE
where
A 1= B- 1 / 2AB- 1 / 2
(11.6)
(11.7a)
and
V = B1 / 2 X.
135
(11.76)
(11.8)
so that
(D- 1ZTAZD -l )(DZT X)=(DZT X)E
(11.9)
or
A2Y = YE
(11.10)
Y= DZT X
(11.11 a)
(11.11 b)
where
and
Another approach is to apply the Choleski decomposition (algorithm 7) to B so
that
AX = LLT XE
(11.12)
where L is lower-triangular. Thus we have
(L- 1AL- T)(LT X)=(LT X) E
(11.13)
A3 W = WE.
(11.14)
or
Note that A3 can be formed by solving the sets of equations
LG = A
(11.15)
A3 LT = G
(11.16)
and
or
(11.17)
so that only the forward-substitutions are needed from the Choleski back-solution
algorithm 8. Also. the eigenvector transformation can be accomplished by solving
LT X = W
(11.18)
requiring only back-substitutions from this same algorithm.
While the variants of the Choleski decomposition method are probably the
most efficient way to solve the generalised eigenproblem (2.63) in terms of the
number of arithmetic operations required, any program based on such a method
must contain two different types of algorithm, one for the decomposition and one
to solve the eigenproblem (11.13). The eigenvalue decomposition (11.2), on the
other hand, requires only a matrix eigenvalue algorithm such as the Jacobi
algorithm 14.
Here the one-sided rotation method of algorithm 13 is less useful since there is
no simple analogue of the Gerschgorin bound enabling a shifted matrix
A' = A + k B
(11.19)
137
(11.20)
138
(11.22)
139
(11.25)
j+3
].
(11.26)
The minimisation of the Rayleigh quotient with respect to the coefficients cj gives
the eigenproblem
(11.27)
Ac = eBc
where
(11.28)
and
(11.29)
These integrals can be decomposed to give expressions which involve only the
integrals
for m odd
for m even
(11.30)
for m = 0.
The normalising constant N 2 has been chosen to cancel some awkard constants in
the integrals (see, for instance, Pierce and Foster 1956, p 68).
Because of the properties of the integrals (11.30) the eigenvalue problem
(11.27) reduces to two smaller ones for the even and the odd functions. If we set a
parity indicator w equal to zero for the even case and one for the odd case,
140
we can substitute
j - 1 = 2 (q -1) + w
(11.31 a)
(11.31b)
i- 1 = 2 (p-1) + w
where p and q will be the new indices for the matrices A and B running from 1 to
n'= n /2 (assuming n even). Thus the matrix elements are
p q=-(j 1)(j 2)Is + 2 a(2j 1)I s+2 + (k 2 4a 2 )I
s +4
+ k4 I s + 6
(11.32)
and
(11.33)
where
s = i + j 4 =2(p + q 3+ w)
and j is given by (11.31a). The tilde is used to indicate the re-numeration of A
and B.
The integrals (11.30) are easily computed recursively.
STEP
0
1
2
3
DESCRIPTION
Enter s, . Note s is even.
Let v = 1.
If s<0, stop. Is is in v. For s<0 this is always multiplied by 0.
For k = 1 to s/2.
Let v = v * (2 * k- 1) * 025/ .
End loop on k.
4
End integral. Is is returned in v.
As an example, consider the exactly solvable problem using n' = 2, and = 05
for w = 0 (even parity). Then the eigenproblem has
with solutions
e =1
c = (1, 0)T
e=5
c = 2-(-1, 2)T.
and
The same oscillator (a = 05) with w = 1 and n' = 10 should also have exact
solutions. However, the matrix elements range from 05 to 32E+17 and the
solutions are almost all poor approximations when found by algorithm 15.
Likewise, while the problem defined by n' = 5, w = 0, a = 2, k2= 0, k4 = 1 is
solved quite easily to give the smallest eigenvalue e1= 106051 with eigenvector
c = (0747087, 107358, 0866449, 0086206, 0195257)T
the similar problem with n' = 10 proves to have a B matrix which is computationally singular (step 4 of algorithm 15). Inverse iteration saves the day giving, for
n ' = 5, e = 10651 and for n' = 10, e = 106027 with eigenvectors having small
residuals. These results were found using an inverse iteration program based on
141
Previous Home
Chapter 12
f i(b 1 ,b2, . . . ,b n)
(12.1)
Next
143
(12.4)
(12.6)
with respect to b. For M greater than n, solutions can be sought in the leastsquares sense; from this viewpoint the problems are then indistinguishable. The
minimum in (12.6) should be found with S = 0 if the system of equations has a
solution. Conversely, the derivatives
j = 1, 2, . . . , n
(12.7)
144
where F is a scale factor (to avoid confusion the notation has been changed from
that of Perry and Soland). There are a number of assumptions that have not been
stated in this summary, but from the information given it is fairly easy to see that
each draw will generate a revenue
R = Np (v + w + K2 + K2N + K3t).
Thus the revenue per unit time is
R/t = S = [(p K2 )N (v + w + K1)]/ t K3.
Therefore, maximum revenue per unit time is found by minimising S( b) where
(12.9)
(12.10)
145
Yil
1
2
3
4
5
6
7
8
9
10
11
12
5308
724
9638
12866
17069
23192
31443
38558
50156
62948
75995
91972
where Z and are constants. In this simple example, the equations reduce to
or
so that
However, in general, the system will involve more than one commodity and will
not offer a simple analytic solution.
Example 12.4. Root-finding
In the economic analysis of capital projects, a measure of return on investment
that is commonly used is the internal rate of return r. This is the rate of interest
applicable over the life of the project which causes the net present value of the
project at the time of the first investment to be zero. Let yli be the net revenue of
the project, that is, revenue or income minus loss or investment, in the ith time
period. This has a present value at the first time period of
y l i /(1 + 001r)i -
where r is the interest rate in per cent per period. Thus the total present value at
the beginning of the first time period is
where K is the number of time periods in the life of the project. By setting
b = 1/(1 + 001r)
this problem is identified as a polynomial root-finding problem (12.8).
146
/[(1 + R ) N 1].
147
pointed out, however, starting points can be close to the desired solution without
guaranteeing convergence to that solution. They found that certain problems in
combination with certain methods have what they termed magnetic zeros to which
the method in use converged almost regardless of the starting parameters employed. However, I did not discover this magnetism when attempting to solve the
cubic-parabola problem of Brown and Gearhart using a version of algorithm 23.
In cases where one root appears to be magnetic, the only course of action once
several deflation methods have been tried is to reformulate the problem so the
desired solution dominates. This may be asking the impossible!
Another approach to global minimisation is to use a pseudo-random-number
generator to generate points in the domain of the function (see Bremmerman
(1970) for discussion of such a procedure including a FORTRAN program). Such
methods are primarily heuristic and are designed to sample the surface defined by
the function. They are probably more efficient than an n-dimensional grid search,
especially if used to generate starting points for more sophisticated minimisation
algorithms. However, they cannot be presumed to be reliable, and there is a lack
of elegance in the need for the shot-gun quality of the pseudo-random-number
generator. It is my opinion that wherever possible the properties of the function
should be examined to gain insight into the nature of a global minimum, and
whatever information is available about the problem should be used to increase
the chance that the desired solution is found. Good starting values can greatly
reduce the cost of finding a solution and greatly enhance the likelihood that the
desired solution will be found.
Chapter 13
ONE-DIMENSIONAL PROBLEMS
13.1. INTRODUCTION
One-dimensional problems are important less in their own right than as a part of
larger problems. Minimisation along a line is a part of both the conjugate
gradients and variable metric methods for solution of general function minimisation problems, though in this book the search for a minimum will only proceed
until a satisfactory new point has been found. Alternatively a linear search is
useful when only one parameter is varied in a complicated function, for instance
when trying to discover the behaviour of some model of a system to changes in
one of the controls. Roots of functions of one variable are less commonly needed
as a part of larger algorithms. They arise in attempts to minimise functions by
setting derivatives to zero. This means maxima and saddle points are also found,
so I do not recommend this approach in normal circumstances. Roots of polynomials are another problem which I normally avoid, as some clients have a nasty
habit of trying to solve eigenproblems by means of the characteristic equation.
The polynomial root-finding problem is very often inherently unstable in that very
small changes in the polynomial coefficients give rise to large changes in the roots.
Furthermore, this situation is easily worsened by ill chosen solution methods. The
only genuine polynomial root-finding problem I have encountered in practice is
the internal rate of return (example 12.4). However, accountants and economists
have very good ideas about where they would like the root to be found, so I have
not tried to develop general methods for finding all the roots of a polynomial, for
instance by methods such as those discussed by Jenkins and Traub (1975). Some
experiments I have performed with S G Nash (unpublished) on the use of matrix
eigenvalue algorithms applied to the companion matrices of polynomials were
not very encouraging as to accuracy or speed, even though we had expected such
methods to be slow.
13.2. THE LINEAR SEARCH PROBLEM
The linear search problem can be stated:
minimise S(b) with respect to b.
(13.1)
(13.2)
One-dimensional problems
149
Since local maxima also zero the derivative of the function S(b), such solutions
will have to be checked either by ensuring the second derivative S(b) is positive
or equivalently by examining the values of the function at points near the
supposed minimum.
When the derivative of S is not available or is expensive to compute, a method
for minimisation along a line is needed which depends only on function values.
Obviously, any method which evaluates the function at a finite number of points is
not guaranteed to detect deep, narrow wells in the function. Therefore, some
assumption must be made about the function on the interval [u,v ]. Here it will be
assumed that the function S(b) is unimodal in [u, v], that is, that there is only one
stationary value (either a maximum or a minimum) in the interval. In the case
where the stationary value is a maximum, the minimum of the function will be
either at u or at v.
Given that S(b) is unimodal in [u, v], a grid or equal-interval search could be
used to decrease the size of the interval. For instance, the function values could be
computed for each of the points
b j = u+ jh
j = 0, 1, 2, . . . , n
(13.3)
where
h = (v - u)/n.
(13.4)
If the smallest of these values is S(bk), then the minimum lies in the interval
[bk-l ,bk+1]. Equation (13.3) can be used to compute the endpoints of this
interval. If S(b0) or S(bn) is the smallest value, then the interval is [u , b 1] or
[bn- 1, u], though for simplicity this could be left out of a program. The search can
now be repeated, noting, of course, that the function values at the endpoints of
the interval have already been computed.
Algorithm 16. Grid search along a line
procedure gridsrch( var 1bound, ubound : real; {the lower and
upper bounds to the interval to be tested}
nint : integer; {the number of grid intervals}
var finin: real; {the lowest function value found}
var minarg: integer; {the grid point in the set
0,1,...,nint at which the minimum function value
was found}
var changarg: integer {the grid point in the set
1,2,...,nint which is nearest the upper bound
ubound such that there has been a sign change
in the function values f(lbound+(changarg-1)*h)
and f(lbound+changarg*h) where h is the step
== (ubound - lbound)/nint} );
{alg16.pas == one-dimensional grid search over function values
This version halts execution if the function is not computable at all
the grid points. Note that it is not equivalent to the version in the
first edition of Compact Numerical Methods.
Copyright 1988 J.C.Nash
}
150
One-dimensional problems
151
(13.5)
which provides three linear equations for the three unknowns A, B and C. Once
these are found, it is simple to set the derivative of the interpolant to zero
dI( b) / db = 2Cb + B = 0
(13.6)
to find the value of b which minimises the interpolating polynomial I(b). This
presumes that the parabola has a minimum at the stationary point, and upon this
presumption hang many ifs, ands and buts, and no doubt miles of computer listing
of useless results. Indeed, the largest part of many procedures for minimising a
function along a line by minimising an interpolant I(b) is concerned with ensuring
that the interpolant has the proper shape to guarantee a minimum. Furthermore
I(b) cannot be used to extrapolate very far from the region in which it has beer
152
(13.7)
S1 <S0
(13.8)
S 1 <S 2 .
(13.9)
for brevity,
and
Excluding the exceptional cases that the function is flat or otherwise perverse, so
that at least one of the conditions (13.8) or (13.9) is an inequality, the interpolating parabola will have its minimum between b 0 and b 2. Note now that we can
measure all distances from b1, so that equations (13.5) can be rewritten
(13.10)
where
xj = bj - b1
for j = 0, 1, 2.
(13.11)
One-dimensional problems
153
(Note that the denominators differ only in their signs.) Hence the minimum of the
parabola is found at
(13.14)
The success-failure algorithm always leaves the step length equal to x2. The
length x0 can be recovered if the steps from some initial point to the previous two
evaluation points are saved. One of these points will be b 1; the other is taken as
b0. The expression on the right-hand side of equation (13.14) can be evaluated in
a number of ways. In the algorithm below, both numerator and denominator have
been multiplied by -1.
To find the minimum of a function of one parameter, several cycles of
success-failure and parabolic inverse interpolation are usually needed. Note that
algorithm 17 recognises that some functions are not computable at certain points
b. (This feature has been left out of the program FMIN given by Forsythe et al
(1977), and caused some failures of that program to minimise fairly simple
functions in tests run by B Henderson and the author, though this comment
reflects differences in design philosophy rather than weaknesses in FMIN.) Algorithm 17 continues to try to reduce the value of the computed function until
(b+h) is not different from b in the machine arithmetic. This avoids the
requirement for machine-dependent tolerances, but may cause the algorithm to
execute indefinitely in environments where arithmetic is performed in extendedprecision accumulators if a storage of (b+h) is not forced to shorten the number
of digits carried.
In tests which I have run with B Henderson, algorithm 17 has always been
more efficient in terms of both time and number of function evaluations than a
linear search procedure based on that in algorithm 22. The reasons for retaining
the simpler approach in algorithm 22 were as follows.
(i) A true minimisation along the line requires repeated cycles of successfailure/inverse interpolation. In algorithm 22 only one such cycle is used as part of
a larger conjugate gradients minimisation of a function of several parameters.
Therefore, it is important that the inverse interpolation not be performed until at
least some progress has been made in reducing the function value, and the
procedure used insists that at least one success be observed before interpolation
is attempted.
(ii) While one-dimensional trials and preliminary tests of algorithm 17-like cycles
in conjugate gradients minimisation of a function of several parameters showed
some efficiency gains were possible with this method, it was not possible to carry
out the extensive set of comparisons presented in chapter 18 for the function
minimisation algorithms due to the demise of the Data General NOVA; the
replacement ECLIPSE uses a different arithmetic and operating system. In view
of the reasonable performance of algorithm 22, I decided to keep it in the
collection of algorithms. On the basis of our experiences with the problem of
minimising a function of one parameter, however, algorithm 17 has been chosen
for linear search problems. A FORTRAN version of this algorithm performed
154
competitively with the program FMIN due to Brent as given in Forsythe et al (1977)
when several tests were timed on an IBM 370/168.
The choice of the step adjustment factors Al and A2 to enlarge the step length
or to reduce it and change its sign can be important in that poor choices will
obviously cause the success-failure process to be inefficient. Systematic optimisation of these two parameters over the class of one-dimensional functions which
may have to be minimised is not feasible, and one is left with the rather
unsatisfactory situation of having to make a judgement from experience. Dixon
(1972) mentions the choices (2,-025) and (3,-05). In the present application,
however, where the success-failure steps are followed by inverse interpolation, I
have found the set (15,-025) to be slightly more efficient, though this may
merely reflect problems I have been required to solve.
Algorithm 17. Minimisation of a function of one variable
procedure min1d(var bb : real; {initial value of argument of function
to be minimised, and resulting minimum position}
var st: real; {initial and final step-length}
var ifn : integer; {function evaluation counter}
var fnminval : real {minimum function value on return});
{alg17.pas ==
One-dimensional minimisation of a function using success-failure
search and parabolic inverse interpolation
Copyright 1988 J.C.Nash
}
{No check is made that abs(st)>0.0. Algorithm will still converge.}
var
a1, a2, fii, s0, s1, s2, tt0, tt1, tt2, x0, x1, x2, xii : real;
notcomp, tripleok: boolean;
begin
writeln(alg17.pas -- One dimensional function minimisation);
{STEP 0 -- partly in procedure call}
ifn := 0; {to initialize function evaluation count}
a1 := 1.5; {to set the growth parameter of the success-failure search}
a2 := -0.25; {to set the shrink parameter of the success-failure search}
x1 := bb; {to start, we must have a current best argument}
notcomp := false; {is set TRUE when function cannot be computed. We set it
here otherwise minimum or root of fn1d may be displayed}
s0 := fn1d(x1,notcomp); ifn := ifn+l; {Compute the function at xl.}
if notcomp then
begin
writeln(***FAILURE *** Function cannot be computed at initial point);
halt;
end;
repeat {Main minimisation loop}
x0 := xl; {to save value of argument}
bb := x0;
x1 := x0+st; {to set second value of argument}
s1 := fn1d(x1,notcomp); if notcomp then s1 := big; ifn := ifn+1;
{Note mechanism for handling non-computability of the function.}
tripleok := false; {As yet, we do not have a triple of points in a V.}
One-dimensional problems
Algorithm 17. Minimisation of a function of one variable (cont.)
if s1<s0 then
begin {Here we can proceed to try to find s2 now in same direction.}
repeat {success-failure search loop}
st := st*a1; {increase the stepsize after a success}
x2 := x1+st; {get next point in the series}
s2 := fn1d(x2,notcomp); if notcomp then s2 := big; ifn := ifn+1;
if s2<s1 then
begin {another success}
s0 := s1; s1 := s2; {In order to continue search,}
x0 := x1; x1 := x2; {we copy over the points.}
write(Success1);
end
else {failure after success ==> V-shaped triple of points}
begin
tripleok := true; (to record existence of the triple}
write(Failure1);
end;
until tripleok, (End of the success-failure search for}
end {s1<s0 on first pair of evaluations}
else
begin {s1>=s0 on first pair of evaluations in this major cycle, so we
must look for a third point in the reverse search direction.}
st := a2*st; {to reverse direction of search and reduce the step size}
tt2 := s0; s0 := s1; s1 := tt2; {to swap function values}
tt2 := x0; x0 := x1; x1 := tt2; {to swap arguments}
repeat
x2 := x1+st; {get the potential third point}
s2 := fn1d(x2,notcomp); if notcomp then s2 := big; ifn := ifn+1;
if s2<s1 then
begin {success in reducing function -- keep going}
s0 := s1; s1 := s2, x0 := x1; x1 := x2; {reorder points}
st := st*a1; {increase the stepsize maintaining direction}
write(Success2);
end
else
begin {two failures in a row ensures a triple in a V}
tripleok := true; write(Failure2);
end;
until tripleok; {End of success-failure search}
end, {if s1<s0 for first pair of test points}
{Now have a V of points (x0,s0), (x1,s1), (x2,s2).}
writeln; writeln(Triple (,x0,,,s0,*));
writeln((,x1,,,s1,)); writeln((,x2,,,s2,));
tt0 := x0-x1; {to get deviation from best point found. Note that st
holds value of x2-x1.}
tt1 := (s0-s1)*st; tt2 := (s2-s1)*tt0; {temporary accumulators}
if tt1<>tt2 then {avoid zero divide in parabolic inverse interpolation}
begin
st := 0.5*(tt2*tt0-tt1*st)/(tt2-ttl1; {calculate the step}
Xii := x1+st;
writeln(Paramin step and argument :,st, ,xii);
if (reltest+xii)<>(reltest+x1) then
155
156
ECLIPSE
F(0) = 22500
F(10) = 222285
F (20) = 221777
F (30) = 223693
F (40) = 228278
F (50) = 235792
For both sets, the endpoints are u=0, v =50 and number of points
is n = 5.
One-dimensional problems
157
Simple grid search was applied to this function on Data General NOVA and
ECLIPSE computers operating in 23-bit binary and six-digit hexadecimal arithmetic,
respectively. The table at the bottom of the previous page gives the results of this
exercise. Note the difference in the computed function values!
An extended grid search (on the ECLIPSE) uses 26 function evaluations to
localise the minimum to within a tolerance of 01.
NEW
*ENTER#SGRID#
*RUN
SGRID NOV 23 77
16 44 31
3 5 l978
ENTER SEARCH INTERVAL ENDPOINTS
AND TOLERANCE OF ANSWERS PRECISION
? 10 ? 30 ? 1
ENTER THE NUMBER OF GRID DIVISIONS
F( 14 )= 22180.5
F( 18 )= 22169.2
F( 22 )=: 22195.9
F( 76 )= 22262.1
THE MINIMUM LIES IN THE INTERVAL.
F( 15.6 )= 22171.6
F( 17.2 )= 22168.6
F( 18.8 )= 22171.6
F( 20.4 )=: 22180.7
THE MINIMUM LIES IN THE INTERVAL
F( 16.24 )=: 22169.7
F( 16.88 )= 22168.7
F( 17.52 )= 22168.7
F( 18.16 )= 22169.7
THE MINIMUM LIES IN THE INTERVAL
F( 16.496 )= 22169.2
F( 36.752 )= 22168.8
F( 17.008 ):= 22168.7
F( 17.264 )= 22168.6
THE MINIMUM LIES IN THE INTERVAL
18 FUNCTION EVALUATIONS
NEW TOLERANCE ? .1
F( 17.1104 )= 22168.6
F( 17.2128 )= 22168.6
F( 17.3152 )= 22168.6
F( 17.4176 )= 22168.6
THE MINIMUM LIES IN THE INTERVAL [
F( 17.1513 )= 22168.6
F( l7.1923 )= 22168.6
F( 17.2332 )= 22168.6
F( 17.2742 )= 22168.6
THE: MINIMUM LIES IN THE INTERVAL [
26 FUNCTION EVALUATIONS
NEW TOLERANCE ? -1
[ 14 , 22 ]
[ 15.6 , 18.8 ]
[ 16.24 , 17.52 ]
[ 17.008 , 17.52 ]
17.1104 , 17.3152 ]
17.1923 , 17.2742 ]
STOP AT 0420
*
Algorithm 17 requires a starting point and a step length. The ECLIPSE gives
*
*RUN
NEWMIN JULY 7 77
STARTING VALUE= ? 10 STEP ? 5
158
The effect of step length choice is possibly important. Therefore, consider the
following applications of algorithm 17 using a starting value of t = 10.
Step length
Minimum at
Function evaluations
1
5
10
20
172264
172067
172314
171774
13
12
10
11
The differences in the minima are due to the flatness of this particular function,
which may cause difficulties in deciding when the minimum has been located. By
way of comparison, a linear search based on the success-failure/inverse interpolation sequence in algorithm 22 found the following minima starting from t = 10.
Step length
Minimum at
Function evaluations
1
5
10
20
172063
172207
172388
172531
23
23
21
24
One-dimensional problems
159
A cubic inverse interpolation algorithm requiring both the function and derivative to be computed, but which employs a convergence test based solely on the
change in the parameter, used considerably more effort to locate a minimum from
t = 10.
Step length
Minimum at
1
5
10
20
172083
172082
172082
172081
38+38
23+23
36+36
38+38
Most of the work in this algorithm is done near the minimum, since the region of
the minimum is located very quickly.
If we can be confident of the accuracy of the derivative calculation, then a
root-finder is a very effective way to locate a minimum of this function. However,
we should check that the point found is not a maximum or saddle. Algorithm 18
gives
*
*RUN200
ROOTFINDER
U= ? 10 V= ? 30
BISECTION EVERY ? 5
TOLERANCE ? 0
F( 10 )=-16.4537 F( 30 )= 32.0994
FP
ITN 1 U= 10
V= 30 F( 16.7776 )=-1.01735
FP ITN 2 U= 16.7776 V= 30 F( 17.1838 )=-0.582123
FP ITN 3 U= 17.1838 V= 30 F( 17.207 )=-.00361633
FP ITN 4 U= 17.2307 V= 30 F( 117.2084 )=-2.28882E-04
FP ITN 5 U= 17.2084 V= 30 F( 17.2085 )=-3.05176E-05
BI ITN 6 U= 17.2085 V= 30 F( 23.6042 )= 15.5647
FP CONVERGED
ROOT: F( 17.2085 )=-3.05176E-05
STOP AT 0340
*
Unless otherwise stated all of the above results were obtained using a Data
General ECLIPSE operating in six hexadecimal digit arithmetic.
It may appear that this treatment is biased against using derivative information.
For instance, the cubic inverse interpolation uses a convergence test which does
not take it into account at all. The reason for this is that in a single-precision
environment (with a short word length) it is difficult to evaluate the projection of
a gradient along a line since inner-product calculations cannot be made in
extended precision. However, if the linear search is part of a larger algorithm to
minimise a function for several parameters, derivative information must usually
be computed in this way. The function values may still be well determined, but
160
(13.15)
j = 0, 1, 2, . . . ,(n + 1)
(13.16)
can be computed. If
f (u + jh) *f(u +(j + 1)h ) < 0
(13.17)
then the interval [u + jh, u +(j +1)h] contains at least one root and the search can
be repeated on this smaller interval if necessary. Roots which occur with
f(b) = 0
f'(b) = 0
(13.18)
simultaneously are more difficult to detect, since the function may not cross the b
axis and a sign change will not be observed. Grid searches are notoriously
expensive in computer time if used indiscriminately. As in the previous section,
they are a useful tool for discovering some of the properties of a function by
taking a look, particularly in conjunction with some plotting device. As such, the
grid parameter, n, should not be large; n < 10 is probably a reasonable bound.
Unfortunately, it is necessary to caution that if the inequality (13.17) is not
satisfied, there may still be an even number of roots in [u + jh,u +(j + 1)h].
Suppose now that a single root is sought in the interval [ u, v] which has been
specified so that
(13.19)
f(u) * f(u) < 0.
Thus the function has at least one root in the interval if the function is continuous.
One possible way to find the root is to bisect the interval, evaluating the function
at
b = (u + v)/2.
(13.20)
If
f (u) * f(b) < 0
(13.21)
One-dimensional problems
161
then the root lies in [u,b]; otherwise in [b, v]. This bisection can be repeated as
many times as desired. After t bisections the interval length will be
2-t(u v)
(13.22)
so that the root can always be located by this method in a fixed number of steps.
Since the function values are only examined for their signs, however, an unnecessarily large number of function evaluations may be required. By reversing the
process of interpolation-that is, estimating function values between tabulated
values by means of an interpolating function fitted to the tabular points-one may
expect to find the root with fewer evaluations of the function. Many interpolants
exist, but the simplest, a straight line, is perhaps the easiest and most reliable to
use. This inverse linear interpolation seeks the zero of the line
y - f(u) = [f (v)- f(u) ] (b - u) / (v - u)
(13.23)
(13.24)
Once f(b) has been evaluated, the interval can be reduced by testing for condition
(13.21) as in the bisection process.
All the above discussion is very much a matter of common sense. However, in
computer arithmetic the formulae may not give the expected results. For instance,
consider the function depicted in figure 13.1(a). Suppose that the magnitude of f(v)
is very much greater than that of f(u). Then formula (13.24) will return a value b
very close to u. In fact, to the precision used, it may happen that b and u are
identical and an iteration based on (13.24) may never converge. For this reason it
is suggested that this procedure, known as the method of False Position, be
combined with bisection to ensure convergence. In other words, after every few
iterations using the False Position formula (13.24), a step is made using the
bisection formula (13.20). In all of this discussion the practical problems of
evaluating the function correctly have been discretely ignored. The algorithms
will, of course, only work correctly if the function is accurately computed. Acton
(1970, chap 1) discusses some of the difficulties involved in computing functions.
A final possibility which can be troublesome in practice is that either of the
formulae (13.20) or (13.24) may result in a value for b outside the interval [u, v]
when applied in finite-precision arithmetic, a clearly impossible situation when the
calculations are exact. This has already been discussed in 1.2 (p 6). Appropriate tests must be included to detect this, which can only occur when u and v are
close to each other, that is, when the iteration is nearing convergence. The author
has had several unfortunate experiences with algorithms which, lacking the
appropriate tests, continued to iterate for hundreds of steps.
Some readers may wonder where the famous Newtons algorithm has disappeared in this discussion. In a manner similar to False Position, Newtons method
seeks a root at the zero of the line
(13.25)
y - f(u) = f(u)(b - u)
where the point (u, f(u)) has been used with f'(u), the derivative at that point, to
162
(13.26)
which gives the zero of the line, is suggested as the next point approximating the
root of f and defines an iteration if b replaces u. The iteration converges very
rapidly except in a variety of delightful cases which have occupied many authors
(see, for instance, Acton 1970, chap 2, Henrici 1964, chap 4) and countless
careless programmers for many hours. The difficulty occurs principally when f(u)
becomes small. The bisection/False Position combination requires no derivatives
and is thoroughly reliable if carefully implemented. The only situation which may
upset it is that in which there is a discontinuity in the function in the interval
[u, v], when the algorithm may converge to the discontinuity if f( b- )f (b +) < 0,
where b - and b + refer to values of b as approached from below and above.
It should be noted that the False Position formula (13.24) is an approximation
to Newtons formula (13.26) by approximating
f ' (u) = [f (v) - f (u) / (v - u) .
(13.27)
The root-finding algorithm based on (13.24) with any two points u, v instead of a
pair which straddle at least one root is called the secant algorithm.
Algorithm 18. Root-finding by bisection and False Position
procedure root1d(var 1bound, ubound: real; {range in which
root is to be found -- refined by procedure}
var ifn: integer; {to count function evaluations}
tol : real; {the width of the final interval
[lbound, ubound] within which a root is to be
located. Zero is an acceptable value.}
var noroot: boolean {to indicate that interval
on entry may not contain a root since both
function values have the same sign});
{alg18.pas == a root of a function of one variable
Copyright 1988 J.C.Nash
}
var
nbis: integer;
b, fb, flow, fup : real;
notcomp: boolean;
begin
writeln(alg18.pas -- root of a function of one variable);
{STEP 0 -- partly in the procedure call}
notcomp := false; {to set flag or else the known root will be displayed
by the function routine}
ifn := 2; {to initialize the function evaluation count}
nbis := 5; {ensure a bisection every 5 function evaluations}
fup := fn1d(ubound,notcomp);
if notcomp then halt;
flow := fn1d(lbound,notcomp);
if notcomp then halt; {safety check}
writeln(f(,lbound:8:5,)=,flow,f(,ubound:8:5,)=,fup);
One-dimensional problems
163
164
finding, we prefer to require the user to provide an interval in which at least one root exists and
upon which the function is defined. The driver program DR1618.PAS on the software diskette is
in tended to allow users to approximately localise roots of functions using grid search, ,followed by
a call to algorithm 18 to refine the position of a suspected root.
(13.28)
where
y = s * (b - t )
(13.29)
(13.30)
One-dimensional problems
165
f(b)
0
01
02
03
04
041
042
043
044
045
046
047
048
049
05
06
07
08
09
10
-100001
-100001
-100001
-100001
-100001
-100001
-0999987
-0999844
-0998783
-0990939
-0932944
-0505471
25972
228404
989994
199
199
199
199
199
(13.32)
may require a very much higher premium to be satisfied if all the arrests occur
166
early in the simulation period than if they occur at the end. Therefore. it is likely
that any sensible simulation. will use root-finding to solve (13.32) for p for a
variety of sets of arrest figures n. In particular, a pseudo-random-number
generator can be used to provide such sets of numbers chosen from some
distribution or other. The function is then computed via one of the two recurrence
relations
f i+ 1 (p)=f i(p)(1+r e ) +mp(1+05r e ) -n i b
for fi (p)>0
(13.33)
or
f o r f i(p)<0.
(13.34)
Note that our shrewd criminals invest their premium money to increase the fund.
The rate 05r e is used to take account of the continuous collection of premium
payments over a period.
To give a specific example consider the following parameters: benefit b=1200,
membership m=2000, interest rates r=008 and rb=015, initial fund f 0=0
and after 10 periods f10=0 (a non-profit scheme!). The root-finding algorithm is
then applied using u=0, v=2000. Three sets of arrest figures were used to
167
One-dimensional problems
simulate the operation of the scheme. The results are given in table 13.2. The
arrests are drawn from a uniform distribution on (0,400).
TABLE 13.2. Simulated operation of an income insurance program.
8607
9290
Premium =
Period
nj
1
2
3
4
5
6
7
8
9
10
0
2
279
124
374
356
101
281
23
117
Total
1657
f i (P)
nj
19323750
39953394
28993412
35756631
13060906
-9290475
-3480294
-18398587
-4954625
-069
17
232
317
67
74
55
152
304
113
181
10992
f i (P)
nj
f i (P)
15863094
7195231
-12366062
-4357875
4011538
15635550
16549481
-703469
3534100
-081
188
315
194
313
35
7
127
387
55
148
302950
14609869
-17218394
-34498200
-21009975
-2138519
5163650
-18000312
-4437406
-069
1512
1769
Function evaluations
to find root
10
14
(Total benefits)/(number
of premiums paid)
9942
9072
11
10614
The last entry in each column is an approximation based on no interest paid or earned in the fund
management. Thus
approximate premium = total arrests * b/(n* T)
= total arrests * 006.
These examples were run in
FORTRAN
on
an IBM 370/168.
Chapter 14
(14.1)
169
The operation of reflection then reflects bH through b C using a reflection factor (a,
that is
b R = b C +a(b C -bH )
= ( l +a)b C -ab H .
(14.3)
If S(b R) is less than S(b L) a new lowest point has been found, and the simplex can
be expanded by extending the line (bR -b C ) to give the point
(14.4)
where , the expansion factor, is greater than unity or else (14.4) represents a
contraction. If S(bE)<S(b R) then b H is replaced by b E and the procedure
repeated by finding a new highest point and a new centroid of n points b C .
Otherwise b R is the new lowest point and it replaces b H .
170
In the case where b R is not a new lowest point, but is less than b N, the
next-to-highest point, that is
S( b L )<S(b R ) <S(b N )
(14.5)
(14.6)
for all i L. In exact arithmetic, (14.10) is acceptable for all points, and the
author has in some implementations omitted the test for i = L. Some caution is
warranted, however, since some machines can form a mean of two numbers which
is not between those two numbers. Hence, the point b L may be altered in the
operations of formula (14.10).
Different contraction factors and ' may be used in (14.8) and (14.10). In
practice these, as well as a and can be chosen to try to improve the rate of
convergence of this procedure either for a specific class or for a wide range of
problems. Following Nelder and Mead (1965), I have found the strategy
a =1
=2
' = = 05
(14.11)
171
172
Besides choices for (a, , ' and other than (14.11) there are many minor
variations on the basic theme of Nelder and Mead. The author has examined
several of these, mainly using the Rosenbrock (1960) test function of two
parameters
(14.16)
starting at the point (-12, 1).
(i) The function value S(b C ) can be computed at each iteration. If S(b C )<S(b L ),
b L is replaced by b C . The rest of the procedure is unaffected by this change, which
is effectively a contraction of the simplex. If there are more than two parameters,
the computation of bC can be repeated. In cases where the minimum lies within
the current simplex, this modification is likely to permit rapid progress towards
the minimum. Since, however, the simplex moves by means of reflection and
expansion, the extra function evaluation is often unnecessary, and in tests run by
the author the cost of this evaluation outweighed the benefit.
(ii) In the case that S(bR ) <S( b L) the simplex is normally expanded by extension
along the line (b R -b C ). If b R is replaced by b E, the formulae contained in the first
two lines of equation (14.4) permit the expansion to be repeated. This modification suffers the same disadvantages as the previous one; the advantages of the
repeated extension are not great enough-in fact do not occur often enough-to
offset the cost of additional function evaluations.
(iii) Instead of movement of the simplex by reflection of b H through bC , one could
consider extensions along the line (b L -b C ), that is, from the low vertex of the
simplex. Simple drawings of the two-dimensional case show that this tends to
stretch the simplex so that the points become coplanar, forcing restarts. Indeed,
a test of this idea produced precisely this behaviour.
(iv) For some sets of parameters b, the function may not be computable, or a
constraint may be violated (if constraints are included in the problem). In such
cases, a very large value may be returned for the function to prevent motion in
the direction of forbidden points. Box (1965) has enlarged on this idea in his
Complex Method which uses more than (n+1) points in an attempt to prevent all
the points collapsing onto the constraint.
(v) The portion of the algorithm for which modifications remain to be suggested
is the starting (and restarting) of the procedure. Until now, little mention has been
made of the manner in which the original simplex should be generated. Nelder
and Mead (1965) performed a variety of tests using initial simplexes generated by
equal step lengths along the parameter axes and various arrangements of the
initial simplex. The exact meaning of this is not specified. They found the rate of
convergence to be influenced by the step length chosen to generate an initial
simplex. ONeill (1971) in his FORTRAN implementation permits the step along
each parameter axis to be specified separately, which permits differences in the
scale of the parameters to be accommodated by the program. On restarting, these
steps are reduced by a factor of 1000. General rules on how step lengths should
173
be chosen are unfortunately difficult to state. Quite obviously any starting step
should appreciably alter the function. In many ways this is an (n+1)-fold
repetition of the necessity of good initial estimates for the parameters as in 12.2.
More recently other workers have tried to improve upon the Nelder-Mead
strategies, for example Craig et al (1980). A parallel computer version reported by
Virginia Torczon seems to hold promise for the solution of problems in relatively
large numbers of parameters. Here we have been content to stay close to the original
Nelder-Mead procedure, though we have simplified the method for ranking the
vertices of the polytope, in particular the selection of the point b N .
Algorithm 19. A Nelder-Mead minimisation procedure
procedure nmmin(n: integer; {the number of parameters in the
function to be minimised}
var Bvec,X: rvector; {the parameter values on
input (Bvec) and output (X) from minmeth}
var Fmin: real; {minimum function value}
Workdata: probdata; {user defined data area}
var fail: boolean; {true if method has failed}
var intol: real); {user-initialized convergence
tolerance; zero on entry if it is not set yet.}
{alg19.pas == Nelder Mead minimisation of a function of n parameters.
Original method due to J. Nelder and R. Mead., Computer Journal,
vol 7, 1965 pp. 308-313.
Modification as per Nash J and Walker-Smith M, Nonlinear Parameter
Estimation: an Integrated System in BASIC, Marcel Dekker: New York,
1987.
Modifications are principally
- in the computation of the next to highest vertex of the current
polytope,
- in the verification that the shrink operation truly reduces the size
of the polytope, and
- in form of calculation of some of the search points.
We further recommend an axial search to verify convergence. This can
be called outside the present code. If placed in-line, the code can
be restarted at STEP3.
If space is at a premium, vector X is not needed except to return
final values of parameters.
Copyright 1988 J.C.Nash
}
const
Pcol= 27; {Maxparm + 2 == maximum number of columns in polytope}
Prow = 26; {Maxparm + 1 == maximum number of rows in polytope}
alpha = 1.0; (reflection factor)
beta = 0.5; {contraction and reduction factor}
gamma = 2.0; {extension factor}
var
action : string[15]; {description of action attempted on polytope. The
program does not inform the user of the success of
the attempt. However, the modifications to do
this are straightforward.]
174
175
176
177
178
(13.17)
where e is some small number such as the square root of the machine precision.
Section 18.2 gives a discussion of this choice. Its principal advantage is that the
increment always adjusts the parameter. Alternatively. I have employed the
assignment
(14.18)
bi := bi (10001)
unless these fail to change the parameter, in which case I use
bi := 0001.
(14.19)
The latter axial search was used in the set of tests used to compare algorithms for
function minimisation which were included in the first edition and reported in 18.4
of both editions. What follows below reflects our current usage. including some
measures of the curvature and symmetry of the functional surface near the presumed
minimum.
179
180
t = 1, 2 , . . . , m.
P t =Q t -b 1 Q t- 1 -b 2 Q t-
t = 1, 2, . . . , m
will have properties different from those of the original series. In particular, the
autocorrelation coefficient of order k is defined (Kendall 1973) as
The following output was produced with driver program DR1920, but has been
edited for brevity. The final parameters are slightly different from those given in the
first edition, where algorithm 19 was run in much lower precision. Use of the final
parameters from the first edition (11104, -0387185) as starting parameters for the
present code gives an apparent minimum.
Minimum function value found = 2.5734305415E-24
At parameters
B[l]= 1.1060491080Et00
B[2]= -3.7996531780E-01
dr1920.pas -- driver for Nelder-Mead minimisation
15: 21: 37
1989/01/25
File for input of control data ([cr] for keyboard) ex14-1
File for console image ([cr] = nul) d:testl4-1.
Function: Jaffrelot Minimisation of First Order ACF
25.02000 25.13000 25.16000 23.70000 22.09000 23.39000
26.96000
181
28.75000
26.79000
38.37000
30.26000
34.19000
25.63000
41.97000
53.00000
182
4.6980937735E-14
3
4.1402114150E-02
BUILD
4.6980937735E-14
1.2559406706E-02
5
LO-REDUCTION
4.6980937735E-14
3.4988133663E-03
HI -REDUCTION
7
4.6980937735E-14
9
7.8255935023E-04
HI-REDUCTION
...
SHRINK
59
3.1448995130E-14
1.0373099578E-16
1.0373099578E-16
2.4400978639E-14
63
SHRINK
1.7010223449E-14 1.0373099578E-16
65
HI-REDUCTION
...
6.0920713485E-24
3.9407472806E-25
117
HI--REDUCTION
Exiting from ALG19.pas Nelder Mead polytope minimiser
119 function evaluations used
Minimum function value found = 1.7118624554E-25
At parameters
B[l]= 1.1213869326E+00
B[2]= -4.0522273834E-01
alg20.pas -- axial search
Axis
Stepsize
tilt
function +
function rad. of curv.
1 1.512415E-06 9.226758E-12 9.226726E-12 2.479099E-01 6.159003E-10
2 5.465253E-07 4.546206E-13 4.546053E-13 6.5702023-01 8.031723E-10
183
nevertheless may need to minimise functions. I have used the Hooke and Jeeves
method in a number of forecasting courses, and published it as a step-anddescription version with a BASIC code in Nash (1982). A more advanced version
appears in Nash and Walker-Smith (1987). I caution that it may prove very slow to
find a minimum, and that it is possible to devise quite simple functions (Nash and
Walker-Smith 1989) which will defeat its heuristic search. Algorithm 27 below
presents a Pascal implementation.
Algorithm 27. Hooke and Jeeves minimiser
procedure hjmin(n: integer; {the number of parameters in the
function to be minimised}
var B,X: r-vector; {the parameter values on
input (B) and output (X) from minmeth}
var Fmin: real; {minimum function value}
Workdata: probdata; {user defined data area}
var fail: boolean; {true if method has failed}
intol: real); {user-initialized convergence
tolerance}
{alg27.pas == Hooke and Jeeves pattern search function minimisation
From Interface Age, March 1982 page 34ff.
Copyright 1988 J.C.Nash
}
var
i j: integer; {loop counters}
stepsize: real; {current step size}
fold: real; {function value at old base point}
fval: real; {current function value}
notcomp: boolean; {set true if function not computable}
temp: real; {temporary storage value}
samepoint: boolean; {true if two points are identical}
ifn: integer; {to count the number of function evaluations}
begin
if intol<0.0 then int<1 := calceps; {set convergence tolerance if necessary}
ifn := 1; {to initialize the count of function evaluations}
fail := false; {Algorithm has not yet failed.}
{STEP HJ1: n already entered, but we need an initial stepsize. Note the
use of the stepredn constant, though possibly 0.1 is faster for
convergence. Following mechanism used to set stepsize initial value.}
stepsize := 0.0;
for i := l to n do
if stepsize < stepredn*abs(B[i]) then stepsize := stepredn*abs(B[i]);
if stepsize=0.0 then stepsize := stepredn; {for safety}
{STEP HJ2: Copy parameters into vector X}
for i := 1 to n do X[i] := B[i];
{STEP HJ3 not needed. In original code, parameters are entered into X
and copied to B}
fval := fminfn(n, B,Workdata,notcomp); {STEP HJ4}
if notcomp then
begin
writeln(***FAILURE*** Function not computable at initial point);
fail := true;
184
185
Chapter 15
(15.1)
evaluated at the point b. Descent methods all use the basic iterative step
b ' =b - k B g
(15.2)
(15.3)
(15.4)
The principal difficulty with steepest descents is its tendency to hemstitch, that is,
to criss-cross a valley on the function S(b) instead of following the floor of the
valley to a minimum. Kowalik and Osborne (1968, pp 34-9) discuss some of the
reasons for this weakness, which is primarily that the search directions generated
are not linearly independent. Thus a number of methods have been developed
which aim to transform the gradient g so that the search directions generated in
(15.2) are linearly independent or, equivalently, are conjugate to each other with
respect to some positive definite matrix A. In other words, if x i and x j are search
directions, xi and x j are conjugate with respect to the positive definite matrix A if
(15.5)
The conjugate gradients algorithm 22 generates such a set of search directions
implicitly, avoiding the storage requirements of either a transformation matrix B
or the previous search directions. The variable metric algorithm 21 uses a
transformation matrix which is adjusted at each step to generate appropriate
search directions. There is, however, another way to think of this process.
Consider the set of nonlinear equations formed by the gradient at a minimum
g (b')=0.
186
(15.6)
187
(15.9)
188
(15.2)
(15.12)
then
BH =l n
or
B= H - l .
(15.13)
(15.16)
(15.17)
189
is given by
yj = gj + l - g j =H(b j + l -b j )
=k jHt j
(15.18)
since the elements of H are constant in this case. From this it follows that (15.15)
becomes
(15.19)
B( m ) yj =k jtj .
Assuming (15.15) is correct for j<m, a new step
t m =B( m ) g m
(15.20)
(15.21)
(15.25)
Cy m =k mt m -B( m ) y m
(15.26)
or
and
B ( m + l )yj =k j t j
f o r j<m
(15.27)
or
Cy j =k jt j -B( m ) y j
= k jt j -k jt j =0.
(15.28)
In establishing (15.26) and (15.28), equation (15.19) has been applied for B ( m ) .
There are thus m conditions on the order-n matrix C. This degree of choice in C
has in part been responsible for the large literature on variable metric methods.
The essence of the variable metric methods, i.e. that information regarding
the Hessian has been drawn from first derivative computations only is somewhat
hidden in the above development. Of course, differences could have been used to
generate an approximate Hessian from (n+1) vectors of first derivatives, but a
190
Newton method based on such a matrix would still require the solution of a linear
system of equations at each step. The variable metric methods generate an
approximate inverse Hessian as they proceed, requiring only one evaluation of the
first derivatives per step and, moreover, reduce the function value at each of these
steps.
15.3. A CHOICE OF STRATEGIES
In order to specify a particular variable metric algorithm it is now necessary to
choose a matrix-updating formula and a linear search procedure. One set of
choices resulting in a compact algorithm is that of Fletcher (1970). Here this will
be simplified somewhat and some specific details made clear. First, Fletcher
attempts to avoid an inverse interpolation in performing the linear search. He
suggests an acceptable point search procedure which takes the first point in some
generated sequence which satisfies some acceptance criterion. Suppose that the
step taken is
t = b ' -b = -k B g
(15.29)
with k=1 initially. The decrease in the function
(15.30)
will be approximated for small steps t by the first term in the Taylor series for S
along t from b, that is, by t Tg. This is negative when t is a downhill direction. It
is not desirable that S be very different in magnitude from t T g since this would
imply that the local approximation of the function by a quadratic form is grossly
in error. By choosing k=1, w, w2, . . . , for 0<w<1 successively, it is always
possible to produce a t such that
0<tolerance< S/tTg
(15.31)
(15.32)
(15.33)
(15.34)
191
(15.35)
d 2 = ( 1 +y T By/d l )d 1 .
(15.36)
and
There are several ways in which the update can be computed and added into B. In
practice these may give significantly different convergence patterns due to the
manner in which digit cancellation may occur. However, the author has not been
able to make any definite conclusion as to which of the few ways he has tried is
superior overall. The detailed description in the next section uses a simple
form for convenience of implementation. The properties of the BroydenFletcher-Shanno update will not be discussed further in this work.
In order to start the algorithm, some initial matrix B must be supplied. If it
were easy to compute the Hessian H for the starting parameters b , H-1 would
be the matrix of choice. For general application, however, B=1n (15.37) is a
simpler choice and has the advantage that it generates the steepest descent
direction in equation (15.29). I have found it useful on a machine having short
mantissa arithmetic to apply the final convergence test on the steepest descent
192
direction to ensure that rounding errors in updating B and forming t via (15.29)
have not accidentally given a direction in which the function S cannot be
reduced. Therefore, a restart is suggested in any of the following cases:
(i) t T g>0, that is, the direction of search is uphill.
(ii) b'=b, that is, no change is made in the parameters by the linear search
along t.
If either case (i) or case (ii) occurs during the first step after B has been set to
the unit matrix, the algorithm is taken to have converged.
(iii) Since the method reduces S along t, it is expected that
t T g(b')
will be greater (less negative) than
t T g(b).
Figure 15.1 illustrates this idea. Therefore, tT y =d 1 should be positive. If it is not
there exists the danger that B may no longer be positive definite. Thus if t T y < 0 ,
the matrix B is reset to unity.
This completes the general description of a variable metric algorithm suitable
for small computers. I am indebted to Dr R Fletcher of Dundee University for
suggesting the basic choices of the Broyden-Fletcher-Shanno updating formula
and the acceptable point search. Note particularly that this search does not
require the gradient to be evaluated at each trial point
Algorithm 21. Variable metric minimiser
The algorithm needs an order-n square matrix B and five order-n vectors b, x, c, g and t. Care
should be taken when coding to distinguish between B and b.
procedure vmmin(n: integer; {the number of parameters in the
function to be minimised}
var Bvec, X: rvector; {the parameter values on
input (Bvec) and output (X) from minmeth}
var Fmin: real; {minimum function value}
Workdata: probdata; {user defined data area}
var fail: boolean; {true if method has failed}
var intol: real); {user-initialized convergence
tolerance}
{alg21.pas == modified Fletcher variable metric method.,
Original method due to R. Fletcher, Computer Journal, vol 13,
pp. 317-322, 1970
Unlike Fletcher-Reeves, we do not use a quadratic interpolation,
since the search is often approximately a Newton step
Copyright 1988 J.C.Nash
}
const
Maxparm = 25; {maximum allowed number of parameters in the
present code. May be changed by the user,
along with dependent constants below.}
stepredn = 0.2; {factor to reduce stepsize in line search}
193
194
195
196
ITNS= 1 # EVALNS=
ITNS= 2 # EVALNS=
ITNS= 3 # EVALNS=
ITNS= 4 # EVALNS=
ITNS= 5 # EVALNS=
ITNS= 6 # EVALNS=
ITNS= 7 # EVALNS=
ITNS= 8 # EVALNS=
ITNS= 9 # EVALNS=
ITNS= 10 # EVALNS=
ITNS= 11 # EVALNS=
ITNS= 12 # EVALNS=
ITNS= 13 # EVALNS=
ITNS= 14 # EVALNS=
ITNS= 15 # EVALNS=
ITNS= 16 # EVALNS=
ITNS= 17 # EVALNS=
ITNS= 18 # EVALNS=
ITNS= 19 # EVALNS=
ITNS= 20 # EVALNS=
ITNS= 21 # EVALNS=
ITNS= 22 # EVALNS=
ITNS= 23 # EVALNS=
ITNS= 24 # EVALNS=
ITNS= 25 # EVALNS=
ITNS= 26 # EVALNS=
ITNS= 27 # EVALNS=
ITNS= 28 # EVALNS=
ITNS= 29 # EVALNS=
ITNS= 30 # EVALNS=
ITNS= 31 # EVALNS=
ITNS= 32 # EVALNS=
ITNS= 33 # EVALNS=
ITNS= 34 # EVALNS=
ITNS= 35 # EVALNS=
ITNS= 35 # EVALNS=
B( 1)= 0.10000000E+01
B( 2)= 0.10000000E+01
# ITNS= 35 # EVALNS=
1 FUNCTION= 0.24199860E+02
6 FUNCTION= 0.20226822E+02
9 FUNCTION= 0.86069937E+01
14 FUNCTION= 0.31230078E+01
16 FUNCTION= 0.28306570E+01
21 FUNCTION= 0.26346817E+01
23 FUNCTION= 0.20069408E+01
24 FUNCTION= 0.18900719E+01
25 FUNCTION= 0.15198193E+01
26 FUNCTION= 0.13677282E+01
27 FUNCTION= 0.10138159E+01
28 FUNCTION= 0.85555243E+00
29 FUNCTION= 0.72980821E+00
30 FUNCTION= 0.56827205E+00
32 FUNCTION= 0.51492560E+00
33 FUNCTION= 0.44735157E+00
34 FUNCTION= 0.32320732E+00
35 FUNCTION= 0.25737345E+00
37 FUNCTION= 0.20997590E+00
38 FUNCTION= 0.17693651E+00
39 FUNCTION= 0.12203962E+00
40 FUNCTION= 0.74170172E-01
41 FUNCTION= 0.39149582E-01
43 FUNCTION= 0.31218585E-01
44 FUNCTION= 0.25947951E-01
45 FUNCTION= 0.12625925E-01
46 FUNCTION= 0.78500621E-02
47 FUNCTION= 0.45955069E-02
48 FUNCTION= 0.15429037E-02
49 FUNCTION= 0.62955730E-03
50 FUNCTION= 0.82553088E-04
51 FUNCTION= 0.54429529E-05
52 FUNCTION= 0.57958061E-07
53 FUNCTION= 0.44057202E-10
54 FUNCTION= 0.0
54 FUNCTION= 0.0
54
FUNCTION=
0.0
Previous Home
Chapter 16
(15.11)
(15.17)
(15.18)
(16.3)
Next
198
q i = -g i
is substituted from (15.18), then we have
(16.5
(16.6)
Moreover, if accurate line searches have been performed at each of the ( i -1)
previous steps, then the function S (still the quadratic form (15.11)) has been
minimised on a hyperplane spanned by the search directions tj , j=1, 2, . . . ,
(i-l), and gi is orthogonal to each of these directions. Therefore, we have
z i j=0
for j<(i- 1 )
(16.7)
(16.8)
Alternatively, using
(16.9)
which is a linear combination of gj , j=1, 2, . . . , (i-1), we obtain
(16.10)
(16.11)
by virtue of the orthogonality mentioned above.
As in the case of variable metric algorithms, the formulae obtained for
quadratic forms are applied in somewhat cavalier fashion to the minimisation of
general nonlinear functions. The formulae (16.8), (16.10) and (16.11) are now no
longer equivalent. For reference, these will be associated with the names: Beale
(1972) and Sorenson (1969) for (16.8); Polak and Ribiere (1969) for (16.10); and
Fletcher and Reeves (1964) for (16.11). All of these retain the space-saving
two-term recurrence which makes the conjugate gradients algorithms so frugal of
storage.
In summary, the conjugate gradients algorithm proceeds by setting
t 1 = -g(b 1 )
(16.12)
i, i - 1t i - 1 -g i (b i )
(16.13)
and
t j =z
with
b j +l=b j +k jt j
where kj is determined by a linear search for a minimum of S(bj +kj tj )
respect to kj.
(16.14)
with
199
Since the conjugate gradients methods are derived on the presumption that
they minimise a quadratic form in n steps, it is also necessary to suggest a method
for continuing the iterations after n steps. Some authors, for instance Polak and
Ribiere (1969), continue iterating with the same recurrence formula. However,
while the iteration matrix B in the variable metric algorithms can in a number of
situations be shown to tend towards the inverse Hessian H-1 in some sense, there
do not seem to be any similar theorems for conjugate gradients algorithms.
Fletcher and Reeves (1964) restart their method every (n+1) steps with
t 1 = -g 1
(16.15)
while Fletcher (1972) does this every n iterations. Powell (1975a, b) has some
much more sophisticated procedures for restarting his conjugate gradients
method. I have chosen to restart every n steps or whenever the linear search can
make no progress along the search direction. If no progress can be made in the
first conjugate gradient direction--that of steepest descent-then the algorithm is
taken to have converged.
The linear search used in the first edition of this book was that of 13.2. However,
this proved to work well in some computing environments but poorly in others. The
present code uses a simpler search which first finds an acceptable point by stepsize
reduction, using the same ideas as discussed in 15.3. Once an acceptable point has
been found, we have sufficient information to fit a parabola to the projection of the
function on the search direction. The parabola requires three pieces of information.
These are the function value at the end of the last iteration (or the initial point), the
projection of the gradient at this point onto the search direction, and the new (lower)
function value at the acceptable point. The step length resulting from the quadratic
inverse interpolation is used to generate a new trial point for the function. If this
proves to give a point lower than the latest acceptable point, it becomes the starting
point for the next iteration. Otherwise we use the latest acceptable point, which is the
lowest point so far.
A starting step length is needed for the search. In the Newton and variable metric
(or quasi-Newton) methods, we can use a unit step length which is ideal for the
minimisation of a quadratic function. However, for conjugate gradients, we do not
have this theoretical support. The strategy used here is to multiply the best step
length found in the line search by some factor to increase the step. Our own usual
choice is a factor 1.7. At the start of the conjugate gradients major cycle we set the
step length to 1. If the step length exceeds 1 after being increased at the end of each
iteration, it is reset to 1.
If the choice of linear search is troublesome, that of a recurrence formula is
even more difficult. In some tests by the author on the 23-bit binary NOVA, the
Beale-Sorenson formula (16.8) in conjunction with the linear search chosen above
required more function and derivative evaluations than either formula (16.10) or
formula (16.11). A more complete comparison of the Polak-Ribiere formula
(16.10) with that of Fletcher-Reeves (16.11) favoured the former. However, it is
worth recording Fletchers (1972) comment: I know of no systematic evidence
which indicates how the choice should be resolved in a general-purpose algorithm. In the current algorithm, the user is given a choice of which approach
should be used.
200
201
202
203
204
where the bj are the parameters which should give T=0 when summed as shown
with the weights w. Given wj, j=1, 2, . . . , n, T can easily be made zero since
there are (n-1) degrees of freedom in b. However, some degree of confidence
must be placed in the published figures, which we shall call pj , j=1, 2, . . . , n.
Thus, we wish to limit each bj so that
| b j-p j| <d j
for j=1, 2, . . . , n
The factor 100 is arbitrary. Note that this is in fact a linear least-squares problem,
subject to the constraints above. However, the conjugate gradients method is
quite well suited to the particular problem in 23 parameters which was presented,
since it can easily incorporate the tolerances dj by declaring the function to be not
computable if the constraints are violated. (In this example they do not in fact
appear to come into play.) The output below was produced on a Data General
ECLIPSE operating in six hexadecimal digit arithmetic. Variable 1 is used to hold
the values p, variable 2 to hold the tolerances d and variable 3 to hold the weights
w. The number of data points is reported to be 24 and a zero has been appended
205
206
# EVALS= 50
STOP AT 0911
*SIZE
USED: 3626 BYTES
LEFT: 5760 BYTES
*
# ITNS= 3
# EVALS= 29
In the above output, the quantities printed are the number of iterations
(gradient evaluations), the number of function evaluations and the lowest function
value found so far. The sensitivity of the gradient and the convergence pattern to
relatively small changes in arithmetic is, in my experience, quite common for
algorithms of this type.
Previous Home
Chapter 17
Next
208
(17.6)
(17.7)
for 0<c<1. Even for non-convex functions which are bounded from below, the
steepest descents method will find a local minimum or saddle point. All the
preceding results are, of course, subject to the provision that the function and
gradient are computed exactly (an almost impossible requirement). In practice,
however, convergence is so slow as to disqualify the method of steepest descents
on its own as a candidate for minimising functions.
Often the cause of this slowness of convergence is the tendency of the method
to take pairs of steps which are virtually opposites, and which are both essentially
perpendicular to the direction in which the minimum is to be found. In a
two-parameter example we may think of a narrow valley with the minimum
somewhere along its length. Suppose our starting point is somewhere on the side
of the valley but not near this minimum. The gradient will be such that the
direction of steepest descent is towards the floor of the valley. However, the step
taken can easily traverse the valley. The situation is then similar to the original
one, and it is possible to step back across the valley almost to our starting point
with only a very slight motion along the valley toward the solution point. One can
picture the process as following a path similar to that which would be followed by
a marble or ball-bearing rolling over the valley-shaped surface.
To illustrate the slow convergence, a modified version of steepest descents was
programmed in BASIC on a Data General NOVA minicomputer having machine
precision 2-22. The modification consisted of step doubling if a step is successful.
The step length is divided by 4 if a step is unsuccessful. This reduction in step size
is repeated until either a smaller sum of squares is found or the step is so small
that none of the parameters change. As a test problem, consider the Rosenbrock
banana-shaped valley:
starting with
S(-l2,1)=241999
209
(as evaluated). The steepest descents program above required 232 computations
of the derivative and 2248 evaluations of S(x) to find
S(100144, 10029)=2110-6.
The program was restarted with this point and stopped manually after 468
derivative and 4027 sum-of-squares computations, where
S(100084, 100168)=7110-7.
By comparison, the Marquardt method to be described below requires 24
derivative and 32 sum-of-squares evaluations to reach
S(1,1)=1410 - 1 4 .
(There are some rounding errors in the display of x1, x2 or in the computation of
S(x), since S(1,1)=0 is the solution to the Rosenbrock problem.)
The Gauss-Newton method
At the minimum the gradient v(x) must be null. The functions vi (x), j=1,
2, . . . , n, provide a set of n nonlinear functions in n unknowns x such that
v (x)= 0
(17.8)
the solution of which is a stationary point of the function S(x), that is, a local
maximum or minimum or a saddle point, The particular form (17.1) or (17.2) of
S(x) gives gradient components
(17.9)
which reduces to
(17.10)
or
v = J Tf
(17.11)
210
(17.17)
211
(17.20)
where e is some parameter. Then as e becomes very large relative to the norm of
JTJ, q tends towards the steepest descents direction, while when e is very small
compared to this norm, the Gauss-Newton solution is obtained. Furthermore, the
scaling of the parameters
x' =Dx
(17.21)
where D is a diagonal matrix having positive diagonal elements, implies a
transformed Jacobian such that
J ' =JD - 1
(17.22)
and equations (17.20) become
[(J' ) TJ'+ e1]q'=-( J') Tf
=( D - 1JTJD - 1 + e1)Dq = - D- 1JTf
(17.23a)
(17.23b)
or
(JTJ+eD 2 ) q= - JTf
(17.24)
212
large enough can, moreover, be made computationally positive definite so that the
simplest forms of the Choleski decomposition and back-solution can be employed.
That is to say, the Choleski decomposition is not completed for non-positive
definite matrices. Marquardt (1963) suggests starting the iteration with e=01,
reducing it by a factor of 10 before each step if the preceding solution q has
given
S(x+ q) <S( x)
and x has been replaced by (x+q). If
S(x+q)> S (x)
then e is increased by a factor of 10 and the solution of equations (17.24)
repeated. (In the algorithm below, e is called lambda.)
17.5. CRITIQUE AND EVALUATION
By and large, this procedure is reliable and efficient. Because the bias e is reduced
after each successful step, however, there is a tendency to see the following
scenario enacted by a computer at each iteration, that is, at each evaluation of J:
(i) reduce e, find S (x+q)>S(x);
(ii) increase e, find S(x+q)<S(x), so replace x by (x+q ) and proceed to (i).
This procedure would be more efficient if e were not altered. In other examples
one hopes to take advantage of the rapid convergence of the Gauss-Newton part
of the Marquardt equations by reducing e, so a compromise is called for. I retain
10 as the factor for increasing e, but use 04 to effect the reduction. A further
safeguard which should be included is a check to ensure that e does not approach
zero computationally. Before this modification was introduced into my program,
it proceeded to solve a difficult problem by a long series of approximate
Gauss-Newton iterations and then encountered a region on the sum-of-squares
surface where steepest descents steps were needed. During the early iterations e
underflowed to zero, and since JT J was singular, the program made many futile
attempts to increase e before it was stopped manually.
The practitioner interested in implementing the Marquardt algorithm will find
these modifications included in the description which follows.
Algorithm 23. Modified Marquardt method for minimising a nonlinear sum-of-squares
function
procedure modmrt( n : integer; {number of residuals and number of parameters}
var Bvec : rvector; {parameter vector}
var X : rvector; {derivatives and best parameters}
var Fmin : real; {minimum value of function}
Workdata:probdata);
{alg23.pas == modified Nash Marquardt nonlinear least squares minimisation
method.
Copyright 1988 J.C.Nash
}
var
213
214
215
216
217
S(b (0))=243495
Chapter 18
LEFT-OVERS
18.1. INTRODUCTION
This chapter is entitled left-overs because each of the topics-approximation
of derivatives, constrained optimisation and comparison of minimisation
algorithms-has not so far been covered, though none is quite large enough in
the current treatment to stand as a chapter on its own. Certainly a lot more could
be said on each, and I am acutely aware that my knowledge (and particularly my
experience) is insufficient to allow me to say it. As far as I am aware, very little
work has been done on the development of compact methods for the mathematical programming problem, that is, constrained minimisation with many constraints. This is a line of research which surely has benefits for large machines, but
it is also one of the most difficult to pursue due to the nature of the problem. The
results of my own work comparing minimisation algorithms are to my knowledge
the only study of such methods which has been made on a small computer. With
the cautions I have given about results derived from experiments with a single
system, the conclusions made in 18.4 are undeniably frail, though they are for
the most part very similar to those of other workers who have used larger
computers.
18.2. NUMERICAL APPROXIMATION OF DERIVATIVES
In many minimisation problems, the analytic computation of partial derivatives is
impossible or extremely tedious. Furthermore, the program code to compute
(18.1)
in a general unconstrained minimisation problem or
(18.2)
in a nonlinear least-squares problem may be so long as to use up a significant
proportion of the working space of a small computer. Moreover, in my experience
9 cases out of 10 of failure of a minimisation program are due to errors in the
code used to compute derivatives. The availability of numerical derivatives
facilitates a check of such possibilities as well as allowing programs which require
derivatives to be applied to problems for which analytic expressions for derivatives are not practical to employ.
In the literature, a great many expressions exist for numerical differentiation of
functions by means of interpolation formulae (see, for instance, Ralston 1965).
However, in view of the large number of derivative calculations which must be
218
Left-overs
219
made during the minimisation of a function, these are not useful in the present
instance. Recall that
(18.3)
where ej is the jth column of the unit matrix of order n (b is presumed to have n
elements). For explanatory purposes, the case n=1 will be used. In place of the
limit (18.3), it is possible to use the forward difference
D= [ S(b+h )- S(b)]/ h
(18.4)
(i) For h small, the discrete nature of the representation of numbers in the
computer causes severe inaccuracies in the calculation of D. The function S is
continuous; its representation is not. In fact it will be a series of steps. Therefore,
h cannot be allowed to be small. Another way to think of this is that since most of
the digits of b are the same as those of (b+h), any function S which is not varying
rapidly will have similar values at b and (b+h), so that the expression (18.4)
implies a degree of digit cancellation causing D to be determined inaccurately.
(ii) For h large, the line joining the points (b , S(b)) and (b+h, S(b+h)) is no
longer tangential to the curve at the former. Thus expression (18.4) is in error due
to the nonlinearity of the function. Even worse, for some functions there may be
a discontinuity between b and (b+h). Checks for such situations are expensive of
both human and computer time. The penalty for ignoring them is unfortunately
more serious.
As a compromise between these extremes, I suggest letting
(18.5)
where is the machine precision. The parameter has once more been given a
subscript to show that the step taken along each parameter axis will in general be
different. The value for h given by (18.5) has the pleasing property that it cannot
become smaller than the machine precision even if bj is zero. Neither can it fail to
change at least the right-most half of the digits in bj since it is scaled by the
magnitude of the parameter. Table 18.1 shows some typical results for this
step-length choice compared with some other values.
Some points to note in table 18.1 are:
(i) The effect of the discontinuity in the tangent function in the computations for
b=1 and b=157 (near /2). The less severely affected calculations for b=
-157 suggest that in some cases the backward difference
D=[ S(b)-S (b- h)]/ h
(18.6)
may be preferred.
(ii) In approximating the derivative of exp(0001) using h=193024E-6 as in
equation (18.5), the system used for the calculations printed identical values for
exp(b) and exp(b+h) even though the internal representations were different
TABLE 18.1. Derivative approximations computed by formula (18.4) on a Data General NOVA. (Extended
log(b)
h
1
00625
390625E3
244141E4
152588E5
953674E7
193024E6
977516E4
976572E2
976658E3
153416E3
Analytic
derivative
BASIC
exp(b)
b = 0001
b=1
b = 100
690876
664166
407171
894754
992437
1000
999012
0693147
0969993
0998032
0999756
1
1
095064E3
999451E3
100098E2
117188E2
00625
0
b = 10
780101E5
468483E5
454858E5
452995E5
43869E5
305176E5
b = 0001
b=l
172
103294
100293
1
1
1
0988142
467077
280502
272363
271875
275
2
0999512
b=0
b = 100
46188E43
27737H343
269068E43
267435E43
181193E43
0
b=l
155741 374245
10013
380045
344702
1
0999999 342578
0999999 35
0999999 5
272
b = l57
b = 1.57
125661
203544
403844
22655E6
157835E6
124006E6
125532
19843
267094
12074E6
159459E6
247322E6
343317
282359E43
99999E3
456229E5
170216E6 539026
1000
001
454E5
1.001
2.71828 268805E43 1
3.42552
157744E6
157744E6
Left-overs
221
j =1, 2, . . . ,m
(18.7)
hk (b) <0
k=1, 2, . . . , q.
(18.8)
and
In general, if the constraints c are independent, m must be less than n , since via
solution of each constraint for one of the parameters b in terms of the others, the
dimensionality of the problem may be reduced. The inequality restrictions h, on
the other hand, reduce the size of the domain in which the solution can be found
without necessarily reducing the dimensionality. Thus there is no formal bound to
the number, q, of such constraints. Note, however, that the two inequalities
h(b)<0
h(b)> 0
(18.9)
(18.10)
for > 0 shows that the inequality constraints may be such that they can never be
satisfied simultaneously. Problems which have such sets of constraints are termed
infeasible. While mutually contradicting constraints such as (18.10) are quite
obvious, in general it is not trivial to detect infeasible problems so that their
detection can be regarded as one of the tasks which any solution method should
be able to accomplish.
There are a number of effective techniques for dealing with constraints in
minimisation problems (see, for instance, Gill and Murray 1974). The problem is
by and large a difficult one, and the programs to solve it generally long and
complicated. For this reason, several of the more mathematically sophisticated
methods, such as Lagrange multiplier or gradient projection techniques, will not
be considered here. In fact, all the procedures proposed are quite simple and all
involve modifying the objective function S(b) so that the resulting function has its
unconstrained minimum at or near the constrained minimum of the original
function. Thus no new algorithms will be introduced in this section.
Elimination or substitution
The equality constraints (18.7) provide m relationships between the n parameters
b. Therefore, it may be possible to solve for as many as m of the b s in terms of
222
the other (n-m). This yields a new problem involving fewer parameters which
automatically satisfy the constraints.
Simple inequality constraints may frequently be removed by substitutions which
satisfy them. In particular, if the constraint is simply
bk > 0
then a substitution of
(18.11)
(18.12)
(18.16)
Left-overs
223
subject to
(18.18b)
This can be solved by elimination. However, in order to perform the elimination it
is necessary to decide whether
(18.19)
or
(18.20)
The first choice leads to the constrained minimum at b 1= b 2=2-. The second
leads to the constrained maximum at b1 =b 2=-2-. This problem is quite easily
solved approximately by means of the penalty (18.14) and any of the unconstrained minimisation methods.
A somewhat different question concerning elimination arises in the following
problem due to Dr Z Hassan who wished to estimate demand equations for
commodities of which stocks are kept: minimise
(18.21)
subject to
b3 b6= b4 b5 .
(18.22)
The data for this problem are given in table 18.2. The decision that must now
be made is which variable is to be eliminated via (18.22); for instance, b 6 can be
found as
(18.23)
b 6= b 4 b 5 /b3 .
The behaviour of the Marquardt algorithm 23 on each of the four unconstrained
minimisation problems which can result from elimination in this fashion is shown
in table 18.3. Numerical approximation of the Jacobian elements was used to save
some effort in running these comparisons. Note that in two cases the algorithm
has failed to reach the minimum. The starting point for the iterations was bj =1,
j=l, 2, . . . , 6, in every case, and these failures are most likely due to the large
differences in scale between the variables. Certainly, this poor scaling is responsible for the failure of the variable metric and conjugate gradients algorithms when
the problem is solved by eliminating b 6. (Analytic derivatives were used in these
cases.)
The penalty function approach avoids the necessity of choosing which parameter is eliminated. The lower half of table 18.3 presents the results of computations
with the Marquardt-like algorithm 23. Similar results were obtained using the
Nelder-Mead and variable metric algorithms, but the conjugate gradients method
failed to converge to the true minimum. Note that as the penalty weighting w is
increased the minimum function value increases. This will always be the case if a
constraint is active, since enforcement of the constraint pushes the solution up
the hill.
Usually the penalty method will involve more computational effort than the
elimination approach (a) because there are more parameters in the resulting
224
TABLE 18.2. Data for the problem of Z Hassan specified by (18.21) and (18.22). Column j below
gives the observations yij , for rows i=1, 2, . . . , m, for m = 26.
1
28675
274857
286756
283461
28605
295925
299863
305198
317953
317941
312646
319625
324063
318566
320239
319582
326646
330788
326205
336785
333414
341555
352068
367147
378424
385381
309935
28675
274857
286756
283461
28605
295925
299863
305198
317953
317941
312646
319625
324063
318566
320239
319582
326646
330788
326205
336785
333414
341555
352068
367147
378424
-404026
13707
431876
-200324
312226
472799
48855
6222
573661
34828
70303
387177
151204
213098
427881
457464
579923
650383
518661
670433
396747
49061
184491
745368
106711
134671
113266
109226
109363
113681
111678
1148
119528
120017
126239
131976
132324
133027
136899
138411
140542
144821
149395
155194
161698
166885
173589
177557
182463
184308
191761
202432
5
01417
001626
001755
011485
0001937
-00354
000221
000131
001156
003982
003795
-000737
0004141
001053
0021
003255
0016911
00308
0069821
001746
0045153
003982
002095
001427
010113
021467
6
06429
07846
08009
08184
09333
09352
08998
0902
09034
09149
09547
09927
09853
09895
1
1021
10536
10705
11013
11711
11885
12337
12735
12945
13088
14099
unconstrained problem, in our example six instead of five, and (b) because the
unconstrained problem must be solved for each increase of the weighting w.
Furthermore, the ultimate de-scaling of the problem as w is made large may cause
slow convergence of the iterative algorithms.
In order to see how the penalty function approach works for inequality
constraints where there is no corresponding elimination, consider the following
problem (Dixon 1972, p 92): minimise
(18.24)
subject to
3b1+4b2 <6
(18.25)
-b1+4 b2 <2.
(18.26)
and
The constraints were weighted equally in (18.15) and added to (18.24). The
resulting function was minimised using the Nelder-Mead algorithm starting from
b 1= b2=0 with a step of 001 to generate the initial simplex. The results are
presented in table 18.4 together with the alternative method of assigning the
225
Left-overs
TABLE 18.3. Solutions found for Z Hassan problem via Marquardt-type algorithm using numerical
approximation of the Jacobian and elimination of one parameter via equation (18.14). The values in
italics are for the eliminated parameter. All calculations run in BASIC on a Data General NOVA in
23-bit binary arithmetic.
b2
b1
b3
b4
b5
327508E-2
Elimi- Sum of
nated squarest
b6
b3
740813
0541558
b4
0113976
706745
(104)
749862
(170)
606163
(67)
606127
(67)
0026591
-59629
-949263
b5
0757074 0167033
276222E-2
-58976
-975284
b6
272248E-2
-591436
-964008
b6
606106
(65)
-589751
-885664
604923
(73)
Increase to w = 1E4
0760732 0167005
45353
272233E-2
-591574
-963664
606097
(+48)
Increase to w = 1E6
453508 0760759 0167023
272204E-2
-591504
-963989
606108
(+22)
439342 076762
460481
0167034
Figures in brackets below each sum of squares denote total number of equivalent function
evaluations (= (n+1) *(number of Jacobian calculations) + (number of function calculations)) to convergence.
function a very large value whenever one or more of the constraints is violated. In
this last approach it has been reported that the simplex may tend to collapse or
flatten against the constraint. Swann discusses some of the devices used to
counteract this tendency in the book edited by Gill and Murray (1974). Dixon
(1972, chap 6) gives a discussion of various methods for constrained optimisation
with a particular mention of some of the convergence properties of the penalty
function techniques with respect to the weighting factors w and W.
TABLE 18.4. Results of solution of problem (18.24)-( 18.26) by the Nelder-Mead
algorithm.
Weighting
10
1E4
Set function
very large
Function
value
Number of evaluations
to converge
b1
b2
-535397
-535136
113
167
146161
145933
040779
040558
-535135
121
145924
0405569
226
It is quite difficult to compare the four basic algorithms presented in the preceding
chapters in order to say that one of them is best. The reason for this is that one
algorithm may solve some problems very easily but run out of space on another.
An algorithm which requires gradients may be harder to use than another which
needs only function values, hence requiring more human time and effort even
though it saves computer time. Alternatively, it may only be necessary to improve
an approximation to the minimum, so that a method which will continue until it
has exhausted all the means by which it may reduce the function value may very
well be unnecessarily persistent.
Despite all these types of question, I have run an extensive comparison of the
methods which have been presented. In this, each method was applied to 79 test
problems, all of which were specified in a sum-of-squares form
S=f Tf.
(18.27)
(18.28)
Nash (1976) presents 77 of these problems, some of which have been used as
examples in this text. The convergence criteria in each algorithm were as
stringent as possible to allow execution to continue as long as the function was
being reduced. In some cases, this meant that the programs had to be stopped
manually when convergence was extremely slow. Some judgement was then used
to decide if a satisfactory solution had been found. In programs using numerical
approximation to the derivative, the same kind of rather subjective judgement
was used to decide whether a solution was acceptable based on the computed
gradient. This means that the results given below for minimisation methods
employing the approximation detailed in 18.2 have used a more liberal definition
of success than that which was applied to programs using analytic or no derivatives. The question this raises is: do we judge a program by what it is intended to
do? Or do we test to see if it finds some globally correct answer? I have chosen
the former position, but the latter is equally valid. Such differences of viewpoint
unfortunately give rise to many disputes and controversies in this field.
Having now described what has been done, a measure of success is needed. For
reliability, we can compare the number of problems for which acceptable (rather
than correct) solutions have been found to the total number of problems run.
Note that this total is not necessarily 79 because some problems have an initial set
of parameters corresponding to a local maximum and the methods which use
gradient information will not generally be able to proceed from such points. Other
problems may be too large to fit in the machine used for the tests-a partition of a
Data General NOVA which uses 23-bit binary arithmetic. Also some of the
algorithms have intermediate steps which cause overflow or underflow to occur
because of the particular scale of the problems at hand.
The two remaining functions are given by Osborne (1972).
227
Left-overs
(18.29)
equivalent function evaluations. I use the factor (n+1) rather than n because of
the particular structure of my programs. The use of equivalent function evaluations, and in particular the choice of multiplier for ig, biases my appraisal against
methods using derivatives, since by and large the derivatives are not n times more
work to compute than the function. Hillstrom (1976) presents a more comprehensive approach to comparing algorithms for function minimisation, though in the
end he is still forced to make some subjective judgements.
Having decided to use equivalent function evaluations as the measure of
efficiency, we still have a problem in determining how to use them, since the
omission of some problems for each of the methods means that a method which
has successfully solved a problem involving many parameters (the maximum in
any of the tests was 20, but most were of order five or less) may appear less
efficient than another method which was unable to tackle this problem. To take
account of this, we could determine the average number of equivalent function
evaluations to convergence per parameter, either by averaging this quantity over
the successful applications of a method or by dividing the total number of efes for
all successful cases by the total number of parameters involved. In practice, it was
decided to keep both measures, since their ratio gives a crude measure of the
performance of algorithms as problem order increases.
To understand this, consider that the work required to minimise the function in
any algorithm is proportional to some power, a, of the order n, thus
w =p n a .
(18.30)
The expected value of the total work over all problems divided by the total
number of parameters is approximated by
(18.31)
The average of w/n over all problems, on the other hand, is approximately
(18.32)
228
(18.34)
as an estimate of the degree of the relationship between work and problem order,
n, of a given method. The limited extent of the tests and the approximation of
sums by integrals, however, mean that the results of such an analysis are no more
than a guide to the behaviour of algorithms. The results of the tests are presented
in table 18.5.
The conclusions which may be drawn from the table are loosely as follows.
(i) The Marquardt algorithm 23 is generally the most reliable and efficient.
Particularly if problems having large residuals, which cause the Gauss-Newton
approximation (17.16) to be invalid, are solved by other methods or by increasing
the parameter phi in algorithm 23, it is extremely efficient, as might be expected
since it takes advantage of the sum-of-squares form.
(ii) The Marquardt algorithm using a numerical approximation (18.4) for the
Jacobian is even more efficient than its analytic-derivative counterpart on those
problems it can solve. It is less reliable, of course, than algorithms using analytic
derivatives. Note, however, that in terms of the number of parameters determined
successfully, only the variable metric algorithm and the Marquardt algorithm are
more effective.
(iii) The Nelder-Mead algorithm is the most reliable of the derivative-free
methods in terms of number of problems solved successfully. However, it is also
one of the least efficient, depending on the choice of measure w1 or w0, though in
some ways this is due to the very strict convergence criterion and the use of the
axial search procedure. Unfortunately, without the axial search, the number of
problems giving rise to failures of the algorithm is 11 instead of four, so I cannot
recommend a loosening of the convergence criteria except when the properties of
the function to be minimised are extremely well known.
(iv) The conjugate gradients algorithm, because of its short code length and low
working-space requirements, should be considered whenever the number of
parameters to be minimised is large, especially if the derivatives are inexpensive
to compute. The reliability and efficiency of conjugate gradients are lower than
those measured for variable metric and Marquardt methods. However, this study,
by using equivalent function evaluations and by ignoring the overhead imposed by
each of the methods, is biased quite heavily against conjugate gradients, and I
would echo Fletchers comment (in Murray 1972, p 82) that the algorithm is
extremely reliable and well worth trying.
As a further aid to the comparison of the algorithms, this chapter is concluded
with three examples to illustrate their behaviour.
Example 18.1. Optimal operation of a public lottery
In example 12.1 a function minimisation problem has been described which arises
in an attempt to operate a lottery in a way which maximises revenue per unit
TABLE 18.5. Comparison of algorithm performance as measured by equivalent function evaluations (efes).
Algorithm
19+20
21
Type
Nelder
Mead
Variable
metric
1242
n2 + 4n + 2
1068
n2 + 5n
68
33394
230
14519
68
18292
261
7008
51
10797
184
5868
66
29158
236
12355
52
16065
196
8196
14197
098
105
6
7672
109
084
3
8825
150
050
20
15478
125
066
5
92
96
72
Code length
Array elements
(a) number of
successful runs
(b) total efes
(c) total parameters
(d) w1 = (b)/(c)
(e) w0 = average efes
per parameter
(f) r = w0/Wl
(g) a = 1/(2r 1)
(h) number of failures
(i) number of problems
not run
(j) successes as
percentage of
problems run
21
With
numerically
approximated
gradient
Omitting
problem
34
76
26419
322
8205
61
8021
255
3145
75
1 4399
318
4528
7896
096
108
16
10601
129
063
0
3657
116
075
8
6736
149
051
0
11
10
93
76
100
88
100
Conjugate
gradients
22
With
numerically
approximated
gradient
23
23
With
numerically
approximated
Jacobian
22
1059
5n
23
Marquardt
1231
n2 + 5n
Problem 34 of the set of 79 has been designed to have residuals f so that the second derivatives of these residuals cannot be dropped from
equation ( 17, 15) to make the GaussNewton approximation. The failure of the approximation in this case is reflected in the very slow (12000 efes)
convergence of algorithm 23.
On Data General NOVA (23-bit mantissa).
230
time. Perry and Soland (1975) derive analytic solutions to this problem, but both
to check their results and determine how difficult the problem might be if such a
solution were not possible, the following data were run through the Nelder-Mead
simplex algorithm 19 on a Data General NOVA operating in 23-bit binary
arithmetic.
K1=382821
a =023047
K2=0416
=012
K3=524263
=0648
F=878602
=1116.
S (b)
-771569
707155
593981
Final point b*
699509, 400185, 300, 1621
591676, 563079, 501378, 182155
699695, 399902, 318797, 16055
S (b*)
efes
-771574 46
0703075 157
-770697
252
(The efes are equivalent function evaluations; see 18.4 for an explanation.) In
case (b), the price per ticket (second parameter) is clearly exorbitant and the
duration of the draw (first parameter) over a year and a half. The first prize (third
parameter, measured in units 1000 times as large as the price per ticket) is
relatively small. Worse, the revenue (-S) per unit time is negative! Yet the
derivatives with respect to each parameter at this solution are small. An additional fact to be noted is that the algorithm was not able to function normally, that
is, at each step algorithm 21 attempts to update an iteration matrix. However,
under certain conditions described at the end of 15.3, it is inadvisable to do this
and the method reverts to steepest descent. In case (b) above, this occurred in 23
of the 25 attempts to perform the update, indicating that the problem is very far
from being well approximated by a quadratic form. This is hardly surprising. The
matrix of second partial derivatives of S is certain to depend very much on the
parameters due to the fractional powers (a, , , ) which appear. Thus it is
unlikely to be approximately constant in most regions of the parameter space as
Left-overs
231
required of the Hessian in 15.2. This behaviour is repeated in the early iterations
of case (c) above.
In conclusion, then, this problem presents several of the main difficulties which
may arise in function minimisation:
(i) it is highly nonlinear;
(ii) there are alternative optima; and
(iii) there is a possible scaling instability in that parameters 3 and 4 (v and w)
take values in the range 200-2000, whereas parameters 1 and 2 (t and p) are in
the range l-10.
These are problems which affect the choice and outcome of minimisation procedures. The discussion leaves unanswered all questions concerning the reliability of
the model or the difficulty of incorporating other parameters, for instance to take
account of advertising or competition, which will undoubtedly cause the function
to be more difficult to minimise.
Example 18.2. Market equilibrium and the nonlinear equations that result
In example 12.3 the reconciliation of the market equations for supply
q=Kp a
and demand
has given rise to a pair of nonlinear equations. It has been my experience that
such systems are less common than minimisation problems, unless the latter are
solved by zeroing the partial derivatives simultaneously, a practice which generally makes work and sometimes trouble. Ones clients have to be trained to
present a problem in its crude form. Therefore, I have not given any special
method in this monograph for simultaneous nonlinear equations, which can be
written
(12.5)
f (b) =0
preferring to solve them via the minimisation of
f Tf = S(b)
(12.4)
232
The second residual is the likely form in which the demand function would be
estimated. To obtain a concrete and familiar form, substitute
q =b 1
= 15
p = b2
= 12
K =1
Z = exp(2)
so that
f 2 =ln( b 1 )-2+12ln( b 2 ) .
Now minimising the sum of squares
should give the desired solution.
The Marquardt algorithm 23 with numerical approximation of the Jacobian as
in 18.2 gives
p =b 2 =209647
q = b 1 =303773
with S=528328E-6 after five evaluations of the Jacobian and 11 evaluations of
S. This is effectively 26 function evaluations. The conjugate gradients algorithm
22 with numerical approximation of the gradient gave
p=209739
q=303747
S=233526E-8
after 67 sum-of-squares evaluations. For both these runs, the starting point
chosen was b 1= b 2=1. All the calculations were run with a Data General NOVA
in 23-bit binary arithmetic.
Example 18.3. Magnetic roots
Brown and Gearhart (1971) raise the possibility that certain nonlinear-equation
systems have magnetic roots to which algorithms will converge even if starting
points are given close to other roots. One such system they call the cubicparabola:
To solve this by means of algorithms 19, 21, 22 and 23, the residuals
b 2=05625.
Left-overs
233
Previous Home
Chapter 19
Next
235
In some tests run by S G Nash and myself, the inexact line searches led to very
slow convergence in a number of cases, even though early progress may have
been rapid (Nash and Nash 1977).
19.2. SOLUTION OF LINEAR EQUATIONS AND
LEAST-SQUARES PROBLEMS BY CONJUGATE GRADIENTS
The conjugate gradients algorithm 22 can be modified to solve systems of linear
equations and linear least-squares problems. Indeed, the conjugate gradients
methods have been derived by considering their application to the quadratic form
S(b)=b T Hb-c Tb+(anyscalar)
(15.11)
where H is symmetric and positive definite. The minimum at b' has the zero
gradient
g = Hb' - c =0
(15.17)
so that b' solves the linear equations
Hb' = c .
(19.1)
(19.34
c =A Tf
(19.3b)
(19.5)
If the linear search along the search direction tj yields a step-length factor kj, then
from (15.18)
(19.6)
g j + l =g j +k jHt j .
This with (19.5) defines a recurrence scheme which avoids the necessity of
multiplication of b by H to compute the gradient, though Ht j must be formed.
However, this is needed in any case in the linear search to which attention will
now be directed.
236
Substitution of
b j + 1 =b j +k jt j
(19.7)
into (15.11) gives an equation for kj, permitting the optimal step length to be
computed. For convenience in explaining this, the subscript j will be omitted.
Thus from substituting, we obtain
(k)=(b+k t) T H(b+k t) -c T (b+k t)+(anyscalar).
(19.8)
(19.9)
237
238
(19.12)
239
(19.13)
provide a non-negative definite matrix A A. Finally, the problem can be approached in a completely different way. Equation (2.13) can be rewritten
y j+ l = 7jh3 +[2-h2 /(1+j 2 h2 ) ]y j-yj- 1 .
(19.14)
Algorithm 24 and
equation (19.12)
Algorithm 24 and
equation (19.13)
Shooting
method
262E-6
834E-6
203E-4
288E-5
102E-2
0930
l97E-6
114E-5
203E-4
0028
0003
Thus, since y0 is fixed as zero by (2.9), if the value y 1 is known, all the points of
the solution can be computed. But (2.10) requires
(19.15)
yn+1=2
thus we can consider the difference
(19.16)
f( y 1 ) =y n+ 1 - 2
to generate a root-finding problem. This process is called a shooting method since
we aim at the value of yn+l desired by choosing yl. Table 19.1 compares the three
methods suggested for n=4, 10 and 50. The main comparison is between the
values found for the deviation from the true solution
or
y (x) =x+x 3
(19.17)
y(x j)=jh(l+j 2 h2 ) .
(19.18)
It is to be noted that the use of the conjugate gradients method with the normal
equations (19.13) is unsuccessful, since we have unfortunately increased the ill
conditioning of the equations in the manner discussed in chapter 5. The other two
methods offer comparable accuracy, but the shooting method, applying algorithm
18 to find the root of equation (19.16) starting with the interval [0,05], is
somewhat simpler and faster. In fact, it could probably be solved using a trial-anderror method to find the root on a pocket calculator.
240
-99.99
-21.03
24.98
-121.02
3.99
The software diskette contains the data file EX24LSl.CNM which, used with the driver
DR24LS.PAS, will execute this example.
As a test of the method of conjugate gradients (algorithm 24) in solving
least-squares problems of the above type, a number of examples were generated
using all possible combinations of n heights. These heights were produced using a
pseudo-random-number generator which produces numbers in the interval (0,1).
All m=n*(n-1)/2 height differences were then computed and perturbed by
pseudo-random values formed from the output of the above-mentioned generator
minus 05 scaled to some range externally specified. Therefore if S1 is the scale
factor for the heights and S2 the scale factor for the perturbation and the function
RND(X) gives a number in (0,1), heights are computed using
S1*RND(X)
and perturbations on the height differences using
S2*[RND(X)-051.
Table 19.2 gives a summary of these calculations. It is important that the
241
m=
n(n-1)/2
Matrix
products
Height
scale S1
4
4
4
10
20
20
20
6
6
6
45
190
190
190
3
2
2
3
3
3
5
100
100
10000
1000
1000
1000
1000
Perturbation Perturbation
variance?
scale S2
001
1
1
01
01
100
2000
915E-6
915E-2
915E-2
925E-4
837E-4
8366
334631
Variance of
computed height
differences?
111E-6
111E-2
111E-2
202E-4
696E-5
6943
266687
Variance
reduction
factor
012
012
012
022
008
008
008
(19.19)
where s is some shift, is not positive definite. (Only the generalised problem will
be treated.) In fact, it is bound to be indefinite if an intermediate eigenvalue is
sought. Superficially, this can be rectified immediately by solving the least-squares
problem
( A')T ( A')yi =(A ') TBx i
(19.20)
in place of (9.12a). However, as with the Froberg problem of example 19.1, this is
done at the peril of worsening the condition of the problem. Since Hestenes
(1975) has pointed out that the conjugate gradients method may work for
indefinite systems-it is simply no longer supported by a convergence theorem-
242
we may be tempted to proceed with inverse iteration via conjugate gradients for
any real symmetric problem.
Ruhe and Wiberg (1972) warn against allowing too large an increase in the
norm of y in a single step of algorithm 24, and present techniques for coping with
the situation. Of these, the one recommended amounts only to a modification of
the shift. However, since Ruhe and Wiberg were interested in refining eigenvectors already quite close to exact, I feel that an ad hoc shift may do just as well if a
sudden increase in the size of the vector y, that is, a large step length k, is
observed.
Thus my suggestion for solution of the generalised symmetric matrix eigenvalue
problem by inverse iteration using the conjugate gradients algorithm 24 is as
follows.
(i) Before each iteration, the norm (any norm will do) of the residual vector
r =(A - e B) x
(19.21)
should be computed and this norm compared to some user-defined tolerance as a
convergence criterion. While this is less stringent than the test made at STEPs 14
and 15 of algorithm 10, it provides a constant running check of the closeness of the
current trial solution (e,x) to an eigensolution. Note that a similar calculation
could be performed in algorithm 10 but would involve keeping copies of the
matrices A and B in the computer memory. It is relatively easy to incorporate a
restart procedure into inverse iteration so that tighter tolerances can be entered
without discarding the current approximate solution. Furthermore, by using b=0
as the starting vector in algorithm 24 at each iteration and only permitting n
conjugate gradient steps or less (by deleting STEP 12 of the algorithm), the
matrix-vector multiplication of STEP 1 of algorithm 24 can be made implicit in the
computation of residuals (19.21) since
c = Bx.
(19.22)
(19.23)
(ii) To avoid too large an increase in the size of the elements of b, STEP 7 of
algorithm 24 should include a test of the size of the step-length parameter k. I use
the test
If ABS(k) > 1/SQR(eps), then . . .
where eps is the machine precision, to permit the shift s to be altered by the user. I
remain unconvinced that satisfactory simple automatic methods yet exist to
calculate the adjustment to the shift without risking convergence to an eigensolution other than that desired. The same argument applies against using the
Rayleigh quotient to provide a value for the shift s. However, since the Rayleigh
quotient is a good estimate of the eigenvalue (see 10.2, p 100), it is a good idea to
compute it.
(iii) In order to permit the solution b of
H b = (A - s B) b = Bx = c
(19.24)
243
(19.25)
where bm is the largest element in magnitude in b (but keeps its sign!!), I use the
infinity norm (9.14) in computing the next iterate x from b. To obtain an
eigenvector normalised so that
x TBx=1
(19.26)
x Tc = x TBx .
(19.27)
(19.28)
then takes on its stationary values (that is, the values at which the partial
derivatives with respect to the components of x are zero) at the eigensolutions of
(2.63)
Ax=e Bx.
In particular, the maximum and minimum values of R are the extreme eigenvalues of the problem (2.63). This is easily seen by expanding
(19.29)
where ~j is the jth eigenvector corresponding to the eigenvalue ej. Then we have
(19.31)
244
If
e 1 >e 2 > . . .>e n
(19.31)
(19.32)
(19.33)
(19.34)
(19.35)
(19.36)
(19.37)
(19.38)
where
245
(19.39)
(19.40)
x T At = t T A x
(19.41)
x T Bt=t T Bx.
(19.42)
(x T Ax)
(x T At)
and
(tT At)
(x T Bx)
(x T Bt)
and
(tT Bt).
The quadratic equation (19.37) has two roots, of which only one will correspond to
a minimum. Since
y (k )=05D 2 ( dR/ dk ) = u k 2 + v k + w
(19.43)
we get
(19.44)
At either of the roots of (19.37), this reduces to
0 5D 2 ( d 2 R/ dk 2 )=2u k + v
(19.45)
(19.46)
Substitution of both roots from the quadratic equation formula shows that the
desired root is
k = [ -v + (v 2 - 4u w) ] / ( 2u ) .
(19.47)
If v is negative, (19.47) can be used to evaluate the root. However, to avoid digit
cancellation when v is positive,
k = - 2w / [v + (v 2 - 4u w) ]
(19.48)
should be used. The linear search subproblem has therefore been resolved in a
straightforward manner in this particular case.
The second aspect of particularising the conjugate gradients algorithm to the
minimisation of the Rayleigh quotient is the generation of the next conjugate
direction. Note that the gradient of the Rayleigh quotient at x is given by
g= 2 (Ax- RBx) / (x T Bx)
(19.49)
(19.50)
246
Substituting
q= -g
and the Hessian (19.50) into the expression
z=g T Ht/ t T Ht
(16.5)
(16.4)
for the parameter in the two-term conjugate gradient recurrence, and noting that
gTt = 0
(19.51)
(19.52)
The work of Geradin (1971) implies that z should be evaluated by computing the
inner products within the square brackets and subtracting. I prefer to perform the
subtraction within each element of the inner product to reduce the effect of digit
cancellation.
Finally, condition (19.51) permits the calculation of the new value of the
Rayleigh quotient to be simplified. Instead of the expression which results from
expanding (19.33), we have from (19.49) evaluted at (x+kt), with (19.51), the
expression
R= (t T Ax+k t T At) / (t T Bx+k t T Bt).
(19.53)
This expression is not used in the algorithm. Fried (1972) has suggested several
other formulae for the recurrence parameter z of equation (19.52). At the time of
writing, too little comparative testing has been carried out to suggest that one
such formula is superior to any other.
Algorithm 2.5. Rayleigh quotient minimisation by conjugate gradients
procedure rqmcg( n : integer; {order of matrices}
A, B : rmatrix; {matrices defining eigenproblem}
var X : rvector; {eigenvector approximation, on both
input and output to this procedure}
var ipr : integer; {on input, a limit to the number of
matrix products allowed, on output, the number of
matrix products used}
var rq : real); {Rayleigh quotient = eigenvalue approx.}
{alg25.pas == Rayleigh quotient minimization by conjugate gradients
Minimize Rayleigh quotient
X-transpose A X / X-transpose B X
thereby solving generalized symmetric matrix eigenproblem
AX=rqBX
for minimal eigenvalue rq and its associated eigenvector.
A and B are assumed symmetric, with B positive definite.
While we supply explicit matrices here, only matrix products
are needed of the form v = A u, w = B u.
Copyright 1988 J.C.Nash
}
var
count i, itn, itlimit : integer;
247
248
249
Example 19.3. Conjugate gradients for inverse iteration and Rayleigh quotient
minimisation
Table 19.3 presents approximations to the minimal and maximal eigensolutions
of the order-10 matrix eigenproblem (2.63) having as A the Moler matrix and as
B the Frank matrix (appendix 1). The following notes apply to the table.
(i) The maximal (largest eigenvalue) eigensolution is computed using (- A) instead
of A in algorithm 25.
(ii) Algorithm 15 computes all eigensolutions for the problem. The maximum
absolute residual quoted is computed in my program over all these solutions, not
simply for the eigenvalue and eigenvector given.
(iii) It was necessary to halt algorithm 10 manually for the case involving a shift
of 88. This is discussed briefly in 9.3 (p 109).
(iv) The three iterative algorithms were started with an initial vector of ones.
250
TABLE 19.3. (a) Minimal and (b) maximal eigensolutions of Ax = eBx for A = Moler matrix, B = Frank
matrix (order 10).
Algorithm 10
Algorithm 15
Section 19.3
Algorithm 25
Shift
Eigenvalue
Iterations or sweeps
Matrix products
Rayleigh quotient
0
21458E-6
4
--
253754E-6
7
_-
0
214552E-6
3
23
21455E-6
26
214512E-6
Eigenvector:
0433017
021651
0108258
541338E-2
270768E-2
135582E-2
681877E-3
348868E-3
190292E-3
126861E-3
0433017
021651
0108257
541337E-2
270768E-2
135583E-2
681877E-3
348866E-3
19029E-3
126858E-3
0433017
021651
0108257
541337E-2
270768E-2
135582E-2
681879E-3
348869E-3
190291E-3
126859E-3
Maximum residual
Error sum of squares r T r
Gradient norm2 gT g
-0433015
-0216509
-0108257
-541331E-2
-270767E-2
-135583E-2
-681877E-3
-348891E-3
-190299E-3
-126864E-3
462709E-11
962214E-15
Maximum residual
Error sum of squares r T r
Gradient norm2 g T g
88
881652
(see notes)
0217765
-0459921
0659884
-0799308
0865401
-0852101
0760628
-0599375
0383132
-0131739
881644
7
-
88
88165
16
166
-0217764
0459918
-0659877
0799302
-0865396
08521
-0760632
0599376
-0383132
0131739
0219309
-0462607
0662815
-0802111
0867203
-085142
0757186
-0594834
0379815
-0130648
96
881651
0219343
-0462759
0663062
-0801759
0866363
-0851188
0757946
-0595627
0379727
-0130327
573166E-3
582802E-9
(v) Different measures of convergence and different tolerances have been used in
the computations, which were all performed on a Data General NOVA in
23-bit binary arithmetic. That these measures are different is due to the various
operating characteristics of the programs involved.
251
Order
Rayleigh quotient
of (-A)
4
10
50
100
0350144
744406E-2
348733E-3
889398E-4
180074E-13
208522E-10
19187E-10
723679E-9
252
Appendix 1
for i
j.
Professor Cleve Moler devised this simple matrix. It has the very simple Choleski
decomposition given in example 7.1, so is positive definite. Nevertheless, it has
one small eigenvalue and often upsets elimination methods for solving linearequation systems.
Frank matrix
A i j=min(i,j).
A reasonably well behaved matrix.
Bordered matrix
Ai i =1
Ai n =A n i =2 1 - i
Ai j = 0
for i n
otherwise.
The matrix has (n-2) eigenvalues at 1. Wilkinson (1965, pp 94-7) gives some
discussion of this property. The high degree of degeneracy and the form of the
253
254
for i
j.
for i=1, 2, . . . , n
for i=1, 2, . . . , (n-1)
for| j - i | > l
where [b] is the largest integer less than or equal to b. The W+matrix (Wilkinson
1965, p 308) is normally given odd order. This tridiagonal matrix then has
several pairs of close eigenvalues despite the fact that no superdiagonal element is
small. Wilkinson points out that the separation between the two largest eigenvalues is of the order of (n!)-2 so that the power method will be unable to
separate them unless n is very small.
Wilkinson W-matrix
Ai i = [ n/ 2 ] + 1 -i
Ai, i + l = A i+ l , l
Ai j=0
for i=1, 2, . . . , n
for i=1, 2, . . . , (n-1)
for|j-i|>1
where [b] is the largest integer less than or equal to b. For odd order, this matrix
has eigenvalues which are pairs of equal magnitude but opposite sign. The
magnitudes of these are very close to some of those of the corresponding W+
matrix.
Ones
Ai j =1
This matrix is singular. It has only rank one, that is, (n-1) zero eigenvalues.
The matrices described here may all be generated by the Pascal procedure
MATRIXIN.PAS, which is on the software diskette. This procedure also allows for
keyboard entry of matrices.
Appendix 2
LIST OF ALGORITHMS
Algorithm 1. Singular-value decomposition
Algorithm 2. Least-squares solution via singular-value decomposition
Algorithm 3. Givens reduction of a real rectangular matrix
Algorithm 4. Givens reductions, singular-value decomposition and leastsquares solution
Algorithm 5. Gauss elimination with partial pivoting
Algorithm 6. Gauss elimination back-substitution
Algorithm 7. Choleski decomposition in compact storage
Algorithm 8. Choleski back-substitution
Algorithm 9. Bauer-Reinsch inversion of a positive definite symmetric
matrix
Algorithm 10. Inverse iteration via Gauss elimination
Algorithm 11. Standardisation of a complex vector
Algorithm 12. Residuals of a complex eigensolution
Algorithm 26. Eigensolutions of a complex matrix by Eberleins method
Algorithm 13. Eigensolutions of a real symmetric matrix via the singularvalue decomposition
Algorithm 14. A Jacobi algorithm for eigensolutions of a real symmetric
matrix
Algorithm 15. Solution of a generalised matrix eigenvalue problem by two
applications of the Jacobi algorithm
Algorithm 16. Grid search along a line
Algorithm 17. Minimisation of a function of one variable
Algorithm 18. Root-finding by bisection and False Position
Algorithm 19. A Nelder-Mead minimisation procedure
Algorithm 20. Axial search
Algorithm 27. Hooke and Jeeves minimiser
Algorithm 21. Variable metric minimiser
Algorithm 22. Function minimisation by conjugate gradients
Algorithm 23. Modified Marquardt method for minimising a nonlinear
sum-of-squares function
Algorithm 24. Solution of a consistent set of linear equations by conjugate
gradients
Algorithm 25. Rayleigh quotient minimisation by conjugate gradients
255
36
42
51
56
75
77
88
89
99
106
111
112
113
123
128
137
149
154
162
173
179
183
192
200
212
236
246
Appendix 3
LIST OF EXAMPLES
Example 2.1. Mass-spectrograph calibration
Example 2.2. Ordinary differential equations: a two-point boundary
value problem
Example 2.3. Least squares
Example 2.4. Surveying-data fitting
Example 2.5. Illustration of the matrix eigenvalue problem
Example 3.1. The generalised inverse of a rectangular matrix via the
singular-value decomposition
Example 3.2. Illustration of the use of algorithm 2
Example 4.1. The operation of Givens reduction
Example 4.2. The use of algorithm 4
Example 6.1. The use of linear equations and linear least-squares problems
Example 7.1. The Choleski decomposition of the Moler matrix
Example 7.2. Solving least-squares problems via the normal equations
Example 8.1. The behaviour of the Bauer-Reinsch Gauss-Jordan inversion
Example 9.1. Inverse iteration
Example 9.2. Eigensolutions of a complex matrix
Example 10.1. Principal axes of a cube
Example 10.2. Application of the Jacobi algorithm in celestial mechanics
Example 11.1. The generalised symmetric eigenproblem: the anharmonic oscillator
Example 12.1. Function minimisation-optimal operation of a public
lottery
Example 12.2. Nonlinear least squares
Example 12.3. An illustration of a system of simultaneous nonlinear
equations
Example 12.4. Root-finding
Example 12.5. Minimum of a function of one variable
Example 13.1. Grid and linear search
Example 13.2. A test of root-finding algorithms
Example 13.3. Actuarial calculations
Example 14.1. Using the Nelder-Mead simplex procedure (algorithm 19)
Example 15.1. Illustration of the variable metric algorithm 21
Example 16.1. Conjugate gradients minimisation
Example 17.1. Marquardts minimisation of a nonlinear sum of squares
Example 18.1. Optimal operation of a public lottery
256
20
20
23
24
28
44
45
52
62
77
91
92
100
108
117
125
131
138
144
144
144
145
146
156
164
165
180
196
204
216
228
List of examples
Example 18.2.
Example
Example
Example
Example
18.3.
19.1.
19.2.
19.3.
Example 19.4.
257
231
232
238
240
249
251
Appendix 4
259
ALG23.PAS
ALG24.PAS
ALG25.PAS
ALG26.PAS
ALG27.PAS
The following files are driver programs to run examples of use of the algorithms.
DR102.PAS
DR03.PAS
DR03A.PAS
DR04.PAS
DR0506.PAS
DR0708.PAS
DR09.PAS
DR10.PAS
DR13.PAS
DR14.PAS
DR15.PAS
DR1617.PAS
DR1618.PAS
DR1920.PAS
DR21.PAS
DR22.PAS
DR23.PAS
DR24II.PAS
DR24LE.PAS
DR24LS.PAS
DR25.PAS
DR26.PAS
DR27.PAS
260
The following support codes are needed to execute the driver programs:
CALCEPS.PAS --to compute the machine precision for the Turbo Pascal computing environment in which the program is compiled
CONSTYPE.DEF ---a set of constant and type specifications common to the codes
CUBEFN.PAS ---a cubic test function of one variable with minimum at 0.81650
FNMIN.PAS ---a main program to run function minimisation procedures
GENEVRES.PAS ---residuals of a generalised eigenvalue problem
GETOBSN.PAS ---a procedure to read a single observation for several variables
(one row of a data matrix)
HTANFN.PAS
---the hyperbolic tangent, example 13.2
JJACF.PAS
---Jaffrelots autocorrelation problem, example 14.1
MATCOPY.PAS ---to copy a matrix
MATMUL.PAS ---to multiply two matrices
MATRIXIN.PAS ---to create or read in matrices
PSVDRES.PAS
---to print singular-value decomposition results
---real valued test function of x for [1D] minimisation and rootQUADFN.PAS
finding
RAYQUO.PAS
---to compute the Rayleigh quotient for a generalised eigenvalue
problem
---to compute residuals for linear equations and least-squares
RESIDS.PAS
problems
---to set up and compute function and derivative information for
ROSEN.PAS
the Rosenbrock banana-shaped valley test problem
SPENDFN.PAS ---the expenditure example, illustrated in example 12.5 and
example 13.1
STARTUP.PAS ---code to read the names of and open console image and/or
console control files for driver programs. This common code
segment is not a complete procedure, so cannot be included in
Turbo Pascal 5.0 programs.
---to compute various tests of a singular-value decomposition
SVDTST.PAS
TDSTAMP.PAS ---to provide a time and date stamp for output (files). This code
makes calls to the operating system and is useful only for MSDOS computing environments. In Turbo Pascal 5.0, there are
utility functions which avoid the DOS call.
VECTORIN.PAS ---to create or read in a vector
The following files provide control information and data to the driver programs.
Their names can be provided in response to the question
File for input of control data ([cr] for keyboard)?
Be sure to include the filename extension (.CNM). The nomenclature follows that for
the DR*.PAS files. In some cases additional examples have been provided. For these
files a brief description is provided in the following list of control files.
261
EX0102.CNM
EX03.CNM
EX03A.CNM
EX04.CNM
EX0506.CNM
EX0506S.CNM --- a set of equations with a singular coefficient matrix
EX0708.CNM
EX09.CNM
EX10.CNM
EX13.CNM
EX14.CNM
EX15.CNM
EX1617.CNM
EX1618.CNM
EX19.CNM
EX1920.CNM
EX1920J.CNM --- data for the Jaffrelot problem (JJACF.PAS), example 14.1
EX21.CNM
EX22.CNM
EX23.CNM
EX24II.CNM
EX24LE.CNM
EX24LS.CNM
EX24LS1.CNM --- data for example 19.2
EX25.CNM
EX26.CNM
EX26A.CNM
EX27J.CNM --- data for the Jaffrelot problem (JJACF.PAS), example 14.1.
EX27R.CNM --- console control file for the regular test problem, the Rosenbrock test function (ROSEN.PAS)
If the driver programs have been loaded and compiled to saved executable (.COM)
files, then we can execute these programs by typing their names, e.g. DR0102. The
user must then enter command information from the keyboard. This is not difficult,
but it is sometimes useful to be able to issue such commands from a file. Such a
BATch command file (.BAT extension) is commonly used in MS-DOS systems. In the
driver programs we have included compiler directives to make this even easier to use
by allowing command input to come from a file. A batch file EXAMPLE.BAT which
could run drivers for algorithms 1 through 6 would have the form
rem EXAMPLE.BAT
rem runs Nash Algorithms 1 through 6 automatically
DR0102<DR0102X.
DR03A<DR03AX.
DR03<DR03X.
DR04<DR04X.
DR0506<DR0506X.
262
The files which end in an X. contain information to control the drivers, in fact, they
contain the names of the EX*.CNM control files. This facility is provided to allow
for very rapid testing of all the codes at once (the technical term for this is regression
testing). Note that console image files having names of the form OUT0102 are
created, which correspond in form to the driver names, i.e. DR0102.PAS. The
command line files present on the disk are:
DR0102X.
DR09X.
DR1618X.
DR24LEX.
DR03AX.
DR10X.
DR19X.
DR24LSX.
DR03X.
DR13X.
DR21X.
DR25X.
DR04X.
DR14X.
DR22X.
DR26X.
DR0506X.
DR15X.
DR23X.
DR27X.
DR0708X.
DR1617X.
DR24IIX.
Users may wish to note that there are a number of deficiencies with version 3.01 a of
Turbo Pascal. I have experienced some difficulty in halting programs with the
Control-C or Control-Break keystrokes, in particular when the program is waiting
for input. In some instances, attempts to halt the program seem to interfere with the
files on disk, and the working algorithm file has been over-written! On some
occasions, the leftmost characters entered from the keyboard are erased by READ
instructions. From the point of view of a software developer, the absence of a facility
to compile under command of a BATch command file is a nuisance. Despite these
faults, the system is relatively easy to use. Many of the faults of Turbo Pascal 3.01a
have been addressed in later versions of the product. We anticipate that a diskette of
the present codes adapted for version 5.0 of Turbo Pascal will be available about the
time the book is published. Turbo Pascal 5.0 is, however, a much larger system in
terms of memory requirements.
BIBLIOGRAPHY
ABKAMOWITZ M and STEGUN I A 1965 Handbook of Mathematical Functions with Formulas, Graphs and
Mathematical Tables (New York: Dover)
ACTON F S 1970 Numerical Methods that Work (New York: Harper and Row)
BARD Y 1967 Nonlinear Parameter Estimation and Programming (New York: IBM New York Scientific
Center)
---1970 Comparison of gradient methods for the solution of nonlinear parameter estimation problems
SIAM J. Numer. Anal. 7 157-86
--1974 Nonlinear Parameter Estimation (New York/London: Academic)
BATES D M and WATTS D G 1980 Relative curvature measures of nonlinearity J. R. Stat. Soc. B 42 1-25
---1981a A relative offset orthogonality convergence criterion for nonlinear least squares Technometrics
23 179-83
---1988 Nonlinear Least Squares (New York: Wiley)
BAUER F L and REINSCH C 1971 Inversion of positive definite matrices by the Gauss-Jordan method in
linear algebra Handbook for Automatic Computation vol 2, eds J H Wilkinson and C Reinsch (Berlin:
Springer) contribution l/3 (1971)
BEALE E M L 1972 A derivation of conjugate gradients Numerical Methods for Nonlinear Optimization ed.
F A Lootsma (London: Academic)
BELSLEY D A, KUH E and WELSCH R E 1980 Regression Diagnostics: Identifying Influential Data and
Sources of Collinearity (New York/Toronto: Wiley)
BIGGS M C 1975 Some recent matrix updating methods for minimising sums of squared terms Hatfield
Polytechnic, Numerical Optimization Centre, Technical Report 67
BOOKER T H 1985 Singular value decomposition using a Jacobi algorithm with an unbounded angle of
rotation PhD Thesis (Washington, DC: The American University)
BOWDLER H J, MARTIN R S, PETERS G and WILKINSON J H 1966 Solution of real and complex systems of
linear equations Numer. Math. 8 217-34; also in Linear Algebra, Handbook for Automatic Computation
vol 2, eds J H Wilkinson and C Reinsch (Berlin: Springer) contribution l/7 (1971)
BOX G E P 1957 Evolutionary operation: a method for increasing industrial productivity Appl. Stat. 6
81-101
BOX M J 1965 A new method of constrained optimization and a comparison with other methods Comput.
J. 8 42-52
BOX M J, DAVIES D and SWANN W H 1971 Techniques doptimisation non lintaire, Monographie No 5 (Paris:
Entreprise Moderne D Edition) Original English edition (London: Oliver and Boyd)
BRADBURY W W and FLETCHER R 1966 New iterative methods for solution of the eigenproblem Numer.
Math. 9 259-67
BREMMERMANN H 1970 A method of unconstrained global optimization Math. Biosci. 9 1-15
BRENT R P 1973 Algorithms for Minimization Without Derivatives (Englewood Cliffs, NJ: Prentice-Hall)
BROWN K M and GEARHART W B 1971 Deflation techniques for the calculation of further solutions of
nonlinear systems Numer. Math. 16 334-42
BROYDEN C G 1970a The convergence of a class of double-rank minimization algorithms, pt 1 J. Inst.
Maths Applies 6 76-90
---1970b The convergence of a class of double-rank minimization algorithms, pt 2 J. Inst. Maths Applies
6 222-31
---1972 Quasi-Newton methods Numerical methods for Unconstrained Optimization ed. W Murray
(London: Academic) pp 87-106
263
264
BUNCH J R and NEILSEN C P 1978 Updating the singular value decomposition Numerische Mathematik 31
111-28
BUNCH J R and ROSE D J (eds) 1976 Sparse Matrix Computation (New York: Academic)
BUSINGER P A 1970 Updating a singular value decomposition (ALGOL programming contribution, No 26)
BIT 10 376-85
CACEI M S and CACHERIS W P 1984 Fitting curves to data (the Simplex algorithm is the answer) Byte 9
340-62
CAUCHY A 1848 Mthode gnrale pour la resolution des systmes dquations simultanes C. R. Acad.
Sci., Paris 27 536-8
CHAMBERS J M 1969 A computer system for fitting models to data Appl. Stat. 18 249-63
---1971 Regression updating J. Am. Stat. Assoc. 66 744-8
---1973 Fitting nonlinear models: numerical techniques Biometrika 60 1-13
CHARTRES B A 1962 Adaptation of the Jacobi methods for a computer with magnetic tape backing store
Comput. J. 5 51-60
CODY W J and WAITE W 1980 Software Manual for the Elementary Functions (Englewood Cliffs. NJ:
Prentice Hall)
CONN A R 1985 Nonlinear programming. exact penalty functions and projection techniques for nonsmooth functions Boggs, Byrd and Schnabel pp 3-25
C OONEN J T 1984 Contributions to a proposed standard for binary floating-point arithmetic PhD
Dissertation University of California, Berkeley
CRAIG R J and EVANS J W c. 1980A comparison of Nelder-Mead type simplex search procedures Technical
Report No 146 (Lexington, KY: Dept of Statistics, Univ. of Kentucky)
CRAIG R J, EVANS J W and ALLEN D M 1980 The simplex-search in non-linear estimation Technical Report
No 155 (Lexington, KY: Dept of Statistics. Univ. of Kentucky)
CURRY H B 1944 The method of steepest descent for non-linear minimization problems Q. Appl. Math. 2
258-61
DAHLQUIST G and BJRAK A 1974 Numerical Methods (translated by N Anderson) (Englewood Cliffs. NJ:
Prentice-Hall)
DANTZIG G B 1979 Comments on Khachians algorithm for linear programming Technical Report No
SOL 79-22 (Standford, CA: Systems Optimization Laboratory, Stanford Univ.)
DAVIDON W C 1959 Variable metric method for minimization Physics and Mathematics, AEC Research
and Development Report No ANL-5990 (Lemont, IL: Argonne National Laboratory)
---1976 New least-square algorithms J. Optim. Theory Applic. 18 187-97
---1977 Fast least squares algorithms Am. J. Phys. 45 260-2
DEMBO R S, EISENSTAT S C and STEIHAUG T 1982 Inexact Newton methods SIAM J. Numer. Anal. 19
400-8
DEMBO R S and STEIHAUG T 1983 Truncated-Newton algorithms for large-scale unconstrained optimization Math. Prog. 26 190-212
DENNIS J E Jr, GAY D M and WELSCH R E 1981 An adaptive nonlinear least-squares algorithm ACM
Trans. Math. Softw. 7 348-68
DENNIS J E Jr and SCHNABEL R 1983 Numerical Methods far Unconstrained Optimization and Nonlinear
Equations (Englewood Cliffs, NJ: Prentice-Hall)
DIXON L C W 1972 Nonlinear Optimisation (London: The English Universities Press)
DIXON L C W and SZEG G P (eds) 1975 Toward Global Optimization (Amsterdam/Oxford: NorthHolland and New York: American Elsevier)
---(eds) 1978 Toward Global Optimization 2 (Amsterdam/Oxford: North-Holland and New York:
American Elsevier)
DONALDSON J R and SCHNABEL R B 1987 Computational experience with confidence regions and
confidence intervals for nonlinear least squares Technometrics 29 67-82
DONGARRA and GROSSE 1987 Distribution of software by electronic mail Commun. ACM 30 403-7
DRAPER N R and SMITH H 1981 Applied Regression Analysis 2nd edn (New York/Toronto: Wiley)
EASON E D and F ENTON R G 1972 Testing and evaluation of numerical methods for design
optimization Report No lJTME-TP7204 (Toronto, Ont.: Dept of Mechanical Engineering, Univ. of
Toronto)
---1973 A comparison of numerical optimization methods for engineering design Trans. ASME J. Eng.
Ind. paper 73-DET-17, pp l-5
Bibliography
265
266
Bibliography
267
LOOTSMA F A (ed.) 1972 Numerical Methods for Non-Linear Optimization (London/New York: Academic)
MAINDONALD J H 1984 Statistical Computation (New York: Wiley)
MALCOLM M A 1972 Algorithms to reveal properties of floating-point arithmetic Commun. ACM 15
949-51
MARQUARDT D W 1963 An algorithm for least-squares estimation of nonlinear parameters J. SIAM 11
431-41
---1970 Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation
Technometrics 12 59l-612
MCKEOWN J J 1973 A comparison of methods for solving nonlinear parameter estimation problems
Identification & System Parameter Estimation, Proc. 3rd IFAC Symp. ed. P Eykhoff (The Hague: Delft)
pp 12-15
-- 1974 Specialised versus general purpose algorithms for minimising functions that are sums of squared
terms Hatfield Polytechnic, Numerical Optimization Centre Technical Report No 50, Issue 2
MEYER R R and ROTH P M 1972 Modified damped least squares: an algorithm for non-linear estimation J.
Inst. Math. Applic. 9 218-33
MOLER C M and VAN LOAN C F 1978 Nineteen dubious ways to compute the exponential of a matrix
SIAM Rev. 20 801-36
MOR J J, GARBOW B S and HILLSTROM K E 1981 Testing unconstrained optimization software ACM
Trans. Math. Softw. 7 17-41
MOSTOW G D and SAMPSON J H 1969 Linear Algebra (New York: McGraw-Hill)
MURRAY W (ed.) 1972 Numerical Methods for Unconstrained Optimization (London: Academic)
NASH J C 1974 The Hermitian matrix eigenproblem HX=eSx using compact array storage Comput.
Phys. Commun. 8 85-94
---1975 A one-sided transformation method for the singular value decomposition and algebraic
eigenproblem Comput. J. 18 74-6
---1976 An Annotated Bibliography on Methods for Nonlinear Least Squares Problems Including Test
Problems (microfiche) (Ottawa: Nash Information Services)
---1977 Minimizing a nonlinear sum of squares function on a small computer J. Inst. Maths Applics 19
231-7
---1979a Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation
(Bristol: Hilger and New York: Halsted)
---1979b Accuracy of least squares computer programs: another reminder: comment Am. J. Ag. Econ.
61 703-9
---1980 Problmes mathmatiques soulevs par les modles conomiques Can. J. Ag. Econ. 28 51-7
---1981 Nonlinear estimation using a microcomputer Computer Science and Statistics: Proceedings of
the 13th Symposium on the Interface ed. W F Eddy (New York: Springer) pp 363-6
---1984a Effective Scientific Problem Solving with Small Computers (Reston, VA: Reston Publishing) (all
rights now held by J C Nash)
---1984b LEQB05: User Guide - A Very Small Linear Algorithm Package (Ottawa, Ont.: Nash
Information Services Inc.)
---1985 Design and implementation of a very small linear algebra program package Commun. ACM 28
89-94
---1986a Review: IMSL MATH/PC-LIBRARY Am. Stat. 40 301-3
---1986b Review: IMSL STAT/PC-LIBRARY Am. Stat. 40 303-6
---1986c Microcomputers, standards, and engineering calculations Proc. 5th Canadian Conf. Engineering Education, Univ. of Western Ontario, May 12-13, 1986 pp 302-16
NASH J C and LEFKOVITCH L P 1976 Principal components and regression by singular value decomposition
on a small computer Appl. Stat. 25 210-16
---1977 Programs for Sequentially Updated Principal Components and Regression by Singular Value
Decomposition (Ottawa: Nash Information Services)
NASH J C and NASH S G 1977 Conjugate gradient methods for solving algebraic eigenproblems Proc.
Symp. Minicomputers and Large Scale Computation, Montreal ed. P Lykos (New York: American
Chemical Society) pp 24-32
---1988 Compact algorithms for function minimisation Asia-Pacific J. Op. Res. 5 173-92
NASH J C and SHLIEN S 1987 Simple algorithms for the partial singular value decomposition Comput. J. 30
268-75
268
NASH J C and TEETER N J 1975 Building models: an example from the Canadian dairy industry Can. Farm.
Econ. 10 17-24
NASH J C and WALKER-SMITH M 1986 Using compact and portable function minimization codes in
forecasting applications INFOR 24 158-68
-- 1987 Nonlinear Parameter Estimation, an Integrated System in Basic (New York: Marcel Dekker)
NASH J C and WANG R L C 1986 Algorithm 645 Subroutines for testing programs that compute the
generalized inverse of a matrix ACM Trans. Math. Softw. 12 274-7
N ASH S G 1982 Truncated-Newton methods Report No STAN-CS-82-906 (Stanford, CA: Dept of
Computer Science, Stanford Univ.)
---1983 Truncated-Newton methods for large-scale function minimization Applications of Nonlinear
Programming to Optimization and Control ed. H E Rauch (Oxford: Pergamon) pp 91-100
---1984 Newton-type minimization via the Lanczos method SIAM J. Numer. Anal. 21 770-88
---1985a Preconditioning of truncated-Newton methods SIAM J. Sci. Stat. Comp. 6 599-616
---1985b Solving nonlinear programming problems using truncated-Newton techniques Boggs, Byrd
and Schnabel pp 119-36
NASH S G and RUST B 1986 Regression problems with bounded residuals Technical Report No 478
(Baltimore, MD: Dept of Mathematical Sciences, The Johns Hopkins University)
NELDER J A and MEAD R 1965 A simplex method for function minimization Comput. J. 7 308-13
NEWING R A and CUNNINGHAM J 1967 Quantum Mechanics (Edinburgh: Oliver and Boyd)
OLIVER F R 1964 Methods of estimating the logistic growth function Appl. Stat. 13 57-66
---1966 Aspects of maximum likelihood estimation of the logistic growth function JASA 61 697-705
OLSSON D M and NELSON L S 1975 The Nelder-Mead simplex procedure for function minimization
Technometrics 17 45-51; Letters to the Editor 3934
ONEILL R 1971 Algorithm AS 47: function minimization using a simplex procedure Appl. Stat. 20 338-45
OSBORNE M R 1972 Some aspects of nonlinear least squares calculations Numerical Methods for Nonlinear
Optimization ed. F A Lootsma (London: Academic) pp 171-89
PAIGE C C and SAUNDERS M A 1975 Solution of sparse indefinite systems of linear equations SIAM J.
Numer. Anal. 12 617-29
PAULING L and WILSON E B 1935 Introduction to Quantum Mechanics with Applications to Chemistry (New
York: McGraw-Hill)
PENROSE R 1955 A generalized inverse for matrices Proc. Camb. Phil. Soc. 51 406-13
PERRY A and SOLAND R M 1975 Optimal operation of a public lottery Mgmt. Sci. 22 461-9
PETERS G and WILKINSON J H 1971 The calculation of specified eigenvectors by inverse iteration Linear
Algebra, Handbook for Automatic Computation vol 2, eds J H Wilkinson and C Reinsch (Berlm:
Springer) pp 418-39
---1975 On the stability of Gauss-Jordan elimination with pivoting Commun. ACM 18 20-4
PIERCE B O and FOSTER R M 1956 A Short Table of Integrals 4th edn (New York: Blaisdell)
POLAK E and RIBIERE G 1969 Note sur la convergence de mthodes de directions conjuges Rev. Fr. Inf.
Rech. Oper. 3 35-43
POWELL M J D 1962 An iterative method for stationary values of a function of several variables Comput. J.
5 147 51
---1964 An efficient method for finding the minimum of a function of several variables without
calculating derivatives Comput. J. 7 155-62
---1975a Some convergence properties of the conjugate gradient method CSS Report No 23 (Harwell,
UK: Computer Science and Systems Division, Atomic Energy Research Establishment)
----1975b Restart procedures for the conjugate gradient method CSS Report No 24 (Harwell, UK:
Computer Science and Systems Division, Atomic Energy Research Establishment)
---1981 Nonlinear Optimization (London: Academic)
PRESS W H, FLANNERY B P, TEUKOLSKY S A and VETTERLING W T (1986/88) Numerical Recipes (in
Fortran/Pascal/C), the Art of Scientific Computing (Cambridge, UK: Cambridge University Press)
RALSTON A 1965 A First Course in Numerical Analysis (New York: McGraw-Hill)
RATKOWSKY D A 1983 Nonlinear Regression Modelling (New York: Marcel-Dekker)
REID J K 1971 Large Sparse Sets of Linear Equations (London: Academic)
RHEINBOLDT W C 1974 Methods for Solving Systems of Nonlinear Equations (Philadelphia: SIAM)
RICE J 1983 Numerical Methods Software and Analysis (New York: McGraw-Hill)
Bibliography
269
RILEY D D 1988 Structured programming: sixteen years later J. Pascal, Ada and Modula-2 7 42-8
ROSENBKOCK H H 1960 An automatic method for finding the greatest or least value of a function Comput.
J. 3 175-84
ROSS G J S 1971 The efficient use of function minimization in non-linear maximum-likelihood estimation
Appl. Stat. 19 205-21
---1975 Simple non-linear modelling for the general user Warsaw: 40th Session of the International
Statistical Institute 1-9 September 1975, ISI/BS Invited Paper 81 pp 1-8
RUHE A and WEDIN P-A 1980 Algorithms for separable nonlinear least squares problems SIAM Rev. 22
318-36
RUHE A and WIBERG T 1972 The method of conjugate gradients used in inverse iteration BIT 12 543-54
RUTISHAUSER H 1966 The Jacobi method for real symmetric matrices Numer. Math. 9 1-10; also in Linear
Algebra, Handbook for Automatic Computation vol 2, eds J H Wilkinson and C Reinsch (Berlin:
Springer) pp 202-11 (1971)
SARGENT R W H and SEBASTIAN D J 1972 Numerical experience with algorithms for unconstrained
minimisation Numerical Methods for Nonlinear Optimization ed. F A Lootsma (London: Academic) pp
445-68
SCHNABEL R B, KOONTZ J E and WEISS B E 1985 A modular system of algorithms for unconstrained
minimization ACM Trans. Math. Softw. 11 419-40
SCHWARZ H R, R UTISHAUSER H and S TIEFEL E 1973 Numerical Analysis of Symmetric Matrices
(Englewood Cliffs, NJ: Prentice- Hall)
SEARLE S R 1971 Linear Models (New York: Wiley)
SHANNO D F 1970 Conditioning of quasi-Newton methods for function minimization Math. Comput. 24
647-56
SHEARER J M and WOLFE M A 1985 Alglib, a simple symbol-manipulation package Commun. ACM 28
820-5
SMITH F R Jr and SHANNO D F 1971 An improved Marquardt procedure for nonlinear regressions
Technometrics 13 63-74
SORENSON H W 1969 Comparison of some conjugate direction procedures for function minimization J.
Franklin Inst. 288 421-41
SPANG H A 1962 A review of minimization techniques for nonlinear functions SIAM Rev. 4 343-65
SPENDLEY W 1969 Nonlinear least squares fitting using a modified Simplex minimization method Fletcher
pp 259-70
SPENDLEY W, HEXT G R and HIMSWORTH F R 1962 Sequential application of simplex designs in
optimization and evolutionary operation Technometric. 4 441-61
STEWART G W 1973 Introduction to Matrix Computations (New York: Academic)
---1976 A bibliographical tour of the large, sparse generalized eigenvalue problem Sparse Matrix
Computations eds J R Bunch and D J Rose (New York: Academic) pp 113-30
---1987 Collinearity and least squares regression Stat. Sci. 2 68-100
STRANG G 1976 Linear Algebra and its Applications (New York: Academic)
SWANN W H 1974 Direct search methods Numerical Methods for Unconstrained Optimization ed. W
Murray (London/New York: Academic)
SYNGE J L and GRIFFITH B A 1959 Principles of Mechanics 3rd edn (New York: McGraw-Hill)
TOINT PH L 1987 On large scale nonlinear least squares calculations SIAM J. Sci. Stat. Comput. 8 416-35
VARGA R S 1962 Matrix Iterative Analysis (Englewood Cliffs. NJ: Prenticee-Hall)
WILKINSON J H 1961 Error analysis of direct methods of matrix inversion J. ACM 8 281-330
---1963 Rounding Errors in Algebraic Processes (London: HMSO)
---1965 The Algebraic Eigenvalue Problem (Oxford: Clarendon)
WILKINSUN J H and REINSCH C (eds) 197 1 Linear Algebra, Handbook for Automatic Computation vol 2
(Berlin: Springer)
WOLFE M A 1978 Numerical Methods for Unconstrained Optimization, an Introduction (Wokingham, MA:
Van Nostrand-Reinhold)
YOURDON E 1975 Techniques of Program Structure and Design (Englewood Cliffs, NJ: Prentice-Hall)
ZAMBARDINO R A 1974 Solutions of systems of linear equations with partial pivoting and reduced storage
requirements Comput. J. 17 377-8
270
Previous Home
INDEX
Abramowitz. M., 4
Absolute value, 17
Acton, F. S.. 104. 146. 162
Actuarial calculations, 165
Addition of observations in least-squares, 64
Algebraic eigenvalue problem. 234
ALGOL,13,80,83
ALGOL-60,80
ALGOL-68, 80
Algorithms,
informal definition of. 1
choice of, 13
expression of, 15
list of, 255
Alternative implementation of singular-value
decomposition. 38
Alternative optima, 230
Analytic expression for derivatives, 218. 223
Anharmonic oscillator. 138
Annihilator of vector. 26
APL, 12
Argonne National Laboratory, 10
Arithmetic.
machine, 6
operations, 5
Autocorrelation, 180
Axial search, 171, 178
Bisection. 16 1
for matrix eigenvalues. 133
Bjrck, A.. 70. 75, 80, 81, 197
Bordered matrix. 253
Boundary-value problem, 20
Bowdler, H. J.. 80
Bradbury, W. W., 244
Bremmerman. H.. 147
Brent. R. P., 154, 185
Brown, K. M.. 146.232
Broyden. C. G.. 190
Businger. P. A.. 63
c (programming language), 11
Campey. 182
Cancellation of digits. 55
Cauchy. A., 186,208
Celestial mechanics, 13 1
Centroid of points. 168
function value at. 172
Chambers, J. M., 63
Chartres, B. A.. 33, 134
Choice.
in extended Choleski decomposition, 88
of algorithms, 13
of algorithms or programs, 14
Choleski back-solution, 2 12
Choleski decomposition, 13. 27, 84, 136, 2 12, 253
extension of. 86
Chopping (truncation), 7
Cobb-Douglas production function, 144
Coefficient matrix, 19, 72
Collinearity. 30, 45
Column permutations, 75
comeig (ALGOL procedure), 110
Compactness of programs, 12
Comparison of function minimisation
algorithms, 218, 226
Compiler for a computer programming language,
91
Complete matrix eigenvalue problem, 119, 135
Complex arithmetic, 83
271
272
Complex matrix,
eigensolutions of, 1 10
Complex systems of linear equations, 82
Components,
principal, 40, 46
Computability of a function, 153
Computations,
statistical, 66
Computer,
small, 3
Conjugacy of search directions, 186, 188, 197,
244,245
Conjugate gradients, 153, 186, 197, 223, 228, 232,
233
in linear algebra, 234
Constrained optimisation, 3, 218, 221
Constraints, 143
equality, 221
independent, 221
inequality, 221
Contraction of simplex, 168, 170
Convergence,
criteria for, 5, 15
of inverse iteration, 105
of Nelder-Mead search, 180
of power method, 103
Convergence test, 159, 171, 180, 242
for inverse iteration, 108
Convex function, 208
Corrected R2 statistic, 45
Cost of computations, 1, 3
Cox, M., 133
Cross-products matrix, 49, 66
Crout method, 75, 80
for complex equations, 83
Cubic interpolation, 15 1
Cubic inverse interpolation, 159
Cubic-parabola problem, 232
Cunningham, J., 138,141
Cycle or sweep, 35, 49
Cyclic Jacobi algorithm, 127
Cyclic re-ordering, 98
Dahlquist, G., 70, 75, 80, 81, 197
Data General computers, see NOVA or
ECLIPSE
Data points, 142
Davies, 182
Davies, Swann and Campey method, 182
Decomposition,
Choleski, 27
of a matrix, 26, 49
Definiteness of a matrix, 22
Degenerate eigenvalues, 120, 125
Degrees of freedom, 46
Index
Equations,
linear, 19, 20, 51
Equilibration of matrix, 80
Equivalent function evaluations (efes), 227
Euclidean norm, 22
Examples,
list of, 256
Execution time, 227
Expenditure minimisation, 156
Exponents of decimal numbers, 17
Expression of algorithms, 15
Extended precision, 14
Extension of simplex, 168, 169, 172
Extrapolation, 151
False Position, 161
Fenton, R. G., 182
Financial Times index, 77
Finkbeiner, D. T., 87
Fletcher, R., 190, 192, 198, 199, 215, 228, 244
Fletcher-Reeves formula, 199
FMIN linear search program, 153
Ford B., 135
Formulae,
Gauss-Jordan, 98
Forsythe, G. E., 127, 153
FORTRAN, 10, 56, 63
Forward difference, 2 19
Forward-substitution, 86, 136
Foster, R. M., 139
Frank matrix, 250,253
Fried, I., 246
Frberg, C., 21, 127, 238, 251
Full-rank case, 23, 66
Function evaluation count, 157, 164, 209, 217,
227, 232
Function minimisation, 142, 207
Functions,
penalty, 222
Galle, 131
Gauss elimination, 72, 79, 82, 93
for inverse iteration, 105, 109
variations, 80
with partial pivoting, 75
Gauss-Jordan reduction, 82, 93
Gauss-Newton method, 209, 211, 228
Gearhart, W. B., 146, 232
Generalised eigenvalue problem, 135, 234, 242
Generalised inverse, 44, 66
2 and 4 condition, 26
of a matrix, 24
Generalised matrix eigenvalue problem, 28, 104
Gentleman, W. M., 50
273
Geradin, M., 244, 246
Gerschgorin bound, 136
Gerschgorins theorem, 121
Gill, P. E., 221, 225
Givens reduction, 15, 49, 51, 63, 83
and singular-value decomposition,
implementation, 54
for inverse iteration, 105, 109
of a real rectangular matrix, 51
operation of, 52
singular-value decomposition and least-squares
solution, 56
Givens tridiagonalisation, 133
Global minimum, 146
Golub, G. H., 56
GOTO instructions, 12
Gradient, 186, 188, 197, 208, 226
computed, 226
of nonlinear sum of squares, 209
of Rayleigh quotient, 245
Gradient calculation in conjugate gradients for
linear equations, 235
Gradient components,
large computed values of, 206
Gram-Schmidt orthogonalisation, 197
Gregory, R. T., 117
Grid search, 149, 156, 160
Griffith, B. A., 125
Guard digits, 7
Hall, G., 135
Hamiltonian operator, 28, 138
Hammarling, S., 50
Hanson, R. J., 64
Hartley, H. O., 210, 211
Harwell subroutine library, 215
Hassan, Z., 223
Healy, M. J. R., 88, 90
Heaviside function, 222
Hemstitching of function minimisation method,
186, 208
Henderson, B., 153
Henrici, P., 127, 162
Hermitian matrix, 137
Hessian, 189, 197, 231
for Rayleigh quotient, 244
matrix, 187
Hestenes, M. R., 33, 134, 235, 241
Heuristic method, 168, 171
Hewlett-Packard,
computers, see HP9830
pocket calculators, 5
Hilbert segment, 108, 253
Hillstrom, K. E., 227
274
Index
Loss of information in least-squares
computations, 23, 67
Lottery,
optimal operation of, 144, 228
LU decomposition, 74
Machine arithmetic, 6
Machine precision, 6, 46, 70, 105, 219
Magnetic roots, 232
Magnetic zeros, 147
Malcolm, M. A., 6
Mantissa, 6
Market equilibrium,
nonlinear equations, 231
Marquardt, D. W., 211, 212
Marquardt algorithm, 209, 223, 228, 232, 233
Mass-spectrograph calibration, 20
Mathematical programming, 3, 13
Mathematical software, 11
Matrix, 19
coefficient, 20.23
complex, 110
cross-products, 66
dense, 20, 23
diagonal, 26, 3 1
elementary, 73
Frank, 100
generalised inverse of, 24
Hermitian, 110
inverse, 24, 95
Moler, 100
non-negative definite, 22, 86
non-symmetric, 110
null, 52
orthogonal, 26, 31, 50
positive definite, 22
rank of, 20
real symmetric, 31, 119
rectangular, 24, 44
semidefinite, 22
singular, 20
sparse, 20, 21, 23
special, 83
symmetric, 23, 28
symmetric positive definite, 83, 84, 93
triangular, 26, 50, 52, 72, 74
unit, 29, 32
unitary, 27
Matrix decomposition,
triangular, 74
Matrix eigenvalue problem, 28, 135
generalised, 104, 148
Matrix eigenvalues for polynomial roots, 148
Matrix form of linear equations, 19
Matrix inverse for linear equations, 24
275
276
Plauger, P. J., 12
Plot or graph of function, 151
Polak, E., 198, 199
Polak-Ribiere formula, 199
Polynomial roots, 143, 145
Positive definite iteration matrix, 192
Positive definite matrix, 22, 120, 188, 197, 211,
235, 241, 243
Positive definite symmetric matrix, 83
inverse of, 24
Powell. M. J. D., 185, 199
Power method for dominant matrix
eigensolution, 102
Precision,
double, 9, 14
extended, 9, 14
machine, 5, 46, 70
Price, K., 90
Principal axes of a cube, 125
Principal components, 41, 46
Principal moments of inertia, 125
Product of triangular matrices, 74
Program,
choice, 14
coding, 14
compactness, 12
maintenance, 14
readability, 12
reliability, 14
testing, 14
Programming,
mathematical, 13
structured, 12
Programming language, 11, 15
Programs,
manufacturers, 9
sources of, 9
Pseudo-random numbers, 147, 166, 240
QR algorithm, 133
QR decomposition, 26, 49, 50, 64
Quadratic equation, 85, 244
Quadratic form, 22, 89, 190, 198, 235
Quadratic or parabolic approximation, 15 1
Quadratic termination, 188, 199, 236
Quantum mechanics, 28
Quasi-Newton methods, 187
R2 statistic, 45, 63
Radix, 7
Ralston, A., 95, 104, 121, 127, 218
Rank, 20
Rank-deficient case, 24, 25, 55
Index
Rayleigh quotient, 122, 123, 138, 200, 234, 242,
244
minimisation, 250
minimisation by conjugate gradients, 243
Rayleigh-Ritz method, 138
Readability of programs, 12
Real symmetric matrix, 119
Reconciliation of published statistics, 204
Recurrence relation, 166, 198, 235, 246
Reduction,
of simplex, 168, 170
to tridiagonal form, 133
Reeves, C. M., 198, 199
References, 263
Reflection of simplex, 168, 169, 172
Regression, 92
stepwise, 96
Reid, J. K., 234
Reinsch, C., 13, 83, 97, 102, 110, 133, 137, 251
Reliability, 14
Re-numeration, 98
Re-ordering, 99
Residual, 21, 45, 250
uncorrelated, 56, 70
weighted, 24
Residuals, 142, 144, 207
for complex eigensolutions, 117
for eigensolutions, 125, 128
sign of, 142, 207
Residual sum of squares, 55, 79
computation of, 43
Residual vector, 242
Restart,
of conjugate gradients for linear equations, 236
of conjugate gradients minimisation, 199
of Nelder-Mead search, 171
Ribiere, G., 198, 199
Ris, F. N., 253
Root-finding, 143, 145, 148, 159, 160,239
Roots,
of equations, 142
of quadratic equation, 245
Rosenbrock, H. H., 151, 182, 196, 208, 209
Rounding, 7
Row,
orthogonalisation, 49, 54
permutations, 75
Ruhe, A., 234, 242
Rutishauser, H., 127, 134
277
278
Truncation, 7
Two-point boundary value problem, 238
Unconstrained minimisation. 142
Uncorrelated residuals, 56, 70
Uniform distribution. 167
Unimodal function, 149
Unit matrix, 29
Univac 1108, 56, 120
Updating,
formula, 190
of approximate Hessian, 189, 192
V-shaped triple of points, 152
Values,
singular, see Singular values
Varga, R. S., 83
Variable metric.
algorithms, 198
methods, 186, 187, 223, 228, 233
Variables. 142
Variance computation in floating-point
arithmetic. 67
Variance of results from known values, 241
Variation method, 28
Vector. 19, 30
null, 20. 32
residual. 21
Weighting.
for nonlinear least-squares, 207
of constraints, 222
in index numbers. 77
Wiberg, T., 242
Wilkinson, J. H., 13, 28, 75, 83. 86, 97, 102, 105,
110, 119, 127, 133, 137, 251, 253, 254
W+matrix, 254
W- matrix, 108, 254
Wilson, E. B., 28
Yourdon. E., 12
Zambardino, R. A., 13