A First Course in Computational Physics
A First Course in Computational Physics
First Course
In
Computational
Physics
Paul L. DeVries
Miami University
Oxford, Ohio
JOHN WILEY & SONS, INC.
NEWYORK . CHICHESTER . BRISBANE . TORONTO . SINGAPORE
ACQUISITIONS EDITOR: Cliff Mills
MARKETING MANAGER: Catherine Faduska
PRODUCTION EDITOR: Sandra Russell
MANUFACTURING MANAGER: Inez Pettis
This book was set in Century Schoolbook by the author and printed
and bound by Hamilton Printing Company. The cover was printed
by New England Book Components, Inc.
Recognizing the importance of preserving what has been written, it is a
policy ofJohn Wiley & Sons, Inc. to have books ofenduring value published
in the United States printed on acid-free papt'r, and we exert our best
efforts to that end.
Copyright @1994 by John Wiley & Sons, Inc.
All rights reserved. Published simultaneously in Canada.
Reproduction or translation of any part of
this work beyond that permitted by Sections
107 and 108 of the 1976 United States Copyright
Act without the permission of the copyright
owner is unlawful. Requests for permission
or further information should be addressed to
the Permissions Department, John Wiley & Sons, Inc.
Library ofCongretIB cataloging in publication data:
DeVries, Paul L., 1948-
A first course in computational physics / Paul L. DeVries.
p. cm.
Includes index.
ISBN 0-471-54869-3
1. Physics - Data processing. 2. FORTRAN (Computer
program language). 3. Mathematical Physics. I. Title
QC52.D48 1993
530'.0285'5133 - dc20 93-15694
CIP
Printed in the Unites States of America
10 9 8 7 6 5 4 3 2
To Judi, Brandon, and Janna.
Preface
Computers have changed the way physics is done, but those changes are only
slowly making their way into the typical physics curriculum. This textbook is
designed to help speed that transition.
Computational physics is now widely accepted as a third, equally valid
complement to the traditional experimental and theoretical approaches to
physics. It clearly relies upon areas that lie some distance from the traditional
physics curriculum, however. In this text, I attempt to provide a reasonably
thorough, numerically sound foundation for its development. However, I have
not attempted to be rigorous; this is not meant to be a text on numerical anal-
ysis. Likewise, this is not a programming manual: I assume that the student
is already familiar with the elements of the computer language, and is ready
to apply it to the task of scientific computing.
The FORTRAN language is used throughout this text. It is widely
available, continually updated, and remains the most commonly used pro-
gramming language in science. Recent FORTRAN compilers written by Mi-
crosoft provide access to many graphics routines, enabling students to gen-
erate simple figures from within their FORTRAN programs running on PC-
compatible microcomputers. Such graphics capabilities greatly enhance the
value of the computer experience.
The various chapters of the text discuss different types of computa-
tional problems, with exercises developed around problems of physical inter-
est. Topics such as root finding, Newton-Cotes integration, and ordinary dif-
ferential equations are included and presented in the context of physics prob-
lems. These are supplemented by discussions of topics such as orthogonal
polynomials and Monte Carlo integration, and a chapter on partial differen-
tial equations. Afew topics rarely seen at this level, such as computerized to-
mography, are also included. Within each chapter, the student is led from rela-
tively elementary problems and simple numerical approaches through deriva-
tions of more complex and sophisticated methods, often culminating in the
solution to problems of significant difficulty. The goal is to demonstrate how
numerical methods are used to solve the problems that physicists face. The
text introduces the student to the process of approaching problems from a
computational point of view: understanding the physics and describing it in
mathematical terms, manipulating the mathematics to the point where a nu-
viii Preface
merical method can be applied, obtaining a numerical solution, and under-
standing the physical problem in terms of the numerical solution that's been
generated.
The material is intended for the student who has successfully com-
pleted a typical year-long course in university physics. Many of the topics
covered would normally be presented at a later stage, but the computational
approach enables the student to apply more than his or her own analytic tools
to the problem. It is unlikely, however, that any but the most serious of stu-
dents will complete the text in one semester. There is an abundance of mate-
rial, so that the instructor can choose topics that best meet the needs of the
students.
A First Course in Computational Physics is the result of several years
of teaching. Initially, handwritten notes to accompany my lectures were dis-
tributed, and as more material was compiled and old notes rewritten, an out-
line of the text developed. The text has been thoroughly tested, and, needless
to say, I want to thank all my students for their involvement in the project. In
particular, I am grateful to Jane Scipione, Chris Sweeney, Heather Frase, Don
Crandall, Jeff Kleinfeld, and Augusto Catalan for their numerous contribu-
tions. I also want to thank several of my colleagues, including Larry Downes,
Comer Duncan, Ian Gatland, Donald Kelly, Philip Macklin, and Donald Shirer,
for reading and commenting on the manuscript. Any errors that remain are,
of course, my own doing. The author invites any and all comments, correc-
tions, additions, and suggestions for improving the text.
Paul L. DeVries
Oxford, Ohio
January 1993
Contents
Chapter 1: Introduction
FORTRAN - the Computer Language of Science
Getting Started 3
Running the Program 5
A Numerical Example 8
Code Fragments 11
A Brief Guide to Good Programming 13
Debugging and Testing 19
A Cautionary Note 20
Elementary Computer Graphics 21
And in Living Color! 26
Classic Curves 27
Monster Curves 29
The Mandelbrot Set 33
References 39
Chapter 2: Functions and Roots
Finding the Roots of a Function 41
Mr. Taylor's Series 51
The Newton-Raphson Method 54
Fools Rush In ... 58
Rates of Convergence 60
Exhaustive Searching 66
Look, Ma, No Derivatives! 67
Accelerating the Rate of Convergence 71
A Little Quantum Mechanics Problem 74
Computing Strategy 79
References 85
Chapter 3: Interpolation and Approximation
Lagrange Interpolation 86
The Airy Pattern 89
Hermite Interpolation 93
2
1
41
86
x Contents
Cubic Splines 95
Tridiagonal Linear Systems 99
Cubic Spline Interpolation 103
Approximation of Derivatives 108
Richardson Extrapolation 113
Curve Fitting by Least Squares 117
Gaussian Elimination 119
General Least Squares Fitting 131
Least Squares and Orthogonal Polynomials 134
Nonlinear Least Squares 138
References 148
Chapter 4: Numerical Integration
Anaxagoras ofClazomenae 149
Primitive Integration Formulas 150
Composite Formulas 152
Errors... and Corrections 153
Romberg Integration 155
Diffraction at a Knife's Edge 157
A Change of Variables 157
The "Simple" Pendulum 160
Improper Integrals 164
The Mathematical Magic of Gauss 169
Orthogonal Polynomials 171
Gaussian Integration 173
Composite Rules 180
Gauss-Laguerre Quadrature 180
Multidimensional Numerical Integration 183
Other Integration Domains 186
A Little Physics Problem 188
More on Orthogonal Polynomials 189
Monte Carlo Integration 191
Monte Carlo Simulations 200
References 206
Chapter 5: Ordinary Differential Equations
Euler Methods 208
Constants of the Motion 212
Runge-Kutta Methods 215
Adaptive Step Sizes 218
Runge-Kutta-Fehlberg 219
Second Order Differential Equations 226
149
207
354
359
361
The Van der Pol Oscillator 230
Phase Space 231
The Finite Amplitude Pendulum 233
The Animated Pendulum 234
Another Little Quantum Mechanics Problem 236
Several Dependent Variables 241
Shoot the Moon 242
Finite Differences 245
SOR 250
Discretisation Error 250
AVibrating String 254
Eigenvalues via Finite Differences 257
The Power Method 260
Eigenvectors 262
Finite Elements 265
An Eigenvalue Problem 271
References 278
Chapter 6: Fourier Analysis
The Fourier Series 280
The Fourier Transform 284
Properties of the Fourier Transform 286
Convolution and Correlation 294
Ranging 304
The Discrete Fourier Transform 309
The Fast Fourier Transform 312
Life in the Fast Lane 316
Spectrum Analysis 319
The Duffing Oscillator 324
Computerized Tomography 325
References 338
Chapter 7: Partial Differential Equations
Classes of Partial Differential Equations 340
The Vibrating String... Again! 342
Finite Difference Equations 344
The Steady-8tate Heat Equation
Isotherms 358
Irregular Physical Boundaries
Neumann Boundary Conditions
A Magnetic Problem 364
Boundary Conditions 366
Contents xi
280
340
xii Contents
The Finite Difference Equations 367
Another Comment on Strategy 370
Are We There Yet? 374
Spectral Methods 374
The Pseudo-Spectral Method 377
A Sample Problem 385
The Potential Step 388
The Well 391
The Barrier 394
And There's More... 394
References 394
Appendix A: Software Installation
Installing the Software 396
The FL Command 398
AUTOEXEC.BAT 400
README. DOC 401
Appendix B: Using FCCP.lib
Library User's Guide 403
Appendix C: Library Internals
Library Technical Reference 407
Index
396
402
407
421
Chapter 1:
Introduction
This is a book about physics - or at least, about how to do physics. The sim-
ple truth is that the computer now permeates our society and has changed the
way we think about many things, including science in general and physics
in particular. It used to be that there was theoretical physics, which dealt
with developing and applying theories, often with an emphasis on mathemat-
ics and "rigor." There was experimental physics, which was also concerned
with theories, and testing their validity in the laboratory, but was primarily
concerned with making observations and quantitative measurements. Now
there is also computational physics, in which numerical experiments are per-
formed in the computer laboratory - an interesting marriage of the tradi-
tional approaches to physics. However, just as the traditional theoretician
needs a working knowledge of analytic mathematics, and the traditional ex-
perimentalist needs a working knowledge of electronics, vacuum pumps, and
data acquisition, the computational physicist needs a working knowledge of
numerical analysis and computer programming. Beyond mastering these ba-
sic tools, a physicist must know how to use them to achieve the ultimate goal:
to understand the physical universe.
In this text, we'll discuss the tools of the computational physicist,
from integrating functions and solving differential equations to using the
Fast Fourier Transform to follow the time evolution of a quantum mechani-
cal wavepacket. Our goal is not to turn you into a computational physicist,
but to make you aware of what is involved in computational physics. Even
if you, personally, never do any serious computing, it's necessary for you to
have some idea of what is reasonably possible. Most of you, though, will find
yourselves doing some computing, and you are likely to use some aspects of
what's presented here - canned programs rarely exist for novel, interesting
physics, and so we have to write them ourselves! We hope to provide you with
some useful tools, enough experience to foster intuition, and a bit of insight
into how physics in your generation will be conducted.
In this chapter we'll introduce you to programming and to a philoso-
phy of programming that we've found to be useful. We'll also discuss some of
2 Chapter 1: Introduction
the details of editing files, and compiling and running FORTRAN programs.
And most importantly, we'll present a brief description of some of the ele-
ments ofgood programming. This discussion is not intended to transformyou
into a computer wizard - simply to help you write clear, reliable programs
in a timely manner. And finally, we will use simple computer graphics to help
solve physics problems. The capability of visualizing a numerical solution as
it's being generated is a tremendous tool in understanding both the solution
and the numerical methods. Although by no means sophisticated, our graph-
ics tools enable the user to produce simple line plots on the computer screen
easily.
FORTRAN - the Computer Language of Science
There's a large variety of computer languages out there that could be (and
are!) used in computational physics: BASIC, C, FORTRAN, and PASCAL all
have their ardent supporters. These languages have supporters because each
has its particular merits, which should not be lightly dismissed. Indeed, many
(if not most) computing professionals know and use more than one program-
ming language, choosing the best language for the job at hand. Generally
speaking, physicists are usually not "computing professionals." We're not in-
terested in the mechanics of computing except when it directly relates to a
problem we're trying to solve. With this in mind, there's a lot to be said for
using only one language and learning it well rather than using several lan-
guages without being really fluent in any of them.
So that's the argument for a single language. If a need later develops
and you find it necessary to learn a second language, then so be it. But why
should the first language be FORTRAN? If you take a formal programming
course, you will almost certainlybe told that FORTRANis not a good language
to use. However, it's also the language in which the majority of scientific and
engineering calculations are performed. Mter all, FORTRAN was developed
for FORmula TRANslation, and it's very good at that. In the physical sci-
ences, it's been estimated that 90% of all computer time is spent executing
programs that are written in FORTRAN. So, why FORTRAN? Because it's
the language of science!
While we don't agree with all the criticisms of FORTRAN, we don't
ignore them either - we simply temper them with our own goals and expec-
tations. As an example, we're told that structure in programming leads to
inherently more reliable programs that are easier to read, understand, and
modify. True enough. The value of top-down design, the virtue of strong typ-
ing, and the potential horrors of GOTO's have also been impressed upon us.
Getting Started 3
We attempt to follow the tenets of good programming style, for the simple
reason that it does indeed help us produce better programs. And while we're
aware that the language is lacking in some areas, such as data structures,
we're also aware of unique advantages of the language, including the ability
to pass function names in a parameter list. On balance, FORTRAN is not
such a bad choice.
Neither is FORTRAN set in stone. In response to advances in hard-
ware and developments in other languages, the American National Standards
Institute (ANSI) established a committee, X3J3, to consider a new FORTRAN
standard. This new standard, now known as FORTRAN 90, incorporates
many of the innovations introduced in other languages, and extends its data
handling and procedural capabilities. While it might be argued that other
languages presently have advantages over FORTRAN, many of these features
will be incorporated into the new standard and, hence, the perceived advan-
tages will evaporate. Since the adoption of the new standard, there has been
discussion of the development of yet another FORTRAN standard, one that
would take advantage of the fundamental changes taking place in computer
hardware with regard to massively parallel computers. In all likelihood, FOR-
TRAN will be around for a long time to come.
This text assumes that you are FORTRAN literate - that you are
aware of the basic statements and constructs of the language. It does not
assume that you are fluent in the language or knowledgeable in scientific pro-
gramming. As far as computer programming is concerned, this text is primar-
ily concerned with how to perform scientific calculations, not how to code a
particular statement.
Getting Started
Before we're through, you'll learn many things about computing, and a few
things about computers. While most of what we have to say will be pertinent
to any computer or computing system, our specific comments and examples
in this text are directed toward microcomputers. Unfortunately, you need
to know quite a bit to even begin. There are operating systems, prompts,
commands, macros, batch files, and other technical details ad nauseam. We
assume that you already know a little FORTRAN, are reasonably intelligent
and motivated, and have access to an IBM-style personal computer equipped
with a Microsoft FORTRAN compiler. We will then introduce you to the tech-
nicalities a little at a time, as they are needed. Aconsequence of this approach
is that you may never learn all the details, which is just fine for most of you;
we adopt the attitude that you should be required to know as few of these
4 Chapter 1: Introduction
technical details as possible, but no fewer! Ifyou want to know more, the ref-
erence manuals contain all the dry facts, and your local computer wizard (and
there is always a local wizard) can fill you in on the rest. It has been said that
a journey of a thousand miles begins with a single step, and so it is that we
must start, someplace. Traditionally, the first program that anyone writes is
one that says "hello":
write(*,*)'Hello, world'
end
Now, admittedly this program doesn't do much - but at least it doesn't take
long to do it! There are many different ways of writing output, but the WRITE
statement used here is probably the simplest. The first * informs the com-
puter that it should write to the standard input/output device, which is just
the screen. (In a READ statement, an * would inform the computer to read
from the keyboard.) It is also possible to write to the printer, or to another
file, by substituting a unit number for the *. The second * tells the computer
to write in its own way, which is OK for right now. Later, we will specify a
particular FORMAT in which the information is to be displayed. And what we
want displayed is just the message contained between the two single quote
marks. Numerical data can also be displayed using the *, * construction, by
simply giving the name of the variable to be printed.
To execute this program, several steps must be accomplished, begin-
ning with entering this FORTRAN source code into the computer. Actually,
we enter it into a file, which we choose to call HELLO. FOR. For our own benefit,
file names should be chosen to be as descriptive as possible, but they are lim-
ited to 8 characters. The 3-character extension after the period informs the
computer of the nature of the file, and must be. FOR for a FORTRAN source
file.
There are several ways to create the file; the most generally useful
one is with a text editor of some kind. If you are already familiar with an
editor that produces standard ASCII output, by all means use it. (Most word
processors embed special characters in the output file, which would prevent
the file from being used by the FORTRAN compiler. Fortunately, most of
them can be configured to produce standard ASCII output.) Ifyou don't have
a favorite editor, I strongly suggest that you choose a full-screen editor such as
the one Microsoft distributes with its FORTRAN compiler. Whichever editor
you choose, you need to become familiar with its usage so that program entry
is not an obstacle. Let's assume that your editor of choice is named E. (E is
just a fictitious name for an editor. You'll need to substitute the real name of
your particular editor in all that follows.) To invoke this editor, type:
Running the Program 5
E HELLO.FOR
This instructs the computer to run the program named E, which is your text
editor, and it tells the program E that you want to work with a file named
HELLO. FOR. Had the file already existed, E would provide you with a copy of it
to modify. Since it doesn't yet exist, E will create it for you.
The editor creates the file, and then displays a mostly blank screen:
the contents of your file. You can now insert text by typing it in, and you can
move around the screen using the arrow keys. Once you have more than a
full screen of text, you can move up or down a page at a time with the PgUp
and PgDn keys, and Home will (usually) take you to the top of the file. Beyond
these rudimentary commands, you need to learn the specific commands for
your particular editor. The very first command you should learn is the specific
one for your editor that allows you to quit the editor and save your work.
Since FORTRAN code must begin in column 7 or greater, you can
use the space bar to position the cursor in the proper column. (With many
editors, the tab key can also be used to move the cursor.) Microcomputers
and modern program editors are pretty smart, so that the editor probably
knows that computer code is often indented - FORTRAN requires the 6-
space indentation, and you will also want to indent portions of the code to aid
in its readability. We'll assume that E knows about indentation. So space over
to column 7, typewrite(.,.) 'Hello, world', and press the return key. Note
that E moves you down a line and positions the cursor in column 7. You're
ready to type the next line.
Many (ifnot most) modern compilers, includingMicrosoft FORTRAN,
are insensitive to case. That is, the compiler makes no distinction between a
lowercase write and an uppercase WRITE. That's a nice feature, since we can
then use lowercase for most of the FORTRAN statements and reserve the up-
percase to denote IMPORTANT or CRUCIAL statements, thus adding to the
readability ofthe computer code. By the way, the operating system is also case
insensitive, so that we could have typed e hello. for with exactly the same
results. (Note: the "old" FORTRAN standard required all uppercase charac-
ters. The "new" standard allows uppercase and lowercase, but is insensitive
to it.)
Running the Program
There are several steps that must be performed before the FORTRAN source
code you've typed into HELLO. FOR is ready to execute. The first thing is to
6 Chapter 1: Introduction
convert your code into the ones and zeros that the computer understands;
this is called compiling your program, and is done by (of all things) the com-
piler. The result of the compilation is an object file with the name HELLO. OBJ.
Your code will undoubtedly need other information as well, and so it must be
linked with certain system files that contain that information. For example,
sines and cosines are provided as separate pieces of code that must be com-
bined - linked - with yours if you require those functions. The output of
the linker is an executable file named HELLO . EXE. For our example, this process
is not particularly complicated. However, it will become more complicated as
we look at the exercises in this text that need additional information specified
to both the compiler and the linker. These steps must be performed every
time a change is made in the FORTRAN source code and are virtually iden-
tical for all the programs you will want to run. To enter all the information
manually, however, would be extremely tedious and susceptible to error. For-
tunately, there is an easy way for us to specify all the necessary information,
using environment variables. This is described in detail in Appendix A. Af-
ter having followed the instructions outlined there, you can compile and link
your FORTRAN programs by simply typing, for example,
fl hello. for
This is the command to the operating system to compile (and link) your pro-
gram. (A fuller description of this command is given in Appendix A.) A mes-
sage will appear on the screen, identifying the specific version of the FOR-
TRAN compiler being used, and the name of the file being compiled, e.g.,
hello. for. Finally, some information about the LINKER will be displayed.
Note that it is necessary to include the . f or extension. As written, the fl com-
mand compiles and links the program, producing an executable file. Should
you accidentally omit the . FOR extension, only the linking step would be per-
formed.
To run the program (that is, to execute it), you type the filename,
hello, followed by the Enter key. (Note that you do not type".for" after the
filename to execute the program.) Assuming there are no errors, the program
will run and the "hello, world" message will be written to your screen. It's
done! (I hope. If the computer didn't perform as I've indicated, make sure
you that you've followed all the instructions precisely - computers can be
quite picky about every detail. If the program still doesn't run, see the local
wizard.)
We are primarily concerned with the process of solving problems, us-
ing the computer. However, we need to say a little about the computer code
itself. Over time "the code" , e.g., a collection of specific FORTRAN state-
ments, tends to take on a life of its own. Certainly, it's to no one's advantage
Running the Program 7
to "reinvent the wheel." Aparticular program (or even a portion ofa program)
that is of proven value tends to be used over and over again, often modified or
"enhanced" to cover a slightly di erent circumstance. Often these enhance-
ments are written by di erent people over an extended period of time. In
order to control the growth of the code through orderly channels, so that we
can understand the code after we've been away from it for a while or to under-
stand someone else's code, it is absolutely essential that we know and exercise
good programming practices. The first step in that direction is to take respon-
sibility for your work. Add "comment statements" to the top of the program,
with your name and date, and perhaps how you can be reached (phone num-
ber, INTERNET address, etc.) And be sure to add a line or two about what
the program does - or is supposed to do! Your modified code might then look
something like the following:
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* HELLO.FOR is a little program that gets us started.
*
*
*
originally written:
last revised:
write(*,*) 'hello, world
end
9/1/87
1/1/93
pld
pld
Since I modified existingcode to get the current version, I included a history of
the revisions, and initialed the entries. This is a little silly for this program,
since it's strictly an example program. But the idea is important - if we
develop good habits now, we'll have them when we really need them!
The first lines of every computer code should contain the pro-
grammer's name, the date, and the purpose of the code.
Save your work, quit the editor, and fl your program. Ifyour screen
looked exactly like my screen, then it contained an error that the compiler has
found. There's a message on the screen that looks something like:
hello.for(9) error F2031: closing quote missing
This informs you that there is an error in line 9 ofthe file hello .for, and what
the error is. (Unfortunately, not all the error messages are as clear as this
one.) The error number, F2031 in this case, usually doesn't help much, but the
message clearly tells me that I forgot the second quote. (All the error messages
8 Chapter 1: Introduction
are listed in the Microsoft FORTRAN Reference manual.) Now, I knew it was
supposed to be there, honest! It was even there previously - look at the
listing! Apparently, in making modifications to the code I accidentally deleted
the quote mark. I always try to be careful, but even experienced programmers
make silly mistakes. I'll have to invoke my editor, correct my mistake, and fl
the program again.
(With many editors, you can invoke the FORTRAN compiler without
leaving the editor itself. Simply hit the appropriate key - perhaps F5 - and
the editor will save the file and instruct the operating system to compile and
link the program. One of the advantages of "compiling within the editor" is
that the editor will place the cursor on the line ofcomputer code that produced
the error, and the FORTRAN error message will be displayed. You can then
edit the line appropriately, to correct your error. If you have several errors,
you can move through the file, correcting them one at a time. Proceeding
in this way, editors provide a valuable tool for catching simple coding errors.
Unfortunately, they don't detect errors of logic!)
After successfully compiling the program, type:
OIR
This commands the computer to provide you with a list of all the files in the
directory. The following files should be present, although not necessarily in
this order:
HELLO.BAK
HELLO.OBJ
HELLO.EXE
HELLO.FOR
A new file, HELLO. BAK, has appeared. It is a copy of HELLO as it appeared before
the last editing session; that is, it is a backup copy ofthe program. HELLO. FOR is
the current version containing the changes made in the last editing session,
and HELLO. OBJ and HELLO. EXE are the newly created object and executable
files - the old files were overwritten by the compiler and linker. Now when
you run the program, your efforts should be rewarded by a pleasant greeting
from HELLO!
A Numerical Example
While HELLO has given us a good start, most of the work we want to do is of a
A Numerical Example 9
numerical nature, so let's write a simple program that computes something.
Let's see, how about computing the average between two numbers that you
type into the keyboard? Sounds good. You'll read in one number, then an-
other, and then compute and display the average. The FORTRAN code might
look something like this:
Program AVERAGE
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* Program AVERAGE computes the average of two numbers
* typed in at the keyboard, and writes the answer.
*
*
*
originally written: 9/2/87 pld
read(*,*)first
read(*,*)second
average = (first+second)/2
write(*,*) average
end
This code should compile without errors.
Let's briefly discuss how computers work internally. For each vari-
able such as first, a small amount of memory is set aside. The value of the
variable, to about 8 significant digits, is stored in this space. When two such
numbers are multiplied, the result has 16 significant digits. But there's no
room to store them - there's only room for 8! As calculations are performed
the results are rounded so as to fit into the space allotted, introducing an error.
Mter many calculations, the accuracy of the result can be severely degraded,
as will be demonstrated in Exercise 1.2. This is called round-off error, and
can be diminished by allocating more space for each variable, using a DOUBLE
PRECISION statement.
Real variables should always be double precision.
Although "standard" FORTRAN allows default types for variables,
such as single precision for any variable name not starting with i, j, k, 1, m,
or n, we have found this to lead to a lot of trouble. For example, plane waves
are described by the function e
ikx
If we have occasion to write a computer
code involving plane waves, it would only be natural for us to use the variable
k. It would also be good programming, in that the computer program would
10 Chapter 1: Introduction
then closely match the mathematical equations. But unless otherwise stated,
the variable k would be an integer! That is, the statement k=1. 7 would not
cause k to take on the value of 1.7 - rather, it would be the integer I! Such
errors happen to everyone, and are tough to find since the code will compile
and run - it just won't give the correct answers! Rather than trusting our
luck to catch these errors, we will avoid their occurrence in the first place by
adhering to the following rule:
II
Declare all variables.
II
In the computer science/software engineering world, this is termed "strong
typing," and is an integral part of some programming languages, although
not FORTRAN. With the options specified in Appendix A, fl will help you
abide by this edict by issuing a WARNING whenever an undeclared variable is
encountered. It is left to you, however, to respond to the warning. (As pro-
grams are compiled, various ERRORs and WARNINGs might be issued. Corrective
action to remove all errors and warnings should be taken before any attempt
is made to execute the program.)
With these thoughts in mind, our computer code now looks like
Program AVERAGE
***********************************************************
* Paul L. DeVries, Department of Physics, Miami University
*
* Program AVERAGE computes the average of two numbers
* typed in at the keyboard, and writes the answer.
*
*
*
*
* Declarations
*
originally written:
last revised:
9/2/87
1/1/93
pld
pld
DOUBLE PRECISION first, second, average
read(*,*)first
read(*,*)second
write(*,*) average
end
average (first+second)/2.dO
To ensure that the
highest precision
is maintained,
constants should
be specified in
the "D" format.
Code Fragments 11
Use your editor to create a file named average. for containing these lines of
code, and fl to create an executable file. If you now execute the program, by
typing average, you will receive a mild surprise: nothing happens! Well, not
exactly nothing- the computer is waiting for you to type in the first number!
Herein lies a lesson:
Always write a prompting message to the screen before at-
tempting to read from the keyboard.
It doesn't have to be much, and it doesn't have to be fancy, but it should be
there. A simple
write(*,*) 'Hey bub, type in a number!'
is enough, but you need something. Likewise, the output should contain some
identifyinginformation so that you knowwhat it is. It's common for programs
to begin as little exercises, only to grow in size and complexity as time goes
by. Today, you know exactly what the output means; tomorrow, you think you
know; and three weeks from next Tuesday you will have forgotten that you
wrote the program. (But, of course, your authorship will be established when
you look at the comments at the head of the file.)
Output should always be self-descriptive.
EXERCISE 1.1
Modify average. for by adding prompts before the input statements
and rewrite output statements so that they are self-descriptive. Then
change the program so that the number of addends, NUMADD, is arbi-
trary and is obtained when you run the program.
Code Fragments
Throughout this text are listed examples of code and fragments of programs,
which are intended to help you develop your computational physics skills. To
save you the drudgery of entering these fragments, and to avoid the probable
errors associated with their entry, many of these fragments are included on
12 Chapter 1: Introduction
the distribution diskette found in the back of the book. Rarely are these frag-
ments complete or ready-to-run - they're only guides, but they will (hope-
fully) propel you in a constructive direction. Exercises havinga code fragment
on the disk are indicated by a a in the margin. The next exercise, for example,
has a ain the margin, indicating that a fragment pertaining to that exercise
exists on the diskette. Being associated with Exercise 1.2, this fragment is
stored in the file 1.2. Most of the exercises do not have code fragments, sim-
ply because they can be successfully completed by modifying the programs
written for previous exercises. Exercise 1.2 is something of a rarity, in that
the entire code is included. All the software distributed with this textbook,
including these fragments, is discussed in Appendix A.
Let's return to the issue of double precision variables, and demon-
strate the degradation due to round-off error that can occur if they are not
used. Consider the simple program
Program TWO
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* Program TWO demonstrates the value of DOUBLE PRECISION
* variables.
*
*
*
* Declarations
*
last revised: 1/1/93 pld
real x
double precision y
integer i
*
* If we start with one, and add one-millionth a million
* times, what do we get?
*
x=1.
Y = l.dO
do i 1, 1000000
x = x + 0.000001
Y = y + 0.000001dO
end do
write(*,*)' Which is closer to two, ',x,' or',y,'?'
end
Its operation is obvious, but its result is rather startling.
A Brief Guide to Good Programming 13
a EXERCISE 1.2
Run Program TWO. Are you convinced that DOUBLE PRECISION vari-
ables are important?
We should emphasize that we have not actually eliminated the error by using
double precision variables, we have simply made it smaller by several orders
of magnitude. There are other sources of error, due to the approximations
being made and the numerical methods being used, that have nothing to do
with round-off error, and are not affected by the type declaration. For exam-
ple, if we were to use x as an approximation for sin x, we would introduce a
substantial truncation error into the solution, whether the variable x is de-
clared double precision or not! Errors can also be introduced (or existing
errors magnified) by the particular algorithm being used, so that the stability
of the algorithm is crucial. The consistent use of double precision variables
effectively reduces the round-off error to a level that can usually be ignored,
allowing us to focus our attention on other sources of error in the problem.
A Brief Guide to Good Programming
When asked what are the most important characteristics of a good computer
program, many people - particularly novices - will say speed and efficiency,
or maybe reliability or accuracy. Certainly these are desirable characteristics
- reliability, in particular, is an extremely desirable virtue. A program that
sometimes works, and sometimes doesn't, isn't of much value. But a pro-
gram that is not as fast or as efficient or as accurate can still be of use, if we
understand the limits of the program. Don't be concerned with efficiency -
it's infinitely preferable to have a slow, fat code that works than to have a
fast, lean code that doesn't! And as for speed - the only "time" that really
matters is how long it takes you to solve the problem, not how long it takes
the program to execute. The reason the computer program was written in
the first place was to solve a particular problem, and if it solves that problem
- within known and understood limitations - then the program must be
deemed a success. How best can we achieve that success?
I have come to the conclusion that the single most important charac-
teristic of any computer program in computational physics is clarity. If it is
not clear what the program is attemptingto do, then it probably won't do it. If
it's not clear what methods and algorithms are being used, then it's virtually
impossible to know ifit's working correctly. Ifit's not clear what the variables
represent, we can't determine that the assignments and operations are valid.
If the program is not clear, its value is essentially zero.
14 Chapter 1: Introduction
On the other hand, if a program is written to be as clear as possible,
then we are more likely to understand what it's intended to do. If (and when)
errors of logic are made, those errors are more easily recognized because we
have a firm understanding of what the code was designed to do. Simple er-
rors, such as mistaking one variable for another, entry errors, and so on, be-
come much less likely to be made and much easier to detect. Modifying the
code or substituting a new subroutine for an old is made simpler if we clearly
understand how each subroutine works and how it relates to the rest of the
program. Our goal is to "solve" the given problem, but the path to that goal
is made much easier when we strive to write our programs with clarity.
Begin by considering the process of writing a program as a whole. In
the early days of computing there was generally a rush to enter the code and
compile it, often before the problem was totally understood. Not surprisingly,
this resulted in much wasted effort - as the problem became better under-
stood, the program had to be modified or even totally rewritten. Today, the
entire discipline of "software engineering" has arisen to provide mechanisms
by which programs can be reliably written in an efficient manner. The pro-
grams we will develop are generally not so long or involved that all the formal
rules of software development need be imposed - but that doesn't mean we
should proceed haphazardly, either.
Think about the problem, not the program.
A standard approach to any new and difficult problem is to break the original
problem into more manageably sized pieces, or "chunks." This gives us a
better understanding of the total problem, since we now see it as a sequence
of smaller steps, and also provides a clue as to how to proceed - "chunk" it
again! Ultimately, the problems become small enough that we can solve them
individually, and ultimately build a solution to the original problem.
When applied to computational problems, this problem-solving strat-
egy is called top-down design and structuredprogramming. This strategy en-
hances our understanding of the problem while simultaneously clarifying the
steps necessary for a computational solution. In practice, top-down design and
structured programming are accomplished by creating a hierarchy of steps
leading to the solution, as depicted in Figure 1.1. At the highest level, the
problem is broken into a few logical pieces. Then each of these pieces is bro-
ken into its logical pieces, and so on. At each level of the hierarchy, the steps
become smaller, more clearly defined, and more detailed. At some point, the
individual steps are the appropriate size for a SUBROUTINE or FUNCTION. There
are no hard-and-fast rules on how small these units should be - the logical
connectedness of the unit is far more important than the physical length -
A Brief Guide to Good Programming 15
but subroutines exceeding a hundred lines or so of code are probably rare.
Functions, being somewhat more limited in scope, are usually even smaller.
Top-down design doesn't stop at this level, however. Subroutines and func-
tions themselves can be logically broken into separate steps, so that top-down
design extends all the way down to logical steps within a program unit.
The Original Problem
FIGURE 1.1 An example of the hierarchy resulting from top-down
program design.
Clearly, a lot of effort is expended in this design phase. The payoff,
at least partially, is that the programming is now very easy - the individual
steps at the lowest, most detailed level of the design are implemented in com-
puter code, and we can work our way up through the design hierarchy. Since
the function of each piece is clearly defined, the actual programming should
be very straightforward. With the top-down programming design, a hierar-
chy was developed with each level ofrefinement at a lower, more detailed level
than the previous one. AB we actually write the code, the process is reversed
- that is, each discrete unit of code is written separately, and then combined
with other units as the next higher level is written.
Alternatively, one can begin the programming in conjunction with the
top-down design phase. That is, as the top levels of the program are being
designed, the associated code can be written, calling as-yet unwritten subrou-
tines (and functions) from the lower levels. Since the specific requirements
of the lower-level subroutines will not be known initially, this code will nor-
mally be incomplete - these stubs, however, clearly reflect the design of the
program and the specific areas where further development is required. The
most important factor in either case is that the coding is secondary to the
design of the program itself, which is accomplished by approaching the total
16 Chapter 1: Introduction
problem one step at a time, and successively refining those steps.
II
Design down.
II
As we write the computer code for these discrete steps, we are natu-
rally led to structured programming. Subroutines at the lower levels of the
hierarchy, for example, contain logical blocks of code corresponding to the
lowest steps of the design hierarchy. Subroutines at the higher levels con-
sist of calls to subroutines and functions that lie below them in the hierarchy.
We can further enhance the structure of the program by insisting on linear
program flow, so that control within a subroutine (or within a block of code)
proceeds from the top to the bottom in a clear and predictable fashion. GOTO
statements and statement labels should be used sparingly, and there should
never be jumps into or out of a block of code. The IF-THEN-ELSE construct
is a major asset in establishing such control. To assist in readability, capital-
ization is used to help identify control structures such as IF-THEN-ELSE and
DO-loops. We also use indentation to provide a visual clue to the "level" of the
code and to separate different blocks of code.
As you develop the actual code, constantly be aware of the need for
clarity. There's rarely a single line or block of code that works - usually,
there are many different ways a specific task can be implemented. When con-
fronted with a choice, choose the one that leads to the clearest, most readable
code. Avoid cleverness in your coding - what looks clever today might look
incomprehensible tomorrow. And don't worry about efficiency - modern com-
pilers are very good at producing efficient code. Your attempts to optimize the
code might actually thwart the compiler's efforts. The primary technique for
improving efficiency is changing algorithms, not rewriting the code. (Unless
you know what needs to be rewritten, and how to improve it, your efforts will
likely be wasted, anyway.)
Overall program clarity is further promoted by appropriate documen-
tation, beginning with a clear statement of what the program, subroutine, or
function is supposed to be doing. It may happen that the code doesn't actually
perform as intended, but if the intention isn't clearly stated, this discrepancy
is difficult to recognize. The in-line documentation should then describe the
method or algorithm being used to carry out that intention, at the level of
detail appropriate for that subprogram in the design hierarchy. That is, a
subroutine at the highest level should describe the sequence of major steps
that it accomplishes; a description of how each of these steps is accomplished
is contained within the subroutines that accomplish them. This description
need not be - and in fact, should not be - a highly detailed narrative. Rather,
A Brief Guide to Good Programming 17
the requirement is that the description be clear and precise. If the Newton-
Raphson method is being used, then the documentation should say so! And if
a particular reference was used to develop the implementation ofthe Newton-
Raphson method, then the reference should be cited!
We've already argued that each program, subroutine, and function
should contain information describing the intended function of the subpro-
gram, the author, and a revision history. We've now added to that list a state-
ment of the method used and references to the method. Furthermore, the
input and output of the program unit should be described, and the meaning
of all variables. All this information should reside at the beginning of the
program unit. Then within the unit, corresponding to each block of struc-
tured code, there should be a brief statement explaining the function of that
block of code. Always, of course, the purpose of this documentation is clarity
- obvious remarks don't contribute to that goal. AP, an example, there are
many instances in which we might want to calculate the value of a function
at different points in space. A fragment of code to do that might look like
* Loop over "i" from 1 to 10
DO i= 1, 10
END DO
The comment statement in this fragment does not contribute to clarity. To
the extent that it interferes with reading the code, it actually detracts from
clarity. A better comment might be
* Loop over all points on the grid
But it still doesn't tell us what is being done! A truly useful comment should
add something to our understanding of the program, and not simply be a
rewording of the code. An appropriate comment for this case might be
* Calculate the electrostatic potential at
* all points on the spatial grid.
This comment succinctly describes what the block of code will be doing, and
significantly contributes to the clarity of the program. Good documentation
is not glamorous, but it's not difficult, either.
Another way to enhance the clarity of a program is through the choice
of variable names. Imagine trying to read and understand a computer code
calculatingthermodynamic quantities in which the variable x34 appears. The
name x34 simply does not convey much information, except perhaps that it's
18 Chapter 1: Introduction
the 34th variable. Far better to call it pressure, if that's what it is. (Within
standard FORTRAN 77, this variable name is too long. However, as noted,
Microsoft (and many other) compilers will accept the longer name. In FOR-
TRAN 90, the limit is 31 alphanumeric characters.) Using well-named vari-
ables not only helps with keeping track of the variables themselves, it helps
make clear the relations between them. For example, the line
x19=x37*y12
doesn't say much. Even a comment might not really help:
* Calculation of force from Newton's second law
x19=x31*y12
But when well-chosen names are used, the meaning of the variables and the
relation between them become much clearer:
Force = Mass * Acceleration
Because the meaning of the FORTRAN statement is now obvious, we don't
need the comment statement at all! This is an example of self-documenting
code, so clear in what is being done that no further documentation is needed.
It's also an example of the extreme clarity that we should strive for in all our
programming efforts. When good names are combined with good documenta-
tion, the results are programs that are easy to read and to understand, and to
test and debug- programs that work better, are written in a shorter amount
of total time, and provide the solution to your problems in a timely manner.
A result of top-down design and structured programming is that the
subroutines can easily be made self-contained. Such modularity is a definite
advantage when testing and debugging the program, and makes it easier to
maintain and modify at a later time. During the process of refinement, the
purpose and function of each subroutine has been well defined. Comments
should be included in the subroutine to record this purpose and function, to
specify clearly the required input and output, and to describe precisely how
the routine performs its function. Note that this has the effect ofhiding much
detailed information from the rest of the program. That is, this subroutine
was designed to perform some particular task. The rest ofthe programdoesn't
need to know how it's done, only the subroutine's required input and expected
output. If, for some reason, we want to know, then the information is there,
and the program as a whole is not overburdened by a lot of unnecessary detail.
Furthermore, if at some later time we want to replace this subroutine, we will
then have all the information to know exactly what must be replaced.
Debugging and Testing 19
Debugging and Testing
When the programming guidelines we've discussed are utilized, the result-
ing programs are often nearly error-free. Still, producing a totally error-free
program on the first attempt is relatively rare. The process of finding and re-
moving errors is called debugging. To some, debugging is nearly an art form
- perhaps even a black art - yet it's a task that's amenable to a systematic
approach.
The entire debugging process is greatly facilitated by our top-down,
structured programming style, which produces discrete, well-defined units of
code. The first step is to compile these units individually. Common errors,
such as misspelling variable names and having unmatched parenthesis, are
easily detected and corrected. With the specified options to fl, undeclared
variables will also be detected and can be corrected, and variables that have
been declared but are unused will be reported. All errors and warnings gen-
erated by the compiler should be removed before moving on to the next phase
of the process.
Mter a subroutine (or function) has been successfully compiled, it
should be tested before being integrated into the larger project. These tests
will typically include several specific examples, and comparing the results to
those obtained by hand or from some other source. It's tempting to keep these
tests as simple as possible, but that rarely exercises the code sufficiently. For
example, a code that works perfectly with an input parameter of x = 1.0, 2.0,
or 3.0 might fail for x = 1.23. Remember, you are trying to see if it will fail,
not if it will succeed. Another item that should always be tested is the behav-
ior of the code at the ends - the first time through a loop, or the last, are
often where errors occur.
Another common problem that is easily addressed at this level is one
of data validity. Perhaps you've written a FUNCTION to evaluate one of the
special functions that often arise in physics, such as the Legendre polynomial,
a function defined on the interval -1 ~ x ~ 1. What does the code do if the
argument x = 1. 7 is passed to it? Obviously, there is an error somewhere
for this to have happened. But that error is compounded if this FUNCTION
doesn't recognize the error. All functions and subroutines should check that
the input variables are reasonable, and if they're not, they should write an
"error message" stating the nature of the problem and then terminate the
execution of the program.
This is a good point at which to rethink what's been done. As the pro-
gram was designed, an understanding of the problem, and of its solution, was
developed. Now, a step in that solution has been successfully implemented,
20 Chapter 1: Introduction
and your understanding is even deeper. This is the time to ask, Is there a better
way ofdoing the same thing? With your fuller understanding of the problem,
perhaps a clearer, more concise approach will present itself. In particular,
complex, nested, and convoluted control structures should be reexamined -
Is it really doing what you want? Is there a more direct way of doing it?
You should also think about generalizing the subroutine (or function).
Yes, it was designed to perform a specific task, and you've determined that it
does it. But could it be made more general, so that it could be used in other
situations? Ifyou needed to do a taskonce, you'll probably need to do it again!
Invest the extra effort now, and be rewarded later when you don't have to
duplicate your work. A little additional effort here, while the workings of
the subroutine are fresh in your mind, can make subsequent programming
projects much easier.
Once we have ascertained that the code produces correct results for
typical input, thoroughly documented the intended purpose of the code and
the methods used, included input verification, reexamined its workings and
perhaps generalized it, the code can be marked "provisionally acceptable" and
we can move on to the next subprogram. After all the subroutines and func-
tions within one logical phase of the project are complete, we can test that
phase. We note that the acceptance of a subroutine or program is never more
than provisional. The more we use a particular piece of code, the more confi-
dent we become of it. However, there is always the chance that within it there
lurks a bug, just waiting for an inopportune time to crawl out.
A Cautionary Note
From time to time, we all make near catastrophic mistakes. For example, it's
entirely possible - even easy - to tell the computer to delete all your work.
Clearly, you wouldn't do this intentionally, but such accidents happen more
often than you might think. Some editors can help - they maintain copies of
your work so that you can UnDo your mistakes or Restore your work in case
of accidents. Find out if your editor does this, and if so, learn how to use it!
There are also "programmer's tools" that allowyou to recover a "deleted" file.
As a final resort, make frequent backups of your work so that you never lose
more than the work since the last backup. For example, when entering a lot of
program code, or doing a lot of debugging, make a backup every half-hour or
so. Then, if the unspeakable happens, you've only lost a few minutes of work.
Plan now to use one or more of these "safety nets." Alittle time invested now
can save you an immense amount of time and frustration later on.
Elementary Computer Graphics 21
Elementary Computer Graphics
It's often the case that a sketch of a problem helps to solve and understand
the problem. In fact, visualization is an important topic in computational
physics today - computers can perform calculations much faster than we
can sift through paper printout to understand them, and so we are asking
the computer to present those results in a form more appropriate for human
consumption, e.g., graphically. Our present interest is very simple - instead
of using graph paper and pencil to produce our sketches by hand, let's have
the computer do it for us.
The FORTRAN language doesn't contain graphics commands itself.
However, graphics are so important that libraries containing graphical sub-
routines have been written, so that graphics can be done from within FOR-
TRAN programs. In particular, Microsoft distributes such a library with its
FORTRAN compiler, Version 5.0 or later. But creating high-quality graphics
is quite difficult, even with all the tools. Our emphasis will be in producing
relatively simple graphs, for our own use, rather than in creating plots to be
included in textbooks, for example. With this in mind, we have chosen to use
only a few ofthe graphics subroutines available to us. The results will be sim-
plistic, in comparison to what some scientific graphics packages can produce,
but entirely adequate for our purposes. We have even simplified the access to
the library, so that simple plots can be produced with virtually no knowledge
ofthe Microsoft graphics library. (See Appendix Bfor more information about
graphics.)
There are two subroutines that must be called for every plot:
glNIT ... initializes the graphics package and prepares the computer and the
graphical output device for producing a drawing. It must be the first graphics
subroutine called.
gEND ... releases the graphics package, and returns the computer to its
standard configuration. This should be the last call made, after you have
finished viewing the graph you created - the last action taken is to clear the
screen.
After gINIT has been called, the entire screen is available for draw-
ing. Where your graph will be displayed is referred to as the viewport: as this
discussion might suggest, it's possible to place the viewport anywhere on the
physical display device - at this time, the viewport covers the entire screen.
To change the location (or size) of the viewport, use
VIEWPORT( X1, Y1, X2, Y2 ) ... locates the viewport on the physical display
22 Chapter 1: Introduction
device. Scaled coordinates are used: the lower left side of the display is the
point (0,0), and the upper right side is (1,1).
Before the data can be displayed, you must inform the graphics li-
brary of the range of your data. That is, you must map the lower and upper
limits of your data onto the physical screen, e.g., the viewport. This is called
a window on your data. After we work through the jargon, the viewport and
window concepts make it very easy for us to design our graphs. Remember,
the coordinates of the viewport relate to the physical display device while the
coordinates of the window relate to the particular data being displayed. So,
we have the command WINDOW,
WINDOW( X1, Y1, X2, Y2 ) ... maps data in the range Xl ~ X < X2 and
YI ~ Y ~ Y2 onto the viewport.
We're now ready to consider the commands that actually produce the
plot. After the viewport and the window have been set, we're ready to draw a
line.
L1NE( X1, Y1, X2, Y2 ) ... draws a line from (Xl, yd to (X2, Y2).
The parameters in the call to LINE are simply the X and y values of
your data in their natural units, as declared to the graphics package by WINDOW.
Both endpoints of the line are specified.
This might all seem a little confusing - let's look at a simple program to
demonstrate these routines:
Program DEMO
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* This little program demonstrates the use of some of the
* elementary graphics commands available.
* January 1, 1993
*
double precision xl,yl,x2,y2
*
* Start the graphics package, and initialize WINDOW.
*
call gINIT
*
* the data range is
*
-10 < x < 10,
O<y<1.
Elementary Computer Graphics 23
*
xl -10.dO
x2 +10.dO
yl O.dO
y2 1. dO
call WINDOW( xl, yl, x2, y2 )
call LINE( -10.dO, .5dO, 10.dO, .5dO
)
call LINE( O.dO, O.dO, O.dO, 1.dO )
call gEND
END
This program should produce a "+" on your computer screen.
These routines must always be called with double precision ar-
guments. The preferred approach is to use variables as argu-
ments, as WINDOW was called. If called with constants, the ar-
guments MUST be specified in a Dformat, as LINE was called.
Failure to provide double precision arguments will cause un-
predictable results.
EXERCISE 1.3
Verify that DEMO works as advertised.
Let's work through something a little more complicated: plotting a
sine curve, on the domain 0 :S x :S 10. To approximate the continuous curve,
we'll draw several straight-line segments - as the number of line segments
increases, the graph will look more and more like a continuous curve. Let's
use 50 line segments, just to see what it looks like. It would also be nice if the
axes were drawn, and perhaps some tickmarks included in the plot. Finally,
let's use a viewport smaller than the entire screen, and locate it in the upper
middle of the physical screen. The appropriate code might then look like the
following:
Program SINE
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* A quick example: drawing a sine curve.
* January 1, 1993
24 Chaprerl:Introduction
*
double prec1s10n xl,yl,x2,y2
integer i
*
* Start the graphics package, and initialize WINDOW.
*
call gINIT
*
* Put the viewport in the upper middle of the screen.
*
*
* The data range is
*
*
xl
x2
yl
y2
call
0.25dO
0.75dO
0.40dO
0.90dO
VIEWPORT( xl, yl, x2, y2 )
o < x < 10,
-1 < Y < 1.
*
* Draw x-axis
*
xl
x2
yl
y2
call
O.dO
+lO.dO
-1. dO
1. dO
WINDOW( xl, yl, x2, y2 )
call LINE( xl, O.dO, x2, O.dO )
*
* Put tickmarks on x-axis
*
yl -0.04dO about 1/50 of the y range
y2 O.OOdO
DO i = 1, 10
xl = dble(i)
call LINE( xl, yl, xl, y2 )
END DO
*
* Draw y-axis
*
call LINE( xl, yl, xl, y2 )
*
* Draw sine curve
*
Elementary Computer Graphics 25
DO i = 1, 50
xl = dble(i-l)*0.2dO
yl sin(xl)
x2 = dble( i )*O.2dO
y2 = sin(x2)
call LlNE( xl, yl, x2, y2 )
END DO
call gEND
END
This code should produce the line drawing in Figure 1.2. For us, graphics are
an aid to our understanding, so that simple drawings such as this are entirely
adequate. We should, however, add some labeling to the figure. Text can be
written with a WRITE statement, but we need to position the label appropri-
ately. To position the cursor, we use the command
CURSOR( row, column) ... which moves the cursor to the designated row
and column. (The row numbering is from top to bottom.) row and column are
integer variables.
FIGURE 1.2 Plot of a sine curve, 0 < x < 10.
a EXERCISE 1.4
Reproduce Figure 1.2, with the appropriate legend added.
Of course, there's no restriction to plotting only one curve on a figure!
EXERCISE 1.5
Plot both the sine and cosine curves in a single figure.
26 Chapter 1: Introduction
And in Living Color!
The addition of color greatly enhances the quality of graphic presentations. If
your computer is equipped with a color display, you can determine the number
of colors available to you with
NOC( NUMBER ) ... which returns the number of colors available in the
integer number.
To change the color being used, we can use the subroutine
COLOR( INDEX) ... sets the current color to the one indexed by INDEX. All
subsequent lines will be drawn in this color until changed by another call to
COLOR. Under normal circumstances, INDEX = 0 is black, while the highest
indexed color is white.
In general, we'll find color very useful. For example, we can distin-
guish between two curves drawn on the same graph by drawing them with
different colors. If the indices of the two desired colors are "I" and "2," then
we might have
*
* draw first curve
*
call color(l)
DO i =
END DO
*
* draw second curve
*
call color(2)
DO i =
END DO
EXERCISE 1.6
Repeat the previous exercise, drawing the sine curve in one color and
the cosine curve in another.
Color can also be used to draw attention to the plot, and to add an
Classic Curves 27
aesthetic element to it. An added background color can be visually pleasing,
while emphasizing the presence of the plot itself. In combination with CURSOR
to label the graph and provide textual information, the result can be impres-
sive. A rectangular box can be filled with a color using
FILL( X1, Y1, X2, Y2 ) ... fills the box defined by the corners (x1, y1) and
(x2, y2) with the current color.
For example, a background color can easily be added to your sine plot:
call WINDOW ( xl, yl, x2, y2 )
call NOC ( NUMBER )
*
* Fill the background with the color with INDEX 2
*
call COLOR(2)
call FILL ( xl, yl, x2, y2 )
*
* Restore color index to white
*
call COLOR ( NUMBER-l )
EXERCISE 1.7
Add a background color to your sine/cosine drawing program. You
will want to experiment with different indices to find the color most
appealing to you.
Classic Curves
We feel compelled to comment that while graphics are extraordinarily useful,
they're also a lot offun. And that's great -learning should be an enjoyable
process, else it's unlikely to be successful! The ease with which we can gener-
ate curves and figures also inspires us to explore the various possibilities. Of
course, we're not the first to tread this path - generations of mathematicians
have investigated various functions and curves. Let's apply the modern capa-
bilities of computers, and graphing, to some of the classic analytic work, for
the sole purpose of enjoying and appreciating its elegance. Table 1.1 presents
some of the possibilities.
28 Chapter 1: Introduction
TABLE 1.1 Classic Curves In the Plane
Bifolium
Cissoid of Diocles
Cochleoid
Conchoid of Nicomedes
Deltoid
Evolute of ellipse
Folium of Descartes
Hypocycloid with four cusps
Involute of a circle
Lemniscate of Bernouli
Limacon of Pascal
Nephroid
Ovals of Cassini
Logarithmic Spiral
Parabolic Spiral
Spiral of Archimedes
Spiral of Galileo
Strophoid
(x
2
+ y2)2 = ax2y
r = asinOcos
2
0
y2(a - x) =x
3
r =asinOtanO
(x
2
+ y2)tan-
1
(y/x) =ay
rO = asin8
(y _ a)2(x2 + y2) = /l-y2
r=acsc8b
x = 2a cos 4> + a cos 24>
y =2a sin 4> - a sin 24>
(ax)2/3 + (by)2/3 = (a
2
_ /l-)2/3
x = a' -b' cos34>
a
y = a'"bb' sin
3
4>
X
3
+y3-3axy=O
3asin 8 cos 8
r = --;:-----,,.-
cos
3
8 + sin
3
8
X2/3 +y2/3 = a2/3
x =acos
3
4>, y =asin
3
4>
x = a cos4> + a4> sin 4>
y = a sin 4> - a4>cos4>
(x2 + y2)2 =a2(x2 _y2)
r
2
= a
2
cos 28
(x2 + y2 _ ax)2 = b2(x2 + y2)
r=b+acos8
x =a(3 cos 4> - cos 34
y =a(3 sin 4> - sin 34
(x
2
+ y2 + b2)2 _ 4b
2
x
2
= k4
r
4
+ b
4
- 2r
2
b
2
cos 28 = k
4
(r - a)2 = 4ak8
r =a8
y2=X
2a
-X
a+x
r = a cos 28 sec 8
Three-leaved rose
Tractrix
Witch ofAgnesi
Monster Curves 29
T = a sin 39
x = a sech-1y/x - Ja
2
- y2
Sa
3
y = x 2 + 4a2
x = 2a cot , y =a(l- cos2)
Note that some of these curves are more easily described in one coordinate system
than another. Other curves are most easily expressed parametrically: both x and y
are given in terms of the parameter .
EXERCISE 1.8
Plot one (or more) of these classic figures. Many of these curves are
of physical and/or historic interest - do a little reference work, and
see what you can find about the curve you chose.
The availability of computer graphics has encouraged the exploration
offunctions which otherwise would never have been considered. For example,
Professor Fey of the University of Southern Mississippi suggests the function
r = ecose - 2cos 40 +sin
5
(0/12),
an eminently humane method of butterfly collecting.
EXERCISE 1.9
(1.1)
Plot Fey's function on the domain 0 :S 0 :S 2411". The artistic aspect of
the curve is enhanced if it's rotated 90 by using the transformation
x = r cos(O +11"/2),
Y = rsin(O +11"/2).
Monster Curves
The figures we've been discussing are familiar to all those with a background
in real analysis, and other "ordinary" types of mathematics. But mathemati-
cians are a funny group - around the turn of the century, they began ex-
ploring some very unusual "curves." Imagine, if you will, a curve that is ev-
erywhere continuous, but nowhere differentiable! A real monster, wouldn't
30 Chapter 1: Introduction
you say? Yet the likes of Hilbert, Peano, and Sierpinski investigated these
monsters - not a lightweight group, to say the least.
One particularly curious figure was due to Helge von Koch in 1904.
As with many of these figures, the easiest way to specify the curve is to de-
scribe how to construct it. And perhaps the easiest way to describe it is with
production rules. (This is the "modern" way of doing things - Helge didn't
know about these.)
Let's imagine that you're instructing an incredibly stupid machine to
draw the figure, so we want to keep the instructions as simple as possible. To
describe the von Koch curve, we need only four instructions: F, to go forward
one step; +, to turn to the right by 60; -, to turn to the left by 60; and T,
to reduce the size of the step by a factor of one-third. To construct the von
Koch curve, we begin with an equilateral triangle with sides of unit length.
The instructions to draw this triangle are simply
F++F++F++.
(1.2)
To produce a new figure, we follow two rules: first, add a T to the beginning
of the instruction list; and second, replace F by a new set of instructions:
F--+F-F++F-F. (1.3)
That is, every time an F appears in the original set of instructions, it is re-
placed by F - F ++F - F. Ifwe follow these rules, beginning with the original
description of the figure, we produce a new list of instructions:
TF - F+ +F - F+ +F - F+ +F - F + +F - F+ +F - F+ +. 0.4)
The first such figures are presented in Figure 1.3. To obtain the ultimate
figure, as depicted in Figure 1.4, simply repeat this process an infinite number
of times!
FIGURE 1.3 The first three steps of the von Koch construction.
Monster Curves 31
FIGURE 1.4 The von Koch curve, a.k.a. the Snowflake.
How long is the curve obtained? A "classical" shape, like a circle,
has a length that can be approached by taking smaller and smaller chords,
larger and larger n-polygon approximations to the circle. It's an "ordinary,"
finite length line on a two-dimensional surface. But the von Koch curve is
not so ordinary. The length of the original curve was three units, but then
each side was replaced by four line segments of one-third the original length,
so that its length was increased by four-thirds. In fact, at every iteration, the
length ofthe curve is increased by four-thirds. So, the length ofthe curve after
an infinite number of iterations is infinite! Actually, the length of the curve
between any two points on it is infinite, as well. And yet it's all contained
within a circle drawn around the original equilateral triangle! Not a typical
figure.
We can invent other curves as well, with the addition of some new
instructions. Let's add R, to turn right 90, L to turn left 90, and Qto reduce
the step length by a factor of a quarter. A square can then be drawn with the
instructions
FRFRFRFR. (1.5)
An interesting figure can then be produced by first appending Qto the front
of the list, and making the replacement
EXERCISE 1.10
F -. FLFRFRFFLFLFRF. (1.6)
Draw the figure resulting from three iterations of these production
rules, which we might call a variegated square. Note that while the
32 Chapter 1: Introduction
length of the perimeter continues to increase, the area bounded by
the perimeter is a constant.
Should we care about such monsters? Do they describe anything in
Nature? The definitive answer to that question is not yet available, but some
interesting observations can be made about them, and their relation to the
physical world.
For example, exactly how long is the coastline of Lake Erie? If you
use a meter stick, you can measure a certain length, but you clearly have not
measured the entire length because there are features smaller than a meter
that were overlooked. So you measure again, this time with a half-meter stick,
and get a new "length." This second length is larger than the first, and so you
might try again with a shorter stick, and so on. Now the question is: Do the
lengths converge to some number. The answer is: Not always - the von Koch
curve, for example.
Benoit Mandelbrot has spent the better part of the last thirty years
looking at these questions. He has coined the word fractal to describe such
geometries, because the curve is something more than a line, yet less than an
area - its dimension is fractional.
Consider a line segment. If we divide it into N identical pieces, then
each piece is described by a scale factor r = l/N. Now consider a square: ifit
is divided into N identical pieces, each piece has linear dimensions r = l/VN.
And for a cube, r = 1/ ifN. So, apparently, we have
or
1
r= erN'
D = 10gN
10g(1/r) ,
(1.7)
(1.8)
which we'll take as the definition of the fractal dimension. Back to the von
Koch curve: in the construction, each line segment is divided into 4 pieces
(N = 4), scaled down by a factor of 3 (r = 1/3). Thus
log 4
D = -1-:=::::: 1.2618 ...
og3
(1.9)
This says that somehow the curve is more than a simple line, but less than an
area.
The Mandelbrot Set 33
EXERCISE 1.11
What is the fractal dimension of the variegated square?
The Mandelbrot Set
Mandelbrot has done more than coin a word, of course. Scarcely a person
this side of Katmandu has not heard of and seen an image of the Mandelbrot
set, such as Figure 1.5. It has become the signature of an entirely new area
of scientific investigation: chaotic dynamics. And yet it's "created" by an
extraordinarily simple procedure - for some complex number c, we start with
z = 0 and evaluate subsequent z's by the iteration
z = z2 + C. (1.10)
If z remains finite, even after an infinite number of iterations, then the point
c is a member of the Mandelbrot set. (It was a simple matter to construct the
von Koch curve, too!)
FIGURE 1.5 The Mandelbrot set.
We want to use our newfound graphics tools to help us visualize the
Mandelbrot set. We will use the most basic, simpleminded, and incredibly slow
method known to calculate the set: straightforward application of Equation
(1.10). And even then, we'll only approximate the set - an infinite number
of iterations can take a long time.
34 Chapter 1: Introduction
It turns out that if Izl ever gets larger than 2, it will eventuallybecome
infinite. So we'll only iterate the equation a few times, say 30; if Izl is still
less than 2, then there's a good chance that c is a member of the set. The
computing involved is rather obvious, and shouldn't be much trouble. The
graphics, on the other hand, are a little trickier.
In our desire to achieve device-independence, we have "agreed" to
forego knowledge about the actual graphics device. But in this instance, we
need that knowledge. In particular, we want to fill the screen with relevant
information, which means that we need to know the location of the individual
pixels that make up the image. We can determine the maximum number of
pixels of the screen by calling
MAXVIEW( NX, NY) ... returns the size of the physical display, in pixels. NX
and NY are integer variables.
Thus (0,0) is one corner of the screen - actually, the upper left corner -
and (NX,NY) is the opposite comer. To construct the image of Figure 1.5, we
let c = x +iy and considered the domain -1. 7 :S x :S 0.8 and -1.0 :S y :S 1.0.
That is, we mapped the domain of c onto the physical pixels of the screen. To
illustrate, let's be somewhat conservative and consider only a limited portion
of the screen, say, an 80 x 100 pixel region. Then the (x, y) coordinate of the
(i,j)-th pixel is
and
2.5
x=-1.7+i-
100'
(1.11)
y = -1.0 +j ~ g . (1.12)
A15 we cycle through the 80 x 100 array of pixels, we are considering different
specific values of c. And for each c, we test to determine if it's a member of
the Mandelbrot set by iterating Equation (1.10). The appropriate computer
code to display the Mandelbrot set might then look like the following:
Program Mandelbrot
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* This program computes and plots the Mandelbrot set,
* using a simpleminded and incredibly slow procedure.
*
*
*
January 1, 1993
double precision xl, yl, x2, y2, x, y
complex*16 c, Z, im
The Mandelbrot Set 35
integer i, j, NX, NY, counter, ixO, iyO
+ , Max_Iteration
parameter( im = (O.dO,l.dO), Max_Iteration 30)
*
* Start the graphics package
*
call gINIT
*
* Find the maximum allowed viewport size
*
call MAXVIEW(nx,ny)
*
* When I know the code will work, I'll use more pixels.
* For right now, be conservative and use 100 x-pixels
* and 80 y-pixels, centered.
*
ixO nx/2-50
iyO ny/2-40
*
* Image to be computed in the region -1. 7 < x < 0.8,
*
-1.0 <
Y
< 1.0.
*
xl -1.7dO
x2 0.8dO
yl -1. dO
y2 1. dO
*
* Cycle over all the pixels on the screen to be included
* in the plot. Each pixel represents a particular complex
* number, c. Then determine, for this particular c, if
* z -) infinity as the iteration proceeds.
*
DO i 0, 100
x = xl + i * (x2-xl)/100.dO
DO j 0, 80
Y yl + j * (y2-yl)/80.dO
C = X + im*Y
*
* Initialize z, and begin the iteration
*
z = O.dO
counter = 0
36 Chapter 1: Introduction
100 z = z*z + c
IF( abs(z) .It. 2.dO) THEN
counter = counter + 1
if( counter .It. Max_Iteration) GOTO 100
*
* If z is still < 2, and if counter = Max_Iteration,
* call the point "c" a member of the Mandelbrot set:
*
call pixel( ixO+i, iyO+j )
ENDIF
END DO
END DO
call gEND
END
The variables c, z, and im have been declared COMPLEX*16, so that both their
real and imaginary parts are double precision. Ifa pixel is found to be a mem-
ber of the set, turning it on is accomplished by a call to the subroutine pixel,
PIXEL( I, J ) ... turns on the i-th horizontal and j-th vertical pixel. This
subroutine is device dependent, and should be used with some care.
As noted, with MAXVIEW we could have determined the maximum size
of the computer screen and used all ofthe display, but we've chosen not to do
so. That would cause the code to loop over every pixel on the screen, which
can take quite a while to execute. Until we're sure ofwhat we have, we'll keep
the drawing region small.
a EXERCISE 1.12
Produce a nice picture of the Mandelbrot set, following the sugges-
tions presented here.
One of the really interesting things about fractals is their self-similarity -
that is, as you look at the image closer and closer, you see similar shapes
emerge. With the von Koch snowflake, this self-similarity was virtually exact.
With the Mandelbrot set, the images are not exact duplicates of one another,
but are certainly familiar.
EXERCISE 1.13
Pick a small section ofthe complex plane, where something is happen-
ing, and generate an image with enlarged magnification. For example,
The Mandelbrot Set 37
you might investigate the region -0.7 ~ x ~ -0.4, 0.5 ~ y ~ 0.7. By
mapping these coordinates onto the same region of the screen used
previously, you are effectively magnifying the image.
As you investigate the boundary of the Mandelbrot set at a finer scale, you
should also increase the number of iterations used to determine membership
in the set. Determining membership is an example of a fundamentallimita-
tion in computation - since we cannot in practice iterate an infinite number
of times, we are always including a few points in our rendering that do not
really belong there.
What sets fractals apart from other unusual mathematical objects
is their visual presentation, as seen in numerous books and even television
shows. There is even some debate as to whether "fractals" belong to mathe-
matics, or to computer graphics. In either case, the images can certainly be
striking. Their aesthetic appeal is particularly evident when they are ren-
dered in color. In the current instance, we'll use color to indicate the number
of iterations a particular point c survived before it exceeded 2. That is, rather
than denoting points in the Mandelbrot set, we'll denote points outside the
Mandelbrot set, with the color of the point indicating how far outside the set
the point is. NOC can be used to determine the number of colors available, and
the "membership" loop will be replaced by
call NOC( number )
*
* Initialize z, and begin the iteration
*
100
z = O.dO
counter = 0
z = z*z + c
IF( abs(z) .It. 2.dO ) THEN
counter = counter + 1
if( counter .It. Max_Iteration) GOTO 100
ELSE
*
* If z > 2, denote how far OUTSIDE the set
* the point "c" is:
*
call COLOR( mod( counter, number) )
call PIXEL( ixO+i, iyO+j )
ENDIF
Setting the color index in this way has the effect of cycling through all
38 Chapter 1: Introduction
the colors available on your particular computer. Since even on monochrome
systems there are two "colors" - black and white - the images this produces
can be quite interesting.
EXERCISE 1.14
Try this code to explore the Mandelbrot set, and see if you can find
some particularly pleasing venues. Remember that as you "magnify"
the image, you should also increase the iteration maximum beyond
the thirty specified in the listing of the code.
These images can be almost addictive. Certainly, the images are very
interesting. As you attempt greater magnification, and hence increase the
iteration maximum, you will find that the calculation can be painfully slow.
The problem is in fact in the iteration, particularly as it involves complex
arithmetic. In general, we're not particularly interested in efficiency. But in
this case we can clearly identify where the code is inefficient, so that effort
spent here to enhance its speed is well spent. Unfortunately, there is little
that can be done within FORTRAN for this particular problem.
It is possible to gain a considerable improvement in speed ifyour com-
puter is equipped with an 80x87 coprocessor, however. By writing in machine
language, we can take advantage of the capabilities of the hardware. Such a
subroutine has been written and is included in the software library distributed
with this text. It can be used by replacing the previous code fragment with
the following:
*
*******************************************************
*
*
*
*
*
*
Use ZIP_87 to perform the iteration.
Valid ONLY if your computer is equipped
with an 80x87 coprocessor!!!!!!
*
*
*
*
*
*
*******************************************************
*
call ZIP_87( c, Max_Iteration, counter)
IF( counter .It. Max_Iteration) THEN
*
* Magnitude of z IS greater than 2, "c" IS NOT a
* member of the Mandelbrot set. Denote how far
* OUTSIDE the set the point "c" is:
*
References 39
call COLOR( mod( counter, number) )
call PIXEL( ixO+i, iyO+j )
ENDIF
You should see a substantial increase in the speed of your Mandelbrot pro-
grams using ZIP_87, provided that your computer is equipped with the nec-
essary hardware.
References
The ultimate reference for the FORTRAN language is the reference manual
supplied by the manufacturer for the specific compiler that you're using. This
manual will provide you with all the appropriate commands and their syntax.
Such manuals are rarely useful in actually learning how to program, however.
There are numerous books available to teach the elements of FORTRAN, but
there are few that go beyond that. Three that we can recommend are
D.M. Etter, Structured FORTRAN 77 For Engineers and Scientists,
Benjamin/Cummings, Menlo Park, 1987.
Michael Metcalf, Effective FORTRAN 77, Oxford University Press,
Oxford, 1988.
Tim Ward and Eddie Bromhead, FORTRAN and the Art of PC Pro-
gramming, John Wiley & Sons, New York, 1989.
Also of interest is the optimistic
Michael Metcalf and John Reid, FORTRAN 8x Explained, Oxford
University Press, Oxford, 1987,
written before the delays that pushed the acceptance of the "new" standard
into 1990.
The Mandelbrot set and the topic of fractals have captured the imagination
of many of us. For the serious enthusiast, there's
Benoit B. Mandelbrot, The Fractal Geometry of Nature, W. H. Free-
man, New York, 1983.
Two of the finest books, both including many marvelous color photographs,
are
40 Chapter 1: Introduction
H.-O. Peitgen and P. H. Richter, The Beauty of Fractals, Springer-
Verlag, Berlin, 1986.
The Science of Fractal Images, edited by Heinz-Otto Peitgen and Di-
etmar Saupe, Springer-Verlag, Berlin, 1988.
Chapter 2:
Functions and Roots
Anatural place for us to begin our discussion of computational physics is with
a discussion of functions. After all, the formal theory of functions underlies
virtually all of scientific theory, and their use is fundamental to any practical
method ofsolvingproblems. We'll discuss some general properties, but always
with an eye toward what is computationally applicable.
In particular, we'll discuss the problem of finding the roots of a func-
tion in one dimension. This is a relatively simple problemthat arises quite fre-
quently. Important in its own right, the problem provides us an opportunity
to explore and illustrate the interplay among formal mathematics, numeri-
cal analysis, and computational physics. And we'll apply it to an interesting
problem: the determination of quantum energy levels in a simple system.
Finding the Roots of a Function
We'll begin our exploration of computational physics by discussing one of the
oldest of numerical problems: finding the x value for which a given function
f(x) is zero. This problem often appears as an intermediate step in the study
ofa larger problem, but is sometimes the problemofinterest, as we'll find later
in this chapter when we investigate a certain problem of quantum mechanics.
For low order polynomials, finding the zero of a function is a trivial
problem: ifthe function is f (x) = x - 3, for example, the equation x - 3 = 0 is
simply rearranged to read x = 3, which is the solution. Closed form solutions
for the roots exist for quadratic, cubic, and quartic equations as well, although
they become rather cumbersome to use. But no general solution exists for
polynomials of fifth-order and higher! Many equations involving functions
other than polynomials have no analytic solution at all.
So what we're really seeking is a method for solving for the root of a
nonlinear equation. When expressed in this way, the problem seems anything
42 Chapter 2: Functions and Roots
but trivial. To help focus our attention on the problem, let's consider a specific
example: let's try to find the value of x for which x = cosx. This problem is
cast into a "zero of a function" problem simply by defining the function of
interest to be
f(x) = cosx - x. (2.1)
Such transcendental equations are not (generally) solvable analytically.
The first thing we might try is to draw a figure, such as Figure 2.1, in
which cos x and x are plotted. The root is simply the horizontal coordinate at
which the two curves cross. The eye has no trouble finding this intersection,
and the graph can easily be read to determine that the root lies near x = 0.75.
But greater accuracy than this is hard to achieve by graphical methods. Fur-
thermore, if there are a large number of roots to find, or if the function is
not an easy one to plot, the effectiveness of this graphical method rapidly de-
creases - we have no choice but to attempt a solution by numerical means.
What we would like to have is a reliable numerical method that will provide
accurate results with a minimum of human intervention.
1.0
0.8
f(x) 0.6
0.4
0.2
0.0 -j<--,----,-----,---,-----,------,--,----'v--
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
x
FIGURE 2.1 The functions x and cos x.
From the figure, it's clear that a root lies between zero and 7r /2; that
is, the root is bracketed between these two limits. We can improve our brackets
by dividing the interval in half, and retaining the interval that contains the
root. We can then check to see how small the bracket has become: if it is still
too large, we halve the interval again! By doing this repeatedly, the upper
and lower limits of the interval approach the root. Sounds like this just might
work! Since the interval is halved at each step, the method is called bisection.
The general construction of the computer code might be something like
< Main Program identification>
< declare variables needed here>
Finding the Roots ofa Function 43
< initialize limits of bracket>
< call Root Finding subroutine >
< print result, and end>
*----------------------------------------------------------
< Subroutine identification>
< declaration of variables, etc.>
< prepare for looping: set initial values, etc.>
< TOP of the loop>
< Body of the loop: divide the interval in half,
determine which half contains the root,
redefine the limits of the bracket>
IF (bracket still too big) go to the TOP of the loop
< loop finished:
put on finishing touches (if needed), and end>
These lines are referred to as pseudocode since they convey the intended con-
tent of the program but don't contain the actual executable statements. Let's
first determine what the program should do, then worry about how to do it.
II
Think first, then code.
II
As indicated in the pseudocode, it's extremely important to use sub-
routines and functions to break the code into manageable pieces. This mod-
ularity aids immensely in the clarity and readability of the code - the main
program simply becomes a list of calls to the various subroutines, so that the
overall logic of the program is nearly transparent. Modularity also isolates
one aspect of a problem from all the others: all the details of finding the root
will be contained in the root-finding subroutine, in effect hidden from the rest
ofthe program! This structure generally helps in developing and maintaining
a program as well, in that a different approach to one aspect of the problem
can be investigated by swapping subroutines, rather than totally rewriting
the program.
Mter havinga general outline ofthe entire programin pseudocode, we
can now go back and expand upon its various components. In a large, compli-
cated project, this refinement step will be performed several times, each time
resulting in a more detailed description ofthe functioning ofthe code than the
44 Chapter 2: Functions and Roots
previous one. In this process, the program practically "writes itself." For this
project, we might begin the refinement process by determining what needs to
be passed to the subroutine, and naming those variables appropriately. For
example, the subroutine should be given the limits of the bracketed interval:
let's name these variables Left and Right, for example, in the subroutine.
(Note, however, that different names, such as x_initial and x-final, might
be more appropriate in the main program.) In one sense, providingreasonable
names for variables is simple stuff, easy to implement, and not very impor-
tant. Not important, that is, until you try to remember ( 6 months from now)
what a poorly named variable, like x34b, was supposed to mean!
Give your variables meaningful names, and declare them ap-
propriately.
We also need to provide the subroutine with the name ofthe function.
The function, or at least its name, could be defined within the subroutine,
but then the subroutine would need to be changed in order to investigate a
different function. One of the goals of writing modular code is to write it once,
and only once! So we'll declare the function as EXTERNAL in the main program,
and pass its name to the subroutine. Oh, and we want the root passed back
to the calling program! Our first refinement of the code might then look like
< Main Program identification>
*
* Type Declarations
*
DOUBLE PRECISION x_initial, x-final, Root, FofX
*
External FofX
*
* Initialize variables
*
The function f(x) will be provided
in a separate subprogram unit.
x_initial
x-final
O.dO
1. 57dO A close approximation to pi/2.
*
* call the root-finding subroutine
*
call Bisect( x_initial, x-final, Root, FofX )
*
< print result, and end>
Finding the Roots ofa Function 45
*----------------------------------------------------------
Double Precision function FofX(x)
*
* This is an example of a nonlinear function whose
* root cannot be found analytically.
*
Double Precision x
FofX = cos (x) - x
end
*----------------------------------------------------------
Subroutine Bisect( Left, Right, middle, F )
*
<prepare for looping: set initial values, etc.>
<TOP of the loop>
<Body of the loop: divide the interval in half,
determine which half contains the root,
redefine the limits of the bracket>
IF (bracket still too big) go to the TOP of the loop
< loop finished:
put on finishing touches (if needed), and end>
The main program is almost complete, although we've yet to begin the root-
finding subroutine itself! This is a general characteristic of the "top-down"
programing we've described - first, write an outline for the overall design
of the program, and then refine it successively until all the details have been
worked out. In passing, we note that the generic cosine function is used in
the function definition rather than the double precision function dcos - the
generic functions, which include abs, sin, and exp, will always match the data
type of the argument.
Now, let's concentrate on the root-finding subroutine. The beginning
of the code might look something like the following:
*----------------------------------------------------------
Subroutine Bisect( Left, Right, Middle, F )
*
* Paul L. DeVries, Department of Physics, Miami University
*
* Finding the root of f(x), known to be bracketed between
* Left and Right, by the Bisection method. The root is
* returned in the variable Middle.
46 Chapter 2: Functions and Roots
*
*
*
* Type Declarations
*
start date: 1/1/93
DOUBLE PRECISION f, x
DOUBLE PRECISION Left, Right, Middle
DOUBLE PRECISION fLeft, fRight, fMiddle
*
* Initialization of variables
*
fLeft
fRight
Middle
fMiddle
f(Left)
f(Right)
(Left+Right)/2.dO
f(Middle)
In addition to using reasonable names for variables, we've also tried to use
selective capitalization to aid in the readability of the code. The single most
important characteristic of your computer programs should be their clarity,
and the readability of the code contributes significantly to that goal. The
FORTRAN compiler does not distinguish between uppercase and lowercase,
but we humans do!
Howdo we determine which subinterval contains the root? Ai?, in most
puzzles, there are many ways we can find the answer. And the idea that occurs
first is not necessarily the best. It's important to try various ideas, knowing
that some will work and some won't. This particular puzzle has a well-known
solution, however. If the root is in the left side, then fLeft and fMiddle will
be of opposite sign and their product will be either zero or negative. So, if
the expression fLeft * fMiddle . Ie. 0 is true, the root is in the left side;
if the expression is false, the root must be in the right side. Voila! Having
determined which subinterval contains the root, we then redefine Middle to
be Right if the root is in the left subinterval, or Middle to be Left if the root
is in the right subinterval. This part of the code then looks like
*
* Determine which half of the interval contains the root
*
IF (fLeft * fMiddle .le. 0 ) THEN
*
* The root is in left subinterval:
*
Right = Middle
Finding the Roots ofa Function 47
fRight fMiddle
ELSE
*
*
The root is in right subinterval:
*
Left Middle
fLeft fMiddle
ENDIF
*
*
The root is now bracketed between Left and Right!
*
The IF-THEN-ELSE construct is ideally suited to the task at hand. The IF state-
ment provides a condition to be met, in this case that the root lie in the left
subinterval. If the condition is true, any statements following THEN are exe-
cuted. Ifthe condition is not true, any statements following ELSE are executed.
The ENDIF completes the construct, and must be present.
Our pseudocode is quickly being replaced by actual code, but a critical
part yet remains: how to terminate the process. Exactly what is the appro-
priate criterion for having found a root? Or to put it another way, what is
the acceptable error in finding the root, and how is that expressed as an error
condition?
Basically, there are two ways to quantify an error: in absolute terms,
or in relative terms. The absolute error is simply the magnitude of the differ-
ence between the true value and the approximate value,
Absolute error = 1true value - approximate value I. (2.2)
Unfortunately, this measure of the error isn't as useful as you might think.
Imagine two situations: in the first, the approximation is 1178.3 while the
true value is 1178.4, and in the second situation the approximation is 0.15
while the true value is 0.25. In both cases the absolute error is 0.1, but clearly
the approximation in the first case is better than in the second case. To gauge
the accuracy of the approximation, we need to know more than merely the
absolute error.
A better sense of the accuracy of an approximation is (usually) con-
veyed using a statement of relative error, comparing the difference between
the approximation and the true value to the true value. That is,
ReI t
E _I True Value - Approximate Value 1
a Ive rror - T 'U_I
rue vwue
(2.3)
48 Chapter 2: Functions and Roots
The relative error in the first situation is 1 1 1 0 7 ~ . 4 1 = 0.00008, while in the sec-
ond case it is I~ 2 ~ I = 0.4. The higher accuracy of the first case is clearly as-
sociated with the much smaller relative error. Note, of course, that relative
error is not defined ifthe true value is zero. And as a practical matter, the only
quantity we can actually compute is an approximation to the relative error,
Approximate Relative Error
= I Best Approximation - Previous Approximation I (2.4)
Best Approximation .
Thus, while there may be times when absolute accuracy is appropri-
ate, most of the time we will want relative accuracy. For example, wanting to
know x to within 1% is usually a more reasonable goal than simply wanting
to know x to within 1, although this is not always true. (For example, in plan-
ning a trip to the Moon you might well want to know the distance to within
1 meter, not to within 1%1) Let's assume that in the present case our goal is
to obtain results with relative error less than 5 x 10-
8
. In this context, the
"error" is simply our uncertainty in locating the root, which in the bisection
method is just the width of the interval. (By the way, this accuracy in a trip
to the Moon gives an absolute error of about 20 meters. Imagine being 20
meters above the surface, the descent rate of your lunar lander brought to
zero, and out of gas. How hard would you hit the surface?) Mter declaring
these additional variables, adding write statements, and cleaning up a few
loose ends, the total code might look something like
*----------------------------------------------------------
Subroutine Bisect( Left, Right, Middle, F )
*
* Paul L. DeVries, Department of Physics, Miami University
*
* Finding the root of the function "F" by BISECTION. The
* root is known(?) to be bracketed between LEFT and RIGHT.
*
*
*
*
* Type Declarations
*
start date: 1/1/93
DOUBLE PRECISION Left, Right, Middle
DOUBLE PRECISION f, fLeft, fRight, fMiddle
DOUBLE PRECISION TOL, Error
*
* Parameter declaration
Finding the Roots ofa Function 49
*
PARAMETER( TOL = 5.d-03)
*
* Initialization of variables
*
FLeft
FRight
f(Left)
f(Right)
*
* Top of the Bisection loop
*
100 Middle
FMiddle
(Left+Right)/2.dO
f(Middle)
*
* Determine which half of the interval contains the root
*
IF( fLeft * fMiddle .le. 0 ) THEN
The root is in left subinterval:
Right Middle
fRight fMiddle
ELSE
The root is in right subinterval:
Left Middle
fLeft fMiddle
ENDIF
The root is now bracketed between (new) Left and Right
*
*
*
*
*
*
* Check for the relative error condition: If too big,
* bisect again; if small enough, print result and end.
*
*
*
*
Error = ABS( (Right-Left)/Middle
IF( Error .gt. TOL ) GOTO 100
write(*,*)' Root found at ',Middle
end
Remove after
routine has
been debugged.
We've introduced, and declared as DOUBLE PRECISION, the parameter TaL to
describe the desired tolerance - you might recall that the parameter state-
ment allows us to treat TaL as a named variable in FORTRAN statements but
50 Chapter 2: Functions and Roots
prohibits us from accidentally changing its value. In the initial stages ofwrit-
ing and testing the program, TOL can be made relatively large and any lurking
errors found quickly. Only after the code is known to be functioning properly
should the tolerance be decreased to achieve the desired level of accuracy. Al-
ternatively, we could pass the tolerance to the subroutine as a parameter. We
have also coded a write statement in the subroutine. This is appropriate dur-
ing the initial development phase, but should be removed - or "commented
out" by inserting * in column 1 - after we're satisfied that the subroutine is
working properly. The purpose of this routine is to find the root - if writing
it is desired, it should be done in a routine which calls Bisect, not in Bisect
itself.
At the bottom of the loop, the relative error is computed. If the error
is greater than the declared tolerance, another bisection step is performed; if
not, the result is printed. To get to the top of the loop, a GOTO statement is
used. Generally speaking, GOTOs should be avoided, as they encourage us to be
sloppy in structuring programs. This can lead to programs that are virtually
unreadable, making it easy for bugs to creep into them and more difficult for
us to be rid of them. But in this instance - returning program control to the
top of a loop - there is no suitable alternative. The program certainly looks
like it's ready to run. In fact, it's foolproof!
Well, maybe not foolproof. It might happen that six months from now
you need to find a root, and so you adopt this code. But what if you mis-
judge, or simply are incorrect, in your bracketing? As developed, the code
assumes that there is a root between the initial Left and Right. In practice,
finding such brackets might be rather difficult, and certainly calls for a differ-
ent strategy than the one implemented here. ModifY the code to include an
explicit check that verifies that the root is bracketed. This verification should
be performed after all the initialization has been accomplished but prior to
the beginning of the main loop. In large, complex computer codes such data
validation can become of preeminent importance to the overall success of the
computation.
a EXERCISE 2.1
Using the code we've developed, with your data validation, find the
root of the equation f(x) = cos x - x = 0 by the method of bisection.
Howmany iterates are necessary to determine the root to 8 significant
figures?
Well, bisection works, and it may be foolproof, but it certainly can be slow!
It's easy to determine how fast this method is. (Or slow, as the case may be.)
Defining the error as the difference between the upper and lower bounds on
Mr. Taylor's Series 51
the root, at every iteration the error is halved. If i is the error at the i-th step,
at the next step we have i+l = d2, and the rate of convergence is said to be
linear. Given initial brackets, we could even determine how many iterations
are necessary to obtain a specific level of accuracy.
It would seemthat there would be a better way to find the root, at least
for a nice, smoothly varyingfunction such as ours. And indeed, there is such a
method, due to Newton. But we need to use information about the function,
and how it changes - that is, derivative information. That information is
most succinctly provided in a Taylor series expansion of the function. Since
this expansion is so central to much of what we are doing now, and will be
doing in latter chapters, let's take a moment to review the essentials of the
series.
Mr. Taylor's Series
In 1715 Brook Taylor, secretary of the Royal Society, published Methodus In-
crementorum Directa et lnversa in which appears one of the most useful
expressions in much of mathematics and certainly numerical analysis. Al-
though previously known to the Scottish mathematician James Gregory, and
probably to Jean Bernoulli as well, we know it today as the Taylor series.
Since it will playa central role in many of our discussions, it's appropriate
that we take a little time to discuss it in some detail. Let's assume that the
function f(x) has a continuous nth derivative in the interval a ::; x ::; b. We
can then integrate this derivative to obtain
Integrating again, we have
l
X2
1
X1
flnJ(xo)dxodxl = l
x2
(tln-1J(x
1
) - f[n-l](a)) dXl
= f[n-2] (X2) - f[n-2
J
(a) - (X2 - a)fln-1J(a).
Continuing in this way we find, after n integrations,
l
xn
lxl
a .. a flnJ(x) dxo dXn-l = f(x
n
) - f(a) - (x
n
- a)f'(a)
(2.5)
(2.6)
_ (x
n
- a)2 /"( ) _ (x
n
- a)3 /"'( ) ... _ (x
n
- a)n-l f[n-l]( ) (2.7)
2! a 3! a (n _ I)! a .
52 Chapter 2: Functions and Roots
To simplify the appearance of the expression, we now substitute x for x
n
, and
solve for f (x) to obtain the Taylor series:
(x a)2 (x a)3
f(x) = f(a) + (x - a)f'(a) + ;! I"(a) + ;! fll/(a) +...
(
X a)n-l
+ - f1n-l1(a) +Rn(x), (2.8)
(n - 1)!
where
Rn(x) = l
x
.. l
x1
fln](x)dxodxn_l. (2.9)
This remainder term is often written in a different way. Using the mean value
theorem of integral calculus,
l
X
q(y) dy = (x - a)q(), for a :s :s x,
(2.10)
and integrating n - 1 more times, the remainder term can be written as
a form originally due to Lagrange. If the function is such that
lim Rn = 0,
n-+oo
(2.11)
(2.12)
then the finite series can be extended to an infinite number of terms and we
arrive at the Taylor series expression for f (x).
To illustrate, consider the Taylor series expansion of sin x about the
point x =0:
with remainder
f(x) = sin x,
f'(x) = cos x,
I"(x) = - sinx,
f'" = - cos x,
f(O) = 0,
1'(0) = 1,
1"(0) = 0,
1"'(0) = -1,
(2.13)
{
(_1)n/2xn sin, neven,
Rn(x) = n! x
n
(_1)(n-l)/2, cos, n odd.
n.
(2.14)
Mr. Taylor's Series 53
Since the magnitudes of the sine and cosine are bounded by 1, the magnitude
of the remainder satisfies the inequality
(2.15)
For any given x, the factorial will eventually exceed the numerator and the
remainder will tend toward zero. Thus we can expand f(x) = sinx as an
infinite series,
f(x) = f(O) + (x - 0)1'(0) + (x - 0)21"(0) + (x - 0)31"'(0) + ...
2! 3!
x
3
x
5
= X - 3! + 5! +.... (2.16)
This is, of course, simply the well-known approximation for the sine function.
1.0,..-----------------------,
0.5
f(x)
0.0
-1.0 0.0
x
1.0
FIGURE 2.2 A decidedly unpleasant function!
Now, it might seem to be a small step to say that all functions that
possess an infinite number of derivatives can be expressed as a Taylor series;
but of course, falling off a cliff only takes a small step, too. Consider the
function
f(x) = {e-
1
/
X2
, for x # 0,
0, for x = 0,
54 Chapter 2: Functions and Roots
which is plotted in Figure 2.2. This is not a "nasty" function - it is well-
behaved, and goes smoothlyto zero as x -> o. In fact, it goes to zero so strongly
that all its derivatives go to zero there as well. If we then try to use Taylor's
series about a = 0, we find that f(x) = 0 +0 + ... = 0, everywhere!
The Newton-Raphson Method
To appreciate some of the power of Taylor's series, we'll use it to develop
Newton's method to find the zeros of a function. Assume that we have a good
"guess," so that (x - a) is a small number. Then keeping just the first two
terms of the Taylor series, we have
f(x) ~ f(a) + (x - a)!,(a). (2.17)
We want to find that value of x for which f(x) =0; setting f(x) equal to zero
and solving for x, we quickly find
f(a)
x = a - f'(a)"
FIGURE 2.3 The Newton-Raphson in action. Beginning with Xo,
the successive iterates move closer to the zero of the function. The
location of X4 and the actual crossing are indistinguishable on this
scale.
(2.18)
The Newton-Raphson Method 55
To see how this works, take a look at Figure 2.3. At x = a the function
and its derivative, which is tangent to the function, are known. Assuming that
the function doesn't differ too much from a straight line, a good approximation
to where the function crosses zero is where the tangent line crosses zero. This
point, being the solution to a linear equation, is easily found - it's given by
Equation (2.18)! Then this point can be taken as a new guess for the root, the
function and its derivative evaluated, and so on. The idea ofusing one value to
generate a better value is called iteration, and it is a very practical technique
which we will use often. Changing the notation a little, we can calculate the
(i + l)-th value Xi+! from the i-th value by the iterative expression
(2.19)
For the function f(x) = cosx - x, we have that f'(x) = - sin x - 1.
We know that cos x = 1 at x = 0, and that cos x = 0 at x = 7r /2, so we might
guess that x = cos x somewhere around Xo = 7r/4. We then calculate the zero
to be at
Xl = ~ - c o s ~ / 4 - 7r/4 = 0.739536134. (2.20)
4 -sm7r/4-1
This is a pretty good result, much closer to the correct answer of 0.739085133
than was the initial guess of 7r/ 4 = 0.785398163. And as noted, this result can
be used as a new guess to calculate another approximation to the location of
the zero. Thus
= 0739536134 - cos(0.739536134) - 0.739536134 = 0 739085178
X2 . _ sin(0.739536134) - 1 . ,
(2.21)
a result accurate to 7 significant digits.
Beginning with an initial guess xo, the expression is iterated to gen-
erate Xl, X2, , until the result is deemed sufficiently accurate. Typically we
want a result that is accurate to about eight significant digits, e.g., a relative
error of 5 x 10-
8
That means that after each evaluation of Xi+l it should be
compared to Xi; if the desired accuracy has been obtained, we should quit.
On the other hand, if the accuracy has not been obtained, we should iterate
again. Since we don't know how many iterations will be needed, a DO loop
is not appropriate for this task. Rather, we need to implement a loop with a
GOTO statement, being sure to include an exit out of the loop after the error
condition has been met.
Program ROOTS
Double Precision x, FofX, DERofF
56 Chapter 2: Functions and Roots
External FofX, DERofF
x = 0.8dO
call Newton ( x, FofX, DERofF )
write(*,*), Root found at x =',x
end
*----------------------------------------------------------
Double Precision Function FofX(x)
*
* This is an example of a nonlinear function whose
* root cannot be found analytically.
*
Double Precision x
FofX = cos (x) - x
end
*----------------------------------------------------------
Double Precision Function DERofF(x)
*
* This function is the derivative of "F of X."
*
Double Precision x
DERofF = -sin(x) - 1.dO
end
*----------------------------------------------------------
Subroutine Newton ( x, F, Fprime )
*
* Paul L. DeVries, Department of Physics, Miami University
*
* Preliminary code for root finding with Newton-Raphson
*
*
*
* Type declarations
*
start date: 1/1/93
DOUBLE PRECISION x, F, Fprime, delta, error, TOL
*
* Parameter declarations
*
PARAMETER( TOL = 5.d-03)
*
* Top of the loop
*
100 delta -f (x)/fprime (x)
x = x + delta
*
The Newton-Raphson Method 57
* Check for the relative error condition: If too big,
* loop again; if small enough, end.
*
Error = ABS( delta / x )
IF ( Error .gt. TOL) GOTO 100
end
a EXERCISE 2.2
Verify that this code is functioning properly by finding (again) where
x = cosx. Compare the effort required to find the root with the
Newton-Raphson and the bisection methods.
AP, we noted earlier, finding the roots of equations often occurs in a larger
context. For example, in Chapter 4 we will find that the zeros of Legendre
functions playa special role in certain integration schemes. So, let's consider
the Legendre polynomial
R ( ) = 6435x
s
- 12012x
6
+6930x
4
- 1260x
2
+35
s x 128 '
-1 :S x :S 1, (2.22)
and try to find its roots. Where to start? What would be a good initial guess?
Since only the first derivative term was retained in developing the Newton-
Raphson method, we suspect that we need to be close to a root before using
it. For Ixl < 1, we have x
S
< x
6
< x
4
< x
2
Let's (temporarily) ignore all the
terms in the polynomial except the last two, and set this truncated function
equal to zero. We thus have
and hence
R ( ) ~ - 1 2 6 0 x ~ +35 = 0
S xo 128 '
(2.23)
[35 (T 1
xo = y 1260 = y 36 = 6 (2.24)
Thus 0.167 should be an excellent guess to begin the iteration for the smallest
non-negative root.
EXERCISE 2.3
Use the Newton-Raphson method to find the smallest non-negative
root of Ps(x).
Root-finding in general, and the Newton-Raphson method in partic-
ular, arise in various rather unexpected places. For example, how could we
58 Chapter 2: Functions and Roots
find the two-thirds power of 13, with only a 4-function hand calculator? That
is, we want to find x such that
Cubing both sides, we have
or
Interesting.
EXERCISE 2.4
x = 13
2
/
3
.
x
3
- 169 = O.
(2.25)
(2.26)
(2.27)
Use the Newton-Raphson method to solve Equation (2.27) for the
two-thirds root of 13, to 8 significant figures.
Fools Rush In ...
Well, our Newton-Raphson method seems to be working so well, let's try it
to find a root of the equation f(x) = x
2
+ 1 = O. No problem. Starting the
iteration with Xo = 1/v'3, we find the first few iterates to be
Xo = .577350269
Xl = -.577350269
X2 = .577350269
X3 = -.577350269
X4 = .577350269
X5 = -.577350269
Oops.
Clearly, there's a problem here. Not only are the iterates not con-
verging, they're just flopping back and forth between 1/ v'3. This example
forces us to realize that we haven't yet thought enough about how to approach
computing. Perhaps the first thing we need to realize is that the world is not
always nice; sometimes things just don't go our way. While we always try to
do things right, errors will always be made, and we can't expect the computer
to protect us from ourselves. In particular, the computer can't do something
Fools Rush In... 59
we haven't instructed it to do - it's not smarter than we are. So, just because
the computer is there and is eager to help us if we only type in the code, we
must first decide if the problem is suitable for computation. Perhaps it needs
to be transformed into an equivalent problem, stated differently; perhaps an-
other algorithm should be used; or perhaps we should have realized that the
problem at hand has no real roots.
Computing is not a substitute for thinking.
As we write programs to solve problems of interest, we'll try to an-
ticipate various situations. This will probably require extra programming for
special cases. In the problem above, we might have tried to determine if a real
root existed before we tried to find it. Certainly we can't expect the program
to know what we don't. What we need is a strategy toward computing, agame
plan, so to speak. The Newton-Raphson method is a robust algorithm that
often works very well- it's like a high-risk offense, capable of aggressively
finding a root. What we've missed is what every sports fan knows: while
offense wins games, defense wins championships. We should strive to be in-
telligent in our computing, anticipating various possibilities, and trying not
to make errors. But we should be prepared to fall short of this goal- it's
simply not possible to foresee all eventualities. Reasonable people know that
mistakes will occasionally be made, and take appropriate steps before they oc-
cur. Ifwe can't instruct the computer howalways to act correctly, let's at least
instruct it how not to act incorrectly! A major key to successful programming
is to
IL.:l========c=o=m=pu=t=e=d=e=fi=e=n=si=v=e=ly='======::J
But how do we instruct the computer not to act incorrectly? In this
case, it's pretty easy. When the iteration worked, it was very fast. But when
it failed, it failed miserably. Thus we can require that the algorithm either
finds a root in just a few steps, or quits. Of course, if it quits it should inform
us that it is quitting because it hasn't found a root. (There's nothing quite
as frustrating as a computer program that stops with no explanation. Or
one that prints ERROR, without any indication of what kind of error or where
it occurred.) The maximum number of iterates could be passed to the code
fragment, but it is easier to just set a maximum that seems much greater
than you would ever need, say 30 iterations. The idea is to prevent infinite
loops and to provide a graceful exit for the program.
60 Chapter 2: Functions and Roots
EXERCISE 2.5
Modify the displayed code to keep track of the number of iterations,
and to exit the loop and end after writing an appropriate message if
a root is not found in 30 iterations. Test your modifications on the
example above, f(x) = x
2
+1, with an initial x of 0.5. You might want
to print out the iterates, just to see how the search proceeds.
Rates of Convergence
We previously saw that the bisection method was linear in its convergence.
Our experience with the Newton-Raphson scheme is that if it converges, it
converges very rapidly. To quantify this statement, let's return to the Taylor
series
() () ( )
'( ) (x - Xi)2 "( )
f x = f Xi + X - Xi f Xi + 2! f Xi +....
(2.28)
We'll assume that we're in a region where the function is well behaved and the
first derivative is not small. Keeping the first two terms and setting f(x) = 0,
we're led to the iteration
(2.29)
just as before. Recall that both Xi and Xi+l are approximations to the location
of the root - if X is where the root actually is, then f.i = X - Xi is a measure
of the error in Xi. Subtracting X from both sides of Equation (2.29), we find
(2.30)
an expression relating the error at one iteration to the next. Let's return to
the Taylor series, this time keeping the first three terms in the series. Setting
f(x) = 0 and solving for f(Xi), we find
2
f(Xi) = -f.i!'(Xi) - ~ f"(Xi). (2.31)
Rates of Convergence 61
Substituting this expression into the previous one, we find that
2
-fd'(Xi) - ~ !"(Xi)
fHI = fi + !'(Xi)
f ~ ! , , ( X i )
2!'(Xi) .
(2.32)
If !"/f' is approximately constant near the root, then at each step the error
is proportional to the square of the error at the previous step - if the error
is initially small, it gets smaller very quickly. Such convergence is termed
quadratic. In the Newton-Raphson iteration, each iteration essentially dou-
bles the number of significant figures that are accurate. (It is possible, how-
ever, for the method not to converge: this might happen if the initial error
is too large, for example. It's also possible that f' is near zero in the vicinity
of the root, which happens if there are two roots of f (x) close to one another.
Could you modify the method presented here to treat that situation?)
It would be desirable to combine the slow-but-sure quality of the bi-
section method with the fast-but-iffy characteristic of Newton's method to
come up with a guaranteed winner. With a little thought, we can do just that.
Let's start with the fact (verified?) that the root is bounded between x = a
and x = c, and that the best current guess for the root is x = b. (Note that ini-
tially we may choose b to be either a or c.) We'll include the iteration counter,
so that the code doesn't loop forever. But the critical step is to decide when
to take a bisection step, and when to take a Newton-Raphson step. Clearly,
we want to take a Newton-Raphson wherever possible. Since we know that
the root is bracketed, it's clearly in our interest to only take steps that lie
within those limits. Thus the crucial step is to determine if the next Newton-
Raphson guess will be outside the bounding interval or not. If the next guess
is within the bounds, then
f(b)
a ~ b- f'(b) ~ c.
(2.33)
Subtracting bfrom all terms, multiplying by - f' (b), and subtracting f (b), this
inequality can be rewritten as
(b - a)j'(b) - f(b) 2 0 2 (b - c)j'(b) - f(b). (2.34)
This inequality will be satisfied if the next iterate lies between a and c. An
easy way of determining ifthis is so is to compare the product ofthe first term
and the last term to zero. If the product is less than or equal to zero, then
the next Newton-Raphson guess will fall within the known limits, and so a
62 Chapter 2: Functions and Roots
Newton-Raphson step should certainly be taken; ifthis condition is not met, a
bisection step should be taken instead. Now, to attempt ajudicious merger of
the bisection and Newton-Raphson methods using this logic to decide which
method to utilize ...
Program MoreROOTS
Double Precision x_initial, x-final, x-root, FofX,
+ DERofF
External FofX, DERofF
x_initial = ...
x-final
call Hybrid(x_initial, x-final, x-root, FofX, DERofF)
write(*,*)' root at x =',x-root
end
*----------------------------------------------------------
Double Precision Function FofX(x)
Double Precision x
*
* < Define the function >
*
end
*----------------------------------------------------------
Double Precision Function DERofF(x)
Double Precision x
*
* < Define the derivative of the function. >
*
end
*----------------------------------------------------------
Subroutine Hybrid ( Left, Right, Best, F, Fprime )
*
* Paul L. DeVries, Department of Physics, Miami University
*
* A hybrid BISECTION/NEWTON-RAPHSON method for
* finding the root of a function F, with derivative
* Fprime. The root is known to be between Left and
* Right, and the result is returned in 'best'. If the
* next NR guess is within the known bounds, the step
* is accepted; otherwise, a bisection step is taken.
*
1/1/93 start date:
*
*
* The root is initially bracketed between Left and Right:
*
Rates of Convergence 63
*
x :
*
f(x):
*
f'(x):
*
*
Type Declarations
*
Left
fLeft
Best
fBest
DerfBest
Right
fRight
Double Precision f,fprime,Left,fLeft,Right,fRight,
+ Best,fBest,DerfBest,delta,TOL
integer count
*
* Initialize parameters
*
Parameter (TOL = 5.d-3)
*
* Initialization of variables
*
fLeft
fRight
f(Left)
f(Right)
*
* Verify that root is bracketed:
*
IF(fLeft * fRight .gt. O)STOP 'root NOT bracketed'
*
* Just to get started, let BEST = ...
*
IF( abs(fLeft).le.abs(fRight) ) THEN
Best Left
fBest fLeft
ELSE
Best Right
fBest fRight
ENDIF
DerfBest fprime(Best)
*
* COUNT is the number of times through the loop
*
100
count
count
o
count + 1
*
* Determine Newton-Raphson or Bisection step:
*
IF(
+
+ THEN
( DerfBest * (Best-Left ) - fBest) *
( DerfBest * (Best-Right) - fBest) .le. 0)
64 Chapter 2: Functions and Roots
*
* O.K. to take a Nevton-Raphson step
*
delta
Best
ELSE
-fBest/DerfBest
Best + delta
*
* take a bisection step instead
*
delta
Best
ENDIF
(Right-Left)/2.dO
(Left+Right)/2.dO
*
* Compare the relative error to the TOLerance
*
IF( abs(delta/Best) .le. TOL ) THEN
*
* Error is BELOW tolerance, the root has been found!
*
* write(*,*)'root found at ',Best
ELSE
*
* The relative error is too big, prepare to loop again
*
fBest
DerfBest
*
* Adjust brackets
*
f(Best)
fprime(Best)
IF( fLeft * fBest .le. 0 ) THEN
* The root is in left subinterval:
*
*
Right
fRight
Best
fBest
ELSE
* The root is in right subinterval:
*
Left
fLeft
ENDIF
Best
fBest
*
* Test for iteration count:
*
Rates of Convergence 65
IF( COUNT .It. 30 ) goto 100
*
* Can only get to this point if the ERROR is TOO BIG
* and if COUNT is greater than 30. Time to QUIT !!!
*
STOP 'Root not converged after 30 iterations.'
ENDIF
end
This looks like it just might work ...
a EXERCISE 2.6
Use the hybrid method discussed above to find the root ofthe equation
x
2
- 2x - 2 = 0,
given that there is a root between 0 and 3. You might find it interest-
ing to add write statements indicating whether a Newton-Raphson
or bisection step is being taken. Such statements can be a great help
in writing and debugging programs, and can then be commented out
when no longer needed. (And easily reinstated if a problem in the
code later develops!)
We should note that in our implementation, STOP statements were en-
countered if i) the root was not initially bracketed or if ii) the result hadn't
converged after 30 iterations. These statements will cause the execution of
the program to terminate. In a large, long-running program it might be
preferable for the program to continue to execute, but perhaps to take some
other action if these conditions have been met. To achieve that, we can intro-
duce an error flag. Typically an integer, the flag would be passed to Hybrid
in the argument list, initialized to zero, and the STOP statements replaced by
statements assigning a value to the flag. For example, if the root were not
bracketed, the flag could be set to 1; if the result didn't converge, the flag
could be set to 2. If either of these conditions were met, the value of the flag
would change but the programwould continue to execute. Control statements
would then be added to ROOTS, after the call to Hybrid, to test the value of the
flag: the flag being zero would indicate that the subroutine had executed as
expected, but a nonzero flag would indicate an error. Appropriate action, re-
definingthe initial interval, for example, could then be taken, depending upon
the error that had been encountered.
66 Chapter 2: Functions and Roots
Exhaustive Searching
We now have a good method for finding a root, if you first know that the root
is bounded. But how do you find those bounds? Unfortunately, the answer
is that there is no good way of finding them. The reason, of course, is that
finding the bounds is a global problem - the root might be found anywhere
- but finding the root, after the bounds are known, is a local problem. Almost
by definition, local problems are always easier to solve than global ones.
So, what to do? One possibility is to graph the curve, and let your
eye find the bounds. This is highly recommended, and for simple functions
can be done with pencil and paper; for more complicated ones, the computer
can be used to graph the function for you. But we can also investigate the
function analytically. When we were finding an initial guess for the root of
the Legendre function P
s
(x), we knew that all the roots were less than 1. As a
result, x
S
< x
6
< x
4
< x
2
, so we ignored the leading terms in the polynomial.
Keeping only the two largest terms, we could exactly solve for the root of the
truncated function. In a similar fashion, we can obtain a good guess for the
largest root of a polynomial, if it's greater than 1. To illustrate, consider the
quadratic equation
f(x) = x
2
- 11x +10. (2.35)
This can of course be solved exactly, and roots found at x = 1 and 10. But
keepingjust the leading two terms, we have the truncated function
which has a root at
/(x) = x
2
- 11x, (2.36)
x = 11, (2.37)
a very reasonable approximation to the root of the original function, obtained
without taking a square root. Of course, the value of the approximation is
more impressive if the problem isn't quite so obvious. Consider, for example,
the function
f(x) = x
3
- 7x
2
- lOx +16. (2.38)
Our approximation to the largest root is 7 - but what is the exact value?
While there are shortcuts for polynomials - many more than we've
mentioned - there are no similar tricks for general functions. The only
method that is generally applicable is brute force, exhaustive searching. The
most common strategy is simply to step along, evaluatingthe function at each
step and determining if the function has changed sign. If it has, then a root is
located within that step; ifnot, the search is continued. The problem with this
procedure, of course, is that the step might be so large that the function has
changed sign twice, e.g., there are two roots in the interval, in which case this
Look, Ma, No Derivatives! 67
method will not detect their presence. (If derivatives are easily obtained, that
information can be incorporated in the search by determiningifthe derivative
has changed sign. If it has, that suggests that there might be two roots in the
interval- at least, there is a minimum.)
Of course, determining an appropriate-sized step is as much educated
guesswork as anything else. Use any and all information you have about the
function, and be conservative. For example, we know that there are 4 roots of
P8(X) between 0 and 1. If they were equally distributed, they might be 0.333
apart. Any search should certainly use a step at least half this, or 0.167, if not
smaller.
EXERCISE 2.7
Find all the non-negative roots of P
8
(x). Use an exhaustive search to
isolate the roots, and then use the hybrid algorithm you've developed
to locate the roots themselves to a relative accuracy of 5 x 10-
8
.
Look, Ma, No Derivatives!
With the safeguards we've added, the hybrid Bisection/Newton-Raphson al-
gorithm you've developed is a good one. Unfortunately, it often happens that
the required derivatives are not directly available to us. (Or perhaps the func-
tion is available but is so complicated that obtaining its derivative is difficult;
or having the derivative, getting it coded without errors is unlikely.) For those
situations, we need a method that requires only evaluations of the function,
and not its derivatives, such as the bisection method.
Well, bisection works, but it certainly can be slow! The reason, of
course, is that we have not been very resourceful in using the information
available to us. After all, we know (assume?) that the root is bounded, a ::;
x :S c, and we knowthe value ofthe function at the limits ofthe interval. Let's
use this information as best we can, and approximate f(x) by the straight
line passing through the known values of the function at the endpoints of the
interval, [a, f(a)] and [c, f(c)]. This leads to the method offalse position. You
can easily verify that the line is given by the equation
x-c x-a
p(x) = -f(a) + -f(c)
a-c c-a
(2.39)
and can be thought to be an approximation to the function f(x). Being a
68 Chapter 2: Functions and Roots
simpler equation, its root can be easily determined: setting p(x) = 0, we find
_ af(c) - cf(a)
x - -'--'--'---'-.:..,...-'-
- f(c) - f(a) .
(2.40)
This approximation is depicted graphically in Figure 2.4. Since f(a) and f(c)
are opposite in sign, there's no worry that the denominator might vanish. It
should be fairly obvious that in many common circumstances this is a consid-
erable improvement over the bisection method. (An interesting analogy can
be found in finding someone's phone number in the telephone book. To find
Arnold Aaron's telephone number, the "bisection" method would first look
under M, halfway in the book. In would then divide the interval in half, and
look under G, about a quarter of the way through the book, and so on. In con-
trast, the method of false position would expect to find Arnold's name early
in the phone book, and would start looking only a few pages into it.)
1
o
f(x)
-1
FIGURE 2.4 The function f(x) = cos x - x and its linear approxi-
mation, p(x).
Mter the guess is made, the iteration proceeds by determining which
subinterval contains the root and adjusting the bounds a and c appropriately.
Since only one of the endpoints is changed, it is entirely possible, even likely,
that one ofthese points will be fixed. (Is this statement obviously true? If not,
look at the figure once again.) For example, the method might converge to the
root from above, so that each iteration leaves c closer and closer to the root,
but never changes at What are important are the successive guesses, and the
difference in them (Ii Iii Newton-Raphson) rather than the difference between
the bounds. Thus, you must keep track of Xi and xH1 as well as a and c.
Beginning with your existing bisection code, only a few changes are
Look, Ma, No Derivatives! 69
necessary to transform it into a false position code. The first, of course, is to
modify the expression for the root: instead of the variable Middle, we'll use
NewGuess, and define it according to Equation (2.40). Also, we need to keep
tabs on the successive approximations: we'll use OldGuess as the previous
approximation. An outline of the code might look like
Subroutine False ( Left, Right, NewGuess, F )
*
* < prepare for looping: set initial values, etc. To
* get started, must assign some value to OldGuess. >
*
OldGuess = right
* < TOP of loop >
NewGuess = ( Left * fRight - Right * fLeft)
+ / (fRight - fLeft)
fGuess = f(NewGuess)
Error = ABS ( (NewGuess - OldGuess)/NewGuess )
IF( Error .gt. TOL )THEN
* <determine which subinterval contains
* the root, and redefine Left and Right
* accordingly>
OldGuess = NewGuess
* < go to TOP of loop>
ELSE
*
*
EXERCISE 2.8
ENDIF
end
< NewGuess is a good approximation to
the root. >
Modify your old bisection code to use the method offalse position. Use
the new code to find the root of f (x) = x - cos x = 0, and compare the
effort required, Le., the number of iterations, to that of the bisection
method.
The method of false position appears to be an obvious improvement
over the bisection method, but there are some interesting situations in which
that's not so. Consider the example in Figure 2.5. Although the root is brack-
eted, the bounds aren't close enough to the root to justify a linear approxima-
tion for the function. In this case, the method of false position might actually
require more function evaluations than bisection.
70 Chapter 2: Functions and Roots
FIGURE 2.5 An example illustrating that the convergence of the
method of false position can be lethargic.
We can rewrite Equation (2.40) as
c-a
x = a - f(a) f(c) _ f(a)'
(2.41)
which looks suspiciously like Newton-Raphson with the derivative approxi-
mated by
!,(a) ~ f(c) - f(a).
c-a
(2.42)
Recall, however, that one of the endpoints will be fixed as we approach the
root. That is, c might be fixed as a approaches the root. Then c- a approaches
a constant, but not zero. Although Equation (2.42) looks something like a
derivative, this expression does not approach the actual derivative in the limit.
However, the successive iterative approximations do approach one an-
other, so that they can be used to approximate the derivative. Using the pre-
vious two iterations for the root to approximate the derivative appearing in
Newton's method gives us the Secant method. That is, if Xi and Xi-l are the
previous two approximations to the root, then the next approximation, Xi+b
is given as
Xi - Xi-l
Xi+! = Xi - f(Xi) f(Xi) _ f(x.-d (2.43)
As with the Newton-Raphson method, the Secant method works very well
when it works, but it's not guaranteed to converge. Clearly, a hybrid combi-
nation of the Bisection and Secant methods will provide the superior method
for finding roots when explicit derivative information is not available.
Accelerating the Rate of Convergence 71
EXERCISE 2.9
Replace Newton-Raphson by Secant in the Hybrid code, so that only
function evaluations - no derivatives - are used to find the root of
a function. Use this method to find a root of f (x) = sin x - x/2 = 0
between 1r/2 and 1r.
Accelerating the Rate of Convergence
For the exercises you've seen so far, and for many similar problems, the speed
with which the computer finds a root is not really an issue: the time required
to find a solution has all been in the development, writing, and debugging
of the computer program, not in the actual calculation. But effort spent in
developing good, reliable methods is spent only once - you're now learning
how to find roots, and that effort shouldn't be duplicated in the future. In
fact, you should regard your current effort as an investment for the future, to
be paid off when you desperately need a solution to a particular problem and
realize you already have all the tools needed to obtain it.
In the real world we usually find that the problems that interest us
become more complicated, and the functions get harder to evaluate. Ai3 a re-
sult, the time required to calculate a solution gets longer. Thus the efficiency
of a root finder should be discussed in terms of the number of function eval-
uations required, not in the complexity of the root-finding algorithm. For
example, perhaps the "function" whose zero you're trying to find is actually
a multidimensional integral that requires an hour of supercomputer time to
evaluate. Not having a supercomputer at your disposal, that's fifty hours on
your local mainframe, or ten days on your microcomputer. Very quickly, you
realize that life will pass you by if you can't find that root Real Soon Now!
And a few extra multiplications to find that root, at microseconds each, don't
add much to the total time involved.
Thus, we want a root, and we want it with as few function, and/or
derivative, evaluations as possible. The key is to realize that we've been all
too eager to discard expensive, hard-to-obtain information -let's see if we
can't use some of those function evaluations that we've been throwing away.
For concreteness, let's look at the Secant method a little more closely.
The harder information is to obtain, the more reluctant we
should be to discard it.
72 Chapter 2: Functions and Roots
In the Hybrid Bisection/Secant method we've developed, a linear ap-
proximation is used to obtain the next approximation to the root, and then
the endpoints of the interval are adjusted to keep the root bounded. By using
three points, which we have, a quadratic could be fitted to the function. We
would then have a quadratic approximation for the root rather than a linear
one, and with no additional function evaluations. Sounds good.
Consider the points Xo, Xl> and X2, and the function evaluated at these
points. We can think of these as three successive approximations to the root
of the function f (x). The quadratic
will pass through these points if
c = f(X2),
b = (xo - X2)2[f(XI) - f(X2)] - (Xl - X2)2[f(xO) - f(X2)]
(xo - XI)(XO - X2)(XI - X2) ,
(Xl - x2)[f(xo) - f(X2)] - (xo - x2)[f(xd - f(X2)]
a= .
(xo - XI)(XO - X2)(XI - X2)
(2.44)
(2.45)
The next approximation to the root, X3, is then found by setting p(X3) = 0 and
solving the quadratic equation to find
-b y'b
2
- 4ac
X3 - X2 = 2a
(2.46)
We expect (or, at least, we hope) that we are converging to the root, so that
X3 - X2 is small in magnitude. But according to Equation (2.46), this will only
happen if -b and y'b
2
- 4ac are nearly equal in magnitude and opposite in
sign. That is, the small magnitude will be achieved by taking the difference
between two nearly equal numbers, not a very sound practice. Rather than
evaluating the root by Equation (2.46), let's develop an alternate form that
has better numerical characteristics.
Let's assume for the moment that b ~ 0 - then the root will be as-
sociated with the plus sign in the expression, and we can rewrite Equation
Accelerating the Rate of Convergence 73
(2.46) as
(
-b + Vb
2
- 4aC) (-b - Vb
2
- 4aC)
X3 - X2 =
2a -b - Vb
2
- 4ac
b
2
- (b
2
- 4ac) 2c
2a( -b - Vb
2
- 4ac) b+ Vb
2
- 4ac'
For b :::; 0, the same reasoning leads to
_ (-b - Vb
2
- 4ac) (-b+ Vb
2
- 4ac)
X3 - X2-
2a -b+ Vb
2
- 4ac
b
2
- (b
2
- 4ac) 2c
- 2a( -b + Vb
2
- 4ac) = -b + Vb
2
- 4ac'
b 2: O.
b:::; o.
(2.47)
(2.48)
It should be fairly clear that the Hybrid Bisection/Secant method can
easilybe modified to incorporate this quadratic approximation to find the root.
The quadratic approximation itselfis due to Miiller; couplingit with the Bisec-
tion method is due to Brent. The result is a robust, virtually failsafe method
of finding roots using only function evaluations.
EXERCISE 2.10
Modify your code to include this accelerated method. Verify that it's
functioning correctly by finding x such that
cosx = xsinx. (2.49)
Before leaving this section, let's see if we can't extract a little more
information from our expression for the root. Differentiating (2.44), we find
that
so that
p'(x) = 2a(x - X2) + b, (2.50)
(2.51)
Near the root, we'll assume that the derivative is nonzero, while the function
itself is small. That is, b = P'(X2) i= 0 while c = P(X2) ~ O. We can then write
2c 2c
X3 - X2 = - = - , b 2: 0,
b+ Vb
2
- 4ac b+bJl - 4ac/b
2
(2.52)
74 Chapter 2: Functions and Roots
where 4acjb
2
is a small term. Making small argument expansions, we find
2c 2c 1
x 3 - X 2 ~ - ~ - - ~ - ~
b +b(1 - 2acjb2 +...) 2b 1 - acjb
2
c ac c ac2
~ -1;(1 + b
2
+...) ~ -I; --b3
/(X2) !"(X2)/2(X2)
- !'(X2) - 2[!'(X2)]3 .
(2.53)
The first term we recognize as just the Newton-Raphson expression - the
second is a correction term, partially accounting for the nonlinearity of the
actual function. (That is, a quadratic contribution.) This expression has an
interesting geometrical interpretation. Rather than simply putting a straight
line through [X2, /(X2)], the line is drawn through a point that is moved up
or down to compensate for the shape of the function, as seen in Figure 2.6.
The line that's drawn is still linear, but the curvature ofthe function has been
taken into account.
FIGURE 2.6 Accelerating the rate of convergence.
A Little Quantum Mechanics Problem
Our interest is not in numerical methods per se, but in investigating physical
processes and solving physical problems. The only reason we've been looking
at the roots of equations is because there are problems of interest to us that
A Little Quantum Mechanics Problem 75
present themselves in this way. One such problem arises in quantum mechan-
ics and is routinely used for illustration in elementary textbooks: finding the
eigenvalues of the finite square well.
Newton's equations ofmotion playa fundamental role in classical me-
chanics; in quantum mechanics, that role is played by the Schrodinger equa-
tion,
;,,2 cP'ljJ
- 2m dx2 + V(x)'ljJ(x) = E'ljJ(x).
(2.54)
This is the equation that determines "all there is to know" about the one-
dimensional problem of a particle of mass m moving in the potential V(x).
In this equation, ;" is a fundamental constant due to Planck, E is the total
energy, and 'ljJ is the wavefunction of the system. 'ljJ is the unknown quantity
that you're solving for, the "solution" to the Schrodinger equation. Once 'ljJ is
determined, all the observable quantities pertaining to the system can be cal-
culated. In particular, 'ljJ* 'ljJ dx is the probability offinding the particle between
x and x +dx.
Now comes the hard part: it can happen that E is unknown as well!
It would seemthat there are too many unknowns in the Schrodinger equation
for us to solve the problem, and that would be true except for certain physical
constraints that we impose. Let's consider a specific example, the infinite
square well.
E
00 00
-a 0 a
FIGURE 2.7 The infinite square well.
As illustrated in Figure 2.7, the potential for the infinite square well
is zero between -a and a, but infinite outside that region. The Schrodinger
equation can be solved in the interior of the well with the general result that
'ljJ(x) = Asinkx +Bcoskx, (2.55)
76 Chapter 2: Functions and Roots
where
k - J2rnE (2.56)
- ;,,2'
Outside the interval -a :s x :s a, the potential is infinite and so we shouldn't
expect to find the particle there. If the particle can't be found there, then the
wavefunction is zero in that region. Since we also expect that the probability
of finding a particle is a continuous function, we require that the wavefunc-
tion vanishes at a. That is, the physics of the problem requires that the
wavefunction vanish at these points. We have thus established boundary con-
ditions that must be satisfied by any mathematical solution in order to be
physically correct.
Now we need to impose these boundary conditions upon the general
solution. At x = -a, we require that
-Asin ka +Bcoska = 0,
while at x = +a, we require that
A sin ka +Bcoska = O.
We now add and subtract these two equations, to obtain
Bcoska = 0
and
Asinka = o.
(2.57)
(2.58)
(2.59)
(2.60)
Consider the second equation, A sin ka = O. This can be accomplished in one
of two ways: either A or sin ka is identically zero. Let's take A = O. Then,
if the wavefunction is to be at all interesting, B i= O. But B cos ka = 0, and
if B i= 0, then we must have cos ka = O. The only way the cosine can vanish
is if the argument is equal to 71"/2, 371"/2, .... That is, we've found that the
boundary condition can be met if
or
n= 0,1, ... (2.61)
(n +1/2)271"2;,,2
En = 2 2 ' n = 0, 1, . . . (2.62)
rna
We have thus found an entire set of discrete energies for which the boundary
condition is met, corresponding to the case with A = 0, B i= 0, in which all
the solutions are even functions of x.
A Little Quantum Mechanics Problem 77
We can also take A =I 0. We then find that sin ka = 0, so that ka =
0, 1f, 21f, .. .. This leads us to the odd solutions for which A =I 0, B = 0, and
(2.63)
Again, we find an entire set of solutions corresponding to a particular parity.
A general consequence of boundary conditions is this restriction to a set of
solutions - not every value of the energy is permitted! Only certain values
of the energy lead to solutions that satisfy both the differential equation and
the boundary conditions: the eigenvalues.
But what if the potentials are not infinitely large? In the finite square
well problem, we identify three different regions, as indicated in Figure 2.8:
E
V
o
I II
-a 0 a
III
FIGURE 2.8 The finite square well.
We will only concern ourselves with states of the system that are lo-
calized, having energy less that Vo. In region I, the Schrodinger equation is
(2.64)
which has as a general solution
(2.65)
We might be tempted to require both C and D to be zero so that the wavefunc-
tion would be zero and there would be no possibility of finding the particle in
region I. But this is inconsistent with the experiment! Sometimes, it's not easy
being a physicist. One of the real surprises of quantum mechanics is that the
wavefunction for a particle can be nonzero in places that classical mechan-
ics would not allow the particle to be: region I is such a classically forbidden
78 Chapter 2: Functions and Roots
region. What we do find, however, is that the farther into the classically for-
bidden region we look, the less likely it is to find the particle. That is, the
wavefunction must decrease as it goes into the barrier. The correct boundary
condition is then that D must identically vanish, else the probability would
increase as the forbidden region was penetrated, contrary to the above discus-
sion.
In region II the general solution is
'l/JII = A sin o:x + B cos o:x,
while in region III it must be
0: = J 2 ~ E , (2.66)
(2.67)
to satisfy the boundary condition on forbidden regions. Furthermore, we are
going to require that both 'l/J(x) and 'l/J'(x) be continuous - we don't expect
to find a sudden change in where the particle is located. (Even in the infinite
square well, the wavefunction was continuous across the boundaries. The
discontinuity of the derivative was due to the infinite nature of the potential
at that point.) At x = -a, this requires that
-Asino:a + B coso:a = Ce-{3a
and
o:A cos o:a +o:B sin o:a = {3Ce- {3a ,
while at x = a we find
A sin o:a + B cos o:a = Fe-{3a
and
o:Acoso:a - o:B sin o:a = -{3Fe-{3a.
(2.68)
(2.69)
(2.70)
(2.71)
After some algebra, we again find two cases, according to the parity of the
solution:
Even States: A = 0, B =I- 0, C = F, o:tano:a = {3.
Odd States: A =I- 0, B = 0, C = -F, o:coto:a = -{3.
(2.72)
(2.73)
This is the result most often displayed in textbooks. We see that the original
problem of finding the energies and wavefunctions of the finite square well
Computing Strategy 79
has evolved into the problemoffinding the roots ofa transcendental equation.
And that's a problem we know how to solve!
Computing Strategy
When we're first presented with a substantial problem, such as this one, it
is easy to become overwhelmed by its complexity. In the present case, the
goal is to find the energies and wavefunctions of the finite square well, but
we're not going to get there in one step. While we always want to remember
where we're going, we need to break. the original problem into several smaller
chunks of a more manageable size, and solve them one at a time. This is the
"modularization" we spoke of earlier, and we see a strong correlation between
the process of solving the physical problem, and the writing of computer code
that addresses a particular piece of the problem.
Our "main program" will initialize some variables, and then call one
of the root-finding subroutines actually to find the root. It's in this main
program that any issues of a global nature, issues that will pertain to the
program as a whole, should be addressed. And in this project, we have such
an issue: units.
In physics, we are accustomed to quantities such as mass having two
attributes: their magnitude and their unit. Saying that a particle has mass
2 doesn't tell me much - is it 2 kilograms, or 2 metric tons? The com-
puter, however, only knows magnitudes - it is up to the computor to keep
the units straight. Sometimes this is easy, and sometimes not. In the current
problem, we have attributes such as mass, distance, and energy to be con-
sidered. For macroscopic objects, expressing these attributes in kilograms,
meters, and joules is natural, but these are extremely large units in which to
express quantum mechanical entities. For example, the mass of the electron
is about 9.11 x 10-
31
kilograms - a perfectly good number, but rather small.
It's best to use units that are natural to the problem at hand: for the square
well problem, electron masses, Angstroms, and eV's are an appropriate set to
use. Using n= 6.5821220 x 10-
16
eVsec, we can then write
(2.74)
It's no accident that the numerical factor that appears here is on the order
of one - in fact, that's the reason for this choice of units. These are not the
conventional units of n, but they work very nicely in this problem.
Let's imagine that we are to find the energy ofthe lowest state having
80 Chapter 2: Functions and Roots
even parity. Then we would rewrite Equation (2.72) as
f(E) = 0: tan o:a - ,B = 0, (2.75)
where 0: = J2:
2
E and ,B = V2m(Vo - E)/h? (2.76)
Now, the root-finders we've developed expect - actually, they demand! - that
the root be bracketed. At this point, we have no idea where to start looking
for the root, except that the root must lie between zero and the top ofthe well.
An exhaustive search, simply calculating the function at several energies, say,
every 0.1 eVup to 10 eV, could be conducted, but we would like to find the root
with as few function evaluations as possible. A general strategy is to look for
"special points" where the function has known forms. At E = 0, the function
is easily evaluated as
f(O) = -V2mVo/h? (2.77)
We also recognize that it goes to infinity as the argument of the tangent goes
to 7f /2. In fact, the repetitive nature of the tangent function suggests to us
that there might be several roots, one during each cycle of the function.
Use analytic results to establish limiting cases.
Q
FIGURE 2.9 A plot of the functions tan o:a and f3/0: versus Q.
In Figure 2.9, we've plotted tano:a and ,B/o:, for the case ofV
o
=10 eV,
a = 3 A, and m = 1 me, in which 0: ranges from zero to about 1.62 A-1. While
Computing Strategy 81
tanaa contains about 1.5 cycles in this range, /31a decreases from infinity at
a = 0 to zero at
a = V2mVolh2.
The figure clearly indicates a root lying in the range
7r
2
h
2
o< E < --2 ~ 1.045 eV.
8ma
(2.78)
(2.79)
(The one-dimensional square well always has a root, and hence at least one
bound state always exists. This is not the case for three dimensions.)
Simple plots can help us visualize what's going on.
We now have rigorously bracketed the root during each cycle of the
tangent function, but we're not quite ready to start. The difficulty is that
we never want the computer actually to evaluate an infinity - which is ex-
actly what the computer will try to do if instructed to evaluate the tangent at
aa = 7r12. We could set the upper bound at some slightly smaller value, say,
0.999999 times the upper limit of the energy. But if this value is too small,
the root would not be bounded. Instead of trying to "patch the code" to make
it work, let's see if we can find the true origin ofthe difficulty.
When we imposed boundary conditions, we found that for even states
B cos aa = Ce-{3a
and
aBsin aa = /3Ce-{3a.
We then wrote this requirement as
a tan aa = /3,
but it could just as easily have been written as
/3 cos aa = a sin aa.
(2.80)
(2.81)
(2.82)
(2.83)
In fact, Equation (2.83) is just a statement of the matching condition, with
common terms eliminated. To obtain Equation (2.82) from (2.83) we had to
divide by cos ka - our infinity problem originates here, when ka = 7r and
82 Chapter 2: Functions and Roots
we're dividing by zero! We can totally avoid the infinity problem, and simulta-
neously improve the correspondence between computer code and the mathe-
matical boundary conditions, by replacingthe transcendental equations (2.72)
and (2.73) by
Even States: A = 0, B i- 0, C = F, f3 cos aa = a sin aa.
Odd States: Ai- 0, B = 0, C = -F, acosaa = -f3 sin aa.
(2.84)
(2.85)
1/1/93
In analogy to Figure 2.9, we could now plot f3 cos aa and a sin aa ver-
sus a. The curves would cross at exactly the same points as do f3/ a and tan aa,
but would be preferable in the sense that they have no singularities in them.
However, having the capability of the computer to plot for us creates many
options. For instance, there's no longer any reason to be using a as the inde-
pendent variable - while a is a convenient variable for humans, the computer
would just as soon use the energy directly! This facilitates a more straight-
forward approach to the solution of the problem before us.
All we have left to do is to code the FUNCTION itself. It's no accident
that this is the last step in the process of developing a computer program to
solve our physical problem - although clearly important to the overall goal,
it's at the end of the logical chain, not the beginning. For the even parity
solutions, the function might be coded something like this:
Double Precision FUNCTION EVEN(E)
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* This FUNCTION evaluates the EVEN condition for the
* finite square well,
*
* F(E) = beta * cosine(alpha*a) - alpha * sine(alpha*a)
*
* for a well with DEPTH VO and WIDTH 2A.
*
* Mass is expressed in electron masses.
* Energy is expressed in eV.
* Distance is expressed in Angstroms.
*
*
*
*
* Declare variables:
*
Computing Strategy 83
Double Precision E, a, VO, Mass, h_bar_SQ, alpha, beta
*
* Specify constants:
*
PARAMETER( a =? ,VO = ?)
PARAMETER( Mass = ?, h_bar-SQ 7.6199682dO)
*
* Evaluate the function and return.
*
alpha = sqrt (2*Mass* E / h_bar_SQ )
beta = sqrt( 2*Mass(VO-E)/ h_bar_SQ )
even = beta * cos(alpha*a) - alpha * sin(alpha*a)
end
We've used the PARAMETER statement to specify the constants so that
they can't be accidentally changed within the program.
a EXERCISE 2.11
Plot the function
f (E) = {3 cos aa - a sin aa
as a function of energy.
All that's left is to solve the original problem!
a EXERCISE 2.12
Find the lowest even and lowest odd solutions to the square well prob-
lem, with a = 3 A, m = 1 me, and Vo = 10 eY. Plot the potential and
the wavefunctions associated with these eigenvalues. (The conven-
tion is to align the zero baseline of the wavefunction with the energy
eigenvalue.)
It's quite common in physics that the solution to one problem just
leads to more questions. You've now found the lowest energies of the square
well for a particular width parameter a, but doesn't that make you wonder
how the energies would change with a? Or with V
o
? To answer these ques-
tions, you need to transform the program you now have into a subroutine,
remove V
o
and a from the parameter list, and call your new subroutine from
the main program with different values of V
o
and a.
84 Chapter 2: Functions and Roots
EXERCISE 2.13
Investigate the dependence of the lowest two eigenvalues (energies)
of the square well upon V
o
(with a fixed at 3 A) and a (with V
o
fixed
at 10 eV), and plot your results. Since you will want V
o
to extend to
infinity, you will probably want to plot the energies versus V
O
-
1
, and
versus a. What is the smallest width that will support a bound state?
And what would happen ifyou had not one square well, but two, as in
Figure 2.10? Ofcourse, you would need to develop different solutions than the
ones used here, with five regions of interest instead of three, but the process
is the same. And the double square well exhibits some interesting behavior,
not seen in the single square well problem. How do the lowest energies of
even and odd parity change as the distance between the wells change? What
do the eigenfunctions look like? From physical reasoning, how must the low-
est eigenvalues and their wavefunctions behave in the limit that the distance
between the wells vanishes?
E
-
V
o
-
I II III IV V
-c -b 0 b c
EXERCISE 2.14
FIGURE 2.10 The double square well.
Consider the double square well, with a single well described with the
parameters a = 3 A, m = 1 me, and Vo = 10 eY. If the two wells are
far apart, there are two energies very near the energy of a single well.
Why? Investigate the dependence of these energies as the wells are
brought nearer to one another.
References 85
References
Root finding is a standard topic of numerical analysis and is discussed in many
such texts, including
Anthony Ralston, A First Course in Numerical Analysis, McGraw-
Hill, New York, 1965.
Richard L. Burden andJ. Douglas Faires, Numerical Analysis, Prindle,
Weber & Schmidt, Boston, 1985.
Curtis F. Gerald and Patrick O. Wheatley, Applied Numerical Analy-
sis, Addison-Wesley, Reading, Massachusetts, 1989.
Although somewhat dated, the following text is particularly commendable
with regard to the author's philosophy of computation:
Forman S. Acton, Numerical Methods That Work, Harper & Row, New
York,1970.
The modifications of Brent and Muller are discussed in
J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, Springer-
Verlag, New York, 1980.
Chapter 3:
Interpolation and
Approximation
In the last chapter, we noted that an approximation to a function was useful
in finding it's root, even though we had the exact function at our disposal.
Perhaps a more common circumstance is that we don't know the exact func-
tion, but build our knowledge of it as we acquire more information about it,
one point at a time. In either case, it's important for us to incorporate the
information we have into an approximation that is useful to us. Presumably,
as we gather more information, our approximation becomes better.
In this chapter, several ways to approximate a function and its deriva-
tives are investigated. With interpolation, an approximating polynomial is
found that exactly describes the function being approximated at a set of spec-
ified points. Lagrange and Hermite interpolation are discussed, and the use
of cubic splines is developed. Application of Taylor's series methods to the
evaluation of derivatives is also discussed, as is the important technique of
Richardson extrapolation. Curoe fitting, which approximates a function in a
general sense, without being constrained to agree with the function at every
point, is discussed - the method of least squares is developed in this regard.
Along the way, a need to solve sets of linear equations is encountered and so
a method to solve them is developed. In passing, we note that functions can
be expressed in terms of other functions, and that sets of orthogonal func-
tions are particularly convenient in this regard. Finally, the method of least
squares fitting using non-polynomial functions is developed, leading us to con-
sider minimization methods in multiple dimensions.
Lagrange Interpolation
While Taylor's series can be used to approximate a function at x ifthe function
Lagrange Interpolation 87
and its derivatives are known at some point, a method due to Lagrange can
be used to approximate a function if only the function is known, although
it must be known at several points. We can derive Lagrange's interpolating
polynomial p(x) from a Taylor's series by expressing the function at Xl and X2
in terms of the function and its derivatives at x,
f(xd = f(x) + (Xl - x)!,(x) + "',
f(X2) = f(x) + (X2 - x)!,(x) + .... (3.1)
We would like to truncate the series and retain only the first two terms. But
in so doing, the equality would be compromised. However, we can introduce
approximations to the function and its derivative such that an equality is re-
tained. That is, we introduce the new function p(x) and its derivative such
that
f(XI) = p(x) + (Xl - x)p'(x)
and
f(X2) = p(x) + (X2 - x)p'(x). (3.2)
Clearly, p(x) is equal to f(x) at Xl and X2, and is perhaps a reasonable approx-
imation in their vicinity. We then have two equations in the two unknowns
p(x) and p'(x); solving for p(x), we find
(3.3)
a linear function in x. This is nothing more than the equation of the line pass-
ing through the points [Xl, f(xdJ and [X2' f(X2)], and could have been found
by other means. In fact, we used this equation in Chapter 2 in developing the
method of false position.
But the form of Equation (3.3) is very convenient, and interesting.
For example, it says that the contribution of f(X2) to the approximation is
weighted by a given factor, (x - XI)/(X2 - xd, which depends upon the dis-
tance Xis away from Xl. As Xvaries between Xl and X2, this factor increases
(linearly) from 0 to 1, so that the importance of f(X2) to the approximation
varies from "irrelevant" to "sole contributor." At the same time, the factor
multiplying f(xd behaves in a complementary way, decreasing linearly as X
varies between Xl to X2.
A higher order approximation can easily be obtained by retaining
more terms in the series expansion. Of course, if another term is retained,
the function must be known at an additional point as well. For example, the
88 Chapter 3: Interpolation and Approximation
three equations
l(xI) = I(x) + (Xl - x)!,(x) + (Xl; x)2 !"(x) + ,
l(x2) = I(x) + (X2 - x)!,(x) + (x
2
; x)2 !"(x) + ,
l(x3) = I(x) + (X3 - x)j'(x) + (X3 ; x)2 !"(x) +"', (3.4)
can be truncated, the functions replaced by their approximations, and the
equations solved to yield the quadratic interpolating polynomial
(3.5)
Again, we see that p(X j) = I (xj ). Earlier, we obtained p(x) by writing
p(x) = ax
2
+ bx +c and determining a, b, and c by requiring that the approx-
imation be equal to the function at three points. Since there is one and only
one quadratic function passing through any three points, the interpolating
polynomial of Equation (3.5) must be identical to the one we found earlier,
although it certainly doesn't look the same.
Equations (3.3) and (3.5) suggest that a general interpolating polyno-
mial of order (n - 1) might be written as
n
p(x) = L lj,n(x) I(xj),
j=l
where the function I(x) is known at the n points Xj and
is the coefficient of I(xj). Note that
(3.6)
(3.7)
i = j,
i =i j,.
a relation that is compactly expressed in terms of the Kronecker delta,
(3.8)
The Airy Pattern 89
This approximation to I(x) is known as the Lagrange interpolating polyno-
mial. It can be extended to more points - later we will argue that this is not
a good practice, however.
Once upon a time - before the widespread availability of computers
- interpolation was an extremely important topic. Values of special functions
ofinterest to mathematical physics, such as Bessel functions, were laboriously
calculated and entered into tables. When a particular value of the function
was desired, interpolation was used on the tabulated values. Sophisticated,
specialized methods were developed to perform interpolation of tabular data
in an accurate and efficient manner.
Today, it is more likely to have a function evaluated as needed, rather
than interpolated, so that the importance of interpolation with respect to tab-
ular data is somewhat diminished. But the topic of interpolation is still im-
portant, because sometimes the data are simply not available at the points of
interest. Certainly, this often happens with experimental data. (Depending
upon the circumstances, there might be more appropriate ways to analyze ex-
perimental data, however.) Interpolation also provides a theoretical basis for
the discussion of other topics, such as differentiation and integration.
The Airy Pattern
When light enters a telescope, only that which falls upon the lens (or mir-
ror) is available to the astronomer. The situation is equivalent to an infinitely
large screen, with a circular hole the size of the lens (or mirror) cut in it.
And we all know that when light passes through an aperture like that, it suf-
fers diffraction. For a circular aperture, the resulting pattern of light and
dark rings is known as the Airy pattern, named after Sir George Airy, the As-
tronomer Royal, who first described it in 1835. This diffraction distorts the
image of even a point source of light, although this distortion is usually small
compared to other sources of distortion. Only in the best of telescopes have
these other sources been eliminated to the point that the quality of the image
is diffraction limited. In this instance, the intensity of the light is described
by the function
(3.9)
where 1
0
is intensity ofthe incident light and J
1
(p) is the Bessel function of or-
der 1. (The center of the image is at p = 0.) The Bessel function has many in-
teresting characteristics, and is often studied in mathematical physics courses
and discussed at length in textbooks. A few values of the Bessel function are
90 Chapter 3: Interpolation and Approximation
listed in Table 3.l.
TABLE 3.1 Bessel Functions
P
Jo(p) Jl(P) J2(p)
0.0 1.0000000000 0.0000000000 0.0000000000
1.0 0.7651976866 0.4400505857 0.1149034849
2.0 0.22389 07791 0.57672 48078 0.35283 40286
3.0 -0.26005 19549 0.3390589585 0.48609 12606
4.0 -0.3971498099 -0.06604 33280 0.3641281459
5.0 -0.17759 67713 -0.32757 91376 0.0465651163
6.0 0.15064 52573 -0.27668 38581 -0.24287 32100
7.0 0.30007 92705 -0.0046828235 -0.30141 72201
8.0 0.1716508071 0.23463 63469 -0.1129917204
9.0 -0.0903336112 0.2453117866 0.14484 73415
10.0 -0.24593 57645 0.0434727462 0.25463 03137
EXERCISE 3.1
Using pencil and paper, estimate J
1
(5.5) by linear, quadratic, and cu-
bic interpolation.
You probably found the computations in the Exercise to be tedious -
certainly, they are for high order approximations. One of the advantages of
the Lagrange approach is that its coding is very straightforward. The crucial
fragment of the code, corresponding to Equations (3.3) and (3.5), might look
like
*
* Code fragment for Lagrange interpolation using N points.
* The approximation P is required at XX, using the
* tabulated values X(j) and F(j).
*
P = O.dO
DOj=1,N
*
* Evaluate the j-th coefficient
*
Lj = l.dO
DO k = 1, N
IF(j .ne. k) THEN
Lj = Lj * ( xx-x(k) )/( x(j)-x(k) )
ENDIF
The Airy Pattern 91
END DO
*
* Add contribution of j-th term to the polynomial
*
p = p + Lj * F(j)
END DO
We should point out that products need to be initialized to 1, just as
sums need to be initialized to O. This fragment has one major flaw - what
happens if any two of the Xi are the same? While this doesn't happen in the
present example, it's a possibility that should be addressed in a general inter-
polation program.
In the above exercise, which p values did you use in your linear inter-
polation? Nothing we've said so far would prevent you from having used p = 0
and 1, but you probably used p = 5 and 6, didn't you? You know that at p = 5
the approximating function is exact. As you move away from 5, you would
expect there to be a difference to develop between the exact function and the
approximation, but this difference (i.e., error) must become zero at p = 6. So
if you use tabulated values at points that surround the point at which you are
interested, the error will be kept to a minimum.
\
\
\
\
\
\
\
\
\
:\
/
/
/
/
'"
'"
'"
"'----
L;
FIGURE 3.1 The Bessel function J1 (p) and the cubic Lagrange poly-
nomial approximation to it, using the tabular data at p = 4, 5, 6, and
7.
The converse is particularly illuminating. Using the interpolating
polynomial to extrapolate to the region exterior to the points sampled can
92 Chapter 3: Interpolation and Approximation
lead to disastrous results, as seen in Figure 3.1. Here the cubic interpola-
tion formula derived from sampling p at 4, 5, 6, and 7 is plotted versus the
true J1(p). Between 5 and 6, and even between 4 and 7, the approximation
is very good. But there's only so much flexibility in a cubic function, and so
the interpolating polynomial must eventually fail as we use it outside of that
region. An extreme example is in fitting a periodic function, such as a cosine,
with a polynomial. An n-th degree polynomial only has (n - 1) extrema, and
so cannot possibly follow the infinite oscillations of a periodic function.
One would think that by increasing the order of the polynomial, we
could improve the fit. And to some extent, we can. Over some specific region,
say, 5 ::; p ::; 6, a cubic fit will almost surely be better than a linear one. But as
we consider higher order approximations, we see less ofan improvement. The
reason, of course, is that the higher order polynomials require more points in
their determination. These points are farther from the region of interest and
so do little to enhance the quality of the approximation.
....
I ,
, I
'..-
FIGURE 3.2 An example of the failure of a high order interpolating
polynomial.
And there is another factor, particularly for evenly spaced data - a
high order interpolating polynomial necessarily has "a lot of wiggle in it,"
even if it doesn't belong! For example, consider the function
f(x) _ 1 +tanh 2QX
- 2 .
(3.10)
This function has some interesting characteristics. As Q ~ 00, f(x) becomes
zero for x < 0 and unity for x > O. That is, it tends toward the Heaviside step
function! For Q = 10, f(x) is plotted in Figure 3.2 on the interval-I::; x ::; 1.
The curve is smooth and well-behaved, although it does possess a "kink" at
x = 0 - one that tends towards a discontinuity as Q ~ 00. We've also plotted
the Lagrange interpolating polynomial obtained from using 11 evenly spaced
Hermite Interpolation 93
points in this interval. Clearly, the approximation is failing, particularly at
the ends of the interval.
a EXERCISE 3.2
Use a 21-point interpolating polynomial to approximate the function
in Equation (3.10). Plot the approximation and the function, as in
Figure 3.2. Does the higher order approximation help?
Experience suggests that the high order approximations aren't very
useful. Generally speaking, a cubic approximation, with the points chosen so
as to surround the desired point of evaluation, usually works pretty well. In
developing an appropriate computer routine, you need to develop an "auto-
matic" way of surrounding the point, and make a special provision for the two
ends of the table. For ease-of-use, the code should be packaged as a FUNCTION.
EXERCISE 3.3
Using the FUNCTION as described above to interpolate on the table of
Bessel functions, write a computer program to evaluate the relative
intensity 1/1
0
as a function of p across the Airy diffraction pattern,
and plot it. To generate a "pleasing" display, you should evaluate the
intensity at 0.1 increments in p.
Hermite Interpolation
Earlier we commented that having distant information about a function was
not of great help in approximating it on a specific region. But if we could have
more information about the function near the region ofevaluation, surely that
must be of help! It sometimes happens that information about the derivative
ofthe function, as well as the function itself, is available to us. Ifit is available,
then we should certainly use it!
In the present example, we have a table of Bessel functions of various
orders at certain tabulated points. But the Bessel functions have some special
properties. (Again, the interested reader is encouraged to peruse a suitable
reference.) For example,
(3.11)
and
(3.12)
94 Chapter 3: Interpolation and Approximation
So we do have information about the function and its derivative! For this
general case, we propose the interpolation polynomial
p(X) = ax
3
+bx
2
+ex +d
and determine a, b, c, and d by requiring that
(3.13)
(3.14)
This interpolating polynomial will be continuous, as was the Lagrange in-
terpolating polynomial, and its first derivative will also be continuous! With
some effort, we can determine the appropriate coefficients and find
(3.15)
In this example, we've only considered Hermite cubic interpolation, but it
should be clear that we could devise other polynomials, depending upon the
information we have available. For example, if we knew the function at n
points and the derivative at r points, we could construct a polynomial of order
n+r-l satisfyingthe n+r conditions. The more frequent circumstance is that
we know the function and its derivative at n points. Using all the information
at our disposal, we write the general Hermite interpolating polynomial as
n n
p(x) = L hj,n(x)f(xj) +L hj,n(x)!'(Xj),
j=l j=l
(3.16)
where h and h are (as yet) undetermined polynomials. With some effort, we
can find that
(3.17)
and
- 2
hj,n(x) = (x - Xj) lj,n(x), (3.18)
where the lj,n(x) were defined in association with the Lagrange polynomial.
The Hermite polynomials are termed osculating, by the way, since they are
constructed so as to just "kiss" at the points xj.
Cubic Splines 95
EXERCISE 3.4
Using Hermite interpolation, evaluate the relative intensity ///0, and
compare to the results of the previous exercise. In conjunction with a
root-finder, determine the p-value for which the intensity is zero, i.e.,
find the location of the first fringe in the diffraction pattern.
It is also possible to construct approximations using higher deriva-
tives. The Bessel functions, for example, satisfy the differential equation
(3.19)
so that if we have a table of In's, we can find both J ~ and J::. A 5th-order
approximating polynomial could then be determined, for example, passing
through two points. Since this information is "local" to the region being ap-
proximated, these high order approximations do not suffer the unstable be-
havior of the high order Lagrange polynomials.
Cubic Splines
If derivatives are available, then Hermite interpolation can - and probably
should - be used. That will guarantee that the function and its first deriva-
tive will be continuous. More often than not, however, the derivatives are not
available. While the Lagrange interpolatingpolynomial can certainly be used,
it has one very undesirable characteristic: its derivative is not continuous.
Let's imagine that we're using the points Xl, X2, X3, and X4 and interpolat-
ing in the region X2 ~ X ~ X3. Ai?, X varies from X2 to X3, the function and
its derivatives vary smoothly. But as X increases beyond X3, we change the
points used in the interpolation in order to keep X "surrounded." This shift-
ing of interpolation points is done for a good reason, as discussed earlier, but
look at the consequence: since we now have a different set of interpolation
points, we have a different approximating polynomial. Although the function
will be continuous, the derivatives will suffer a discontinuous change. Not a
particularly attractive feature.
It would be desirable to have an approximating function that had con-
tinuous derivatives. And in fact, we can construct such a function. Let's de-
fine p(x) as the cubic interpolating function used in the region Xj ~ X ~ Xj+l,
which can be written as
(3.20)
96 Chapter 3: Interpolation and Approximation
Requiring this approximation to be exact at x = xj gives us
(3.21)
This approximation should also be exact at x = Xj+l, so that
where we've introduced the notation
(3.22)
and (3.23)
The derivatives of our cubic approximation are
and
p"(X) = 6aj(x - Xj) + 2bj .
For the second derivative at x = Xj we have
so that
P"
b
- J
j - 2'
while at x = Xj+l we have
and hence
1 P" - P"
J+l J
aj = 6 h.
J
From Equation (3.22), we then find that
(3.24)
(3.25)
(3.26)
(3.27)
(3.28)
(3.29)
(3.30)
With the coefficients of the polynomial known, at least in terms of the p'j, we
can write
[
h
" h"] "
Pj+l - Pj jPHl jPj Pj 2
p(X)=Pj+ h
j
--6---3- (x-Xj)+2(x-Xj)
" "
+
Pj+l - Pj ( _ .)3
6h. x X J ,
J
(3.31)
Cubic Splines 97
and
h" h"
'() Pj+1 - Pj jPj+1 jPj + "( )
pX= ------- P-X-X-
h- 6 3 J J
J
" "
+
Pj+l - Pj ( _ _)2
2h- X X J ,
J
(3.32)
These expressions tell us that the function and its derivative can be
approximated from the Pj and the p'j. Of course, we already know the Pj. To
determine the p'j, we consider the derivative in the previous interval. Replac-
ing j by j - 1 in Equation (3.32), we have
Xj-1 ::; X ::; Xj.
P
- - P- 1 h - 1P" h - 1P" 1
'( ) _ J J- _ ~ _ J- J- "( _ _ )
P x - h
j
-
1
6 3 +Pj-1 X XJ -l
" "
Pj - Pj-1 ( )2
+ 2h _ x - x j -1 ,
J-1
(3.33)
We now require that Equations (3.33) and (3.32) yield exactly the same value
of the derivative at x = xj, so that
h " h " " "
Pj - Pj-l _ j-1Pj _ j-1Pj-l "h_ Pj - Pj-l h2
h
- 6 3 +Pj-1 J + 2h _ j-1
J-l J-l
h " h"
Pj+l - Pj _ jPj+1 _ jPj
(3.34)
h
j
6 3
Moving the (unknown) p'j to the left side of the equation and the (known) Pj
to the right, we find
j = 2, ... , n - 1. (3.35)
This gives us (n - 2) equations for the n unknown p'j - we need two more
equations to determine a unique solution. These additional equations may
come from specifying the derivative at the endpoints, i.e., at x = Xl and x
n
.
From Equation (3.32), we find
2h
" h " 6
P2
- PI 6'
1P1 + IP2 = hI - PI'
(3.36)
98 Chapter 3: Interpolation and Approximation
while from Equation (3.33) we find
h
II 2h II 6
Pn
- Pn-l ,
n-IPn-1 + n-IPn = - h +Pn
n-l
All these equations can be written in matrix form as
(3.37)
h
n
- 2 2(h
n
- 2 +h
n
- l )
h
n
-
l
hn - l
2h
n
-
1
P ~ - l
P ~
Is it obvious why the matrix is termed tridiagonal?
(3.38)
The derivatives at the endpoints are not always known, however. The
most common recourse in this instance is to use the so-called natural spline,
obtained by setting the second derivatives to zero at x = Xl and x = Xno This
forces the approximatingfunction to be linear at the limits ofthe interpolating
region. In this case the equations to be solved are
1
hn- 2 2(hn- 2 +hn-d P ~ - l
1 P ~
o
6P3 - P2 _ 6P2 - PI
h
2
hI
6P4 - P3 _ 6P3 - P2
h
3
h
2
Tridiagonal Linear Systems 99
(3.39)
6Pn - Pn-l _ 6Pn-1 - Pn-2
h
n
-
l
h
n
-
l
o
Again, the equations are of a tridiagonal form. This is the simplest form
that allows for a second derivative. Since many of the important equations
of physics are second-order differential equations, tridiagonal systems like
these arise frequently in computational physics. Before proceeding with our
problem of determining cubic splines, let's investigate the general solution to
systems of equations of this form.
Tridiagonal Linear Systems
Linear algebraic systems are quite common in applied problems. In fact, since
they are relatively easy to solve, one way of attacking a difficult problem is to
find some way to write it as a linear one. In the general case, where a solution
to an arbitrary set of simultaneous linear equations is being sought, Gaussian
elimination with partial pivoting is a common method of solution and will be
discussed later in this chapter. For tridiagonal systems, Gaussian elimination
takes on a particularly simple form, which we discuss here.
Let's write a general tridiagonal system as
b
l CI Xl
a2 b
2 C2 X2
a3 b
3 C3 X3
an-l bn - 1 Cn-l Xn-l
an b
n
X
n
(3.40)
The b
j
, j = 1, ... , n lie on the main diagonal. On the subdiagonallie the
aj, for which j ranges from 2 to n, while the Cj lie above the main diagonal,
with 1 :::; j :::; n - 1. Alternatively, we can write Equation (3.40) as a set of
100 Chapter 3: Interpolation and Approximation
simultaneous equations,
(3.41)
a n -IXn -2 +bn-Ixn-l +Cn-IXn Tn-l
anXn-1 +bnxn Tn
The general way to solve such sets ofequations is to combine the equa-
tions in such a way as to eliminate some of the variables. Let's see if we can
eliminate Xl from the first two equations. Multiply the first equation by a2/bl,
and subtract it from the second, and use this new equation in place ofthe orig-
inal second equation to obtain the set
+CIX2
a2
(b2 - b;CI)X2 +C2
X
3
a3X2 +b3X3 +C3X4
a n -IXn -2 +bn-Ixn-l +Cn-IXn
anXn-1 +bnx
n
To simplify the notation, define
(3.42)
f31 = bl ,
and
thus obtaining
PI = TI,
(3.43)
(3.44)
(3.45)
a n -IXn -2 +bn-Ixn-l +Cn-IXn Tn-l
anXn-1 +bnx
n
Tn
We can now proceed to eliminate X2 - multiplying the second equation by
Tridiagonal Linear Systems 101
a3/(32 and subtracting it from the third yields
Pl
P2
P3
an -lXn -2 +bn-lxn-l +Cn-lXn Tn-l
anXn-l +bnx
n
Tn
where we've defined
(3.46)
(3.47)
Clearly, there's a pattern developing here. After n - 1 such steps, we arrive at
the set of equations
where we've defined
Pl
P2
P3
(3n-l
X
n-l Cn-lXn Pn-1
(3n
x
n Pn
(3.48)
a'
(3j = b
j
- -(3 J Cj-l
j-1
d
_ aj
an Pj - Tj - -(3Pj-1,
j-1
j = 2, ... ,n. (3.49)
This set of equations can now be solved by "back substitution." From the last
of the equations, we have
Xn = Pn/(3n,
which can then be substituted into the previous equation to yield
(3.50)
(3.51)
and so on. Since all the previous equations have the same form, we are lead
to the general result
Xn-j = (Pn-j - C
n
-jx
n
-Hd/(3n-j, j = 1, ... , n - 1,
the solution to our original problem!
(3.52)
102 Chapter 3: Interpolation and Approximation
A subroutine to implement the solution we've developed can easily be
written. In fact, all that needs to be done is to determine the (3i' s and Pi's from
Equations (3.43) and (3.49), and then back substitute according to Equations
(3.50) and (3.52) to determine the x/So A suitable code might look like
Subroutine TriSolve(A, B, C, X, R, n)
* This subroutine solves the tridiagonal set of equations
*
* / b(l) c(l)
\
/ x(1)\ / r(l)\
* I a(2) b(2) c(2)
I
I x(2) I I r(2) I
* I
a(3) b(3) c(3)
I
I x(3) I I r(3) I
* I I
I .... I I .... I
* I
a(n-l) b(n-l) c(n-l) I Ix(n-l)1 Ir(n-l) I
* \
a(n) b(n) / \ x(n)/ \ r(n)/
*
* The diagonals A, B, and C, and the right-hand sides
* of the equations R, are provided as input to the
* subroutine, and the solution X is returned.
*
Integer n
Double Precision A(n), B(n), C(n), X(n), R(n)
Double Precision BETA(100) , RHO(100)
If(n .gt. 100) STOP' Arrays too large for TRISOLVE'
If (b(1) .eq. 0)
+ STOP 'Zero diagonal element in TRISOLVE'
beta(1) = b(l)
rho(1) = r(1)
DO j=2,n
beta(j) = b(j) - a(j) * c (j-l) / beta(j-l)
rho(j) = r(j) - a(j)* rho(j-l) / beta(j-l)
if(beta(j) .eq. 0)
+ STOP 'Zero diagonal element in TRISOLVE'
END DO
*
* Nov, for the back substitution...
*
x(n) = rho(n) / beta(n)
DO j = 1, n-l
x(n-j) = ( rho(n-j)-c(n-j)*x(n-j+l) )/beta(n-j)
END DO
end
Note that the arrays A, B, C, X, and R are dimensioned n in the calling
routine. BETA and RHO are used only in this subroutine, and not in the calling
Cubic Spline Interpolation 103
routine, and so must be explicitly dimensioned here and a check performed
to verify that they are sufficiently large. Since BETA will eventually be used in
the denominator of an expression, it should be verified that it is nonzero.
There is one further modification that we should make before we use
this code. While "efficiency" is not of overwhelming importance to us, nei-
ther should we be carelessly inefficient. Do you see where we have erred? As
it stands, an entire array is being wasted - there is no need for both X and
RHO! In the elimination phase, the Xarray is never used, while in the back
substitution phase, it is equated to the elements of RHO. Defining two distinct
variables was necessary when we were developing the method of solution, and
it was entirely appropriate that we used them. At that point, we didn't know
what the relationship between Pj and Xj was going to be. But as a result of
our analysis we know, and we should use our understanding of that relation-
ship as best we can. With respect to the computer code, one of the arrays is
unnecessary - you should remove all references to RHO in the subroutine, in
favor of the array X.
a EXERCISE 3.5
After suitably modifying TriSolve, use it to solve the set of equations
(3.53)
You can check your result by verifying that your solution satisfies the
original equations.
Cubic Spline Interpolation
Now that we see what's involved, the solution to the cubic spline problem is
fairly clear, at least in principle. From the given table of x/s and f(xj)'s, the
tridiagonal set of equations of Equation (3.38) (or Equation (3.39), for the
natural spline) is determined. Those equations are then solved, by a subrou-
tine such as TriSolve, for the p'j. Knowing these, all other quantities can be
determined. In pseudocode, the subroutine might look something like this:
Subroutine Splinelnit(x,f,fpl,fpN,second,n)
*
104 Chapter 3: Interpolation and Approximation
* This subroutine is called with the first derivatives
* at x=x(l), FP1, and at x=x(n), FPN, specified, as
* well as the X's and the F's. (And n.)
*
* It returns the second derivatives in SECOND.
*
* The arrays X, F, and SECOND are dimensioned in the
* calling routine.
*
* < Initialize A, B, C --- the subdiagonal, main diagonal,
* and super-diagonal.>
* < Initialize R --- the right-hand side.>
*
* < Call TRISOLVE to solve for SECOND, the second
* derivatives. This requires the additional array BETA.>
*
end
This subroutine will certainly work. But... Are all those arrays really
necessary? If we leave the solution of the tridiagonal system to a separate
subroutine, they probably are. However, it would be just as useful to have a
specialized version of TRISOLVE within the spline code. This would make the
spline subroutine self-contained, which can be a valuable characteristic in it-
self. As we think about this, we realize that TRISOLVE will be the major portion
of the spline routine - what we really need to do is to start with TRISOLVE,
and add the particular features of spline interpolation to it. This new point
of view is considerably different from what we began with, and not obvious
from the beginning. The process exemplifies an important component ofgood
programming and successful computational physics - don't get "locked in"
to one way of doing things; always look to see if there is a better way. Some-
times there isn't. And sometimes, a new point of view can lead to dramatic
progress.
So, let's start with TRISOLVE. To begin, we should immediately change
the name of the array Xto SECOND, to avoid confusion with the coordinates
that we'll want to use. Then we should realize that we don't need both Aand
C, since the particular tridiagonal matrix used in the cubic spline problem is
symmetric. In fact, we recognize that it's not necessary to have arrays for
the diagonals of the matrix at all - we can evaluate them as we go! That is,
instead of having a DO loop to evaluate the components of A, for example, we
can evaluate the specific element we need within the Gauss elimination loop.
The arrays A, B, C, and Rare not needed! The revised code looks like this:
Subroutine SplineInit(x,f,fpl,fpn,second,n)
Cubic Spline Interpolation 105
*
* This subroutine performs the initialization of second
* derivatives necessary for CUBIC SPLINE interpolation.
* The arrays X and F contain N values of function and the
* position at which it was evaluated, and FPl and FPn
* are the first derivatives at x = x(l) and x = x(n).
*
* The subroutine returns the second derivatives in SECOND.
*
* The arrays X, F, and SECOND are dimensioned in the
* calling routine --- BETA is dimensioned locally.
*
Integer n
Double Precision X(n), F(n), Second(n)
Double Precision FP1,FPn
Double Precision BETA(100)
*
* In a cubic spline, the approximation to the function and
* its first two derivatives are required to be continuous.
* The primary function of this subroutine is to solve the
* set of tridiagonal linear equations resulting from this
* requirement for the (unknown) second derivatives and
* knowledge of the first derivatives at the ends of the
* interpolating region. The equations are solved by
* Gaussian elimination, restricted to tridiagonal systems,
* with A, B, and C being sub, main, and super diagonals,
* and R the right hand side of the equations.
*
If(n .gt. 100)
+ STOP 'Array too large for SPLINE INIT'
bl = 2.dO*(x(2)-x(1))
beta(l) = bl
If (beta(l) .eq. 0)
+ STOP 'Zero diagonal element in S P L I ~ E INIT'
rl = 6.dO*( (f(2)-f(1))/(x(2)-x(1)) - FPl )
second(1) = rl
DO j=2,n
IF (j.eq.n) THEN
bj= 2.dO * (x(n)-x(n-l))
rj= -6.dO*f(n)-f(n-l))/(x(n)-x(n-l))-FPn)
*
*
For j=2, ... ,n-l, do the following
ELSE
bj=2.dO*( x(j+l) - x(j-l) )
106 Chapter 3: Interpolation and Approximation
rj=6.dO*( (f(j+1)-f( j ))/(x(j+1)-x( j ))
-(f( j )-f(j-1))/(x( j )-x(j-1)))
ENDIF
*
* Evaluate the off-diagonal elements. Since the
* matrix is symmetric, A and C are equivalent.
*
aj = x( j ) - x(j-1)
c = aj
beta(j) = bj - aj * c / beta(j-1)
second(j) = rj - aj* second(j-1) / beta(j-1)
IF(beta(j) .eq. 0)
+ STOP 'Zero diagonal element in SPLINE INIT'
END DO
*
* Now, for the back substitution...
*
second(n) = second(n) / beta(n)
DO j = 1, n-1
c = x(n-j+1)-x(n-j)
second(n-j) =
+ ( second(n-j) - c * second(n-j+1) )/beta(n-j)
END DO
end
This code could be made more "compact" - that is, we don't really need to
use intermediate variables like aj and c, and they could be eliminated - but
the clarity might be diminished in the process. It's far better to have a clear,
reliable code than one that is marginally more "efficient" but is difficult to
comprehend.
Of course, we haven't interpolated anything yet! SplineInit yields
the second derivatives, which we need for interpolation, but doesn't do the
interpolation itself. SplineInit needs to be called, once, before any interpo-
lation is done.
The interpolation itself, the embodiment of Equation (3.31), is done
in the function Spline:
Double Precision Function Spline(xvalue,x,f,second)
*
* This subroutine performs the CUBIC SPLINE interpolation,
* after SECOND has been initialized by SPLINE INIT.
* The arrays F, SECOND, and X contain N values of function,
Cubic Spline Interpolation 107
* its second derivative, and the positions at which they
* were evaluated.
*
* The subroutine returns the cubic spline approximation
* to the function at XVALUE.
*
* The arrays X, F, and SECOND are dimensioned in the
* calling routine.
*
Integer n
Double Precision Xvalue, X(n), F(n), Second(n)
*
* Verify that XVALUE is between x(l) and x(n).
*
IF (xvalue .It. x(l))
+ Stop 'XVALUE is less than X(l) in SPLINE'
IF (xvalue .gt. x(n))
+ Stop 'XVALUE is greater than X(n) in SPLINE'
*
* Determine the interval containing XVALUE.
*
j = 0
100 j = j + 1
IF ( xvalue .gt. x(j+l) ) goto 100
*
* Xvalue is between x(j) and x(j+l).
*
* Now, for the interpolation...
*
spline
end
Some of the code has been left for you to fill in.
a EXERCISE 3.6
Use the cubic spline approximation to the Bessel function to find the
location of its first zero.
In this spline approximation, we need to know the derivatives of the
function at the endpoints. But how much do they matter? If we had simply
"guessed" at the derivatives, how much would it affect the approximation?
One of the advantages of a computational approach is that we answer these
108 Chapter 3: Interpolation and Approximation
questions almost as easily as we can ask them.
EXERCISE 3.7
Investigate the influence of the derivatives on the approximation to
Jl, by entering false values for FP1. Instead of the correct derivative,
useJ{(O) = -3.0, -2.5, -2.0, ... , 2.5,and3,andplotyourbogusap-
proximations against the valid one. Howmuch ofa difference does the
derivative make?
Earlier, we noted that in addition to the "clamped" spline approxi-
mation in which the derivatives at the endpoints are specified, there is the
"natural" spline for which derivative information is not required.
EXERCISE 3.8
Change Splinelnit into NaturalSplinelnit, a subroutine which ini-
tializes the natural spline. Plot the "clamped" and natural spline ap-
proximations to J
1
, and compare them. (Are changes needed in the
function Spline?)
Occasionally, we find the derivative capability of the spline approxi-
mation to be particularly valuable. Based upon Equation (3.32), it's easy to
develop a function analogous to Spline that evaluates the derivative.
EXERCISE 3.9
Develop the function SplinePrime, which evaluates the derivative of
the function, and consider the intensity of light passing through a
circular aperture. We previously investigated the location ofthe dark
fringe - past that fringe, the intensity builds towards a maximum
- the location of the first light fringe. Where is it, and how bright is
it? (Note that you need to approximate 1/1
0
, not J
1
Extrema of the
intensity correspond to zeros of the derivative of the intensity.)
Approximation of Derivatives
If a function, and all its derivatives, are known at some point, then Taylor's
series would seem to be an excellent way to approximate the function. As a
practical matter, however, the method leaves quite a lot to be desired - how
often are a function and all its derivatives known? This situation is some-
what complementary to that of Lagrange interpolation - instead of know-
Approximation ofDerivatives 109
ing the function at many points, and having "global" information about the
function, the Taylor series approximation relies upon having unlimited infor-
mation about the function "local" to the point of interest. In practice, such
information is usually not available. However, Taylor's series plays a crucial
role in the numerical calculation of derivatives.
There are several ways to discuss numerical differentiation. For ex-
ample, we could simply take the definition of the derivative from differential
calculus,
f
'( ) = df(x) = I f(x + h) - f(x)
x d 1m h '
x h-+O
and suggest that an adequate approximation is
I
' ( ) - f(x + h) - f(x)
h X - h .
(3.54)
(3.55)
Certainly, f ~ ( x ) approaches f'(x) in the limit as h ........ 0, but how does it com-
pare if h is other than zero? What we need is a method that will yield good
approximations of known quality using finite differences; that is, the approx-
imation should not require that the limit to zero be taken, and an estimate
of the accuracy of the approximation should be provided. It's in situations
like these that the power and usefulness of the Taylor series is most obvious.
Making a Taylor series expansion of f(x + h) about the point f(x),
h
2
f(x + h) = f(x) + h!,(x) + ,!,,(x) + "',
2.
we can solve for f' (x J to find
!,(x) = ~ [f(X + h) - f(x) - ~ ~ !,,(x) + ...J.
(3.56)
(3.57)
This is not an approximation - it is an expression for f'(x) that has exactly
the same validity as the original Taylor series from which it was derived. And
the terms that appear in the expression involve the actual function of interest,
f(x), not some approximation to it (quadratic or otherwise). On the other
hand, Equation (3.57) certainly suggests the approximation,
!,(x) ~ f(x + h) - f(x).
h
(3.58)
By referring to Equation (3.57) we see that the error incurred in using this
approximation is
E(h) = ~ ! " ( x ) +"',
(3.59)
110 Chapter 3: Interpolation and Approximation
where we've written only the leading term of the error expression - the next
term will contain h
2
and for small h will be correspondingly smaller than the
term exhibited. Another way of indicating the specific approximation being
made is to write
j'(x) = f(x + h ~ - f(x) + O(h), (3.60)
where O(h) indicates the order ofthe approximation beingmade by neglecting
all other terms in the total expansion.
Equation (3.60) is a forward difference approximation to the deriva-
tive. We can derive a different expression for f'(x), starting with a Taylor
series expansion of f(x - h). This leads to an expression analogous to Equa-
tion (3.57),
j'(x) = ~ [f(X) - f(x - h) + ~ ~ f"(x) + ...J'
and the backward difference approximation,
j'(x) = f(x) - ~ ( x - h) + O(h).
(3.61)
(3.62)
Since the forward and backward difference formulas are obtained in similar
ways, it's not surprising that their error terms are of the same order. Thus
neither has a particular advantage over the other, in general. (An exception
occurs at the limits of an interval, however. If the function is only known on
the interval a < x < b, then the backward difference formula can not be used
at x = a. Likewise, the forward difference formula can't be used at x = b.)
However, the two formulas have an interesting relation to one another. In
particular, the leading term in the error in one formula is of opposite sign
compared to the other. If we add the two expressions, the leading term will
cancel! Let's return to the Taylor series expansions, and include some higher
order terms,
h
2
h
3
h
4
.
f(x + h) = f(x) + hj'(x) + 2T f"(x) + 3! flll(X) + 4! j'V(x) +.... (3.63)
and
( ) ()
'() h
2
I/() h
3
III() h
4
iv( )
f x - h = f x - hf x + - f x - - f x + - f x + ....
2! 3! 4!
(3.64)
Wanting an expression for f'(x), we subtract these expressions to find the
central difference formula for the derivative,
j'(x) = f(x + h) ~ f ( x - h) + O(h
2
),
(3.65)
Approximation ofDerivatives 111
in which the error term is
E(h) = _ h
2
I'"(x) _ h
4
rex) +....
3! 5!
(3.66)
Not only has the error been reduced to O(h
2
), but all the terms involving odd
powers of h have been eliminated.
As a general rule, symmetric expressions are more accurate
than nonsymmetric ones.
EXERCISE 3.10
Consider the function I(x) = xe
X
Obtain approximations to 1'(2),
with h = 0.5, 0.45, ... , 0.05, using the forward, backward, and central
difference formulas. To visualize how the approximation improves
with decreasing step size, plot the error as a function of h.
In the present instance, we know the correct result and so can also
plot the error, as the difference between the calculated and the true value of
the derivative. For any function, such as this error, that is written as
the natural logarithm is simply
(3.67)
InE(h) ~ n lnh. (3.68)
Thus, ifln E(h) is plotted versus In h, the value of n is just the slope of the
line!
EXERCISE 3.11
Verify the order of the error in the backward, forward, and central
difference formulas by plotting the natural log of the error versus the
natural log of h in your approximation of the derivative of I = xe
X
at
x=2.
These methods can be used to evaluate higher derivatives. For ex-
ample, Equations (3.63) and (3.64) can be added to eliminate I'(x) from the
112 Chapter 3: Interpolation and Approximation
expression, yielding
!,,(x) = f(x +h) - 2 ~ ~ X ) + f(x - h) +O(h2 )
with an error of
E(h) = _ h
2
fiv(X) _ ~ ri(X) +....
12 360
(3.69)
(3.70)
Again, we note that terms with odd powers of h are absent, a consequence of
the symmetry of the expression. To develop a forward difference formula, we
need to know that
Combining this with Equation (3.63) yields
!,,(x) = f(x) - 2f(x +h;) + f(x +2h) +O(h).
(3.72)
While this expression uses three function evaluations, just as the central dif-
ference formula, Equation (3.69), it's not nearly as accurate - in this approx-
imation, the error is
E(h) = -h!"'(x) _ 7h
2
fiv(x) +....
12
(3.73)
The reason, of course, is that with the central difference formula the function
is evaluated at points surrounding the point of interest, while with the for-
ward difference they're all off to one side. This is entirely consistent with our
previous observation concerning the interpolation of functions.
EXERCISE 3.12
Consider the function f(x) = xe
X
Obtain approximations to 1"(2),
with h = 0.5, 0.45, ... , 0.05, using the forward and central difference
formulas. Verify the order ofthe error by plotting the log error versus
logh.
Richardson Extrapolation 113
Richardson Extrapolation
We have seen how two different expressions can be combined to eliminate the
leading error term and thus yield a more accurate expression. It is also possi-
ble to use a single expression to achieve the same goal. This general technique
is due to L. F. Richardson, a meteorologist who pioneered numerical weather
prediction in the 1920s. (His Weather Prediction by Numerical Process, writ-
ten in 1922, is a classic in the field.) Although we will demonstrate the method
with regards to numerical differentiation, the general technique is applicable,
and very useful, whenever the order of the error is known.
Let's start with the central difference formula, and imagine that we
have obtained the usual approximation for the derivative,
J'(x) = f(x +h) - f(x - h) _ h
2
J'"(X) +...
2h 6 .
(3.74)
Using a different step size we can obtain a second approximation to the deriva-
tive. Then using these two expressions, we can eliminate the leading term of
the error. In practice, the second expression is usually obtained by using an h
twice as large as the first, so that
J'(x) = f(x +2h) - f(x - 2h) _ 4h
2
f"'(x) +....
4h 6
(3.75)
While the step size is twice as large, the error is four times as large. Dividing
this expression by 4 and subtracting it from the previous one eliminates the
error! Well, actually only the leading term of the error is eliminated, but it
still sounds great! Solving for f', we obtain
f
'( ) = f(x - 2h) - 8f(x - h) +8f(x +h) - f(x +2h) O(h
4
)
x 12h +,
a 5-point central difference formula with error given by
h
4
E(h) = - r(x) +....
30
(3.76)
(3.77)
(Yes, we know there are only 4 points used in this expression. The coefficient
of the fifth point, f(x), happens to be zero.)
Of course, we can do the same thing with other derivatives: using
the 3-point expression of Equation (3.69), we can easily derive the 5-point
114 Chapter 3: Interpolation and Approximation
expression
f
"( ) = - f(x - 2h) + 16f(x - h) - 30f(x) +16f(x +h) - f(x +2h) CJ(h
4
)
x 12h2 + ,
(3.78)
with
E(h) = h
4
ri(X) +....
90
(3.79)
Now, there are two different ways that Richardson extrapolation can
be used. The first is to obtain "new" expressions, as we'vejust done, and to use
these expressions directly. Be forewarned, however, that these expressions
can become rather cumbersome.
Richardson extrapolation can also be used in an indirect computa-
tional scheme. We'll carry out exactly the same steps as before, but we'll per-
form them numerically rather than symbolically. Let D1(h) be the approxi-
mation to the derivative obtained from the 3-point central difference formula
with step size h, and imagine that both D
1
(2h) and D1(h) have been calcu-
lated. Clearly, the second approximation will be better than the first. But
since the order of the error in this approximation is known, we can do even
better. Since the error goes as the square ofthe step size, D
1
(2h) must contain
four times the error contained in D
1
(h)! The difference between these two ap-
proximations is then three times the error of the second. But the difference
is something we can easily calculate, so in fact we can calculate the error! (Or
at least, its leading term.) The "correct" answer, D
2
(h), is then obtained by
simply subtracting this calculated error from the second approximation,
D
2
(h) = D
1
(h) _ [D
1
= (h)]
4D
1
(h) - D
1
(2h)
3
(3.80)
Of course, D
2
(h) is not the exact answer, since we've only accounted
for the leading term in the error. Since the central difference formulas have
error terms involving only even powers of h, the error in D
2
(h) must be O(h
4
).
Thus D
2
(2h) contains 2
4
times as much error as D
2
(h), and so this error can
be removed to yield an even better estimate of the derivative,
D
3
(h) = D
2
(h) _ =
16D
2
(h) - D
2
(2h)
15
(3.81)
Richardson Extrapolation 115
This processes can be continued indefinitely, with each improved estimated
given by
D. (h) = D.(h) _ [D
i
(2h) - Di(h)]
0+1 0 220 _ 1
2
2i
Di(h) - D
i
(2h)
2
2i
- 1
(3.82)
To see how this works, consider the function f(x) = xe
X
and calcu-
late D
1
(h) at x = 2 for different values ofthe step size h. It's convenient to
arrange these in a column, with each succeeding step size half the previous
one. The entries in this column are then used to evaluate D
2
(h), which can
be listed in an adjoining column. In turn, these entries are combined to yield
D
3
, and so on. We thus build a table of extrapolations, with the most accurate
approximation being the bottom rightmost entry:
1'(2)
h Dl D2 D3 D4
0.4 23.1634642931
0.2 22.4141606570 22.1643927783
0.1 22.22878 68803 22.1669956214 22.1671691443
0.05 22.1825648578 22.1671575170 22.1671683100 22.1671682968
While we've suggested that this table was created a column at a time, that's
not the way it would be done in practice, of course. Rather, it would be built a
row at a time, as the necessary constituents of the various approximations be-
came available. For example, the first entry in the table is D
1
(0.4) and the sec-
ond entry is D
1
(0.2). This is enough information to evaluate D2(0.2), which
fills out that row. (Note that this entry is already more accurate than the last
3-point central difference approximation in the table, D
1
(0.05).) Ifwe wanted
to obtain the derivative to a specific accuracy, we would compare this last ap-
proximation, D2(0.2), to the best previous approximation, D
1
(0.2). In this
case, we only have 2-digit accuracy, so we would halve h and evaluate D
1
(0.1).
That would enable D
2
(0.1) to be evaluated, which would then be used to eval-
uate D3(0.1), completing that row. We now have 4-significant-digit agreement
between D2(0.1) and D3(0.1). Halving h one more time, the fourth row of the
table can be generated. The last entries, D3(0.05) and D
4
(0.05), agree to 9
significant digits; D4(0.05) is actually accurate to all 12 digits displayed.
The coding of this extrapolation scheme can initially be a little intim-
idating, although (as we shall see) it isn't really all that difficult. As often
happens, we will start with a general outline and build upon it - there's no
116 Chapter 3: Interpolation and Approximation
reason to believe that anyone ever writes a complicated computer code off the
top of their head. In fact, quite the opposite is true: the more complicated
the code, the more the methodical its development should be. From reflection
upon the extrapolation table we created, we realize that the "next" rowis gen-
erated with one newly evaluated derivative and the entries of the "previous"
row. So, we'll have an "old" row, and a "new" one, each expressed as an array.
An outline of the code might look something like this:
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* This code fragment illustrates the RICHARDSON
* EXTRAPOLATION scheme for the calculation of a derivative.
*
* 5 columns will be stored in each of the OLD and NEW
* rows of the extrapolation table. H is the step size,
* which is halved with each new row.
*
Double Precision OLD(5),NEW(5),x,h,DF
Integer iteration, i
X = < point where derivative is to be evaluated>
h = < initial step size >
iteration = 0 <keep track of number of iterations>
100 iteration = iteration + 1
DO i = 1, 5
old(i) = new(i)
END DO
* <evaluate the derivative DF using the 3-pt formula>
new(1) = DF
*
* The following IF-statement structure calculates the
* appropriate extrapolations.
*
IF (iteration .ge. 2)new(2)=( 4*new(1) - old(l) )/ 3
IF (iteration .ge. 3)new(3)=(16*new(2) - old(2) )/15
IF (iteration
*
h = h/2.DO
GOTO 100
This is a start. We've dimensioned the arrays to be 5, although we may need
to change this later. We've established the basic loop in which the derivative
Curve Fitting by Least Squares 117
will be evaluated and extrapolations are made, and we've redefined the step
size at the end of the loop. However, we haven't provided any way for the
program to STOP! The first thing to do to this code, then, is to provide a
way to terminate execution once a specified tolerance for the accuracy of the
derivative has been met. You will also want to provide a graceful exit if the
tolerance isn't met, after a "reasonable" number of iterations. You might
think that more entries should be allowed in the rows - however, experience
suggests that the extrapolations don't improve indefinitely, so that the fourth
or fifth column is about as far as one really wants to extrapolate. By the way,
this general extrapolation procedure can be used any time that the order of
the error is known - later, we will use Richardson extrapolation to calculate
integrals. In the general case, however, odd powers of h will be present in the
error terms and so the coefficients used here would not be appropriate.
a EXERCISE 3.13
Modify the code fragment to calculate the second derivative of f (x) =
xe
x
, using the 3-point expression of Equation (3.69) and Richardson
extrapolation, and generate a table of extrapolates in the same way
that the first derivative was evaluated earlier.
Curve Fitting by Least Squares
To this point, we have discussed interpolation in which the approximating
function passes through the given points. But what if there is some "scatter"
to the points, such as will be the case with actual experimental data? A stan-
dard problem is to find the "best" fit to the data, without requiring that the
approximation actually coincides with it any of the specific data points. As an
example, consider an experiment in which a ball is dropped, and its velocity
is recorded as a function of time. From this data, we might be able to obtain
the acceleration due to gravity. (See Figure 3.3.)
Of course, we have to define what we mean by "best fit." We'll take a
very simple definition, requiring a certain estimate of the error to be a mini-
mum. Let's make a guess at the functional form ofthe data - in our example,
the guess will be
v(t) = a +bt (3.83)
- and evaluate the difference between this guess, evaluated at ti, and the
velocity measured at t = ti, Vi- Because the difference can be positive or nega-
tive, we'll square this difference to obtain a non-negative number. Then, we'll
118 Chapter 3: Interpolation and Approximation
add this to the estimate from all other points. We thus use
N
8 = ~ ) V ( t i ) - Vi)2
i=l
(3.84)
as an estimate of the error. Our approximation will be obtained by making
this error as small as possible - hence the terminology "least squares fit."
10
time velocity
8
0.00 -0.10290
v
0.10 0.37364
e
6
l
0.20 2.43748
0
c
0.30 3.93836 i
0.40 3.31230
t
4
0.50 5.49472
Y
0.60 5.43325
0.70 6.39321
2
0.80 9.06048
0.90 9.36416
1.00 9.52066
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
time
FIGURE 3.3 The measured velocity of a particle at various times.
In general, v(t) will be written in terms ofunlrnown parameters, such
as the a and b in Equation (3.83). Let's think of varying a, for example, so
as to minimize 8 - a minimum will occur whenever 88/8a is zero and the
second derivative is positive. That is, the error will be an extremum when
And since
8
2
8 N
8
2 = ~ 2 > 0,
a i=l
(3.85)
(3.86)
Gaussian Elimination 119
we will have found a minimum. Of course, we'll have similar equations for b,
and
8
2
8 N 2
8
2 = L
2t
i > 0,
a i=l
(3.87)
(3.88)
so that the error is minimized with respect to b as well. We thus arrive at the
equations
and
N N
aN +b L ti = LVi,
i=l i=l
N N N
a Lti +b L
t
;= LVik
i=l i=l i=l
(3.89)
(3.90)
Ifa and bcan be found such that these equations hold, then we will have found
a minimum in the error.
EXERCISE 3.14
Solve for a and b. Plot the velocity data and your "best fit" to that
data.
In a more complicated situation, the equations might be more difficult
to solve. That is, most of us have little difficulty with solving two equations
for two unknowns. But 5 equations, or 25 equations, is a different matter.
Before we go on, we really need to discuss the solution to an arbitrary linear
set of simultaneous solutions, a standard topic of linear algebra.
Gaussian Elimination
Consider the set of equations
allXl +a12x 2 + alnX
n
= b
1
a21 X l +a22x 2 + a2nXn = b
2
(3.91)
120 Chapter 3: Interpolation and Approximation
We'll solve this set via Gauss elimination. In discussing tridiagonal systems,
we hit upon the main idea of the process: combine two of the equations in
such a way as to eliminate one of the variables, and replace one of the original
equations by the reduced one. Although the system of equations confronting
us now is not as simple as the tridiagonal one, the process is exactly the same.
We'll start by eliminating Xl in all but the first equation. Consider
the i-th equation,
(3.92)
Multiplying the first equation by ail/all and subtracting it from the i-th equa-
tion gives us
After Xl has been eliminated in equations 2 through N, we repeat the process
to eliminate X2 from equations 3 through N. Eventually, we arrive at an upper
triangular set of equations that is easily solved by back substitution.
Now, let's repeat some of what we just said, but in matrix notation.
We start with the set of Equations (3.91), written as a matrix:
all al2 al3
a21 a22 a23
a31 a32 a33
(3.94)
To eliminate Xl, we combined equations in a particular way. That process can
be expressed as a matrix multiplication - starting with
Ax=b,
we multiplied by the matrix M
I
, where
1 0 0 0
-a2da ll
1 0 0
M
I
=
-a3da ll 0 1 0
-andall 0 0 1
(3.95)
(3.96)
yielding the equation M
1
Ax = M
1
b, where
Gaussian Elimination 121
all a12 a13
0
a21 a21
a22 - -a12 a23 - -a13
all all
0
a31 a31
M1A=
a32 - -a12 a33 - -a13
all all
aln
a21
a2n - -aln
all
a31
a3n - -al
n
all
(3.97)
o
and
(3.98)
which is just what we would expect from Equation (3.93), The next step, to
eliminate X2 from equations 3 through N, is accomplished by multiplying by
M
2
, and so on.
Aparticularly useful variation ofthis method is due to Crout. In it, we
write A as a product of a lower triangular matrix L and an upper triangular
matrix U,
A=LU, (3.99)
or
all a12 a13
a,n]
a21 a22 a23 a2n
a31 a32 a33 a3n
anI a
n
2 a
n
3 ann
flU
0 0
11
1
U12 U13 Ul
n
hi l22
0 0 1
U23 U2n
l31 l32 l33
0 0 1
U3n
(3.100)
lnl ln2 ln3 0 0 0 1
Written out like this, the meaning of "lower" and "upper" triangular matrix
is pretty clear. It might seem that finding L and U would be difficult, but
122 Chapter 3: Interpolation and Approximation
actually it's not. Consider an element in the first column of A, and compare
it to the corresponding element of the matrix product on the right side. We
find that all = Ill, a21 = 121. a31 = 131. and so on, giving us the first column of
L. From the first row of A we find
a12 = IllU12
a13 = IllU13
(3.101)
But we already know Ill, and so we can solve these equations for the first row
ofU,
U1j = a1jjlll, j = 2, ... , N.
We now move to the second column of A, which gives us
a22 = 121U12 + 122
a32 = 131U12 + 132
(3.102)
(3.103)
Again, we know U12 and all the li1 appearing in these equations, so we can
solve them for the second column of L,
i = 2, ... ,N. (3.104)
We then consider the second row of A, and so on. In general, from the k-th
column of A we find that
k-1
lik = aik - L lijujk, i = k, k + 1, ... , n,
j=l
while from the k-th row of A we find that
(3.105)
k-1
akj - L lkiuij
i=l
Ukj = ----:l-kk-=-----
j = k + 1, k + 2, ... , n. (3.106)
By alternating between the columns and rows, we can solve for all the ele-
ments of L and U.
Gaussian Elimination 123
Recall that the original problem is to find x in the matrix equation
Ax=b,
or, equivalently, in
LUx=b,
where we've replaced A by LU. Defining z = Ux, this is the same as
Lz=b,
(3.107)
(3.108)
(3.109)
an equation for the unknown z. But this is "easy" to solve! In terms of the
component equations, we have
lllZI b
l
hlZI +l22
Z
2
b
2
l3l
Z
I +l32
Z
2 +l33
Z
3 b3 .
(3.110)
lnlZI +ln2
Z
2 +ln3 +lnnzn
b
n
From the first of these we find Zl = bI/lll, and from subsequent equations we
find
i-I
bi - LlikZk
k=l
i =2, ... ,N, (3.111)
by forward substitution. Having found z, we then solve
Ux=z (3.112)
for x. Since U is upper diagonal, this equation is also easily solved - it's
already in the form required for backward substitution. We have
Xl +U12
X
2 +U13
X
3 +UlnXn Zl
X2 +U23
X
3 +U2n
X
n Z2
X3 +U3n
X
n Z3, (3.113)
X
n Zn
124 Chapter 3: Interpolation and Approximation
from which we find
i-I
Xn-i = Zn-i - L Un-i,k Xk,
k=1
i = 1, n - 1. (3.114.)
In some problems we might be required to solve Ax = b for many different
b's. In that situation we would perform the decomposition only once, storing
L and U and using them in the substitution step to find x.
Superficially, this process looks much different from Gaussian elimi-
nation. However, as you move through it, you find that you're performing ex-
actly the same arithmetic steps as with Gaussian elimination - actually, the
Crout decomposition is nothing more than an efficient bookkeeping scheme.
It has the additional advantage ofnot requiring b until the substitution phase.
Thus the decomposition is independent of b, and once the decomposition has
been accomplished, only the substitution phases need to be performed to solve
for x with a different b. The LUdecomposition is also efficient in terms ofstor-
age, in that the original array A can be used to store its LU equivalent. That
is, the array A will initially contain the elements of A, but can be overwritten
so as to store L and U in the scheme
L(1,t) U(1,2) U(1,3)
U(I"J] L(2, t) L(2,2) U(2,3) U(2,N)
[AJ =
L(3,1) L(3,2) L(3,3) U(3,N) .
(3.115)
L(N,1) L(1,2) L(1,3) ... L(N,N)
There is a problem with the algorithm, however, as it's been pre-
sented. If any lqq is zero, the computer will attempt to divide by zero. This
could happen, for example, in trying to solve the set of equations
2
1.
(3.116)
These equations clearly have a trivial solution, but in determining the first
row of U from Equation (3.102) the algorithm will attempt to evaluate UI2
as 1/0. The remedy is simply to interchange the order of the two equations.
In general, we'll search for the largest element of the k-th column of L, and
interchange the rows to move that element onto the diagonal. Ifthe largest el-
ement is zero, then the system of equations has no unique solution! In some
cases it can also happen that the coefficients vary by orders of magnitude.
This makes determining the "largest" difficult. To make the comparison of
the coefficients meaningful, the equations should be scaled so that the largest
Gaussian Elimination 125
coefficient is made to be unity. (We note that this scaling is done for compar-
ison purposes only, and need not be included as a step in the actual compu-
tation.) The swapping of rows and scaling makes the coding a little involved,
and so we'll provide a working subroutine, LUSolve, that incorporates these
complications.
Subroutine LUsolve( A, x, b, det, ndim, n )
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* This subroutine solves the linear set of equations
*
b A x
*
*
* by the method of L U decomposition.
*
*
INPUT: ndim the size of the arrays, as dimensioned
*
in the calling routine
*
n the actual size of the arrays for
*
this problem
*
A an n by n array of coefficients,
*
altered on output
*
b a vector of length n
*
*
OUTPUT: x the 'solution' vector
*
det the determinant of A. If the
*
determinant is zero, A is SINGULAR.
*
*
1/1/93
*
integer ndim,n,order(100),i,j,k, imax,itemp
double precision a(ndim,ndim),x(ndim),b(ndim),det
double precision scale(100), max, sum, temporary
if(n.gt.100)stop , n too large in LUsolve'
det = l.dO
*
* First, determine a scaling factor for each row. (We
* could "normalize" the equation by multiplying by this
* factor. However, since we only want it for comparison
* purposes, we don't need to actually perform the
* multiplication.)
*
DO i = 1, n
order(i) i
126 Chapter 3: Interpolation and Approximation
max = O.dO
DO j = 1, n
if( abs(a(i,j)) .gt. max) max abs(a(i,j))
END DO
scale(i) = 1.dO/max
END DO
*
* Start the LU decomposition. The original matrix A
* will be overwritten by the elements of L and U as
* they are determined. The first row and column
* are specially treated, as is L(n,n).
*
DO k = 1, n-l
*
* Do a column of L
*
*
IF( k .eq. 1 ) THEN
No work is necessary.
*
*
ELSE
Compute elements of L from Eq. (3.105).
Put L(i,k) into A. sum
DO i = k, n
sum = a(i,k)
DO j = 1, k-l
sum sum - a(i,j)*a(j,k)
END DO
a(i,k)
END DO
ENDIF
*
* Do we need to interchange rows? We want the largest
* (scaled) element of the recently computed column of L
* moved to the diagonal (k,k) location.
*
max = O.dO
DO i = k, n
IF(scale(i)*a(i,k) .ge. max) THEN
max = scale(i)*a(i,k)
imax=i
ENDIF
END DO
*
* Largest element is L(imax,k). If imax=k, the largest
* (scaled) element is already on the diagonal.
Gaussian Elimination 127
*
*
*
*
IF(imax .eq. k)THEN
No need to exchange rows.
ELSE
Exchange rows ...
det = -det
DO j=1,n
temporary
a(imax,j)
a(k,j)
END DO
*
* scale factors ...
*
temporary
scale (imax)
scale(k)
a(imax,j)
a(k,j)
temporary
scale (imax)
scale(k)
temporary
*
* and record changes in the ordering
*
*
itemp
order (imax)
order(k)
order (imax)
order(k)
itemp
ENDIF
det det * a(k,k)
*
* Now compute a row of U.
*
IF(k. eq.1) THEN
* The first row is treated special, see Eq. (3.102).
*
DO j = 2, n
a(1,j) a(1,j) / a(1,1)
END DO
ELSE
* Compute U(k,j) from Eq. (3.106).
*
DO j = k+1, n
sum = a(k,j)
DO i = 1, k-1
sum = sum - a(k,i)*a(i,j)
END DO
128 Chapter 3: Interpolation and Approximation
* Put the element U(k,j) into A.
*
a(k,j) = sum / a(k,k)
END DO
ENDIF
1000 CONTINUE
*
* Now, for the last element of L
*
sum = a(n,n)
DO j = 1, n-l
sum sum - a(n,j)*a(j,n)
END DO
a(n,n) sum
det = det * a(n,n)
*
* LU decomposition is now complete.
*
* We now start the solution phase. Since the equations
* have been interchanged, we interchange the elements of
* B the same way, putting the result into X.
*
DO i = 1, n
x(i) b( order(i) )
END DO
*
* Forward substitution...
*
x(l) = x(l) / a(l,l)
DO i = 2, n
sum = x(i)
DO k = 1, 1-1
sum = sum - a(i,k)*x(k)
END DO
x(i) sum / a(i,i)
END DO
*
* and backward substitution...
*
DO i = 1, n-l
sum = x(n-i)
DO k = n-i+l, n
sum = sum - a(n-i,k)*x(k)
END DO
Gaussian Elimination 129
x(n-i) sum
END DO
*
* and we're done!
*
end
Acalculation ofthe determinant ofthe matrix has been included. The
determinant of a product of matrices is just the product of the determinants
of the matrices. For triangular matrices, the determinant is just the product
of the diagonal elements, as you'll find if you try to expand it according to
Cramer's rule. So after the matrix A has been written as LU, the evaluation
of the determinant is straightforward. Except, of course, that permuting the
rows introduces an overall sign change, which must be accounted for. If the
determinant is zero, no unique, nontrivial solution exists; small determinants
are an indication of an ill-conditioned set of equations whose solution might
be very difficult.
a EXERCISE 3.15
You might want to test the subroutine by solving the equations
[
2 -1 0
-1 2-1
o -1 2
o 0-1
000
which you've seen before.
EXERCISE 3.16
(3.117)
As an example of a system that might be difficult to solve, consider
[
1 1/2 1/3 1/4 1/5] [Xl] [1]
1/2 1/3 1/4 1/5 1/6 X2 2
1/3 1/4 1/5 1/6 1/7 X3 = 3 .
1/4 1/5 1/6 1/7 1/8 X4 4
1/5 1/6 1/7 1/8 1/9 X5 5
(3.118)
The matrix with elements H
ij
= l/(i + j - 1) is called the Hilbert
matrix, and is a classic example of an ill-conditioned matrix. With
double precision arithmetic, you are equipped to solve this particular
problem. However, as the dimension ofthe array increases it becomes
130 Chapter 3: Interpolation and Approximation
futile to attempt a solution. Even for this 5 x 5 matrix, the determi-
nant is surprisingly small- be sure to print the determinant, as well
as to find the solutions.
Before returning to the problem of curve fitting by least squares, we
need to emphasize the universality of systems of linear equations, and hence
the importance of Gaussian elimination in their solution. You've probably
seen this problem arise many times already, and you will surely see it many
more times. For example, consider a typical problem appearing in university
physics courses: electrical networks, as seen in Figure 3.4.
v
f
FIGURE 3.4 An electrical network requiring the solution of a set of
simultaneous linear equations.
Let's arbitrarily define the potential at reference point i to be zero;
the potential at a is then V. Given the resistances and the applied voltage,
the problem is to determine the potential at each of the other points b, c,
o 0 , h. Current enters the resistance grid at a, passes through the resistors,
and returns to the battery through point i. Since there is no accumulation of
charge at any point, any current flowing into a node must also flow out - this
is simply a statement of the conservation of electrical charge. As an example,
the current flowing through R
1
to the point bmust equal the current flowing
through resistors R
3
and R
4
away from the point b. And from Ohm's law we
know that the voltage drop across a resistor is the product of the resistance
and the current, or that the current is simply the voltage drop divided by the
General Least Squares Fitting 131
resistance. For point b we can then write
v-v" _ Vb-Vd + v,,-Ve
R
1
- R
a
R
4
'
(3.119)
and similar equations for all the other points in the network. Consider a
specific network in which R
1
= R
a
= R
4
= Rg = RIO = R
12
= 1 ohm,
R
2
= R
s
= Rt; = R
7
= Rg = R
ll
= 2 ohms, and V = 10 volts. We easily find
the following set of 7 equations describing the system,
3v" -Vd -V
e
10
3v"
-V
e
-Vf 10
2Vb -3V
d +V
g
0
2Vb
+v"
-6V
e +V
g
+2Vh O. (3.120)
v"
-3V
f +2V
h
0
Vd + ~
-3V
g
0
V
e +Vf -3Vh 0
EXERCISE 3.17
Use LUsolve to find the voltages in the described electrical network.
Much more complicated networks can also be considered. Any net-
work involving simply batteries and resistors will lead to a set of simultane-
ous linear equations such as these. Incorporation of capacitors and inductors,
which depend upon the time rate of change of the voltage, give rise to linear
differential equations. Including transistors and diodes further complicates
the problem by adding nonlinearity to the network.
General Least Squares Fitting
We can now turn our attention to least squares fitting with an arbitrary poly-
nomial of degree m. We'll write the polynomial as
The sum of the square of the errors is then
N
S = L (P(Xi) - Yi)2 ,
j=l
(3.121)
(3.122)
132 Chapter 3: Interpolation and Approximation
where p(x) is our assumed functional form for the data known at the N points
(Xi, Yi)' This error is to be minimized by an appropriate choice of the coeffi-
cients Ci; in particular, we require
(3.123)
This gives us m + 1 equations for the m + 1 unknown coefficients Cj,
+C2 LX; + .
+C2 LX; + .
+Cm Lxj - LYj = 0,
+c
m
L xj+! - L XjYj = 0,
= 0,
co L xj Cl L xj+l C2 L xj+2 + . . . +c
m
L xj+m - L xjYj = 0.
(3.124)
These simultaneous linear equations, also known as normal equations, can be
solved by LUsolve for the coefficients.
3
time height
0.00 1.67203
0.10 1.79792
2
h
0.20 2.37791
e
0.30 2.66408
i
g
0.40 2.11245
h
0.50 2.43969
t
0.60 1.88843
1
0.70 1.59447
0.80 1.79634
0.90 1.07810
1.00 0.21066
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
time
FIGURE 3.5 The measured height of a particle at various times.
General Least Squares Fitting 133
In another experiment, an object has been hurled into the air and its
position measured as a function of time. The data obtained in this hypothet-
ical experiment are contained in Figure 3.5. From our studies of mechanics
we know that the height should vary as the square of the time, so instead of
a linear fit to the data we should use a quadratic one.
EXERCISE 3.18
Fit a quadratic to the position data, using least squares, and deter-
mine g, the acceleration due to gravity.
1.00
0.75
r 0.50
e
s
i 0.25
d
u
a
0.00
l
00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.9
1.0
e
-0.25
r
time
r
0
r
-0.50
-0.75
-1.00
FIGURE 3.6 The residual error from the linear least squares fit to
the data of Figure 3.5.
For some problems, particularly if the functional dependence of the
data isn't known, there's a temptation to use a high order least squares fit.
This approach didn't work earlier when we were interpolating data, and for
the same reasons it won't work here either. Simply using a curve with more
parameters doesn't guarantee a better fit. To illustrate, let's return to the
position data and fit a linear, a quadratic, and a cubic polynomial to the data.
For the linear fit, the least squares error is computed to be 2.84, which seems
rather large given the magnitudes of our data. Even more revealing is the
residual error, Yi - P(Xi), which is plotted in Figure 3.6. The magnitudes of
the error tells us the fit is not very good. Note that the residual error is almost
systematic - at first, it's negative, and then its positive, and then it's negative
again. If we had a good fit, we would expect that the sign of the error would
be virtually random. In particular, we would expect that the sign of the error
134 Chapter 3: Interpolation and Approximation
at one point would be independent of the sign at the previous point, so that
we would expect the sign of the error to change about half the time. But in
these residuals, the error only changes sign twice. This strongly suggests a
functional dependence in the data that we have not included in our polynomial
approximation.
But you have already determined the best quadratic fit, and found
that the least squares error was about 0.52. The quadratic fit is thus a con-
siderable improvement over the linear one. Moreover, the residual error is
smaller in magnitude and more scattered than for the linear fit. In fact, there
are 7 sign changes in the error, much more consistent with the picture of the
approximation "fitting" through the scatter in the data than was the linear
fit.
So we have found a good fit to the data, having both a small magnitude
in the error and a healthy scatter in the signs of the residual errors. The
prudent course of action is to quit! And what if we were to consider a higher
degree polynomial? If there were no scatter in the data, then trying to fit
a cubic would yield a C3 coefficient that was zero. Of course, there's always
some scatter in the data, so this rarely happens. But the fit is no better than
we already have - we find that S = 0.52 again. The unmistakable conclusion
is that there is no advantage in going to higher order approximations.
To summarize: the primary factor in determining a good least squares
fit is the validity of the functional form to which you're fitting. Certainly, the-
oretical or analytic information about the physical problemshould be incorpo-
rated whenever it's available. The residual error, the difference between the
determined "best fit" and the actual data, is a good indicator of the quality of
the fit, and can suggest instances when a systematic functional dependence
has been overlooked.
Least Squares and Orthogonal Polynomials
Although fitting data is a primary application of the method of least squares,
the method can also be applied to the approximation of functions. Consider,
for example, a known function f(x) which we want to approximate by the
polynomial p(x). Instead of a Taylor series expansion, we'll try to find the best
polynomial approximation, in the least squares sense, to the function f (x). To
this end we replace the sum over data points by an integral, and define the
error as being
S = l
b
(f(x) - p(X))2 dx. (3.125)
Least Squares and Orthogonal Polynomials 135
We'll obtain normal equations just as before, by minimizing S with respect to
the coefficients in the approximating polynomial.
Consider a specific example: what quadratic polynomial best approx-
imates sin1rX between 0 and 1? We define the least squares error as
(3.126)
and take the partial derivatives of S with respect to the coefficients to deter-
mine the normal equations
(3.127)
(3.128)
and
(3.129)
Performing the integrals, we find
(3.130)
We can solve this set of equations for the coefficients, and thus determine the
"best" quadratic approximation to be
p(X) = -0.0505 + 4.1225x - 4.1225x
2
, O<x<l. (3.131)
This expression certainly would not have been found by a Taylor series ex-
pansion. It is, in fact, fundamentally different. Whereas the Taylor series is
an expansion about a point, the least squares fit results from consideration of
the function over a specific region.
In principle, we could find higher order polynomials giving an even
better approximation to the function. But Equation (3.130) seems familiar-
136 Chapter 3: Interpolation and Approximation
writing it in matrix form we have
[
1 1/2 1/3]
1/2 1/3 1/4
1/3 1/4 1/5
(3.132)
and we recognize the square array as a Hilbert matrix. If we were to consider
higher order approximations, the dimension of the Hilbert matrix would in-
crease and its determinant would become exceedingly small, making it very
difficult to solve this problem. We should stress that any polynomial approx-
imation by least squares leads to this form, not just the particular example
we've treated. (Actually, the function being approximated only appears on
the right-hand side, and so doesn't influence the array at all. Only the limits
of the integration affect the numerical entries in the array.)
Instead of expanding the function as
(3.133)
let's expand it in a more general fashion as
(3.134)
where ai are coefficients and the Pi(X) are i-th order polynomials. Let's see
what conditions we might put on these polynomials to make the normal equa-
tions easier to solve. We'll consider the general problem of fitting over the
region [-1,1], and write the error as
(3.135)
partial differentiation with respect to the ai lead to the normal equations
a-1 1
1
p ~ ( x ) dx +a1 1>0(X)P1 (x) dx +a2111 Po (X)P2(X) dx
=1
1
/ 0 (X)!(X) dx, (3.136)
Least Squares and Orthogonal Polynomials 137
Qo [1/0(X)P1 (x) dx + Q1 dx + Q2 [1/1(X)P2(X) dx
= [l/l(X)f(X)dX, (3.137)
and
QO [11 POP2(X) dx + Q1 [>1(X)P2 (X) dx + Q2 [11 dx
= [11 P2(x)f(x) dx. (3.138)
Now, these equations for Qi would be easier to solve if we could choose the
Pi (x) to make some of these integrals vanish. That is, if
1
1 Pi (x)Pj (x) dx = 0,
i1
i # j,
(3.139)
then we could immediately solve the normal equations and find that the "best
fit" in the least squares sense is obtained when the expansion coefficients are
simply
pi(x)f(x) dx
Qi = l' (3.140)
1-1 p;(x) dx
We should also note that the determination of one coefficient is independent
from all the others - once Q1 has been found, it doesn't change as the expan-
sion is extended to include higher order polynomials. These are remarkable
properties, well worth having.
Functions which satisfy Equation (3.139) are called orthogonal poly-
nomials, and play several important roles in computational physics. We might
define them in a more general fashion, as satisfying
i # j,
(3.141)
where [a, b] is the region of interest and w(x) is a weighting function. Sev-
eral well-known functions are actually orthogonal functions, some of which
are listed in Table 3.2. Of these, the Legendre, Laguerre, and Hermite poly-
nomials all are of direct physical significance, and so we often find instances
in which the physical solution is expanded in terms of these functions. The
Chebyshev polynomials have no direct physical significance, but are particu-
larly convenient for approximating functions.
138 Chapter 3: Interpolation and Approximation
TABLE 3.2 Orthogonal Polynomials
[a, b]
[-1,1]
[-1,1]
[-1,1]
[0,00)
(-00,00)
W(X)
1
(1 - x
2
)-1/2
(1- x
2
)+1/2
e-::r:
e-
z2
Symbol
Pn(X)
Tn(x)
Un(x)
Ln(x)
Hn(x)
Name
Legendre
Chebyshev I
Chebyshev II
Laguerre
Hermite
Nonlinear Least Squares
Not all the problems that confront us are linear. Imagine that you have some
experimental data that you believe should fit a particular theoretical model.
For example, atoms in a gas emit light in a range of wavelengths described by
the Lorentzian lineshape function
1
1(>-') = 1
0
1 + 4(>-' _ >-'0)2 jr2'
(3.142)
where >-. is the wavelength of the light emitted, >-'0 is the resonant wavelength,
r is the width ofthe curve (full width at half maximum), and 1
0
is the intensity
of the light at >-. = >-'0. From measurements of 1(>-'), which necessarily contain
experimental noise, we're to determine >-'0, r, and 1
0
(Actually, the intensity
is often measured in arbitrary units, and so is unimportant. The position
and width of the curve, however, are both significant quantities. The position
depends upon the atom emitting the light and hence serves to identify the
atom, while the width conveys information concerning atom's environment,
e.g., the pressure of the gas. )
Sample data of ambient room light, obtained by an optical multichan-
nel analyzer, are presented in Figure 3.7. A diffraction grating is used to
spread the light, which then falls upon a linear array ofvery sensitive electro-
optical elements, so that each element is exposed to some (small) range of
wavelengths. Each element then responds according to the intensity of the
light falling upon it, yielding an entire spectrum at one time. Note that the
data is not falling to zero away from the peak as would be suggested by our
theoreticallineshape. This is typical of experimental data - a baseline must
Nonlinear Least Squares 139
also be established. We thus define our error as
N
S = LUj - B - I(Aj))2,
j=l
(3.143)
where lj are the measured intensities. To obtain the normal equations, we
minimize S with respect to B, the baseline, and the lineshape parameters Ao,
r, and 1
o
, as before. However, leA) is not a linear function of Ao and r, so
that the normal equations we derive will be nonlinear equations, m u c l ~ more
difficult than the linear ones we had earlier.
150
I
n
t
e
100
n
s
i
t
y
50
4358.0 4358.5
>. (Angstrom)
FIGURE 3.7 A spectrum of ambient room light, in arbitrary units
of intensity. (Data courtesy L. w: Downes, Miami University.)
Rather than pursue this approach, let's look at the problem from a
different perspective. First, let's write the least squares error as
N
S(al,a2, ... ,a
m
) = L(Yk - Y(Xk;al,a2, ... ,a
m
))2,
k=l
(3.144)
where Yk are the data at Xk and Y(Xk; aI, a2, ... am) is the proposed function
evaluated at Xk and parameterized by the coefficients ab a2, ... , am' Our
goal is to minimize S - why don't we simply vary the parameters ai until
we find a minimum? If we already have initial guesses a ~ , ag, ... , a ~ that lie
near a minimum, then S will be approximately quadratic. Let's evaluate S at
a ~ +hI, a ~ , and a ~ - hI, holding all the other parameters fixed. The minimum
140 Chapter 3: Interpolation and Approximation
of a quadratic interpolating polynomial through these points then gives us a
better approximation to the minimum of S,
(3.145)
The process is then repeated for a2, a3, ... , am' Admittedly, this is a crude
procedure, but it works! (Sort of.)
We want to keep the programming as general as possible, so that we
don't have to rewrite the routine to solve a different problem. In particular,
the routine doesn't need to know much about S - only how many parameters
are used, and the value of S at some specific locations. Crucial portions of the
code might look something like the following:
Subroutine Minimize
double precision a(6),h(6)
integer M
*
* Get initial guesses for the parameters.
*
call init(M,a,h)
call crude(M,a,h)
end
*
*---------------------------------------------------
*
Subroutine Init(m,a,h)
double precision a(6),h(6)
integer m,max
parameter ( max = 6)
*
* We need four parameters to determine the lineshape.
*
m = 4
*
* Check if properly dimensioned. If not, will need to
* modify the following array bounds:
*
*
*
*
In routine:
minimize
modify arrays:
a, h
*
*
*
crude
sum
Nonlinear Least Squares 141
a, h, a_pIus, a...minus
a
if(m.gt.max) stop 'dimensional error in init'
*
* Initial guesses for the parameters:
*
a(1)
a(2)
a(3)
a(4)
4358.4dO
O.3dO
i20.dO
40.dO
resonant wavelength
linewidth
intensity
Baseline
*
* Initial values for the step sizes
*
h(i) O.ldO
h(2) O.05dO
h(3) 4.dO
h(4) 2.dO
end
*
*--------------------------------------------------------
*
Subroutine CRUDE (m,a,h)
double precision a(6),h(6),a_plus(6),a...minus(6)
double precision sp,sO,sm
integer i,k,m
*
* Cycle through each parameter
*
DOi=l,m
*
* The SUM will be evaluated with the parameters
* A...PLUS, A, and A...MINUS:
*
DO k=i,m
IF(k .eq. i) THEN
a_plus(i) = a(i) + h(i)
a...minus(i)= a(i) - h(i)
ELSE
a_plus(k) = a(k)
a...minus (k) = a (k)
ENDIF
END DO
142 Chapter 3: Interpolation and Approximation
sp
sO
sm
sum(a_plus)
sum( a )
sum(a.Jllinus)
Evaluate the sum.
a(i) = a(i) - 0.5dO*h(i)*(sp-sm)/(sp-2.dO*sO+sm)
*
* As ve move tovards a m1n1mum, ve should decrease
* step size used in calculating the derivative.
*
h(i) = 0.5dO * h(i)
END DO
end
*
*----------------------------------------------------------
*
double precision Function SUM(a)
double precision a(6), x(21), y(21)
double precision TC, lambda, lambda_O, baseline,
+ intensity, linevidth
*
* All the information about the least squares error,
* including the proposed functional form of the data
* --- and the data itself --- is located in this
* function, and NOWHERE ELSE.
*
data (x(i),i=1,21) / 4357.84dO, 4357. 89dO, 4357. 94dO,
+ 4357. 99dO, 435. 804dO, 4358.09dO, 4358. 14dO,
+ 4358. 19dO, 4358. 24dO, 4358. 29dO, 4358. 34dO,
+ 4358.39dO, 4358.44dO, 4358.49dO, 4358. 54dO,
+ 4358. 59dO, 4358. 64dO, 4358. 69dO, 4358. 74dO,
+ 4358.79dO, 4358. 84dO
/
data ( y(i),i=1,21)
/ 40.dO, 44.dO, 41. dO,
+ 46.dO, 47.dO, 54.dO, 66.dO,
+ 97.dO, 129.dO, 153.dO, 165.dO,
+ 168.dO, 143.dO, 111. dO, 79.dO,
+ 64.dO, 52.dO, 51.dO, 44.dO,
+ 46.dO, 41.dO /
*
* The theoretical curve TC is given as
*
*
intensity
*
TC(lambda) baseline +
--------------------------------
*
1+4(lambda-Iambda_O)**2/gamma**2
Nonlinear Least Squares 143
*
* where
lambda_O = a (1)
gamma = a(2)
intensity a(3)
baseline a(4)
*
SUM = O.dO
do i = 1, 21
lambda = x(i)
*
* evaluate the theoretical curve:
*
the resonance wavelength
the linewidth
TC = baseline + intensity /
+ (1.dO + 4.dO*(lambda-lambda_O)**2/gamma**2)
*
* add the square of the difference to the sum:
*
sum sum + ( y(i) - TC )**2
end do
end
As presented, the code will vary each of the parameters once. But, of
course, this doesn't assure us of finding a true minimum. In general, we'll
need to repeat the process many times, adjusting the parameters each time
so that the sum of the errors is decreased. S can be reduced to zero only if
the proposed curve fits every data point. Since we assume there is random
scatter in the experimental data, this cannot occur. Rather, the error will be
reduced to some minimum, determined by the validity of the proposed curve
and the "noise" in the data. Modify the code to make repeated calls to CRUDE.
As with the root-finder, exercise caution - terminate the process if the error
hasn't decreased by half from the previous iteration, and provide a graceful
exit after some maximum number - say, ten - iterations.
n EXERCISE 3.19
With these modifications, perform a least squares fit to the lineshape
data. By the way, can you identify the atom?
Although this crude method works, it can be rather slow to converge.
In essence, it's possible for the routine to "walk around" a minimum, getting
closer at each step but not moving directly toward it. This happens because
144 Chapter 3: Interpolation and Approximation
the method does not use any information about the (multidimensional) shape
of S to decide in which direction to move - it simply cycles through each
parameter in tum. Let's reexamine our approach and see if we can develop a
method that is more robust in moving toward the minimum.
Our goal is to determine the point at which
8S(all a2, . ,am) _ 0
8ai - ,
(3.146)
for i = 1, ... m. (These are just the normal equations we had before.) Let's
expand 8Sj8ai about ag, ... keeping only the linear terms,
8S(all a2,' .. ,am)
8ai
(3.147)
Defining o? = ai - a?,
(3.148)
and
(3.149)
we arrive at the set of linear equations
suo? + S120g + = -Sl
S210? + S220g + = -S2
(3.150)
These equations are linear, of course, because we kept only linear terms in
the expansion of the partial derivative, Equation (3.147). We can write them
Nonlinear Least Squares 145
in matrix form as
~ : : ] [ ~ l ] = [ ~ : ] .
8mm o ~ 8m
(3.151)
The square array whose elements are the second partial derivatives of 8 is
known as the Hessian of 8.
Being a linear matrix equation, we can use the subroutine LUsolve to
solve for the OD'S in Equation (3.151). A better approximation to the location
of the minimum will then be
i = I, ... m. (3.152)
This is virtually the same process used in finding the roots ofequations, except
that we now have a set of simultaneous equations to solve. As with root-
finding, this process is executed repeatedly, so that for the k-th iteration we
would write
a
k
= a
k
-
l
+Ok-l. (3.153)
J J J
The process continues until the o's are all sufficiently small, or until the iter-
ations fail to significantly reduce 8.
This method is nothing more than Newton's method, extended to sev-
eral functions in several independent variables. When close to a minimum, it
converges quite rapidly. But if it's started far from the minimum, Newton's
method can be slow to converge, and can in fact move to a totally different
region of parameter space. In one dimension we placed restrictions on the
method so that would not happen, but in multiple dimensions such bounds
are not easily placed. (In one dimension, bounds are simply two points on the
coordinate, but to bound a region in two dimensions requires a curve, which
is much harder to specify.) To increase the likelihood that Newton's method
will be successful - unfortunately, we can't guarantee it! - we need a good
initial guess to the minimum, which is most easily obtained by using one or
two cycles ofthe CRUDE method before we begin applying Newton's method
to the problem. Fundamentally, we're trying to solve a nonlinear problem,
a notoriously difficult task full of hidden and disguised subtleties. Nonlin-
ear equations should be approached with a considerable degree of respect and
solved with caution.
In Newton's method we need to evaluate the Hessian, the array of
second derivatives. This can be done analytically by explicit differentiation of
146 Chapter 3: Interpolation and Approximation
S. However, it's often more convenient - and less prone to error - to use
approximations to these derivatives. From our previous discussion of central
difference equations, we can easily find that
S
.. _ 8
2
S( ... ,a?, .. .)
,,- 2
8a
i
~ S( ... ,a? + hi,"') - 2S( ... , ~ ? " " , ) + S( ... ,a? - hi," .) (3.154)
hi
and
8
2
S( . .. ,a?, ... , a ~ , ...)
Sij = 8 08 0
a
i
a
j
~ _1_ [S( ... , a? + hi,"" a ~ + hj , ... ) - S( ... , a? +hi,"" a ~ - hj , .. . )
2h
i
2h
j
S( . .. ,a? - hi, ... , a ~ + h
j
, . ) - S( . .. ,a? - hi, ... , a ~ - h
j
, ..)]
2h
j
(3.155)
In coding these derivatives, one might consider writing separate expressions
for each of them. But this runs counter to our programing philosophy ofwrit-
ing general code, rather than specific examples. Having once written the sub-
routine, we shouldn't have to change it just because some new problem has
6 parameters instead of 4. With a little thought, we can write a very general
routine. The code to calculate the Hessian might look like this:
DOUBLE PRECISION Hessian(6,6), Ap(6).Am(6).
+ App(6).Apm(6).Amp(6).Amm(6)
*
* The arrays have the following contents:
*
*
A a(1) a(i) ... , a(m)
*
Ap a(1) ... , a(i)+h(i) ... a(m)
*
Am a(t) ... , a(i)-h(i) ... a(m)
*
*
App a(1) .. a(i)+h(i), ... a(j)+h(j) ... a(m)
*
Apm a(t) .. a(i)+h(i) ... a(j)-h(j) ... a(m)
*
Amp a(1) .. a(i)-h(i) ... a(j)+h(j) ... a(m)
*
Amm a(1) .. a(i)-h(i) ... a(j)-h(j) ... a(m)
*
*
Nonlinear Least Squares 147
* Compute the Hessian:
*
DOi=1,M
DO j = i, M
IF ( i .eq. j) THEN
DO k =1, M
Ap(k) = A(k)
Am(k) = A(k)
END DO
Ap(i) = A(i) + H(i)
Am(i) = A(i) - H(i)
Hessian(i,i)=(sum(Ap) - 2.dO*sum(A)
+ + sum(Am / ( h(i) * h(i) )
ELSE
DOk=1,M
App(k) = A(k)
Apm(k) = A(k)
Amp(k) = A(k)
Amm(k) = A(k)
END DO
App(i) = A(i) + H(i)
App(j) = A(j) + H(j)
Apm(i) = A(i) + H(j)
Apm(j) = A(j) - H(j)
Amp(i) = A(i) - H(i)
Amp(j) = A(j) + H(j)
Amm(i) = A(i) - H(i)
Amm(j) = A(j) - H(j)
*
Hessian(i,j) =
+ ( (sum(App)-sum(Apm/(2.dO*H(j
+ -(sum(Amp)-sum(Amm/(2.dO*H(j )
+ / ( 2.dO * H(i) )
Hessian(j,i) = Hessian(i,j)
ENDIF
END DO
END DO
We've used the array Apm, for example, to store the parameters aI, ... ,ai +
hi, ... , aj - h
j
, ... , am. The introduction of these auxiliary arrays ease the
evaluation of the second derivatives of the quantity SUM having any number
of parameters.
148 Chapter 3: Interpolation and Approximation
a EXERCISE 3.20
Complete the development of the subroutine NEWTON, and apply it to
the least squares problem.
References
Interpolation is a standard topic of many numerical analysis texts, and is dis-
cussed in several of the works listed at the conclusion of Chapter 2. Gaussian
elimination and Crout decomposition are topics from linear algebra, and also
are discussed in those texts.
There are many "special functions" of physics, such as gamma functions,
Bessel functions, and Legendre, Laguerre, Chebyshev, and Hermite polyno-
mials. These are discussed in many mathematical physics textbooks. I have
found the following few to be particularly helpful.
George Arfken, Mathematical Methods for Physicists, Academic Press,
New York, 1985.
Mary L. Boas, Mathematical Methods in the Physical Sciences, John
Wiley & Sons, New York, 1983.
Sadri Hassani, Foundations of Mathematical Physics, Allyn and Ba-
con, Boston, 1991.
(4.1)
Chapter 4:
Numerical Integration
Perhaps the most common elementary task of computational physics is the
evaluation of integrals. You undoubtedly know many techniques for integrat-
ing, and numerical integration should be looked upon as merely another use-
ful method. Certainly, a numerical integration should never be performed if
the integral has an analytic solution, except perhaps as a check. You should
note, however, that such a check might be used to verify an analytic result,
rather than checking the accuracy of the numerical algorithm. It is thus im-
portant that we have some brute-force numerical methods to evaluate inte-
grals. Initially, we will consider well-behaved integrals over a finite range,
such as
1= l
b
I(x) dx.
Eventually we will also treat infinite ranges of integration, and integrands
that are not particularly well-behaved. But first, a little Greek history ...
Anaxagoras of Clazomenae
The Age of Pericles, the fifth century B.C., was the zenith of the Athenian era,
marked by substantial accomplishments in literature and the arts, and the
city of Athens attracted scholars from all of Greece and beyond. From Ionia
came Anaxagoras, a proponent of rational inquiry, and a teacher of Pericles.
Anaxagoras asserted that the Sun was not a deity, but simply a red-hot glowing
stone. Despite the enlightenment of the era, such heresy could not be left
unpunished, and he was thrown in prison. (Pericles was eventually able to
gain his release.) Plutarch tells us that while Anaxagoras was imprisoned, he
occupied himself by attempting to square the circle. This is the first record
of this problem, one of the three "classic problems of antiquity": to square
the circle, to double a cube, and to trisect an angle. As now understood, the
problem is to construct a square, exactly equal in area to the circle, using only
compass and straightedge. This problem was not finally "solved" until 1882,
150 Chapter 4: Numerical Integration
when it was proved to be impossible!
The fundamental goal, of course, is to determine the area ofthe circle.
For over two thousand years this problem occupied the minds of the great
scholars of the world. In partial payment of those efforts, we refer to the
numerical methods of integration, and hence finding the area bounded by
curves, as integration by quadrature. That is, integration can be thought of
as finding the edgelength of the square, Le., four-sided regular polygon ergo
quad, with an area equal to that bounded by the original curve.
Primitive Integration Formulas
It is instructive, and not very difficult, to derive many of the commonly used
integration formulas. The basic idea is to evaluate the function at specific
locations, and to use those function evaluations to approximate the integral.
We thus seek a quadrature formula of the form
N
I ~ L W i / i ,
i=O
(4.2)
where Xi are the evaluation points, Ii = f(Xi), Wi is the weight given the i-th
point, and N + 1 is the number of points evaluated. To keep things simple,
we will use equally spaced evaluation points separated by a distance h. Let's
begin by deriving a closed formula, one that uses function evaluations at the
endpoints of the integration region. The simplest formula is thus
I ~ W
o
fo + WI II,
(4.3)
where Xo = a and Xl = b, the limits of the integration. We want this approx-
imation to be useful for a variety of integrals, and so we'll require that it be
exact for the simplest integrands, f(x) = 1 and f(x) = x. Since these are the
first two terms in a Taylor series expansion of any function, this approxima-
tion will converge to the exact result as the integration region is made smaller,
for any f (x) that has a Taylor series expansion. We thus require that
l
Xi
1dx = Xl - Xo = W
o
+ WI,
XQ
and
(4.4)
Primitive Integration Formulas 151
This is simply a set of two simultaneous equations in two unknowns, W
o
and
WI, easily solved by the methods of linear algebra to give
w; - W _ Xl - Xo
0- 1- 2 .
Writing h = b - a, the approximation is then
l
X
! h
f(x) dx 2(fo + h),
XQ
(4.5)
(4.6)
the so-called trapezoid rule. For the sake of completeness we'll exhibit these
formulas as equalities, using Lagrange's expression for the remainder term
in the Taylor series expansion, Equation (2.7), so that we have
l
X
! h h
3
f(x)dx= -(fo+h) -
XQ 2 12
(4.7)
where is some point within the region of integration. Of course, we're not
limited to N = 1; for three points, we find the equations
and
l
X2
2 X2
3
- Xo
3
2 2 2
X dx = 3 = Woxo +WIX
I
+W2
x
2'
XQ
Solving for the weights, we are led to Simpson's rule,
l
X2
h h
5
f(x) dx = -3 (fo +4h + h) - -
XQ 90
(4.8)
(4.9)
Continuingin this fashion, with 4 points we are led to Simpson's three-eighths
rule,
l
X3
3h 3h
5
XQ f(x) dx = 8 (fo +3h +312 + h) - 80f[4]
while with five points we find Boole's rule
(4.10)
(4.11)
152 Chapter 4: Numerical Integration
These integration formulas all require that f(x) be expressed as a
polynomial, and could have been derived by fitting f(x) to an approximat-
ing polynomial and integrating that function exactly. AI3 more points are in-
corporated in the integration formula, the quadrature is exact for higher de-
gree polynomials. This process could be continued indefinitely, although the
quadratures become increasingly complicated.
Composite Formulas
AI3 an alternative to higher order approximations, we can divide the total in-
tegration region into many segments, and use a low order, relatively simple
quadrature over each segment. Probably the most common approach is to use
the trapezoid rule, one of the simplest quadratures available. Dividing the re-
gion from Xo to x N into N segments of width h, we can apply the trapezoid
rule to each segment to obtain the composite trapezoid rule
l
XN
h h h
f(x) dx ~ "2(Jo +h) + "2(h +12) + ... + "2 (IN-l + fN)
Xo
(
k ~ )
= h 2 + h + 12 + ... + fN-l + 2 .
FIGURE 4.1 Constructing a composite formula from the trapezoid
rule.
(4.12)
This is equivalent to approximating the actual integrand by a series
of straight line segments, as in Figure 4.1, and integrating; such a fit is said
to be piecewise linear. Asimilar procedure yields a composite Simpson's rule,
(4.13)
where h is again the distance between successive function evaluations. This is
equivalent to a piecewise quadratic approximation to the function, and hence
Errors . .. and Corrections 153
is more accurate than the composite trapezoid rule using the same number
of function evaluations. Piecewise cubic and quartic approximations are ob-
viously available by making composite versions of the primitive integration
formulas in Equations (4.10) and (4.11).
a EXERCISE 4.1
Evaluate the integral J
0
1l: sin x dx using approximations to the inte-
grand that are piecewise linear, quadratic, and quartic. With N in-
tervals, and hence N + 1 points, evaluate the integral for N = 4, 8,
16, ... , 1024, and compare the accuracy of the methods.
Errors ... and Corrections
Let's rederive the trapezoid rule. Earlier, we noted that we can derive quadra-
ture formulas by expanding the integrand in a Taylor series and integrating.
Let's try that: consider the integral of Equation (4.1), and expand the func-
tion f(x) in a Taylor series about x = a:
l
b
f(x) dx = l
b
[f(a) + (x _ a)f'(a) + (x -,a)2 !,,(a)
a a 2.
( X a)3 (x a)4 J
+ - flll(a)+ - f[4
1
(a)+ ... d
3! 4! x
h
2
h
3
h
4
h
5
=hf(a)+- f'(a) + - !,,(a) + - fill (a) + - f[4
1
(a) +... (4.14)
2! 3! 4! 5! .
But we could just as easily have expanded about x = b, and would have found
l
b
f(x) dx = l
b
[f(b) + (x _ b)f'(b) + (x -, b)2 !,,(b)
a a 2.
+ (x - b)3 !,,'(b) + (x - b)4 f[41(b) + ...J dx
3! 4!
f()
h2 '() h
3
f"() h
4
III() h
5
[41()
= h b - 2! f b + 3! b - 4! f b + 5! f b + ....
(4.15)
The symmetry between these two expressions is quite evident. A new expres-
154 Chapter 4: Numerical Integration
sion for the integral can be obtained by simply adding these two equations,
l
b
f(x)dx = ~ [f(a) + f(b)] + ~ [!'(a) - !'(b)] + ~ : [!"(a) + !"(b)]
+ h
4
[f"'(a) _ f"'(b)] + ~ [f[4
J
(a) + f[4
1
(b)] + .... (416)
48 240 .
Ai?, it stands, this expression is no better or worse than the previous two, but
it's leading us to an interesting and useful result. We see that the odd- and
even-order derivatives enter this expression somewhat differently, as either a
difference or sum ofderivatives at the endpoints. Let's see if we can eliminate
the even-order ones - the motivation will be clear in just a bit. Making a
Taylor series expansion of f' (x) about the point x = a, we have
!,(x) = !,(a) + (x _ a)!,,(a) + (x - a)2 !,,'(a) + (x - a)3 f[41(a) + .... (4.17)
2 6
In particular, at x = b,
f'(b) = !,(a) + h!,,(a) + h
2
f"'(a) + h
3
f[4
1
(a) +.... (4.18)
2 6
And of course, we could have expanded about x = b, and have found
!,(a) = !,(b) - h!,,(b) + h
2
!,,'(b) _ h
3
f[4
J
(b) +.... (4.19)
2 6
These expressions can be combined to yield
!,,(a) + !,,(b) = ~ [!'(b) - !'(a)] - ~ [!'''(a) - !"'(b)]
_ ~ 2 [f[4
J
(a) + f[4
J
(b)] +.... (4.20)
We could also expand the fIll (x) about x = a and x = bto find
f[4
1
(a) + f[4
J
(b) = ~ [!'''(b) - !"'(a)] + .... (4.21)
UsingEquations (4.20) and (4.21), the even-order derivatives can nowbe elim-
inated from Equation (4.16), so that we have
I
b
f(x) dx = ~ [f(a) + f(b)] + ~ ~ [!'(a) - !'(b)] - ~ : [!'''(a) - !"'(b)] + ....
(4.22)
Romberg Integration 155
The importance ofEquation (4.22) is not that the even derivatives are
missing, but that the derivatives that are present appear in differences. Anew
composite formula can now be developed by dividing the integration region
from a to b into many smaller segments, and applying Equation (4.22) to each
of these segments. Except at the points a and b, the odd-order derivatives
contribute to two adjacent segments, once as the left endpoint and once as
the right. But they contribute with different signs, so that their contributions
to the total integral cancel! We thus find the Euler-McClaunn integration
rule,
l
XN
fo fN
f(x) dx =h(- +it +h + ... + fN-l + -)
Xv 2 2
h
2
[f
l
fl] h
4
[fill fill]
+ - JO - N - - Jo - N + ...
12 720 .
(4.23)
This is simply the trapezoid rule, Equation (4.12), with correction terms, and
so gives us the error incurred when using the standard composite trapezoid
rule. When the integrand is easily differentiated, the Euler-McClaurin in-
tegration formula is far superior to other numerical integration schemes. A
particularly attractive situation arises when the derivatives vanish at the lim-
its ofthe integration, so that the "simple" trapezoid rule can yield surprisingly
accurate results.
EXERCISE 4.2
Reevaluate the integral Jo1< sin x dx using the trapezoid rule with first
one and then two correction terms, and compare to the previous calcu-
lations for N = 4, 8, ... , 1024 points. Since the integrand is relatively
simple, analytic expressions for the correction terms should be used.
Romberg Integration
Although the Euler-McClaurin formula is very powerful when it can be used,
the more common situation is that the derivatives are not so pleasant to evalu-
ate. However, from Euler-McClaurin we know how the error behaves, and so
we can apply Richardson extrapolation to obtain an improved approximation
to the integral.
Let's denote the result obtained by the trapezoid rule with n = 2
m
intervals as Tm,O, which is known to have an error proportional to h
2
Then,
Tm+l,o, obtained with an interval half as large, will have one-fourth the error
156 Chapter 4: Numerical Integration
of the previous approximation. We can thus combine these two approxima-
tions to eliminate this leading error term, and obtain
T. _ 4Tm+1,o - Tm,o
m+l,l - 3
(4.24)
(Though not at all obvious, you might want to convince yourself that this
is simply Simpson's rule.) Of course, we only eliminated the leading term
- Tm+t,l is still in error, but proportional to h
4
If we halve the interval
size again, the next approximation will have one-sixteenth the error of this
approximation, which can then be eliminated to yield
T. _ 16Tm+2,1 - Tm+t,l
m+2,2 - 15
(4.25)
(4.26)
(Again not obvious, this is Boole's rule.) In this way, a triangular array of
increasingly accurate results can be obtained, with the general entry in the
array given by
T. _ 4
k
Tm+k,k_l - Tm+k-1,k-l
m+k,k - 4k -1
For moderately smooth functions, this Romberg integration scheme yields
very good results. As we found with Richardson extrapolation of derivatives,
there are decreasing benefits associated with higher and higher extrapola-
tions, so that k should probably be kept less than 4 or 5.
The code fragment developed earlier for Richardson extrapolation is
directly applicable to the present problem, using the trapezoid rule approx-
imation to the integral as the function being extrapolated. The composite
trapezoid rule can itself be simplified, since all (interior) function evaluations
enter with the same weight and all the points used in calculating Tm,O were
already used in calculating Tm-1,o. In fact, it's easy to show that
b 2
m
_l
1f(x) dx Tm,o = +h . L f(a +ih),
t=I,3, ...
b-a
where the sum over is over the points not included in Tm-1,o,
a EXERCISE 4.3
(4.27)
Write a computer code to perform Romberg integration, obtaining 8-
figure accuracy in the calculation of the integral, as determined by a
A Change ofVariahles 157
relative error check. Your code should include a "graceful exit" if con-
vergence hasn't been obtained after a specified number of halvings,
say, if m 2: 15. The dimensions used in the previous code fragment
should be adjusted appropriately. Test your code on the integral
r/
2
dO
J
o
1 +cosO'
Diffraction at a Knife's Edge
In optics, we learn that light "bends around" objects, i.e., it exhibits diffrac-
tion. Perhaps the simplest case to study is the bending of light around a
straightedge. In this case, we find that the intensity of the light varies as
we move away from the edge according to
/ = 0.5/
0
{[C(v) +0.5]2 + [S(v) +0.5]2}, (4.28)
where /0 is the intensity of the incident light, v is proportional to the distance
moved, and C(v) and S(v) are the Fresnel integrals
and
EXERCISE 4.4
C(v) = l
v
cos(7rW
2
/2) dw
S(v) = l
v
sin(7rw
2
/2) dw.
(4.29)
(4.30)
Numerically integrate the Fresnel integrals, and thus evaluate ///0
as a function of v. Plot your results.
A Change of Variables
Well, now we have a super-duper method of integrating ... or do we? Consider
the integral
/=1
1
~ d x ,
-1
(4.31)
158 Chapter 4: Numerical Integration
displayed in Figure 4.2.
f ( x ) = ~
x
-1 0 1
FIGURE 4.2 A simple-looking integral.
The integral doesn't look like it should give us much trouble, yet when we try
to do Romberg integration, we generate the following table:
m
Tm,o Tm,l Tm,2 Tm,3
0 0.00000000
1 1.00000000 1.33333333
2 1.36602540 1.48803387 1.49834724
3 1.49785453 1.54179758 1.54538182 1.54612841
4 1.54490957 1.56059458 1.56184772 1.56210908
5 1.56162652 1.56719883 1.56763912 1.56773104
6 1.56755121 1.56952611 1.56968126 1.56971368
7 1.56964846 1.57034754 1.57040230 1.57041375
8 1.57039040 1.57063771 1.57065705 1.57066110
9 1.57065279 1.57074026 1.57074709 1.57074852
10 1.57074558 1.57077 650 1.57077 892 1.57077 943
It's clear that none ofthe approximations are convergingvery rapidly- What
went wrong?
Well, let's see ... The Euler-McClaurin formula gives us an idea of
what the error should be, in terms of the derivatives of the integrand at the
endpoints. And for this integrand, these derivatives are infinite! No wonder
Romberg integration failed!
Before trying to find a "cure," let's see if we can find a diagnostic
that will indicate when the integration is failing. Romberg integration works,
when it works, because the error in the trapezoid rule is quartered when the
step size is halved - that seems simple enough to check. Using the difference
between Tm+1,o and Tm,o as an indication of the error "at this step," we can
A Change of Variables 159
evaluate the ratio of errors at consecutive steps. In order for the method to
work, this ratio should be about 4:
Rm = T
m
-
1
,0 - Tm,o ::::: 4.
Tm,o - Tm+1,0
(4.32)
Using the trapezoid approximations given in the table, we compute the ratios
to be
m
Rm
1 2.73205
2 2.77651
3 2.80159
4 2.81481
5 2.82157
These ratios are obviously not close to 4, and the integration fails. The reason,
of course, is that the Taylor series expansion, upon which all this is based,
has broken down - at the endpoints, the derivative of the function is infi-
nite! Therefore, the approximation for the integral in the first and last inter-
vals is terrible! The total integral converges, slowly, only because smaller and
smaller steps are being taken, so that the contributions from the first and last
intervals are correspondingly reduced. In essence, you are required to take
a step size sufficiently small that the error incurred in the end intervals has
been reduced to a tolerable level. All those function evaluations in the interior
ofthe integration region have done nothing for you! What a waste.
This problem is crying out to you, use a different spacing! Put a lot
of evaluations at the ends, if you need them there, and not so many in the
middle. Mathematically, this is accomplished by a change of variables. This
is, of course, a standard technique of analytic integration and a valuable tool
in numerical work as well. The form of the integrand suggests that we try a
substitution of the form
x = cosO,
so that the integral becomes
1=/1 ~ dx = r sin
2
odO.
-1 10
(4.33)
(4.34)
Using the Romberg integrator to evaluate this integral, we generate
the following results:
160 Chapter 4: Numerical Integration
m
o
1
2
3
4
Tm,o
0.00000000
1.57079633
1.57079633
1.57079633
1.57079633
Tm,l
2.09439510
1.57079633
1.57079633
1.57079633
1.53588974
1.57079633
1.57079633
1.57135040
1.57079633
At first blush, these results are more surprising than the first ones - only
one nonzero function evaluation was required to yield the exact result! Well,
one thing at a time. First, it's only coincidence that the result is exact - in
general, you have to do some work to get a good result. However, we shouldn't
have to work too hard - look at the derivatives at the endpoints! For this
integrand, all those derivatives are zero, so that Euler-McClaurin tells us
that the trapezoid rule is good enough, once h is sufficiently small.
As a practical matter, we need to do two things to the Romberg com-
puter code. First, the ratios Rm should be evaluated, to verify that the method
is working correctly. The ratio will not be exactly 4, but it should be close, say,
within 10%. If the ratio doesn't meet this requirement, the program should
exit and print the relevant information. (Of course, if the requirement is
met, there's no reason to print any of this information. However, a message
indicating that the test was performed and the requirement met should be
printed.) The second thing relates to the last table we generated. All the re-
sults were exact, except the first ones in each column. In general, we should
provide the integrator with a reasonable approximation before we begin the
extrapolation procedure. Delaying the extrapolation a few iterations, say, un-
til m = 2 or 3, simply avoids the spurious approximations generated early in
the process and has little effect on the ultimate accuracy of the integration.
EXERCISE 4.5
Modify your Romberg integrator, and use it to evaluate the elliptic
integral
1=1
1
1 J(1- x
2
)(2 - x) dx.
First, test your diagnostic by trying to evaluate the integral in its
present form. If the diagnostic indicates that Romberg integration
is failing, perform a change of variables and integrate again.
The "Simple" Pendulum
A standard problem of elementary physics is the motion of the simple pendu-
The "Simple" Pendulum 161
lum, as shown in Figure 4.3. From Newton's laws, or from either Lagrange's
or Hamilton's formulations, the motion is found to be described by the differ-
ential equation
d
2
0 .
ml dt
2
= -mg sm0, (4.35)
The mass doesn't enter the problem, and the equation can be written as
.. g.
0= -- smO
I
(4.36)
where we've written the second time derivative as 0, For small amplitudes,
sinO ~ 0, so that the motion is described approximately by the equation
which has as its solution
.. 9
0=--0
I '
(4.37)
O(t) = 0
0
cos 1ft, (4.38)
for the case of the motion beginning at t = 0 with the pendulum motionless
at 0 = 0
0
FIGURE 4.3 The simple pendulum.
But what about the real motion? For large amplitude (how large?), we
should expect this linearized description to fail. In particular, this description
tells us that the period of the oscillation is independent of the amplitude, or
isochronous, which seems unlikely. Let's see if we can do better.
Now, we could attempt an improved description by applying perturba-
tion theory - that is, by starting with the linearized description and finding
162 Chapter 4: Numerical Integration
corrections to it. In the days before computers, this was not only the approach
of choice, it was virtually the only approach available! Today, of course, we
have computers, and ready access to them. We could simply solve the dif-
ferential equation describing the exact motion, directly. And in Chapter 5,
that's exactly what we'll do! But to always jump to the numerical solution of
the differential equation is to be just as narrow-minded and shortsighted as
insisting on applying perturbation theory to every problem that comes along.
There are many ways to approach a problem, and we often learn different
things as we explore different approaches. This lesson, learned when analytic
methods were all we had, is just as valuable now that the have new tools to
use. The computational approach to physics is most successful when numer-
ical methods are used to complement analytic ones, not simply replace them.
Starting with Equation (4.36), multiply both sides by iJ and integrate
to obtain
1 2 9
2
0
= l coso +C,
(4.39)
where C is a constant of integration, to be determined from the initial condi-
tions: for iJ = 0 at t = 0, we have C = - 9/ l cos 0
0
Solving for iJ, we find
or
. dO f2i
0= dt = VTVcosO - cos 00 ,
(4.40)
(4.41)
dt = {l dO .
Y29 ..;cos 0 - cos 0
0
Now the total period, T, is just four times the time it takes the pendulum to
travel from 0 = 0 to 0 = 0
0
, so that
{Jyl
(JO dO
T-4 -
2g 0 ycosO - cos 0
0
(4.42)
This is an elliptic integral ofthe first kind, and is an exact result for the period
of the pendulum. But of course, it remains to evaluate the integral. In partic-
ular, the integrand is a little unpleasant at the upper limit of integration.
Let's convert this integral into the "standard" form, originated by
Legendre, using the identity
o 0 ( . 2 0
0
2 0)
cos -cos 0=2sm 2-
sm
2
(4.43)
The "Simple" Pendulum 163
and the substitution
We then find that
where
. ()
SIn -
. c 2
sIn." = --()-.
. 0
sm-
2
vr
(
.()o)
T=4 -K sm- ,
9 2
(4.44)
(4.45)
(4.46)
l
1r
/
2
dt,
K(k) =
o J1 - k
2
sin
2
~
is the complete elliptic integral ofthe first kind. Clearly, this integrand is much
nicer than the previous one. The most general elliptic integral ofthe first kind
is parametrically dependent upon the upper limit of the integration,
(4.47)
From these definitions, we clearly have that K(k) = F(k, ~ ) .
The real value of a "standard form" is that you might find values for
it tabulated somewhere. You could then check your integration routine by
computing the standard integral for the values tabulated, before you solved
the particular problem of your interest. For example, the following table is
similar to one found in several reference books of mathematical functions.
sin-1k
K(k)
0 1.5707963270
10 1.58284 28043
20 1.62002 58991
30 1.6857503548
40 1.78676 91349
50 1.93558 10960
60 2.1565156475
70 2.50455 00790
80 3.1533852519
90 00
You might want to verify these values before returning your attention to the
motion of the simple pendulum.
164 Chapter 4: Numerical Integration
EXERCISE 4.6
Calculate the period of the simple pendulum, using whatever method
you feel appropriate to evaluate the integral. Be prepared to justify
your choice. Produce a table of results, displaying the calculated pe-
riod versus the amplitude. For what values of the amplitude does the
period differ from 2rr Jfl9 by more than 1%? Is there a problem at
0
0
= 180? Why?
Legendre was thoroughly absorbed by these integrals, and developed
a second and third standard form. Of some interest to physics is the elliptic
integral of the second kind,
(4.48)
which arises in determining the length of an elliptic arc.
Improper Integrals
Integrals of the form
I = 1
00
I(x) dx (4.49)
are certainly not uncommon in physics. It is possible that this integral di-
verges; that is, that I = 00. If this is the case, don't ask your numerical
integrator for the answer! But often these integrals do exist, and we need to
be able to handle them. Many techniques are available to us; however, none
of them actually tries to integrate numerically all the way to infinity. That is,
if the integral is
I = 1
00
x
2
e-
x
dx, (4.50)
it might be tempting to integrate up to some large number A, and then go a
little farther to A', and stop if the integrals aren't much different. But "how
much farther" is enough? - infinity is a long way off! It's better to use a
different approach, one which in some way accounts for the infinite extent of
the integration domain.
One such approach is to split the integration region into two seg-
ments,
1
00
I(x) dx = l
a
I(x) dx +1
00
I(x) dx.
(4.51)
Improper Integrals 165
The first integral is over a finite region and can be performed by a standard
numerical method. The appropriate value for a will depend upon the partic-
ular integrand, and how the second integral is to be handled. For example, it
can be mapped into a finite region by a change of variables such as
x ~ l/y,
so that the integral becomes
1
00 lIla f( -1)
f(x)dx = +dy.
a 0 Y
(4.52)
Ifthe change of variables hasn't introduced an additional singularity into the
integrand, this integral can often be evaluated by one of the standard numer-
ical methods.
EXERCISE 4.7
Evaluate the integral
[_ roo ~
- i
o
1 + x
2
to 8 significant digits. (Let a = 1, and break the interval into two
domains.)
Note that x ~ l/y is not the only substitution that might be used - for
a particular f(x) it might be better to choose x ~ e-
Y
, for example. The
substitution
x ~ 1+y (4.53)
l-y
is interesting in this regard as it transforms the interval [0,00] into [-1,1],
which is particularly convenient for Gauss-Legendre integration, to be dis-
cussed latter. The goal is to use a transformthat maps the semi-infinite region
into a finite one while yielding an integrand that we can evaluate.
EXERCISE 4.8
Using the transformation
transform the integral
y
x ~ -
1- y'
1
00 xdx
[-
- 0 (1 + X)4
(4.54)
(4.55)
166 Chapter 4: Numerical Integration
into one over a finite domain, and evaluate.
There's another approach we can take to the evaluation ofan integral:
expand all or part of it in an infinite series. Let's imagine that we need to
evaluate the integral
1
00 dx
[-
- 0 (l+x)JX'
and have broken it into two domains. We first consider the integral
1
00 dx
[2 -
- a (l+x)JX'
(4.56)
(4.57)
where we have chosen a to be much larger than 1. Then we can use the bino-
mial theorem to write
1
l+x
1 1 1 1
----:-------;;-:- = - (1 - - + - - ...).
x ( l + ~ ) x x x
2
(4.58)
Substituting this into the integral, we find
1 2 3 2_Q
=2a-" - -a-" + -a 2 - (4.59)
3 5
The series will converge, by virtue of the convergence of the binomial expan-
sion, for any a > 1, although it clearly converges faster for larger a. Series
expansions can be extremely useful in situations such as these, but we must
always be certain that we are using the series properly. In particular, the
series will only converge if it's being used within its domain of convergence.
So, all we need to do now is to perform the integral from zero to a, but
there seems to be a small problem ...
r dx
It = 10 (1 +x)JX
(4.60)
- the integrand is infinite at one endpoint. Before proceeding, we should
probably convince ourselves that the integral is finite. This can be done by
noting that
1
-- < 1, x >_ 0,
1 +x -
(4.61)
Improper Integrals 167
so that
l
a
...,----_d_x...,----= < l
a
_d_x = 2y'a.
o (1 +x)JX - 0 JX
We thus have a finite upper bound on the integral, so the integral itself must
be finite. To evaluate the integral, we might try a power series expansion such
as
1
23
-- = 1- x + x - x +,.,. (4.62)
l+x
However, we must note that this series converges if and only if x < 1. Since
we've already specified that a > 1, this series cannot be used over the entire
integration region. However, we could break the interval again, into one piece
near the origin and the other piece containing everything else:
i
a
dx l' dx fa dx
(l+x)JX= 0 (l+x)JX+ , (l+x)JX'
(4.63)
This is a very reasonable approach to take - the idea of isolating the difficult
part into one term upon which to concentrate is a very good tactic, one that
we often use. In some cases, it might happen that you not even need to do the
difficult part. For example, in this case you might evaluate the second integral
for increasingly smaller and extrapolate to the limit of --+ O.
Let's try something totally different from the power series method-
just subtract the singularity away! We've already discovered that near zero
the integrand behaves as 1/JX, which we were able to integrate to yield an
upper bound. So, let's simply write
l
a
dx l
a
[ 1 1 1 ]
o (1 +x)JX = 0 JX + (1 + x)JX - JX dx
l
a
dx l
a
-xdx
= 0 JX+ 0 (l+x)JX
= 2y'a -l
a
JXdx.
o 1 +x
(4.64)
The desired integral is thus expressed as an integrated term containing the
difficult part, plus a difference expression that can be integrated numerically.
EXERCISE 4.9
Evaluate the integral
1
2 dx
1-
- 0 (l+x)JX
168 Chapter 4: Numerical Integration
by subtracting the singularity away.
Ifwe can add and subtract terms in an integrand, we can also multiply
and divide them. In general, we might write
JI(x) dx = J;i;j g(x) dx.
(4.65)
This hasn't gained us much, unless g(x) is chosen so that g(x) dx = dy. Ifthis
is so, then we can make the substitution x -+ y to arrive at the integral
J
I(y)d
g(y) y.
(4.66)
Since we are free to choose g, we will choose it so that the integrand in this
expression is "nicer" than just I (x). This, of course, is nothing more than a
rigorous mathematical description of a change of variables, with the function
9 being the Jacobian of the transformation. In our example, we might choose
g(x) = l/Vi, so that
r dx r( Vi )(dX)
J
o
(l+x)Vi = J
o
(l+x)Vi Vi
(4.67)
We now have dy = dx/ Vi, or y = 2Vi. Note that we are choosing 9 to make
the integrand nice, and from that deriving what the new variable should be---
virtually the reverse of the usual change of variable procedure. Making the
substitution, we find
r -,----_d_x-,----= _ {2,fii. 1 dy
J0 (1 + x) Vi - J0 1 + y2 /4 .
(4.68)
Once again, we find that the transformed integrand will be easy to integrate,
which of course was the motivation for transforming in the first place.
EXERCISE 4.10
Evaluate the integral
1
2 dx
[-
- 0 (1 + x) Vi
by changing the variable of integration.
The Mathematical Magic of Gauss 169
Fromtime to time, you'll have to evaluate a function that superficially
appears divergent. An example of such a situation is the integral
r sinx dx.
i
o
x
The integral exists, and in fact the function is finite at every point. However,
ifyou simply ask the computer to evaluate the integrand at x = 0, you'll have
problems with "zero divided by zero." Clearly, either a series expansion of
sinx or the use of L'Hospital's rule will remove the difficulty - but that's
your job, not the computer's!
EXERCISE 4.11
Using the integration tools we've discussed, evaluate the integral
1
1 cos x d
r::;; x.
o yX
The Mathematical Magic of Gauss
Earlier we performed numerical integration by repeated use of the trapezoid
rule, using the composite formula
l
XN
[fa fN]
f(x)dx=h -+!I++fN-1+-,
XQ 2 2
(4.69)
where h = (XN - xo)/N and the x
m
, where the function is being evaluated,
are evenly spaced on the interval [a, b]. That is, the total integration region
has been divided into N equal intervals of width h, and the function is being
evaluated N +1 times. But there's really no reason to require that the inter-
vals be of equal size. We chose this equidistant spacing while developing the
primitive integration formulas so as to keep the derived expressions simple
- in fact, we can obtain much more accurate formulas if we relax this re-
quirement. We begin as we did before, trying to approximate the integral by
a quadrature of the form
b N
1f(x) dx ~ 2: Wmf(xm).
a m=l
(4.70)
170 Chapter 4: Numerical Integration
Note that we are starting the sum at m = 1, so that N refers to the number of
function evaluations being made. A13 with the primitive integration quadra-
tures, WTn is an unknown; now, however, the X
Tn
are also unknown! This gives
us 2N unknowns, so that we need 2N equations in order to determine them.
And, as before, we obtain these equations by requiring that the quadrature be
exact for f (x) being the lowest order polynomials, f (x) = 1, x, x
2
, x
2N
-1.
Unfortunately, the equations thus obtained are nonlinear and extremely diffi-
cult to solve (by standard algebraic methods). For example, we can take N =
2 and require the integration formula to be exact for f(x) = 1, x, x
2
, and x
3
,
yielding the four equations
(4.71a)
1 2 2
2(b - a ) = W
1
X 1 +W
2
X 2, (4.71b)
- a
3
) = w
1
xi + (4.71c)
- a
4
) = + (4.71d)
Although they're nonlinear, we can solve this relatively simple set ofequations
analytically. The first thing is to note that any finite interval [a, b] can be
mapped onto the interval [-1,1] by a simple change of variable:
x-a
y= -1+2--.
b-a
(4.72)
Thus it is sufficient for us to consider only this normalized integration region,
in terms of which the nonlinear equations can be expressed as
(4.73a)
(4.73b)
(4. 73c)
(4.73d)
Equations (4.73b) and (4.73d) can be combined to yield xi = since the
points must be distinct (else we wouldn't have 4 independent variables), we
have that Xl = -X2. Then from Equation (4.73b) we have W
1
= W
2
, and from
Equation (4.73a) we have W
1
= 1. Equation (4.73c) then gives us
2/3 = 2xi,
Orthogonal Polynomials 171
or that Xl = 1/V3. To summarize, we find that the function should be eval-
uated at X
m
= 1/V3, and that the evaluations have an equal weighting of
1. If you like, you may try to find the weights and abscissas for N = 3, but
be forewarned: it gets harder. (Much harder!) This is where Professor Gauss
steps in to save the day. But first, we need to know about ...
Orthogonal Polynomials
Orthogonal polynomials are one of those "advanced" topics that many of us
never quite get to, although they really aren't that difficult to understand.
The basic idea is that there exists a set of polynomials 4Jm(x) such that
(4.74)
(Actually, we can construct such a set, if we don't already have one!) This or-
thogonality condition that orthogonal polynomials obey is simply a mathemat-
ical statement that the functions are fundamentally different. Equation (4.74)
gives us the way to measure the "sameness" oftwo functions - if m # n, then
the functions are not the same, and the integral is zero. Let's see what this
means in terms of a particular example.
The polynomials we're most comfortable with are the ones x
O
, xl, x
2
,
and so on. Start with these, and call them U
m
= x
m
. For the moment, let's
forget the weighting function, settingw( x) = 1, and let a = -1 and b = 1. The
very first integral to think about is
(4.75)
We could simply choose 4J0 (x) = Uo(x) = 1, and satisfy this equation with Co =
2. Although not necessary, it's often convenient to normalize these functions
as well, requiring all the C
m
= 1, so that we actually choose 4Jo(x) = 1/v'2.
The integral of Equation (4.74) will then yield 1, expressing the fact that the
functions are identical.
The next integral to consider is
(4.76)
172 Chapter 4: Numerical Integration
Since the subscripts differ, this integral is required to be zero. In this case,
we can choose </>1 (x) = U1 and find that Equation (4.76) is satisfied. But in
general we can't count on being so lucky. What we need is a universal method
for choosing </>m, independent ofw( x), a, and b, that can be applied to whatever
particular case is at hand.
Let's take </>1 to be a linear combination of what we have, U1 and </>0,
and make it satisfy Equation (4.76). That is, we choose
(4.77)
and require 0:10 to take on whatever value necessary to force the integral to
vanish. With this expression for </>1, we have
ill </>0 (X)</>l (x) dx = ill </>0 (x) [Ul +0: 10</>0 (x)] dx
= 11 ~ [x +0: 10/)2] dx
-1 y2
= 0 +0:10' (4.78)
Since the integral is zero, we have 0:10 = 0 and hence </>1 = x. Again, we'll
normalize this result by considering the integral
(4.79)
so that the normalized polynomial is
(4.80)
The next orthogonal polynomial is found by choosing </>2 to be a linear
combination ofu2, </>0, and </>1:
and requiring both
</>2(X) = U2 +0:21</>1 (x) +0:20</>0 (x)
= x
2
+0:21 J3/2x +0:20/)2,
(4.81)
(4.82)
(4.83)
Gaussian Integration 173
and
[11 <PI (X)<P2(X) dx = O.
From the first we find 020 = -.;2/3, and from the second 021 = O. After
normalization, we find
<P2(X) = ~ 3x
2
- 1. (4.84)
V2 2
The astute reader will recognize these as the first three Legendre polynomials,
although the unnormalized versions are more popular than these normalized
ones. This process, known as Gram-Schmidt orthogonalization, can be used
to generate various sets of orthogonal polynomials, depending upon w(x), a,
and b. (See Table 3.2.)
EXERCISE 4.12
Using this Gram-Schmidt orthogonalization process, find the next
Legendre polynomial, <P3(X).
Gaussian Integration
We now return to the evaluation of integrals. We'll consider a slightly larger
class of integrals than previously indicated, by considering integrals of the
form
b N
1f(x) w(x) dx = L Wmf(x
m
),
a m=l
(4.85)
where w(x) is a positive definite (i.e., never negative) weighting function.
(And yes, it's the same weighting function we just discussed.) Since both the
weights and abscissas are treated as unknowns, we have 2N coefficients to
be determined. Our plan will be to require that this quadrature be exact for
polynomials of order 2N - 1 and less, and use this requirement to determine
the weights and abscissas of the quadrature!
Let f(x) be a polynomial of degree 2N - 1 (or less), and <PN(X) be
a specific orthogonal function of order N. In particular, we let <PN be the
orthogonal polynomial appropriate for the particular weighting function w( x)
and limits of integration a and b, so that
(4.74)
174 Chapter 4: Numerical Integration
That is, the particular set of orthogonal polynomials to be used is dictated by
the integral to be evaluated. (That's not so surprising, is it?)
Now, consider what happens if f (x) is divided by cPN (x): the leading
term in the quotient will be of order N - 1, and the leading term in the re-
mainder will also be of order N - 1. (This is not necessarily obvious, so do the
division on an example of your own choosing. The division will probably not
come out evenly - the remainder is what you need to subtract from f(x) to
make it come out even.) In terms of the quotient and the remainder, we can
write
(4.86)
where both qN -1 (x) and r N -1 (x) are polynomials of order N - 1.
With this expression for f (x), the integral of Equation (4.85) becomes
l
b
f(x) w(x) dx = l
b
qN-1(X) cPN(X) w(x) dx + l
b
rN-1(x) w(x) dx. (4.87)
Since the functions {cPm} are a complete set, we can expand the function
qN-1(X) as
N-1
qN-1 (x) = 2: qi cPi(X),
i=O
(4.88)
where the qi are constants. Note that the summation ranges up to N -1, since
qN-1(X) is an (N -l)-th order polynomial. The first integral on the right side
of Equation (4.87) is then
b N-1 b
1qN-1(X)cPN(X) w(x) dx = 2: qi 1cPi(X)cPN(X)W(x)dx
a i=O a
N-1
= 2: qJ>iNCN
i=O
= o.
(4.89)
Note that the integrals on the right side are merely the orthogonalityintegrals
of Equation (4.74) and that the total integral is zero since the summation only
ranges up to N - 1.
We began by requiring that the integration formula be exact for f(x),
an arbitrary function of order 2N - 1. The product qN-1(X)cPN is also a poly-
Gaussian Integration 175
nomial of order 2N - 1, so it must be true that
b N
1qN-1(X) cPN(X) w(x) dx = L Wm QN-1(Xm )cPN(Xm ).
a m=l
(4.90)
But from Equation (4.89) we know this integral to be zero. Since I(x) is ar-
bitrary, the derived QN-1(X) must also be arbitrary, and so the sum doesn't
vanish because of any unique characteristics of the function QN-1. The only
way to guarantee that this sum will be zero is to require that all the cPN(X
m
)
be zero! Now, that's not as difficult as it might seem - the X
m
are chosen
such that the orthogonal polynomial </J N (x) is zero at these points; since we
need N of these X
m
, we're fortunate that cPN(X) just happens to be an N-th
order polynomial and so possesses N roots. Doyou believe in coincidence? (In
a general sense, these roots might be complex. In cases of practical interest
they are always real.) We've thus determined the abscissas X
m
.
The integration formula is to be exact for polynomials of order 2N -1,
so surely it must be exact for a function of lesser order as well. In particular,
it must be true for the (N - 1)-th order polynomialli,N(X), defined as
(4.91)
This function occurs in connection with Lagrange's interpolating polynomial,
and has the interesting property, easily verified, that
j#i
j = i.
(4.92)
We thus have the exact result
b N
lli,N(X) w(x) dx = L Wm li,N(Xm ) = Wi,
a m=l
(4.93)
so that the weights Wi can be obtained by analytically performing the indi-
cated integration.
Let's consider an example of how this works. In particular, let's de-
velop an integration rule for the integral
(4.94)
176 Chapter 4: Numerical Integration
using two function evaluations. Since the limits ofthe integration are a = -1,
b = 1, and the weighting function is w(x) = 1, Legendre functions are the
appropriate orthogonal polynomials to use. We already know that
(4.95)
The abscissas for the Gauss-Legendre integration are the zeros of this func-
tion,
Xl = -If and X2 = +If (4.96)
The weights are then evaluated by performingthe integral ofEquation (4.93).
In this case,
(4.97)
and
(4.98)
This agrees with our previous result, but was much easier to obtain than solv-
ing a set of nonlinear equations.
EXERCISE 4.13
Determine the weights and abscissas for the Gauss-Legendre integra-
tion rule for N = 3.
Fortunately, we don't have to actually do all this work every time
we need to perform numerical integrations by Gaussian quadrature - since
they're used so frequently, weights and abscissas for various weighting func-
tions and limits of integration have already been tabulated. For example,
weights and abscissas for the Gauss-Legendre quadrature are listed in Ta-
ble 4.1. Since double precision corresponds to roughly 15 decimal digits, the
weights and abscissas are given to this precision. Clearly, for highly accurate
work even more precision is required - one of the standard references for
Gaussian Integration 177
this work, Gaussian Quadrature Formulas by Stroud and Secrest, actually
presents the data to 30 digits.
TABLE 4.1 Gauss-Legendre Quadrature: Weights and Abscissas
1 N
1f(x)dx = L Wmf(xm)
-1 m=1
x
m
W
m
N =2
0.57735 02691 89626 1.00000 00000 00000
N =3
0.77459 66692 41483 0.55555 55555 55556
0.00000 00000 00000 0.88888 88888 88889
N =4
0.86113 63115 94053 0.347854845137454
0.33998 1043584856 0.652145154862546
N =5
0.90617 98459 38664 0.236926885056189
0.53846 93101 05683 0.478628670499367
0.00000 00000 00000 0.56888 88888 88889
N =6
0.93246 95142 03152 0.17132 44923 79170
0.66120 93864 66265 0.36076 1573048139
0.23861 91860 83197 0.467913934572691
N = 7
0.94910 79123 42759 0.129484966168870
0.74153 1185599394 0.279705391489277
0.40584 5151377397 0.381830050505119
0.00000 00000 00000 0.417959183673469
N =8
0.96028 98564 97536 0.101228536290376
0.79666 6477413627 0.22238 1034453375
0.52553 2409916329 0.31370 66458 77887
0.18343 46424 95650 0.36268 37833 78362
N =9
0.96816 02395 07626 0.081274388361574
0.83603 1107326636 0.180648160694857
0.61337 1432700590 0.26061 06964 02936
0.32425 34234 03809 0.31234 70770 40003
0.00000 00000 00000 0.330239355001260
178 Chapter 4: Numerical Integration
N = 10
0.97390 65285 17172
0.86506 33666 88985
0.67940 95682 99024
0.43339 53941 29247
0.14887 43389 81631
0.97822 86581 46057
0.88706 25997 68095
0.73015 2005574049
0.51909 61292 06812
0.26954 31559 52345
0.00000 00000 00000
0.98156 06342 46719
0.90411 7256370475
0.76990 26741 94305
0.58731 7954286617
0.36783 1498998180
0.12523 34085 11169
N=l1
N = 12
0.06667 13443 08688
0.14945 13491 50581
0.21908 63625 15982
0.26926 67193 09996
0.295524224714753
0.055668567116174
0.125580369464905
0.186290210927734
0.233193764591991
0.262804544510247
0.27292 50867 77901
0.047175336386512
0.106939325995318
0.160078328543346
0.20316 74267 23066
0.23349 25365 38355
0.24914 70458 13403
This need for precision in expressing the weights and abscissas of the
quadrature introduces an additional complicating factor - howto insure that
the data are entered in the program correctly, in the first place. The data can
be entered in a DATA statement, or preferably, in a PARAMETER statement-
something like
* A first attempt at entering the WEIGHTS and ABSCISSAS
* for 5-point Gauss-Legendre quadrature:
*
Double Precision X(5),W(5)
PARAMETER(
+ x(1)=-O.9061798459386640dO,w(1)=O.2369268550561891dO,
+ x(2)=-O.5384963101056831dO,w(2)=O.4786286704993665dO,
+ x(3)= O.OdO ,w(3)=O.5688888888888889dO,
+ x(4)= O.5384693101056831dO,w(4)=O.4786286704993665dO,
+ x(5)= O.9061798459386640dO,w(5)=O.2369268550561891dO)
*
This coding is not necessarily the most economical use of space, but that's not
the idea! By using a spacing like this the numbers are relatively easy to read,
and many "typo" errors will be immediately caught because the columns will
Gaussian Integration 179
be out of alignment. But there still might be an error - a digit can simply be
mistyped. But there's an easy way to check: this integration formula should
be exact, i.e., accurate to about 15 digits, for polynomials through X
9
, so do
the integral! (You were going to write an integration routine anyway, weren't
you?)
a EXERCISE 4.14
Evaluate the integral J ~ 1 x
m
, dx, m = 0,1, ... ,9. If the weights and
abscissas have been entered correctly, the results should be accurate
to about 15 digits. By the way, there are errors in the PARAMETER state-
ment in the listed code fragment.
This sort of verification is not particularly exciting, but it saves an immense
amount of wasted effort in the long run. Assuming that you now have a valid
table entered into the program, let's check the statements made concerning
the accuracy of the integration.
EXERCISE 4.15
Evaluate the integral
1
1
x
7
dx
using quadratures with N = 2, 3, 4, and 5. Don't forget to map the
given integration interval into the region [-1,1]. You should observe
that the error decreases as more points are used in the integration,
and that the integral is obtained exactly if four or more points are
used.
But polynomials are easy to evaluate!
EXERCISE 4.16
Using quadratures with 2, 3, 4, and 5 points, evaluate the integral
Since the integrand is not a polynomial, the numerical evaluation is
not exact. As the number of points in the quadrature is increased,
however, the result becomes more and more accurate.
180 Chapter 4: Numerical Integration
Composite Rules
It should be clear from the preceding discussion and from these numerical
experiments that Gaussian quadrature will always be better than Romberg
integration. However, going to larger quadratures is not necessarily the best
thing to do. As was discussed with Romberg integration, the use of higher
order polynomial approximations to the integrand is not a guarantee of ob-
taining a more accurate result. The alternative is to use a composite rule,
in which the total integration region is divided into segments, and to use a
relatively simple integration rule on each segment.
An additional advantage of a composite rule, of course, is that the
error, as determined by difference in succeeding approximations, can be mon-
itored and the integration continued until a predetermined level of accuracy
has been achieved.
EXERCISE 4.17
Evaluate the integral
1
1
e-
x2
dx,
by a composite rule, using 4-point Gauss-Legendre integration within
each segment. For comparison, repeat the evaluation using the trape-
zoid rule and Romberg integration. Compare the effort needed to ob-
tain 8-significant-figure accuracy with these different methods.
GaUSS-Laguerre Quadrature
To use Gauss-Legendre quadrature the limits of the integration must be fi-
nite. But it can happen, and often does, that the integral of interest extends
to infinity. That is, a physically significant quantity might be expressed by the
integral
1=1
00
g(x) dx. (4.99)
For this integral to exist, g(x) must go to zero asymptotically, faster than l/x.
It might even happen that the particular integral of interest is of the form
(4.100)
In this case, I(x) can be a polynomial, for example, since the factor e-
X
will
cause the integrand to vanish asymptotically.
Gauss-Laguerre Quadrature 181
In order to develop a Gauss-style integration formula, we need a set of
functions that are orthogonal over the region [0,00] with the weighting func-
tion W(X) = e-
x
Proceeding as before, we can construct a set of polynomials
that has precisely this characteristic! Beginning (again) with the set Urn = x
rn
,
we first consider the function o = OOOUo, and the integral
1
00
w(x) o(x) o(x) dx = 0601
00
e-
X
dx = 060 = co.
(4.101)
With Co set to unity, we find o(x) = 1. We then consider the next polynomial,
(4.102)
and require that
1
00
e-XO(X)l(X) dx = 0, (4.103)
and so on. This process constructs the Laguerre polynomials; the zeros of
these functions can be found, the appropriate weights for the integration de-
termined. These can then be tabulated, as in Table 4.2. We have thus found
the sought-after Gauss-Laguerre integration formulas,
(4.104)
TABLE 4.2 Gauss-Laguerre Quadrature: Weights and Abscissas
rOO N
in e-
X
f(x)dx = ~ Wm f(xm )
o m=l
x
m
W
m
N = 2
5.857864376269050(-1) 8.535533905932738(-1)
3.41421 35623 73095 1.464466094067262(-1)
N = 4
3.225476896193923(-1) 6.03154 10434 16336(-1)
1. 74576 11011 58347 3.574186924377997(-1)
4.53662 02969 21128 3.888790851500538(-2)
9.39507 09123 01133 5.392947055613275(-4)
N = 6
2.228466041792607(-1) 4.589646739499636(-1)
182 Chapter 4: Numerical Integration
1.1889321016 72623 4.1700083077 21210(-1)
2.992736326059314 1.133733820740450(-1)
5.775143569104511 1.0399197453 14907(-2)
9.83746 74183 82590 2.610172028149321(-4)
1.59828 73980 60170(1) 8.985479064296212(-7)
N =8
1.702796323051010(-1) 3.691885893416375(-1)
9.037017767993799(-1) 4.187867808143430(-1)
2.251086629866131 1.757949866371718(-1)
4.266700170287659 3.334349226121565(-2)
7.045905402393466 2.794536235225673(-3)
1.07585 16010 18100(1) 9.076508773358213(-5)
1.574067864127800(1) 8.485746716272532(-7)
2.286313173688926(1) 1.048001174871510(-9)
N = 10
1.377934705404924(-1) 3.084411157650201(-1)
7.294545495031705(-1) 4.011199291552736(-1)
1.808342901740316 2.180682876118094(-1)
3.40143 36978 54900 6.208745609867775(-2)
5.5524961400 63804 9.501506975181101(-3)
8.3301527467 64497 7.530083885875388(-4)
1.1843785837 90007(1) 2.825923349599566(-5)
1.62792 57831 37810(1) 4.249313984962686(-7)
2.1996585811 98076(1) 1.048001174871510(-9)
2.992069701227389(1) 9.91182 7219609009(-12)
N = 12
1.157221173580207(-1) 2.647313710554432(-1)
6.117574845151307(-1) 3.777592758731380(-1)
1.51261 02697 76419 2.440820113198776(-1)
2.83375 13377 43507 9.0449222211 68093(-2)
4.59922 76394 18348 2.010128115463410(-2)
6.844525453115177 2.663973541865216(-3)
9.62131 68424 56867 2.032315926629994(-4)
1.300605499330635(1) 8.365055856819799(-6)
1. 711685518746226(1) 1.668493876540910(-7)
2.215109037939701(1) 1.3423910305 15004(-9)
2.848796725098400(1) 3.06160 1635035021(-12)
3.70991 2104446692(1) 8.148077467426242(-16)
N = 18
7.816916666970547(-2) 1.855886031469188(-1)
4.124900852591293(-1) 3.10181 7663702253(-1)
1.01652 01796 23540 2.678665671485364(-1)
1.89488 85099 69761 1.529797474680749(-1)
3.054353113202660 6.143491786096165(-2)
4.50420 55388 89893 1.768721308077293(-2)
6.256725073949111 3.6601797677 59918(-3)
8.327825156605630 5.40622 78700 77353(-4)
1.073799004775761(1) 5.616965051214231(-5)
1.351365620755509(1) 4.015307883701158(-6)
1.668930628193011(1) 1.914669856675675(-7)
2.031076762626774(1) 5.836095268631594(-9)
Multidimensional Numerical Integration 183
2.444068135928370(1)
2.916820866257962(1)
3.4627927065 66017(1)
4.104181677280876(1)
4.88339 22716 08652(1)
5.909054643590125(1)
1.07171 12669 55390(-10)
1.089098713888834(-12)
5.386664748378309(-15)
1.049865978035703(-17)
5.405398451631054(-21)
2.691653269201029(-25)
Since the upper limit of the integration is infinite, we cannot develop compos-
ite formulas with Gauss-Laguerre integration. Of course, we can use Gauss-
Legendre integration, using composite formulas if needed, up to some (arbi-
trary) finite limit, and then use Gauss-Laguerre integration from that finite
limit up to infinity (with an appropriate change of variable, of course).
EXERCISE 4.18
The integral
1
00 x3
--dx
o e
X
-1
appears in Planck's treatment of black body radiation. Perform the
integral numerically, with Gauss-Laguerre integration using 2, 4, 6,
and 8 points.
Multidimensional Numerical Integration
Now that we can integrate in one dimension, it might seem that the jump to
two or more dimensions would not be particularly difficult. But actually it is!
First is the issue of the number of function evaluations: if a typical integra-
tion requires, say, 100 points in one dimension, then a typical integration in
two dimensions should take 10,000 points, and in three dimensions a million
points are required. That's a lot of function evaluations! A second difficulty
is that instead of simply specifying the limits of integration, as is done in one
dimension, in two dimensions a region of integration is specified, perhaps as
bounded by a particular curve. And in three dimensions it's a volume, as spec-
ified by a boundary surface - so simply specifying the region of integration
can become difficult. In general, multidimensional integration is a lot harder
than integration in a single variable.
Let's imagine that we need to evaluate the integral
1= l
b
l
d
f(x,y)dxdy.
(4.105)
184 Chapter 4: Numerical Integration
Ifa, b, c, and d are constants, then the integration region is simply a rectangle
in the xy-plane, and the integral is nothing more than
where
y
b
a
1= l
b
F(y) dy,
F(y) = l
d
f(x, y) dx.
c
x
d
(4.106)
(4.107)
FIGURE 4.4 Two-dimensional integration, doing the x-integral "first."
Figure 4.4 suggests what's happening - the rectangular integration
region is broken into strips running in the x-direction: the area under the
function f(x, y) along each strip is obtained by integrating in x, and the total
area is obtained by adding the contributions from all the strips. Of course, we
could equally well have chosen to perform the y integral first, so that
where
1= l
d
G(x) dx,
(4.108)
G(x) = l
b
f(x, y) dy. (4.109)
Ai; indicated in Figure 4.5, this corresponds to running the strips along the
y-axis.
The computer code to perform such two-dimensional integrals is eas-
ily obtained from the one-dimensional code. Although a single subroutine
with nested loops would certainly do the job, it might be easier simply to have
two subroutines, one that does the x-integral and one that does the y-integral.
Multidimensional Numerical Integration 185
b
y
a
c
x
d
FIGURE4.5 Two-dimensional integration, doing the y-integral "first."
Ofcourse, what goes into each will depend upon the order in which you choose
to perform the integrations. Let's say that you've decided to do the x-integral
"first," so that the integral is to be evaluated according to Equations (4.106)
and (4.107), and as depicted in Figure 4.4. Then the code might look some-
thing like this:
* The next code does two-dimensional numerical integration,
* using the function FofY to perform the integration in
* the x-dimension.
*
* Do all necessary setup...
* The DO loop does the y-integral, Equation (4.106).
*
total O.dO
DO i 1, n
y = ...
total total + W(i) * FofY( c, d, y )
END DO
end
*
*---------------------------------------------------------
*
Double Precision Function FofY( c, d, y )
*
* This subroutine evaluates the integral of f(x,y) dx
* between the limits of x=c and x=d, i.e., F(y) of
186 Chapter 4: Numerical Integration
* Equation (4.107). With regard to this x-integration,
* y is a constant.
*
double precision
*
* This DO loop does the x-integral.
*
FofY = O.dO
DO i=1,nn
x = ...
FofY = FofY + vw(i) * f(x,y)
END DO
end
This is only an outline ofthe appropriate computer code - the method ofinte-
gration (trapezoid, Simpson, Gaussian, etc.) hasn't even been specified. And,
of course, whatever method used should be put together so as to guarantee
that either convergence has been obtained or an appropriate message will be
displayed. But these are problems that you've already encountered.
a EXERCISE 4.19
Numerically integrate
JJe-
xy
dxdy
on the rectangular domain defined by 0 :S x :S 2 and 0 :S y :S 1. Use
your personal preference for method of integration, but be sure that
your result is accurate to 8 significant figures.
Other Integration Domains
Integration on a rectangular domain is a rather straightforward task, but we
often need to performthe integration over different regions ofinterest. For ex-
ample, perhaps we need the integral evaluated in the first quadrant, bounded
by a circle of radius 1, as indicated in Figure 4.6. The y-variable still varies
from 0 to 1, but the limits of the x-integration vary from 0 to J'l=Y2. In
general, the limits of the integration are a function of all variables yet to be
integrated. The computer codejust discussed is still appropriate for this prob-
lem except that d, the upper limit ofthe x-integration, is not a constant and is
Other Integration Domains 187
different for each evaluation of the function FofY, i.e., for each x-integration.
The function FofY itself would remain unchanged.
y
---......
'"""-
""
"
\
\
\
\
x
FIGURE 4.6 Atwo-dimensional integration region divided into Carte-
sian strips.
EXERCISE 4.20
Modify your program so that the calling routine adjusts the limits
of the integration. (No changes are needed in the called function,
however.) Then evaluate the integral
r
1
( r ~ )
1= 1
0
1
0
e-XYdx dy
over the quarter circle of unit radius lying in the first quadrant.
It should be noted that a change ofvariables can sometimes be used to simplify
the specification ofthe boundingregion. For example, ifthe integration region
is bounded by a circle, then it might be advantageous to change from cartesian
to polar coordinates. This would yield integration domains as indicated in
Figure 4.7.
EXERCISE 4.21
Evaluate the integral
r
1
( r ~ )
1= 1
0
1
0
e-XYdx dy
over the quarter circle of radius 1 lying in the first quadrant, by first
188 Chapter 4: Numerical Integration
changing to polar coordinates. Note that in these coordinates, the
limits of integration are constant.
FIGURE 4.7 A two-dimensional integration region divided into po-
lar strips.
A Little Physics Problem
Consider a square region in the xy-plane, such that -1 :::; x :::; 1 and -1 :::; y :::;
1, containing a uniform charge distribution p, as depicted in Figure 4.8. The
electrostatic potential at the point (x
p
, yp) due to this charge distribution is
obtained by integrating over the charged region,
(4.110)
For simplicity, take p/47fcQ to be 1.
EXERCISE 4.22
Use your two dimensional integration routine to evaluate <I>(x
p
, yp),
and create a table of values for x
p
, yp = 2,4, ... ,20. Use a suffi-
cient number of points in your integration scheme to guarantee 8-
significant-digit accuracy in your final results.
More on Orthogonal Polynomials 189
y
x
FIGURE 4.8 A uniformly charged square region in Cartesian coor-
dinates.
More on Orthogonal Polynomials
One reason that orthogonal functions in general, and Legendre functions in
particular, are important is that they allow us to write a complicated thing,
such as the electrostatic potential of a charged object, in terms of just a few
coefficients. That is, the potential might be written as
00
<I>(r, 0) = 2:ai(r) Pi (COS 0).
i=O
(4.111)
A relatively few coefficients are then sufficient to describe the potential to
high accuracy, rather than requiring a table of values. Note that the coeffi-
cients ai(r) are functions of r, but are constants with respect to O. Factoring
the potential in this way divides the problem into two portions: the angular
portion, which is often geometric in nature and easily solved, and the radial
portion, which is where the real difficulty of the problem often resides.
The Legendre functions used here are the usual, unnormalized ones,
the first few of which are
Po(x) = 1
P1(x) = X
P
2
(x) = (3x
2
- 1)/2
P
3
(x) = (5x
3
- 3x)/2
P
4
(x) = (35x
4
- 30x
2
+3)/8
P
5
(x) = (63x
5
- 70x
3
+ 15x)/8
(4.112)
190 Chapter 4: Numerical Integration
The orthogonality condition for these unnormalized functions is
1
~ 2
Pm(cos O)Pn(cos 0) sinOdO = --18mn.
o 2m+
(4.113)
Equation (4.111) can then be multiplied by P
j
(cos 0) and integrated to yield
2j + 1 r .
aj(r) = -2- i
o
lJ>(r, 0) Pj(cosO) smOdO.
(4.114)
Due to the nature of the integrand, Gaussian quadrature is well suited to
evaluate this integral, although a change of variables is necessary since the
specified integration is on the interval [0,71"].
Let's reconsider the problem of the uniformly charged square. Ifyou
have not already done so, you should write a subroutine (or function) that per-
forms the two-dimensional integral of Equation (4.110), returning the value
of the potential at the point (x
p
, yp). This function evaluation will be needed
to evaluate the integral for ai(r
p
), as given in Equation (4.114). Of course,
the points at which the function is to be evaluated are determined by rp and
the abscissas at which the Op integral is to be performed. That is, measuring
ocounterclockwise from the positive x-axis, we have
as illustrated in Figure 4.9.
and
y
x
p
= r
p
cosO
p
yP = r
p
sinO
p
, (4.115)
FIGURE 4.9 A uniformly charged square region in polar coordi-
nates.
Monte Carlo Integration 191
EXERCISE 4.23
Expand the potential in a series of Legendre functions through P5.
That is, evaluate the Tp-dependent coefficients ai(T
p
) for i = 0, ... , 5
at T
p
= 2,4, ... , 20. Perform the required integrations by Gaussian
quadrature, using an appropriate number of points N in the integra-
tion. Justify your choice of N.
Monte Carlo Integration
The integration methods discussed so far all are based upon making a polyno-
mial approximation to the integrand. But there are other ways of calculating
an integral, some of which are very different. One class of these methods re-
lies upon random numbers; these methods have come to be known under the
general rubric Monte Carlo, after the famous casino.
f(x)
a b
x
FIGURE 4.10 The integral of f(x) between a and b.
Consider a function to be integrated, as in Figure 4.10 - the integral
is just the area under the curve. If we knew the area, we could divide by the
width of the interval, (b - a), and define the average value of the function, (J).
Conversely, the width times the average value of the function is the integral,
l
b
f(x) dx = (b - a) (J).
So if we just had some way of calculating the average ...
(4.116)
And that's where the need for random numbers arises. Let's imagine
that we have a "list" of random numbers, the Xi, uniformly distributed be-
192 Chapter 4: Numerical Integration
tween a and b. To calculate the function average, we simply evaluate f(x) at
each of the randomly selected points, and divide by the number of points:
(4.117)
As the number of points used in calculating the average increases, (I) N tends
towards the "real" average, (I), and so we write the Monte Carlo estimate of
the integral as
I
b
1 N
f(x) dx ~ (b - a) N L f(Xi).
a i=l
(4.118)
Finding a "list" ofrandom numbers could be a real problem, ofcourse.
Fortunately for us, it's a problem that has already been tackled by others,
and as a result random number generators are fairly common. Unfortunately,
there is a wide range in the quality of these generators, with some of them
being quite unacceptable. For a number of years a rather poor random num-
ber generator was widely distributed in the scientific community, and more
recently algorithms now described as "mediocre" were circulated. For our
purposes, which are not particularly demanding, the subroutine RANDOM, sup-
plied with the FORTRAN compiler, will suffice. If our needs became more
stringent, however, verifying the quality of the generator, and replacing it
if warranted, would take a high priority. In any event, we should note that
these numbers are generated by a computer algorithm, and hence are not
truly random - they are in fact pseudo-random - but they'll serve our pur-
pose. Unfortunately, the argument of RANDOM must be a REAL variable; we also
need the subroutine SEED, which initializes the random number generator. A
suitable code for estimating the integral then would be
Program MONTE_CARLO
*
* Paul L. DeVries, Department of Physics, Miami University
*
* This program computes a Monte Carlo style estimate of
* the integral exp(x) between 0 and 1. (= e-l)
*
double precision sum,e,ranl, x, error, monte
real xxx
integer N,i
integer*2 value
parameter ( e = 2.718281828459045dO )
*
Monte Carlo Integration 193
* Initialize the "seed" used in the Random Number
* Generator, and set the accumulated SUM to zero.
*
value = 1
call seed( value )
sum =O.dO
*
* Calculate the function a total of 1,000 times, printing
* an estimate of the integral after every 10 evaluations.
*
DO i = 1, 100
*
* Evaluate the function another 10 times.
* accumulated total.
*
DO j = 1, 10
call random( xxx )
x = xxx
sum = sum + exp(x)
END DO
SUM is the
*
* The function has nov been evaluated a total of
* (10 * i ) times.
*
N i * 10
MONTE sum / N
*
* Calculate the relative error, from the known value
* of the integral.
*
error = abs( monte - (e-l.dO) )/( e - 1.dO )
vrite(*,*) n, MONTE, error
END DO
end
Before using RANDOM, SEED is called to initialize the generation of a se-
quence of random numbers. The random number generator will provide the
same sequence of random numbers every time it's called, after it's been ini-
tialized with a particular value of value. This is essential to obtaining repro-
ducible results and in debugging complicated programs. To obtain a different
sequence of random numbers, simply call SEED with a different initial value.
This code prints the estimate of the integral after every 10 function
194 Chapter 4: Numerical Integration
evaluations; typical results (which is to say, the ones I found) are presented in
Figure 4.11. With 1,000 function evaluations the Monte Carlo estimate of the
integral is 1. 7530885, compared to e-1 ~ 1.7182818, for an accuracy of about
two significant digits. The figure suggests that the estimate of the integral
has stabilized, if not converged, to this value. This is an illusion, however; as
the number of accumulated points grow, the influence of the last few points
diminishes, so that the variation from one estimate to the next is necessarily
reduced.
-
1.85
1.80
I
1.75
e-l ----------------------
100 200 300 400 500 600 700 800 900 1000
N
FIGURE 4.11 Monte Carlo estimates of the integral fo
1
eX dx using
various numbers of sampling points. The correct result is indicated
by the dashed line.
The accuracy of the Monte Carlo method can be enhanced by using
information about the function. For example, if g(x) ~ f(x), and if we can
integrate g, then we can write
where
I
b
Ib f(x) l
y
-
1
(b) f(x)
f(x) dx = -() g(x) dx = -() dy,
a a 9 X y-l(a) 9 X
(4.119)
y(x) = JX g(t) dt. (4.120)
Instead of uniformly sampling X to integrate f (x), we uniformly sample y and
integrate f(x)jg(x)! To the extent that 9 is a good approximation to f, the
Monte Carlo Integration 195
integrand will be unity, and easy to evaluate. This technique, known as im-
portance sampling, has the effect of placing a larger number of sample points
where the function is large, thus yielding a better estimate of the integral.
a EXERCISE 4.24
Consider the integral
1=1
1
eX dx.
Since eX ~ 1 +x, the integral can be rewritten as
-1
1
~ _1
3
/
2
ev'l+2y-1
1- 1 (1 +x)dx - VI+2Y dy,
o +x 0 1 +2y
where
J
x x2
Y = (1 + t) dt = x + "2
and
(4.121)
(4.122)
(4.123)
x = -1 + VI +2y. (4.124)
This change of variables modifies the limits of integration and the
form of the integrand, of course. To evaluate the integral in its new
form, y is to be uniformly sampled on the interval [0,3/2]. Modify the
previous Monte Carlo program to evaluate this integral.
You probably found a better result than I had obtained, but not much
better. Particularly when you consider that with 1025 function evaluations -
1024 intervals - the composite trapezoid rule yields 1. 7182824, accurate to 7
significant digits. (Simpson's rule gives this same level of accuracy with only
129 function evaluations!) So why do we care about Monte Carlo methods?
Actually, ifthe integral can be done by other means, then Monte Carlo
is not a good choice. Monte Carlo methods come into their own in situations
where the integral is difficult, or even impossible, to evaluate in any other
way. And in order to explain the advantages of Monte Carlo methods in these
cases, we need first to consider some of the ideas from probability theory.
Let's say that we have a Monte Carlo estimate of an integral, obtained
with the first 100 random numbers we generated. And then we make another
estimate of the integral, using the next 100 random numbers. Would these
estimates be the same? Of course not! (Unless the integrand is a constant -
a rather uninteresting case.) A different set of random numbers would in all
likelihood yield a different estimate, although perhaps not too different. And
as a larger and larger number of estimates are considered, we would expect a
196 Chapter 4: Numerical Integration
smooth distribution of estimates to be observed - most ofthem near the true
value of the integral, with the number of estimates decreasing as we moved
away from that true value.
1.6 1.7 1.8 1.9 1.6 1.7 1.8
FIGURE 4.12 Distributions of 10,000 Monte Carlo estimates of the
integral f0
1
eX dx. On the left, each integral was evaluated with N =
100 points; on the right, with N = 400 points.
And that's exactly what we see! In Figure 4.12, two distributions of
10,000 estimates of the integral are displayed. The distribution on the left
was obtained by using 100 points in the estimate of the integral. The plot
should look (at least vaguely) familiar to you - something like a bell shape.
In fact, the Central Limit Theorem of probability theory states that, if N is
sufficiently large, the distribution of sums will be a normal distribution, e.g.,
described by a Gaussian function.
Now, what would happen if we used more points N to estimate the
integral? It would seem reasonable to expect to get a "better" answer, in the
sense that if a large number of estimates of the integral were made, the dis-
tribution of estimates would be narrower about the true answer. And indeed,
this is what's seen on the right side of Figure 4.12, where another 10,000 es-
timates of the integral are plotted. This time, each estimate of the integral
was obtained using N = 400 points.
(To generate these histograms, we kept track of how many Monte
Carlo estimates fell within each narrow "bin." The plot is simply the number
Monte Carlo Integration 197
of estimates in each bin. For N = 100, the bin was 0.004 wide; for the N =
400 example, it was reduced to 0.002.)
Comparing the two distributions in the figure, we immediately see
that the second distribution is much narrower than the first. We can take a
ruler and measure the width of the distributions, say, at half their maximum
values, and find that the second distribution is very nearly one-half the width
of the first. This is another fundamental result from probability theory -
that the width of the distribution of estimates of an integral is proportional
to l/m, so that when we quadrupled the number of points, we halved the
width of the distribution.
Probability theory also tells us howto estimate the standard deviation
of the mean, a measure of the width of the distribution of estimates. Since
68.3% of all estimates lie within one standard deviation of the mean, we can
also say that there is a 68.3% probability that our particular estimate (f) N lies
within one standard deviation of the exact average (f)! (There's also a 95.4%
probability of being within two standard deviations, a 99.7% probability of
being within three standard deviations, and so on.) The standard deviation
can be estimated from the points sampled in evaluating the integral:
(4.125)
It's important to note that aN is accumulated as more points are sampled.
That is, the two sums appearing in Equation (4.125) are updated with every
additional point, and aN can be evaluated whenever it's needed or desired.
Thus
(4.126)
with 68.3% confidence. Implicit in this is one of the strengths of the Monte
Carlo method - if a more accurate estimate of the integral is desired, you
need only to sample the integrand at more randomly selected points. Ai?, N
increases, aN decreases, and the probability of your result lying within some
specified vicinity of the correct result increases. But also contained is its weak-
ness - the improvement goes only as the square root of N.
EXERCISE 4.25
Modifythe Monte Carlo code to include the calculation ofthe standard
198 Chapter 4: Numerical Integration
deviation. Use your code to estimate the integral
1= l
1r
sinxdx
(4.127)
and its standard deviation. In general, the addition of importance
sampling will greatly increase the accuracy of the integration. Noting
that
sin x ~ 4
2
x (7T - x),
7T
o::; x ::; 7T, (4.128)
reevaluate the integral using importance sampling, and compare the
estimated standard deviations.
The "error" in a Monte Carlo calculation is fundamentally different
from that in the other methods of integration that we've discussed. With the
trapezoid rule, for example, the error represents the inadequacy of the linear
approximation in fitting the actual integrand being integrated - by making
the step size smaller, the fit is made better, and the error decreases. Since
the error rv h
2
, the error can be halved if h is decreased by v'2, which is to
say that N would be increased by v'2. In a two-dimensional integral, the step
sizes in both dimensions would have to be decreased, so that N would need
to be increased by a factor of 2. In a general, multidimensional integral of
dimension d, N must be increased by a factor of 2
d
/
2
in order to decrease the
error by a factor of 2.
In a Monte Carlo calculation, the "error" is of a probabilistic nature
- 68.3% of the time, the estimate is within one standard deviation of the
"correct" answer. Ai? more points are included, the average gets better, in
that sense. To perform a multidimensional integration, the random num-
ber generator will be called d times to get each of d coordinate values. Then
the function is evaluated, and the sums updated. Although the Monte Carlo
method converges slowly - only the square root of N - this convergence is
dependent upon the probabilistic nature of the averaging process, and not the
dimensionality of the integral! That is, to reduce the error by 2, N must be
increased by 4, independent of the dimensionality of the integral! Thus the
convergence of the Monte Carlo method is comparable to the trapezoid rule
in four dimensions, and faster if d > 4. For integrals of sufficiently high di-
mensionality, Monte Carlo methods actually converge faster than any of the
other methods we've discussed!
In some areas of physics, such multidimensional integrals occur fre-
quently. In statistical mechanics, for example, a standard problem goes like
this: Given a microscopic variable u which is defined at every point in phase
Monte Carlo Integration 199
space, the equilibrium thermodynamic value uis given as
_ Juexp [-E/kT] dx
3N
dv
3N
u - ~ - - = - , . . : ; - ~ ~ - - : : = - - - : : = -
- Jexp [-E/kT] dX
3N
dV
3N
'
(4.129)
where the notation dx
3N
dv
3N
means to integrate over the three components
of position and velocity for each of N particles. N doesn't need to be very
large for the integral to become intractable to all but Monte Carlo methods of
attack.
And even in situations where the standard methods are applicable,
Monte Carlo might still be preferred, purely on the number of function eval-
uations required. For example, to evaluate a lO-dimensional integral, using
only 10 points per coordinate, requires 10 billion function evaluations. That
might take a while. It's not unusual for symmetry considerations to reduce
the complexity of the problem substantially, but still - we're not going to
evaluate an integral like that by direct methods! With Monte Carlo methods
we can at least obtain some estimate, and a reasonable idea of the error, with
however many points we have. And to improve the result, we need only to
add more function evaluations to the approximation. Clearly, this process is
not likely to yield a highly precise result - but it can give valid estimates of
1- or 2-significant-digit accuracy where other methods totally fail.
EXERCISE 4.26
Evaluate the 9-dimensional integral
I = t... {l dax day daz dbx d ~ dbz dcx dcy dcz
io io (ii+b)2
Use a sufficient number of points in the integration so that the esti-
mated standard deviation is less than 10% of the estimated integral.
Can you find some other way to evaluate the integral? An additional
feature of Monte Carlo integration is that singularities don't bother it too
much - this can't be said of other integration schemes. Integrals similar
to the one in the exercise appear in the study of electron plasmas, but they
are more complicated in that the integration extends over all of space. By an
appropriate change of variable and the use of importance sampling, Monte
Carlo methods can be used to give (crude) estimates of these integrals, where
no other methods can even be applied.
200 Chapter 4: Numerical Integration
Monte Carlo Simulations
In addition to evaluating multidimensional integrals, Monte Carlo methods
are widely used to mimic, or simulate, "random" processes. In fact, one ofthe
first uses of Monte Carlo methods was in designing nuclear reactors - in par-
ticular, to determine how much shielding is necessary to stop neutrons. One
could imagine following the path of a neutron as it moved through the shield-
ing material, encountering various nuclei and being scattered. But after a few
such collisions, any "memory" the neutron had of its initial conditions would
be lost. That is, the specific result of a particular collision would have no cor-
relation with the initial conditions, i.e., it would be "random." Such processes
are said to be stochastic. Such stochastic problems, or physical problems that
are treatable by stochastic methods, abound in physics. And in many cases,
the use of simulations is the only practical tool for their investigation.
As a rather trivial example, consider the drunken sailor problem. A
young seaman, after many months at sea, is given shore leave in a foreign
port. He and his buddies explore the town, find an interesting little bistro
and proceed to partake of the local brew. In excess, unfortunately. When it's
time to return to the ship, the sailor can hardly stand. As he leaves the bistro,
a physicist sitting at the end ofthe bar observes that the sailor is equally likely
to step in any direction. In his condition, how far will he travel after taking
N steps?
FIGURE 4.13 A"typical" random walk on a two-dimensional square
grid.
We can simulate this walk by using random numbers. Let's imagine
a square grid, with an equal probability of moving on this grid in any of four
directions. A random number generator will be called upon to generate the
direction: if between 0 and 0.25, move north; between 0.25 and 0.50, move
Monte Carlo Simulations 201
east; and so on. A typical path is shown in Figure 4.13. Of course, we don't
learn much from a single path - we need to take a large number of paths,
and build up a distribution. Then we can state, in a probabilistic sense, how
far the sailor is likely to travel.
EXERCISE 4.27
Write a computer code to investigate the random walk problem. Plot
the mean distance traveled versus the number of steps taken. For
large N, can you make a statement concerning the functional rela-
tionship between these quantities?
It might seem that this exercise is pure fancy, having little to do with
physics. But what if we were investigating the mobility of an atom attached to
the surface of a crystal? With the atom playing the role of the sailor, and the
grid points corresponding to lattice points ofthe crystal, we have a description
of a real physical situation. And we can use the simulation to ask real phys-
ical questions: How far will the atom travel in a given period of time? What
percentage of the surface needs to be covered to insure that two atoms will be
at adjoining sites 50% of the time? As the percentage of coverage increases,
will patterns of atoms develop on the surface?
In the example, only steps taken in the cardinal directions were per-
mitted. This restriction can be relaxed, of course, and we can also consider
motion in three dimensions. For example, consider the problem of diffusion.
At room temperature, molecules in the air are traveling at hundreds of me-
ters per second. Yet, when a bottle of perfume is opened at the far end of the
classroom, it takes several minutes for the aroma to be perceived at the other
end. Why?
The explanation is that while the velocity of the molecules is great,
there are a large number of collisions with other molecules. Each such colli-
sion changes the direction of the aromatic molecule, so that it wanders about,
much like our drunken sailor, making many collisions while achieving only
modest displacement from its origin. Let's model this process with our Monte
Carlo approach.
We begin with a single molecule, and allow it to travel in a random
direction. The first problem, then, is determining the direction. We want a
uniform distribution of directions, but an element of solid angle is
dO. = sin () d() d. (4.130)
If we were simply to take a uniform distribution in () and in , there would be
202 Chapter 4: Numerical Integration
a "bunching" of chosen directions about the poles. What we really want is a
sin edistribution so that the directions are uniformly distributed throughout
space, i.e., uniformly distributed on the unit sphere. (This is very similar to
what we encountered with importance sampling - changing the variables in
such a way as to put points where we want them. This similarity is not mere
coincidence: ultimately we will take a large number ofevents and average over
them, and hence are performing an integration.) In this case, let's introduce
the variable 9 such that
dO = dgd,
so that 9 and should be uniformly sampled. But that means that
dg = sin ede
or that
g(e) = cos e.
(4.131)
(4.132)
(4.133)
To obtain our uniform distribution of solid angles, we select from a uniform
distribution of random variables on the interval [0,211"]. We then select 9 from
a uniform distribution on [-1,1], and obtain efrom the relation
e= cos-
1
g. (4.134)
After choosing a direction, we allow the molecule to move some given
distance before colliding with a molecule of air. Realistically, this distance is
another variable of the problem, but we'll assume it to be a constant, taken
to be the mean free path. As a result of the collision, the molecule is scattered
in a random direction, travels another mean free path distance, and so on.
Slowed by all these collisions, how far will the molecule travel in a given time?
Of course, the path of a single molecule isn't very significant - we need to
repeat the simulation for many molecules, and average the results.
EXERCISE 4.28
Consider the diffusion ofan aromatic molecule in air, havinga velocity
of 500 meters per second and a mean free path .A of 1 meter. Calculate
the distance (d) a molecule moves in one second, averaging over 100
different molecules.
You probably found that the net displacement is much less than the 500 me-
ters a free molecule would have traveled. We argued that this would be the
case, as a consequence of the many collisions, but we have now successfully
modeled that phenomenon on the computer. The importance of Monte Carlo
Monte Carlo Simulations 203
simulation is not the precision of the result, but the fact that it can yield qual-
itatively valid results in situations where any result is difficult to obtain.
Just how valid are the results? Certainly, the average displacement
can be monitored as more molecular paths are considered, and some feel for
the convergence of the results can be acquired, but perhaps a better approach
is to monitor the distribution of displacements obtained. Physically, this dis-
tribution is akin to the density of aromatic molecules in the air at different
distances from the perfume bottle, a distribution we clearly expect to be con-
tinuous. In our simulation, we have started all the molecules at the same
instant, as if the perfume bottle were opened, many molecules allowed to es-
cape, and then the bottle closed. Mter 1 second, and 500 collisions, we would
expect very few molecules to still be in the vicinity of the origin. We would
also expect few molecules to be found at large displacements since insufficient
time has elapsed for them to travel very far. The distribution is thus expected
to be small (or even zero) at the origin, to increase smoothly to a maximum
and then to decrease smoothly to zero at large displacements. If the Monte
Carlo sampling is sufficiently large, then the distribution we obtain should
mimic this physically expected one. Conversely, if the the distribution is not
realistic, then a larger sampling is needed.
EXERCISE 4.29
Investigate the distribution of net displacements as described. Plot
a histogram indicating the number of paths yielding displacements
between 0 and 1 m, between 1 and 2 m, and so on, using 100 different
paths. Repeat the exercise, plotting histograms with 200, 300, 400,
and 500 different paths, and compare the histograms obtained with
what is expected on physical grounds.
Once we're satisfied that the method is working properly and that the
results are statistically valid, we must ask ourselves if the results are physi-
cally valid. (Actually, this is a question we ask ourselves at every opportunity!)
That the displacement is considerably less than the free displacement would
have been is certainly expected, but what about the magnitude of the result
itself? If you found, as I did, that after 1 second the net displacement is be-
tween 10 and 20 meters, then the the aroma of the perfume reaches the front
of a 30-foot room in less than a second! That seems much too rapid. My own
recollection of similar events is that it takes several minutes for the aroma to
travel that far.
We're forced to conclude that the actual diffusion is slower than what
we've found, which in turn means that the mean free path we've adopted is
too large. We could repeat our simulation with a different value for the mean
204 Chapter 4: Numerical Integration
free path, but there might be a better way. Consider: does the value of the
mean free path really enter the calculation? Or have we actually evaluated
something more universal than we thought: the net displacement, in units
of the mean free path length, after 500 collisions? This is an example of a
scaling relation, and is very important. If we can determine a fundamental
relationship between the magnitude ofthe displacement, in units of the mean
free path, and the number of collisions, then we can apply that relation to
many different physical situations.
EXERCISE 4.30
Reexamine the simulation. Instead of monitoring the displacement
after 1 second, monitor it after every collision, and average over a suf-
ficiently large number of molecular paths to yield valid results. Plot
the average net displacement as a function ofthe number ofcollisions.
40
30
(d)
oX 20
10
1000
N
2000
FIGURE 4.14 The average displacement (d) (in units of the mean
free path oX) is plotted versus the number of steps taken. These data
were obtained by averaging over 1000 different paths.
Averaging over 1000 molecular paths produced results presented in Figure
4.14. (This calculation takes several minutes but needs to be done only once.)
The curve is remarkably smooth, and appears vaguely familiar, which sug-
gests that further investigation might be worthwhile. Recall that if the dis-
placement is proportional to a power of the number of collisions, N,
then
In (d) ~ qlnN
.A .
(4.135)
(4.136)
Monte Carlo Simulations 205
That is, a log-log plot of the data will be linear, and the slope of the line will
be the power q. Such a plot is presented in Figure 4.15. Clearly, the plot is a
linear one. With the data at N = 10 and N = 1000, the slope was determined
~ 0.496.
40
30
20
10
(d)
A
........
10002000
N
FIGURE 4.15 This is the same data plotted in Figure 4.14, but on a
log-log scale. The apparent linearity of the curve suggests a simple
power-law dependence.
There are certain numbers of which we should always be mindful: 7T",
e, integers, and their reciprocals and powers. The slope we've determined
is very close to 1/2 - close enough to suspect that it's not accidental. We
seem to have stumbled upon a fundamental truism, that the displacement is
proportional to the square root of N,
(4.137)
While we should always use analytic results to guide our computations, we
should also be open to the possibility that our computations can lead us to
new analytic results. Our results do not prove this relationship, of course,
but strongly suggest that the relationship exists. At this juncture, we should
set aside the computer and our computations, and research the availability of
analytic derivations ofthis result. As Hamming, a pioneer in modern comput-
ing, has said, "The purpose of computing is insight, not numbers." It appears
that we have gained significant insight from this simulation.
206 Chapter 4: Numerical Integration
EXERCISE 4.31
Plot your data on a log-log scale, and verifY the power dependence.
References
Since numerical integration is fundamental to many applications, accurate
weights and abscissas for Gaussian integration were developed at nearly the
same time as large-scale computers were becoming available to the scientific
community. To avoid duplication of this effort, Stroud and Secrest published
their results for all to use.
A. H. Stroud and Don Secrest, Gaussian Quadrature Formulas. Pren-
tice-Hall, Englewood Cliffs, 1966.
An indispensible source of information regarding mathematical functions is
the reference
Handbook of Mathematical Functions, edited by Milton Abramowitz
and Irene A. Stegun, Dover Publications, New York, 1965.
Monte Carlo methods are becoming more widely used every day. A good in-
troduction is provided by
Malvin H. Kalos and Paula A. Whitlock, Monte Carlo Methods, John
Wiley & Sons, New York, 1986.
Some insight into the quality of random number generators can be discerned
from the article
William H. Press and Saul A. Teukolsky, "Portable Random Number
Generators," Computers in Physics 6, 522 (1992).
Obtaining a Gaussian distribution ofrandom numbers, rather than a uniform
distribution, is discussed in
G. E. P. Box and M.E. Muller, ''A note on the generation of random
normal deviates," Ann. Math. Statist. 29,610 (1958).
Chapter 5:
Ordinary Differential
Equations
To a large extent, the study of physics is the study of differential equations,
so it's no surprise that the numerical solution of differential equations is a
central issue in computational physics. The surprise is that so few traditional
courses in physics, and even mathematics courses in differential equations,
provide tools that are of practical use in solving real problems. The number of
physically relevant "linear second-order homogeneous differential equations
with constant coefficients" is not large; experience has led us to the conclusion
that all the interesting equations are either trivial, or impossibly difficult to
solve analytically. Traditional analysis solves the trivial cases, and can yield
invaluable insight to the solution of the difficult ones. But the bottom line is
that the difficult cases must be treated numerically.
You are probably familiar with the initial value problem, in which
you have a differential equation such as y"(X) = f(x, y', y") and the initial
conditions y(O) = a, y'(O) = b. Equations of this form are quite common, and
we'll develop several methods suitable for their solution. There is another
type of problemthat is also of considerable importance to physics, which we'll
also address in this chapter: the boundary value problem. Here, rather than
having information about the derivative, you are told about the function at
various points on the boundary of the integration region.
Unfortunately, there is no "best" method to solve all differential equa-
tions. Each equation has a character all its own, and a method that works
well on one may work poorly, or even fail, on another. What we will do is
to develop some general ideas concerning the numerical solution of differen-
tial equations, and implement these ideas in a code that will work reasonably
well for a wide variety of problems. Keep in mind, however, that if you are
faced with solving a difficult problem, say, one involving large sets of differen-
tial equations to be solved over a wide range of the independent variable, you
208 Chapter 5: Ordinary Differential Equations
might be better offinvestigating a solution specifically tailored to the problem
you're confronting.
Euler Methods
The most prolific mathematician of the eighteenth century, or perhaps of any
century, was the Swiss-born Leonhard Euler (1707-1783). It has been said
that Euler could calculate with no apparent effort, just as men breathe and
eagles fly, and would compose his memoirs while playing with his 13 children.
He lost the sight in his right eye when he was 28, and in the left when 59,
but with no effect on his mathematical production. Aided by a phenomenal
memory, and having practiced writing on a chalkboard before becomingblind,
he continued to publish his mathematical discoveries by dictating to his chil-
dren. During his life, he published over 500 books and papers; the complete
bibliography of Euler's work, including posthumous items, has 886 entries.
Euler made contributions to virtually every field ofeighteenth-century
mathematics, particularly the theory of numbers, and wrote textbooks on al-
gebra and calculus. The prestige of his books established his notation as the
standard; the modem usage ofthe symbols e, 1r, and i (for y'=1) are directly at-
tributable to Euler. He also made significant contributions in areas of applied
mathematics: Euler wrote books on ship construction and artillery, on optics
and music, and ventured into the areas of physics and astronomy. But our
current interest, which represents only a minute fraction of Euler's output,
is in methods due to him in solving differential equations.
Consider the differential equation
y'(x) = f(x,y).
(If f is a function of x alone, we can immediately "solve" for y:
y(x) = jX f(x) dX.
(5.1)
(5.2)
Since this is an "uninteresting" situation, we'll assume that f is a function of
x and y.) Now, one might try to solve Equation (5.1) by Taylor's series; that
is, if we knew all the derivatives, we could construct the solution from the
expansion
y(x) = y(xo) + (x - xo)y'(xo) + (x - ; ~ O ) 2 y"(xo) +.... (5.3)
Euler Methods 209
Since y'(x) is known, we can obtain the higher derivatives - but it takes a
little work. The second derivative, for example, is
y"(x) = :Xf(x,y) +
8 8
= 8xf(x, y) + f(x, y) 8yf(x, y)
(5.4)
Clearly, this is leading to some complicated expressions, and the situation de-
generates as we move to higher derivatives. As a practical matter, the Taylor
series solution is not very helpful. However, it provides a standard against
which other methods can be measured. To that end, we write
( )
_ +(_ )f( ) + (x - XO)2 [8
f
(x
o
, Yo) + f( )8
f
(X
o
,yo)]
y x - Yo x Xo xo,Yo 2! 8x xo,Yo 8y
+ (x - xO)3 "'(1:) (5.5)
3! y.",
where Yo = y(xo).
FIGURE 5.1 The simple Euler method.
The original differential equation gives us the derivative y' at any
point; if we're given the value of y at some point Xo, then we could approxi-
mate the function by a Taylor series, truncated to two terms:
y(x) y(xo) + (x - xo)y'(xo).
(5.6)
As simpleminded as it is, this methodactually works! (But not well!) Denoting
the size of the step x - Xo by h, we can write Equation (5.6) as an equality
y(xo +h) = y(xo) +hf(xo,Y(xo)) = Yo +hfo,
(5.7)
210 Chapter 5: Ordinary Differential Equations
where we've defined Yo = y(xo) and 10 = I(xo, Yo). This is known as the
simple Euler method, and it allows us to move the solution along, one step at
a time, as indicated in Figure 5.1. A typical implementation is to divide the
total integration region into steps of size h, and to move the solution along
one step at a time in the obvious way. As a check on accuracy, the calculation
can be repeated for a different step size, and the results compared.
a EXERCISE 5.1
Write a computer code to solve the differential equation
y'(x)=y2+1
on the region 0 < x < 1 using Euler's method, with y(O) = o. Plot or
graph your results for h = 0.05, 0.10, 0.15, and 0.20, along with the
exact result.
The problem with the simple Euler method is that the derivative at
the beginning of the interval is assumed constant over the entire step; the
derivative at the end of the interval is not used (in this interval). But we've
already seen that such asymmetric treatments always lead to low levels of
accuracy in the solution. Wouldn't it be better to use some median value of
the derivative, say, the value halfway through the step? Of course - but how
is the derivative at the midpoint evaluated, when the derivative is itself a
function of y? Good question.
Let's use Euler's method to give us aguess at what the solution should
be at the midpoint, Xmid = Xo + ~ . That is,
Y(Xmid) = Yo + ~ Y b = Yo + ~ 10,
(5.8)
where we've again associated the derivative of y with the function I - that is,
we've used the differential equation we're trying to solve. With this expres-
sion for Y(Xmid), we can evaluate the derivative at the midpoint, I(xmid, Ymid) ,
and using that as our approximation to the derivative over the entire interval
we find
y(xo +h) = y(xo) +hI (Xmid' Ymid) (5.9)
This is the modified Euler's method, and has an interesting geometrical in-
terpretation. (See Figure 5.2.) While Euler's method corresponds to drawing
a straight line with derivative I(xo, Yo) through the point (xo, Yo), the modi-
fied Euler's method puts a line through (xo, Yo), but with (approximately) the
derivative at the midpoint of the interval. Another way of thinking of this
Euler Methods 211
method is to consider a simple approximation to the derivative at the mid-
point,
, Y(Xo +h) - Y(Xo)
f(Xmid' Ymid) = Y (Xmid) ~ h . (5.10)
UsingEuler's method to approximate Ymid, the modified Euler method quickly
follows.
FIGURE 5.2 The modified Euler method.
Yet another variation of Euler's method is possible if we attempt a so-
lution using a mean value of the derivative. (See Figure 5.3.) That is, we use
Euler's equation to guess at y(xo + h), which we use to evaluate the deriva-
tive at the end of the interval. This derivative is averaged with the "known"
derivative at the start of the interval, and this mean derivative is used to ad-
vance the solution. The improved Euler method is thus given as
Y(Xo +h) = y(xo) +h fo + f(xo + h, Yo +hfo) .
2
FIGURE 5.3 The improved Euler method.
(5.11)
212 Chapter 5: Ordinary Differential Equations
EXERCISE 5.2
Modify your computer code to solve differential equations by the mod-
ified Euler and improved Euler methods, and solve the equation
y'(x) = y2 +1, yeO) = 0
using a step size h = 0.10. Prepare a table of solutions comparing the
three methods, on the interval 0 < x < 1.
Constants of the Motion
In Exercise 5.1, you probably found that the solution improves as the step
size is made smaller, but that the approximations always lag behind the exact
solution. Using the modified or improved methods gives better results, based
on a comparison amongapproximations or against the exact result. But before
we explore even better approximations, we should note that there are other
ways to judge the quality of a solution: on physical grounds, it might happen
that a particular quantity is conserved. In that situation, the degree to which
that quantity is calculated to be constant is indicative of the quality of the
solution. For example, consider a mass on a spring - the velocity of the mass
is determined from the equation
dv F -kx
- =a= - = --.
dt m m
(5.12)
For simplicity, take the mass and the force constant to equal 1, so that we have
dv
dt = -x.
From the definition of velocity, we also have
dx
dt = v.
(5.13)
(5.14)
For a time step 8, the simple Euler approximation to the solution to this set
of equations is simply
dvl
veto +8) = veto) +8 dt to = veto) - 8x(to),
(5.15)
Constants of the Motion 213
(5.16)
x(to +8) = x(to) +8 ~ ; Ito = x(to) +8v(to).
But the spring - at least, an ideal one - provides a conservative force so
that the energy of the system, E = mv
2
/2 + kx
2
/2, is a constant of the mo-
tion. Imagine that the system is set in motion at t = 0, x = 0 with v = 1. Using
time steps of 8 = 0.1, the equations can be solved, and the "solutions" thus
determined. At each step, we can derive the energy from the calculated posi-
tions and velocities, and exhibit these quantities as in Figure 5.4. Needless to
say, something is not working here.
. ~
.......~ ....
1 ..... o ~ o
_.00 0
0 ......
....6 9 ~ . ~ ... ..\ ...
o. 0
o. 0
o 0
o 0
o 0
1 2 3 0 4 5 6
0
o. 0
e. o. 0
o. 0
0 0
0 0
CJ/I 0
.eo
o
0
.....,.. 00 000
~
-1
FIGURE 5.4 Position (0), velocity(e), and energy() as a function of
time, for the simple harmonic oscillator, as calculated by the simple
Euler method.
But we already knew that the simple Euler method had its problems;
do the modified or improved methods do a better job? Applying the modified
Euler method to Equations (5.13) and (5.14), we first use the simple Euler's
method to approximate v and x halfway through the time increment,
8 8 dv I 8
v(to + 2) = v(to) + 2 dt = v(to) - 2
x
(to),
to
(5.17)
(5.18)
8 8 dxl 8
x(to + 2) = x(to) + 2 dt to = x(to) + 2
v
(to).
These values are then used over the entire time increment to determine v and
214 Chapter 5: Ordinary Differential Equations
x at t = to +8:
dvl 8
v(to +8) = v(to) +8 -d = v(to) - 8x(to + -2)'
t t o + ~
dxl 8
x(to +8) = x(to) +8 -d = x(to) +8v(to + -2)'
t t o + ~
(5.19)
(5.20)
The code to implement the modified Euler method is only a slight
modification of that used for the simple Euler method - simply add the eval-
uation of xmid and vmid to the code, and use these values to evaluate :mew and
vnew from the previous xoid and voId values.
EXERCISE 5.3
Use the modified Euler method with a time step of 0.1 to solve the
"mass on a spring" problem, and present your results in a plot similar
to Figure 5.4. How well is the energy conserved?
It might seem that a numerical method that preserves constants of
the motion is inherently "better" than one that does not. Certainly, the im-
proved and modified Euler methods are to be preferred over the simple Euler
method. But this preference is derived from improvements made in the algo-
rithm, as verified by the computation of constants of the motion, not because
the constants were guaranteed to be preserved.
It is possible to construct an algorithm that preserves constants ofthe
motion. For example, consider the mass-on-a-spring problem. We might use
the simple Euler expression to determine position,
dxl
x(to +8) = x(to) +8 dt to = x(to) +8v(to),
and determine the velocity by requiring that
mv
2
kx
2
E=-+-.
2 2
(5.21)
(5.22)
This will give us the magnitude of the velocity, and we could obtain its sign
by requiring it to be the same as that obtained from the expression
dvj
v(to +8) = v(to) +8 dt to = v(to) - 8x(to).
(5.23)
Runge-Kutta MetJwds 215
This algorithm is absolutely guaranteed to conserve energy, within the com-
puter's ability to add and subtract numbers. But how good is it otherwise?
EXERCISE 5.4
Use this "guaranteed energy-conserving" algorithmto solve the mass-
on-a-spring problem, and plot the results. How does it do?
Runge-Kutta Methods
The Euler methods are examples of a general class of approximations known
as Runge-Kutta methods, characterized by expressing the solution in terms
of the derivative f(x, y) evaluated with different arguments. This is in con-
trast to the Taylor's series solution which requires many different derivatives,
all evaluated with the same arguments. Runge-Kutta methods are extremely
popular, in part due to the ease with which they can be implemented on com-
puters.
We notice that all the Euler methods can be written in the form
y(xo +h) = y(xo) + h[af(xo, Yo) +(3f(xo +r h, Yo +8hfo)],
(5.24)
Let's see how well this expression agrees with Taylor's series. A function
f (x, y) of two variables can be expanded as
f( )
-f( )+( )8
f
(x
o
,yo)+( )8
f
(x
o
,yo)
x, y - Xo, Yo x - Xo 8x y - Yo 8y
+
(x - XO)2 8
2
f ( ~ , T}) + ( )( )8
2
f ( ~ , T})
2 8x2 X - Xo Y - Yo 8x8y
+ (y - yo)2 8
2
f ( ~ , T}) +... (5.25)
2 8y2 '
where Xo ::; ~ :S x and Yo ::; T} ::; y. Using this expression to expand f(xo +
rh, Yo +8hfo) of Equation (5.24), we find that
y(x) = Yo +haf(xo, Yo)
+ h(3 [f(x
o
, Yo) + h
r
8 f ( ~ ~ Yo) + h8f(xo, Yo) 8 f ( ~ ~ Yo) +O(h
2
)]
216 Chapter 5: Ordinary Differential Equations
=Yo + h(a +(3)f(xo, Yo)
+ h
2
(3 [ , - / ~ f ( ~ ~ Yo) +6f(xo, Yo) 8J ( ~ ~ YO)] +O(h
3
) (5.26)
This expression agrees with the Taylor series expression of Equation (5.5)
through terms involving h
2
, if we require that
a +{3 = 1,
{3'Y = 1/2,
and {36 = 1/2 (5.27)
Thus the improved and modified Euler methods both agree with the Taylor se-
ries through terms involving h
2
, and are said to be second-order Runge-Kutta
methods. Although these equations require that 'Y = 6, otherwise there is con-
siderable flexibility in choosing the parameters; the optimum second-order
method, in the sense that the coefficient multiplyingthe h
3
term is minimized,
has a = 1/3, {3 = 2/3, and 'Y = 6 = 3/4.
While the Euler methodjumps right in to find a solution, the improved
and modified methods are more conservative, testing the water (so to speak.)
before taking the plunge. These methods can actually be derived in terms of
an integral: since y' = f(x, y), then clearly
l
XO
+
h
y(xo + h) = y(xo) + f(7, y) d7.
XO
(5.28)
The only problem, of course, is that the sought-after solution y appears under
the integral on the right side of the equation, as well as on the left side of the
equation. Approximating the integral by the midpoint rule, we have
h
y(xo + h) = y(xo) +hf(xo + 2' Ymid).
Ymid is then approximated by a Taylor series expansion,
h
Ymid ~ y(xo) + 2f(xo, Yo).
(5.29)
(5.30)
Since the integral is already in error O(h
2
), there is no point in using a more
accurate series expansion. With these approximations, Equation (5.28) then
reads
(5.31)
Runge-Kutta Methods 217
which we recognize as the modified Euler method. In a similar fashion, the
improved Euler method can be derived by approximatingthe integral in Equa-
tion (5.28) by the trapezoid rule.
The methods we've outlined canbe used to derive higher order Runge-
Kutta methods. Perhaps the most popular integration method ever devised,
the fourth-order Runge-Kutta method, is written in terms of intermediate
quantities defined as
fo = f(xo, Yo),
h h
ft = f(xo + 2'YO + 2fo),
h h
12 = f(xo + 2'Yo + 2ft),
13 = f(xo + h,yo +hh)
The solution is then expressed as
h
y(xo +h) = y(xo) + 6(10 +2ft +212 + h)
(5.32)
(5.33)
This is the standard, classic result often referred to as simply the Runge-
Kutta method, and is a mainstay in the arsenal of numerical analysts. For the
special case that f = f (x), this result is obtained by evaluating the integral of
Equation (5.28) by Simpson's rule.
a EXERCISE 5.5
One of the standard problems of first-year physics is one-dimensional
projectile motion - but contrary to standard practice, let's include
air resistance to see how large an effect it is. The time rate of change
of the momentum is
dp 2
dt = mg - kv ,
where m is the mass of the object, 9 = 9.8 rnIs
2
is the acceleration
due to gravity, and k is a drag coefficient. For a particular sphere
of mass 10-
2
kg the drag coefficient was determined to be k = 10-
4
kg/m. Letting p = mv, use the fourth-order Runge-Kutta method to
find the velocity of the sphere released from rest as a function of time
for 0 < t < 10 seconds. Choose a step size to ensure 4-significant-digit
accuracy. Compare your calculation to the zero-th order approxima-
tion, e.g., the analytic solution obtained by ignoring air resistance.
218 Chapter 5: Ordinary Differential Equations
Adaptive Step Sizes
In the previous exercise, a typical problem involving ordinary differential
equations was presented. Part of the specification of that exercise was the
stipulation that a given level of accuracy in the solution be achieved. How did
you achieve that accuracy, and - before reading further - are you convinced
that your solution is really that accurate?
Probably the most common way to ascertain the accuracy ofa solution
is to calculate it twice, with two different step sizes, and compare the results.
This comparison could be made after many propagation steps, i.e., after inte-
grating the solution for some distance. But the nature of the solution might
change considerably from one region to another - a smaller step size might
be necessary here and not there - so that the calculated results should be
compared often, allowing for a decrease (or increase!) of the step size where
appropriate.
By far the easiest way to accomplish this comparison is to use step
sizes h and h/2, and compare immediately - if the difference is small, then
the error is assumed small. In fact, this estimate of the error is used to adjust
the size of the step. If the error is larger than tolerated, then the step size
is halved. Likewise, if the error is less than some predetermined value, the
steps are too small and too much work is being performed; the step size is
increased. Such an adaptive step size modification to the classic Runge-Kutta
method greatly enhances its utility, so much so that methods without some
form of adaptation simply should not be used.
An additional benefit of having two approximations to the result is
that Richardson's extrapolation can be used to obtain a "better" estimate of
the solution. Since the Runge-Kutta method is accurate through h
4
, the two
solutions can be combined to eliminate the first term of the error (rv h
5
): if
Y(h) and Y(h/2) are the two solutions, the extrapolated value is
EXERCISE 5.6
16Y(h/2) - Y(h)
Yextrapolated = 15
(5.34)
Modify your Runge-Kutta program to take adaptive step sizes, and to
improve upon the results at each step via Richardson's extrapolation.
Use this modified code to solve the projectile problem of the previous
exercise.
Runge-Kutta-Fehlberg 219
Runge-Kutta-Fehlberg
Rather than interval halving/doubling, there's another, even more interest-
ing, way that we can utilize our knowledge of the error to our benefit. Let's
see if we can devise a scheme by which we can maintain a given accuracy for
the derivative at each step in the solution of the differential equation. We'll
denote the tolerated error in the derivative bye, so that the acceptable error
in the function is he. For an n-th order Runge-Kutta solution we have
Y(XQ +h) = Yexact +khn+l, (5.35)
where Yexact is the exact solution, while for an (n + l)-th order solution we
have
fj(XQ +h) = Yexact +kh
n
+
2
.
The difference between these two solutions is simply
Y(XQ +h) - fj(xQ +h) = khn+l - kh
n
+
2
~ khn+l,
where the approximation is valid for small h. We can then solve for k,
(5.36)
(5.37)
(5.38)
But the difference between the two solutions is also a measure of the error,
which is to be maintained at the level he. Let hnew be a new step size, for
which these two expressions for the error agree. That is, we'll require that
hn+l
h
- kh
n
+1 _ new I A I
new
e
- new - h
n
+
1
Y - Y .
Equation (5.39) is easily solved for hnew, with the result
he
hnew = h n ly(xQ +h) _ fj(xQ +h)!'
(5.39)
(5.40)
Now, we need to interpret this result just a bit. With the step size
h, both y and fj can be calculated, and so a direct numerical estimate of the
error Iy - fjl can be calculated. But we know how this error depends on h,
and so can calculate the coefficient k from Equation (5.38). And knowing k
allows us to evaluate an hnew that would have given an error in the derivative
of only e. Voila! If the calculated error is greater than the acceptable error,
then too large a step will have been taken and h will be greater than h
new
.
Since the error has been determined to be larger than acceptable, we'll repeat
220 Chapter 5: Ordinary Differential Equations
the step using a smaller step size. On the other hand, it can happen that the
calculated error is smaller than we've specified as acceptable, h is less than
h
new
, so that we could have taken a larger step. Ofcourse, it would be wasteful
actually to repeat the step - but we can use hnew as a guess for the next step
to be taken! This leads to a very efficient algorithm in which the step size is
continually adjusted so that the actual error is near, but always less than, the
prescribed tolerance. Beginning with an initial value of y(xo) and an initial h,
the algorithm consists of the following steps:
1. Calculate y(xo +h) and y(xo +h) from y(xo).
2. Calculate h
new
- If hnew is less than h, reject the propagation to Xo +h,
redefine h, and repeat step 1. If hnew is greater than h, accept the
propagation step, replace Xo by Xo +h, redefine h, and go to step 1 to
continue propagating the solution.
Since the cost of repeating a step is relatively high, we'll intentionally be con-
servative and use a step size somewhat smaller than that predicted, say, only
90% of the value, so that corrective action is only rarely required.
So far so good, but it's not obvious that much effort, if any, has been
saved. Fehlberg provided the real key to this method by developing Runge-
Kutta methods of different order that use exactly the same intermediate func-
tion evaluations. Once the first approximation to the solution has been found,
it's trivial to calculate the second. The method we will use is the fourth-
order/fifth-order method, defined in terms of the following intermediate func-
tion evaluations:
10 = I(xo, Yo), (5.41)
h h
h = I(xo + 4' Yo + 4/0), (5.42)
3h 3h 9h
12 = I(xo + 8'Yo + 32/0 + 32 h ), (5.43)
12h 1932h 7200h 7296h
13 = I(xo + 13' Yo + 2197 10 - 2197 h + 2197 h), (5.44)
439h 3680h 845h
14 = I(xo +h, Yo + 216 10 - 8hh + ~ 12 - 4104 13), (5.45)
h 8h 3544h 1859h 11h
15 = I(xo + 2' Yo - 27
/0
+2hh - 2565 12 + 4104 13 - 40 14)' (5.46)
With these definitions, the fourth-order approximation is given as
(5.47)
Runge-Kutta-Fehlberg 221
and the fifth-order one as
The error can be evaluated directly from these expressions,
Err - A _ - h (_1_ f _ 128 f - 2197 f ~ f ~ f )
- y y - 360 J 0 4275 2 75240 3 + 50 4 + 55 5 ,
(5.48)
(5.49)
so that y need never be explicitly calculated. Since we're using a fourth-order
method, an appropriate (conservative) expression for the step size is
Ihle
hnew = 0.9h 4 I () )I'
y Xo +h - fj( Xo +h
(5.50)
The computer coding of these expressions is actually quite straight-
forward, although a little tedious. In particular, the coefficients in the expres-
sions must be entered very carefully. In the following computer code, these
coefficients are specified separately from their actual use, and most are spec-
ified by expressions involving operations as well as numerical values. Twelve
divided by thirteen is not easily expressed as a decimal, and should be ex-
pressed to the full precision of the computer if it were; it's easier, and more
accurate, to let the computer do the division. By using symbolic names in the
actual expressions of the algorithm, the clarity of the code is enhanced. And
by placingthe evaluation ofthe coefficients within a PARAMETER statement, we
are assured that they won't be "accidentally" changed. Some of the computer
code might look like the following:
Subroutine R.JLF
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* This program solves differential equations by the
* Runge-Kutta-Fehlberg adaptive step algorithm.
* 11/8/86
*
double precision h, x, y, xO, yO, ...
*
* Specify error tolerance.
*
double precision Epsilon
Parameter ( Epsilon=1.d-5 )
222 Chapter 5: Ordinary Differential Equations
*
* Specify coefficients used in the R-K-F algorithm:
*
* The coefficients Am are used to determine the 'x' at
* which the derivative is evaluated.
*
double precision al,a2,a3,a4,a5
parameter (al=0.25DO, a2=0.375DO, a3=1.2Dl/l.3Dl, ...
*
* The coefficients Bmn are used to determine the 'y' at
* which the derivative is evaluated.
*
double precision bl0, b20,b21, b30,b31,b32, ...
parameter (bl0=0.25DO, b20=3.DO/3.2Dl, ...
*
* The en are used to evaluate the solution YHAT.
*
double precision cO,c2,c3,c4,c5
parameter( cO=1.6Dl/l.35D2, c2=6.656D3/1.2825D4, ...
*
* The Dn are used to evaluate the error.
*
double precision dO,d2,d3,d4,d5
parameter( dO = 1.dO/3.6D2, d2 = -1.28D2/4.275D3, ...
*
* Initialize the integration:
*
h = ...
xO
yO = ...
*
* The current point is specified as (XO,YO) and the
* step size to be used is H. The function DER evaluates
* the derivative function at (X,Y). The solution is
* moved forward one step by:
*
100 fO= der(xO,yO)
200 x = xO + al*h
y = yO + bl0*h*fO
f1 = der(x,y)
x = xO + a2*h
Runge-Kutta-Fehlberg 223
y = yO + b20*h*fO + b21*h*fl
f2 = der(x,y)
x = xO + a3*h
y = yO + h*(b30*fO + b31*fl + b32*f2)
f3 = der(x,y)
x = xO + a4*h
y = yO + h*(b40*fO + b41*fl + b42*f2 + b43*f3)
f4 = der(x,y)
x = xO + a5*h
y yO + h*(b50*fO + b51*fl + b52*f2 + b53*f3
+ + b54*f4)
f5 = der(x,y)
yhat = yO + h * ( cO*fO + c2*f2 + c3*f3
+ + c4*f4 + c5*f5
)
err H * DABS ( dO*fO + d2*f2 + d3*f3
+ d4*f4 + d5*f5 )
MaxErr = h*epsilon
hnew = 0.9dO * h * sqrt( sqrt(MaxErr/Err) )
*
* If the error is too large, repeat the propagation step
* using HNEW.
*
IF( Err .gt. MaxErr) THEN
h = hnew
goto 200
ENDIF
*
* The error is small enough, so this step is acceptable.
* Redefine XO to move the solution along; let the more
* accurate approximation YHAT become the initial value
* for the next step.
*
xO = xO + h
yO = yhat
h = hnew
*
* Have we gone far enough? Do we stop here? What gives?
*
224 Chapter 5: Ordinary Differential Equations
if 'we take another step' goto 100
end
* ---------------------------------------------------------
double precision function DER(x,y)
double precision x.y
der =
end
For the moment, you should just loop through this code until x is greater
than the maximum desired, at which time the program should terminate. As
it stands, there is no printout, which you'll probably want to change.
n EXERCISE 5.7
Repeat the problem of projectile motion with air resistance, with all
the parameters the same, but using the Fehlberg algorithm. You can
start the integration with almost any h - ifit's too large, the program
will redefine it and try again, and if it's too small, well, there's no
harm done and the second step will be larger. Use a maximum error
tolerance of 10-
5
, and print out the accepted x and y values at each
step.
Before we can go much farther in our investigation of the solutions of
differential equations, we need to make some modifications to our Fehlberg
integrator. By this time, you've probably found that the integrator works
very well, and you might suspect that further refinement isn't necessary. And
you're probably right! But we want to develop a code that is goingto do its job,
without a lot of intervention from us, for a large variety of problems. That is,
we want it to be smart enough to deal with most situations, and even more
important, smart enough to tell us when it can't!
One of the things that can happen, occasionally, is that the very na-
ture of the solution can change; in such a case, it's possible for the integra-
tor to "get confused." To avoid this problem, we will impose an additional
constraint upon the step size predictions: the step size shouldn't change too
much from one step to the next. That is, if the prediction is to take a step
100 times greater than the current one, then it's a good bet that something
"funny" is going on. So, we won't allow the prediction to be larger than a
factor of 4 times the current size. Likewise, we shouldn't let h decrease too
rapidly either; we'll only accept a factor of 10 decrease in h. (While it might
appear that overriding the step size prediction would automatically introduce
an error, such is not the case since the accuracy of the step is verified after its
completion and repeated if the error isn't within the specified limit. It does
225
insure that the integrator isn't tricked into taking too small of a step.) And
while we're at this, let's impose maximum and minimum step sizes: if the
predicted h exceeds h
max
, h will simply be redefined; but if h falls less than
h
min
, it's probably an indication that something is seriously amiss, and the
integration should be stopped and an appropriate message displayed. These
modifications should be made to your code, so that the result looks something
like the following:
Hmax
H = Hmax
Hmin = ...
Hnev= O.9dO * h * sqrt( sqrt(MaxErr/Err) )
*
* Check for increase/decrease in H, as veIl as H-limits.
*
4.dO*H
.1dO*H
Hmax
IF( Hnev .gt. 4.0dO*H)Hnev
IF( Hnev .It. O.ldO*H)Hnev
IF( Hnev . gt. Hmax) Hnev =
IF( Hnev .It. Hmin) THEN
vrite(*,*) , H is TOO SMALL'
vrite(*,*)XO,YO,Yhat,H,Hnev
STOP 'Possible Problem in RKF
ENDIF
Integrator'
*
* If the error is too large, REPEAT the current
* propagation step vith HNEW.
*
IF( Err . gt. MaxErr ) THEN
Note that we can conveniently initialize the integrator with h = h
max
These
modifications to Hnev do not change the conditions under which the "current"
step is accepted or rejected; they simply restrict the range of acceptable step
sizes. After making these modifications, check them on the following project.
EXERCISE 5.8
Attempt to find the solution of
y'
1
x
2
'
y(-l) = 1
226 Chapter 5: Ordinary Differential Equations
on the interval -1 :S x :S 1. Take h
rnax
= 0.1, h
rnin
= 0.001, and Ep-
silon = 5 x 10-
5
.
Second-Order Differential Equations
So far we've been quite successful in finding numerical solutions to our dif-
ferential equations. However, we've mostly been concerned with first-order
equations of the form
y' = f(x,y). (5.51)
In physics, second-order equations are much more important to us in that
they describe a much larger number of real physical systems. We thus need
to work with equations of the form
y" = f(x, y, y'). (5.52)
Now, it might appear that this is a totally new and different problem. There
are, in fact, numerical approaches specifically developed to treat second-order
equations. However, it is also possible to reduce this problem to a more famil-
iar form.
In fact, we've already done this when we looked at the "mass-on-the-
spring" problem. You'll recall that we were able to treat the problem by con-
sidering two quantities, the position and the velocity, each of which was gov-
erned by a first order differential equation but involving both quantities. Al-
though our notation was clumsy, we were able to solve the problem.
To facilitate the general solution to Equation (5.52), let's introduce
some new variables; in particular, let's let Yl = y, and Y2 = y'. Aside from the
computational advantages which we're seeking, this just makes a lot of sense:
y' is a different thing than is y - just as position and velocity are different -
and so why shouldn't it have it's own name and be treated on an equal basis
with y? We then find that the original second-order differential equation can
be written as a set of two, first-order equations,
(5.53)
y ~ = f(X,Yl,Y2). (5.54)
At this point, we could go back and rederive all our methods - Euler, Runge-
Kutta, and Fehlberg - but we don't have to do that. Instead, we note that
these look like equations involving vector components, an observation further
enhanced if we define
(5.55)
Second-Order Differential Equations 227
and
h = !(X,Yl,Y2).
Our equations can then be written in vector form
or
or
(5.56)
(5.57)
(5.58)
y' =f,
where y and f denote vectors. We thus see that this problem is not funda-
mentally different from the first-order problem we've been working with -
it just has more components! Instead of stepping our one solution along, we
need to modify the existing code so that it steps our two components along.
We thus see that the modifications in our existing code are rather trivial, and
consist primarily of replacing scalar quantities by arrays. The independent
variable Xis still a scalar, but the dependent variables Yand That, and the
intermediate quantities FO, Fl, ... , all become dimensioned arrays. A sample
of the changes to be made are illustrated below:
*----------------------------------------------------------
* Paul L. DeVries, Department of Physics, Miami University
*
* This program solves SECOND-ORDER differential equations
* by the Runge-Kutta-Fehlberg adaptive step algorithm.
* Note that the solution and the intermediate function
* evaluations are now stored in ARRAYS, and are treated
* as components of VECTORS.
*
*
*
11/9/87
double precision h, x, y(2), xO, yO(2) , ...
*
* Specify coefficients used in the R-K-F algorithm:
*
* The coefficients Am are used to determine the 'x' at
* which the derivative is evaluated.
*
double precision al,a2,a3,a4,a5
parameter (al=O.25DO, a2=O.375DO, a3=1.2Dl/l.3Dl, ...
228 Chapter 5: Ordinary Differential Equations
*
* Initialize the integration:
*
h = .
xO = .
yOO)
yO(2) = ...
*
*
**********************************************************
* Remember: Y, YO, YHAT, and all the intermediate *
* quantities FO, Fl, etc., are dimensioned arrays! *
**********************************************************
*
* The current point is specified as (xO,YO) and the
* step size to be used is H. The subroutine DERS evaluates
* all the derivatives, e.g., all the components of the
* derivative vector, at (X,Y). The solution is moved
* forward one step by:
*
100 call DERS( xO, yO, fO)
200 x = xO + al*h
DOi=l,2
y(i) = yO(i) + bl0*h*fO(i)
END DO
call DERS( x, y, fl)
x = xO + a2*h
DO i = 1, 2
y(i) = yO(i) + b20*h*fO(i) + b21*h*fl(i)
END DO
call DERS( x, y, f2)
BigErr = O.dO
DO i = 1, 2
Yhat(i) YO(i)+H*( CO*FO(i) + C2*F2(i) + C3*F3(i)
+ + C4*F4(i) + C5*F5(i) )
Err H * DABS( DO*FO(i) + D2*F2(i) + D3*F3(i)
+ + D4*F4(i) + D5*F5(i) )
IF(Err .gt. BigErr)BigErr = Err
END DO
Second-Order Differential Equations 229
MaxErr = h*epsilon
hnew 0.9dO * h * sqrt( sqrt(MaxErr/BigErr) )
*
* Check for increase/decrease in H, as well as H-limits.
*
if( Hnew .gt. 4.dO*H)Hnew =
*
* If the error is too large, repeat the propagation step
* using HNEW.
*
IF( BigErr . gt . MaxErr) THEN
h = hnew
goto 200
ENDIF
*
* The error is small enough, so this step is acceptable.
* Redefine XO to move the solution along; let the more
* accurate approximation become the initial value for
* the next step.
*
xO xO + h
DOi=1,2
yO(i) = yhat(i)
END DO
h = hnew
*
* Have we gone far enough? Do we stop here? What gives?
*
if 'we should take another step' goto 100
end
*---------------------------------------------------------
Subroutine DERS (x, y, f)
*
* This subroutine evaluates all the components of the
* derivative vector, putting the results in the array 'f'.
*
double
f(1)
(2)
end
precision x, y(2), f(2)
y(2)
< the second derivative
of Equation (5.56)
function
>
230 Chapter 5: Ordinary Differential Equations
The biggest change, of course, is in the dimensioning of the relevant quanti-
ties. Note that we must also change the error criteria somewhat - we are
interested in the largest error in any of the components, and so introduce the
variable BigErr to keep track of it. We've also put almost all the informa-
tion about the particular differential equation to be solved into the subrou-
tine DERS, which calculates all the components of the derivative vector. At
this juncture, it would not be a major modification to write the Runge-Kutta-
Fehlberg algorithm as an independent subroutine, not tied to a specific prob-
lem being solved.
a EXERCISE 5.9
Make the necessary modifications to your code, and test it on the dif-
ferential equation
y" = -4y, y(O) = 1, y'(O) = 0,
on the interval 0 :S x :S 211". Compare to the analytic result. Use the
computer to plot these results as well- the visualization ofthe result
is much more useful than a pile of numbers in understanding what's
going on.
The Van der Pol Oscillator
In the 1920s, Balthasar Van der Pol experimented with some novel electrical
circuits involving triodes. One of those circuits is now known as the Van der
Pol oscillator, and is described by the differential equation
.. (2 1) .
x = -x - EO x-x. (5.59)
Interestingly, this equation pops up frequently, and is seen in the theory of
the laser. If EO = 0, then this is a simple harmonic oscillator. But for EO =1= 0, this
oscillator departs from the harmonic oscillator, and in a nonlinear way. Thus,
many of the analytic methods normally applied to differential equations are
not applicable to this problem!
If EO is small, then this problem can be treated by a form of pertur-
bation theory, in which the effect of the nonlinear term is averaged over one
cycle of the oscillation. But this is valid only if EO is small. Fortunately, our
numerical methods do not rely on the linearity of the differential equation,
and are perfectly happy to solve this equation for us.
Phase Space 231
EXERCISE 5.10
Use our friendly integrator to solve Equation (5.59), for = 1 (a value
much greater than zero). As initial conditions, let x(O) = 0.5, and
x(O) = O. Follow the solution for several oscillations, say, on the in-
terval 0 ~ t ~ 811", and plot the results.
Phase Space
Although following the solution as a function of time is interesting, an alter-
nate form of viewing the dynamics of a problem has also been developed. In
a phase space description, the position and velocity (momentum) of a system
are the quantities of interest. As time evolves, the point in phase space spec-
ifying the current state of the system changes, thereby tracing out a phase
trajectory. For example, consider a simple harmonic oscillator, with
and
x(t) = sin t (5.60)
v(t) = cos t (5.61)
The position and velocity are plotted as functions oftime in Figure 5.5. This is
certainly a useful picture, but it's not the only picture. While the figure illus-
trates how position and velocity are (individually) related to time, it doesn't
do a very good job of relating position and velocity to one another.
x v
FIGURE 5.5 Position and velocity as functions oftime for the simple
harmonic oscillator.
But certainly position and velocity are related; we know that the total
energy of the system is conserved. We can remove the explicit time depen-
dence, thereby obtaining a direct relationship between them. Such a curve,
232 Chapter 5: Ordinary Differential Equations
directly relating position and velocity, is shown in Figure 5.6. Of course, the
time at which the system passes through a particular point can be noted, if
desired.
v
x
FIGURE 5.6 The trajectory in phase space of the simple harmonic
oscillator.
By adopting a phase space picture, many results of advanced mechan-
ics are immediately applicable. For example, the area enclosed by a phase tra-
jectory is a conserved quantity for conservative systems, an oxymoron if ever
there was one. For the simple harmonic oscillator, these phase trajectories
are simple ellipses and can be denoted by the total energy, which is (of course)
conserved.
Now consider a more realistic system, one with friction, for example,
so that energy is dissipated. For the harmonic oscillator, the phase trajectory
must spiral down and eventually come to rest at the origin, reflecting the loss
of energy. This point is said to be an attractor.
We can also drive a system. That is, we can initially have a system at
rest with zero energy, and add energy to it. A simple time-reversal argument
should convince you that the phase trajectory will spiral outwards, evidencing
an increase in the energy of the system.
But what if you have both dissipation and driving forces acting on a
system? This is the case of the Van der Pol oscillator.
EXERCISE 5.11
Investigate the trajectory of the Van der Pol oscillator in phase space,
for 0 $ t $ 81L Try different initial conditions, say with x(O) = 0 but
x(O) ranging from 0.5 up to 3.0, or with x(O) fixed and x(O) varying.
The Finite Amplitude Pendulum 233
Plot some of your results, and write a paragraph or two describing
them. From your investigations, can you infer the meaning of "limit
cycle"?
The Finite Amplitude Pendulum
Back when we were talking about integrals in Chapter 4, we discussed the
"simple" pendulum, whose motion is described by the differential equation
.. 9
() = -ysin(). (5.62)
We found that the period of this pendulum was given in terms of an elliptic
integral. We now want to solve for () as a function of time. In the usual intro-
ductory treatments of the pendulum, sin () is approximated by () and Equation
(5.62) becomes the equation for a simple harmonic oscillator. In more ad-
vanced presentations, more terms in the expansion of the sine function might
be retained and a series solution to Equation (5.62) be developed; but even
then, the solution is valid only to the extent that the expansion for the sine
is valid. But a numerical solution is not constrained in this way. With our
handy-dandy second-order integrator up and running, it's nearly trivial to
generate accurate numerical solutions. In fact, we can generate several dif-
ferent solutions, corresponding to different initial conditions, to produce a
family of trajectories in phase space, a phase portrait of the system. Such a
portrait contains a lot of information. For example, at small amplitudes sin ()
is approximately (), so that the path in phase space should look very similar
to those for a simple harmonic oscillator. With larger amplitudes, the trajec-
tories should look somewhat different.
Let's recall some facts about the simple pendulum. The total energy,
E, is the sum of kinetic and potential energies,
where
and
For ()o = 0 we then have
E=T+V,
v = mgl(l - cos()).
1 22
E = T = -ml ()o
2
(5.63)
(5.64)
(5.65)
(5.66)
234 Chapter 5: Ordinary Differential Equations
or
.(2E
(}o = V;;;J2' (5.67)
With these initial conditions, Equation (5.62) can be solved to find (} and iJ as
a function of time, for various total energies. Is there a physical significance
to energies greater than 2mgl?
EXERCISE 5.12
Solve the "simple" pendulum for arbitrary amplitudes. In particu-
lar, create a phase portrait of the physical system. For simplicity,
choose units such that 9 = I = 1 and m = 0.5, and plot (} in the range
-311" < (} < 311". Identify trajectories in phase space by their energy; in-
clude trajectories for E = 0.25, 0.5, 0.75, 1, 1.25, and 1.5. (Why should
you take particular care with the E = 1 trajectory? This particular
trajectory is called the separatrix - I wonder why?)
The Animated Pendulum
Ai3 we noted, the phase space approach is achieved by removing the explicit
time dependence. This yields valuable information, and presents it in a useful
manner - the significance ofan attractor is immediately obvious, for example.
The presentation is static, however - quite acceptable for textbook publica-
tion, but not really in the "modern" spirit of things. With a microcomputer,
you can generate the solution as a function of time - why don't you display
the solution as time evolves? In particular, why don't you animate your dis-
play? Actually, it's not that difficult!
Animation is achieved by displaying different images on the screen in
quick succession. On television and at the cinema, the images are displayed
at a rate of about 30 images per second. About 15 images a second is the
lower limit; less that this, and the mind/eye sees "jitter" rather than "motion."
Affordable full-color, full-screen displays at 30 images per second are coming,
but at the present time that is an expensive proposition; fortunately, what we
need is not that challenging.
Imagine a "picture" of our pendulum, simply a line drawn from the
point of suspension to the location of the pendulum bob. The next image is
simply another line, drawn to the new location ofthe bob. For such simple line
plots, we can achieve animation by simply erasing the old line and drawing
the new one! Actually, we don't erase the old one, we redraw it in the back-
ground color. Clearly, there are differences in the details of how this works
The Animated Pendulum 235
on a monochrome display, versus a color one. But we don't need to know the
details; we just need to change the color. And that's done by the subroutine
color, which we've used previously. The logic for the computer code might be
something like the following:
integer background, white, number
* Initialize values
x...fixed
y...fixed = ...
time = ...
theta =
x_bob = ...
y_bob =
background = a BLACK background
call noc( number)
white = number - 1
call color( white )
call line(x...fixed,y_fixed,x_bob,y_bob)
100 < Start of time step>
* < Propagate the solution one time step, >
* < getting THETA at new time >
* < At this point, we have the new theta and >
* < are ready to update the display. >
* < Do we need to wait here? >
*
* Erase old line
*
call color( background )
call line(x_fixed,y_fixed,x_bob,y_bob)
*
* Get coordinates of bob from new THETA
*
*
* Draw new line
*
236 Chapter 5: Ordinary Differential Equations
call color( white )
call line(x-fixed,y-fixed,x_bob,y_bob)
* < Do another time step? or else? >
Note that a line is displayed on the screen while a new theta is being
calculated. Then, in quick succession, the old line is redrawn with the back-
ground color (hence erasing it) and the new line is drawn with the color set to
white. For the animation to simulate the motion of the pendulum accurately,
we would need to ensure that the lines are redrawn/drawn at a constant rate,
perhaps by adding a timing loop to the code. But for purposes of illustration,
it's sufficient to modify the code to use a fixed step size (in time). This time
step should be small enough to maintain the accuracy of the calculation. If
you find that the animation proceeds too rapidly, you can further decrease the
step size and update the display only after several steps have been taken.
EXERCISE 5.13
Animate your pendulum.
Another Little Quantum Mechanics Problem
Way back in Chapter 2 we discussed the one-dimensional Schrodinger equa-
tion,
n? d
2
1/J
- 2m dx2 + V(x) = E1/J(x), (5.68)
and found solutions for the problem of an electron in a finite square well.
Those solutions were facilitated by knowing the analytic solutions for the dif-
ferent regions of the potential. Well, that was then, and this is now!
Let's imagine a different physical problem, that of an electron bound
in the anharmonic potential
V(x) = ax
2
+{3x4, (5.69)
where a and {3 are constants. In a qualitative sense, the solutions to this
problem must be similar to those for the finite square well; that is, an oscil-
latory solution between the classical turning points, and a decaying solution
as you move into the forbidden region. The sought-for solution should look
something like Figure 5.7.
Another Little Quantum Mechanics Problem 237
-3 -2 -1 0 1 2 3
FIGURE 5.7 The anharmonic potential V(x) = O.5x
2
+ O.25x
4
, and
its least energetic eigenfunction.
The computer code you presently have should work just fine, except
for one small problem: what are the initial conditions? In this problem, you
don't have them! Rather, you have boundary conditions. That is, instead of
knowing '1/7 and d'1/7 / dx at some specific point, you know how '1/7 should behave
as x -> oo. That's a little different, and takes some getting used to.
There are various ways this difficulty can be overcome, or at least
circumvented. For the immediate problem, we can simply guess initial con-
ditions and observe their consequences. Deep in Region I, we know that '1/7
should be small but increasing as x increases; let's guess that
'I/7(xo) = 0,
and
'I/7'(xo) = 'l/7b, (5.70)
where '1/7
0
is a positive number. We can then make an initial guess for the en-
ergy E, and "solve" the differential equation usingthe Runge-Kutta-Fehlberg
integrator. Plottingthe solution, we can see howgood our guess for the energy
was.
Before we can do the calculation, however, we need specific values for
the constants. As noted earlier, the computer deals with pure numbers only,
and not units, so that we need to exercise some care. Using an appropriate
systemof units is extremely advantageous. For example, the mass of the elec-
tron is 9.11 x 10-
31
kilograms, but it's far more convenient, and less prone to
error, to use a system of units in which the mass of the electron is defined as
being 1. Using electron volts for energy and Angstroms for distance, we have
;,,2 = 7.6199682 me eVA2. (5.71)
We'll take Q = 0.5 eVA-2, and f3 = 0.25 eVA-4. And to start the integration,
we'll take Xo = -5 A, '1/7
0
= 1 X 10-
5
, and let EPSILON = 1 x 10-
5
Figure 5.8
238 Chapter 5: Ordinary Differential Equations
contains the results of such calculations, for trial energies of 1.5, 1.75, 2.0,
and 2.25 electron volts. We need to discuss these results in some detail.
1.5 eV
-3 -2 -1 0 1 2 3
FIGURE 5.8 Numerical solutions of the anharmonic potential for
trial energies of 1.5, 1.75, 2.0, and 2.25 electron volts. The numer-
ical solution must satisfy the boundary conditions to be physically
acceptable.
Consider the first trial energy attempted, 1.5 electron volts. In Region
I, where the integration begins, the solution has the correct general shape,
decreasing "exponentially" as it penetrates the barrier. In the classically al-
lowed region the solution has a cosine-like behavior, as we would expect. But
the behavior to the right of the well is clearly incorrect, with the solution be-
coming very large instead of very small. Recall that in this region there are
two mathematically permitted solutions: one that is increasing, and one that
is decreasing; it is on the basis of our physical understanding that only the
decreasing one is permitted. But of course, our trial energy is not the correct
energy, it's only a first guess, and a rather poor one at that, as evidenced by
the poor behavior of the solution. Recall that we want to find that energy
which leads to a strictly decaying solution in this region.
Let's try another energy, say, 1.75 electron volts. The behavior of
the numerical solution is much as before, including a rapid increase to the
right of the potential well. However, the onset of this rapid increase has been
postponed until farther into the barrier, so that it is a better solution than we
had previously. Let's try again: at 2.0 electron volts, the onset is delayed even
further. We're coming closer and closer to the correct energy, finding solutions
with smaller amounts ofthe increasing solution in them. So we try again, but
as shown in the figure, we get a quite different behavior of the solution for
2.25 electron volts: instead of becoming very large and positive, it becomes
very large and negative!
What has happened, of course, is that our trial energy has gotten too
large. The correct energy, yielding a wavefunction that tends toward zero at
Another Little Quantum Mechanics Problem 239
large x, is between 2 and 2.25 electron volts, and we see a drastic change in
behavior as we go from below to above the correct energy. But can we ever
expect to find that correct energy by this method? That is, can we ever find a
solution that strictly decays in Region I I I? And why aren't we seeing any of
this "explosion of the solution" to the left of the potential well?
Recall that there are two linearly independent solutions to a second-
order differential equation; in the classically forbidden region, one of the so-
lutions increases, and one decreases. Any solution can be written as a linear
combination of these two: what we want is that specific solution which has
the coefficient of the increasing solution exactly equal to zero. As the trial
energy comes closer to the correct energy, that coefficient decreases - but we
saw that our numerical solution always "blew up" at some point. Why? Al-
though the coefficient is small, it's multiplying a function that is increasing
as x increases - no matter how small the coefficient, as long as it is not ex-
actly zero, there will come a point where this coefficient times the increasing
function will be larger than the desired function, which is decreasing, and the
numerical solution will "explode"! Why wasn't this behavior seen in Region
I? In that region, the sought-after solution was increasing, not decreasing
- the contribution from the nonphysical function was decreasing as the in-
tegration proceeded out of the classically forbidden region. In Region I, we
integrated in the direction such that the unwanted contribution vanished,
while in Region I I I, we integrated in the direction such that the unwanted
solution overwhelmed our desired one. We integrated in the wrong direction!
Always propagate the numerical solution in the same direc-
tion as the physically meaningful solution increases.
The cure for our disease is obvious: we need always to begin the in-
tegration in a classically forbidden region, and integrate toward a classically
allowed region. In the case of the potential well, we'll have two solutions,
integrated from the left and right, and require that they match up properly
in the middle. For symmetric potentials, such as the anharmonic potential
we've been discussing, the situation is particularly simple since the solutions
must be either even or odd: the even solutions have a zero derivative at x = 0,
and the odd solutions must have the wavefunction zero at x = o. It looks like
we're almost ready to solve the problem.
But there are a couple ofloose ends remaining. What about the choice
of 'l/Jb? Doesn't that make any difference at all? Well it does, but not much.
With a different numerical value of 'l/Jb we would be lead to a different nu-
merical solution, but one with exactly the same validity as the first, and only
240 Chapter 5: Ordinary Differential Equations
differing from it by an overall multiplicative factor. For example, if'l/J(x) is
a solution, so is 5'l/J(x). This ambiguity can be removed by normalizing the
wavefunction, e.g., requiring that
I:'l/J*(x)'l/J(x) dx = 1.
(5.72)
(We note that even then there remains an uncertainty in the overall phase of
the wavefunction. That is, if'l/J is a normalized solution, so is i'l/J. For the bound
state problem discussed here, however, the wavefunction can be taken to be
real, and the complex conjugate indicated in the integral is unnecessary.) For
our eigenvalue search, there's an even simpler approach. Since the ratio of'l/J'
to 'l/J eliminates any multiplicative factors, instead of searching for the zero of
'l/J(O) we can search for the zero of the logarithmic derivative, and require
'l/J'(x,E) I - 0
'l/J(x,E) x=o -
(5.73)
for the eigenvalue E. (For the odd states, we'd want to find the zero of inverse
of the logarithmic derivative, of course.)
Now, we're almost done; all that remains is to ensure that the calcu-
lation is accurate. If a (poor) choice of the tolerance yields 5 significant digits
in the Runge-Kutta-Fehlberg integration, for example, then it's meaningless
to try to find the root of the logarithmic derivative to 8 significant places. The
overall accuracy can never be greater than the least accurate step in the cal-
culation. We also need to verify that the Xo is "deep enough" in the forbidden
region that the eigenvalue doesn't depend upon its value.
EXERCISE 5.14
Find the lowest three eigenvalues, two even and one odd, of the an-
harmonic potential V(x) = O.5x
2
+O.25x4, and plot the potential and
the eigenfunctions. Discuss the measures you've taken to ensure 8-
significant-digit accuracy in the eigenvalues.
We should not leave you with the impression that all the problems
of quantum mechanics involve symmetric potentials - quite the contrary is
true. Symmetry is a terrifically useful characteristic, and should be exploited
whenever present. But the more usual situation is the one in which the poten-
tial is not symmetric. For example, let's consider the force between two atoms.
When they are far apart, the electrons of one atom interact with the electrons
and nucleus of the other atom, giving rise to an attractive force, the van der
Waals attraction. But as the distance between the atoms becomes very small,
Several Dependent Variables 241
the force becomes repulsive as the nuclei (or the ionic core in many-electron
atoms) interact with one another. Thus the general shape of the potential
must be repulsive at small distances and attractive at large ones, necessitating
an energy minimum somewhere in the middle. Ai3 an example, the potential
energy curve for the hydrogen molecule is presented in Figure 5.9.
-0.95
-1.00
V(R) -1.05
-1.10
-1.15
1 2
R(a.u.)
3 4
FIGURE 5.9 The ground state potential for molecular hydrogen.
Energies are expressed in Hartree ( ~ 27.2 eV) and distances in
Bohr ( ~ 0.529 A). (Data taken from W. Kolos and L. Wolniewicz,
"Potential-Energy Curves for the Xl E; , b 3 E ~ , and C
l
II
u
States of
the Hydrogen Molecule," Journal of Chemical Physics 43, 2429,
1965.)
Clearly, the potential is not symmetric. And since it is not symmetric,
the eigenfunctions are not purely even or odd functions. Still, the method of
solution is essentially the same as before: choose a matching point, say, near
the minimum of the potential well. Then, beginning far to the left, integrate
to the matching point; beginning far to the right, integrate to the matching
point; and compare the logarithmic derivatives at the matching point.
Several Dependent Variables
We've seen that the solution of a second-order differential equation can be
transformed into the solution of two, first-order differential equations. Re-
garding these equations as components of a vector, we were able to develop
a computer code to solve the problem. This code can easily be extended in
another direction, to problems having several dependent variables.
For example, to describe the motion of a particle in space we need the
242 Chapter 5: Ordinary Differential Equations
particle's x, y, and z coordinates, which are all functions of time. The motion
is described by Newton's second law, a second-order differential equation, so
that we can introduce the variables v
x
, V
y
, and V
z
to obtain a set of six depen-
dent variables that fully describe the particle's motion. These variables can
be regarded as the components of a six-dimensional vector - and we already
know how to solve such vector problems! All that is required is to modify the
computer code, changing the vectors and the indices on the loops from two to
SIX.
AI?, we'vejust seen, six variables are required for each particle - clear-
ly, the number of independent variables in a problem is not simply the dimen-
sionality ofthe space. But the algorithm we've developed, suitably modified to
reflect the number of variables, can be applied to all such problems involving
a single independent variable.
EXERCISE 5.15
Consider the problem of the falling sphere with air resistance, intro-
duced in Exercise 5.5. In three dimensions, its motion is described by
the vector equation
dp 2P
- =mg-kv -.
dt p
Note that pip is a unit vector pointing in the direction of the momen-
tum. Ifthe particle is initially at the origin traveling at 50 meters per
second in the direction of the positive x-axis, what are the position
and velocity of the sphere as a function of time, 0 < t < 10 seconds?
(How many dependent variables are required in this problem?)
Shoot the Moon
Newton tells us that the gravitational force of attraction between two bodies
is
mM
F = -G-
2
-e
r
. (5.74)
r
We already know that F = rna, so I suppose we know all there is to know
about the motion of bodies under the influence of gravity.
In principle, maybe; in practice, there's a little more to it. Let's start
with a little review of a problem from freshman mechanics. Imagine that we
have two bodies, say, the Earth and the Moon, that are orbiting one another.
Or to put it another way, each body is orbiting about a common center-of-
mass. This makes it convenient to use a coordinate system to describe the
Shoot the Moon 243
system that is fixed in space and has its origin located at the center-of-mass.
In general, the orbital trajectories are ellipses, with one focus at the origin,
but for the Earth-Moon system the eccentricity of the orbit is so small, 0.055,
that we'll approximate them as circles. If d is the (mean) center-to-center
distance between the Earth and the Moon, then
and
mE
TM = d
mM+mE
where mE is the mass of the Earth and mM the mass of the Moon.
FIGURE 5.10 Center-of-mass coordinates for the Earth-Moon sys-
tem. The viewis from Polaris, lookingdown on the plane ofrotation,
with the counterclockwise motion ofthe Earth and Moon indicated.
The relative sizes of the Earth and Moon, and the location of the
center-of-mass @, are drawn to scale.
(5.75)
(5.76)
The Earth-Moon system revolves about its common center of mass
with a sidereal orbital period T, or with an angular frequency w = 21r IT. If
the Moon lies on the positive x-axis at t = 0, then
and
JM = wt
(5.77)
JE = wt +1r, (5.78)
where JM is the angular location of the Moon and JE is the angular location
ofthe Earth. The relationship between T, d, and the masses is not accidental,
of course. To remain in uniform circular motion a body must be constantly
244 Chapter 5: Ordinary Differential Equations
accelerated, the acceleration being generated by the gravitational force of at-
traction. Thus we have Kepler's third law of planetary motion, first published
in Harmonice Mundi in 1619, that the square of the orbital period is propor-
tional to the cube of the mean separation.
We now have a reasonable description of the Earth-Moon system.
Let's add to the equation mankind's innate desire to explore the unknown,
and ask: How do we journey from the Earth to the Moon?
Let's imagine that a spacecraft has been placed in a "parking orbit"
400 krn above the Earth, so that the radius of the orbit is 6800 krn. At time
t = 0 the spacecraft reaches the angular position Q' in its circular orbit about
the Earth, the rockets fire, and the spacecraft attains the velocity of Vo in a
direction tangent to its parking orbit. Since the rockets fire for only a brief
time compared to the duration of the voyage, we'll assume it to be instanta-
neous. But of course, as the spacecraft moves towards the Moon, the Moon
continues its revolution about the Earth-Moon center-of-mass. All the while,
the motion of the spacecraft is dictated by Newton's laws,
(5.79)
AI3 we had expected, the mass of the spacecraft doesn't enter into these equa-
tions. They can be written in component form as
.. G x - Xe G x - XM
X=- mE-r- mM d
3
'
E M
and
.. G Y - Ye G Y - YM
Y=- m E ~ - mM d
3
'
E M
where
dE = J(x - XE)2 +(y - YE)2
is the distance from the spacecraft to the center of the Earth and
d
M
= J(X-XM)2+(Y-YM)2
(5.80)
(5.81)
(5.82)
(5.83)
is the distance from the spacecraft to the center of the Moon. With knowledge
of a few physical constants, such as
d = 3.844 X 10
5
krn,
mE = 5.9742 x 10
24
kg,
mM = 7.35 x 10
22
kg,
Finite Differences 245
T = 27.322 days,
and
G = 6.6726 X 10-
11
N m
2
/ kff,
we're ready to explore!
EXERCISE 5.16
As a preliminary exercise, and to check that all the parameters have
been correctly entered, set the mass of the Moon to zero and see if
you can verify the freshman calculation for the velocity needed for the
spacecraft to have a circular orbit about the Earth at this altitude.
Now, for the "real thing."
EXERCISE 5.17
Find a set of initial conditions, 0: and Va, that will send the spacecraft
from the parking orbit to the Moon. For starters, just try to hit the
Moon. (Score a hit if you come within 3500 km of the center of the
Moon.)
EXERCISE 5.18
For more intrepid voyagers, find initial conditions that will cause the
spacecraft to loop around the Moon and return to Earth. This orbit
is of real significance: see ''Apollo 13" under the heading NEAR DIS-
ASTERS in your history text.
Finite Differences
There is another approach to boundary value problems that's considerably
different from the method we've been discussing. In it, we replace the given
differential equation by the equivalent difference equation. Since we have
approximations for derivatives in terms of finite differences, this shouldn't be
too difficult. For example, consider the differential equation
y" - 5y'(x) + lOy = lOx
subject to the boundary conditions
y(O) = 0, y(l) = 100.
(5.84)
(5.85)
246 Chapter 5: Ordinary Differential Equations
The first step is to impose agrid on this problemand to derive the appropriate
finite difference equation. We will then seek the solution of the difference
equation at the grid points. Since we are replacing the continuous differential
equation by the discrete finite difference equation, we can only ask for the
solution on the finite grid. We would hope, of course, that the solution to this
problem is (at least approximately) a solution to the original one. For our
example, we'll choose the grid to be x = 0.0, 0.1, 0.2, ... , 1.0, and solve for the
function on the interior points - the solutions at x = 0 and x = 1 are fixed
by the boundary condition, and are not subject to change! To derive the finite
difference equation, consider some arbitrary grid point, Xi; at this point, we
have (approximately)
and
, Yi+l - Yi-l
Yi::::;; 2h '
(5.86)
" Yi+l - 2Yi +Yi-l (5.87)
Yi ::::;; h2 '
where h = Xi+l - Xi and we've introduced the notation Yi = Y(Xi)' (Note
that the expressions for the derivatives we're using are of the same order
of accuracy, having error O(h
2
) - there would be no advantage to using an
expression for one termin the differential equation that is more accurate than
the expressions for any other term.) These approximations are substituted
into the original differential equation to obtain the finite difference equation
Yi+l - 2Yi +Yi-l _ 5
Yi
+l - Yi-l +10 . = 10 .
h2 2h Y. x.
(5.88)
This equation has the same structure as some ofthe equations we investigated
in Chapter 2, and can be solved in the same manner. That is, we can write the
equation in matrix form, and use Gaussian elimination to obtain a solution.
However, there are other ways of solving such equations, which we should
investigate. Rather than solvingthe problem in a direct manner, we'll develop
an indirect one. Solving for Yi, we find
1 [5h 5h 2 ]
Yi = 2 -lOh
2
(1- 2)Yi+l + (1 + 2)Yi-l -lOh Xi .
(5.89)
We will find an approximate solution to our differential equation by finding
y's that satisfy the derived difference equation, and we'll do that by iteration.
Iterative methods are usually not superior to direct methods when
applied to ordinary differential equations. However, many physical situations
are actually posed in several dimensions, and require the solution to partial
differential equations rather than ordinary ones. In these instances, the rela-
tive merits are frequently reversed, with iterative methods being immensely
Finite Differences 247
superior to direct ones. We should also note that with direct methods the er-
ror tends to accumulate as the solution is generated. In Gaussian elimination,
for example, the back substitution propagates any error that may be present
in one component to all subsequent components. Iterative methods, on the
other hand, tend to treat all the components equally and distribute the error
uniformly. An iterative solution can almost always be "improved" by iterating
again - with a direct solution, what you get is all you got. In Chapter 7 we
will explicitly discuss partial differential equations and appropriate methods
to investigate them. But those methods are most easily introduced in the con-
text of a single coordinate, e.g., ordinary differential equations, and hence we
include their discussion at this point.
Let's begin our development of an iterative method by simply guess-
ingvalues for all the y's - for example, we can guess that y is a linear function
and evaluate the y/s accordingly. These become our old y's as we use Equa-
tion (5.89) to obtain new y's according to the iterative expression
Y
U) = 1 [(1 _ 5h)yU-l) + (1 + 5h)yU-l) _ lOh
2
X'] .
o 2 _ lOh2 2 0+1 2 0-1 0
(5.90)
The initial guesses are denoted yjOl , and might be stored in an array Yold. The
first iteration at Xi is obtained from y j ~ 1 and yjo.!.l by application of Equation
(5.90), and stored in the array Ynev. One iteration consists of moving entirely
through the array Yold, evaluating the elements of Ynev at all the interior
points - remember, the first and last entries are fixed! After all the Ynev
entries have been evaluated, Ynev can be copied into Yold, and one cycle of
the iteration is complete; the process is then repeated. This is known as the
Jacobi iteration scheme, and will converge to the correct result. However, we
can speed it up a bit with no additional effort.
In the Jacobi scheme, the old values of y are used to evaluate the new
ones. But think about that: you've just determined yjj), and are ready to
evaluate yjrl. Jacobi would have you use yjj-l) in this evaluation, although
you've just calculated a better value! Let's use the better value: in moving
through the y-array from left to right, replace Equation (5.90) by the Gauss-
Siedel iteration scheme
Y
U) = 1 [(1 _ 5h)yU-l) + (1 + 5h)yU) _ lOh
2
X.] .
o 2 _ lOh2 2 0+1 2 0-1 0
(5.91)
Of course, if moving through the array from right to left, you would use
Y
U) = 1 [(1- 5h)yU) + (1 + 5h)yU-l) _ lOh
2
X'] .
o 2 _ lOh2 2 0+1 2 0-1 0
(5.92)
248 Chapter 5: Ordinary Differential Equations
Not only is this a more rapidly convergent scheme, it eliminates the need for
two different arrays of data. The iteration is continued until a specified level
of accuracy is obtained, for all points.
Previously we've discussed how the accuracy of a single quantity is
determined; in the present case, we would require that the successive iterates
y ~ j - l ) and y ~ j ) be the same to some specified number of significant digits. But
here we need to require that this accuracy be met at all the grid points. We
find it very convenient to use logical variables to do this, as suggested in the
example computer code:
*
* This code fragment implements GAUSS-SIEDEL iteration to
* solve the finite difference Equation (5.91).
*
Double Precision y(ll), h, cl,c2,c3,c4, x, yy
+ , tolerance
Integer i, iteration
LOGICAL DONE
Parameter (tolerance 5.d-4)
h = ...
cl 1.dO - 2.5dO * h
c2 1.dO + 2.5dO * h
c3 -10.dO * h * h
c4 2.dO + c3
*
* Initialize y:
*
DO i = 1, 11
y(i)
END DO
iteration = 0
*
* Iterate until done ... or have iterated too many times.
*
100 DONE = .TRUE.
iteration = iteration + 1
IF (iteration .ge. 100) stop 'Too many iterations!)
*
* Evaluate the function at all the interior points:
*
DO i 2, 10
x =
Finite Differences 249
yy = (cl * y(i+l) + c2 * y(i-l) + c3 * x ) / c4
IF ( abs( (yy-y(i))/yy ) .gt. tolerance)
+ DONE = .FALSE.
y(i) = yy
END DO
IF (.NOT.DONE)goto 100
The variable DONE is declared to be a logical variable, and is set to TRUE at the
start of every iteration. As the iteration proceeds, the accuracy of each point
is tested. Any time the error exceeds the specified accuracy, DONE is set to
FALSE - of course, it only takes one such instance for DONE to become FALSE.
The final accuracy check is then very simple, " if not done, iterate again."
By choosing the appropriate type of variable, and a suitable name for it, the
convergence checking has been made very clear.
The accuracy we've specified is not great, for several reasons. First,
it's always wise to be conservative when starting a problem - try to develop
some feel for the problem and the method of solution before you turn the
computer loose on it. And secondly, we shouldn't lose sight of the fact that
this is a derived problem, not the original differential equation. Is an exact
solution to an approximate problem any better than an approximate solution
to the exact problem? To solve the difference equation to higher accuracy is
unwarranted, since the difference equation is only a finite representation of
the differential equation given. As a final comment on the computer code, note
that, as should be done with any iterative method, the number of iterations
are counted and a graceful exit is provided if convergence is not obtained.
The maximum iteration count is set rather large; the method is guaranteed
to converge, but it can be slow. The code fragment was used as a base to
develop a computer program to solve the finite difference equation, with the
following results:
iter y(O.O) y(O.I) y(0.2) y(0.3) y(O.4) y(0.5) y(0.6) y(0.7) y(0.8) y(0.9) y(1.0)
0 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00
1 0.00 7.89 17.02 26.97 37.46 48.30 59.38 70.61 81.94 93.33 100.00
2 0.00 6.71 15.05 24.68 35.28 46.62 58.51 70.80 83.38 94.28 100.00
3 0.00 5.94 13.64 22.88 33.44 45.07 57.57 70.75 83.72 94.50 100.00
4 0.00 5.38 12.56 21.45 31.88 43.67 56.63 70.26 83.49 94.35 100.00
5 0.00 4.95 11.71 20.27 30.55 42.43 55.62 69.51 82.93 93.99 100.00
86 0.00 1.98 5.04 9.48 15.65 23.90 34.52 47.68 63.34 81.10 100.00
Although the calculation has converged, note that 86 iterations were neces-
sary to obtain the specified accuracy.
250 Chapter 5: Ordinary Differential Equations
a EXERCISE 5.19
Write a computer program to reproduce these results.
SOR
This whole method is referred to as relaxation - the finite difference equa-
tions are set up, and the solution relaxes to its correct value. As with any
iterative procedure, the better the initial guess, the faster the method will
converge. Once a "sufficiently accurate" approximation is available, either
through good initialization or after cycling through a few iterations, the re-
laxation is monotonic - that is, each iteration takes the approximate solution
a step closer to its converged limit, each step being a little less than the pre-
vious one. (In the table above, we see that the relaxation becomes monotonic
after the third iteration.) This suggests an improved method, in which the
change in the function from one iteration to the next is used to generate a
better approximation: let's guess that the converged result is equal to the
most recent iterate plus some multiplicative factor times the difference be-
tween the two most recent iterations. That is, we guess the solution y(j) as
being
yij) = yij) +a [yij) - yii-
1
)] , (5.93)
where a lies between 0 and 1. (The optimal value for a depends on the ex-
act structure of the difference equation.) This is called over-relaxation; since
it's repeated at each iteration, the method is referred to as Successive Over-
Relaxation (SOR).
EXERCISE 5.20
Modify your program to use the SOR method. Experiment with a few
different values of a to find the one requiring the fewest iterations.
For that a, you should see a dramatic improvement in the rate of con-
vergence over the Gauss-Siedel method.
Discretisation Error
We noted earlier that demanding excessive accuracy in the solution of the fi-
nite difference equations is not appropriate. The reason, of course, is that the
finite difference equation is only an approximation to the differential equation
that we want to solve. Recall that the finite difference equation was obtained
Discretisation Error 251
by approximating the derivative on a grid - no amount of effort spent on
solving the finite difference equation itself will overcome the error incurred
in making that approximation. Of course, we could try a different grid ....
To make our point, we've solved the finite difference equation for
three different grids, with h = 0.2, 0.1, and 0.05. We pushed the calculation
to 8 significant digits, far more than is actually appropriate, to demonstrate
that the error is associated with the finite difference equation itself, and not
the method of solution. That is, our results are essentially exact, for each par-
ticular grid used. Any differences in our results for different grids are due to
the intrinsic error associated with the discretisation of the grid! These results
are presented in Table 5.1.
TABLE 5.1 Olseretlsatlon Error
y(0.2) y(O.4) y(0.6) y(0.8)
Finite difference
h =0.20 4.3019 13.9258 31.9767 61.0280
h =0.10 5.0037 15.5634 34.3741 63.2003
h =0.05 5.1586 15.9131 34.8676 63.6292
Richardson extrapolation
0.20/0.10 5.2376 16.0193 35.1732 63.9244
0.10/0.05 5.2103 16.0296 35.0321 63.7721
0.20/0.10/.05 5.2085 16.0243 35.0227 63.7619
Analytic result 5.2088 16.0253 35.0247 63.7644
As expected, the results become more accurate as h is decreased. But the
magnitude of the discretisation error is surprising: at x = 0.2, for example,
the calculated value changes by 16% as the grid decreases from h = 0.2 to 0.1,
and another 3% when h is further reduced to 0.05. Again, these changes are
due to the discretisation of the grid itself, not the accuracy of the solution of
the finite difference equations. Clearly, it is a serious mistake to ask for many-
significant-digit accuracy with a coarse grid - the results are not relevant to
the actual problem you're trying to solve!
Since the error is due to the approximation of the derivatives, which
we know to be O(h
2
), we can use Richardson's extrapolation to better our re-
sults. Usingjust the h =0.2 and 0.1 data yields extrapolated results superior
to the h =0.05 ones. The extrapolations can themselves be extrapolated, with
remarkable success.
Another approach to generatingmore accurate results would be to use
252 Chapter 5: Ordinary Differential Equations
a more accurate approximation for the derivatives in the first place. By using
a higher order approximation, the truncation error incurred by discarding the
higher terms in the derivative expressions would be reduced. For the interior
points, we can use the approximations developed in Chapter 3,
and
f'(x) = f(x - 2h) - 8f(x - h)1;h8f(x +h) - f(x +2h) +O(h4 )
(5.94)
f
"( ) = - f(x - 2h) + 16f(x - h) - 30f(x) + 16f(x +h) - f(x +2h)
X 12h
2
+O(h
4
). (5.95)
These expressions have error O(h
4
), but are not applicable for the points im-
mediately adjoining the endpoints, since either f (x - 2h) or f (x +2h) will lie
outside the range beingconsidered. For these points, we must devise alternate
expressions for the derivatives.
In Chapter 3, we used Taylor series expansions for f (x +h), f (x +2h),
etc., to develop approximations for the derivatives. Let's try that again - the
relevant expansions are
f( h) f
I hi' h
2
I" h
3
I'" h
4
liv h
5
IV O(h
6
)
X - = -1 = JO - JO + 2T JO - 3TJO + 4! JO - 5! JO + ,
f(x) = fo,
f( h) f
I hI' h
2
I" h
3
I'" h
4
liv h
5
IV O( 6)
X + = 1 = JO + JO + 2! JO + 3! JO + 4! Jo + 5! Jo + h,
f( )
f
I I' 4h
2
I" 8h
3
I'" 16h
4
iv 32h
5
v (6)
X + 2h = 2 = JO +2hJO + 2!JO + 3!JO + 4!fo + 51fo +0 h ,
f()
I' 9h
2
I" 27h
3
I'" 81h
4
iv 243h
5
v 6)
X +3h = h = fo +3hJO + 2!JO + 3!JO + 4!f
o
+ -s!fo +O(h ,
f(
4h) - f - I 4h I' 16h
2
I" 64h
3
I'" 256h
4
liv 1024h
5
f,v
X + - 4 - JO + Jo + 2! JO + 3! JO + 4! JO + 5! 0
+O(h
6
). (5.96)
Since we are seeking a more accurate approximation, it's necessary that we
retain more terms in the expansion than before. And since the derivative
is desired at a nonsymmetric location, we'll need to evaluate the function at
Discretisation Error 253
more points as well. Taking a linear combination ofthese expressions, we find
a-d-l +aolo +adl +a2f2 + a313 +a4/4
=10 [a-l + ao + al + a2 + a3 + a4]
+ h / ~ [-a-l + al + 2a2 + 3a3 + 4a4]
h
2
II [ ]
+ I"10 a-I + al + 4a2 + 9a3 + 16
a
4
2.
h
3
11/ [ ]
+ 3!1
0
-a-l +al +8a2 + 27a3 + 64a4
h
4
fiv [ ]
+ 4! Jo a-I +al + 16a2 + 81a3 + 256
a
4
h
5
+ I"I ~ [a-l + al + 3
2a
2 + 243a3 + 1024a41
5.
+O(h
6
). (5.97)
To obtain an expression for I ~ , we require that the coefficients of the higher
order derivatives vanish. To have an error of O(h
4
), it's sufficient to set a4 =
oand to choose the remaining coefficients such that
a-I + al + 4a2 +9a3 = 0,
-a-l + al +8a2 + 27a3 = 0,
a-I +al + 16a2 + 81a3 = 0.
fl = -3/-1 - 12/0 + 18h - 612 + 13 O(h4)
JO 12h +.
Similarly, we can - with some effort - find that
f" = 10/-1 - 15/0 - 4h + 1412 - 613 + 14 O(h4)
JO 12h2 +.
(5.98)
(5.99)
(5.100)
As you can see, this approach leads to a rather more complicated
scheme. And we have yet to derive the finite difference equations themselves.
With these expressions for the derivatives, however, the difference equations
are within sight. Note that there will be three distinct cases to be treated: 1)
an endpoint, which is held constant; 2) a point adjacent to an endpoint, for
which Equations (5.99) and (5.100) are used to derive the difference equation;
and 3) points that are at least one point removed from an endpoint, for which
the central difference approximations can be used to derive the appropriate
difference equation.
254 Chapter 5: Ordinary Differential Equations
EXERCISE 5.21
Modify your existing program to use these higher order approxima-
tions. Note that the optimal acceleration parameter determined pre-
viously will not be optimal for this new situation, although it should
be a reasonable value to use.
Every physical situation is unique, but the general consensus is that
the additional effort required for the higher order approximations is usually
not justified. On a coarse grid, they perform no better than the three-point
rule, and for fine grids the extra level of accuracy can be more easily obtained
by using Richardson extrapolation to "polish" the coarse results.
A Vibrating String
You probably recall that the vibration of a string, fixed at both ends and under
uniform tension, is described by the differential equation
fPu(x, t)
&t
2
T (Pu(x, t)
fL(X) ax
2
(5.101)
where T is the tension in the string and fLeX) is the mass of the string per unit
length. Astandard way to solve this problem is to guess a solution ofthe form
u(X, t) = Y(X)T(t) (5.102)
(5.103)
and see if it works! After making the substitution and rearranging, we find
that
1 T d
2
y 1 ~ T
y(x) fLeX) dx
2
= T(t) dt
2
'
Now, the left side of this equation is a function of x alone, and the right side
a function of t alone. The only way this relation can be valid for all x and t is
for both sides of the equation to equal a constant, which we take to be -w
2
DO i = 1, imax
DO j = 1, imax
flag(i,j)=l
phi(i , j) = 1. dO
END DO
END DO
Initialize all boundaries, points, etc.
DO j = 1, jstar-1
flag(istar,j) 4
END DO
Initialize other variables:
h = ..
Nx
Ny = ...
alpha = ..
CALL CLEAR
count = 0