Methods of Conjugate Gradients For Solving Linear Systems
Methods of Conjugate Gradients For Solving Linear Systems
(4:3a)
(4:3b)
(3:8) (4:3c)
The scalar at can be given by the formula
(4:4)
This minimum value differs from/(x) by the quantity where n(z) is the Rayleigh quotient
2
(v r)
J\X) j\X\~ap)==za \jp,^x.pj — ~/ ~i r* (^4 . o) (4:12)
KP>Ap)
Comparing (4:7) with (4:1a), we obtain the first two The Rayleigh quotient of the error vector y does not
sentences of the following result: exceed that of the residual r, that is,
Theorem 4:3. The point xt minimizes f(x) on the
line x=Xi-i + api_i. At the i-th step the error f(xt_i) (4:13)
is relaxed by the amount Moreover,
(4:14)
fixi-j-fixi)^^-1'^2,- (4:9)
The proof of this result is based on the Schwarzian
In fact, the point xt minimizes f(x) on the i-dimensional quotients { (Az,A2z)
plane Pt of points {
Lb)
(z,z) = (z,Az) =(Az,Az)' ^'
(4:10) The first of these follows from the inequality of
where a0, ..., at_i are parameters. This plane con-
Schwarz
tains the points xQ,X\,..., xt.
(4:16)
In view of this result the cd-method is a method by choosing p="Z, q=Az. The second is obtained
of relaxation of the error function/(x). An iteration by selecting p=Bz, q=Bzz, where B2=A.
of the routine may accordingly be referred to as In order to prove theorem 4:4 recall that if we
a relaxation. set y=h—x, then
In order to prove the third sentence of the theorem
observe that at the point (4:10) r=k—Ax=A(h—x)=Ay
f(x)=f(xo)-TJ tta&j^ f(x)=(y,Ay)
j=o
by (4:5). Using the inequalities (4:15) with z=y,
At the minimum point we have we see that
^(Ay,A*y)
=
(y,y) (y.Ay) -f(x)=(Ay,Ay)
and hence a.j=ah by (4:4). The minimum point is _(r,Ar)_
accordingly the point Xt, as was to be proved. (r,r) '
413
This yields (4:11) and (4:13). Using (4:16) with In view of theorem 4:6, it is seen that at each step
p=y and q=r we find that of the cd-routine the dimensionality of the space TT*
in which we seek the solution h is reduced by unity.
=(y,r)^\y\ \r\ Beginning with x0, we select an arbitrary chord Co of
Hence f(x) =f(xo) through x0 and find its center. The plane
TTI through Xi conjugate to Co contains the centers of
all chords parallel to Co. In the next step we re-
so that the second iuquality in (4:14) holds. The strict ourselves to TTI and select an arbitrary chord
first inequality is obtained from the relations Ci of f(x) =f(xi) through x1 and find its midpoint
x2 and the plane TT2 in 7rt conjugate to C\ (and hence
to Co). This process when repeated will yield the
answer in at most n steps. In the cg-method the
chord d of f(x) =f(xt) is chosen to be the normal
As is to be expected, any cd-method has 1within at x^
its routine a determination of the inverse A~ of A.
We have, in fact, the following: 5. Basic Relations in the cg-Method
Theorem 4:5. Let p0, . . ., pn-i be n mutually con-
jugate nonzero vectors and let p^ be the j-th component Recall that in the cg-method the following formulas
of pi. The element in the j-th row and k-th column of are used
A"1 is given by the sum Po=ro=k—Ax0 (5:1a)
n-l „1_ H 2 (5:1b)
(p,,APij
This result follows from the formula xm=xi+atpi
(5:Id)
b,=
for the solution h of Ax=k, obtained by selecting
xo=O.
We conclude this section with the following: (5: If)
Theorem 4:6. Let x* be the (n—i)-dimensional
plane through x{ conjugate to the vectors p0, plf . . ., One should verify that eq. (5:le) and (5: If) hold for
Pi-i. The plane TT* contains the points xt, xi+i, . . . ^=0,1,2,. . . if, and only if,
and intersects the (n—1)-dimensional ellipsoid fix) =
f(Xi) in an ellipsoid E[ of dimension (n—i—1).
The center of El is the solution h of Ax=k. The point (5:2)
xi+i is the midpoint of the chord d of E[ through xu
which is parallel to pt. In the eg-method the chord d
is normal to E[ at xt and hence is in the direction of The present section is devoted to consequences of
the gradient of f(x) at xf in Tt. these formulas. As a first result, we have
The last statement will be established at the end Theorem 5:1. The residuals r0, ru . . . and the
of section 6. The equations of the plane -KI is given direction vectors p0, pu . . . generated by (5:1) satisfy
by the system the relations
(iS J) (5:6a)
\ri\2
(6:3)
j=o
(6.9)
and
(6.10)
In view of (6:2) and (5:1b) this becomes
In order to establish (6:9) and 6:10) we use the
( I)= (6:6) formula
^-*<- MS)-
Setting x=xi_l and j=m in (6:4), we obtain (6:5)
by the use of (6:6) and (6:1). which holds for all values of a in view of the fact that
This result establishes the cg-method as a method Xt minimizes f(x) on Pt. Setting «=/(#*)/|ri | 2
of successive approximations and justifies the pro- we have
cedure of stopping the algorithm before the final-
step is reached. If this is done, the estimate ob-
tained can be improved by using the results given
in the next two theorems.
Theorem 6:4. Let x$1: • • •, x£ be the projec- An algebraic reduction yields (6:9). Inasmuch as
tions of the points xi+1, • • • , xm=h in the i-
dimensional plane Pt passing through the points xQ,
# ! , • • • , xt. The points xt_u xu xQi, • • •, x^ lie |^ — %i\2— \h — X^\2= r—
on a straight 0line in the order given by their enumeration.
The point as* (k^>i) is given by the formulas
_i) — f(Xi) |r<_i
X fc — X i __ i " (6:7a)
we obtain (6:10) from (6:9) and (5:1b).
As a further result we have
x (i) _ _ x .
(6:7b) Theorem 6:6. Let x[, . . ., x'm_xbe the projections
of the points xx, . . ., x'm^x on the line joining the initial
point x0 to the solution xm=h. The points x0, x[, . . .,
In order to prove this result, it is sufficient to x'm-i, xm=h lie in the order of enumeration.
establish (6:7). To this end observe first that the Thus, it is seen that we proceed towards the solu-
vector tion without oscillation. To prove this fact we need
only observe that
\PhPj)— I
\ri-l\
12 \PhPi-V = by (5: 6a). A similar result holds for the line joining
The projection of the point Let -KI be the (n—i)-dimensional plane through xt
conjugate to p0, Pi, . . -, Pi-i- It consists of the set
of points x satisfying the equation
in P^ is accordingly
0"=0,l,
~«) ^ . a*_ 1 |r ? :_ 1 | 2 +.
Pi-i- This plane contains the points xi+i, .,xm and hence
the solution h.
Using (6:2), we obtain (6:7). The points lie in the Theorem 6:7. The gradient of the function fix) at xt
designated order, since f(xk) y>f(xk+1). in the plane Tr* is a scalar multiple of the vector pt.
Since f(xm) = 0, we have the first part of The gradient of/(as) at xt is the vector — rt. The
gradient qt oif(x) at xt in TTI is the orthogonal projec-
Theorem 6:5. T/i6 point tion of — Ti in the plane ?rz. Hence qt is of the form
(6:8) qi=—ri+aoAp{)+ . . . + ai_lApi-u
417
where a0, . . .,a*_i are chosen so that qt is orthogonal x
12 i
(7:3c)
Inserting (7:1c) in the first term, The system Ax=k may be replaced by this linear
system for a'o,- • -X_i. Therefore, because of
(Aptipi+1)=—— \ri+1\2+-~ (rt rounding-off errors we have certainly not solved
t^ a% the given system exactly, but we have reached a
(8:6) more modest goal, namely, we have transformed the
given system into the system (8:12), which has a
But in view of (8:1b) and (8:Id) dominating main diagonal if rounding-off errors have
not accumulated too fast. The cg-algorithm gives,
ri+1\2=aibi(Api,pi). (8:7) an approximate solution
Therefore,
h'=xo+aopo+- (8:13)
(8:8)
A comparison of (8:11) and (8:13) shows that the
number ak computed during the cg-process is an
This is our second propagation formula. approximate value of ak.
Putting (8:5) and (8:8) togetherjyields the third In order to have a dominating main diagonal in
and fourth propagation formulas the matrix of the system (8:12) the quotients
(Apt,pk)
(8:9a) (8:14)
of A, and we take the corresponding normalized In order to discuss this result, suppose that the
eigenvectors as a coordinate system. Let a0, a\, ..., numbers at have been computed accordingly to
an_i be real numbers not equal to zero and e a small the formula
quantity. Then we start with a residual vector
(Pt,rt) (8:19)
r0z=(a:0,aie,a2e2, . . . ,an-ien-1). (8:14a)
at=-
Expanding everything in a power series, one finds (theorem 5:5). From (8:1c) it follows that
that iri+hpi)=0, and therefore this term drops out in
(8:18). In this case the correction Aaf depends
(8:14b) only on the products (Api,pk) with i<CJc. That is
to say, that this correction is influenced only by the
Hence rounding-off errors after the i-th step. If, for
instance, the rounding-off errors in the last 10 steps
of a cg-process are small enough to be neglected,
the last 10 values a* need not to be corrected. Hence,
if e is small enough. generally, the Aa* decrease rather rapidly.
As a by-product of such a choice of r0 we get by rounding-off From (8:18) we learn that in order to have a good
(8:14b) approximations of the eigenvalues of A. keep the products behavior, it is not only necessary to
Moreover, it turns out that in this case the successive satisfy (r pi) = 0 as (pk,Apt) (i^k) small, but also to
i+h
residual-vectors r0, rif ..., rn_i are approximations of it may be better to compute well as possible. Therefore,
the eigenvectors. the a* from the formulas
These results suggest the following rule: (8:19) rather than from (8:1b). We see this im-
mediately, if
The cg-process should start with a smooth residual (8:19) and (8:le) we have we compare (8:19) with (8:1b); by
distribution, that is, one for which /x(r0) is close to Xmin.
Ij needed, the first estimate can be smoothed by some
relaxation process.
Of course, we may use for this preparing relaxation
the cg-process itself, computing the estimates xf
given in section 7. A simpler method is to modify For ill-conditioned matrices, where at and bt may
the cg-process by setting bt=0 so that Pi=rt and become considerably larger than 1, the omitting
selecting at of the form at=a/ii(ri)i where a is a of the second summand may cause additional errors.
small constant in the range 0 < < l For the same reason, it is at least as important in
421
these cases to use formula (3:2b) rather than (8:Id) Introducing the correction factor
for determining bh since by (3:2b) and (8:1c)
1 (8:21)
{k
and taking into account the old value (8:1b) of aiy
Here the second summand is not directly made this can be written in the form
zero by any of the two sets of formulas for at and
hi. The only orthogonality relations, which are
directly fulfilled in the scope of exactitude of the 5<=gr (8:22)
numerical computation by the choice of at and bu
are the following: Continuing in the general routine of the (i+l)st step
we replace bt by a number 6* in such a way that
(Api,pt+i)=0. We use (8:6), which now must be
written in the form
Therefore, we have to represent (ri+i,r%) in terms
of these scalar products:
— |r f + 1 | 2 + (r
) = (ri+hpt) — b^
The term (r*,rf+i) vanishes by virtue of our choice
From this expression we see that for large 6?: and a* of at. Using (8:7), we see that aibi=aibi and from
the second and third terms may cause considerable (8:22)
rounding-off errors, which affect also the relation bi=bidi. (8:23)
(Pi+i,Api)=0, if we use formula (8:Id) for 6t. This
is confirmed by our numerical experiments (sec- Since rounding-off errors occur again, this sub-
tion 19). routine can be used in the same way to improve the
From a practical point of view, the following results in the (i+l)th step.
formula is more advantageous because it avoids the The corrections just described can be incorporated
computation of all the products (Apt,pk). From automatically in the general routine by replacing the
(8:1c) follows formulas (3:1) by the following refinement:
rn=ri+i— Po=ro=k—Ax0, do=l
(8:20) (8:24)
Pi
Pi+i= (9:1)
(9:5)
Apt
di=i
h \n+i\2dt
n+i
to>t,APt) Pt+i=Pi+
The connections between ai} biy dt are given by the
equation This result is obtained from (9:1) by choosing
(9:6)
This yields (9:2) in case i > 0 . The formula, when Pi+l==
t-i+1
i = 0 , follows similarly.
In the formulas (9:1) the scalar factor dt is an
arbitrary positive number determining the length of
p t The case df=l is discussed in sections 3 and 5.
T he following cases are of interest.
227440—52- 423
r
These relations are obtained from (9:1) and (9:2) by- If one does not wish to use any properties of the
setting di=l. cg-method in the computation of at and b t besides
IV. The vector pt can be chosen so that at is the recip- the defining relations, since they may be disturbed
rocal of the Eayleigh quotient oj rt. by rounding-off errors, one should use the formulas
The formulas for aiy bt and dt in (9:1) then become
, AA*ri+1)
btai+1
do=l, In this case the error function f(x) is the function
f(x)=\k—Ax\2, and hence is the squared residual.
This is sufficient to indicate the variety of choices It is a simple matter to interpret the results given
that can be made for the scalar factor d0, d1} . . . . above for this new system.
For purposes of computation the choice di = 1 appears It should be emphasized that, even though the use
to be the simplest, all things considered. of the system (10:2) is equivalent from a theoretical
point of view to applying the cg-algorithm to the
10. Extensions of the cg-Method system (10:1), the two methods are not equivalent
In the preceding pages we have assumed that the from a numerical point of view. This follows because
matrix A is a positive definite symmetric matrix. rounding-off errors in the two methods are not the
The algorithm (3:1) still holds when A is nonnegative same. The system (10:2) is the better of the two,
and symmetric. The routine will terminate when because at all times one uses the original matrix A
one of the following situations is met: instead of the computed matrix A*A, which will
(1) The residual rm is zero. In this event xm is contain rounding-off errors.
a solution of Ax~k, and the problem is solved. There is a slight generalization of the system (10:2)
(2) The residual rm is different from zero but that is worthy of note. This generalization consists
(Apm,pm)=Q, and hence Apm=0. Since Pi=Ctri, of selecting a matrix B such that BA is positive defi-
it follows that Arm=0, where rm is the residual of the nite and symmetric. The matrix B is necessarily of
vector xm defined in section 7. The point xm is the form A*H, where H is positive definite and
accordingly a point at which \k—Ax\2 attains its symmetric. We can apply the cg-algorithm to the
minimum. In other words, xm is a least-square system
solution. One should observe that Pm^O (and hence BAx=Bk. (10:3)
rm5*0). Otherwise, we would have rm=—bm-ipm-i,
contrary to the fact that rm is orthogonal to pm-\. In place of (10:2) one obtains the algorithm
The point xm fails to minimize the function
rQ=k—Ax0, pQ=Br0,
g(x)=(x,Ax)-2(k,x),
for in this event „ _ \Br<\*
at
2
-(pt,BAPt)'
g(xm+tpm)=g(xm)—2t\rm\ .
In fact, g(x) fails to have a minimum value.
It remains to consider the case when A is a general (10:4)
nonsingular matrix. In this event we observe that
the matrix A*A is symmetric and that the system
Ax=k is equivalent to the system
A*Ax=A*k. (10:1)
Applying the eq (3:1) to this last system, we obtain
the following iteration,
Again the formulas for at and bi} which are given
r o = k — Ax0, po= A*r0, directly by the defining relations, are
ai
~ \APi\* ai
(pitBAPi)
xi+1=xi+aipi
(10:2) h _ (Bri+1,BAPi)
(piyBAPi)
a1=
(Apt,et)
Hence,
These formulas generate mutually conjugate vectors
Pu • • •> Vn and corresponding estimates Xi, . . ., xn
of the solution of Ax=k. In particular xn is the
desired solution. The advantage of this method is the next estimate of the solution. Moreover,
lies in the ease with which the inner products appear-
12
4§
cf. Fox, Huskey, and Wilkinson, loc. cit. #22
426
Next multiply the 2nd row of (12:3) by ai2 and sub vectors are pu p2, • • •, pnj then the matrix (12:4)
tract the result from the iih row (i—2>y—,n). We ob is the matrix
tain \\P*A P * P*Jfc||.
#13 Pn Vm The matrices P*A and P are triangular matrices
(2) n(2)
with zeros below the diagonal. The matrix D=P*AP
0 a #23 P21 Vin is the diagonal matrix whose diagonal elements are
0 0 n (3)
#33
n(3)
#3n Vzn kf #11, a™, . . ., a<nnn- The determinant of Pis unity and
4,(3)
the determinant of A is the product
0 0
n n(2) <n)
#11#22.
As was seen in section 4, if we let
0 0 ,(3) (3)
u.'nn Jfe*. f(x) = (h—x,A(h—x)),
( ( }
The vector (0,0,& l\. . .,& n ) is the residual of x2. the sequence
The elements u}?,. . .,u$ form the vector uf (i = 3)
and Vz—U (3} • We have
a^) = (-4u?), e*) (i=3,. . .,n) decreases monotonically. No general statement
can be made for the sequence
428
where Again we may restrict ourselves to the powers
•,»)•
(14:6) 1, X, . . . ,Xn-1. That is, we must show that
We then have
+an\n*en (14:7)
r
Jo
X'+1X*dm (X) =
(14:10)
for k=0,l, • • •, n—1. The vectors rOrAro, • • -, If j<Cn—lj this has already been verified, The
An~lr§ are linearly independent and will be used as a remaining case
coordinate system. Indeed their determinant is up
to the factor a^a2 • • • an Van der Monde's determi-
nant of Xi, • • -,Xn. By the correspondence
Jo
f <n-l) (14:11)
\k->Akr0 (k=0,l, • • -,n — l) (14:8) follows in the same manner from Gauss' integration
formula, since n+k<2n—l.
every polynomial of maximal degree T? — 1 is mapped Theorem 14:3. Let A be a positive definite sym-
onto a vector of the ^-dimensional space and a one- metric matrix with distinct eigenvalues and let r be a
0
one correspondence between these polynomials and vector that is not perpendicular to an eigenvector of A.
vectors is established. The correspondence has the There is a mass distribution m(X) related to A as
following properties: described above.
Theorem 14:1. Let the space of polynomials 22(X) In order to prove this result let eh . . ., en be
of degree <n—l be metrized by the norm the normalized eigenvectors of A and let Xx, . . .,
\n be the corresponding (positive) eigenvalues. The
i vector r0 is expressible in the form (14:5). According
to our assumption no ak vanishes. The desired mass
distribution can be constructed as a step function
Then the correspondence described above is isometric, that is constant on each of the intervals 0<Xi<X 2 <
that is, • • • <XW<^, and having a jump at \k of the amount
r
Jo7
mk^aiy>0, the number I being any number greater
than \n.
We want to emphasize the following property of
our correspondence. If A and r0 are given, we are
where 22(X), 22 (X) are the polynomials corresponding able to establish the corredence without computing
to r and rf. eigenvalues of A. This follows immediately from the
It is sufficient to prove this for the powers 1, X, basic relation (14:8). Moreover, we are able to com-
2 1 j k
X , . . ., X"" . Let \ , \ be two of these powers. pute integrals of the type
From Gauss' formula (14:4) follows
[lR (X) R' (X) dm (X), VR (X) R' (X) \dm (X),
l k j+k
f V\ dm (X) = (\ dm (X) Jo Jo
Jo Jo (14:12)
r
For k<i% this is zero, because the second factor is of
Using the isometric properties described in theorems
14:1 and 14:2, we find that
&«-,=
\r*\'
degree k^i
Using (15:5a) and (15:6), we obtain The vectors rt are orthogonal, and the pt are con-
jugate; the latter result follows from (15:6). Hence
the basic formulas and properties of the cg-method
f Ri(X)Pi(X)\dm(\)= fZ listed in sections 3 and 5 are established. It remains
J° Jo to prove that the method gives the exact solution
after n steps. If we set xi+1=Xi+aiPi, the corre-
Combining this result with the orthogonality of sponding
Ri+i and Rt, we see, by (15:5b), that residual is ri+i as follows by induction:
Rt(\)*dm(\)
____ Jo (15:7) For the last residual rw we have (i=0, 1, . . . ,,n—l)
at=
Jo (rn,rt) = (rn-iVi) —dn-^Apn-ifi)
0= f = (RnRidm = 0.
Jo Jo
430
Our basic method reestablishes also the methods This gives the following result, let A be a symmetric
of C. Lanczos for computing the characteristic poly- matrix having the roots of the Legendre polynomial
nomial of a given matrix A. Indeed the polynomials Rn(K) as eigenvalues, and let
Ri, computed by the recurrence relation (15:5), lead
finally to the polynomial 2?W(X), which, by the basic
definition of the correspondence in section 14, is the
characteristic polynomial of A, provided that r0 satis- where eu . . ., en are the normalized eigenvectors of
fies the conditions given in theorem 14:3. It may A, and mi=ai, ra2=a|, . . ., mn=<x2n are the
be remembered that orthogonal polynomials build a weight-coefficients for the Gauss' mechanical quad-
Sturmian sequence. Therefore, the polynomials rature with respect to Rn. The cg-algorithm applied
Ro, i?i, . . ., Rn build a Sturmian sequence for the to AirQ yields the numbers at, bi given by (17:1).
eigenvalues of the given matrix A. Moreover,
Our correspondence allows us to translate every
method or result in the vector-space into an anal-
ogous method or result for polynomials and vice
versa. Let us take as an example the smoothing
process in section 7. It is easy to show that the
vector rt introduced in that section corresponds to a Hence the residuals decrease during the alogrithm.
polynomial__i?i(X) characterized by the following It may be worth noting that the Rayleigh quotient
property: Rt(\) is the polynomial of degree i with of rt is
22i(O) = l having the least-square integral on (0,1)- (r^Ari) 1 , &t_i 1
In other words, if r0 is given by (14:5), then
mn (18:1)
t+1 X-Xi
-1
a 2— ctoX —
(18:2)
431
The denominators of the convergents are given by puted by expanding into a continued fraction the quo-
the recursion formulas tient built by the characteristic polynomial of A with
respect to r0 and the ordinary characteristic polynomial
o=O. (18:3) of A.
This is the simplest form of the relation between a
This coincides with (15:1). However, in order to matrix A, a vector r0 and the numbers at, bf of the
satisfy (14:3), the expansion must be carried out corresponding cg-process. The theorem may be used
so that di=Ci + l, by virtue of (15:2). The numbers to investigate the behavior of the af, bt if the eigen-
bt are then given by (15:4). It is clear that values of A and those with respect to r0 are given.
The following special case is worth recording. If
.Qn-l(X) m]=m2= . . . = m n = l , the rational function is the
logarithmic derivative of the characteristic poly-
where nomial. From theorem (18:1) follows
Theorem 18:2. If the vector r0 of a cg-process is the
sum of the normalized eigenvectors of A, the numbers
di, bf may be computed by expanding the logarithmic
derivative of the characteristic polynomial of A into a
Let us translate these results into the n-dimensional continued fraction.
space given by our correspondence. As before, we Finally, we are able to prove
construct a positive definite symmetric matrix A Theorem 18:3. There is no restriction whatever on
with eigenvalues Xi, . . ., \n. Let eu . . ., en be the positive constants at, bt in the cg-process, that is,
corresponding eigenvectors of unit length and choose, given two sequences of positive numbers a0, aly . . .,
as before, an_i and b0, bi, . . ., 5w_i, there is a symmetric positive
definite matrix A and a vector r0 such that the cg-
anen, algorithm applied to A, r0 yield the given numbers.
The demonstration goes along the following lines:
The eigenvalues are the reciprocals of the squares of From (15:2) and (15:4), we compute the numbers ciy
the semiaxis of the (n—1)-dimensional ellipsoid dt, the d being again positive. Then we use the con-
(x,Ax) = l. The hyperplane, (ro,x) = O, cuts this tinued fraction (18:2) to compute F(\) which we
ellipsoid in an (n—2)-dimensional ellipsoid, En_2, the decompose into partial fractions to obtain (18:1).
squares of whose semiaxis are given by the reciprocals We show next that the numbers Xi, mt appearing in
of the zeros of the numerator Qn-i(\) of F(\). (18:1) are positive. After this has been established,
This follows from the fact that if Xo is a number our correspondence finishes the proof.
such that there is a vector xQ?£0 orthogonal to r0 In order to prove that Xj>0, m t >0we observe that
having the property that (Axo,x) = \o(xo,x) whenever the ratio RwIRt is a decreasing function of X, as can
(ro,a0 = O, then Xo is the square of the reciprocal of be seen from (18:3) by induction. Using this result,
the semiaxis of En-2 whose direction is given by x0. it is not too difficult to show that the polynomials
If the coordinate system is chosen so that the axes i?0(X),i?i(X), . . ., i?»(X) build a Sturmian sequence
are given by ex, . . ., en, respectively, then X=X0 in the following sense. The number of zeros of Rn(\)
satisfies the equation in any interval a< X< b is equal to the increase of the
number of variations in sign in going from a to b.
0 0 0 At the point Xo there are no variations in sign since
Bt(0) = 1 for every i. At X= + °°, there are exactly n
0 X2—X 0 0 variations because the coefficient of the highest power
of X in i?i(X) is (— 1)%0&I • • • <t>i-i- Therefore, the
0 roots Xi, X2, . . ., Xw of i?TC(X) are real and positive.
That the function F(\) is itself a decreasing func-
tion of X follows directly from (18:2). Therefore, its
residues mx, m2, . . ., mn are positive.
In view of theorem. 18:3 the numbers at in a cg-
0 0 ... K —X an process can increase as fast as desired. This result
was used in section 8.2. Furthermore, the formula
ai a2 ... an 0
as was to be proved.
Let us call the zeros of Qn-i (X) the eigenvalues of A
with respect to r0 and the polynomial Q»-i(X) the shows that there is no restriction at all on the be-
characteristic polynomial of A with respect to r0. The havior of the length of the residual vector during the
rational function F(X) is accordingly the quotient of cg-process. Hence, there are certainly examples
this polynomial and the characteristic polynomial of where the residual vectors increase in length during
A. Hence we have, the computation, as was stated earlier. This holds
Theorem 18:1. The numbers at, bt connected with in spite of the fact that the error vector h—x decreases
the cg-process of a matrix A and a vector x0 can be com- in length at each step.
432
19. Numerical Illustrations A sufficient number of experiments have not been
carried out as yet so as to determine the '"best"
A number of numerical experiments have been formulas to be used. Our experiments do indicate
made with the processes described in the preceding that floating operations should be used whenever
sections. A preliminary report on these experiments possible. We have also observed that the results
will be given in this section. In carrying out these in the (r& + l)st and (r&+2)nd iterations are normally
experiments, no attempt was made to select those far superior to those obtained in the nth. iteration.
which favored the method. Normally, we selected Example 1. This example was selected to illus-
those which might lead to difficulties. trate the method of conjugate gradients in case
In carrying out these experiments three sets of there are no rounding off errors. The matrix A
formulas for a^ bt were used in the symmetric case, was chosen to be the matrix
namely,
1 2 -1
b K }
2 5 0 2
\ri\ -1 0 6 0
di = 7 (19:2)
\rt
1 2 0 3
a
1
= If we select k to be the vector (0,2, —1,1), the
( computation is simple. The results at each step
(19:3) are given in table 1.
Normally, the computation is not as simple as in-
In the nonsymmetric case, we have used only the dicated in the preceding case. For example, if one
formulas selects the solution h to be the vector (1,1,1,1), then
k is the vector (3,9,5,6). The results with (0,0,0,0)
(19:4) as the initial estimate is given by table 2.
TABLE 1.
Our experience thus far indicates that the best
results are obtained by the use of (19:1). Formulas
(19:2) were about as good as (19:1) except for very Components of the vector
ill conditioned matrices. Most of our experiments Step Vector en bi-i
1 2 3 4
were carried out with the use of (19:2) because they
are somewhat simpler than (19:1). Formulas (19:3) Xo 1 0 0 0
were designed to improve the relations
ro -1 0 0 0
0
Po -1 0 0 0
(pt,Api+1) = 0, (19:5)
Apo -1 -2 1 i
1
r\ 0 2 -1 1 6
1
(19:6) Pi -6 2 -1 1
(pt,r i+i)=0,
Api 0 0 0 1 6 .
A reflection of the geometrical interpretation of the
method will convince one that one should strive to X2 -36 +12 -6 6
satisfy the relations (19:6) rather than (19:5). It is T2 0 2 -1 -5 5
2
for this reason that (19:1) appears to be considerably P2 -30 12 -6 0
superior to (19:3). In place of (19:2), one can use AP2 0 0 -6 -6 5/6
the formulas
Xz -61 22 -11 6
\rt\ n 0 2 4 0 2/3
at==-/ 3
\ri\ P3 -20 10 0 0
Apz 0 10 20 0 1/5
to correct rounding off errors. A preliminary
experiment indicates that this choice is better than 4 Xi -65 24 -11 6
(19:2) and is perhaps as good as (19:1).
433
TABLE 2. The system just described is particularly well
suited for elimination. In case k is the vector
a. times components of vector (3, 9, 5, 6) the procedure described in section 12 yields
Step Vector a the results given in table 3. In this table, we start
1 2 3 4 with the matrices A and / . These matrices are
transformed into the matrices P*A and P* given at
Xo 0 0 0 0 1 the bottom of the table.
0
ro 3 9 5 6 1 It is of interest to compare the error vectors
po 3 9 5 6 1 yi=h—Xi obtained by the two methods just described
Apo 22 63 27 39 1
with Jc=(3, 9, 5, 6). The error \yt\ is given in the
following table.
zi 453 1359 755 906 ft
ri -316 -495 933 123 ft \Vi\ cg-method Elimination method
1
Pi -1935 -2799 6461 1140 ft7l
Api -12854 -15585 40701 -4113 ft7l M 2.0 2.00
toil 0. 7 2.65
X2 131702 419553 298277' 304149 ft
1689 -34360 -27345 73483 ft M .67 4.69
2
Pi -116022 -1684085 -381080 3066641 ft72 \v*\ .65 6.48
Ap2 -66471 -2579187 -2140458 5685731 ft72 M .0 0.00
Starting vector k = (3.371, 1.2996, 3.4851, 3.7244, 3.0387, 2.412) Xo= (1,0,0)
1 2 3
Step Xi r» Pi at, bi Case. Formula (19:2) F o r m u l a (19:3)
1 with 10 digits
Formula (19:1)
0 3.37100 3.37100 5 5 5 5
0 1.29960 1. 29960 ao=3.092387 po 11 11 11 11
-14 -14 -14 -14
0 3.48510 3.48510
. 999998 -.00000252
- . 10484 -.05079 . 99424 .9999886893
. 999991 -.00000084
6 1.000013 -.00002271 x% -1.46997 —1.34651 -2.99518 -3.000023179
1.000004 .00000645
- 1 . 21653 -1.38837 -1.99328 -1.999968135
. 999992 .00001636
. 999991 . 00000825
-.057616 -.058572 -.086092 -.0009108898
r% .23615 .23643 - . 19036 -.0020300857
-.18543 -.18733 .25063 .0026663150
keeping five significant figures at all times. For . 093471 . 094422 .10646 . 000012060204
comparison, the computation was carried out also
with 10 digits, using (19:2). The results are given 62 3.0287 3.0306 4.5804 . 000518791676
in table 5. In the third iteration, formula (19:1) -.36924 -.37336 -.50602 -.0009585010
gave the better result. In the fourth iteration, Vi .51134 .51126 .39181 -.0019642287
formulas (19:1) and (19:2) were equally good, and . 26185 .26035 .54853 .0027001920
superior to (19:3). The solution was also carried out as 2.9923 2.9762 . 011854 .0118007358
by the elimination method using only five significant
figures. The results are 1.00004 1.06040 1.00024 1.0000000003
Xi -3.00005 - 2 . 86812 -2.99982 -2.9999999997
-2.00006 —2.16322 -1.99978 -1.9999999993
X . 99424 1. 00603
y -2.99518 -3.00506
z — 1.99328 -2.00180
435
In this case the results by the cg-method and elimin- Several symmetric systems, some involving as
ation method appear to be equally effective. The many as twelve unknowns, have been solved on the
cg-method has the advantage that an improvement IBM card programed calculator. In one case,
can be made by taking one additional step. where the ratio of the largest to the smallest eigen-
This example is also a good illustration for the value was 4.9, a satisfactory solution has been
fact that the size of the residuals is not a reliable obtained already in the third step; in another case,
criterion for how close one is to the solution. In where this ratio was 100, one had to carry out fifteen
step 3 the residuals in case 1 are smaller than those steps in order to get an estimate with six correct
of case 3, although the estimate in case 1 is very far digits. In these computations floating operations
from the right solution, whereas in case 3 we are were not used. At all times an attempt was made
close to it. to keep six or seven significant figures.
Further examples. The largest system that has The cg-method has also been applied to the
been solved by the cg-method is a linear, symmetric solution of small nonsymmetric systems on the
system of 106 difference equations. The computa- SWAC. The results indicate that the method is
tion, was done on the Zuse relay-computer at the very suitable for high speed machines.
Institute for Applied Mathematics in Zurich. The A report on these experiments is being prepared
estimate obtained in the 90th step was of sufficient at the National Bureau of Standards, Los Angeles.
accuracy to be acceptable. 15
is See U. Hochstrasser,"Die Anwendung der Methode der konjugierten Gradi-
enten und ihrer Mqdifikationen auf die Losung linearer Randwertprobleme,"
Thesis E. T. H., Zurich, Switzerland, in manuscript. Los ANGELES, May 8,1952.
INDEX TO VOLUME 49
A Page
Page Compressibility of natural high polymers, effect ol moisture on, RP2349 135
Absorbency and scattering of light in sugar solutions, RP2373 365 Computing techniques, RP2348 133
Absorption spectrum of water vapor, RP2347 91 Configurations, low even, of first spectrum of molybdenum (Mo i), RP2378. _. 397
Acoustic materials, long-tube method for field determination of sound- Constitution diagram for magnesium-zirconium alloys, RP2352 155
absorption coefficients, RP2339 17 Copper, high-purity, tensil properties, RP2354 167
Acrylic plastic glazing, tensile and crazing properties, RP2369 61 Copper-nickel alloys, crystal orientation and polarized-light extinctions
Acquista, Nicolo, Earle K. Plyler, Infrared properties of cesium bromide of, RP2351 149
prisms, RP2343 61 Corrosion of galvanized steel in soils, RP2366 299
—, —, Calibrating wavelengths in the region 0.6 to 2.6 microns, RP2338 13 Corrosion, soil, of low-alloy irons and steels:
Aliphatic nitrocompounds, diphenylamine test for, RP2353 163 RP2366 299
Alloy, copper-nickel, crystal orientation and polarized light extinctions of, RP2367 315
RP2351 149 Creep, influence on tensile properties of copper, RP2354 167
Annealed and cold-drawn copper, structures and properties of, RP2354 167
Annealing optical glass, effect of temperature gradients, RP2340 21
Arc and spark spectra of rhenium, RP2355 187 D
Argon, spectrum of, RP2345 73
Attenuation index of turbid solutions, RP2373 365 Dam-break functions, hydraulic resistance effect upon, RP2356 217
Axilrod, B. M., M. A. Sherman, V. Cohen, I. Wolock, Effects of moderate Deitz, V. R., N. L. Pennington, H. L. Hoffman, Jr., Transmittancy of com-
biaxial stretch-forming on tensile and crazing properties of acrylic plastic mercial sugar liquors: Dependence on concentration of total solids, RP2373. 365
glazing, RP2369 331 Denison, Irving A., Melvin Romanoff, Corrosion of galvanized steel in soils,
RP2366 299
B —, —, Corrosion of low-alloy irons and steels in soils, RP2367 315
Deuterium and hydrogen electrode characteiistics of lithia-silica glasses,
Badger, Florence T., Leroy W. Tilton, Fred W. Fosberry, Refractive uni- RP2363 267
formity of a borosilicate glass after different annealing treatments, RP2340-_ 21 4,4'-Diaminobenzophenone, dissociation constants from spectral-absorb-
Beer's law, deviations in sugar solutions, RP2373 365 ancy measurements, RP2337 7
Benedict, W. S., H. H. Claassen, J. H. Shaw, Absorption spectrum of water Dibeler, Vernon H., Fred L. Mohler, R. M. Reese, Mass spectra of fluoro-
vapor between 4.5 and 13 microns, RP2347 91 carbons, RP2370 343
—, Earle K. Plyler, Fine structure in some infrared bands of methylene —, Mass spectra of the tetramethyl compounds of carbon, silicon, ger-
halides, RP2336 1 manium, tin, and lead, RP2358-.. 235
Brauer, G. M., 1. C. Schoonover, W. T. Sweeney, Effect of water on the induc- Digges, Thomas G., William D. Jenkins, Influence of prior strain history
tion period of the polymerization of methyl methacrylate RP2372 359 on the tensile properties and structures of high-purity copper, RP2354 167
Breckenridge, F. G., W. F. Hosier, Titanium dioxide rectifiers, RP2344 65 Diphenylamine test for aliphatic nitrocompounds, RP2353 163
Burnett, H. C, J. H. Schaum, Magnesium-rich side of the magnesium- Dissociation constants of 4,4'-diaminobenzophenone, RP2337 7
zirconium constitution diagram, RP2352 155 Dressier, Robert F., Hydraulic resistance effect upon the dam-break func-
tions, RP2356 217
c Duncan, Blanton C, Lawrence M. Kushner, James 1. Hoffman, A visco-
metric study of the micelles of sodium dodecyl sulfate in dilute solutions,
Calorimetric properties of Teflon, RP2364 273 RP2346 85
Carson, F. T., Vernon Worthington, Stiffness of paper, RP2376 385
Cements, magnesium oxychloride, heat of hardening, RP2375 377
Cements, portland-pozzolan, heats of hydration and pozzolan content, E
RP2342 - 55
Cesium bromide prisms, infrared properties, RP2343 61 Edelman, Seymour, Earle Jones, Albert London, Long-tube method for field
Claassen, H. H., W. S. Benedict, J. H. Shaw, Absorption spectrum cf water determination of sound-absorption coefficients, RP2339 17
vapor between 4.5 and 13 microns, RP2347 91 Eigenvalue problems, numerical computation, RP2341 33
Cleek, Given W., Donald Hubbard, Deuterium and hydrogen electrode charac- Electrode, lithia-silica glass, pH and pD response, RP2363 267
teristics of lithia-silica glasses, RP2363 267 Emerson, Walter B., Determination of planeness and bending of optical
Cohen, V., B. M. Axilrod, M. A. Sherman, I. Wolock, Effects of moderate flats, RP2359 241
biaxial stretch-forming on tensile and crazing properties of acrylic plastic Energy levels, rotation-vibration, of water vapor, RP2347 91
glazing, RP2369 331 Evans, William H., Donald D. Wagman, Thermodynamics of some simple
Combustors, gas turbine, analytical and experimental studies, RP2365 279 sulfur-containing molecules, RP2350... _._ 141
436