Numerical Analysis Lecture Notes: 7. Iterative Methods For Linear Systems
Numerical Analysis Lecture Notes: 7. Iterative Methods For Linear Systems
Peter J. Olver
u(0) = a.
(7.1)
The coefficient matrix T has size n n. We will consider both real and complex systems, and so the iterates u(k) are vectors either in R n (which assumes that the coefficient
matrix T is also real) or in C n . For k = 1, 2, 3, . . ., the solution u(k) is uniquely determined
by the initial conditions u(0) = a.
Powers of Matrices
The solution to the general linear iterative system (7.1) is, at least at first glance,
immediate. Clearly,
u(1) = T u(0) = T a,
u(2) = T u(1) = T 2 a,
u(3) = T u(2) = T 3 a,
Warning: The superscripts on u(k) refer to the iterate number, and should not be mistaken
for derivatives.
5/18/08
103
c 2008
Peter J. Olver
and, in general,
u(k) = T k a.
(7.2)
Thus, the iterates are simply determined by multiplying the initial vector a by the successive powers of the coefficient matrix T . And so, unlike differential equations, proving the
existence and uniqueness of solutions to an iterative system is completely trivial.
However, unlike real or complex scalars, the general formulae and qualitative behavior
of the powers of a square matrix are not nearly so immediately apparent. (Before continuing, the reader is urged to experiment with simple 2 2 matrices, trying to detect
patterns.) To make progress, recall how we managed to solve linear systems of differential
equations by suitably adapting the known exponential solution from the scalar version.
In the iterative case, the scalar solution formula (2.8) is written in terms of powers, not
exponentials. This motivates us to try the power ansatz
u(k) = k v,
(7.3)
in which is a scalar and v is a fixed vector, as a possible solution to the system. We find
u(k+1) = k+1 v,
while
T u(k) = T (k v) = k T v.
(7.4)
where v1 , . . . , vn are the linearly independent eigenvectors and 1 , . . . , n the corresponding eigenvalues of T . The coefficients c1 , . . . , cn are arbitrary scalars and are uniquely
prescribed by the initial conditions u(0) = a.
Proof : Since we already know that (7.4) is a solution to the system for arbitrary
c1 , . . . , cn , it suffices to show that we can match any prescribed initial conditions. To this
end, we need to solve the linear system
u(0) = c1 v1 + + cn vn = a.
(7.5)
Completeness of T implies that its eigenvectors form a basis of C n , and hence (7.5) always
admits a solution. In matrix form, we can rewrite (7.5) as
S c = a,
5/18/08
so that
104
c = S 1 a,
c 2008
Peter J. Olver
3
5
x(k) +
y (k) ,
y (k+1) =
x(0) = a,
y (0) = b.
1
5
1
5
x(k) +
3
5
y (k) ,
(7.6)
a
a=
.
b
(k)
u2
= ( .4)
1
.
1
Theorem 7.2 tells us that the general solution is given as a linear combination,
1
1
c1 ( .8)k c2 ( .4)k
(k)
(k)
(k)
k
k
u = c1 u1 + c2 u2 = c1 ( .8)
,
+ c2 ( .4)
=
1
1
c1 ( .8)k + c2 ( .4)k
where c1 , c2 are determined by the initial conditions:
a+b
c1 c2
a
(0)
,
u =
=
,
and hence
c1 =
b
c1 + c2
2
c2 =
ba
.
2
Therefore, the explicit formula for the solution to the initial value problem (7.67) is
x(k) = ( .8)k
ab
a+b
+ ( .4)k
,
2
2
y (k) = ( .8)k
a+b
ba
+ ( .4)k
.
2
2
105
c 2008
Peter J. Olver
Figure 7.1.
Example 7.4. The Fibonacci numbers are defined by the second order iterative
scheme
u(k+2) = u(k+1) + u(k) ,
(7.8)
with initial conditions
u(0) = a,
u(1) = b.
(7.9)
In short, to obtain the next Fibonacci number, add the previous two. The classical Fibonacci integers start with a = 0, b = 1; the next few are
u(0) = 0, u(1) = 1, u(2) = 1, u(3) = 2, u(4) = 3, u(5) = 5, u(6) = 8, u(7) = 13, . . . .
The Fibonacci integers occur in a surprising variety of natural objects, including leaves,
flowers, and fruit, [52]. They were originally introduced by the Italian Renaissance mathematician Fibonacci (Leonardo of Pisa) as a crude model of the growth of a population of
rabbits. In Fibonaccis model, the k th Fibonacci number u(k) measures the total number
of pairs of rabbits at year k. We start the process with a single juvenile pair at year 0.
Once a year, each pair of rabbits produces a new pair of offspring, but it takes a full year
for a rabbit pair to mature enough to produce offspring of their own.
Just as every higher order ordinary differential equation can be replaced by an equivalent first order system, so every higher order iterative equation can be replaced by a first
In general, an iterative system u(k+j) = T1 u(k+j1) + + Tj u(k) in which the new iterate
depends upon the preceding j values is said to have order j.
5/18/08
106
c 2008
Peter J. Olver
where
T =
0 1
1 1
To find the explicit formula for the Fibonacci numbers, we must determine the eigenvalues
and eigenvectors of the coefficient matrix T . A straightforward computation produces
1 5
1+ 5
= 1.618034 . . . ,
2 =
= .618034 . . . ,
1 =
2
2
!
!
v1 =
1+ 5
2
v2 =
1 5
2
1+ 5
2
+ c2
2 a + (1 + 5) b
c1 =
,
2 5
1 5
2
a
b
(7.10)
2 a + (1 5) b
.
c2 =
2 5
The first entry of the solution vector (7.10) produces the explicit formula
!k
!k
(1 + 5) a 2 b 1 5
(1 + 5) a + 2 b 1 + 5
(k)
+
u
=
2
2
2 5
2 5
(7.11)
for the k th Fibonacci number. For the particular initial conditions a = 0, b = 1, (7.11)
reduces to the classical Binet formula
!k
!k
1 5
1
1+ 5
(7.12)
u(k) =
2
2
5
for the k th Fibonacci integer. It is a remarkable fact that, for every value of k, all the 5s
cancel out, and the Binet formula does indeed produce the Fibonacci integers listed above.
Another useful observation is that, since
51
1+ 5
< 1 < 1 =
,
0 < | 2 | =
2
2
5/18/08
107
c 2008
Peter J. Olver
Figure 7.2.
Fibonacci Iteration.
the terms involving k1 go to (and so the zero solution to this iterative system is unstable)
while the terms involving k2 go to zero. Therefore, even for k moderately large, the first
term in (7.11) is an excellent approximation (and one that gets more and more accurate
with increasing k) to the k th Fibonacci number. A plot of the first 4 iterates, starting
with the initial data consisting of equally spaced points on the unit circle, can be seen in
Figure 7.2. As in the previous example, the circle is mapped to a sequence of progressively
more eccentric ellipses; however, their major semi-axes become more and more stretched
out, and almost all points end up going off to .
The dominant eigenvalue 1 = 21 1 + 5 = 1.618034 . . . is known as the golden
ratio and plays an important role in spiral growth in nature, as well as in art, architecture
and design, [52]. It describes the overall growth rate of the Fibonacci integers,
and, in
1
fact, any sequence of Fibonacci numbers with initial conditions b 6= 2 1 5 a.
3
1
6
Example 7.5. Let T = 1 1 2 be the coefficient matrix for a three1 1
0
(k+1)
(k)
dimensional iterative system u
= T u . Its eigenvalues and corresponding eigenvectors are
1 = 2,
4
v1 = 2 ,
1
5/18/08
2 = 1 + i ,
2 i
v2 = 1 ,
1
108
3 = 1 i ,
2+ i
v3 = 1 .
1
c 2008
Peter J. Olver
Therefore, according to (7.4), the general complex solution to the iterative system is
4
2 i
2+ i
u(k) = b1 ( 2)k 2 + b2 ( 1 + i )k 1 + b3 ( 1 i )k 1 ,
1
1
1
2 cos 34 k + sin 43 k
2 sin 43 k cos 43 k
2 i
k/2
( 1 + i )k 1 = 2k/2
+ i 2
cos 34 k
sin 34 k
1
3
3
cos k
sin k
4
2 sin 43 k cos 43 k
2 cos 43 k + sin 34 k
4
k/2
c1 ( 2)k 2 + c2 2k/2
+ c3 2
,
cos 43 k
sin 34 k
1
cos 3 k
sin 3 k
4
(7.13)
where c1 , c2 , c3 are arbitrary real scalars, uniquely prescribed by the initial conditions.
7.2. Stability.
With the solution formula (7.4) in hand, we are now in a position to understand
the qualitative behavior of solutions to (complete) linear iterative systems. The most
important case for applications is when all the iterates converge to 0.
Definition 7.6. The equilibrium solution u = 0 to a linear iterative system (7.1)
is called asymptotically stable if and only if all solutions u(k) 0 as k .
Asymptotic stability relies on the following property of the coefficient matrix.
Definition 7.7. A matrix T is called convergent if its powers converge to the zero
matrix, T k O, meaning that the individual entries of T k all go to 0 as k .
The equivalence of the convergence condition and stability of the iterative system
follows immediately from the solution formula (7.2).
Proposition 7.8. The linear iterative system u(k+1) = T u(k) has asymptotically
stable zero solution if and only if T is a convergent matrix.
5/18/08
109
c 2008
Peter J. Olver
(7.14)
prescribed by the vectors maximal entry (in modulus) is usually much easier to work with.
Convergence of the iterates is equivalent to convergence of their norms:
u(k) 0
if and only if
k u(k) k 0
as
k .
The fundamental stability criterion for linear iterative systems relies on the size of the
eigenvalues of the coefficient matrix.
Theorem 7.9. A linear iterative system (7.1) is asymptotically stable if and only if
all its (complex) eigenvalues have modulus strictly less than one: | j | < 1.
Proof : Let us prove this result assuming that the coefficient matrix T is complete.
(The proof in the incomplete case relies on the Jordan canonical form, and is outlined in
the exercises.) If j is an eigenvalue such that | j | < 1, then the corresponding basis
(k)
solution uj
k uj k = k kj vj k = | j |k k vj k 0
since
| j | < 1.
Therefore, if all eigenvalues are less than 1 in modulus, all terms in the solution formula
(7.4) tend to zero, which proves asymptotic stability: u(k) 0. Conversely, if any eigenvalue satisfies | j | 1, then the solution u(k) = kj vj does not tend to 0 as k , and
hence 0 is not asymptotically stable.
Q.E.D.
Consequently, the necessary and sufficient condition for asymptotic stability of a linear
iterative system is that all the eigenvalues of the coefficient matrix lie strictly inside the
unit circle in the complex plane: | j | < 1.
Definition 7.10. The spectral radius of a matrix T is defined as the maximal modulus of all of its real and complex eigenvalues: (T ) = max { | 1 |, . . . , | k | }.
We can then restate the Stability Theorem 7.9 as follows:
Theorem 7.11. The matrix T is convergent if and only if its spectral radius is
strictly less than one: (T ) < 1.
5/18/08
110
c 2008
Peter J. Olver
| 1 |k k c1 v1 k + + | n |k k cn vn k
(T )k | c1 | k v1 k + + | cn | k vn k
(7.15)
= C (T )k ,
for some constant C > 0 that depends only upon the initial conditions. In particular, if
(T ) < 1, then
k u(k) k C (T )k 0
as
k ,
(7.16)
in accordance with Theorem 7.11. Thus, the spectral radius prescribes the rate of convergence of the solutions to equilibrium. The smaller the spectral radius, the faster the
solutions go to 0.
If T has only one largest (simple) eigenvalue, so | 1 | > | j | for all j > 1, then the
first term in the solution formula (7.4) will eventually dominate all the others: k k1 v1 k
k kj vj k for j > 1 and k 0. Therefore, provided that c1 6= 0, the solution (7.4) has the
asymptotic formula
u(k) c1 k1 v1 ,
(7.17)
and so most solutions end up parallel to v1 . In particular, if | 1 | = (T ) < 1, such a
solution approaches 0 along the direction of the dominant eigenvector v1 at a rate governed
by the modulus of the dominant eigenvalue. The exceptional solutions, with c1 = 0, tend
to 0 at a faster rate, along one of the other eigendirections. In practical computations,
one rarely observes the exceptional solutions. Indeed, even if the initial condition does not
involve the dominant eigenvector, round-off error during the iteration will almost inevitably
introduce a small component in the direction of v1 , which will, if you wait long enough,
eventually dominate the computation.
Warning: The inequality (7.15) only applies to complete matrices. In the general
case, one can prove that the solution satisfies the slightly weaker inequality
k u(k) k C k
for all
k 0,
where
> (T )
(7.18)
is any number larger than the spectral radius, while C > 0 is a positive constant (whose
value may depend on how close is to ).
Example 7.12. According to Example 7.5, the matrix
3
T = 1
1
1
6
1 2
1
0
has eigenvalues
1 = 2,
2 = 1 + i ,
3 = 1 i .
111
c 2008
Peter J. Olver
Te = 31 T =
1 13
1
3
1
3
1
3
1
3
2
2
3
1 = 23 ,
with eigenvalues
2 =
3 =
1
3
1
3
1
3
1
3
i,
i,
has spectral radius (Te) = 32 , and hence is a convergent matrix. According to (7.17), if we
write the initial data u(0) = c1 v1 +c2 v2 +c3 v3 as a linear combination of the eigenvectors,
k
v1 , where
then, provided c1 6= 0, the iterates have the asymptotic form u(k) c1 23
T
v1 = ( 4, 2, 1 ) is the eigenvector corresponding to the dominant eigenvalue 1 = 23 .
Thus, for most initial vectors, the iterates end up decreasing in length by a factor of almost
exactly 32 , eventually becoming parallel to the dominant eigenvector v1 . This is borne out
T
by a sample computation: starting with u(0) = ( 1, 1, 1 ) , the first ten iterates are
.0936
.0627
.0416
.0275
.0462 , .0312 , .0208 , .0138 ,
.0231
.0158
.0105
.0069
.0121
.0081
.0054
.0036
.0061 , .0040 , .0027 , .0018 ,
.0030
.0020
.0013
.0009
.0182
.0091 ,
.0046
.0024
.0012 .
.0006
112
c 2008
Peter J. Olver
(7.19)
(7.20)
Therefore, A v = 0 for every v R n , which implies A = O is the zero matrix. This serves
to prove the positivity property. As for homogeneity, if c R is any scalar,
k c A k = max { k c A u k } = max { | c | k A u k } = | c | max { k A u k } = | c | k A k.
Finally, to prove the triangle inequality, we use the fact that the maximum of the sum of
quantities is bounded by the sum of their individual maxima. Therefore, since the norm
on R n satisfies the triangle inequality,
k A + B k = max { k A u + B u k } max { k A u k + k B u k }
Q.E .D.
max { k A u k } + max { k B u k } = k A k + k B k.
The property that distinguishes a matrix norm from a generic norm on the space of
matrices is the fact that it also obeys a very useful product inequality.
Theorem 7.14. A natural matrix norm satisfies
k A v k k A k k v k,
for all
A Mnn ,
v R n.
(7.21)
Furthermore,
k A B k k A k k B k,
for all
A, B Mnn .
(7.22)
max { k A k k B u k } = k A k max { k B u k } = k A k k B k.
Q.E .D.
113
c 2008
Peter J. Olver
The multiplicative inequality (7.22) implies, in particular, that k A2 k k A k2 ; equality is not necessarily valid. More generally:
Proposition 7.15. If A is a square matrix, then k Ak k k A kk . In particular, if
k A k < 1, then k Ak k 0 as k , and hence A is a convergent matrix: Ak O.
The converse is not quite true; a convergent matrix does not necessarily have matrix norm less than 1, or even 1 see Example 7.20 below. An alternative proof of
Proposition 7.15 can be based on the following useful estimate:
Theorem 7.16. The spectral radius of a matrix is bounded by its matrix norm:
(A) k A k.
(7.23)
(7.24)
Since k A k is the maximum of k A u k over all possible unit vectors, this implies that
| | k A k.
(7.25)
If all the eigenvalues of A are real, then the spectral radius is the maximum of their absolute
values, and so it too is bounded by k A k, proving (7.23).
If A has complex eigenvalues, then we need to work a little harder to establish (7.25).
(This is because the matrix norm is defined by the effect of A on real vectors, and so
we cannot directly use the complex eigenvectors to establish the required bound.) Let
= r e i be a complex eigenvalue with complex eigenvector z = x + i y. Define
(7.26)
m = min k Re (e i z) k = k (cos ) x (sin ) y k 0 2 .
Since the indicated subset is a closed curve (in fact, an ellipse) that does not go through
the origin, m > 0. Let 0 denote the value of the angle that produces the minimum, so
m = k (cos 0 ) x (sin 0 ) y k = k Re e i 0 z k.
so that
k u k = 1.
1
1
r
Re e i 0 A z =
Re e i 0 r e i z =
Re e i (0 +) z .
m
m
m
r
k Re e i (0 +) z k r = | |,
m
114
c 2008
(7.27)
Q.E.D.
Peter J. Olver
Explicit Formulae
Let us now determine the explicit formulae for the matrix norms induced by our most
important vector norms on R n . The simplest to handle is the norm
k v k = max { | v1 |, . . . , | vn | }.
Definition 7.17. The ith absolute row sum of a matrix A is the sum of the absolute
values of the entries in the ith row:
n
X
| aij |.
(7.28)
si = | ai1 | + + | ain | =
j =1
X
= max{s1 , . . . , sn } = max
| aij | 1 i n .
(7.29)
j =1
Proposition 7.18.
absolute row sum:
k A k
Proof : Let s = max{s1 , . . . , sn } denote the right hand side of (7.29). Given any
v R n , we compute
n
n
X
X
| aij vj |
aij vj
max
k A v k = max
j =1
j =1
n
X
max
| aij |
max | vj | = s k v k .
j =1
n
X
j =1
| aij | = s.
(7.30)
Let u R n be the specific vector that has the following entries: uj = + 1 if aij > 0, while
uj = 1 if aij < 0. Then k u k = 1. Moreover, since aij uj = | aij |, the ith entry of A u
is equal to the ith absolute row sum (7.30). This implies that
k A k k A u k s.
Q.E .D.
Combining Propositions 7.15 and 7.18, we have established the following convergence
criterion.
Corollary 7.19. If all the absolute row sums of A are strictly less than 1, then
k A k < 1 and hence A is a convergent matrix.
5/18/08
115
c 2008
Peter J. Olver
1
2
1
3
1 1 5 1 1
7
lute row sums are 2 + 3 = 6 , 3 + 4 = 12 , so
5
7
k A k = max 56 , 12
= 6 .83333 . . . .
1
3
1
4
Since the norm is less than 1, A is a convergent matrix. Indeed, its eigenvalues are
9 73
9 + 73
.7310 . . . ,
2 =
.0190 . . . ,
1 =
24
24
and hence the spectral radius is
9 + 73
(A) =
.7310 . . . ,
24
which is slightly smaller than its norm.
The row sum test for convergence is not always conclusive. For example, the matrix
!
3
1
2
5
A=
has matrix norm
k A k = 11
(7.31)
10 > 1.
3
1
5
4
On the other hand, its eigenvalues are (15 601 )/40, and hence its spectral radius is
15 + 601
.98788 . . . ,
(A) =
40
which implies that A is (just barely) convergent, even though its maximal row sum is larger
than 1.
p
v12 + + vn2 is
The matrix norm associated with the Euclidean norm k v k2 =
given by largest singular value.
Proposition 7.21. The matrix norm corresponding to the Euclidean norm equals
the maximal singular value:
k A k2 = 1 = max {1 , . . . , r },
r = rank A > 0,
while
k O k2 = 0.
(7.32)
Unfortunately, as we discovered in Example 7.20, matrix norms are not a foolproof test
of convergence. There exist convergent matrices such that (A) < 1 and yet have matrix
norm k A k 1. In such cases, the matrix norm is not able to predict convergence of the
iterative system, although one should expect the convergence to be quite slow. Although
such pathology might show up in the chosen matrix norm, it turns out that one can always
rig up some matrix norm for which k A k < 1. This follows from a more general result,
whose proof can be found in [44].
Theorem 7.22. Let A have spectral radius (A). If > 0 is any positive number,
then there exists a matrix norm k k such that
(A) k A k < (A) + .
(7.33)
Corollary 7.23. If A is a convergent matrix, then there exists a matrix norm such
that k A k < 1.
5/18/08
116
c 2008
Peter J. Olver
Proof : By definition, A is convergent if and only if (A) < 1. Choose > 0 such that
(A) + < 1. Any norm that satisfies (7.33) has the desired property.
Q.E.D.
Remark : Based on the accumulated evidence, one might be tempted to speculate that
the spectral radius itself defines amatrixnorm. Unfortunately, this is not the case. For
0 1
example, the nonzero matrix A =
has zero spectral radius, (A) = 0, in violation
0 0
of a basic norm axiom.
u(0) = u0 ,
(7.35)
117
c 2008
Peter J. Olver
x 4 y + 2 z = 1,
2 x y + 5 z = 2,
3
1 1
x
A=
1 4
2 ,
u = y ,
2 1
5
z
(7.37)
3
b = 1 .
2
One easy way to convert a linear system into a fixed-point form is to rewrite it as
u = I u A u + A u = ( I A)u + b = T u + c,
where
T = I A,
c = b.
2 1
1
T = I A = 1
5 2 ,
2
1 4
3
c = b = 1 .
2
The resulting iterative system u(k+1) = T u(k) + c has the explicit form
x(k+1) = 2 x(k) y (k) + z (k) + 3,
(7.38)
Another possibility is to solve the first equation in (7.37) for x, the second for y, and
the third for z, so that
x = 31 y +
1
3
z + 1,
y=
1
4
x+
1
2
z + 14 ,
z=
2
5
x+
1
5
y + 25 .
u = Tb u + b
c,
0 13
1
3
1
2
1
5
Tb = 14
in which
2
5
1
1
b
c = 4 .
2
5
z (k+1) =
1
4
2
5
x(k) +
x(k) +
1
3
1
2
1
5
z (k) + 1,
z (k) + 41 ,
(7.39)
y (k) + 25 .
118
c 2008
Peter J. Olver
u(k+1) = T u(k) + b
0
1
2
3
4
5
6
7
8
9
10
11
0
0
0
3
1
2
0
13
1
15
64
7
30
322
4
261
1633
244
870
7939
133
6069
40300
5665
22500
196240
5500
145743
992701 129238
571980 4850773 184261
3522555 24457324 2969767
u(k+1) = Tb u(k) + b
c
0
1
1.05
1.05
1.0075
1.005
.9986
1.0004
.9995
1.0001
.9999
1.0000
.25
.7
.9375
.9925
1.00562
1.002
1.0012
1.0000
1.0001
.9999
1.0000
.4
.85
.96
1.0075
1.0015
1.0031
.9999
1.0004
.9998
1.0001
1.0000
For the first scheme, the answer is clearly no the iterates become wilder and wilder.
Indeed, this occurs no matter how close the initial guess u(0) is to the actual solution
unless u(0) = u happens to be exactly equal. In the second case, the iterates do converge
to the solution, and it does not take too long, even starting from a poor initial guess, to
obtain a reasonably accurate approximation. Of course, in such a simple example, it would
be silly to use iteration, when Gaussian Elimination can be done by hand and produces the
solution almost immediately. However, we use the small examples for illustrative purposes,
bringing the full power of iterative schemes to bear on the large linear systems arising in
applications.
The convergence of solutions to (7.35) to the fixed point u is based on the behavior
of the error vectors
e(k) = u(k) u ,
(7.40)
which measure how close the iterates are to the true solution. Let us find out how the
successive error vectors are related. We compute
e(k+1) = u(k+1) u = (T u(k) + a) (T u + a) = T (u(k) u ) = T e(k) ,
(7.41)
with the same coefficient matrix T . Therefore, they are given by the explicit formula
e(k) = T k e(0) .
Now, the solutions to (7.35) converge to the fixed point, u(k) u , if and only if the error
vectors converge to zero: e(k) 0 as k . Our analysis of linear iterative systems, as
summarized in Proposition 7.8, establishes the following basic convergence result.
5/18/08
119
c 2008
Peter J. Olver
Proposition 7.25. The affine iterative system (7.35) will converge to the solution
to the fixed point equation (7.36) if and only if T is a convergent matrix: (T ) < 1.
The spectral radius (T ) of the coefficient matrix will govern the speed of convergence.
Therefore, our main goal is to construct an iterative scheme whose coefficient matrix has
as small a spectral radius as possible. At the very least, the spectral radius must be less
than 1. For the two iterative schemes presented in Example 7.24, the spectral radii of the
coefficient matrices are found to be
( Tb ) = .5.
(T ) 4.9675,
Therefore, T is not a convergent matrix, which explains the wild behavior of its iterates,
whereas Tb is convergent, and one expects the error to roughly decrease by a factor of 12 at
each step.
The Jacobi Method
The first general iterative scheme for solving linear systems is based on the same
simple idea used in our illustrative Example 7.24. Namely, we solve the ith equation in the
system A u = b, which is
n
X
aij uj = bi ,
j =1
for the ith variable ui . To do this, we need to assume that all the diagonal entries of A are
nonzero: aii 6= 0. The result is
n
n
X
1 X
bi
tij uj + ci ,
=
ui =
aij uj +
aii
aii
j=1
j6=i
where
a
ij ,
aii
tij =
0,
(7.42)
j =1
i 6= j,
and
i = j,
ci =
bi
.
aii
(7.43)
The result has the form of a fixed-point system u = T u + c, and forms the basis of the
Jacobi method
(7.44)
u(k+1) = T u(k) + c,
u(0) = u0 ,
named after the influential nineteenth century German analyst Carl Jacobi. The explicit
form of the Jacobi iterative scheme is
n
b
1 X
(k)
(k+1)
aij uj + i .
(7.45)
ui
=
aii j=1
aii
j6=i
120
(7.46)
c 2008
Peter J. Olver
into the sum of a strictly lower triangular matrix L, a diagonal matrix D, and a strictly
upper triangular matrix U , each of which is uniquely specified. For example, when
3
1 1
A = 1 4
2,
(7.47)
2 1
5
the decomposition (7.46) yields
0
0 0
L= 1
0 0,
2 1 0
3
0 0
D = 0 4 0 ,
0
0 5
0 1
U = 0 0
0 0
1
2.
0
D u = (L + U ) u + b.
where
T = D1 (L + U ),
c = D1 b.
(7.48)
0 31
1
3
1
2
2
5
1
5
T = D1 (L + U ) = 41
Deciding in advance whether or not the Jacobi method will converge is not easy.
However, it can be shown that Jacobi iteration is guaranteed to converge when the original
coefficient matrix has large diagonal entries, in accordance with Definition 6.25.
Theorem 7.26. If A is strictly diagonally dominant, then the associated Jacobi
iteration scheme converges.
Proof : We shall prove that k T k < 1, and so Corollary 7.19 implies that T is a
convergent matrix. The absolute row sums of the Jacobi matrix T = D1 (L + U ) are,
according to (7.43),
n
n
X
1 X
| aij | < 1,
si =
| tij | =
(7.49)
|
a
|
ii
j=1
j =1
j6=i
121
c 2008
Peter J. Olver
x + z + 4 w + v = 2,
y + w + 4 v = 1.
The Jacobi method solves the respective equations for x, y, z, w, v, leading to the iterative
scheme
x(k+1) = 41 y (k) 14 w(k) + 14 ,
y (k+1) = 41 x(k) 14 z (k) 14 v (k) + 12 ,
z (k+1) = 41 y (k) 14 w(k) 14 ,
4
1
A = 0
1
0
1
4
1
0
1
0
1
4
1
0
1
0
1
4
1
0
1
0 ,
1
4
is diagonally dominant, and so we are guaranteed that the Jacobi iterations will eventually
converge to the solution. Indeed, the Jacobi scheme takes the iterative form (7.48), with
0 41
14
0
T =
0 4
14
0
0 14
0 14
0 14
0 14
0 14
0
,
41
0 14
41
c=
1
4
1
2
1
4
1
2
1
4
Note that k T k = 34 < 1, validating convergence of the scheme. Thus, to obtain, say,
four decimal place accuracy in the solution, we estimate that it would take less than
log(.5 104 )/ log .75 34 iterates, assuming a moderate initial error. But the matrix
norm always underestimates the true rate of convergence, as prescribed by the spectral
radius (T ) = .6124, which would imply about log(.5 104 )/ log .6124 20 iterations to
obtain the desired accuracy. Indeed, starting with the initial guess x(0) = y (0) = z (0) =
w(0) = v (0) = 0, the Jacobi iterates converge to the exact solution
x = .1,
y = .7,
z = .6,
w = .7,
v = .1,
122
c 2008
Peter J. Olver
(k+1)
= t21 u1
(k+1)
= t31 u1 + t32 u2
u1
u2
u3
(k)
(k)
(k)
..
.
(k)
(k)
(k)
(k)
..
.
(k)
..
.
..
(k)
(k)
.
(k)
(7.50)
..
.
(k)
u(k+1)
= tn1 u1 + tn2 u2 + tn3 u3 + + tn,n1 un1
n
+ cn ,
where we are explicitly noting the fact that the diagonal entries of T vanish. Observe
that we are using the entries of u(k) to compute all of the updated values of u(k+1) .
Presumably, if the iterates u(k) are converging to the solution u , then their individual
(k+1)
entries are also converging, and so each uj
should be a better approximation to uj
(k)
than uj
(k+1)
using the
(k)
first equation, then we are tempted to use this new and improved value to replace u1
each of the subsequent equations. In particular, we employ the modified equation
(k+1)
u2
(k+1)
= t21 u1
in
(k)
to update the second component of our iterate. This more accurate value should then be
(k+1)
used to update u3
, and so on.
The upshot of these considerations is the GaussSeidel method
(k+1)
(k+1)
(k+1)
(k)
= ti1 u1
+ + ti,i1 ui1
x 4 y + 2 z = 1,
2 x y + 5 z = 2,
the Jacobi iteration method was given in (7.39). To construct the corresponding Gauss
Seidel scheme we use updated values of x, y and z as they become available. Explicitly,
x(k+1) = 13 y (k) +
y (k+1) =
z (k+1) =
5/18/08
1
4
2
5
1 (k)
z + 1,
3
x(k+1) + 12 z (k) + 41 ,
x(k+1) + 15 y (k+1) + 25 .
123
(7.52)
c 2008
Peter J. Olver
1.0000
1.1333
1.0222
u(1) = .5000 ,
u(2) = .9833 ,
u(3) = 1.0306 ,
.9000
1.0500
1.0150
.9977
1.0000
1.0001
u(5) = .9990 ,
u(6) = .9994 ,
u(7) = 1.0000 ,
.9989
.9999
1.0001
u(4)
u(8)
.9948
= 1.0062 ,
.9992
1.0000
= 1.0000 ,
1.0000
and have converged to the solution, to 4 decimal place accuracy, after only 8 iterations
as opposed to the 11 iterations required by the Jacobi method.
The GaussSeidel iteration scheme is particularly suited to implementation on a serial
(k)
computer, since one can immediately replace each component ui by its updated value
(k+1)
ui
, thereby also saving on storage in the computers memory. In contrast, the Jacobi
scheme requires us to retain all the old values u(k) until the new approximation u(k+1) has
been computed. Moreover, GaussSeidel typically (although not always) converges faster
than Jacobi, making it the iterative algorithm of choice for serial processors. On the other
hand, with the advent of parallel processing machines, variants of the parallelizable Jacobi
scheme have recently been making a comeback.
What is GaussSeidel really up to? Let us rewrite the basic iterative equation (7.51)
by multiplying by aii and moving the terms involving u(k+1) to the left hand side. In view
of the formula (7.43) for the entries of T , the resulting equation is
(k+1)
ai1 u1
(k+1)
+ + ai,i1 ui1
(k+1)
+ aii ui
(k)
(7.53)
and so can be viewed as a linear system of equations for u(k+1) with lower triangular
coefficient matrix L + D. Note that the fixed point of (7.53), namely the solution to
(L + D) u = U u + b,
coincides with the solution to the original system
A u = (L + D + U ) u = b.
In other words, the GaussSeidel procedure is merely implementing Forward Substitution
to solve the lower triangular system (7.53) for the next iterate:
u(k+1) = (L + D)1 U u(k) + (L + D)1 b.
The latter is in our more usual iterative form
c,
u(k+1) = Te u(k) + e
where
Te = (L + D)1 U,
e
c = (L + D)1 b.
(7.54)
124
c 2008
Peter J. Olver
3
1 1
3
0 0
A = 1 4
2,
L + D = 1 4 0 ,
2 1
5
2 1 5
0 1
U= 0 0
0 0
.3333 .3333
Te = (L + D)1 U = 0
1
2.
0
.0833 .5833 .
.1500 .2500
Its eigenvalues are 0 and .0833.2444 i , and hence its spectral radius is ( Te ) .2582. This
is roughly the square of the Jacobi spectral radius of .5, which tell us that the GaussSeidel
iterations will converge about twice as fast to the solution. This can be verified by more
extensive computations. Although examples can be constructed where the Jacobi method
converges faster, in many practical situations GaussSeidel tends to converge roughly twice
as fast as Jacobi.
Completely general conditions guaranteeing convergence of the GaussSeidel method
are hard to establish. But, like the Jacobi scheme, it is guaranteed to converge when the
original coefficient matrix is strictly diagonally dominant.
Theorem 7.29. If A is strictly diagonally dominant, then the GaussSeidel iteration
scheme for solving A u = b converges.
Proof : Let e(k) = u(k) u denote the k th GaussSeidel error vector. As in (7.41),
the error vectors satisfy the linear iterative system e(k+1) = Te e(k) , but a direct estimate of
k Te k is not so easy. Instead, let us write out the linear iterative system in components:
(k+1)
ei
(k+1)
= ti1 e1
Let
(k+1)
+ + ti,i1 ei1
(k)
(k)
m(k) = k e(k) k = max | e1 |, . . . , | e(k)
n |
(7.55)
(7.56)
denote the norm of the k th error vector. To prove convergence, e(k) 0, it suffices to
show that m(k) 0 as k . We claim that diagonal dominance of A implies that
m(k+1) s m(k) ,
where
s = k T k < 1
(7.57)
denotes the matrix norm of the Jacobi matrix (not the GaussSeidel matrix), which,
by (7.49), is less than 1. We infer that m(k) sk m(0) 0 as k , demonstrating the
theorem.
To prove (7.57), we use induction on i = 1, . . . , n. Our induction hypothesis is
(k+1)
| ej
for
j = 1, . . . , i 1.
| ej | m(k)
5/18/08
for all
125
j = 1, . . . , n.
c 2008
Peter J. Olver
(k+1)
| ei
(k+1)
(k+1)
(k)
| | ti1 | | e1
| from (7.55):
Q.E.D.
Example 7.30. For the linear system considered in Example 7.27, the GaussSeidel
iterations take the form
x(k+1) = 41 y (k) 14 w(k) + 41 ,
Starting with x(0) = y (0) = z (0) = w(0) = v (0) = 0, the GaussSeidel iterates converge
to the solution x = .1, y = .7, z = .6, w = .7, v = .1, to four decimal places in 11
iterations, again roughly twice as fast as the Jacobi scheme. Indeed, the convergence rate
is governed by the corresponding GaussSeidel matrix Te, which is
4
1
1
0
0
4
1
0
1
0
0
4
1
0
0
0
0
4
1
1
0
0
0 0
0 0
0
0
0
4
1
0
0
0
0
0
1
0
0
0
1
0
1
0
0
0
0
1 0
0 = 0
0
1
0
0
.2500
0
.2500
0
.0625 .2500
.0625 .2500
.0156
.0625 .2656
.0625 .
.0664 .0156
.1289 .2656
.0322
.0664 .0479
.1289
Its spectral radius is ( Te ) = .3936, which is, as in the previous example, approximately
the square of the spectral radius of the Jacobi coefficient matrix, which explains the speed
up in convergence.
Successive OverRelaxation (SOR)
As we know, the smaller the spectral radius (or matrix norm) of the coefficient matrix,
the faster the convergence of the iterative scheme. One of the goals of researchers in
numerical linear algebra is to design new methods for accelerating the convergence. In his
1950 thesis, the American mathematician David Young discovered a simple modification of
the Jacobi and GaussSeidel methods that can, in favorable situations, lead to a dramatic
speed up in the rate of convergence. The method, known as successive over-relaxation,
and often abbreviated as SOR, has become the iterative method of choice in many modern
applications, [13, 53]. In this subsection, we provide a brief overview.
5/18/08
126
c 2008
Peter J. Olver
In practice, finding the optimal iterative algorithm to solve a given linear system is
as hard as solving the system itself. Therefore, researchers have relied on a few tried
and true techniques for designing iterative schemes that can be used in the more common
applications. Consider a linear algebraic system
A u = b.
Every decomposition
A=M N
(7.58)
of the coefficient matrix into the difference of two matrices leads to an equivalent system
of the form
M u = N u + b.
(7.59)
Provided that M is nonsingular, we can rewrite the system in the fixed point form
u = M 1 N u + M 1 b = T u + c,
where
T = M 1 N,
c = M 1 b.
Now, we are free to choose any such M , which then specifies N = AM uniquely. However,
for the resulting iterative scheme u(k+1) = T u(k) + c to be practical we must arrange that
(a) T = M 1 N is a convergent matrix, and
(b) M can be easily inverted.
The second requirement ensures that the iterative equations
M u(k+1) = N u(k) + b
(7.60)
can be solved for u(k+1) with minimal computational effort. Typically, this requires that
M be either a diagonal matrix, in which case the inversion is immediate, or upper or lower
triangular, in which case one employs Back or Forward Substitution to solve for u(k+1) .
With this in mind, we now introduce the SOR method. It relies on a slight generalization of the GaussSeidel decomposition (7.53) of the matrix into lower plus diagonal
and upper triangular parts. The starting point is to write
A = L + D + U = L + D ( 1) D U ,
(7.61)
where 0 6= is an adjustable scalar parameter. We decompose the system A u = b as
(L + D)u = ( 1) D U u + b.
(7.62)
It turns out to be slightly more convenient to divide (7.62) through by and write the
resulting iterative system in the form
( L + D)u(k+1) = (1 ) D U u(k) + b,
(7.63)
where = 1/ is called the relaxation parameter . Assuming, as usual, that all diagonal
entries of A are nonzero, the matrix L + D is an invertible lower triangular matrix, and
so we can use Forward Substitution to solve the iterative system (7.63) to recover u(k+1) .
The explicit formula for its ith entry is
(k+1)
ui
(k+1)
= ti1 u1
(k+1)
+ + ti,i1 ui1
(k)
(k)
+ (1 ) ui
5/18/08
127
c 2008
(7.64)
Peter J. Olver
where tij and ci denote the original Jacobi values (7.43). As in the GaussSeidel approach,
(k+1)
we update the entries ui
in numerical order i = 1, . . . , n. Thus, to obtain the SOR
scheme (7.64), we merely multiply the right hand side of the GaussSeidel scheme (7.51)
(k)
by the adjustable relaxation parameter and append the diagonal term (1 ) ui . In
particular, if we set = 1, then the SOR method reduces to the GaussSeidel method.
Choosing < 1 leads to an under-relaxed method, while > 1, known as over-relaxation,
is the choice that works in most practical instances.
To analyze the SOR scheme in detail, we rewrite (7.63) in the fixed point form
u(k+1) = T u(k) + c ,
(7.65)
where
T = ( L + D)1 (1 ) D U ,
c = ( L + D)1 b.
(7.66)
The rate of
is to choose
as possible.
convergence
(k+1)
u
=
u(k) + b,
2
0
2 (1 )
where GaussSeidel is the particular case = 1. The SOR coefficient matrix is
1
1
2 0
2 (1 )
1
2
T =
= 1
1
2 .
2
0
2 (1 )
2 (1 )
4 (2 )
To compute the eigenvalues of T , we form its characteristic equation
0 = det(T I ) = 2 2 2 + 14 2 + (1 )2 = ( + 1)2 14 2 . (7.67)
Our goal is to choose so that
(a) both eigenvalues are less than 1 in modulus, so | 1 |, | 2 | < 1. This is the minimal
requirement for convergence of the method.
(b) the largest eigenvalue (in modulus) is as small as possible. This will give the smallest
spectral radius for T and hence the fastest convergence rate.
5/18/08
128
c 2008
Peter J. Olver
= = 8 4 3 1.07,
at which point
1 = 2 = 1 = .07 = (T ),
which is the convergence rate of the optimal SOR scheme. Each iteration produces slightly
more than one new decimal place in the solution, which represents a significant improvement over the GaussSeidel convergence rate. It takes about twice as many GaussSeidel
iterations (and four times as many Jacobi iterations) to produce the same accuracy as this
optimal SOR method.
Of course, in such a simple 2 2 example, it is not so surprising that we can construct
the best value for the relaxation parameter by hand. Young was able to find the optimal
value of the relaxation parameter for a broad class of matrices that includes most of those
arising in the finite difference and finite element numerical solutions to ordinary and partial
differential equations. For the matrices in Youngs class, the Jacobi eigenvalues occur in
signed pairs. If are a pair of eigenvalues for the Jacobi method, then the corresponding
eigenvalues of the SOR iteration matrix satisfy the quadratic equation
( + 1)2 = 2 2 .
(7.68)
1 + 1 2
Therefore, if J = max | | denotes the spectral radius of the Jacobi method, then the
GaussSeidel has spectral radius GS = 2J , while the SOR method with optimal relaxation
parameter
2
p
,
has spectral radius
= 1.
=
(7.69)
1 + 1 2J
For example, if J = .99, which is rather slow convergence (but common for iterative
numerical solution schemes for partial differential equations), then GS = .9801, which is
5/18/08
129
c 2008
Peter J. Olver
twice as fast, but still quite slow, while SOR with = 1.7527 has = .7527, which is
dramatically faster . Indeed, since (GS )14 (J )28 , it takes about 14 GaussSeidel
(and 28 Jacobi) iterations to produce the same accuracy as one SOR step. It is amazing
that such a simple idea can have such a dramatic effect.
More precisely, since the SOR matrix is not diagonalizable, the overall convergence rate is
slightly slower than the spectral radius. However, this technical detail does not affect the overall
conclusion.
5/18/08
130
c 2008
Peter J. Olver