0% found this document useful (0 votes)
16 views

2 - Numerical Methods For Solving Linear Systems of Equations

Uploaded by

Bob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

2 - Numerical Methods For Solving Linear Systems of Equations

Uploaded by

Bob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Chapter 1

Introduction

Remark 1.1 Contents of the lecture notes. These lecture notes consider solvers
for a linear system of equation

Ax = b, A ∈ Rn×n , x, b ∈ Rn (1.1)

with non-singular matrix A. The solution of such systems is the core of many
algorithms.
In particular, systems with the following features will be considered in these
notes:
• the dimension n of the systems is very large,
• the system matrix A is sparse, i.e. the number of non-zero entries in A is only
a small percentage, usually O(n), of the total number of entries that is n2 .
Systems with these features arise, e.g., in the discretization of partial differential
equations.
Throughout the lecture notes, vectors are denoted by small bold-faced letters,
components of vectors by small letters, matrices by capital letters, scalars by Greek
letters, and indices by the letters i, j, l, m. The iteration index in iterative scheme
is denoted by k.
Main parts of these lecture notes follow Starke (2001). 2

2
Chapter 2

Some Basics on Vectors and


Matrices

Remark 2.1 Contents. This chapter gives an overview on vector and matrix prop-
erties which will be used in these lecture notes. 2
T
Remark 2.2 Norms of vectors. Let x = (x1 , . . . , xn ) ∈ Rn be a vector. The
lp -norm is defined by
n
!1/p
X p
kxkp := |xi | , p ∈ [1, ∞),
i=1
kxk∞ := max |xi | .
i=1,...,n

If p = 1, the norm is called sum norm, in the case p = 2 one speaks of the Euclidean
norm and for p = ∞ of the maximum norm. 2
Remark 2.3 Norms of matrices. Let
 
a11 · · · a1n
A =  ... .. ..  ∈ Rn×n .

. . 
an1 ··· ann
The induced matrix p-norm is defined by
kAxkp
kAkp := max = max kAxkp = max kAxkp . (2.1)
x∈R ,x6=0
n kxkp x∈Rn ,kxkp ≤1 x∈Rn ,kxkp =1

Special cases are


n
X
kAk1 = max |aij | , maximum absolute column sum norm,
j=1,...,n
i=1
1/2
kAk2 = λmax AT A , Euclidean norm, spectral norm,
Xn
kAk∞ = max |aij | , maximum absolute row sum norm.
i=1,...,n
j=1

Another norm is the Frobenius1 norm given by


 1/2
X n
2
kAkF =  |aij |  .
i,j=1

1 Ferdinand Georg Frobenius (1849 - 1917)

3
2

Remark 2.4 Properties of matrix norms. From (2.1) it follows immediately for all
x ∈ Rn that
kAxkp
kAkp ≥ ⇐⇒ kAxkp ≤ kAkp kxkp .
kxkp
It holds also kAxk2 ≤ kAkF kxk2 . By induction it follows for B ∈ Rn×n that

kABxkp ≤ kAkp kBxkp ≤ kAkp kBkp kxkp ⇐⇒


kABxkp
kABkp = max ≤ kAkp kBkp .
x∈Rn ,x6=0 kxkp
2

Lemma 2.5 Properties of non-singular quadratic matrices. Let A ∈ Rn×n .


The following properties are equivalent:
• A is non-singular.
• The inverse A−1 of A exists.
• The linear system (1.1) possesses for each right hand side b a unique solution.
• The determinant of A does not vanish: det (A) 6= 0.
• All eigenvalues of A are different from zero.

Proof: This lemma was proved in the course on basic linear algebra.

Definition 2.6 Eigenvalues, eigenvectors, spectral radius. A complex num-


ber λ ∈ C is called eigenvalue of A ∈ Cn×n if there is a vector x ∈ Cn , x 6= 0, such
that
Ax = λx.
The vector x is called eigenvector. Note that all real (complex) eigenvalues will be
associated to real (complex) eigenvectors.
The spectral radius of a matrix A is defined by

ρ (A) = max{|λ| : λ is eigenvalue of A}.

Remark 2.7 On eigenvalues. For every eigenvalue λj ∈ C of A it holds |λj | ≤ kAk


for any matrix norm which is given in Remark 2.3. It follows that ρ (A) ≤ kAk. 2

Lemma 2.8 Existence of a matrix norm that is arbitrarily close to the


spectral radius. Let A ∈ Rn×n and ε > 0 be given. Then, there is a vector norm
k·k∗ such that for the induced matrix norm it holds

ρ (A) ≤ kAk∗ ≤ ρ (A) + ε.

Proof: The proof uses Schur’s2 triangulation theorem: Every matrix A ∈ Rn×n can
be factored in the form A = U ∗ T U , where U ∈ Cn×n is a unitary matrix, U ∗ = U −1 (the
adjoint matrix is the inverse matrix), and T an upper triangular matrix of the form

λ1 t12 · · · t1n
 
 0 λ2 · · · t2n 
T = . .. ..  ,
 
 .. ..
. . . 
0 0 · · · λn
2 Issai Schur (1875 - 1941)

4
with the eigenvalues λ1 , . . . , λn of A, e.g. see (Marcus and Minc, 1992, p.
 67). The vector
norm k·k∗ is defined with the diagonal matrix Dδ = diag 1, δ, . . . , δ n−1 , δ > 0:

kxk∗ := Dδ−1 U x ∞
.

For the induced matrix norm it follows, using the Schur triangulation of A, that
Dδ−1 U Ax ∞
Dδ−1 U U ∗ T U x ∞
kAk∗ := max = max .
x∈R ,x6=0
n
Dδ−1 U x ∞
x∈R ,x6=0
n
Dδ−1 U x ∞

Setting y = Dδ−1 U x it follows that x = U ∗ Dδ y since the matrices U and Dδ are non-
singular and Dδ−1 U is a bijection from Rn to Rn . Inserting this expression gives

Dδ−1 U U ∗ T U U ∗ Dδ y
kAk∗ = max ∞
= Dδ−1 T Dδ .
y∈R ,y6=0
n kyk∞ ∞

The diagonal matrix Dδ−1 scales just the rows of T and the matrix Dδ just the columns of
T . Thus, the product is again an upper triangular matrix and a straigthforward calculation
shows that
λ1 δt12 · · · δ n−1 t1n
 
 0 n−2
λ2 ··· δ t2n 
 . . .
 
−1
Dδ T Dδ =  .. .. . .. ..


 
 0 0 · · · δtn−1,n 
0 0 ··· λn
and hence Dδ−1 T Dδ ∞
≤ ρ (A) + ε if δ is chosen sufficiently small.

Definition 2.9 Spectral condition number. The spectral condition number


κ2 (A) of a non-singular matrix A ∈ Rn×n is defined by

κ2 (A) = kAk2 A−1 2


.

Definition 2.10 Definiteness. The matrix A ∈ Rn×n is called positive definite


if
xT Ax > 0 ∀ x ∈ Rn \ {0}. (2.2)
If the equal sign can occur, A is called positv semi-definite. 2

Remark 2.11 On definiteness. Applying the standard basis vectors

e(i) = (0, . . . , 0, |{z}


1 , 0, . . . , 0)T , i = 1, . . . , n,
i

in (2.2) shows that if A is positive (semi-)definite then also the diagonal matrix
diag (aii ) is positive (semi-)definite. 2

Remark 2.12 On symmetric matrices. A matrix A ∈ Rn×n is called symmetric


if A = AT . It is called skew-symmetric if AT = −A.
One of the most important properties of symmetric matrices is that all eigen-
values are real numbers. It holds, see e.g. Saad (2003),

xT Ax xT Ax
λmax (A) = max , λmin (A) = min . (2.3)
x∈Rn ,x6=0 xT x x∈Rn ,x6=0 xT x

The quotient on the right hand side is called Rayleigh3 quotient. A symmetric
matrix is positive definite (s.p.d.) if and only if all of its eigenvalues are positive.
It is positive semi-definite if and only if all of its eigenvalues are non-negative.
3 John William Strutt (Lord Rayleigh) (1842 - 1919)

5
In the case of A ∈ Rn×n being symmetric and positive definite, one obtains for
the spectral condition number of A
1/2 1/2  1/2
2
kAk2 = λmax AT A = λmax A2 = (λmax (A)) = λmax (A) .
 −1 −1
Since λmax A−1 = (λmin (A)) , one has A−1 2
= (λmin (A)) and

λmax (A)
κ2 (A) = ≥ 1.
λmin (A)
2

Definition 2.13 Diagonal dominance. A matrix A is called diagonally domi-


nant if
n
X
|aii | ≥ |aij | for all i = 1, . . . , n.
j=1,j6=i

If for all i the larger sign holds, then A is called stronly diagonally dominant. 2

Definition 2.14 Normal matrix. The matrix A ∈ Rn×n is called normal, if


AT A = AAT . 2

Remark 2.15 On normal matrices. It is known that A is normal if and only if it


is unitary similar to a diagonal matrix, i.e., there is a unitary matrix Q ∈ Rn×n
such that
A = Q∗ ΛQ, Λ = diag (λ1 , . . . , λn ) ,
where λ1 , . . . , λn are the eigenvalues of A. Obviously, symmetric matrices are nor-
mal. 2

6
Chapter 3

Classical Iterative Schemes

3.1 General Theory


Remark 3.1 Basic idea, transform to a fixed point equation. The construction of
a classical iterative scheme for solving (1.1) starts with the decomposition

A = M − N, M, N ∈ Rn×n , M is non-singular, (3.1)

of the system matrix A. Using this decomposition, (1.1) can be transformed into
the fixed point equation

Mx = b + Nx ⇐⇒ x = M −1 (b + N x) . (3.2)

Given an initial iterate x(0) ∈ Rn , one can try to solve (3.2) with the fixed point
iteration  
x(k+1) = M −1 b + N x(k) , k = 0, 1, 2, . . . . (3.3)

Banach’s1 fixed point theorem gives information on the convergence of this it-
eration. 2

Theorem 3.2 Banach’s fixed point theorem. Let (X , d) be a complete metric


space and let f : X → X be a contraction (f is Lipschitz2 continuous with the
Lipschitz constant L < 1). Then, the equation x = f (x) possesses a unique solution
x̄ ∈ X (a fixed point). The iterative scheme
 
x(k+1) = f x(k) , k = 0, 1, 2, . . .

converges to x̄ for any initial iterate x(0) ∈ X .

Proof: Basic course on calculus.

Theorem 3.3 Condition on the iteration matrix of (3.3) for convergence.


The iterative scheme (3.3) converges to the solution x of (1.1) for any initial iterate
x(0) if and only if the spectral radius of the iteration matrix G = M −1 N is smaller
than one: ρ (G) < 1.

Proof: i) The iteration (3.3) is a fixed point iteration with


f : Rn → Rn , x 7→ M −1 N x + M −1 b.
1 Stefan Banach (1892 - 1945)
2 Rudolf Lipschitz (1832 - 1903)

7
The operator G = M −1 N is linear and bounded since kGk∗ is finite, where k·k∗ is any ma-
trix norm. Hence, G is continuous and even Lipschitz continuous. Since f is continuously
differentiable, the Lipschitz constant is given by
L∗ = sup kJ (f (x))k∗ = sup kGk∗ = kGk∗ ,
x∈Rn x∈Rn

where J (f (x)) is the (constant) Jacobian of f (x).


ii) Let ρ (G) < 1. Then, it is possible to find a matrix norm k·k∗ such that, according
to Lemma 2.8, kGk∗ ≤ ρ (G) + ε < 1 with ε > 0. Hence L∗ < 1 and f (x) is a contraction.
iii) Let ρ (G) ≥ 1. An initial guess will be constructed for which the fixed point
iteration does not converge. Without loss of generality, consider the case b = 0 such that
the solution of (1.1) is x = 0.
Since ρ (G) ≥ 1, there is an eigenvalue λ ∈ C of G with |λ| ≥ 1. The eigenvalue
can be written in the form λ = |λ| (cos (ϕ) + i sin (ϕ)) where ϕ is the argument of λ. Let
z ∈ Cn , z 6= 0, be a corresponding eigenvector: Gz = λz. From the conjugate of this
equation Gz = λz it follows that Gz = λz since G is a real matrix.
Choose the initial iterate x(0) = z + z ∈ Rn . One has to exclude that x(0) = 0.
If x(0) = 0 then z = iv with v ∈ Rn . One obtains from the eigenvalue equation that
iGv = iλv which is equivalent to Gv = λv. On the left hand side of this equation there
is a real vector. Since v is a real vector, it follows that λ must be real, too. But in this
case, the corresponding eigenvector is also real and it cannot be of form z = iv. Hence,
an eigenvector of form z = iv cannot appear and x(0) 6= 0.
Now, it follows that
   k
 
G G · · · Gx(0) = Gk x(0) = Gk z + Gk z = λk z + λ z = 2Re λk z , k = 0, 1, . . . .
| {z }
k times

The iteration converges if


 
0 = lim 2Re λk z = lim 2 |λ|k Re ((cos (kϕ) + i sin (kϕ)) z)
k→∞ k→∞
k
= lim 2 |λ| (cos (kϕ) Re (z) − sin (kϕ) Im (z)) .
k→∞

The factor |λ|k does not converge to zero since |λ| ≥ 1. Note that the second factor is a
vector. It converges to zero if and only if each of its components converges to zero. There
is at least one component zl with zl 6= 0 since z is an eigenvector. Let ζ be the argument
of zl . It is
cos (kϕ) Re (zl ) − sin (kϕ) Im (zl ) = |zl | (cos (kϕ) cos (ζ) − sin (kϕ) sin (ζ))
= |zl | cos (kϕ + ζ) . (3.4)
If λ ∈ R, i.e. ϕ = ±π, then the eigenvector z is real, too, such that ζ = ±π. In this case,
(3.4) takes the values |zl | or − |zl | and it does not tend to zero as k → ∞. If λ 6∈ R, then
the period ϕ in (3.4) is not an integer multiple of π such that the argument of the cosine
cannot tend to π/2 plus an integer multiple of π. Hence, also in this case, the second
factor in (3.4) does not tend to zero.
In summary, the iteration (3.3) does not converge for the initial iterate x(0) = z + z.
That means, if the iteration (3.3) converges for all initial iterates, then ρ (G) ≥ 1 cannot
hold.
Note: the last part of the proof simplifies much if one considers complex-valued systems
of linear equations. Then, one can take the initial iterate x(0) = z 6= 0, finds that
x(k) = λk z, and concludes that x(k) = |λ|k kzk2 6→ 0, since kzk2 6= 0 and |λ|k → ∞.
2

3.2 Examples for classical iterative schemes


Remark 3.4 Decomposition of the system matrix. One uses for the definition of
classical iterative schemes a decomposition of the matrix A into the form
A = D + L + U,

8
where D is the diagonal of A, L is its strict lower part and U its strict upper part.
2

Example 3.5 Jacobi method. The Jacobi method is derived by setting

M = D, N = − (L + U ) .

A straighforward calculation reveals that the fixed point equation (3.2) has the form

x = D−1 (b − (L + U ) x) = x + D−1 (b − Ax) .

This gives the following iterative scheme, called Jacobi3 method


 
x(k+1) = x(k) + D−1 b − Ax(k) , k = 0, 1, 2, . . . .

The iteration matrix is GJac = −D−1 (L + U ). 2

Example 3.6 Damped Jacobi method. Let ω ∈ R, ω > 0. The matrices which
define the fixed point equation for the damped Jacobi method are given by

M = ω −1 D, N = ω −1 D − A.

The damped Jacobi method has the form


 
x(k+1) = x(k) + ωD−1 b − Ax(k) , k = 0, 1, 2, . . .

and the iteration matrix is GdJac = I − ωD−1 A. 2

Example 3.7 Gauss–Seidel method. In the Gauss4 –Seidel5 method, the invertible
matrix M is a triangular matrix

M = D + L, N = −U.

It follows that
 
x(k+1) = (D + L) b − U x(k) ,
−1
k = 0, 1, 2, . . . ,

−1
such that the iteration matrix has the form GGS = − (D + L) U . Multiplying the
equation for the Gauss–Seidel method by (D + L) and rearranging terms gives the
more familiar form of this iteration
 
x(k+1) = D−1 b − Lx(k+1) − U x(k)
 
= x(k) + D−1 b − Lx(k+1) − (D + U ) x(k) , k = 0, 1, 2, . . . .

Writing this iteration for the components of the vector shows that the right hand
side can be evaluated even if the new iterate appears there, since only already
computed components of the new iterate are needed for this evaluation
 
i−1 n
(k+1) (k) 1  X (k+1)
X (k)
xi = xi + bi − aij xj − aij xj  .
aii j=1 j=i

2
3 Carl Gustav Jacob Jacobi (1804 - 1851)
4 Johann Carl Friedrich Gauss (1777 - 1855)
5 Philipp Ludwig von Seidel (1821 - 1896)

9
Example 3.8 SOR method. The matrices which define the (forward) successive
over relaxation (SOR) method are given by

M = ω −1 D + L, N = ω −1 D − (D + U ) ,

where ω ∈ R, ω > 0. This method can be written in the form


 
x(k+1) = x(k) + ωD−1 b − Lx(k+1) − (D + U ) x(k) , k = 0, 1, 2, . . . .

For ω = 1, the Gauss–Seidel method is recovered. One obtains for the iteration
matrix
−1 −1 
GSOR (ω) = ω −1 D + L ω D − (D + U )
−1 
= ω (D + ωL) ω −1 D − (D + U )
−1
= (D + ωL) ((1 − ω) D − ωU ) .

Example 3.9 SSOR method. In the SOR method, one can change the roles of L
and U to obtain the backward SOR method
 
x(k+1) = x(k) + ωD−1 b − U x(k+1) − (D + L) x(k) , k = 0, 1, 2, . . . .

This method updates the unknowns in reverse order. The forward and backward
SOR behave in general differently. There are cases in which one of them works
much more efficient than the other one. However, in general one does not know
a priori which is the better variant. The SSOR (symmetric SOR) combines both
methods. One step of SSOR consists of two substeps, one forward SOR and one
backward SOR step:
 
x(k+1/2) = x(k) + ωD−1 b − Lx(k+1/2) − (D + U ) x(k)
 
x(k+1) = x(k) + ωD−1 b − U x(k+1) − (D + L) x(k+1/2) , k = 0, 1, 2, . . . .

3.3 Some Convergence Results


Theorem 3.10 Strongly diagonally dominant matrices. Let A ∈ Rn×n be
strongly diagonally dominant. Then, the Jacobi method and the Gauss–Seidel method
converge for every initial iterate x(0) ∈ Rn .
Proof: Following Theorem 3.3, one has to show that the spectral radius of the itera-
tion matrices is smaller than 1.
Jacobi method. Let z ∈ Rn , z 6= 0. Then, the triangle inequality gives
n n
 1 X 1 X
(GJac z)i = −D−1 (L + U ) z = aij zj ≤ |aij | |zj |
i aii |aii |
j=1,j6=i j=1,j6=i
n
1 X
≤ |aij | kzk∞ < kzk∞ .
|aii |
j=1,j6=i
| {z }
<|aii |

It follows that
kGJac zk∞
ρ (GJac ) ≤ kGJac k∞ = max < 1.
z∈Rn ,z6=0 kzk∞

10
Gauss–Seidel method. A direct calculation shows (exercise)

GGS = −D−1 (LGGS + U ) .

This relation gives for the first component and z ∈ Rn , z 6= 0


n n
1 X 1 X
(GGS z)1 ≤ |a1j | |zj | ≤ |a1j | kzk∞ < kzk∞ ,
|a11 | j=2 |a11 | j=2,
| {z }
<|a11 |

where the term with the factor LGGS vanishes since the first row of L consists only of
zeros. Using now the induction (GGS z)j < kzk∞ , j < i, yields

i−1 n
!
1 X X
(GGS z)i ≤ |aij | (GGS z)j + |aij | |zj |
|aii | j=1 j=i+1
n
1 X
≤ |aij | kzk∞ < kzk∞ , i = 2, . . . , n.
|aii |
j=1,j6=i
| {z }
<|aii |

The remainder of the proof is like for the Jacobi method.

Lemma 3.11 Eigenvalues of the iteration matrix of the damped Jacobi


method. Let ω > 0, then λ ∈ C is an eigenvalue of GJac if and only if µ = 1−ω+ωλ
is an eigenvalue of GdJac .
Proof: It is with A = D + L + U
GdJac = I − ωD−1 A = I − ωD−1 D − ω D−1 (L + U ) = (1 − ω) I + ωGJac .
| {z }
−GJac

The statement of the lemma follows now from well known properties of eigenvalues.

Example 3.12 Convergence of the damped Jacobi method where the Jacobi method
fails. If ω is chosen appropriately, there is the possibility that the damped Jacobi
method converges for every initial guess whereas the Jacobi method does not.
Assume that GJac has only real eigenvalues. Denote by λmin the smallest one
and by λmax the largest one. If

λmin < −1 < λmax < 1,

then there are initial iterates for which the Jacobi method does not converge, The-
orem 3.3. From Lemma 3.11 one has

µmin = (1 − ω) + ωλmin , µmax = (1 − ω) + ωλmax .

It follows that
2
−1 < µmin if ω < < 1, µmax < 1 if ω > 0.
1 − λmin
The choice of ω ∈ (0, 2/ (1 − λmin )) ensures the convergence of the damped Jacobi
method for each initial iterate.
Consider the case λmax > 1. Then

µmax = (1 − ω) + ωλmax = 1 + ω(λmax − 1) > 1.

In this case, there are initial iterates for which the damped Jacobi method does not
converge as well. 2

11
Lemma 3.13 Parameter in the case that the SOR method converges,
Lemma of Kahan6 . If the SOR method converges for every initial iterates x(0) ∈
Rn then ω ∈ (0, 2).

Proof: Let λ1 , . . . , λn ∈ C be the eigenvalues of GSOR (ω). It is


n
Y 
λi = det (GSOR (ω)) = det (D + ωL)−1 ((1 − ω) D − ωU )
i=1
 
= det (D + ωL)−1 det (1 − ω) D − ωU
| {z } | {z }
lower triangular matrix upper triangular matrix
−1 
= det D (1 − ω) det (D) = (1 − ω)n .
n

Hence
n
Y
|λi | = |1 − ω|n .
i=1

There is at least one eigenvalue λi with |λi | ≥ |1 − ω| and it follows that ρ (GSOR (ω)) ≥
|1 − ω|. The application of Theorem 3.3 shows now that SOR cannot converge for all
initial iterates if ω 6∈ (0, 2), because if ω 6∈ (0, 2) then ρ (GSOR (ω)) ≥ |1 − ω| ≥ 1.

Theorem 3.14 Convergence of SOR for s.p.d. matrices. Let A ∈ Rn×n be


s.p.d. Then the SOR method converges for all initial iterates x(0) ∈ Rn if ω ∈ (0, 2).

Proof: Let λ ∈ C be an eigenvalue of GSOR (ω) and let z ∈ Cn be the corresponding


eigenvector, i.e.
(D + ωL)−1 ((1 − ω) D − ωU ) z = λz.
Following Theorem 3.3, one has to find a condition such that |λ| < 1. The following
identities can be easily verified
 ω ω ω
D + ωL = 1− D + A + (L − U ) ,
 2 2 2
ω ω ω
(1 − ω) D − ωU = 1− D − A + (L − U ) .
2 2 2
Inserting these identities into the eigenvalue equation and multiplying this equation from
the left hand side with the adjoint vector z∗ , one obtains
 ω ∗ ω ω
1− z Dz − z∗ Az + z∗ (L − U ) z
λ=  2 2 2 .
ω ∗ ω ω
1− z Dz + z∗ Az + z∗ (L − U ) z
2 2 2
Now, the terms in this expression will be considered individually. The matrix L − U is
skew-symmetric since A is symmetric. It follows for all x ∈ Rn that
 T
xT (L − U )x = xT (L − U )x = xT (L − U )T x = −xT (L − U )x ∈ R,
| {z }
∈R

consequently xT (L − U )x = 0 for all x ∈ Rn , and

Re (z∗ (L − U ) z) = Re (z∗ ) (L − U )Re (z) + Im (z∗ ) (L − U )Im (z) = 0.

Hence z∗ (L − U ) z = ia with a ∈ R. Since A is positive definite, its diagonal D is


positive definite, too. The products z∗ Dz and z∗ Az are positive real numbers since for
z = u + iv, z∗ = uT − ivT , u, v ∈ Rn , z 6= 0 because it is an eigenvector, one obtains with
the symmetry of A

z∗ Az = uT Au − ivT Au + iuT Av − i2 vT Av
= uT Au − ivT Au + ivT Au + vT Av > 0.
6 William M. Kahan, born 1933

12
It follows that λ has the form
b + ia
λ= a, b, c ∈ R
c + ia
with  ω ∗ ω  ω ∗ ω
b= 1− z Dz − z∗ Az, c= 1− z Dz + z∗ Az.
2 2 2 2
Consequently
h ω ∗ ω i2
b2 + a2 1− z Dz − z∗ Az + a2
2
|λ| = 2 = h 2 2 .
c + a2 ω ∗ ω i2
1− z Dz + z∗ Az + a2
2 2
Thus |λ| < 1 only if the numerator is smaller than the denominator. This is equivalent to
 ω ∗  ω ∗

− |{z}
ω 1− Dz} z
z {z Az} < ω 1 − z Dz z∗ Az ⇐⇒
2 | | {z 2
>0 >0 >0
 ω  ω
− 1− < 1− ⇐⇒
2 2
ω < 2.

Hence, the SOR method converges for all initial iterates if ω ∈ (0, 2).

Remark 3.15 Difficulty of choosing ω in practice. For choosing ω such that the
SOR method converges as fast as possible, one needs information about the eigen-
values of A. However, the computation of these information is very costly, see
Numerical Mathematics I. 2

Remark 3.16 Number of iterations in practice. If classical iterative schemes are


used for the solution of linear systems of equations which arise in discretizing par-
tial differential equations, one finds that the number of iterations to fulfill a certain
stopping criterion rapidly increases. One can show that the number of iteration
depends on the condition number of the matrices and it scales linearly with the
condition number. As example, the standard finite element discretization of the
Laplace equation on  an equidistant grid of size h leads to matrices with a condition
number of O h−2 , see homework problems. It follows that the number of itera-
tions for the solution of the linear system increases approximately by the factor 4
with each refinement h → h/2. For this reason, the classical iterative schemes are
not useful as solver for such systems. They are important as preconditioner or as
smoother in multigrid methods, see Chapter 9. 2

13
Chapter 4

The Richardson Iteration

Definition 4.1 Richardson iteration. Let x(0) ∈ Rn be a given initial iterate.


The Richardson1 iteration for computing a sequence of vectors x(k) ∈ Rn , k =
0, 1, 2, . . ., has the form

r(k) = b − Ax(k) , x(k+1) = x(k) + αk r(k) (4.1)

with appropriately chosen numbers αk ∈ R. 2

Definition 4.2 Co-domain of a matrix. The set


 ∗ 
y Ay n
R (A) = : y ∈ C , y =
6 0 ⊂C
y∗ y

is called co-domain of A. 2

Remark 4.3 On the co-domain of a matrix. The co-domain of A is the co-domain


of the unit sphere of Cn since
y∗ Ay y∗ Ay y∗ y

= ∗
= ∗
A .
y y ky k2 kyk2 ky k kyk
| {z 2} | {z 2}
k·k2 =1 k·k2 =1

The unit sphere is a compact set (bounded and closed) and the mapping y 7→
y∗ Ay/y∗ y is continuous. It follows that R (A) is also a compact set. 2

Lemma 4.4 Co-domain of the inverse matrix. Let A ∈ Rn×n with R (A) ⊂
{λ ∈ C : Re(λ) > 0}, i.e., the co-domain of A is a subset of the right half of the
complex plane. Then

R A−1 ⊂ {λ ∈ C : Re(λ) > 0} .

Proof: From the assumption it follows that A is non-singular. Otherwise, there would
be a vector z ∈ ker(A), z 6= 0, and
=0
z}|{ !

z Az
Re = Re (0) = 0.
z∗ z

This contradicts the assumption on R(A).


1 Lewis Fry Richardson (1881 – 1953)

14
Let y ∈ Cn , y =
6 0, be arbitrary and z = A−1 y 6= 0. Hence, z is also an arbitrary
vector. One has
!
y∗ A−1 y 1 ∗ −1  1
Re = 2 Re y A y = Re((Az)∗ |A−1
{z A} z)
y∗ y kyk2 kAzk22
|{z} =I
∈R
1 1 ∗ 1
= Re (z∗ A∗ z) = Re (z∗ A∗ z) = Re (z∗ Az)
kAzk22 kAzk22 kAzk22
 ∗   ∗ 
kzk22 z Az 1 z Az
= Re ≥ Re > 0,
kAzk22 z∗ z kAk22 z∗ z

where kAzk2 ≤ kAk2 kzk2 has been used.

Theorem 4.5 Convergence of the Richardson iteration. Let A ∈ Rn×n with


R (A) ⊂ {λ ∈ C : Re(λ) > 0}. Then the Richardson iteration (4.1) converges to
the solution of the linear system Ax = b for every initial iterate if αk = α, k =
0, 1, 2, . . ., with 
0 < α < min{β = Re (λ) , λ ∈ R A−1 }.

Proof: Note that R A−1 is a compact set such that the minimum exists. Let x
be the solution of (1.1). It will be shown that the error x − x(k) decreases strongly
2
monotonically and the rate of decrease is strictly lower than one. Using b = Ax and (4.1),
one has the recursion
 
x − x(k+1) = x − x(k) − αr(k) = x − x(k) − α b − Ax(k)
 
= x − x(k) − αA x − x(k) .

Hence,
2     
x − x(k+1) = x − x(k) − αA x − x(k) , x − x(k) − αA x − x(k) (4.2)
2
2  T     2
= x − x(k) − 2α x − x(k) A x − x(k) + α2 A x − x(k) .
2 2
 
(k)
Denoting y = A x − x , one obtains

∈R
 T    T   z }| {
x − x(k) A x − x(k) x − x(k) AT A−T A x − x(k) yT A−T y
2 = 2 =
kA (x − x(k) )k2 kA (x − x(k) )k2 yT y
T −1
y A y  
= ≥ min Re(λ) : λ ∈ R A−1 > α,
yT y
⇐⇒
  2  T  
α2 A x − x(k) < α x − x(k) A x − x(k) .
2

Applying this estimate to the last term of (4.2) gives


2 2  T  
x − x(k+1) ≤ x − x(k) − α x − x(k) A x − x(k)
2 2
  T  
2 x − x(k) A x − x(k) 
= x − x(k) 1 − α 2 . (4.3)
2 kx − x(k) k2

Since R (A) is compact, there is a ε > 0 such that Re (λ) ≥ ε for all λ ∈ R (A) (there is
no sequence that can converge to the imaginary axis). Hence
 T  
x − x(k) A x − x(k)
2 ≥ ε.
kx − x(k) k2

15
Choose ε such that αε ≤ 1, then it follows from (4.3) that
2 2 2
x − x(k+1) ≤ x − x(k) (1 − αε) =: q 2 x − x(k)
2 2 2

with 0 < q < 1 independent of k. One obtains by induction

x − x(k) ≤ q k x − x(0)
2 2

(k)
such that x → x as k → ∞.

Remark 4.6 Choice of α for s.p.d. matrices. Let A be symmetric and positive
definite. Using Rayleigh’s coefficient (2.3), one gets

Re y∗ A−1 y
2
kyk2
1  
T T
= 2 (Re (y)) A−1 Re (y) + (Im (y)) A−1 Im (y)
kyk2
!
T T
1 2 (Re (y)) A−1 Re (y) 2 (Im (y)) A−1 Im (y)
= 2 kRe (y)k2 2 + kIm (y)k2 2
kyk2 kRe (y)k2 kIm (y)k2
1  2  2 
≥ 2 kRe (y)k2 λmin A−1 + kIm (y)k2 λmin A−1
kyk2
 1 1
= λmin A−1 = = .
λmax (A) ρ (A)
That means, the choice α < 1/ρ (A) guarantees the convergence of the Richardson
method. 2

Remark 4.7 Residual minimzation for choosing αk . One possibility to choose αk


in practice consists in the minimization of the norm of the residual
2 2 2 2
r(k+1) = b − Ax(k+1) = b − Ax(k) − αk Ar(k) = r(k) − αk Ar(k)
2 2 2 2
2 2
(k) (k) T (k)
= r − 2αk r Ar + αk2 Ar (k)
.
2 2

The necessary condition for a minimum


d 2
r(k+1) =0
dαk 2

gives
T
r(k) Ar(k)
αk = 2 . (4.4)
Ar(k) 2
Since
d2 2 2
r(k+1) = 2 Ar(k) > 0,
dαk2 2 2

if r(k) 6= 0, one obtains in fact a minimum. 2

Remark 4.8 Spaces spanned by the iterates. It is by (4.1)


n o
x(1) ∈ x(0) + span r(0) ,
n o n o
x(2) ∈ x(1) + span r(1) ∈ x(0) + span r(0) , r(1) .

16
It holds
r(1) = b − Ax(1) = b − Ax(0) − α0 Ar(0) = r(0) − α0 Ar(0)
and consequently n o
x(2) ∈ x(0) + span r(0) , Ar(0) .

One obtains by induction


n o
x(k) ∈ x(0) + span r(0) , Ar(0) , . . . , Ak−1 r(0) .

Definition 4.9 Krylov subspace. Let q ∈ Rn and A ∈ Rn×n . Then, the space

Km (q, A) := span q, Aq, . . . , Ak−1 q

is called the Krylov2 subspace of order m which is spanned by q and A. 2



Remark 4.10 Next goal. It holds x(k) ∈ x(0) + Kk r(0) , A . In the following,
Richardson’s method will be generalized by constructing the iterates x(k) in this
manifold with respect to certain optimality criteria. 2

2 Krylov

17
Chapter 5

Krylov Subspace Methods


that Are Based on the
Minimization of the Residual

Remark 5.1 Goal. The goal of these methods consists in determining


 
x(k) ∈ x(0) + Kk r(0) , A

such that the corresponding Euclidean norm of the residual

r(k) = b − Ax(k)
2 2

becomes minimal in the space Kk r(0) , A . Note, in the Richardson iteration with
the special choice (4.4), the norm of the residual will be minimized on the line
x(k) +τ r(k) . However,
 the minimum on this line is in general not the global minimum
in x(0) + Kk r(0) , A . 2

5.1 General matrices


Remark 5.2 Construction of an orthonormal basis of the Krylov subspace. To
perform the minimization of the Euclidean norm of the residual efficiently, an or-
thonormal basis of Km (q1 , A) is needed. There are several ways to transform an ar-
bitrary basis into an orthonormal one, e.g. the modified Gram1 –Schmidt2 method3
or the Householder4 algorithm. In the context of Krylov subspace methods, the
computation of an orthonormal basis of Km (q1 , A) is called Arnoldi’s5 method. 2
1 Jorgen Pedersen Gram (1850 – 1916)
2 Erhard Schmidt (1876 – 1959)
3 Given a set of orthonormal vectors {u , . . . , u
1 m−1 } and a vector vm that should be orthornor-
malized with this set. In the original Gram–Schmidt method, one computes all projections of vm
with respect to ui , i = 1, . . . , m − 1, and subtracts these projections. In the modified Gram–
Schmidt method, one computes the projection of vm to one of the vectors, say u1 , and subtracts
(1) (1)
this projection. The result vm is orthogonal to u1 . Now the projection vm with respect to
(2)
a second vector, say u2 is computed and subtracted to give vm . This vector is orthogonal to
u1 and u2 . Continuing this procedure leads to the orthonomalization of vm with respect to
{u1 , . . . , um−1 }. In exact arithmetic, both versions are identical. From the numerical point of
view, the original Gram–Schmidt method might be instable whereas the modified Gram–Schmidt
method is stable.
4 Alston Scott Householder (1904 – 1993)
5 Walter Edwin Arnoldi (1917 – 1995)

18
Algorithm 5.3 Arnoldi method, modified Gram–Schmidt method. Given
A ∈ Rn×n and q1 ∈ Rn with kq1 k2 = 1.
1. for j = 1 : m
2. wj = Aqj
3. for i = 1 : j
4. hij = (wj , qi )
5. wj = wj − hij qi % subtract projection
6. endfor
7. hj+1,j = kwj k2
8. if hj+1,j == 0
9. stop
10. endif
11. qj+1 = wj /hj+1,j
12. endfor
2

Lemma 5.4 Computation of an orthonormal basis by Arnoldi’s method.


If dim Km (q1 , A) = m, then Arnoldi’s method computes an orthonormal basis
{q1 , . . . , qm } of Km (q1 , A).

Proof: The vectors q1 , . . . , qm are orthonormal by construction: orthogonal by line 3


– 6 and normalized by line 7 and 11. One has to show that they belong all to Km (q1 , A).
This statement will follow from the fact that each vector qj is of the form pj−1 (A) q1 ,
where pj−1 is a polynomial of degree j − 1. The proof is done by induction. For j = 1,
one has q1 = p0 (A) q1 such that p0 (t) = 1. Assume, the statement is true for j = k. One
has by using first line 11, then lines 3 – 6, and finally the assumption of the induction
k
X k
X
hk+1,k qk+1 = wk = Aqk − hik qi = Apk−1 (A)q1 − hik pi−1 (A)q1 = pk (A)q1 .
i=1 i=1

Remark 5.5 Factorization of the system matrix. Denote

Qm = (q1 , q2 , . . . , qm ) ∈ Rn×m ,
 
h11 h12 · · · h1,m−1 h1m
 h21 h22 · · · h2,m−1 h2m 
 
 0 h32 · · · h3,m−1 h3m 
 ∈ R(m+1)×m .
 
Hm =  .. .. .. .. ..
 .
 . . . . 

 0 0 · · · hm,m−1 hmm 
0 0 ··· 0 hm+1,m

A matrix of this form, i.e., hij = 0 for i > j + 1, is called (upper) Hessenberg6
matrix. It follows readily from Arnoldi’s method that

AQm = Qm+1 Hm . (5.1)

Remark 5.6 Initial vector in Krylov subspace methods. In the Krylov subspace
methods, r(0) / r(0) 2 plays the role of q1 in Arnoldi’s method. 2
6 Karl Hessenberg (1904 – 1959)

19
Remark 5.7 Principle approach for minimizing the residual. The goal of the meth-
ods presented in this section is to minimize the Euclidean norm of the residual. One
has
r(k) = b − Ax(k) = r(0) + Ax(0) − Ax(k)
2
 2  2

= r − A x(k) − x(0)
(0)
2

with x(k) − x(0) ∈ Kk r(0) , A , see Remark 4.8. Since the vectors {q1 , . . . , qk }

computed with Arnoldi’s method form a generating system of Kk r(0) , A , it is
k
X
x(k) − x(0) = zi q i = Q k z
i=1

T
with z = (z1 , . . . , zk ) , Qk = (q1 , . . . , qk ). Using (5.1), Qk+1 e1 = q1 = r(0) / r(0) 2
and the fact that the Euclidean norm is invariant under orthonormal transforma-
tions, one obtains
2 2 2
r(k) = r(0) − AQk z = r(0) − Qk+1 Hk z
2 2 2
2   2
(0)
= r Qk+1 e1 − Qk+1 Hk z = Qk+1 r(0) e1 − H k z
2 2 2 2
2
= r(0) e1 − H k z .
2 2

The minimizer of the residual is obtained by solving the least squares problem
2
min r(0) e1 − H k z . (5.2)
z∈Rk 2 2

This problem possesses k unknowns and the vector that has to be minimized has
k + 1 components. It can be solved, e.g., with the QR algorithm, see lecture notes
Numerical Mathematics I. Let z(k) be a solution of this problem, then the next
iterate of the Krylov subspace method is
x(k) = x(0) + Qk z(k) . (5.3)
This algorithm is called GMRES (generalized minimal residual). It has been pro-
posed the first time in Saad and Schultz (1986). 2
Theorem 5.8 Properties of GMRES.
i) In the case that Arnoldi’s
 method has an early breakdown, i.e., hl+1,l = 0,
then dim Kk r(0) , A = l < k and r(l) = 0. Hence Ax(l) = b.
ii) The iterate x(k) = x(0) + Qk z(k) is uniquely determined.
iii) It holds
r(k) ≤ r(k−1) , k = 1, 2, 3, . . .
2 2

Proof: i) The breakdown of Arnoldi’s method after l steps, hl+1,l = 0, is equivalent


to wl = 0. It follows from Arnoldi’s algorithm lines 3 – 6 that
l
X
Aql = hil qi ,
i=1
 
where q1 = r(0) / r(0) in the case of GMRES. Hence, one has dim Kl+1 r(0) , A =
  2
dim Kl r(0) , A . One obtains by induction
   
dim Kk r(0) , A = dim Kl r(0) , A for k ≥ l.

20
Using matrix notations, Arnoldi’s method gives in this case

h11 h12 h1l


 
···
 h21 h22 ··· h2l 
l×l
AQl = Ql H̃l with H̃l =  . ..  ∈ R , Ql ∈ Rn×l .
 
. . . . .
 . . . . 
0 · · · hl,l−1 hll

SinceA is non-singular
 and
 rank
 (Ql ) = l, one has rank (AQl ) = l. Consequently, it is
rank Ql H̃l = l and rank H̃l = l and H̃l is invertible. In the same way as above, one
obtains
2 2
min r(l) = min r(0) e1 − H̃l z .
2 z∈Rl 2 2
 
The minimizer is given by z(l) = H̃l−1 r(0) e1 which gives r(l) = 0.
2 2
ii) If rank (Hk ) = k, the minimizer of (5.2) is unique (theory of least squares problems,
see Numerical Mathematics I). If rank (Hk ) < k, then x(k) = x, see i).
iii) The set in which
 theminimizer is computed becomes larger since the inclusion
Kk r(0) , A ⊇ Kk−1 r(0) , A holds.

Remark 5.9 Implementational issues.


• The GMRES process consists in principle of two steps:

1. computing the orthonormal basis of Kk r(0) , A ,
2. solving the least squares problem (5.2) to find the minimizer of the residual
with a standard method.
In the practical use of GMRES, Step 2 is performed only at the end of the
iteration. Thus, the iterate x(k) is not directly available. It is computed in
a post-processing step. However, there is an elegant and inexpensive way to
compute r(k) 2 without having access to x(k) , see Saad (2003). With r(k) 2
one can control the iterative process.
For concrete ways to implement GMRES, it is refered to Saad and Schultz
(1986); Saad (2003).
• Each step of GMRES requires one matrix-vector multiplication, line 2 of Arnoldi’s
method.
• In exact arithmetic, GMRES terminates with the solution in at most n steps.
This property is, however, of no practical use for large n.
• From the practical point of view, the greatest  problem of GMRES is that one has
to store the basis {q1 , . . . , qk } of Kk r(0) , A . see lines 3 – 6 of Arnoldi’s method.
Thus, with every new iteration, one has to store an additional vector. This
situation is called long recurrence. In practice, one prescribes a maximal order m
of the Krylov subspace. After m iterations, GMRES is stopped with the iterate
x(m) . If x(m) is not yet sufficiently close to the solution, GMRES is started
from the beginning with x(0) = x(m) . This approach is called GMRES(m) (with
restart). An optimal choice of m is in general an unresolved problem. Often
m ∈ [5, 20] is used. GMRES(m) might also fail to converge, see Saad and Schultz
(1986) for the simple example
     
0 1 1 0
A= , f= , x(0) = .
−1 0 1 0

GMRES converges in two steps whereas GMRES(1) computes the stationary


sequence x(1) = x(0) , x(2) = x(1) = x(0) and so on. Despites of the possibility
to fail, GMRES(m) is one of the most popular and best performing iterative
methods for solving linear systems of equations with non-symmetric matrix.
2

21
5.2 Symmetric Matrices
Remark 5.10 Goal. Arnoldi’s method and the minimization of the residual in
Kk r(0) , A will be studied in the special case that A is symmetric. The most
important result will be that in this case, it is not necessary to store the basis of
Kk r(0) , A . If suffices to store a fixed number of only few basis vectors. Thus, the
memory requirements do not increase in the course of the iteration and the most
important problem of using GMRES vanishes. 2

Remark 5.11 Arnoldi’s method revisited. First, Arnoldi’s method is revisited.


From the general relation (5.1) it follows by the orthonormality of the columns of
Qk and Qk+1 that
 
1 0 ··· 0 0
 0 1 ··· 0 0 
QTk AQk = QTk Qk+1 Hk =  . . . Hk =: H̃k ∈ Rk×k .
 
(5.4)
 .. .. . . ... ... 

0 0 ··· 1 0

Thus, H̃k contains just the first k rows of Hk . Since


T
QTk AQk = QTk AT Qk = QTk AQk ,

is a symmetric matrix, H̃k is symmetric, too. As in the case of a general matrix,


H̃k is an upper Hessenberg matrix. From its symmetry, it follows that H̃k is even
a tridiagonal matrix. Hence, Hk is a tridiagonal matrix, too
 
α1 β2 0 · · · 0 0
 β2 α2 β3 · · · 0 0 
 
 .. . . . .. . .. .
.. .. 
 . . . 
 ∈ R(k+1)×k .
 
Hk = 
 
 0 0 0 · · · αk−1 βk 
 
 0 0 0 ··· βk αk 
0 0 0 ··· 0 βk+1

Arnoldi’s method simplifies. Using (5.1) and the special form of Hk , one obtains
the relation
Aqk = βk qk−1 + αk qk + βk+1 qk+1 .
From this relation, qk+1 can be computed. The corresponding algorithm is called
Lanczos7 algorithm.
2

Algorithm 5.12 Lanczos algorithm – modified Gram–Schmidt variant.


Given a symmetric matrix A ∈ Rn×n and q1 ∈ Rn with kq1 k2 = 1.
1. β1 = 0
2. q0 = 0
3. for j = 1 : m
4. s = Aqj − βj qj−1
5. αj = (s, qj )
6. s = s − αj qj
7. βj+1 = ksk2
8. if βj+1 == 0
9. stop
7 Cornelius Lanczos (1893 – 1974)

22
10. endif
11. qj+1 = s/βj+1
12. endfor
2

Remark 5.13 Short recurrence. The computation of qj+1 requires only qj−1 and
qj , see lines 4, 6, and 11. This situation is called short recurrence. 2

Lemma 5.14 Non-singularity of the matrix generated by the Lanczos


method. Let A be symmetric and positive definite. Then, the matrix H̃k = QTk AQk
which is generated in the Lanczos method is non-singular.

Proof: One has for all y ∈ Rk , y 6= 0,


yT H̃k y = yT QTk AQk y = (Qk y)T A (Qk y) > 0,

since Qk has full rank and A is positive definite. Hence, H̃k is also positive definite (and
symmetric). It follows that H̃k is non-singular.

Remark 5.15 The minimzation of the residual revisited. In the second step, one
has to find a way to minimize the residual in Kk r(0) , A without having to store

the complete basis of Kk r(0) , A . Only if this is possible, then the short recurrence
of the Lanczos algorithm becomes useful.
The least squares problem which has to be solved has the form (5.2). To solve
this problem, a QR decomposition of Hk is used

Hk = Q̄R̄k , Q̄ ∈ R(k+1)×(k+1) , R̄k ∈ R(k+1)×k .

Here, Q̄ is a unitary matrix and R̄k an upper triangular matrix.


The unitary matrix Q̄ describes, geometrically, rotations and reflections. It can
be decomposed as a product of simple rotations or reflections, so called Givens8
rotations or Givens reflections

Q̄ = G1 G2 · · · Gk−1 Gk

where, in the case of a Givens rotation,


 
1
 .. 

 . 


 1 

 cj sj 
 ∈ R(k+1)×(k+1) , c2j + s2j = 1,
Gj =  
 −s j cj 


 1 

 .. 
 . 
1

see lecture notes Numerical Mathematics I. For a Givens reflection, the non-diagonal
block has the form  
cj sj
, c2j + s2j = 1.
sj −cj
The off diagonal entries are in the rows j and j + 1. It is

R̄k = Q̄T Hk = GTk GTk−1 · · · GT2 GT1 Hk .


8 James Wallace Givens, Jr. (1910 – 1993).

23
Since Hk is tridiagonal, one obtains
 
r11 r12 r13 0 ··· 0
 0 r22 r23 r24 ··· 0 
 
 .. .. .. .. .. .. 
 .
 . . . . . 

R̄k =  0 rk−2,k ,
 
 0 rk−1,k 
 
 0 rk,k 
0 0 0 0 ··· 0

i.e. rij = 0 if j > i + 2. A Givens rotation changes only the two columns that are
involved, i.e. here the two neighbouring columns j and j + 1. Since the non-zero
entries at column j of Hk are at rows (j − 1), j, and j + 1, where the last will be
transformed to become zero, a new non-zero entry at column (j + 1) can occur only
at row (j − 1).
Consider the only interesting case r(k) 6= 0, in which the matrix Hk has full
rank. Let Rk be the matrix which consists of the first k rows of R̄k . The matrix
Rk is non-singular since Hk has full rank. Setting

Pk = (p1 p2 . . . pk ) := Qk Rk−1 ∈ Rn×k ,

one has from Pk Rk = Qk and due to the special form of Rk the recursion
q1
p1 = ,
r11
1
p2 = (q2 − r12 p1 ) (⇐= r22 p2 + r12 p1 = q2 ) ,
r22
..
.
1
pj = (qj − pj−1 rj−1,j − pj−2 rj−2,j ) , j = 3, . . . k. (5.5)
rjj

The least squares problem (5.2) can now rewritten in the form
2 2
min r(0) e1 − G1 G2 · · · Gk R̄k z = min r(0) GTk · · · GT2 GT1 e1 − R̄k z ,
z∈Rk 2 2 z∈Rk 2 2

because the Euclidean norm is invariant under a multiplication with a unitary ma-
trix. Since the last row of R̄k vanishes, its Moore9 –Penrose10 inverse (pseudo-
invers), see Numerical Mathematics I, is given by

R̄k+ = Rk−1 0 ∈ Rk×(k+1)

and the solution of the least squares problem is given by



z(k) = r(0) R̄k+ GTk · · · GT2 GT1 e1 = r(0) Rk−1 GTk · · · GT1 e1 1≤i≤k
,
2 2

where the last index symbolizes that only the first k components of the vectors are
taken. Consequently, the iterate with the minimal residual has the form, see (5.3),

x(k) = x(0) + Qk z(k) = x(0) + r(0) Qk Rk−1 GTk · · · GT1 e1 1≤i≤k
2 | {z }
∈Rk+1

= x(0) + r(0) Pk GTk · · · GT1 e1 1≤i≤k
.
2
9 Eliakim Hastings Moore (1862 – 1932)
10 Roger Penrose, born 1931

24
Since the Givens rotation or reflection GTj influences only the rows j and j + 1 of
the vector to which it is applied, the first (j − 1) of its components stay unchanged:
 
GTj · · · GT1 e1 1≤i≤j−1 = GTj−1 · · · GT1 e1 1≤i≤j−1 .

It follows that

x(k)
 
= x(0) + r(0) Pk−1 GTk · · · GT1 e1 1≤i≤k−1
+ r(0) pk GTk · · · GT1 e1 i=k
2 2
 
= x(0) + r(0) Pk−1 GTk−1 · · · GT1 e1 1≤i≤k−1
+ r(0) pk GTk · · · GT1 e1 i=k
2 2
| {z }
=x(k−1)
(k−1) (0)

= x + r GTk · · · GT1 e1 i=k
pk .
2

For computing pk , one needs, see (5.5), qk , pk−1 , and pk−2 , The result pk can be
stored in place of pk−2 since pk−2 is not needed any longer. Together with the
short recurrence
 of the Lanczos algorithm, it is shown that the storage of the basis
of Kk r(0) , A is not necessary.
The resulting method which computes iterates with minimal residual for sym-
metric matrices A is called MINRES. MINRES requires to store six arrays: qk ,
qk+1 , s, pk , pk−1 , and x(k) . In comparison with GMRES, the current iterate x(k)
is known and not only the residual of the current iterate. 2

Remark 5.16 S.p.d. matrices, conjugate residual method. In practice, A is often


not only symmetric but als positive definite. In this case, MINRES is seldom used,
since for s.p.d. matrices there is a more efficient method called conjugate residual
method. 2

Definition 5.17 Conjugate vectors. Let A ∈ Rn×n be symmetric and positive


definite. The vectors x, y ∈ Rn are called (A)-orthogonal or (A)-conjugate if

xT Ay = (Ax, y) = 0.

If there is no ambiguity, the vectors are called just conjugate. 2

Remark 5.18 Comparison of MINRES and conjugate residuals, conjugate gradient


method. The conjugate residual method needs to store only five arrays. It requires in
each iteration one matrix-vector product. The memory requirements are one array
more than the conjugate gradient method, see Section 6.2. In addition, one has to
compute one vector update (2n flops) per iteration more with the conjugate residual
method in comparison with the conjugate gradient method. Since both methods
need in general a similar number of iterations, the conjugate gradient method is
prefered in practice. For this reason, it will be refered to the literature for more
details concerning the conjugate residual method. 2

Remark 5.19 S.p.d. matrices vs. other matrices. As it can be already seen in the
case of Krylov subspace methods that minimize the residual, one has to distinguish
the cases that A is s.p.d. or A is another matrix. These two cases represent two
worlds in the context of iterative methods for solving linear systems of equations.
Methods that can be used for general matrices and that are considered to work
usually well in this case, are generally not the best methods for s.p.d. matrices. The
solution of systems with s.p.d. matrices is much simpler. In engineering practice,
it is a common approach to try to reduce the solution of a complicated problem to
the sucessive solution of linear systems with s.p.d. matrices. 2

25
Chapter 6

Krylov Subspace Methods


that are Based on a
Projection of the Residual

Remark 6.1 Idea. The methods presented  in this section determine the iterate
x(k) at the manifold x(0) + Kk r(0) , A such that the corresponding residual r(k)

is orthogonal to Kk r(0) , A . That means, r(k) is projected into the orthogonal
⊥ 
complement Kk r(0) , A of Kk r(0) , A . 2

6.1 General Matrices


Remark 6.2 Full orthogonalization method. Let Qk = {q1 , . . . , qk } be an or-
thonormal basis of Kk r(0) , A computed with Arnoldi’s method. The identities
(5.1) and (5.4) are valid. It is q1 = r(0) / r(0) 2 . Set β = r(0) 2 . By the orthogo-
nality of the columns of Qk it follows that
QTk r(0) = βQTk q1 = βe1 . (6.1)
Consequently, the iterate x(k) is given by
x(k) = x(0) + Qk yk with yk = H̃k−1 (βe1 ) , (6.2)
⊥
since the orthogonal projection of r(k) into Kk r(0) , A is unique and the iterate

(6.2) fulfills r(k) ⊥ Kk r(0) , A :
 
QTk r(k) = QTk b − Ax(0) − βAQk H̃k−1 e1 = QTk r(0) − β QTk AQk H̃k−1 e1
| {z }
=H̃k

= QTk r(0) − βe1 = r (0)


e1 − βe1 = 0,
2

where (5.4) and (6.1) have been used.


The algorithm which is based on this approach is called full orthogonalization
method (FOM). Since it is of little relevance in practice, it will not be presented
here in detail. Similar to GMRES, an early break down of the Arnoldi process
is equivalent of already having computed the solution. FOM possesses
 the same
great problem as GMRES since the whole basis of the Kk r(0) , A has to be stored.
In contrast to GMRES, the iterate of FOM is undefined if H̃k is singular. This
situation can happen, e.g., if A is a symmetric indefinite matrix. 2

26
6.2 Symmetric Matrices
Remark 6.3 SYMMLQ for symmetric matrices. If A is a symmetric matrix, there
is a way to perform FOM with a shortrecurrence, i.e., without having to store the
whole basis {q1 , . . . , qk } of Kk r(0) , A . The resulting method is called SYMMLQ.
This method will not be presented here. Instead, the case that A is symmetric and
positive definit will be studied in detail. Then, SYMMLQ can be simplified, leading
to the famous conjugate gradient (CG) method. 2
Remark 6.4 Lanczos algorithm for a s.p.d. matrix. CG will be derived from the
Lanczos algorithm 5.12. Starting point is the Cholesky1 decomposition of H̃k
H̃k = Lk Dk LTk (6.3)
  
1 0 ··· 0 0 d1 0 ··· 0 0
 l1 1 ··· 0 0   0 d2 ··· 0 0 
  
=  .. ..  .. 
 . .  . 
0 0 ··· lk−1 1 0 0 ··· dk
 
1 l1 ··· 0 0
 .. .. 

× . . 
.
 0 0 ··· 1 lk−1 
0 0 ··· 0 1

Define P̂k = Qk L−T


k = (p̂1 , . . . , p̂k ). The columns of P̂k are linear
 combinations
of the columns of Qk such that span{p̂1 , . . . , p̂k } ⊂ Kk r(0) , A . Since Lk is a

non-singular matrix and since the columns of Qk form a basis of Kk r(0) , A , the

columns of P̂k form a basis of Kk r(0) , A , too. It is for the iterate (6.2) of FOM

x(k) = x(0) + βQk L−T −1 −1


k D k L k e1 = x
(0)
+ P̂k yk (6.4)
with yk = βk Dk−1 L−1
k e1 . 2

Lemma 6.5 Colums of P̂k are A-conjugate. The columns {p̂1 , . . . , p̂k } are
mutually A-conjugate, i.e., P̂kT AP̂k is a diagonal matrix.
Proof: Using (5.4) and (6.3) gives
P̂kT AP̂k = L−1 T −T
k Qk AQk Lk = L−1 −T
k H̃k Lk = Dk .

Remark 6.6 First version of a method. The last column of P̂k is given by
p̂k = qk − lk−1 p̂k−1 , (6.5)
which follows immediately from Qk = P̂k LTk . The update yk in (6.4) has the form
T
yk = (yk−1 , ηk ) with ηk ∈ R, since
 −1  −1
−1 −1 Dk−1 Lk−1 0
yk = βDk Lk e1 = β e1,k
dk lk−1 1
| {z }| {z }
−1
   
Dk−1 L−1
k−1 0 
d−1
  
k ∗ 1
 −1 −1   −1
  
Dk−1 Lk−1 0 βDk−1 L−1
k−1 e1,k−1 yk−1
= β e 1,k = = ,
∗ d−1
k ∗ ηk
1 André Louis Cholesky (1875 – 1918)

27
where e1,k is the first Cartesian unit vector with k components and e1,k−1 the first
Cartesian unit vector of length (k − 1). This means, the first (k − 1) components
of yk are the components of yk−1 . Now, one needs to find a formula for ηk .
From the definition of yk it follows that Lk Dk yk = βe1 , i.e.,
   
yk,1 d1 β
 yk,2 d2   0 
   
Lk  ..  =  ..  .
 .   . 
ηk dk 0
Hence, lk−1 yk,k−1 dk−1 + ηk dk = 0 and
lk−1 yk,k−1 dk−1
ηk = − , if k ≥ 2. (6.6)
dk
The first component, η1 is given by
β
η1 = βD1−1 L−1
1 e1 = .
d1
Inserting all terms into (6.4) gives
  y 
k−1
x(k) = x(0) + P̂k yk = x(0) + P̂k−1 p̂k = x(0) + P̂k−1 yk−1 + ηk p̂k
ηk
= x(k−1) + ηk p̂k . (6.7)
Thus, the new iterate can be computed with (6.5) and (6.6). This approach shows
that a short recurrence is possible. However, it is not yet optimal and it can be
simplified. 2
Remark 6.7 Optimal version of the method. It holds for the residual, using (6.4),
that
r(k) = b − Ax(k) = b − Ax(0) − AP̂k yk = r(0) − Az

with some vector z ∈ Kk r(0) , A = span{q1 , . . . , qk } since the columns of P̂k form

a basis of Kk r(0) , A , see Remark 6.4. This representation shows first that
 
r(k) ∈ Kk+1 r(0) , A = span{q1 , . . . , qk , qk+1 }.

By construction, see Remark 6.2, it is also r(k) ⊥ Kk r(0) , A . These two properties
imply that r(k) ∈ span{qk+1 } such that

r(k) = ± r(k) qk+1 .


2

Using (6.7), the residual vector r(k) can be computed recursively by


 
r(k) = b − Ax(k) = b − A x(k−1) + ηk p̂k = r(k−1) − ηk Ap̂k . (6.8)

Setting
r(k)
qk+1 = (6.9)
r(k) 2

and denoting pk = r(k−1) 2


p̂k , one obtains with (6.7) and (6.8)
ηk
r(k) = r(k−1) − Apk = r(k−1) − νk Apk , (6.10)
r(k−1) 2
ηk
x(k) = x(k−1) + (k−1) pk = x(k−1) + νk pk . (6.11)
r 2

28
With (6.5) and (6.9), one gets
!
(k−1) (k−1) pk−1
pk = r p̂k = r qk − lk−1
2 2 r(k−2) 2
r(k−1) l
2 k−1
= r(k−1) − pk−1 = r(k−1) + µk pk−1 . (6.12)
r(k−2) 2

Now, formulas for νk and µk are needed. Multiplying (6.12) from the left hand side
with pTk A gives
2
pTk Apk = pTk Ar(k−1) + µk r(k−1) p̂T Ap̂k−1 = pTk Ar(k−1) . (6.13)
2 | k {z }
=0,Lemma 6.5

T
Multiplying (6.10) from left with r(k−1) and using r(j) = cqj , j = 1, . . . , k, and
the orthonormality of the vectors qj leads to
 T  T  T
r(k−1) r(k) = r(k−1) r(k−1) − νk r(k−1) Apk ,
| {z }
=0

which gives together with (6.13)


T T
r(k−1) r(k−1) r(k−1) r(k−1)
νk = T = . (6.14)
r(k−1) Apk pTk Apk

Now, multiplying (6.12) from left with pTk−1 A leads to, using Lemma 6.5,

pTk−1 Apk = pTk−1 Ar(k−1) + µk pTk−1 Apk−1 ,


| {z }
=0

which gives
pTk−1 Ar(k−1)
µk = − .
pTk−1 Apk−1
T
To simplify this expression, multiply (6.10) from left with r(k) such that one
obtains, using r(k) ⊥ r(k−1) ,
T

(k)
T
(k)

(k)
T r(k) r(k)
r r = 0 − νk r Apk =⇒ νk = − T .
r(k) Apk

This expression gives together with (6.14)


T T
r(k) Apk r(k) r(k)
− = T
pTk Apk r(k−1) r(k−1)

such that T
r(k−1) r(k−1)
µk = T . (6.15)
r(k−2) r(k−2)
The evaluation of this expression requires only two inner products but not any
matrix-vector product. These considerations lead to Algorithm 6.8. 2

29
Algorithm 6.8 Conjugate Gradient (CG). Given a symmetric positive definite
matrix A ∈ Rn×n , a right hand side b ∈ Rn , an initial iterate x(0) ∈ Rn and a
tolerance ε > 0.
1. r(0) = b − Ax(0)
2. p1 = r(0)
3. k = 0
4. while r(k) 2 > ε
5. k =k+1
6. s = Apk
T
r(k−1) r(k−1)
7. νk = % (6.14)
pTk s
8. x(k) = x(k−1) + νk pk % (6.11)
9. r(k) = r(k−1) − νk s % (6.10)
T
r(k) r(k)
10. µk+1 = T % (6.15)
r(k−1) r(k−1)
11. pk+1 = r(k) + µk+1 pk % (6.12)
12. endwhile
2

Remark 6.9 First publication of CG. The CG method has been published the first
time by Hestenes2 and Stiefel3 in Hestenes and Stiefel (1952). 2

Remark 6.10 Costs of CG. The costs of one CG iteration are:


• one matrix-vector multiplication, line 6,
• three additions of vectors in Rn , lines 8, 9, 11,
• three multiplications of vectors with a scalar, lines 8, 9, 11,
T
• two inner product of vectors, lines 7, 10. The inner product r(k−1) r(k−1) is
already known from the previous iteration.
One has to store four vectors: x(k) , r(k) , pk , s. In comparison wiht the conjugate
residual method, CG needs one vector update (2n flops) less and one has to store
one vector less. Since both schemes exihibit in general a similar convergence, CG
is generally preferred.
Altogether, CG is in general the best performing iterative scheme without a
multigrid component for solving linear systems of equations with a symmetric pos-
itive definit matrix. 2

Definition 6.11 Energy norm. Let A ∈ Rn×n be s.p.d., then A induces a vector
norm by
1/2
kxkA = (x, Ax) ∀ x ∈ Rn ,
the so-called energy norm. 2

Theorem 6.12 Minimization of the error in the energy norm. Let A ∈


Rn×n be symmetric and positive definite. The iterate

x(k) = x(0) + r(0) Qk H̃k−1 e1


2

is well defined and it is the solution of

min kx − ykA ,
y∈{x(0) +Kk (r(0) ,A)}

2 Magnus Rudolph Hestenes (1906 – 1991)


3 Eduard L. Stiefel (1909 – 1978)

30
where x is the solution of (1.1). The corresponding residual r(k) is orthogonal to
Kk r(0) , A , i.e., QTk r(k) = 0.

Proof: The non-singularity of the matrix H̃k , Lemma


 5.14,
 induces that the iterate
x(k) is well defined. The orthogonality of r(k) and Kk r(0) , A follows by the construction
of the method,n see Remark 6.2. o  
Let y ∈ x(0) + Kk r(0) , A , y 6= x(k) , and denote z = y − x(k) ∈ Kk r(0) , A .
 
Using the symmetriy of A, the orthogonality of r(k) to z ∈ Kk r(0) , A and the positive
definiteness of A gives

kx − yk2A
 T  
= (x − y)T A (x − y) = x − x(k) − z A x − x(k) − z
 T      T
= x − x(k) A x − x(k) − zT A x − x(k) − x − x(k) Az +zT Az
| {z }
T
T

= (x−x(k) ) Az ∈R
 T    
= x − x(k) A x − x(k) − 2zT A x − x(k) + zT Az
 T  
= x − x(k) A x − x(k) − 2 z T (k)
r }+z
| {z
T
Az}
| {z
=0 >0
 T   2
(k) (k) (k)
> x−x A x−x = x−x .
A

Remark 6.13 On the energy norm. To minimize the error in the energy norm is
more natural than to minimize the Euclidean norm of the residual since
 
r(k) = A x − x(k) = x − x(k) 2 .
2 2 A

The energy norm is the natural measure for the error.


In the literature, one can find the derivation of the CG method also with the
starting point of trying to minimize the error in the energy norm. One finds that
the most simple approach, the steepest descent method, converges very slowly.
Considerations on improving the iterative scheme lead finally to the CG method.
2

31
Chapter 7

Convergence of Krylov
Subspace Methods

Remark 7.1 Motivation. The Krylov subspace methods compute the solution of
(1.1) in at most n iterations (in exact arithmetic) by construction. However, this
property is useless if n is large. The question arises if one can get information about
the iterate x(k) for k < n. 2

Remark 7.2 Starting point of the convergence analysis. The basis of the con-
vergence analysis for Krylov subspace methods is the following observation: z ∈
Kk r(0) , A is equivalent to z = qk−1 (A) r(0) , where qk−1 ∈ Pk−1 is a polynomial


of degree k − 1. It follows for the residual of the k-th iterate that


 
r(k) = b − Ax(k) = b − A x(0) + z = r(0) − Az = r(0) − Aqk−1 (A) r(0)
= pk (A)r(0) , (7.1)

where pk (x) = 1 − xqk−1 (x) ∈ Pk with pk (0) = 1.


Considering the methods which are based on the minimzation of the residual,
see Chapter 5, one has now

r(k) = min pk (A)r(0) ,


2 pk ∈Pk ,pk (0)=1 2

such that with pk (A)r(0) 2


≤ kpk (A)k2 r(0) 2
it follows that

r(k) 2
≤ min kpk (A)k2 . (7.2)
r(0) 2
pk ∈Pk ,pk (0)=1

For all Krylov subspace methods, in particulr for those methods which are based
on projecting the residual, see Chapter 6, it holds with (7.1) that
 
x − x(k) = A−1 b − A−1 b − r(k) = A−1 r(k) = A−1 pk (A) r(0)
k k k
! ! !
X X X
= A−1 αi Ai r(0) = αi Ai−1 r(0) = αi Ai A−1 r(0)
i=0 i=0 i=0
 
−1 (0) (0)
= pk (A) A r = pk (A) x − x . (7.3)

32
Remark 7.3 S.p.d. matrices and the CG method. Consider the case that A is
symmetric and positive definite. Then, one gets from (7.3)
 
x − x(k) = pk (A) x − x(0) .
A A

The iterate x(k) of the conjugate gradient method is the minimizer of x − x(k) A
in x(0) + Kk r(0) , A , see Theorem 6.12. Hence


 
x − x(k) = min pk (A) x − x(0) ,
A pk ∈Pk ,pk (0)=1 A

since pk (A) is the only parameter in the expression on the right hand side. From
 
pk (A) x − x(0)
A
  T   1/2
= pk (A) x − x(0) A pk (A) x − x(0)
  T   1/2
1/2 (0) 1/2 (0)
= A pk (A) x − x A pk (A) x − x
  T   1/2
= pk (A) A1/2 x − x(0) pk (A) A1/2 x − x(0)
 
= pk (A) A1/2 x − x(0)
2
 
1/2 (0)
≤ kpk (A)k2 A x−x = kpk (A)k2 x − x(0)
2 A

it follows that
x − x(k) A
≤ min kpk (A)k2 . (7.4)
x − x(0) A
pk ∈Pk ,pk (0)=1

The right hand side of (7.4) is the same as the right hand side of (7.2). 2

Lemma 7.4 Characterization of kpk (A)k2 for normal matrices. If A ∈ Rn×n


is a normal matrix, see Definition 2.14, then

min kpk (A)k2 = min max |pk (λ)| .


pk ∈Pk ,pk (0)=1 pk ∈Pk ,pk (0)=1 λ is eigenvalue of A

Proof: Let pk ∈ Pk be an arbitrary polynomial with pk (0) = 1. Then


  k
X  i
kpk (A)k2 = pk QT ΛQ = αi QT ΛQ
2
i=0 2
k
!
X
= QT αi Λi Q = QT pk (Λ) Q = kpk (Λ)k2 ,
2
i=0 2

since  i      
QT ΛQ = QT Λ Q (QT Λ Q QT ΛQ . . . QT ΛQ = QT Λi Q
| {z } | {z }
=I =I

and the k·k2 -norm is invariant with respect to the multiplication with unitary matrices.
The matrix pk (Λ) is diagonal with the entries pk (λi ). Hence

kpk (A)k2 = max |pk (λi )|


1≤i≤n

by the definition of the spectral norm.

33
Remark 7.5 Chebyshev polynomials. For proving the convergence theorem, Cheby-
shev1 polynomials of first kind will be use, see also the lecture notes of Numerical
Mathematics I,

Tk (x) = cos (k arccos (x))


   
k k k−2 2 k 2
xk−4 1 − x2

= x − x 1−x +
2 4
 
k 3
xk−6 1 − x2 . . . , x ∈ [−1, 1].


6

In particular, it is Tk ∈ [−1, 1] for x ∈ [−1, 1],

T0 (x) = 1, T1 (x) = x, T2 (x) = 2x2 − 1.

The domain of definition of Tk (x) can be extended to |x| > 1. It is


1  p 
arccos (x) = ln x + x2 − 1 , x ∈ R,
i
such that  
k  p
Tk (x) = cos ln x + x2 − 1 .
i
For x > 1, one has  p 
ln x + x2 − 1 = arcosh (x)
and from
z  ez + e−z
cos = cos (−iz) = cos (iz) = cosh (z) = , z ∈ C,
i 2
it follows that
Tk (x) = cosh (k arcosh (x)) for x > 1.
For symmetry reasons, one obtains for x < −1
k
Tk (x) = (−1) cosh (k arcosh (−x)) . (7.5)

Theorem 7.6 Estimate of the rate of convergence for s.p.d. matrices. Let
A be symmetric and positive definit. Then
p !k
κ2 (A) − 1
min kpk (A)k2 ≤ 2 p .
pk ∈Pk ,pk (0)=1 κ2 (A) + 1

Proof: The idea of the proof consists in constructing a special polynomial which gives
the estimate since
min kpk (A)k2 ≤ kpk,special (A)k2 .
pk ∈Pk ,pk (0)=1

Let λmin be the smallest and λmax be the largest eigenvalue of A. Consider the linear
function
λmin + λmax λmax − λmin
λ : R → R, t 7→ + t.
2 2
In particular, the restriction t ∈ [−1, 1] gives λ ∈ [λmin , λmax ]. The root of λ(t) is denoted
by t0 . It is
λmin + λmax κ2 (A) + 1
t0 = − =− < −1,
λmax − λmin κ2 (A) − 1
1 Pafnuty Lvovich Chebyshev (1821 – 1894)

34
where one uses that for symmetric positive definite matrices κ2 (A) = λmax /λmin . Denoting
by t(λ) the inverse function, one defines the special polynomial
Tk (t (λ)) Tk (t)
pk (λ) = =: ∈ Pk .
Tk (t (0)) Tk (t0 )
Then pk (0) = Tk (t0 ) /Tk (t0 ) = 1. It is by Lemma 7.4 and since λ ∈ [λmin , λmax ] for all
eigenvalues of A (the maximum does not decrease if it is searched in a larger set)
|Tk (t)|
kpk (A)k2 = max |pk (λ)| ≤ max |pk (λ)| = max
λ is eigenvalue of A λ∈[λmin ,λmax ] t∈[−1,1] |Tk (t0 )|
1 1
= max |Tk (t)| ≤ . (7.6)
|Tk (t0 )| t∈[−1,1] |Tk (t0 )|
| {z }
≤1

For estimating this term, consider (7.5) since t0 < −1:

ekω0 + e−kω0
|Tk (t0 )| = (−1)k cosh(k arcosh(−t0 )) = |cosh (kω0 )| = .
| {z } 2
ω0

One has to estimate this term from below. Since −t0 > 1, one has

eω0 + e−ω0
= cosh (ω0 ) = cosh (arcosh (−t0 )) = −t0 ,
2
from what eω0 +e−ω0 = −2t0 follows. This is a quadratic equation in eω0 with the solution
q
eω0 = −t0 ± t20 − 1.
|{z}
>1

For estimating |Tk (t0 )|, one obtains a sharper estimate if the larger one of these two values
is considered, i.e.,
s
(κ2 (A) + 1)2 − (κ2 (A) − 1)2
q
ω0 2 κ2 (A) + 1
e = −t0 + t0 − 1 = +
κ2 (A) − 1 (κ2 (A) − 1)2
p 2
p p
κ2 (A) + 2 κ2 (A) + 1 κ2 (A) + 1 κ2 (A) + 1
= = p  p  = p .
κ2 (A) − 1 κ2 (A) + 1 κ2 (A) − 1 κ2 (A) − 1

Now, |Tk (t0 )| is estimated from below


p !k
ekω0 + e−kω0 ekω0 (eω0 )k 1 κ2 (A) + 1
|Tk (t0 )| = ≥ = = p .
2 2 2 2 κ2 (A) − 1

Inserting this estimate into (7.6) finishes the proof.

Remark 7.7 Connection of the number of iterations and the spectral condition
number. To guarantee the reduction of the error by a factor η < 1, using the
estimate from Theorem 7.6,
p !k
κ2 (A) − 1
2 p ≤η
κ2 (A) + 1

must be satisfied. The number of iterations to achieve this condition is


|ln (η/2)| − ln (η/2)
k≥ √  = √ .
κ (A)−1 κ (A)+1
ln √ 2 ln √ 2
κ2 (A)+1 κ2 (A)−1

35
If κ2 (A) is large, then a power series expansion gives

1+ √ 1
p !  
κ2 (A) + 1 κ2 (A)
ln p = ln  
κ2 (A) − 1 1− √ 1
κ2 (A)
! !
1 1
= ln 1 + p − ln 1 − p
κ2 (A) κ2 (A)
2
≈ p .
κ2 (A)

That means, the expected number of iterations to reduce the error by the factor η
increases with
− ln (η/2) p p 
k≈ κ2 (A) = O κ2 (A) .
2
This bevavior can be observed in fact in many situations, e.g., for linear systems of
equations arising in discretizations of partial differential equations. 2

36

You might also like