2 - Numerical Methods For Solving Linear Systems of Equations
2 - Numerical Methods For Solving Linear Systems of Equations
Introduction
Remark 1.1 Contents of the lecture notes. These lecture notes consider solvers
for a linear system of equation
Ax = b, A ∈ Rn×n , x, b ∈ Rn (1.1)
with non-singular matrix A. The solution of such systems is the core of many
algorithms.
In particular, systems with the following features will be considered in these
notes:
• the dimension n of the systems is very large,
• the system matrix A is sparse, i.e. the number of non-zero entries in A is only
a small percentage, usually O(n), of the total number of entries that is n2 .
Systems with these features arise, e.g., in the discretization of partial differential
equations.
Throughout the lecture notes, vectors are denoted by small bold-faced letters,
components of vectors by small letters, matrices by capital letters, scalars by Greek
letters, and indices by the letters i, j, l, m. The iteration index in iterative scheme
is denoted by k.
Main parts of these lecture notes follow Starke (2001). 2
2
Chapter 2
Remark 2.1 Contents. This chapter gives an overview on vector and matrix prop-
erties which will be used in these lecture notes. 2
T
Remark 2.2 Norms of vectors. Let x = (x1 , . . . , xn ) ∈ Rn be a vector. The
lp -norm is defined by
n
!1/p
X p
kxkp := |xi | , p ∈ [1, ∞),
i=1
kxk∞ := max |xi | .
i=1,...,n
If p = 1, the norm is called sum norm, in the case p = 2 one speaks of the Euclidean
norm and for p = ∞ of the maximum norm. 2
Remark 2.3 Norms of matrices. Let
a11 · · · a1n
A = ... .. .. ∈ Rn×n .
. .
an1 ··· ann
The induced matrix p-norm is defined by
kAxkp
kAkp := max = max kAxkp = max kAxkp . (2.1)
x∈R ,x6=0
n kxkp x∈Rn ,kxkp ≤1 x∈Rn ,kxkp =1
3
2
Remark 2.4 Properties of matrix norms. From (2.1) it follows immediately for all
x ∈ Rn that
kAxkp
kAkp ≥ ⇐⇒ kAxkp ≤ kAkp kxkp .
kxkp
It holds also kAxk2 ≤ kAkF kxk2 . By induction it follows for B ∈ Rn×n that
Proof: This lemma was proved in the course on basic linear algebra.
Proof: The proof uses Schur’s2 triangulation theorem: Every matrix A ∈ Rn×n can
be factored in the form A = U ∗ T U , where U ∈ Cn×n is a unitary matrix, U ∗ = U −1 (the
adjoint matrix is the inverse matrix), and T an upper triangular matrix of the form
λ1 t12 · · · t1n
0 λ2 · · · t2n
T = . .. .. ,
.. ..
. . .
0 0 · · · λn
2 Issai Schur (1875 - 1941)
4
with the eigenvalues λ1 , . . . , λn of A, e.g. see (Marcus and Minc, 1992, p.
67). The vector
norm k·k∗ is defined with the diagonal matrix Dδ = diag 1, δ, . . . , δ n−1 , δ > 0:
kxk∗ := Dδ−1 U x ∞
.
For the induced matrix norm it follows, using the Schur triangulation of A, that
Dδ−1 U Ax ∞
Dδ−1 U U ∗ T U x ∞
kAk∗ := max = max .
x∈R ,x6=0
n
Dδ−1 U x ∞
x∈R ,x6=0
n
Dδ−1 U x ∞
Setting y = Dδ−1 U x it follows that x = U ∗ Dδ y since the matrices U and Dδ are non-
singular and Dδ−1 U is a bijection from Rn to Rn . Inserting this expression gives
Dδ−1 U U ∗ T U U ∗ Dδ y
kAk∗ = max ∞
= Dδ−1 T Dδ .
y∈R ,y6=0
n kyk∞ ∞
The diagonal matrix Dδ−1 scales just the rows of T and the matrix Dδ just the columns of
T . Thus, the product is again an upper triangular matrix and a straigthforward calculation
shows that
λ1 δt12 · · · δ n−1 t1n
0 n−2
λ2 ··· δ t2n
. . .
−1
Dδ T Dδ = .. .. . .. ..
0 0 · · · δtn−1,n
0 0 ··· λn
and hence Dδ−1 T Dδ ∞
≤ ρ (A) + ε if δ is chosen sufficiently small.
in (2.2) shows that if A is positive (semi-)definite then also the diagonal matrix
diag (aii ) is positive (semi-)definite. 2
xT Ax xT Ax
λmax (A) = max , λmin (A) = min . (2.3)
x∈Rn ,x6=0 xT x x∈Rn ,x6=0 xT x
The quotient on the right hand side is called Rayleigh3 quotient. A symmetric
matrix is positive definite (s.p.d.) if and only if all of its eigenvalues are positive.
It is positive semi-definite if and only if all of its eigenvalues are non-negative.
3 John William Strutt (Lord Rayleigh) (1842 - 1919)
5
In the case of A ∈ Rn×n being symmetric and positive definite, one obtains for
the spectral condition number of A
1/2 1/2 1/2
2
kAk2 = λmax AT A = λmax A2 = (λmax (A)) = λmax (A) .
−1 −1
Since λmax A−1 = (λmin (A)) , one has A−1 2
= (λmin (A)) and
λmax (A)
κ2 (A) = ≥ 1.
λmin (A)
2
If for all i the larger sign holds, then A is called stronly diagonally dominant. 2
6
Chapter 3
of the system matrix A. Using this decomposition, (1.1) can be transformed into
the fixed point equation
Mx = b + Nx ⇐⇒ x = M −1 (b + N x) . (3.2)
Given an initial iterate x(0) ∈ Rn , one can try to solve (3.2) with the fixed point
iteration
x(k+1) = M −1 b + N x(k) , k = 0, 1, 2, . . . . (3.3)
Banach’s1 fixed point theorem gives information on the convergence of this it-
eration. 2
7
The operator G = M −1 N is linear and bounded since kGk∗ is finite, where k·k∗ is any ma-
trix norm. Hence, G is continuous and even Lipschitz continuous. Since f is continuously
differentiable, the Lipschitz constant is given by
L∗ = sup kJ (f (x))k∗ = sup kGk∗ = kGk∗ ,
x∈Rn x∈Rn
The factor |λ|k does not converge to zero since |λ| ≥ 1. Note that the second factor is a
vector. It converges to zero if and only if each of its components converges to zero. There
is at least one component zl with zl 6= 0 since z is an eigenvector. Let ζ be the argument
of zl . It is
cos (kϕ) Re (zl ) − sin (kϕ) Im (zl ) = |zl | (cos (kϕ) cos (ζ) − sin (kϕ) sin (ζ))
= |zl | cos (kϕ + ζ) . (3.4)
If λ ∈ R, i.e. ϕ = ±π, then the eigenvector z is real, too, such that ζ = ±π. In this case,
(3.4) takes the values |zl | or − |zl | and it does not tend to zero as k → ∞. If λ 6∈ R, then
the period ϕ in (3.4) is not an integer multiple of π such that the argument of the cosine
cannot tend to π/2 plus an integer multiple of π. Hence, also in this case, the second
factor in (3.4) does not tend to zero.
In summary, the iteration (3.3) does not converge for the initial iterate x(0) = z + z.
That means, if the iteration (3.3) converges for all initial iterates, then ρ (G) ≥ 1 cannot
hold.
Note: the last part of the proof simplifies much if one considers complex-valued systems
of linear equations. Then, one can take the initial iterate x(0) = z 6= 0, finds that
x(k) = λk z, and concludes that x(k) = |λ|k kzk2 6→ 0, since kzk2 6= 0 and |λ|k → ∞.
2
8
where D is the diagonal of A, L is its strict lower part and U its strict upper part.
2
M = D, N = − (L + U ) .
A straighforward calculation reveals that the fixed point equation (3.2) has the form
Example 3.6 Damped Jacobi method. Let ω ∈ R, ω > 0. The matrices which
define the fixed point equation for the damped Jacobi method are given by
M = ω −1 D, N = ω −1 D − A.
Example 3.7 Gauss–Seidel method. In the Gauss4 –Seidel5 method, the invertible
matrix M is a triangular matrix
M = D + L, N = −U.
It follows that
x(k+1) = (D + L) b − U x(k) ,
−1
k = 0, 1, 2, . . . ,
−1
such that the iteration matrix has the form GGS = − (D + L) U . Multiplying the
equation for the Gauss–Seidel method by (D + L) and rearranging terms gives the
more familiar form of this iteration
x(k+1) = D−1 b − Lx(k+1) − U x(k)
= x(k) + D−1 b − Lx(k+1) − (D + U ) x(k) , k = 0, 1, 2, . . . .
Writing this iteration for the components of the vector shows that the right hand
side can be evaluated even if the new iterate appears there, since only already
computed components of the new iterate are needed for this evaluation
i−1 n
(k+1) (k) 1 X (k+1)
X (k)
xi = xi + bi − aij xj − aij xj .
aii j=1 j=i
2
3 Carl Gustav Jacob Jacobi (1804 - 1851)
4 Johann Carl Friedrich Gauss (1777 - 1855)
5 Philipp Ludwig von Seidel (1821 - 1896)
9
Example 3.8 SOR method. The matrices which define the (forward) successive
over relaxation (SOR) method are given by
M = ω −1 D + L, N = ω −1 D − (D + U ) ,
For ω = 1, the Gauss–Seidel method is recovered. One obtains for the iteration
matrix
−1 −1
GSOR (ω) = ω −1 D + L ω D − (D + U )
−1
= ω (D + ωL) ω −1 D − (D + U )
−1
= (D + ωL) ((1 − ω) D − ωU ) .
Example 3.9 SSOR method. In the SOR method, one can change the roles of L
and U to obtain the backward SOR method
x(k+1) = x(k) + ωD−1 b − U x(k+1) − (D + L) x(k) , k = 0, 1, 2, . . . .
This method updates the unknowns in reverse order. The forward and backward
SOR behave in general differently. There are cases in which one of them works
much more efficient than the other one. However, in general one does not know
a priori which is the better variant. The SSOR (symmetric SOR) combines both
methods. One step of SSOR consists of two substeps, one forward SOR and one
backward SOR step:
x(k+1/2) = x(k) + ωD−1 b − Lx(k+1/2) − (D + U ) x(k)
x(k+1) = x(k) + ωD−1 b − U x(k+1) − (D + L) x(k+1/2) , k = 0, 1, 2, . . . .
It follows that
kGJac zk∞
ρ (GJac ) ≤ kGJac k∞ = max < 1.
z∈Rn ,z6=0 kzk∞
10
Gauss–Seidel method. A direct calculation shows (exercise)
where the term with the factor LGGS vanishes since the first row of L consists only of
zeros. Using now the induction (GGS z)j < kzk∞ , j < i, yields
i−1 n
!
1 X X
(GGS z)i ≤ |aij | (GGS z)j + |aij | |zj |
|aii | j=1 j=i+1
n
1 X
≤ |aij | kzk∞ < kzk∞ , i = 2, . . . , n.
|aii |
j=1,j6=i
| {z }
<|aii |
The statement of the lemma follows now from well known properties of eigenvalues.
Example 3.12 Convergence of the damped Jacobi method where the Jacobi method
fails. If ω is chosen appropriately, there is the possibility that the damped Jacobi
method converges for every initial guess whereas the Jacobi method does not.
Assume that GJac has only real eigenvalues. Denote by λmin the smallest one
and by λmax the largest one. If
then there are initial iterates for which the Jacobi method does not converge, The-
orem 3.3. From Lemma 3.11 one has
It follows that
2
−1 < µmin if ω < < 1, µmax < 1 if ω > 0.
1 − λmin
The choice of ω ∈ (0, 2/ (1 − λmin )) ensures the convergence of the damped Jacobi
method for each initial iterate.
Consider the case λmax > 1. Then
In this case, there are initial iterates for which the damped Jacobi method does not
converge as well. 2
11
Lemma 3.13 Parameter in the case that the SOR method converges,
Lemma of Kahan6 . If the SOR method converges for every initial iterates x(0) ∈
Rn then ω ∈ (0, 2).
Hence
n
Y
|λi | = |1 − ω|n .
i=1
There is at least one eigenvalue λi with |λi | ≥ |1 − ω| and it follows that ρ (GSOR (ω)) ≥
|1 − ω|. The application of Theorem 3.3 shows now that SOR cannot converge for all
initial iterates if ω 6∈ (0, 2), because if ω 6∈ (0, 2) then ρ (GSOR (ω)) ≥ |1 − ω| ≥ 1.
z∗ Az = uT Au − ivT Au + iuT Av − i2 vT Av
= uT Au − ivT Au + ivT Au + vT Av > 0.
6 William M. Kahan, born 1933
12
It follows that λ has the form
b + ia
λ= a, b, c ∈ R
c + ia
with ω ∗ ω ω ∗ ω
b= 1− z Dz − z∗ Az, c= 1− z Dz + z∗ Az.
2 2 2 2
Consequently
h ω ∗ ω i2
b2 + a2 1− z Dz − z∗ Az + a2
2
|λ| = 2 = h 2 2 .
c + a2 ω ∗ ω i2
1− z Dz + z∗ Az + a2
2 2
Thus |λ| < 1 only if the numerator is smaller than the denominator. This is equivalent to
ω ∗ ω ∗
∗
− |{z}
ω 1− Dz} z
z {z Az} < ω 1 − z Dz z∗ Az ⇐⇒
2 | | {z 2
>0 >0 >0
ω ω
− 1− < 1− ⇐⇒
2 2
ω < 2.
Hence, the SOR method converges for all initial iterates if ω ∈ (0, 2).
Remark 3.15 Difficulty of choosing ω in practice. For choosing ω such that the
SOR method converges as fast as possible, one needs information about the eigen-
values of A. However, the computation of these information is very costly, see
Numerical Mathematics I. 2
13
Chapter 4
is called co-domain of A. 2
The unit sphere is a compact set (bounded and closed) and the mapping y 7→
y∗ Ay/y∗ y is continuous. It follows that R (A) is also a compact set. 2
Lemma 4.4 Co-domain of the inverse matrix. Let A ∈ Rn×n with R (A) ⊂
{λ ∈ C : Re(λ) > 0}, i.e., the co-domain of A is a subset of the right half of the
complex plane. Then
R A−1 ⊂ {λ ∈ C : Re(λ) > 0} .
Proof: From the assumption it follows that A is non-singular. Otherwise, there would
be a vector z ∈ ker(A), z 6= 0, and
=0
z}|{ !
∗
z Az
Re = Re (0) = 0.
z∗ z
14
Let y ∈ Cn , y =
6 0, be arbitrary and z = A−1 y 6= 0. Hence, z is also an arbitrary
vector. One has
!
y∗ A−1 y 1 ∗ −1 1
Re = 2 Re y A y = Re((Az)∗ |A−1
{z A} z)
y∗ y kyk2 kAzk22
|{z} =I
∈R
1 1 ∗ 1
= Re (z∗ A∗ z) = Re (z∗ A∗ z) = Re (z∗ Az)
kAzk22 kAzk22 kAzk22
∗ ∗
kzk22 z Az 1 z Az
= Re ≥ Re > 0,
kAzk22 z∗ z kAk22 z∗ z
Hence,
2
x − x(k+1) = x − x(k) − αA x − x(k) , x − x(k) − αA x − x(k) (4.2)
2
2 T 2
= x − x(k) − 2α x − x(k) A x − x(k) + α2 A x − x(k) .
2 2
(k)
Denoting y = A x − x , one obtains
∈R
T T z }| {
x − x(k) A x − x(k) x − x(k) AT A−T A x − x(k) yT A−T y
2 = 2 =
kA (x − x(k) )k2 kA (x − x(k) )k2 yT y
T −1
y A y
= ≥ min Re(λ) : λ ∈ R A−1 > α,
yT y
⇐⇒
2 T
α2 A x − x(k) < α x − x(k) A x − x(k) .
2
Since R (A) is compact, there is a ε > 0 such that Re (λ) ≥ ε for all λ ∈ R (A) (there is
no sequence that can converge to the imaginary axis). Hence
T
x − x(k) A x − x(k)
2 ≥ ε.
kx − x(k) k2
15
Choose ε such that αε ≤ 1, then it follows from (4.3) that
2 2 2
x − x(k+1) ≤ x − x(k) (1 − αε) =: q 2 x − x(k)
2 2 2
x − x(k) ≤ q k x − x(0)
2 2
(k)
such that x → x as k → ∞.
Remark 4.6 Choice of α for s.p.d. matrices. Let A be symmetric and positive
definite. Using Rayleigh’s coefficient (2.3), one gets
Re y∗ A−1 y
2
kyk2
1
T T
= 2 (Re (y)) A−1 Re (y) + (Im (y)) A−1 Im (y)
kyk2
!
T T
1 2 (Re (y)) A−1 Re (y) 2 (Im (y)) A−1 Im (y)
= 2 kRe (y)k2 2 + kIm (y)k2 2
kyk2 kRe (y)k2 kIm (y)k2
1 2 2
≥ 2 kRe (y)k2 λmin A−1 + kIm (y)k2 λmin A−1
kyk2
1 1
= λmin A−1 = = .
λmax (A) ρ (A)
That means, the choice α < 1/ρ (A) guarantees the convergence of the Richardson
method. 2
gives
T
r(k) Ar(k)
αk = 2 . (4.4)
Ar(k) 2
Since
d2 2 2
r(k+1) = 2 Ar(k) > 0,
dαk2 2 2
16
It holds
r(1) = b − Ax(1) = b − Ax(0) − α0 Ar(0) = r(0) − α0 Ar(0)
and consequently n o
x(2) ∈ x(0) + span r(0) , Ar(0) .
Definition 4.9 Krylov subspace. Let q ∈ Rn and A ∈ Rn×n . Then, the space
Km (q, A) := span q, Aq, . . . , Ak−1 q
2 Krylov
17
Chapter 5
r(k) = b − Ax(k)
2 2
becomes minimal in the space Kk r(0) , A . Note, in the Richardson iteration with
the special choice (4.4), the norm of the residual will be minimized on the line
x(k) +τ r(k) . However,
the minimum on this line is in general not the global minimum
in x(0) + Kk r(0) , A . 2
18
Algorithm 5.3 Arnoldi method, modified Gram–Schmidt method. Given
A ∈ Rn×n and q1 ∈ Rn with kq1 k2 = 1.
1. for j = 1 : m
2. wj = Aqj
3. for i = 1 : j
4. hij = (wj , qi )
5. wj = wj − hij qi % subtract projection
6. endfor
7. hj+1,j = kwj k2
8. if hj+1,j == 0
9. stop
10. endif
11. qj+1 = wj /hj+1,j
12. endfor
2
Qm = (q1 , q2 , . . . , qm ) ∈ Rn×m ,
h11 h12 · · · h1,m−1 h1m
h21 h22 · · · h2,m−1 h2m
0 h32 · · · h3,m−1 h3m
∈ R(m+1)×m .
Hm = .. .. .. .. ..
.
. . . .
0 0 · · · hm,m−1 hmm
0 0 ··· 0 hm+1,m
A matrix of this form, i.e., hij = 0 for i > j + 1, is called (upper) Hessenberg6
matrix. It follows readily from Arnoldi’s method that
Remark 5.6 Initial vector in Krylov subspace methods. In the Krylov subspace
methods, r(0) / r(0) 2 plays the role of q1 in Arnoldi’s method. 2
6 Karl Hessenberg (1904 – 1959)
19
Remark 5.7 Principle approach for minimizing the residual. The goal of the meth-
ods presented in this section is to minimize the Euclidean norm of the residual. One
has
r(k) = b − Ax(k) = r(0) + Ax(0) − Ax(k)
2
2 2
= r − A x(k) − x(0)
(0)
2
with x(k) − x(0) ∈ Kk r(0) , A , see Remark 4.8. Since the vectors {q1 , . . . , qk }
computed with Arnoldi’s method form a generating system of Kk r(0) , A , it is
k
X
x(k) − x(0) = zi q i = Q k z
i=1
T
with z = (z1 , . . . , zk ) , Qk = (q1 , . . . , qk ). Using (5.1), Qk+1 e1 = q1 = r(0) / r(0) 2
and the fact that the Euclidean norm is invariant under orthonormal transforma-
tions, one obtains
2 2 2
r(k) = r(0) − AQk z = r(0) − Qk+1 Hk z
2 2 2
2 2
(0)
= r Qk+1 e1 − Qk+1 Hk z = Qk+1 r(0) e1 − H k z
2 2 2 2
2
= r(0) e1 − H k z .
2 2
The minimizer of the residual is obtained by solving the least squares problem
2
min r(0) e1 − H k z . (5.2)
z∈Rk 2 2
This problem possesses k unknowns and the vector that has to be minimized has
k + 1 components. It can be solved, e.g., with the QR algorithm, see lecture notes
Numerical Mathematics I. Let z(k) be a solution of this problem, then the next
iterate of the Krylov subspace method is
x(k) = x(0) + Qk z(k) . (5.3)
This algorithm is called GMRES (generalized minimal residual). It has been pro-
posed the first time in Saad and Schultz (1986). 2
Theorem 5.8 Properties of GMRES.
i) In the case that Arnoldi’s
method has an early breakdown, i.e., hl+1,l = 0,
then dim Kk r(0) , A = l < k and r(l) = 0. Hence Ax(l) = b.
ii) The iterate x(k) = x(0) + Qk z(k) is uniquely determined.
iii) It holds
r(k) ≤ r(k−1) , k = 1, 2, 3, . . .
2 2
20
Using matrix notations, Arnoldi’s method gives in this case
SinceA is non-singular
and
rank
(Ql ) = l, one has rank (AQl ) = l. Consequently, it is
rank Ql H̃l = l and rank H̃l = l and H̃l is invertible. In the same way as above, one
obtains
2 2
min r(l) = min r(0) e1 − H̃l z .
2 z∈Rl 2 2
The minimizer is given by z(l) = H̃l−1 r(0) e1 which gives r(l) = 0.
2 2
ii) If rank (Hk ) = k, the minimizer of (5.2) is unique (theory of least squares problems,
see Numerical Mathematics I). If rank (Hk ) < k, then x(k) = x, see i).
iii) The set in which
theminimizer is computed becomes larger since the inclusion
Kk r(0) , A ⊇ Kk−1 r(0) , A holds.
21
5.2 Symmetric Matrices
Remark 5.10 Goal. Arnoldi’s method and the minimization of the residual in
Kk r(0) , A will be studied in the special case that A is symmetric. The most
important result will be that in this case, it is not necessary to store the basis of
Kk r(0) , A . If suffices to store a fixed number of only few basis vectors. Thus, the
memory requirements do not increase in the course of the iteration and the most
important problem of using GMRES vanishes. 2
Arnoldi’s method simplifies. Using (5.1) and the special form of Hk , one obtains
the relation
Aqk = βk qk−1 + αk qk + βk+1 qk+1 .
From this relation, qk+1 can be computed. The corresponding algorithm is called
Lanczos7 algorithm.
2
22
10. endif
11. qj+1 = s/βj+1
12. endfor
2
Remark 5.13 Short recurrence. The computation of qj+1 requires only qj−1 and
qj , see lines 4, 6, and 11. This situation is called short recurrence. 2
since Qk has full rank and A is positive definite. Hence, H̃k is also positive definite (and
symmetric). It follows that H̃k is non-singular.
Remark 5.15 The minimzation of the residual revisited. In the second step, one
has to find a way to minimize the residual in Kk r(0) , A without having to store
the complete basis of Kk r(0) , A . Only if this is possible, then the short recurrence
of the Lanczos algorithm becomes useful.
The least squares problem which has to be solved has the form (5.2). To solve
this problem, a QR decomposition of Hk is used
Q̄ = G1 G2 · · · Gk−1 Gk
see lecture notes Numerical Mathematics I. For a Givens reflection, the non-diagonal
block has the form
cj sj
, c2j + s2j = 1.
sj −cj
The off diagonal entries are in the rows j and j + 1. It is
23
Since Hk is tridiagonal, one obtains
r11 r12 r13 0 ··· 0
0 r22 r23 r24 ··· 0
.. .. .. .. .. ..
.
. . . . .
R̄k = 0 rk−2,k ,
0 rk−1,k
0 rk,k
0 0 0 0 ··· 0
i.e. rij = 0 if j > i + 2. A Givens rotation changes only the two columns that are
involved, i.e. here the two neighbouring columns j and j + 1. Since the non-zero
entries at column j of Hk are at rows (j − 1), j, and j + 1, where the last will be
transformed to become zero, a new non-zero entry at column (j + 1) can occur only
at row (j − 1).
Consider the only interesting case r(k) 6= 0, in which the matrix Hk has full
rank. Let Rk be the matrix which consists of the first k rows of R̄k . The matrix
Rk is non-singular since Hk has full rank. Setting
one has from Pk Rk = Qk and due to the special form of Rk the recursion
q1
p1 = ,
r11
1
p2 = (q2 − r12 p1 ) (⇐= r22 p2 + r12 p1 = q2 ) ,
r22
..
.
1
pj = (qj − pj−1 rj−1,j − pj−2 rj−2,j ) , j = 3, . . . k. (5.5)
rjj
The least squares problem (5.2) can now rewritten in the form
2 2
min r(0) e1 − G1 G2 · · · Gk R̄k z = min r(0) GTk · · · GT2 GT1 e1 − R̄k z ,
z∈Rk 2 2 z∈Rk 2 2
because the Euclidean norm is invariant under a multiplication with a unitary ma-
trix. Since the last row of R̄k vanishes, its Moore9 –Penrose10 inverse (pseudo-
invers), see Numerical Mathematics I, is given by
R̄k+ = Rk−1 0 ∈ Rk×(k+1)
where the last index symbolizes that only the first k components of the vectors are
taken. Consequently, the iterate with the minimal residual has the form, see (5.3),
x(k) = x(0) + Qk z(k) = x(0) + r(0) Qk Rk−1 GTk · · · GT1 e1 1≤i≤k
2 | {z }
∈Rk+1
= x(0) + r(0) Pk GTk · · · GT1 e1 1≤i≤k
.
2
9 Eliakim Hastings Moore (1862 – 1932)
10 Roger Penrose, born 1931
24
Since the Givens rotation or reflection GTj influences only the rows j and j + 1 of
the vector to which it is applied, the first (j − 1) of its components stay unchanged:
GTj · · · GT1 e1 1≤i≤j−1 = GTj−1 · · · GT1 e1 1≤i≤j−1 .
It follows that
x(k)
= x(0) + r(0) Pk−1 GTk · · · GT1 e1 1≤i≤k−1
+ r(0) pk GTk · · · GT1 e1 i=k
2 2
= x(0) + r(0) Pk−1 GTk−1 · · · GT1 e1 1≤i≤k−1
+ r(0) pk GTk · · · GT1 e1 i=k
2 2
| {z }
=x(k−1)
(k−1) (0)
= x + r GTk · · · GT1 e1 i=k
pk .
2
For computing pk , one needs, see (5.5), qk , pk−1 , and pk−2 , The result pk can be
stored in place of pk−2 since pk−2 is not needed any longer. Together with the
short recurrence
of the Lanczos algorithm, it is shown that the storage of the basis
of Kk r(0) , A is not necessary.
The resulting method which computes iterates with minimal residual for sym-
metric matrices A is called MINRES. MINRES requires to store six arrays: qk ,
qk+1 , s, pk , pk−1 , and x(k) . In comparison with GMRES, the current iterate x(k)
is known and not only the residual of the current iterate. 2
xT Ay = (Ax, y) = 0.
Remark 5.19 S.p.d. matrices vs. other matrices. As it can be already seen in the
case of Krylov subspace methods that minimize the residual, one has to distinguish
the cases that A is s.p.d. or A is another matrix. These two cases represent two
worlds in the context of iterative methods for solving linear systems of equations.
Methods that can be used for general matrices and that are considered to work
usually well in this case, are generally not the best methods for s.p.d. matrices. The
solution of systems with s.p.d. matrices is much simpler. In engineering practice,
it is a common approach to try to reduce the solution of a complicated problem to
the sucessive solution of linear systems with s.p.d. matrices. 2
25
Chapter 6
Remark 6.1 Idea. The methods presented in this section determine the iterate
x(k) at the manifold x(0) + Kk r(0) , A such that the corresponding residual r(k)
is orthogonal to Kk r(0) , A . That means, r(k) is projected into the orthogonal
⊥
complement Kk r(0) , A of Kk r(0) , A . 2
26
6.2 Symmetric Matrices
Remark 6.3 SYMMLQ for symmetric matrices. If A is a symmetric matrix, there
is a way to perform FOM with a shortrecurrence, i.e., without having to store the
whole basis {q1 , . . . , qk } of Kk r(0) , A . The resulting method is called SYMMLQ.
This method will not be presented here. Instead, the case that A is symmetric and
positive definit will be studied in detail. Then, SYMMLQ can be simplified, leading
to the famous conjugate gradient (CG) method. 2
Remark 6.4 Lanczos algorithm for a s.p.d. matrix. CG will be derived from the
Lanczos algorithm 5.12. Starting point is the Cholesky1 decomposition of H̃k
H̃k = Lk Dk LTk (6.3)
1 0 ··· 0 0 d1 0 ··· 0 0
l1 1 ··· 0 0 0 d2 ··· 0 0
= .. .. ..
. . .
0 0 ··· lk−1 1 0 0 ··· dk
1 l1 ··· 0 0
.. ..
× . .
.
0 0 ··· 1 lk−1
0 0 ··· 0 1
Lemma 6.5 Colums of P̂k are A-conjugate. The columns {p̂1 , . . . , p̂k } are
mutually A-conjugate, i.e., P̂kT AP̂k is a diagonal matrix.
Proof: Using (5.4) and (6.3) gives
P̂kT AP̂k = L−1 T −T
k Qk AQk Lk = L−1 −T
k H̃k Lk = Dk .
Remark 6.6 First version of a method. The last column of P̂k is given by
p̂k = qk − lk−1 p̂k−1 , (6.5)
which follows immediately from Qk = P̂k LTk . The update yk in (6.4) has the form
T
yk = (yk−1 , ηk ) with ηk ∈ R, since
−1 −1
−1 −1 Dk−1 Lk−1 0
yk = βDk Lk e1 = β e1,k
dk lk−1 1
| {z }| {z }
−1
Dk−1 L−1
k−1 0
d−1
k ∗ 1
−1 −1 −1
Dk−1 Lk−1 0 βDk−1 L−1
k−1 e1,k−1 yk−1
= β e 1,k = = ,
∗ d−1
k ∗ ηk
1 André Louis Cholesky (1875 – 1918)
27
where e1,k is the first Cartesian unit vector with k components and e1,k−1 the first
Cartesian unit vector of length (k − 1). This means, the first (k − 1) components
of yk are the components of yk−1 . Now, one needs to find a formula for ηk .
From the definition of yk it follows that Lk Dk yk = βe1 , i.e.,
yk,1 d1 β
yk,2 d2 0
Lk .. = .. .
. .
ηk dk 0
Hence, lk−1 yk,k−1 dk−1 + ηk dk = 0 and
lk−1 yk,k−1 dk−1
ηk = − , if k ≥ 2. (6.6)
dk
The first component, η1 is given by
β
η1 = βD1−1 L−1
1 e1 = .
d1
Inserting all terms into (6.4) gives
y
k−1
x(k) = x(0) + P̂k yk = x(0) + P̂k−1 p̂k = x(0) + P̂k−1 yk−1 + ηk p̂k
ηk
= x(k−1) + ηk p̂k . (6.7)
Thus, the new iterate can be computed with (6.5) and (6.6). This approach shows
that a short recurrence is possible. However, it is not yet optimal and it can be
simplified. 2
Remark 6.7 Optimal version of the method. It holds for the residual, using (6.4),
that
r(k) = b − Ax(k) = b − Ax(0) − AP̂k yk = r(0) − Az
with some vector z ∈ Kk r(0) , A = span{q1 , . . . , qk } since the columns of P̂k form
a basis of Kk r(0) , A , see Remark 6.4. This representation shows first that
r(k) ∈ Kk+1 r(0) , A = span{q1 , . . . , qk , qk+1 }.
By construction, see Remark 6.2, it is also r(k) ⊥ Kk r(0) , A . These two properties
imply that r(k) ∈ span{qk+1 } such that
Setting
r(k)
qk+1 = (6.9)
r(k) 2
28
With (6.5) and (6.9), one gets
!
(k−1) (k−1) pk−1
pk = r p̂k = r qk − lk−1
2 2 r(k−2) 2
r(k−1) l
2 k−1
= r(k−1) − pk−1 = r(k−1) + µk pk−1 . (6.12)
r(k−2) 2
Now, formulas for νk and µk are needed. Multiplying (6.12) from the left hand side
with pTk A gives
2
pTk Apk = pTk Ar(k−1) + µk r(k−1) p̂T Ap̂k−1 = pTk Ar(k−1) . (6.13)
2 | k {z }
=0,Lemma 6.5
T
Multiplying (6.10) from left with r(k−1) and using r(j) = cqj , j = 1, . . . , k, and
the orthonormality of the vectors qj leads to
T T T
r(k−1) r(k) = r(k−1) r(k−1) − νk r(k−1) Apk ,
| {z }
=0
Now, multiplying (6.12) from left with pTk−1 A leads to, using Lemma 6.5,
which gives
pTk−1 Ar(k−1)
µk = − .
pTk−1 Apk−1
T
To simplify this expression, multiply (6.10) from left with r(k) such that one
obtains, using r(k) ⊥ r(k−1) ,
T
(k)
T
(k)
(k)
T r(k) r(k)
r r = 0 − νk r Apk =⇒ νk = − T .
r(k) Apk
such that T
r(k−1) r(k−1)
µk = T . (6.15)
r(k−2) r(k−2)
The evaluation of this expression requires only two inner products but not any
matrix-vector product. These considerations lead to Algorithm 6.8. 2
29
Algorithm 6.8 Conjugate Gradient (CG). Given a symmetric positive definite
matrix A ∈ Rn×n , a right hand side b ∈ Rn , an initial iterate x(0) ∈ Rn and a
tolerance ε > 0.
1. r(0) = b − Ax(0)
2. p1 = r(0)
3. k = 0
4. while r(k) 2 > ε
5. k =k+1
6. s = Apk
T
r(k−1) r(k−1)
7. νk = % (6.14)
pTk s
8. x(k) = x(k−1) + νk pk % (6.11)
9. r(k) = r(k−1) − νk s % (6.10)
T
r(k) r(k)
10. µk+1 = T % (6.15)
r(k−1) r(k−1)
11. pk+1 = r(k) + µk+1 pk % (6.12)
12. endwhile
2
Remark 6.9 First publication of CG. The CG method has been published the first
time by Hestenes2 and Stiefel3 in Hestenes and Stiefel (1952). 2
Definition 6.11 Energy norm. Let A ∈ Rn×n be s.p.d., then A induces a vector
norm by
1/2
kxkA = (x, Ax) ∀ x ∈ Rn ,
the so-called energy norm. 2
min kx − ykA ,
y∈{x(0) +Kk (r(0) ,A)}
30
where x is the solution of (1.1). The corresponding residual r(k) is orthogonal to
Kk r(0) , A , i.e., QTk r(k) = 0.
kx − yk2A
T
= (x − y)T A (x − y) = x − x(k) − z A x − x(k) − z
T T
= x − x(k) A x − x(k) − zT A x − x(k) − x − x(k) Az +zT Az
| {z }
T
T
= (x−x(k) ) Az ∈R
T
= x − x(k) A x − x(k) − 2zT A x − x(k) + zT Az
T
= x − x(k) A x − x(k) − 2 z T (k)
r }+z
| {z
T
Az}
| {z
=0 >0
T 2
(k) (k) (k)
> x−x A x−x = x−x .
A
Remark 6.13 On the energy norm. To minimize the error in the energy norm is
more natural than to minimize the Euclidean norm of the residual since
r(k) = A x − x(k) = x − x(k) 2 .
2 2 A
31
Chapter 7
Convergence of Krylov
Subspace Methods
Remark 7.1 Motivation. The Krylov subspace methods compute the solution of
(1.1) in at most n iterations (in exact arithmetic) by construction. However, this
property is useless if n is large. The question arises if one can get information about
the iterate x(k) for k < n. 2
Remark 7.2 Starting point of the convergence analysis. The basis of the con-
vergence analysis for Krylov subspace methods is the following observation: z ∈
Kk r(0) , A is equivalent to z = qk−1 (A) r(0) , where qk−1 ∈ Pk−1 is a polynomial
r(k) 2
≤ min kpk (A)k2 . (7.2)
r(0) 2
pk ∈Pk ,pk (0)=1
For all Krylov subspace methods, in particulr for those methods which are based
on projecting the residual, see Chapter 6, it holds with (7.1) that
x − x(k) = A−1 b − A−1 b − r(k) = A−1 r(k) = A−1 pk (A) r(0)
k k k
! ! !
X X X
= A−1 αi Ai r(0) = αi Ai−1 r(0) = αi Ai A−1 r(0)
i=0 i=0 i=0
−1 (0) (0)
= pk (A) A r = pk (A) x − x . (7.3)
32
Remark 7.3 S.p.d. matrices and the CG method. Consider the case that A is
symmetric and positive definite. Then, one gets from (7.3)
x − x(k) = pk (A) x − x(0) .
A A
The iterate x(k) of the conjugate gradient method is the minimizer of x − x(k) A
in x(0) + Kk r(0) , A , see Theorem 6.12. Hence
x − x(k) = min pk (A) x − x(0) ,
A pk ∈Pk ,pk (0)=1 A
since pk (A) is the only parameter in the expression on the right hand side. From
pk (A) x − x(0)
A
T 1/2
= pk (A) x − x(0) A pk (A) x − x(0)
T 1/2
1/2 (0) 1/2 (0)
= A pk (A) x − x A pk (A) x − x
T 1/2
= pk (A) A1/2 x − x(0) pk (A) A1/2 x − x(0)
= pk (A) A1/2 x − x(0)
2
1/2 (0)
≤ kpk (A)k2 A x−x = kpk (A)k2 x − x(0)
2 A
it follows that
x − x(k) A
≤ min kpk (A)k2 . (7.4)
x − x(0) A
pk ∈Pk ,pk (0)=1
The right hand side of (7.4) is the same as the right hand side of (7.2). 2
since i
QT ΛQ = QT Λ Q (QT Λ Q QT ΛQ . . . QT ΛQ = QT Λi Q
| {z } | {z }
=I =I
and the k·k2 -norm is invariant with respect to the multiplication with unitary matrices.
The matrix pk (Λ) is diagonal with the entries pk (λi ). Hence
33
Remark 7.5 Chebyshev polynomials. For proving the convergence theorem, Cheby-
shev1 polynomials of first kind will be use, see also the lecture notes of Numerical
Mathematics I,
Theorem 7.6 Estimate of the rate of convergence for s.p.d. matrices. Let
A be symmetric and positive definit. Then
p !k
κ2 (A) − 1
min kpk (A)k2 ≤ 2 p .
pk ∈Pk ,pk (0)=1 κ2 (A) + 1
Proof: The idea of the proof consists in constructing a special polynomial which gives
the estimate since
min kpk (A)k2 ≤ kpk,special (A)k2 .
pk ∈Pk ,pk (0)=1
Let λmin be the smallest and λmax be the largest eigenvalue of A. Consider the linear
function
λmin + λmax λmax − λmin
λ : R → R, t 7→ + t.
2 2
In particular, the restriction t ∈ [−1, 1] gives λ ∈ [λmin , λmax ]. The root of λ(t) is denoted
by t0 . It is
λmin + λmax κ2 (A) + 1
t0 = − =− < −1,
λmax − λmin κ2 (A) − 1
1 Pafnuty Lvovich Chebyshev (1821 – 1894)
34
where one uses that for symmetric positive definite matrices κ2 (A) = λmax /λmin . Denoting
by t(λ) the inverse function, one defines the special polynomial
Tk (t (λ)) Tk (t)
pk (λ) = =: ∈ Pk .
Tk (t (0)) Tk (t0 )
Then pk (0) = Tk (t0 ) /Tk (t0 ) = 1. It is by Lemma 7.4 and since λ ∈ [λmin , λmax ] for all
eigenvalues of A (the maximum does not decrease if it is searched in a larger set)
|Tk (t)|
kpk (A)k2 = max |pk (λ)| ≤ max |pk (λ)| = max
λ is eigenvalue of A λ∈[λmin ,λmax ] t∈[−1,1] |Tk (t0 )|
1 1
= max |Tk (t)| ≤ . (7.6)
|Tk (t0 )| t∈[−1,1] |Tk (t0 )|
| {z }
≤1
ekω0 + e−kω0
|Tk (t0 )| = (−1)k cosh(k arcosh(−t0 )) = |cosh (kω0 )| = .
| {z } 2
ω0
One has to estimate this term from below. Since −t0 > 1, one has
eω0 + e−ω0
= cosh (ω0 ) = cosh (arcosh (−t0 )) = −t0 ,
2
from what eω0 +e−ω0 = −2t0 follows. This is a quadratic equation in eω0 with the solution
q
eω0 = −t0 ± t20 − 1.
|{z}
>1
For estimating |Tk (t0 )|, one obtains a sharper estimate if the larger one of these two values
is considered, i.e.,
s
(κ2 (A) + 1)2 − (κ2 (A) − 1)2
q
ω0 2 κ2 (A) + 1
e = −t0 + t0 − 1 = +
κ2 (A) − 1 (κ2 (A) − 1)2
p 2
p p
κ2 (A) + 2 κ2 (A) + 1 κ2 (A) + 1 κ2 (A) + 1
= = p p = p .
κ2 (A) − 1 κ2 (A) + 1 κ2 (A) − 1 κ2 (A) − 1
Remark 7.7 Connection of the number of iterations and the spectral condition
number. To guarantee the reduction of the error by a factor η < 1, using the
estimate from Theorem 7.6,
p !k
κ2 (A) − 1
2 p ≤η
κ2 (A) + 1
35
If κ2 (A) is large, then a power series expansion gives
1+ √ 1
p !
κ2 (A) + 1 κ2 (A)
ln p = ln
κ2 (A) − 1 1− √ 1
κ2 (A)
! !
1 1
= ln 1 + p − ln 1 − p
κ2 (A) κ2 (A)
2
≈ p .
κ2 (A)
That means, the expected number of iterations to reduce the error by the factor η
increases with
− ln (η/2) p p
k≈ κ2 (A) = O κ2 (A) .
2
This bevavior can be observed in fact in many situations, e.g., for linear systems of
equations arising in discretizations of partial differential equations. 2
36