numerical_analysis
numerical_analysis
Lent 2016
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Polynomial approximation
Interpolation by polynomials. Divided differences of functions and relations to deriva-
tives. Orthogonal polynomials and their recurrence relations. Least squares approx-
imation by polynomials. Gaussian quadrature formulae. Peano kernel theorem and
applications. [6]
1
Contents IB Numerical Analysis
Contents
0 Introduction 3
1 Polynomial interpolation 4
1.1 The interpolation problem . . . . . . . . . . . . . . . . . . . . . . 4
1.2 The Lagrange formula . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The Newton formula . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 A useful property of divided differences . . . . . . . . . . . . . . 7
1.5 Error bounds for polynomial interpolation . . . . . . . . . . . . . 8
2 Orthogonal polynomials 12
2.1 Scalar product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Three-term recurrence relation . . . . . . . . . . . . . . . . . . . 14
2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Least-squares polynomial approximation . . . . . . . . . . . . . . 16
6 Stiff equations 37
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2 Linear stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2
0 Introduction IB Numerical Analysis
0 Introduction
Numerical analysis is the study of algorithms. There are many problems we
would like algorithms to solve. In this course, we will tackle the problems of
polynomial approximation, solving ODEs and solving linear equations. These
are all problems that frequently arise when we do (applied) maths.
In general, there are two things we are concerned with — accuracy and speed.
Accuracy is particularly important in the cases of polynomial approximation and
solving ODEs, since we are trying approximate things. We would like to make
good approximations of the solution with relatively little work. On the other
hand, in the case of solving linear equations, we are more concerned with speed
— our solutions will be exact (up to numerical errors due to finite precision of
calculations), but we would like to solve it quickly. We might have to deal with
huge systems, and we don’t want the computation time to grow too quickly.
In the past, this was an important subject, since they had no computers.
The algorithms had to be implemented by hand. It was thus very important to
find some practical and efficient methods of computing things, or else it would
take them forever to calculate what they wanted. So we wanted quick algorithms
that give reasonably accurate results.
Nowadays, this is still an important subject. While we have computers that
are much faster at computation, we still want our programs to be fast. We would
also want to get really accurate results, since we might be using them to, say,
send our rocket to the Moon. Moreover, with more computational power, we
might sacrifice efficiency for some other desirable properties. For example, if we
are solving for the trajectory of a particle, we might want the solution to satisfy
the conservation of energy. This would require some much more complicated and
slower algorithms that no one would have considered in the past. Nowadays, with
computers, these algorithms become more feasible, and are becoming increasingly
more popular.
3
1 Polynomial interpolation IB Numerical Analysis
1 Polynomial interpolation
Polynomials are nice. Writing down a polynomial of degree n involves only n + 1
numbers. They are easy to evaluate, integrate and differentiate. So it would be
nice if we can approximate things with polynomials. For simplicity, we will only
deal with real polynomials.
Notation. We write Pn [x] for the real linear vector space of polynomials (with
real coefficients) having degree n or less.
It is easy to show that dim(Pn [x]) = n + 1.
p(xi ) = fi for i = 0, · · · , n.
4
1 Polynomial interpolation IB Numerical Analysis
Note that these polynomials have degree exactly n. The significance of these
polynomials is we have `k (xi ) = 0 for i 6= k, and `k (xk ) = 1. In other words, we
have
`k (xj ) = δjk .
This is obvious from definition.
With these cardinal polynomials, we can immediately write down a solution
to the interpolation problem.
Theorem. The interpolation problem has exactly one solution.
Proof. We define p ∈ Pn [x] by
n
X
p(x) = fk `k (x).
k=0
Evaluating at xi gives
n
X n
X
p(xj ) = fk `k (xj ) = fk δjk = fj .
k=0 k=0
So we get existence.
For uniqueness, suppose p, q ∈ Pn [x] are solutions. Then the difference
r = p − q ∈ Pn [x] satisfies r(xj ) = 0 for all j, i.e. it has n + 1 roots. However, a
non-zero polynomial of degree n can have at most n roots. So in fact p − q is
zero, i.e. p = q.
While this works, it is not ideal. If we one day decide we should add one more
interpolation point, we would have to recompute all the cardinal polynomials,
and that is not fun. Ideally, we would like some way to reuse our previous
computations when we have new interpolation points.
pk (xi ) = fi for i = 0, · · · , k.
This is the unique degree-k polynomial that satisfies the first k + 1 conditions,
whose existence (and uniqueness) is guaranteed by the previous section. Then
we can write
p(x) = pn (x) = p0 (x) + (p1 (x) − p0 (x)) + · · · + (pn (x) − pn−1 (x)).
Hence we are done if we have an efficient way of finding the differences pk − pk−1 .
5
1 Polynomial interpolation IB Numerical Analysis
This formula has the advantage that it is built up gradually from the interpo-
lation points one-by-one. If we stop the sum at any point, we have obtained
the polynomial that interpolates the data for the first k points (for some k).
Conversely, if we have a new data point, we just need to add a new term, instead
of re-computing everything.
All that remains is to find the coefficients Ak . For k = 0, we know A0 is the
unique constant polynomial that interpolates the point at x0 , i.e. A0 = f0 .
For the others, we note that in the formula for pk − pk−1 , we find that Ak
is the leading coefficient of xk . But pk−1 (x) has no degree k term. So Ak must
be the leading coefficient of pk . So we have reduced our problem to finding the
leading coefficients of pk .
The solution to this is known as the Newton divided differences. We first
invent a new notation:
Ak = f [x0 , · · · , xk ].
Note that these coefficients depend only on the first k interpolation points.
Moreover, since the labelling of the points x0 , · · · , xk is arbitrary, we don’t have
to start with x0 . In general, the coefficient
f [xj , · · · , xk ]
is the leading coefficient of the unique q ∈ Pk−j [x] such that q(xi ) = fi for
i = j, · · · , k.
While we do not have an explicit formula for what these coefficients are, we
can come up with a recurrence relation for these coefficients.
Theorem (Recurrence relation for Newton divided differences). For 0 ≤ j <
k ≤ n, we have
f [xj+1 , · · · , xk ] − f [xj , · · · , xk−1 ]
f [xj , · · · , xk ] = .
xk − xj
Proof. The key to proving this is to relate the interpolating polynomials. Let
q0 , q1 ∈ Pk−j−1 [x] and q2 ∈ Pk−j satisfy
q0 (xi ) = fi i = j, · · · , k − 1
q1 (xi ) = fi i = j + 1, · · · , k
q2 (xi ) = fi i = j, · · · , k
We now claim that
x − xj xk − x
q2 (x) = q1 (x) + q0 (x).
xk − xj xk − xj
6
1 Polynomial interpolation IB Numerical Analysis
We can check directly that the expression on the right correctly interpolates
the points xi for i = j, · · · , k. By uniqueness, the two expressions agree. Since
f [xj , · · · , xk ], f [xj+1 , · · · , xk ] and f [xj , · · · , xk−1 ] are the leading coefficients of
q2 , q1 , q0 respectively, the result follows.
Thus the famous Newton divided difference table can be constructed
xi fi f [∗, ∗] f [∗, ∗, ∗] ··· f [∗, · · · , ∗]
x0 f [x0 ]
f [x0 , x1 ]
x1 f [x1 ] f [x0 , x1 , x2 ]
..
f [x1 , x2 ] .
x2 f [x2 ] f [x2 , x3 , x4 ] ··· f [x0 , x1 , · · · , xn ]
.
f [x2 , x3 ] ..
.
x3 f [x3 ] ..
.. .. .
. . ..
xn f [xn ]
From the first n columns, we can find the n + 1th column using the recurrence
relation above. The values of Ak can then be found at the top diagonal, and
this is all we really need. However, to compute this diagonal, we will need to
compute everything in the table.
In practice, we often need not find the actual interpolating polynomial. If we
just want to evaluate p(x̂) at some new point x̂ using the divided table, we can
use Horner’s scheme, given by
S <- f[x0 ,..., xn ]
for k = n - 1,..., 0
S <- (^x - xk )S + f[x0 ,..., xk ]
end
This only takes O(n) operations.
If an extra data point {xn+1 , fn+1 } is added, then we only have to compute
an extra diagonal f [xk , · · · , xn+1 ] for k = n, · · · , 0 in the divided difference table
to obtain the new coefficient, and the old results can be reused. This requires
O(n) operations. This is less straightforward for Lagrange’s method.
7
1 Polynomial interpolation IB Numerical Analysis
where
n
Y
ω(x) = (x − xi ).
i=0
for all x ∈ R. In particular, putting x = x̄, we have pn+1 (x̄) = f (x̄), and we get
the result.
Combining the two results, we find
Theorem. If in addition f ∈ C n+1 [a, b], then for each x ∈ [a, b], we can find
ξx ∈ (a, b) such that
1
en (x) = f (n+1) (ξx )ω(x)
(n + 1)!
Proof. The statement is trivial if x is an interpolation point — pick arbitrary
ξx , and both sides are zero. Otherwise, this follows directly from the last two
theorems.
8
1 Polynomial interpolation IB Numerical Analysis
This is an exact result, which is not too useful, since there is no easy
constructive way of finding what ξx should be. Instead, we usually go for a
bound. We introduce the max norm
Assuming our function f is fixed, this error bound depends only on ω(x),
which depends on our choice of interpolation points. So can we minimize ω(x)
in some sense by picking some clever interpolation points ∆ = {xi }ni=0 ? Here
we will have n fixed. So instead, we put ∆ as the subscript. We can write our
bound as
1
kf − p∆ k∞ ≤ kf (n+1) k∞ kω∆ k∞ .
(n + 1)!
So the objective is to find a ∆ that minimizes kω∆ k∞ .
For the moment, we focus on the special case where the interval is [−1, 1].
The general solution can be obtained by an easy change of variable.
For some magical reasons that hopefully will become clear soon, the optimal
choice of ∆ comes from the Chebyshev polynomials.
Definition (Chebyshev polynomial). The Chebyshev polynomial of degree n on
[−1, 1] is defined by
Tn (x) = cos(nθ),
where x = cos θ with θ ∈ [0, π].
So given an x, we find the unique θ that satisfies x = cos θ, and then find
cos(nθ). This is in fact a polynomial in disguise, since from trigonometric
identities, we know cos(nθ) can be expanded as a polynomial in cos θ up to
degree n.
Two key properties of Tn on [−1, 1] are
(i) The maximum absolute value is obtained at
πk
Xk = cos
n
for k = 0, · · · , n with
Tn (Xk ) = (−1)k .
9
1 Polynomial interpolation IB Numerical Analysis
T4 (x)
x
−1 1
−1
All that really matters about the Chebyshev polynomials is that the maximum
is obtained at n + 1 distinct points with alternating sign. The exact form of the
polynomial is not really important.
Notice there is an intentional clash between the use of xk as the zeros and
xk as the interpolation points — we will show these are indeed the optimal
interpolation points.
We first prove a convenient recurrence relation for the Chebyshev polynomials:
Lemma (3-term recurrence relation). The Chebyshev polynomials satisfy the
recurrence relations
Tn+1 (x) = 2xTn (x) − Tn−1 (x)
with initial conditions
T0 (x) = 1, T1 (x) = x.
Proof.
cos((n + 1)θ) + cos((n − 1)θ) = 2 cos θ cos(nθ).
This recurrence relation can be useful for many things, but for our purposes,
we only use it to show that the leading coefficient of Tn is 2n−1 (for n ≥ 1).
Theorem (Minimal property for n ≥ 1). On [−1, 1], among all polynomials
1
p ∈ Pn [x] with leading coefficient 1, 2n−1 kTn k minimizes kpk∞ . Thus, the
1
minimum value is 2n−1 .
Proof. We proceed by contradiction. Suppose there is a polynomial qn ∈ Pn
1
with leading coefficient 1 such that kqn k∞ < 2n−1 . Define a new polynomial
1
r= Tn − qn .
2n−1
This is, by assumption, non-zero.
Since both the polynomials have leading coefficient 1, the difference must
1 1
have degree at most n − 1, i.e. r ∈ Pn−1 [x]. Since 2n−1 Tn (Xk ) = ± 2n−1 , and
1
|qn (Xn )| < 2n−1 by assumption, r alternates in sign between these n + 1 points.
But then by the intermediate value theorem, r has to have at least n zeros. This
is a contradiction, since r has degree n − 1, and cannot be zero.
10
1 Polynomial interpolation IB Numerical Analysis
Corollary. Consider
n
Y
w∆ = (x − xi ) ∈ Pn+1 [x]
i=0
1
min kω∆ k∞ = .
∆ 2n
This minimum is achieved by picking the interpolation points to be the zeros of
Tn+1 , namely
2k + 1
xk = cos π , k = 0, · · · , n.
2n + 2
Theorem. For f ∈ C n+1 [−1, 1], the Chebyshev choice of interpolation points
gives
1 1
kf − pn k∞ ≤ n kf (n+1) k∞ .
2 (n + 1)!
Suppose f has as many continuous derivatives as we want. Then as we
increase n, what happens to the error bounds? The coefficients involve dividing
by an exponential and a factorial. Hence as long as the higher derivatives of f
don’t blow up too badly, in general, the error will tend to zero as n → ∞, which
makes sense.
The last two results can be easily generalized to arbitrary intervals [a, b], and
this is left as an exercise for the reader.
11
2 Orthogonal polynomials IB Numerical Analysis
2 Orthogonal polynomials
It turns out the Chebyshev polynomials is just an example of a more general
class of polynomials, known as orthogonal polynomials. As in linear algebra, we
can define a scalar product on the space of polynomials, and then find a basis
of orthogonal polynomials of the vector space under this scalar product. We
shall show that each set of orthogonal polynomials has a three-term recurrence
relation, just like the Chebyshev polynomials.
(ii) We can allow [a, b] to be infinite, e.g. [0, ∞) or even (−∞, ∞), but we have
to be more careful. We first define
Z b
hf, gi = w(x)f (x)g(x) dx
a
Rb
as before, but we now need more conditions. We require that a w(x)xn dx
to exist for all n ≥ 0, since we want to allow polynomials in our vector
2
space. For example, w(x) = e−x on [0, ∞), works, or w(x) = e−x on
(−∞, ∞). These are scalar products for Pn [x] for n ≥ 0, but we cannot
extend this definition to all smooth functions since they might blow up too
fast at infinity. We will not go into the technical details, since we are only
interested in polynomials, and knowing it works for polynomials suffices.
(iii) We can also have a discrete inner product, defined by
m
X
hf, gi = wj f (ξj )g(ξj )
j=1
12
2 Orthogonal polynomials IB Numerical Analysis
with {ξj }m m
j=1 distinct points and {wj }j=1 > 0. Now we have to restrict
ourselves a lot. This is a scalar product for V = Pm−1 [x], but not for
higher degrees, since a scalar product should satisfy hf, f i > 0 for f 6= 0.
In particular, we cannot extend this to all smooth functions.
With an inner product, we can define orthogonality.
Definition (Orthogonalilty). Given a vector space V and an inner product
h · , · i, two vectors f, g ∈ V are orthogonal if hf, gi = 0.
hpn+1 , pm i = 0
13
2 Orthogonal polynomials IB Numerical Analysis
To obtain uniqueness, assume both pn+1 , p̂n+1 ∈ Pn+1 [x] are both monic
orthogonal polynomials. Then r = pn+1 − p̂n+1 ∈ Pn [x]. So
So r = 0. So pn+1 = p̂n−1 .
Finally, we have to show that p0 , · · · , pn+1 form a basis for Pn+1 [x]. Now
note that every p ∈ Pn+1 [x] can be written uniquely as
p = cpn+1 + q,
where q ∈ Pn [x]. But {pk }nk=0 is a basis for Pn [x]. So q can be uniquely
decomposed as a linear combination of p0 , · · · , pn .
Alternatively, this follows from the fact that any set of orthogonal vectors
must be linearly independent, and since there are n + 2 of these vectors and
Pn+1 [x] has dimension n + 2, they must be a basis.
In practice, following the proof naively is not the best way of producing the
new pn+1 . Instead, we can reduce a lot of our work by making a clever choice of
qn+1 .
p0 = 1, p1 (x) = (x − α0 )p0 ,
where
hxpk , pk i hpk , pk i
αk = , βk = .
hpk , pk i hpk−1 , pk−1 i
Proof. By inspection, the p1 given is monic and satisfies
hp1 , p0 i = 0.
14
2 Orthogonal polynomials IB Numerical Analysis
We notice that hpn , xpk i and vanishes whenever xpk has degree less than n. So
we are left with
hxpn , pn i hpn , xpn−1 i
= xpn − pn − pn−1
hpn , pn i hpn−1 , pn−1 i
hpn , xpn−1 i
= (x − αn )pn − pn−1 .
hpn−1 , pn−1 i
2.4 Examples
The four famous examples are the Legendre polynomials, Chebyshev polynomials,
Laguerre polynomials and Hermite polynomials. We first look at how the
Chebyshev polynomials fit into this framework.
Chebyshev is based on the scalar product defined by
Z 1
1
hf, gi = √ f (x)g(x) dx.
−1 1 − x2
Note that the weight function blows up mildly at the end, but this is fine since
it is still integrable.
This links up with
Tn (x) = cos(nθ)
for x = cos θ via the usual trigonometric substitution. We have
Z π
1
hTn , Tm i = √ cos(nθ) cos(mθ) sin θ dθ
0 1 − cos2 θ
Z π
= cos(nθ) cos(mθ) dθ
0
= 0 if m 6= n.
The other orthogonal polynomials come from scalar products of the form
Z b
hf, gi = w(x)f (x)g(x) dx,
a
15
2 Orthogonal polynomials IB Numerical Analysis
kf − pk2 = hf − p, f − pi.
hf, pk i
ck = ,
kpk k2
16
2 Orthogonal polynomials IB Numerical Analysis
Note that the solution decouples, in the sense that ck depends only on f and
pk . If we want to take one more term, we just need to compute an extra term,
and not redo our previous work.
Also, we notice that the formula for the error is a positive term kf k2 sub-
tracting a lot of squares. As we increase n, we subtract more squares, and the
error decreases. If we are lucky, the error tends to 0 as we take n → ∞. Even
though we might not know how many terms we need in order to get the error to
be sufficiently small, we can just keep adding terms until the computed error
small enough (which is something we have to do anyway even if we knew what
n to take).
Proof. We consider a general polynomial
n
X
p= ck pk .
k=0
Note that there are no cross terms between the different coefficients. We minimize
this quadratic by setting the partial derivatives to zero:
∂
0= hf − p, f − pi = −2hf, pk i + 2ck kpk k2 .
∂ck
To check this is indeed a minimum, note that the Hessian matrix is simply 2I,
which is positive definite. So this is really a minimum. So we get the formula for
the ck ’s as claimed, and putting the formula for ck gives the error formula.
Note that our constructed p ∈ Pn [x] has a nice property: for k ≤ n, we have
hf, pk i
hf − p, pk i = hf, pk i − hp, pk i = hf, pk i − hpk , pk i = 0.
kpk k2
hf − p, qi = 0.
In particular, this is true when q = p, and tells us hf, pi = hp, pi. Using this to
expand hf − p, f − pi gives
kf − pk2 + kpk2 = kf k2 ,
17
3 Approximation of linear functionals IB Numerical Analysis
L(f ) = f (ξ).
L(f ) = f 0 (η).
In this case, we need to pick a vector space in which this makes sense, e.g.
the space of continuously differentiable functions.
(iii) We can define
Z b
L(f ) = f (x) dx.
a
The set of continuous (or even just integrable) functions defined on [a, b]
will be a sensible domain for this linear functional.
(iv) Any linear combination of these linear functions are also linear functionals.
For example, we can pick some fixed α, β ∈ R, and define
β−α 0
L(f ) = f (β) − f (α) − [f (β) + f 0 (α)].
2
18
3 Approximation of linear functionals IB Numerical Analysis
How can we choose the coefficients ai and the points xi so that our approxi-
mation is “good”?
We notice that most of our functionals can be easily evaluated exactly when
f is a polynomial. So we might approximate our function f by a polynomial,
and then do it exactly for polynomials.
More precisely, we let {xi }N
i=0 ⊆ [a, b] be arbitrary points. Then using the
Lagrange cardinal polynomials `i , we have
N
X
f (x) ≈ f (xi )`i (x).
i=0
So we can pick
ai = L(`i ).
Similar to polynomial interpolation, this formula is exact for f ∈ PN [x]. But we
could do better. If we can freely choose {ai }N N
i=0 and {xi }i=0 , then since we now
have 2n + 2 free parameters, we might expect to find an approximation that is
exact for f ∈ P2N +1 [x]. This is not always possible, but there are cases when
we can. The most famous example is Gaussian quadrature.
be a scalar product for Pν [x]. We will show that we can find weights, written
{bn }νk=1 , and nodes, written {ck }νk=1 ⊆ [a, b], such that the approximation
Z b ν
X
w(x)f (x) dx ≈ bk f (ck )
a k=1
is exact for f ∈ P2ν−1 [x]. The nodes {ck }νk=1 will turn out to be the zeros of
the orthogonal polynomial pν with respect to the scalar product. The aim of
this section is to work this thing out.
We start by showing that this is the best we can achieve.
19
3 Approximation of linear functionals IB Numerical Analysis
Proposition. There is no choice of ν weights and nodes such that the approxi-
Rb
mation of a w(x)f (x) dx is exact for all f ∈ P2ν [x].
Proof. Define
ν
Y
q(x) = (x − ck ) ∈ Pν [x].
k=1
Then we know Z b
w(x)q 2 (x) dx > 0,
a
the approximation
Z b ν
X
L(f ) = w(x)f (x) dx ≈ bk f (ck )
a k=1
20
3 Approximation of linear functionals IB Numerical Analysis
So there is at least one sign change in (a, b). We have already got the result we
need for ν = 1, since we only need one zero in (a, b).
Now for ν > 1, suppose {ξj }m j=1 are the places where the sign of pν changes
in (a, b) (which is a subset of the roots of pν ). We define
m
Y
q(x) = (x − ξj ) ∈ Pm [x].
j=1
Since this changes sign at the same place as pν , we know qpν maintains the same
sign in (a, b). Now if we had m < ν, then orthogonality gives
Z b
hq, pν i = w(x)q(x)pν (x) dx = 0,
a
which is impossible, since qpν does not change sign. Hence we must have
m = ν.
Theorem. In the ordinary quadrature, if we pick {ck }νk=1 to be the roots of
pν (x), then get we exactness for f ∈ P2ν−1 [x]. In addition, {bn }νk=1 are all
positive.
Proof. Let f ∈ P2ν−1 [x]. Then by polynomial division, we get
f = qpν + r,
But r has degree at most ν − 1, and this formula is exact for polynomials in
Pν−1 [x]. Hence we know
Z b Z b ν
X ν
X
w(x)f (x) dx = w(x)r(x) dx = bk r(ck ) = bk f (ck ).
a a k=1 k=1
21
4 Expressing errors in terms of derivatives IB Numerical Analysis
where Li are some simpler linear functionals, and suppose this is exact for
f ∈ Pk [x] for some k ≥ 0.
Hence we know the error
n
X
eL (f ) = L(f ) − ai Li (f ) = 0
i=0
whenever f ∈ Pk [x]. We say the error annihilates for polynomials of degree less
than k.
How can we use this property to generate formulae for the error and error
bounds? We first start with a rather simple example.
β−α 0
L(f ) ≈ f (α) + (f (β) + f 0 (α)),
2
where α 6= β. This is clearly much easier to evaluate. The error is given by
β−α 0
eL (f ) = f (β) − f (α) − (f (β) + f 0 (α)),
2
and this vanishes for f ∈ P2 [x].
How can we get a more useful error formula? We can’t just use the fact that
it annihilates polynomials of degree k. We need to introduce something beyond
this — the k + 1th derivative. We now assume f ∈ C k+1 [a, b].
Note that so far, everything we’ve done works if the interval is infinite, as long
as the weight function vanishes sufficiently quickly as we go far away. However,
for this little bit, we will need to require [a, b] to be finite, since we want to make
sure we can take the supremum of our functions.
We now seek an exact error formula in terms of f (k+1) , and bounds of the
form
|eL (f )| ≤ cL kf (k+1) k∞
for some constant cL . Moreover, we want to make cL as small as possible. We
don’t want to give a constant of 10 million, while in reality we can just use 2.
Definition (Sharp error bound). The constant cL is said to be sharp if for any
ε > 0, there is some fε ∈ C k+1 [a, b] such that
22
4 Expressing errors in terms of derivatives IB Numerical Analysis
23
4 Expressing errors in terms of derivatives IB Numerical Analysis
Hence we can find the constant cL for different choices of the norm. When
1
computing cL , don’t forget the factor of k! !
By fiddling with functions a bit, we can show these bounds are indeed sharp.
Example. Consider our previous example where
β−α 0
eL (f ) = f (β) − f (α) − (f (β) + f 0 (α)),
2
with exactness up to polynomials of degree 2. We wlog assume α < β. Then
Hence we get
0
a≤θ≤α
K(θ) = (α − θ)(β − θ) α ≤ θ ≤ β
0 β ≤ θ ≤ b.
Hence we know Z β
1
eL (f ) = (α − θ)(β − θ)f 000 (θ) dθ
2 α
can be achieved by xk+1 , since this has constant k + 1th derivative. Also, we
can use the integral mean value theorem to get the bound
Z b !
1
λ(f ) = K(θ) dθ f (k+1) (ξ),
k! a
24
4 Expressing errors in terms of derivatives IB Numerical Analysis
Finally, note that Peano’s kernel theorem says if eL (f ) = 0 for all f ∈ Pk [x],
then we have
1 b
Z
eL (f ) = K(θ)f (k+1) (θ) dθ
k! a
for all f ∈ C k+1 [a, b].
But for any other fixed j = 0, · · · , k − 1, we also have eL (f ) = 0 for all
f ∈ Pj [x]. So we also know
Z b
1
eL (f ) = Kj (θ)f (j+1) (θ) dθ
j! a
for all f ∈ C j+1 [a, b]. Note that we have a different kernel.
In general, this might not be a good idea, since we are throwing information
away. Yet, this can be helpful if we get some less smooth functions that don’t
have k + 1 derivatives.
25
5 Ordinary differential equations IB Numerical Analysis
y(0) = y0 .
The data we are provided is the function f : R×RN → RN , the ending time T > 0,
and the initial condition y0 ∈ Rn . What we seek is the function y : [0, T ] → RN .
When solving the differential equation numerically, our goal would be to
make our numerical solution as close to the true solution as possible. This makes
sense only if a “true” solution actually exists, and is unique. From IB Analysis
II, we know a unique solution to the ODE exists if f is Lipschitz.
Definition (Lipschitz function). A function f : R × RN → RN is Lipschitz with
Lipschitz constant λ ≥ 0 if
It doesn’t really matter what norm we pick. It will just change the λ. The
importance is the existence of a λ.
A special case is when λ = 0, i.e. f does not depend on x. In this case, this
is just an integration problem, and is usually easy. This is a convenient test case
— if our numerical approximation does not even work for these easy problems,
then it’s pretty useless.
Being Lipschitz is sufficient for existence and uniqueness of a solution to
the differential equation, and hence we can ask if our solution converges to
this unique solution. An extra assumption we will often make is that f can
be expanded in a Taylor series to as many degrees as we want, since this is
convenient for our analysis.
What exactly does a numerical solution to the ODE consist of? We first
choose a small time step h > 0, and then construct approximations
yn ≈ y(tn ), n = 1, 2, · · · ,
26
5 Ordinary differential equations IB Numerical Analysis
yn+1 = φh (tn , yn )
yn+1 = yn + hf (tn , yn ).
We want to show that this method “converges”. First of all, we need to make
precise the notion of “convergence”. The Lipschitz condition means there is a
unique solution to the differential equation. So we would want the numerical
solution to be able to approximate the actual solution to arbitrary accuracy as
long as we take a small enough h.
Definition (Convergence of numerical method). For each h > 0, we can produce
a sequence of discrete values yn for n = 0, · · · , [T /h], where [T /h] is the integer
part of T /h. A method converges if, as h → 0 and nh → t (hence n → ∞), we
get
yn → y(t),
where y is the true solution to the differential equation. Moreover, we require
the convergence to be uniform in t.
We now prove that Euler’s method converges. We will only do this properly for
Euler’s method, since the algebra quickly becomes tedious and incomprehensible.
However, the proof strategy is sufficiently general that it can be adapted to most
other methods.
Theorem (Convergence of Euler’s method).
(i) For all t ∈ [0, T ], we have
lim yn − y(t) = 0.
h→0
nh→t
(ii) Let λ be the Lipschitz constant of f . Then there exists a c ≥ 0 such that
eλT − 1
ken k ≤ ch
λ
for all 0 ≤ n ≤ [T /h], where en = yn − y(tn ).
27
5 Ordinary differential equations IB Numerical Analysis
Note that the bound in the second part is uniform. So this immediately gives
the first part of the theorem.
Proof. There are two parts to proving this. We first look at the local truncation
error. This is the error we would get at each step assuming we got the previous
steps right. More precisely, we write
y(tn+1 ) = y(tn ) + h(f , tn , y(tn )) + Rn ,
and Rn is the local truncation error. For the Euler’s method, it is easy to get
Rn , since f (tn , y(tn )) = y0 (tn ), by definition. So this is just the Taylor series
expansion of y. We can write Rn as the integral remainder of the Taylor series,
Z tn+1
Rn = (tn+1 − θ)y 00 (θ) dθ.
tn
Finally, we have
(1 + hλ) ≤ eλh ,
since 1 + λh is the first two terms of the Taylor series, and the other terms are
positive. So
(1 + hλ)n ≤ eλhn ≤ eλT .
So we obtain the bound
eλT − 1
ken k∞ ≤ ch .
λ
Then this tends to 0 as we take h → 0. So the method converges.
28
5 Ordinary differential equations IB Numerical Analysis
If we put θ = 1, then we just get Euler’s method. The other two most
common choices of θ are θ = 0 (backward Euler) and θ = 12 (trapezoidal rule).
Note that for θ =6 1, we get an implicit method. This is since yn+1 doesn’t
just appear simply on the left hand side of the equality. Our formula for yn+1
involves yn+1 itself! This means, in general, unlike the Euler method, we can’t
just write down the value of yn+1 given the value of yn . Instead, we have to
treat the formula as N (in general) non-linear equations, and solve them to find
yn+1 !
In the past, people did not like to use this, because they didn’t have computers,
or computers were too slow. It is tedious to have to solve these equations in
every step of the method. Nowadays, these are becoming more and more popular
because it is getting easier to solve equations, and θ-methods have some huge
theoretical advantages (which we do not have time to get into).
We now look at the error of the θ-method. We have
η = y(tn+1 ) − y(tn ) − h θy0 (tn ) + (1 − θ)y0 (tn+1 )
29
5 Ordinary differential equations IB Numerical Analysis
This formula is used to find the value of yn+s given the others.
One point to note is that we get the same method if we multiply all the
constants ρ` , σ` by a non-zero constant. By convention, we normalize this by
setting ρs = 1. Then we can alternatively write this as
s
X s−1
X
yn+s = h σ` f (tn+` , yn+` ) − ρ` yn+` .
`=0 `=0
and
s
X s
X
ρ` `k = k σ` `k−1
`=0 `=0
0
for k = 1, · · · , p, where 0 = 1.
30
5 Ordinary differential equations IB Numerical Analysis
We now expand the e`x in Taylor series about x = 0. This comes out as
∞
s s s
!
X X 1 X k X
ρ` + ρ` ` − k σ` `k−1 xk .
k!
`=0 k=1 `=0 `=0
31
5 Ordinary differential equations IB Numerical Analysis
We’ve sorted out the order of multi-step methods. The next thing to check
is convergence. This is where the difference between one-step and multi-step
methods come in. For one-step methods, we only needed the order to understand
convergence. It is a fact that a one step method converges whenever it has an
order p ≥ 1. For multi-step methods, we need an extra condition.
Definition (Root condition). We say ρ(w) satisfies the root condition if all its
zeros are bounded by 1 in size, i.e. all roots w satisfy |w| ≤ 1. Moreover any
zero with |w| = 1 must be simple.
We can imagine this as saying large roots are bad — they cannot get past 1,
and we cannot have too many with modulus 1.
We saw any sensible multi-step method must have ρ(1) = 0. So in particular,
1 must be a simple zero.
Theorem (Dahlquist equivalence theorem). A multi-step method is convergent
if and only if
(i) The order p is at least 1; and
(ii) The root condition holds.
The proof is too difficult to include in this course, or even the Part II version.
This is only done in Part III.
Example (AB2). Again consider the two-step Adams-Bashforth method. We
have seen it has order p = 2 ≥ 1. So we need to check the root condition. So
ρ(w) = w2 − w = w(w − 1). So it satisfies the root condition.
Let’s now come up with a sensible strategy for constructing convergent s-step
methods:
(i) Choose a ρ so that ρ(1) = 0 and the root condition holds.
(ii) Choose σ to maximize the order, i.e.
(
ρ(w) O(|w − 1|s+1 ) if implicit
σ= +
log w O(|w − 1|s ) if explicit
We have the two different conditions since for implicit methods, we have
one more coefficient to fiddle with, so we can get a higher order.
Where does the log1 w come from? We try to substitute w = ex (noting that
ex − 1 ∼ x). Then the formula says
(
x 1 x O(xs+1 ) if implicit
σ(e ) = ρ(e ) + .
x O(xs ) if explicit
Rearranging gives
(
O(xs+2 ) if implicit
ρ(ex ) − xσ(ex ) = ,
O(xs+1 ) if explicit
which is our order condition. So given any ρ, there is only one sensible way to
pick σ. So the key is in picking a good enough ρ.
32
5 Ordinary differential equations IB Numerical Analysis
The root conditions is “best” satisfied if ρ(w) = ws−1 (w − 1), i.e. all but one
roots are 0. Then we have
s
X
yn+s − yn+s−1 = h σ` f (tn+` , yn+` ),
`=0
33
5 Ordinary differential equations IB Numerical Analysis
s
Multiplying by σs w gives the desired result.
For this method to be convergent, we need to make sure it does satisfy the
root condition. It turns out the root condition is satisfied only for s ≤ 6. This is
not obvious by first sight, but we can certainly verify this manually.
where
ν
X
k` = f tn + cn h, yn + h a`j kj
j=1
for ` = 1, · · · , ν.
There are a lot of parameters we have to choose. We need to pick
34
5 Ordinary differential equations IB Numerical Analysis
Note that in general, {k` }ν`=1 have to be solved for, since they are defined in
terms of one another. However, for certain choices of parameters, we can make
this an explicit method. This makes it easier to compute, but we would have
lost some accuracy and flexibility.
Unlike all the other methods we’ve seen so far, the parameters appear inside
f . They appear non-linearly inside the functions. This makes the method much
more complicated and difficult to analyse using Taylor series. Yet, once we
manage to do this properly, these have lots of nice properties. Unfortunately, we
will not have time to go into what these properties actually are.
Notice this is a one-step method. So once we get order p ≥ 1, we will have
convergence. So what conditions do we need for a decent order?
This is in general very complicated. However, we can quickly obtain some
necessary conditions. We can consider the case where f is a constant. Then k`
is always that constant. So we must have
ν
X
b` = 1.
`=1
While these are necessary conditions, they are not sufficient. We need other
conditions as well, which we shall get to later. It is a fact that the best possible
order of a ν-stage Runge-Kutta method is 2ν.
To describe a Runge-Kutta method, a standard notation is to put the
coefficients in the Butcher table:
c1 a11 ··· a1ν
.. .. .. ..
. . . .
cν aν1 ··· aνν
b1 ··· bν
We sometimes more concisely write it as
c A
vT
This table allows for a general implicit method. Initially, explicit methods came
out first, since they are much easier to compute. In this case, the matrix A is
strictly lower triangular, i.e. a`j whenever ` ≤ j.
Example. The most famous explicit Runge-Kutta method is the 4-stage 4th
order one, often called the classical Runge-Kutta method. The formula can be
given explicitly by
h
yn+1 = yn + (k1 + 2k2 + 2k3 + k4 ),
6
35
5 Ordinary differential equations IB Numerical Analysis
where
k1 = f (xn , yn )
1 1
k2 = f xn + h, yn + hk1
2 2
1 1
k3 = f xn + h, yn + hk2
2 2
k4 = f (xn + h, yn + hk3 ) .
We see that this is an explicit method. We don’t need to solve any equations.
Choosing the parameters for the Runge-Kutta method to maximize order
is hard. Consider the simplest case, the 2-stage explicit method. The general
formula is
yn+1 = yn + h(b1 k1 + b2 k2 ),
where
k1 = f (xn , yn )
k2 = f (xn + c2 h, yn + r2 hk1 ).
To analyse this, we insert the true solution into the method. First, we need to
insert the true solution of the ODE into the k’s. We get
k1 = y0 (tn )
k2 = f (tn + c2 h, y(tn ) + c2 hy0 (tn ))
0 ∂f 0
= y (tn ) + c2 h (tn .y(tn )) + ∇f (tn , y(tn ))y (tn ) + O(h2 )
∂t
Fortunately, we notice that the thing inside the huge brackets is just y00 (tn ). So
this is
= y0 (tn ) + c2 hy00 (tn ) + O(h2 ).
Hence, our local truncation method for the Runge-Kutta method is
36
6 Stiff equations IB Numerical Analysis
6 Stiff equations
6.1 Introduction
Initially, when people were developing numerical methods, people focused mostly
on quantitative properties like order and accuracy. We then develop many
different methods like multi-step methods and Runge-Kutta methods.
More recently, people started to look at structural properties. Often, equations
come with some special properties. For example, a differential equation describing
the motion of a particle would most probably conserve energy. When we
approximate it numerically, we would like the numerical approximation to satisfy
conservation of energy as well. This is what recent developments are looking at —
we want to look at whether numerical methods preserve certain nice properties.
We are not going to look at conservation of energy — this is too complicated
for a first course. Instead, we look at the following problem. Suppose we have a
system of ODEs for 0 ≤ t ≤ T :
y0 (t) = f (t, y(t))
y(0) = y0 .
Suppose T > 0 is arbitrary, and
lim y(t) = 0.
t→∞
37
6 Stiff equations IB Numerical Analysis
Re
−1
Re
1
38
6 Stiff equations IB Numerical Analysis
Example (Trapezoidal rule). Again consider y 0 (t) = λy, with the trapezoidal
rule. Then we can find n
1 + hλ/2
yn = .
1 − hλ/2
So the linear stability domain is
2+z
D= z∈C: <1 .
2−z
Since |g| needs to have a maximum in the closure of Ω, the maximum must
occur on the boundary. So to show |g| ≤ 1 on the region Ω, we only need to
show the inequality holds on the boundary ∂Ω.
We try Ω = C− . The trick is to first check that g is analytic in Ω, and
then check what happens at the boundary. This technique is made clear in the
following example:
Example. Consider
6 − 2z
r(z) = .
6 − 4z + z 2
This is still pretty simple, but can illustrate how we can use the maximum
principle.
We
√ first check if it is analytic. This certainly has some poles, but they are
2 ± 2i, and are in the right-half plane. So this is analytic in C− .
Next, what happens at the boundary of the left-half plane? Firstly, as
|z| → ∞, we find r(z) → 0, since we have a z 2 at the denominator. The next
part is checking when z is on the imaginary axis, say z = it with t ∈ R. Then
we can check by some messy algebra that
|r(it)| ≤ 1
for t ∈ R. Therefore, by the maximum principle, we must have |r(z)| ≤ 1 for all
z ∈ C− .
39
7 Implementation of ODE methods IB Numerical Analysis
40
7 Implementation of ODE methods IB Numerical Analysis
There are two ways to solve this equation for yn+1 . The simplest method is
functional iteration. As the name suggests, this method is iterative. So we use
superscripts to denote the iterates. In this case, we use the formula
(k+1) k
yn+1 = yn + hf (tn+1 , yn+1 ).
(0) (0)
To do this, we need to find a yn+1 . Usually, we start yn+1 = yn . Even better,
(0)
we can use some simpler explicit method to obtain our first guess of yn+1 .
The question, of course, is whether this converges. Fortunately, this converges
to a locally unique solution if λh is sufficiently small, where λ is the Lipschitz
constant of f . For the backward Euler, we will require λh < 1. This relies on
the contraction mapping theorem, which you may have met in IB Analysis II.
Does this matter? Sometimes it does. Usually, we pick an h using accuracy
considerations, picking the largest possible h that still gives us the desired
accuracy. However, if we use this method, we might need to pick a much smaller
h in order for this to work. This will require us to compute much more iterations,
and can take a lot of time.
An alternative is Newton’s method. This is given by the formula
(k) (k)
(I − hJ (k) )z(k) = yn+1 − (yn + hf (tn+1 , yn+1 ))
(k+1)
yn+1 = yn(k) − z(k) ,
This requires us to solve for z in the first equation, but this is a linear system,
which we have some efficient methods for solving.
There are several variants to Newton’s method. This is the full Newton’s
method, where we re-compute the Jacobian in every iteration. It is also possible
to just use the same Jacobian over and over again. There are some speed gains
in solving the equation, but then we will need more iterations before we can get
our yn+1 .
41
8 Numerical linear algebra IB Numerical Analysis
L U
Why are triangular matrices nice? First of all, it is easy to find the determinants
of triangular matrices. We have
n
Y n
Y
det(L) = Lii , det(U ) = U11 .
i=1 i=1
42
8 Numerical linear algebra IB Numerical Analysis
There is nothing to solve in the first line. We can immediately write down the
value of x1 . Substituting this into the second line, we can then solve for x2 . In
general, we have
i−1
1 X
xi = bi − Lij xj .
Lii j=1
8.2 LU factorization
In general, we don’t always have triangular matrices. The idea is, for every
matrix A, find some lower and triangular matrices L, U such that A = LU .
43
8 Numerical linear algebra IB Numerical Analysis
Clearly, these rows and columns cannot be arbitrary, since L and U are triangular.
In particular, li , ui must be zero in the first i − 1 entries.
Suppose this is an LU factorization of A. Then we can write
What do these matrices look like? For each i, we know li and ui have the first
i − 1 entries zero. So the first i − 1 rows and columns of li uTi are zero. In
particular, the first row and columns only have contributions from l1 uT1 , the
second row/column only has contributions from l2 uT2 and l1 uT1 etc.
The plan is as follows:
(i) Obtain l1 and u1 from the first row and column of A. Since the first entry
of l1 is 1, uT1 is exactly the first row of A. We can then obtain l2 by taking
the first column of A and dividing by U11 = A11 .
(ii) Obtain l2 and u2 form the second row and column of A − l1 uT1 similarly.
(iii) · · ·
Pn−1
(iv) Obtain ln and uTn from the nth row and column of A − i=1 li uTi .
We can turn this into an algorithm. We define the intermediate matrices, starting
with
A(0) = A.
44
8 Numerical linear algebra IB Numerical Analysis
For k = 1, · · · , n, we let
(k−1)
Ukj = Akj j = k, · · · , n
Aik
Lik = (k−1)
i = k, · · · , n
Akk
(k) (k−1)
Aij = Aij − Lik Ukj i, j ≥ k
When k = n, we end up with a zero matrix, and then U and L are completely
filled.
We can now see when this will break down. A sufficient condition for A = LU
(k−1) (k−1)
to exist is that Akk 6= 0 for all k. Since Akk = Ukk , this sufficient condition
ensures U , and hence A is non-singular. Conversely, if A is non-singular and
an LU factorization exists, then this would always work, since we must have
(k−1)
Akk = Ukk 6= 0. Moreover, the LU factorization must be given by this
algorithm. So we get uniqueness.
The problem with this sufficient condition is that most of these coefficients
do not appear in the matrix A. They are constructed during the algorithm. We
don’t know easily what they are in terms of the coefficients of A. We will later
come up with an equivalent condition on our original A that is easier to check.
Note that as long as this method does not break down, we need O(n3 )
operations to perform this factorization. Recall we only needed O(n2 ) operations
to solve the equation after factorization. So the bulk of the work in solving
Ax = b is in doing the LU factorization.
As before, this allows us to find the inverse of A if it is non-singular. In
particular, solving Axj = ej gives the jth column of A−1 . Note that we are
solving the system for the same A for each j. So we only have to perform the
LU factorization once, and then solve n different equations. So in total we need
O(n3 ) operations.
However, we still have the problem that factorization is not always possible.
Requiring that we must factorize A as LU is too restrictive. The idea is to factor
something closely related, but not exactly A. Instead, we want a factorization
P A = LU,
45
8 Numerical linear algebra IB Numerical Analysis
Again, if the kth column of A(k−1) is completely zero, we set lk = ek and uTk to
be the kth row of A(k−1) . But again this implies A and U will be singular.
However, as we do this, the permutation matrices appear all over the place
inside the algorithm. It is not immediately clear that we do get a factorization
of the form P A = LU . Fortunately, keeping track of the interchanges, we do
have an LU factorization
P A = L̃U,
where U is what we got from the algorithm,
P = Pn−1 · · · P2 P1 ,
while L̃ is given by
L̃ = l̃1 ··· l̃n , l̃k = Pn−1 · · · Pk−1 lk .
One problem we have not considered is the problem of inexact arithmetic. While
these formula are correct mathematically, when we actually implement things,
we do them on computers with finite precision. As we go through the algorithm,
errors will accumulate, and the error might be amplified to a significant amount
when we get the reach the end. We want an algorithm that is insensitive to
errors. In order to work safely in inexact arithmetic, we will put the element of
largest modulus in the (k, k)th position, not just an arbitrary non-zero one, as
this minimizes the error when dividing.
46
8 Numerical linear algebra IB Numerical Analysis
Theorem. A sufficient condition for the existence for both the existence and
uniqueness of A = LU is that det(Ak ) 6= 0 for k = 1, · · · , n − 1.
Note that we don’t need A to be non-singular. This is equivalent to the
(k−1)
restriction Akk 6= 0 for k = 1, · · · , n − 1. Also, this is a sufficient condition,
not a necessary one.
Proof. Straightforward induction.
We extend this result a bit:
Theorem. If det(Ak ) 6= 0 for all k = 1, · · · , n, then A ∈ Rn×n has a unique
factorization of the form
A = LDÛ ,
where D is non-singular diagonal matrix, and both L and Û are unit triangular.
Proof. From the previous theorem, A = LU exists. Since A is non-singular, U
is non-singular. So we can write this as
U = DÛ ,
A = LDLT ,
A = LDÛ .
A = AT = Û T DLT .
xT Ax > 0
for x 6= 0 ∈ Rn .
Theorem. Let A ∈ Rn×n be a positive-definite matrix. Then det(Ak ) 6= 0 for
all k = 1, · · · , n.
47
8 Numerical linear algebra IB Numerical Analysis
if y 6= 0.
Now if A is positive definite, it has an LU factorization, and since A is
symmetric, we can write it as
A = LDLT ,
where L is unit lower triangular and D is diagonal. Now we have to show
Dkk > 0. We define yk such that LT yk = ek , which exist, since L is invertible.
Then clearly yk 6= 0. Then we have
Dkk = eTk Dek = ykT LDLT yk = ykT Ayk > 0.
So done.
This is a practical check for symmetric A being positive definite. We can
perform this LU factorization, and then check whether the diagonal has positive
entries.
Definition (Cholesky factorization). The Cholesky factorization of a symmetric
positive-definite matrix A is a factorization of the form
A = LDLT ,
with L unit lower triangular and D a positive-definite diagonal matrix.
There is another way of doing this. We let D1/2 be the “square root” of D,
by taking the positive square root of the diagonal entries of D. Then we have
A = LDLT = LD1/2 D1/2 LT = (LD1/2 )(LD1/2 )T = GGT ,
where G is lower triangular with Gkk > 0. This is another way of presenting this
result.
Finally, we look at the LU factorization of band matrices.
48
8 Numerical linear algebra IB Numerical Analysis
The result is
Proposition. If a band matrix A has band width r and an LU factorization
A = LU , then L and U are both band matrices of width r.
49
9 Linear least squares IB Numerical Analysis
Ax = b,
AT (Ax∗ − b) = 0.
f (x) = hAx − b, Ax − bi
= xT AAx − 2xT AT b + bT b.
So a necessary condition is
AT (Ax − b).
Now suppose our x∗ satisfies AT (Ax∗ − b) = 0. Then for all x ∈ Rn , we write
x = x∗ + y, and then we have
50
9 Linear least squares IB Numerical Analysis
(AT A)x = AT b.
AT Ax = AT b.
kQxk = kxk
for all x ∈ Rn . This is helpful since what we are trying to do in the least-squares
problem involves minimizing the Euclidean norm of something.
The idea is that for any orthogonal matrix, minimizing kAx − bk is the same
as minimizing kQAx − Qbk. So Q should we pick? Recall that a simple kind of
matrices is triangular matrices. So we might want to get something, say, upper
triangular.
Definition (QR factorization). A QR factorization of an m × n matrix A is a
factorization of the form
A = QR,
where Q ∈ Rm×m is an orthogonal matrix, and R ∈ Rm×n is “upper triangular”
matrix, where “upper triangular” means
R11 R12 · · · R1m
0 R22 · · · R2m
.. .. ..
..
.
. . .
R= 0 0 · · · Rmm
0
0 · · · 0
. . . ..
.. .. .. .
0 0 ··· 0
51
9 Linear least squares IB Numerical Analysis
Gram-Schmidt factorization
This targets the skinny version. So we stop writing the tildes above, and just
write Q ∈ Rm×n and R ∈ Rn×n .
As we all know, the Gram-Schmidt process orthogonalizes vectors. What we
are going to orthogonalize are the columns of A. We write
A = a1 · · · an , Q = q1 · · · qn .
By definition of the QR factorization, we need
j
X
aj = Rij qi .
i=1
52
9 Linear least squares IB Numerical Analysis
If dk 6= 0, then we set
dk
qk = ,
kdk k
and
Rkk = kdk k.
In the case where dk = 0, we again set Rkk = 0, and pick qk to be anything
orthonormal to q1 , · · · , qk−1 .
In practice, a slightly different algorithm (modified Gram-Schmidt process)
is used, which is (much) superior with inexact arithmetic. The modified Gram-
Schmidt process is in fact the same algorithm, but performed in a different order
in order to minimize errors.
However, this is often not an ideal algorithm for large matrices, since there
are many divisions and normalizations involved in computing the qi , and the
accumulation of errors will cause the resulting matrix Q to lose orthogonality.
Givens rotations
This works with the full QR factorization.
Recall that in R2 , a clockwise rotation of θ ∈ [−π, π] is performed by
cos θ sin θ α α cos θ + β sin θ
= .
− sin θ cos θ β −α sin θ + β cos θ
By choosing θ such that
α β
cos θ = p , sin θ = p ,
α + β2
2 α2 + β2
this then becomes p
α2 + β 2
.
0
Of course, by choosing
pa slightly different θ, we can make the result zero in the
first component and α2 + β 2 in the second.
Definition (Givens rotation). In Rm , where m > 2, we define the Givens
rotation on 3 parameters 1 ≤ p < q ≤ m, θ ∈ [−π, π] by
1
..
.
1
cos θ sin θ
1
[p,q]
Ωθ =
.. ∈ Rm×m ,
.
1
− sin θ cos θ
1
. ..
1
where the sin and cos appear at the p, qth rows and columns.
53
9 Linear least squares IB Numerical Analysis
[p,q]
Note that for y ∈ Rm , the effect of Ωθ only alters the p and q components.
[p,q]
In general, for B ∈ Rm×n , then Ωθ B only alters the p and q rows of B.
Moreover, just like the R case, given a particular z ∈ Rm , we can choose θ
2
[p,q]
such that the qth component (Ωθ z)q = 0.
Hence, A ∈ Rm×n can be transformed into an “upper triangular” form by
applying s = mn − n(n+1)
2 Givens rotations, since we need to introduce s many
zeros. Then
Qs · · · Q1 A = R.
How exactly do we do this? Instead of writing down a general algorithm, we
illustrate it with a matrix A ∈ R4×3 . The resultant R would look like
R11 R12 R13
0 R22 R23
R= 0
.
0 R33
0 0 0
[2,3]
Ωθ4 × × × Ω[3,4] × × × Ω[3,4] ×××
0 ×× θ5 0 ×× θ6 0 ××
0 0× 0 0× 0 0×
0 ×× 0 0× 0 0 0
[2,3]
Note that when applying, say, Ωθ4 , the zeros of the first column get preserved,
[2,3]
since Ωθ4 only mixes together things in row 2 and 3, both of which are zero in
the first column. So we are safe.
Note that this gives us something of the form
Qs · · · Q1 A = R.
A = QT1 · · · QTs R.
However, we don’t really need to do this if we just want to do use this to solve
the least squares problem, since to do so, we need to multiply by QT , not Q,
which is exactly Qs · · · Q1 .
Householder reflections
6 0 ∈ Rm , we define the Householder
Definition (Householder reflection). For u =
reflection by
uuT
Hu = I − 2 T ∈ Rm×m .
u u
54
9 Linear least squares IB Numerical Analysis
Note that this is symmetric, and we can see Hu2 = I. So this is indeed
orthogonal.
To show this is a reflection, suppose we resolve x as the perpendicular and
parallel parts as x = αu + w ∈ Rm , where
uT x
α= , uT w = 0.
uT u
Then we have
Hu x = −αu + w.
So this is a reflection in the m − 1-dimensional hyperplane uT y = 0.
What is the cost of computing Hu z? This is evaluated as
uT z
Hu z = z − 2 u.
uT u
This only requires O(m) operations, which is nice.
Proposition. A matrix A ∈ Rm×n can be transformed into upper-triangular
form by applying n Householder reflections, namely
Hn · · · H1 A = R,
where each Hn introduces zero into column k and leaves the other zeroes alone.
This is much less multiplications needed than the number needed for Givens
rotations, in general.
To show this, we first prove some lemmas.
Lemma. Let a, b ∈ Rm , with a 6= b, but kak = kbk. Then if we pick u = a − b,
then
Hu a = b.
This is obvious if we draw some pictures in low dimensions.
Proof. We just do it:
2(kak − aT b)
Hu a = a − (a − b) = a − (a − b) = b,
kak2 − 2aT b + kbk2
where we used the fact that kak = kbk.
We will keep applying the lemma. Since we want to get an upper triangular
matrix, it makes sense to pick b to have many zeroes, and the best option would
be to pick b to be a unit vector.
So we begin our algorithm: we want to clear the first column of A. We let a
be the first column of A, and assume a ∈ Rm is not already in the correct form,
i.e. a is not a multiple of e1 . Then we define
u = a ∓ kake1 ,
H1 a = Hu a = ±kake1 .
55
9 Linear least squares IB Numerical Analysis
Now we have
×××
××
H1 A = 0 × × .
0
0 ××
To do the next step, we need to be careful, since we don’t want to destroy the
previously created zeroes.
Lemma. If the first k − 1 components of u are zero, then
(i) For every x ∈ Rm , Hu x does not alter the first k − 1 components of x.
(ii) If the last (m − k + 1) components of y ∈ Rm are zero, then Hu y = y.
These are all obvious from definition. All these say is that reflections don’t
affect components perpendicular to u, and in particular fixes all vectors perpen-
dicular to u.
Lemma. Let a, b ∈ Rm , with
ak bk
.. ..
. 6= . ,
am bm
but
m
X m
X
a2j = b2j .
j=k j=k
Suppose we pick
u = (0, 0, · · · , 0, ak − bk , · · · , am − bm )T .
Then we have
Hu a = (a1 , · · · , ak−1 bk , · · · , bm ).
This is a generalization of what we’ve had before for k = 1. Again, the proof
is just straightforward verification.
Now we can mess with the second column of H1 A. We let a be the second
column of H1 A, and assume a3 , · · · , am are not all zero, i.e. (0, a2 , · · · , am )T is
not a multiple of e2 . We choose
u = (0, a2 ∓ γ, a3 , · · · , am )T ,
where v
um
uX
γ=t aj .
j=2
Then
Hu a = (a1 , ±γ, 0, · · · ).
Also, by the previous lemma, this does not affect anything in the first column
and the first row. Now we have
×××
××
H2 H1 A = 0 ×
0 0
0 0×
56
9 Linear least squares IB Numerical Analysis
Suppose we have reached Hk−1 · · · H1 A, where the first k − 1 rows are of the
correct form, i.e.
Hk−1 · · · H1 A =
0
k−1
We consider the kth column of Hk−1 · · · H1 A, say a. We assume (0, · · · , 0, ak , · · · , am )T
is not a multiple of ek . Choosing
v
um 2
uX
u = γ(0, · · · , 0, ak ∓ γ, ak+1 , · · · , am )t , γ = t aj ,
j=k
we find that
Hu a = (a1 , · · · , ak−1 , ±γ, 0, · · · , 0)T .
Now we have
Hk · · · H1 A = ,
0
k
and Hk did not alter the first k − 1 rows and columns of Hk−1 · · · H1 A.
There is one thing we have to decide on — which sign to pick. As mentioned,
these do not matter in pure mathematics, but with inexact arithmetic, we
should pick the sign in ak ∓ γ such that ak ∓ γ has maximum magnitude, i.e.
ak ∓ γ = ak + sgn(ak )γ. It takes some analysis to justify why this is the right
choice, but it is not too surprising that there is some choice that is better.
So how do Householder and Givens compare? We notice that the Givens
method generates zeros at one entry at a time, while Householder does it column
by column. So in general, the Householder method is superior.
However, for certain matrices with special structures, we might need the
extra delicacy of introducing a zero one at a time. For example, if A is a band
matrix, then it might be beneficial to just remove the few non-zero entries one
at a time.
57