100% found this document useful (1 vote)
1K views

M832 Approximation Theory Course Notes

The subject of Approximation Theory lies at the frontier between Applied Mathematics and Pure Mathematics. Practical problems, such as the computer calculation of special functions like e x, lead naturally to theoretical problems, such as ‘how well can we approximate by a given method?’ or ‘how fast does a given algorithm converge?’.

Uploaded by

Viator in Terra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views

M832 Approximation Theory Course Notes

The subject of Approximation Theory lies at the frontier between Applied Mathematics and Pure Mathematics. Practical problems, such as the computer calculation of special functions like e x, lead naturally to theoretical problems, such as ‘how well can we approximate by a given method?’ or ‘how fast does a given algorithm converge?’.

Uploaded by

Viator in Terra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94



 
 Mathematics and Computing: Taught MSc M832 CN1

M832
APPROXIMATION THEORY AND METHODS
(PART 1)

Course Notes
(Chapters 1–8 and 10)

Prepared by

P. J. Rippon

Second edition

Copyright 
c 2007 The Open University SUP 01168 7
3.1
Contents
Introduction 2
Reading List 3
Chapter 1 The approximation problem and existence of best
approximations 4
Study Session 1: Approximation in a metric space 4
Study Session 2: Approximation in a normed linear space 7
Chapter 2 The uniqueness of best approximation 16
Study Session 1: Convexity 16
Study Session 2: Best approximation operators 17
Chapter 3 Approximation operators and some approximating
functions 25
Study Session 1: ‘Good’ versus ‘best’ approximation 25
Study Session 2: Types of approximating functions 26
Chapter 4 Polynomial interpolation 36
Study Session 1: Polynomial interpolation 36
Study Session 2: Chebyshev interpolation 37
Chapter 5 Divided differences 47
Study Session 1: Basic properties of divided differences 47
Study Session 2: Numerical considerations and Hermite
interpolation 49
Chapter 6 The uniform convergence of polynomial
approximations 55
Study Session 1: Monotone operators 55
Study Session 2: The Bernstein operator 56
Chapter 7 The theory of minimax approximation 63
Study Session 1: The extreme values of the error function 63
Study Session 2: Characterising best minimax
approximations 64
Chapter 8 The exchange algorithm 72
Study Session 1: Using the exchange algorithm 72
Study Session 2: Matters relating to the exchange
algorithm 75
Chapter 10 Rational approximation by the exchange algorithm 82
Study Session 1: The exchange algorithm for rational
approximation 82
Study Session 2: Some convergence properties of the
exchange algorithm 86
Introduction
The subject of Approximation Theory lies at the frontier between Applied
Mathematics and Pure Mathematics. Practical problems, such as the computer
calculation of special functions like ex , lead naturally to theoretical problems,
such as ‘how well can we approximate by a given method?’ or ‘how fast does a
given algorithm converge?’.
Powell’s book Approximation Theory and Methods (hereafter referred to as
‘Powell’) provides an excellent introduction to these theoretical problems, covering
the basic theory of a wide range of approximation methods. Professor Powell is an
expert on both pure and applied approximation theory, and the book contains a
very detailed list of references to and discussion of the research literature.
This course is based on a treatment of fifteen chapters of Powell. Do not be
misled by the statement that this is an undergraduate textbook. Much of the
material can be taught at that level, but when looked at in detail many parts of it
are quite demanding. These course notes will guide you through the book telling
you which sections to read, explaining difficult parts, correcting errors (mercifully
few!) and setting SAQs and Problems to test your understanding of the material.
You should attempt all the SAQs and as many Problems as you have time for:
full solutions are given at the end of the notes for each chapter.
You will find the exercises in Powell quite varied. Many are routine, but others
are rather hard and some are very hard (particularly those which contain the
word ‘investigate’). I have resisted the temptation to attach ‘stars’ to harder
exercises and instead tried to provide ‘hints’, where appropriate. In general I feel
that, at this level, it is a good idea for you to try and make your own judgement
about the difficulty of a given problem.
Many of the exercises require the use of a good scientific calculator (one with
special functions, including hyperbolics, and a memory). Some require the
solution of non-linear equations of the form f (x) = 0 by using, for example:
the bisection method (finding an interval [a, b] such that f (a), f (b) have
opposite signs, testing f (c), where c = 12 (a + b), and then repeating the process
with either [a, c] or [c, b]);
Newton’s method (making a good initial guess x0 at a solution and then
calculating the sequence xn given by
xn+1 = xn − f (xn )/f  (xn ), n = 0, 1, 2, . . .).
Such methods can be implemented on a basic scientific calculator (especially if
only a rough answer is needed), but it will obviously save time if you have access
to a computer. You will not be expected to determine accurate solutions by such
methods in the examination. On the matter of accuracy, I have tended to present
calculations as they appeared on my own calculator, and have sometimes given
final answers correct to only three significant digits.
In order to pace you through the course, there are four Tutor-Marked
Assignments (TMAs). These are compulsory in that you cannot pass the course
without obtaining a reasonable average grade on them. Your three best TMAs
carry 50% of the total marks for the course, the remaining 50% coming from the
three-hour examination at the end of the course. Please note that TMAs cannot
be accepted after their cut-off dates, other than in exceptional circumstances.
Although you should have plenty to do reading Powell and these course notes, I
have added a reading list after this introduction. This splits into books covering
the background material which is assumed in Powell (Linear Algebra, Metric
Spaces, etc.) and other books on Approximation Theory.

2
I should be grateful to receive any comments you may have on the course notes
and on the set book. The course notes have already benefited greatly from close
reading by Mick Bromilow here at the OU and Martin Stynes of University
College, Cork. Their help has been invaluable. Finally, I should like to thank all
those who have helped prepare these Course Notes, including members of the
Desktop Publishing Unit, Alison Cadle who edited them, and the many M832
students who have supplied corrections to earlier versions.

Phil Rippon
Milton Keynes, August 2006

Reading List

Background
W. Rudin, Principles of Mathematical Analysis, McGraw–Hill, 1976.
(A concise introduction to real analysis, including metric spaces,
integration and functions of several variables, as well as basic linear
algebra — available in a paperback International Student Edition.)
V. Bryant, Metric Spaces, Cambridge University Press, 1985.
(An introduction to metric spaces, emphasising the importance of
iteration. Plenty of explanation.)
D. Kreider, R. Kuller, D. Ostberg and F. Perkins, An Introduction to Linear
Analysis, Addison–Wesley, 1966.
(An introduction to the use of vector spaces of functions in solving
linear differential equations — lots of worked exercises.)
M203 Introduction to Pure Mathematics
MST204 Mathematical Models and Methods
M386 Metric and Topological Spaces (now part of M435 )

Approximation Theory
T. J. Rivlin, An Introduction to the Approximation of Functions, Dover, 1981.
(Cheap and covers very similar material to Powell, with less on
splines and more on rational approximation.)
P. J. Davis, Interpolation and Approximation, Dover, 1976.
(Cheap, but a classic text which overlaps Powell considerably,
though with a much greater emphasis on complex approximation.)
D. Braess, Nonlinear Approximation Theory, Springer, 1986.
(Recent and sophisticated, this book examines the more difficult
non-linear theory which Powell largely avoids.)

3
Chapter 1 The approximation problem
and existence of best approximations
The book begins with a discussion of the types of problems which are to be solved
and several fundamental results. Powell assumes that the reader is quite familiar
with metric spaces and so the commentary below includes a short refresher course
on these, in case you are rusty on this subject.
This chapter splits into TWO study sessions:
Study session 1: Sections 1.1–1.2.
Study session 2: Sections 1.3–1.5.

Study Session 1: Approximation in a metric space

Read Sections 1.1 and 1.2

Commentary

1. Section 1.1 describes the three ingredients of an approximation problem:


(a) a function f to be approximated, lying in some underlying (background)
set B;
(b) a set of functions A ⊆ B from which we wish to choose an approximation
g to f ;
(c) a means of measuring how close together g and f are.

f
?
A
g

In a continuous approximation problem f is typically a real function, such


as f (x) = ex , and the set A is a finite-dimensional vector space of real
functions, such as the set Pn of polynomials of degree at most n. One
measure of how closely a function g approximates to f on an interval [a, b] is
max |g(x) − f (x)|.
a≤x≤b

In a discrete approximation problem f is typically a vector of function


values (f (x1 ), . . . , f (xn )), where g belongs to some set of approximating
functions. Note that f and g are used here to represent both a function and
the corresponding vector of function values. One measure of how closely g
approximates to f is
max |g(xi ) − f (xi )| or max |g(xi ) − fi |.
1≤i≤n 1≤i≤n

4
2. A metric space (B, d) is a set B and a metric (or distance function) d(a, b),
a, b ∈ B, such that for all a, b, c ∈ B:
(M1) d(a, b) ≥ 0, with equality if and only if a = b;
(M2) d(a, b) = d(b, a);
(M3) d(a, c) ≤ d(a, b) + d(b, c).
The most familiar metric spaces are R with the metric d(a, b) = |a − b| and
R2 with the metric
 1
d(a, b) = (a1 − b1 )2 + (a2 − b2 )2 2 , a = (a1 , a2 ), b = (b1 , b2 ).

a2 a = (a1 , a2 )

d(a, b)

b2 b = (b1 , b2 )

a1 b1

This example explains why (M3) is called the triangle inequality.


More generally Rn is a metric space with
 n  12

2
d(a, b) = (ai − bi ) , a = (a1 , . . . , an ), b = (b1 , . . . , bn ). (1)
i=1

For general n it is not quite so obvious that (M3) holds. The proof is given
later when we introduce a large family of metric spaces. Before that we recall
a number of definitions and results for future reference. No proofs are given
as these results are quite standard.
Convergence A sequence an , n = 1, 2, . . ., in B is convergent with limit
a∗ if d(an , a∗ ) → 0 as n → ∞.
Closed set A subset F of B is closed if every convergent sequence an ,
n = 1, 2, . . ., in F has its limit in F .
For example, the closed ball
{b ∈ B : d(a, b) ≤ r}, r > 0,
is a closed set.
Open set A subset E of B is open if B\E is closed.
For example, the open ball
{b ∈ B : d(a, b) < r}, r > 0,
is an open set.
Compact set A subset K of B is compact if every sequence an ,
n = 1, 2, . . ., in K has a convergent subsequence ank , k = 1, 2, . . ., whose limit
a is in K.
For example, every finite set is compact. In Rn with the metric d given by
equation (1), every closed set which is also bounded (i.e. lies inside some
fixed closed ball) is compact. Note that every compact set is closed.

5
Continuous function A function φ : (B, d) → (B  , d ) is continuous at
a ∈ B if for each ε > 0 there is a δ > 0 such that
d(a, b) < δ ⇒ d (φ(a), φ(b)) < ε
(equivalently: for each sequence an → a in B, we have f (an ) → f (a)). We
say that φ : (B, d) → (B  , d ) is continuous if φ is continuous at each a ∈ B.
Uniformly continuous function A function φ : (B, d) → (B  , d ) is
uniformly continuous on B if for each ε > 0 there is a δ > 0 such that, for
all a, b ∈ B,
d(a, b) < δ ⇒ d (φ(a), φ(b)) < ε.
Extreme Value Theorem If φ : (B, d) → (R, d ) is continuous (where
d (a, b) = |a − b|), then φ attains a maximum value and a minimum value on
any compact subset K of B.
Uniform Continuity Theorem If φ : (B, d) → (B  , d ) is continuous then
φ is uniformly continuous on any compact subset K of B.

3. Theorem 1.1. The proof can be shortened. You can omit the second
sentence and the word ‘Otherwise’ from the third sentence, and then use the
notation a∗ in place of a+ . Note that Powell uses ‘limitpoint’ to mean the
limit of a convergent subsequence.
The following picture may be helpful.

a3
f a1
A
a∗
a4
a2
B

4. The set A discussed


 after Theorem 1.1 is not compact; for example, the
sequence 1 − n1 , 0 lies in A but has no subsequence which converges to a
limit in A. If f = (2, 0), say, then for each a ∈ A we can find an a ∈ A which
is closer to f than a.

Self-assessment questions
S1 Consider the problem of fitting the data in Figure 1.2 by a straight line. Show
that the set A of vectors (p(x1 ), . . . , p(x5 )), arising from functions
p(x) = c0 + c1 (x), forms a 2-dimensional subspace of R5 .

S2 Prove the following generalisation of Theorem 1.1. If A1 and A2 are compact


subsets of B, then there exist a∗1 in A1 and a∗2 in A2 such that
d(a∗1 , a∗2 ) = inf{d(a1 , a2 ) : a1 ∈ A1 , a2 ∈ A2 }.

6
Study Session 2: Approximation in a normed linear
space

Read Sections 1.3, 1.4 and 1.5

Commentary
1. Almost every metric space in Powell arises as a normed linear space
(n.l.s.). This is a linear space B (also called a vector space) with an
associated norm a , a ∈ B, such that, for all a, b ∈ B and λ ∈ R:
(N1) a ≥ 0, with equality if and only if a = 0;
(N2) λa = |λ| a ;
(N3) a + b ≤ a + b .
Roughly speaking, the norm measures how large the element a is, that is,
how far a lies from the zero element of the space.
By defining
d(a, b) = a − b ,
we find that (B, d) is a metric space. Properties (M1) and (M2) are
immediate, as is (M3), since
a − c = (a − b) + (b − c) (by linearity)
≤ a − b + b − c . (by (N3))
For this reason, (N3) is also called the triangle inequality.
Powell gives some important examples of norms in Section 1.4. Two of these
have useful geometric interpretations.

y = f (x) y = f (x)

a b a b

−||f ||∞

b
f ∞ = maxa≤x≤b |f (x)| f 1 = a
|f (x)| dx

The corresponding metrics have similar geometric interpretations.

Maximum Total
vertical shaded
f separation f area

g g
a b a b

b
f − g∞ = maxa≤x≤b |f (x) − g(x)| f − g1 = a
|f (x) − g(x)| dx

7
The 2-norm
  12
b
f 2 = f (x)2 dx
a

has no convenient geometric interpretation.


For each of these norms, properties (N1) and (N2) are evident, but (N3)
requires some work. We give here the argument for the continuous 2-norm
above. The proof is in two stages.
(I) Cauchy–Schwarz Inequality
 
 b 
 
 f (x)g(x) dx ≤ f 2 g 2 , where f, g ∈ C[a, b].
 a 

Proof If f 2 = 0 or g 2 = 0, then the result is clear.


Otherwise, we use the inequality
√ A+B
AB ≤ , for A, B ≥ 0, (2)
2
with A = f (x)2 / f 22 and B = g(x)2 / g 22. Integration gives
 b  b  b

1 1 1 2 1 2
|f (x)g(x)| dx ≤ f (x) dx + g(x) dx
f 2 g 2 a 2 f 22 a g 22 a
= 1.
The desired inequality now follows from
  
 b  b
 
 f (x)g(x) dx ≤ |f (x)g(x)| dx.
 a  a

Remark Another proof of the Cauchy–Schwarz inequality appears in


the notes for Chapter 2.
(II) Minkowski’s Inequality
f + g 2 ≤ f 2 + g 2 , where f, g ∈ C[a, b].
Proof If f + g 2 = 0, then the result is clear. Otherwise, note that
 b
2
f + g 22 = (f (x) + g(x)) dx
a
 b  b
≤ |f (x) + g(x)| |f (x)| dx + |f (x) + g(x)| |g(x)| dx
a a
≤ f + g 2 f 2 + f + g 2 g 2 ,
by the Cauchy–Schwarz inequality. The desired inequality now follows on
dividing by f + g 2 .
As you should have realised, Minkowski’s inequality is just (N3) for the
2-norm f 2 . The proof of (N3) for the discrete 2-norm is similar,
proceeding via the discrete version of the Cauchy–Schwarz inequality:
m  m  12  m  12
   
  2
 ai b i  ≤ ai b2i .
 
i=1 i=1 i=1

The metric on R which arises from the discrete 2-norm is precisely that
n

defined in equation (1) of the commentary for Study Session 1.


The argument to prove property (N3) for
 1/p
b
f p = |f (x)|p dx ,
a

if 1 < p < ∞, is similar to the case p = 2. Instead of the Cauchy–Schwarz


inequality, we need the more general Hölder inequality

8
 
 b 
 
 f (x)g(x) dx ≤ f p g q ,
 a 

where p−1 + q −1 = 1, whose proof is based on the inequality


A B
A1/p B 1/q ≤ + , A, B ≥ 0.
p q
Since Powell uses only the 1-norm, 2-norm and ∞-norm, we omit the
details.

2. Theorem 1.2 is of fundamental importance, and the proof looks


straightforward (note the use of the ‘backwards’ form of the triangle
inequality a − f ≥ a − f in (1.15), which is equivalent to
a ≤ a − f + f ). However, the second sentence is not quite so
transparent as it may appear. A closed ball in Rn is certainly compact, but
this does not immediately imply that a closed ball in a finite-dimensional
subspace of an n.l.s. is compact. The proof of this fact is a little tricky but it
illustrates how ‘analysis’ and ‘linear algebra’ interact in this subject.
We prove that if A is a finite-dimensional subspace of an n.l.s. and M ≥ 0,
then
AM = {a ∈ A : a ≤ M }
is compact, by showing that any sequence am , m = 1, 2, . . ., in AM has a
convergent subsequence, whose limit must be in AM (since AM is closed).

AM
B A

{a ∈ B : ||a|| ≤ M }

Let b1 , . . . , bn be a basis for A and write each am in the form


am = λ1m b1 + · · · + λnm bn , λ1m , . . . , λnm ∈ R.
The result follows if we can show that the sequence λm = (λ1m , . . . , λnm ) of
coefficient vectors has a convergent subsequence in Rn , since the function
φ : Rn → A given by
φ(λ) = λ1 b1 + · · · + λn bn , (λ1 , . . . , λn ) ∈ Rn ,
is continuous.
The normalised coefficient vectors µm = λm / λm 2 lie on the unit sphere
{µ ∈ Rn : µ 2 = 1} and so have a convergent subsequence µmk with limit
µ∗ , say, where µ∗ 2 = 1. By the continuity of φ, once again,
 
µ∗1 b1 + · · · + µ∗n bn = lim µ1mk b1 + · · · + µnmk bn
k→∞
= lim amk / λmk 2 .
k→∞

Hence λmk 2  ∞ as k → ∞ (for otherwise, since amk ≤ M , the above


limit is 0, which contradicts the linear independence of b1 , . . . , bn ). We may
assume, therefore (by taking a further subsequence), that λmk 2 is
convergent and deduce that λmk = λmk 2 µmk is convergent, as required.

3. In the proof of Theorem 1.3, the Cauchy–Schwarz inequality is applied with


f = |e| and g = 1.

9
Self-assessment questions
S3 Prove that the function φ in the above proof is continuous.

S4 Prove that f 1 , f ∈ C[a, b], satisfies (N3).

S5 Prove that f ∞ , f ∈ C[a, b], satisfies (N3).

S6 Give an alternative proof of Theorem 1.2 by considering


A0 = {a ∈ A : a − f ≤ f }.

S7 Verify equations (1.24), (1.25) and (1.26).

S8 Let A and f be as in Figure 1.4. Determine inf a∈A f − a for the 1-norm,
the 2-norm and the ∞-norm.

Problems for Chapter 1


P1 Use first principles to find:
(a) the best approximations to f (x) = ex on [0, 1] by a constant, in L1 , L2
and L∞ ;
(b) the best approximation to f (x) = x2 on [0, 1] by a linear function
p(x) = ax, in L∞ .

P2 Powell Exercise 1.5

P3 Powell Exercise 1.6

P4 Powell Exercise 1.7

P5 Powell Exercise 1.1 (Hint: choose a suitable compact subset of A1 and apply
SAQ S2.)

Solutions to SAQs in Chapter 1


S1 Since
p(xi ) = c0 + c1 xi , i = 1, 2, . . . , 5,
we have
(p(x1 ), p(x2 ), p(x3 ), p(x4 ), p(x5 )) = c0 (1, 1, 1, 1, 1) + c1 (x1 , x2 , x3 , x4 , x5 ).
Now (1, 1, 1, 1, 1) and (x1 , x2 , x3 , x4 , x5 ) are fixed vectors in R5 , which are not
linearly dependent, and c0 , c1 can take any real values. Hence the set of
vectors (p(x1 ), p(x2 ), p(x3 ), p(x4 ), p(x5 )) forms a 2-dimensional subspace
of R5 .

S2 Let d∗ = inf{d(a1 , a2 ) : a1 ∈ A1 , a2 ∈ A2 } and choose sequences a1n ∈ A1 ,


a2n ∈ A2 , n = 1, 2, . . ., such that
lim d(a1n , a2n ) = d∗ .
n→∞

By the compactness of A1 and A2 , we can choose common subsequences


a1nk , a2nk , k = 1, 2, . . ., such that
lim a1nk = a∗1 and lim a2nk = a∗2 ,
k→∞ k→∞

10
with a∗1 ∈ A1 and a∗2 ∈ A2 (first choose a convergent subsequence a1nk of a1n
and then, if necessary, a subsequence of a2nk ).
Now, by the triangle inequality,
d(a∗1 , a∗2 ) ≤ d(a∗1 , a1nk ) + d(a1nk , a2nk ) + d(a2nk , a∗2 ) .
Letting k → ∞, we deduce that d(a∗1 , a∗2 ) ≤ d∗ , so that d(a∗1 , a∗2 ) = d∗ , as
required.

S3 If λ = (λ1 , . . . , λn ), µ = (µ1 , . . . µn ) ∈ Rn , then


φ(λ) − φ(µ) = (λ1 − µ1 )b1 + · · · + (λn − µn )bn .
Hence
φ(λ) − φ(µ) ≤ |λ1 − µ1 | b1 + · · · + |λn − µn | bn
≤ λ − µ 2 max bi
1≤i≤n
= K λ − µ 2 ,
say. Hence
λ − µ 2 < ε/K ⇒ φ(λ) − φ(µ) < ε,
which proves that φ is (uniformly) continuous on Rn .

S4 Let f, g ∈ C[a, b]. Since


|f (x) + g(x)| ≤ |f (x)| + |g(x)|,
for all x ∈ [a, b], we deduce that
 b  b  b
|f (x) + g(x)| dx ≤ |f (x)| dx + |g(x)| dx,
a a a
that is,
f + g 1 ≤ f 1 + g 1 ,
as required.

S5 Let f, g ∈ C[a, b] and suppose that


max |f (x) + g(x)| = |f (c) + g(c)|,
a≤x≤b

where c ∈ [a, b] (the maximum is attained because the function


x → |f (x) + g(x)| is continuous on [a, b]). Then
|f (c) + g(c)| ≤ |f (c)| + |g(c)|
≤ max |f (x)| + max |g(x)|,
a≤x≤b a≤x≤b

that is,
f + g ∞ ≤ f ∞ + g ∞ ,
as required.

S6 The set A0 is compact, being the intersection of a closed ball in B with A and
hence a closed subset of a compact set. Thus we can, by Theorem 1.1, choose
a∗ ∈ A0 such that
a − f ≥ a∗ − f , a ∈ A0 .
To see that
a − f ≥ a∗ − f , a ∈ A,
note that if a ∈ A\A0 , then
a − f > 0 − f ≥ a∗ − f ,
since 0 ∈ A0 .

11
S7 Since λ > 0, 1 − xλ ≥ 0 for 0 ≤ x ≤ 1, so that
 1 1
  xλ+1 λ
e 1 = 1 − xλ dx = x − = ,
0 λ + 1 0 λ + 1
 1  1
2
 
λ 2
 
e 2 = 1−x dx = 1 − 2xλ + x2λ dx
0 0
1
2x λ+1
x 2λ+1
2λ2
= x− + = ,
λ+1 2λ + 1 0 (λ + 1)(2λ + 1)
e ∞ = max |1 − xλ | = 1.
0≤x≤1


S8 inf f − a 1 = 1; inf f − a 2 = 1/ 2; inf f − a ∞ = 12 .
a∈A a∈A a∈A

Solutions to Problems in Chapter 1


P1 (a) Let p(x) = c be a constant approximation to f (x) = ex . It is clear that
the minimum of f − p 1 occurs when 1 ≤ c ≤ e. Since ex − c = 0 when
x = log c, with ex − c < 0 for x < log c, we have
 1
f − p 1 = |ex − c| dx
0
 log c  1
= (c − e ) dx +
x
(ex − c) dx
0 log c
= 2c log c − 3c + e + 1.
The minimum of this expression occurs when

2(log c + 1) − 3 = 0 ⇒ c = e.

Hence p(x) =√ e is the best L1 approximation, with
f − p 1 = ( e − 1)2 .
To find the best approximation in L2 we minimise
 1
2
f − p 22 = (ex − c) dx
0
e2 − 1
= c2 − 2c(e − 1) +
2
2
e − 1
= (c − (e − 1))2 + − (e − 1)2 .
2
Evidently the minimum occurs for c = e − 1, and so p(x) = e − 1 is the
1
best L2 approximation, with f − p 2 = 2 (3 − e)(e − 1).
The best L∞ approximation again occurs when 1 ≤ c ≤ e, and the
maximum error occurs at the ends of the interval, so that
max |ex − c| = c − 1 = e − c ⇒ c = 12 (e + 1).
x∈[0,1]

Hence p(x) = 12 (e + 1) is the best L∞ approximation, with


f − p ∞ = 12 (e − 1).
(b) The best L∞ approximation to f (x) = x2 on [0, 1] by p(x) = ax occurs
when 0 < a < 1. For a given value of a, 0 < a < 1, there
 are two
candidates for the point x which maximises x2 − ax, namely the point
x = 1 and the point where x2 − ax is at a minimum, which satisfies
2x − a = 0 ⇒ x = a/2.

12
Now if x = 1, then
 2 
x − ax = |1 − a| = 1 − a,

and if x = 12 a, then
 2   
x − ax = 1 a 1 a − a = 1 a2 .
2 2 4

Since 1 − a is decreasing and 14 a2 is increasing, we deduce that the best


L∞ approximation occurs when
a2 √
1−a= ⇒
a = 2( 2 − 1) 0.8284.
4

√ approximation is p(x) = 2( 2 − 1)x, with
Hence the best L∞
f − p ∞ = 3 − 2 2.

P2 Here A is the set of continuous piecewise-linear functions on [a, b]. Given


f ∈ C[a, b] and ε > 0 we must construct such a piecewise-linear function g,
with
f − g ∞ < ε.
(Note that the letter a has two meanings in the question.)
Since f must be uniformly continuous on [a, b], there exists δ > 0 such that
|x − y| < δ ⇒ |f (x) − f (y)| < ε. (3)
Now choose n > (b − a)/δ and define xk = a + (b − a)(k/n), for
k = 0, 1, . . . , n. Notice that xk+1 − xk = (b − a)/n < δ.
Next define g(xk ) = f (xk ), for k = 0, 1, . . . , n, and extend the function g
linearly to each interval [xk , xk+1 ] by the formula
(xk+1 − x)f (xk ) + (x − xk )f (xk+1 )
g(x) = .
xk+1 − xk

y = g(x)

y = f (x)

a = x0 x1 x2 b = xn

We claim that f − g ∞ < ε, that is,


|f (x) − g(x)| < ε, xk ≤ x ≤ xk+1 , k = 0, 1, . . . , n − 1.
This holds because

|f (x) − f (xk )| < ε


, xk ≤ x ≤ xk+1 ,
|f (x) − f (xk+1 )| < ε
by condition (3), and the values of g(x), xk ≤ x ≤ xk+1 , lie between f (xk )
and f (xk+1 ). Thus if f ∈ C[a, b] but f ∈
/ A, then there is no best
approximation to f from A because for any ε > 0 there exists g ∈ A such that
f − g ∞ < ε. This example shows that Theorem 1.2 is false if we drop the
hypothesis that A be finite-dimensional.

13
P3 Let us take [a, b] = [0, 1] for simplicity. The example can always be adapted to
[a, b] by a translation.
Consider first the example e(x) = 1 − xλ , 0 ≤ x ≤ 1. Equations (1.24) and
(1.25) give

e 2 2λ + 2
= ,
e 1 2λ + 1
which shows that
e 2 √
1≤ ≤ 2, 0 ≤ λ < ∞.
e 1
However, if we allow negative values of λ then e 2 / e 1 becomes unbounded
as λ tends to − 21 from above. Of course, f (x) = xλ is not continuous on [0, 1]
for negative values of λ, but this observation suggests a possible ‘shape’ for
our example.
Consider instead the continuous function

1 − x/ε, 0 ≤ x ≤ ε,
fε (x) =
0, ε < x ≤ 1,
where 0 < ε < 1. We have
 ε ε
x2
fε 1 = (1 − x/ε)dx = x − = ε/2,
0 2ε 0
 ε ε
x2 x3
fε 22 = (1 − x/ε)2 dx = x − + 2 = ε/3.
0 ε 3ε 0

Hence fε 2 / fε 1 = 2/ 3ε → ∞ as ε → 0.

P4 (i) The unit ball for the 1-norm in R3 is a regular octahedron centred at the
origin. The part of its boundary in the first octant has equation
x + y + z = 1. Thus, as r increases,
{a : a 1 ≤ r}
first meets 3x + 2y + z = 6 at the point (2, 0, 0), which is the closest point
to the origin with respect to the 1-norm.
(ii) The unit ball for the 2-norm in R3 is the ordinary ball centred at the
origin. As r increases,
{a : a 2 ≤ r}
first meets 3x + 2y + z = 6 at a point (x, y, z) whose normal (to the
plane) passes through the origin. Since the line {(3k, 2k, k) : k ∈ R} is
normal to the plane we solve
3(3k) + 2(2k) + k = 6 ⇒ k = 3/7.
Thus (9/7, 6/7, 3/7) is the closest point to the origin with respect to the
2-norm.
(iii) The unit ball for the ∞-norm in R3 is the cube with vertices
(±1, ±1, ±1). As r increases,
{a : a ∞ ≤ r}
first meets 3x + 2y + z = 6 at a point of the form (k, k, k), k > 0. Thus
(1, 1, 1) is the closest point to the origin with respect to the ∞-norm.

14
P5 The idea, as in Theorem 1.2, is to choose a compact subset A2 of A1 which
must contain the point of A1 which is closest to A0 . For example, we can
choose
A2 = A1 ∩ {a : a ≤ 2M },
where M is so large that
A0 ⊆ {a : a ≤ M }.
Then choose a∗0 ∈ A0 and a∗1 ∈ A2 (see SAQ S2) such that
a∗0 − a∗1 ≤ a0 − a1 , a0 ∈ A0 , a1 ∈ A2 .
To prove that
a∗0 − a∗1 ≤ a0 − a1 , a0 ∈ A0 , a1 ∈ A1 ,
note that if a0 ∈ A0 and a1 ∈ A1 \A2 , then
a0 − a1 ≥ a1 − a0
> 2M − M
≥ a∗0
= a∗0 − 0
≥ a∗0 − a∗1 ,
as required.

15
Chapter 2 The uniqueness of best
approximation
Ideally the method used to choose an approximation from a set A to a given
function f should give a unique answer. This chapter is devoted to the study of
those conditions under which a best approximation from A to f is unique.
Important new concepts introduced include ‘convexity’ and ‘scalar product’.
This chapter splits into TWO study sessions:
Study session 1: Sections 2.1 and 2.2.
Study session 2: Sections 2.3 and 2.4.

Study Session 1: Convexity

Read Sections 2.1 and 2.2

Commentary
1. The following diagrams illustrate the notion of a convex set and a strictly
convex set.

Not convex Convex Strictly


(not strictly) convex

A point a is interior to a set A in a metric space if the open ball


{b : d(a, b) < r} ⊂ A, for some r > 0.

2. In the proof of Theorem 2.1, there is no need for modulus signs around θ and
1 − θ, since both quantities are positive.

3. The following diagram illustrates the proof of Theorem 2.3.

B A
s0
1
2
(s0 + s1 )

f s1

 
s = 12 (s0 + s1 ) + λ f − 12 (s0 + s1 )

Note that the number λ ≥ 0, which appears in (2.6), does not need to be
maximal. All that is required is λ > 0 and s ∈ A.

16
4. The following diagram illustrates the proof of Theorem 2.4.
1
2
(s0 + s1 ) s1 A

h∗
B

s0 f

N (f, h∗ )

Note that any finite-dimensional subspace A is convex, but not compact


(consider the sequence na, n = 1, 2, . . ., where a = 0).

Self-assessment questions
S1 Which of the unit balls in Figure 1.5 (page 10) are strictly convex?

S2 Prove that
(a) the intersection of two convex sets is convex;
(b) the intersection of two strictly convex sets is strictly convex.

S3 Show that the norms (a) · 1 and (b) · ∞ are not strictly convex on C[0, 1].

Study Session 2: Best approximation operators

Read Sections 2.3 and 2.4

Commentary

1. The following diagram illustrates the definition of the best approximation


operator X.

a∗ = X(f )
A

A projection operator is one for which X(X(f )) = X(f ), that is, the best
approximation from A to a point a ∈ A is a itself.

2. The final comment in Section 2.3 relates to the earlier comment on the
importance of the continuity of the best approximation operator to computer
calculations.

17
3. A scalar product (or inner product) on a linear space B is a real-valued
function (a, b), a, b ∈ B, such that for all a, b, c ∈ B and λ, µ ∈ R:
(S1) (a, a) ≥ 0, with equality if and only if a = 0;
(S2) (a, b) = (b, a);
(S3) (a, λb + µc) = λ(a, b) + µ(a, c).
Two important scalar products are given in Section 2.4.
In any linear space with a scalar product we can define a norm by the
equation

a = (a, a).
As usual, only the proof of the triangle inequality requires any work; it
follows from a version of the Cauchy–Schwarz inequality (see SAQ S6):
|(a, b)| ≤ a b , a, b ∈ B,
together with the identity
a + b 2 = a 2 + 2(a, b) + b 2,
which you can easily verify.
We shall meet other examples of scalar products in Chapter 11.

4. The proof of Theorem 2.7 is perhaps more clearly written as follows.


If f = g and f 2 = g 2 = 1, then we have
f − g 22 = 1 − 2(f, g) + 1 > 0 ⇒ (f, g) < 1,
and hence
θf + (1 − θ)g 22 = θ2 + 2θ(1 − θ)(f, g) + (1 − θ)2
< θ2 + 2θ(1 − θ) + (1 − θ)2
= 1.

5. The norms
 1/p
b
f p = |f (x)| dx
p
, 1 < p < ∞,
a

are all strictly convex. The proof (for p = 2) depends on a careful study of
the possibility of equality in Hölder’s inequality.

Self-assessment questions
S4 Prove Theorem 2.6. (Hint: consider A0 = {a ∈ A : a ≤ 4 f }.)

S5 Let w be a positive function in C[a, b]. Prove that


 b
(f, g) = w(x)f (x)g(x) dx,
a

is a scalar product on C[a, b].

S6 Prove the Cauchy–Schwarz inequality for scalar products by considering the


discriminant of the quadratic expression
(a + λb, a + λb), λ ∈ R.

S7 Draw figures to illustrate the non-uniqueness of best approximation in the


four examples on pages 18–19.

18
Problems for Chapter 2
P1 Powell Exercise 2.4

P2 Powell Exercise 2.5

P3 Powell Exercise 2.6

P4 Powell Exercise 2.8

P5 Powell Exercise 2.1

Solutions to SAQs in Chapter 2


S1 Only the unit ball in the 2-norm is strictly convex.

S2 (a) Let S and T be convex sets. If s0 , s1 ∈ S ∩ T , then the points


s = θs0 + (1 − θ)s1 , 0 < θ < 1,
also lie in both S and T (since S and T are convex) and hence in S ∩ T .
Thus S ∩ T is convex.
(b) The proof is as above, but in addition we observe that if s is an interior
point of both S and T then s is an interior point of S ∩ T .

S3 (a) Consider f (x) = 2x and g(x) = 2(1 − x) on [0, 1]. Clearly f 1 and
g 1 = 1, but
1 
2 (f + g) (x) = 1, 0 ≤ x ≤ 1,
so that 12 (f + g) 1 = 1. Hence · 1 is not strictly convex.
(b) Consider f (x) = 1 and g(x) = x on [0, 1]. Clearly f ∞ = 1 and
g ∞ = 1, but
1  1
2 (f + g) (x) = 2 (1 + x), 0 ≤ x ≤ 1,
so that 12 (f + g) ∞ = 1. Hence · ∞ is not strictly convex.

S4 The idea, as in Theorem 1.2, is to consider a compact subset of A which is


large enough to contain best approximations to all points in a neighbourhood
of f , and then apply Theorem 2.5.
For example, if we take
A0 = {a ∈ A : a ≤ 4 f },
then A0 contains the best approximations (from A) to all points g such that
g ≤ 2 f
(because A0 ⊇ {a ∈ A : a ≤ 2 g }, for all such g).
Thus if g ≤ 2 f , then the best approximation operator X0 with respect to
A0 coincides with the best approximation operator X with respect to A.
Applying Theorem 2.5, we find that X0 is continuous at f , and so therefore
is X.

19
b
S5 (S1) (f, f ) = a w(x)f (x)2 dx ≥ 0, since w(x)f (x)2 ≥ 0, a ≤ x ≤ b.
Equality can occur only if f (x)2 = 0, a ≤ x ≤ b, since w(x) > 0,
a ≤ x ≤ b.
(S2) Obvious, by definition.
 b
(S3) (f, λg + µh) = w(x)f (x)(λg(x) + µh(x)) dx
a
 b  b
=λ w(x)f (x)g(x) dx + µ w(x)f (x)h(x) dx
a a
= λ(f, g) + µ(g, h).

S6 Since, for all λ ∈ R,


0 ≤ (a + λb, a + λb) = a 2 + 2λ(a, b) + λ2 b 2 ,
we must have B 2 − 4AC ≤ 0, where
A = b 2 , B = 2(a, b), C = a 2 .
Thus
4(a, b)2 ≤ 4 a 2 b 2 ⇒ |(a, b)| ≤ a b ,
as required.

S7 The figures are as follows.

f (x) = 1 1 f = (1, 1, 1, 1, 1)
1

a(x) = λx

−1 1 −1 1
−λ
a = (−λ, 2
, 0, λ2 , λ)
−1

f − a1 = 2, |λ| ≤ 1 f − a1 = 5, |λ| ≤ 1

f (x) = 1 1 f = (1, 1, 1, 1, 1)

a(x) = λ(1 + x) a = (0, λ2 , λ, 3λ


2
, 2λ)

−1 1 −1 1

f − a∞ = 1, 0 ≤ λ ≤ 1 f − a∞ = 1, 0 ≤ λ ≤ 1

20
Solutions to Problems in Chapter 2
P1 Since the unit ball in the ∞-norm is a square with sides parallel to the axes,
the best approximation in A to a point f ∈ R2 \A is found as follows.
(a) If f lies in one of the shaded sets, then X(f ) lies on the circle
{a : a 2 = 1} and on a (projection) line through f at 45◦ to the axes.
(b) If f does not lie in one of the shaded sets, then X(f ) is the nearest of the
four points (±1, 0), (0, ±1).

X(f )
g X(g)
1
A

To prove directly (that is, without the help of Theorem 2.6) that X(f ) is
continuous, suppose first that f1 , f2 lie in the shaded set in the first quadrant.
Then

f1 − f2 ∞ ≥ d/ 2,
where d is the (ordinary) distance between the 45◦ projection lines through f1
and f2 .

f1 √
f2 N (f2 , d/ 2)

X(f1 ) N (f2 , ||f2 − f1 ||∞ )

A X(f2 )

Furthermore,

X(f1 ) − X(f2 ) ∞ ≤ 2d,
since the line segment joining X(f1 ) to X(f2 ) makes an angle of more than
45◦ with the projection lines from f1 , f2 . Hence
X(f1 ) − X(f2 ) ∞ ≤ 2 f1 − f2 ∞ .
It follows that X is continuous on the shaded sets. Since X is constant on the
four unshaded sets in R2 \A (and these constant values agree with the values
of X on the boundaries between the shaded and unshaded sets in R2 \A) and
X is the identity on A itself, we deduce that X is continuous on the whole
of R2 .

21
P2 To prove that X(f ) = f / f , if f > 1, we have to show that
f − g ≥ f − f / f , g ∈ A.
But, by the ‘backwards’ form of the triangle inequality,
f − g ≥ f − g
≥ f − 1 (since g ∈ A)
= f (1 − 1/ f )
= f − f / f ,
as required. (Where did we use the fact that f > 1?)
To prove that
X(f1 ) − X(f2 ) ≤ 2 f1 − f2 , f1 , f2 ∈ B,
it is sufficient to consider three cases.
Case 1 f1 ≤ 1, f2 ≤ 1.
In this case X(f1 ) = f1 and X(f2 ) = f2 , so that
X(f1 ) − X(f2 ) = f1 − f2 .
Case 2 f1 ≤ 1, f2 > 1.
In this case X(f1 ) = f1 and X(f2 ) = f2 / f2 , so that
 
 f2 
X(f1 ) − X(f2 ) = f −
 1 f2 

  
 1 
= f
 1 − f + f 1 − 
f2 
2 2
 
1
≤ f1 − f2 + f2 1 − (since f2 > 1)
f2
= f1 − f2 + f2 − 1
≤ f1 − f2 + f2 − f1 (since f1 ≤ 1)
≤ 2 f1 − f2 .
Case 3 f1 > 1, f2 ≥ f1 .
In this case X(f1 ) = f1 / f1 and X(f2 ) = f2 / f2 , so that
 
 f1 f2 
X(f1 ) − X(f2 ) =  −
 f1 f2 

  
1  f1 − f2 + f2 1 − f1 

= 
f1 f2 
≤ f1 − f2 + f2 − f1 (since f1 > 1)
≤ 2 f1 − f2 .

P3 First we remark that the sum of two norms on a linear space is also a norm
on that space.
To prove that
f = f 1 + f ∞ , f ∈ C[−π, π],
is not strictly convex, let A be the 1-dimensional subspace of functions of the
form
g(x) = λ sin2 x, −π ≤ x ≤ π,
where λ ∈ R, and let f (x) = x, −π ≤ x ≤ π.
For |λ| ≤ 1, the graph y = λ sin2 x meets y = x only at the origin since
|λ sin2 x| ≤ | sin x| < |x|, for x = 0.

22
y=x

y = λ sin2 x

−π π

Hence, by the evenness of sin2 x, f − g 1 = π2 , for |λ| ≤ 1.


Also, for |λ| < 1, the function
e(x) = x − λ sin2 x
has no local maximum or minimum on R, since
e (x) = 1 − λ sin 2x > 0, x ∈ R.
Hence
f − g ∞ = max |e(x)| = max {e(π), −e(−π)} = π.
−π≤x≤π

Thus
f − g = π2 + π,
for g(x) = λ sin2 x, |λ| ≤ 1.
Since it is also clear that
f − g ≥ π2 + π, λ ∈ R,
we deduce that f does not have a unique best approximation in A. Hence, by
Theorem 2.4, this norm is not strictly convex.

P4 Recall that the unit ball of R3 in the 1-norm is the regular octahedron whose
face in the first octant lies on x + y + z = 1, and the unit ball of R3 in the
∞-norm is the cube with vertices (±1, ±1, ±1).
Thus the plane x + y = 1 meets the boundary of the unit ball in the 1-norm
in a line segment and also meets the boundary of the ball of radius 12 in the
∞-norm in a line segment.

z z
y y
1
1 1 1
x+y = 1 2
1
x+y = 1
2

1
2
1 1
x x

23
P5 This question shows that any closed, bounded, convex subset A of a linear
space B, with the property that f ∈ A ⇒ −f ∈ A, is the unit ball of some
norm, namely that given by

0, if f = 0,
f =
min {r ∈ (0, ∞) : f /r ∈ A}, if f = 0.
First note that the minimum in this definition is attained. Indeed, if we first
define
f = inf {r ∈ (0, ∞) : f /r ∈ A}, f = 0,
then f > 0, otherwise A is unbounded. Also, there is a sequence rn → f ,
such that f /rn ∈ A, and since f /rn → f / f we deduce that f / f ∈ A, as
required.
(N1) Certainly f ≥ 0, by definition and we have just seen that f > 0
for f = 0.
(N2) λf = |λ| f , for λ ∈ R.
It is clear that this holds if f = 0 or λ = 0.
If f = 0 and λ = 0, then
λf = min {r ∈ (0, ∞) : λf /r ∈ A}
= min {r ∈ (0, ∞) : |λ|f /r ∈ A} (since f ∈ A ⇔ −f ∈ A)
= min {|λ|r ∈ (0, ∞) : |λ|f /(r|λ|) ∈ A}
= |λ| min {r ∈ (0, ∞) : f /r ∈ A}
= |λ| f ,
as required.
(N3) f + g ≤ f + g
At first sight this looks difficult. However, by the definition of f + g ,
it is sufficient to prove that
f +g
∈ A. (1)
f + g
We know that f / f ∈ A and g/ g ∈ A so that, by the convexity of A,
f g
θ + (1 − θ) ∈ A, 0 < θ < 1.
f g
If we now choose
f g
θ= so that 1 − θ = ,
f + g f + g
then we obtain (1).
The above argument breaks down if f = g = 0, but in this case
f = g = 0, and so f + g = 0 also.
Hence f , f ∈ B, is indeed a norm on B.

24
Chapter 3 Approximation operators and
some approximating functions
Calculating a best approximation from a subspace A of B to f , with respect to
some norm on B, may not be as easy as calculating an approximation using some
other operator, such as an interpolation operator. To judge how good an
approximation is obtained by such an operator X, we use the ‘norm’ X of the
operator, which is exploited in Sections 3.1 and 3.2. The other two sections of the
chapter contain a discussion of the problems involved in approximating by
polynomials and an introduction to piecewise polynomial approximation.
This chapter splits into TWO study sessions:
Study session 1: Sections 3.1 and 3.2.
Study session 2: Sections 3.3 and 3.4.

Study Session 1: ‘Good’ versus ‘best’ approximation

Read Sections 3.1 and 3.2

Commentary
1. Powell defines X to be the smallest real number such that
X(f ) ≤ X f , f ∈ B.
Otherwise stated,
X = sup { X(f ) / f : f ∈ B, f = 0}.
Thus to determine the value of X , we must find a number M such that
X(f ) ≤ M f , f ∈ B,
and such that, whenever M  < M , there exists f ∈ B with
X(f ) > M  f .
If X(f ) / f is unbounded on B, then we say that the operator X is
unbounded.
Notice that if X is a linear operator, then
X(λf ) X(f )
= , f = 0, λ = 0,
λf f
so that
X = sup { X(f ) : f = 1}.
In general, the supremum may not be attained because the unit sphere
{f ∈ B : f = 1} need not be compact (see, for example, Problem P1).

25
2. The following diagram may help you with Theorem 3.1.

A
B

X(f )

p∗

d∗
f

[Note the word ‘a’ in the first line of the proof of Theorem 3.1.]

3. Page 25, line 13. The reason why p∗ (x) = x − 18 is the best L∞
approximation by a linear polynomial to f (x) = x2 on [0, 1] will become clear
in Chapter 7.

4. The point of the final paragraph of Section 3.2 is that algorithms for
calculating best L∞ approximations from Pn are more involved than those
for applying linear (projection) operators X : B → A, such as interpolation.
If we determine Xf and compute f (x) − Xf (x) at various points, finding an
x for which
|f (x) − Xf (x)| > (1 + X )ε,
then, by Theorem 3.1, the best approximation p∗ will satisfy f − p∗ > ε.
Thus a larger value of n may be required.

Self-assessment questions
S1 Show that the interpolation operator X defined at the bottom of page 23 is
unbounded with respect to (a) the 1-norm, (b) the 2-norm.

S2 Explain why the application of this operator X to f (x) = x2 (see


equation (3.15)) shows that we can have equality in (3.11).

Study Session 2: Types of approximating functions

Read Sections 3.3 and 3.4

Commentary
1. Page 26, line 9. The promised technique appears in equation (3.23).

2. The space C (k) [a, b]. An example of a function f which belongs to C (k) [a, b],
but not to C (k+1) [a, b] is

xk+1 , x ≥ 0,
f (x) =
−xk+1 , x < 0.
For this function,
f (k) (x) = (k + 1)!|x|, x ∈ R,
which is continuous but not differentiable at 0.

26
3. The identity (3.23) holds because q ∈ Pn so that, as p varies over the whole
space Pn , so q + p varies over the whole of Pn .
1
4. Table 3.1. For k = 1, the terms d∗n (f ) scale by a factor of approximately 2 as
n doubles, whereas, for k = 3, they scale by approximately 18 . This gives
C1
d∗2n (f ) , k = 1,
2n
C3 C3
d∗2n (f ) n = n 3 , k = 3,
8 (2 )
which suggests that d∗n (f ) Ck /nk in both cases.
Notice that in (3.20), for a fixed value of k,
(n − k)! 1 1
= ∼ k as n → ∞.
n! n(n − 1) . . . (n − k + 1) n
 
an
(We say that an ∼ bn as n → ∞ if lim = 1.)
n→∞ bn

5. Page 29, line 8. An analytic function is one which has a power series
expansion about each point of its domain of definition. Such functions are
completely determined by their values on any given open interval, no matter
how short.

6. Page 29, line 6−. The spline function s is a piecewise polynomial on [a, b],
such that
s(x) = pj (x), ξj−1 ≤ x ≤ ξj , j = 1, . . . , n,
where each pj ∈ Pk , and
(i) (i)
pj (ξj ) = pj+1 (ξj ), i = 0, 1, . . . , k − 1, j = 1, . . . , n − 1.

pj
p1 pj+1
pn

a = ξ0 ξ1 ξj−1 ξj ξj+1 ξn−1 b = ξn

Formula (3.31) can be explained as follows. First note that


s(x) = p1 (x), ξ0 ≤ x ≤ ξ1 .
Next put
q1 (x) = p2 (x) − p1 (x), x ∈ R,
so that
(k−1)
q1 (ξ1 ) = q1 (ξ1 ) = · · · = q1 (ξ1 ) = 0.
Since q1 is a polynomial of degree k,
d1 (k)
q1 (x) = (x − ξ1 )k , where d1 = q1 (ξ1 ).
k!

27
Hence
s(x) = p2 (x) = p1 (x) + q1 (x), ξ1 ≤ x ≤ ξ2 ,
and so
d1
s(x) = p1 (x) + (x − ξ1 )k+ , ξ0 ≤ x ≤ ξ2 .
k!
Here

0, x < ξ1 ,
(x − ξ1 )+ =
x − ξ1 , x ≥ ξ1 .
Now put
q2 (x) = p3 (x) − p2 (x), x ∈ R,
and continue in this manner to obtain
1 
n−1
s(x) = p1 (x) + dj (x − ξj )k+ , a ≤ x ≤ b,
k! j=1
(k)
where dj = qj (ξj ) and qj = pj+1 − pj . Thus dj is the jump in s(k) at ξj .

7. Page 30, line 9−. The ‘big oh’ notation used here needs some explanation.
We say that
f (x) = O(g(x)), x ∈ S,
for some subset S of R, if
|f (x)| ≤ M |g(x)|, x ∈ S,
where the constant M does not depend on x. For example,
 
x2 + x = O x2 , x ≥ 1, whereas x2 + x = O(x), 0 ≤ x ≤ 1.

Self-assessment questions
S3 The best L∞ approximation from P2 to f (x) = |x| on [−1, 1] is
p∗ (x) = x2 + 18 (this will become clear in Chapter 7). Verify that
f − p∗ ∞ = 18 , thus confirming one of the entries in Table 3.1.

S4 Express the quadratic spline



⎨ −x, −1 ≤ x ≤ 0,
s(x) = x2 − x, 0 ≤ x ≤ 1,
⎩ 2
3x − 5x + 2, 1 ≤ x ≤ 2,
in the form of equation (3.31).

Problems for Chapter 3


P1 Powell Exercise 3.2 (The last part is rather hard and can safely be ignored!)

P2 Powell Exercise 3.3

P3 Powell Exercise 3.4

P4 Powell Exercise 3.6 (Use Theorem 3.1 and be content to get the lower
estimate 0.048.)

P5 Powell Exercise 3.8

28
Solutions to SAQs in Chapter 3
S1 Consider

⎨ 1 − x/ε, 0 ≤ x ≤ ε,
fε (x) = 0, ε < x < 1 − ε,

1 + (x − 1)/ε, 1 − ε ≤ x ≤ 1.
(Remember Problem P3, Chapter 1.)
Since fε (0) = fε (1) = 1, the interpolating function p = X(fε ) is simply
p(x) = 1, 0 ≤ x ≤ 1, and so p 1 = p 2 = 1. However,
 ε  1
fε 1 = (1 − x/ε) dx + (1 + (x − 1)/ε) dx = ε,
0 1−ε

so that
X(fε ) 1 1
(a) = is unbounded.
fε 1 ε
Also
 ε  1
fε 22 = (1 − x/ε)2 dx + (1 + (x − 1)/ε)2 dx = 2ε/3,
0 1−ε

so that 
X(fε ) 2 3
(b) = is unbounded.
fε 2 2ε

S2 If f (x) = x2 , 0 ≤ x ≤ 1, and p∗ (x) = x − 1/8, 0 ≤ x ≤ 1, then there are 3


candidates for the point x ∈ [0, 1] such that f − p∗ ∞ = |f (x) − p∗ (x)|.

1
y =x− 8
y = x2

1
− 18 2
1

These are 0, 1, and the point x where


e(x) = f (x) − p∗ (x) = x2 − (x − 1/8)
is a minimum, which satisfies
e (x) = 2x − 1 = 0 ⇒ x = 12 .
Thus
   
f − p∗ ∞ = max |e(0)|, |e 12 |, |e(1)| = 18 .
(The fact that f (x) − p∗ (x) has the same absolute value, but alternating
signs, at each of these 3 points is characteristic of best L∞ approximation
from certain subspaces — see Chapter 7.)
The corresponding interpolating polynomial is p(x) = x, 0 ≤ x ≤ 1, and, in
this case,
  
f − p ∞ = max x2 − x : 0 ≤ x ≤ 1 = 14 ,
since the maximum is taken at x = 12 . Hence f − p ∞ = 2 f − p∗ ∞ and
since X ∞ = 1 we do have equality in (3.11).

29
S3 If f (x) = |x|, −1 ≤ x ≤ 1, and p∗ (x) = x2 + 18 , −1 ≤ x ≤ 1, then there are 5
candidates for the point x ∈ [−1, 1] such that f − p∗ ∞ = |f (x) − p∗ (x)|.
These are 0, ±1, and the points ±x where
 
e(x) = f (x) − p∗ (x) = |x| − x2 + 18
is a maximum. As in SAQ S2, we find that x = ± 12 , so that
f − p∗ ∞ = 1
8 = 0.125,
which confirms the first entry in the k = 1 column of Table 3.1.

S4 Following the proof of (3.31) given in the commentary, put p1 (x) = −x,
p2 (x) = x2 − x, p3 (x) = 3x2 − 5x + 2. Then
q1 (x) = p2 (x) − p1 (x) = x2 − x − (−x) = x2 ,
 
q2 (x) = p3 (x) − p2 (x) = 3x2 − 5x + 2 − x2 − x
= 2x2 − 4x + 2
= 2(x − 1)2 .
Hence s(x) = −x + (x)2+ + 2(x − 1)2+ , −1 ≤ x ≤ 2.

Solutions to Problems in Chapter 3


P1 First we seek an expression M , depending on K but not on x, such that
|Xf (x)| ≤ M f ∞, a ≤ x ≤ b.
For example
 b
|Xf (x)| ≤ |K(x, y)| |f (y)| dy
a 
b
≤ |K(x, y)| dy f ∞ ,
a

so that
  
b
Xf ∞ ≤ max |K(x, y)| dy f ∞ .
a≤x≤b a

Hence
 b  b
X ∞ ≤ max |K(x, y)| dy = |K(x0 , y)| dy,
a≤x≤b a a

say. To prove that X ∞ can be no less than this, we should like to find a
function f ∈ C[a, b] such that f ∞ = 1 and
 b
Xf ∞ = |K(x0 , y)| dy.
a
The ideal function would be

1, if K(x0 , y) > 0,
f (y) = sgn(K(x0 , y)) =
−1, if K(x0 , y) < 0,
so that
 b  b
(Xf )(x0 ) = K(x0 , y)f (y) dy = |K(x0 , y)| dy,
a a
b
which implies that X ∞ ≥ a |K(x0 , y)| dy. Unfortunately, however, this
function f is not continuous (unless K(x0 , y) never vanishes).
Instead we take a continuous approximation fε , ε > 0, which differs from
sgn(K(x0 , y)) only on a set of length less than ε.

30
1 fε (y)

a b y
Total
length
<ε K(x0 , y)
−1

It follows that
  b   
   b 
   
Xfε (x0 ) − |K(x0 , y)| dy  =  K(x0 , y)(fε (y) − sgn(K(X0 , y))) dy 
 a   a 
 b
≤K |fε (y) − sgn(K(x0 , y))| dy
a
≤ Kε,
where K = maxa≤y≤b |K(x0 , y)|. Hence
 b
Xfε ∞ ≥ |K(x0 , y)| dy − Kε.
a

Since fε ∞ = 1 and Kε can be taken arbitrarily small, we deduce that


 b
X ∞ = |K(x0 , y)| dy.
a

Remark If X ∞ = 1 and Xf = f , then f need not be a constant. For


example, if

⎨ 4(1 − 2y),
⎪ 0 ≤ x, y ≤ 12 ,
1
K(x, y) = 8(1 − x)(1 − 2y), 2 ≤ x ≤ 1, 0 ≤ y ≤ 12 ,

⎩ 1
0, 2 ≤ y ≤ 1,
and

1 1, 0 ≤ x ≤ 12 ,
f (x) = K(x, y) dy = 1
0 2(1 − x), 2 ≤ x ≤ 1,
then X ∞ = 1 and
 1
f (x) = K(x, y)f (y) dy, 0 ≤ x ≤ 1.
0

P2 To prove that X is a projection (it is obviously linear) we have to show that


Xf (x) = f (x) when f (x) = a + bx. But
 1
Xf (x) = 12 (1 + 3xy)f (y) dy
−1
 1  1
1 3x
= 2 f (y) dy + yf (y) dy
−1 2 −1
 1  1
3x  
= 1
2 (a + by) dy + ay + by 2 dy
−1 2 −1
 
3x 2b
= 12 (2a) +
2 3
= a + bx,
as required.

31
Now, by Problem P1,
 1
1
X ∞ = max |1 + 3xy| dy
−1≤x≤1 2 −1
 1   1 
1 1
= 2 |1 + 3y| dy or 2 |1 − 3y| dy .
−1 −1

To see this, consider the change in the graph 1 + 3xy, −1 ≤ y ≤ 1, as x


increases from 0 to 1.

1
4
x=0 x= 2 x=1

1 1 1

−1 1 y −1 1 y −1 1 y

−2

Hence
1 1
    
X ∞ = 2 2 4 × 43 + 12 2 × 23 = 53 .

P3 First note that X is a projection (it is clearly linear), since if f (x) = a + bx,
then
 12
 
Xf (x) = 2 (a + bt) dt + x − 14 (a + b − a)
0
 
= a + b/4 + b x − 14
= a + bx,
as required.
Now, for f ∈ C[0, 1],
  1 
 
 2

|Xf (x)| ≤ 2 f (t) dt + |x − 14 | |f (1) − f (0)|
 0 
 12
≤2 |f (t)| dt + 34 (|f (1)| + |f (0)|)
0
3
≤ f ∞ + 4 · 2 f ∞
5
= 2 f ∞ .

Hence
Xf ∞ ≤ 52 f ∞ ⇒ X ∞ ≤ 52 .
Thus, by Theorem 3.1,
 
f − Xf ∞ ≤ 1 + 52 f − p∗ ∞ ,
where p∗ is the best L∞ approximation to f from P1 , and so
f − Xf ∞ ≤ 72 f − p ∞ , for p ∈ P1 .
Remark In fact X ∞ = 5/2 in this problem, as you can see by
considering, for 0 < ε < 1,

−1 + 2x/ε, 0 ≤ x ≤ ε,
fε (x) =
1, ε < x ≤ 1.

32
P4 There is rather more to this question than meets the eye! First, if
p(x) = a + bx + cx2 , then
p(0) = a, p(1) = a + b + c, p(3) = a + 3b + 9c, p(4) = a + 4b + 16c,
and it is true that
a + 3b + 9c = − 12 a + (a + b + c) + 12 (a + 4b + 16c).
Now
min max |f (x) − p(x)| = f − p∗ ∞ ,
p∈P2 0≤x≤4

where p∗ is the best L∞ approximation to f from P2 . Thus, by Theorem 3.1,


f − Xf ∞
f − p∗ ∞ ≥ ,
1 + X ∞
where X is any linear projection from C[0, 4] to P2 .
Given the first part of the question, it seems a good idea to let X be the
interpolation operator X(f ) = p, defined by p(0) = f (0), p(1) = f (1),
p(4) = f (4). This is certainly a linear projection operator. Since
f (3) = − 12 f (0) + f (1) + 12 f (4) ± 0.15,
we have
f − Xf ∞ ≥ |f (3) − p(3)|
    
= ±0.15 + − 21 f (0) + f (1) + 12 f (4) − − 21 p(0) + p(1) + 12 p(4) 
= 0.15.
Thus
0.15
f − p∗ ∞ ≥ .
1 + X ∞
To obtain the desired lower estimate for f − p∗ ∞ , we need to show,
therefore, that X ∞ ≤ 2. Unfortunately, it turns out that X ∞ = 17 8 .
Nevertheless, we indicate the argument.
Recall that, since X is linear,
X ∞ = max { Xf ∞ : f ∞ = 1}.
Thus we seek to maximise the L∞ norm of X(f ) = p, subject to the
constraints |p(0)| ≤ 1, |p(1)| ≤ 1, |p(4)| ≤ 1. It seems likely that p ∞ will be
largest if we choose p(0), p(1) and p(4) to be ±1. Taking this for granted,
there remains only an analysis of the (non-trivial) cases.

1 1 1

0 1 4 0 1 4 0 1 4
−1 −1 −1
(a) (b) (c)

As you can easily check, case (c) gives the largest value of p ∞ . In this case
 
5 2
 
p(x) = 17 1
8 − 2 x− 2 ⇒ p ∞ = p 52 = 17 8 .
17
Hence X ∞ = 8 , so that
0.15
f − p∗ ∞ ≥ = 0.048.
1 + 17
8

33
Two questions remain.
(I) How do we justify taking p(0), p(1), p(4) to be ±1?
(II) Can we in fact obtain the better estimate 0.05?
There are various ways to answer Question (I). For example, we could argue
from basic principles, examining the effects of taking p(0) = 1, |p(1)| < 1,
|p(4)| < 1, and so on. This would be tedious, and would not generalise to
other problems.
More generally, we can use a linear programming argument. This sounds very
grand, but it is really quite a simple idea. We want to maximise
|p(x)| = |a + bx + cx2 |, 0 ≤ x ≤ 4,
subject to the constraints
|p(xi )| = |a + bxi + cx2i | ≤ 1,
where x1 = 0, x2 = 1, x3 = 4. Now any equation of the form
Xa + Y b + Zc = k, where X, Y , Z, k are constant, defines a plane in R3 .
Hence the above 3 constraints define a parallelopiped P , centred at the origin,
of possible values of (a, b, c) in R3 .
The required maximum M of |p(x)| occurs for some (a, b, c) ∈ P and
x0 ∈ [0, 4], so that
 
M = max a + bx0 + cx20  .
(a,b,c)∈P

Since x0 is now fixed, we can find M by moving the plane a + bx0 + cx20 = k
as far as possible from the origin, while still meeting P ; at this point M = |k|.
Now, however, the plane must pass through at least one vertex of P , so that
|p(xi )| = 1, for i = 1, 2, 3, as required.
We shall see another approach in Chapter 4 which contains a formula for
X ∞ , where X is an interpolation operator from C[a, b] to Pn .
To answer Question (II) we look again at the proof of Theorem 3.1. Using
equation (3.12), we have
0.15 = |f (3) − (X(f ))(3)|
= |(f − p∗ ) (3) −(X(f − p∗ )) (3)|
≤ f − p∗ ∞ + |(X(f − p∗ )) (3)| .
Now consider the problem of maximising |(Xg)(3)|, for g ∈ C[0, 4], while
keeping g ∞ constant. Once again this is a linear programming problem so
that the maximum occurs for g(0) = ± g ∞, g(1) = ± g ∞ , g(4) = ± g ∞.
Examining cases (a), (b), (c) given earlier, we find that
|(Xg)(3)| ≤ 2 g ∞ , g ∈ C[0, 4]
((c) is again the extreme case). Hence, with g = f − p∗ ,
0.15 ≤ f − p∗ ∞ + 2 f − p∗ ∞ = 3 f − p∗ ∞ ,
and so f − p∗ ∞ ≥ 0.05, as required.

34
P5 This one is a little easier! First, since every quadratic spline is differentiable
at points of (−1, 1), we cannot have f − s ∞ = 0, that is s(x) = f (x),
−1 ≤ x ≤ 1, because f is not differentiable at 0.
However, we can make f − s ∞ < ε by defining

⎨ −x, −1 ≤ x ≤ −ε,
s(x) = p(x) = a + bx2 , −ε < x < ε,

x, ε ≤ x ≤ 1.
To guarantee that s is a quadratic spline, we require

p(±ε) = ε, that is, a + bε2 = ε 1 ε


⇒ b= ,a= .
p (±ε) = ±1, that is, 2bε = 1 2ε 2

y = s(x)

−1 −ε ε 1

Since the worst error occurs at x = 0, we have


ε
f − s ∞ = |f (0) − s(0)| = < ε,
2
as required.

35
Chapter 4 Polynomial interpolation
This chapter begins a detailed investigation of the interpolation of continuous
functions by polynomials. It turns out that the choice of interpolation points
makes a considerable difference to the accuracy of the interpolating
approximation; for example, we see in this chapter that equally-spaced
interpolation points make a rather poor choice. This investigation of interpolation
continues in Chapter 5.
This chapter splits into TWO study sessions:
Study session 1: Sections 4.1 and 4.2.
Study session 2: Sections 4.3 and 4.4.

Study Session 1: Polynomial interpolation

Read Sections 4.1 and 4.2

Commentary
1. Equation (4.2) represents (n + 1) linear equations (one for each interpolation
point) with n + 1 unknowns (the coefficients of the required polynomial).
Theorem 4.1 shows that the corresponding (n + 1) × (n + 1) matrix
⎛ ⎞
1 x0 x20 . . . xn0
⎜ 1 x1 x21 . . . xn1 ⎟
⎜ ⎟
⎜ .. ⎟
⎝. ⎠
1 xn x2n . . . xnn
must be non-singular. In fact, this Vandermonde matrix, as it is called, has
determinant
"
(xj − xi ),
0≤i<j≤n

which is clearly non-zero if and only if the xi are distinct points.

2. Below is the graph of a typical Lagrange function


"n  
x − xj
k (x) = ,
j=0
xk − xj
j=k

with n = 10 and k = 6.

y = 6 (x)

x0 x5 x6 x7 x10

Note that if all the xi are kept fixed except for xk and xk+1 which both tend
to a number c, then k (x) → ∞ for any x = x0 , x1 , . . . , xk−1 , c, xk+2 , . . . , xn .

3. The useful symbol δki in equation (4.11) is called the Kronecker delta.

36
4. The remarks before the statement of Theorem 4.2 provide a way of
remembering that it is the (n + 1)th derivative of f which appears in the
error formula (4.13).

5. Equation (4.15) can be written as


"
n
1
g (n+1) (ξ) = f (n+1) (ξ) − e(x)(n + 1)! = 0.
i=0
(x − xi )

Self-assessment questions
S1 Powell Exercise 4.1

S2 Verify equation (4.11). (This identity is required in Chapter 10 and in


Chapter 19.)

S3 Powell Exercise 4.8

Study Session 2: Chebyshev interpolation

Read Sections 4.3 and 4.4

Commentary
1. Table 4.1. Here is the graph of the Runge example together with its
Lagrange interpolating polynomial p of degree 10.

y = p(x)

1
1 y=
1 + x2

−5 −1 1 5

Notice that p(4.5) 1.6, as indicated by the 5th entry in the middle column
of Table 4.1. Here is the graph of the corresponding function
10
"
prod(x) = (x − xj ).
j=0

y = prod (x)
105
−5 −1 1 5

As you can see, there is a close relationship between prod(x) and the size of
the error function in the above interpolation.

37
2. Chebyshev polynomials (pronounced Cheby‘shov’ in Russian).
We have
cos θ = cos θ ⇒ T1 (x) = x,
cos 2θ = 2 cos2 θ − 1 ⇒ T2 (x) = 2x2 − 1
and cos 3θ = 4 cos3 θ − 3 cos θ ⇒ T3 (x) = 4x3 − 3x.
The graphs of these Chebyshev polynomials appear below.

1
2
y = 2x − 1
y=x
y = 4x3 − 3x

−1 1

−1

Formula (4.25) shows that Tn (x) is a polynomial of degree  n in which  the


(2i − 1)π
coefficient of xn is 2n−1 . It has zeros at the points cos ,
2n
i = 1, . . . , n, since
      
(2i − 1)π −1 (2i − 1)π
Tn cos = cos n cos cos
2n 2n
# π$
= cos (2i − 1) =0
2
(note that 0 < (2i − 1)π/(2n) < π, for i = 1, . . . , n).
Thus the function Tn+1 has n + 1 (simple) zeros which are, in increasing
order,
 
[2(n − i) + 1]π
xi = cos , i = 0, 1, . . . , n,
2(n + 1)
and we deduce that
"
n
1
Tn+1 (x) = 2n (x − xi ) ⇒ max |prod(x)| = .
i=0
−1≤x≤1 2n
The graphs below shows the points xi in the case n = 10 and the graph
y = T11 (x).

1
21π
π 22
y = T11 (x)

π y = cos−1 x
2 −1 0 1

−1 1
x0 x5 = 0 x10 −1

38
If xi are the Chebyshev points for the interval [a, b], defined by (4.28) and
(4.30), and ti are the Chebyshev points for [−1, 1], defined by (4.27), then,
for a ≤ x ≤ b with x = λ + µt,
"
n
prod(x) = (x − xi )
i=0
"n
= ((λ + µt) − (λ + µti ))
i=0
"
n
= µn+1 (t − ti )
i=0
= µn+1 prod(t),
where the latter product is defined with respect to the ti . Thus
max |prod(x)| = µn+1 max |prod(t)|
a≤x≤b −1≤t≤1
 n+1
b−a 1
= · n
2 2
 n+1
b−a
=2 .
4
For example, with n = 10 and [a, b] = [−5, 5], this maximum is 47 683.7,
which is considerably smaller than the corresponding maximum for
equally-spaced points. Finally, we plot the interpolating polynomial of degree
10 to the Runge example using these Chebyshev interpolation points.

−5 0 5

This graph confirms the fifth entry in the third column of Table 4.4, which
gives the maximum error in the above interpolation of approximately 0.1.

3. Theorem 4.3 provides an alternative method of calculating X in the


solution of Problem P4, Chapter 3. Also, the formula for X should remind
you of Problem P1, Chapter 3. In the proof, some comments are required
about the last two equalities of (4.32).
First, it is legitimate to change the order of the sup and the max because,
quite generally, we have
sup sup φ(x, y) = sup sup φ(x, y),
x∈X y∈Y y∈Y x∈X

for any real-valued function φ(x, y), x ∈ X, y ∈ Y . Indeed, for any fixed
x ∈ X, y ∈ Y ,
φ(x, y) ≤ sup φ(ξ, y)
ξ∈X

so that
sup φ(x, y) ≤ sup sup φ(ξ, y).
y∈Y y∈Y ξ∈X

39
Thus
sup sup φ(x, y) ≤ sup sup φ(x, y),
x∈X y∈Y y∈Y x∈X

and the reverse inequality is proved similarly.


Next, note that in the final step of (4.32) it is clear that
 
n  n
 
max sup  f (xk )k (x) ≤ max |k (x)|,
a≤x≤b f ≤1   a≤x≤b
k=0 k=0

and equality is seen to hold by taking f (xk ) = sgn (k (x∗ )), where

n 
n
|k (x∗ )| = max |k (x)|.
a≤x≤b
k=0 k=0

Note that all the norms in Theorem 4.3 are the ∞-norm · ∞ .

4. Table 4.5. It is natural to ask at what rate the norms in the right-hand

column are increasing. It can be shown that these grow like π2 loge n.

5. The ‘optimal nodes problem’, to find the interpolating points which minimise
X (see Powell Exercise 4.10) was solved only comparatively recently; see
the remarks in Appendix B. Note that the two independent papers [28] and
[89], referred to in Appendix B, appeared ‘back-to-back’ in the Journal of
Approximation Theory in 1978.

Self-assessment questions
S4 By calculating the interpolating polynomial from P2 to the Runge example,
with suitable interpolation points x0 , x1 , x2 , confirm the first entry in each
column of Table 4.1.

S5 (a) Use formula (4.25) to calculate T4 (x) and T5 (x).


(b) Prove that T2n (x) = 2Tn (x)2 − 1.

S6 Check the first entry in both columns of Table 4.5.

Problems for Chapter 4


P1 Powell Exercise 4.2

P2 Powell Exercise
 4.3 (Hint: find the maxima of |(x − a)(x − b)| and
|(x − a) x − 12 (a + b) (x − b)| on [a, b].)

P3 Powell Exercise 4.4 (Hint: try to find a substitute for the function g of
(4.14).)

P4 Powell Exercise 4.5 (Hint: decide first where the maximum and minimum
gaps occur.)

P5 Powell Exercise 4.6

40
Solutions to SAQs in Chapter 4
S1 Using equation (4.7),
(x − 1)(x − 2)(x − 3) (x − 0)(x − 2)(x − 3)
p(x) = f (0) + f (1)
(0 − 1)(0 − 2)(0 − 3) (1 − 0)(1 − 2)(1 − 3)
(x − 0)(x − 1)(x − 3) (x − 0)(x − 1)(x − 2)
+ f (2) + f (3)
(2 − 0)(2 − 1)(2 − 3) (3 − 0)(3 − 1)(3 − 2)
1 1
= − 6 f (0)(x − 1)(x − 2)(x − 3) + 2 f (1)x(x − 2)(x − 3)
− 12 f (2)x(x − 1)(x − 3) + 16 f (3)x(x − 1)(x − 2).
Hence
p(6) = −10f (0) + 36f (1) − 45f (2) + 20f (3).
If f (x) = (x − 3)3 , then f (0) = −27, f (1) = −8, f (2) = −1, f (3) = 0, and so
p(6) = −10(−27) + 36(−8) − 45(−1) + 20(0) = 27.
This is correct since f (6) = 27 and f is a cubic, so the interpolation is exact.
The uncertainty of p(6), if that of each function value is ±ε, is
3

± ε|k (6)| = ±(10ε + 36ε + 45ε + 20ε) = ±111ε.
k=0

S2 Substituting (4.3) in (4.9) gives


n "n  
i x − xj
xk = xi , i = 0, 1, . . . , n,
j=0
x k − xj
k=0
j=k

since the interpolation is exact for polynomials of degree i ≤ n. The coefficient


of xn on the right is 1 if i = n and 0 otherwise, whereas on the left it is

n "
n
1
xik .
j=0
(xk − xj )
k=0
j=k

Hence (4.11) follows.

S3 The Lagrange interpolation formula can be written as follows:


n 
n "n  
x − xj
f (xk )k (x) = f (xk )
j=0
xk − xj
k=0 k=0
j=k
 
"
n 
n
f (xk ) 1
= (x − xj ) % .
n
(x − xk )
j=0 k=0 (xk − xj )
j=0
j=k

Each of the quantities


f (xk )
µk = %
n
(xk − xj )
j=0
j=k

depends only on the data points and the function values, and so can be
calculated beforehand. Hence
"n 
n
µk
p(x) = (x − xj )
j=0
x − xk
k=0

can be evaluated using n + 1 multiplications, n additions, n + 1 divisions and


n + 1 subtractions. Altogether this is 4n + 3 ≤ 5n arithmetic operations
(n ≥ 3).

41
S4 First calculate
1 1
f (−5) = 26 , f (0) = 1, f (5) = 26 .

The unique quadratic p(x) taking these values is of the form p(x) = 1 − ax2 ,
a > 0. To find a we use
1
26 = 1 − a · 52 ⇒ a= 1
26 .
5
Now, with n = 2 we have x 32 = 2 and
5 1 4
f = = = 0.137 931 034,
2
1 + 25
4
29

p( 52 ) = 1 − 1
26 · 25
4 = 79
104 = 0.759 615 384.
Thus
5 5
f 2 −p 2 = −0.621 684 35,
and the verification is complete.

S5 (a) T4 (x) = 2xT3 (x) − T2 (x)


   
= 2x 4x3 − 3x − 2x2 − 1
= 8x4 − 8x2 + 1,
   
T5 (x) = 2x 8x4 − 8x2 + 1 − 4x3 − 3x
= 16x5 − 20x3 + 5x.
 
(b) T2n (x) = cos 2n cos−1 x
 
= 2 cos2 n cos−1 x − 1
= 2Tn (x)2 − 1.

S6 Let X be the interpolation operator for the equally-spaced points x0 = −5,


x1 = 0, x2 = 5. Then
X ∞ = max { Xf ∞ : f ∞ = 1}
= max { p ∞ : p ∈ P2 , |p(xi )| ≤ 1, i = 0, 1, 2}.
As in Problem P4, Chapter 3, the maximum is attained when p(x0 ), p(x1 ),
p(x2 ) = ±1, and so we need consider only these cases. It is easy to see that
the maximum must occur for p(x0 ) = −1, p(x1 ) = 1, p(x2 ) = 1, with the
polynomial
 2  
p(x) = 54 − 25
1
x − 52 ⇒ p ∞ = p 52 = 1.25.
This confirms the first entry in the left-hand column of Table 4.5.
In the right-hand column, Powell uses the Chebyshev interpolation points
given by (4.27), scaled so that the initial and final points map to −5 and 5,
respectively (see the discussion
√ √before Table 4.5). In the first entry the points
given by (4.27) are − 3/2, 0, 3/2, so ||X||∞ is calculated using x0 = −5,
x1 = 0, x2 = 5 again. Hence ||X||∞ = 1.25 again, as required.
Note that if ||X||∞ is calculated on [−1, 1] using the points in (4.27), then the
values obtained are different from those in the right-hand column. This is
because the points in (4.27) do not include ±1. A formula for ||X||∞ ,
calculated in the latter way, is given in Powell Exercise 12.6.

42
Alternatively, Theorem 4.3 can be used. Here is the calculation for
equally-spaced points x0 = −5, x1 = 0, x2 = 5. First
(x − 0)(x − 5) 1
0 (x) = = 50 x(x − 5),
(−5 − 0)(−5 − 5)
(x − (−5))(x − 5)
1 (x) = 1
= − 25 (x2 − 25),
(0 − (−5))(0 − 5)
(x − (−5))(x − 0) 1
2 (x) = = 50 x(x + 5).
(5 − (−5))(5 − 0)
Now, for 0 ≤ x ≤ 5,
2
  
|k (x)| = 1
50 x(5 − x) + 1
25 25 − x2 + 1
50 x(x + 5)
k=0
 
= 1
25 25 + 5x − x2 ,
and the maximum of this expression occurs when x = 52 . Hence
2
  
max |k (x)| = 1
25 25 + 5.5/2 − (5/2)2 = 5/4.
0≤x≤5
k=0

By symmetry, the maximum of this sum will be the same for −5 ≤ x ≤ 0 and
so X ∞ = 5/4, as required.

Solutions to Problems in Chapter 4


P1 By Theorem 4.2, applied on [0, 1] to the interpolation points {0, 0.7} and
{0.7, 1}, there exist ξ0 , ξ1 ∈ [0, 1] such that
e0 (x) = f (x) − p0 (x) = 12 x(x − 0.7)f (2) (ξ0 ), x ∈ [0, 1],
e1 (x) = f (x) − p1 (x) = 12 (x − 0.7)(x − 1)f (2) (ξ1 ), x ∈ [0, 1],
where p0 , p1 ∈ P1 with
p0 (0) = 0, p0 (0.7) = p1 (0.7) = 0.7, p1 (1) = 0.1.
Now note that
|x(x − 0.7)| ≤ |(x − 0.7)(x − 1)| ⇔ |x| ≤ |x − 1| ⇔ x ≤ 12 .
Therefore (assuming that we have no information about f (2) ) it is best to use
p0 (x) for 0 < x < 12 and p1 (x) for 12 < x < 1.
Applying these error estimates at x = 0.5 itself, where p0 (0.5) = 0.5 and
p1 (0.5) = 1.1, we obtain
f (0.5) − 0.5 = −0.05f (2)(ξ0 ) ≤ 0.05 f (2) ∞ ,
f (0.5) − 1.1 = 0.05f (2) (ξ1 ) ≥ −0.05 f (2) ∞ .
Hence
1.1 − 0.05 f (2) ∞ ≤ f (0.5) ≤ 0.5 + 0.05 f (2) ∞ ,
as required. Thus
1.1 − 0.5
f (2) ∞ ≥ = 6.
0.05 + 0.05

43
P2 According to Theorem 4.2, the error in interpolating f (x) = cos x over
[kπ/n1 , (k + 1)π/n1 ] by p1 ∈ P1 , such that p1 (kπ/n1 ) = f (kπ/n1 ) and
p1 ((k + 1)π/n1 ) = f ((k + 1)π/n1 ), is at most
   
1 kπ (k + 1)π  (2)
 x− x−

max
(k+1)π  2 n1 n1  f ∞ .
n ≤x≤
1 n 1

(2)
Since f ∞ ≤ 1 and the maximum of |(x − a)(x − b)| on [a, b] is
((b − a)/2)2 , we deduce that
 2
1 π π2
f − p1 ∞ ≤ 2 = .
2n1 8n21
To guarantee that this error is less than 10−6 it is, therefore, sufficient for n1
to satisfy
103 π
n1 > √ = 1110.7 . . . .
8
Again by Theorem 4.2, the error in interpolating f (x) = cos x over
[kπ/n2 , (k + 2)π/n2 ] by p2 ∈ P2 , such that p2 (kπ/n2 ) = f (kπ/n2 ),
p2 ((k + 1)π/n2 ) = f ((k + 1)π/n2 ), p2 ((k + 2)π/n2 ) = f ((k + 2)π/n2 ), is at
most
    
1 kπ (k + 1)π (k + 2)π  (3)
 x− x− x−

max
(k+2)π  6 n2 n2 n2  f ∞ .
n2 ≤x≤ n2
   
Since f ∞ ≤ 1 and the maximum of (x − a) x − 12 (a + b) (x − b) on
(3)

[a, b] is (2 3/9)((b − a)/2)3 , we deduce that
√  3
2 3 π π3
f − p2 ∞ ≤ 16 · = 5/2 3 .
9 n2 3 n2
To guarantee that this error is less than 10−6 it is, therefore, sufficient for n2
to satisfy
102 π
n2 > = 125.7 . . . .
35/6

P3 Clearly this exercise is related to Theorem 4.2. To solve it we must find a


substitute for the function g of (4.14). It is natural to consider the function
tn (t − 1)n
g(t) = f (t) − p(t) − e(x) , 0 ≤ t ≤ 1,
xn (x − 1)n
and to try and show that g (2n) (ξ) = 0, for some ξ ∈ [0, 1], since this would give
(2n)!
0 = f (2n) (ξ) − e(x) ,
xn (x − 1)n
as required.
To prove that g (2n) (ξ) = 0 for some ξ it is sufficient to prove that g (n) = 0 for
n + 1 distinct points in [0, 1], since we can then apply Rolle’s theorem n
times, as in the proof of Theorem 4.2.
By the definition of g we have g(x) = 0, and
&
g (k) (0) = 0
, k = 0, 1, . . . , n − 1.
g (k) (1) = 0
If we apply Rolle’s theorem to g on [0, x] and on [x, 1], then we deduce that
there are two distinct points x0 ∈ (0, x) and x1 ∈ (x, 1) such that
g  (x0 ) = g  (x1 ) = 0.
Now apply Rolle’s theorem to g  on [0, x0 ], [x0 , x1 ], [x1 , 1], to deduce that
there are three distinct points in (0, 1) at which g (2) = 0. Continuing in this
way, we apply Rolle’s theorem n times to deduce that there are indeed n + 1
distinct points in (0, 1) at which g (n) = 0, as required.

44
P4 Since
 &
[2(n − i) + 1]π
xi = cos , i = 0, 1, . . . , n,
2(n + 1)
and the function f (x) = cos x is concave on [0, π/2] and convex on [π/2, π],
the maximum gap occurs in the middle of the range and the minimum gap
occurs at the ends.
 
If n is even, then the maximum gap i = 12 n, i + 1 = 12 n + 1 is
       
(n − 1)π (n + 1)π nπ π
cos − cos = 2 sin sin
2(n + 1) 2(n + 1) 2(n + 1) 2(n + 1)
 
π
< 2 sin
2(n + 1)
π
< ,
n+1
since sin x < x, for x > 0.
 
If n is odd, then the maximum gap i = 12 (n − 1), i + 1 = 12 (n + 1) is
       
nπ (n + 2)π (n + 1)π π
cos − cos = 2 sin sin
2(n + 1) 2(n + 1) 2(n + 1) 2(n + 1)
 
π
= 2 sin
2(n + 1)
π
< .
n+1
Since the gap for n + 1 equally-spaced points is 2/n, the desired factor is
indeed less than π/2 in both cases.
The minimum gap (i = n − 1, i + 1 = n) is
       
π 3π π π
cos − cos = 2 sin sin .
2(n + 1) 2(n + 1) 2(n + 1) n+1
Thus the ratio of the maximum to the minimum gap is
⎧ # $ # $
⎨sin nπ / sin π
2(n+1) n+1 , n even,
# $
⎩1/ sin π , n odd.
n+1

It is evident that
1 1 n+1
# $ > # $ = ,
sin n+1π π π
n+1

so the required lower estimate clearly holds for n odd. For n even, there is a
little more work to do, since sin(nπ/2(n + 1)) < 1. However,
     
nπ π π π
sin = sin − = cos
2(n + 1) 2 2(n + 1) 2(n + 1)
and
     
π π π
sin = 2 sin cos ,
n+1 2(n + 1) 2(n + 1)
so that
# $

sin 2(n+1) 1 1 n+1
# $ = # $ > # $= ,
π
sin n+1 π
2 sin 2(n+1) π
2 2(n+1) π

which completes the solution.

45
P5 We give a solution along the lines of that used to find X ∞ in Problem P4,
Chapter 3. Note that Theorem 4.3 cannot be used because we are not
interpolating by a general element of P3 . To find X ∞ , we must find the
maximum on [0, 3] of |p(x)| = |c0 + c1 x + c3 x3 |, where |p(0)| ≤ 1, |p(2)| ≤ 1
and |p(3)| ≤ 1; once again, by the linear programming argument, we need
consider only the cases p(0), p(2), p(3) = ±1. The critical cases are sketched
below.

1 1 1

0 2 3 0 2 3 0 2 3
−1 −1 −1
(a) (b) (c)

In fact, case (c) gives the greatest value of |p(x)| in [0, 3]. In this case we have
p(0) = 1, p(2) = 1 and p(3) = −1, so that
8 2 3

p(x) = 1 + 15 x − 15 x ⇒ p ∞ = p(2/ 3) = 1 + 4532 √ ,
3

as required.

46
Chapter 5 Divided differences
In Chapter 4 we found that interpolation can provide a good method of
determining a polynomial approximation to a given function. This chapter is
devoted to a good method of calculating such an interpolating polynomial using a
formula due to Newton which involves divided differences.
This chapter splits into TWO study sessions:
Study session 1: Sections 5.1, 5.2 and 5.3.
Study session 2: Sections 5.4 and 5.5.

Study Session 1: Basic properties of divided


differences

Read Sections 5.1, 5.2 and 5.3

Commentary
1. The definition of the divided difference given in Section 5.1 makes it clear
that f [x0 , x1 , . . . , xn ] is independent of the order in which the points
x0 , x1 , . . . , xn appear. For example
f [x0 , x1 , x2 , x3 ] = f [x1 , x3 , x0 , x2 ].

2. The remarks at the bottom of page 47 will make more sense after you have
read how to calculate divided differences in Section 5.3.

3. The key features of the Newton formula (5.12) are that, for
k = 0, 1, . . . , n − 1,
(a) the first k + 1 terms comprise the polynomial pk ∈ Pk which interpolates
f at x0 , x1 , . . . , xk ;
(b) the (k + 2)th term is an estimate for the error in the approximation of f
by pk .
If a large number of function values are available, therefore, Newton’s
formula should give better and better approximations to f by choosing more
and more interpolation points. By checking the size of each additional term
calculated, one can decide when further interpolation points are of no help.

4. Some special cases of (5.14) are


f [xj+1 ] − f [xj ]
f [xj , xj+1 ] = ,
xj+1 − xj
f [xj+1 , xj+2 ] − f [xj , xj+1 ]
f [xj , xj+1 , xj+2 ] = .
xj+2 − xj

47
The following diagram may help to interpret (5.14) in general.

xj f (xj )

xj+1 f (xj+1 )
..
. f [xj , . . . , xj+k ]
f [xj , . . . , xj+k+1 ]
..
. f [xj+1 , . . . , xj+k+1 ]
xj+k f (xj+k )

xj+k+1 f (xj+k+1 )

The (k + 1)th divided difference is found using the two adjacent terms in the
previous column and the corresponding x values at the ends of the diagonals.

5. The method of calculating divided differences given in Theorem 5.3 explains


the remarks at the bottom of page 47. For example, if the data
f (x0 ), f (x1 ), . . . , f (xn ) is given and hi = xi − xi−1 , i = 1, 2, . . . , n, then
adding ε to f (x2 ) has the following effect on the table (note how the errors
grow rapidly if the h values are small).
x0 0
0  
x1 0 +ε/ h1 h2 + h22
+ε/h2
x2 +ε −ε/(h2 h3 ) ...
−ε/h3  
x3 0 +ε/ h23 + h3 h4
0
x4 0
The pattern which emerges in the case of equally-spaced data is investigated
in Powell Exercise 5.3 and exploited in Powell Exercise 5.8.

6. When evaluating (5.12) it is sometimes convenient to use the nested form


pn (x) = f (x0 ) + (x − x0 )(f [x0 , x1 ] + (x − x1 )(f [x0 , x1 , x2 ] + · · ·
. . . (f [x0 , x1 , . . . , xn−1 ] + (x − xn−1 )f [x0 , x1 , . . . , xn ]) . . .)).

Self-assessment questions
S1 Powell Exercise 5.1

S2 Powell Exercise 5.2

48
Study Session 2: Numerical considerations and
Hermite interpolation

Read Sections 5.4 and 5.5

Commentary
1. The method of interpolation by calculating the coefficients ci , i = 0, 1, . . . , n,

n
in p(x) = ci xi is, of course, convenient for interpolating by very low
i=0
degree polynomials, where the coefficients can be found exactly. For higher
degree polynomials, however, it is difficult to calculate the coefficients with
sufficient accuracy because the corresponding matrix equation may be
ill-conditioned.
"m
2. The assertion at the bottom of page 54 that p is a multiple of (x − xi )i +1
is true because i=0

f (j) (xi ) = 0, j = 0, 1, . . . , i , i = 0, 1, . . . , m.
This implies that the Taylor expansion of p about each xi begins
f (i +1) (xi )
p(x) = (x − xi )i +1 + · · · ,
(i + 1)!
so that (x − xi )i +1 is a factor of p(x), for each i.

3. The word ‘suitable’ at the top of page 56 can be interpreted to mean ‘valid’.

4. The proof of Theorem 5.5 shows that Hermite interpolation is the limiting
case of Newton’s formula (5.12), which is obtained when various adjacent
interpolation points merge together.

Self-assessment questions
S3 Verify equation (5.19).

S4 Calculate the value p(1.8) given by (5.20) and confirm the value p(1.8) given
by (5.21). (Evaluate these polynomials by nested multiplication:
a0 + a1 x + a2 x2 + · · · + an xn = a0 + x(a1 + · · · + x(an−1 + an x) · · ·).)

S5 Verify that the polynomial (5.29) satisfies the last two interpolation
conditions in (5.28).

Problems for Chapter 5


P1 Powell Exercise 5.3

P2 Powell Exercise 5.4

P3 Powell Exercise 5.5

P4 Powell Exercise 5.7

P5 Powell Exercise 5.8

49
Solutions to SAQs in Chapter 5
S1 The table is as follows.
xi f (xi ) Order 1 Order 2 Order 3 Order 4
−2 3.28
14.08
−1 17.36 −3.72
−0.8 1.0
2 14.96 1.28 0
4.32 1.0
3 19.28 6.28
16.88
4 36.16
Thus
p4 (x) = 3.28 + 14.08(x + 2) − 3.72(x + 2)(x + 1) + (x + 2)(x + 1)(x − 2)
and so
p4 (4) = 3.28 + 14.08 × 6 − 3.72 × 6 × 5 + 6 × 5 × 2 = 36.16,
as expected. Note that p4 (x) = p3 (x) in this example.

S2 The required formula for pn (x0 ) follows from (5.12) by noting that, for
k = 1, 2, . . . , n − 1,
d d
(x − x0 ) . . . (x − xk ) = (x − x1 ) . . . (x − xk ) + (x − x0 ) (x − x1 ) . . . (x − xk )
dx dx
and so
d 
(x − x0 ) . . . (x − xk )x=x = (x0 − x1 ) . . . (x0 − xk ).
dx 0

Thus
p (2) = f [2, 3] + (2 − 3)f [2, 3, 4] + (2 − 3)(2 − 4)f [2, 3, 4, −1]
+ (2 − 3)(2 − 4)(2 + 1)f [2, 3, 4, −1, −2].
By Comment 1 on page 47 and the above table,
f [2, 3] = f [3, 2] = 4.32,
f [2, 3, 4] = f [4, 3, 2] = 6.28,
f [2, 3, 4, −1] = f [4, 3, 2, −1] = 1,
f [2, 3, 4, −1, −2] = f [4, 3, 2, −1, −2] = 0.
Thus
p (2) = 4.32 + (2 − 3)6.28 + (2 − 3)(2 − 4) = 0.04.
The ordering x0 = 2, x1 = −1, x2 = 3, x3 = −2, x4 = 4 also allows us to
obtain the divided differences
f [2, −1], f [2, −1, 3], f [2, −1, 3, −2], f [2, −1, 3, −2, 4]
without compiling a fresh table. With this ordering,
p (2) = f [2, −1] + (2 + 1)f [2, −1, 3] + (2 + 1)(2 − 3)f [2, −1, 3, −2]
+ (2 + 1)(2 − 3)(2 + 2)f [2, −1, 3, −2, 4]
= − 0.8 + 3 × 1.28 + 3 × (−1) × 1 + 0
= 0.04,
once again.

50
S3 In exact arithmetic
p(1.8) = 0.0823 − 0.2 × 0.236 33 + 0.2 × 0.17 × 0.329
− 0.2 × 0.17 × 0.1 × 0.328 87 + 0.2 × 0.17 × 0.1 × 0.04 × 0.5008
= 0.0823 − 0.047 266 + 0.011 186 − 0.001 118 158 + 0.000 068 108 8
= 0.045 169 950 8.
(Note that using nested multiplication here may lose you a couple of digits at
the end.)

S4 Using (5.20),
p(1.8) = 6.700 98 + 1.8(−13.360 21 + 1.8(10.3856 + 1.8(−3.692 41 + 1.8 × 0.502 72)))
= 0.045 164 35,
which agrees with the data to 4 places of decimals. Using (5.21),
p(1.8) = 6.701 + 1.8(−13.36 + 1.8(10.386 + 1.8(−3.6924 + 1.8 × 0.502 72)))
= 0.046 916 672,
which agrees with the data to only 2 places of decimals.

S5 Since 1.8 − 1.6 = 0.2, 1.8 − 1.7 = 0.1 and 1.8 − 1.8 = 0,
p(1.8) = 0.082 297 + 0.2(−0.246 892 + 0.2(0.335 92 + 0.1 × (−0.297 35)))
= 0.045 166,
as required. Now
d
(x − 1.6)2 = 2(x − 1.6),
dx
d
(x − 1.6)2 (x − 1.7) = 2(x − 1.6)(x − 1.7) + (x − 1.6)2 ,
dx
d
(x − 1.6)2 (x − 1.7)(x − 1.8) = 2(x − 1.6)(x − 1.7)(x − 1.8) + (x − 1.6)2 (2x − 3.5).
dx
Hence
p (1.8) = − 0.246 892 + 0.335 92 × 2 × 0.2
− 0.297 35(2 × 0.2 × 0.1 + 0.2 × 0.2)
+ 0.203 75(0.2 × 0.2 × 0.1)
= − 0.135 497,
as required.
Remark In more complicated examples, one might use a more systematic
approach to calculate p (x), where
p(x) = a0 + (x − x0 )(a1 + (x − x1 )(a2 + · · · (x − xn−1 )(an + an+1 (x − xn )) . . .)).
Put
p(x) = q0 (x) = a0 + (x − x0 )q1 (x) = a0 + (x − x0 )(a1 + (x − x1 )q2 (x)) = . . . ,
so that
qn (x) = an + an+1 (x − xn ) and qn+1 (x) = an+1 .
Then
qk (x) = ak + (x − xk )qk+1 (x)
and so
qk (x) = qk+1 (x) + (x − xk )qk+1

(x).
Hence, by induction,
p (x) = q1 (x) + (x − x1 )(q2 (x) + · · · (qn (x) + (x − xn )qn+1 (x)) . . .)).

51
Solutions to Problems in Chapter 5
P1 First note that with the given values of xi , we have
"
n
(xk − xj ) = k!hk (n − k)!(−h)n−k = (−1)n−k hn k!(n − k)!,
j=0
j=k

so that

n
f (xk )
f [x0 , x1 , . . . , xn ] = h−n (−1)n−k .
k!(n − k)!
k=0

To verify that this formula is consistent with Theorem 5.3 we note that

k+1
f (xi+j )
−k−1
f [xj , . . . , xj+k+1 ] = h (−1)k+1−i ,
i=0
i!(k + 1 − i)!

k
f (xi+j )
f [xj , . . . , xj+k ] = h−k (−1)k−i ,
i=0
i!(k − i)!

k
f (xi+j+1 )
f [xj+1 , . . . , xj+k+1 ] = h−k (−1)k−i
i=0
i!(k − i)!
and
xj+k+1 − xj = (k + 1)h.
Now, the coefficient of f (xi+j ) in
f [xj+1 , . . . , xj+k+1 ] − f [xj , . . . , xj+k ]
xj+k+1 − xj
is

1 (−1)k−(i−1) (−1)k−i

(k + 1)hk+1 (i − 1)!(k − (i − 1))! i!(k − i)!

(−1)k+1−i 1 1
= +
(k + 1)hk+1 (i − 1)!(k + 1 − i)! i!(k − i)!
(−1)k+1−i
= ,
hk+1 i!(k + 1 − i)!
which is the coefficient of f (xi+j ) in f [xj , . . . , xj+k+1 ]. Hence the recurrence
relation (5.14) does indeed hold.

P2 The table can be reconstructed from the first entry in each column using
(5.14) in the form
f [xj+1 , . . . , xj+k+1 ] = f [xj , . . . , xj+k ] + (xj+k+1 − xj )f [xj , . . . , xj+k+1 ].
For example,
f [1.8, 1.76, 1.7, 1.63] = f [1.76, 1.7, 1.63, 1.6] + (1.8 − 1.6)f [1.8, 1.76, 1.7, 1.63, 1.6]
= −0.328 87 + 0.2 × 0.500 80
= −0.228 71,

f [1.76, 1.7, 1.63] = f [1.7, 1.63, 1.6] + (1.76 − 1.6)f [1.76, 1.7, 1.63, 1.6]
= 0.329 + 0.16 × (−0.328 87)
= 0.276 380 8,

f [1.8, 1.76, 1.7] = f [1.76, 1.7, 1.63] + (1.8 − 1.63)f [1.8, 1.76, 1.7, 1.63]
= 0.276 380 8 + 0.17 × (−0.228 71)
= 0.237 500 1.

52
Continuing in this manner, we find
f [1.7, 1.63] = −0.236 33 + (1.7 − 1.6) × 0.329 = −0.203 43,
f [1.76, 1.7] = −0.203 43 + (1.76 − 1.63) × 0.276 380 8 = −0.167 500 49,
f [1.8, 1.76] = −0.167 500 49 + (1.8 − 1.7) × 0.237 500 1 = −0.143 750 48,
f [1.63] = 0.082 30 + (1.63 − 1.6) × (−0.236 33) = 0.075 210 1,
f [1.7] = 0.075 210 1 + (1.7 − 1.63) × (−0.203 43) = 0.060 97,
f [1.76] = 0.060 97 + (1.76 − 1.7) × (−0.167 500 49) = 0.050 919 97,
f [1.8] = 0.050 919 97 + (1.8 − 1.76) × (−0.143 750 48) = 0.045 169 951.

P3 The required table is as follows.


f (0)
f  (0)
1 
f (0) 2 f (0)

f (0) f [1, 0] − f  (0) − 12 f  (0)
f (0) 
f [1, 0] − f (0) f  (1) − 3f [1, 0] + 2f  (0) + 12 f  (0)
 
f [1, 0] f (1) − 2f [1, 0] + f (0)
f (1) f  (1) − f [1, 0]
f  (1)
f (1)
Hence
 
p(x) = f (0) + xf  (0) + 12 x2 f  (0) + x3 f [1, 0] − f  (0) − 12 f  (0)
 
+ x3 (x − 1) f  (1) − 3f [1, 0] + 2f  (0) + 12 f  (0) .
If f (x) = (x + 1)4 , then f (0) = 1, f  (0) = 4, f  (0) = 12, f (1) = 16,
f  (1) = 32, so that
p(x) = 1 + 4x + 6x2 + x3 (15 − 4 − 6) + x3 (x − 1)(32 − 45 + 8 + 6)
= 1 + 4x + 6x2 + 5x3 + x3 (x − 1)
= 1 + 4x + 6x2 + 4x3 + x4
= (1 + x)4 ,
as required.

P4 It is easy to prove that if f (k) is strictly increasing, then the kth-order


differences are increasing. Indeed, by Theorem 5.1 and Theorem 5.3,
f [xj+1 , . . . , xj+k+1 ] − f [xj , . . . , xj+k ] = (xj+k+1 − xj )f [xj , . . . , xj+k+1 ]
f (k+1) (ξ)
= (xj+k+1 − xj ) ,
(k + 1)!
for some ξ in [xj , xj+k+1 ]. Since f (k) is strictly increasing, we have
f (k+1) (ξ) ≥ 0. Thus
f [xj+1 , . . . , xj+k+1 ] ≥ f [xj , . . . , xj+k ].
Some extra work is required to prove that this inequality must be strict. If it
is not, then f [xj , . . . , xj+k+1 ] = 0, and so the polynomial p in Pk+1 which
interpolates f at xj , . . . , xj+k+1 actually lies in Pk . Therefore, e = f − p has
(at least) k + 2 zeros and so, by Rolle’s Theorem, e(k) has (at least) 2 zeros.
Hence f (k) = p(k) at (at least) 2 points. But p(k) is a constant and f (k) is
strictly increasing — a contradiction. Hence the kth-order differences are
strictly increasing.

53
P5 The difference table is as follows.
xi f (xi ) Order 1 Order 2 Order 3
0.0 0.0
0.119 778
0.1 0.119 778 0.009 57
0.129 348 0.000 018
0.2 0.249 126 0.009 588
0.138 936 −0.002 982
0.3 0.388 062 0.006 606
0.145 542 0.009 015
0.4 0.533 604 0.015 621
0.161 163 −0.008 982
0.5 0.694 767 0.006 639
0.167 802 0.003 013
0.6 0.862 569 0.009 652
0.177 454 0.000 005
0.7 1.040 023 0.009 657
0.187 111 0.000 041
0.8 1.227 134 0.009 698
0.196 809 −0.000 015
0.9 1.423 943 0.009 683
0.206 492
1.0 1.630 435
The second-order differences are irregular. Most noticeably, the numbers
0.006 606, 0.015 621, 0.006 639 are substantially different from the other
entries, and this can be traced to an error in the value of f (0.4). Indeed, if the
above value of f (0.4) is increased by ε, then the increases in the second-order
differences are ε, −2ε, ε respectively, so that their average remains constant at
(0.006 606 + 0.015 621 + 0.006 639)/3 = 0.009 622.
For the middle one of these three differences to equal 0.009 622, we require
2ε = 0.015 621 − 0.009 622, that is, ε = 0.002 999 5. It appears likely,
therefore, that f (0.4) should actually be 0.536 604. (Alternatively: note that
the increases in the corresponding third-order differences are ε, −3ε, 3ε, −ε,
so that ε 0.003.)
Once this error is corrected, notice that the second-order differences are
increasing, apart from the last one, and that the increase from 0.009 657 to
0.009 698 is abnormally large. This can be traced to an error in the value of
f (0.8). Once again an increase of ε in the above value of f (0.8) leads to
increases in the last three second-order differences of ε, −2ε, ε respectively.
The average of these differences is 0.009 679 3̇, which suggests that
2ε = 0.009 698 − 0.009 679 3̇, that is, ε = 0.000 009 3̇. It appears likely,
therefore, that f (0.8) should actually be 1.227 143.
Once these corrections are made, the third-order differences are (giving only
the significant digits)
18, 18, 15, 18, 13, 14, 14, 12,
which is quite regular, since the final digit of the data is probably rounded.

54
Chapter 6 The uniform convergence of
polynomial approximations
Chapter 6 begins the detailed study of uniform approximation, that is,
approximation of functions in the ∞-norm. The chapter is almost entirely
devoted to the proof of Weierstrass’ approximation theorem, which states that if f
is continuous on [a, b] and ε > 0 is given, then there is a polynomial p such that
p − f ∞ ≤ ε. The degree of p is not fixed in advance and will in general depend
on ε and f .
In this chapter we make frequent use of the Binomial Theorem
n  
n n
(x + y) = xk y n−k ,
k
k=0
especially the cases x + y √
= 1 and x = y = 1. We also require Stirling’s
asymptotic formula, n! ∼ 2πn(n/e)n , in the problems.
This chapter splits into TWO study sessions:
Study session 1: Sections 6.1 and 6.2.
Study session 2: Sections 6.3 and 6.4.

Study Session 1: Monotone operators

Read Sections 6.1 and 6.2

Commentary
1. Weierstrass’ Theorem (proved in 1885) is probably the best known result in
approximation theory. Nevertheless, it is still a little surprising, since a
function f can be very badly behaved and yet be continuous. For example,
there are many functions f which are continuous but nowhere differentiable.
One such function (called the ‘blancmange function’ on account of the shape
of its graph) is given by
∞
1
f (x) = n
φ(2n x), 0 ≤ x ≤ 1,
n=0
2
where φ(x) = |x|, for |x| ≤ 12 , and φ(x + n) = φ(x), for n = 0, ±1, ±2, . . ..
The first example of such a continuous, nowhere differentiable function is due
to Weierstrass himself (1872).

2. Powell proves Weierstrass’ Theorem using Theorem 6.2, which is due to


Korovkin (1953) and Bohman (1952).

3. Note that the assertion in (6.9) uses the fact that a continuous function on a
closed interval [a, b] is uniformly continuous (see commentary on Chapter 1).

Self-assessment questions
S1 Verify the simpler form of the definition of a monotone linear operator L,
given in Section 6.2 (first paragraph).

S2 Verify that (6.11) holds for all x ∈ [a, b].

S3 Powell Exercise 6.1

55
Study Session 2: The Bernstein operator

Read Sections 6.3 and 6.4

Commentary
 
n
1. The expression n!/(k!(n − k)!) is commonly referred to as or n Ck .
k

2. The function Bn f can be thought of as a weighted  average of the function


n
values f (k/n), k = 0, 1, . . . , n. The weights xk (1 − x)n−k are smooth
k
functions of x which are relatively large for x near k/n and relatively small
elsewhere (see Powell Exercise 6.6).

3. A more careful analysis of the Bernstein operator shows that


 √ 
Bn f − f ∞ ≤ 32 ω 1/ n ,
where ω(δ) = sup{|f (x) − f (y)| : |x − y| ≤ δ} is the modulus of continuity
of f . Powell Exercise 6.5 shows that we cannot hope for a significantly
smaller estimate of the error in approximating a continuous function by the
Bernstein operator (compare Powell Exercise 6.8, however, for an
improvement if f ∈ C (2) [0, 1]).

Self-assessment questions
S4 Verify the identity (6.27).

S5 Determine the modulus of continuity ω of the functions


(a) f (x) = x2 (0 ≤ x ≤ 1),
 
(b) f (x) = x − 12  (0 ≤ x ≤ 1).

S6 Prove that if f is continuous on [a, b], then its modulus of continuity ω satisfies
(a) ω is increasing;
(b) lim ω(δ) = 0;
δ→0
(c) ω(δ1 + δ2 ) ≤ ω(δ1 ) + ω(δ2 ).

Problems for Chapter 6


P1 Powell Exercise 6.2

P2 Powell Exercise 6.3

P3 Powell
√ Exercise 6.5 (Hint: you will need to use Stirling’s formula
n! ∼ 2πn(n/e)n .)

P4 Powell Exercise 6.6

P5 Powell Exercise 6.8

56
Solutions to SAQs in Chapter 6
S1 If L is linear and satisfies
(Lf )(x) ≥ 0, a ≤ x ≤ b,
whenever
f (x) ≥ 0, a ≤ x ≤ b,
then
(Lf )(x) − (Lg)(x) = (L(f − g))(x) ≥ 0, a ≤ x ≤ b,
whenever
f (x) ≥ g(x), a ≤ x ≤ b,
as required.

S2 Case 1 If |x − ξ| ≤ δ, then
qu (x) ≥ f (ξ) + ε ≥ f (x),
by (6.9).
Case 2 If |x − ξ| > δ, then
qu (x) ≥ f (ξ) + ε + 2 f ∞
≥ ε + f ∞ (since f ∞ ≥ −f (ξ))
≥ f (x) (since f ∞ ≥ f (x)).
Thus (6.11) holds in either case.

S3 We know that X is a linear operator, so we can use the simpler definition of


monotone operator discussed in SAQ S1.
Suppose first that x0 = a, x1 = b. If f (x) ≥ 0, a ≤ x ≤ b, then f (a), f (b) ≥ 0,
so that
(b − x)f (a) + (x − a)f (b)
(Xf )(x) = ≥ 0, a ≤ x ≤ b,
b−a
and hence X is monotone in this case.
Now suppose that x0 > a. We show that X is not monotone by choosing a
non-negative function f on [a, b] such that Xf fails to be non-negative on
[a, b]. A simple example is f (x) = |x − x0 |, since f (x0 ) = 0, f (x1 ) > 0 implies
that the linear interpolating function Xf (x) is negative for a < x < x0 .
A similar counterexample can be given if x1 < b.
n  
  2
n k
S4 xk (1 − x)n−k
k n
k=0
n 
   
n−1 n−k k
= x (1 − x)
k
k−1 n
k=1
n  
1  n−1
= xk (1 − x)n−k ((k − 1) + 1)
n k−1
k=1
   n     n  
n−1 n−2 1 n−1
= x2 xk−2 (1 − x)n−k + x xk−1 (1 − x)n−k
n k−2 n k−1
k=2 k=1
  
n−2    n−1
n − 1
n−1 n−2 1
= x2 xk (1 − x)n−2−k + x xk (1 − x)n−1−k
n k n k
k=0 k=0
   
n−1 1
= x2 + x,
n n
by the Binomial Theorem.

57
 
S5 (a) By inspection, the largest value of x2 − y 2  when |x − y| ≤ δ, x, y ∈ [0, 1],
occurs for x = 1, y = 1 − δ. Thus
ω(δ) = 1 − (1 − δ)2
= 2δ − δ2 .
   
(b) By inspection, the largest value of  x − 12  − y − 12   when
|x − y| ≤ δ ≤ 12 , x, y ∈ [0, 1], occurs for x = 12 , y = 12 + δ. When
1 1
2 < δ ≤ 1, the largest value occurs for x = 2 , y = 1. Thus

δ, 0 < δ ≤ 12 ,
ω(δ) = 1 1
2, 2 < δ ≤ 1.

S6 (a) If 0 < δ1 < δ2 , then


ω(δ1 ) = sup{|f (x) − f (y)| : |x − y| ≤ δ1 }
≤ sup{|f (x) − f (y)| : |x − y| ≤ δ2 }
= ω(δ2 ),
since the second supremum is taken over a larger set.
(b) For each ε > 0, there exists δ > 0 such that
|x − y| ≤ δ ⇒ |f (x) − f (y)| ≤ ε,
and so
0 ≤ ω(δ) ≤ ε.
Hence
lim ω(δ) = 0.
δ→0

(c) If x, y satisfy |x − y| ≤ δ1 + δ2 , then we can choose z such that


|x − z| ≤ δ1 , |z − y| ≤ δ2 . Since
|f (x) − f (y)| ≤ |f (x) − f (z)| + |f (z) − f (y)|
≤ ω(δ1 ) + ω(δ2 ),
we deduce that
ω(δ1 + δ2 ) ≤ ω(δ1 ) + ω(δ2 ),
as required.

58
Solutions to Problems in Chapter 6
P1 The proof is a generalisation of that of (6.27):
n    3
n k
xk (1 − x)n−k
k n
k=0
 n    2
n−1 n−k k
= x (1 − x)
k
k−1 n
k=1
 
1  n−1
n
= 2 xk (1 − x)n−k ((k − 1)(k − 2) + 3(k − 1) + 1)
n k−1
k=1
n   n  
(n − 1)(n − 2) 3  n − 3 3(n − 1) 2  n − 2
= x xk−3
(1 − x)n−k
+ x xk−2 (1 − x)n−k
n2 k−3 n2 k−2
k=3 k=2
   n  
1 n−1
+ x xk−1 (1 − x)n−k
n2 k−1
k=1
 
(n − 1)(n − 2) 3 3(n − 1) 2 1
= x + x + x,
n2 n2 n2
by the Binomial Theorem.
The method generalises to xr , for n > r, because we can always write k r−1 as
a linear combination of the expressions
(k − 1) . . . (k − r + 1), (k − 1) . . . (k − r + 2), . . . , (k − 1)(k − 2), (k − 1), 1.
Note that if n = r then Bn f is automatically in Pr = Pn .

P2 By (6.23),
6  
 6
p(j/6) = (j/6)k (1 − j/6)6−k fk ,
k
k=0

where fk = f (k/6), k = 0, 1, . . . , 6.
Thus, with the given values of p(j/6), j = 0, 1, . . . , 6, we have 0 = p(0) = f0
and 0 = p(1) = f6 , so that
1 
0 = p(1/6) = 6 6.55 f1 + 15.54f2 + 20.53 f3 + 15.52 f4 + 6.5f5 ,
6
1 
0 = p(1/3) = 6 6.45 .2f1 + 15.44 .22 f2 + 20.43 .23 f3 + 15.42 .24 f4 + 6.4.25 f5 ,
6
1
1 = p(1/2) = 6 (6f1 + 15f2 + 20f3 + 15f4 + 6f5 ),
6
1 
0 = p(2/3) = 6 6.25 .4f1 + 15.24.42 f2 + 20.23 .43 f3 + 15.22 .44 f4 + 6.2.45 .f5 ,
6
1 
0 = p(5/6) = 6 6.5f1 + 15.52 f2 + 20.53f3 + 15.54 f4 + 6.55 f5 ,
6
which reduce to
3750f1 + 1875f2 + 500f3 + 75f4 + 6f5 = 0, (1)
48f1 + 60f2 + 40f3 + 15f4 + 3f5 = 0, (2)
6f1 + 15f2 + 20f3 + 15f4 + 6f5 = 64, (3)
3f1 + 15f2 + 40f3 + 60f4 + 48f5 = 0, (4)
6f1 + 75f2 + 500f3 + 1875f4 + 3750f5 = 0. (5)
By considering (1)–(5) and (2)–(4), we find that f1 − f5 = 0 and f2 − f4 = 0.
Thus equations (1) to (5) further reduce to
3756f1 + 1950f2 + 500f3 = 0, (6)
12f1 + 30f2 + 20f3 = 64, (7)
51f1 + 75f2 + 40f3 = 0. (8)

59
Eliminating f3 first from (6), (7) and then from (7), (8) gives
3456f1 + 1200f2 = −1600,
27f1 + 15f2 = −128,
and so
1296f1 = 8640 ⇒ f1 = f5 = 20/3.
Substituting back, we obtain f2 = f4 = −308/15 and f3 = 30, as required.
1
P3 Since f 2 = 0, the error in question is
n    
     n  1 n  k 1

(Bn f ) 12 − f 12 = 2  − 2 .
k n
k=0

Now, if n is even, then


n    n/2  
 
n  k 
1 n 1 k
− =2 −
k n 2 k 2 n
k=0 k=0
n/2   n/2  
 n  n−1
= −2
k k−1
k=0 k=1
n/2   n/2−1  
 n  n−1
= −2 .
k k
k=0 k=0

Since n is even, n − 1 is odd and so


n/2  
 n      
 n  n n n
1 1
= 2 + = 2n−1 + 2
k k n/2 n/2
k=0 k=0

and
n/2−1  
 n−1 
n−1
n−1

2 = = 2n−1 ,
k k
k=0 k=0

by the Binomial Theorem. Hence


 
    1 n
(Bn f ) 12 − f 12 = .
2n+1 n/2
To proceed further we need to use Stirling’s formula,

n! ∼ 2πn(n/e)n as n → ∞,
which gives
  √
n 2πn(n/e)n 2n+1
n/2
∼ √ 2 = √ as n → ∞.
πn(n/2e)n/2 2πn
Hence
    1
(Bn f ) 12 − f 12 ∼ √ as n → ∞,
2πn
as required.
If n is odd, then a similar argument gives
 
1 1 1 n−1
(Bn f ) 2 − f 2 = n 1
2 2 (n − 1)
1
∼ ,
2π(n − 1)
again using Stirling’s formula.

60
P4 Each of the functions
 
n
φnk (x) = xk (1 − x)n−k , 0 ≤ x ≤ 1,
k
is positive for 0 < x < 1 and vanishes for x = 0, 1 (unless k = 0 or k = n,
respectively). Also, for 0 < k < n,
 
n  k−1 
φnk (x) = kx (1 − x)n−k − xk (n − k)(1 − x)n−k−1
k
 
n
= xk−1 (1 − x)n−k−1 (k(1 − x) − (n − k)x)
k
 
n
= xk−1 (1 − x)n−k−1 (k − nx),
k
so that φnk has a unique turning point in (0, 1), namely, a maximum for
k − nx = 0, that is, x = k/n. This maximum value is
  k  n−k
n k k
φnk (k/n) = 1− .
k n n
If we keep k fixed while n → ∞, then by Stirling’s formula,
√  k  n−k
2πn(n/e)n k n−k
φnk (k/n) ∼ 
k! 2π(n − k)((n − k)/e)n−k n n
 k
n k
=
n − k k!ek
kk
∼ as n → ∞.
k!ek
Note, however, that k/n → 0 as n → ∞, so in this case the peak of height
e−k k k /k! moves towards the y-axis.
On the other hand, if ξ = k/n remains fixed while n (and hence k) tends to
infinity, then the width of the peak becomes narrower. Indeed, if η = ξ, then
φnk (η) ηk (1 − η)n−k
= k = α,
φnk (ξ) ξ (1 − ξ)n−k
say, where 0 < α < 1, because ξ is the maximum of φnk . Now consider the
sequences np = pn and kp = pk, where p is a positive integer. Then
φnp kp (η) ηpk (1 − η)pn−pk
=
φnp kp (ξ) ξpk (1 − ξ)pn−pk
= αp .
Thus
φnp kp (η)
lim = 0,
p→∞ φnp kp (ξ)
and so the width of the peak at ξ = k/n must tend to 0 as p → ∞. The
height of this peak is in fact
 
pn
φnp kp (ξ) = ξpk (1 − ξ)pn−pk
pk
1
∼ as p → ∞,
2πpk(1 − ξ)
by a further application of Stirling’s formula.
These properties of the graphs y = φnk (x) are illustrated below in various
special cases.

61
0.5
φ31
φ62
φ93

φ61
φ91

0 1 1 1
9 6 3
1

P5 In the proof of Theorem 6.3, we found that if q(x) = a + bx + cx2 , then


Bn q − q = Bn p − p,
where p(x) = cx2 , and that
c
Bn q − q ∞ = Bn p − p ∞ =
4n
1 
= q ∞ .
8n
The technique used in Theorem 6.2 was to approximate f from above and
below at a point ξ by quadratic functions, and then use:
(a) the convergence of Ln q to q, if q is quadratic;
(b) the fact that the operators Ln are monotone.
In this problem we replace (a) by the explicit estimate for Bn q − q ∞ , given
above. We wish, therefore, to approximate a given function f ∈ C (2) [0, 1]
above and below by quadratics, the approximation being particularly good
near ξ. By Taylor’s Theorem,
f (x) = f (ξ) + f  (ξ)(x − ξ) + 12 f  (η)(x − ξ)2 ,
where η lies between x and ξ. Thus, if
qu (x) = f (ξ) + f  (ξ)(x − ξ) + 12 f  ∞ (x − ξ)2 ,
q (x) = f (ξ) + f  (ξ)(x − ξ) − 12 f  ∞ (x − ξ)2 ,
then
q (x) ≤ f (x) ≤ qu (x), x ∈ [0, 1],
so that
(Bn q )(x) ≤ (Bn f )(x) ≤ (Bn qu )(x), x ∈ [0, 1],
since Bn is a monotone operator.
Now qu and q are quadratic functions, so that
1 
|(Bn qu )(ξ) − qu (ξ)| ≤ q ∞
8n u
and
1 
|(Bn q )(ξ) − q (ξ)| ≤ q ∞ .
8n 
Furthermore, by the definitions of q and qu , q (ξ) = f (ξ) = qu (ξ) and
q ∞ = f  ∞ = qu ∞ , so that
1
|(Bn f )(ξ) − f (ξ)| ≤ f  ∞ .
8n
Since the expression on the right of this inequality is independent of ξ, we
deduce that
1
Bn f − f ∞ ≤ f  ∞ ,
8n
as required.

62
Chapter 7 The theory of minimax
approximation
In Chapter 7 we consider the problem of approximating a given function
f ∈ C[a, b] by polynomials of fixed degree n in the ∞-norm. The polynomial
which best approximates f in this respect can be characterised rather elegantly
and is in fact unique. The theory can be extended to other linear spaces of
approximating functions which satisfy a criterion known as the ‘Haar condition’.
For conciseness we shall use the abbreviation b.m.a. for ‘best minimax
approximation’.
This chapter splits into TWO study sessions:
Study session 1: Sections 7.1 and 7.2.
Study session 2: Sections 7.3 and 7.4.

Study Session 1: The extreme values of the error


function

Read Sections 7.1 and 7.2

Commentary
1. The parameter θ in (7.2) may appear superfluous at first sight, but its rôle
becomes clear in Section 7.2.

2. The following diagram may clarify the final paragraph of Section 7.1.

f +g
p∗3

The function p∗3 is the b.m.a. from P1 to both f and to f + g, but the b.m.a.
from P1 to g is not the zero function.

3. The letter used in Section 7.2 to denote the set where the extreme values of
the error function occur is a script Z (Z) with subscript M . Here one must
interpret ‘extreme values of e∗ ’ to mean ‘maximum values of |e∗ |’.

4. The result in the first paragraph of Section 7.2 can be summarised as follows:
if p∗ is not a b.m.a. from A to f , then there exists p in A such that
sgn(e∗ (x)) = sgn(p(x)), x ∈ ZM ,
that is,
e∗ (x)p(x) > 0, x ∈ ZM .

63
The converse result is:
if p∗ is in A, e∗ = f − p∗ and there exists p in A such that
e∗ (x)p(x) > 0, x ∈ ZM ,
then there exists θ > 0 such that
f − (p∗ + θp) ∞ < f − p∗ ∞ ,
so that p∗ is not a b.m.a. from A to f .
This converse result is the special case of Theorem 7.1 in which Z = [a, b].

5. The proof of Theorem 7.1 is quite subtle. At a first reading you would do
well to assume that Z = [a, b]. The following diagram (based on the third
part of Figure 7.1) may clarify the rôles played by p, ZM , Z0 and d.

f
p∗

p
ZM

d
e∗ Z0

Self-assessment questions
S1 Sketch a diagram like the above, which corresponds to the second part of
Figure 7.1.
1
S2 Can the constant 2 in (7.13) be replaced by 1?

Study Session 2: Characterising best minimax


approximations

Read Sections 7.3 and 7.4

Commentary
1. The relationship between conditions (1), (2), (3) and (4) is examined in
detail in Appendix A, which will not be assessed. Note, however, that the
equivalence of (1) and (4) is straightforward. Indeed, if {φi : i = 0, 1, . . . , n}
is a basis of A, then:

n
(a) the function f = λi φi in A is identically zero if and only if
i=0
λ = (λ0 , λ1 , . . . , λn ) = (0, . . . , 0);

64

n
(b) the function f = λi φi in A has zeros at ξj , j = 0, 1, . . . , n, if and only
i=0
if

n
λi φi (ξj ) = 0, j = 0, 1, . . . , n,
i=0

that is,
Pλ = 0, (∗)
where P is the matrix with entries φi (ξj );
(c) equation (∗) has the unique solution λ = 0 if and only if P is
non-singular.
To verify that (4) holds for a given space A we need to check that the matrix
P is non-singular for every set {ξj : j = 0, 1, . . . , n} of distinct points in [a, b]
where {φi : i = 0, 1, . . . , n} is some basis of A. (See SAQ S4 and Powell
Exercise 7.8.)

2. We remark that Haar condition (2) is in fact equivalent to Haar conditions


(1), (3) and (4), contrary to the assertion at the end of Powell Exercise 7.4.
This result came to light during the preparation of these notes, after
Professor M. Stynes (University College, Cork) had pointed out that the
space A in Powell Exercise 7.4 does not in fact provide a counterexample to
this equivalence.
 
3. Points ξ∗0 , . . . , ξ∗n+1 which satisfy (7.17), (7.18) and (7.19) are often called
an alternating set (of length n + 2) for the error function f − p∗ .

4. The key observation in Theorem 7.3 is that the function


p∗ (x) = xn − Tn (x)/2n−1 has the following properties:
(a) p∗ ∈ Pn−1 ;
(b) if f (x) = xn , then f (x) − p∗ (x) = Tn (x)/2n−1 has an alternating set of
length n + 1 in [−1, 1].
Thus p∗ is the b.m.a. from Pn−1 to f on [−1, 1].
Similarly, if f (x) = c0 + c1 x + · · · + cn xn , then p∗ (x) = f (x) − cn Tn (x)/2n−1
is the b.m.a. from Pn−1 to f on [−1, 1]. Furthermore, the b.m.a. from Pn−1
to f on an arbitrary closed interval [a, b] can be found by applying the above
technique to the polynomial f (φ(x)), where φ is the linear map from [−1, 1]
onto [a, b] (see Powell Exercise 7.7).

5. In the sentence following the proof of Theorem 7.3, ‘C[a, b]’ should be ‘[a, b]’.

6. Theorem 7.4, and the discussion following it, indicate how to find the b.m.a.
from Pn to the discrete data {(ξi , f (ξi )) : i = 0, 1, . . . , n}. The equations
(7.27) are fundamental to the exchange algorithm, which is discussed in
Chapter 8. The matrix of the system (7.27) is non-singular because if a linear
mapping from Rn+1 to Rn+1 is onto, then it is also one–one.

7. Theorem 7.6 can be proved without the rather awkward Theorem 7.5 and
condition (3), using the more direct method of Powell Exercise 7.6. However,
Theorem 7.5 is needed for Theorem 7.7 (see also page 113).

8. Theorem 7.7 is quite hard to grasp fully at a first reading, but it is an


essential part of the exchange algorithm. The best way to understand the
inequalities (7.32) is to attempt Powell Exercise 7.7.

65
Self-assessment questions
S3 Explain why Pn satisfies Haar conditions (1) and (2).

S4 Determine whether the following linear spaces satisfy the Haar condition:
(a) the space A spanned by φ0 (x) = 1, φ1 (x) = cos x on [0, π];
(b) the space A spanned by φ0 (x) = 1, φ1 (x) = cos x on [π/2, 3π/2].

S5 Verify that p∗ (x) = x2 + 1/8 is the b.m.a. from P3 to f (x) = |x| on [−1, 1]
(cf. Chapter 3, SAQ S3).

S6 Determine the b.m.a. from P1 to f (x) = sin(πx/2) on [0, 1].

S7 Powell Exercise 7.5

Problems for Chapter 7


P1 Powell Exercise 7.2

P2 Powell Exercise 7.3

P3 Powell Exercise 7.6

P4 Powell Exercise 7.7

P5 Powell Exercise 7.8

Solutions to SAQs in Chapter 7


S1 The diagram is as follows.

p∗

e∗ Z0

ZM

S2 No, because if θ = max{|e∗ (x)| : x ∈ Z} − d, then we must have


d + θ = max{|e∗ (x)| : x ∈ Z} and (7.15) would not yield the required strict
inequality in (7.8).

S3 Pn satisfies condition (1) because a polynomial of degree n can have at most


n zeros. Pn satisfies condition (2) because the function
"
k
p(x) = (x − ζj )
j=1

has degree k ≤ n and changes sign precisely at the points ζj . Moreover, the
function p(x) = 1 lies in Pn and has no zeros in [a, b].

66
S4 (a) To verify Haar condition (4) we need to show that, for distinct ξ1 , ξ2 in
[0, π], the matrix
 
1 cos ξ1
1 cos ξ2
is non-singular, that is, cos ξ1 = cos ξ2 . But this is evident, since cos is
one–one on [0, π], because it is strictly decreasing on [0, π].
(b) Condition (4) fails in this case because the above matrix is singular if, for
example, ξ0 = π/2, ξ1 = 3π/2, so cos ξ0 = 0 = cos ξ1 . Also, consideration
of φ1 (x) = cos x at ξ0 , ξ1 shows that condition (1) is false.
 
S5 By Theorem 7.2, it is sufficient to note that −1, − 21 , 0, 12 , 1 is an alternating
set of length 5(= 3 + 2) for f (x) − p∗ (x) with h = −1/8. Note  that p∗ is also
a b.m.a. from
  P2 to f (suitable alternating sets are either −1, − 2 , 0, 12 or
1

− 12 , 0, 12 , 1 ).

S6 By Theorem 7.2, the error function e = f − p∗ must have an alternating set of


length 3. Since f is concave, p∗ (x) = a + bx must look as follows.

1
f p∗

0 α 1

Thus we have
f (0) − p∗ (0) = sin 0 − a = h, (1)
f (α) − p∗ (α) = sin(πα/2) − a − bα = −h, (2)
f (1) − p∗ (1) = sin(π/2) − a − b = h, (3)
where α is a solution of
π π
e (x) = cos x − b = 0.
2 2
Since (1) and (3) imply that b = 1, we deduce that
 
2 −1 2
α = cos = 0.560 664 18.
π π
Thus, from (1) and (2),
2h = α − sin(πα/2) = −0.210 513 662 ⇒ h = −0.105 256 831.
Since a = −h, we deduce that p∗ (x) = 0.105 + x.

67
S7 By Theorem 7.2, the error function e = f − p∗ must have an alternating set of
length 4. The required quadratic p∗ must surely, therefore, be of the following
form.

1 f
1
2
p∗

−1 − 1 α 1
2

 
If p∗ (x) = a + bx + cx2 , then for −1, − 21 , α, 1 to be an alternating set, we
want
f (−1) − p∗ (−1) = 12 − (a − b + c) = h, (4)
 1    
f − 2 − p∗ − 21 = 0 − a − 12 b + 14 c = −h, (5)
 
f (α) − p∗ (α) = α + 12 − a + αb + α2 c = h, (6)
∗ 3
f (1) − p (1) = 2 − (a + b + c) = −h. (7)
Here α is a solution of
e (x) = 1 − b − 2cx = 0 ⇒ α = (1 − b)/2c.
Equations (5) and (7) imply that 3/2 − 3b/2 − 3c/4 = 0, and hence that
α = 1/4. Equations (4) and (7) imply that a + c = 1 and also that
2b = 1 + 2h. Substituting in (5), we find that a = 2h, and then in (6), that
h = 9/50, a = 9/25, b = 17/25, c = 16/25.
Hence
 
p∗ (x) = 1
25 9 + 17x + 16x2 .

Solutions to Problems in Chapter 7


P1 According to Theorem 7.1, for p∗ to be a b.m.a. from A to f there must be no
p in A such that
 
f (ξj ) − p∗ (ξj ) p(ξj ) > 0, j = 1, 2, . . . , r.
This implies, for instance, that there is no p in A such that
p(ξj ) = f (ξj ) − p∗ (ξj ), j = 1, 2, . . . , r,
and hence that the linear equations

n
λi φi (ξj ) = f (ξj ) − p∗ (ξj ), j = 1, 2, . . . , r,
i=0

have no solutions λ0 , λ1 , . . . , λn . But this means that the matrix H with


entries φi (ξj ) fails to have full rank r.

68
P2 The following special case may help to illuminate this rather slippery problem.
Suppose that all functions in A vanish at a particular point ξ1 . If it turns out
that ξ1 belongs to the set ZM for some approximation p∗ to f , then p∗ is a
b.m.a. to f , since we can obtain no better approximation at the point ξ1 .
In general, the condition

r
σj φ(ξj ) = 0, φ ∈ A, (8)
j=1

gives a linear dependence among the values taken at the ξj , by any member φ
of A. The condition
σj e∗ (ξj ) ≥ 0, j = 1, 2, . . . , r, (9)
where e∗ = f − p∗ , implies that σj and e∗ (ξj ), which are both non-zero, have
the same sign. Thus if φ is any member of A, then
e∗ (ξj )φ(ξj ) ≤ 0,
for at least one of the j, since otherwise (8) and (9) lead to a contradiction.
Since ξj ∈ ZM we deduce, by Theorem 7.1, that p∗ is a b.m.a. from A to f .

P3 If q ∗ and r∗ are b.m.a.s from A to f , then so is p∗ = 12 (q ∗ + r∗ ), by Theorem


2.2. Thus there is an alternating set {ξi : i = 0, 1, . . . , n + 1} for f − p∗ . Since,
for each i = 0, 1, . . . , n + 1,
f − p∗ ∞ = |f (ξi ) − p∗ (ξi )|
 
=  12 (f (ξi ) − q ∗ (ξi )) + 12 (f (ξi ) − r∗ (ξi ))
≤ 12 ( f − q ∗ ∞ + f − r∗ ∞ )
= f − p∗ ∞ ,
we deduce that
f (ξi ) − q ∗ (ξi ) = f (ξi ) − r∗ (ξi ), i = 0, 1, . . . , n + 1.
Thus (q ∗ − r∗ ) (ξi ) = 0, for i = 0, 1, . . . , n + 1, and so q ∗ = r∗ by Haar
condition (1).

P4 According to Theorem 7.4, we need to solve the linear equations (7.27). If


f (x) = x3 and p(x) = a + bx + cx2 , these equations are
f (0) − p∗ (0) = −a = h, (10)
f (0.3) − p∗ (0.3) = 0.027 − a − 0.3b − 0.09c = −h, (11)
f (0.8) − p∗ (0.8) = 0.512 − a − 0.8b − 0.64c = h, (12)
f (1) − p∗ (1) = 1 − a − b − c = −h. (13)
Considering (12) − (10) and (13) − (11) gives
0.512 − 0.8b − 0.64c = 0,
0.973 − 0.7b − 0.91c = 0,
and we can now eliminate b to give
4.2 − 2.8c = 0 ⇒ c = 1.5.
It quickly follows that b = −0.56, a = 0.03 and h = −0.03. In particular, the
first line of (7.32) is equal to 0.03.
To determine the final line of (7.32), we need to find the extreme values of
 
f (x) − p∗ (x) = x3 − 0.03 − 0.56x + 1.5x2 ,
which occur when
3x2 − 3x + 0.56 = 0 ⇒ x = 0.248 34, 0.751 67.

69
Evaluating f − p∗ at these points gives f − p∗ ∞ = 0.031 877. Thus, we
deduce that
0.03 ≤ min f − p ∞ ≤ 0.031 877.
p∈A

In fact, we can determine minp∈A f − p ∞ by considering f (φ(x)), where


φ(x) = 12 (1 + x) maps [−1, 1] onto [0, 1]. The b.m.a. from P2 to
f (φ(x)) = 1
8 + 38 x + 38 x2 + 18 x3
on [−1, 1] is (see commentary on Theorem 7.3) given by
 3 
f (φ(x)) − 1
8 · 14 T3 (x) = 1
8 + 38 x + 38 x2 + 18 x3 − 1
32 4x − 3x
= 1
8 + 15
32 x + 38 x2 .
Hence the b.m.a. from P2 to f on [0, 1] is
15 −1
p(x) = 1
8 + 32 φ (x) + 38 φ−1 (x)2
1 15 3 2
= 8 + 32 (2x − 1) + 8 (2x − 1)
1 9 3 2
= 32 − 16 x + 2 x .

Since the extreme values of f − p occur at the points of [0, 1] which


correspond to the extreme values of 4x3 − 3x on [−1, 1], namely at 0, 0.25,
0.75 and 1, we find that
1
min f − p ∞ = 32 = 0.031 25.
p∈A

Note that 0.031 25 does indeed lie in (0.03, 0.031 877).

P5 The space A that is spanned by


φ0 (x) = 1, ϕ1 (x) = cos 2x, ϕ2 (x) = sin 3x,
on [−π/6, π/2] satisfies Haar condition (4) if the matrix
⎛ ⎞
1 cos 2ξ0 sin 3ξ0
⎝ 1 cos 2ξ1 sin 3ξ1 ⎠ (14)
1 cos 2ξ2 sin 3ξ2
is non-singular for all distinct ξ0 , ξ1 , ξ2 in [−π/6, π/2]. To prove this, we must
either show that the determinant
 
 1 cos 2ξ0 sin 3ξ0 
 
 1 cos 2ξ1 sin 3ξ1 
 
 1 cos 2ξ2 sin 3ξ2 
is non-zero for such ξ0 , ξ1 , ξ2 , or show that if α, β, γ are not all zero, then the
equation
f (x) = α + β cos 2x + γ sin 3x = 0 (15)
cannot have 3 distinct solutions in [−π/6, π/2] (since this shows that the
columns of the above matrix are linearly independent).
To prove this fact about the above determinant, we would need to express the
determinant as a product with fairly simple factors, each of which can be
shown to be non-zero. This seems rather difficult in this case.
To prove that an equation such as (15) has at most 2 distinct solutions in
[−π/6, π/2], it is sufficient (by Rolle’s theorem) to prove that the equation
f  (x) = 0 has at most one solution in [−π/6, π/2]. This approach is often
effective, but it does not work in this case, since
f  (x) = −2β sin 2x + 3γ cos 3x = 0
can have 3 solutions (take β = 0, γ = 1, x = −π/6, π/6, π/2).
We can convert this into a problem about polynomials by using the identities
cos 2x = 1 − 2 sin2 x, sin 3x = 3 sin x − 4 sin3 x.

70
Since the sine function maps [−π/6, π/2] one-one onto [− 21 , 1], it is equivalent
to prove that the matrix
⎛ ⎞
1 1 − 2t20 3t0 − 4t30
⎜ ⎟
⎝ 1 1 − 2t21 3t1 − 4t31 ⎠ (16)
2 3
1 1 − 2t2 3t2 − 4t2
is non-singular for all distinct t0 , t1 , t2 in [− 21 , 1]. Here ti = sin ξi , i = 0, 1, 2.
In this form, we can evaluate the corresponding determinant and (with some
effort) find a fairly simple factorisation:
 
 1 1 − 2t2 3t0 − 4t2 
 0 0
 
 1 1 − 2t21 3t1 − 4t31  = (t0 − t1 ) (t1 − t2 ) (t2 − t0 ) (3 + 4 (t0 t1 + t1 t2 + t2 t0 )) .
 
 1 1 − 2t22 3t2 − 4t32 

The first three factors of this product are non-zero (since t0 , t1 , t2 are
distinct) but it remains to show that
3 + 4(t0 t1 + t1 t2 + t2 t0 ) = 0,
for distinct t0 , t1 , t2 in [− 12 , 1]. This can be done, but it is rather tricky, and
we omit the details.
Instead, we show that if α, β, γ are not all zero, then the cubic equation
p(t) = α + β(1 − 2t2 ) + γ(3t − 4t3 ) = 0
cannot have 3 distinct roots in [− 12 , 1], which shows that the matrix (16) is
non-singular and hence so is the matrix (14). Once again, an approach via
Rolle’s theorem turns out to be unsuccessful, so we try the following approach
which uses knowledge about the overall shape of a cubic graph.
The equation can certainly have at most 2 roots if γ = 0. We can then
assume, for example, that γ < 0. Since the coefficient of t3 is then positive (so
p(t) → ±∞ as t → ±∞), there can be 3 distinct roots in [−1/2, 1] only if
p(1) = α − β − γ ≥ 0, (17)
p (1) = −4β + 3γ − 12γ > 0, (18)
 
p − 12 = α + 12 β − γ ≤ 0, (19)
 
p − 12 = 2β + 3γ − 3γ > 0. (20)
According to (20), we have β > 0 and so (17) and (19) yield the contradictory
inequalities α > γ and α < γ, respectively. Hence the above matrices are
indeed non-singular and A does satisfy Haar condition (4).
To prove the final part note that if
φ(x) = α + β cos(2x) + γ sin(3x)
vanishes at x = −π/6, then
α + 12 β − γ = 0. (21)
Now
φ (−π/6) = −2β sin(−π/3) + 3γ cos(−π/2)

= 3β,
and, by equation (21),
φ(π/2) = α − β − γ = −3β/2.
If β = 0, then φ(π/2) = φ (−π/6) = 0. On the other hand, if β > 0, then
φ (−π/6) > 0 and φ(π/2) < 0, so φ has a zero in [−π/6, π/2]. A similar
argument applies if β < 0 and so each function φ in A which vanishes at −π/6
also vanishes at some other point of [−π/6, π/2].
Remark The final part could also have been proved by considering the
cubic function p defined earlier.

71
Chapter 8 The exchange algorithm
This chapter contains a detailed account of the exchange algorithm, which is an
iteration process for determining the b.m.a. from a finite-dimensional subspace A
of C[a, b] to a function f ∈ C[a, b]. The space A must satisfy the Haar condition,
since the algorithm is based on the theory developed in Chapter 7.
The exchange algorithm is analysed in Chapter 9, which will not be assessed. Two
proofs are given there that the algorithm converges. The first, in Sections 9.1 and
9.2, is fairly straightforward, but does not given an estimate for the rate of
convergence of the algorithm. The second proof, in Sections 9.3 and 9.4, is very
involved, but it serves to show that the algorithm converges remarkably quickly.
This chapter splits into TWO study sessions:
Study session 1: Sections 8.1, 8.2 and 8.3.
Study session 2: Sections 8.4 and 8.5.

Study Session 1: Using the exchange algorithm

Read Sections 8.1, 8.2 and 8.3

Commentary
1. Although Powell does not mention it in the text, the version of the exchange
algorithm in which all points of the reference are changed at each iteration is
often called the Remes algorithm (see page 338).

2. The choice of the point ξq to be replaced (see page 88) is easy if


ξ0 < η < ξn+1 . If η < ξ0 , however, there are two possibilities, illustrated
below.

|h| |h|

η
η ξ0 ξ1 ξ2 ξ3 ξ4 ξ0 ξ1 ξ2 ξ3 ξ4

−|h| −|h|

On the left, e(η) has the same sign as e(ξ0 ), so ξ0 leaves the reference; on the
right, e(η) has the opposite sign to e(ξ0 ), so ξ4 leaves the reference.

3. The following summary of the one-point exchange algorithm may prove


helpful. Recall that we are given a function f in C[a, b] and an
(n + 1)-dimensional subspace A of C[a, b], which satisfies the Haar condition,
and that we are trying to find the function p∗ in A such that
f − p∗ ∞ = min f − p ∞ .
p∈A

72
Step 1 Choose an initial reference: a ≤ ξ0 < ξ1 < . . . < ξn+1 ≤ b.
Step 2 Determine p ∈ A and h ∈ R, such that
f (ξi ) − p(ξi ) = (−1)i h, i = 0, 1, . . . , n + 1.
Thus, by Theorem 7.4,
|h| = min max |f (ξi ) − p(ξi )|.
p∈A i=0,1,...,n+1

Step 3 Determine η ∈ [a, b], such that


|f (η) − p(η)| = f − p ∞ .

Step 4 By Theorem 7.7,


|h| ≤ f − p∗ ∞ ≤ |f (η) − p(η)|,
so stop if
δ = |f (η) − p(η)| − |h|
is small enough. Otherwise, continue to Step 5.
Step 5 Choose a new reference: a ≤ ξ+ + +
0 < ξ1 < . . . < ξn+1 ≤ b, replacing
one ξq by η, in such a way that the numbers
   
f ξ+i − p ξ+i , i = 0, 1, . . . , n + 1,
have alternating signs. Then return to Step 2.

4. Here is an example of the one-point exchange algorithm in action, being used


to find the best minimax approximation from P2 to f (x) = ex on [−1, 1]. We
perform only the first iteration of the algorithm.
 
Step 1 Choose the initial reference to be −1, − 21 , 12 , 1 . The reason for
this choice will be explained in Theorem 8.1.
Step 2 Determine p(x) = a + bx + cx2 ∈ P2 such that
f (−1) − p(−1) = e−1 − (a − b + c) = h, (1)
     
f − 12 − p − 21 = e−1/2 − a − 12 b + 14 c = −h, (2)
     
f 12 − p 12 = e1/2 − a + 12 b + 14 c = h, (3)
f (1) − p(1) = e − (a + b + c) = −h. (4)
Considering (1) + (4) and (2) + (3), we find that

a + c = cosh 1 c = 43 (cosh 1 − cosh 12 ) = 0.553 93,


1 1 ⇒
a + 4 c = cosh 2 a = cosh 1 − c = 0.989 14.
Considering (4) − (1) and (3) − (2), we find that

 
b − h = sinh 1 b = 23 sinh 1 + sinh 12 = 1.130 86,
1 1 ⇒
2 b + h = sinh 2 h = b − sinh 1 = −0.044 337.
Thus the first approximation to p∗ is
p(x) = 0.989 14 + 1.130 86x + 0.553 93x2,
and the levelled reference error is |h| = 0.044 337.

73
Step 3 To determine f − p ∞ , we need to identify the extreme points of
e = f − p on [−1, 1], which occur either at ±1 or at solutions of
e (x) = ex − b − 2cx = 0.
The graphs y = ex and y = b + 2cx (with b = 1.130 86, c = 0.553 93)
indicate that this non-linear equation has two solutions η1 , η2 at
approximately ±0.5. One can use the bisection method or Newton’s
method to obtain the accurate values
η1 = −0.438 62 and η2 = 0.560 94.
Since e(η1 ) = 0.045 233 and e(η2 ) = −0.045 468, we have
f − p ∞ = |e(η2 )| = 0.045 468.
Note that |e(−1)| = |e(1)| = |h| < |e(η2 )|.
Remark Recall Powell’s comment on page 86 that f − p ∞
would in practice be obtained by computing many values of f − p
on [a, b] and approximating this function locally by quadratics.
Step 4 Since
δ = |f (η2 ) − p(η2 )| − |h| = 0.045 468 − 0.044 337 = 0.001 131,
the polynomial p is already fairly close to the best minimax
approximation from P2 to f (x) = ex on [−1, 1].
Step 5 The error function e = f − p has the following form.

|h|

1
−1 2 η2
η 0
− 12 1 1

−|h|

The point of the initial reference


 to be replaced
 by η2 is clearly 12
1
and so the new reference is −1, − 2 , η2 , 1 . The linear equations to
be solved in Step 2 will not yield quite so simply with this
reference, owing to the lack of symmetry.
Remark Notice how well the calculated polynomial p(x)
approximates f (x) = ex on [−1, 1] by comparison with the Taylor
polynomial q(x) = 1 + x + 12 x2 of f . Indeed,
f − q ∞ = f (1) − q(1) = e − 2.5 = 0.218 28,
so that f − q ∞ 4.8 f − p ∞ .

Self-assessment questions
S1 Justify the statement in the second paragraph of page 86 that the error
function e has at least n turning points.

S2 Justify the statement in the second paragraph of page 88 that the case when
|h| = 0 can occur only on the first iteration, and then any value of q gives the
increase (8.11).

S3 Powell Exercise 8.2

74
Study Session 2: Matters relating to the exchange
algorithm

Read Sections 8.4 and 8.5

Commentary
1. Theorem 8.1 explains the choice of reference that we made when finding a
minimax approximation from P2 to f (x) = ex on [−1, 1]. The Chebyshev 
polynomial T3 (x) = 4x3 − 3x takes its extreme values at −1, − 21 , 12 , 1 .
Notice also the bearing that Theorem 8.1 has on Powell Exercise  7.7. If we
map the above reference from [−1, 1] to [0, 1], then it becomes 0, 14 , 34 , 1 ,
and Theorem 8.1 implies that if this reference had been used in Powell
Exercise 7.7, then the calculated polynomial p(x) would have been the b.m.a.
from P2 to f (x) = x3 on [0, 1]. Since the given reference {0, 0.3, 0.8, 1} was
close to this ideal reference, the resulting polynomial was close to the b.m.a.

2. The linear operator from C[−1, 1] to Pn , described after Theorem 8.1, is


investigated in a special case in Powell Exercise 8.5. The asymptotic estimate
ln n for the norm of this operator is not obvious.

3. The discussion of ‘telescoping’ on page 92 is closely related to our


commentary on Theorem 7.3 (see Problem P4).

4. In the linear programming problem described on page 94 the aim is to


minimise θ subject to the 2m linear constraints on (θ, λ0 , λ1 , . . . , λn ) given by
(8.34).

Self-assessment questions
S4 Verify that the points ξi in (8.17) satisfy (8.18).

S5 Calculate the b.m.a. from P2 to f (x) = x3 on [0, 1] by using Theorem 8.1.


Compare your answer with the one obtained for Powell Exercise 7.7.

Problems for Chapter 8


P1 Powell Exercise 8.1 (Hint: consider q ∈ Pn+1 which interpolates f at ξi .)

P2 Powell Exercise 8.3

P3 Powell Exercise 8.5

P4 Powell Exercise 8.6 (Hint: express the remainder for the Taylor
approximation as an integral.)

P5 Powell Exercise 8.7

75
Solutions to SAQs in Chapter 8
S1 Let ξk−1 , ξk , ξk+1 be consecutive points of the reference. If e(ξk ) = |h| > 0,
then the error function e has a local maximum inside [ξk−1 , ξk+1 ], with value
at least |h|. On the other hand, if e(ξk ) = −|h| < 0, then the function e has a
local minimum inside [ξk−1 , ξk+1 ], with value at most −|h|. Each of these
local extreme values is clearly distinct and so, since there are n intervals of
the form [ξk−1 , ξk+1 ], there are at least n local extreme points.

S2 The value h = 0 occurs if and only if the numbers {f (ξi ) : i = 0, 1, . . . , n + 1}


happen to be the values taken at ξi , i = 0, 1, . . . , n + 1, by some p in A. If f
itself is in A, then we also have δ = 0 on the first iteration, so the algorithm
terminates. If f is not in A, then h may be zero on the first iteration because
f (ξi ) = p(ξi ), i = 0, 1, . . . , n + 1, for some p in A. But then
f − p ∞ = |f (η) − p(η)| > 0 and on replacing any ξq by η we obtain a
 
reference ξ+ +
i : i = 0, 1, . . . , n + 1 such that no p in A interpolates f at ξi ,
i = 0, 1, . . . , n + 1 (because such a p is determined by its values at n + 1
points). Hence the new levelled reference error is positive.

S3 The initial reference is of the form {ξ0 , ξ1 , ξ2 }, where ξ0 = a, ξ2 = b, and the


error function e = f − p on the first iteration is convex.

|h|
ξ1 η
a = ξ0 b = ξ2
−|h|
y = e(x)

Since e = f − p is convex, it has a unique extreme value other than a and b at


η, which is a minimum. If η = ξ1 , then the algorithm terminates. Otherwise,
the new reference is {a, η, b} and the new approximation is
p̃ = p − 12 (|e(η)| − |h|), since this gives an error function ẽ = e + 12 (|e(η)| − |h|)
with
ẽ(a) = −ẽ(η) = ẽ(b) = 12 (|h| + |e(η)|) = |h̃|,
say. The algorithm now terminates since ẽ ∞ = |h̃|.

S4 Since
 
Tn+1 (x) = cos (n + 1) cos−1 x , −1 ≤ x ≤ 1
(see (4.23)), we have
Tn+1 (ξi ) = cos((n + 1)(n + 1 − i)π/(n + 1))
= cos((n + 1 − i)π)
= (−1)n+1−i ,
because cos(nπ) = (−1)n .

76
S5 According to Theorem 8.1 and the discussion at the bottom of page 91, we
should use the reference {0, 0.25, 0.75, 1}. With f (x) = x3 and
p(x) = a + bx + cx2 , this gives
f (0) − p(0) = −a = h, (5)
f (1/4) − p(1/4) = 1/64 − (a + b/4 + c/16) = −h, (6)
f (3/4) − p(3/4) = 27/64 − (a + 3b/4 + 9c/16) = h, (7)
f (1) − p(1) = 1 − (a + b + c) = −h. (8)
Considering (7) − (5) and (8) − (6) gives

27/64 − 3b/4 − 9c/16 = 0


⇒ c = 3/2, b = −9/16.
63/64 − 3b/4 − 15c/16 = 0
Adding (5) and (8) gives
1 − 2a − b − c = 0 ⇒ a = 1/32, h = −1/32.
By Theorem 8.1, the b.m.a. from P2 to f (x) = x3 on [0, 1] is
p∗ (x) = 1
32 − 9
16 x + 32 x2 .
This agrees with the answer to Powell Exercise 7.7.

Solutions to Problems in Chapter 8


P1 The levelled reference error |h| is found by solving
f (ξi ) − p(ξi ) = (−1)i h, i = 0, 1, . . . , n + 1,
where p ∈ Pn . Now let q ∈ Pn+1 interpolate f at {ξi : i = 0, 1, . . . , n + 1}, so
that
q(ξi ) − p(ξi ) = (−1)i h, i = 0, 1, . . . , n + 1.
n+1
The coefficient of x in q, which is identical to the coefficient of xn+1 in
q − p, is equal to f [ξ0 , ξ1 , . . . , ξn+1 ] (see Section 5.1) and so
f [ξ0 , ξ1 , . . . , ξn+1 ]/h is the coefficient of xn+1 in that function r ∈ Pn+1 which
satisfies
r(ξi ) = (−1)i , i = 0, 1, . . . , n + 1.
Hence
r[ξ0 , ξ1 , . . . , ξn+1 ] = f [ξ0 , ξ1 , . . . , ξn+1 ]/h,
so that
f [ξ0 , ξ1 , . . . , ξn+1 ]
h= ,
r[ξ0 , ξ1 , . . . , ξn+1 ]
where r[ξ0 , ξ1 , . . . , ξn+1 ] is independent of f .
For n = 1, using equation (5.14),
 
2 −2 2
r[ξ0 , ξ1 , ξ2 ] = − /(ξ2 − ξ0 ) = ,
ξ2 − ξ1 ξ1 − ξ0 (ξ2 − ξ1 )(ξ1 − ξ0 )
and so
|h| = 12 (ξ2 − ξ1 )(ξ1 − ξ0 )|f [ξ0 , ξ1 , ξ2 ]|,
as required.

77
P2 The extreme values of the error function
e∗ (x) = f (x) − p∗ (x)
144
= − (69 − 20x + 2x2 )
x+2
occur when x = 0, 6, or x satisfies
−144
e∗ (x) = + 20 − 4x = 0.
(x + 2)2
This equation reduces to
(x − 1)(x2 − 16) = 0,
which has solutions x = 1, ±4. Since
e∗ (0) = 3, e∗ (1) = −3, e∗ (4) = 3, e∗ (6) = −3,
we deduce that p∗ is indeed the b.m.a. from P2 to f on [0, 6].
Next we determine the function p(x) = a + bx + cx2 which satisfies (8.4) with
the reference {0, 1 + α, 4 + β, 6}:
f (0) − p(0) = 72 − a = h, (9)
144  
f (1 + α) − p(1 + α) = − a + b(1 + α) + c(1 + α)2 = −h, (10)
(3 + α)
144  
f (4 + β) − p(4 + β) = − a + b(4 + β) + c(4 + β)2 = h, (11)
(6 + β)
f (6) − p(6) = 18 − (a + 6b + 36c) = −h. (12)
−1
Now, the binomial expansion for (1 + x) gives
1 1  
= = 13 (1 − α/3) + O α2
3+α 3(1 + α/3)
and
1 1  
= = 16 (1 − β/6) + O β2 ,
6+β 6(1 + β/6)
so that, if α, β are small enough for α2 , β2 to be neglected, then (10), (11)
reduce to
48 − 16α − (a + b(1 + α) + c(1 + 2α)) = −h, (13)
24 − 4β − (a + b(4 + β) + c(16 + 8β)) = h. (14)
Now we observe that the values a = 69, b = −20, c = 2 and h = 3 must satisfy
(9), (13), (14), (12) in the case α = β = 0, because p∗ is the b.m.a. from P2 to
f , with the alternating set being {0, 1, 4, 6}. Thus, these values for a, b, c and
h will satisfy (9), (13), (14), (12) when α, β = 0 also, provided that
−16α − bα − 2cα = (−16 − b − 2c)α = 0, (15)
−4β − bβ − 8cβ = (−4 − b − 8c)β = 0. (16)
Since both these equations do hold with b = −20 and c = 2, we deduce that
a, b, c, h do satisfy (9), (13), (14), (12), and so the function given by (8.4) in
this case is p∗ again.
Remark It is not, of course, pure chance that equations (15) and (16) hold.
In fact, if ξ is a point at which
f (ξ) − p(ξ) = h and f  (ξ) − p (ξ) = 0,
then for small ε (small enough for ε2 to be neglected) we have the Taylor
approximation
f (ξ + ε) − p(ξ + ε) = f (ξ) − p(ξ) + ε (f  (ξ) − p (ξ))
= f (ξ) − p(ξ)
= h.

78
In practice this means that if we choose a reference which is fairly close to an
alternating set for f − p∗ , then the polynomial given by (8.4) is very close
to p∗ .

P3 Since the reference ξi , i = 0, 1, . . . , n + 1, given by (8.17) is fixed, the


coefficients λj of p (cf. equation (7.27)) and h are linear functions of f (ξi ),
i = 0, 1, . . . , n + 1. Hence the operator
n
X : f → p = λj φj ,
j=0
where {φj } is the standard basis for Pn , is linear.
 
For n = 2, the reference (8.17) is −1, − 21 , 12 , 1 and we have to find the
quadratic function p(x) of  ∞-norm on [−1, 1] subject to the four
 largest
constraints |p(±1)| ≤ 1, p ± 12  ≤ 1. By the linear programming argument
from
 Chapter
 3, it is sufficient to consider the cases p(±1) = ±1,
p ± 21 = ±1, and it turns out that the largest ∞-norm is achieved for
p(x) = −5/3 + (8/3)x2 . In this case p ∞ = 5/3, so that X ∞ = 5/3, as
required.
 
P4 The nth Taylor polynomial of f (x) = ln 1 + 12 x at 0 is
x 1 # x $2 (−1)n+1 # x $n
pn (x) = − + ···+ .
2 2 2 n 2
The usual form of the remainder in Taylor’s Theorem is rather poor for the
function ln. It is better to consider the derivatives
 # $n−1 
  1 1 x n+1 x
f (x) − pn (x) = − 1 − + · · · + (−1)
2+x 2 2 2
 n
1 1 1 −(−x/2)
= −
2+x 2 1 + x/2
n
(−x/2)
= ,
2+x
and then integrate to give
 x 
 
|f (x) − pn (x)| =  (f (t) − pn (t)) dt
  
(since f (0) = pn (0) = 0)
0 x 
 (−t/2)n 
=  dt
0 2+t
 |x|
(t/2)n
≤ dt
0 2−t
 1
1
≤ n tn dt
2 0
1
= ,
(n + 1)2n
for |x| ≤ 1.
To obtain a Taylor polynomial pn with f − pn ∞ < 0.01 on [−1, 1] we must
therefore take n = 5:
1 1
f − p5 ∞ ≤ = 192 = 0.005 208 3̇.
6.25
Now we apply telescoping to
x x2 x3 x4 x5
p5 (x) = − + − +
2 8 24 64 160
on [−1, 1]. The b.m.a. from P4 to p5 on [−1, 1] is
1 1
p̃4 (x) = p5 (x) − · T5 (x),
160 24
and the ∞-norm error made by this approximation is 1/(160 × 24 ) = 1/2560.

79
Since T5 has no term in x4 , the coefficient of x4 in p̃4 (x) is −1/64 and so the
b.m.a. from P3 to p̃4 on [−1, 1] is
1 1
p̃3 (x) = p̃4 (x) +· T4 (x),
64 23
and the ∞-norm error made by this second approximation is
1/(64 × 23 ) = 1/512. Thus
f − p̃3 ∞ ≤ f − p5 ∞ + p5 − p̃4 ∞ + p̃4 − p̃3 ∞
1 1 1
≤ 192 + 2560 + 512
= 0.007 55.
Since p̃3 is a cubic function and f − p̃3 ∞ < 0.01, we are done.
For the record:
p̃4 (x) = 255
512 x − 18 x2 + 19 3
384 x − 1 4
64 x

and
1 255 9 2 19 3
p̃3 (x) = 512 + 512 x − 64 x + 384 x .

Note, however, that the error in the next telescoping step is


1
4 (19/384) = 0.012 . . ., so no further telescoping is possible.

P5 We denote a typical element of P1 by p(x) = a + bx.


Iteration 1 {0, 3, 6}
⎫ &
f (0) − p(0) = 0.3 − a = h ⎬ 3.1 − 3b = −2h
f (3) − p(3) = 3.4 − a − 3b = −h 5.4 − 6b = 0

f (6) − p(6) = 5.7 − a − 6b = h ⇒ b = 0.9, h = −0.2, a = 0.5
Thus p1 (x) = 0.5 + 0.9x, with |h| = 0.2. The maximum error of p1 is
|f (1) − p1 (1)| = 2.8 > |h|, with f (1) − p1 (1) > 0, and so 1 replaces 3 in the
reference.
Iteration 2 {0, 1, 6}
⎫ &
f (0) − p(0) = 0.3 − a = h ⎬ 3.9 − b = −2h
f (1) − p(1) = 4.2 − a − b = −h 5.4 − 6b = 0

f (6) − p(6) = 5.7 − a − 6b = h ⇒ b = 0.9, h = −1.5, a = 1.8
Thus p2 (x) = 1.8 + 0.9x, with |h| = 1.5. The maximum error of p2 is
|f (2) − p2 (2)| = 3.5 > |h|, with f (2) − p2 (2) < 0, and so 2 replaces 6 in the
reference.
Iteration 3 {0, 1, 2}
⎫ &
f (0) − p(0) = 0.3 − a = h ⎬ 3.9 − b = −2h
f (1) − p(1) = 4.2 − a − b = −h −0.2 − 2b = 0

f (2) − p(2) = 0.1 − a − 2b = h ⇒ b = −0.1, h = −2, a = 2.3
Thus p3 (x) = 2.3 − 0.1x, with |h| = 2. The maximum error of p3 is
|f (6) − p3 (6)| = 4 > |h|, with f (6) − p3 (6) > 0, and so 6 replaces 0 in the
reference. (Care needed here to ensure that f − p3 alternates in sign on the
new reference!)
Iteration 4 {1, 2, 6}
⎫ &
f (1) − p(1) = 4.2 − a − b = h ⎬ 4.3 − 2a − 3b = 0
f (2) − p(2) = 0.1 − a − 2b = −h 5.8 − 2a − 8b = 0

f (6) − p(6) = 5.7 − a − 6b = h ⇒ a = 1.7, b = 0.3, h = 2.2
Thus p4 (x) = 1.7 + 0.3x, with |h| = 2.2. The maximum error of p4 is
|f (4) − p4 (4)| = 2.8 > |h|, with f (4) − p4 (4) > 0, and so 4 replaces 6 in the
reference.

80
Iteration 5 {1, 2, 4}
⎫ &
f (1) − p(1) = 4.2 − a − b = h ⎬ 4.3 − 2a − 3b = 0
f (2) − p(2) = 0.1 − a − 2b = −h 5.8 − 2a − 6b = 0

f (4) − p(4) = 5.7 − a − 4b = h ⇒ a = 1.4, b = 0.5, h = 2.3
Thus p5 (x) = 1.4 + 0.5x, with |h| = 2.3. The maximum error of p5 is also 2.3
and so the algorithm ends.
Hence the b.m.a. from P1 to f on {0, 1, 2, 3, 4, 5, 6} is p∗ (x) = 1.4 + 0.5x. The
following figure illustrates the 5 approximations calculated by the algorithm.

p2
6 p1
p5
p4
3
p3

0 3 6

Remark Notice that maximum errors in the above process are not
decreasing, and so this example serves to answer Powell Exercise 8.4.

81
Chapter 10 Rational approximation by
the exchange algorithm
The approximation of continuous functions by rational functions, that is, ratios of
polynomials, is of great practical importance. For example, it is common for
computers to use rational functions to approximate special functions such as ex ,
sin x etc. As an example, we mention the function
40x
r(x) = 1 +
x2 − 20x + 138 − 4116/(x2 + 42)
which approximates f (x) = ex to within 1.11 × 10−7 on [−1, 1]. The theory of
minimax rational approximation is rather like that for minimax polynomial
approximation, but is more difficult because rational functions do not depend
linearly on their coefficients.
The first three sections of Chapter 10 are devoted to a version of the exchange
algorithm for rational functions, including a discussion of its possible failure. The
existence and uniqueness of best minimax rational approximations is not proved
and the characterisation of best minimax rational approximation in terms of
alternating sets is left to a series of (rather hard) exercises (10.1, 10.2 and 10.6).
Section 10.4 gives a brief description of an alternative algorithm for calculating
best minimax rational approximations, called the ‘differential correction
algorithm’ (see the book by Braess for more details). This section will not be
assessed in any way.
This chapter splits into TWO study sessions:
Study session 1: Sections 10.1 and 10.2.
Study session 2: Section 10.3.

Study Session 1: The exchange algorithm for


rational approximation

Read Sections 10.1 and 10.2

Commentary

1. Notice that the evaluation of a polynomial in Pm+n requires m + n additions


and m + n multiplications (using nested multiplication), whereas the
evaluation of a function in Amn requires m + n additions, m + n − 1
multiplications and one division (we may assume that the coefficient of xn in
the denominator is 1). Thus a function in Amn takes hardly any longer to
evaluate than one in Pm+n .

2. Powell uses the letter k in (10.5) to denote the kth approximation to the
b.m.a. from Amn to f . Unfortunately, this makes some of the formulas, such
as (10.12), look even more involved than they are already. This commentary
will suppress the letter k.

3. A proof that each f in C[a, b] has a unique b.m.a. from Amn can be found in
Achieser [2] or Rivlin [138]. Note that Theorem 1.2 does not apply because
Amn is not a linear space.

82
4. The process described in Section 10.2 for solving the non-linear equations
(10.10) to find aj , j = 0, 1, . . . , m, bj , j = 0, 1, . . . , n, and h, is very ingenious,
but rather hard to follow in the abstract. We illustrate the process here with
the concrete example f (x) = ex on [−1, 1], r(x) = (a0 +a1 x)/(b0 + b1 x) in
A11 , and initial reference {ξ0 , ξ1 , ξ2 , ξ3 } = −1, − 21 , 12 , 1 . The calculation
should be compared with that given in the commentary for Chapter 8 to find
the b.m.a. from P2 to f on [−1, 1].
In this case equations (10.10) are
 
a0 + a1 ξ0 = eξ0 − h (b0 + b1 ξ0 ), (1)
 
a0 + a1 ξ1 = eξ1 + h (b0 + b1 ξ1 ), (2)
 
a0 + a1 ξ2 = eξ2 − h (b0 + b1 ξ2 ), (3)
 
a0 + a1 ξ3 = eξ3 + h (b0 + b1 ξ3 ). (4)
These are 4 non-linear equations for the 5 unknowns a0 , a1 , b0 , b1 , h, but
remember that we are free to scale a0 , a1 , b0 , b1 . Even so, it does not look
easy to solve (1), (2), (3), (4).
The method of solution in Section 10.2 involves (10.11), which are special
cases of (4.11). It is helpful here to use the notation
"
m+n+1
1
Πi = , i = 0, 1, . . . , m + n + 1, (5)
j=0
(ξj − ξi )
j=i

so that (10.11) becomes



m+n+1
ξi Πi = 0,  = 0, 1, . . . , m + n. (6)
i=0

In our example, where ξ0 = −1, ξ1 = − 12 , ξ2 = 12 , ξ3 = 1, we have


1 1
Π0 =  1  3  = 23 ,
Π1 =  1   3  = − 43 ,
2 2 (2) − 2 (1) 2
1 1
Π2 =  3   1  = 43 , Π3 =  3  1  = − 32 .
− 2 (−1) 2 (−2) − 2 − 2
Thus equation (6) states that
         4   2
(−1) 23 + − 12 − 34 + 12 3 + (1) − 3 = 0,

 = 0, 1, 2,
which you can readily check.
Thus, if we multiply equations (1), (2), (3), (4) by Π0 , Π1 , Π2 , Π3 ,
respectively, then the left-hand sides sum to zero, giving
3
  ξ 
e i − (−1)i h (b0 + b1 ξi )Πi = 0. (7)
i=0

Similarly, if we multiply equations (1), (2), (3), (4) by ξ0 Π0 , ξ1 Π1 , ξ2 Π2 ,


ξ3 Π3 , respectively, and sum, then we obtain
3
  ξ 
e i − (−1)i h (b0 ξi + b1 ξ2i )Πi = 0. (8)
i=0

Equations (7) and (8) can be written together as


⎡ ⎤
3 1

 ξ 
0= e i − (−1)i h ⎣ bj ξj+
i
⎦ Πi
i=0 j=0
1
. 3 /
   j+
= e − (−1) h ξi Πi bj ,
ξi i
 = 0, 1,
j=0 i=0

83
that is, (A − hB)b = 0, where b = (b0 , b1 ),
3

Aj = eξi ξi+j Πi , , j = 0, 1, (9)
i=0
3
Bj = (−1)i ξi+j Πi , , j = 0, 1. (10)
i=0

In our case
1 1  
A00 = 23 e−1 − 43 e− 2 + 43 e 2 − 23 e = 4
2 sinh 12 − sinh 1 ,
3
1 1  
A10 = − 23 e−1 + 23 e− 2 + 23 e 2 − 23 e = 43 cosh 12 − cosh 1 ,
1 1  
A11 = 23 e−1 − 13 e− 2 + 13 e 2 − 23 e = 23 sinh 12 − 2 sinh 1 ,
and
2 4 4 2
B00 = 3 + 3 + 3 + 3 = 4,
B10 = − 23 − 23 + 23 + 23 = 0,
2 1 1 2
B11 = 3 + 3 + 3 + 3 = 2.

Thus
   
−0.177 347 44 −0.553 939 55 4 0
A= , B= .
−0.553 939 55 −1.219 538 05 0 2
Now we wish to determine those h such that det(A − hB) = 0, that is,
(det B)h2 − (A00 B11 − 2A01 B01 + A11 B00 )h + det A = 0, (11)
in view of the symmetry of A and B. This quadratic is
8h2 + 5.232 847 08 h − 0.090 567 07 = 0,
which has solutions
h1 = 0.016 872 21 and h2 = −0.670 978 095.
Recalling that |h| denotes the levelled reference error, we try h1 first (since it
is smaller) and seek a solution b = (b0 , b1 ) to
    
−0.244 836 28 −0.553 939 55 b0 0
(A − h1 B)b = = .
−0.553 939 55 −1.253 282 47 b1 0
Choosing b1 = 1, we obtain b0 = −2.262 489 65 from the first equation, and
we note that, with these values, q(x) = b0 + b1 x has no zeros in [−1, 1]. From
equations (1) and (4), say, we obtain a0 = −2.299 130 562,
a1 = −1.153 973 103, so that
2.299 130 562 + 1.153 973 103 x
r(x) = ,
2.262 489 65 − x
and the levelled reference error is |h1 | = 0.016 872 21. The argument on
page 115 of Powell shows that the value h2 leads to a rational function r with
a singularity in [−1, 1], as you can easily check.
To continue the exchange algorithm, we seek a number η in [−1, 1] such that
(10.5) holds. The extreme values occur either at ±1 or at solutions of
  
d a0 + a1 x a1 b 0 − a0 b 1
0= ex − = ex − .
dx b0 + b1 x (b0 + b1 x)2
With the calculated values of a0 , a1 , b0 , b1 , we thus have to solve
4.909 982 764
ex − =0
(2.262 489 65 − x)2
and use of Newton’s method or the bisection method gives
η1 = −0.256 234 678, η2 = 0.704 578 163.

84
Evaluating f (x) − r(x) at these points we find that
f − r ∞ = 0.025 321 995 4. Thus we deduce from (10.8) that, to 3 significant
figures,
0.0169 ≤ f − r∗ ∞ ≤ 0.0253,
where r∗ is the b.m.a. from A11 to f (x) = ex on [−1, 1]. It would appear,
therefore, that the least maximum error in approximation from A11 to f is
about half that obtained in approximating from P2 to f (see the commentary
on Section 8.2).
The error function e = f − r has the form shown below.

|h|

− 12 η1
−1 0 1 η2 1
2

−|h|

Hence the new


 reference for  the second iteration of the one-point exchange
algorithm is −1, − 21 , η2 , 1 .

5. Theorem 10.2 and the discussion which follows it shows that all values of h
which satisfy (10.16) are real. Note that if B is symmetric and positive
definite then B is conjugate to a diagonal matrix D with the (positive)
eigenvalues of B lying on the main diagonal. In fact, P−1 BP = D, where P
is the transition matrix from an eigenvector basis to the standard basis
(obtained by writing these eigenvectors as column vectors). Thus
1 1
B = PDP−1 ⇒ B 2 = PD 2 P−1 ,
# 1 $2 1 1
since B 2 = PD 2 P−1 PD 2 P−1 = PDP−1 = B, and so

1
# 1
$−1 1
B− 2 = PD 2 P−1 = PD− 2 P−1 .

Self-assessment questions
S1 Justify the fact, used in equation (10.19), that
"
m+n+1
1 "
m+n+1
1
(−1)i = , i = 0, 1, . . . , m + n + 1.
s=0
(ξs − ξi ) s=0
|ξs − ξi |
s=i s=i

 
3 −2
S2 Let B = . Prove that B is positive definite (i.e. B has strictly
−2 3
1
positive eigenvalues), and determine B− 2 .

85
Study Session 2: Some convergence properties of
the exchange algorithm

Read Section 10.3

Commentary
1. Theorem 10.3 shows that in the discrete case the exchange algorithm
converges. As pointed out in this section, the exchange algorithm may fail,
but if it does converge, then the rate of convergence is very rapid — hence
the algorithm’s importance.

2. Equation (10.29). More precisely, the difficulty occurs when every alternating
set which satisfies (10.29) has length less than m + n + 2. In the example at
the bottom of page 117, where m = n = 1, the longest alternating set for
f − r has length 3, whereas m + n + 2 = 4 in this case (cf. the solution to
Problem P5).

Self-assessment questions
S3 Explain how the method of proof of Theorem 10.1 implies that r∗ − r is the
ratio of two cubic functions with four zeros (page 117, bottom).

S4 Confirm that the functions in A11 which satisfy (10.6) with the data
f (−4) = 0, f (−1) = 1, f (1) = 1, f (4) = 0, are given by (1.6 − 0.2x)/(2 − x)
and (1.6 + 0.2x)/(2 + x).

Problems for Chapter 10


# a c a+c c $
P1 Powell Exercise 10.1 Hint: if b, d > 0, then < ⇔ < .
b d b+d d

P2 Powell Exercise 10.2 (Hint: try to mimic the proof of Theorem 7.1.)

P3 Powell Exercise 10.3

P4 Powell Exercise 10.4 (Note that you need not find the b.m.a. from A12 , in
the second part.)

P5 Powell Exercise 10.6 (Hint: try to use the dimension theorem for linear
mappings.)

Solutions to SAQs in Chapter 10


S1 If i is even (odd), then there are an even (odd) number of points ξ0 , . . . , ξi−1
lying to the left of ξi and hence an even (odd) number of the factors (ξs − ξi )
are negative. Thus (−1)i and the product are either both positive or both
negative. The result follows.

86
S2 The characteristic equation of B is λ2 − 6λ + 5 = 0, so that the eigenvalues
are λ = 1, 5. Since these are both positive, B is positive definite.
Corresponding eigenvectors are (1, 1) for λ = 1 and (1, −1) for λ = 5, so that
the transition matrix from the basis {(1, 1), (1, −1)} to the standard basis is
   
1 1 −1 1 0
P= ⇒ P BP = = D.
1 −1 0 5
Now
 
− 12 1 0√ 1 1
D = ⇒ B− 2 = PD− 2 P−1
0 1/ 5
 √ √ 
1 1
2 (1 + 1/ 5) 2 (1 − 1/ 5)
= 1 √ 1
√ .
2 (1 − 1/ 5) 2 (1 + 1/ 5)

S3 We know that ξi , i = 0, 1, 2, 3, 4, is an alternating set for f − r∗ , that is,


f (ξi ) − r∗ (ξi ) = (−1)i h, i = 0, 1, 2, 3, 4,
for some h, with f − r ∞ = |h|. If r ∈ A22 and f − r ∞ < f − r∗ ∞ , then

|f (ξi ) − r(ξi )| < |f (ξi ) − r∗ (ξi )| , i = 0, 1, 2, 3, 4,


and so each of the numbers
r(ξi ) − r∗ (ξi ) = (f (ξi ) − r∗ (ξi )) −(f (ξi ) − r(ξi )) , i = 0, 1, 2, 3, 4,

has the same sign as (−1) h. Hence r − r has at least 4 zeros (the full
i

strength of Theorem 7.5 is not needed in this case), and since r ∈ A22 ,
r∗ ∈ A11 , the function r − r∗ is indeed the ratio of two cubic functions, whose
denominator has no zeros. Hence r − r∗ = 0, as required (see solution to
Problem P5 for a more general result).

S4 To solve the equations (10.6) we need to consider equation (11), where the
coefficients of A and B are given by (9) and (10) (adapted to the present
case) and Πi , i = 0, 1, 2, 3, are given by (5).
First,
1 1 1 1
Π0 = 3·5·8 = 120 , Π1 = (−3)·2·5 = − 30 ,
1 1 1 1
Π2 = (−5)·(−2)·3 = 30 , Π3 = (−8)·(−5)·(−3) = − 120 ,
so that
1 1
A00 = 0 − 30 + 30 + 0 = 0,
1 1 1
A10 = 0 + 30 + 30 +0 = 15 ,
1 1
A11 = 0 − 30 + 30 + 0 = 0,
1 1 1 1 1
B00 = + 30
120 + 30 + 120 = 12 ,
1
 1
 1  1 
B10 = (−4) · 120 − (−1) · − 30 + 1 · 30 − 4 · − 120 = 0,
1
 1 1  1  1
B11 = 16 · 120 − 1 · − 30 + 1 · 30 − 16 · − 120 = 3.
Hence
   
0 1/15 1/12 0
A= and B = ,
1/15 0 0 1/3
so that equation (11) is h2 /36 − 1/225 = 0 ⇒ h = ±0.4. It follows that the
equations (A − hB)b = 0 reduce to −hb0 /12 + b1 /15 = 0, that is
b1 = 5hb0 /4. Thus, we find from equations (1) and (4) that a0 = 5h2 b0 ,
a1 = hb0 /4, so that the corresponding rational functions are
1.6 − 0.2x 1.6 + 0.2x
(h = −0.4) and (h = 0.4).
2−x 2+x

87
Solutions to Problems in Chapter 10
P1 The key observation here is that for x ∈ [a, b], r(x) lies between r∗ (x) and
r(x). Indeed, by the hint, we deduce that, for x ∈ [a, b],
p∗ (x) p(x) p∗ (x) p∗ (x) + θp(x) p(x)

< ⇒ ∗
< ∗
< ,
q (x) q(x) q (x) q (x) + θq(x) q(x)
since q ∗ , q > 0 on [a, b], and similarly
p∗ (x) p(x) p∗ (x) p∗ (x) + θp(x) p(x)
> ⇒ > > .
q ∗ (x) q(x) q ∗ (x) q ∗ (x) + θq(x) q(x)
Also, if p∗ (x)/q ∗ (x) = p(x)/q(x), then p(x)/q(x) = p∗ (x)/q ∗ (x).
Now suppose that η ∈ [a, b] satisfies f − r ∞ = |f (η) − r(η)|. If r(η) lies
strictly between r∗ (η) and r(η), then it is clear that
|f (η) − r(η)| < max{|f (η) − r∗ (η)| , |f (η) − r(η)|} ≤ f − r∗ ∞ ,
and so f − r ∞ < f − r∗ ∞ , as required.
Otherwise, r(η) = r∗ (η) = r(η), so that
f − r ∞ = |f (η) − r(η)| = |f (η) − r(η)| < f − r∗ ∞ ,
once again.

P2 The aim here is to prove that if


[f (x) − r∗ (x)][r(x) − r∗ (x)] > 0, x ∈ ZM ,
for some r ∈ Amn , where
ZM = {x ∈ [a, b] : |f (x) − r∗ (x)| = f − r∗ ∞ },
then r∗ is not a b.m.a. from Amn to f , because f − rθ ∞ < f − r∗ ∞ for
some θ > 0, where
p∗ + θp
rθ = .
q ∗ + θq
Following the proof of Theorem 7.1, we put e∗ = f − r∗ and define
Z0 = {x ∈ [a, b] : (r(x) − r∗ (x)) e∗ (x) ≤ 0},
so that d = maxx∈Z0 |e∗ (x)| < e∗ ∞ , because Z0 and ZM are disjoint.
Next observe that r∗ − rθ ∞ can be made arbitrarily small by choosing θ
small, since
 ∗ 
p∗ p∗ + θp pq − p∗ q
r∗ − rθ = ∗ − ∗ = −θ ∗ ∗ .
q q + θq q (q + θq
In particular, we can choose θ so small that
r∗ − rθ ∞ < e∗ ∞ − d.
Also recall from Problem P1 that rθ (x) lies between r∗ (x) and r(x), so that
r∗ (x) − rθ (x) and r∗ (x) − r(x) have the same sign (unless both are zero).
Finally, choose ξ ∈ [a, b] such that |f (ξ) − rθ (ξ)| = f − rθ ∞ . If ξ ∈ Z0 , then
f (ξ) − r∗ (ξ) and r∗ (ξ) − rθ (ξ) have the same sign, so that
f − rθ ∞ = |f (ξ) − rθ (ξ)|
= |f (ξ) − r∗ (ξ) + r∗ (ξ) − rθ (ξ)|
= |f (ξ) − r∗ (ξ)| + |r∗ (ξ) − rθ (ξ)|
< d + e∗ ∞ − d
= e∗ ∞ .

88
On the other hand, if ξ ∈ [a, b]\Z0 , then f (ξ) − r∗ (ξ) and r∗ (ξ) − rθ (ξ) have
opposite signs (neither is zero!), so that
f − rθ ∞ = |f (ξ) − rθ (ξ)|
= |f (ξ) − r∗ (ξ) + r∗ (ξ) − rθ (ξ)|
< max{|f (ξ) − r∗ (ξ)| , |r∗ (ξ) − rθ (ξ)|}
≤ e∗ ∞ .
In either case, the desired reduction holds.
We have thus proved the ‘only if’ part of the following analogue of
Theorem 7.1.
Theorem Let f ∈ C[a, b]. Then r∗ is a b.m.a. from Amn to f if and only if
there is no function r in Amn such that
(f (x) − r∗ (x))(r(x) − r∗ (x)) > 0, x ∈ ZM ,
where
ZM = {x ∈ [a, b] : |f (x) − r∗ (x)| = f − r∗ ∞ }.
The proof of the ‘if’ part goes exactly as the beginning of Section 7.2.

P3 As in the solution to SAQ S4 we could obtain equation (11) by determining


Πi , i = 0, 1, 2, 3, from (5) and the coefficients of A and B from (9) and (10).
However, the fact that ξ0 = 0 and f (0) = 0 makes it fairly easy to solve
equations (10.6) directly:
a0 + 0a1 = (0 − h)(b0 + 0b1 ), (12)
a0 + 2a1 = (1 + h)(b0 + 2b1 ), (13)
a0 + 5a1 = (1.6 − h)(b0 + 5b1 ), (14)
a0 + 6a1 = (2 + h)(b0 + 6b1 ). (15)
Substituting for a0 from (12) into (13), (14), (15) and choosing b1 = 1 gives
2a1 = b0 + 2 + 2hb0 + 2h, (16)
5a1 = 1.6b0 + 8 − 5h, (17)
6a1 = 2b0 + 12 + 2hb0 + 6h. (18)
Using (16) to eliminate a1 from (17) and (18) gives
0 = 0.9b0 − 3 + 5hb0 + 10h,
0 = b0 − 6 + 4hb0 ,
and, after eliminating b0 ,
0 = 2.4 + 28h + 40h2 = (h + 0.1)(40h + 24),
so that h = −0.1, −0.6.
The solution h = −0.1 gives b0 = 10, a1 = 4.9, a0 = 1, that is,
1 + 4.9x
r1 (x) = .
10 + x
The solution h = −0.6 gives b0 = −30/7, a1 = 29/35, a0 = −18/7, that is,
−18/7 + (29/35)x −90 + 29x
r2 (x) = = .
−30/7 + x −150 + 35x
The two functions have the following graphs.

89
1 + 4.9x
y=
10 + x
−90 + 29x
y=
−150 + 35x
1

0 1 2 5 6

Note that only the first approximation r1 is in A11 .

P4 It
 is a straightforward
 matter to check that f − r∗ ∞ = 14 and that
1 1
−1, − 2 , 2 , 1 is an alternating set of length 4 for f − r∗ . Since
m + n + 2 = 5 in this case, we can apply the argument at the bottom of page
117 (see also SAQ S3) to show that r∗ is the b.m.a. to f from A21 .
To prove that r∗ is not the b.m.a. from A12 to f , we shall find a function of
the form
x
r(x) = , −1 ≤ x ≤ 1,
a + bx2
such that f − r ∞ < 14 . The above form for r is appropriate because the
function f itself is odd (i.e. f (−x) = −f (x)). There are many reasonable
choices for a and b, but a = 2, b = −1 seems a good idea since then we have
r (0) = 12 and r(1) = f (1) (and also r (1) = f  (1)).

x 1
y=
2−x 2

y = x3

−1 1

−1

The maximum value of |f (x) − r(x)| in [−1, 1] occurs at a zero α of


 
  2 2 − x2 + 2x2
f (x) − r (x) = 3x − 2
(2 − x2 )
   
3x2 4 − 4x2 + x4 − 2 + x2
= 2
(2 − x2 )
3x6 − 12x4 + 11x2 − 2
=
(2 − x2 )2
 2  
x − 1 3x4 − 9x2 + 2
= .
(2 − x2 )2
Thus

9− 81 − 24
α2 = ⇒ α = ±0.491 624 105.
6
Hence f − r ∞ = |f (α) − r(α)| = 0.160 778 31 < 0.25, as required.

P5 It is useful in this problem to denote the degree of a polynomial p by ∂p.


Thus d = min{m − ∂p∗ , n − ∂q ∗ }.

90
If r = p/q lies in Amn , then
p p∗ pq ∗ − p∗ q
r − r∗ = − ∗ = .
q q qq ∗
Thus we need to find polynomials p ∈ Pm , q ∈ Pn such that
"
k
pq ∗ − p∗ q = (x − ξi ), (19)
i=1

where 0 ≤ k ≤ m + n − d. Also, we require q > 0 on [a, b], but we note that


this can be arranged (once a solution to (19) is available) by considering
(p∗ + εp)/(q ∗ + εq), which also satisfies (19), for a suitably small positive
constant ε.
Notice first that the degree of pq ∗ − p∗ q is
max {m + ∂q ∗ , n + ∂p∗ } = m + n − d,
and that the polynomials p, q have altogether m + n + 2 unknown coefficients
a0 , . . . , am , b0 , . . . , bn , say. Equating coefficients of xj , j = 0, 1, . . . , m + n − d,
in (19) gives m + n + 1 − d linear equations for the unknown coefficients,
which we regard as a linear mapping
t : (a0 , . . . , bn ) → (c0 , c1 , . . . cm+n−d ),
from R m+n+2
to Rm+n+1−d .
In order that we can solve for a0 , . . . , bn given any c0 , . . . , cm+n−d , we require
the mapping t to be onto, that is, dim Im(t) = m + n + 1 − d. By the
dimension theorem,
dim Ker(t) + dim Im(t) = m + n + 2,
and so we require dim Ker(t) = d + 1.
Suppose, then, that p/q ∈ Amn and that the coefficient vector of (p, q) lies in
Ker(t). Then
p p∗
pq ∗ − p∗ q = 0 so = ∗, (20)
q q
and since p∗ , q ∗ have no non-constant common factors, we deduce that
p = λp∗ , q = λq ∗ , where λ is a polynomial such that
∂λ = ∂p − ∂p∗ = ∂q − ∂q ∗ ≤ d. It is also clear that if λ is any polynomial
with ∂λ ≤ d, then p/q = λp∗ /λq ∗ lies in Amn and satisfies (20). Hence Ker(t)
consists of arbitrary linear combinations of the d + 1 coefficient vectors of
 d ∗ 
(p∗ (x), q ∗ (x)) , (xp∗ (x), xq ∗ (x)) , . . . , x p (x), xd q ∗ (x) .
These coefficient vectors are clearly linearly independent, and so
dim Ker(t) = d + 1, as required.
The result of this problem should be compared to Haar condition (2), Powell,
page 77. Using this result, the theorem stated in the solution to Problem P2,
and the fact that a polynomial of degree m + n − d has at most m + n − d
sign changes, we deduce by the argument at the beginning of Section 7.3 the
following analogue of Theorem 7.2.
Theorem Let f ∈ C[a, b]. Then r∗ = p∗ /q ∗ is a b.m.a. from Amn to f on
[a, b] if and only if f − r∗ has an alternating set of length m + n + 2 − d,
where d = min{m − ∂p∗ , n − ∂q ∗ }.
Thus a b.m.a. r∗ from Amn to f is defective if and only if f − r∗ does not have
an alternating set of full length m + n + 2. A function f ∈ C[a, b] is called
hypernormal if its b.m.a. from each Amn is not defective. See Rivlin [138]
for a discussion of this concept and a proof that f (x) = ex is hypernormal.

91
Printed in the United Kingdom.

You might also like