0% found this document useful (0 votes)
11 views

Lecture Notes

This document provides lecture notes on analysis covering topics such as metric spaces, uniform convergence, power series, compactness, approximation theory, linear operators, derivatives, and more. The notes are intended to serve as a reference for a mathematical analysis course and include definitions, statements of important facts, examples, and exercises.

Uploaded by

gayaupasana8
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture Notes

This document provides lecture notes on analysis covering topics such as metric spaces, uniform convergence, power series, compactness, approximation theory, linear operators, derivatives, and more. The notes are intended to serve as a reference for a mathematical analysis course and include definitions, statements of important facts, examples, and exercises.

Uploaded by

gayaupasana8
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 131

Analysis II

Lecture Notes

Joris Roos
UW Madison, Fall 2019
Last update: December 14, 2019
Contents

Chapter 1. Review 5
1. Metric spaces 5
2. Uniform convergence 6
3. Power series 7
4. Further exercises 11
Chapter 2. Compactness in metric spaces 13
1. Compactness and continuity 14
2. Sequential compactness and total boundedness 17
3. Equicontinuity and the Arzelà-Ascoli theorem 22
4. Further exercises 26
Chapter 3. Approximation theory 29
1. Polynomial approximation 29
2. Orthonormal systems 32
3. The Haar system 38
4. Trigonometric polynomials 42
5. The Stone-Weierstrass Theorem 51
6. Further exercises 53
Chapter 4. Linear operators and derivatives 59
1. Equivalence of norms 62
2. Dual spaces* 64
3. Sequential `p spaces* 65
4. Derivatives 68
5. Further exercises 73
Chapter 5. Differential calculus in Rn 75
1. The contraction principle 79
2. Inverse function theorem and implicit function theorem 80
3. Ordinary differential equations 86
4. Higher order derivatives and Taylor’s theorem 96
5. Local extrema 101
6. Optimization and convexity* 102
7. Further exercises 109
Chapter 6. The Baire category theorem* 115
1. Nowhere differentiable continuous functions* 118
2. Sets of continuity* 119
3. The uniform boundedness principle* 121
4. Kakeya sets* 126
5. Further exercises 130

3
4 CONTENTS

Disclaimer:
• This content is based on various sources, mainly Principles of Mathematical
Analysis by Walter Rudin, various individual lecture notes by Andreas Seeger,
and my own notes. For my own convenience, I will not reference sources
individually throughout these notes.
• These notes are likely to contain typos, errors and imprecisions of all kinds.
Possibly lots. Some might be deliberate, some less so. Don’t ever take anything
that you read in a mathematical text for granted. Think hard about what you
are reading and try to make sense of it independently. If that fails, then it’s
time to ask somebody a question. That usually helps. If you do notice a
mistake or an inaccuracy, feel free to let me know.
• Thanks to the students of Math 522 for many useful questions and remarks
that have improved these lecture notes.
Some recommended literature for further reading:
There are many books on mathematical analysis each of which likely has a large
intersection with this course. Here are two very good ones:
• W. Rudin, Principles of Mathematical Analysis
• T. Apostol, Mathematical analysis: A modern approach to advanced calculus
For further reading on Fourier series and trigonometric polynomials, see:
• E. M. Stein, R. Shakarchi, Fourier Analysis (modern and very accessible for
beginners)
• Y. Katznelson, Introduction to Harmonic Analysis (slightly more advanced)
• A. Zygmund, Trigonometric Series (a classic that continues to be relevant
today)
We will sometimes dip into concepts from functional analysis. For instance, exposi-
tions of the Baire category theorem and its consequences are also contained in
• W. Rudin, Real and Complex Analysis (Chapter 5)
• E. M. Stein, R. Shakarchi, Functional Analysis (Chapter 4)
We roughly assume knowledge of the content of Rudin’s book up to Chapter 7 up
to (excluding) equicontinuity, but some of the material in previous chapters will also
be repeated (everything related to compactness for instance).
CHAPTER 1

Review

Lecture 1 (Wednesday, Sep 4)

1. Metric spaces
Definition 1.0 (Metric space). A non-empty set X equipped with a map d :
X × X → [0, ∞) is called a metric space if for all x, y, z ∈ X,
(1) d(x, y) = d(y, x)
(2) d(x, z) ≤ d(x, y) + d(y, z)
(3) d(x, y) = 0 if and only if x = y
d is called a metric.
We will use the following notations for (closed) balls in X:
(1.1) B(x0 , r) = {x ∈ X : d(x, x0 ) < r}, B(x0 , r) = {x ∈ X : d(x, x0 ) ≤ r}.
We write B(x0 , r) for the closure of B(x0 , r). Note that B(x0 , r) ⊂ B(x0 , r) ⊂
B(x0 , r), but each of these inclusions may be proper.
Should multiple metric spaces be involved we use subscripts on the metric and balls
to indicate which metric space we mean, i.e. dX refers to the metric of X and BX (x0 , r)
is a ball in the metric space X.
The most important examples of metric spaces for the purpose of this lecture are
R, C, Rn , subsets thereof and Cb (X), the space of bounded continuous functions on a
metric space X which will be introduced later.
Definition 1.1 (Convergence). Let X be a metric space, (xn )n ⊂ X a sequence
and x ∈ X. We say that (xn )n converges to x if for all ε > 0 there exists N ∈ N such
that for all n ≥ N it holds that d(xn , x) < ε.
Definition 1.2 (Continuity). Let X, Y be metric spaces. A map f : X → Y is
called continuous at x ∈ X if for all ε > 0 there exists δ > 0 such that if dX (x, y) < δ,
then dY (f (x), f (y)) < ε. f is called continuous if it is continuous at every x ∈ X. We
also write f ∈ C(X, Y ).
We assume familiarity with basic concepts of metric space topology except for com-
pactness: open sets, closed sets, limit points, closure, completeness, dense sets, con-
nected sets, etc. We will discuss compactness in metric spaces in detail in Section 2.

In this course we will mostly study real- or complex-valued functions on metric


spaces, i.e. f : X → R or f : X → C. Whether functions are real- or complex-valued
is often of little consequence to the heart of the matter. For definiteness we make the
convention that functions are always complex-valued, unless specified otherwise. The
space of continuous functions will be denoted by C(X), while the space of bounded
continuous functions is denoted Cb (X).
5
6 1. REVIEW

2. Uniform convergence
Definition 1.3. A sequence (fn )n of functions on a metric space is called uniformly
convergent to a function f if for all ε > 0 there exists N ∈ N such that for all n ≥ N
and all x ∈ X,
(1.2) |fn (x) − f (x)| < ε.
Compare this to pointwise convergence. To see the difference between the two it
helps to write down the two definitions using the symbolism of predicate logic:

(1.3) ∀ε > 0 ∃N ∈ N ∀x ∈ X ∀n ≥ N : |fn (x) − f (x)| < ε.

(1.4) ∀ε > 0 ∀x ∈ X ∃N ∈ N ∀n ≥ N : |fn (x) − f (x)| < ε.


Formally, the difference is an interchange in the order of universal and existential
quantifiers. The first is uniform convergence, where N needs to be chosen independently
of x (uniformly in x) and the second is pointwise convergence, where N is allowed to
depend on x.

In the following we collect some important facts surrounding uniform convergence


that will be used in this lecture. In case you are feeling a bit rusty on these concepts,
all of these are good exercises to try and prove directly from first principles.
Fact 1.4. A sequence (fn )n of functions on a metric space X converges uniformly
if and only if it is uniformly Cauchy, i.e. for every ε > 0 there exists N ∈ N such that
for all n, m ≥ N and all x ∈ X, |fn (x) − fm (x)| < ε.
Fact 1.5. If (fn )n converges uniformly to f and each fn is bounded, then f is
bounded.
(Recall that a function f : X → C is called bounded if there exists C > 0 such that
|f (x)| ≤ C for all x ∈ X.)
Fact 1.6. If (fn )n converges uniformly to f and each fn is continuous, then f is
continuous.
3. POWER SERIES 7

Lecture 2 (Friday, Sep 6)

Fact 1.7. Let X be a metric space. The space of bounded continuous functions
Cb (X) is a complete metric space with the supremum metric
(1.5) d∞ (f, g) = sup |f (x) − g(x)|.
x∈X

(Recall that a metric space is complete if every Cauchy sequence converges.)


Fact 1.8. Let (fn )n ⊂ Cb (X) be a sequence. Then (fn )n converges in Cb (X) (with
respect to d∞ ) if and only if it converges uniformly to f for some f ∈ Cb (X).
Fact 1.9 (Weierstrass M -test). Let (fn )n be a sequence of functions on a metric
space X such that there exists a sequence of non-negative real numbers (Mn )n with
(1.6) |fn (x)| ≤ Mn
P∞
n = 1, 2, . . . and all x ∈ X. Assume that
for all P n=1 Mn converges. P Then the
series ∞ f
n=1 n converges uniformly (that is, the sequence of partial sums ( m
n=1 fn )m
converges uniformly).
Fact 1.10. Suppose (fn )n is a sequence of Riemann integrable functions on the
interval [a, b] which uniformly converges to some limit f on [a, b]. Then f is Riemann
integrable and
Z b Z b
(1.7) lim fn = f.
n→∞ a a

Exercise 1.11. Remind yourself how to prove all these facts.


Careful: Recall that if fn → f uniformly and fn is differentiable on [a, b], then this
does not imply that f is differentiable.
Exercise 1.12. Find an example for this. (Hint: Try trigonometric functions.)

3. Power series
A power series is a function of the form
X∞
(1.8) f (x) = cn x n
n=0

where cn ∈ C are some complex coefficients.


To a power series we can associate a number R ∈ [0, ∞] called its radius of conver-
gence such that
• P∞ n
P
n=0 cn x converges for every |x| < R,

• n=0 cn xn diverges for every |x| > R.
On the convergence boundary |x| = R, the series may converge or diverge. The number
R can be computed by the Cauchy-Hadamard formula:
 −1
(1.9) R = lim sup |cn |1/n
n→∞

(with the convention that if lim supn→∞ |cn |1/n = 0, then R = ∞.)
8 1. REVIEW

−R 0 R

Figure 1. Radius of convergence

Fact 1.13. A power series with radius of convergence R converges uniformly on


[−R + ε, R − ε] for every 0 < ε < R. Consequently, power series are continuous on
(−R, R).
Exercise 1.14. Prove this. Why does uniform convergence not hold on (−R, R) ?
(Give an example.)
Fact 1.15. If f (x) = ∞ n
P
n=0 cn x has radius of convergence R, then f is differentiable
on (−R, R) and

X
(1.10) f 0 (x) = ncn xn−1
n=1

for |x| < R.


Example 1.16. The exponential function is a power series defined by

X 1 n
(1.11) exp(x) = x .
n=0
n!
The radius of convergence is R = ∞.
Fact 1.17. The exponential function is differentiable and exp0 (x) = exp(x) for all
x ∈ R.
Fact 1.18. For all x, y ∈ R we have the functional equation
(1.12) exp(x + y) = exp(x) exp(y).
It also makes sense to speak of exp(z) for z ∈ C since the series converges absolutely.
We also write ex instead of exp(x).
Example 1.19. The trigonometric functions can also be defined by power series:

X (−1)n
(1.13) cos(x) = x2n
n=0
(2n)!


X (−1)n 2n+1
(1.14) sin(x) = x
n=0
(2n + 1)!

Fact 1.20. The functions sin and cos are differentiable and
(1.15) sin0 (x) = cos(x), cos0 (x) = − sin(x)
3. POWER SERIES 9

The trigonometric functions are related to the exponential function via complex
numbers.
Fact 1.21 (Euler’s identity). For all x ∈ R,
(1.16) eix = cos(x) + i sin(x),
eix + e−ix
(1.17) cos(x) = ,
2
eix − e−ix
(1.18) sin(x) = .
2i
Fact 1.22 (Pythagorean theorem). For all x ∈ R,
(1.19) cos(x)2 + sin(x)2 = 1.
Let us also recall basic properties of complex numbers at this point: For every
complex number z ∈ C there exist a, b ∈ R, r ≥ 0 and φ ∈ [0, 2π) such that
(1.20) z = a + ib = reiφ .
The complex conjugate of z is defined by
(1.21) z = a − ib = re−iφ
The absolute value of z is defined by

(1.22) |z| = a2 + b 2 = r
We have
(1.23) |z|2 = zz.

C
z = a + ib = reiφ

b r

φ
a

Figure 2. Polar and cartesian coordinates in the complex plane


10 1. REVIEW

Lecture 3 (Monday, Sep 9)

We finish the review section with a simple, but powerful theorem on the continuity
of power series on the convergence boundary.
P∞ n
Theorem 1.23 (Abel). Let fP (x) = n=0 cn x be a power series with radius of

convergence R = 1. Assume that n=0 cn converges. Then

X
(1.24) lim f (x) = cn .
x→1−
n=0

(In particular, the limit exists.)


The key idea for the proof is Abel summation, also referred to as summation by
parts. The precise formula can be derived simply by reordering terms (we say that
a−1 = 0):
N
X
(1.25) (an − an−1 )bn = a0 b0 + a1 b1 − a0 b1 + a2 b2 − a1 b2 + · · · + aN bN − aN −1 bN
n=0

(1.26)
N
X −1
= a0 (b0 − b1 ) + a1 (b1 − b2 ) + · · · + aN −1 (bN −1 − bN ) + aN bN = aN bN + an (bn − bn+1 )
n=0
Pn
Proof. To apply summation by parts we set sn = k=0 ck , s−1 = 0. Then
N
X N
X N
X −1
n n N
(1.27) cn x = (sn − sn−1 )x = sN x + (1 − x) s n xn .
n=0 n=0 n=0

Let 0 < x < 1. Then



X
(1.28) f (x) = (1 − x) s n xn
n=0
P∞
Let s = n=0 cn . By assumption, sn → s. Let ε > 0 and choose N ∈ N such that
(1.29) |sn − s| < ε
for all n > N . Then,

X
(1.30) |f (x) − s| = (1 − x) (sn − s)xn ,
n=0
P∞
because (1 − x) n=0 xn = 1. Now we use the triangle inequality and split the sum at
n = N:

≤ε
N
X ∞
X z }| {
n
(1.31) ≤ (1 − x) |sn − s|x + (1 − x) |sn − s| xn
n=0 n=N +1

N
X
(1.32) ≤ (1 − x) |sn − s|xn + ε.
n=0
4. FURTHER EXERCISES 11

By making x sufficiently close to 1 we can achieve that


N
X
(1.33) (1 − x) |sn − s|xn ≤ ε.
n=0

This concludes the proof. 


Abel’s theorem provides a tool to evaluate convergent series.
Example 1.24. Consider the power series

X (−1)n 2n+1
(1.34) f (x) = x .
n=0
2n + 1
The radius of convergence is R = 1. This is the Taylor series at x = 0 of the function
arctan.
Exercise 1.25. (a) Prove that f (x) really is the Taylor series at x = 0 of arctan.
(b) Prove using Taylor’s theorem that arctan(x) is represented by its Taylor series at
x = 0 for every |x| < 1, i.e. that f (x) = arctan(x) for |x| < 1.
(−1)n
It follows from the alternating series test that ∞
P
n=0 2n+1 converges. Thus, Abel’s
theorem implies that

X (−1)n π
(1.35) = lim arctan(x) = arctan(1) = .
n=0
2n + 1 x→1− 4
This is also known as Leibniz’ formula.

4. Further exercises
Exercise 1.26. Prove or disprove convergence for each of the following series (a
and b are real parameters and convergence may depend on their values).
∞ ∞ ∞
X 1 X log n X
(log n)a log log n e1/n − n+1

(i) (ii) (iii) n
n=2
n (log(n))b
a
n=3 n=1

∞ ∞  n 2 ∞
X
−1
X 1 X 1
(iv) cos(πn) sin(πn ) (v) 1+ −e (vi)
n=1 n=2
n n=1
n(n1/n )100

∞ ∞ X
10n k ∞
−(log(n))a kn 1
X X X
(vii) 2 (viii) (−1) (ix)
n=2 n=1 k=0
k! n=1
n2 (1 − cos(n))

Exercise 1.27. Prove or disprove convergence for each of the following sequences
and in case of
p convergence, determine the limit:
4 2 2
(i) an = n + cos(n
2 1
√)−n
(ii) an = n + 2 n − n4 + n3
P 2
(iii) an = nk=n k1
(iv) an = n ∞ 1
P
k=0 n2 +k2
(v) a0 = 1, an+1 = a2n + a1n
2
(vi) an = nk=2 k k−1
Q
2
12 1. REVIEW

Exercise 1.28. For which x ∈ R do the following series converge? On which sets
do these series converge uniformly?

X ∞
X ∞
X
(1.36) (i) 2 n
nx (ii) 1/n n n
(3 − 1) x (iii) tan(n−2 )enx
n=1 n=1 n=1

∞ ∞ ∞
X xn X sin(nx) X
(1.37) (iv) (v) (vi) 2−n tan(bxc + 1/n)
n=1
nn n=1
n2 n=1

Exercise 1.29. (i) Give an example of a sequence (fn )n of continuously differen-


tiable functions defined on R, uniformly convergent on R such that the limit limn→∞ fn0 (x)
does not exist for any value of x ∈ R.
(ii) Give an example of a sequence (fn )n of continuously differentiable functions defined
on R, uniformly convergent on R to some function f such that f is not differentiable.
(iii) Give an example of a sequence (fn )n of continuous bounded functions on R that
converges pointwise to some function f such that f is unbounded and not continuous.
(−1)n
Exercise 1.30. Determine the value of the series ∞
P
n=1 n(n+1) .

Exercise 1.31. For a positive real number x define



X 1
(1.38) f (x) = .
n=0
n(n + 1) + x
(i) Show that f : (0, ∞) → (0, ∞) is a well-defined and continuous function.
(ii) Prove that there exists a unique x0 ∈ (0, ∞) such that f (x0 ) = 2π.
(iii) Determine the value of x0 .

Exercise 1.32. Let f : R → R be a smooth function (i.e. derivatives of all orders


exist). Assume that there exist A > 0, R > 0 such that
(1.39) |f (n) (x)| ≤ An n!
for |x| < R. Show that there exists r > 0 such that for every |x| < r we have that

X f (n) (0) n
(1.40) f (x) = x .
n=0
n!
(That is, prove that the series on the right hand side converges and that the limit is
f (x).)
CHAPTER 2

Compactness in metric spaces

The goal in this section is to study the general theory of compactness in metric
spaces. From Analysis I, you might already be familiar with compactness in R. By
the Heine-Borel theorem, a subset of Rn is compact if and only if it is bounded and
closed. We will see that this no longer holds in general metric spaces. We will also
study in detail compact subsets of the space of continuous functions C(K) where K is
a compact metric space (Arzelà-Ascoli theorem). Let (X, d) be a metric space. We first
review some basic definitions.
Definition 2.1. A collection (Gi )i∈I (ISis an arbitrary index set) of open sets
Gi ⊂ X is called an open cover of X if X ⊂ i∈I Gi .
(Clarification of notation: A ⊂ B means for us that A is a subset of B, not neces-
sarily a proper subset. That is, we also allow A = B. We will write A ( B to refer to
proper subsets.)
Definition 2.2. X is compact if every open cover of X contains a finite subcover.
is, if for every open cover (Gi )i∈I there exists m ∈ N and i1 , . . . , im ∈ I such that
That S
X⊂ m j=1 Gij . This is also called the Heine-Borel property.

Definition 2.3. A subset A ⊂ X is called compact if (A, d|A×A ) is a compact


metric space. Here d|A×A denotes the restriction of d to A × A.

13
14 2. COMPACTNESS IN METRIC SPACES

Lecture 4 (Wednesday, Sep 11)


Review of relative topology. If we have a metric space X and a subset A ⊂ X,
then A as a metric space with the metric d|A×A comes with its own open sets: a set
U ⊂ A is open in A if and only if for every x ∈ U there exists ε > 0 such that
BA (x, ε) = {y ∈ A : d(x, y) < ε} ⊂ U . A set U ⊂ A that is open in A is not
necessarily open in X. However, the open sets in A can be characterized by the open
sets in X: a set U ⊂ A is open in A if and only if there exists V ⊂ X open (in X) such
that U = V ∩ A (see Chapter 2 in Rudin’s book).
Example 2.4. Let X = R, A = [0, 1]. Then U = [0, 12 ) ⊂ A ⊂ X is open in A, but
not open in R. However, there exists V ⊂ R open such that U = V ∩ A: for example,
V = (−1, 12 ).
Theorem 2.5 (Heine-Borel). A subset A ⊂ R is compact if and only if A is closed
and bounded.
This theorem also holds for subsets of Rn but not for subsets of general metric
spaces. We will later identify this as a special case of a more general theorem.
Definition 2.6. A subset A ⊂ X is called relatively compact or precompact if the
closure A ⊂ X is compact.
Examples 2.7. • If X is finite, then it is compact.
• [a, b] ⊂ R isPcompact. [a, b), (a, b) ⊂ R are relatively compact.
n
• {x ∈ Rn : 2 n
i=1 |xi | = 1} ⊂ R is compact.
• The set of orthogonal n × n matrices with real entries O(n, R) is compact as a
2
subset of Rn .
• For general X, the closed ball
(2.1) B(x0 , r) = {x ∈ X : d(x, x0 ) ≤ r} ⊂ X
is not necessarily compact (examples later).
As a warm-up in dealing with the definition of compactness let us prove the follow-
ing.
Fact 2.8. A closed subset of a compact metric space is compact.
Proof. Let (Gi )i∈I be an open cover of a closed subset A ⊂ X. That is, Gi ⊂ A
is open with respect to A. Then Gi = Ui ∩ A for some open Ui ⊂ X (see Theorem 2.30
in Rudin’s book). Note that X\A is open. Thus,
(2.2) {Ui : i ∈ I} ∪ {X\A}
is an open cover of X, which by compactness has a finite subcover {Uik : k =
1, . . . , M } ∪ {X\A}. Then {Gik : k = 1, . . . , M } is an open cover of A. 
Exercise 2.9. Let X be a compact metric space. Prove that there exists a count-
able, dense set E ⊂ X (recall that E ⊂ X is called dense if E = X).
1. Compactness and continuity
We will now prove three key theorems that relate compactness to continuity. In
Analysis I you might have seen versions of these on R or Rn . The proofs are not very
interesting, but can serve as instructive examples of how to prove statements involving
the Heine-Borel property.
1. COMPACTNESS AND CONTINUITY 15

Theorem 2.10. Let X, Y be metric spaces and assume that X is compact. If f :


X → Y is continuous, then it is uniformly continuous.
Proof. Let ε > 0. We need to demonstrate the existence of a number δ > 0
such that for all x, y ∈ X we have that dX (x, y) ≤ δ implies dY (f (x), f (y)) ≤ ε. By
continuity, for every x ∈ X there exists a number δx > 0 such that for all y ∈ X,
dX (x, y) ≤ δx implies dY (f (x), f (y)) ≤ ε/2. Let
(2.3) Bx = B(x, δx /2) = {y ∈ X : dX (x, y) < δx /2}.
Then (Bx )x∈X is an open cover of X. By compactness, there exists a finite subcover by
Bx1 , . . . , Bxm . Now we set
(2.4) δ = 12 min(δx1 , . . . , δxm ).
We claim that this δ does the job. Indeed, let x, y ∈ X satisfy dX (x, y) ≤ δ. There
exists i ∈ {1, . . . , m} such that x ∈ Bxi . Then
(2.5) dX (xi , y) ≤ dX (xi , x) + dX (x, y) ≤ 21 δxi + δ ≤ δxi .

xi
x
y

Figure 1. The balls Bxi , B(xi , δxi ), B(x, δ).

Thus, by definition of δxi ,


(2.6) dY (f (x), f (y)) ≤ dY (f (x), f (xi )) + dY (f (xi ), f (y)) ≤ ε/2 + ε/2 = ε.

Theorem 2.11. Let X, Y be metric spaces and assume that X is compact. If f :
X → Y is continuous, then f (X) ⊂ Y is compact.
Note that for A ⊂ X we have A ⊂ f −1 (f (A)) and for B ⊂ Y we have f (f −1 (B)) ⊂
B, but equality need not hold in either case.
Proof. Let (Vi )i∈I be an open cover of S f (X). Since f is continuous, the sets
Ui = f −1 (Vi ) ⊂ X are open. We have f (X) ⊂ i∈I Vi . So,
[ [
(2.7) X ⊂ f −1 (f (X)) ⊂ f −1 (Vi ) = Ui .
i∈I i∈I

Thus (Ui )i∈I is an open cover of X and by compactness there exists a finite subcover
{Ui1 , . . . , Uim }. That is,
m
[
(2.8) X⊂ Uik
k=1
16 2. COMPACTNESS IN METRIC SPACES

Consequently,
m
[ m
[
(2.9) f (X) ⊂ f (Uik ) ⊂ Vik .
k=1 k=1

Thus {Vi1 , . . . , Vim } is an open cover of f (X). 


2. SEQUENTIAL COMPACTNESS AND TOTAL BOUNDEDNESS 17

Lecture 5 (Friday, Sep 13)

Theorem 2.12. Let X be a compact metric space and f : X → R a continuous


function. Then there exists x0 ∈ X such that f (x0 ) = supx∈X f (x).
By passing from f to −f we see that the theorem also holds with sup replaced by
inf.
Proof. By Theorem 2.11, f (X) ⊂ R is compact. By the Heine-Borel Theorem 2.5,
it is therefore closed and bounded. By completeness of the real numbers, f (X) has a
finite supremum sup f (X) and since f (X) is closed we have sup f (X) ∈ f (X), so there
exists x0 ∈ X such that f (x0 ) = sup f (X) = supx∈X f (x). 
Corollary 2.13. Let X be a compact metric space. Then every continuous func-
tion on X is bounded: C(X) = Cb (X).
For a converse of this statement, see Exercise 2.43 below.
Proof. Let f ∈ C(X). Then |f | : X → [0, ∞) is also continuous. By Theorem
2.12 there exists x0 ∈ X such that |f (x0 )| = supx∈X |f (x)|. Set C = |f (x0 )|. Then
|f (x)| ≤ C for all x ∈ X, so f is bounded. 
2. Sequential compactness and total boundedness
Definition 2.14. A metric space X is sequentially compact if every sequence in X
has a convergent subsequence. This is also called the Bolzano-Weierstrass property.
Let us recall the Bolzano-Weierstrass theorem which you might have seen in Analysis
I.
Theorem 2.15 (Bolzano-Weierstrass). Every bounded sequence in R has a conver-
gent subsequence.
Definition 2.16. A metric space X is bounded if it is contained in a single fixed
ball, i.e. if there exist x0 ∈ X and r > 0 such that X ⊂ B(x0 , r).
Definition 2.17. A metric space X is totally bounded if for every ε > 0 there exist
finitely many balls of radius ε that cover X.
Similarly, we define these terms for subsets A ⊂ X by considering (A, d|A×A ) as its
own metric space.
Note that
(2.10) X totally bounded =⇒ X bounded.
The converse is generally false. However, for A ⊂ Rn we have that A is totally bounded
if and only if A is bounded.
Theorem 2.18. Let X be a metric space. The following are equivalent:
(1) X is compact
(2) X is sequentially compact
(3) X is totally bounded and complete
Corollary 2.19. (1) (Heine-Borel Theorem) A subset A ⊂ Rn is compact if
and only if it is bounded and closed.
(2) (Bolzano-Weierstrass Theorem) A subset A ⊂ Rn is sequentially compact if
and only if it is bounded and closed.
18 2. COMPACTNESS IN METRIC SPACES

Proof of Corollary 2.19. A subset A ⊂ Rn is closed if and only if A is com-


plete as a metric space (this is because Rn is complete). Also, A ⊂ Rn is bounded if
and only if it is totally bounded. Therefore, both claims follow from Theorem 2.18. 
Example 2.20. Let `∞ be the space of bounded sequences (an )n ⊂ C with d(a, b) =
supn∈N |an − bn | (that is, `∞ = Cb (N)). We claim that the closed unit ball around
0 = (0, 0, . . . ),
(2.11) B(0, 1) = {a ∈ `∞ : |an | ≤ 1 ∀n ∈ N}
is bounded and closed, but not compact. Indeed, let e(k) ∈ `∞ be the sequence with

(k) 0, k 6= n,
(2.12) en =
1, k = n.
Then, e(k) ∈ B(0, 1) for all k = 1, 2, . . . but (e(k) )k ⊂ B(0, 1) does not have a convergent
subsequence, because d(e(k) , e(j) ) = 1 for all k 6= j and therefore no subsequence can
be Cauchy. Thus B(0, 1) is not sequentially compact and by Theorem 2.18 it is not
compact.
1
P Example 2.21. Let ` be the space of complex sequences (an )n ⊂ C such that
1
n |an | < ∞. We define a metric on ` by
X
(2.13) d(a, b) = |an − bn |
n

Exercise 2.22. Show that the closed and bounded set B(0, 1) ∈ `1 is not compact.
2. SEQUENTIAL COMPACTNESS AND TOTAL BOUNDEDNESS 19

Lecture 6 (Monday, Sep 16)

Proof of Theorem 2.18. X compact ⇒ X sequentially compact: Suppose that


X is compact, but not sequentially compact. Then there exists a sequence (xn )n ⊂ X
without a convergent subsequence. Let A = {xn : n ∈ N} ⊂ X. Note that A must
be an infinite set (otherwise (xn )n has a constant subsequence). Since A has no limit
points, we have that for every xn there is an open ball Bn such that Bn ∩ A = {xn }.
Also, A is a closed set, so X\A is open. Thus, {Bn : n ∈ N} ∪ {X\A} is an open cover
of X. By compactness of X, it has a finite subcover, but that is a contradiction since
A is an infinite set.

Bi Bj Bk B`

Figure 2.

X sequentially compact ⇒ X totally bounded and complete: Suppose X is sequen-


tially compact. Then it is complete, because every Cauchy sequence that has a conver-
gent subsequence must converge (prove this!). Suppose that X is not totally bounded.
Then there exists ε > 0 such that X cannot be covered by finitely many ε-balls.
Claim: There exists a sequence p1 , p2 , . . . in X such that d(pi , pj ) ≥ ε for all i 6= j.
Proof of claim. Pick p1 arbitrarily and then proceed inductively: say that we have
exists pn+1 such that d(pi , pn+1 ) ≥ ε for all
constructed p1 , . . . , pn already. Then there S
i = 1, . . . , n since otherwise we would have ni=1 B(pi , ε) ⊃ X. 
Now it remains to observe that the sequence (pn )n has no convergent subsequence (no
subsequence can be Cauchy). Contradiction! Thus, X is totally bounded.
20 2. COMPACTNESS IN METRIC SPACES

Lecture 7 (Wednesday, Sep 18)

X totally bounded and complete ⇒ X sequentially compact: Assume that X is to-


tally bounded and complete. Let (xn )n ⊂ X be a sequence. We will construct a conver-
gent subsequence. First we cover X by finitely many 1-balls. At least one of them, call
it B0 , must contain infinitely many of the xn (that is, xn ∈ B0 for infinitely many n), so
(0)
there is a subsequence (xn )n ⊂ B0 . Next, cover X by finitely many 21 -balls. There is
(0)
at least one, B1 , that contains infinitely many of the xn . Thus there is a subsequence
(1) (0) (1)
(xn )n ⊂ B1 . Inductively, we obtain subsequences (xn )n ⊃ (xn )n ⊃ . . . of (xn )n such
(k) (n)
that (xn )n is contained in a ball of radius 2−k . Now let an = xn . Then (an )n is a
subsequence of (xn )n .
Claim: (an )n is a Cauchy sequence.
Proof of claim. Let ε > 0 and N large enough so that 2−N +1 < ε. Then for m > n ≥ N
we have
(2.14) d(am , an ) ≤ 2 · 2−n ≤ 2−N +1 < ε,
because an , am ∈ Bn and Bn is a ball of radius 2−n . 
Since X is complete, the Cauchy sequence (an )n converges.

X sequentially compact ⇒ X compact: Assume that X is sequentially compact.


Let (Gi )i∈I be an open cover of X.
Claim: There exists ε > 0 such that every ball of radius ε is contained in one of the
Gi .
Proof of claim. Suppose not. Then for every n ∈ N there is a ball Bn of radius n1
that is not contained in any of the Gi . Let pn be the center of Bn . By sequential
compactness, the sequence (pn )n has a convergent subsequence (pnk )k with some limit
p ∈ X. Let i0 ∈ I be such that p ∈ Gi0 . Since Gi0 is open there exists δ > 0 such that
B(p, δ) ⊂ Gi0 . Let k be large enough such that d(pnk , p) < δ/2 and n1k < δ/2. Then
Bnk ⊂ B(p, δ) because if x ∈ Bnk , then
(2.15) d(p, x) ≤ d(p, pnk ) + d(pnk , x) < δ/2 + δ/2 = δ.
Thus, Bnk ⊂ B(p, δ) ⊂ Gi0 .

p1 p2 pnk p

Figure 3.

This is a contradiction, because we assumed that the Bn are not contained in any
of the Gi . 
Now let ε > 0 be such that every ε-ball is contained in one of the Gi . We have already
proven earlier that X is totally bounded if it is sequentially compact. Thus there exist
2. SEQUENTIAL COMPACTNESS AND TOTAL BOUNDEDNESS 21

p1 , . . . , pM such that the balls B(pj , ε) cover X. But each B(pj , ε) is contained in a Gi ,
say in Gij , so we have found a finite subcover:
M
[ M
[
(2.16) X⊂ B(pj , ε) ⊂ Gij .
j=1 j=1


Corollary 2.23. Compact subsets of metric spaces are bounded and closed.
Corollary 2.24. Let X be a complete metric space and A ⊂ X. Then A is totally
bounded if and only if it is relatively compact.
Exercise 2.25. Prove this.
22 2. COMPACTNESS IN METRIC SPACES

Lecture 8 (Friday, Sep 20)

3. Equicontinuity and the Arzelà-Ascoli theorem


Let (K, d) be a compact metric space. By Corollary 2.13, continuous functions on
K are automatically bounded. Thus, C(K) = Cb (K) is a complete metric space with
the supremum metric
(2.17) d∞ (f, g) = sup |f (x) − g(x)|
x∈K

(see Fact 1.7). Convergence with respect to d∞ is uniform convergence (see Fact 1.8).
In this section we ask ourselves when a subset F ⊂ C(K) is compact.
Example 2.26. Let F = {fn : n ∈ N} ⊂ C([0, 1]), where
(2.18) fn (x) = xn , x ∈ [0, 1].
F is not compact, because no subsequence of (fn )n converges. This is because the
pointwise limit

0, x ∈ [0, 1),
(2.19) f (x) =
1, x = 1.
is not continuous, i.e. not in C([0, 1]).
The key concept that characterizes compactness in C(K) is equicontinuity.
Definition 2.27 (Equicontinuity). A subset F ⊂ C(K) is called equicontinuous if
for every ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε for all f ∈ F, x, y ∈ K
with d(x, y) < δ.
Definition 2.28. F ⊂ C(K) is called uniformly bounded if there exists C > 0 such
that |f (x)| ≤ C for all x ∈ K and f ∈ F.
F ⊂ C(K) is called pointwise bounded if for all x ∈ K there exists C = C(x) > 0 such
that |f (x)| ≤ C for all f ∈ F.
Note that F ⊂ C(K) is uniformly bounded if and only if it is bounded (as a metric
space, see Definition 2.16). We have
(2.20) F uniformly bounded ⇒ F pointwise bounded.
The converse is false in general.
Fact 2.29. If (fn )n ⊂ C(K) is uniformly convergent (on K), then {fn : n ∈ N} is
equicontinuous.
Proof. Let ε > 0. By uniform convergence there exists N ∈ N such that
(2.21) sup |fn (x) − fN (x)| ≤ ε/3
x∈K

for n ≥ N . By uniform continuity (using Theorem 2.10) there exists δ > 0 such that
(2.22) |fk (x) − fk (y)| ≤ ε/3
for all x, y ∈ K with d(x, y) < δ and all k = 1, . . . , N . Thus, for n ≥ N and x, y ∈ K
with d(x, y) < δ we have
(2.23) |fn (x)−fn (y)| ≤ |fn (x)−fN (x)|+|fN (x)−fN (y)|+|fN (y)−fn (y)| ≤ 3·ε/3 = ε.

3. EQUICONTINUITY AND THE ARZELÀ-ASCOLI THEOREM 23

Fact 2.30. If F ⊂ C(K) is pointwise bounded and equicontinuous, then it is uni-


formly bounded.
Proof. Choose δ > 0 such that
(2.24) |f (x) − f (y)| ≤ 1
for all d(x, y) < δ, f ∈ F. Since K is totally bounded (by Theorem 2.18) there exist
p1 , . . . , pm ∈ K such that the balls B(pj , δ) cover K. By pointwise boundedness, for
every x ∈ K there exists C(x) such that |f (x)| ≤ C(x) for all f ∈ F. Set
(2.25) C := max{C(p1 ), . . . , C(pm )}.
Then for f ∈ F and x ∈ K,
(2.26) |f (x)| ≤ |f (pj )| + |f (x) − f (pj )| ≤ C + 1,
where j is chosen such that x ∈ B(pj , δ). 
Theorem 2.31 (Arzelà-Ascoli). Let K be a compact metric space. Then F ⊂ C(K)
is relatively compact if and only if it is pointwise bounded and equicontinuous.
Corollary 2.32. Let F ⊂ C([a, b]) be such that
(i) F is bounded (i.e. uniformly bounded),
(ii) every f ∈ F is continuously differentiable and
(2.27) F 0 = {f 0 : f ∈ F}
is bounded.
Then F is relatively compact.
24 2. COMPACTNESS IN METRIC SPACES

Lecture 9 (Monday, Sep 23)


P∞
Example 2.33. Let F = {x 7→ n=0 cn x
n
: |cn | ≤ 1} ⊂ C([− 21 , 21 ]). The set F is
bounded, because

X ∞
X
(2.28) cn x n
≤ 2−n = 2.
n=0 n=0

for all sequences (cn )n with |cn | ≤ 1 and for all x ∈ [−1/2, 1/2]. Similarly,
nX∞ o
(2.29) F0 = ncn xn−1 : |cn | ≤ 1
n=1

is also bounded. Thus, F ⊂ C([− 12 , 12 ])


is relatively compact. However, note that the F
interpreted as a subset of C([0, 1]) (with the understanding that convergence at x = 1
is also assumed) is not relatively compact (it contains the set in Example 2.26).
Example 2.34. The set
n o
(2.30) F= sin(πnx) : n ∈ Z ⊂ C([0, 1])
is bounded, but not relatively compact. Indeed, suppose it is. Then by Arzelà-Ascoli
it is equicontinuous, so there exists δ > 0 such that for all n ∈ N and for all x, y ∈ [0, 1]
with |x − y| < δ we have | sin(πnx) − sin(πny)| < 1/2. Set x = 0 and y = 1/(2n) for
n > δ −1 /2. Then | sin(πnx) − sin(πny)| = 1. Contradiction!
Proof of Theorem 2.31. ⇐=: Without loss of generality assume that F is
closed. Assume that F ⊂ C(K) is pointwise bounded and equicontinuous. By Fact
2.30 it follows that F is uniformly bounded. By Exercise 2.9 there exists a countable,
dense set E ⊂ K. Let (fn )n ⊂ F be a sequence.
Claim: There exists a subsequence (fnk )k such that (fnk (p))k converges for all p ∈ E.
Proof of claim. This is again a diagonal subsequence argument. Let E = {p1 , p2 , . . . }.
By the Bolzano-Weierstrass theorem (see Corollary 2.19 or Theorem 2.15) we have
the following inductive claim: for all j ∈ N there is a subsequence (fn(j) )k such that
k
(fn(j) (p` ))k converges for all 1 ≤ ` ≤ j. Indeed, for j = 1 this is a direct consequence
k
of Bolzano-Weierstrass because (fn (p1 ))n is just a bounded sequence of complex num-
bers. Assume we proved the claim up to some j ≥ 1. Then by one more application of
Bolzano-Weierstrass, there is a subsequence (fn(j+1) )k of (fn(j) )k (and therefore a subse-
k k
(k)
quence of (fn )n ) such that (fn(j+1) (pj+1 ))k converges. Now we set nk = nk and observe
k
that (fnk (pj ))k converges for all j. 
Let us set gk = fnk to simplify notation. We show that (gk )k converges uniformly on
K. Let ε > 0. By equicontinuity there exists δ > 0 such that
(2.31) |gk (x) − gk (y)| < ε/3
for all x, y ∈ K with d(x, y) < δ and all k ∈ N. Since K is compact and E is dense,
there exist p1 , . . . , pm ∈ E with
(2.32) K ⊂ B(p1 , δ) ∪ · · · ∪ B(pm , δ).
Since (gk (pj ))k converges, there exists Nj ∈ N such that
(2.33) |gk (pj ) − g` (pj )| < ε/3
3. EQUICONTINUITY AND THE ARZELÀ-ASCOLI THEOREM 25

for all k, ` ≥ Nj . Set


(2.34) N = max{N1 , . . . , Nm }.
Then,
(2.35) |gk (pj ) − g` (pj )| < ε/3
for all k, ` ≥ N and all j = 1, . . . , m. Let x ∈ K. By (2.32) there exists j ∈ {1, . . . , m}
such that x ∈ B(pj , δ). Then from (2.31) and (2.33) we have
(2.36) |gk (x) − g` (x)| ≤ |gk (x) − gk (pj )| + |gk (pj ) − g` (pj )| + |g` (pj ) − g` (x)| ≤ ε.
Thus (gk )k converges uniformly and therefore converges to a limit in F. By Theo-
rem 2.18, this shows that F is relatively compact.

=⇒: Assume that F is relatively compact. Then it is bounded. Say that F is


not equicontinuous. Then there exists ε > 0 and a sequence (fn )n ⊂ F and points
(xn )n , (yn )n ⊂ K with d(xn , yn ) < n1 such that
(2.37) |fn (xn ) − fn (yn )| ≥ ε.
By (relative) compactness, (fn )n has a uniformly convergent subsequence which we will
also call (fn )n for simplicity. Then fn converges to some limit f ∈ F ⊂ C(K). By
uniform convergence, there exists N ∈ N such that
(2.38) |fn (x) − f (x)| < ε/3
for all n ≥ N and x ∈ K. By uniform continuity of f (using Theorem 2.10) there exists
δ > 0 such that
(2.39) |f (x) − f (y)| < ε/3
for all x, y ∈ K with d(x, y) < δ. Let n ≥ max{N, δ −1 }. Then by (2.38) and (2.39) we
have
(2.40) |fn (xn ) − fn (yn )| ≤ |fn (xn ) − f (xn )| + |f (xn ) − f (yn )| + |f (yn ) − fn (yn )| < ε.
This contradicts (2.37). 

Lecture 10 (Wednesday, Sep 25)

Proof of Corollary 2.32. Using the mean value theorem we see that for all
x, y ∈ [a, b] there exists ξ ∈ [a, b] such that
(2.41) f (x) − f (y) = f 0 (ξ)(x − y).
But since F 0 is bounded there exists C > 0 such that
(2.42) |f 0 (ξ)| ≤ C
for all f ∈ F, ξ ∈ [a, b]. Thus,
(2.43) |f (x) − f (x)| ≤ C|x − y|
for all x, y ∈ [a, b] and all f ∈ F. This implies equicontinuity: for ε > 0 we set δ = C −1 ε.
Then for x, y ∈ [a, b] with |x − y| < δ we have
(2.44) |f (x) − f (y)| ≤ C|x − y| < Cδ = ε.
Therefore the claim follows from Theorem 2.31. 
26 2. COMPACTNESS IN METRIC SPACES

Example 2.35. Condition (i) from Corollary 2.32 is necessary, because relatively
compact sets are bounded. Condition (ii) however is not necessary.√ Consider for ex-
ample F = {fn : n = 1, 2, . . . } ⊂ C([0, 1]) with fn (x) = sin(nx)/ n. The set F is
bounded, but F 0 is unbounded. But the sequence (fn )n is uniformly convergent, so by
Fact 2.29, F is equicontinuous and hence relatively compact.

4. Further exercises
Exercise 2.36. Let (X, d) be a metric space and A ⊂ X a subset.
(i) Show that A is totally bounded if and only if A is totally bounded.
(ii) Assume that X is complete. Show that A is totally bounded if and only if A is
relatively compact. Which direction is still always true if X is not complete?
Exercise 2.37. Let `1 denote space of all sequences (aPn )n of complex numbers such
that n=1 |an | < ∞, equipped with the metric d(a, b) = ∞
P∞
n=1 |an − bn |.
(i) Prove that
X∞
1
(2.45) A = {a ∈ ` : |an | ≤ 1}
n=1
is bounded and closed, but not compact.
(ii) Let b ∈ `1 with bn ≥ 0 for all n ∈ N. Show that
(2.46) B = {a ∈ `1 : |an | ≤ bn ∀ n ∈ N}
is compact.
Exercise 2.38. Recall that `∞ is the metric space of bounded sequences of complex
numbers equipped with the supremum metric d(a, b) = supn∈N |an − bn |. Let s ∈ `∞ be
a sequence of non–negative real numbers that converges to zero. Let
(2.47) A = {a ∈ `∞ : |an | ≤ sn for all n}.
Prove that A ⊂ `∞ is compact.
Exercise 2.39. For each of the following subsets of C([0, 1]) prove or disprove
compactness:
(i) A1 = {f ∈ C([0, 1]) : maxx∈[0,1] |f (x)| ≤ 1},
(ii) A2 = A1 ∩ {p : p polynomial of degree ≤ d} (where d ∈ N is given)
(iii) A3 = A1 ∩ {f : f is a power series with infinite radius of convergence}
Exercise 2.40. Let F ⊂ C([a, b]) be a bounded set. Assume that there exists a
function ω : [0, ∞) → [0, ∞) such that
(2.48) lim ω(t) = ω(0) = 0.
t→0+

and for all x, y ∈ [a, b], f ∈ F,


(2.49) |f (x) − f (y)| ≤ ω(|x − y|).
Show that F ⊂ C([a, b]) is relatively compact.
Exercise 2.41. For 1 ≤Pp < ∞ we denote by `p the space of sequences (an )n of
complex numbers such that ∞ p p
n=1 |an | < ∞. Define a metric on ` by
!1/p
X
(2.50) d(a, b) = |an − bn |p .
n∈N
4. FURTHER EXERCISES 27

The purpose of this exercise is to prove a theorem of Fréchet that characterizes com-
pactness in `p . Let F ⊂ `p .
(i) Assume that F is bounded and equisummable in the following sense: for all ε > 0
there exists N ∈ N such that
X∞
(2.51) |an |p < ε for all a ∈ F.
n=N
Then show that F is totally bounded.
(ii) Conversely, assume that F is totally bounded. Then show that it is equisummable
in the above sense.
Hint: Mimick the proof of Arzelà-Ascoli.
Exercise 2.42. Let C k ([a, b]) denote the space of k-times continuously differentiable
functions on [a, b] endowed with the metric
k
X
(2.52) d(f, g) = sup |f (j) (x) − g (j) (x)|.
j=0 x∈[a,b]

Let 0 ≤ ` < k be integers and consider the canonical embedding map


(2.53) ι : C k ([a, b]) → C ` ([a, b]) with ι(f ) = f.
Prove that if B ⊂ C k ([a, b]) is bounded, then the image ι(B) = {ι(f ) : f ∈ B} ⊂
C ` ([a, b]) is relatively compact. Hint: Use the Arzelà-Ascoli theorem.

Exercise 2.43. Let X be a metric space. Assume that for every continuous function
f : X → C there exists a constant Cf > 0 such that |f (x)| ≤ Cf for all x ∈ X. Show
that X is compact. Hint: Assume that X is not sequentially compact and construct
an unbounded continuous function on X.
Exercise 2.44. Consider F = {fN : N ∈ N} ⊂ C([0, 1]) with
N
X
(2.54) fN (x) = b−nα sin(bn x),
n=0
where 0 < α < 1 and b > 1 are fixed.
(a) Show that F is relatively compact in C([0, 1]).
(b) Show that F 0 is not a bounded subset of C([0, 1]).
(c) Show that there exists c > 0 such that for all x, y ∈ R and N ∈ N,
(2.55) |fN (x) − fN (y)| ≤ c|x − y|α .
Exercise 2.45. Suppose (X, d) is a metric space with a countable dense subset, i.e.
a set A = {x1 , x2 , . . . } ⊂ X with A = X. Let `∞ denote the metric space of bounded
sequences a = (an )n of real numbers with metric d∞ (a, b) = supn∈N |an − bn |. Show
that there exists a map ι : X → `∞ with d∞ (ι(x), ι(y)) = d(x, y) for every x, y ∈ X (in
other words, X can be isometrically embedded into `∞ ).
CHAPTER 3

Approximation theory

In this section we want to study different ways to approximate continuous functions.


Let X be a normed vector space of functions (say, continuous functions on [0, 1])
and A ⊂ X some subspace of it (say, polynomials). Let f ∈ X be arbitrary. Our
goal is to ‘approximate‘ the function f by functions g in A. We measure the quality of
approximation by the error in norm, i.e. kf − gk.
The most basic question in this context is:
Can we make kf − gk arbitrarily small?
More precisely, we are asking if A is dense in X. Recall that A ⊂ X is called dense if
A = X. That is, if for every f ∈ X and every ε > 0 there exists a g ∈ A such that
kf − gk ≤ ε.

1. Polynomial approximation
Theorem 3.1 (Weierstrass). For every continuous function f on [a, b] there exists
a sequence of polynomials that converges uniformly to f .
In other words, the theorem says that the set A = {p : p polynomial} is dense in
C([a, b]).

There are many proofs of this theorem in the literature. We present a proof using
Bernstein polynomials. Without loss of generality we consider only the interval [a, b] =
[0, 1] (why are we allowed to do that?).

29
30 3. APPROXIMATION THEORY

Lecture 11 (Friday, Sep 27)


Let f be continuous on [0, 1]. Define for n = 1, 2, . . . :
Xn  k n
(3.1) Bn f (t) = f tk (1 − t)n−k .
k=0
n k
Bn f is a polynomial of degree n. We will show that Bn f → f uniformly on [0, 1]. By
the binomial theorem,
n  
n
X n k
(3.2) 1 = (t + 1 − t) = t (1 − t)n−k .
k=0
k
Thus,
n  
X n k
(3.3) Bn f (t) − f (t) = (f (k/n) − f (t)) t (1 − t)n−k .
k=0
k
Let ε > 0. By uniform continuity of f we choose δ > 0 be such that |f (t) − f (s)| ≤ ε/2
for all t, s ∈ [0, 1] with |t − s| ≤ δ. Now we write the sum on the right hand side of
(3.3) as I + II, where
n  
X n k
(3.4) I= (f (k/n) − f (t)) t (1 − t)n−k ,
k=0,
k
k
| −t|<δ
n

n  
X n k
(3.5) II = (f (k/n) − f (t)) t (1 − t)n−k .
k=0,
k
k
| −t|≥δ
n
We estimate I and II separately. For I we have from uniform continuity that
n  
X n k
(3.6) |I| ≤ ε/2 t (1 − t)n−k = ε/2.
k=0
k
1. POLYNOMIAL APPROXIMATION 31

Lecture 12 (Monday, Sep 30)


To estimate II we first compute the Bernstein polynomials for the monomials 1, t, t2 .
Lemma 3.2. Let gm (t) = tm . Then
(3.7) Bn g0 (t) = 1

(3.8) Bn g1 (t) = t

t − t2
(3.9) Bn g2 (t) = t2 + for n ≥ 2
n
Proof. We have
n  
X n k
(3.10) Bn g0 (t) = t (1 − t)n−k = (t + (1 − t))n = 1
k=0
k
by the binomial theorem. Next,
n   n  
X k n k n−k
X n−1 k
(3.11) Bn g1 (t) = t (1 − t) = t (1 − t)n−k
k=0
n k k=1
k − 1
n−1  
X n−1 k
=t t (1 − t)(n−1)−k = t(t + (1 − t))n−1 = t.
k=0
k
To compute Bn g2 we use that
k2 n
       
k n−1 n−1k−1 n−1 1 n−1
(3.12) = = +
n2 k n k−1 n n−1 k−1 n k−1
   
n−1 n−2 1 n−1
= + .
n k−2 n k−1
Thus,
n   n  
n−1X n−2 k n−k 1 X n−1 k
(3.13) Bn g2 (t) = t (1 − t) + t (1 − t)n−k
n k=2 k − 2 n k=1 k − 1
n−1 2 1 t − t2
= t + t = t2 + .
n n n

As a consequence, we obtain the following:
Lemma 3.3. For all t ∈ [0, 1],
n  
2 n 1
X
(3.14) k
( n − t) tk (1 − t)n−k ≤ .
k=0
k n
Proof. From the previous lemma,
n  
2 n
X
(3.15) k
( n − t) tk (1 − t)n−k = Bn g2 (t) − 2tBn g1 (t) + t2 Bn g0 (t)
k=0
k
t − t2 t − t2
= t2 + − 2t2 + t2 = .
n n
Since t ∈ [0, 1] we have 0 ≤ t − t2 = t(1 − t) ≤ 1. 
32 3. APPROXIMATION THEORY

Now we are ready to estimate II. First note that f is bounded, so there exists c > 0
such that |f (x)| ≤ c for all x ∈ [0, 1]. Choose N ∈ N such that 2cδ −2 N −1 ≤ ε/2. Then
for all n ≥ N ,
n   n  
X n k n−k −2
X
2 n
(3.16) |II| ≤ 2c t (1 − t) ≤ 2cδ k
( n − t) tk (1 − t)n−k
k=0,
k k=0
k
k
| −t|≥δ
n

≤ 2cδ −2 N −1 ≤ ε/2.
In the second inequality we have used that δ −2 | nk −t|2 ≤ 1. Thus if n ≥ N and t ∈ [0, 1],
then
(3.17) |Bn f (t) − f (t)| ≤ |I| + |II| ≤ ε/2 + ε/2 = ε.
This concludes the proof of Weierstrass’ theorem.

2. Orthonormal systems
In the previous section we studied approximation of continuous functions in the
supremum norm, kf k∞ = supx∈[a,b] |f (x)|. In this section we turn our attention to
another important norm, the L2 norm.
Definition 3.4. For two piecewise continuous functions f, g on an interval [a, b] we
define their inner product by
Z b
(3.18) hf, gi = f (x)g(x)dx.
a

If hf, gi = 0 we say that f and g are orthogonal. We define the L2 -norm of f by


Z b 1/2
2
(3.19) kf k2 = |f (x)| dx .
a
2
If kf k2 = 1 then we say that f is L -normalized.
Note: Some comments are in order regarding the term ’piecewise continuous’. For
our purposes we call a function f , defined on an interval [a, b], piecewise continuous if
limx→x0 f (x) exists at every point x0 and is different from f (x0 ) at at most finitely many
points. We denote this class of functions by pc([a, b]). Piecewise continuous functions
are Riemann integrable.
The inner product has the following properties (for functions f, g, h and λ ∈ C):
• Sesquilinearity:
(3.20) hf + λg, hi = hf, hi + λhg, hi,

(3.21) hh, f + λgi = hh, f i + λhh, gi.


• Antisymmetry: hf, gi = hg, f i
• Positivity: hf, f i ≥ 0 (and > 0 unless f is zero except at possibly finitely many
points)
Theorem 3.5 (Cauchy-Schwarz inequality). For two piecewise continuous functions
f, g we have
(3.22) |hf, gi| ≤ kf k2 kgk2 .
2. ORTHONORMAL SYSTEMS 33

Proof. For nonnegative real numbers x and y we have the elementary inequality
x2 y 2
(3.23) xy ≤ + .
2 2
Thus we have
(3.24)
Z b Z b Z b
2
|hf, gi| ≤ 1
|f (x)g(x)|dx ≤ 2 |f (x)| dx + 21
|g(x)|2 dx. = 12 hf, f i + 21 hg, gi.
a a a
Now we note that for every λ > 0, replacing f by λf and g by λ−1 g does not change
the left hand side of this inequality. Thus we have for every λ > 0 that
λ2 1
(3.25) |hf, gi| ≤ 2
hf, f i + 2λ2
hg, gi.
q
hg,gi
Now we choose λ so that this inequality is as strong as possible: λ2 = hf,f i
(we may
assume that hf, f i =
6 0 because otherwise there is nothing to show). Then
p p
(3.26) |hf, gi| ≤ hf, f i hg, gi.
Note that one can arrive at this definition of λ in a systematic way: treat the right
hand side of (3.25) as a function of λ and minimize it using calculus. 
Corollary 3.6 (Minkowski’s inequality). For two functions f, g ∈ pc([a, b]),
(3.27) kf + gk2 ≤ kf k2 + kgk2 .
Proof. We may assume kf + gk2 6= 0 because otherwise there is nothing to prove.
Then
Z b Z b Z b
2 2
(3.28) kf + gk2 = |f + g| ≤ |f + g||f | + |f + g||g|
a a a

(3.29) ≤ kf + gk2 kf k2 + kf + gk2 kgk2 = kf + gk2 (kf k2 + kgk2 ).


Dividing by kf + gk2 we obtain kf + gk2 ≤ kf k2 + kgk2 . 
This is the triangle inequality for k · k2 . This makes d(f, g) = kf − gk2 a metric on
say, the set of continuous functions. Unfortunately, the resulting metric space is not
complete. (Its completion is a space called L2 ([a, b]), see Exercise 3.70.)
34 3. APPROXIMATION THEORY

Lecture 13 (Wednesday, October 2)

Definition 3.7. A sequence (φn )n of piecewise continuous functions on [a, b] is


called an orthonormal system on [a, b] if
Z b 
0, if n 6= m,
(3.30) hφn , φm i = φn (x)φm (x)dx =
a
1, if n = m.
(The index n may run over the natural numbers, or thePintegers, a finite set of
integers, or more generally any countable set. We will write n to denote a sum over
all the indices. In proofs we will always adopt the interpretation that the index n runs
over 1, 2, 3, . . . . This is no loss of generality.)

Notation: For a set A we denote by 1A the characteristic function of A. This is the


function such that 1A (x) = 1 when x ∈ A and 1A (x) = 0 when x 6∈ A.
Example 3.8 (Disjoint support). Let φn (x) = 1[n,n+1) and N ∈ N. Then (φn )n=0,...,N −1
is an orthonormal system on [0, N ].
Examples 3.9 (Trigonometric functions). The following are orthonormal systems
on [0, 1]:
1. φn (x) = e√2πinx
2. φn (x) = √2 cos(2πnx)
3. φn (x) = 2 sin(2πnx)
Exercise 3.10 (Rademacher functions). For n = 0, 1, . . . and x ∈ [0, 1] we define
rn (x) = sgn(sin(2n πx)). Show that (rn )n is an orthonormal system on [0, 1].
Let (φn )n be an orthonormal system and let f be a finite linear combination of the
functions (φn )n . Say,
N
X
(3.31) f (x) = cn φn (x).
n=1
Then there is an easy way to compute the coefficients cn :
Z b
(3.32) cn = hf, φn i = f (x)φn (x)dx.
a

To prove this we multiply (3.31) by φm (x) and integrate over x:


Z b N
X Z b N
X
(3.33) f (x)φm (x)dx = cn φn (x)φm (x)dx = cn hφn , φm i = cm .
a n=1 a n=1

Notice that the formula cn = hf, φn i still makes sense if f is not of the form (3.31).
Theorem 3.11. Let (φn )n be an orthonormal system on [a, b]. Let f be a piecewise
continuous function. Consider
N
X
(3.34) sN (x) = hf, φn iφn (x).
n=1

Denote the linear span of the functions (φn )n=1,...,N by XN . Then


(3.35) kf − sN k2 ≤ kf − gk2
holds for all g ∈ XN with equality if and only if g = sN .
2. ORTHONORMAL SYSTEMS 35

In other words, the theorem says that among all functions of the form N
P
n=1 cn φn (x),
2
the function sN defined by the coefficients cn = hf, φn i is the best “L -approximation”
to f in the sense that (3.35) holds.
This can be interpreted geometrically: the function sN is the orthogonal projection
of f onto the subspace XN . As in Euclidean space, the orthogonal projection is char-
acterized by being the point in XN that is closest to f and it is uniquely determined
by this property (see Figure 1).

XN⊥

kf − gk2 kf − sN k2

XN
g sN

Figure 1. sN is the orthogonal projection of f onto XN .


36 3. APPROXIMATION THEORY

Lecture 14 (Friday, October 4)

Theorem 3.12 (Bessel’s inequality). If (φn )n is an orthonormal system on [a, b]


and f a piecewise continuous function on [a, b] then
X
(3.36) |hf, φn i|2 ≤ kf k22 .
n

Corollary 3.13 (Riemann-Lebesgue lemma). Let (φn )n=1,2,... be an orthonormal


system and f a piecewise continuous function. Then
(3.37) lim hf, φn i = 0.
n→∞

This follows because the series ∞ 2


P
n=1 |hf, φn i| converges as a consequence of Bessel’s
inequality.

Definition 3.14. An orthonormal system (φn )n is called complete if


X
(3.38) |hf, φn i|2 = kf k22
n

for all f .
Theorem 3.15. Let (φn )n be an orthonormal system on [a, b]. Let (sN )N be as
in Theorem 3.11. Then (φn )n is complete if and only if (sN )N converges to f in the
L2 -norm (that is, limN →∞ kf − sN k2 = 0) for every piecewise continuous f on [a, b].
We will later see that the orthonormal system φn (x) = e2πinx (n ∈ Z) on [0, 1] is
complete.
Proof of Theorem 3.11. Let g ∈ XN and write
N
X
(3.39) g(x) = bn φn (x).
n=1

Let us also write


(3.40) cn = hf, φn i.
We have
N
X N
X
(3.41) hf, gi = bn hf, φn i = c n bn .
n=1 n=1

Using that (φn )n is orthonormal we get


N
DX N
X E X N
N X N
X
(3.42) hg, gi = bn φn , bm φm = bn bm hφn , φm i = |bn |2 .
n=1 m=1 n=1 m=1 n=1

Thus,
(3.43) hf − g, f − gi = hf, f i − hf, gi − hg, f i + hg, gi
N
X N
X N
X
(3.44) = hf, f i − c n bn − c n bn + |bn |2
n=1 n=1 n=1
2. ORTHONORMAL SYSTEMS 37

N
X N
X
2
(3.45) = hf, f i − |cn | + |bn − cn |2
n=1 n=1
We have
(3.46) hf − sN , f − sN i = hf, f i − hf, sN i − hsN , f i + hsN , sN i
N
X N
X N
X
2 2
= hf, f i − 2 |cn | + |cn | = hf, f i − |cn |2 .
n=1 n=1 n=1
Thus we have shown
N
X
(3.47) hf − g, f − gi = hf − sN , f − sN i + |bn − cn |2
n=1
PN 2
which implies the claim since n=1 |bn − cn | ≥ 0 with equality if and only if bn = cn
for all n = 1, . . . , N . 
Proof of Theorem 3.12. From the calculation in (3.46),
N
X
(3.48) hf, f i − |cn |2 = hf − sN , f − sN i ≥ 0,
n=1
PN 2 2
so n=1 |cP
n | ≤ kf k2 for all N . Letting N → ∞ this proves the claim (in particular,

the series n=1 |cn |2 converges). 
Proof of Theorem 3.15. From (3.46),
N
X
(3.49) kf − sN k22 = hf, f i − |hf, φn i|2
n=1

This converges to 0 as N → ∞ if and only if (φn )n is complete. 


38 3. APPROXIMATION THEORY

Lecture 15 (Monday, October 7)

3. The Haar system


In this section we discuss an important example of an orthonormal system on [0, 1].
Definition 3.16 (Dyadic intervals). For non-negative integers j, k with 0 ≤ j < 2k
we define
(3.50) Ik,j = [2−k j, 2−k (j + 1)) ⊂ [0, 1].
The interval Ik,j is called a dyadic interval and k is called its generation.
S We denote
by Dk the set of all dyadic intervals of generation k and by D = k≥0 Dk the set of all
dyadic intervals on [0, 1].
Definition 3.17. Each dyadic interval I ∈ D with |I| = 2−k can be split in the
middle into its left child and right child, which are again dyadic intervals that we denote
by I` and Ir , respectively.
Example 3.18. The interval I = [ 21 , 21 + 14 ) is a dyadic interval and its left and right
children are given by I` = [ 12 , 12 + 18 ) and Ir = [ 21 + 18 , 12 + 14 ).

0 1
D0

D1

D2

.. .. ..
. . .

S
D= k≥0 Dk

0 I` Ir 1

Figure 2. Dyadic intervals.

Lemma 3.19. (1) Two dyadic intervals are either disjoint or contained in each
other. That is, for every I, J ∈ D at least one of the following is true: I ∩J = ∅
or I ⊂ J or J ⊂ I.
3. THE HAAR SYSTEM 39

(2) For every k ≥ 0 the dyadic intervals of generation k are a partition of [0, 1).
That is,
[
(3.51) [0, 1) = I.
I∈Dk

Exercise 3.20. Prove this lemma.


Exercise 3.21. Let J ⊂ [0, 1] be any interval. Show that there exists I ∈ D such
that |I| ≤ |J| and 3I ⊃ J. (Here 3I denotes the interval with three times the length of
I and the same center as I.)
Definition 3.22. For each I ∈ D we define the Haar function associated with it
by
(3.52) ψI = |I|−1/2 (1I` − 1Ir )
The countable set of functions given by
(3.53) H = {1[0,1] } ∪ {ψI : I ∈ D}
is called the Haar system on [0, 1].
Example 3.23. The Haar function associated with the dyadic interval I = [0, 12 ) is
given by

(3.54) ψ[0, 1 ] = 2 · (1[0, 1 ) − 1[ 1 , 1 ) )
2 4 4 2

|I|−1/2

I` Ir

Figure 3. A Haar function ψI .

Lemma 3.24. The Haar system on [0, 1] is an orthonormal system.


R1
Proof. Let f ∈ H. If f = 1[0,1] then kf k2 = ( 0 12 )1/2 = 1. Otherwise, f = ψI for
some I ∈ D. Then by (3.52) and since I` and Ir are disjoint,
Z 1 Z 1 Z 1
2 2 −1

(3.55) kf k2 = |ψI | = |I| 1I` + 1Ir = 1.
0 0 0
Next let f, g ∈ H with f 6= g. Suppose that one of f, g equals 1[0,1] , say f = 1[0,1] . Then
g = ψJ for some J ∈ D and thus
Z 1
(3.56) hf, gi = ψJ = 0.
0
40 3. APPROXIMATION THEORY

It remains to treat the case that f = ψI and g = ψJ for I, J ∈ D with I 6= J. By


Lemma 3.19 (i), I and J are either disjoint or contained in each other. If I and J are
disjoint, then hψI , ψJ i = 0. Otherwise they are contained in each other, say I ( J.
Then ψJ is constant on the set where ψI is different from zero. Thus,
Z Z 1
−1
(3.57) hψI , ψJ i = ψI · ψJ = ±|I| (1I` − 1Ir ) = 0.
0

Let us write
[
(3.58) D<n = Dk
0≤k<n

to denote the set of dyadic intervals of generation less than n. We want to study how
continuous functions can be approximated by linear combinations of Haar functions.
Let f ∈ C([0, 1]). Motivated by Theorem 3.11, we define for every positive integer n,
the orthogonal projection
X
(3.59) En f = hf, ψI iψI .
I∈D<n

Definition 3.25. For a function f on [0, 1] and an interval I ⊂ [0, 1] we write


−1
R
hf iI = |I| I
f to denote the average or the mean of f on I.
R1
Theorem 3.26. Let 0 f = 0. Then, for every I ∈ Dn ,
(3.60) En f (x) = hf iI if x ∈ I.
In other words,
X
(3.61) En f = hf iI 1I .
I∈Dn
R1
Theorem 3.27. Suppose that 0
f = 0 and f ∈ C([0, 1]). Then
(3.62) En f → f uniformly on [0, 1] as n → ∞.
Remark. If f ∈ C([0, 1]) does not have mean zero then En f converges to f − hf i[0,1] .
Corollary 3.28. The Haar system is complete in the sense of Definition 3.14.
For every f ∈ C([0, 1]) we have
X
(3.63) kf k22 = |hf i[0,1] |2 + |hf, ψI i|2 .
I∈D

Exercise 3.29. By using Theorem 3.27, prove Corollary 3.28.


Proof of Theorem 3.26. Fix n ≥ 0 and write g = En f . We prove something
seemingly stronger.
Claim. For every dyadic interval I ∈ Dn , we have hf iI = hgiI .
This implies the statement in the theorem because En f is constant on dyadic intervals
of generation n.
To prove the claim weRperform an induction on I ∈ Dn . To begin with, the claim holds
1
for I = [0, 1) because 0 f = 0. Now suppose that it is true for some interval I ∈ D<n .
It suffices to show that it also holds for I` and Ir , i.e. that
(3.64) hf iI` = hgiI` and hf iIr = hgiIr .
3. THE HAAR SYSTEM 41

Since the Haar system is orthonormal and I ∈ D<n ,


X
(3.65) hg, ψI i = hf, ψJ ihψJ , ψI i = hf, ψI i.
J∈D<n

Compute
Z Z Z
1/2
(3.66) f− f = |I| f · ψI = |I|1/2 hf, ψI i
I` Ir
and by the same reasoning,
Z Z
(3.67) g− g = |I|1/2 hg, ψI i.
I` Ir
Combining the last three displays we get
Z Z Z Z
(3.68) f− f= g− g.
I` Ir I` Ir

By the inductive hypothesis we know that hf iI = hgiI , so


Z Z Z Z
(3.69) f+ f= g+ g.
I` Ir I` Ir

Adding the previous two displays gives hf iI` = hgiI` and subtracting them gives hf iIr =
hgiIr . This concludes the proof. 
Proof of Theorem 3.27. Let ε > 0. By uniform continuity of f on [0, 1] (which
follows from Theorem 2.10) we may choose δ > 0 such that |f (t) − f (s)| < ε whenever
t, s ∈ [0, 1] are such that |t − s| < δ. Let N ∈ N be large enough so that 2−N < δ and
n ≥ N . Let t ∈ [0, 1] and I ∈ Dn such that t ∈ I. Then by Theorem 3.26,
Z
−1
(3.70) |En f (t) − f (t)| = |hf iI − f (t)| ≤ |I| |f (s) − f (t)|ds < ε.
I

Remark. This result goes back to A. Haar’s 1910 article Zur Theorie der orthogo-
nalen Funktionensysteme in Math. Ann. 69 (1910), no. 3, p. 331–371. The functions
(En f )n are also called dyadic martingale averages of f and have wide applications in
modern analysis and probability theory.
Exercise 3.30. Recall the functions rn (x) = sgn(sin(2n πx)) from Exercise 3.10.
(i) Show that every rn for n ≥ 1 can be written as a finite linear combination of Haar
functions and determine the coefficients of this linear combination.
(ii) Show that the orthonormal system on [0, 1] given by (rn )n is not complete.
Exercise 3.31. Define
X 1/2
(3.71) ∆n f = En+1 f − En f, Sf = |∆n f |2 .
n≥1
R1
(i) Assume that 0 f = 0. Prove that kSf k2 = kf k2 .
(ii) Show that for every m ∈ N there exists a finite linear combination of Haar functions
fm such that supx∈[0,1] |fm (x)| ≤ 1 and supx∈[0,1] |Sfm (x)| ≥ m.
42 3. APPROXIMATION THEORY

Lecture 16 (Wednesday, October 9)

4. Trigonometric polynomials
In the following we will only be concerned with the trigonometric system on [0, 1]:
(3.72) φn (x) = e2πinx (n ∈ Z)
Definition 3.32. A trigonometric polynomial is a function of the form
N
X
(3.73) f (x) = cn e2πinx (x ∈ R),
n=−N

where N ∈ N and cn ∈ C. If cN or c−N is non-zero, then N is called the degree of f .


From Euler’s identity (see Fact 1.21) we see that every trigonometric polynomial
can also be written in the alternate form

N
X
(3.74) f (x) = a0 + (an cos(2πnx) + bn sin(2πnx)).
n=1

Exercise 3.33. Work out how the coefficients an , bn in (3.74) are related to the cn
in (3.73).
Every trigonometric polynomial is 1-periodic:
(3.75) f (x) = f (x + 1)
for all x ∈ R.
Fact 3.34. (e2πinx )n∈Z forms an orthonormal system on [0, 1]. In particular,
(i) for all n ∈ Z,
Z 1 
2πinx 0, if n 6= 0,
(3.76) e dx =
0
1, if n = 0.
PN
(ii) if f (x) = n=−N cn e2πinx is a trigonometric polynomial, then
Z 1
(3.77) cn = f (t)e−2πint dt.
0

One goal in this section is to show that this orthonormal system is in fact complete.
We denote by pc the space of piecewise continuous, 1-periodic functions f : R →
C (let us call a 1-periodic function piecewise continuous, if its restriction to [0, 1] is
piecewise continuous in the sense defined in the beginning of this section).
Definition 3.35. For a 1-periodic function f ∈ pc and n ∈ Z we define the nth
Fourier coefficient by
Z 1
(3.78) f (n) =
b f (t)e−2πint dt.
0
The series

X
(3.79) fb(n)e2πinx
n=−∞

is called the Fourier series of f .


4. TRIGONOMETRIC POLYNOMIALS 43

The question of when the Fourier series of a function f converges and in what sense
it represents the function f is a very subtle issue and we will only scratch the surface
in this lecture.
Definition 3.36. For a 1-periodic function f ∈ pc we define the partial sums
N
X
(3.80) SN f (x) = fb(n)e2πinx .
n=−N

Remark. Note that since (φn )n is an orthonormal system, SN f is exactly the or-
thogonal projection of f onto the space of trigonometric polynomials of degree ≤ N .
In particular, Theorem 3.11 tells us that
(3.81) kf − SN f k2 ≤ kf − gk2
holds for all trigonometric polynomials g of degree ≤ N . That is, SN f is the best
approximation to f in the L2 -norm among all trigonometric polynomials of degree
≤ N.
Definition 3.37 (Convolution). For two 1-periodic functions f, g ∈ pc we define
their convolution by
Z 1
(3.82) f ∗ g(x) = f (t)g(x − t)dt
0

Note that if f, g ∈ pc then f ∗ g ∈ pc.


Example 3.38. SupposeR f is a given 1-periodic function and g is a 1-periodic
1
function, non-negative and 0 g = 1. Then (f ∗ g)(x) can be viewed as a weighted
average of f around x with weight profile g. For instance, if g = 2N 1[−1/N,1/N ] , then
(f ∗ g)(x) is the average value of f in the interval [x − 1/N, x + 1/N ].
Fact 3.39. For 1-periodic functions f, g ∈ pc,
(3.83) f ∗ g = g ∗ f.
Proof. For x ∈ [0, 1],
(3.84) Z 1 Z x Z 0 Z x
f ∗g(x) = f (t)g(x−t)dt = f (x−t)g(t)dt = f (x−t)g(t)dt+ f (x−t)g(t).
0 x−1 x−1 0
Z 1 Z x
(3.85) = f (x − (t − 1))g(t − 1)dt + f (x − t)g(t)dt = g ∗ f (x),
x 0
where in the last step we used that f (x − (t − 1)) = f (x − t) and g(t − 1) = g(t) by
periodicity. 
It turns out that the partial sum SN f can be written in terms of a convolution:
XN Z 1 Z 1 N
X
−2πint 2πinx
(3.86) SN f (x) = f (t)e dte = f (t) e2πin(x−t) dt = f ∗ DN (x).
n=−N 0 0 n=−N

where
N
X
(3.87) DN (x) = e2πinx .
n=−N
44 3. APPROXIMATION THEORY

The sequence of functions (DN )N is called Dirichlet kernel. The Dirichlet kernel can
be written more explicitly.
Fact 3.40. We have
sin(2π(N + 21 )x)
(3.88) DN (x) =
sin(πx)
Proof.
N 2N
X
−2πiN x
X e2πi(2N +1)x − 1
(3.89) DN (x) = 2πinx
e =e e2πinx = e−2πiN x
n=−N n=0
e2πix − 1
1 1
e2πi(N + 2 ) − e−2πi(N + 2 )x sin(2π(N + 21 )x)
(3.90) = = .
eπix − e−πix sin(πx)

4. TRIGONOMETRIC POLYNOMIALS 45

Lecture 17 (Friday, October 11)


We would like to approximate continuous functions by trigonometric polynomials.
If f is only continuous it may happen that SN f (x) does not converge. However, instead
of SN f we may also consider their arithmetic means. We define the Fejér kernel by
N
1 X
(3.91) KN (x) = Dn (x).
N + 1 n=0
Fact 3.41. We have
1 1 − cos(2π(N + 1)x) 1  sin(π(N + 1)x) 2
(3.92) KN (x) = =
2(N + 1) sin(πx)2 N +1 sin(πx)
Proof. Using that 2 sin(x) sin(y) = cos(x − y) − cos(x + y),

(3.93)
sin(2π(N + 21 )x) 2 sin(πx) sin(2π(N + 12 )x) cos(2πN x) − cos(2π(N + 1)x)
DN (x) = = = .
sin(πx) 2 sin(πx)2 2 sin(πx)2
Thus,

(3.94)
N N
X 1 X 1 − cos(2π(N + 1)x)
Dn (x) = 2
cos(2πnx) − cos(2π(n + 1)x) =
n=0
2 sin(πx) n=0 2 sin(πx)2
The claim now follows from the formula 1 − cos(2x) = 2 sin(x)2 . 
As a consequence of this explicit formula we see that KN (x) ≥ 0 for all x ∈ R which
is not at all obvious from the initial definition. We define
(3.95) σN f (x) = f ∗ KN (x).
Theorem 3.42 (Fejér). For every 1-periodic continuous function f ,
(3.96) σN f → f
uniformly on R as N → ∞.
Corollary 3.43. Every 1-periodic continuous function can be uniformly approxi-
mated by trigonometric polynomials.
Remark. There is nothing special about the period 1 here. By considering the or-

thonormal system (L−1/2 e L inx )n∈Z we obtain a similar result for L-periodic functions.

This follows from Fejér’s theorem because σN f is a trigonometric polynomial:

(3.97)
1 N n N n Z
1 X X 1
Z
1 X X 2πik(x−t)
σN f (x) = f (t) e dt = f (t)e−2πikt dte2πikx
0 N + 1 n=0 k=−n N + 1 n=0 k=−n 0
(3.98)
N n N N N
1 X X b 1 X X X |k|
= f (k)e2πikx = fb(k)e2πikx = (1 − N +1
)fb(k)e2πikx .
N + 1 n=0 k=−n N + 1 k=−N k=−N
n=|k|

We will now derive Fejér’s Theorem as a consequence of a more general principle.


46 3. APPROXIMATION THEORY

Definition 3.44 (Approximation of unity). A sequence of 1-periodic continuous


functions (kn )n is called approximation of unity if for all 1-periodic continuous functions
f we have that f ∗ kn converges uniformly to f on R. That is,
(3.99) sup |f ∗ kn (x) − f (x)| → 0
x∈R
as n → ∞.
Remark. There is no unity for the convolution of functions. More precisely, there
exists no continuous function k such that k ∗ f = f for all continuous, 1-periodic f (this
is the content of Exercise 3.62). An approximation of unity is a sequence (kn )n that
approximates unity:
(3.100) lim kn ∗ f = f
n→∞

for every continuous, 1-periodic f .


Theorem 3.45. Let (kn )n be a sequence of 1-periodic continuous functions such
that
(1) kn (x) ≥ 0
R 1/2
(2) −1/2 kn (t)dt = 1.
(3) For all 1/2 ≥ δ > 0 we have
Z δ
(3.101) kn (t)dt → 1
−δ
as n → ∞.
Then (kn )n is an approximation of unity.

Figure 4. Approximation of unity

Assumption (3) is a precise way to express the idea that the “mass” of kn con-
centrates near the origin. Keeping in mind Assumption (2), Assumption (3) can be
rewritten equivalently as:
Z
(3.102) kn (t)dt → 0
1
≥|t|≥δ
2
4. TRIGONOMETRIC POLYNOMIALS 47

Proof. Let f be 1-periodic and continuous. By continuity, f is bounded and


uniformly continuous on [−1/2, 1/2]. By periodicity, f is also bounded and uniformly
continuous on all of R. Let ε > 0. By uniform continuity there exists δ > 0 such that
(3.103) |f (x − t) − f (x)| ≤ ε/2
for all |t| < δ, x ∈ R. Using Assumption (2),
Z 1/2
(3.104) f ∗ kn (x) − f (x) = (f (x − t) − f (x))kn (t)dt = A + B,
−1/2

where Z Z
(3.105) A= (f (x − t) − f (x))kn (t)dt, B= (f (x − t) − f (x))kn (t)dt.
1
|t|≤δ ≥|t|≥δ
2
By 3.103 and Assumption (2),
Z
ε ε
(3.106) |A| ≤ kn (t)dt ≤ .
2 |t|≤δ 2
Since f is bounded there exists C > 0 such that |f (x)| ≤ C for all x ∈ R. for all
0 < δ < 21 . Let N be large enough so that for all n ≥ N ,
Z
ε
(3.107) kn (t)dt ≤ .
1
≥|t|≥δ 4C
2
Thus, if n ≥ N ,
Z
ε
(3.108) |B| ≤ 2C kn (t)dt ≤ .
1
≥|t|≥δ 2
2
This implies
(3.109) |f ∗ kn (x) − f (x)| ≤ ε/2 + ε/2 ≤ ε
for n ≥ N and x ∈ R. 
Corollary 3.46. The Fejér kernel (KN )N is an approximation of unity.
Proof. We verify the assumptions of Theorem 3.45. From (3.92) we see that
KN ≥ 0. Also,
Z 1/2 N n Z N
1 X X 1/2 2πikt 1 X
(3.110) KN (t)dt = e dt = 1 = 1.
−1/2 N + 1 n=0 k=−n −1/2 N + 1 n=0
1
Now we verify the last property. Let > δ > 0 and |x| ≥ δ. By (3.92),
2
1 1
(3.111) KN (x) ≤
N + 1 sin(πδ)2
Thus,
Z
1 1
(3.112) KN (t)dt ≤
1
≥|t|≥δ N + 1 sin(πδ)2
2
which converges to 0 as N → ∞. 
Therefore we have proven Fejér’s theorem. Note that although the Dirichlet kernel
also satisfies Assumptions (2) and (3), it is not an approximation of unity. In other
words, if f is continuous then it is not necessarily true that SN f → f uniformly.
However, we can use Fejér’s theorem to show that SN f → f in the L2 -norm.
48 3. APPROXIMATION THEORY

Theorem 3.47. Let f be a 1-periodic and continuous function. Then


(3.113) lim kSN f − f k2 = 0.
N →∞

Proof. Let ε > 0. By Fejér’s theorem there exists a trigonometric polynomial p


such that |f (x) − p(x)| ≤ ε/2 for all x ∈ R. Then
Z 1 1/2
(3.114) kf − pk2 = |f (x) − p(x)|2 dx ≤ ε/2.
0
Let N be the degree of p. Then SN p = p by Fact 3.34. Thus,
(3.115) SN f − f = SN f − SN p + SN p − f = SN (f − p) + p − f.
By Minkowski’s inequality,
(3.116) kSN f − f k2 ≤ kSN (f − p)k2 + kp − f k2
Bessel’s inequality (Theorem 3.12) says that kSN f k2 ≤ kf k2 . Therefore,
(3.117) kSN f − f k2 ≤ 2kf − pk2 ≤ ε.

In view of Theorem 3.15 this means that the trigonometric system is complete.
4. TRIGONOMETRIC POLYNOMIALS 49

Lecture 18 (Monday, October 14)

Corollary 3.48 (Parseval’s theorem). If f, g are 1-periodic, continuous functions,


then

X
(3.118) hf, gi = fb(n)bg (n).
n=−∞

In particular,

X
(3.119) kf k22 = |fb(n)|2 .
n=−∞

Proof. We have
N
X N
X
(3.120) hSN f, gi = fb(n)he2πinx , gi = fb(n)b
g (n).
n=−N n=−N

But hSN f, gi → hf, gi as N → ∞ because


(3.121) |hSN f, gi − hf, gi| = |hSN f − f, gi| ≤ kSN f − f k2 kgk2 → 0
as N → ∞. Here we have used the Cauchy-Schwarz inequality and the previous theo-
rem. Equation (3.119) follows from putting f = g. 
Remark. Theorems 3.47 and Corollary 3.48 also hold for piecewise continuous and
1-periodic functions.
Exercise 3.49. (i) Let f be the 1-periodic function such that f (x) = x for x ∈ [0, 1).
Compute the Fourier coefficient fb(n) for every n ∈ Z and use Parseval’s theorem to
derive the formula

X 1 π2
(3.122) = .
n=1
n2 6
(ii)
P∞Using Parseval’s theorem for a suitable 1-periodic function, determine the value of
1
n=1 n4 .

While the Fourier series of a continuous function does not necessarily converge point-
wise, we can obtain pointwise convergence easily if we impose additional conditions.
Theorem 3.50. Let f be a 1-periodic continuous function and let x ∈ R. Assume
that f is differentiable at x. Then SN f (x) → f (x) as N → ∞.
Proof. By definition,
Z 1
(3.123) SN f (x) = f (x − t)DN (t)dt.
0
Also,
Z 1 N
X Z 1
(3.124) DN (t)dt = e2πint dt = 1.
0 n=−N 0

Thus from Fact 3.40,


Z 1
(3.125) SN f (x) − f (x) = (f (x − t) − f (x))DN (t)dt
0
50 3. APPROXIMATION THEORY
Z 1
(3.126) = g(t) sin(2π(N + 12 )t)dt,
0
where
f (x − t) − f (x)
(3.127) g(t) = .
sin(πt)
Differentiability of f at x implies that g is continuous at 0. Indeed,
f (x − t) − f (x) f (x − t) − f (x) t 1
(3.128) = → f 0 (x)
sin(πt) t sin(πt) π
as t → 0.

Exercise 3.51. Show that φn (x) = 2 sin(2π(n + 21 )x) with n = 1, 2, . . . defines
an orthonormal system on [0, 1].
With this exercise, the claim follows from (3.126) and the Riemann-Lebesgue lemma
(Corollary 3.13). 
Exercise 3.52. Show that there exists a constant c > 0 such that
Z 1
(3.129) |DN (x)|dx ≥ c log(2 + N )
0
holds for all N = 0, 1, . . . .
Exercise 3.53. (i) Let (ak )k be a sequence of complex numbers with limit L. Prove
that
a1 + · · · + an
lim =L
n→∞ n
Given the sequence ak , form the partial sums sn = nk=1 ak and let
P

s1 + · · · + sN
σN = .
N
σN is Pcalled the N th Cesàro mean of the sequence sk or the N th Cesàro P∞ sum of the
series ∞ a
k=1 k . If σ N converges to a limit S we say that the series k=1 ak is Cesàro
summable to S. P

P∞ that if k=1 ak is summable to S (i.e. by definition converges with sum S)
(ii) Prove
then k=1 ak is Cesàro summable to S.
(iii) Prove that the sum ∞ k−1
P
k=1 (−1) does not converge but is Cesàro summable to
some limit S and determine S.
5. THE STONE-WEIERSTRASS THEOREM 51

Lecture 19 (Wednesday, October 16)

5. The Stone-Weierstrass Theorem


We have seen two different classes of continuous functions that are rich enough
to enable uniform approximation of arbitrary continuous functions: polynomials and
trigonometric polynomials. In other words, we have shown that polynomials are dense
in C([a, b]) and trigonometric polynomials are dense in C(R/Z) (space of continuous
and 1-periodic functions). The Stone-Weierstrass theorem gives a sufficient criterion
for a subset of C(K) to be dense (where K is a compact metric space). Both, Fejér’s
and Weierstrass’ theorems are consequences of this more general theorem.
Theorem 3.54 (Stone-Weierstrass). Let K be a compact metric space and A ⊂
C(K). Assume that A satisfies the following conditions:
(1) A is a self-adjoint algebra: for f, g ∈ A, c ∈ C,
(3.130) f + g ∈ A, f · g ∈ A, c · f ∈ A, f ∈ A.
(2) A separates points: for all x, y ∈ K with x 6= y there exists f ∈ A such that
f (x) 6= f (y).
(3) A vanishes nowhere: for all x ∈ K there exists f ∈ A such that f (x) 6= 0.
Then A is dense in C(K) (that is, A = C(K)).
Exercise 3.55. Let K be a compact metric space. Show that if a subset A ⊂ C(K)
does not separate points or does not vanish nowhere, then A is not dense.

PnExercise 3.56. Let A ⊂ C([1, 2]) be the set of all polynomials of the form p(x) =
2k+1
k=0 ck x where ck ∈ C and n a non-negative integer. Show that A is dense, but
not an algebra.
Before we begin the proof of the Stone-Weierstrass theorem we first need some
preliminary lemmas.
Lemma 3.57. For every a > 0 there exists a sequence of polynomials (pn )n with real
coefficients such that pn (0) = 0 for all n and supx∈[−a,a] |pn (x) − |x|| → 0 as n → ∞.
Proof. From Weierstrass’ theorem we get that there exists a sequence of poly-
nomials qn that converges uniformly to f (x) = |x| on [−a, a]. Now set pn (x) =
qn (x) − qn (0). 
Exercise 3.58. Work out an explicit sequence of polynomials (pn )n that converges
uniformly to x 7→ |x| on [−1, 1].
Let A ⊂ C(K) satisfy conditions (1),(2),(3). Observe that then also A satisfies (1),
(2), (3).
We may assume without loss of generality that we are dealing with real-valued
functions (otherwise split functions into real and imaginary parts f = g + ih and go
through the proof for both parts).
Lemma 3.59. If f ∈ A, then |f | ∈ A.
Proof. Let ε > 0 and a = maxx∈K |f (x)|. By Lemma 3.57 there exist c1 , . . . , cn ∈
R such that
n
X
(3.131) | ci y i − |y|| ≤ ε.
i=1
52 3. APPROXIMATION THEORY

for all y ∈ [−a, a]. By Condition (1) we have that


X n
(3.132) g= ci f i ∈ A.
i=1

Then |g(x) − |f (x)|| ≤ ε for all x ∈ K. Thus, |f | can be uniformly approximated by


functions in A. But A is closed, so |f | ∈ A. 
Lemma 3.60. If f1 , . . . , fm ∈ A, then min(f1 , . . . , fm ) ∈ A and max(f1 , . . . , fm ) ∈
A.
Proof. It suffices to show the claim for m = 2 (the general case then follows by
induction). Let f, g ∈ A. We have
f + g |f − g| f + g |f − g|
(3.133) min(f, g) = − , max(f, g) = + .
2 2 2 2
Thus, Condition (1) and Lemma 3.60 imply that min(f, g), max(f, g) ∈ A. 
Lemma 3.61. For every x0 , x1 ∈ K, x0 6= x1 and c0 , c1 ∈ R there exists f ∈ A such
that f (xi ) = ci for i = 0, 1.
In other words, any two points in K × R that could lie on the graph of a function
in A do lie on the graph of a function in A.
Proof. By Conditions (2) and (3) there exist g, h0 , h1 ∈ A such that g(x0 ) 6= g(x1 )
and hi (xi ) 6= 0 for i = 0, 1. Set
(3.134) ui (x) = g(x)hi (x) − g(x1−i )hi (x).
Then ui (x1−i ) = 0 and ui (xi ) 6= 0 for i = 0, 1. Set
c0 u0 (x) c1 u1 (x)
(3.135) f (x) = + .
u0 (x0 ) u1 (x1 )
Then f (x0 ) = c0 and f (x1 ) = c1 and f ∈ A by Condition (1). 
This lemma can be seen as a baby version of the full theorem: the statement ex-
tends to finitely many points. So we can use it to find a function in A that matches
a given function f in any given collection of finitely many points (see Exercise 3.81).
Thus, if K was finite, we would already be done. If K is not finite, we need to exploit
compactness. Let us now get to the details.
6. FURTHER EXERCISES 53

Lecture 20 (Friday, October 18)

Fix f ∈ C(K) and let ε > 0.

Claim: For every x ∈ K there exists gx ∈ A such that gx (x) = f (x) and
gx (t) > f (t) − ε for t ∈ K.
Proof of Claim. Let y ∈ K. By Lemma 3.61 there exists hy ∈ A such that
hy (x) = f (x) and hy (y) = f (y). By continuity of hy there exists an open ball By
around y such that |hy (t) − f (t)| < ε for all t ∈ By . In particular,
(3.136) hy (t) > f (t) − ε.
Observe that (By )y∈K is an open cover of K. Since K is compact, we can find a finite
subcover by By1 , . . . , Bym . Set
(3.137) gx = max(hy1 , . . . , hym ).
By Lemma 3.60, gx ∈ A. 
By continuity of gx there exists an open ball Ux such that
(3.138) |gx (t) − f (t)| < ε
for t ∈ Ux . In particular,
(3.139) gx (t) < f (t) + ε.
(Ux )x∈K is an open cover of K which has a finite subcover by Ux1 , . . . , Uxn . Then let
(3.140) h = min(gx1 , . . . , gxn ).
By Lemma 3.60 we have h ∈ A. Also,
(3.141) f (t) − ε < h(t) < f (t) + ε
for all t ∈ K. That is,
(3.142) |f (t) − h(t)| < ε
for all t ∈ K. This proves that f ∈ A.

6. Further exercises
Exercise 3.62. Show that there exists no continuous 1-periodic function g such
that f ∗ g = f holds for all continuous 1-periodic functions f .
Hint: Use the Riemann-Lebesgue lemma.
Exercise 3.63. Give an alternative proof of Weierstrass’ theorem by using Fejér’s
theorem and then approximating the resulting trigonometric polynomials by truncated
Taylor expansions.
Exercise 3.64. Find a sequence of continuous functions (fn )n on [0, 1] and a con-
tinuous function f on [0, 1] such that kfn − f k2 → 0, but fn (x) does not converge to
f (x) for any x ∈ [0, 1].
Exercise 3.65 (Weighted L2 norms). Fix a function w ∈ C([a, b]) that is non-
negative and does not vanish identically. Let us define another inner product by
Z b
(3.143) hf, giL2 (w) = f (x)g(x)w(x)dx
a
54 3. APPROXIMATION THEORY

1/2
and a corresponding norm kf kL2 (w) = hf, f iL2 (w) . Similarly, we say that (φn )n is an
orthonormal system by asking that hφn , φm iL2 (w) is 1 if n = m and 0 otherwise. Verify
that all theorems in Section 2 continue to hold when h·, ·i, k·k2 are replaced by h·, ·iL2 (w) ,
k · kL2 (w) , respectively.
Exercise 3.66. Let w ∈ C([0, 1]) be such that w(x) ≥ 0 for all x ∈ [0, 1] and w ≡
6 0.
Prove that there exists a sequence of real-valued polynomials (pn )n such that pn is of
degree n and

Z 1 
1, if n = m,
(3.144) pn (x)pm (x)w(x)dx =
0
0, if n 6= m
for all non-negative integers n, m.
Exercise 3.67 (Chebyshev polynomials). Define a sequence of polynomials (Tn )n
by T0 (x) = 1, T1 (x) = x and the recurrence relation Tn (x) = 2xTn−1 (x) − Tn−2 (x) for
n ≥ 2.
(i) Show that Tn (x) = cos(nt) if x = cos(t).
Hint: Use that 2 cos(a) cos(b) = cos(a + b) + cos(a − b) for all a, b ∈ C.
(ii) Compute
Z 1
dx
(3.145) Tn (x)Tm (x) √
−1 1 − x2
for all non-negative integers n, m.
(iii) Prove that |Tn (x)| ≤ 1 for x ∈ [−1, 1] and determine when there is equality.
Exercise 3.68. Let d be a positive integer and f ∈ C([a, b]). Denote by Pd the set
of polynomials with real coefficients of degree ≤ d. Prove that there exists a polynomial
p∗ ∈ Pd such that kf − p∗ k∞ = inf p∈Pd kf − pk∞ .
Hint: Find a way to apply Theorem 2.12.
Exercise 3.69. Let f be smooth on [0, 1] (that is, arbitrarily often differentiable).
(i) Let p be a polynomial such that |f 0 (x) − p(x)| ≤ ε for all x ∈ [0, 1]. Construct a
polynomial q such that |f (x) − q(x)| ≤ ε for all x ∈ [0, 1].
(k)
(ii) Prove that there exists a sequence of polynomials (pn )n such that (pn )n converges
uniformly on [0, 1] to f (k) for all k = 0, 1, 2, . . . .
Exercise 3.70 (The space L2 ). Let (X, d) be a metric space. Recall that the
completion X of X is defined as follows: for two Cauchy sequences (an )n , (bn )n in X
we say that (an )n ∼ (bn )n if limn→∞ d(an , bn ) = 0. Then ∼ is an equivalence relation on
the space of Cauchy sequences and we define X as the set of equivalence classes. We
identify X with a subset of X by identifying x ∈ X with the equivalence class of the
constant sequence (x, x, . . . ). We make X a metric space by defining
(3.146) d(a, b) = lim d(an , bn ),
n→∞

where (an )n , (bn )n are representatives of a, b ∈ X, respectively. Then X is a complete


metric space. Let us denote by L2c (a, b) the metric space of continuous functions on
Rb
[a, b] equipped with the metric d(f, g) = kf − gk2 , where kf k2 = ( a |f |2 )1/2 . Define
(3.147) L2 (a, b) = L2c (a, b).
6. FURTHER EXERCISES 55

(i) Define an inner product on L2 (a, b) by


Z b
(3.148) hf, gi = lim fn (x)gn (x)dx,
n→∞ a
2
for f, g ∈ L (a, b) with (fn )n , (gn )n being representatives of f, g, respectively. Show
that this is well-defined: that is, show that the limit on the right hand side exists and
is independent of the representatives (fn )n , (gn )n and that h·, ·i is an inner product.
Hint: Use the Cauchy-Schwarz inequality on L2c (a, b).

For f ∈ L2 (a, b) we define kf k2 = hf, f i1/2 . Let (φn )n=1,2,... be an orthonormal


system in L2 (a, b) (that is, hφn , φm i = 0 if n =
6 m and = 1 if n = m).

(ii) Prove Bessel’s inequality: for every f ∈ L2 (a, b) it holds that


X∞
(3.149) |hf, φn i|2 ≤ kf k22
n=1

Hint: Use the same proof as seen for L2c (a, b) in the lecture!

(iii) Let (cn )n ⊂ C be a sequence of complex numbers and let


N
X
(3.150) fN = cn φn ∈ L2 (a, b).
n=1
2
Show that (fN )N converges in L (a, b) if and only if
X∞
(3.151) |cn |2 < ∞.
n=1

Exercise 3.71. Let f be the 1-periodic function such that f (x) = |x| for x ∈
[−1/2, 1/2]. Determine explicitly a sequence of trigonometric polynomials (pN )N such
that pN → f uniformly as N → ∞.
Exercise 3.72. Let f, g be continuous, 1-periodic functions.
(i) Show that f[ ∗ g(n) = fb(n)b
g (n).
P
(ii) Show that f · g(n) = m∈Z fb(n − m)b
d g (m).
(iii) If f is continuously differentiable, prove that fb0 (n) = 2πinfb(n).
(iv) Let y ∈ R and set fy (x) = f (x + y). Show that fby (n) = e2πiny fb(n).
(v) Let m ∈ Z, m 6= 0 and set fm (x) = f (mx). Show that fc b n
m (n) equals f ( m ) if m
divides n and zero otherwise.

dn
Exercise 3.73 (Legendre polynomials). Define pn (x) = dxn
[(1 − x2 )n ] for n =
0, 1, . . . and
Z 1
−1/2
(3.152) φn (x) = pn (x) · pn (t)2 dt .
−1
Show that (φn )n=0,1,... is a complete orthonormal system on [−1, 1].
Exercise 3.74. Let f be 1-periodic and k times continuously differentiable. Prove
that there exists a constant c > 0 such that
(3.153) |fb(n)| ≤ c|n|−k for all n ∈ Z.
56 3. APPROXIMATION THEORY

Hint: What can you say about the Fourier coefficients of f (k) ?
Exercise 3.75. Let f be 1-periodic and continuous.
(i) Suppose that fb(n) = −fb(−n) ≥ 0 holds for all n ≥ 0. Prove that
∞ b
X f (n)
(3.154) < ∞.
n=1
n
(ii) Show that there does not exist a 1-periodic continuous function f such that
sgn(n)
(3.155) fb(n) = for all |n| ≥ 2.
log |n|
Here sgn(n) = 1 if n > 0 and sgn(n) = −1 if n < 0.
Exercise 3.76. Suppose that f is a 1-periodic function such that there exists c > 0
and α ∈ (0, 1] such that
(3.156) |f (x) − f (y)| ≤ c|x − y|α
PN
holds for all x, y ∈ R. Show that the sequence of partial sums SN f (x) = n=−N fb(n)e2πinx
converges uniformly to f as N → ∞.
Exercise 3.77. Let f ∈ C([0, 1]) and A ⊂ C([0, 1]) dense. Suppose that
Z 1
(3.157) f (x)a(x)dx = 0
0
for all a ∈ A. Show
R 1 that f2 = 0.
Hint: Show that 0 |f (x)| dx = 0.
Exercise 3.78. Let f ∈ C([−1, 1]) and a ∈ [−1, 1]. Show that for every ε > 0 there
exists a polynomial p such that p(a) = f (a) and |f (x) − p(x)| < ε for all x ∈ [−1, 1].
Exercise 3.79. Prove that

1 X sin(n)
(3.158) − = (−1)n .
2 n=1 n
Exercise 3.80. Suppose f ∈ C([1, ∞)) and limx→+∞ f (x) = a. Show that f can
be uniformly approximated on [1, ∞) by functions of the form g(x) = p(1/x), where p
is a polynomial.
Exercise 3.81 (Stone-Weierstrass for finite sets). Let K be a finite set and A
a family of functions on K that is an algebra (i.e. closed under taking finite linear
combinations and products), separates points and vanishes nowhere. Give a purely
algebraic proof that A must then already contain every function on K. (That means
your proof is not allowed to use the concept of an inequality. In particular, you are not
allowed to use any facts about metric spaces such as the Stone-Weierstrass theorem.)
Hint: Take a close look at the proof of Stone-Weierstrass.
Exercise 3.82 (Uniform approximation by neural networks). Let σ(t) = et for
t ∈ R. Fix n ∈ N and let K ⊂ Rn be a compact set. As usual, let C(K) denote the
space of real-valued continuous functions on K. Define a class of functions N ⊂ C(K)
by saying that µ ∈ N iff there exist m ∈ N, W ∈ Rm×n , v, b ∈ Rm such that
Xm
(3.159) µ(x) = σ((W x)i + bi )vi for all x ∈ K.
i=1
6. FURTHER EXERCISES 57

Prove that N is dense in C(K).


Remark. This is a special case of a well-known result of G. Cybenko, Approximation by
Superpositions of a Sigmoidal Function in Math. Control Signals Systems (1989). As a
real-world motivation for this problem, note that a function µ ∈ N can be interpreted as
a neural network with a single hidden layer, see Figure 5. Consequently, in this problem
you are asked to show that every continuous function can be uniformly approximated
by neural networks of this form.

x3

x2 µ(x)

x1

Figure 5. Visualization of µ when n = 3 and m = 6.

Exercise 3.83. Let f be a continuous function on [0, 1] and N a positive integer.


Define xk = Nk for k = 0, . . . , N . Define
N N
X Y x − xj
(3.160) LN (x) = f (xk ) .
j=0
x k − xj
j=0,j6=k

(i) Show that f (xk ) = LN (xk ) for all k = 0, . . . , N and that LN is the unique polynomial
of degree ≤ N with this property.
(ii) Suppose f ∈ C N +1 ([0, 1]). Show that for every x ∈ [0, 1] there exists ξ ∈ [0, 1] such
that
N
f (N +1) (ξ) Y
(3.161) f (x) − LN (x) = (x − xk ).
(N + 1)! k=0
(iii) Show that LN does not necessarily converge to f uniformly on [0, 1]. (Find a
counterexample.)
(iv) Suppose f is given by a power series with infinite convergence radius. Does LN
necessarily converge to f uniformly on [0, 1] ?
Remark. The polynomials LN are also known as Lagrange interpolation polynomials.
CHAPTER 4

Linear operators and derivatives

Lecture 21 (Monday, October 21)

Let K denote either one of the fields R or C. Let X be a vector space over K.
Definition 4.1. A map k · k : X → [0, ∞) is called a norm if for all x, y ∈ X and
λ ∈ K,
(4.1) kλxk = |λ| · kxk, kx + yk ≤ kxk + kyk, kxk = 0 ⇔ x = 0.
A K-vector space equipped with a norm is called a normed vector space. On every
normed vector space we have a natural metric space structure defined by
(4.2) d(x, y) = kx − yk.
A complete normed vector space is called Banach space.
Examples 4.2. • Rn with the Euclidean norm is a Banach space.
• Rn with the norm kxk = supi=1,...,n |xi | is also a Banach space.
• If K is a compact metric space, then C(K) is a Banach space with the supre-
mum norm kf k∞ = supx∈K |f (x)|.
• The
R 1 space2 of continuous functions on [0, 1] equipped with the L2 -norm kf k2 =
( 0 |f (x)| dx)1/2 is a normed vector space, but not a Banach space (why?).
Example 4.3. The set of bounded sequences (an )n∈N of complex numbers equipped
with the `∞ -norm,
(4.3) kak∞ = sup |an |
n=1,2,...

is a Banach space. As a metric space, `∞ conincides with Cb (N).


Exercise 4.4. P∞
Define `1 = {(an )n∈N ⊂ C : 1
n=1 |an | < ∞}. We equip ` with the norm defined by

X
(4.4) kak1 = |an |.
n=1

Prove that this defines a Banach space.


P∞
Exercise 4.5. Define `2 = {(an )n∈N ⊂ C : n=1 |an |2 < ∞}. We equip `2 with
the norm defined by

X 1/2
(4.5) kak2 = |an |2 .
n=1

Prove that this is really a norm and that `2 is complete.


59
60 4. LINEAR OPERATORS AND DERIVATIVES

Let X, Y be normed vector spaces. Recall that a map T : X → Y is called linear if


(4.6) T (x + λy) = T x + λT y
for every x, y ∈ X, λ ∈ K. We adopt the convention that whenever T is a linear map we
write T x instead of T (x) (unless brackets are necessary because of operator precedence).
Definition 4.6. A linear map T : X → Y is called bounded if there exists C > 0
such that kT xkY ≤ CkxkX for all x ∈ X.
Linear maps between normed vector spaces are also referred to as linear operators.
Lemma 4.7. Let T : X → Y be a linear map. The following are equivalent:
(i) T is bounded
(ii) T is continuous
(iii) T is continuous at 0
(iv) supkxkX =1 kT xkY < ∞
Proof. (i) ⇒ (ii): By assumption and linearity, for x, y ∈ X,
(4.7) kT x − T ykY = kT (x − y)kY ≤ Ckx − ykX .
This implies continuity.
(ii) ⇒ (iii): There is nothing to prove.
(iii) ⇒ (iv): By continuity at 0 there exists δ > 0 such that for x ∈ X with kxkX ≤ δ
we have kT xkY ≤ 1. Let x ∈ X with kxkX = 1. Then kδxkX = δ, so
(4.8) kT (δx)kY ≤ 1
By linearity of T , kT xkY ≤ δ −1 . Thus, supkxkX =1 kT xkY ≤ δ −1 < ∞.
(iv) ⇒ (i): Let x ∈ X with x 6= 0. Let C = supkxkX =1 kT xkY < ∞. Then
x
(4.9) = 1.
kxkX X

Thus,
 x 
(4.10) T ≤ C.
kxkX Y

By linearity of T this implies


(4.11) kT xkY ≤ CkxkX .

Definition 4.8. By L(X, Y ) we denote the space of bounded linear maps T : X →
Y . For every T ∈ L(X, Y ) we define its operator norm by
kT xkY
(4.12) kT kop = sup .
x6=0 kxkX
We also denote kT kop by kT kX→Y .
One should think of kT kop as the best (i.e. smallest) constant C > 0 for which
(4.13) kT xkY ≤ CkxkX
holds. We have by definition that
(4.14) kT xkY ≤ kT kop kxkX .
4. LINEAR OPERATORS AND DERIVATIVES 61

Observe that by linearity of T and homogeneity of the norm,


(4.15) kT kop = sup kT xkY = sup kT xkY .
kxkX =1 kxkX ≤1

Exercise 4.9. Show that L(X, Y ) endowed with the operator norm forms a normed
vector space (i.e. show that k · kop is a norm).
Example 4.10. Let A ∈ Rn×m be a real n × m matrix. We view A as a linear
map Rm → Rn : for x ∈ Rm , A(x) = A · x ∈ Rn . Let us equip Rn and Rm with the
corresponding k · k∞ norms. Consider the operator norm kAk∞→∞ = supkxk∞ =1 kAxk∞
with respect to these normed spaces:
Xm Xm

(4.16) kAxk∞ = max Aij xj ≤ max |Aij | kxk∞ .
i=1,...,n i=1,...,n
j=1 j=1
Pm
This implies kAk∞→∞ ≤ maxi=1,...,n j=1 |Aij |. On the other hand, for given i =
1, . . . , n we choose x ∈ Rm with xj = |Aij |/Aij if Aij 6= 0 and xj = 0 if Aij = 0. Then
kxk∞ ≤ 1 and
Xm
(4.17) kAk∞→∞ ≥ kAxk∞ = |Aij |.
j=1
Pm
Since i was arbitrary, we get kAk∞→∞ ≥ maxi=1,...,n j=1 |Aij |. Altogether we proved
m
X
(4.18) kAk∞→∞ = max |Aij |.
i=1,...,n
j=1

Exercise 4.11. Let A ∈ Rn×m . For x ∈ Rn we define kxk1 = ni=1 |xi |.


P
(i) Determine the value of kAk1→1 = supkxk1 =1 kAxk1 (that is, find a formula for kAk1→1
involving only finitely many computations in terms of the entries of A).
(ii) Do the same for kAk1→∞ = supkxk1 =1 kAxk∞ and kAk∞→1 = supkxk∞ =1 kAxk1 .

Exercise 4.12. Let A ∈ Rn×n . Define kxk2 = ( ni=1 |xi |2 )1/2 (Euclidean norm)
P
and kAk2→2 = supkxk2 =1 kAxk2 . Observe that AAT is a symmetric n × n matrix and
hence has only non-negative eigenvalues. Denote the largest eigenvalue of AAT by ρ.

Prove that kAk2→2 = ρ. Hint: First consider the case that A is symmetric. Use that
symmetric matrices are orthogonally diagonalizable.
62 4. LINEAR OPERATORS AND DERIVATIVES

Lecture 22 (Wednesday, October 23)

1. Equivalence of norms
Definition 4.13. Two norms k·ka and k·kb on a vector space X are called equivalent
if there exist constants c, C > 0 such that
(4.19) ckxka ≤ kxkb ≤ Ckxka
for all x ∈ X.
Exercise 4.14. Prove that equivalent norms generate the same topologies: if k · ka
and k · kb are equivalent then a set U ⊂ X is open with respect to k · ka if and only if
it is open with respect to k · kb .
Exercise 4.15. Show that equivalence of norms forms an equivalence relation on
the space of norms. That is, if we write n1 ∼ n2 to denote that two norms n1 , n2 are
equivalent, then prove that n1 ∼ n1 (reflexivity), n1 ∼ n2 ⇒ n2 ∼ n1 (symmetry) and
n1 ∼ n2 , n2 ∼ n3 ⇒ n1 ∼ n3 (transitivity).
Theorem 4.16. Let X be a finite-dimensional K-vector space. Then all norms on
X are equivalent.

PnProof. Let {b1 , . . . , bn } be a basis. Then for every x ∈ X we can write x =


i=1 xi bi with uniquely determined coefficients xi ∈ K. Then kxk∗ = maxi |xi | defines
a norm on X. Let k · k be any norm on X. Since equivalence of norms is an equivalence
relation, it suffices to show that k · k∗ and k · k are equivalent. We have
n
X n
X
(4.20) kxk ≤ |xi |kbi k ≤ ( max |xj |) kbi k = Ckxk∗ ,
j=1,...,n
i=1 i=1
Pn
where C = i=1 kbi k ∈ (0, ∞). Now define
(4.21) S = {x ∈ X : kxk∗ = 1}.
We claim that this is a compact set with respect to k · k∗ . Indeed, define the canonical
isomorphism φ : Kn → X, (x1 , . . . , xn ) 7→ ni=1 xi bi . This is a continuous map (where
P
we equip Kn with the Euclidean metric, say) and S = φ(K), where K = {x ∈ Kn :
maxi |xi | = 1} is compact by the Heine-Borel Theorem (see Corollary 2.19). Thus S is
compact by Theorem 2.11.
Next note that the function x 7→ kxk is continuous with respect to the k · k∗ norm. This
is because by the triangle inequality and (4.20),
(4.22) |kxk − kyk| ≤ kx − yk ≤ Ckx − yk∗ .
Thus by Theorem 2.12, x 7→ kxk attains its infimum on the compact set S and therefore
there exists c > 0 such that
(4.23) kyk ≥ c
x
for all y ∈ S. For x ∈ X, x 6= 0 we have kxk∗
∈ S and thus by homogeneity of norms,
x
using (4.23) with y = kxk∗
gives
(4.24) kxk ≥ ckxk∗ .
Thus we proved that k · k and k · k∗ are equivalent norms. 
1. EQUIVALENCE OF NORMS 63

In contrast, two given norms on an infinite-dimensional vector space are generally


not equivalent. For example, the supremum norm and the L2 -norm on C([0, 1]) are not
equivalent (as a consequence of Exercise 3.64).
Corollary 4.17. If X is finite-dimensional then every linear map T : X → Y is
bounded.
Proof. Let {x1 , . . . , xn } ⊂ X be a basis. Then for x = ni=1 ci xi with ci ∈ K,
P

Xn
(4.25) kT xkY ≤ |ci |kT xi kY ≤ C max |ci |,
i=1,...,n
i=1

where C = ni=1 kT xi kY . By equivalence of norms we may assume that maxi |ci | is the
P
norm on X. 
This is not true if X is infinite-dimensional.
Example 4.18. Let X be the set of sequences of complex numbers (an )n∈N such
that supn∈N n|an | < ∞ and let Y be the space of bounded complex sequences. Then
X ⊂ Y . Equip both spaces with the norm kak = supn∈N |an |. The map T : X → Y ,
(k) (k)
(T a)n = nan is not bounded: let en = 1 if k = n and en = 0 if k 6= n. Then e(k) ∈ X
and T e(k) = ke(k) and ke(k) k = 1. So
(4.26) kT e(k) k = k
for every k ∈ N and therefore supkxk=1 kT xk = ∞.
Exercise 4.19. Let X be the set of continuously differentiable functions on [0, 1]
and let Y = C([0, 1]). We consider X and Y as normed vector spaces with the norm
kf k = supx∈[0,1] |f (x)|. Define a linear map T : X → Y by T f = f 0 . Show that T is
not bounded.
64 4. LINEAR OPERATORS AND DERIVATIVES

Optional topic (not relevant for exams)

2. Dual spaces*
Theorem 4.20. Let X be a normed vector space and Y a Banach space. Then
L(X, Y ) is a Banach space (with the operator norm).
Proof. Let (Tn )n ⊂ L(X, Y ) be a Cauchy sequence. Then for every x ∈ X,
(Tn x)n ⊂ Y is Cauchy and by completeness of Y it therefore converges to some limit
which we call T x. This defines a linear operator T : X → Y . We claim that T is
bounded. Since (Tn )n is a Cauchy sequence, it is a bounded sequence. Thus there
exists M > 0 such that kTn kop ≤ M for all n ∈ N. We have for x ∈ X,
(4.27) kT xkY ≤ kT x − Tn xkY + kTn xkY ≤ kT x − Tn xkY + M kxkX .
Letting n → ∞ we get kT xkY ≤ M kxkX . So T is bounded with kT kop ≤ M . It
remains to show that Tn → T in L(X, Y ). That is, for all ε > 0 we need to find N ∈ N
such that
(4.28) kTn x − T xkY ≤ εkxkX
for all n ≥ N and x ∈ X. Since (Tn )n is a Cauchy sequence, there exists N ∈ N such
that
(4.29) kTn x − Tm xkY ≤ 2ε kxkX
for all n, m ≥ N and x ∈ X. Fix x ∈ X. Then there exists mx ≥ N such that
(4.30) kTmx x − T xkY ≤ 2ε kxkX .
Then if n ≥ N and x ∈ X,
(4.31) kTn x − T xkY ≤ kTn x − Tmx xkY + kTmx x − T xkY ≤ εkxkX .

Definition 4.21. Let X be a normed vector space. Elements of L(X, K) are called
bounded linear functionals. L(X, K) is called the dual space of X and denoted X 0 .
Corollary 4.22. Dual spaces of normed vector spaces are Banach spaces.
Proof. This follows from Theorem 4.20 because K (which is R or C) is complete.

Theorem 4.23. If X is finite-dimensional, then X 0 is isomorphic to X.
Proof. Let {x1 , . . . , xn } ⊂ X be a basis. Then we can define a corresponding dual
basis of X 0 as follows: let fi ∈ X 0 , i ∈ {1, . . . , n} be the linear map given by fi (xi ) = 1
0
and fi (xj ) = 0 for j 6= i. Then we claim Pn that {f1 , . . . , fn } is a basis of X . Indeed, let
0
f ∈ X . For x ∈ X we can write x = i=1 ci xi with uniquely determined ci ∈ K. Then
by linearity,
X n Xn
(4.32) f (x) = ci f (xi ) = f (xi )fi (x),
i=1 i=1

because fi (x) = ci . Thus, the linear span of {f1 , . . . , fn } is X 0 . On the other hand,
suppose
X n
(4.33) bi f i = 0
i=1
3. SEQUENTIAL `p SPACES* 65

for some coefficients (bi )i=1,...,n ⊂ K. Then for every j ∈ {1, . . . , n}, bj = ni=1 bi fi (xj ) =
P
0. Thus, {f1 , . . . , fn } is linearly independent. Thus, X 0 and X are isomorphic since
they have the same dimension. We can define an isomorphism φ : X → X 0 by xi 7→ fi
for i = 1, . . . , n. 
Optional topic (not relevant for exams)

3. Sequential `p spaces*
Definition 4.24. Let P 1 ≤ p < ∞. Then we define `p as the set of all sequences
(xn )n=1,2,... ⊂ C such that ∞ p p
n=1 |xn | < ∞. The ` -norm is defined as
X∞ 1/p
(4.34) kxkp = |xn |p .
n=1

If p ∈ [1, ∞] then the number p0 ∈ [1, ∞] such that 1


p
+ 1
p0
= 1 is called the Hölder dual
exponent of p.
Our first goal is to show that k·kp really is a norm. To do that we need the following
generalization of the Cauchy-Schwarz inequality.
0
Theorem 4.25 (Hölder’s inequality). Let p ∈ [1, ∞] and x ∈ `p , y ∈ `p . Then
X∞
(4.35) xn yn ≤ kxkp kykp0
n=1

Lemma 4.26 (Young’s inequality). For a, b ≥ 0 and p ∈ (1, ∞) we have the elemen-
tary inequality
0
ap b p
(4.36) ab ≤ + 0.
p p
Proof. Recall that log is a concave function. Thus, for u, v ≥ 0 and t ∈ [0, 1],
(4.37) t log(u) + (1 − t) log(v) ≤ log(tu + (1 − t)v).
0
The left hand side equals log(ut v 1−t ). Now let u = ap , v = bp , t = p1 . Then the claim
follows from applying the exponential function on both sides of the inequality. 
Proof of Hölder’s inequality. If p ∈ {1, ∞}, the inequality is trivial. So we
assume p ∈ (1, ∞). By Young’s inequality,
∞ ∞ ∞
X 1X 1X 0
(4.38) |xn yn | ≤ |xn |p + 0 |yn |p .
n=1
p n=1 p n=1
Let λ > 0. Replacing xn by λxn and yn by λ−1 yn we obtain
∞ ∞ 0 ∞
X λp X λ−p X 0 0
(4.39) |xn yn | ≤ p
|xn | + 0 |yn |p = λp A + λ−p B,
n=1
p n=1 p n=1
0
where A = p1 kxkpp and B = p10 kykpp0 . Without loss of generality we may assume that
A 6= 0. We choose λ such that this inequality is strongest. This turns out to be when
0 1
λ = ( ppAB ) p+p0 . Plugging this into (4.39) implies the claim. 
Theorem 4.27 (Minkowski’s inequality). Let p ∈ [1, ∞]. For x, y ∈ `p ,
(4.40) kx + ykp ≤ kxkp + kykp .
66 4. LINEAR OPERATORS AND DERIVATIVES

Proof. If p ∈ {1, ∞} the inequality is trivial. Thus we assume p ∈ (1, ∞). If


kx + ykp = 0, the inequality is also trivial, so we can assume kx + ykp > 0. Now we
write
X∞ ∞
X
p p−1
(4.41) kx + ykp ≤ |xn ||xn + yn | + |yn ||xn + yn |p−1
n=1 n=1
Using Hölder’s inequality on both sums we obtain that this is
(4.42) ≤ kxkp kx + ykp−1 p−1
p0 (p−1) + kykp kx + ykp0 (p−1)
p
We have p0 (p − 1) = p−1
(p − 1) = p, so we have proved that
(4.43) kx + ykpp = (kxkp + kykp )kx + ykp−1
p .

Dividing by kx + ykp−1
p gives the claim. 
We conclude that k · kp is a norm and `p a normed vector space.
Theorem 4.28. Let p ∈ (1, ∞). The dual space (`p )0 is isometrically isomorphic to
p0
` .
Proof. By ek we denote the sequence which is 1 at position k and 0 everywhere
else.
0
Then we define a map φ : (`p )0 → `p by φ(v) = (v(ek ))k . Clearly, this is a linear
0
map. First we need to show that φ(v) ∈ `p . Let v ∈ (`p )0 . For each n we define
x(n) ∈ `p by
0
(
|v(ek )|p
(4.44)
(n)
xk = v(ek )
if k ≤ n, v(ek ) 6= 0,
0 otherwise.
We have on the one hand
n
0
X
(n)
(4.45) v(x )= |v(ek )|p .
n=1
And on the other hand
n
X 1/p
0
(4.46) |v(x(n) )| ≤ kvkop kx(n) kp = kvkop |v(ek )|p .
k=1
0 p p
Here we have used that p(p − 1) = p( p−1 − 1) = p−1
= p0 . Combining these two we get
n
X  10
0 p
(4.47) |v(ek )|p ≤ kvkop .
k=1
Letting n → ∞ this implies that

X 1/p0
p0
(4.48) kφ(v)k = p0 |v(en )| ≤ kvkop ,
n=1
0
so φ(v) ∈ `p . The calculation also shows that φ is bounded. It is easy to check that
0
φ is injective.
P∞ We show that it is surjective: let x ∈ `p . Then define v ∈ (`p )0 by
v(y) = n=1 xn yn . By Hölder’s inequality, v is well-defined. We have v(ek ) = xk , so
φ(v) = x. Thus φ is an isomorphism. It remains to show that φ is an isometry. We
have already seen that
(4.49) kφ(v)kp0 ≤ kvkop
3. SEQUENTIAL `p SPACES* 67

We leave it to the reader to verify the other inequality. 


Remark. It can be shown similarly that (`1 )0 = `∞ . However, the dual of `∞ is not `1 .
Corollary 4.29. `p is a Banach space for all p ∈ (1, ∞).
Remark. `1 and `∞ are also Banach spaces as we saw in Example 4.3 and Exercise 4.4.
Exercise 4.30. Show that `p ( `q if 1 ≤ p < q ≤ ∞.
68 4. LINEAR OPERATORS AND DERIVATIVES

Lecture 23 (Friday, October 25)

4. Derivatives
Recall that a function f on an interval (a, b) is called differentiable at x ∈ (a, b) if
limh→0 f (x+h)−f
h
(x)
exists. In other words, if there exists a number T ∈ R such that
|f (x + h) − f (x) − T h|
(4.50) lim = 0.
h→0 |h|
In that case we denote that real number T by f 0 (x). A real number can be understood
as a linear map R → R:
(4.51) R −→ L(R, R), T 7−→ (x 7→ T · x)
That is, the linear map associated with a real number T is given by multiplication with
T . Interpreting the derivative at a given point as a linear map, we can formulate the
definition in the general setting of normed vector spaces.
Definition 4.31. Let X, Y be normed vector spaces and U ⊂ X open. A map
F : U → Y is called Fréchet differentiable (we also say differentiable) at x ∈ U if there
exists T ∈ L(X, Y ) such that
kF (x + h) − F (x) − T hkY
(4.52) lim = 0.
h→0 khkX
In that case we call T the (Fréchet) derivative of F at x and write T = DF (x) or
T = DF |x . F is called (Fréchet) differentiable if it is differentiable at every point
x ∈ U . When X = Rn we also use the following terminology: F is totally differentiable
and DF (x) is the total derivative of F at x.
Before we move on we need to verify that DF (x) is well-defined. That is, that T is
uniquely determined by F and x. Suppose T, Te ∈ L(X, Y ) both satisfy (4.52). Then
(4.53) kT h − TehkY ≤ kF (x + h) − F (x) − T hkY + kF (x + h) − F (x) − TehkY
Thus, by (4.52),
kT h − TehkY
(4.54) −→ 0 as h → 0
khkX
In other words, for all ε > 0 there exists δ > 0 such that
(4.55) kT h − TehkY ≤ εkhkX
if khkX ≤ δ. By homogeneity of norms we argue that the inequality (4.55) must hold
for all h ∈ X: let h ∈ X, h 6= 0 be arbitrary. Then let h0 = δ khkh X . By homogeneity of
norms we have kh0 kX = δ. Thus,
(4.56) kT h0 − Teh0 kY ≤ εkh0 kX = εδ.
Multiplying both sides by δ −1 khkX and using homogeneity of norms and linearity of T ,
we obtain
(4.57) kT h − TehkY ≤ εkhkX
for all h ∈ X (it is trivial for h = 0). Since ε > 0 was arbitrary (and is independent of
h), this implies kT h − TehkY = 0, so T h = Teh for all h. Thus T = Te.
4. DERIVATIVES 69

Reminder: Big-O and little-o notation. Let f, g be maps between normed


vector spaces X, Y, Z: f : U → Y, g : U → Z, U ⊂ X open neighborhood of 0.
• Big-O: We write
(4.58) f (h) = O(g(h)) as h → 0
to mean
kf (h)k
(4.59) lim sup < ∞.
h→0 kg(h)k
This is equivalent to saying that there exists a C > 0 and δ > 0 such that
(4.60) kf (h)k ≤ Ckg(h)k
for all h with 0 < khk < δ.
• Little-o: Write
(4.61) f (h) = o(g(h)) as h → 0
to mean
kf (h)k
(4.62) lim = 0.
h→0 kg(h)k

Comments.
• O and o are not functions and (4.58), (4.61) are not equations!
• This is an abuse of the inequality sign: it would be more accurate to define
O(g) as the class of functions that satisfy (4.60), say to write f ∈ O(g).
• One can think of (say) O(g) as a placeholder for a function which may change
at every occurrence of the symbol O(g) but always satisfies the respective
condition that it is dominated by a constant times kg(h)k if khk is small.
• For brevity, we may sometimes not write out the phrase ”as h → 0”.
• There is nothing special about letting h tend to 0 in this definition. We can also
define o(g), O(g) with respect to another limit, for instance, say, as khk → ∞.
• If f (h) = o(g(h)), then f (h) = O(g(h)), but generally not vice versa.
• If f (h) = O(khkk ), then f (h) = o(khkk−ε ) for every ε > 0.

• f (h) = o(1) is equivalent to saying that f (h) → 0 as h → 0.


We can use little-o notation to restate the definition of derivatives in an equivalent
way: F is Fréchet differentiable at x if and only if there exists T ∈ L(X, Y ) such that
(4.63) F (x + h) = F (x) + T h + o(khk) (as h → 0).
The derivative map T = DF |x provides a linear approximation to F (x + h) when
khk is small. Thus, in the same way as in the one-dimensional setting, the derivative
is a way to describe how the values of F change around a fixed point x.
70 4. LINEAR OPERATORS AND DERIVATIVES

Lecture 24 (Monday, October 28)

Example 4.32. Let F : R2 → R be given by F (x1 , x2 ) = x1 cos(x2 ). We claim


that F is totally differentiable at every x = (x1 , x2 ) ∈ R2 . Indeed, let x ∈ R2 and
h = (h1 , h2 ) ∈ R2 \{0}. Then
(4.64) F (x + h) = (x1 + h1 ) cos(x2 + h2 ) = x1 cos(x2 + h2 ) + h1 cos(x2 + h2 )
From Taylor’s theorem we have that
(4.65) cos(t + ε) = cos(t) − sin(t)ε + O(ε2 ) as ε → 0
Thus,
(4.66)
F (x + h) = x1 cos(x2 ) − x1 sin(x2 )h2 + O(khk2 ) + h1 cos(x2 ) − h1 sin(x2 )h2 + O(khk2 )

(4.67) F (x + h) − F (x) = h1 cos(x2 ) − x1 sin(x2 )h2 + O(khk2 )


This implies
(4.68) F (x + h) = F (x) + T h + o(khk),
where we have set T h = h1 cos(x2 ) − x1 sin(x2 )h2 (this is a linear map R2 → R). So we
have proven that F is differentiable at x and
(4.69) DF |x h = h1 cos(x2 ) − x1 sin(x2 )h2 .
Rx
Example 4.33. Let F : C([0, 1]) → C([0, 1]) be given by F (f )(x) = 0 f (t)2 dt.
Then F is Fréchet differentiable at every f ∈ C([0, 1]). Indeed, we compute
(4.70) Z x Z Z x Z x x
F (f +h)(x)−F (f )(x) = (f (t)+h(t))2 dt− f (t)2 dt = 2 f (t)h(t)dt+ h(t)2 dt
0 0 0 0
Rx
Set T (h)(x) = 2 0
f (t)h(t)dt. This is a bounded linear map:
Z 1
(4.71) kT (h)k∞ ≤ 2 |f (t)h(t)|dt ≤ Ckhk∞ ,
0
R1
where C = 2 0
|f (t)|dt. We have
Z x
(4.72) F (f + h)(x) − F (f )(x) − T (h)(x) = h(t)2 dt.
0

Thus
Z x
(4.73) kF (f + h) − F (f ) − T hk∞ ≤ sup h(t)2 dt
x∈[0,1] 0

Z 1
(4.74) ≤ |h(t)|2 dt ≤ sup |h(x)|2 = khk2∞
0 x∈[0,1]

This implies
1
(4.75) kF (f + h) − F (f ) − T hk∞ ≤ khk∞ → 0
khk∞
Rx
as h → 0. Thus F is Fréchet differentiable at f and DF |f (h) = 2 0 f (t)h(t)dt.
4. DERIVATIVES 71

We go on to discuss some of the familiar properties of derivatives. It follows directly


from the definition that DF |x is linear in F . That is, if F : U → Y, G : U → Y are
differentiable at x ∈ U and λ ∈ R, then the function F + λG : U → Y defined by
(F + λG)(x) = F (x) + λG(x) is differentiable at x and D(F + λG)|x = DF |x + λDG|x .
Theorem 4.34 (Chain rule). Let X1 , X2 , X3 be normed vector spaces and U1 ⊂
X1 , U2 ⊂ X2 open. Let x ∈ U1 and g : U1 → X2 , f : U2 → X3 such that g is Fréchet
differentiable at x, g(U1 ) ⊂ U2 and f is Fréchet differentiable at g(x). Then the function
f ◦ g : U1 → X3 defined by (f ◦ g)(x) = f (g(x)) is Fréchet differentiable at x and
(4.76) D(f ◦ g)|x h = Df |g(x) Dg|x h
for all h ∈ X1 .
Proof. Let x, x + h ∈ U1 . We write
(4.77) f (g(x + h)) − f (g(x)) − Df |g(x) Dg|x h
(4.78) = f (g(x) + k) − f (g(x)) − Df |g(x) k + Df |g(x) (g(x + h) − g(x) − Dg|x h),
where k = g(x + h) − g(x). Using the triangle inequality and that Df |g(x) is a bounded
linear map we obtain
(4.79) kf (g(x + h)) − f (g(x)) − Df |g(x) Dg|x hkX3
(4.80) ≤ kf (g(x)+k)−f (g(x))−Df |g(x) kkX3 +kDf |g(x) kop kg(x+h)−g(x)−Dg|x hkX2
We have
(4.81) kkkX2 = kg(x + h) − g(x)kX2 ≤ kDg|x kop khkX1 + o(khkX1 ).
Dividing by khkX1 on both sides, (4.80) implies
(4.82)
1 kkkX2 kf (g(x) + k) − f (g(x)) − Df |g(x) kkX3
kf (g(x+h))−f (g(x))−Df |g(x) Dg|x hkX3 ≤ +o(1)
khkX1 khkX1 kkkX2
(as h → 0). By (4.81),
kkkX2
(4.83) ≤ kDg|x kop + 1
khkX1
if khkX1 is small enough. In particular, k → 0 as h → 0. Since f is differentiable at
g(x) we have that
kf (g(x) + k) − f (g(x)) − Df |g(x) kkX3
(4.84)
kkkX2
converges to 0 as h → 0 (since then k → 0). 
Theorem 4.35 (Product rule). Let X be a normed vector space, U ⊂ X open and
assume that F, G : U → R are differentiable at x ∈ U . Then the function F ·G : U → R,
(F · G)(x) = F (x)G(x) is also differentiable at x and
(4.85) D(F · G) |x = F (x) · DG |x + G(x) · DF |x .
Exercise 4.36. Prove this.
72 4. LINEAR OPERATORS AND DERIVATIVES

Lecture 25 (Friday, November 1)

Definition 4.37. Let X, Y be normed vector spaces, U ⊂ X open, F : U → Y .


Let v ∈ X with v 6= 0. If the limit
F (x + hv) − F (x)
(4.86) lim ∈Y (h ∈ K \ {0})
h→0 h
exists, then it is called the directional derivative (or Gâteaux derivative) of F at x in
direction v and denoted Dv F |x .
Theorem 4.38. Let X, Y be normed vector spaces, U ⊂ X open and F : U → Y
Fréchet differentiable at x ∈ U . Then for every v ∈ X, v 6= 0, the directional derivative
Dv F |x exists and
(4.87) Dv F |x = DF |x v.
Proof. By definition
(4.88) F (x + hv) − F (x) − DF |x (hv) = o(h) as h → 0.
Therefore,
F (x + hv) − F (x)
(4.89) = DF |x v + o(1) as h → 0.
h
In other words, the limit as h → 0 exists and equals DF |x v. 
Example 4.39. Consider F : R2 → R, F (x) = x21 + x22 (where x = (x1 , x2 ) ∈ R2 ).
Let e1 = (1, 0), e2 = (0, 1). Then the directional derivatives De1 F |x and De2 F |x exist
at every point x ∈ R2 and
(4.90) De1 F |x = 2x1 , De2 F |x = 2x2 .
Also, DF |x exists at every x and we can compute it using De1 F |x and De2 F |x : let
v ∈ R2 and write v = v1 e1 + v2 e2 where v1 , v2 ∈ R. Then
(4.91) DF |x v = v1 DF |x e1 + v2 DF |x e2
By Theorem 4.38 this equals
(4.92) v1 De1 F |x + v2 De2 F |x = 2x1 v1 + 2x2 v2 .
Remark. The converse of Theorem 4.38 is not true!
x3
Example 4.40. Let F : R2 → R be defined by F (x) = x2 +x 1
2 if x 6= 0 and F (0) = 0.
1 2
Then all directional derivatives Dv F |0 for v 6= 0 exist: for v = (v1 , v2 ),
v13
(4.93) F (hv) − F (0) = h ,
v12 + v22
v3
so Dv F |0 = v2 +v
1
2 . But F is not totally differentiable at 0, otherwise we would have by
1 2
linearity of the total derivative,
(4.94) Dv F |0 = DF |0 v = v1 De1 F |0 + v2 De2 F |0 = v1 ,
which is false.
5. FURTHER EXERCISES 73

5. Further exercises
P 1/p
n
Exercise 4.41. Let x ∈ Rn . Define kxkp = i=1 |x i |p
for 0 < p < ∞ and
kxk∞ = maxi=1,...,n |xi |.
(i) Show that limp→∞ kxkp = kxk∞ .
(ii) Show that limp→0 kxkp exists and determine its value (we also allow ∞ as a limit).
t
Exercise 4.42. Let C(R) be the set of continuous functions on R. Let w(t) = 1+t
for t ≥ 0. Define

X
(4.95) d(f, g) = 2−k w( sup |f (x) − g(x)|).
k=0 x∈[−k,k]

(i) Show that d is a well-defined metric.


(ii) Show that C(R) is complete with this metric.
(iii) Show that there exists no norm k · k on C(R) such that d(f, g) = kf − gk.
Exercise 4.43. Consider the space `1 of absolutely summable sequences of complex
numbers. LetP p, q ∈ [1, ∞] with p 6= q. Then k · kp and k · kq are norms on `1 (recall
that kakp = ( ∞ p 1/p
n=1 |an | ) for p ∈ [1, ∞) and kak∞ = supn∈N |an |). Show that k · kp
and k · kq are not equivalent.
Exercise 4.44.
R 1 Let X be the space of continuous functions on [0, 1] equipped with
the norm kf k = 0 |f (t)|dt. Define a linear map T : X → X by
Z x
(4.96) T f (x) = f (t)dt.
0
Show that T is well-defined and bounded and determine the value of kT kop .
Exercise 4.45. Let X, Y be normed vector spaces and F : X → Y a map.
(i) Show that F is continuous if it is Fréchet differentiable.
(ii) Prove that F is Fréchet differentiable if it is linear and bounded.
Exercise 4.46. Let X = C([0, 1]) be the Banach space of continuous functions on
[0, 1] (with the supremum norm) and define a map F : X → X by
Z s
(4.97) F (f )(s) = cos(f (t)2 )dt, s ∈ [0, 1].
0
(i) Show that F is Fréchet differentiable and compute the Fréchet derivative DF |f for
each f ∈ X.
(ii) Show that F X = {F (f ) : f ∈ X} ⊂ X is relatively compact.

Exercise 4.47. Let Rn×n denote the space of real n × n matrices equipped with
the matrix norm kAk = supkxk=1 kAxk. Define
(4.98) F : Rn×n −→ Rn×n , A 7−→ A2 .
Show that F is totally differentiable and compute DF |A .
CHAPTER 5

Differential calculus in Rn

Lecture 26 (Monday, November 4)

In this section we study the differential calculus of maps f : U → Rm , U ⊂ Rn open.


In this setting we refer to the Fréchet derivative as total derivative. Whenever we
speak of functions in this section, we mean real-valued functions.
Definition 5.1. By ek we denote the kth unit vector in Rn . Then the directional
derivative in the direction ek is called kth partial derivative and denoted by ∂k f (x) or
∂xk f (x) (if it exists).
If f is totally differentiable at a point x = (x1 , . . . , xn ) ∈ U , then we can compute
its total derivative in terms of the partial derivatives by using (4.87):
n
X n
X
(5.1) Df |x h = hj Df |x ej = hj ∂j f (x) ∈ Rm .
j=1 j=1

By definition, Df |x is a linear map Rn → Rm . It is therefore given by multiplication


with a real m × n matrix. We will denote this matrix also by Df |x and call it the
Jacobian matrix of f at x. From (4.87) we conclude that the jth column vector of this
matrix is given by ∂j f (x) ∈ Rm . Therefore the Jacobian matrix is given by
∂1 f1 (x) · · · ∂n f1 (x)
 

(5.2) Df |x = (∂j fi (x))i,j =  .. .. ..  ∈ Rm×n ,


. . .
∂1 fm (x) · · · ∂n fm (x)
where f (x) = (f1 (x), . . . , fm (x)) ∈ Rm . If m = 1, then the gradient of f at x is defined
as1
∂1 f (x)
 

(5.3) ∇f (x) = Df |Tx =  ..  ∈ Rn .


.
∂n f (x)
(Note that n × 1 matrices are identified with vectors in Rn : Rn×1 = Rn .)
Example 5.2. Let F : R3 → R2 be defined by F (x) = (x1 x2 sin(x3 ), x22 −ex1 ). Then
F is totally differentiable and the Jacobian is given by
 
x2 sin(x3 ) x1 sin(x3 ) x1 x2 cos(x3 )
(5.4) DF |x = .
−ex1 2x2 0
Recall that a set A ⊂ Rn is called convex if tx + (1 − t)y ∈ A for every x, y ∈ A,
t ∈ [0, 1].

1Here M T ∈ Rn×m denotes the transpose of the matrix M ∈ Rm×n .


75
76 5. DIFFERENTIAL CALCULUS IN Rn

Theorem 5.3 (Mean value theorem). Let U ⊂ Rn be open and convex. Suppose
that f : U → R is totally differentiable on U . Then, for every x, y ∈ U , there exists
ξ ∈ U such that
(5.5) f (x) − f (y) = Df |ξ (x − y)
and there exists t ∈ [0, 1] such that ξ = tx + (1 − t)y.
The idea of the proof is to apply the one-dimensional mean value theorem to the
function restricted to the line passing through x and y.
Proof. If x = y there is nothing to show. Let x 6= y. Define g : [0, 1] → R by
g(t) = f (tx + (1 − t)y). The function g is continuous on [0, 1] and differentiable on
(0, 1). By the one-dimensional mean value theorem there exists t0 ∈ [0, 1] such that
g(1) − g(0) = g 0 (t0 ). By the chain rule,
(5.6) g 0 (t0 ) = Df |t0 x+(1−t0 )y (x − y).

Corollary 5.4. Under the assumptions of the previous theorem: if Df |x = 0 for
all x ∈ U , then f is constant.
Exercise 5.5. Show that the conclusion of the corollary also holds under the weaker
assumption that U is open and connected (rather than convex). Hint: Consider over-
lapping open balls along a continuous path connecting two given points in U .
Definition 5.6. A map f : U → Rm , U ⊂ Rn open, is called continuously dif-
ferentiable (on U ) if it is totally differentiable on U and the map U → L(Rn , Rm ),
x 7→ Df |x is continuous. We denote the collection of continuously differentiable maps
by C 1 (U, Rm ). If m = 1 we also write C 1 (U, R) = C 1 (U ).
Remark. For f : U → R, continuity of the map U → Rn , x 7→ ∇f (x) is equivalent to
continuity of the map U → L(Rn , R), x 7→ Df |x .
Theorem 5.7. Let U ⊂ Rn be open. Let f : U → R. Then f ∈ C 1 (U ) if and
only if ∂j f (x) exists for every j ∈ {1, . . . , n} and x 7→ ∂j f (x) is continuous on U for
j ∈ {1, . . . , n}.
Remark. Without additional assumptions (such as continuity of x 7→ ∂j f (x)), existence
of partial derivatives does not imply total differentiability.
Exercise 5.8. Let F : R2 → R be defined by F (x) = xx21+x x2
2 if x 6= 0 and F (0) = 0.
1 2
(i) Show that the partial derivatives ∂1 F (x), ∂2 F (x) exist for every x ∈ R2 .
(ii) Show that F is not continuous at (0, 0).
(iii) Determine at which points F is totally differentiable.
Proof. Let f ∈ C 1 (U ). Then ∂j f (x) exists by Theorem 4.38 and x 7→ ∂j f (x)
is continuous because it can be written as the composition of the continuous maps
x 7→ ∇f (x) and πj : Rn → R, x 7→ xj : ∂j f (x) = (πj ◦ ∇f )(x).
Conversely, assume that ∂j f (x) exists for every x ∈ U , j ∈ {1, . . . , n} and x 7→ ∂j f (x) is
continuous. Let x ∈ U . Write h = nj=1 hj ej and define vk = kj=1 hj ej for 1 ≤ k ≤ n
P P

and v0 = 0. Then, if khk is small enough so that x + h ∈ U , then


(5.7)
f (x+h)−f (x) = f (x+vn )−f (x+vn−1 )+f (x+vn−1 )−f (x+vn−2 )+· · ·+f (x+v1 )−f (x+v0 )
5. DIFFERENTIAL CALCULUS IN Rn 77

n
X
(5.8) = (f (x + vj ) − f (x + vj−1 )).
j=1

By the one-dimensional mean value theorem there exists tj ∈ [0, 1] such that
(5.9) f (x+vj )−f (x+vj−1 ) = f (x+vj−1 +hj ej )−f (x+vj−1 ) = ∂j f (x+vj−1 +tj hj ej )hj .
By continuity of ∂j f , for every ε > 0 exists δ > 0 such that
(5.10) |∂j f (y) − ∂j f (x)| ≤ ε/n for all j = 1, . . . , n,
whenever y ∈ U is such that kx − yk ≤ δ. We may choose δ small enough so that
x + h ∈ U whenever khk ≤ δ. Then, if khk ≤ δ (then also kvj k ≤ δ, kvj−1 + tj hj ej k ≤ δ)
we get
n
X n
X
(5.11) f (x + h) − f (x) − hj ∂j f (x) ≤ f (x + vj ) − f (x + vj−1 ) − hj ∂j f (x)
j=1 j=1

n n
X X ε
(5.12) = |hj ||∂j f (x + vj−1 + tj hj ej ) − ∂j f (x)| ≤ |hj | ≤ εkhk.
j=1 j=1
n

Therefore, f (x + h) − f (x) − Df |x h = o(h), where


n
X
(5.13) Df |x h = hj ∂j f (x),
j=1

so f is differentiable at x. Also, x 7→ ∇f (x) is continuous, because the ∂j f are contin-


uous. 
To conclude this introductory section, we discuss some variants of the mean value
theorem that will be useful later.
Theorem 5.9 (Mean value theorem, integral version). Let U ⊂ Rn be open and
convex and f ∈ C 1 (U ). Then for every x, y ∈ U ,
Z 1
(5.14) f (x) − f (y) = Df |tx+(1−t)y (x − y)dt.
0

Proof. Let g(t) = f (tx + (1 − t)y). By the fundamental theorem of calculus and
the chain rule,
Z 1 Z 1
0
(5.15) f (x) − f (y) = g(1) − g(0) = g (s)ds = Df |tx+(1−t)y (x − y)dt.
0 0

Theorem 5.10 (Mean value theorem, vector-valued case). Let U ⊂ Rn be open and
convex and F ∈ C 1 (U, Rm ). Then for every x, y ∈ U there exists θ ∈ [0, 1] such that
(5.16) kF (x) − F (y)k ≤ kDF |ξ kop kx − yk,
where ξ = θx + (1 − θ)y.
Proof. Write F = (F1 , . . . , Fm ). Then by Theorem 5.9
Z 1
(5.17) Fi (x) − Fi (y) = DFi |tx+(1−t)y (x − y)dt.
0
78 5. DIFFERENTIAL CALCULUS IN Rn

This implies
Z 1
(5.18) F (x) − F (y) = DF |tx+(1−t)y (x − y)dt.
0
By the triangle inequality, we have
Z 1
(5.19) kF (x) − F (y)k ≤ kDF |tx+(1−t)y kop dtkx − yk
0
The map [0, 1] → R, t 7→ kDF |tx+(1−t)y kop is continuous (because F is C 1 ) and therefore
assumes its supremum at some point θ ∈ [0, 1]. Define ξ = θx + (1 − θ)y. Then
(5.20) kF (x) − F (y)k ≤ kDF |ξ kop kx − yk.

Remark. If m ≥ 2 and F : U → Rm is C 1 and x, y ∈ U , then it is not necessarily true
that there exists ξ ∈ U such that
(5.21) F (x) − F (y) = DF |ξ (x − y).

Exercise 5.11. Find a C 1 map F : R → R2 and points x, y ∈ R such that there


does not exist ξ ∈ R such that F (x) − F (y) = DF |ξ (x − y).
1. THE CONTRACTION PRINCIPLE 79

Lecture 27 (Wednesday, November 6)

1. The contraction principle


The contraction principle is a powerful tool in analysis. It is most naturally described
in the setting of metric spaces.
Definition 5.12. Let (X, d) be a metric space. A map ϕ : X → X is called a
contraction (of X) if there exists a constant c ∈ (0, 1) such that
(5.22) d(ϕ(x), ϕ(y)) ≤ c · d(x, y)
holds for all x, y ∈ X.
Remark. Contractions are continuous.
Example 5.13. Let U ⊂ Rn be open and convex and F : U → U a differentiable
map. If there exists c ∈ (0, 1) such that kDF |x kop ≤ c for all x ∈ U , then F is
a contraction of U . Indeed, by the mean value theorem (Theorem 5.10), for every
x, y ∈ U there exists ξ ∈ U such that
(5.23) kF (x) − F (y)k ≤ kDF |ξ kop kx − yk ≤ ckx − yk.
Theorem 5.14 (Banach fixed point theorem). Let X be a complete metric space
and ϕ : X → X a contraction. Then there exists a unique x∗ ∈ X such that ϕ(x∗ ) = x∗ .
Remark. A point x ∈ X such that ϕ(x) = x is called a fixed point of ϕ.
Proof. Uniqueness: Suppose x0 , x1 ∈ X are fixed points of ϕ. Then
(5.24) 0 ≤ d(x0 , x1 ) = d(ϕ(x0 ), ϕ(x1 )) ≤ c · d(x0 , x1 ),
which implies d(x0 , x1 ) = 0, since c ∈ (0, 1). Thus x0 = x1 .
Existence: Pick x0 ∈ X arbitrarily and define a sequence (xn )n≥0 recursively by
(5.25) xn+1 = ϕ(xn ).
We claim that (xn )n is a Cauchy sequence. Indeed, by induction we see that
(5.26) d(xn+1 , xn ) ≤ cd(xn , xn−1 ) ≤ c2 d(xn−1 , xn−2 ) ≤ · · · ≤ cn d(x1 , x0 ).
Thus, for n < m we can use the triangle inequality to obtain
(5.27) d(xm , xn ) ≤ d(xm , xm−1 ) + d(xm−1 , xm−2 ) + · · · + d(xn+1 , xn )

m−1 m−1 ∞
X X X d(x1 , x0 )
(5.28) = d(xi+1 , xi ) ≤ ci d(x1 , x0 ) ≤ d(x1 , x0 ) ci = cn .
i=n i=n i=n
1−c

Thus, d(xm , xn ) converges to 0 as m > n → ∞. This shows that (xn )n is a Cauchy


sequence. By completeness of X, it must converge to a limit which we call x∗ ∈ X. By
continuity of ϕ,
(5.29) ϕ(x∗ ) = ϕ( lim xn ) = lim ϕ(xn ) = lim xn+1 = x∗ .
n→∞ n→∞ n→∞


80 5. DIFFERENTIAL CALCULUS IN Rn

Remarks. 1. The theorem is false if we drop the assumption that X is complete: the
map f : (0, 1) → (0, 1) defined by f (x) = x/2 is a contraction, but has no fixed point.
2. The proof not only demonstrates the existence of the fixed point x∗ , but also gives
an algorithm to compute it via successive applications of the map ϕ. We can say
something about how quickly the algorithm converges: the sequence (xn )n defined in
the proof satisfies the inequality
cn
(5.30) d(xn , x∗ ) ≤ d(x0 , x1 ),
1−c
so speed of convergence depends only on the parameter c ∈ (0, 1) and the quality of
the initial guess x0 ∈ X.
3. The contraction principle can be used to solve equations. For example, say we want
to solve F (x) = 0 (F is some function). Then we can set G(x) = F (x) + x. Then
F (x) = 0 if and only if x is a fixed point of G.

Example 5.15. Let A ∈ Rn×n be an invertible n × n matrix and b ∈ Rn . Say we


want to solve the linear system
(5.31) Ax = b
for x. Of course, x = A−1 b. However, A−1 is expensive to compute if n is large, so
other methods are desirable for solving linear equations. Let
(5.32) F (x) = λ(Ax − b) + x
for some constant λ 6= 0 that we may choose freely. Then Ax = b if and only if x is a
fixed point of F . Moreover,
(5.33) kF (x) − F (y)k = kλA(x − y) + x − yk = k(λA + I)(x − y)k ≤ kλA + Ikop kx − yk.
Suppose that λ happens to be such that kλA + Ikop < 1. Then F : Rn → Rn is a
contraction, so we can compute the solution to the equation by the iteration xn+1 =
F (xn ).
2. Inverse function theorem and implicit function theorem
In this section we will see how the contraction principle can be applied to find (local)
inverses of maps between open sets in Rn , in other words to solve equations of the form
f (x) = y.
Definition 5.16. Let E ⊂ Rn be open. We say that a map f : E → Rn is locally
invertible at a ∈ E if there exist open sets U, V ⊂ Rn such that U ⊂ E, a ∈ U , f (a) ∈ V
and a function g : V → U such that g(f (x)) = x for all x ∈ U and f (g(y)) = y for all
y ∈ V . In that case we call g a local inverse of f (at a) and denote it by f |−1
U (this is
consistent with usual notation of inverse functions, because the restriction f |U of f to
U is an invertible map U → V ).
Theorem 5.17 (Inverse function theorem). Let E ⊂ Rn be open, f ∈ C 1 (E, Rn )
and a ∈ E. Assume that Df |a is invertible. Then f is locally invertible at a in some
open neighborhood U ⊂ E of a and f |−1 1
U ∈ C (f (U ), U ) with

(5.34) D(f |−1 −1


U )|f (a) = (Df |a ) .
2. INVERSE FUNCTION THEOREM AND IMPLICIT FUNCTION THEOREM 81

Lecture 28 (Friday, November 8)

Proof. We want to apply the contraction principle. For fixed y ∈ Rn , consider the
map
(5.35) ϕy (x) = x + Df |−1
a (y − f (x)) (x ∈ E)

Then f (x) = y if and only if x is a fixed point of ϕy . Calculate


(5.36) Dϕy |x = I − Df |−1 −1
a Df |x = Df |a (Df |a − Df |x ).

Let λ = kDf |−1a kop . By continuity of Df at a, there exists an open ball U ⊂ E such
that
1
(5.37) kDf |a − Df |x kop ≤ for x ∈ U.

Then for x, x0 ∈ U
(5.38) kϕy (x) − ϕy (x0 )k ≤ kDϕy |ξ kop kx − x0 k
1
(5.39) ≤ kDf |−1 0 0
a kop kDf |a − Df |ξ kop kx − x k ≤ kx − x k.
2
Note that this doesn’t show that ϕy is a contraction, because ϕy (U ) may not be con-
tained in U . However, it does show that ϕy has at most one fixed point (by the same
argument used to show uniqueness in the Banach fixed point theorem). This already
implies that f is injective on U : for every y ∈ Rn we have f (x) = y for at most one
x ∈ U . Let V = f (U ). Then f |U : U → V is a bijection and has an inverse g : V → U .
Claim. V is open.
Proof of claim. Let y0 ∈ V . We need to show that there exists an open ball
around y0 that is contained in V . Since V = f (U ) there exists x0 ∈ U such that
f (x0 ) = y0 . Let r > 0 be small enough so that Br (x0 ) ⊂ U (possible because U is
open). Let ε > 0 and y ∈ Bε (y0 ). We will demonstrate that if ε > 0 is small enough,
then ϕy maps Br (x0 ) into itself. First note
(5.40) kϕy (x0 ) − x0 k = kDf |−1
a (y − y0 )k ≤ λε.
r
Hence, choosing ε ≤ 2λ
, we get for x ∈ Br (x0 ) that
(5.41) kϕy (x) − x0 k ≤ kϕy (x) − ϕy (x0 )k + kϕy (x0 ) − x0 k

(5.42) ≤ 21 kx − x0 k + r
2
≤ r
2
+ r
2
= r.
Thus ϕy (x) ∈ Br (x0 ). This proves ϕy (Br (x0 )) ⊂ Br (x0 ), so ϕy is a contraction of Br (x0 ).
By the Banach fixed point theorem, ϕy must have a unique fixed point x ∈ Br (x0 ). So
by definition of ϕy we have f (x) = y, so y ∈ f (U ) = V . Therefore we have shown that
Bε (y0 ) ⊂ V , so V is open. 
It remains to show that g ∈ C 1 (V, U ) and Dg|f (a) = Df |−1
a . We use the following
lemma.
Lemma 5.18. Let A, B ∈ Rn×n such that A is invertible and
(5.43) kB − Ak · kA−1 k < 1.
Then B is invertible. (Here k · k denotes the matrix norm, which is just the operator
norm: kAk = supkxk=1 kAxk.)
82 5. DIFFERENTIAL CALCULUS IN Rn

In other words, if a matrix A is invertible and B is a “small” perturbation of A


(“small” in the sense that (5.43) holds), then B is also invertible.
Proof. It suffices to show that B is injective. Let x 6= 0. Then we need to show
Bx 6= 0. Indeed,
(5.44) kxk = kA−1 Axk ≤ kA−1 k · kAxk ≤ kA−1 k(k(A − B)xk + kBxk)
(5.45) ≤ kA−1 k · kB − Ak · kxk + kA−1 kkBxk,
which implies kA−1 kkBxk ≥ (1 − kA−1 k · kB − Ak)kxk > 0, so Bx 6= 0. 
2. INVERSE FUNCTION THEOREM AND IMPLICIT FUNCTION THEOREM 83

Lecture 29 (Monday, November 11)

Let y ∈ V . We show that g is totally differentiable at y. There exists x ∈ U such


that f (x) = y and from the above,
(5.46) kDf |−1 1
a kkDf |x − Df |a k ≤ 2 .

By the lemma, Df |x is invertible. Let k be such that y + k ∈ V . Then there exists h


such that y + k = f (x + h). We have
(5.47) khk ≤ kh − Df |−1 −1
a kk + kDf |a kk and

(5.48) h − Df |−1 −1
a k = h + Df |a (f (x) − f (x + h)) = ϕy (x + h) − ϕy (x),

so kh − Df |−1 1
a kk ≤ 2 khk. Therefore, khk ≤ 2λkkk → 0 as kkk → 0. Now we compute

(5.49) g(y + k) − g(y) − Df |−1 −1


x k = x + h − x − Df |x k

(5.50) = h − Df |−1 −1
x (f (x + h) − f (x)) = −Df |x (f (x + h) − f (x) − Df |x h) and so

1 −1 kf (x + h) − f (x) − Df |x hk khk
(5.51) kg(y + k) − g(y) − Df |−1
x kk ≤ kDf |x k
kkk khk kkk
kf (x + h) − f (x) − Df |x hk
(5.52) ≤ kDf |−1
x k 2λ −→ 0 as k → 0.
khk
Therefore g is differentiable at y with Dg|y = Df |−1
x .
It remains to show that Dg is continuous. To show this we need another lemma.
Lemma 5.19. Let GL(n) denote the space of real invertible n × n matrices (equipped
with some norm). The map GL(n) → GL(n) defined by A 7→ A−1 is continuous.
This lemma follows because the entries of A−1 are rational functions with non-
vanishing denominator in terms of the entries of A (by Cramer’s rule).
Since Dg|y = Df |−1
x and compositions of continuous maps are continuous (Df is
continuous by assumption), we have that Dg must be continuous, so g ∈ C 1 (V, U ). 
Exercise 5.20. Let f ∈ C 1 (E, Rn ) and assume that Df |x is invertible for all x ∈ E.
Prove that f (U ) is open for every open set U ⊂ E.
Remark. If f is locally invertible at every point, it is not necessarily (globally) invertible
(that is, bijective).
Example 5.21. Let f : R2 → R2 be given by f (x) = (ex2 sin(x1 ), ex2 cos(x1 )). Then
 x 
e 2 cos(x1 ) ex2 sin(x1 )
(5.53) Df |x = .
−ex2 sin(x1 ) ex2 cos(x1 )
Thus det Df |x = e2x2 (cos(x1 )2 + sin(x1 )2 ) = e2x2 6= 0, so by Theorem 5.17, f is locally
invertible at every point x ∈ R2 . f is not bijective: it is not injective because, for
instance, f (0, 0) = f (2π, 0).
84 5. DIFFERENTIAL CALCULUS IN Rn

Lecture 30 (Wednesday, November 13)

We will now use the inverse function theorem to prove a significant generalization
concerning equations of the form f (x, y) = 0, where y is given and we want to solve for
x. Let E ⊂ Rn × Rm open, f : E → Rn differentiable at p = (a, b) ∈ Rn × Rm . Then
Df |p is a n × (n + m) matrix:
∂1 f1 |p · · · ∂n f1 |p ∂n+1 f1 |p · · · ∂n+m f1 |p
 

(5.54) Df |p =  .. .. .. .. .. ..  ∈ Rn×(n+m)
. . . . . .
∂1 fn |p · · · ∂n fn |p ∂n+1 fn |p · · · ∂n+m fn |p
We denote the left n × n submatrix by Dx f |p and the right n × m submatrix by Dy f |p .
Note that Dx f |(a,b) is the Jacobian matrix of the differentiable map x 7→ f (x, b) at
x = a (b is fixed).
Theorem 5.22 (Implicit function theorem). Let f ∈ C 1 (E, Rn ), (a, b) ∈ E ⊂
Rn × Rm with f (a, b) = 0. Assume that Dx f |(a,b) ∈ Rn×n is invertible. Then there exist
open sets U ⊂ E and W ⊂ Rm with (a, b) ∈ U , b ∈ W such that for every y ∈ W there
exists a unique x such that (x, y) ∈ U and f (x, y) = 0. Write x = g(y). Then W can be
chosen such that g ∈ C 1 (W, Rn ), g(b) = a, (g(y), y) ∈ U and f (g(y), y) = 0 for y ∈ W .
Moreover,
(5.55) Dg|b = −Dx f |−1
(a,b) Dy f |(a,b) .

(Note that this equation makes sense, because Dg|b ∈ Rn×m , Dx f |−1
(a,b) ∈ R
n×n
, Dy f |(a,b) ∈
n×m
R .)
Remark. The equation (5.55) can be obtained from differentiating the equation
(5.56) f (g(y), y) = 0
with respect to y using the chain rule (this is called implicit differentiation).
Proof. Define F (x, y) = (f (x, y), y) for (x, y) ∈ E. Then F ∈ C 1 (E, Rn × Rm ).
We would like to apply the inverse function theorem to F . For h ∈ Rn , k ∈ Rm with
(a + h, b + k) ∈ E,
(5.57)
F (a + h, b + k) − F (a, b) = (f (a + h, b + k) − f (a, b), k) = (Df |(a,b) (h, k) + o(k(h, k)k), k)

(5.58) = (Df |a,b (h, k), k) + o(k(h, k)k)


and thus DF |(a,b) (h, k) = (Df |(a,b) (h, k), k). We claim that DF |(a,b) ∈ R(n+m)×(n+m) is
invertible. It suffices to show that DF |(a,b) is injective. Let DF |(a,b) (h, k) = 0. Then
k = 0 and Df |(a,b) (h, 0) = 0. Thus, Dx f |(a,b) h = 0, so h = 0 because Dx f |(a,b) is
injective.
By the inverse function theorem there exist open sets U, V ⊂ Rn+m with (a, b) ∈ U ,
(0, b) ∈ V such that F |U : U → V is bijective. Let W = {y ∈ Rm : (0, y) ∈ V }. W
is open because V is open. Then, if y ∈ W , there exists a unique (x, y) ∈ U such that
F (x, y) = (0, y), so f (x, y) = 0. Define g(y) = x. We need to show that g ∈ C 1 (W, Rn ).
We have F (g(y), y) = (0, y). Let G = F |−1 U : V → U . Then G(0, y) = (g(y), y), so g is
1 1
C because G is C . Set φ(y) = (g(y), y). Then Dφ|y = (Dg|y , I). Also, f (φ(y)) = 0.
By the chain rule,
(5.59) 0 = Df |φ(y) Dφ|y = Dx f |φ(y) Dg|y + Dy f |φ(y) ,
2. INVERSE FUNCTION THEOREM AND IMPLICIT FUNCTION THEOREM 85

so Dg|y = −Dx f |−1


φ(y) Dy f |φ(y) . Setting y = b we obtain

(5.60) Dg|b = −Dx f |−1


(a,b) Dy f |(a,b) .

Example 5.23. While we used the inverse function theorem in the proof of the
implicit function theorem, the inverse function theorem is also a consequence of the
implicit function theorem. Say E ⊂ Rn , f ∈ C 1 (E, Rn ) and a ∈ E such that Df |a is
invertible.

Define F : E × Rn → Rn by F (x, y) = f (x) − y and set b = f (a). Then F (a, b) = 0.


Also, Dx F |(a,b) = Df |a is invertible, so by the implicit function theorem there exist open
sets Ω ⊂ E × Rn , W ⊂ Rn , f (a) ∈ W and g ∈ C 1 (W, Rn ) with g(b) = a, (g(y), y) ∈ Ω
for y ∈ W and

(5.61) F (g(y), y) = 0 and Dg|f (a) = Df |−1


a .

But F (g(y), y) = 0 is equivalent to f (g(y)) = y. Define U = g(W ) = {g(y) : y ∈ W }.


Then a ∈ U since g(f (a)) = g(b) = a and b ∈ W .
Also U ⊂ E since if x ∈ U , then x = g(y) for y ∈ W , so (x, y) ∈ Ω ⊂ E × Rn .
Similarly, we see that U is open because Ω is open. Let x ∈ U . Then there exists y ∈ W
such that x = g(y). Also, f (x) = f (g(y)) = y and therefore g(f (x)) = g(y) = x. Thus
f |U is invertible and f |−1
U = g.

Example 5.24. Let f : R2 × R3 → R2 be given by


 
x21 y1 + x2 cos(y2 ) − y3
(5.62) f (x, y) = (x ∈ R2 , y ∈ R3 ).
e−x2 + sin(y1 ) + x1 y2 y3 − 1 − sin(1)

Then
 
2x1 y1 cos(y2 ) x21 −x2 sin(y2 ) −1
(5.63) Df (x, y) = −x2
y2 y3 −e cos(y1 ) x1 y 3 x1 y 2

Set a = (1, 0), b = (1, 0, 1). Then f (a, b) = 0 and


 
2 1
(5.64) Dx f |(a,b) = ,
0 −1

then det Dx f |(a,b) = −2 6= 0, so Dx f |(a,b) is invertible. We have


 
1 0 −1
(5.65) Dy f |(a,b) = .
cos(1) 1 0

Thus there exists U ⊂ R2 × R3 open, (a, b) ∈ U , W ⊂ R3 , b ∈ W , g ∈ C 1 (W, R2 ) such


that

(5.66) f (g(y), y) = 0 for y ∈ W and

 
−1 − cos(1) −1 1
(5.67) Dg|b = −Dx f |−1
(a,b) Dy f |(a,b) = 1
.
2 2 cos(1) 2 0
86 5. DIFFERENTIAL CALCULUS IN Rn

3. Ordinary differential equations


In this section we study initial value problems of the form
 0
y (t) = F (t, y(t))
(5.68)
y(t0 ) = y0 ,
where E ⊂ R × R is open, (t0 , y0 ) ∈ E and F ∈ C(E) are given. We say that a
differentiable function y : I → R defined on some open interval I ⊂ R that includes
the point t0 ∈ I is a solution to the initial value problem if (t, y(t)) ∈ E for all t ∈ I
and y(t0 ) = y0 and y 0 (t) = F (t, y(t)) for all t ∈ E. The equation y 0 (t) = F (t, y(t)) is
a first order ordinary differential equation. We also write this differential equation in
short form as
(5.69) y 0 = F (t, y).
Geometric interpretation. At each point (t, y) ∈ E imagine a small line segment
with slope F (t, y). We are looking for a function such that its graph has the slope
F (t, y) at each point (t, y) on the graph of the function.

Figure 1. Visualization of F (t, y).

Example 5.25. Consider the equation y 0 = yt . The solutions of this equation are
of the form y(t) = ct for c ∈ R.
Example 5.26. Sometimes we can solve initial value problems by computing an
explicit expression for y. Recall for instance that solving differential equations of the
form y 0 = f (t)g(y) is easy (by separation of variables). Consider for instance
 0 t
y (t) = y(t)
(5.70)
y(t0 ) = y0
p
for (t0 , y0 ) ∈ (0, ∞) × (0, ∞). Then y(t) = t2 + y02 − t20 . Note that if y02 − t20 ≥ 0,
2 2
p y is defined on I = (0, ∞). But if y0 − t0 < 0, then y is only defined on I =
then
( t20 − y02 , ∞) 3 t0 .
3. ORDINARY DIFFERENTIAL EQUATIONS 87

In general, however it is not easy to find a solution. It may also happen that the
solution is not expressible in terms of elementary functions. Try for instance, to solve
the initial value problem
2 2
 0
y (t) = ey(t) t sin(t + y(t)),
(5.71)
y(1) = 5.
88 5. DIFFERENTIAL CALCULUS IN Rn

Lecture 31 (Friday, November 15)

Theorem 5.27 (Picard-Lindelöf). Let E ⊂ R × R be open, (t0 , y0 ) ∈ E, F ∈ C(E).


Let a > 0 and b > 0 be small enough such that
(5.72) R = {(t, y) ∈ R2 : |t − t0 | ≤ a, |y − y0 | ≤ b} ⊂ E.
Let M = sup(t,y)∈R |F (t, y)| < ∞. Assume that there exists c ∈ (0, ∞) such that
(5.73) |F (t, y) − F (t, u)| ≤ c|y − u|
for all (t, y), (t, u) ∈ R. Define a∗ = min(a, b/M ) and let I = [t0 − a∗ , t0 + a∗ ]. Then
there exists a unique solution y : I → R to the initial value problem
 0
y (t) = F (t, y(t)),
(5.74)
y(t0 ) = y0 .

(t0 , y0 )
t

Figure 2. Visualization of F (t, y).

Remarks. 1. If F satisfies condition (5.73), we also say that F is Lipschitz continuous


in the second variable.
2. The condition (5.73) follows if F is differentiable in the second variable and |∂y F (t, y)| ≤
c for every (t, y) ∈ R (by the mean value theorem).
3. By the fundamental theorem of calculus, the initial value problem (5.74) is equivalent
to the integral equation
Z t
(5.75) y(t) = y0 + F (s, y(s))ds.
t0

Corollary 5.28. Let E ⊂ R × R open, (t0 , y0 ) ∈ E, F ∈ C 1 (E). Then there exists


an interval I ⊂ R and a unique differentiable function y : I → R such that (t, y(t)) ∈ E
for all t ∈ I and y solves (5.74).
This is true because (5.73) follows from the mean value theorem and continuity of
the second derivative ∂y F .
3. ORDINARY DIFFERENTIAL EQUATIONS 89

Proof of Theorem 5.27. Let J = [y0 − b, y0 + b]. It suffices to show that there
exists a unique continuous function y : I → J such that
Z t
(5.76) y(t) = y0 + F (s, y(s))ds
t0

(that is, y is a solution of the integral equation). Let


(5.77) Y = {y : I → J : y continuous on I}.
For every y ∈ Y, t 7→ F (t, y(t)) is a well-defined continuous function on I. Define
Z t
(5.78) T y(t) = y0 + F (s, y(s))ds.
t0
Claim. T Y ⊂ Y.
Proof of claim. Let y ∈ Y. Then T y is a continuous function on I. It remains
to show that T y(t) ∈ J for all t ∈ I. Recalling that |F (t, y)| ≤ M for all (t, y) ∈ R we
obtain:
Z t
(5.79) |T y(t) − y0 | ≤ |F (s, y(s))|ds ≤ |t0 − t|M ≤ M a∗ ≤ b,
t0

where we used that a∗ = min(a, b/M ) ≤ b/M . 


To apply the contraction principle we need to equip Y with a metric such that
T : Y → Y is a contraction and Y is complete. We could be tempted to try the
usual supremum metric d∞ (g1 , g2 ) = supt∈I |g1 (t) − g2 (t)|. Then Y ⊂ C(I) is closed, so
(Y, d∞ ) is a complete metric space. However, T will not necessarily be a contraction2
with respect to d∞ . Instead, we define the metric
(5.80) d∗ (g1 , g2 ) = sup e−2c|t−t0 | |g1 (t) − g2 (t)|.
t∈I
2ca∗
Then d∗ (g1 , g2 ) ≤ d∞ (g1 , g2 ) ≤ e d∗ (g1 , g2 ). In other words, d∗ and d∞ are equivalent
metrics. This implies that (Y, d∗ ) is still complete.
Claim. T : Y → Y is a contraction with respect to d∗ .
Proof of claim. For g1 , g2 ∈ Y, t ∈ I we have by (5.73),
Z t
(5.81) |T g1 (t) − T g2 (t)| = (F (s, g1 (s)) − F (s, g2 (s)))ds
t0
Z t
(5.82) ≤c |g1 (s) − g2 (s)|ds.
t0

Let us assume that t ∈ [t0 , t0 + a∗ ]. Then


Z t Z t
(5.83) |T g1 (t) − T g2 (t)| ≤ c |g1 (s) − g2 (s)|ds ≤ c e2c(s−t0 ) d∗ (g1 , g2 )ds
t0 t0

1 2c(t−t0 )
(5.84) = cd∗ (g1 , g2 ) (e − 1) ≤ 12 d∗ (g1 , g2 )e2c|t−t0 |
2c
Similarly, for t ∈ [t0 − a∗ , t0 ] we also have
(5.85) |T g1 (t) − T g2 (t)| ≤ 21 d∗ (g1 , g2 )e2c|t−t0 | .
2For the supremum metric to give rise to a contraction we would need to make the interval I
smaller.
90 5. DIFFERENTIAL CALCULUS IN Rn

Thus,
(5.86) e−2c|t−t0 | |T g1 (t) − T g2 (t)| ≤ 12 d∗ (g1 , g2 )
holds for all t ∈ I, so d∗ (T g1 , T g2 ) ≤ 12 d∗ (g1 , g2 ). 
By the Banach fixed point theorem, there exists a unique y ∈ Y such that T y = y,
i.e. a unique solution to the initial value problem (5.74). 
Remarks. 1. The proof is constructive. That is, it tells us how to compute the solution.
This is because the proof of the Banach fixed point theorem is constructive. Indeed,
construct a sequence (yn )n≥0 ⊂ Y by y0 (t) = y0 and
Z t
(5.87) yn (t) = y0 + F (s, yn−1 (s))ds for n = 1, 2, . . .
t0

Then (yn )n≥0 converges uniformly on I to the solution y. This method is called Picard
iteration.
2. Note that the length of the existence interval I does not depend on the size of the
constant c in (5.73).
Example 5.29. Consider the initial value problem
( t
y 0 (t) = e sin(t+y(t))
ty(t)−1
,
(5.88)
y(1) = 5.
t
Let F (t, y) = e sin(t+y)
ty−1
. We need to choose a rectangle R around the point (1, 5) where
we have control over |F (t, y)| and |∂y F (t, y)|. Thus we need to stay away from the set
of (t, y) such that ty − 1 = 0. Say,
(5.89) R = {(t, y) : |t − 1| ≤ 21 , |y − 5| ≤ 1}.
Then for (t, y) ∈ R:
(5.90) |ty − 1| ≥ (1 − 21 )(5 − 1) − 1 = 1.
Also, |et sin(t + y)| ≤ e3/2 . Setting M = e3/2 , we obtain
(5.91) |F (t, y)| ≤ M for all (t, y) ∈ R.
Compute
et cos(t + y) et sin(t + y)
(5.92) ∂y F (t, y) = −t .
ty − 1 (ty − 1)2
For (t, y) ∈ R we estimate
et cos(t + y) et sin(t + y)
(5.93) |∂y F (t, y)| ≤ + t ≤ c,
ty − 1 (ty − 1)2
where we have set c = e3/2 + 23 e3/2 . Then the number a∗ from Theorem 5.27 is
a∗ = min(a, b/M ) = min( 12 , 1/e3/2 ) = e−3/2 . So the theorem yields the existence
and uniqueness of a solution the the initial value problem (5.88) in the interval I =
[1−e−3/2 , 1+e−3/2 ]. We can also compute that solution by Picard iteration: let y0 (t) = 5
and
Z t s
e sin(s + yn−1 (s))
(5.94) yn (t) = 5 + ds.
1 syn−1 (s) − 1
The sequence (yn )n converges uniformly on I to the solution y.
3. ORDINARY DIFFERENTIAL EQUATIONS 91

Example 5.30. Sometimes one can extend solutions beyond the interval obtained
from the Picard-Lindelöf theorem. Consider the initial value problem
 0
y (t) = cos(y(t)2 − 2t3 )
(5.95)
y(0) = 1
We claim that there exists a unique solution y : R → R. To prove this it suffices to
demonstrate the existence of a unique solution on the interval [−L, L] for every L > 0.
To do this we invoke the Picard-Lindelöf theorem. Set
(5.96) R = {(t, y) ∈ R2 : |t| ≤ L, |y − 1| ≤ L}.
Let F (t, y) = cos(y 2 − 2t3 ). Then
(5.97) |F (t, y)| ≤ 1 for all (t, y) ∈ R2 .
We have ∂y F (t, y) = −2y sin(y 2 − 2t3 ), so |∂y F (t, y)| ≤ 2|y| ≤ 2(L + 1) for all (t, y) ∈ R.
Then by Theorem 5.27, there exists a unique solution to (5.95) on I = [−L, L].
Example 5.31. If the Lipschitz condition (5.73) fails, then the initial value problem
may have more than one solution. Consider
 0
y (t) = |y(t)|1/2 ,
(5.98)
y(0) = 0.
The function y 7→ |y|1/2 is not Lipschitz continuous in any neighborhood of 0: for
y > 0 its derivative 21 y −1/2 is unbounded as y → 0. The function y1 (t) = 0 solves the
initial value problem (5.98). The function
 2
t /4, if t > 0,
(5.99) y2 (t) =
0, if t ≤ 0
also does.
92 5. DIFFERENTIAL CALCULUS IN Rn

Lecture 32 (Monday, November 18)

Existence of a solution still holds without the assumption (5.73). We will prove this
as a consequence of the Arzelá-Ascoli theorem.
Theorem 5.32 (Peano existence theorem). Let E ⊂ R × R open, (t0 , y0 ) ∈ E,
F ∈ C(E),
(5.100) R = {(t, y) : |t − t0 | ≤ a, |y − y0 | ≤ b} ⊂ E.
Let M = sup(t,y)∈R |F (t, y)| < ∞. Define a∗ = min(a, b/M ) and let I = [t0 − a∗ , t0 + a∗ ].
Then there exists a solution y : I → R to the initial value problem
 0
y (t) = F (t, y(t)),
(5.101)
y(t0 ) = y0 .
Corollary 5.33. Let E ⊂ R × R open, (t0 , y0 ) ∈ E, F ∈ C(E). Then there exists
an interval I ⊂ R and a differentiable function y : I → R such that (t, y(t)) ∈ E for all
t ∈ I and y solves (5.74).
Proof. It suffices to produce a solution to the integral equation
Z t
(5.102) y(t) = y0 + F (s, y(s))ds.
t0

To avoid some technicalities we will only present the proof under the additional as-
sumption that
(5.103) |F (t, y)| ≤ M
holds for |t − t0 | ≤ a and all y ∈ R. Then we may choose b arbitrarily large and thus
a∗ = a. We also restrict our attention to the interval [t0 , t0 + a], which we denote by I.
The construction is similar on the other half, [t0 −a, t0 ]. Let P be a partition of [t0 , t0 +a]:
P = {t0 < t1 < · · · < tN = t0 + a} of [t0 , t0 + a]. We let ∆P = max0≤k≤N −1 (tk+1 − tk )
denote the fineness of P. We try to build an approximate solution given as a piecewise
linear function. The function yP : [t0 , t0 + a] → R shall be defined as follows: let
yP (t0 ) = y0 and for t ∈ (tk , tk+1 ] we define yP (t) recursively by
(5.104) yP (t) = yP (tk ) + F (tk , yP (tk ))(t − tk ).
Claim 1. For t, t0 ∈ [t0 , t0 + a],
(5.105) |yP (t) − yP (t0 )| ≤ M |t − t0 |.
Proof of claim. In this proof we will write yP as y for brevity. Say t0 ∈ [tk , tk+1 ], t ∈
[t` , t`+1 ], k ≤ `. If k = `, then by (5.103),
(5.106) |y(t) − y(t0 )| = |F (tk , y(tk ))(t − t0 )| ≤ M |t − t0 |.
If k < `, then
`−1
X
0
(5.107) |y(t) − y(t )| = |y(t) − y(t` ) + (y(tj+1 ) − y(tj )) + y(tk+1 ) − y(t0 )|
j=k+1

`−1
X
(5.108) ≤ |y(t) − y(t` )| + |y(tj+1 ) − y(tj )| + |y(tk+1 ) − y(t0 )|
j=k+1
3. ORDINARY DIFFERENTIAL EQUATIONS 93

`−1
X
(5.109) ≤ M (t − t` ) + M (tj+1 − tj ) + M (tk+1 − t0 ) = M (t − t0 ).
j=k+1


Define gP (t) = F (tk , yP (tk )) for t ∈ (tk , tk+1 ]. Then gP is a step function and
yP0 (t) = gP (t) for t ∈ (tk , tk+1 ).

Let ε > 0. F is uniformly continuous on R, because R is compact (Theorem 2.10).


Thus there exists δ = δ(ε) > 0 such that
(5.110) |F (t, y) − F (t0 , y 0 )| ≤ ε
for all (t, y), (t0 , y 0 ) ∈ R with k(t, y) − (t0 , y 0 )k ≤ 100δ.

Claim 2. Suppose that ∆P ≤ δ(ε) min(1, M −1 ). Then we have for all t ∈ [t0 , t0 + a]
that
Z t
(5.111) yP (t) = y0 + gP (s)ds and |gP (s) − F (s, yP (s))| ≤ ε if s ∈ (tk−1 , tk ).
t0

Proof of claim. We will write y instead of yP and g instead of gP in this proof.


First we have for t = tk :
k
X
(5.112) y(tk ) − y0 = y(tk ) − y(t0 ) = y(tj ) − y(tj−1 )
j=1

k
X k Z
X tj Z tk
(5.113) = F (tj−1 , y(tj−1 ))(tj − tj−1 ) = g(s)ds = g(s)ds.
j=1 j=1 tj−1 t0

Similarly, for t ∈ (tk , tk+1 ):


Z t
(5.114) y(t) − y(tk ) = F (tk , y(tk ))(t − tk ) = g(s)ds.
tk

Thus,
Z t Z tk Z t Z t
(5.115) y(t) = y(tk ) + g(s)ds = y0 + g(s)ds + g(s)ds = y0 + g(s)ds.
tk t0 tk t0

Let s ∈ (tk−1 , tk ). Then


(5.116) |g(s) − F (s, y(s))| = |F (tk−1 , y(tk−1 )) − F (s, y(s))|.
We have
(5.117) |y(tk−1 ) − y(s)| ≤ M |tk−1 − s| ≤ M (tk − tk−1 ) ≤ M · ∆P ≤ δ.
Also, |tk−1 − s| ≤ tk − tk−1 ≤ ∆P ≤ δ. Thus,
(5.118) k(tk−1 , y(tk−1 )) − (s, y(s))k ≤ 100δ.
By (5.110),
(5.119) |g(s) − F (s, y(s))| = |F (tk−1 , y(tk−1 )) − F (s, y(s))| ≤ ε.

94 5. DIFFERENTIAL CALCULUS IN Rn

Claim 3. Suppose that ∆P ≤ δ(ε) min(1, M −1 ). Then it holds for all t ∈ [t0 , t0 + a]
that
Z t
(5.120) |yP (t) − (y0 + F (s, yP (s))ds)| ≤ εa.
t0

Proof of claim. By Claim 2, the left hand side equals


Z t Z t
(5.121) (gP (s) − F (s, yP (s)))ds ≤ |gP (s) − F (s, yP (s))|ds.
t0 t0

Claim 2 implies that this is no larger than ε(t − t0 ) ≤ εa. 


Claim 3 says that yP is almost a solution if the partition P is sufficiently fine. In
the final step we use a compactness argument to obtain an honest solution.

Claim 4. The set F = {yP : P partition} ⊂ C([t0 , t0 + a]) is relatively compact.


Proof of claim. By Claim 1,
(5.122) |yP (t) − yP (t0 )| ≤ M |t − t0 |.
This implies that F is equicontinuous. It is also bounded:
(5.123) |yP (t)| ≤ |yP (t0 )| + |yP (t) − yP (t0 )| ≤ |y0 | + M |t − t0 | ≤ |y0 | + M a.
Thus the claim follows from the Arzelà-Ascoli theorem (Theorem 2.31). 
For n ∈ N, choose a partition Pn with ∆Pn ≤ δ(1/n) min(1, M −1 ). By compactness
of F, the sequence (yPn )n ⊂ F ⊂ F has a convergent subsequence that converges to
some limit y ∈ C([t0 , t0 + a]). It remains to show that y is a solution to the integral
equation (5.102). Let us denote that subsequence by (yn )n . By (uniform) continuity of
F , we have that F (s, yn (s)) → F (s, y(s)) uniformly in s ∈ [t0 , t] as n → ∞. Thus,
Z t Z t
(5.124) F (s, yn (s))ds −→ F (s, y(s))ds as n → ∞.
t0 t0

On the other hand, by Claim 3 we get


Z t
a
(5.125) |yn (t) − (y0 + F (s, yn (s))ds)| ≤ −→ 0 as n → ∞.
t0 n
Therefore, y solves the integral equation (5.102). 
The theory for ordinary differential equations that we have developed turns out to
be far more general.
Systems of first-order ordinary differential equations. The proofs of the Picard-Lindelöf
theorem and the Peano existence theorem can easily be extended to apply to systems
of differential equations:
 0
y (t) = F (t, y(t)),
(5.126)
y(t0 ) = y0
for F : E → Rm , E ⊂ R × Rm open, (t0 , y0 ) ∈ E.
Higher-order differential equations. Let d ≥ 1 and consider the d-th order ordinary
differential equation given by
(5.127) y (d) (t) = F (t, y(t), y 0 (t), . . . , y (d−1) (t))
3. ORDINARY DIFFERENTIAL EQUATIONS 95

for some F : E → R, E ⊂ R × Rd open. We can transform this equation into a system


of d first-order equations: if Y = (Y1 , . . . , Yd ) solves the system
Y10 (t) = Y2 (t)


0
 Y2 (t) = Y3 (t)



(5.128) ..
.
0

 0 d−1 = Yd (t)
Y (t)



Yd (t) = F (t, Y (t))
then Yd is a solution to (5.127).
96 5. DIFFERENTIAL CALCULUS IN Rn

Lecture 33 (Wednesday, November 20)

4. Higher order derivatives and Taylor’s theorem


Definition 5.34. Let U ⊂ Rn be open and f : U → R. We define the partial
derivatives of second order as
(5.129) ∂ij f = ∂xi xj f = ∂xi (∂xj f ) for i, j ∈ {1, . . . , n}
(if ∂xj f , ∂xi (∂xj f ) exist). If ∂i f and ∂ij f exist and are continuous for all i, j ∈ {1, . . . , n},
then we say that f ∈ C 2 (U ).
Theorem 5.35 (Schwarz). Let U ⊂ Rn open, f : U → R such that ∂xi f, ∂xj f ,
∂xi xj f exist at every point in U and ∂xi xj f is continuous at some point x0 ∈ U . Then
∂xj xi f (x0 ) exists and
(5.130) ∂xj xi f (x0 ) = ∂xi xj f (x0 ).
Proof. Without loss of generality assume that n = 2, i = 1, j = 2. Let f be as
in the theorem and x0 = (a, b) ∈ U and (h, k) ∈ R2 \{0} such that (a + h, b + k) are
contained in an open ball around x0 that is contained in U . We want to show that ∂21 f
exists, so we need to study the expression
(5.131) ∂1 f (a, b + k) − ∂1 f (a, b).
This leads us to consider the quantity
(5.132) ∆(a, b, h, k) = (f (a + h, b + k) − f (a, b + k)) − (f (a + h, b) − f (a, b)).
Define g(y) = f (a + h, y) − f (a, y). Since ∂2 f exists at every point in U , the mean
value theorem implies that there exists η = ηh,k contained in the closed interval with
endpoints b and b + k such that
(5.133) ∆(a, b, h, k) = g(b + k) − g(b) = g 0 (η)k = k(∂2 f (a + h, η) − ∂2 f (a, η))
Since ∂12 f exists at every point in U , another application of the mean value theorem
yields
(5.134) ∆(a, b, h, k) = hk∂12 f (ξ, η),
where ξ = ξh is in the closed interval with endpoints a and a + h.
Let ε > 0. Since ∂12 f is continuous at (a, b),
(5.135) |∂12 f (a, b) − ∂12 f (x, y)| ≤ ε
whenever k(a, b) − (x, y)k is small enough. Thus, for small enough h and k we have
∆(a, b, h, k)
(5.136) |∂12 f (a, b) − | ≤ ε.
hk
Letting h → 0 and using that ∂1 f exists at every point this inequality implies
∂1 f (a, b + k) − ∂1 f (a, b)
(5.137) ∂12 f (a, b) − ≤ ε.
k
In other words, ∂21 f (a, b) exists and equals ∂12 f (a, b). 
This is not true without the assumption that ∂xi xj f is continuous at x.
4. HIGHER ORDER DERIVATIVES AND TAYLOR’S THEOREM 97

Exercise 5.36. Define f : R2 → R by


2 2
xy xx2 −y

2, if (x, y) 6= (0, 0),
(5.138) f (x, y) = +y
0, if (x, y) = (0, 0).
Show that ∂x ∂y f and ∂y ∂x f exist at every point in R2 , but that ∂x ∂y f (0, 0) 6= ∂y ∂x f (0, 0).
Corollary 5.37. If f ∈ C 2 (U ), then ∂xi xj f = ∂xj xi f for every i, j ∈ {1, . . . , n}.
Definition 5.38. Let U ⊂ Rn be an open set and f : U → R. Let k ∈ N. If all
partial derivatives of f up to order k exist, i.e. for all j ∈ {1, . . . , k} and i1 , . . . , ij ∈
{1, . . . , n}, the ∂i1 · · · ∂ij f exist, and are continuous, then we write f ∈ C k (E) and say
that f is k times continuously differentiable.
Corollary 5.39. If f ∈ C k (U ) and π : {1, . . . , k} → {1, . . . , k} is a bijection, then
(5.139) ∂i1 · · · ∂ik f = ∂iπ(1) · · · ∂iπ(k) f
for all i1 , . . . , ik ∈ {1, . . . , n}.
98 5. DIFFERENTIAL CALCULUS IN Rn

Lecture 34 (Friday, November 22)

Multiindex notation. In order to make formulas involving higher order derivatives


shorter and more readable, we introduce multiindex notation. P A multiindex of order k
n n n
is a vector
Pn α = (α1 , . . . , α n ) ∈ N0 = {0, 1, 2, . . . } such that i=1 αi = k. We write
|α| = i=1 αi . For every multiindex α we introduce the notation
(5.140) ∂ α f = ∂xα11 · · · ∂xαnn f,
where ∂xαii is short for ∂xi · · · ∂xi (αi times). For x = (x1 , . . . , xn ) ∈ Rn we also write
(5.141) xα = xα1 1 · · · xαnn
and α! = α1 ! · · · αn !. Moreover, for α, β ∈ Nn0 , α ≤ β means that αi ≤ βi for every
i ∈ {1, . . . , n}.
With this notation, we can state Taylor’s theorem in Rn quite succinctly.
Theorem 5.40 (Taylor). Let U ⊂ Rn be open and convex, f ∈ C k+1 (U ) and x, x +
y ∈ U . Then there exists ξ ∈ U such that
X ∂ α f (x) X ∂ α f (ξ)
(5.142) f (x + y) = yα + yα.
α! α!
|α|≤k |α|=k+1

Moreover, ξ takes the form ξ = x + θy for some θ ∈ [0, 1].


Remark. Without multiindex notation the statement of this theorem would look much
more messy:
X ∂ α f (x)
α
X ∂xα11 · · · ∂xαnn f (x) α1
(5.143) y = y1 · · · ynαn .
α! α ,...,α ≥0,
α1 ! · · · αn !
|α|≤k 1 n
α1 +···+αn ≤k

Proof of Theorem 5.40. The idea is to apply Taylor’s theorem in one dimension
to the function g : [0, 1] → R given by g(t) = f (x + ty). Let us compute the derivatives
of g.
Claim. For m = 1, . . . , k + 1,
X m!
(5.144) g (m) (t) = ∂ α f (x + ty)y α
α!
|α|=m

Proof of claim. We first show by induction on m that


n
X
(5.145) g (m) (t) = ∂i1 · · · ∂im f (x + ty)yi1 · · · yim .
i1 ,...,im =1

Indeed, for m = 1, by the chain rule,


n
X
0
(5.146) g (t) = ∂i f (x + ty)yi .
i=1

Suppose we have shown it for m. Then


n
(m+1) d (m) d X
(5.147) g (t) = g (t) = ∂i · · · ∂im f (x + ty)yi1 · · · yim .
dt dt i ,...,i =1 1
1 m
4. HIGHER ORDER DERIVATIVES AND TAYLOR’S THEOREM 99

By the chain rule this equals


(5.148)
Xn X n n
X
= ∂i1 · · · ∂im ∂i f (x+ty)yi1 · · · yim yi = ∂i1 · · · ∂im+1 f (x+ty)yi1 · · · yim+1 .
i1 ,...,im =1 i=1 i1 ,...,im+1 =1

It remains to show that


X n X m!
(5.149) ∂i1 · · · ∂im f (x + ty)yi1 · · · yim = ∂ α f (x + ty)y α .
i ,...,i =1
α!
1 m |α|=m

This follows because for a given α = (α1 , . . . , αn ) with |α| = m there are
    
m! m! m m − α1 m − α1 − · · · − αn−1
(5.150) = = ···
α! α1 ! · · · αn ! α1 α2 αn
many tuples (i1 , . . . , im ) ∈ {1, . . . , n}m such that i appears exactly αi times among the
ij s. In other words, this is the number of ways to sort m pairwise different marbles into
n numbered bins such that bin number i contains exactly αi marbles. 
By the one-dimensional Taylor theorem, there exists a θ ∈ [0, 1] such that
k
X g (m) (0) m g (k+1) (θ) k+1
(5.151) g(t) = t + t
m=0
m! (k + 1)!
From the claim we see that this equals
k
X 1 X m! α 1 X (k + 1)!
(5.152) ∂ f (x)y α tm + ∂ α f (x + θy)y α tk+1
m=0
m! α! (k + 1)! α!
|α|=m |α|=k+1

X ∂ α f (x) X ∂ α f (ξ)
(5.153) = (ty)α + (ty)α ,
α! α!
|α|≤k |α|=k+1

where we have set ξ = x + θy. Letting t = 1 we obtain the claim. 


Corollary 5.41. If E ⊂ Rn is open and f ∈ C k (E), then for every x ∈ E,
X ∂ α f (x)
(5.154) f (x + y) = y α + o(kykk ) as y → 0.
α!
|α|≤k

Proof. Let x ∈ E and δ > 0 be small enough so that U = Bδ (x) ⊂ E. By Taylor’s


theorem we have for every y with x + y ∈ U that
(5.155)
X ∂ α f (x) X ∂ α f (x + θy) X ∂ α f (x) X ∂ α f (x + θy) − ∂ α f (x)
f (x+y) = yα+ yα = yα+ yα
α! α! α! α!
|α|≤k−1 |α|=k |α|≤k |α|=k

for some θ ∈ [0, 1]. Since ∂ α f is continuous for every |α| = k, it holds that
(5.156) |∂ α f (x + θy) − ∂ α f (x)| → 0 as y → 0.
Also |y α | = |y1 |α1 · · · |yn |αn ≤ kykα1 +···+αn = kyk|α| , so
X ∂ α f (x + θy) − ∂ α f (x)
(5.157) y α = o(kykk ).
α!
|α|=k


100 5. DIFFERENTIAL CALCULUS IN Rn

Definition 5.42. Let E ⊂ Rn be open and f ∈ C 2 (E). We define the Hessian


matrix of f at x ∈ E by
 2
∂1 f (x) · · · ∂1 ∂n f (x)

(5.158) D2 f |x = (∂i ∂j f (x))i,j=1,...,n =  .. ... ..  ∈ Rn×n .


. .
∂n ∂1 f (x) · · · ∂n2 f (x)
We call det D2 f |x the Hessian determinant of f at x ∈ E.
Sometimes the term Hessian is used for both, the matrix and its determinant. By
Theorem 5.35 the Hessian matrix is symmetric.
Corollary 5.43. Let E ⊂ Rn be open, f ∈ C 2 (E) and x ∈ E. Then
(5.159) f (x + y) = f (x) + h∇f (x), yi + 12 hy, D2 f |x yi + o(kyk2 ) as y → 0.
(Here hx, yi = ni=1 xi yi denotes the inner product of two vectors x, y ∈ Rn .)
P

Proof. By Corollary 5.41,


X ∂ α f (x) X ∂ α f (x)
(5.160) f (x + y) = f (x) + yα + y α + o(kyk2 ) as y → 0.
α! α!
|α|=1 |α|=2

We have
X ∂ α f (x) Xn
α
(5.161) y = ∂i f (x)yi = h∇f (x), yi.
α! i=1
|α|=1

If |α| = 2 then either α = 2ei for some i ∈ {1, . . . , n} or α = ei + ej for some


1 ≤ i < j ≤ n. Thus,
X ∂ α f (x) Xn X X n
α 1 2 2 1
(5.162) y =2 ∂i f (x)yi + ∂i ∂j f (x)yi yj = 2 ∂i ∂j f (x)yi yj
α! i=1 1≤i<j≤n i,j=1
|α|=2

n
X
(5.163) = 1
2
yi (D2 f |x y)i = 21 hy, D2 f |x yi.
i=1

5. LOCAL EXTREMA 101

Lecture 35 (Monday, November 25)

5. Local extrema
Let E ⊂ Rn be an open set and f : E → R a function.
Definition 5.44. A point a ∈ E is called a local maximum if there exists an open
set U ⊂ E with a ∈ U such that f (a) ≥ f (x) for all x ∈ U . It is called a strict local
maximum if f (a) > f (x) for all x ∈ U , x 6= a. We define the terms local minimum,
strict local minimum accordingly. A point is called a (strict) local extremum if it is a
(strict) local maximum or a (strict) local minimum.
Theorem 5.45. Suppose the partial derivative ∂i f exists on E. Then, if f has a
local extremum at a ∈ E, then ∂i f (a) = 0.
Proof. Let δ > 0 be such that a + tei ∈ E for all |t| ≤ δ. Define g : (−δ, δ) → R
by g(t) = f (a + tei ). By the chain rule, g is differentiable and g 0 (t) = ∂i f (a + tei ). Also,
0 is a local extremum of g so by Analysis I, 0 = g 0 (0) = ∂i f (a). 
Corollary 5.46. If f is differentiable at a and a is a local extremum, then ∇f (a) =
0.
Remark. ∇f (a) = 0 is not a sufficient condition for a to be a local extremum. Think
of saddle points.
Definition 5.47. If a ∈ E is such that ∇f (a) = 0, then we call a a critical point
of f .
Recall from linear algebra: A matrix A ∈ Rn×n is called positive definite if hx, Axi >
0 for all x ∈ Rn \{0} and positive semidefinite if hx, Axi ≥ 0 for all x ∈ Rn . We also
write A > 0 to express that A is positive definite and A ≥ 0 to express that A is positive
semidefinite. The terms negative definite, negative semidefinite are defined accordingly.
A is indefinite if it is not positive semidefinite and not negative semidefinite. Every real
symmetric matrix has real eigenvalues and there is an orthonormal basis of eigenvectors
(spectral theorem). A real symmetric matrix is positive definite if and only if all
eigenvalues are positive.
Theorem 5.48. Let f ∈ C 2 (E) and a ∈ E with ∇f (a) = 0. Then
(1) if D2 f |a > 0, then a is a strict local minimum of f ,
(2) if D2 f |a < 0, then a is a strict local maximum of f ,
(3) if D2 f |a is indefinite, then a is not a local extremum of f .
Remark. If D2 f |x is only positive semidefinite or negative semidefinite, then we need
more information to be able to decide whether or not a is a local extremum.
Proof. We write A = D2 f |a . Let ε > 0. By Corollary 5.43 there exists δ > 0 such
that for all y with kyk ≤ δ we have
(5.164) f (a + y) = f (a) + 21 hy, Ayi + r(y)
with |r(y)| ≤ εkyk2 .
(1): Let A be positive definite. Let S = {y ∈ Rn : kyk = 1}. S is compact, so the
continuous map y 7→ hy, Ayi attains its minimum on S. That is, there exists y0 ∈ S
such that
(5.165) hy0 , Ay0 i ≤ hy, Ayi
102 5. DIFFERENTIAL CALCULUS IN Rn

for all y ∈ S. Define α = hy0 , Ay0 i. Since y0 6= 0 and A is positive definite, α > 0. Let
y
y ∈ Rn , y 6= 0. Then kyk ∈ S, so
y y 1
(5.166) α≤h ,A i= hy, Ayi.
kyk kyk kyk2
Thus, hy, Ayi ≥ αkyk2 for all y ∈ Rn . Now we set ε = α4 . Then
(5.167)
f (a + y) ≥ f (a) + 12 hy, Ayi − α4 kyk2 ≥ f (a) + α2 kyk2 − α4 kyk2 = f (a) + α4 kyk2 > f (a)
if y 6= 0, kyk ≤ δ. Therefore a is a local minimum.
(2): Follows from (1) by replacing f by −f .
(3): Let A be indefinite. We need to show that in every open neighborhood of a there
exist points y 0 , y 00 such that
(5.168) f (y 00 ) < f (a) < f (y 0 ).
Since A is not negative semidefinite there exists ξ ∈ Rn such that α = hξ, Aξi > 0.
Then, for t ∈ R small enough such that |tξ| ≤ δ we have
(5.169) f (a + tξ) = f (a) + 12 htξ, Atξi + r(tξ) = f (a) + 21 αt2 + r(tξ).
Let ε > 0 be such that |r(tξ)| ≤ α4 t2 for all |tξ| ≤ δ (recall that δ depends on ε). Then
f (a + tξ) ≥ f (a) + 14 αt2 > f (a). Similarly, since A is also not positive semidefinite,
there exists η ∈ Rn such that hη, Aηi < 0 and for small enough t, f (a + tη) < f (a). 
Examples 5.49. (1) Let f (x, y) = c + x2 + y 2 for c ∈ R. Then
 
2 2 0
(5.170) D f |0 = >0
0 2
and 0 is a strict local minimum of f (even a global minimum).
(2) Let f (x, y) = c + x2 − y 2 for c ∈ R. Then
 
2 2 0
(5.171) D f |0 =
0 −2
is indefinite and 0 is not a local extremum of f .
(3) Let f1 (x, y) = x2 + y 4 , f2 (x, y) = x2 , f3 (x, y) = x2 + y 3 . Then
 
2 2 0
(5.172) D fi |0 = ≥ 0,
0 0
but f1 has a strict local minimum at 0, f2 has a (non-strict) local minimum at
0 and f3 has no local extremum at 0.

Optional topic (not relevant for exams)

6. Optimization and convexity*


In applications it is often desirable to minimize a given function f : E → R, i.e. to
find x∗ ∈ E such that f (x∗ ) ≤ f (x) for all x ∈ E. We call such a point x∗ a global
minimum of f . We say that x∗ is a strict global minimum if f (x∗ ) < f (x) for all x 6= x∗ .
6. OPTIMIZATION AND CONVEXITY* 103

Example 5.50 (Linear regression). Say we are given finitely many points
(5.173) (x1 , y1 ), . . . , (xN , yN ) ∈ Rn × R.
Suppose for instance that these represent measurements or observations of some physical
system. For example, xi could represent a point in space and yi the corresponding air
pressure measurement. We are looking to discover a “hidden relation” between the x
and y coordinates. That is, we are looking for a function F : Rn → R such that F (xi )
is (at least roughly) yi . One way this is done is linear regression. Here we search only
among F that take the form
(5.174) Fa,b (x) = hx, ai + b
with some parameters a ∈ Rn , b ∈ R. That is, we are trying to “model” the hidden
relation by an affine linear function. The task is now to find the parameters a, b such
that Fa,b “fits best” to the given data set. To make this precise we introduce the error
function
XN
(5.175) E(a, b) = (Fa,b (xi ) − yi )2 .
i=1

The problem of linear regression is to find the parameters (a, b) such that E(a, b) is
minimal.
One approach to minimizing a function f : E → R is to solve the equation ∇f (x) =
0, i.e. to find all critical points. By Corollary 5.46 we know that every minimum must
be a critical points. However it is often difficult to solve that equation, so more practical
methods are needed.
Gradient descent. Choose x0 ∈ Rn arbitrary and let
(5.176) xn+1 = xn − αn ∇f (xn )
where αn > 0 is a small enough number to be determined later. The idea of this
iteration is to keep moving into the direction where f decreases the fastest. Sometimes
this simple process successfully converges to a minimum and sometimes it doesn’t,
depending on f , x0 and αn . What we can say from the definition is that, if f ∈ C 1 (E)
and (xn )n converges, then the limit is a critical point of f . The following lemma gives
some more hope.
Lemma 5.51. Let f ∈ C 1 (E). Then, for every x ∈ E and small enough α > 0,
(5.177) f (x − α∇f (x)) ≤ f (x).
Proof. By the definition of total derivatives,
(5.178) f (x−α∇f (x)) = f (x)+h∇f (x), −α∇f (x)i+o(α) = f (x)−αk∇f (x)k2 +o(α)
which is ≤ f (x) provided that α > 0 is small enough. 
Remark. Note that the smallness of α in this lemma depends on the point x. Also, this
result is not enough to prove anything about the convergence of gradient descent.
We will see that gradient descent works well if f is a convex function.
Definition 5.52. Let E ⊂ Rn be convex. A function f : E → R is called convex if
(5.179) f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y)
for all x, y ∈ E, t ∈ [0, 1]. f is called strictly convex if
(5.180) f (tx + (1 − t)y) < tf (x) + (1 − t)f (y)
104 5. DIFFERENTIAL CALCULUS IN Rn

for all x 6= y ∈ E and t ∈ (0, 1).


Theorem 5.53. Let E ⊂ Rn be open and convex and f ∈ C 1 (E). Then f is convex
if and only if
(5.181) f (u + v) ≥ f (u) + h∇f (u), vi
for all u, u + v ∈ E.
Proof. ⇒: Fix u, u + v ∈ E. By convexity, for t ∈ [0, 1],
(5.182) f (u + tv) = f ((1 − t)u + t(u + v)) ≤ (1 − t)f (u) + tf (u + v).
By definition of the derivative,
(5.183) f (u + tv) = f (u) + t∇f (u)T v + r(t),
r(t)
where limt→0 t
= 0. Thus,
(5.184) f (u) + th∇f (u), vi + r(t) ≤ (1 − t)f (u) + tf (u + v)
which implies
−r(t)
(5.185) f (u) + h∇f (u), vi − f (u + v) ≤ → 0 as t → 0.
t
Therefore f (u) + h∇f (u), vi ≤ f (u + v).
⇐: Let x, y ∈ E, t ∈ [0, 1]. Let u = tx + (1 − t)y and v = x − u. Then the assumption
implies
(5.186) f (x) ≥ f (u) + h∇f (u), x − ui.
On the other hand, letting v = y − u, the assumption implies
(5.187) f (y) ≥ f (u) + h∇f (u), y − ui.
Therefore
(5.188) tf (x) + (1 − t)f (y) ≥ t(f (u) + h∇f (u), x − ui) + (1 − t)(f (u) + h∇f (u), y − ui)

(5.189) = f (u) + h∇f (u), t(x − u) + (1 − t)(y − u)i = f (u) + h∇f (u), tx + (1 − t)y − ui.
Recalling that u = tx + (1 − t)y, we get
(5.190) tf (x) + (1 − t)f (y) ≥ f (u) = f (tx + (1 − t)y).

Theorem 5.54. Let E ⊂ Rn be open and convex and f ∈ C 2 (E). Then
(1) f is convex if and only if D2 f |x ≥ 0 for all x ∈ E,
(2) f is strictly convex if D2 f |x > 0 for all x ∈ E.
Proof. We only prove (1). The proof of (2) is very similar. Let f be convex. By
Taylor’s theorem, for u, u + tv ∈ E,
(5.191) f (u + tv) = f (u) + th∇f (u), vi + 21 t2 hD2 f |u v, vi + o(t2 )
and by Theorem 5.53,
(5.192) f (u + tv) ≥ f (u) + th∇f (u), vi.
Combining these two pieces of information we obtain
1 2
(5.193) 2
t hD2 f |u v, vi + o(t2 ) ≥ 0
6. OPTIMIZATION AND CONVEXITY* 105

which implies hD2 f |u v, vi ≥ 0 for all v ∈ Rn .


Conversely, assume that D2 f |u ≥ 0 for all u ∈ E. By Taylor’s theorem, for all u, u+v ∈
E exists ξ ∈ E such that
(5.194) f (u + v) = f (u) + h∇f (u), vi + 12 hD2 f |ξ v, vi ≥ f (u) + h∇f (u), vi.
Therefore f is convex by Theorem 5.53. 
Remark. If f is strictly convex, then it does not follow that D2 f |x > 0 for all x.
Example 5.55. Let f : R → R, f (x) = x4 . Then D2 f |x = f 00 (x) = 12x2 which is 0
at x = 0, but f is strictly convex.
Theorem 5.56. Let E ⊂ Rn be open and convex and f ∈ C 2 (E). Then
(1) If f is convex, then every critical point of f is a global minimum.
(2) If f is strictly convex, then f has at most one critical point.
Remarks. 1. Convex functions may have more than one critical point. For instance,
the constant function f ≡ 0 is convex.
2. Conclusion (1) implies that if f is convex and gradient descent converges, then it
converges to a global minimum.
Proof. (1): Let ∇f (x∗ ) = 0. Then by Taylor’s theorem, for every x ∈ E there
exists ξ ∈ E such that
(5.195) f (x) = f (x∗ ) + h∇f (x∗ ), x − x∗ i + 12 hD2 f |ξ (x − x∗ ), x − x∗ i ≥ f (x∗ ).
| {z } | {z }
=0 ≥0

(2): Let x1 , x2 ∈ E be critical points of f . By (1), they are global minima. This implies
f (x1 ) = f (x2 ). If x1 6= x2 , then by strict convexity,
f (x1 ) + f (x2 ) x + x 
1 2
(5.196) f (x1 ) = >f .
2 2
This is a contradiction to x1 being a global minimum. Therefore x1 = x2 . 
Example 5.57. If k · k is a norm on Rn , then the function x 7→ kxk is convex:
(5.197) ktx + (1 − t)yk ≤ tkxk + (1 − t)kyk
by the triangle inequality. Also, this function has a unique global minimum at x = 0.
Lemma 5.58. Let I ⊂ R, E ⊂ Rn be convex and suppose that
(1) f : E → I is convex, and
(2) g : I → R is convex and nondecreasing.
Then the function h : E → R given by h = g ◦ f is convex.
Proof. By convexity of f and since g is nondecreasing,
(5.198) h(tx + (1 − t)y) = g(f (tx + (1 − t)y)) ≤ g(tf (x) + (1 − t)f (y)).
Since g is convex this is
(5.199) ≤ tg(f (x)) + (1 − t)g(f (y)) = th(x) + (1 − t)h(y).

Corollary 5.59. If k · k is a norm on Rn , then the function x 7→ kxk2 is convex.
106 5. DIFFERENTIAL CALCULUS IN Rn

Example 5.60. Recall the error function from linear regression (Example 5.50):
N
X
(5.200) E(a, b) = (ha, xi i + b − yi )2
i=1
n+1
We claim that E : R → R is a convex function. We first rewrite E(a, b) into a
different form. Define a N × (n + 1) matrix M and a vector v ∈ Rn+1 by
a1
 
x11 · · · x1n 1
 
.. 
.. .. .. ..  ∈ RN ×(n+1) , v =  .  n+1
(5.201) M= . . . .  a ∈R ,
 
n
xN 1 · · · xN n 1
b
where xi = (xi1 , . . . , xin ) ∈ Rn for i = 1, . . . , N and a = (a1 , . . . , an ) ∈ Rn . Then
N
X
(5.202) E(a, b) = E(v) = ((M v)i − yi )2 = kM v − yk2 ,
i=1
P 1/2
N
where kck = i=1 |ci |2 .

Let us rename variables and consider


(5.203) E(x) = kM x − yk2
for x ∈ Rn , M ∈ RN ×n , y ∈ RN . Let F : RN → R be defined by F (y) = kyk2 and
G : Rn → RN , G(x) = M x − y. We have
(5.204) ∂i F (y) = 2yi , so DF |y = 2y T ∈ R1×N .
and DG|x = M ∈ RN ×n . Therefore, by the chain rule we obtain
(5.205) DE|x = 2(M x − y)T M = 2(M x)T M − 2y T M = 2xT M T M − 2y T M ∈ R1×n .
Therefore,
(5.206) D2 E|x = (∂i DE|x )i=1,...,n = (2(M T M )i )i=1,...,n = 2M T M.
Notice that M T M is positive semidefinite because
(5.207) hM T M x, xi = hM x, M xi = kM xk2 ≥ 0.
Therefore E is convex by Theorem 5.54.
Example 5.61. Convex functions do not necessarily have a critical point. For
instance the function f : R → R, f (x) = x is convex, because D2 f |x = f 00 (x) = 0 for
all x ∈ R. But ∇f (x) = f 0 (x) = 1 6= 0 for all x ∈ R.
It is also not enough to assume strict convexity. For instance, the function f : R → R,
f (x) = ex is strictly convex, because f 00 (x) = ex > 0. But f 0 (x) = ex > 0 for all x ∈ R.
This motivates us to consider a stronger notion of convexity.
Definition 5.62. Let E ⊂ Rn be convex and open. Let f ∈ C 2 (E). We say that
f is strongly convex if there exists β > 0 such that
(5.208) hD2 f |x y, yi ≥ βkyk2
for all x ∈ E, y ∈ Rn .
6. OPTIMIZATION AND CONVEXITY* 107

Remarks. 1. f is strongly convex if and only if there exists β > 0 such that D2 f |x −βI ≥
0 for all x ∈ E. This follows directly from the definition using that βkyk2 = hβIy, yi.
The condition D2 f |x − βI ≥ 0 is equivalent to the smallest eigenvalue of D2 f |x be-
ing ≥ β. Yet another equivalent way of stating this is saying that the function
g(x) = f (x) − β2 kxk2 is convex. This is because D2 g|x = D2 f |x − βI.
2. If f is strongly convex, then f is strictly convex (by Theorem 5.54).
3. If f is strictly convex, then f is not necessarily strongly convex. For example con-
sider f : R → R, f (x) = ex . For every β > 0 there exists x ∈ R such that ex < β
because ex → 0 as x → −∞.

The following exercise shows that the assumption of strong convexity is not as
restrictive as it may seem at first sight: strictly convex functions are strongly convex
when restricted to compact sets.
Exercise 5.63. Suppose that f ∈ C 2 (Rn ) is strictly convex. Let K ⊂ Rn be
compact and convex. Show that there exist β− , β+ > 0 such that
(5.209) β− kyk2 ≤ hD2 f |x y, yi ≤ β+ kyk2
for all x ∈ K and y ∈ Rn . (In particular, f is strongly convex on K.)
Hint: Consider the minimal eigenvalue of D2 f |x as a function of x.
Theorem 5.64. Let E ⊂ Rn be open and convex. Let f ∈ C 2 (E). Then f is
strongly convex if and only if there exists γ > 0 such that
(5.210) f (u + v) ≥ f (u) + h∇f (u), vi + γkvk2
for every u, u + v ∈ E.
Proof. ⇒: Let β > 0 be such that g(x) = f (x) − β2 kxk2 is convex. Then by
Theorem 5.53,
(5.211) g(u + v) ≥ g(u) + h∇g(u), vi = f (u) − β2 kuk2 + h∇f (u) − βu, vi
On the other hand,
(5.212) g(u + v) = f (u + v) − β2 ku + vk2
Thus,
(5.213)
f (u + v) ≥ f (u) + h∇f (u), vi + β2 (ku + vk2 − kuk2 − 2hu, vi) = f (u) + h∇f (u), vi + β2 kvk2 .
⇐: This follows in the same way from the converse direction of Theorem 5.53. 
Theorem 5.65. Let f ∈ C 2 (Rn ) be strongly convex. Then for every c ∈ R, the
sublevel set
(5.214) B = {x ∈ Rn : f (x) ≤ c}
is bounded.
Proof. By Theorem 5.64 we have
(5.215) f (x) ≥ f (0) + h∇f (0), xi + γkxk2 .
Therefore, limkxk→∞ f (x) = ∞. Suppose that B is unbounded. Then there would exist
a sequence (xn )n≥1 ⊂ B such that limn→∞ kxn k = ∞. But f (xn ) ≤ c, so f (xn ) 6→ ∞
as n → ∞. Contradiction! 
108 5. DIFFERENTIAL CALCULUS IN Rn

Theorem 5.66. Let f ∈ C 2 (Rn ) be strongly convex. Then there exists a unique
global minimum of f .
Proof. By the previous theorem, the set B = {x ∈ Rn : f (x) ≤ f (0)} is bounded.
Thus, there exists R > 0 such that B ⊂ BR = {x ∈ Rn : kxk ≤ R}. BR is compact,
so f attains its minimum on BR (0) at some point x∗ ∈ BR . Then f (x∗ ) ≤ f (x) for all
x ∈ BR . It remains to show f (x∗ ) ≤ f (x) for all x 6∈ BR . If x 6∈ BR , then x 6∈ B, so
f (x) > f (0). Also, 0 ∈ BR , so f (x∗ ) ≤ f (0) < f (x). 
We conclude this discussion by proving that gradient descent converges for strongly
convex functions.
Theorem 5.67. Let f ∈ C 2 (Rn ) be strongly convex and x0 ∈ Rn . Define
(5.216) xn+1 = xn − α∇f (xn ) for n ≥ 0.
If α is small enough, then (xn )n converges to the global minimum x∗ of f .
Remark. The restriction to f defined on Rn is only for convenience (the same is true
for Theorems 5.65 and 5.66).
Lemma 5.68. Let A ∈ Rn×n be a symmetric and positive definite matrix. Then the
matrix norm kAkop = supx6=0 kAxk
kxk
is equal to the largest eigenvalue of A.
Pn 2 1/2
(Here kxk = ( i=1 |xi | ) is the Euclidean norm.)
Proof. Let {v1 , . . . , vn } be an orthonormal basis of eigenvectors corresponding to
eigenvalues λ1 , . . . , λn , respectively. Then
Xn n
X
(5.217) kAxk = xi Avi = xi λi vi
i=1 i=1
P 1/2
n
which by orthogonality is equal to i=1 |xi |2 λ2i (use that kxk = (hx, xi)1/2 ). Thus
n n
!1/2
X X
(5.218) kAxk = ( |xi |2 λ2i )1/2 ≤ max λi 2
|xi | = max λi kxk.
i=1,...,n i=1,...,n
i=1 i=1

Let maxi=1,...,n λi = λi0 . We have shown that kAk ≤ λi0 . On the other hand,
(5.219) kAvi0 k = λi0 kvi0 k = λi0 ,
so kAk = supkxk=1 kAxk ≥ kAvi0 k = λi0 . 
Proof of Theorem 5.67. Let α > 0. Define T (x) = x − α∇f (x). Then xn+1 =
T (xn ). We want T to be a contraction. For R > 0 define BR = {x ∈ Rn : kx − x∗ k ≤
R}. Let R > 0 be large enough such that x0 ∈ BR .
Claim. If α is small enough, then T is a contraction of BR .
Proof of claim. x∗ is a global minimum of f , so ∇f (x∗ ) = 0. Thus, T (x∗ ) = x∗ .
We have
(5.220) DT |x = I − αD2 f |x .
The largest eigenvalue of D2 f |x is a continuous function of x which is bounded on the
compact set BR . Therefore there exists γ > 0 such that
(5.221) hD2 f |x y, yi ≤ γkyk2
7. FURTHER EXERCISES 109

for all y ∈ Rn and x ∈ BR . By strong convexity,


(5.222) βkyk2 ≤ hD2 f |x y, yi ≤ γkyk2 .
In other words, the eigenvalues of D2 f |x are contained in the interval [β, γ] for all
1
x ∈ BR . Let α ≤ 2γ . Then the eigenvalues of I − αD2 f |x are contained in
γ β β
(5.223) [1 − 2γ
,1 − 2γ
] = [ 21 , 1 − 2γ
] ⊂ (0, 1).
β
Set c = 1 − 2γ
. By Lemma 5.68, we have
(5.224) kI − αD2 f |x k ≤ c < 1.
Therefore, kT (x) − T (y)k ≤ ckx − yk for all x, y ∈ BR . It remains to show that
T (BR ) ⊂ BR . Let x ∈ BR . Then since T (x∗ ) = x∗ ,
(5.225) kT (x) − x∗ k = kT (x) − T (x∗ )k ≤ ckx − x∗ k ≤ cR ≤ R.

The claim now follows from the contraction principle (more precisely, from the same
argument used to prove the Banach fixed point theorem). 

7. Further exercises
Exercise 5.69. Show that there exists a unique (x, y) ∈ R2 such that cos(sin(x)) =
y and sin(cos(y)) = x.
Exercise 5.70. Let U ⊆ Rn be open and convex and f : U → R differentiable such
that ∂1 f (x) = 0 for all x ∈ U .
(i) Show that the value of f (x) for x = (x1 , . . . , xn ) ∈ U does not depend on x1 .
(ii) Does (i) still hold if we assume that U is connected instead of convex? Give a proof
or counterexample.
Exercise 5.71. A function f : Rn → R is called homogeneous of degree α ∈ R if
f (λx) = λα f (x) for all λ > 0 and x ∈ Rn . Suppose that f is differentiable. Then show
that f is homogeneous of degree α if and only if
Xn
(5.226) xi ∂i f (x) = αf (x)
i=1

for all x ∈ R . Hint: Consider the function g(λ) = f (λx) − λα f (x).


n

Exercise 5.72. Define F : R2 → R2 by


(5.227) F (x, y) = (x4 − y 4 , exy − e−xy ).
(i) Compute the Jacobian of F .
(ii) Let p0 ∈ R2 and p0 6= (0, 0). Show that there exist open neighborhoods U, V ⊂ R2
of p0 and F (p0 ), respectively and a function G : V → U such that G(F (p)) = p for all
p ∈ U and F (G(p)) = p for all p ∈ V .
(iii) Compute DG F (p0 ) .
(iv) Is F a bijective map?
Exercise 5.73. Let a ∈ R, a 6= 0 and E = {(x, y, z) ∈ R3 : a + x + y + z 6= 0} and
f : E → R3 defined by
 
x y z
(5.228) f (x, y, z) = , , .
a+x+y+z a+x+y+z a+x+y+z
110 5. DIFFERENTIAL CALCULUS IN Rn

(i) Compute the Jacobian determinant of f (that is, the determinant of the Jacobian
matrix).
(ii) Show that f is one-to-one and compute its inverse f −1 .
Exercise 5.74. Prove that there exists δ > 0 such that for all square matrices
A ∈ Rn×n with kA−Ik < δ (where I denotes the identity matrix) there exists B ∈ Rn×n
such that B 2 = A.
Exercise 5.75. Look at each of the following as an equation to be solved for x ∈ R
in terms of parameter y, z ∈ R. Notice that (x, y, z) = (0, 0, 0) is a solution for each of
these equations. For each one, prove that it can be solved for x as a C 1 -function of y, z
in a neighborhood of (0, 0, 0).
3
(a) cos(x)2 − esin(xy) +x = z 2
(b) (x2 + y 3 + z 4 )2 = sin(x − y + z)
(c) x7 + yez x3 − x2 + x = log(1 + y 2 + z 2 )

Exercise 5.76. Let (t0 , y0 ) ∈ R2 , c ∈ R and define Y0 (t) = y0 ,


Z t
(5.229) Yn (t) = y0 + c sYn−1 (s)ds.
t0

Compute Yn (t) and Y (t) = limn→∞ Yn (t). Which initial value problem does Y solve?
Exercise 5.77. Consider the initial value problem
 0 2 1
y (t) = ey(t) − ty(t) ,
(5.230)
y(1) = 1.
Find an interval I = (1 − h, 1 + h) such that this problem has a unique solution y in I.
Give an explicit estimate for h (it does not need to be best possible).
Exercise 5.78. Consider the initial value problem
 0
y (t) = t + sin(y(t)),
(5.231)
y(2) = 1.
Find the largest interval I ⊆ R containing t0 = 2 such that the problem has a unique
solutions y in I.
Exercise 5.79. Let F be a smooth function on R2 (i.e. partial derivatives of all
orders exist everywhere and are continuous) and suppose that the initial value problem
y 0 = F (t, y), y(t0 ) = y0 has a unique solution y on the interval I = [t0 , t0 + a] with
y smooth on I. Let h > 0 be sufficiently small and define tk = t0 + kh for integers
0 ≤ k ≤ a/h.
Define a function yh recursively by setting yh (t0 ) = y0 and
(5.232) yh (t) = yh (tk ) + (t − tk )F (tk , yh (tk ))
for t ∈ (tk , tk+1 ] for integers 0 ≤ k ≤ a/h.
(i) From the proof of Peano’s theorem (Theorem 5.32) it follows that yh → y uniformly
on I as h → 0. Prove the following stronger statement: there exists a constant C > 0
such that for all t ∈ I and h > 0 sufficiently small,
(5.233) |y(t) − yh (t)| ≤ Ch.
Hint: The left hand side is zero if t = t0 . Use Taylor expansion to study how the error
changes as t increases from tk to tk+1 .
7. FURTHER EXERCISES 111

(ii) Let F (t, y) = λy with λ ∈ R a parameter. Explicitly determine y, yh and a value


for C in (i).

Exercise 5.80. Let us improve the approximation from Exercise 5.79. In the
context of that exercise, define a piecewise linear function yh∗ recursively by setting
yh∗ (t0 ) = y0 and
(5.234) yh∗ (t) = yh∗ (tk ) + (t − tk )G(tk , yh∗ (tk ), h),
for t ∈ (tk , tk+1 ] for integers 0 ≤ k ≤ a/h, where
(5.235) G(t, y, h) = 12 (F (t, y) + F (t + h, y + hF (t, y))).
Prove that there exists a constant C > 0 such that for all t ∈ I and h > 0 sufficiently
small,
(5.236) |y(t) − yh∗ (t)| ≤ Ch2 .
Exercise 5.81. For a function f : [a, b] → R define
Z b
(5.237) I (f ) = (1 + f 0 (t)2 )1/2 dt.
a
2
Let A = {f ∈ C ([a, b]) : f (a) = c, f (b) = d}. Determine f∗ ∈ A such that
(5.238) I (f∗ ) = inf I (f ).
f ∈A

What is the geometric meaning of I (f ) and inf f ∈A I (f )?


Exercise 5.82. Let f, g : Rn → R be smooth functions (that is, all partial deriva-
tives exist to arbitrary orders and are continuous). Show that for all multiindices
α ∈ Nn0 ,
X α
α
(5.239) ∂ (f · g)(x) = ∂ β f (x)∂ α−β g(x)
β∈Nn
β
0 : β≤α

for all x ∈ Rn , where αβ = β!(α−β)!


α!
= β1 !···βn !(αα11−β
!···αn !

1 )!···(αn −βn )!
.

Exercise 5.83. Let f : R2 → R be such that ∂1 ∂2 f exists everywhere. Does it


follow that ∂1 f exists? Give a proof or counterexample.
Exercise 5.84. Determine the Taylor expansion of the function
x−y
(5.240) f : (0, ∞) × (0, ∞) → R, f (x, y) =
x+y
at the point (x, y) = (1, 1) up to order 2.
Exercise 5.85. Show that every continuous function f : [a, b] → [a, b] has a fixed
point.
Exercise 5.86. Let X be a real Banach space. Let B = {x ∈ X : kxk ≤ 1} and
∂B = {x ∈ X : kxk = 1}. Show that the following are equivalent:
(i) every continuous map f : B → B has a fixed point
(ii) there exists no continuous map r : B → ∂B such that r(b) = b for all b ∈ ∂B.
Exercise 5.87. Determine the local minima and maxima of the function
2 −4y 2
(5.241) f : R2 → R, f (x, y) = (4x2 + y 2 )e−x .
112 5. DIFFERENTIAL CALCULUS IN Rn

Exercise 5.88. Let E ⊂ Rn be open, f : E → R and x ∈ E. Assume that for y in


a neighborhood of 0 we have
X
(5.242) f (x + y) = cα y α + o(kykk )
|α|≤k

as y → 0 and
X
(5.243) f (x + y) = cα y α + o(kykk )
e
|α|≤k

as y → 0. Show that cα = e
cα for all |α| ≤ k.
Exercise 5.89. Let D = {(x, y) ∈ R2 : x2 + y 2 ≤ 1}. Determine the maximum
and minimum values of the function f : D → R, f (x, y) = 4x2 − 3xy.
Exercise 5.90. Let f ∈ C 2 (Rn ) and suppose that the Hessian of f is positive
definite at every point. Show that ∇f : Rn → Rn is an injective map.
Exercise 5.91. Let f ∈ C 2 (Rn ) be strongly convex. Show that ∇f : Rn → Rn is
a diffeomorphism (that is, show that it is differentiable, bijective and that its inverse is
differentiable).
Exercise 5.92. Let f (x) = 21 hAx, xi − hb, xi + c with A ∈ Rn×n and b ∈ Rn , c ∈ R.
Assume that A is symmetric and positive definite. Show that f has a unique global
minimum at some point x∗ and determine f (x∗ ) in terms of A, b, c.
Exercise 5.93. Prove that the point x∗ from Exercise 5.92 can be computed using
gradient descent: that is, if x0 ∈ Rn arbitrary and
(5.244) xn+1 = xn − α∇f (xn )
for n = 0, 1, 2, . . . , then the sequence (xn )n converges to x∗ for all starting points
x0 ∈ Rn , provided that α is chosen sufficiently small.
Exercise 5.94. Let D ⊂ R2 be a finite set. Define a function E : R3 → R by
X
(5.245) E(a, b, c) = (ax21 + bx1 + c − x2 )2 .
x∈D

(1) Show that E is convex.


(2) Does there exist a set D such that E is strongly convex? Proof or counterex-
ample.

Exercise 5.95. (a) Find a convex function that is not bounded from below.
(b) Find a strictly convex function that is not bounded from below.
(c) If a function is strictly convex and bounded from below, does it necessarily have a
critical point? (Proof or counterexample.)
Exercise 5.96. (a) Give an example of a convex function that is not continuous.
(b) Let f : (a, b) → R. Show that if f is convex, then f is continuous.
Exercise 5.97. Construct a strictly convex function f : R → R such that f is not
differentiable at x for every x ∈ Q.
7. FURTHER EXERCISES 113

Exercise 5.98. Let f ∈ C 2 (Rn ). Recall that we defined f to be strongly convex if


there exists β > 0 such that hD2 f |x y, yi ≥ βkyk2 for every x, y ∈ Rn . Show that f is
strongly convex if and only if there exists γ > 0 such that
(5.246) f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y) − γt(1 − t)kx − yk2
for all x, y ∈ Rn , t ∈ [0, 1].
(Consequently, that condition can serve as an alternative definition of strong convexity,
which is also valid if f is not C 2 .)
Exercise 5.99. (Recall Exercise 3.82 as motivation for this exercise.) Fix a function
σ ∈ C 1 (R) and define for x ∈ Rn , W ∈ Rm×n , v ∈ Rm ,
m
X
(5.247) µ(x, W, v) = σ((W x)i )vi
i=1

Given a finite set of points D = {(x1 , y1 ), . . . , (xN , yN ) ∈ Rn × R} define


N
X
(5.248) E(W, v) = (µ(xi , W, v) − yi )2 .
j=1

Is E necessarily convex? (Proof or counterexample.)


CHAPTER 6

The Baire category theorem*

Lecture 36 (Wednesday, November 27)


Let (X, d) be a metric space. Recall that the interior Ao of a set A ⊂ X is the set
of interior points of A, i.e. the set of all x ∈ A such that there exists ε > 0 such that
Bε (x) ⊂ A. A set A ⊂ X is dense if A = X. Note that A is dense if and only if for all
non-empty open sets U ⊂ X we have A ∩ U 6= ∅.
Definition 6.1. A set A ⊂ X is called nowhere dense if its closure has empty
o
interior. In other words, if A = ∅. Equivalently, A is nowhere dense if and only if A
contains no non-empty open set.
Remarks. 1. A closed set A ⊂ X has empty interior if and only if Ac = X \ A is open
and dense. (This is because A is closed if and only if Ac is open and A has empty
interior if and only if Ac is dense.)
2. A is nowhere dense if and only if Ac contains an open dense set.
3. A is nowhere dense if and only if A is contained in a closed set with empty interior.
Example 6.2. The Cantor set
∞ 3[`
−1
[
(6.1) C = [0, 1]\ ( 3k+1 , 3k+2 )
3`+1 3`+1
`=0 k=0

is a closed subset of [0, 1] and has empty interior. Therefore, it is nowhere dense.
Lemma 6.3. Suppose A1 , . . . , An ⊂ X are nowhere dense sets. Then nk=1 Ak is
S
nowhere dense.
Proof. Without loss of generality let n = 2. We need to show that A1 ∪ A2 has
c
empty interior. Equivalently, setting Uk = Ak for k = 1, 2. We show that U1 ∩ U2 is
dense. Let U ⊂ X be a non-empty open set. Then V1 = U ∩ U1 is open and non-empty,
because U1 is dense. Since U2 is also dense, V1 ∩ U2 = U ∩ (U1 ∩ U2 ) is non-empty, so
U1 ∩ U2 is dense. 
Also, a subset of a nowhere dense set is nowhere dense and the closure of a nowhere
dense set is nowhere dense.

However, countable unions of nowhere dense sets are not necessarily nowhere dense
sets.
Example 6.4. Enumerate the rationals as Q = {q
S1∞, q2 , . . . }. For every k = 1, 2, . . . ,
the set Ak = {qk } is nowhere dense in R. But Q = k=1 Ak ⊂ R is not nowhere dense
(it is dense!).
Definition 6.5. A set A ⊂ X is called meager (or of first category) in X if it is the
countable union of nowhere dense sets. A is called comeager (or residual or of second
category) if Ac is meager.
115
116 6. THE BAIRE CATEGORY THEOREM*

The above example shows that Q ⊂ R is meager. In fact, every countable subset of
R is meager (because single points are nowhere dense in R).
By definition, countable unions of meager sets are meager. The choice of the word
“meager” suggests that meager sets are somehow “small” or “negligible”. But how
“large” can meager sets be? For example, can X be meager? That is, can we write
the entire metric space X as a countable union of nowhere dense subsets? The Baire
category theorem will show that the answer is no, if X is complete.
Theorem 6.6 (Baire category theorem). In a complete metric space, meager sets
have empty interior. Equivalently, countable intersections of open dense sets are dense.
Corollary 6.7. Let X be a complete metric space and A ⊂ X a meager set. Then
A 6= X. In other words, X is not a meager subset of itself.
Example 6.8. The conclusion of the Baire category theorem fails if we drop the
assumption that X is complete: let X = Q with the metric inherited from R (so
d(p, q) = |p − q|). Then X is a meager subset of itself because it is countable and single
points are nowhere dense in X (X has no isolated points). But the interior of X is
non-empty, because X is open in X.
Example 6.9. Not every set with empty interior is meager: consider the irrational
numbers A = R \ Q. A has empty interior, because Ac = Q is dense. It is not meager,
because otherwise R = A ∪ Ac would be meager, which contradicts the Baire category
theorem.
Exercise 6.10. Another notion of “smallness” is the following:
Definition. A set A ⊂ R is called a Lebesgue null set if for every ε > 0 there exist
intervals I1 , I2 , . . . such that

[ ∞
X
(6.2) A⊂ Ij and |Ij | ≤ ε.
j=1 j=1

(Here |I| denotes the length of the interval I.)


Give an example of a comeager Lebesgue null set. (Recall that a set is called comeager
if its complement is meager.)
(This implies in particular that Lebesgue null sets are not necessarily meager and meager
sets are not necessarily Lebesgue null sets.)
For the proof of Theorem 6.6 we will need the following lemma.
Lemma 6.11. Let X be complete and A1 ⊃ A2 ⊃ · · · a decreasing sequence of
non-empty closed sets in X such that
(6.3) diam An = sup d(x, y) −→ 0
x,y∈An
T∞
as n → ∞. Then n=1 An is non-empty.
Proof of Lemma 6.11. For every n ≥ 1 we choose xn ∈ An . Then (xn )n is a
Cauchy sequence, because for all n ≥ m we have d(xn , xm ) ≤ diamAm → 0 as m → ∞.
Since X is complete, there exists x ∈ X such that limn→∞ xn = x. Let N ∈ N. Then
AN contains the sequence (xn )n≥N and since AN is closed, it must also contain the limit
of this sequence, so x ∈ AN . This proves that x ∈ ∞
T
N =1 N .
A 
6. THE BAIRE CATEGORY THEOREM* 117

T∞Proof of Theorem 6.6. Let (Un )n be open dense sets. We need to show that
T∞n=1 Un is dense. Let U ⊂ X be open and non-empty. It suffices to show that U ∩
n=1 Un is non-empty. Since U1 is open and dense, U ∩ U1 is open and non-empty.
Choose a closed ball B(x1 , r1 ) ⊂ U ∩ U1 with r1 ∈ (0, 1). Then B(x1 , r1 ) ∩ U2 is
open and non-empty (because U2 is dense), so we can choose a closed ball B(x2 , r2 ) ⊂
B(x1 , r1 ) ∩ U2 with r2 ∈ (0, 12 ). Iterating this process, we obtain a sequence of closed
balls (B(xn , rn ))n such that B(xn , rn ) ⊂ B(xn−1 , rn−1 ) ∩ Un and rn ∈ (0, n1 ). By Lemma
in ∞
T
6.11 there exists a point x contained
T∞ n=1 B(xn , rn ). Since B(xn , rn ) ⊂ U ∩ Un for
all n ≥ 1, we have x ∈ U ∩ n=1 Un . 
The Baire category theorem has a number of interesting consequences.
118 6. THE BAIRE CATEGORY THEOREM*

Lecture 37 (Monday, December 2)

1. Nowhere differentiable continuous functions*


Theorem 6.12. Let A ⊂ C([0, 1]) be the set of all functions that are differentiable
at at least one point in [0, 1]. Then A is meager.
Proof. For n ∈ N we define An to be the set of all f ∈ C([0, 1]) such that there
exists t ∈ [0, 1] such that
f (t + h) − f (t)
(6.4) ≤n
h
holds for all h ∈ R with t + h ∈ [0, 1]. Then

[
(6.5) A⊂ An .
n=1
It suffices to show that each An is nowhere dense. We first prove that An is closed. Let
(fk )k ⊂ An be a sequence that converges to some f ∈ C([0, 1]). We show that f ∈ An .
Indeed, by assumption, there exists (tk )k ⊂ [0, 1] such that
fk (tk + h) − fk (tk )
(6.6) ≤n
h
holds for all k ≥ 1 if tk + h ∈ [0, 1]. By the Bolzano-Weierstrass theorem, we may
assume without loss of generality that (tk )k converges to some t ∈ [0, 1] (by passing to
a subsequence). Then, by continuity of f ,
f (t + h) − f (t) fk (tk + h) − fk (tk )
(6.7) = lim ≤ n.
h k→∞ h
Therefore, f ∈ An and An is closed. Also, An has empty interior. Indeed, one can see
that C([0, 1]) \ An is dense because every f ∈ C([0, 1]) can be uniformly approximated
by a function that has arbitrarily large slope (think of “sawtooth” functions).
Exercise 6.13. Provide the details of this argument: show that An has empty
interior.

The Baire category theorem implies that A has empty interior. In other words, the
set of nowhere differentiable functions C([0, 1])\A is dense. In this sense, it is “generic”
behavior for continuous functions to be nowhere differentiable. In particular, we can
conclude that there exists f ∈ C([0, 1]) \ A (so f is nowhere differentiable) without
actually constructing such a function. On the other hand, one can also give explicit
examples of nowhere differentiable functions.
Example 6.14 (Weierstrass’ function). Consider the function f ∈ C([0, 1]) defined
as

X
(6.8) f (x) = b−nα sin(bn x),
n=0
where 0 < α < 1 and b > 1 are fixed. The function f is indeed continuous because
the series is uniformly convergent. In fact, f is the uniform limit of the sequence of
functions (fN )N considered in Exercise 2.44.
Exercise 6.15. Show that f is nowhere differentiable.
2. SETS OF CONTINUITY* 119

2. Sets of continuity*
Definition 6.16. Let X, Y be metric spaces and f : X → Y a map. The set
(6.9) Cf = {x ∈ X : f is continuous at x} ⊂ X
is called the set of continuity of f . Similarly, X \ Cf is called the set of discontinuity
of f .
Example 6.17. Let f : R → R be defined by f (x) = 1 if x is rational and f (x) = 0
if x is irrational. Then Cf = ∅.
Example 6.18. Let f : R → R be defined by f (x) = x if x is rational and f (x) = 0
if x is irrational. Then Cf = {0}.
Example 6.19. Consider the function f : R → R defined as follows: we set f (0) = 1
and if x ∈ Q \ {0}, then we let f (x) = 1/q, where x = pq , where p ∈ Z, q ∈ N and the
greatest common divisor of p and q is one. If x 6∈ Q, then we let f (x) = 0. We claim
that Cf = R \ Q. Indeed, say x ∈ R \ Q and pn /qn → x a rational approximation. Then
qn → ∞ (otherwise, it must converge and then x would be rational). √This implies that
f is continuous at x. On the other hand, say x ∈ Q. Set xn = x + n2 . Then xn 6∈ Q

because 2 6∈ Q, so f (xn ) = 0 for all n, so limn→∞ f (xn ) = 0, but f (x) 6= 0. Hence f
is not continuous at x.
It is natural to ask which subsets of X arise as the set of continuity of some function
on X. For instance, does there exist a function f : R → R such that Cf = Q ?
Definition 6.20. A set A ⊂ X is called an Fσ -set if it is a countable union of
closed sets. A set G ⊂ X is called a Gδ -set if it is a countable intersection of open sets.
These names are motivated historically. The F in Fσ is for fermé which is French
for closed. On the other hand, the G in Gδ is for Gebiet which is German for region.
Examples 6.21. 1. Every open set is a Gδ -set and every closed set is an Fσ -set.
2. Let x ∈ X. Then {x} is a Gδ -set: S
it is the intersection of the open balls B(x, 1/n).
3. Q ⊂ R is an Fσ set, because Q = q∈Q {q} (a countable union of closed sets).
Theorem 6.22. Let X and Y be metric spaces and f : X → Y a map. Then
Cf ⊂ X is a Gδ -set and X \ Cf is an Fσ -set.
Proof. Let f : X → Y be given. It suffices to show that Cf is a Gδ -set. For every
S ⊂ X we define the oscillation of f on S by
(6.10) ωf (S) = sup dY (f (x), f (x0 )) = diam f (S).
x,x0 ∈S

For a point x ∈ X we define the oscillation of f at x by


(6.11) ωf (x) = inf ωf (B(x, ε)).
ε>0

Then we have
(6.12) x ∈ Cf ⇐⇒ ωf (x) = 0
and we can write the set of continuity of f as

\
(6.13) Cf = {x ∈ X : ωf (x) < n1 }.
n=1
120 6. THE BAIRE CATEGORY THEOREM*

We are done if we can show that Un = {x ∈ X : ωf (x) < n1 } is open for every n ∈ N. Let
x0 ∈ Un . Then ωf (x0 ) < n1 . Therefore, there exists ε > 0 such that ωf (B(x0 , ε)) < n1 .
Let x ∈ B(x0 , ε/2). Then by the triangle inequality, B(x, ε/2) ⊂ B(x0 , ε). Therefore,
(6.14) ωf (x) ≤ ωf (B(x, ε/2)) ≤ ωf (B(x0 , ε)) < n1 .
Thus, B(x0 , ε/2) ⊂ Un and so Un is open. 
As a sample application of the Baire category theorem we now answer one of our
previous questions negatively:
Lemma 6.23. Q ⊂ R is not a Gδ -set. Consequently, there exists no function f :
R → R such that Cf = Q.
Proof. Suppose Q is a Gδ -set. Then R \ Q is an Fσ -set and therefore can be
written as a countable union of closed sets A1 , A2 , . . . . Since R \ Q has empty interior
(its complement Q is dense), An ⊂ R \ Q also has empty interior for every n. Thus An
is nowhere dense, so R \ Q is meager. But then R = Q ∪ (R \ Q) must be meager, which
contradicts the Baire category theorem. 
Remark. Observe that an Fσ -set is either meager or has non-empty interior: suppose
A ⊂ X is an Fσ -set with empty interior. Then it is a countable union of closed sets with
empty interior and therefore meager. Similarly, a Gδ -set is either comeager or not dense.

Remark. It is natural to ask if the converse of Theorem 6.22 is true in the following
sense: given a Gδ -set G ⊂ X, can we find a function f : X → R such that Cf = G ?
This cannot hold in general: suppose X contains an isolated point, that is X contains an
open set of the form {x}. Then necessarily x ∈ Cf , but x is not necessarily contained in
every possible Gδ -set. However, this turns out to be the only obstruction: if X contains
no isolated points, then for every Gδ -set G ⊂ X one can find f : X → R such that
Cf = G. For a very short proof of this, see S. S. Kim: A Characterization of the Set
of Points of Continuity of a Real Function. Amer. Math. Monthly 106 (1999), no. 3,
258—259.
3. THE UNIFORM BOUNDEDNESS PRINCIPLE* 121

Lecture 38 (Wednesday, December 4)

3. The uniform boundedness principle*


The following theorem is one of the cornerstones of functional analysis and is a
direct application of the Baire category theorem.
Theorem 6.24 (Banach-Steinhaus). Let X be a Banach space and Y a normed
vector space. Let F ⊂ L(X, Y ) be a family of bounded linear operators. Then
(6.15) sup kT xkY < ∞ for all x ∈ X ⇐⇒ sup kT kop < ∞.
T ∈F T ∈F

In other words, a family of bounded linear operators is uniformly bounded if and only
if it is pointwise bounded.
This theorem is also called the uniform boundedness principle.
Proof. In the ’⇐’ direction there is nothing to show. Let us prove ’⇒’. Suppose
that supT ∈F kT xkY < ∞ for all x ∈ X. Define
(6.16) An = {x ∈ X : sup kT xkY ≤ n} ⊂ X.
T ∈F

An is a closed set: if (xk )k ⊂ An is a sequence with xk → x ∈ X, then since T


is continuous, kT xkY = limk→∞ kT xk kY ≤ n for all T ∈ F, so x ∈ An . Also, the
assumption supT ∈F kT xkY < ∞ for all x ∈ X implies that

[
(6.17) X= An .
n=1

By the Baire category theorem, X is not meager. Thus, there exists n0 ∈ N such that
An0 has non-empty interior. This means that there exists x0 ∈ An0 and ε > 0 such that
(6.18) B(x0 , ε) ⊂ An0 .
Let x ∈ X be such that kxkX ≤ ε. Then for all T ∈ F,
(6.19) kT xkY = kT (x0 − x) − T x0 kY ≤ kT (x0 − x)kY + kT x0 kY ≤ 2n0 .
Now we use the usual scaling trick: let x ∈ X satisfy kxkX = 1. Then
(6.20) kT xkY = ε−1 kT (εx)kY ≤ 2ε−1 n0 .
This implies
(6.21) sup kT kop = sup sup kT xkY ≤ 2ε−1 n0 < ∞.
T ∈F T ∈F kxkX =1


Example 6.25. If X is not complete, then the conclusion of the theorem may fail.
For instance, let X be the space of all sequences (xn )n ⊂ R such that at most finitely
many of the xn are non-zero. Equip X with the norm kxk∞ = supn∈N |xn |. Define
`n : X → R by `n (x) = nxn . `n is a bounded linear map because
(6.22) |`n (x)| = |nxn | ≤ nkxk∞ .
For every x ∈ X there exists Nx ∈ N such that xn = 0 for all n > Nx .This implies that
(6.23) sup |`n (x)| = max{|`n (x)| : n = 1, . . . , Nx } < ∞.
n∈N
122 6. THE BAIRE CATEGORY THEOREM*

But k`n kop ≥ n because |`n (en )| = n (where en denotes the sequence such that en (m) =
0 for every m 6= n and en (n) = 1). Thus,
(6.24) sup k`n kop = ∞.
n∈N

Remark. In the proof we only needed that X is not meager. This is true if X is
complete, but it may also be true for an incomplete space.
As a first application of the uniform boundedness principle we prove that the point-
wise limit of a sequence of bounded linear operators on a Banach space must be a
bounded linear operator.
Corollary 6.26. Let X be a Banach space and Y a normed vector space. Suppose
(Tn )n ⊂ L(X, Y ) is such that (Tn x)n converges to some T x for every x ∈ X. Then
T ∈ L(X, Y ).
Proof. Linearity of T follows from linearity of limits. It remains to show that T is
bounded. Let x ∈ X. Since (Tn x)n converges, we have supn kTn xkY < ∞ (convergent
sequences are bounded). By the Banach-Steinhaus theorem, there exists C ∈ (0, ∞)
such that kTn kop ≤ C for every n. Let x ∈ X. Then
(6.25) kT xkY = lim kTn xkY ≤ CkxkX .
n→∞

Remark. Note that in the context of Corollary 6.26 it does not follow that Tn → T in
L(X, Y ). For instance, let Tn : `1 → `1 and Tn (x) = xn en . Then Tn (x) → 0 as n → ∞
for every x ∈ `1 , but kTn kop = 1 for every n ∈ N, so Tn does not converge to 0 in
L(X, Y ).
3. THE UNIFORM BOUNDEDNESS PRINCIPLE* 123

Lecture 39 (Friday, December 6)

3.1. An application to Fourier series. Recall that for a 1-periodic continuous


function f : R → C we defined the partial sums of its Fourier series by
N
X
(6.26) SN f (x) = cn e2πinx = f ∗ DN (x),
n=−N
R1 sin(2π(N + 12 )x)
where cn = 0 f (t)e−2πitn dt and DN (x) = N 2πixn
P
n=−N e = sin(πx)
is the Dirichlet
kernel (see Section 4).
The uniform boudnedness principle directly implies the following:
Corollary 6.27. Let x0 ∈ R. There exists a 1-periodic continuous function f such
that the sequence (SN f (x0 ))N ⊂ C does not converge. That is, the Fourier series of f
does not converge at x0 .
In particular, this means that the Dirichlet kernels do not form an approximation
of unity. To see why this is a consequence of the uniform boudnedness principle, we
first need to take another close look at the partial sums.
Lemma 6.28. There exists a constant c ∈ (0, ∞) such that for every N ∈ N,
Z 1
(6.27) |DN (x)|dx ≥ c log(N ).
0

Proof. Since | sin(x)| ≤ |x|,


| sin(2π(N + 12 )x)| | sin(2π(N + 21 )x)|
Z 1 Z 1 Z 1
−1
(6.28) |DN (x)|dx = dx ≥ π dx.
0 0 | sin(πx)| 0 x
Changing variables 2π(N + 12 )x 7→ x we see that the right hand side of this display
equals
Z π(2N +1) 2N Z π(k+1)
−1 | sin(x)| −1
X | sin(x)|
(6.29) π dx = π dx.
0 x k=0 πk
x
We have that
2N Z π(k+1) 2N πk+ π2 + 100
π 2N Z πk+ π + π
| sin(x)| | sin(x)|
Z
X X X 2 100 dx
(6.30) dx ≥ dx ≥ c .
k=0 πk x k=0 πk+ π2 − 100
π x π π
k=0 πk+ 2 − 100
x
Here we have used that | sin(x)| ≥ c for some positive number c whenever |x| is at most
π
100
away from πk + π2 for some integer k ∈ Z (indeed, | sin(x)| ≥ sin(π/2 − π/100) > 0
for such x). Since x 7→ 1/x is a decreasing function,
Z πk+ π + π
2 100 dx 1 1
π 1
(6.31) ≥ 50 · π π ≥ 50 · .
π π
πk+ 2 − 100 x πk + 2 + 100 k+1
Thus,
(6.32)
2N Z πk+ π2 + 100
π 2N 2N Z k+2 Z 2N +2
X dx 1
X 1 1
X dx 1 dx 1
≥ 50
≥ 50
= 50
= 50
log(2N + 2),
k=0 πk+ π2 − 100
π x k=0
k+1 k=0 k+1 x 1 x
which implies the claim. 
124 6. THE BAIRE CATEGORY THEOREM*

Let us denote the space of 1-periodic continuous functions f : R → C by C(T) (here


T = R/Z = S 1 is the unit circle, which is a compact metric space1). Then C(T) is a
Banach space. Fix x0 ∈ R. We can define a linear map TN : C(T) → C by
(6.33) TN f = SN f (x0 ).
Lemma 6.29. For every N ∈ N, TN : C(T) → C is a bounded linear map and
(6.34) kTN kop = kDN k1 .
R1
(Here kDN k1 = 0
|DN (x)|dx.)
Proof. For every f ∈ C(T) we have
(6.35) Z 1 Z 1
|TN f | = |f ∗ DN (x0 )| ≤ |f (x0 − t)DN (t)|dt ≤ kf k∞ |DN (t)|dt = kf k∞ kDN k1 .
0 0

Therefore, TN is bounded and kTN kop ≤ kDN k1 . To prove the lower bound we let
(6.36) f (x) = sgn(DN (x0 − x)).
While f is not a continuous function, it can be approximated by continuous functions
as the following exercise shows.
Exercise 6.30. Show that for every ε > 0 there exists g ∈ C(T) such that |g(t)| ≤ 1
for all t ∈ R and
Z 1
ε
(6.37) |f (t) − g(t)|dt ≤
0 2N + 1
Hint: Modify the function f in a small enough neighborhood of each discontinuity; g
can be chosen to be a piecewise linear function.
So let ε > 0 and choose g ∈ C(T) as in the exercise. We have
Z 1 Z 1
(6.38) |TN f | = |f ∗ DN (x0 )| = sgn(DN (t))DN (t)dt = |DN (t)|dt = kDN k1 .
0 0
Moreover,
(6.39) |TN g| ≥ |TN f | − |TN (f − g)|,
The error term |TN (f − g)| can be estimated as follows:
(6.40) Z 1 Z 1
ε
|TN (f −g)| ≤ |DN (x0 −t)||f (t)−g(t)|dt ≤ kDN k∞ |f (t)−g(t)|dt ≤ (2N +1) = ε.
0 0 2N + 1
so
(6.41) kTN kop ≥ |TN g| ≥ kDN k1 − ε.
Since ε > 0 was arbitrary, this implies kTN kop ≥ kDN k1 . 
Armed with this knowledge, we can now reveal Corollary 6.27 as a direct conse-
quence of Theorem 6.24. Indeed, we have that
(6.42) kTN kop = kDN k1 ≥ c log(N )
1Themetric being the quotient metric inherited from R or the subspace metric induced by the
inclusion S 1 ⊂ R2 . These metrics are equivalent.
3. THE UNIFORM BOUNDEDNESS PRINCIPLE* 125

and therefore
(6.43) sup kTN kop = ∞.
N ∈N

So by Theorem 6.24 there must exist an f ∈ C(T) such that


(6.44) sup |TN f | = ∞.
N ∈N

In otherwords, (SN f (x0 ))N does not converge.

Remark. Continuous functions with divergent Fourier series can also be constructed
explicitly. The conclusion of Corollary 6.27 can be strengthened significantly: for every
Lebesgue null set A ⊂ T 2 there exists a continuous function whose Fourier series
diverges on A (see J.-P. Kahane, Y. Katznelson: Sur les ensembles de divergence des
séries trigonométriques, Studia Math. 26 (1966), 305–306.).
On the other hand, L. Carleson proved in 1966 that the Fourier series of a continuous
function must always converge almost everywhere (that is, everywhere except possibly
on a Lebesgue null set). This is a very deep result in Fourier analysis which is difficult
to prove (see M. Lacey, C. Thiele: A proof of boundedness of the Carleson operator,
Math. Res. Lett. 7 (2000), no. 4, 361—370 for a very elegant proof).

2See Exercise 6.10 for a definition on R; Lebesgue null sets of T are precisely the images of Lebesgue
null sets on R under the canonical quotient map R → R/Z = T.
126 6. THE BAIRE CATEGORY THEOREM*

Lecture 40 (Monday, December 9)

4. Kakeya sets*
Definition 6.31. We call a compact set A ⊂ Rn a Kakeya set if A contains a unit
line segment in every direction. That is, if for every v ∈ Rn with kvk = 1 there exists
x ∈ A such that x + tv ∈ A for all t ∈ [0, 1].
(Note that this is only an interesting concept if n ≥ 2.)
Example 6.32. Consider the unit disk A = {x ∈ R2 : kxk ≤ 1}. Clearly 0+tv ∈ A
for every t ∈ [0, 1] and v ∈ R2 with kvk = 1, so A is a Kakeya set in R2 . The area of
the unit disk is π/4.
Example 6.33. Let A be the compact set the boundary of which is the deltoid
curve defined by γ(t) = ( 12 cos(t) + 14 cos(2t), 21 sin(t) − 41 sin(2t)) for t ∈ R. It can be
seen that A is a Kakeya set and has area π/8 (draw a picture).
Do there exist Kakeya sets in R2 with even smaller area? What is the smallest
possible “area” or “volume” of a Kakeya set in Rn ?
While we are not going to attempt a rigorous definition of the notion of “volume” for
an arbitrary subset of Rn at this point (this leads to a subject of its own, called measure
theory), we can easily make rigorous what we mean by a subset of “zero volume”.
Definition 6.34. A set in A ⊂ Rn is called a Lebesgue null set (or of Lebesgue
measure zero) if for every ε > 0 there exist (x1 , r1 ), (x2 , r2 ), . . . with xi ∈ Rn and ri > 0
such that
[∞ X∞
(6.45) A⊂ B(xi , ri ) and rin ≤ ε.
i=1 i=1
In other words, A is a Lebesgue null set if it can be covered by countably many balls
the combined volume of which can be made arbitrarily small. Intuitively, Lebesgue null
sets are sets of “volume zero”.
The surprising answer to our question on the smallest possible volume of Kakeya
sets is that Kakeya sets may have volume zero.
Theorem 6.35. Let n ≥ 2. There exists a compact set K ⊂ Rn such that K is a
Kakeya set and a Lebesgue null set.
Remark. Many explicit constructions of such sets have been described in the literature.
The first example (for the case n = 2) was given by Besicovitch in 1926. Therefore such
sets are also called Besicovitch sets.

We will give a non-constructive proof using the Baire category theorem. This proof
first appeared in T. W. Körner: Besicovitch via Baire, Stud. Math. 158 (2003), no.
1, 65–78.

To simplify the exposition we only consider the case n = 2, but the method can be
extended to any n ≥ 3. To apply the Baire category theorem, we need to work in a
complete metric space. Let K denote the set of all non-empty compact subsets of R2 .
We need to define a metric on K. For a point x ∈ Rn and a set A ∈ K we define
(6.46) d(x, A) = inf ka − xk.
a∈A
4. KAKEYA SETS* 127

For A, B ∈ K we define
(6.47) d(A, B) = max(sup d(a, B), sup d(A, b)).
a∈A b∈B
This is called Hausdorff metric.
Exercise 6.36. Show that d is a metric on K and that (K, d) is a complete metric
space.
Consider the set of all P ∈ K with P ⊂ [−1, 1] × [0, 1] which are of the form
[
(6.48) P = `i ,
i∈I

where I is some index set and for each i ∈ I, there exist x1 , x2 ∈ [−1, 1] such that `i is
the line segment connecting the point (x1 , 0) to the point (x2 , 1). We define P ⊂ K to be
the set of all such P such that additionally for every |v| ≤ 12 there exist x1 , x2 ∈ [−1, 1]
such that x2 −x1 = v and the line segment connecting (x1 , 0) to (x2 , 1) is contained in P .

This definition ensures that sets P ∈ P are “almost” Kakeya sets in the sense that
while they do not contain a line segment in every direction, they do contain a line
segment pointing in every direction that makes a sufficently small angle with the y-
axis. We can always produce a true Kakeya set from such a P by taking a finite union
of some rotated copies of P .
Exercise 6.37. Show that P is a closed subset of K (with respect to the Hausdorff
metric d).
This implies in particular that (P, d|P×P ) is a complete metric space. Thus, we are
done if we can show that there exists a set P ∈ P that has Lebesgue measure zero. We
will actually show the following stronger result.
Theorem 6.38 (Körner). The set
(6.49) B = {P ∈ P : P is a Lebesgue null set} ⊂ P
is comeager.
The Baire category theorem says that comeager subsets of complete metric spaces
are dense and in particular, non-empty.
To prove Theorem 6.38 it suffices to show that B contains a countable intersection
of open dense sets.
Let v ∈ [0, 1] and ε > 0. Then we define P(v, ε) ⊂ P to be the set of all P ∈ P such
that there exist finitely many intervals I1 , . . . , IN such that if y ∈ [0, 1] ∩ [v − ε, v + ε],
then
N
[ XN
(6.50) {x : (x, y) ∈ P } ⊂ Ij and |Ij | < 100ε.
j=1 j=1

Lemma 6.39. P(v, ε) ⊂ P is open and dense.


This is the main ingredient of the argument. Please refer to Lemma 2.4 in Körner’s
paper for the proof.
Now we show how this allows us to complete the proof of Theorem 6.38. Suppose
that
\\ n
(6.51) P ∈ P( nr , n1 ).
n∈N r=0
128 6. THE BAIRE CATEGORY THEOREM*

Then by definition of Lebesgue null sets and (6.50),


(6.52) {x : (x, y) ∈ P } ⊂ [−1, 1]
is a Lebesgue null set for every y ∈ [0, 1]. This implies that P is a Lebesgue null set3.
Therefore,
\\ n
(6.53) P( nr , n1 ) ⊂ B,
n∈N r=0

so B contains a countable intersection of open dense sets and is therefore comeager.


4.1. Box counting dimension. Consider a compact subset A ⊂ Rn . Let δ ∈
(0, 1) and define Nδ (A) to be the minimum number of balls of radius δ required to
cover the set A (it is clear that Nδ (A) is finite because A ⊂ Rn is totally bounded). We
are interested in the rate of growth of the number Nδ (A) as δ tends to zero.
Example 6.40. Let k ≤ n and let A denote a k-dimensional box in Rn :
(6.54)
A = [0, 1]k × {0}n−k = {x ∈ Rn : xj ∈ [0, 1] for 1 ≤ j ≤ k, xj = 0 for k < j ≤ n}.
Then there exist constants c, c0 > 0 such that
(6.55) c0 δ −k ≤ Nδ (A) ≤ cδ −k
for all δ ∈ (0, 1).
Exercise 6.41. Let A ⊂ Rn be a compact set. Show that there exists a constant
c ∈ (0, ∞) such that
(6.56) Nδ (x) ≤ c · δ −n
holds for all δ > 0.
Definition 6.42. Let A ⊂ Rn be a compact set. The upper box counting dimension
of A is defined as
log(Nδ (A))
(6.57) dim(A) = lim sup .
δ→0 log(1/δ)
Similarly, the lower box counting dimension of A is defined as
log(Nδ (A))
(6.58) dim(A) = lim inf .
δ→0 log(1/δ)
We always have
(6.59) 0 ≤ dim(A) ≤ dim(A) ≤ n.
The first two of these inequalities follow directly from the definitions and the third
inequality follows from Exercise 6.41. Each of these inequalities may be strict.
If dim(A) = dim(A) = d, then we say that d is the box counting dimension (or
Minkowski dimension) of A and write
(6.60) dim(A) = d.
3Thisis not obvious directly from the definitions. For the purpose of this discussion, we will take
this implication for granted. It follows directly from properties of the Lebesgue integral, more precisely,
Fubini’s theorem. Intuitively, if every “horizontal slice” of a subset of the plane has zero length in R,
then that subset of the plane has zero area.
4. KAKEYA SETS* 129

The numbers dim(A), dim(A) do not depend on the norm on Rn used to form the
balls that appear in the definition of Nδ (A) (because the number Nδ (A) only changes
by a multiplicative constant when swapping out norms). The balls in the maximum
norm on Rn defined by kxk∞ = maxi=1,...,n |xi | look like boxes. This motivates the term
“box counting dimension”.

This notion of dimension conincides with our intuition about dimension. For in-
stance, the set A from Example 6.40 which we referred to as a “k-dimensional box”
actually has box counting dimension k. Note that there is no reason why the box
counting dimension of some given set A should always be an integer. In fact, there are
lots of compact sets with a non-integer box counting dimension. We refer to such sets
as fractals (because they have fractional dimension).
Example 6.43. Maybe the simplest example of a fractal is the Cantor set C ⊂ [0, 1]
(see (6.1)). In the iterative construction of the Cantor set, at the kth step we arrive at
a disjoint union of 2k closed intervals each of which has length 3−k . Thus
(6.61) N 1 ·3−k (C) = 2k .
2

Similarly, Nδ (C) ≈ 2 where δ ∈ (0, 1) and k is such that 3−k−1 < δ ≤ 3−k . This shows
k

that
log(2)
(6.62) 0 < dim(C) = < 1.
log(3)
Dimension is related the notion of Lebesgue null sets in the following way.

Lemma 6.44. If A ⊂ Rn is a compact set such that dim(A) < n, then A is a


Lebesgue null set.
Proof. Let ν = 12 (n − dim(A)) > 0. By assumption, there exists a sequence (δm )m
such that δm → 0 and for each m there exist open balls (B(xm,j , δm ))j=1,...,Nm covering
−(n−ν)
A, where Nm ≤ δm . Then
Nm
X
n n ν
(6.63) δm = Nm δm ≤ δm −→ 0 as m → ∞.
j=1


n
Example 6.45. It is not true that a Lebesgue null set in R necessarily has box
counting dimension n: take the set A = Q ∩ [0, 1] ⊂ R (A is not compact, but it
still makes sense to speak of its box counting dimension). It is not hard to show that
dim(A) = 1.
In view of this fact and the existence of Besicovitch sets, it is a natural instance of
our original question about the smallest possible “size” of a Kakeya set to ask whether
there exist Besicovitch sets in Rn that have a box counting dimension strictly smaller
than n. It is conjectured that the answer is ’no’ for all n.

Kakeya conjecture. Let K ⊂ Rn be a Kakeya set. Then dim(K) = n.

This is known to hold if n = 2 (and trivial if n = 1), but still widely open if n ≥ 3.
See Exercise 6.53 below for a walkthrough to a simple proof that dim(K) ≥ n+1 2
. Wolff
(1995) proved that dim(K) ≥ n+2 2
. The currently best known results are as follows:
130 6. THE BAIRE CATEGORY THEOREM*


n = 2: dim(K) = 2 (Davies 1971)
n = 3: dim(K) ≥ 52 + 10−10 (Katz-Laba-Tao 1999)

n = 4: dim(K) ≥ 3 + 10−10 (√
• Laba-Tao 2000)

4 < n < 24: dim(K) ≥ (2 − 2)(n − 4) + 3 (Katz-Tao 2001)
n ≥ 24: dim(K) ≥ n/α+(α−1)/α, where α ∈ (1, 2) is such that α3 −4α+2 = 0

(Katz-Tao 2001)
The Kakeya conjecture has many surprising connections to other open problems in
mathematics, in particular Fourier analysis.

5. Further exercises
Exercise 6.46. We define the subset A ⊂ R as follows: x ∈ A if and only if there
exists c > 0 such that
(6.64) |x − j2−k | ≥ c2−k
holds for all j ∈ Z and integers k ≥ 0. Show that A is meager and dense.
Exercise 6.47. Show that the set A from Exercise 6.46 is a Lebesgue null set.
Exercise 6.48. Let (X, d) be a complete metric space without isolated points.
Prove that X cannot be countable.
Exercise 6.49. (i) Show that if X is a normed vector space and U ⊂ X a proper
subspace, then U has empty interior.
(ii) Let
(6.65) X = {P : R → R | P is a polynomial}.
Use the Baire category theorem to prove that there exists no norm k · k on X such that
(X, k · k) is a Banach space.
(iii) Let X be an infinite dimensional Banach space. Prove that X cannot have a
countable (linear-algebraic) basis.
Exercise 6.50. Consider X = C([−1, 1]) with the usual norm kf k∞ = supt∈[−1,1] |f (t)|.
Let
(6.66) A+ = {f ∈ X : f (t) = f (−t) ∀t ∈ [−1, 1]},

(6.67) A− = {f ∈ X : f (t) = −f (−t) ∀t ∈ [−1, 1]}.


(i) Show that A+ and A− are meager.
(ii) Is A+ + A− = {f + g : f ∈ A+ , g ∈ A− } meager?
Exercise 6.51. Construct a function f : R → R such that f is continuous at every
x ∈ Z and discontinuous at every x 6∈ Z.
Exercise 6.52. For every interval (open, half-open or closed) I ⊂ R give an example
of a function f : R → R such that f is continuous on I and discontinuous on R \ I.
1
Exercise 6.53. Let 0 < δ  1 (say δ < 10 ) and n ≥ 2. A δ-tube is a rectangular
n
box in R of dimensions 1 × δ × · · · × δ. We call a collection of δ-tubes δ-separated if
every two distinct tubes make an angle of at least δ. Let K ⊂ Rn be a Kakeya set and
denote by K(δ) its δ-neighborhood:
(6.68) K(δ) = {x ∈ Rn : dist(x, K) ≤ δ}.
5. FURTHER EXERCISES 131

Then K(δ) must contain a δ-tube in every direction. Let Tδ denote a maximal δ-
separated collection of δ-tubes contained in K(δ) (then Tδ contains roughly δ 1−n many
δ-tubes). If A ⊂ Rn is a finite union of δ-tubes, then we denote by vol(A) the volume
of A.
(i) Prove that there must exist a point x ∈ K(δ) such that the number of tubes
T ∈ Tδ such that x ∈ T is at least c/vol(∪Tδ ), where c > 0 is a constant
depending only on the dimension, n.
1
(ii) Conclude from (i) that there exists c > 0 such that for every δ ∈ (0, 10 ):
n−1
(6.69) vol(∪Tδ ) ≥ c · δ 2

(iii) Conclude from (ii) that


n+1
(6.70) dim(K) ≥ 2
.
1
(iv) Suppose that for every ε > 0 there exists cε > 0 such that for every δ ∈ (0, 10 )
we have
(6.71) vol(∪Tδ ) ≥ cε δ ε .
Show that this would imply the Kakeya conjecture: dim(K) = n.
S
(Here ∪Tδ = T ∈Tδ T .)

You might also like