Lecture Notes
Lecture Notes
Lecture Notes
Joris Roos
UW Madison, Fall 2019
Last update: December 14, 2019
Contents
Chapter 1. Review 5
1. Metric spaces 5
2. Uniform convergence 6
3. Power series 7
4. Further exercises 11
Chapter 2. Compactness in metric spaces 13
1. Compactness and continuity 14
2. Sequential compactness and total boundedness 17
3. Equicontinuity and the Arzelà-Ascoli theorem 22
4. Further exercises 26
Chapter 3. Approximation theory 29
1. Polynomial approximation 29
2. Orthonormal systems 32
3. The Haar system 38
4. Trigonometric polynomials 42
5. The Stone-Weierstrass Theorem 51
6. Further exercises 53
Chapter 4. Linear operators and derivatives 59
1. Equivalence of norms 62
2. Dual spaces* 64
3. Sequential `p spaces* 65
4. Derivatives 68
5. Further exercises 73
Chapter 5. Differential calculus in Rn 75
1. The contraction principle 79
2. Inverse function theorem and implicit function theorem 80
3. Ordinary differential equations 86
4. Higher order derivatives and Taylor’s theorem 96
5. Local extrema 101
6. Optimization and convexity* 102
7. Further exercises 109
Chapter 6. The Baire category theorem* 115
1. Nowhere differentiable continuous functions* 118
2. Sets of continuity* 119
3. The uniform boundedness principle* 121
4. Kakeya sets* 126
5. Further exercises 130
3
4 CONTENTS
Disclaimer:
• This content is based on various sources, mainly Principles of Mathematical
Analysis by Walter Rudin, various individual lecture notes by Andreas Seeger,
and my own notes. For my own convenience, I will not reference sources
individually throughout these notes.
• These notes are likely to contain typos, errors and imprecisions of all kinds.
Possibly lots. Some might be deliberate, some less so. Don’t ever take anything
that you read in a mathematical text for granted. Think hard about what you
are reading and try to make sense of it independently. If that fails, then it’s
time to ask somebody a question. That usually helps. If you do notice a
mistake or an inaccuracy, feel free to let me know.
• Thanks to the students of Math 522 for many useful questions and remarks
that have improved these lecture notes.
Some recommended literature for further reading:
There are many books on mathematical analysis each of which likely has a large
intersection with this course. Here are two very good ones:
• W. Rudin, Principles of Mathematical Analysis
• T. Apostol, Mathematical analysis: A modern approach to advanced calculus
For further reading on Fourier series and trigonometric polynomials, see:
• E. M. Stein, R. Shakarchi, Fourier Analysis (modern and very accessible for
beginners)
• Y. Katznelson, Introduction to Harmonic Analysis (slightly more advanced)
• A. Zygmund, Trigonometric Series (a classic that continues to be relevant
today)
We will sometimes dip into concepts from functional analysis. For instance, exposi-
tions of the Baire category theorem and its consequences are also contained in
• W. Rudin, Real and Complex Analysis (Chapter 5)
• E. M. Stein, R. Shakarchi, Functional Analysis (Chapter 4)
We roughly assume knowledge of the content of Rudin’s book up to Chapter 7 up
to (excluding) equicontinuity, but some of the material in previous chapters will also
be repeated (everything related to compactness for instance).
CHAPTER 1
Review
1. Metric spaces
Definition 1.0 (Metric space). A non-empty set X equipped with a map d :
X × X → [0, ∞) is called a metric space if for all x, y, z ∈ X,
(1) d(x, y) = d(y, x)
(2) d(x, z) ≤ d(x, y) + d(y, z)
(3) d(x, y) = 0 if and only if x = y
d is called a metric.
We will use the following notations for (closed) balls in X:
(1.1) B(x0 , r) = {x ∈ X : d(x, x0 ) < r}, B(x0 , r) = {x ∈ X : d(x, x0 ) ≤ r}.
We write B(x0 , r) for the closure of B(x0 , r). Note that B(x0 , r) ⊂ B(x0 , r) ⊂
B(x0 , r), but each of these inclusions may be proper.
Should multiple metric spaces be involved we use subscripts on the metric and balls
to indicate which metric space we mean, i.e. dX refers to the metric of X and BX (x0 , r)
is a ball in the metric space X.
The most important examples of metric spaces for the purpose of this lecture are
R, C, Rn , subsets thereof and Cb (X), the space of bounded continuous functions on a
metric space X which will be introduced later.
Definition 1.1 (Convergence). Let X be a metric space, (xn )n ⊂ X a sequence
and x ∈ X. We say that (xn )n converges to x if for all ε > 0 there exists N ∈ N such
that for all n ≥ N it holds that d(xn , x) < ε.
Definition 1.2 (Continuity). Let X, Y be metric spaces. A map f : X → Y is
called continuous at x ∈ X if for all ε > 0 there exists δ > 0 such that if dX (x, y) < δ,
then dY (f (x), f (y)) < ε. f is called continuous if it is continuous at every x ∈ X. We
also write f ∈ C(X, Y ).
We assume familiarity with basic concepts of metric space topology except for com-
pactness: open sets, closed sets, limit points, closure, completeness, dense sets, con-
nected sets, etc. We will discuss compactness in metric spaces in detail in Section 2.
2. Uniform convergence
Definition 1.3. A sequence (fn )n of functions on a metric space is called uniformly
convergent to a function f if for all ε > 0 there exists N ∈ N such that for all n ≥ N
and all x ∈ X,
(1.2) |fn (x) − f (x)| < ε.
Compare this to pointwise convergence. To see the difference between the two it
helps to write down the two definitions using the symbolism of predicate logic:
Fact 1.7. Let X be a metric space. The space of bounded continuous functions
Cb (X) is a complete metric space with the supremum metric
(1.5) d∞ (f, g) = sup |f (x) − g(x)|.
x∈X
3. Power series
A power series is a function of the form
X∞
(1.8) f (x) = cn x n
n=0
(with the convention that if lim supn→∞ |cn |1/n = 0, then R = ∞.)
8 1. REVIEW
−R 0 R
∞
X (−1)n 2n+1
(1.14) sin(x) = x
n=0
(2n + 1)!
Fact 1.20. The functions sin and cos are differentiable and
(1.15) sin0 (x) = cos(x), cos0 (x) = − sin(x)
3. POWER SERIES 9
The trigonometric functions are related to the exponential function via complex
numbers.
Fact 1.21 (Euler’s identity). For all x ∈ R,
(1.16) eix = cos(x) + i sin(x),
eix + e−ix
(1.17) cos(x) = ,
2
eix − e−ix
(1.18) sin(x) = .
2i
Fact 1.22 (Pythagorean theorem). For all x ∈ R,
(1.19) cos(x)2 + sin(x)2 = 1.
Let us also recall basic properties of complex numbers at this point: For every
complex number z ∈ C there exist a, b ∈ R, r ≥ 0 and φ ∈ [0, 2π) such that
(1.20) z = a + ib = reiφ .
The complex conjugate of z is defined by
(1.21) z = a − ib = re−iφ
The absolute value of z is defined by
√
(1.22) |z| = a2 + b 2 = r
We have
(1.23) |z|2 = zz.
C
z = a + ib = reiφ
b r
φ
a
We finish the review section with a simple, but powerful theorem on the continuity
of power series on the convergence boundary.
P∞ n
Theorem 1.23 (Abel). Let fP (x) = n=0 cn x be a power series with radius of
∞
convergence R = 1. Assume that n=0 cn converges. Then
∞
X
(1.24) lim f (x) = cn .
x→1−
n=0
(1.26)
N
X −1
= a0 (b0 − b1 ) + a1 (b1 − b2 ) + · · · + aN −1 (bN −1 − bN ) + aN bN = aN bN + an (bn − bn+1 )
n=0
Pn
Proof. To apply summation by parts we set sn = k=0 ck , s−1 = 0. Then
N
X N
X N
X −1
n n N
(1.27) cn x = (sn − sn−1 )x = sN x + (1 − x) s n xn .
n=0 n=0 n=0
≤ε
N
X ∞
X z }| {
n
(1.31) ≤ (1 − x) |sn − s|x + (1 − x) |sn − s| xn
n=0 n=N +1
N
X
(1.32) ≤ (1 − x) |sn − s|xn + ε.
n=0
4. FURTHER EXERCISES 11
4. Further exercises
Exercise 1.26. Prove or disprove convergence for each of the following series (a
and b are real parameters and convergence may depend on their values).
∞ ∞ ∞
X 1 X log n X
(log n)a log log n e1/n − n+1
(i) (ii) (iii) n
n=2
n (log(n))b
a
n=3 n=1
∞ ∞ n 2 ∞
X
−1
X 1 X 1
(iv) cos(πn) sin(πn ) (v) 1+ −e (vi)
n=1 n=2
n n=1
n(n1/n )100
∞ ∞ X
10n k ∞
−(log(n))a kn 1
X X X
(vii) 2 (viii) (−1) (ix)
n=2 n=1 k=0
k! n=1
n2 (1 − cos(n))
Exercise 1.27. Prove or disprove convergence for each of the following sequences
and in case of
p convergence, determine the limit:
4 2 2
(i) an = n + cos(n
2 1
√)−n
(ii) an = n + 2 n − n4 + n3
P 2
(iii) an = nk=n k1
(iv) an = n ∞ 1
P
k=0 n2 +k2
(v) a0 = 1, an+1 = a2n + a1n
2
(vi) an = nk=2 k k−1
Q
2
12 1. REVIEW
Exercise 1.28. For which x ∈ R do the following series converge? On which sets
do these series converge uniformly?
∞
X ∞
X ∞
X
(1.36) (i) 2 n
nx (ii) 1/n n n
(3 − 1) x (iii) tan(n−2 )enx
n=1 n=1 n=1
∞ ∞ ∞
X xn X sin(nx) X
(1.37) (iv) (v) (vi) 2−n tan(bxc + 1/n)
n=1
nn n=1
n2 n=1
The goal in this section is to study the general theory of compactness in metric
spaces. From Analysis I, you might already be familiar with compactness in R. By
the Heine-Borel theorem, a subset of Rn is compact if and only if it is bounded and
closed. We will see that this no longer holds in general metric spaces. We will also
study in detail compact subsets of the space of continuous functions C(K) where K is
a compact metric space (Arzelà-Ascoli theorem). Let (X, d) be a metric space. We first
review some basic definitions.
Definition 2.1. A collection (Gi )i∈I (ISis an arbitrary index set) of open sets
Gi ⊂ X is called an open cover of X if X ⊂ i∈I Gi .
(Clarification of notation: A ⊂ B means for us that A is a subset of B, not neces-
sarily a proper subset. That is, we also allow A = B. We will write A ( B to refer to
proper subsets.)
Definition 2.2. X is compact if every open cover of X contains a finite subcover.
is, if for every open cover (Gi )i∈I there exists m ∈ N and i1 , . . . , im ∈ I such that
That S
X⊂ m j=1 Gij . This is also called the Heine-Borel property.
13
14 2. COMPACTNESS IN METRIC SPACES
xi
x
y
Thus (Ui )i∈I is an open cover of X and by compactness there exists a finite subcover
{Ui1 , . . . , Uim }. That is,
m
[
(2.8) X⊂ Uik
k=1
16 2. COMPACTNESS IN METRIC SPACES
Consequently,
m
[ m
[
(2.9) f (X) ⊂ f (Uik ) ⊂ Vik .
k=1 k=1
Exercise 2.22. Show that the closed and bounded set B(0, 1) ∈ `1 is not compact.
2. SEQUENTIAL COMPACTNESS AND TOTAL BOUNDEDNESS 19
Bi Bj Bk B`
Figure 2.
p1 p2 pnk p
Figure 3.
This is a contradiction, because we assumed that the Bn are not contained in any
of the Gi .
Now let ε > 0 be such that every ε-ball is contained in one of the Gi . We have already
proven earlier that X is totally bounded if it is sequentially compact. Thus there exist
2. SEQUENTIAL COMPACTNESS AND TOTAL BOUNDEDNESS 21
p1 , . . . , pM such that the balls B(pj , ε) cover X. But each B(pj , ε) is contained in a Gi ,
say in Gij , so we have found a finite subcover:
M
[ M
[
(2.16) X⊂ B(pj , ε) ⊂ Gij .
j=1 j=1
Corollary 2.23. Compact subsets of metric spaces are bounded and closed.
Corollary 2.24. Let X be a complete metric space and A ⊂ X. Then A is totally
bounded if and only if it is relatively compact.
Exercise 2.25. Prove this.
22 2. COMPACTNESS IN METRIC SPACES
(see Fact 1.7). Convergence with respect to d∞ is uniform convergence (see Fact 1.8).
In this section we ask ourselves when a subset F ⊂ C(K) is compact.
Example 2.26. Let F = {fn : n ∈ N} ⊂ C([0, 1]), where
(2.18) fn (x) = xn , x ∈ [0, 1].
F is not compact, because no subsequence of (fn )n converges. This is because the
pointwise limit
0, x ∈ [0, 1),
(2.19) f (x) =
1, x = 1.
is not continuous, i.e. not in C([0, 1]).
The key concept that characterizes compactness in C(K) is equicontinuity.
Definition 2.27 (Equicontinuity). A subset F ⊂ C(K) is called equicontinuous if
for every ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε for all f ∈ F, x, y ∈ K
with d(x, y) < δ.
Definition 2.28. F ⊂ C(K) is called uniformly bounded if there exists C > 0 such
that |f (x)| ≤ C for all x ∈ K and f ∈ F.
F ⊂ C(K) is called pointwise bounded if for all x ∈ K there exists C = C(x) > 0 such
that |f (x)| ≤ C for all f ∈ F.
Note that F ⊂ C(K) is uniformly bounded if and only if it is bounded (as a metric
space, see Definition 2.16). We have
(2.20) F uniformly bounded ⇒ F pointwise bounded.
The converse is false in general.
Fact 2.29. If (fn )n ⊂ C(K) is uniformly convergent (on K), then {fn : n ∈ N} is
equicontinuous.
Proof. Let ε > 0. By uniform convergence there exists N ∈ N such that
(2.21) sup |fn (x) − fN (x)| ≤ ε/3
x∈K
for n ≥ N . By uniform continuity (using Theorem 2.10) there exists δ > 0 such that
(2.22) |fk (x) − fk (y)| ≤ ε/3
for all x, y ∈ K with d(x, y) < δ and all k = 1, . . . , N . Thus, for n ≥ N and x, y ∈ K
with d(x, y) < δ we have
(2.23) |fn (x)−fn (y)| ≤ |fn (x)−fN (x)|+|fN (x)−fN (y)|+|fN (y)−fn (y)| ≤ 3·ε/3 = ε.
3. EQUICONTINUITY AND THE ARZELÀ-ASCOLI THEOREM 23
for all sequences (cn )n with |cn | ≤ 1 and for all x ∈ [−1/2, 1/2]. Similarly,
nX∞ o
(2.29) F0 = ncn xn−1 : |cn | ≤ 1
n=1
Proof of Corollary 2.32. Using the mean value theorem we see that for all
x, y ∈ [a, b] there exists ξ ∈ [a, b] such that
(2.41) f (x) − f (y) = f 0 (ξ)(x − y).
But since F 0 is bounded there exists C > 0 such that
(2.42) |f 0 (ξ)| ≤ C
for all f ∈ F, ξ ∈ [a, b]. Thus,
(2.43) |f (x) − f (x)| ≤ C|x − y|
for all x, y ∈ [a, b] and all f ∈ F. This implies equicontinuity: for ε > 0 we set δ = C −1 ε.
Then for x, y ∈ [a, b] with |x − y| < δ we have
(2.44) |f (x) − f (y)| ≤ C|x − y| < Cδ = ε.
Therefore the claim follows from Theorem 2.31.
26 2. COMPACTNESS IN METRIC SPACES
Example 2.35. Condition (i) from Corollary 2.32 is necessary, because relatively
compact sets are bounded. Condition (ii) however is not necessary.√ Consider for ex-
ample F = {fn : n = 1, 2, . . . } ⊂ C([0, 1]) with fn (x) = sin(nx)/ n. The set F is
bounded, but F 0 is unbounded. But the sequence (fn )n is uniformly convergent, so by
Fact 2.29, F is equicontinuous and hence relatively compact.
4. Further exercises
Exercise 2.36. Let (X, d) be a metric space and A ⊂ X a subset.
(i) Show that A is totally bounded if and only if A is totally bounded.
(ii) Assume that X is complete. Show that A is totally bounded if and only if A is
relatively compact. Which direction is still always true if X is not complete?
Exercise 2.37. Let `1 denote space of all sequences (aPn )n of complex numbers such
that n=1 |an | < ∞, equipped with the metric d(a, b) = ∞
P∞
n=1 |an − bn |.
(i) Prove that
X∞
1
(2.45) A = {a ∈ ` : |an | ≤ 1}
n=1
is bounded and closed, but not compact.
(ii) Let b ∈ `1 with bn ≥ 0 for all n ∈ N. Show that
(2.46) B = {a ∈ `1 : |an | ≤ bn ∀ n ∈ N}
is compact.
Exercise 2.38. Recall that `∞ is the metric space of bounded sequences of complex
numbers equipped with the supremum metric d(a, b) = supn∈N |an − bn |. Let s ∈ `∞ be
a sequence of non–negative real numbers that converges to zero. Let
(2.47) A = {a ∈ `∞ : |an | ≤ sn for all n}.
Prove that A ⊂ `∞ is compact.
Exercise 2.39. For each of the following subsets of C([0, 1]) prove or disprove
compactness:
(i) A1 = {f ∈ C([0, 1]) : maxx∈[0,1] |f (x)| ≤ 1},
(ii) A2 = A1 ∩ {p : p polynomial of degree ≤ d} (where d ∈ N is given)
(iii) A3 = A1 ∩ {f : f is a power series with infinite radius of convergence}
Exercise 2.40. Let F ⊂ C([a, b]) be a bounded set. Assume that there exists a
function ω : [0, ∞) → [0, ∞) such that
(2.48) lim ω(t) = ω(0) = 0.
t→0+
The purpose of this exercise is to prove a theorem of Fréchet that characterizes com-
pactness in `p . Let F ⊂ `p .
(i) Assume that F is bounded and equisummable in the following sense: for all ε > 0
there exists N ∈ N such that
X∞
(2.51) |an |p < ε for all a ∈ F.
n=N
Then show that F is totally bounded.
(ii) Conversely, assume that F is totally bounded. Then show that it is equisummable
in the above sense.
Hint: Mimick the proof of Arzelà-Ascoli.
Exercise 2.42. Let C k ([a, b]) denote the space of k-times continuously differentiable
functions on [a, b] endowed with the metric
k
X
(2.52) d(f, g) = sup |f (j) (x) − g (j) (x)|.
j=0 x∈[a,b]
Exercise 2.43. Let X be a metric space. Assume that for every continuous function
f : X → C there exists a constant Cf > 0 such that |f (x)| ≤ Cf for all x ∈ X. Show
that X is compact. Hint: Assume that X is not sequentially compact and construct
an unbounded continuous function on X.
Exercise 2.44. Consider F = {fN : N ∈ N} ⊂ C([0, 1]) with
N
X
(2.54) fN (x) = b−nα sin(bn x),
n=0
where 0 < α < 1 and b > 1 are fixed.
(a) Show that F is relatively compact in C([0, 1]).
(b) Show that F 0 is not a bounded subset of C([0, 1]).
(c) Show that there exists c > 0 such that for all x, y ∈ R and N ∈ N,
(2.55) |fN (x) − fN (y)| ≤ c|x − y|α .
Exercise 2.45. Suppose (X, d) is a metric space with a countable dense subset, i.e.
a set A = {x1 , x2 , . . . } ⊂ X with A = X. Let `∞ denote the metric space of bounded
sequences a = (an )n of real numbers with metric d∞ (a, b) = supn∈N |an − bn |. Show
that there exists a map ι : X → `∞ with d∞ (ι(x), ι(y)) = d(x, y) for every x, y ∈ X (in
other words, X can be isometrically embedded into `∞ ).
CHAPTER 3
Approximation theory
1. Polynomial approximation
Theorem 3.1 (Weierstrass). For every continuous function f on [a, b] there exists
a sequence of polynomials that converges uniformly to f .
In other words, the theorem says that the set A = {p : p polynomial} is dense in
C([a, b]).
There are many proofs of this theorem in the literature. We present a proof using
Bernstein polynomials. Without loss of generality we consider only the interval [a, b] =
[0, 1] (why are we allowed to do that?).
29
30 3. APPROXIMATION THEORY
n
X n k
(3.5) II = (f (k/n) − f (t)) t (1 − t)n−k .
k=0,
k
k
| −t|≥δ
n
We estimate I and II separately. For I we have from uniform continuity that
n
X n k
(3.6) |I| ≤ ε/2 t (1 − t)n−k = ε/2.
k=0
k
1. POLYNOMIAL APPROXIMATION 31
(3.8) Bn g1 (t) = t
t − t2
(3.9) Bn g2 (t) = t2 + for n ≥ 2
n
Proof. We have
n
X n k
(3.10) Bn g0 (t) = t (1 − t)n−k = (t + (1 − t))n = 1
k=0
k
by the binomial theorem. Next,
n n
X k n k n−k
X n−1 k
(3.11) Bn g1 (t) = t (1 − t) = t (1 − t)n−k
k=0
n k k=1
k − 1
n−1
X n−1 k
=t t (1 − t)(n−1)−k = t(t + (1 − t))n−1 = t.
k=0
k
To compute Bn g2 we use that
k2 n
k n−1 n−1k−1 n−1 1 n−1
(3.12) = = +
n2 k n k−1 n n−1 k−1 n k−1
n−1 n−2 1 n−1
= + .
n k−2 n k−1
Thus,
n n
n−1X n−2 k n−k 1 X n−1 k
(3.13) Bn g2 (t) = t (1 − t) + t (1 − t)n−k
n k=2 k − 2 n k=1 k − 1
n−1 2 1 t − t2
= t + t = t2 + .
n n n
As a consequence, we obtain the following:
Lemma 3.3. For all t ∈ [0, 1],
n
2 n 1
X
(3.14) k
( n − t) tk (1 − t)n−k ≤ .
k=0
k n
Proof. From the previous lemma,
n
2 n
X
(3.15) k
( n − t) tk (1 − t)n−k = Bn g2 (t) − 2tBn g1 (t) + t2 Bn g0 (t)
k=0
k
t − t2 t − t2
= t2 + − 2t2 + t2 = .
n n
Since t ∈ [0, 1] we have 0 ≤ t − t2 = t(1 − t) ≤ 1.
32 3. APPROXIMATION THEORY
Now we are ready to estimate II. First note that f is bounded, so there exists c > 0
such that |f (x)| ≤ c for all x ∈ [0, 1]. Choose N ∈ N such that 2cδ −2 N −1 ≤ ε/2. Then
for all n ≥ N ,
n n
X n k n−k −2
X
2 n
(3.16) |II| ≤ 2c t (1 − t) ≤ 2cδ k
( n − t) tk (1 − t)n−k
k=0,
k k=0
k
k
| −t|≥δ
n
≤ 2cδ −2 N −1 ≤ ε/2.
In the second inequality we have used that δ −2 | nk −t|2 ≤ 1. Thus if n ≥ N and t ∈ [0, 1],
then
(3.17) |Bn f (t) − f (t)| ≤ |I| + |II| ≤ ε/2 + ε/2 = ε.
This concludes the proof of Weierstrass’ theorem.
2. Orthonormal systems
In the previous section we studied approximation of continuous functions in the
supremum norm, kf k∞ = supx∈[a,b] |f (x)|. In this section we turn our attention to
another important norm, the L2 norm.
Definition 3.4. For two piecewise continuous functions f, g on an interval [a, b] we
define their inner product by
Z b
(3.18) hf, gi = f (x)g(x)dx.
a
Proof. For nonnegative real numbers x and y we have the elementary inequality
x2 y 2
(3.23) xy ≤ + .
2 2
Thus we have
(3.24)
Z b Z b Z b
2
|hf, gi| ≤ 1
|f (x)g(x)|dx ≤ 2 |f (x)| dx + 21
|g(x)|2 dx. = 12 hf, f i + 21 hg, gi.
a a a
Now we note that for every λ > 0, replacing f by λf and g by λ−1 g does not change
the left hand side of this inequality. Thus we have for every λ > 0 that
λ2 1
(3.25) |hf, gi| ≤ 2
hf, f i + 2λ2
hg, gi.
q
hg,gi
Now we choose λ so that this inequality is as strong as possible: λ2 = hf,f i
(we may
assume that hf, f i =
6 0 because otherwise there is nothing to show). Then
p p
(3.26) |hf, gi| ≤ hf, f i hg, gi.
Note that one can arrive at this definition of λ in a systematic way: treat the right
hand side of (3.25) as a function of λ and minimize it using calculus.
Corollary 3.6 (Minkowski’s inequality). For two functions f, g ∈ pc([a, b]),
(3.27) kf + gk2 ≤ kf k2 + kgk2 .
Proof. We may assume kf + gk2 6= 0 because otherwise there is nothing to prove.
Then
Z b Z b Z b
2 2
(3.28) kf + gk2 = |f + g| ≤ |f + g||f | + |f + g||g|
a a a
Notice that the formula cn = hf, φn i still makes sense if f is not of the form (3.31).
Theorem 3.11. Let (φn )n be an orthonormal system on [a, b]. Let f be a piecewise
continuous function. Consider
N
X
(3.34) sN (x) = hf, φn iφn (x).
n=1
In other words, the theorem says that among all functions of the form N
P
n=1 cn φn (x),
2
the function sN defined by the coefficients cn = hf, φn i is the best “L -approximation”
to f in the sense that (3.35) holds.
This can be interpreted geometrically: the function sN is the orthogonal projection
of f onto the subspace XN . As in Euclidean space, the orthogonal projection is char-
acterized by being the point in XN that is closest to f and it is uniquely determined
by this property (see Figure 1).
XN⊥
kf − gk2 kf − sN k2
XN
g sN
for all f .
Theorem 3.15. Let (φn )n be an orthonormal system on [a, b]. Let (sN )N be as
in Theorem 3.11. Then (φn )n is complete if and only if (sN )N converges to f in the
L2 -norm (that is, limN →∞ kf − sN k2 = 0) for every piecewise continuous f on [a, b].
We will later see that the orthonormal system φn (x) = e2πinx (n ∈ Z) on [0, 1] is
complete.
Proof of Theorem 3.11. Let g ∈ XN and write
N
X
(3.39) g(x) = bn φn (x).
n=1
Thus,
(3.43) hf − g, f − gi = hf, f i − hf, gi − hg, f i + hg, gi
N
X N
X N
X
(3.44) = hf, f i − c n bn − c n bn + |bn |2
n=1 n=1 n=1
2. ORTHONORMAL SYSTEMS 37
N
X N
X
2
(3.45) = hf, f i − |cn | + |bn − cn |2
n=1 n=1
We have
(3.46) hf − sN , f − sN i = hf, f i − hf, sN i − hsN , f i + hsN , sN i
N
X N
X N
X
2 2
= hf, f i − 2 |cn | + |cn | = hf, f i − |cn |2 .
n=1 n=1 n=1
Thus we have shown
N
X
(3.47) hf − g, f − gi = hf − sN , f − sN i + |bn − cn |2
n=1
PN 2
which implies the claim since n=1 |bn − cn | ≥ 0 with equality if and only if bn = cn
for all n = 1, . . . , N .
Proof of Theorem 3.12. From the calculation in (3.46),
N
X
(3.48) hf, f i − |cn |2 = hf − sN , f − sN i ≥ 0,
n=1
PN 2 2
so n=1 |cP
n | ≤ kf k2 for all N . Letting N → ∞ this proves the claim (in particular,
∞
the series n=1 |cn |2 converges).
Proof of Theorem 3.15. From (3.46),
N
X
(3.49) kf − sN k22 = hf, f i − |hf, φn i|2
n=1
0 1
D0
D1
D2
.. .. ..
. . .
S
D= k≥0 Dk
0 I` Ir 1
Lemma 3.19. (1) Two dyadic intervals are either disjoint or contained in each
other. That is, for every I, J ∈ D at least one of the following is true: I ∩J = ∅
or I ⊂ J or J ⊂ I.
3. THE HAAR SYSTEM 39
(2) For every k ≥ 0 the dyadic intervals of generation k are a partition of [0, 1).
That is,
[
(3.51) [0, 1) = I.
I∈Dk
|I|−1/2
I` Ir
to denote the set of dyadic intervals of generation less than n. We want to study how
continuous functions can be approximated by linear combinations of Haar functions.
Let f ∈ C([0, 1]). Motivated by Theorem 3.11, we define for every positive integer n,
the orthogonal projection
X
(3.59) En f = hf, ψI iψI .
I∈D<n
Compute
Z Z Z
1/2
(3.66) f− f = |I| f · ψI = |I|1/2 hf, ψI i
I` Ir
and by the same reasoning,
Z Z
(3.67) g− g = |I|1/2 hg, ψI i.
I` Ir
Combining the last three displays we get
Z Z Z Z
(3.68) f− f= g− g.
I` Ir I` Ir
Adding the previous two displays gives hf iI` = hgiI` and subtracting them gives hf iIr =
hgiIr . This concludes the proof.
Proof of Theorem 3.27. Let ε > 0. By uniform continuity of f on [0, 1] (which
follows from Theorem 2.10) we may choose δ > 0 such that |f (t) − f (s)| < ε whenever
t, s ∈ [0, 1] are such that |t − s| < δ. Let N ∈ N be large enough so that 2−N < δ and
n ≥ N . Let t ∈ [0, 1] and I ∈ Dn such that t ∈ I. Then by Theorem 3.26,
Z
−1
(3.70) |En f (t) − f (t)| = |hf iI − f (t)| ≤ |I| |f (s) − f (t)|ds < ε.
I
Remark. This result goes back to A. Haar’s 1910 article Zur Theorie der orthogo-
nalen Funktionensysteme in Math. Ann. 69 (1910), no. 3, p. 331–371. The functions
(En f )n are also called dyadic martingale averages of f and have wide applications in
modern analysis and probability theory.
Exercise 3.30. Recall the functions rn (x) = sgn(sin(2n πx)) from Exercise 3.10.
(i) Show that every rn for n ≥ 1 can be written as a finite linear combination of Haar
functions and determine the coefficients of this linear combination.
(ii) Show that the orthonormal system on [0, 1] given by (rn )n is not complete.
Exercise 3.31. Define
X 1/2
(3.71) ∆n f = En+1 f − En f, Sf = |∆n f |2 .
n≥1
R1
(i) Assume that 0 f = 0. Prove that kSf k2 = kf k2 .
(ii) Show that for every m ∈ N there exists a finite linear combination of Haar functions
fm such that supx∈[0,1] |fm (x)| ≤ 1 and supx∈[0,1] |Sfm (x)| ≥ m.
42 3. APPROXIMATION THEORY
4. Trigonometric polynomials
In the following we will only be concerned with the trigonometric system on [0, 1]:
(3.72) φn (x) = e2πinx (n ∈ Z)
Definition 3.32. A trigonometric polynomial is a function of the form
N
X
(3.73) f (x) = cn e2πinx (x ∈ R),
n=−N
N
X
(3.74) f (x) = a0 + (an cos(2πnx) + bn sin(2πnx)).
n=1
Exercise 3.33. Work out how the coefficients an , bn in (3.74) are related to the cn
in (3.73).
Every trigonometric polynomial is 1-periodic:
(3.75) f (x) = f (x + 1)
for all x ∈ R.
Fact 3.34. (e2πinx )n∈Z forms an orthonormal system on [0, 1]. In particular,
(i) for all n ∈ Z,
Z 1
2πinx 0, if n 6= 0,
(3.76) e dx =
0
1, if n = 0.
PN
(ii) if f (x) = n=−N cn e2πinx is a trigonometric polynomial, then
Z 1
(3.77) cn = f (t)e−2πint dt.
0
One goal in this section is to show that this orthonormal system is in fact complete.
We denote by pc the space of piecewise continuous, 1-periodic functions f : R →
C (let us call a 1-periodic function piecewise continuous, if its restriction to [0, 1] is
piecewise continuous in the sense defined in the beginning of this section).
Definition 3.35. For a 1-periodic function f ∈ pc and n ∈ Z we define the nth
Fourier coefficient by
Z 1
(3.78) f (n) =
b f (t)e−2πint dt.
0
The series
∞
X
(3.79) fb(n)e2πinx
n=−∞
The question of when the Fourier series of a function f converges and in what sense
it represents the function f is a very subtle issue and we will only scratch the surface
in this lecture.
Definition 3.36. For a 1-periodic function f ∈ pc we define the partial sums
N
X
(3.80) SN f (x) = fb(n)e2πinx .
n=−N
Remark. Note that since (φn )n is an orthonormal system, SN f is exactly the or-
thogonal projection of f onto the space of trigonometric polynomials of degree ≤ N .
In particular, Theorem 3.11 tells us that
(3.81) kf − SN f k2 ≤ kf − gk2
holds for all trigonometric polynomials g of degree ≤ N . That is, SN f is the best
approximation to f in the L2 -norm among all trigonometric polynomials of degree
≤ N.
Definition 3.37 (Convolution). For two 1-periodic functions f, g ∈ pc we define
their convolution by
Z 1
(3.82) f ∗ g(x) = f (t)g(x − t)dt
0
where
N
X
(3.87) DN (x) = e2πinx .
n=−N
44 3. APPROXIMATION THEORY
The sequence of functions (DN )N is called Dirichlet kernel. The Dirichlet kernel can
be written more explicitly.
Fact 3.40. We have
sin(2π(N + 21 )x)
(3.88) DN (x) =
sin(πx)
Proof.
N 2N
X
−2πiN x
X e2πi(2N +1)x − 1
(3.89) DN (x) = 2πinx
e =e e2πinx = e−2πiN x
n=−N n=0
e2πix − 1
1 1
e2πi(N + 2 ) − e−2πi(N + 2 )x sin(2π(N + 21 )x)
(3.90) = = .
eπix − e−πix sin(πx)
4. TRIGONOMETRIC POLYNOMIALS 45
(3.93)
sin(2π(N + 21 )x) 2 sin(πx) sin(2π(N + 12 )x) cos(2πN x) − cos(2π(N + 1)x)
DN (x) = = = .
sin(πx) 2 sin(πx)2 2 sin(πx)2
Thus,
(3.94)
N N
X 1 X 1 − cos(2π(N + 1)x)
Dn (x) = 2
cos(2πnx) − cos(2π(n + 1)x) =
n=0
2 sin(πx) n=0 2 sin(πx)2
The claim now follows from the formula 1 − cos(2x) = 2 sin(x)2 .
As a consequence of this explicit formula we see that KN (x) ≥ 0 for all x ∈ R which
is not at all obvious from the initial definition. We define
(3.95) σN f (x) = f ∗ KN (x).
Theorem 3.42 (Fejér). For every 1-periodic continuous function f ,
(3.96) σN f → f
uniformly on R as N → ∞.
Corollary 3.43. Every 1-periodic continuous function can be uniformly approxi-
mated by trigonometric polynomials.
Remark. There is nothing special about the period 1 here. By considering the or-
2π
thonormal system (L−1/2 e L inx )n∈Z we obtain a similar result for L-periodic functions.
(3.97)
1 N n N n Z
1 X X 1
Z
1 X X 2πik(x−t)
σN f (x) = f (t) e dt = f (t)e−2πikt dte2πikx
0 N + 1 n=0 k=−n N + 1 n=0 k=−n 0
(3.98)
N n N N N
1 X X b 1 X X X |k|
= f (k)e2πikx = fb(k)e2πikx = (1 − N +1
)fb(k)e2πikx .
N + 1 n=0 k=−n N + 1 k=−N k=−N
n=|k|
Assumption (3) is a precise way to express the idea that the “mass” of kn con-
centrates near the origin. Keeping in mind Assumption (2), Assumption (3) can be
rewritten equivalently as:
Z
(3.102) kn (t)dt → 0
1
≥|t|≥δ
2
4. TRIGONOMETRIC POLYNOMIALS 47
where Z Z
(3.105) A= (f (x − t) − f (x))kn (t)dt, B= (f (x − t) − f (x))kn (t)dt.
1
|t|≤δ ≥|t|≥δ
2
By 3.103 and Assumption (2),
Z
ε ε
(3.106) |A| ≤ kn (t)dt ≤ .
2 |t|≤δ 2
Since f is bounded there exists C > 0 such that |f (x)| ≤ C for all x ∈ R. for all
0 < δ < 21 . Let N be large enough so that for all n ≥ N ,
Z
ε
(3.107) kn (t)dt ≤ .
1
≥|t|≥δ 4C
2
Thus, if n ≥ N ,
Z
ε
(3.108) |B| ≤ 2C kn (t)dt ≤ .
1
≥|t|≥δ 2
2
This implies
(3.109) |f ∗ kn (x) − f (x)| ≤ ε/2 + ε/2 ≤ ε
for n ≥ N and x ∈ R.
Corollary 3.46. The Fejér kernel (KN )N is an approximation of unity.
Proof. We verify the assumptions of Theorem 3.45. From (3.92) we see that
KN ≥ 0. Also,
Z 1/2 N n Z N
1 X X 1/2 2πikt 1 X
(3.110) KN (t)dt = e dt = 1 = 1.
−1/2 N + 1 n=0 k=−n −1/2 N + 1 n=0
1
Now we verify the last property. Let > δ > 0 and |x| ≥ δ. By (3.92),
2
1 1
(3.111) KN (x) ≤
N + 1 sin(πδ)2
Thus,
Z
1 1
(3.112) KN (t)dt ≤
1
≥|t|≥δ N + 1 sin(πδ)2
2
which converges to 0 as N → ∞.
Therefore we have proven Fejér’s theorem. Note that although the Dirichlet kernel
also satisfies Assumptions (2) and (3), it is not an approximation of unity. In other
words, if f is continuous then it is not necessarily true that SN f → f uniformly.
However, we can use Fejér’s theorem to show that SN f → f in the L2 -norm.
48 3. APPROXIMATION THEORY
In particular,
∞
X
(3.119) kf k22 = |fb(n)|2 .
n=−∞
Proof. We have
N
X N
X
(3.120) hSN f, gi = fb(n)he2πinx , gi = fb(n)b
g (n).
n=−N n=−N
While the Fourier series of a continuous function does not necessarily converge point-
wise, we can obtain pointwise convergence easily if we impose additional conditions.
Theorem 3.50. Let f be a 1-periodic continuous function and let x ∈ R. Assume
that f is differentiable at x. Then SN f (x) → f (x) as N → ∞.
Proof. By definition,
Z 1
(3.123) SN f (x) = f (x − t)DN (t)dt.
0
Also,
Z 1 N
X Z 1
(3.124) DN (t)dt = e2πint dt = 1.
0 n=−N 0
s1 + · · · + sN
σN = .
N
σN is Pcalled the N th Cesàro mean of the sequence sk or the N th Cesàro P∞ sum of the
series ∞ a
k=1 k . If σ N converges to a limit S we say that the series k=1 ak is Cesàro
summable to S. P
∞
P∞ that if k=1 ak is summable to S (i.e. by definition converges with sum S)
(ii) Prove
then k=1 ak is Cesàro summable to S.
(iii) Prove that the sum ∞ k−1
P
k=1 (−1) does not converge but is Cesàro summable to
some limit S and determine S.
5. THE STONE-WEIERSTRASS THEOREM 51
PnExercise 3.56. Let A ⊂ C([1, 2]) be the set of all polynomials of the form p(x) =
2k+1
k=0 ck x where ck ∈ C and n a non-negative integer. Show that A is dense, but
not an algebra.
Before we begin the proof of the Stone-Weierstrass theorem we first need some
preliminary lemmas.
Lemma 3.57. For every a > 0 there exists a sequence of polynomials (pn )n with real
coefficients such that pn (0) = 0 for all n and supx∈[−a,a] |pn (x) − |x|| → 0 as n → ∞.
Proof. From Weierstrass’ theorem we get that there exists a sequence of poly-
nomials qn that converges uniformly to f (x) = |x| on [−a, a]. Now set pn (x) =
qn (x) − qn (0).
Exercise 3.58. Work out an explicit sequence of polynomials (pn )n that converges
uniformly to x 7→ |x| on [−1, 1].
Let A ⊂ C(K) satisfy conditions (1),(2),(3). Observe that then also A satisfies (1),
(2), (3).
We may assume without loss of generality that we are dealing with real-valued
functions (otherwise split functions into real and imaginary parts f = g + ih and go
through the proof for both parts).
Lemma 3.59. If f ∈ A, then |f | ∈ A.
Proof. Let ε > 0 and a = maxx∈K |f (x)|. By Lemma 3.57 there exist c1 , . . . , cn ∈
R such that
n
X
(3.131) | ci y i − |y|| ≤ ε.
i=1
52 3. APPROXIMATION THEORY
Claim: For every x ∈ K there exists gx ∈ A such that gx (x) = f (x) and
gx (t) > f (t) − ε for t ∈ K.
Proof of Claim. Let y ∈ K. By Lemma 3.61 there exists hy ∈ A such that
hy (x) = f (x) and hy (y) = f (y). By continuity of hy there exists an open ball By
around y such that |hy (t) − f (t)| < ε for all t ∈ By . In particular,
(3.136) hy (t) > f (t) − ε.
Observe that (By )y∈K is an open cover of K. Since K is compact, we can find a finite
subcover by By1 , . . . , Bym . Set
(3.137) gx = max(hy1 , . . . , hym ).
By Lemma 3.60, gx ∈ A.
By continuity of gx there exists an open ball Ux such that
(3.138) |gx (t) − f (t)| < ε
for t ∈ Ux . In particular,
(3.139) gx (t) < f (t) + ε.
(Ux )x∈K is an open cover of K which has a finite subcover by Ux1 , . . . , Uxn . Then let
(3.140) h = min(gx1 , . . . , gxn ).
By Lemma 3.60 we have h ∈ A. Also,
(3.141) f (t) − ε < h(t) < f (t) + ε
for all t ∈ K. That is,
(3.142) |f (t) − h(t)| < ε
for all t ∈ K. This proves that f ∈ A.
6. Further exercises
Exercise 3.62. Show that there exists no continuous 1-periodic function g such
that f ∗ g = f holds for all continuous 1-periodic functions f .
Hint: Use the Riemann-Lebesgue lemma.
Exercise 3.63. Give an alternative proof of Weierstrass’ theorem by using Fejér’s
theorem and then approximating the resulting trigonometric polynomials by truncated
Taylor expansions.
Exercise 3.64. Find a sequence of continuous functions (fn )n on [0, 1] and a con-
tinuous function f on [0, 1] such that kfn − f k2 → 0, but fn (x) does not converge to
f (x) for any x ∈ [0, 1].
Exercise 3.65 (Weighted L2 norms). Fix a function w ∈ C([a, b]) that is non-
negative and does not vanish identically. Let us define another inner product by
Z b
(3.143) hf, giL2 (w) = f (x)g(x)w(x)dx
a
54 3. APPROXIMATION THEORY
1/2
and a corresponding norm kf kL2 (w) = hf, f iL2 (w) . Similarly, we say that (φn )n is an
orthonormal system by asking that hφn , φm iL2 (w) is 1 if n = m and 0 otherwise. Verify
that all theorems in Section 2 continue to hold when h·, ·i, k·k2 are replaced by h·, ·iL2 (w) ,
k · kL2 (w) , respectively.
Exercise 3.66. Let w ∈ C([0, 1]) be such that w(x) ≥ 0 for all x ∈ [0, 1] and w ≡
6 0.
Prove that there exists a sequence of real-valued polynomials (pn )n such that pn is of
degree n and
Z 1
1, if n = m,
(3.144) pn (x)pm (x)w(x)dx =
0
0, if n 6= m
for all non-negative integers n, m.
Exercise 3.67 (Chebyshev polynomials). Define a sequence of polynomials (Tn )n
by T0 (x) = 1, T1 (x) = x and the recurrence relation Tn (x) = 2xTn−1 (x) − Tn−2 (x) for
n ≥ 2.
(i) Show that Tn (x) = cos(nt) if x = cos(t).
Hint: Use that 2 cos(a) cos(b) = cos(a + b) + cos(a − b) for all a, b ∈ C.
(ii) Compute
Z 1
dx
(3.145) Tn (x)Tm (x) √
−1 1 − x2
for all non-negative integers n, m.
(iii) Prove that |Tn (x)| ≤ 1 for x ∈ [−1, 1] and determine when there is equality.
Exercise 3.68. Let d be a positive integer and f ∈ C([a, b]). Denote by Pd the set
of polynomials with real coefficients of degree ≤ d. Prove that there exists a polynomial
p∗ ∈ Pd such that kf − p∗ k∞ = inf p∈Pd kf − pk∞ .
Hint: Find a way to apply Theorem 2.12.
Exercise 3.69. Let f be smooth on [0, 1] (that is, arbitrarily often differentiable).
(i) Let p be a polynomial such that |f 0 (x) − p(x)| ≤ ε for all x ∈ [0, 1]. Construct a
polynomial q such that |f (x) − q(x)| ≤ ε for all x ∈ [0, 1].
(k)
(ii) Prove that there exists a sequence of polynomials (pn )n such that (pn )n converges
uniformly on [0, 1] to f (k) for all k = 0, 1, 2, . . . .
Exercise 3.70 (The space L2 ). Let (X, d) be a metric space. Recall that the
completion X of X is defined as follows: for two Cauchy sequences (an )n , (bn )n in X
we say that (an )n ∼ (bn )n if limn→∞ d(an , bn ) = 0. Then ∼ is an equivalence relation on
the space of Cauchy sequences and we define X as the set of equivalence classes. We
identify X with a subset of X by identifying x ∈ X with the equivalence class of the
constant sequence (x, x, . . . ). We make X a metric space by defining
(3.146) d(a, b) = lim d(an , bn ),
n→∞
Hint: Use the same proof as seen for L2c (a, b) in the lecture!
Exercise 3.71. Let f be the 1-periodic function such that f (x) = |x| for x ∈
[−1/2, 1/2]. Determine explicitly a sequence of trigonometric polynomials (pN )N such
that pN → f uniformly as N → ∞.
Exercise 3.72. Let f, g be continuous, 1-periodic functions.
(i) Show that f[ ∗ g(n) = fb(n)b
g (n).
P
(ii) Show that f · g(n) = m∈Z fb(n − m)b
d g (m).
(iii) If f is continuously differentiable, prove that fb0 (n) = 2πinfb(n).
(iv) Let y ∈ R and set fy (x) = f (x + y). Show that fby (n) = e2πiny fb(n).
(v) Let m ∈ Z, m 6= 0 and set fm (x) = f (mx). Show that fc b n
m (n) equals f ( m ) if m
divides n and zero otherwise.
dn
Exercise 3.73 (Legendre polynomials). Define pn (x) = dxn
[(1 − x2 )n ] for n =
0, 1, . . . and
Z 1
−1/2
(3.152) φn (x) = pn (x) · pn (t)2 dt .
−1
Show that (φn )n=0,1,... is a complete orthonormal system on [−1, 1].
Exercise 3.74. Let f be 1-periodic and k times continuously differentiable. Prove
that there exists a constant c > 0 such that
(3.153) |fb(n)| ≤ c|n|−k for all n ∈ Z.
56 3. APPROXIMATION THEORY
Hint: What can you say about the Fourier coefficients of f (k) ?
Exercise 3.75. Let f be 1-periodic and continuous.
(i) Suppose that fb(n) = −fb(−n) ≥ 0 holds for all n ≥ 0. Prove that
∞ b
X f (n)
(3.154) < ∞.
n=1
n
(ii) Show that there does not exist a 1-periodic continuous function f such that
sgn(n)
(3.155) fb(n) = for all |n| ≥ 2.
log |n|
Here sgn(n) = 1 if n > 0 and sgn(n) = −1 if n < 0.
Exercise 3.76. Suppose that f is a 1-periodic function such that there exists c > 0
and α ∈ (0, 1] such that
(3.156) |f (x) − f (y)| ≤ c|x − y|α
PN
holds for all x, y ∈ R. Show that the sequence of partial sums SN f (x) = n=−N fb(n)e2πinx
converges uniformly to f as N → ∞.
Exercise 3.77. Let f ∈ C([0, 1]) and A ⊂ C([0, 1]) dense. Suppose that
Z 1
(3.157) f (x)a(x)dx = 0
0
for all a ∈ A. Show
R 1 that f2 = 0.
Hint: Show that 0 |f (x)| dx = 0.
Exercise 3.78. Let f ∈ C([−1, 1]) and a ∈ [−1, 1]. Show that for every ε > 0 there
exists a polynomial p such that p(a) = f (a) and |f (x) − p(x)| < ε for all x ∈ [−1, 1].
Exercise 3.79. Prove that
∞
1 X sin(n)
(3.158) − = (−1)n .
2 n=1 n
Exercise 3.80. Suppose f ∈ C([1, ∞)) and limx→+∞ f (x) = a. Show that f can
be uniformly approximated on [1, ∞) by functions of the form g(x) = p(1/x), where p
is a polynomial.
Exercise 3.81 (Stone-Weierstrass for finite sets). Let K be a finite set and A
a family of functions on K that is an algebra (i.e. closed under taking finite linear
combinations and products), separates points and vanishes nowhere. Give a purely
algebraic proof that A must then already contain every function on K. (That means
your proof is not allowed to use the concept of an inequality. In particular, you are not
allowed to use any facts about metric spaces such as the Stone-Weierstrass theorem.)
Hint: Take a close look at the proof of Stone-Weierstrass.
Exercise 3.82 (Uniform approximation by neural networks). Let σ(t) = et for
t ∈ R. Fix n ∈ N and let K ⊂ Rn be a compact set. As usual, let C(K) denote the
space of real-valued continuous functions on K. Define a class of functions N ⊂ C(K)
by saying that µ ∈ N iff there exist m ∈ N, W ∈ Rm×n , v, b ∈ Rm such that
Xm
(3.159) µ(x) = σ((W x)i + bi )vi for all x ∈ K.
i=1
6. FURTHER EXERCISES 57
x3
x2 µ(x)
x1
(i) Show that f (xk ) = LN (xk ) for all k = 0, . . . , N and that LN is the unique polynomial
of degree ≤ N with this property.
(ii) Suppose f ∈ C N +1 ([0, 1]). Show that for every x ∈ [0, 1] there exists ξ ∈ [0, 1] such
that
N
f (N +1) (ξ) Y
(3.161) f (x) − LN (x) = (x − xk ).
(N + 1)! k=0
(iii) Show that LN does not necessarily converge to f uniformly on [0, 1]. (Find a
counterexample.)
(iv) Suppose f is given by a power series with infinite convergence radius. Does LN
necessarily converge to f uniformly on [0, 1] ?
Remark. The polynomials LN are also known as Lagrange interpolation polynomials.
CHAPTER 4
Let K denote either one of the fields R or C. Let X be a vector space over K.
Definition 4.1. A map k · k : X → [0, ∞) is called a norm if for all x, y ∈ X and
λ ∈ K,
(4.1) kλxk = |λ| · kxk, kx + yk ≤ kxk + kyk, kxk = 0 ⇔ x = 0.
A K-vector space equipped with a norm is called a normed vector space. On every
normed vector space we have a natural metric space structure defined by
(4.2) d(x, y) = kx − yk.
A complete normed vector space is called Banach space.
Examples 4.2. • Rn with the Euclidean norm is a Banach space.
• Rn with the norm kxk = supi=1,...,n |xi | is also a Banach space.
• If K is a compact metric space, then C(K) is a Banach space with the supre-
mum norm kf k∞ = supx∈K |f (x)|.
• The
R 1 space2 of continuous functions on [0, 1] equipped with the L2 -norm kf k2 =
( 0 |f (x)| dx)1/2 is a normed vector space, but not a Banach space (why?).
Example 4.3. The set of bounded sequences (an )n∈N of complex numbers equipped
with the `∞ -norm,
(4.3) kak∞ = sup |an |
n=1,2,...
Thus,
x
(4.10) T ≤ C.
kxkX Y
Exercise 4.9. Show that L(X, Y ) endowed with the operator norm forms a normed
vector space (i.e. show that k · kop is a norm).
Example 4.10. Let A ∈ Rn×m be a real n × m matrix. We view A as a linear
map Rm → Rn : for x ∈ Rm , A(x) = A · x ∈ Rn . Let us equip Rn and Rm with the
corresponding k · k∞ norms. Consider the operator norm kAk∞→∞ = supkxk∞ =1 kAxk∞
with respect to these normed spaces:
Xm Xm
(4.16) kAxk∞ = max Aij xj ≤ max |Aij | kxk∞ .
i=1,...,n i=1,...,n
j=1 j=1
Pm
This implies kAk∞→∞ ≤ maxi=1,...,n j=1 |Aij |. On the other hand, for given i =
1, . . . , n we choose x ∈ Rm with xj = |Aij |/Aij if Aij 6= 0 and xj = 0 if Aij = 0. Then
kxk∞ ≤ 1 and
Xm
(4.17) kAk∞→∞ ≥ kAxk∞ = |Aij |.
j=1
Pm
Since i was arbitrary, we get kAk∞→∞ ≥ maxi=1,...,n j=1 |Aij |. Altogether we proved
m
X
(4.18) kAk∞→∞ = max |Aij |.
i=1,...,n
j=1
Exercise 4.12. Let A ∈ Rn×n . Define kxk2 = ( ni=1 |xi |2 )1/2 (Euclidean norm)
P
and kAk2→2 = supkxk2 =1 kAxk2 . Observe that AAT is a symmetric n × n matrix and
hence has only non-negative eigenvalues. Denote the largest eigenvalue of AAT by ρ.
√
Prove that kAk2→2 = ρ. Hint: First consider the case that A is symmetric. Use that
symmetric matrices are orthogonally diagonalizable.
62 4. LINEAR OPERATORS AND DERIVATIVES
1. Equivalence of norms
Definition 4.13. Two norms k·ka and k·kb on a vector space X are called equivalent
if there exist constants c, C > 0 such that
(4.19) ckxka ≤ kxkb ≤ Ckxka
for all x ∈ X.
Exercise 4.14. Prove that equivalent norms generate the same topologies: if k · ka
and k · kb are equivalent then a set U ⊂ X is open with respect to k · ka if and only if
it is open with respect to k · kb .
Exercise 4.15. Show that equivalence of norms forms an equivalence relation on
the space of norms. That is, if we write n1 ∼ n2 to denote that two norms n1 , n2 are
equivalent, then prove that n1 ∼ n1 (reflexivity), n1 ∼ n2 ⇒ n2 ∼ n1 (symmetry) and
n1 ∼ n2 , n2 ∼ n3 ⇒ n1 ∼ n3 (transitivity).
Theorem 4.16. Let X be a finite-dimensional K-vector space. Then all norms on
X are equivalent.
Xn
(4.25) kT xkY ≤ |ci |kT xi kY ≤ C max |ci |,
i=1,...,n
i=1
where C = ni=1 kT xi kY . By equivalence of norms we may assume that maxi |ci | is the
P
norm on X.
This is not true if X is infinite-dimensional.
Example 4.18. Let X be the set of sequences of complex numbers (an )n∈N such
that supn∈N n|an | < ∞ and let Y be the space of bounded complex sequences. Then
X ⊂ Y . Equip both spaces with the norm kak = supn∈N |an |. The map T : X → Y ,
(k) (k)
(T a)n = nan is not bounded: let en = 1 if k = n and en = 0 if k 6= n. Then e(k) ∈ X
and T e(k) = ke(k) and ke(k) k = 1. So
(4.26) kT e(k) k = k
for every k ∈ N and therefore supkxk=1 kT xk = ∞.
Exercise 4.19. Let X be the set of continuously differentiable functions on [0, 1]
and let Y = C([0, 1]). We consider X and Y as normed vector spaces with the norm
kf k = supx∈[0,1] |f (x)|. Define a linear map T : X → Y by T f = f 0 . Show that T is
not bounded.
64 4. LINEAR OPERATORS AND DERIVATIVES
2. Dual spaces*
Theorem 4.20. Let X be a normed vector space and Y a Banach space. Then
L(X, Y ) is a Banach space (with the operator norm).
Proof. Let (Tn )n ⊂ L(X, Y ) be a Cauchy sequence. Then for every x ∈ X,
(Tn x)n ⊂ Y is Cauchy and by completeness of Y it therefore converges to some limit
which we call T x. This defines a linear operator T : X → Y . We claim that T is
bounded. Since (Tn )n is a Cauchy sequence, it is a bounded sequence. Thus there
exists M > 0 such that kTn kop ≤ M for all n ∈ N. We have for x ∈ X,
(4.27) kT xkY ≤ kT x − Tn xkY + kTn xkY ≤ kT x − Tn xkY + M kxkX .
Letting n → ∞ we get kT xkY ≤ M kxkX . So T is bounded with kT kop ≤ M . It
remains to show that Tn → T in L(X, Y ). That is, for all ε > 0 we need to find N ∈ N
such that
(4.28) kTn x − T xkY ≤ εkxkX
for all n ≥ N and x ∈ X. Since (Tn )n is a Cauchy sequence, there exists N ∈ N such
that
(4.29) kTn x − Tm xkY ≤ 2ε kxkX
for all n, m ≥ N and x ∈ X. Fix x ∈ X. Then there exists mx ≥ N such that
(4.30) kTmx x − T xkY ≤ 2ε kxkX .
Then if n ≥ N and x ∈ X,
(4.31) kTn x − T xkY ≤ kTn x − Tmx xkY + kTmx x − T xkY ≤ εkxkX .
Definition 4.21. Let X be a normed vector space. Elements of L(X, K) are called
bounded linear functionals. L(X, K) is called the dual space of X and denoted X 0 .
Corollary 4.22. Dual spaces of normed vector spaces are Banach spaces.
Proof. This follows from Theorem 4.20 because K (which is R or C) is complete.
Theorem 4.23. If X is finite-dimensional, then X 0 is isomorphic to X.
Proof. Let {x1 , . . . , xn } ⊂ X be a basis. Then we can define a corresponding dual
basis of X 0 as follows: let fi ∈ X 0 , i ∈ {1, . . . , n} be the linear map given by fi (xi ) = 1
0
and fi (xj ) = 0 for j 6= i. Then we claim Pn that {f1 , . . . , fn } is a basis of X . Indeed, let
0
f ∈ X . For x ∈ X we can write x = i=1 ci xi with uniquely determined ci ∈ K. Then
by linearity,
X n Xn
(4.32) f (x) = ci f (xi ) = f (xi )fi (x),
i=1 i=1
because fi (x) = ci . Thus, the linear span of {f1 , . . . , fn } is X 0 . On the other hand,
suppose
X n
(4.33) bi f i = 0
i=1
3. SEQUENTIAL `p SPACES* 65
for some coefficients (bi )i=1,...,n ⊂ K. Then for every j ∈ {1, . . . , n}, bj = ni=1 bi fi (xj ) =
P
0. Thus, {f1 , . . . , fn } is linearly independent. Thus, X 0 and X are isomorphic since
they have the same dimension. We can define an isomorphism φ : X → X 0 by xi 7→ fi
for i = 1, . . . , n.
Optional topic (not relevant for exams)
3. Sequential `p spaces*
Definition 4.24. Let P 1 ≤ p < ∞. Then we define `p as the set of all sequences
(xn )n=1,2,... ⊂ C such that ∞ p p
n=1 |xn | < ∞. The ` -norm is defined as
X∞ 1/p
(4.34) kxkp = |xn |p .
n=1
Lemma 4.26 (Young’s inequality). For a, b ≥ 0 and p ∈ (1, ∞) we have the elemen-
tary inequality
0
ap b p
(4.36) ab ≤ + 0.
p p
Proof. Recall that log is a concave function. Thus, for u, v ≥ 0 and t ∈ [0, 1],
(4.37) t log(u) + (1 − t) log(v) ≤ log(tu + (1 − t)v).
0
The left hand side equals log(ut v 1−t ). Now let u = ap , v = bp , t = p1 . Then the claim
follows from applying the exponential function on both sides of the inequality.
Proof of Hölder’s inequality. If p ∈ {1, ∞}, the inequality is trivial. So we
assume p ∈ (1, ∞). By Young’s inequality,
∞ ∞ ∞
X 1X 1X 0
(4.38) |xn yn | ≤ |xn |p + 0 |yn |p .
n=1
p n=1 p n=1
Let λ > 0. Replacing xn by λxn and yn by λ−1 yn we obtain
∞ ∞ 0 ∞
X λp X λ−p X 0 0
(4.39) |xn yn | ≤ p
|xn | + 0 |yn |p = λp A + λ−p B,
n=1
p n=1 p n=1
0
where A = p1 kxkpp and B = p10 kykpp0 . Without loss of generality we may assume that
A 6= 0. We choose λ such that this inequality is strongest. This turns out to be when
0 1
λ = ( ppAB ) p+p0 . Plugging this into (4.39) implies the claim.
Theorem 4.27 (Minkowski’s inequality). Let p ∈ [1, ∞]. For x, y ∈ `p ,
(4.40) kx + ykp ≤ kxkp + kykp .
66 4. LINEAR OPERATORS AND DERIVATIVES
Dividing by kx + ykp−1
p gives the claim.
We conclude that k · kp is a norm and `p a normed vector space.
Theorem 4.28. Let p ∈ (1, ∞). The dual space (`p )0 is isometrically isomorphic to
p0
` .
Proof. By ek we denote the sequence which is 1 at position k and 0 everywhere
else.
0
Then we define a map φ : (`p )0 → `p by φ(v) = (v(ek ))k . Clearly, this is a linear
0
map. First we need to show that φ(v) ∈ `p . Let v ∈ (`p )0 . For each n we define
x(n) ∈ `p by
0
(
|v(ek )|p
(4.44)
(n)
xk = v(ek )
if k ≤ n, v(ek ) 6= 0,
0 otherwise.
We have on the one hand
n
0
X
(n)
(4.45) v(x )= |v(ek )|p .
n=1
And on the other hand
n
X 1/p
0
(4.46) |v(x(n) )| ≤ kvkop kx(n) kp = kvkop |v(ek )|p .
k=1
0 p p
Here we have used that p(p − 1) = p( p−1 − 1) = p−1
= p0 . Combining these two we get
n
X 10
0 p
(4.47) |v(ek )|p ≤ kvkop .
k=1
Letting n → ∞ this implies that
∞
X 1/p0
p0
(4.48) kφ(v)k = p0 |v(en )| ≤ kvkop ,
n=1
0
so φ(v) ∈ `p . The calculation also shows that φ is bounded. It is easy to check that
0
φ is injective.
P∞ We show that it is surjective: let x ∈ `p . Then define v ∈ (`p )0 by
v(y) = n=1 xn yn . By Hölder’s inequality, v is well-defined. We have v(ek ) = xk , so
φ(v) = x. Thus φ is an isomorphism. It remains to show that φ is an isometry. We
have already seen that
(4.49) kφ(v)kp0 ≤ kvkop
3. SEQUENTIAL `p SPACES* 67
4. Derivatives
Recall that a function f on an interval (a, b) is called differentiable at x ∈ (a, b) if
limh→0 f (x+h)−f
h
(x)
exists. In other words, if there exists a number T ∈ R such that
|f (x + h) − f (x) − T h|
(4.50) lim = 0.
h→0 |h|
In that case we denote that real number T by f 0 (x). A real number can be understood
as a linear map R → R:
(4.51) R −→ L(R, R), T 7−→ (x 7→ T · x)
That is, the linear map associated with a real number T is given by multiplication with
T . Interpreting the derivative at a given point as a linear map, we can formulate the
definition in the general setting of normed vector spaces.
Definition 4.31. Let X, Y be normed vector spaces and U ⊂ X open. A map
F : U → Y is called Fréchet differentiable (we also say differentiable) at x ∈ U if there
exists T ∈ L(X, Y ) such that
kF (x + h) − F (x) − T hkY
(4.52) lim = 0.
h→0 khkX
In that case we call T the (Fréchet) derivative of F at x and write T = DF (x) or
T = DF |x . F is called (Fréchet) differentiable if it is differentiable at every point
x ∈ U . When X = Rn we also use the following terminology: F is totally differentiable
and DF (x) is the total derivative of F at x.
Before we move on we need to verify that DF (x) is well-defined. That is, that T is
uniquely determined by F and x. Suppose T, Te ∈ L(X, Y ) both satisfy (4.52). Then
(4.53) kT h − TehkY ≤ kF (x + h) − F (x) − T hkY + kF (x + h) − F (x) − TehkY
Thus, by (4.52),
kT h − TehkY
(4.54) −→ 0 as h → 0
khkX
In other words, for all ε > 0 there exists δ > 0 such that
(4.55) kT h − TehkY ≤ εkhkX
if khkX ≤ δ. By homogeneity of norms we argue that the inequality (4.55) must hold
for all h ∈ X: let h ∈ X, h 6= 0 be arbitrary. Then let h0 = δ khkh X . By homogeneity of
norms we have kh0 kX = δ. Thus,
(4.56) kT h0 − Teh0 kY ≤ εkh0 kX = εδ.
Multiplying both sides by δ −1 khkX and using homogeneity of norms and linearity of T ,
we obtain
(4.57) kT h − TehkY ≤ εkhkX
for all h ∈ X (it is trivial for h = 0). Since ε > 0 was arbitrary (and is independent of
h), this implies kT h − TehkY = 0, so T h = Teh for all h. Thus T = Te.
4. DERIVATIVES 69
Comments.
• O and o are not functions and (4.58), (4.61) are not equations!
• This is an abuse of the inequality sign: it would be more accurate to define
O(g) as the class of functions that satisfy (4.60), say to write f ∈ O(g).
• One can think of (say) O(g) as a placeholder for a function which may change
at every occurrence of the symbol O(g) but always satisfies the respective
condition that it is dominated by a constant times kg(h)k if khk is small.
• For brevity, we may sometimes not write out the phrase ”as h → 0”.
• There is nothing special about letting h tend to 0 in this definition. We can also
define o(g), O(g) with respect to another limit, for instance, say, as khk → ∞.
• If f (h) = o(g(h)), then f (h) = O(g(h)), but generally not vice versa.
• If f (h) = O(khkk ), then f (h) = o(khkk−ε ) for every ε > 0.
Thus
Z x
(4.73) kF (f + h) − F (f ) − T hk∞ ≤ sup h(t)2 dt
x∈[0,1] 0
Z 1
(4.74) ≤ |h(t)|2 dt ≤ sup |h(x)|2 = khk2∞
0 x∈[0,1]
This implies
1
(4.75) kF (f + h) − F (f ) − T hk∞ ≤ khk∞ → 0
khk∞
Rx
as h → 0. Thus F is Fréchet differentiable at f and DF |f (h) = 2 0 f (t)h(t)dt.
4. DERIVATIVES 71
5. Further exercises
P 1/p
n
Exercise 4.41. Let x ∈ Rn . Define kxkp = i=1 |x i |p
for 0 < p < ∞ and
kxk∞ = maxi=1,...,n |xi |.
(i) Show that limp→∞ kxkp = kxk∞ .
(ii) Show that limp→0 kxkp exists and determine its value (we also allow ∞ as a limit).
t
Exercise 4.42. Let C(R) be the set of continuous functions on R. Let w(t) = 1+t
for t ≥ 0. Define
∞
X
(4.95) d(f, g) = 2−k w( sup |f (x) − g(x)|).
k=0 x∈[−k,k]
Exercise 4.47. Let Rn×n denote the space of real n × n matrices equipped with
the matrix norm kAk = supkxk=1 kAxk. Define
(4.98) F : Rn×n −→ Rn×n , A 7−→ A2 .
Show that F is totally differentiable and compute DF |A .
CHAPTER 5
Differential calculus in Rn
Theorem 5.3 (Mean value theorem). Let U ⊂ Rn be open and convex. Suppose
that f : U → R is totally differentiable on U . Then, for every x, y ∈ U , there exists
ξ ∈ U such that
(5.5) f (x) − f (y) = Df |ξ (x − y)
and there exists t ∈ [0, 1] such that ξ = tx + (1 − t)y.
The idea of the proof is to apply the one-dimensional mean value theorem to the
function restricted to the line passing through x and y.
Proof. If x = y there is nothing to show. Let x 6= y. Define g : [0, 1] → R by
g(t) = f (tx + (1 − t)y). The function g is continuous on [0, 1] and differentiable on
(0, 1). By the one-dimensional mean value theorem there exists t0 ∈ [0, 1] such that
g(1) − g(0) = g 0 (t0 ). By the chain rule,
(5.6) g 0 (t0 ) = Df |t0 x+(1−t0 )y (x − y).
Corollary 5.4. Under the assumptions of the previous theorem: if Df |x = 0 for
all x ∈ U , then f is constant.
Exercise 5.5. Show that the conclusion of the corollary also holds under the weaker
assumption that U is open and connected (rather than convex). Hint: Consider over-
lapping open balls along a continuous path connecting two given points in U .
Definition 5.6. A map f : U → Rm , U ⊂ Rn open, is called continuously dif-
ferentiable (on U ) if it is totally differentiable on U and the map U → L(Rn , Rm ),
x 7→ Df |x is continuous. We denote the collection of continuously differentiable maps
by C 1 (U, Rm ). If m = 1 we also write C 1 (U, R) = C 1 (U ).
Remark. For f : U → R, continuity of the map U → Rn , x 7→ ∇f (x) is equivalent to
continuity of the map U → L(Rn , R), x 7→ Df |x .
Theorem 5.7. Let U ⊂ Rn be open. Let f : U → R. Then f ∈ C 1 (U ) if and
only if ∂j f (x) exists for every j ∈ {1, . . . , n} and x 7→ ∂j f (x) is continuous on U for
j ∈ {1, . . . , n}.
Remark. Without additional assumptions (such as continuity of x 7→ ∂j f (x)), existence
of partial derivatives does not imply total differentiability.
Exercise 5.8. Let F : R2 → R be defined by F (x) = xx21+x x2
2 if x 6= 0 and F (0) = 0.
1 2
(i) Show that the partial derivatives ∂1 F (x), ∂2 F (x) exist for every x ∈ R2 .
(ii) Show that F is not continuous at (0, 0).
(iii) Determine at which points F is totally differentiable.
Proof. Let f ∈ C 1 (U ). Then ∂j f (x) exists by Theorem 4.38 and x 7→ ∂j f (x)
is continuous because it can be written as the composition of the continuous maps
x 7→ ∇f (x) and πj : Rn → R, x 7→ xj : ∂j f (x) = (πj ◦ ∇f )(x).
Conversely, assume that ∂j f (x) exists for every x ∈ U , j ∈ {1, . . . , n} and x 7→ ∂j f (x) is
continuous. Let x ∈ U . Write h = nj=1 hj ej and define vk = kj=1 hj ej for 1 ≤ k ≤ n
P P
n
X
(5.8) = (f (x + vj ) − f (x + vj−1 )).
j=1
By the one-dimensional mean value theorem there exists tj ∈ [0, 1] such that
(5.9) f (x+vj )−f (x+vj−1 ) = f (x+vj−1 +hj ej )−f (x+vj−1 ) = ∂j f (x+vj−1 +tj hj ej )hj .
By continuity of ∂j f , for every ε > 0 exists δ > 0 such that
(5.10) |∂j f (y) − ∂j f (x)| ≤ ε/n for all j = 1, . . . , n,
whenever y ∈ U is such that kx − yk ≤ δ. We may choose δ small enough so that
x + h ∈ U whenever khk ≤ δ. Then, if khk ≤ δ (then also kvj k ≤ δ, kvj−1 + tj hj ej k ≤ δ)
we get
n
X n
X
(5.11) f (x + h) − f (x) − hj ∂j f (x) ≤ f (x + vj ) − f (x + vj−1 ) − hj ∂j f (x)
j=1 j=1
n n
X X ε
(5.12) = |hj ||∂j f (x + vj−1 + tj hj ej ) − ∂j f (x)| ≤ |hj | ≤ εkhk.
j=1 j=1
n
Proof. Let g(t) = f (tx + (1 − t)y). By the fundamental theorem of calculus and
the chain rule,
Z 1 Z 1
0
(5.15) f (x) − f (y) = g(1) − g(0) = g (s)ds = Df |tx+(1−t)y (x − y)dt.
0 0
Theorem 5.10 (Mean value theorem, vector-valued case). Let U ⊂ Rn be open and
convex and F ∈ C 1 (U, Rm ). Then for every x, y ∈ U there exists θ ∈ [0, 1] such that
(5.16) kF (x) − F (y)k ≤ kDF |ξ kop kx − yk,
where ξ = θx + (1 − θ)y.
Proof. Write F = (F1 , . . . , Fm ). Then by Theorem 5.9
Z 1
(5.17) Fi (x) − Fi (y) = DFi |tx+(1−t)y (x − y)dt.
0
78 5. DIFFERENTIAL CALCULUS IN Rn
This implies
Z 1
(5.18) F (x) − F (y) = DF |tx+(1−t)y (x − y)dt.
0
By the triangle inequality, we have
Z 1
(5.19) kF (x) − F (y)k ≤ kDF |tx+(1−t)y kop dtkx − yk
0
The map [0, 1] → R, t 7→ kDF |tx+(1−t)y kop is continuous (because F is C 1 ) and therefore
assumes its supremum at some point θ ∈ [0, 1]. Define ξ = θx + (1 − θ)y. Then
(5.20) kF (x) − F (y)k ≤ kDF |ξ kop kx − yk.
Remark. If m ≥ 2 and F : U → Rm is C 1 and x, y ∈ U , then it is not necessarily true
that there exists ξ ∈ U such that
(5.21) F (x) − F (y) = DF |ξ (x − y).
m−1 m−1 ∞
X X X d(x1 , x0 )
(5.28) = d(xi+1 , xi ) ≤ ci d(x1 , x0 ) ≤ d(x1 , x0 ) ci = cn .
i=n i=n i=n
1−c
80 5. DIFFERENTIAL CALCULUS IN Rn
Remarks. 1. The theorem is false if we drop the assumption that X is complete: the
map f : (0, 1) → (0, 1) defined by f (x) = x/2 is a contraction, but has no fixed point.
2. The proof not only demonstrates the existence of the fixed point x∗ , but also gives
an algorithm to compute it via successive applications of the map ϕ. We can say
something about how quickly the algorithm converges: the sequence (xn )n defined in
the proof satisfies the inequality
cn
(5.30) d(xn , x∗ ) ≤ d(x0 , x1 ),
1−c
so speed of convergence depends only on the parameter c ∈ (0, 1) and the quality of
the initial guess x0 ∈ X.
3. The contraction principle can be used to solve equations. For example, say we want
to solve F (x) = 0 (F is some function). Then we can set G(x) = F (x) + x. Then
F (x) = 0 if and only if x is a fixed point of G.
Proof. We want to apply the contraction principle. For fixed y ∈ Rn , consider the
map
(5.35) ϕy (x) = x + Df |−1
a (y − f (x)) (x ∈ E)
Let λ = kDf |−1a kop . By continuity of Df at a, there exists an open ball U ⊂ E such
that
1
(5.37) kDf |a − Df |x kop ≤ for x ∈ U.
2λ
Then for x, x0 ∈ U
(5.38) kϕy (x) − ϕy (x0 )k ≤ kDϕy |ξ kop kx − x0 k
1
(5.39) ≤ kDf |−1 0 0
a kop kDf |a − Df |ξ kop kx − x k ≤ kx − x k.
2
Note that this doesn’t show that ϕy is a contraction, because ϕy (U ) may not be con-
tained in U . However, it does show that ϕy has at most one fixed point (by the same
argument used to show uniqueness in the Banach fixed point theorem). This already
implies that f is injective on U : for every y ∈ Rn we have f (x) = y for at most one
x ∈ U . Let V = f (U ). Then f |U : U → V is a bijection and has an inverse g : V → U .
Claim. V is open.
Proof of claim. Let y0 ∈ V . We need to show that there exists an open ball
around y0 that is contained in V . Since V = f (U ) there exists x0 ∈ U such that
f (x0 ) = y0 . Let r > 0 be small enough so that Br (x0 ) ⊂ U (possible because U is
open). Let ε > 0 and y ∈ Bε (y0 ). We will demonstrate that if ε > 0 is small enough,
then ϕy maps Br (x0 ) into itself. First note
(5.40) kϕy (x0 ) − x0 k = kDf |−1
a (y − y0 )k ≤ λε.
r
Hence, choosing ε ≤ 2λ
, we get for x ∈ Br (x0 ) that
(5.41) kϕy (x) − x0 k ≤ kϕy (x) − ϕy (x0 )k + kϕy (x0 ) − x0 k
(5.42) ≤ 21 kx − x0 k + r
2
≤ r
2
+ r
2
= r.
Thus ϕy (x) ∈ Br (x0 ). This proves ϕy (Br (x0 )) ⊂ Br (x0 ), so ϕy is a contraction of Br (x0 ).
By the Banach fixed point theorem, ϕy must have a unique fixed point x ∈ Br (x0 ). So
by definition of ϕy we have f (x) = y, so y ∈ f (U ) = V . Therefore we have shown that
Bε (y0 ) ⊂ V , so V is open.
It remains to show that g ∈ C 1 (V, U ) and Dg|f (a) = Df |−1
a . We use the following
lemma.
Lemma 5.18. Let A, B ∈ Rn×n such that A is invertible and
(5.43) kB − Ak · kA−1 k < 1.
Then B is invertible. (Here k · k denotes the matrix norm, which is just the operator
norm: kAk = supkxk=1 kAxk.)
82 5. DIFFERENTIAL CALCULUS IN Rn
(5.48) h − Df |−1 −1
a k = h + Df |a (f (x) − f (x + h)) = ϕy (x + h) − ϕy (x),
so kh − Df |−1 1
a kk ≤ 2 khk. Therefore, khk ≤ 2λkkk → 0 as kkk → 0. Now we compute
(5.50) = h − Df |−1 −1
x (f (x + h) − f (x)) = −Df |x (f (x + h) − f (x) − Df |x h) and so
1 −1 kf (x + h) − f (x) − Df |x hk khk
(5.51) kg(y + k) − g(y) − Df |−1
x kk ≤ kDf |x k
kkk khk kkk
kf (x + h) − f (x) − Df |x hk
(5.52) ≤ kDf |−1
x k 2λ −→ 0 as k → 0.
khk
Therefore g is differentiable at y with Dg|y = Df |−1
x .
It remains to show that Dg is continuous. To show this we need another lemma.
Lemma 5.19. Let GL(n) denote the space of real invertible n × n matrices (equipped
with some norm). The map GL(n) → GL(n) defined by A 7→ A−1 is continuous.
This lemma follows because the entries of A−1 are rational functions with non-
vanishing denominator in terms of the entries of A (by Cramer’s rule).
Since Dg|y = Df |−1
x and compositions of continuous maps are continuous (Df is
continuous by assumption), we have that Dg must be continuous, so g ∈ C 1 (V, U ).
Exercise 5.20. Let f ∈ C 1 (E, Rn ) and assume that Df |x is invertible for all x ∈ E.
Prove that f (U ) is open for every open set U ⊂ E.
Remark. If f is locally invertible at every point, it is not necessarily (globally) invertible
(that is, bijective).
Example 5.21. Let f : R2 → R2 be given by f (x) = (ex2 sin(x1 ), ex2 cos(x1 )). Then
x
e 2 cos(x1 ) ex2 sin(x1 )
(5.53) Df |x = .
−ex2 sin(x1 ) ex2 cos(x1 )
Thus det Df |x = e2x2 (cos(x1 )2 + sin(x1 )2 ) = e2x2 6= 0, so by Theorem 5.17, f is locally
invertible at every point x ∈ R2 . f is not bijective: it is not injective because, for
instance, f (0, 0) = f (2π, 0).
84 5. DIFFERENTIAL CALCULUS IN Rn
We will now use the inverse function theorem to prove a significant generalization
concerning equations of the form f (x, y) = 0, where y is given and we want to solve for
x. Let E ⊂ Rn × Rm open, f : E → Rn differentiable at p = (a, b) ∈ Rn × Rm . Then
Df |p is a n × (n + m) matrix:
∂1 f1 |p · · · ∂n f1 |p ∂n+1 f1 |p · · · ∂n+m f1 |p
(5.54) Df |p = .. .. .. .. .. .. ∈ Rn×(n+m)
. . . . . .
∂1 fn |p · · · ∂n fn |p ∂n+1 fn |p · · · ∂n+m fn |p
We denote the left n × n submatrix by Dx f |p and the right n × m submatrix by Dy f |p .
Note that Dx f |(a,b) is the Jacobian matrix of the differentiable map x 7→ f (x, b) at
x = a (b is fixed).
Theorem 5.22 (Implicit function theorem). Let f ∈ C 1 (E, Rn ), (a, b) ∈ E ⊂
Rn × Rm with f (a, b) = 0. Assume that Dx f |(a,b) ∈ Rn×n is invertible. Then there exist
open sets U ⊂ E and W ⊂ Rm with (a, b) ∈ U , b ∈ W such that for every y ∈ W there
exists a unique x such that (x, y) ∈ U and f (x, y) = 0. Write x = g(y). Then W can be
chosen such that g ∈ C 1 (W, Rn ), g(b) = a, (g(y), y) ∈ U and f (g(y), y) = 0 for y ∈ W .
Moreover,
(5.55) Dg|b = −Dx f |−1
(a,b) Dy f |(a,b) .
(Note that this equation makes sense, because Dg|b ∈ Rn×m , Dx f |−1
(a,b) ∈ R
n×n
, Dy f |(a,b) ∈
n×m
R .)
Remark. The equation (5.55) can be obtained from differentiating the equation
(5.56) f (g(y), y) = 0
with respect to y using the chain rule (this is called implicit differentiation).
Proof. Define F (x, y) = (f (x, y), y) for (x, y) ∈ E. Then F ∈ C 1 (E, Rn × Rm ).
We would like to apply the inverse function theorem to F . For h ∈ Rn , k ∈ Rm with
(a + h, b + k) ∈ E,
(5.57)
F (a + h, b + k) − F (a, b) = (f (a + h, b + k) − f (a, b), k) = (Df |(a,b) (h, k) + o(k(h, k)k), k)
Example 5.23. While we used the inverse function theorem in the proof of the
implicit function theorem, the inverse function theorem is also a consequence of the
implicit function theorem. Say E ⊂ Rn , f ∈ C 1 (E, Rn ) and a ∈ E such that Df |a is
invertible.
Then
2x1 y1 cos(y2 ) x21 −x2 sin(y2 ) −1
(5.63) Df (x, y) = −x2
y2 y3 −e cos(y1 ) x1 y 3 x1 y 2
−1 − cos(1) −1 1
(5.67) Dg|b = −Dx f |−1
(a,b) Dy f |(a,b) = 1
.
2 2 cos(1) 2 0
86 5. DIFFERENTIAL CALCULUS IN Rn
Example 5.25. Consider the equation y 0 = yt . The solutions of this equation are
of the form y(t) = ct for c ∈ R.
Example 5.26. Sometimes we can solve initial value problems by computing an
explicit expression for y. Recall for instance that solving differential equations of the
form y 0 = f (t)g(y) is easy (by separation of variables). Consider for instance
0 t
y (t) = y(t)
(5.70)
y(t0 ) = y0
p
for (t0 , y0 ) ∈ (0, ∞) × (0, ∞). Then y(t) = t2 + y02 − t20 . Note that if y02 − t20 ≥ 0,
2 2
p y is defined on I = (0, ∞). But if y0 − t0 < 0, then y is only defined on I =
then
( t20 − y02 , ∞) 3 t0 .
3. ORDINARY DIFFERENTIAL EQUATIONS 87
In general, however it is not easy to find a solution. It may also happen that the
solution is not expressible in terms of elementary functions. Try for instance, to solve
the initial value problem
2 2
0
y (t) = ey(t) t sin(t + y(t)),
(5.71)
y(1) = 5.
88 5. DIFFERENTIAL CALCULUS IN Rn
(t0 , y0 )
t
Proof of Theorem 5.27. Let J = [y0 − b, y0 + b]. It suffices to show that there
exists a unique continuous function y : I → J such that
Z t
(5.76) y(t) = y0 + F (s, y(s))ds
t0
1 2c(t−t0 )
(5.84) = cd∗ (g1 , g2 ) (e − 1) ≤ 12 d∗ (g1 , g2 )e2c|t−t0 |
2c
Similarly, for t ∈ [t0 − a∗ , t0 ] we also have
(5.85) |T g1 (t) − T g2 (t)| ≤ 21 d∗ (g1 , g2 )e2c|t−t0 | .
2For the supremum metric to give rise to a contraction we would need to make the interval I
smaller.
90 5. DIFFERENTIAL CALCULUS IN Rn
Thus,
(5.86) e−2c|t−t0 | |T g1 (t) − T g2 (t)| ≤ 12 d∗ (g1 , g2 )
holds for all t ∈ I, so d∗ (T g1 , T g2 ) ≤ 12 d∗ (g1 , g2 ).
By the Banach fixed point theorem, there exists a unique y ∈ Y such that T y = y,
i.e. a unique solution to the initial value problem (5.74).
Remarks. 1. The proof is constructive. That is, it tells us how to compute the solution.
This is because the proof of the Banach fixed point theorem is constructive. Indeed,
construct a sequence (yn )n≥0 ⊂ Y by y0 (t) = y0 and
Z t
(5.87) yn (t) = y0 + F (s, yn−1 (s))ds for n = 1, 2, . . .
t0
Then (yn )n≥0 converges uniformly on I to the solution y. This method is called Picard
iteration.
2. Note that the length of the existence interval I does not depend on the size of the
constant c in (5.73).
Example 5.29. Consider the initial value problem
( t
y 0 (t) = e sin(t+y(t))
ty(t)−1
,
(5.88)
y(1) = 5.
t
Let F (t, y) = e sin(t+y)
ty−1
. We need to choose a rectangle R around the point (1, 5) where
we have control over |F (t, y)| and |∂y F (t, y)|. Thus we need to stay away from the set
of (t, y) such that ty − 1 = 0. Say,
(5.89) R = {(t, y) : |t − 1| ≤ 21 , |y − 5| ≤ 1}.
Then for (t, y) ∈ R:
(5.90) |ty − 1| ≥ (1 − 21 )(5 − 1) − 1 = 1.
Also, |et sin(t + y)| ≤ e3/2 . Setting M = e3/2 , we obtain
(5.91) |F (t, y)| ≤ M for all (t, y) ∈ R.
Compute
et cos(t + y) et sin(t + y)
(5.92) ∂y F (t, y) = −t .
ty − 1 (ty − 1)2
For (t, y) ∈ R we estimate
et cos(t + y) et sin(t + y)
(5.93) |∂y F (t, y)| ≤ + t ≤ c,
ty − 1 (ty − 1)2
where we have set c = e3/2 + 23 e3/2 . Then the number a∗ from Theorem 5.27 is
a∗ = min(a, b/M ) = min( 12 , 1/e3/2 ) = e−3/2 . So the theorem yields the existence
and uniqueness of a solution the the initial value problem (5.88) in the interval I =
[1−e−3/2 , 1+e−3/2 ]. We can also compute that solution by Picard iteration: let y0 (t) = 5
and
Z t s
e sin(s + yn−1 (s))
(5.94) yn (t) = 5 + ds.
1 syn−1 (s) − 1
The sequence (yn )n converges uniformly on I to the solution y.
3. ORDINARY DIFFERENTIAL EQUATIONS 91
Example 5.30. Sometimes one can extend solutions beyond the interval obtained
from the Picard-Lindelöf theorem. Consider the initial value problem
0
y (t) = cos(y(t)2 − 2t3 )
(5.95)
y(0) = 1
We claim that there exists a unique solution y : R → R. To prove this it suffices to
demonstrate the existence of a unique solution on the interval [−L, L] for every L > 0.
To do this we invoke the Picard-Lindelöf theorem. Set
(5.96) R = {(t, y) ∈ R2 : |t| ≤ L, |y − 1| ≤ L}.
Let F (t, y) = cos(y 2 − 2t3 ). Then
(5.97) |F (t, y)| ≤ 1 for all (t, y) ∈ R2 .
We have ∂y F (t, y) = −2y sin(y 2 − 2t3 ), so |∂y F (t, y)| ≤ 2|y| ≤ 2(L + 1) for all (t, y) ∈ R.
Then by Theorem 5.27, there exists a unique solution to (5.95) on I = [−L, L].
Example 5.31. If the Lipschitz condition (5.73) fails, then the initial value problem
may have more than one solution. Consider
0
y (t) = |y(t)|1/2 ,
(5.98)
y(0) = 0.
The function y 7→ |y|1/2 is not Lipschitz continuous in any neighborhood of 0: for
y > 0 its derivative 21 y −1/2 is unbounded as y → 0. The function y1 (t) = 0 solves the
initial value problem (5.98). The function
2
t /4, if t > 0,
(5.99) y2 (t) =
0, if t ≤ 0
also does.
92 5. DIFFERENTIAL CALCULUS IN Rn
Existence of a solution still holds without the assumption (5.73). We will prove this
as a consequence of the Arzelá-Ascoli theorem.
Theorem 5.32 (Peano existence theorem). Let E ⊂ R × R open, (t0 , y0 ) ∈ E,
F ∈ C(E),
(5.100) R = {(t, y) : |t − t0 | ≤ a, |y − y0 | ≤ b} ⊂ E.
Let M = sup(t,y)∈R |F (t, y)| < ∞. Define a∗ = min(a, b/M ) and let I = [t0 − a∗ , t0 + a∗ ].
Then there exists a solution y : I → R to the initial value problem
0
y (t) = F (t, y(t)),
(5.101)
y(t0 ) = y0 .
Corollary 5.33. Let E ⊂ R × R open, (t0 , y0 ) ∈ E, F ∈ C(E). Then there exists
an interval I ⊂ R and a differentiable function y : I → R such that (t, y(t)) ∈ E for all
t ∈ I and y solves (5.74).
Proof. It suffices to produce a solution to the integral equation
Z t
(5.102) y(t) = y0 + F (s, y(s))ds.
t0
To avoid some technicalities we will only present the proof under the additional as-
sumption that
(5.103) |F (t, y)| ≤ M
holds for |t − t0 | ≤ a and all y ∈ R. Then we may choose b arbitrarily large and thus
a∗ = a. We also restrict our attention to the interval [t0 , t0 + a], which we denote by I.
The construction is similar on the other half, [t0 −a, t0 ]. Let P be a partition of [t0 , t0 +a]:
P = {t0 < t1 < · · · < tN = t0 + a} of [t0 , t0 + a]. We let ∆P = max0≤k≤N −1 (tk+1 − tk )
denote the fineness of P. We try to build an approximate solution given as a piecewise
linear function. The function yP : [t0 , t0 + a] → R shall be defined as follows: let
yP (t0 ) = y0 and for t ∈ (tk , tk+1 ] we define yP (t) recursively by
(5.104) yP (t) = yP (tk ) + F (tk , yP (tk ))(t − tk ).
Claim 1. For t, t0 ∈ [t0 , t0 + a],
(5.105) |yP (t) − yP (t0 )| ≤ M |t − t0 |.
Proof of claim. In this proof we will write yP as y for brevity. Say t0 ∈ [tk , tk+1 ], t ∈
[t` , t`+1 ], k ≤ `. If k = `, then by (5.103),
(5.106) |y(t) − y(t0 )| = |F (tk , y(tk ))(t − t0 )| ≤ M |t − t0 |.
If k < `, then
`−1
X
0
(5.107) |y(t) − y(t )| = |y(t) − y(t` ) + (y(tj+1 ) − y(tj )) + y(tk+1 ) − y(t0 )|
j=k+1
`−1
X
(5.108) ≤ |y(t) − y(t` )| + |y(tj+1 ) − y(tj )| + |y(tk+1 ) − y(t0 )|
j=k+1
3. ORDINARY DIFFERENTIAL EQUATIONS 93
`−1
X
(5.109) ≤ M (t − t` ) + M (tj+1 − tj ) + M (tk+1 − t0 ) = M (t − t0 ).
j=k+1
Define gP (t) = F (tk , yP (tk )) for t ∈ (tk , tk+1 ]. Then gP is a step function and
yP0 (t) = gP (t) for t ∈ (tk , tk+1 ).
Claim 2. Suppose that ∆P ≤ δ(ε) min(1, M −1 ). Then we have for all t ∈ [t0 , t0 + a]
that
Z t
(5.111) yP (t) = y0 + gP (s)ds and |gP (s) − F (s, yP (s))| ≤ ε if s ∈ (tk−1 , tk ).
t0
k
X k Z
X tj Z tk
(5.113) = F (tj−1 , y(tj−1 ))(tj − tj−1 ) = g(s)ds = g(s)ds.
j=1 j=1 tj−1 t0
Thus,
Z t Z tk Z t Z t
(5.115) y(t) = y(tk ) + g(s)ds = y0 + g(s)ds + g(s)ds = y0 + g(s)ds.
tk t0 tk t0
Claim 3. Suppose that ∆P ≤ δ(ε) min(1, M −1 ). Then it holds for all t ∈ [t0 , t0 + a]
that
Z t
(5.120) |yP (t) − (y0 + F (s, yP (s))ds)| ≤ εa.
t0
Proof of Theorem 5.40. The idea is to apply Taylor’s theorem in one dimension
to the function g : [0, 1] → R given by g(t) = f (x + ty). Let us compute the derivatives
of g.
Claim. For m = 1, . . . , k + 1,
X m!
(5.144) g (m) (t) = ∂ α f (x + ty)y α
α!
|α|=m
This follows because for a given α = (α1 , . . . , αn ) with |α| = m there are
m! m! m m − α1 m − α1 − · · · − αn−1
(5.150) = = ···
α! α1 ! · · · αn ! α1 α2 αn
many tuples (i1 , . . . , im ) ∈ {1, . . . , n}m such that i appears exactly αi times among the
ij s. In other words, this is the number of ways to sort m pairwise different marbles into
n numbered bins such that bin number i contains exactly αi marbles.
By the one-dimensional Taylor theorem, there exists a θ ∈ [0, 1] such that
k
X g (m) (0) m g (k+1) (θ) k+1
(5.151) g(t) = t + t
m=0
m! (k + 1)!
From the claim we see that this equals
k
X 1 X m! α 1 X (k + 1)!
(5.152) ∂ f (x)y α tm + ∂ α f (x + θy)y α tk+1
m=0
m! α! (k + 1)! α!
|α|=m |α|=k+1
X ∂ α f (x) X ∂ α f (ξ)
(5.153) = (ty)α + (ty)α ,
α! α!
|α|≤k |α|=k+1
for some θ ∈ [0, 1]. Since ∂ α f is continuous for every |α| = k, it holds that
(5.156) |∂ α f (x + θy) − ∂ α f (x)| → 0 as y → 0.
Also |y α | = |y1 |α1 · · · |yn |αn ≤ kykα1 +···+αn = kyk|α| , so
X ∂ α f (x + θy) − ∂ α f (x)
(5.157) y α = o(kykk ).
α!
|α|=k
100 5. DIFFERENTIAL CALCULUS IN Rn
We have
X ∂ α f (x) Xn
α
(5.161) y = ∂i f (x)yi = h∇f (x), yi.
α! i=1
|α|=1
n
X
(5.163) = 1
2
yi (D2 f |x y)i = 21 hy, D2 f |x yi.
i=1
5. LOCAL EXTREMA 101
5. Local extrema
Let E ⊂ Rn be an open set and f : E → R a function.
Definition 5.44. A point a ∈ E is called a local maximum if there exists an open
set U ⊂ E with a ∈ U such that f (a) ≥ f (x) for all x ∈ U . It is called a strict local
maximum if f (a) > f (x) for all x ∈ U , x 6= a. We define the terms local minimum,
strict local minimum accordingly. A point is called a (strict) local extremum if it is a
(strict) local maximum or a (strict) local minimum.
Theorem 5.45. Suppose the partial derivative ∂i f exists on E. Then, if f has a
local extremum at a ∈ E, then ∂i f (a) = 0.
Proof. Let δ > 0 be such that a + tei ∈ E for all |t| ≤ δ. Define g : (−δ, δ) → R
by g(t) = f (a + tei ). By the chain rule, g is differentiable and g 0 (t) = ∂i f (a + tei ). Also,
0 is a local extremum of g so by Analysis I, 0 = g 0 (0) = ∂i f (a).
Corollary 5.46. If f is differentiable at a and a is a local extremum, then ∇f (a) =
0.
Remark. ∇f (a) = 0 is not a sufficient condition for a to be a local extremum. Think
of saddle points.
Definition 5.47. If a ∈ E is such that ∇f (a) = 0, then we call a a critical point
of f .
Recall from linear algebra: A matrix A ∈ Rn×n is called positive definite if hx, Axi >
0 for all x ∈ Rn \{0} and positive semidefinite if hx, Axi ≥ 0 for all x ∈ Rn . We also
write A > 0 to express that A is positive definite and A ≥ 0 to express that A is positive
semidefinite. The terms negative definite, negative semidefinite are defined accordingly.
A is indefinite if it is not positive semidefinite and not negative semidefinite. Every real
symmetric matrix has real eigenvalues and there is an orthonormal basis of eigenvectors
(spectral theorem). A real symmetric matrix is positive definite if and only if all
eigenvalues are positive.
Theorem 5.48. Let f ∈ C 2 (E) and a ∈ E with ∇f (a) = 0. Then
(1) if D2 f |a > 0, then a is a strict local minimum of f ,
(2) if D2 f |a < 0, then a is a strict local maximum of f ,
(3) if D2 f |a is indefinite, then a is not a local extremum of f .
Remark. If D2 f |x is only positive semidefinite or negative semidefinite, then we need
more information to be able to decide whether or not a is a local extremum.
Proof. We write A = D2 f |a . Let ε > 0. By Corollary 5.43 there exists δ > 0 such
that for all y with kyk ≤ δ we have
(5.164) f (a + y) = f (a) + 21 hy, Ayi + r(y)
with |r(y)| ≤ εkyk2 .
(1): Let A be positive definite. Let S = {y ∈ Rn : kyk = 1}. S is compact, so the
continuous map y 7→ hy, Ayi attains its minimum on S. That is, there exists y0 ∈ S
such that
(5.165) hy0 , Ay0 i ≤ hy, Ayi
102 5. DIFFERENTIAL CALCULUS IN Rn
for all y ∈ S. Define α = hy0 , Ay0 i. Since y0 6= 0 and A is positive definite, α > 0. Let
y
y ∈ Rn , y 6= 0. Then kyk ∈ S, so
y y 1
(5.166) α≤h ,A i= hy, Ayi.
kyk kyk kyk2
Thus, hy, Ayi ≥ αkyk2 for all y ∈ Rn . Now we set ε = α4 . Then
(5.167)
f (a + y) ≥ f (a) + 12 hy, Ayi − α4 kyk2 ≥ f (a) + α2 kyk2 − α4 kyk2 = f (a) + α4 kyk2 > f (a)
if y 6= 0, kyk ≤ δ. Therefore a is a local minimum.
(2): Follows from (1) by replacing f by −f .
(3): Let A be indefinite. We need to show that in every open neighborhood of a there
exist points y 0 , y 00 such that
(5.168) f (y 00 ) < f (a) < f (y 0 ).
Since A is not negative semidefinite there exists ξ ∈ Rn such that α = hξ, Aξi > 0.
Then, for t ∈ R small enough such that |tξ| ≤ δ we have
(5.169) f (a + tξ) = f (a) + 12 htξ, Atξi + r(tξ) = f (a) + 21 αt2 + r(tξ).
Let ε > 0 be such that |r(tξ)| ≤ α4 t2 for all |tξ| ≤ δ (recall that δ depends on ε). Then
f (a + tξ) ≥ f (a) + 14 αt2 > f (a). Similarly, since A is also not positive semidefinite,
there exists η ∈ Rn such that hη, Aηi < 0 and for small enough t, f (a + tη) < f (a).
Examples 5.49. (1) Let f (x, y) = c + x2 + y 2 for c ∈ R. Then
2 2 0
(5.170) D f |0 = >0
0 2
and 0 is a strict local minimum of f (even a global minimum).
(2) Let f (x, y) = c + x2 − y 2 for c ∈ R. Then
2 2 0
(5.171) D f |0 =
0 −2
is indefinite and 0 is not a local extremum of f .
(3) Let f1 (x, y) = x2 + y 4 , f2 (x, y) = x2 , f3 (x, y) = x2 + y 3 . Then
2 2 0
(5.172) D fi |0 = ≥ 0,
0 0
but f1 has a strict local minimum at 0, f2 has a (non-strict) local minimum at
0 and f3 has no local extremum at 0.
Example 5.50 (Linear regression). Say we are given finitely many points
(5.173) (x1 , y1 ), . . . , (xN , yN ) ∈ Rn × R.
Suppose for instance that these represent measurements or observations of some physical
system. For example, xi could represent a point in space and yi the corresponding air
pressure measurement. We are looking to discover a “hidden relation” between the x
and y coordinates. That is, we are looking for a function F : Rn → R such that F (xi )
is (at least roughly) yi . One way this is done is linear regression. Here we search only
among F that take the form
(5.174) Fa,b (x) = hx, ai + b
with some parameters a ∈ Rn , b ∈ R. That is, we are trying to “model” the hidden
relation by an affine linear function. The task is now to find the parameters a, b such
that Fa,b “fits best” to the given data set. To make this precise we introduce the error
function
XN
(5.175) E(a, b) = (Fa,b (xi ) − yi )2 .
i=1
The problem of linear regression is to find the parameters (a, b) such that E(a, b) is
minimal.
One approach to minimizing a function f : E → R is to solve the equation ∇f (x) =
0, i.e. to find all critical points. By Corollary 5.46 we know that every minimum must
be a critical points. However it is often difficult to solve that equation, so more practical
methods are needed.
Gradient descent. Choose x0 ∈ Rn arbitrary and let
(5.176) xn+1 = xn − αn ∇f (xn )
where αn > 0 is a small enough number to be determined later. The idea of this
iteration is to keep moving into the direction where f decreases the fastest. Sometimes
this simple process successfully converges to a minimum and sometimes it doesn’t,
depending on f , x0 and αn . What we can say from the definition is that, if f ∈ C 1 (E)
and (xn )n converges, then the limit is a critical point of f . The following lemma gives
some more hope.
Lemma 5.51. Let f ∈ C 1 (E). Then, for every x ∈ E and small enough α > 0,
(5.177) f (x − α∇f (x)) ≤ f (x).
Proof. By the definition of total derivatives,
(5.178) f (x−α∇f (x)) = f (x)+h∇f (x), −α∇f (x)i+o(α) = f (x)−αk∇f (x)k2 +o(α)
which is ≤ f (x) provided that α > 0 is small enough.
Remark. Note that the smallness of α in this lemma depends on the point x. Also, this
result is not enough to prove anything about the convergence of gradient descent.
We will see that gradient descent works well if f is a convex function.
Definition 5.52. Let E ⊂ Rn be convex. A function f : E → R is called convex if
(5.179) f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y)
for all x, y ∈ E, t ∈ [0, 1]. f is called strictly convex if
(5.180) f (tx + (1 − t)y) < tf (x) + (1 − t)f (y)
104 5. DIFFERENTIAL CALCULUS IN Rn
(5.189) = f (u) + h∇f (u), t(x − u) + (1 − t)(y − u)i = f (u) + h∇f (u), tx + (1 − t)y − ui.
Recalling that u = tx + (1 − t)y, we get
(5.190) tf (x) + (1 − t)f (y) ≥ f (u) = f (tx + (1 − t)y).
Theorem 5.54. Let E ⊂ Rn be open and convex and f ∈ C 2 (E). Then
(1) f is convex if and only if D2 f |x ≥ 0 for all x ∈ E,
(2) f is strictly convex if D2 f |x > 0 for all x ∈ E.
Proof. We only prove (1). The proof of (2) is very similar. Let f be convex. By
Taylor’s theorem, for u, u + tv ∈ E,
(5.191) f (u + tv) = f (u) + th∇f (u), vi + 21 t2 hD2 f |u v, vi + o(t2 )
and by Theorem 5.53,
(5.192) f (u + tv) ≥ f (u) + th∇f (u), vi.
Combining these two pieces of information we obtain
1 2
(5.193) 2
t hD2 f |u v, vi + o(t2 ) ≥ 0
6. OPTIMIZATION AND CONVEXITY* 105
(2): Let x1 , x2 ∈ E be critical points of f . By (1), they are global minima. This implies
f (x1 ) = f (x2 ). If x1 6= x2 , then by strict convexity,
f (x1 ) + f (x2 ) x + x
1 2
(5.196) f (x1 ) = >f .
2 2
This is a contradiction to x1 being a global minimum. Therefore x1 = x2 .
Example 5.57. If k · k is a norm on Rn , then the function x 7→ kxk is convex:
(5.197) ktx + (1 − t)yk ≤ tkxk + (1 − t)kyk
by the triangle inequality. Also, this function has a unique global minimum at x = 0.
Lemma 5.58. Let I ⊂ R, E ⊂ Rn be convex and suppose that
(1) f : E → I is convex, and
(2) g : I → R is convex and nondecreasing.
Then the function h : E → R given by h = g ◦ f is convex.
Proof. By convexity of f and since g is nondecreasing,
(5.198) h(tx + (1 − t)y) = g(f (tx + (1 − t)y)) ≤ g(tf (x) + (1 − t)f (y)).
Since g is convex this is
(5.199) ≤ tg(f (x)) + (1 − t)g(f (y)) = th(x) + (1 − t)h(y).
Corollary 5.59. If k · k is a norm on Rn , then the function x 7→ kxk2 is convex.
106 5. DIFFERENTIAL CALCULUS IN Rn
Example 5.60. Recall the error function from linear regression (Example 5.50):
N
X
(5.200) E(a, b) = (ha, xi i + b − yi )2
i=1
n+1
We claim that E : R → R is a convex function. We first rewrite E(a, b) into a
different form. Define a N × (n + 1) matrix M and a vector v ∈ Rn+1 by
a1
x11 · · · x1n 1
..
.. .. .. .. ∈ RN ×(n+1) , v = . n+1
(5.201) M= . . . . a ∈R ,
n
xN 1 · · · xN n 1
b
where xi = (xi1 , . . . , xin ) ∈ Rn for i = 1, . . . , N and a = (a1 , . . . , an ) ∈ Rn . Then
N
X
(5.202) E(a, b) = E(v) = ((M v)i − yi )2 = kM v − yk2 ,
i=1
P 1/2
N
where kck = i=1 |ci |2 .
Remarks. 1. f is strongly convex if and only if there exists β > 0 such that D2 f |x −βI ≥
0 for all x ∈ E. This follows directly from the definition using that βkyk2 = hβIy, yi.
The condition D2 f |x − βI ≥ 0 is equivalent to the smallest eigenvalue of D2 f |x be-
ing ≥ β. Yet another equivalent way of stating this is saying that the function
g(x) = f (x) − β2 kxk2 is convex. This is because D2 g|x = D2 f |x − βI.
2. If f is strongly convex, then f is strictly convex (by Theorem 5.54).
3. If f is strictly convex, then f is not necessarily strongly convex. For example con-
sider f : R → R, f (x) = ex . For every β > 0 there exists x ∈ R such that ex < β
because ex → 0 as x → −∞.
The following exercise shows that the assumption of strong convexity is not as
restrictive as it may seem at first sight: strictly convex functions are strongly convex
when restricted to compact sets.
Exercise 5.63. Suppose that f ∈ C 2 (Rn ) is strictly convex. Let K ⊂ Rn be
compact and convex. Show that there exist β− , β+ > 0 such that
(5.209) β− kyk2 ≤ hD2 f |x y, yi ≤ β+ kyk2
for all x ∈ K and y ∈ Rn . (In particular, f is strongly convex on K.)
Hint: Consider the minimal eigenvalue of D2 f |x as a function of x.
Theorem 5.64. Let E ⊂ Rn be open and convex. Let f ∈ C 2 (E). Then f is
strongly convex if and only if there exists γ > 0 such that
(5.210) f (u + v) ≥ f (u) + h∇f (u), vi + γkvk2
for every u, u + v ∈ E.
Proof. ⇒: Let β > 0 be such that g(x) = f (x) − β2 kxk2 is convex. Then by
Theorem 5.53,
(5.211) g(u + v) ≥ g(u) + h∇g(u), vi = f (u) − β2 kuk2 + h∇f (u) − βu, vi
On the other hand,
(5.212) g(u + v) = f (u + v) − β2 ku + vk2
Thus,
(5.213)
f (u + v) ≥ f (u) + h∇f (u), vi + β2 (ku + vk2 − kuk2 − 2hu, vi) = f (u) + h∇f (u), vi + β2 kvk2 .
⇐: This follows in the same way from the converse direction of Theorem 5.53.
Theorem 5.65. Let f ∈ C 2 (Rn ) be strongly convex. Then for every c ∈ R, the
sublevel set
(5.214) B = {x ∈ Rn : f (x) ≤ c}
is bounded.
Proof. By Theorem 5.64 we have
(5.215) f (x) ≥ f (0) + h∇f (0), xi + γkxk2 .
Therefore, limkxk→∞ f (x) = ∞. Suppose that B is unbounded. Then there would exist
a sequence (xn )n≥1 ⊂ B such that limn→∞ kxn k = ∞. But f (xn ) ≤ c, so f (xn ) 6→ ∞
as n → ∞. Contradiction!
108 5. DIFFERENTIAL CALCULUS IN Rn
Theorem 5.66. Let f ∈ C 2 (Rn ) be strongly convex. Then there exists a unique
global minimum of f .
Proof. By the previous theorem, the set B = {x ∈ Rn : f (x) ≤ f (0)} is bounded.
Thus, there exists R > 0 such that B ⊂ BR = {x ∈ Rn : kxk ≤ R}. BR is compact,
so f attains its minimum on BR (0) at some point x∗ ∈ BR . Then f (x∗ ) ≤ f (x) for all
x ∈ BR . It remains to show f (x∗ ) ≤ f (x) for all x 6∈ BR . If x 6∈ BR , then x 6∈ B, so
f (x) > f (0). Also, 0 ∈ BR , so f (x∗ ) ≤ f (0) < f (x).
We conclude this discussion by proving that gradient descent converges for strongly
convex functions.
Theorem 5.67. Let f ∈ C 2 (Rn ) be strongly convex and x0 ∈ Rn . Define
(5.216) xn+1 = xn − α∇f (xn ) for n ≥ 0.
If α is small enough, then (xn )n converges to the global minimum x∗ of f .
Remark. The restriction to f defined on Rn is only for convenience (the same is true
for Theorems 5.65 and 5.66).
Lemma 5.68. Let A ∈ Rn×n be a symmetric and positive definite matrix. Then the
matrix norm kAkop = supx6=0 kAxk
kxk
is equal to the largest eigenvalue of A.
Pn 2 1/2
(Here kxk = ( i=1 |xi | ) is the Euclidean norm.)
Proof. Let {v1 , . . . , vn } be an orthonormal basis of eigenvectors corresponding to
eigenvalues λ1 , . . . , λn , respectively. Then
Xn n
X
(5.217) kAxk = xi Avi = xi λi vi
i=1 i=1
P 1/2
n
which by orthogonality is equal to i=1 |xi |2 λ2i (use that kxk = (hx, xi)1/2 ). Thus
n n
!1/2
X X
(5.218) kAxk = ( |xi |2 λ2i )1/2 ≤ max λi 2
|xi | = max λi kxk.
i=1,...,n i=1,...,n
i=1 i=1
Let maxi=1,...,n λi = λi0 . We have shown that kAk ≤ λi0 . On the other hand,
(5.219) kAvi0 k = λi0 kvi0 k = λi0 ,
so kAk = supkxk=1 kAxk ≥ kAvi0 k = λi0 .
Proof of Theorem 5.67. Let α > 0. Define T (x) = x − α∇f (x). Then xn+1 =
T (xn ). We want T to be a contraction. For R > 0 define BR = {x ∈ Rn : kx − x∗ k ≤
R}. Let R > 0 be large enough such that x0 ∈ BR .
Claim. If α is small enough, then T is a contraction of BR .
Proof of claim. x∗ is a global minimum of f , so ∇f (x∗ ) = 0. Thus, T (x∗ ) = x∗ .
We have
(5.220) DT |x = I − αD2 f |x .
The largest eigenvalue of D2 f |x is a continuous function of x which is bounded on the
compact set BR . Therefore there exists γ > 0 such that
(5.221) hD2 f |x y, yi ≤ γkyk2
7. FURTHER EXERCISES 109
7. Further exercises
Exercise 5.69. Show that there exists a unique (x, y) ∈ R2 such that cos(sin(x)) =
y and sin(cos(y)) = x.
Exercise 5.70. Let U ⊆ Rn be open and convex and f : U → R differentiable such
that ∂1 f (x) = 0 for all x ∈ U .
(i) Show that the value of f (x) for x = (x1 , . . . , xn ) ∈ U does not depend on x1 .
(ii) Does (i) still hold if we assume that U is connected instead of convex? Give a proof
or counterexample.
Exercise 5.71. A function f : Rn → R is called homogeneous of degree α ∈ R if
f (λx) = λα f (x) for all λ > 0 and x ∈ Rn . Suppose that f is differentiable. Then show
that f is homogeneous of degree α if and only if
Xn
(5.226) xi ∂i f (x) = αf (x)
i=1
(i) Compute the Jacobian determinant of f (that is, the determinant of the Jacobian
matrix).
(ii) Show that f is one-to-one and compute its inverse f −1 .
Exercise 5.74. Prove that there exists δ > 0 such that for all square matrices
A ∈ Rn×n with kA−Ik < δ (where I denotes the identity matrix) there exists B ∈ Rn×n
such that B 2 = A.
Exercise 5.75. Look at each of the following as an equation to be solved for x ∈ R
in terms of parameter y, z ∈ R. Notice that (x, y, z) = (0, 0, 0) is a solution for each of
these equations. For each one, prove that it can be solved for x as a C 1 -function of y, z
in a neighborhood of (0, 0, 0).
3
(a) cos(x)2 − esin(xy) +x = z 2
(b) (x2 + y 3 + z 4 )2 = sin(x − y + z)
(c) x7 + yez x3 − x2 + x = log(1 + y 2 + z 2 )
Compute Yn (t) and Y (t) = limn→∞ Yn (t). Which initial value problem does Y solve?
Exercise 5.77. Consider the initial value problem
0 2 1
y (t) = ey(t) − ty(t) ,
(5.230)
y(1) = 1.
Find an interval I = (1 − h, 1 + h) such that this problem has a unique solution y in I.
Give an explicit estimate for h (it does not need to be best possible).
Exercise 5.78. Consider the initial value problem
0
y (t) = t + sin(y(t)),
(5.231)
y(2) = 1.
Find the largest interval I ⊆ R containing t0 = 2 such that the problem has a unique
solutions y in I.
Exercise 5.79. Let F be a smooth function on R2 (i.e. partial derivatives of all
orders exist everywhere and are continuous) and suppose that the initial value problem
y 0 = F (t, y), y(t0 ) = y0 has a unique solution y on the interval I = [t0 , t0 + a] with
y smooth on I. Let h > 0 be sufficiently small and define tk = t0 + kh for integers
0 ≤ k ≤ a/h.
Define a function yh recursively by setting yh (t0 ) = y0 and
(5.232) yh (t) = yh (tk ) + (t − tk )F (tk , yh (tk ))
for t ∈ (tk , tk+1 ] for integers 0 ≤ k ≤ a/h.
(i) From the proof of Peano’s theorem (Theorem 5.32) it follows that yh → y uniformly
on I as h → 0. Prove the following stronger statement: there exists a constant C > 0
such that for all t ∈ I and h > 0 sufficiently small,
(5.233) |y(t) − yh (t)| ≤ Ch.
Hint: The left hand side is zero if t = t0 . Use Taylor expansion to study how the error
changes as t increases from tk to tk+1 .
7. FURTHER EXERCISES 111
Exercise 5.80. Let us improve the approximation from Exercise 5.79. In the
context of that exercise, define a piecewise linear function yh∗ recursively by setting
yh∗ (t0 ) = y0 and
(5.234) yh∗ (t) = yh∗ (tk ) + (t − tk )G(tk , yh∗ (tk ), h),
for t ∈ (tk , tk+1 ] for integers 0 ≤ k ≤ a/h, where
(5.235) G(t, y, h) = 12 (F (t, y) + F (t + h, y + hF (t, y))).
Prove that there exists a constant C > 0 such that for all t ∈ I and h > 0 sufficiently
small,
(5.236) |y(t) − yh∗ (t)| ≤ Ch2 .
Exercise 5.81. For a function f : [a, b] → R define
Z b
(5.237) I (f ) = (1 + f 0 (t)2 )1/2 dt.
a
2
Let A = {f ∈ C ([a, b]) : f (a) = c, f (b) = d}. Determine f∗ ∈ A such that
(5.238) I (f∗ ) = inf I (f ).
f ∈A
as y → 0 and
X
(5.243) f (x + y) = cα y α + o(kykk )
e
|α|≤k
as y → 0. Show that cα = e
cα for all |α| ≤ k.
Exercise 5.89. Let D = {(x, y) ∈ R2 : x2 + y 2 ≤ 1}. Determine the maximum
and minimum values of the function f : D → R, f (x, y) = 4x2 − 3xy.
Exercise 5.90. Let f ∈ C 2 (Rn ) and suppose that the Hessian of f is positive
definite at every point. Show that ∇f : Rn → Rn is an injective map.
Exercise 5.91. Let f ∈ C 2 (Rn ) be strongly convex. Show that ∇f : Rn → Rn is
a diffeomorphism (that is, show that it is differentiable, bijective and that its inverse is
differentiable).
Exercise 5.92. Let f (x) = 21 hAx, xi − hb, xi + c with A ∈ Rn×n and b ∈ Rn , c ∈ R.
Assume that A is symmetric and positive definite. Show that f has a unique global
minimum at some point x∗ and determine f (x∗ ) in terms of A, b, c.
Exercise 5.93. Prove that the point x∗ from Exercise 5.92 can be computed using
gradient descent: that is, if x0 ∈ Rn arbitrary and
(5.244) xn+1 = xn − α∇f (xn )
for n = 0, 1, 2, . . . , then the sequence (xn )n converges to x∗ for all starting points
x0 ∈ Rn , provided that α is chosen sufficiently small.
Exercise 5.94. Let D ⊂ R2 be a finite set. Define a function E : R3 → R by
X
(5.245) E(a, b, c) = (ax21 + bx1 + c − x2 )2 .
x∈D
Exercise 5.95. (a) Find a convex function that is not bounded from below.
(b) Find a strictly convex function that is not bounded from below.
(c) If a function is strictly convex and bounded from below, does it necessarily have a
critical point? (Proof or counterexample.)
Exercise 5.96. (a) Give an example of a convex function that is not continuous.
(b) Let f : (a, b) → R. Show that if f is convex, then f is continuous.
Exercise 5.97. Construct a strictly convex function f : R → R such that f is not
differentiable at x for every x ∈ Q.
7. FURTHER EXERCISES 113
is a closed subset of [0, 1] and has empty interior. Therefore, it is nowhere dense.
Lemma 6.3. Suppose A1 , . . . , An ⊂ X are nowhere dense sets. Then nk=1 Ak is
S
nowhere dense.
Proof. Without loss of generality let n = 2. We need to show that A1 ∪ A2 has
c
empty interior. Equivalently, setting Uk = Ak for k = 1, 2. We show that U1 ∩ U2 is
dense. Let U ⊂ X be a non-empty open set. Then V1 = U ∩ U1 is open and non-empty,
because U1 is dense. Since U2 is also dense, V1 ∩ U2 = U ∩ (U1 ∩ U2 ) is non-empty, so
U1 ∩ U2 is dense.
Also, a subset of a nowhere dense set is nowhere dense and the closure of a nowhere
dense set is nowhere dense.
However, countable unions of nowhere dense sets are not necessarily nowhere dense
sets.
Example 6.4. Enumerate the rationals as Q = {q
S1∞, q2 , . . . }. For every k = 1, 2, . . . ,
the set Ak = {qk } is nowhere dense in R. But Q = k=1 Ak ⊂ R is not nowhere dense
(it is dense!).
Definition 6.5. A set A ⊂ X is called meager (or of first category) in X if it is the
countable union of nowhere dense sets. A is called comeager (or residual or of second
category) if Ac is meager.
115
116 6. THE BAIRE CATEGORY THEOREM*
The above example shows that Q ⊂ R is meager. In fact, every countable subset of
R is meager (because single points are nowhere dense in R).
By definition, countable unions of meager sets are meager. The choice of the word
“meager” suggests that meager sets are somehow “small” or “negligible”. But how
“large” can meager sets be? For example, can X be meager? That is, can we write
the entire metric space X as a countable union of nowhere dense subsets? The Baire
category theorem will show that the answer is no, if X is complete.
Theorem 6.6 (Baire category theorem). In a complete metric space, meager sets
have empty interior. Equivalently, countable intersections of open dense sets are dense.
Corollary 6.7. Let X be a complete metric space and A ⊂ X a meager set. Then
A 6= X. In other words, X is not a meager subset of itself.
Example 6.8. The conclusion of the Baire category theorem fails if we drop the
assumption that X is complete: let X = Q with the metric inherited from R (so
d(p, q) = |p − q|). Then X is a meager subset of itself because it is countable and single
points are nowhere dense in X (X has no isolated points). But the interior of X is
non-empty, because X is open in X.
Example 6.9. Not every set with empty interior is meager: consider the irrational
numbers A = R \ Q. A has empty interior, because Ac = Q is dense. It is not meager,
because otherwise R = A ∪ Ac would be meager, which contradicts the Baire category
theorem.
Exercise 6.10. Another notion of “smallness” is the following:
Definition. A set A ⊂ R is called a Lebesgue null set if for every ε > 0 there exist
intervals I1 , I2 , . . . such that
∞
[ ∞
X
(6.2) A⊂ Ij and |Ij | ≤ ε.
j=1 j=1
T∞Proof of Theorem 6.6. Let (Un )n be open dense sets. We need to show that
T∞n=1 Un is dense. Let U ⊂ X be open and non-empty. It suffices to show that U ∩
n=1 Un is non-empty. Since U1 is open and dense, U ∩ U1 is open and non-empty.
Choose a closed ball B(x1 , r1 ) ⊂ U ∩ U1 with r1 ∈ (0, 1). Then B(x1 , r1 ) ∩ U2 is
open and non-empty (because U2 is dense), so we can choose a closed ball B(x2 , r2 ) ⊂
B(x1 , r1 ) ∩ U2 with r2 ∈ (0, 12 ). Iterating this process, we obtain a sequence of closed
balls (B(xn , rn ))n such that B(xn , rn ) ⊂ B(xn−1 , rn−1 ) ∩ Un and rn ∈ (0, n1 ). By Lemma
in ∞
T
6.11 there exists a point x contained
T∞ n=1 B(xn , rn ). Since B(xn , rn ) ⊂ U ∩ Un for
all n ≥ 1, we have x ∈ U ∩ n=1 Un .
The Baire category theorem has a number of interesting consequences.
118 6. THE BAIRE CATEGORY THEOREM*
2. Sets of continuity*
Definition 6.16. Let X, Y be metric spaces and f : X → Y a map. The set
(6.9) Cf = {x ∈ X : f is continuous at x} ⊂ X
is called the set of continuity of f . Similarly, X \ Cf is called the set of discontinuity
of f .
Example 6.17. Let f : R → R be defined by f (x) = 1 if x is rational and f (x) = 0
if x is irrational. Then Cf = ∅.
Example 6.18. Let f : R → R be defined by f (x) = x if x is rational and f (x) = 0
if x is irrational. Then Cf = {0}.
Example 6.19. Consider the function f : R → R defined as follows: we set f (0) = 1
and if x ∈ Q \ {0}, then we let f (x) = 1/q, where x = pq , where p ∈ Z, q ∈ N and the
greatest common divisor of p and q is one. If x 6∈ Q, then we let f (x) = 0. We claim
that Cf = R \ Q. Indeed, say x ∈ R \ Q and pn /qn → x a rational approximation. Then
qn → ∞ (otherwise, it must converge and then x would be rational). √This implies that
f is continuous at x. On the other hand, say x ∈ Q. Set xn = x + n2 . Then xn 6∈ Q
√
because 2 6∈ Q, so f (xn ) = 0 for all n, so limn→∞ f (xn ) = 0, but f (x) 6= 0. Hence f
is not continuous at x.
It is natural to ask which subsets of X arise as the set of continuity of some function
on X. For instance, does there exist a function f : R → R such that Cf = Q ?
Definition 6.20. A set A ⊂ X is called an Fσ -set if it is a countable union of
closed sets. A set G ⊂ X is called a Gδ -set if it is a countable intersection of open sets.
These names are motivated historically. The F in Fσ is for fermé which is French
for closed. On the other hand, the G in Gδ is for Gebiet which is German for region.
Examples 6.21. 1. Every open set is a Gδ -set and every closed set is an Fσ -set.
2. Let x ∈ X. Then {x} is a Gδ -set: S
it is the intersection of the open balls B(x, 1/n).
3. Q ⊂ R is an Fσ set, because Q = q∈Q {q} (a countable union of closed sets).
Theorem 6.22. Let X and Y be metric spaces and f : X → Y a map. Then
Cf ⊂ X is a Gδ -set and X \ Cf is an Fσ -set.
Proof. Let f : X → Y be given. It suffices to show that Cf is a Gδ -set. For every
S ⊂ X we define the oscillation of f on S by
(6.10) ωf (S) = sup dY (f (x), f (x0 )) = diam f (S).
x,x0 ∈S
Then we have
(6.12) x ∈ Cf ⇐⇒ ωf (x) = 0
and we can write the set of continuity of f as
∞
\
(6.13) Cf = {x ∈ X : ωf (x) < n1 }.
n=1
120 6. THE BAIRE CATEGORY THEOREM*
We are done if we can show that Un = {x ∈ X : ωf (x) < n1 } is open for every n ∈ N. Let
x0 ∈ Un . Then ωf (x0 ) < n1 . Therefore, there exists ε > 0 such that ωf (B(x0 , ε)) < n1 .
Let x ∈ B(x0 , ε/2). Then by the triangle inequality, B(x, ε/2) ⊂ B(x0 , ε). Therefore,
(6.14) ωf (x) ≤ ωf (B(x, ε/2)) ≤ ωf (B(x0 , ε)) < n1 .
Thus, B(x0 , ε/2) ⊂ Un and so Un is open.
As a sample application of the Baire category theorem we now answer one of our
previous questions negatively:
Lemma 6.23. Q ⊂ R is not a Gδ -set. Consequently, there exists no function f :
R → R such that Cf = Q.
Proof. Suppose Q is a Gδ -set. Then R \ Q is an Fσ -set and therefore can be
written as a countable union of closed sets A1 , A2 , . . . . Since R \ Q has empty interior
(its complement Q is dense), An ⊂ R \ Q also has empty interior for every n. Thus An
is nowhere dense, so R \ Q is meager. But then R = Q ∪ (R \ Q) must be meager, which
contradicts the Baire category theorem.
Remark. Observe that an Fσ -set is either meager or has non-empty interior: suppose
A ⊂ X is an Fσ -set with empty interior. Then it is a countable union of closed sets with
empty interior and therefore meager. Similarly, a Gδ -set is either comeager or not dense.
Remark. It is natural to ask if the converse of Theorem 6.22 is true in the following
sense: given a Gδ -set G ⊂ X, can we find a function f : X → R such that Cf = G ?
This cannot hold in general: suppose X contains an isolated point, that is X contains an
open set of the form {x}. Then necessarily x ∈ Cf , but x is not necessarily contained in
every possible Gδ -set. However, this turns out to be the only obstruction: if X contains
no isolated points, then for every Gδ -set G ⊂ X one can find f : X → R such that
Cf = G. For a very short proof of this, see S. S. Kim: A Characterization of the Set
of Points of Continuity of a Real Function. Amer. Math. Monthly 106 (1999), no. 3,
258—259.
3. THE UNIFORM BOUNDEDNESS PRINCIPLE* 121
In other words, a family of bounded linear operators is uniformly bounded if and only
if it is pointwise bounded.
This theorem is also called the uniform boundedness principle.
Proof. In the ’⇐’ direction there is nothing to show. Let us prove ’⇒’. Suppose
that supT ∈F kT xkY < ∞ for all x ∈ X. Define
(6.16) An = {x ∈ X : sup kT xkY ≤ n} ⊂ X.
T ∈F
By the Baire category theorem, X is not meager. Thus, there exists n0 ∈ N such that
An0 has non-empty interior. This means that there exists x0 ∈ An0 and ε > 0 such that
(6.18) B(x0 , ε) ⊂ An0 .
Let x ∈ X be such that kxkX ≤ ε. Then for all T ∈ F,
(6.19) kT xkY = kT (x0 − x) − T x0 kY ≤ kT (x0 − x)kY + kT x0 kY ≤ 2n0 .
Now we use the usual scaling trick: let x ∈ X satisfy kxkX = 1. Then
(6.20) kT xkY = ε−1 kT (εx)kY ≤ 2ε−1 n0 .
This implies
(6.21) sup kT kop = sup sup kT xkY ≤ 2ε−1 n0 < ∞.
T ∈F T ∈F kxkX =1
Example 6.25. If X is not complete, then the conclusion of the theorem may fail.
For instance, let X be the space of all sequences (xn )n ⊂ R such that at most finitely
many of the xn are non-zero. Equip X with the norm kxk∞ = supn∈N |xn |. Define
`n : X → R by `n (x) = nxn . `n is a bounded linear map because
(6.22) |`n (x)| = |nxn | ≤ nkxk∞ .
For every x ∈ X there exists Nx ∈ N such that xn = 0 for all n > Nx .This implies that
(6.23) sup |`n (x)| = max{|`n (x)| : n = 1, . . . , Nx } < ∞.
n∈N
122 6. THE BAIRE CATEGORY THEOREM*
But k`n kop ≥ n because |`n (en )| = n (where en denotes the sequence such that en (m) =
0 for every m 6= n and en (n) = 1). Thus,
(6.24) sup k`n kop = ∞.
n∈N
Remark. In the proof we only needed that X is not meager. This is true if X is
complete, but it may also be true for an incomplete space.
As a first application of the uniform boundedness principle we prove that the point-
wise limit of a sequence of bounded linear operators on a Banach space must be a
bounded linear operator.
Corollary 6.26. Let X be a Banach space and Y a normed vector space. Suppose
(Tn )n ⊂ L(X, Y ) is such that (Tn x)n converges to some T x for every x ∈ X. Then
T ∈ L(X, Y ).
Proof. Linearity of T follows from linearity of limits. It remains to show that T is
bounded. Let x ∈ X. Since (Tn x)n converges, we have supn kTn xkY < ∞ (convergent
sequences are bounded). By the Banach-Steinhaus theorem, there exists C ∈ (0, ∞)
such that kTn kop ≤ C for every n. Let x ∈ X. Then
(6.25) kT xkY = lim kTn xkY ≤ CkxkX .
n→∞
Remark. Note that in the context of Corollary 6.26 it does not follow that Tn → T in
L(X, Y ). For instance, let Tn : `1 → `1 and Tn (x) = xn en . Then Tn (x) → 0 as n → ∞
for every x ∈ `1 , but kTn kop = 1 for every n ∈ N, so Tn does not converge to 0 in
L(X, Y ).
3. THE UNIFORM BOUNDEDNESS PRINCIPLE* 123
Therefore, TN is bounded and kTN kop ≤ kDN k1 . To prove the lower bound we let
(6.36) f (x) = sgn(DN (x0 − x)).
While f is not a continuous function, it can be approximated by continuous functions
as the following exercise shows.
Exercise 6.30. Show that for every ε > 0 there exists g ∈ C(T) such that |g(t)| ≤ 1
for all t ∈ R and
Z 1
ε
(6.37) |f (t) − g(t)|dt ≤
0 2N + 1
Hint: Modify the function f in a small enough neighborhood of each discontinuity; g
can be chosen to be a piecewise linear function.
So let ε > 0 and choose g ∈ C(T) as in the exercise. We have
Z 1 Z 1
(6.38) |TN f | = |f ∗ DN (x0 )| = sgn(DN (t))DN (t)dt = |DN (t)|dt = kDN k1 .
0 0
Moreover,
(6.39) |TN g| ≥ |TN f | − |TN (f − g)|,
The error term |TN (f − g)| can be estimated as follows:
(6.40) Z 1 Z 1
ε
|TN (f −g)| ≤ |DN (x0 −t)||f (t)−g(t)|dt ≤ kDN k∞ |f (t)−g(t)|dt ≤ (2N +1) = ε.
0 0 2N + 1
so
(6.41) kTN kop ≥ |TN g| ≥ kDN k1 − ε.
Since ε > 0 was arbitrary, this implies kTN kop ≥ kDN k1 .
Armed with this knowledge, we can now reveal Corollary 6.27 as a direct conse-
quence of Theorem 6.24. Indeed, we have that
(6.42) kTN kop = kDN k1 ≥ c log(N )
1Themetric being the quotient metric inherited from R or the subspace metric induced by the
inclusion S 1 ⊂ R2 . These metrics are equivalent.
3. THE UNIFORM BOUNDEDNESS PRINCIPLE* 125
and therefore
(6.43) sup kTN kop = ∞.
N ∈N
Remark. Continuous functions with divergent Fourier series can also be constructed
explicitly. The conclusion of Corollary 6.27 can be strengthened significantly: for every
Lebesgue null set A ⊂ T 2 there exists a continuous function whose Fourier series
diverges on A (see J.-P. Kahane, Y. Katznelson: Sur les ensembles de divergence des
séries trigonométriques, Studia Math. 26 (1966), 305–306.).
On the other hand, L. Carleson proved in 1966 that the Fourier series of a continuous
function must always converge almost everywhere (that is, everywhere except possibly
on a Lebesgue null set). This is a very deep result in Fourier analysis which is difficult
to prove (see M. Lacey, C. Thiele: A proof of boundedness of the Carleson operator,
Math. Res. Lett. 7 (2000), no. 4, 361—370 for a very elegant proof).
2See Exercise 6.10 for a definition on R; Lebesgue null sets of T are precisely the images of Lebesgue
null sets on R under the canonical quotient map R → R/Z = T.
126 6. THE BAIRE CATEGORY THEOREM*
4. Kakeya sets*
Definition 6.31. We call a compact set A ⊂ Rn a Kakeya set if A contains a unit
line segment in every direction. That is, if for every v ∈ Rn with kvk = 1 there exists
x ∈ A such that x + tv ∈ A for all t ∈ [0, 1].
(Note that this is only an interesting concept if n ≥ 2.)
Example 6.32. Consider the unit disk A = {x ∈ R2 : kxk ≤ 1}. Clearly 0+tv ∈ A
for every t ∈ [0, 1] and v ∈ R2 with kvk = 1, so A is a Kakeya set in R2 . The area of
the unit disk is π/4.
Example 6.33. Let A be the compact set the boundary of which is the deltoid
curve defined by γ(t) = ( 12 cos(t) + 14 cos(2t), 21 sin(t) − 41 sin(2t)) for t ∈ R. It can be
seen that A is a Kakeya set and has area π/8 (draw a picture).
Do there exist Kakeya sets in R2 with even smaller area? What is the smallest
possible “area” or “volume” of a Kakeya set in Rn ?
While we are not going to attempt a rigorous definition of the notion of “volume” for
an arbitrary subset of Rn at this point (this leads to a subject of its own, called measure
theory), we can easily make rigorous what we mean by a subset of “zero volume”.
Definition 6.34. A set in A ⊂ Rn is called a Lebesgue null set (or of Lebesgue
measure zero) if for every ε > 0 there exist (x1 , r1 ), (x2 , r2 ), . . . with xi ∈ Rn and ri > 0
such that
[∞ X∞
(6.45) A⊂ B(xi , ri ) and rin ≤ ε.
i=1 i=1
In other words, A is a Lebesgue null set if it can be covered by countably many balls
the combined volume of which can be made arbitrarily small. Intuitively, Lebesgue null
sets are sets of “volume zero”.
The surprising answer to our question on the smallest possible volume of Kakeya
sets is that Kakeya sets may have volume zero.
Theorem 6.35. Let n ≥ 2. There exists a compact set K ⊂ Rn such that K is a
Kakeya set and a Lebesgue null set.
Remark. Many explicit constructions of such sets have been described in the literature.
The first example (for the case n = 2) was given by Besicovitch in 1926. Therefore such
sets are also called Besicovitch sets.
We will give a non-constructive proof using the Baire category theorem. This proof
first appeared in T. W. Körner: Besicovitch via Baire, Stud. Math. 158 (2003), no.
1, 65–78.
To simplify the exposition we only consider the case n = 2, but the method can be
extended to any n ≥ 3. To apply the Baire category theorem, we need to work in a
complete metric space. Let K denote the set of all non-empty compact subsets of R2 .
We need to define a metric on K. For a point x ∈ Rn and a set A ∈ K we define
(6.46) d(x, A) = inf ka − xk.
a∈A
4. KAKEYA SETS* 127
For A, B ∈ K we define
(6.47) d(A, B) = max(sup d(a, B), sup d(A, b)).
a∈A b∈B
This is called Hausdorff metric.
Exercise 6.36. Show that d is a metric on K and that (K, d) is a complete metric
space.
Consider the set of all P ∈ K with P ⊂ [−1, 1] × [0, 1] which are of the form
[
(6.48) P = `i ,
i∈I
where I is some index set and for each i ∈ I, there exist x1 , x2 ∈ [−1, 1] such that `i is
the line segment connecting the point (x1 , 0) to the point (x2 , 1). We define P ⊂ K to be
the set of all such P such that additionally for every |v| ≤ 12 there exist x1 , x2 ∈ [−1, 1]
such that x2 −x1 = v and the line segment connecting (x1 , 0) to (x2 , 1) is contained in P .
This definition ensures that sets P ∈ P are “almost” Kakeya sets in the sense that
while they do not contain a line segment in every direction, they do contain a line
segment pointing in every direction that makes a sufficently small angle with the y-
axis. We can always produce a true Kakeya set from such a P by taking a finite union
of some rotated copies of P .
Exercise 6.37. Show that P is a closed subset of K (with respect to the Hausdorff
metric d).
This implies in particular that (P, d|P×P ) is a complete metric space. Thus, we are
done if we can show that there exists a set P ∈ P that has Lebesgue measure zero. We
will actually show the following stronger result.
Theorem 6.38 (Körner). The set
(6.49) B = {P ∈ P : P is a Lebesgue null set} ⊂ P
is comeager.
The Baire category theorem says that comeager subsets of complete metric spaces
are dense and in particular, non-empty.
To prove Theorem 6.38 it suffices to show that B contains a countable intersection
of open dense sets.
Let v ∈ [0, 1] and ε > 0. Then we define P(v, ε) ⊂ P to be the set of all P ∈ P such
that there exist finitely many intervals I1 , . . . , IN such that if y ∈ [0, 1] ∩ [v − ε, v + ε],
then
N
[ XN
(6.50) {x : (x, y) ∈ P } ⊂ Ij and |Ij | < 100ε.
j=1 j=1
The numbers dim(A), dim(A) do not depend on the norm on Rn used to form the
balls that appear in the definition of Nδ (A) (because the number Nδ (A) only changes
by a multiplicative constant when swapping out norms). The balls in the maximum
norm on Rn defined by kxk∞ = maxi=1,...,n |xi | look like boxes. This motivates the term
“box counting dimension”.
This notion of dimension conincides with our intuition about dimension. For in-
stance, the set A from Example 6.40 which we referred to as a “k-dimensional box”
actually has box counting dimension k. Note that there is no reason why the box
counting dimension of some given set A should always be an integer. In fact, there are
lots of compact sets with a non-integer box counting dimension. We refer to such sets
as fractals (because they have fractional dimension).
Example 6.43. Maybe the simplest example of a fractal is the Cantor set C ⊂ [0, 1]
(see (6.1)). In the iterative construction of the Cantor set, at the kth step we arrive at
a disjoint union of 2k closed intervals each of which has length 3−k . Thus
(6.61) N 1 ·3−k (C) = 2k .
2
Similarly, Nδ (C) ≈ 2 where δ ∈ (0, 1) and k is such that 3−k−1 < δ ≤ 3−k . This shows
k
that
log(2)
(6.62) 0 < dim(C) = < 1.
log(3)
Dimension is related the notion of Lebesgue null sets in the following way.
n
Example 6.45. It is not true that a Lebesgue null set in R necessarily has box
counting dimension n: take the set A = Q ∩ [0, 1] ⊂ R (A is not compact, but it
still makes sense to speak of its box counting dimension). It is not hard to show that
dim(A) = 1.
In view of this fact and the existence of Besicovitch sets, it is a natural instance of
our original question about the smallest possible “size” of a Kakeya set to ask whether
there exist Besicovitch sets in Rn that have a box counting dimension strictly smaller
than n. It is conjectured that the answer is ’no’ for all n.
This is known to hold if n = 2 (and trivial if n = 1), but still widely open if n ≥ 3.
See Exercise 6.53 below for a walkthrough to a simple proof that dim(K) ≥ n+1 2
. Wolff
(1995) proved that dim(K) ≥ n+2 2
. The currently best known results are as follows:
130 6. THE BAIRE CATEGORY THEOREM*
•
n = 2: dim(K) = 2 (Davies 1971)
n = 3: dim(K) ≥ 52 + 10−10 (Katz-Laba-Tao 1999)
•
n = 4: dim(K) ≥ 3 + 10−10 (√
• Laba-Tao 2000)
•
4 < n < 24: dim(K) ≥ (2 − 2)(n − 4) + 3 (Katz-Tao 2001)
n ≥ 24: dim(K) ≥ n/α+(α−1)/α, where α ∈ (1, 2) is such that α3 −4α+2 = 0
•
(Katz-Tao 2001)
The Kakeya conjecture has many surprising connections to other open problems in
mathematics, in particular Fourier analysis.
5. Further exercises
Exercise 6.46. We define the subset A ⊂ R as follows: x ∈ A if and only if there
exists c > 0 such that
(6.64) |x − j2−k | ≥ c2−k
holds for all j ∈ Z and integers k ≥ 0. Show that A is meager and dense.
Exercise 6.47. Show that the set A from Exercise 6.46 is a Lebesgue null set.
Exercise 6.48. Let (X, d) be a complete metric space without isolated points.
Prove that X cannot be countable.
Exercise 6.49. (i) Show that if X is a normed vector space and U ⊂ X a proper
subspace, then U has empty interior.
(ii) Let
(6.65) X = {P : R → R | P is a polynomial}.
Use the Baire category theorem to prove that there exists no norm k · k on X such that
(X, k · k) is a Banach space.
(iii) Let X be an infinite dimensional Banach space. Prove that X cannot have a
countable (linear-algebraic) basis.
Exercise 6.50. Consider X = C([−1, 1]) with the usual norm kf k∞ = supt∈[−1,1] |f (t)|.
Let
(6.66) A+ = {f ∈ X : f (t) = f (−t) ∀t ∈ [−1, 1]},
Then K(δ) must contain a δ-tube in every direction. Let Tδ denote a maximal δ-
separated collection of δ-tubes contained in K(δ) (then Tδ contains roughly δ 1−n many
δ-tubes). If A ⊂ Rn is a finite union of δ-tubes, then we denote by vol(A) the volume
of A.
(i) Prove that there must exist a point x ∈ K(δ) such that the number of tubes
T ∈ Tδ such that x ∈ T is at least c/vol(∪Tδ ), where c > 0 is a constant
depending only on the dimension, n.
1
(ii) Conclude from (i) that there exists c > 0 such that for every δ ∈ (0, 10 ):
n−1
(6.69) vol(∪Tδ ) ≥ c · δ 2