Math522-notes
Math522-notes
Lecture Notes
Work in progress, last updated: May 12, 2023
Note to students 5
Chapter 1. Metric spaces 7
1. Topology 7
2. The contraction principle 14
3. Compactness 17
4. Covering numbers and Minkowski dimension* 26
5. Oscillation as a quantification of discontinuity* 28
6. Further exercises 29
Chapter 2. Linear operators and derivatives 33
1. Bounded linear operators 33
2. Equivalence of norms 36
3. Dual spaces* 38
4. Sequential `p spaces* 39
5. Derivatives 41
6. Further exercises 46
Chapter 3. Differential calculus in Rn 49
1. Inverse function theorem 52
2. Implicit function theorem 54
3. Ordinary differential equations 56
4. Higher order derivatives and Taylor’s theorem 64
5. Local extrema 69
6. Local extrema on surfaces 71
7. Optimization and convexity* 73
8. Further exercises 79
Chapter 4. Approximation of functions 85
1. Polynomial approximation 85
2. Orthonormal systems 87
3. The Haar system 91
4. Trigonometric polynomials 95
5. The Stone-Weierstrass Theorem 103
6. Further exercises 105
Chapter 5. From Riemann to Lebesgue* 111
1. Lebesgue null sets 111
2. Lebesgue’s Characterization of the Riemann integral 112
Chapter 6. The Baire category theorem* 115
1. Nowhere differentiable continuous functions* 117
2. Sets of continuity* 118
3
4 CONTENTS
These are lecture notes for a second undergraduate course in analysis, taught as
Math 522 at UW Madison. J.R. prepared a full set of lecture notes for the class in the
fall semesters of 2018 and 2019; they were preceded by individual notes on some of the
topics, written by A.S. for previous classes. The current version is by no means a final
one; all chapters are still undergoing revisions and some will be further expanded. We
are grateful to the students of several Math 522 classes for useful questions and remarks
on previous versions of the notes.
There is more content in these notes than we can cover in Math 522, and you may
receive updates about the precise lecture contents throughout the course.
The notes in the present form are likely to still contain typos, errors and imprecisions
of all kinds. Do not ever take anything that you read in a mathematical text for granted.
Think hard about what you are reading and try to make sense of it independently. If
that fails, then it’s time to ask somebody a question and that usually helps. In the spring
semester of 2023 the course will be taught by A.S. He will welcome all comments about
the contents of these notes - please let him know about any misprints or inaccuracies
that you may find.
There are many books on mathematical analysis, each of which will likely have a
large intersection with this course. Here are three very good ones:
• W. Rudin, Principles of mathematical analysis
• T. Apostol, Mathematical analysis: A modern approach to advanced calculus
• T. Körner, A Companion to Analysis: A Second First and First Second Course
in Analysis
For further self study in analysis we recommend the Princeton Lectures in Analysis I-
IV, by Stein and Shakarchi. Throughout the course A.S. will make concrete suggestions
for further reading related to the content of these lecture notes.
• E. M. Stein, R. Shakarchi, Fourier Analysis, an introduction.
• , Complex Analysis
• , Real Analysis : measure theory, integration, and Hilbert spaces
• , Functional Analysis
We mention two excellent books used in first year analysis graduate courses at UW
Madison.
• W. Rudin, Real and Complex Analysis
• G. Folland, Real Analysis, modern techniques and their applications.
Finally, a concise and more general treatment of differential calculus in normed spaces
can be found in chapter 1 of
• L. Hörmander, The analysis of linear partial differential operators, vol. I.
5
CHAPTER 1
Metric spaces
1. Topology
The notion of a metric space serves as a convenient abstract setting that underlies
all topics discussed in this course. A metric space can be thought of as a collection
of distinct objects that come with a distance between them. This provides a structure
that makes it meaningful to speak of notions such as convergence and continuity. It
will allow us to use the same terminology for potentially very different kinds of objects.
Definition 1.1 (Metric space). A set X equipped with a map d : X × X → [0, ∞)
is called a metric space if X is not the empty set and for all x, y, z ∈ X,
(1) d(x, y) = d(y, x),
(2) d(x, z) ≤ d(x, y) + d(y, z),
(3) d(x, y) = 0 if and only if x = y.
d is called a metric .
x z
One may imagine the d to stand for ‘distance‘. If multiple metric spaces are relevant
at the same time, then we may also write dX for the metric d on the metric space X.
Examples 1.2. Some fundamental examples of metric spaces that will be important
in this course are
• the real numbers R with d(x, y) = |x − y|,
• closed and open intervals of real numbers (with the same metric),
• the complex numbers C with d(z, w) = |z − w|,
• n-dimensional Euclidean space Rn consisting of vectors x = (x1 , . . . , xn ) with
the Euclidean metric
n
X 1/2
d(x, y) = |xi − yi |2 ,
i=1
7
8 1. METRIC SPACES
• the space c0 of sequences (an )n∈N of complex numbers with limn=0 an = 0, with
the same metric as for `∞ .
Exercise 1.3. Verify that each of the preceding examples is really a metric space.
In the following let X be a metric space with metric d.
1.1. Open and closed sets. For every x0 ∈ X and r > 0 define the open ball
B(x0 , r) = {x ∈ X : d(x, x0 ) < r},
and the closed ball
B(x0 , r) = {x ∈ X : d(x, x0 ) ≤ r}.
Example 1.4. If X = R (always with the usual metric), then the open balls are
open intervals and the closed balls are closed intervals.
Should multiple metric spaces be involved we use subscripts on the metric and balls
to indicate which metric space we mean, i.e. BX (x0 , r) is a ball in the metric space X.
Definition 1.5 (Open set). Let X be a metric space and U ⊂ X. A point x ∈ U
is called interior in U if there exists r > 0 such that B(x, r) ⊂ U .
The set U ⊂ X is called open if every point x ∈ U is interior.
Clarification of notation: A ⊂ B means for us that A is a subset of B, not
necessarily a proper subset. That is, we also allow A = B. We will write A ( B to
refer to proper subsets.
Note that a union of open sets is open. The family (U )U ⊂X open of open sets is also
called the topology of X. Note from the definition that the topology of a metric space
X is determined by the open balls (B(x, r))x∈X,r>0 . The notion of open sets can be
generalized and leads to the concept of topological spaces, which we will not need in
this course.
Definition 1.6 (Closed set). A set A ⊂ X is called closed if its complement
{
A = X \ A is open. If A ⊂ X is an arbitrary set, then A denotes the intersection of
all closed sets containing A.
Since an intersection of closed sets is closed, A is closed by definition and called
the closure of A. It is the ‘smallest’ closed set containing A in the sense that if A0 is a
closed set with A ⊂ A0 , then A ⊂ A0 . As a consequence, a set A is closed if and only if
A = A.
Exercise 1.7. Verify that open balls are open and closed balls are closed.
Note that B(x0 , r), the closure of the open ball B(x0 , r), often coincides with the
closed ball B(x0 , r). While this is the case in most of the metric spaces encountered in
this course, it is generally only true that
B(x0 , r) ⊂ B(x0 , r) ⊂ B(x0 , r).
Example 1.8. Let X be a non-empty set. For x, y ∈ X let
0, if x = y,
d(x, y) =
1, if x 6= y.
1. TOPOLOGY 9
This defines a metric on X (called the trivial metric). The topology on X is very
boring: every set is open (hence also every set is closed). Then, for every x ∈ X,
B(x, 1) = B(x, 1) = {x} ⊂ B(x, 1) = X.
Definition 1.9 (Accumulation point). Let A ⊂ X. A point x ∈ X is called an
accumulation point of A if for every r > 0, there exists y ∈ Br (x) ∩ A with y 6= x.
Lemma 1.10. Let X be a metric space and A ⊂ X. Then A is equal to the union
of A and the set of accumulation points of A.
Proof. For one direction we take an arbitrary closed set C containing A and we
have to show that every accumulation point x belongs to C. If x were in X \ C (an
open set not intersecting A) then there would be an ε > 0 and a ball B(x, ε) such that
B(x, ε) ⊂ X \ C and hence B(x, ε) ∩ A ⊂ B(x, ε) ∩ C = ∅, in contradiction to x being
an accumulation point. Since C was an arbitrary closed set containing A we find that
A and the set of accumulation points are both subsets of A.
To show the converse let x ∈ A \ A; we have to show that x is an accumulation
point. Again argue by contradiction and suppose that x is not an accumulation point.
Then there would exist a ball B(x, ε) containing no points in A (other than x itself,
but that is excluded by assumption). Hence C = (X \ B(x, ε)) ∩ A would be a closed
set containing A, with C ( A, a contradiction to the definition of closure of A.
1.2. Relative topology. If we have a metric space X with metric d and a non-
empty subset A ⊂ X, then A can be made a metric space by restricting the metric: we
define the metric dA : A × A → [0, ∞) by setting
dA (x, y) = d(x, y) for all x, y ∈ A.
In other words, dA is the restriction of d to the set A × A ⊂ X × X, also denoted
by dA = d|A×A . As a metric space, A comes with its own open sets: unpacking the
definition, a set U ⊂ A is open in A if and only if for every x ∈ U there exists r > 0
such that
BA (x, r) = {y ∈ A : d(x, y) < r} ⊂ U.
Observe that the open balls in A are not necessarily open balls in X. As a consequence,
a set U ⊂ A that is open in A is not necessarily open in X. However, the open sets in
A can be characterized by the open sets in X.
Lemma 1.11. Let A ⊂ X. A set U ⊂ A is open in A if and only if there exists an
open set V ⊂ X such that U = V ∩ A.
Proof. Suppose that U = V ∩ A with V open. We have to show that U is open in
A. Let x ∈ U ⊂ V then there is a ball B(x, r) = {y ∈ X : d(x, y) < rx } contained in V .
Then BA (x, rx ) ⊂ U , so x is an interior point of U (with respect to the metric on A).
Vice versa, let U be open in A. Then for every x ∈ U there is rx > 0 such that
x ∈ BA (x, rx ) ⊂ U , and thus U = ∪x∈U BA (x, rx ). Define V = ∪x∈U B(x, rx ). Then V is
open in X and V ∩ A = U .
Example 1.12. Let X = R, A = [0, 1]. Then U = [0, 21 ) ⊂ A ⊂ X is open in A, but
not open in R. However, there exists V ⊂ R open such that U = V ∩ A: for example,
V = (−1, 12 ).
10 1. METRIC SPACES
1.3. Convergence.
Definition 1.13 (Convergence). Let X be a metric space, (xn )n∈N ⊂ X a sequence
and x ∈ X. We say that (xn )n∈N converges to x if for all ε > 0 there exists N ∈ N
such that for all n ≥ N it holds that d(xn , x) < ε.
If (xn )n∈N converges to x we also call x the limit of the sequence and write x =
limn→∞ xn ; alternatively we may also write that xn → x in X.
Definition 1.14 (Cauchy sequence). Let X be a metric space. A sequence (xn )n∈N
in X is called Cauchy sequence if for every ε > 0 there exists N ∈ N such that for all
n, m ≥ N we have
d(xn , xm ) < ε.
Lemma 1.15. Every convergent sequence is a Cauchy sequence.
Proof. Let ε > 0. Since xn → x there is N so that for all k ≥ N we have
d(xk , x) ≤ ε/2. If m ≥ N , n ≥ N we get by the triangle inequality
d(xn , xm ) ≤ d(xm , x) + d(x, xn ) < ε/2 + ε/2 = ε
and since ε was arbitrary the result is proved.
Definition 1.16 (Completeness). A metric space X is called complete if every
Cauchy sequence (xn )n∈N ⊂ X converges.
Example 1.17. The metric space of rational numbers, Q (with the usual metric)
is not complete: the sequence of rational numbers
√
(10−n b10n 2c)n∈N = (1.4, 1.41, 1.414, . . . )
is a Cauchy sequence, but it does not converge in Q.
√ This is because it converges as a
sequence of real numbers to the irrational number 2 6∈ Q.
The real numbers form an example of a complete metric space (in fact, they are
usually defined via completion of the rational numbers).
Lemma 1.18. If X is complete and A ⊂ X is closed, then A is a complete metric
space.
Proof. Let (xn )n∈N be a Cauchy sequence in A. Since dA = dX |A×A , (xn ) is a
Cauchy sequence in X, it has, by assumption, a limit x ∈ X. By Lemma 1.10 x ∈ A,
but by assumption A = A. Hence xn converges to x in A.
Note that this is not true if X is not complete: for example, every metric space is
a closed subset of itself, but not every metric space is complete.
1.4. Continuity.
Definition 1.19 (Continuity). Let X, Y be metric spaces.
(i) A map f : X → Y is called continuous at x ∈ X if for every ε > 0 there exists
δ > 0 such that if dX (x, y) < δ, then dY (f (x), f (y)) < ε.
(ii) f is called continuous if it is continuous at every x ∈ X. We also write
f ∈ C(X, Y ).
Lemma 1.20. Let f : X → Y and x ∈ X. The following are equivalent.
(i) f continuous at x.
(ii) For every sequence (xn )n∈N ⊂ X convergent to x, the sequence (f (xn ))n∈N
converges to f (x).
1. TOPOLOGY 11
Lemma 1.24. Let (fn )n∈N be a sequence of functions on a set X. Then (fn )n∈N
converges uniformly to f if and only if limn→∞ supx∈X |fn (x) − f (x)| = 0.
To illustrate the difference between the notions of pointwise convergence and uniform
convergence we consider
(
1 − nx if 0 ≤ x ≤ n−1
fn (x) =
0 if n−1 < x ≤ 1
as a sequence of functions on the metric space X = [0, 1]. For every x ∈ [0, 1] the
numerical sequence {fn (x)}n∈N converges and we have
(
0 if 0 < x ≤ 1
lim fn (x) = .
n→∞ 1 if x = 0
Thus fn converges to f pointwise. However, for every n ∈ N
sup |fn (x) − f (x)| = 1
x∈[0,1]
(fn (x))n∈N is Cauchy. Since all Cauchy sequences in R and C converge this numerical
sequence has a limit, call it f (x). Letting m → ∞ we see that limm→∞ |fn (x)−fm (x)| =
|fn (x) − f (x)| and it follows that for n ≥ N we get |fn (x) − f (x)| ≤ ε/2. Since ε is
arbitrary this means that (fn ) converges to f uniformly.
Lemma 1.26. If (fn )n∈N converges uniformly to f and each fn is bounded, then f
is bounded.
(Recall that a function f : X → C is called bounded if there exists C > 0 such that
|f (x)| ≤ C for all x ∈ X.)
Proof. By the assumed uniform convergence there is N such that |fN (x)−f (x)| <
1 for all n ≥ N , x ∈ X. Since fN is bounded there is M > 0 such that |fn (x)| ≤ M for
all x ∈ X. Now use
|f (x)| = |f (x) − fN (x) + fN (x)| ≤ |f (x) − fN (x)| + |fN (x)| < 1 + M.
We shall now assume that X is a metric space, with metric d.
Lemma 1.27. Let X be a metric space and a ∈ X. If (fn )n∈N converges uniformly
to f and each fn is continuous at a, then f is continuous at a.
Proof. We have to show that given ε > 0 there is δ > 0 such that |f (x)−f (a)| < ε
provided that d(x, a) < δ.
Since fn converges uniformly to f there is an N ∈ N such that |fn (x) − f (x)| < ε/3
for n ≥ N , and all x ∈ X. Consider the continuous function fN . There is δ > 0 such
that |fN (x) − fN (a)| < ε/3 provided that d(x, a) < δ. For such x we get
|f (x) − f (a)| = |f (x) − fN (x) + fN (x) − fN (a) + fN (a) − f (a)|
≤ |f (x) − fN (x)| + |fN (x) − fN (a)| + |fN (a) − f (a)| < ε/3 + ε/3 + ε/3 = ε.
Lemma 1.28. Let X be a metric space. The space of bounded continuous functions
Cb (X) is a complete metric space with the supremum metric
d∞ (f, g) = sup |f (x) − g(x)|.
x∈X
Remarks. 1. The proof not only demonstrates the existence of the fixed point x∗ ,
but also gives an algorithm to compute it via successive applications of the map ϕ. We
can say something about how quickly the algorithm converges: the sequence (xn )n∈N
defined in the proof satisfies the inequality
cn
d(xn , x∗ ) ≤ d(x0 , x1 ),
1−c
so speed of convergence depends only on the parameter c ∈ (0, 1) and the quality of
the initial guess x0 ∈ X.
2. The contraction principle can be used to solve equations. For example, say we want
to solve F (x) = 0 (F is some function). Then we can set G(x) = F (x) + x. Then
F (x) = 0 if and only if x is a fixed point of G.
3. The conclusion does not necessarily hold if we drop the assumption that X is
complete: the map f : (0, 1) → (0, 1) defined by f (x) = x/2 is a contraction (in which
metric space?) but has no fixed point.
4. If we replace the contraction assumption (1.2) by the weaker condition
(1.3) |ϕ(x) − ϕ(y)| < |x − y|
for all x, y with x 6= y then ϕ may not have a fixed point in X. Consider ϕ(x) = x + e−x
on the complete metric space X = [0, ∞). One verifies that ϕ0 (x) = 1 − e−x ∈ (0, 1) for
x ≥ 0 thus (1.3) is satisfied if x 6= y. But clearly ϕ(x) − x > 0 for x ≥ 0 so ϕ does not
have a fixed point.
Exercise 1.37. We are given h ∈ C([0, 1]) and K ∈ C([0, 1]2 ) such that |K(x, t)| ≤
3/4 for (x, t) ∈ [0, 1]2 . Consider the integral equation
Z 1
(1.4) f (x) = K(x, t)f (t)dt + h(x), x ∈ [0, 1]
0
16 1. METRIC SPACES
Show that there exists a unique function continuous in [0, 1] such that (1.4) holds.
Follow the following steps.
(i) Define for f ∈ C([0, 1])
Z 1
T [f ](x) = K(x, t)f (t)dt + h(x)
0
.
(ii) Show that T maps C([0, 1]) to C([0, 1]).
(iii) Show that supx∈[0,1] |T [f ](x) − T [g](x)| ≤ 34 supt∈[0,1] |f (t) − g(t)| and conclude.
Exercise 1.38. Let A > 0. We are given h ∈ C([0, A]) and K ∈ C([0, A]2 ) such
that |K(x, t)| ≤ B for all (x, t) ∈ [0, A]2 .
Consider the Volterra integral equation
Z x
(1.5) f (x) = K(x, t)f (t)dt + h(x), x ∈ [0, A]
0
Show that there exists a unique function continuous in [0, A] such that (1.5) holds. Fill
in the details for the following steps.
(i) Define for f ∈ C([0, A])
Z x
V [f ](x) = K(x, t)f (t)dt + h(x)
0
.
(ii) Show that V maps C([0, A]) to C([0, A]).
(iii) Given a positive number M define a metric on C([0, A]) by
dM (f, g) = sup |f (x) − g(x)|e−M x .
x∈[0,A]
Show that C([0, A]) with this metric is a complete metric space. Show that a sequence
(fn )n∈N of functions in C([0, A]) converges uniformly if and only if it converges with
respect to dM .
(iv) Show
B
dM (V [f ], V [g]) ≤ d(f, g)
M
so that with the choice M > B the map V becomes a contraction on (C([0, A]), dM ) .
Remark. The preceding example shows that a smart choice of the metric or metric
space can be crucial in solving such equations. This can often present highly nontrivial
problems in applications.
Exercise 1.39. Show that the system of equations
1
x1 + 10
cos(sin(2x2 + x1 )) = 6
1 −x21 1
x2 + 12
e + 10 cos(x1 + x2 ) = 7
has a unique solution (x1 , x2 ) ∈ R2 .
Exercise 1.40. (i) Show there is exactly one u ∈ C([−1, 1]) that satisfies the
integral equation
Z x
t2 cos u(t) dt,
u(x) = x x ∈ [−1, 1].
0
Hint: Use the contraction principle in the space C([−1, 1]).
(ii) Show that u is differentiable. Is u0 differentiable?
3. COMPACTNESS 17
Exercise 1.41. Let f : R → R be a C 1 -function, such that |f 0 (x)| ≤ a < 1 for all
x ∈ R. Define a C 1 -function g : R2 → R2 by g(x, y) = (x + f (y), y + f (x)). Show that
the range of g is all of R2 .
Exercise 1.42. Show that there exists a unique (x, y) ∈ R2 such that cos(sin(x)) =
y and sin(cos(y)) = x.
3. Compactness
The goal in this section is to study the general theory of compactness in metric
spaces. From Analysis I, you might already be familiar with compactness in R. By
the Heine-Borel theorem, a subset of Rn is compact if and only if it is bounded and
closed. We will see that this no longer holds in general metric spaces. We will also
study in detail compact subsets of the space of continuous functions C(K) where K is
a compact metric space (Arzelà-Ascoli theorem). Let (X, d) be a metric space. We first
review some basic definitions.
Definition 1.43. A collection (Gi )i∈I (ISis an arbitrary index set) of open sets
Gi ⊂ X is called an open cover of X if X ⊂ i∈I Gi .
Definition 1.44. X is compact if every open cover of X contains a finite subcover.
is, if for every open cover (Gi )i∈I there exists m ∈ N and i1 , . . . , im ∈ I such that
That S
X⊂ m j=1 Gij . This is also called the Heine-Borel property .
Exercise 1.51. Let X be a compact metric space. Prove that there exists a count-
able, dense set E ⊂ X (recall that E ⊂ X is called dense if E = X).
Exercise 1.52. Construct a compact subset of real numbers whose accumulation
points form a countable set.
3.1. Compactness and continuity. We will now prove three key theorems that
relate compactness to continuity. In Analysis I you might have seen versions of these
on R or Rn . The proofs are not very interesting, but can serve as instructive examples
of how to prove statements involving the Heine-Borel property.
Theorem 1.53. Let X, Y be metric spaces and assume that X is compact. If a map
f : X → Y is continuous, then it is uniformly continuous.
Proof. Let ε > 0. We need to demonstrate the existence of a number δ > 0
such that for all x, y ∈ X we have that dX (x, y) ≤ δ implies dY (f (x), f (y)) ≤ ε. By
continuity, for every x ∈ X there exists a number δx > 0 such that for all y ∈ X,
dX (x, y) ≤ δx implies dY (f (x), f (y)) ≤ ε/2. Let
Bx = B(x, δx /2) = {y ∈ X : dX (x, y) < δx /2}.
Then (Bx )x∈X is an open cover of X. By compactness, there exists a finite subcover by
Bx1 , . . . , Bxm . Now we set
δ = 12 min(δx1 , . . . , δxm ).
We claim that this δ does the job. Indeed, let x, y ∈ X satisfy dX (x, y) ≤ δ. There
exists i ∈ {1, . . . , m} such that x ∈ Bxi . Then
dX (xi , y) ≤ dX (xi , x) + dX (x, y) ≤ 21 δxi + δ ≤ δxi .
xi
x
y
Theorem 1.54. Let X, Y be metric spaces and assume that X is compact. If a map
f : X → Y is continuous, then f (X) ⊂ Y is compact.
Note that for A ⊂ X we have A ⊂ f −1 (f (A)) and for B ⊂ Y we have f (f −1 (B)) ⊂
B, but equality need not hold in either case.
Proof. Let (Vi )i∈I be an open cover of S f (X). Since f is continuous, the sets
Ui = f −1 (Vi ) ⊂ X are open. We have f (X) ⊂ i∈I Vi . So,
[ [
X ⊂ f −1 (f (X)) ⊂ f −1 (Vi ) = Ui .
i∈I i∈I
Thus (Ui )i∈I is an open cover of X and by compactness there exists a finite subcover
{Ui1 , . . . , Uim }. That is,
m
[
X⊂ Uik
k=1
Consequently,
m
[ m
[
f (X) ⊂ f (Uik ) ⊂ Vik .
k=1 k=1
Thus {Vi1 , . . . , Vim } is an open cover of f (X).
Theorem 1.55. Let X be a compact metric space and f : X → R a continuous
function. Then there exists x0 ∈ X such that f (x0 ) = supx∈X f (x).
By passing from f to −f we see that the theorem also holds with sup replaced by
inf.
Proof. By Theorem 1.54, f (X) ⊂ R is compact. By the Heine-Borel Theorem
1.46, it is therefore closed and bounded. By completeness of the real numbers, f (X)
has a finite supremum sup f (X) and since f (X) is closed we have sup f (X) ∈ f (X), so
there exists x0 ∈ X such that f (x0 ) = sup f (X) = supx∈X f (x).
Corollary 1.56. Let X be a compact metric space. Then every continuous func-
tion on X is bounded: C(X) = Cb (X).
For a converse of this statement, see Exercise 1.104 below.
Proof. Let f ∈ C(X). Then |f | : X → [0, ∞) is also continuous. By Theorem
1.55 there exists x0 ∈ X such that |f (x0 )| = supx∈X |f (x)|. Set C = |f (x0 )|. Then
|f (x)| ≤ C for all x ∈ X, so f is bounded.
3.2. Sequential compactness and total boundedness.
Definition 1.57. A metric space X is sequentially compact if every sequence in X
has a convergent subsequence. This is also called the Bolzano-Weierstrass property .
Let us recall the Bolzano-Weierstrass theorem which you should have seen in Anal-
ysis I.
Theorem 1.58 (Bolzano-Weierstrass). Every bounded sequence in R has a conver-
gent subsequence.
Definition 1.59. A metric space X is bounded if it is contained in a single fixed
ball, i.e. if there exist x0 ∈ X and r > 0 such that X ⊂ B(x0 , r).
20 1. METRIC SPACES
Definition 1.60. A metric space X is totally bounded if for every ε > 0 there exist
finitely many balls of radius ε that cover X.
Similarly, we define these terms for subsets A ⊂ X by considering (A, d|A×A ) as its
own metric space.
Note that
X totally bounded =⇒ X bounded.
The converse is generally false. However, for A ⊂ Rn we have that A is totally bounded
if and only if A is bounded.
Theorem 1.61. Let X be a metric space. The following are equivalent:
(1) X is compact
(2) X is sequentially compact
(3) X is totally bounded and complete
Corollary 1.62.
(1) (Heine-Borel Theorem) A subset A ⊂ Rn is compact if and only if it is bounded
and closed.
(2) (Bolzano-Weierstrass Theorem) A subset A ⊂ Rn is sequentially compact if
and only if it is bounded and closed.
Proof of Corollary 1.62. A subset A ⊂ Rn is closed if and only if A is com-
plete as a metric space (this is because Rn is complete). Also, A ⊂ Rn is bounded if
and only if it is totally bounded. Therefore, both claims follow from Theorem 1.61.
Example 1.63. Let `∞ be the space of bounded sequences (an )n∈N ⊂ C with
d(a, b) = supn∈N |an − bn | (that is, `∞ = Cb (N)). We claim that the closed unit ball
around 0 = (0, 0, . . . ),
B(0, 1) = {a ∈ `∞ : |an | ≤ 1 ∀n ∈ N}
is bounded and closed, but not compact. Indeed, let e(k) ∈ `∞ be the sequence with
(k) 0, k 6= n,
en =
1, k = n.
Then, e(k) ∈ B(0, 1) for all k = 1, 2, . . . but (e(k) )k ⊂ B(0, 1) does not have a convergent
subsequence, because d(e(k) , e(j) ) = 1 for all k 6= j and therefore no subsequence can
be Cauchy. Thus B(0, 1) is not sequentially compact and by Theorem 1.61 it is not
compact.
1
P Example 1.64. Let ` be the space of complex sequences (an )n∈N ⊂ C such that
1
n |a n | < ∞. We define a metric on ` by
X
d(a, b) = |an − bn |
n
Exercise 1.65. Show that the closed and bounded set B(0, 1) ∈ `1 is not compact.
Proof of Theorem 1.61. X compact ⇒ X sequentially compact: Suppose that
X is compact, but not sequentially compact. Then there exists a sequence (xn )n∈N ⊂ X
without a convergent subsequence. Let A = {xn : n ∈ N} ⊂ X. Note that A must
be an infinite set (otherwise (xn )n∈N has a constant subsequence). Since A has no
accumulation points, we have that for every xn there is an open ball Bn such that
Bn ∩ A = {xn }. Also, A is a closed set, so X\A is open. Thus, {Bn : n ∈ N} ∪ {X\A}
3. COMPACTNESS 21
Bi Bj Bk B`
Figure 3.
p1 p2 pnk p
Figure 4.
This is a contradiction, because we assumed that the Bn are not contained in any
of the Gi .
Now let ε > 0 be such that every ε-ball is contained in one of the Gi . We have already
proven earlier that X is totally bounded if it is sequentially compact. Thus there exist
p1 , . . . , pM such that the balls B(pj , ε) cover X. But each B(pj , ε) is contained in a Gi ,
say in Gij , so we have found a finite subcover:
M
[ M
[
X⊂ B(pj , ε) ⊂ Gij .
j=1 j=1
Corollary 1.66. Compact subsets of metric spaces are bounded and closed.
Corollary 1.67. Let X be a complete metric space and A ⊂ X. Then A is totally
bounded if and only if it is relatively compact.
Exercise 1.68. Prove this.
3.3. Equicontinuity and the Arzelà-Ascoli theorem. Let (K, d) be a compact
metric space. By Corollary 1.56, continuous functions on K are automatically bounded.
Thus, C(K) = Cb (K) is a complete metric space with the supremum metric
d∞ (f, g) = sup |f (x) − g(x)|
x∈K
(see Fact 1.28). Convergence with respect to d∞ is uniform convergence (see Fact 1.29).
In this section we ask ourselves when a subset F ⊂ C(K) is compact.
Example 1.69. Let F = {fn : n ∈ N} ⊂ C([0, 1]), where
fn (x) = xn , x ∈ [0, 1].
F is not compact, because no subsequence of (fn )n∈N converges. This is because the
pointwise limit
0, x ∈ [0, 1),
f (x) =
1, x = 1.
is not continuous, i.e. not in C([0, 1]).
The key concept that characterizes compactness in C(K) is equicontinuity.
Definition 1.70 (Equicontinuity). A subset F ⊂ C(K) is called equicontinuous if
for every ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε for all f ∈ F, x, y ∈ K
with d(x, y) < δ.
3. COMPACTNESS 23
for n ≥ N . By uniform continuity (using Theorem 1.53) there exists δ > 0 such that
|fk (x) − fk (y)| ≤ ε/3
for all x, y ∈ K with d(x, y) < δ and all k = 1, . . . , N . Thus, for n ≥ N and x, y ∈ K
with d(x, y) < δ we have
|fn (x) − fn (y)| ≤ |fn (x) − fN (x)| + |fN (x) − fN (y)| + |fN (y) − fn (y)| ≤ 3 · ε/3 = ε.
Lemma 1.73. If F ⊂ C(K) is pointwise bounded and equicontinuous, then it is
uniformly bounded.
Proof. Choose δ > 0 such that
|f (x) − f (y)| ≤ 1
for all d(x, y) < δ, f ∈ F. Since K is totally bounded (by Theorem 1.61) there exist
p1 , . . . , pm ∈ K such that the balls B(pj , δ) cover K. By pointwise boundedness, for
every x ∈ K there exists C(x) such that |f (x)| ≤ C(x) for all f ∈ F. Set
C = max{C(p1 ), . . . , C(pm )}.
Then for f ∈ F and x ∈ K,
|f (x)| ≤ |f (pj )| + |f (x) − f (pj )| ≤ C + 1,
where j is chosen such that x ∈ B(pj , δ).
Theorem 1.74 (Arzelà-Ascoli). A subset F of C(K) is totally bounded if and only
if it is pointwise bounded and equicontinuous.
Proof. Recall that the space C(K) is complete. Since we now assume that F
is closed in C(K) the metric space F is complete. Thus by the characterization of
compactness (F compact ⇐⇒ F totally bounded and complete) the corollary follows
from the theorem.
Corollary 1.76. An equicontinuous and bounded sequence {fn } of functions in
C(K) has a uniformly convergent subsequence.
Proof. The closure of F = {fn : n ∈ N} is bounded, complete, and equicontinuous,
thus compact. By a part of the theorem on the characterization of compactness it is
also sequentially compact, therefore fn has a convergent subsequence.
We now discuss a special case of the Arzelá-Ascoli theorem.
Corollary 1.77. Let F ⊂ C([a, b]) be such that
(i) F is bounded (i.e. uniformly bounded),
(ii) every f ∈ F is continuously differentiable and
F 0 = {f 0 : f ∈ F}
is bounded.
Then F is totally bounded.
Proof of Corollary 1.77. Using the mean value theorem we see that for all
x, y ∈ [a, b] there exists ξ ∈ [a, b] such that
f (x) − f (y) = f 0 (ξ)(x − y).
But since F 0 is bounded there exists C > 0 such that
|f 0 (ξ)| ≤ C
for all f ∈ F, ξ ∈ [a, b]. Thus,
|f (x) − f (x)| ≤ C|x − y|
for all x, y ∈ [a, b] and all f ∈ F. This implies equicontinuity: for ε > 0 we set δ = C −1 ε.
Then for x, y ∈ [a, b] with |x − y| < δ we have
|f (x) − f (y)| ≤ C|x − y| < Cδ = ε.
Therefore the claim follows from Theorem 1.74.
Example 1.78. Let F = {x 7→ ∞ n 1 1
P
n=0 cn x : |cn | ≤ 1} ⊂ C([− 2 , 2 ]). The set F is
bounded, because
∞
X X∞
n
cn x ≤ 2−n = 2.
n=0 n=0
for all sequences (cn )n∈N with |cn | ≤ 1 and for all x ∈ [−1/2, 1/2]. Similarly,
nX ∞ o
0
F = ncn xn−1 : |cn | ≤ 1
n=1
is also bounded. Thus, F ⊂ C([− 21 , 12 ]) is relatively compact. However, note that the F
interpreted as a subset of C([0, 1]) (with the understanding that convergence at x = 1
is also assumed) is not relatively compact (it contains the set in Example 1.69).
26 1. METRIC SPACES
Example 1.80. Condition (i) from Corollary 1.77 is necessary, because relatively
compact sets are bounded. Condition (ii) however is not necessary.√ Consider for ex-
ample F = {fn : n = 1, 2, . . . } ⊂ C([0, 1]) with fn (x) = sin(nx)/ n. The set F is
bounded, but F 0 is unbounded. But the sequence (fn )n∈N is uniformly convergent, so
by Fact 1.72, F is equicontinuous and hence relatively compact.
Definition 1.81. Let E be a totally bounded subset of a metric space X, i.e. for
every δ > 0 it is contained in a finite collection of δ-balls.
For δ > 0 let N (E, δ) be the minimal number of δ-balls needed to cover E (the
centers of these balls are not required to belong to E). This number is called the δ-
covering number of E; note that it depends not only on E but also on the underlying
metric space X and the given metric d. The function δ 7→ log N (E, δ) is called the
metric entropy function of E.
The definition of N (E, δ) is extended to sets that are not totally bounded if we allow
the value ∞. If E is not totally bounded then there exists a δ0 such that N (E, δ) = ∞
for δ < δ0 .
One is interested in the behavior of N (E, δ) for small δ. For compact E this serves
as a quantitative measure of compactness.
Definition 1.82. Let E be totally bounded. The number
log N (E, δ)
dimM (E) = lim sup
δ→0+ log( 1δ )
is called the upper Minkowski dimension (also known as Box counting dimension or
upper metric dimension of E.) The expression
log N (E, δ)
dimM (E) = lim inf
δ→0+ log( 1δ )
is called lower Minkowski dimension or lower box counting or metric metric dimension
of E. If dim(E) = dim(E) = α we say that E has Minkowski dimension α.
Example 1.83. Let k ≤ n and let E denote a k-dimensional box in Rn :
E = [0, 1]k × {0}n−k = {x ∈ Rn : xj ∈ [0, 1] for 1 ≤ j ≤ k, xj = 0 for k < j ≤ n}.
Then there exist constants c, c0 > 0 such that
(1.6) c0 δ −k ≤ N (E, δ) ≤ cδ −k
for all δ ∈ (0, 1). Hence E has Minkowski dimension k.
4. COVERING NUMBERS AND MINKOWSKI DIMENSION* 27
Exercise 1.84. Let E ⊂ Rn be a compact set. Show that there exists a constant
c ∈ (0, ∞) such that
N (E, δ) ≤ c · δ −n
holds for all δ > 0. Hence dimM E ≤ n
Exercise 1.85. (i) Show that if we replace the natural log in the above definitions
by another logb with base b > 1 then the definitions of the dimensions do not change.
(ii) Let α > 0. Suppose that for every ε > 0 there is a δ(ε) > 0 and a positive
constant Cε ≥ 1 such that Cε−1 δ −α+ε ≤ N (E, δ) ≤ Cε δ −α−ε for 0 < δ < δ(ε). Show
that E has Minkowski dimension α.
(iii) Let E ⊂ X be totally bounded and let E be the closure of E. Show: E is
totally bounded and we have
N (E, δ) ≤ N (E, δ) ≤ N (E, δ 0 ) if 0 < δ 0 < δ.
(iv) Define N cent (E, δ) to be the minimal number of δ-balls with center in E needed
to cover E. Show that
N (E, δ) ≤ N cent (E, δ) ≤ N (E, δ/2).
(v) Let B1 , . . . , BM be balls of radius δ in X, so that each ball has nonempty
intersection with the set E. For each i = 1, . . . , M denote by Bi∗ the ball with same
∗
center as Bi and radius 3δ. Assume that the balls B1∗ , . . . , BM are disjoint. Prove that
M ≤ N (E, δ).
Remark: This can be an effective tool to prove lower bounds for the covering num-
bers.
Exercise 1.86. Consider the following metrics in Rn .
• d1 (x, y) = ni=1 |xi − yi |,
P
P 1/2
n 2
• d2 (x, y) = i=1 |x i − y i | ,
• d∞ (x, y) = maxi=1,...,n |xi − yi |.
(i) Let E ⊂ Rn and let N1 (E, δ), N2 (E, δ), N∞ (E, δ) be the metric entropy numbers
of E associated with to the metrics d1 , d2 , d∞ , respectively. Show that
√
N∞ (E, δ) ≤ N2 (E, δ) ≤ N1 (E, δ) ≤ N2 (E, δ/ n) ≤ N∞ (E, δ/n).
(ii) Let Q = [0, 1]n be the unit cube in Rn . Show that Q has Minkowski dimension
n (with respect to any of the metrics d1 , d2 , d3 ).
(iii) Let f be a differentiable function on [0, 1] with bounded derivative. Let E be
the set of all x = (x1 , x2 ) ∈ R2 for which 0 ≤ x1 ≤ 1 and x2 = f (x1 ). What is the
Minkowski dimension of E?
√
(iv) Let E be the set of all x = (x1 , x2 ) ∈ R2 for which 0 ≤ x1 ≤ 1 and x2 = x1 .
What is the Minkowski dimension of E?
Exercise 1.87. Let β > 0. Consider the subset E of R consisting of the numbers
n−β , for n = 1, 2, . . . . Show that E has a Minkowski dimension and determine it.
Hint: It might help to try this first for the sequence 1/n which, perhaps counterin-
tuitively, turns out to have Minkowski dimension 12 .
Example 1.88. The Cantor middle third set C is given as a the subset of [0, 1]
consisting of numbers of the form
X∞
ak 3−k where ak ∈ {0, 2}.
k=1
28 1. METRIC SPACES
It can be written as
∞ 3[`
−1
[
(1.7) C = [0, 1]\ ( 3k+1 , 3k+2 ).
3`+1 3`+1
`=0 k=0
C is a compact subset of [0, 1], with the property that for each N there are 2N disjoint
closed intervals of length 3−N which cover C.
log 2
Exercise 1.89. Show that C has Minkowski dimension log 3
.
Exercise 1.90. Let A be the space of functions f : N → R (aka sequences) so that
|f (n)| ≤ 2−n for all n ∈ N. It is a subset of the space of bounded sequences with norm
kf k∞ = supn∈N |f (n)| and associated metric d∞ . Show that for δ < 1/2 the covering
numbers N (A, δ) satisfy the bounds
1 C+ 12 log2 1δ
N (A, δ) ≤
δ
where C is independent of δ. Hint: It helps to work with δ = 2−M where M ∈ N.
Also provide a lower bound which shows that A does not have finite lower Minkowski
dimension.
the oscillation of f at x.
The number oscf (x) can be used to quantify discontinuities:
Lemma 1.93. Let f : X → R be a bounded function. Then f is continuous at x if
and only if oscf (x) = 0.
Proof. This is a consequence of the definition of continuity.
Lemma 1.94. Let f : X → R be a bounded function. Then for every γ ≥ 0 the set
{x : oscf (x) ≥ γ} is closed.
Proof. The conclusion is shown by proving that the complement
Ωγ = {x : oscf (x) < γ}
is open. Let x ∈ Ωγ and choose ε such that 0 < ε < γ − oscf (x). By the definition of
oscf (x) we can pick δ > 0 such that Mf,δ (x)−mf,δ (x) < oscf (x)+ε. If d(y, x) < δ/2 and
6. FURTHER EXERCISES 29
d(z, y) < δ/2 then d(z, x) < δ and thus Mf,δ/2 (y) ≤ Mf,δ (x) and mf,δ/2 (y) ≥ mf,δ (x).
Hence
oscf (y) ≤ Mf,δ/2 (y) − mf,δ/2 (y) ≤ Mf,δ (x) − mf,δ (x) < oscf (x) + ε < γ
so that B(x, δ/2) ⊂ Ωγ . Hence x is an interior point of Ωγ and since x was chosen
arbitrarily in Ωγ this set is open.
Exercise 1.95. Define f : [−10, 10] → R by f (x) = −4x for x ≤ 0, f (x) = sin(π/x)
for 0 < x < 3/2, f (x) = cos(π/x) for x ≥ 3/2. Determine oscf (x) for all x ∈ [−10, 10].
Exercise 1.96. Consider Thomae’s function f : [0, 1] → R, defined by
(
0 if x ∈ [0, 1] \ Q,
f (x) = 1
n
if x = m
n
with gcd(m, n) = 1.
Find oscf (x) for all x ∈ [0, 1].
6. Further exercises
Exercise 1.97. Let (X, d) be a metric space and A ⊂ X a subset.
(i) Show that A is totally bounded if and only if A is totally bounded.
(ii) Assume that X is complete. Show that A is totally bounded if and only if A is
relatively compact. Which direction is still always true if X is not complete?
1
P∞ 1.98. Let ` denote space of all sequences (an )P
Exercise n∈N of complex numbers
such that n=1 |an | < ∞, equipped with the metric d(a, b) = ∞ n=1 |an − bn |.
(i) Prove that
X∞
1
A = {a ∈ ` : |an | ≤ 1}
n=1
is bounded and closed, but not compact.
(ii) Let b ∈ `1 with bn ≥ 0 for all n ∈ N. Show that
B = {a ∈ `1 : |an | ≤ bn ∀ n ∈ N}
is compact.
Exercise 1.99. Recall that `∞ is the metric space of bounded sequences of complex
numbers equipped with the supremum metric d(a, b) = supn∈N |an − bn |. Let s ∈ `∞ be
a sequence of non–negative real numbers that converges to zero. Let
A = {a ∈ `∞ : |an | ≤ sn for all n}.
Prove that A ⊂ `∞ is compact.
Exercise 1.100. For each of the following subsets of C([0, 1]) prove or disprove
compactness:
(i) A1 = {f ∈ C([0, 1]) : maxx∈[0,1] |f (x)| ≤ 1}.
(ii) A2 = A1 ∩ {p : p polynomial of degree ≤ d} (where d ∈ N is given) .
(iii) A3 = A1 ∩ {f : f is a power series with infinite radius of convergence}.
Exercise 1.101. Let F ⊂ C([a, b]) be a bounded set. Assume that there exists a
function ω : [0, ∞) → [0, ∞) such that
lim ω(t) = ω(0) = 0.
t→0+
30 1. METRIC SPACES
The purpose of this exercise is to prove a theorem of Fréchet that characterizes com-
pactness in `p . Let F ⊂ `p .
(i) Assume that F is bounded and equisummable in the following sense: for all ε > 0
there exists N ∈ N such that
X∞
|an |p < ε for all a ∈ F.
n=N
Exercise 1.104. Let X be a metric space. Assume that for every continuous
function f : X → C there exists a constant Cf > 0 such that |f (x)| ≤ Cf for all
x ∈ X. Show that X is compact. Hint: Assume that X is not sequentially compact
and construct an unbounded continuous function on X.
Exercise 1.105. Consider F = {fN : N ∈ N} ⊂ C([0, 1]) with
N
X
fN (x) = b−nα sin(bn x),
n=0
Exercise 1.106. Suppose (X, d) is a metric space with a countable dense subset, i.e.
a set A = {x1 , x2 , . . . } ⊂ X with A = X. Let `∞ denote the metric space of bounded
sequences a = (an )n∈N of real numbers with metric d∞ (a, b) = supn∈N |an − bn |. Show
that there exists a map ι : X → `∞ with d∞ (ι(x), ι(y)) = d(x, y) for every x, y ∈ X (in
other words, X can be isometrically embedded into `∞ ).
CHAPTER 2
Exercise 2.9. Show that L(X, Y ) endowed with the operator norm forms a normed
vector space (i.e. show that k · kop is a norm).
Example 2.10. Let A ∈ Rn×m be a real n × m matrix. We view A as a linear
map Rm → Rn : for x ∈ Rm , A(x) = A · x ∈ Rn . Let us equip Rn and Rm with the
corresponding k · k∞ norms. Consider the operator norm kAk∞→∞ = supkxk∞ =1 kAxk∞
with respect to these normed spaces:
m
X m
X
kAxk∞ = max Aij xj ≤ max |Aij | kxk∞ .
i=1,...,n i=1,...,n
j=1 j=1
Pm
This implies kAk∞→∞ ≤ maxi=1,...,n j=1 |Aij |. On the other hand, for given i =
1, . . . , n we choose x ∈ Rm with xj = |Aij |/Aij if Aij 6= 0 and xj = 0 if Aij = 0. Then
kxk∞ ≤ 1 and
Xm
kAk∞→∞ ≥ kAxk∞ = |Aij |.
j=1
Pm
Since i was arbitrary, we get kAk∞→∞ ≥ maxi=1,...,n j=1 |Aij |. Altogether we proved
m
X
kAk∞→∞ = max |Aij |.
i=1,...,n
j=1
Exercise 2.12. Let A ∈ Rn×n . Define kxk2 = ( ni=1 |xi |2 )1/2 (Euclidean norm)
P
and kAk2→2 = supkxk2 =1 kAxk2 . Observe that AAT is a symmetric n × n matrix and
hence has only non-negative eigenvalues. Denote the largest eigenvalue of AAT by ρ.
√
Prove that kAk2→2 = ρ. Hint: First consider the case that A is symmetric. Use that
symmetric matrices are orthogonally diagonalizable.
Exercise 2.13. Let A ∈ Rn×n and define
X n X n 1/2
kAkHS = |Aij |2 .
i=1 j=1
for some constant λ 6= 0 that we may choose freely. Then Ax = b if and only if x is a
fixed point of F . Moreover,
kF (x) − F (y)k = kλA(x − y) + x − yk = k(λA + I)(x − y)k ≤ kλA + Ikop kx − yk.
Suppose that λ happens to be such that kλA + Ikop < 1. Then F : Rn → Rn is a
contraction, so we can compute the solution to the equation by the iteration xn+1 =
F (xn ).
2. Equivalence of norms
Definition 2.15. Two norms k·ka and k·kb on a vector space X are called equivalent
if there exist constants c, C > 0 such that
ckxka ≤ kxkb ≤ Ckxka
for all x ∈ X.
Exercise 2.16. Prove that equivalent norms generate the same topologies: if k · ka
and k · kb are equivalent then a set U ⊂ X is open with respect to k · ka if and only if
it is open with respect to k · kb .
Exercise 2.17. Show that equivalence of norms forms an equivalence relation on
the space of norms. That is, if we write n1 ∼ n2 to denote that two norms n1 , n2 are
equivalent, then prove that n1 ∼ n1 (reflexivity), n1 ∼ n2 ⇒ n2 ∼ n1 (symmetry) and
n1 ∼ n2 , n2 ∼ n3 ⇒ n1 ∼ n3 (transitivity).
Theorem 2.18. Let X be a finite-dimensional K-vector space. Then all norms on
X are equivalent.
x
for all y ∈ S. For x ∈ X, x 6= 0 we have kxk ∗
∈ S and thus by homogeneity of norms,
x
using (2.2) with y = kxk∗ gives
kxk ≥ ckxk∗ .
Thus we proved that k · k and k · k∗ are equivalent norms.
In contrast, two given norms on an infinite-dimensional vector space are generally
not equivalent. For example, the supremum norm and the L2 -norm on C([0, 1]) are not
equivalent (as a consequence of Exercise 4.64).
Corollary 2.19. If X is finite-dimensional then every linear map T : X → Y is
bounded.
Proof. Let {x1 , . . . , xn } ⊂ X be a basis. Then for x = ni=1 ci xi with ci ∈ K,
P
Xn
kT xkY ≤ |ci |kT xi kY ≤ C max |ci |,
i=1,...,n
i=1
where C = ni=1 kT xi kY . By equivalence of norms we may assume that maxi |ci | is the
P
norm on X.
This is not true if X is infinite-dimensional.
Example 2.20. Let X be the set of sequences of complex numbers (an )n∈N such
that supn∈N n|an | < ∞ and let Y be the space of bounded complex sequences. Then
X ⊂ Y . Equip both spaces with the norm kak = supn∈N |an |. The map T : X → Y ,
(k) (k)
(T a)n∈N = nan is not bounded: let en = 1 if k = n and en = 0 if k 6= n. Then
e(k) ∈ X and T e(k) = ke(k) and ke(k) k = 1. So
kT e(k) k = k
for every k ∈ N and therefore supkxk=1 kT xk = ∞.
Exercise 2.21. Let X be the set of continuously differentiable functions on [0, 1]
and let Y = C([0, 1]). We consider X and Y as normed vector spaces with the norm
kf k = supx∈[0,1] |f (x)|. Define a linear map T : X → Y by T f = f 0 . Show that T is
not bounded.
38 2. LINEAR OPERATORS AND DERIVATIVES
3. Dual spaces*
Theorem 2.22. Let X be a normed vector space and Y a Banach space. Then
L(X, Y ) is a Banach space (with the operator norm).
Proof. Let (Tn )n∈N ⊂ L(X, Y ) be a Cauchy sequence. Then for every x ∈ X,
(Tn x)n∈N ⊂ Y is Cauchy and by completeness of Y it therefore converges to some limit
which we call T x. This defines a linear operator T : X → Y . We claim that T is
bounded. Since (Tn )n∈N is a Cauchy sequence, it is a bounded sequence. Thus there
exists M > 0 such that kTn kop ≤ M for all n ∈ N. We have for x ∈ X,
kT xkY ≤ kT x − Tn xkY + kTn xkY ≤ kT x − Tn xkY + M kxkX .
Letting n → ∞ we get kT xkY ≤ M kxkX . So T is bounded with kT kop ≤ M . It
remains to show that Tn → T in L(X, Y ). That is, for all ε > 0 we need to find N ∈ N
such that
kTn x − T xkY ≤ εkxkX
for all n ≥ N and x ∈ X. Since (Tn )n∈N is a Cauchy sequence, there exists N ∈ N such
that
kTn x − Tm xkY ≤ 2ε kxkX
for all n, m ≥ N and x ∈ X. Fix x ∈ X. Then there exists mx ≥ N such that
kTmx x − T xkY ≤ 2ε kxkX .
Then if n ≥ N and x ∈ X,
kTn x − T xkY ≤ kTn x − Tmx xkY + kTmx x − T xkY ≤ εkxkX .
Definition 2.23. Let X be a normed vector space. Elements of L(X, K) are called
bounded linear functionals . L(X, K) is called the dual space of X and denoted X 0 .
Corollary 2.24. Dual spaces of normed vector spaces are Banach spaces.
Proof. This follows from Theorem 2.22 because K (which is R or C) is complete.
Theorem 2.25. If X is finite-dimensional, then X 0 is isomorphic to X.
Proof. Let {x1 , . . . , xn } ⊂ X be a basis. Then we can define a corresponding dual
basis of X 0 as follows: let fi ∈ X 0 , i ∈ {1, . . . , n} be the linear map given by fi (xi ) = 1
and fi (xj ) = 0 for j 6= i. Then we claim that {f1 , . . . , fn } is a basis of X 0 . Indeed, let
n
f ∈ X 0 . For x ∈ X we can write x = i=1 ci xi with uniquely determined ci ∈ K. Then
P
by linearity,
X n Xn
f (x) = ci f (xi ) = f (xi )fi (x),
i=1 i=1
because fi (x) = ci . Thus, the linear span of {f1 , . . . , fn } is X 0 . On the other hand,
suppose
Xn
bi f i = 0
i=1
for some coefficients (bi )i=1,...,n ⊂ K. Then for every j ∈ {1, . . . , n}, bj = ni=1 bi fi (xj ) =
P
0. Thus, {f1 , . . . , fn } is linearly independent. Thus, X 0 and X are isomorphic since
they have the same dimension. We can define an isomorphism φ : X → X 0 by xi 7→ fi
for i = 1, . . . , n.
4. SEQUENTIAL `p SPACES* 39
4. Sequential `p spaces*
Definition 2.26. Let P 1 ≤ p < ∞. Then we define `p as the set of all sequences
(xn )n=1,2,... ⊂ C such that ∞ p p
n=1 |xn | < ∞. The ` -norm is defined as
X∞ 1/p
p
kxkp = |xn | .
n=1
We need an auxiliary Lemma which generalizes the usual inequality for two non-
negative numbers a, b
√ a+b
ab ≤
2
comparing the geometrical mean of a, b (i.e. the sidelength of the square whose area
equals the area of the rectangle with sides a and b) with the arithmetical mean (the
number half way between a and b).
Lemma 2.28. Let a, b ≥ 0.
(i) Let 0 < ϑ < 1. Then
a1−ϑ bϑ ≤ (1 − ϑ)a + ϑb.
(ii) (Young’s inequality) Let p ∈ (1, ∞). Then
0
ap b p
ab ≤ + 0.
p p
Proof. Clearly the inequality holds if one of a, b is 0. Also Check that if the
inequality is true for some a, b then it is also true for ta, tb where t > 0.
Assume now 0 < b ≤ a and let s = b/a. Then the stated inequality is equivalent
with sϑ ≤ (1 − ϑ) + ϑs for 0 ≤ s ≤ 1. Set f (s) = 1 − ϑ + ϑs − sϑ . Then f (1) = 0 and
f 0 (s) < 0 for 0 < s < 1, thus f (s) ≥ 0 for 0 ≤ s ≤ 1 which implies the desired iequality.
The case 0 < a ≤ b is shown in the same way (in fact follows from the previous case by
interchanging a, b and replacing ϑ by 1 − ϑ). This proves part (i).
0
For part (ii) set x = ap , y = bp , ϑ = 1 − 1/p and observe that the inequality is then
equivalent with x1−ϑ y ϑ ≤ (1 − ϑ)x + ϑy which holds by part (i).
Proof of Hölder’s inequality. Observe that the inequality is true if either x
or y are 0. Check that if the inequality is true for some choice of x and y then it is also
true for sx, ty with s > 0, t > 0. Finally If p ∈ {1, ∞}, the inequality is trivial. So we
assume p ∈ (1, ∞).
40 2. LINEAR OPERATORS AND DERIVATIVES
By Young’s inequality,
∞ ∞ ∞
X 1X p 1X 0
|xn yn | ≤ |xn | + 0 |yn |p .
n=1
p n=1 p n=1
Observe that this yields the asserted inequality when kxkp = 1 and kykp0 = 1.
Also we have k kxkx
p
kp = 1, k kyky 0 = 1, and since the assertion holds for x/kxkp and
p
y/ky|p0 it holds also for x and y.
Theorem 2.29 (Minkowski’s inequality). Let p ∈ [1, ∞]. For x, y ∈ `p ,
kx + ykp ≤ kxkp + kykp .
Proof. If p ∈ {1, ∞} the inequality is trivial. Thus we assume p ∈ (1, ∞). If
kx + ykp = 0, the inequality is also trivial, so we can assume kx + ykp > 0. Now we
write
X∞ ∞
X
p p−1
kx + ykp ≤ |xn ||xn + yn | + |yn ||xn + yn |p−1
n=1 n=1
Using Hölder’s inequality on both sums we obtain that this is
≤ kxkp kx + ykpp−1 p−1
0 (p−1) + kykp kx + ykp0 (p−1)
p
We have p0 (p − 1) = p−1
(p − 1) = p, so we have proved that
kx + ykpp = (kxkp + kykp )kx + ykp−1
p .
Dividing by kx + ykp−1
p gives the claim.
We conclude that k · kp is a norm and `p a normed vector space.
Theorem 2.30. Let p ∈ (1, ∞). The dual space (`p )0 is isometrically isomorphic to
p0
` .
Proof. By ek we denote the sequence which is 1 at position k and 0 everywhere
else.
0
Then we define a map φ : (`p )0 → `p by φ(v) = (v(ek ))k . Clearly, this is a linear
0
map. First we need to show that φ(v) ∈ `p . Let v ∈ (`p )0 . For each n we define
x(n) ∈ `p by
0
(
|v(ek )|p
(n)
xk = v(ek )
if k ≤ n, v(ek ) 6= 0,
0 otherwise.
We have on the one hand
n
0
X
v(x(n) ) = |v(ek )|p .
n=1
And on the other hand
n
X 1/p
0
|v(x(n) )| ≤ kvkop kx(n) kp = kvkop |v(ek )|p .
k=1
0 p p
Here we have used that p(p − 1) = p( p−1 − 1) = p−1
= p0 . Combining these two we get
n
X 10
0 p
|v(ek )|p ≤ kvkop .
k=1
5. DERIVATIVES 41
5. Derivatives
Recall that a function f on an interval (a, b) is called differentiable at x ∈ (a, b) if
limh→0 f (x+h)−f
h
(x)
exists. In other words, if there exists a number T ∈ R such that
|f (x + h) − f (x) − T h|
lim = 0.
h→0 |h|
In that case we denote that real number T by f 0 (x). A real number can be understood
as a linear map R → R:
R −→ L(R, R), T 7−→ (x 7→ T · x)
That is, the linear map associated with a real number T is given by multiplication with
T . Interpreting the derivative at a given point as a linear map, we can formulate the
definition in the general setting of normed vector spaces.
Definition 2.33. Let X, Y be normed vector spaces and U ⊂ X open. A map
F : U → Y is called Fréchet differentiable (we also say differentiable ) at x ∈ U if there
exists T ∈ L(X, Y ) such that
kF (x + h) − F (x) − T hkY
(2.3) lim = 0.
h→0 khkX
In that case we call T the (Fréchet) derivative of F at x and write T = DF (x) or
T = DF |x . F is called (Fréchet) differentiable if it is differentiable at every point
x ∈ U . When X = Rn we also use the following terminology: F is totally differentiable
and DF (x) is the total derivative of F at x.
42 2. LINEAR OPERATORS AND DERIVATIVES
Before we move on we need to verify that DF (x) is well-defined. That is, that T is
uniquely determined by F and x. Suppose T, Te ∈ L(X, Y ) both satisfy (2.3). Then
kT h − TehkY ≤ kF (x + h) − F (x) − T hkY + kF (x + h) − F (x) − TehkY
Thus, by (2.3),
kT h − TehkY
−→ 0 as h → 0
khkX
In other words, for all ε > 0 there exists δ > 0 such that
(2.4) kT h − TehkY ≤ εkhkX
if khkX ≤ δ. By homogeneity of norms we argue that the inequality (2.4) must hold
for all h ∈ X: let h ∈ X, h 6= 0 be arbitrary. Then let h0 = δ khkh X . By homogeneity of
norms we have kh0 kX = δ. Thus,
kT h0 − Teh0 kY ≤ εkh0 kX = εδ.
Multiplying both sides by δ −1 khkX and using homogeneity of norms and linearity of T ,
we obtain
kT h − TehkY ≤ εkhkX
for all h ∈ X (it is trivial for h = 0). Since ε > 0 was arbitrary (and is independent of
h), this implies kT h − TehkY = 0, so T h = Teh for all h. Thus T = Te.
Comments.
• O and o are not functions and (2.5), (2.7) are not equations!
• This is an abuse of the inequality sign: it would be more accurate to define
O(g) as the class of functions that satisfy (2.6), say to write f ∈ O(g).
• One can think of (say) O(g) as a placeholder for a function which may change
at every occurrence of the symbol O(g) but always satisfies the respective
condition that it is dominated by a constant times kg(h)k if khk is small.
• For brevity, we may sometimes not write out the phrase ”as h → 0”.
5. DERIVATIVES 43
• There is nothing special about letting h tend to 0 in this definition. We can also
define o(g), O(g) with respect to another limit, for instance, say, as khk → ∞.
• If f (h) = o(g(h)), then f (h) = O(g(h)), but generally not vice versa.
• If f (h) = O(khkk ), then f (h) = o(khkk−ε ) for every ε > 0.
This implies
1
kF (f + h) − F (f ) − T hk∞ ≤ khk∞ → 0
khk∞
Rx
as h → 0. Thus F is Fréchet differentiable at f and DF |f (h) = 2 0 f (t)h(t)dt.
We go on to discuss some of the familiar properties of derivatives. It follows directly
from the definition that DF |x is linear in F . That is, if F : U → Y, G : U → Y are
differentiable at x ∈ U and λ ∈ R, then the function F + λG : U → Y defined by
(F + λG)(x) = F (x) + λG(x) is differentiable at x and D(F + λG)|x = DF |x + λDG|x .
Theorem 2.36 (Chain rule). Let X1 , X2 , X3 be normed vector spaces and U1 ⊂
X1 , U2 ⊂ X2 open. Let x ∈ U1 and g : U1 → X2 , f : U2 → X3 such that g is Fréchet
differentiable at x, g(U1 ) ⊂ U2 and f is Fréchet differentiable at g(x). Then the function
f ◦ g : U1 → X3 defined by (f ◦ g)(x) = f (g(x)) is Fréchet differentiable at x and
D(f ◦ g)|x h = Df |g(x) Dg|x h
for all h ∈ X1 .
Proof. Let x, x + h ∈ U1 . We write
f (g(x + h)) − f (g(x)) − Df |g(x) Dg|x h
= f (g(x) + k) − f (g(x)) − Df |g(x) k + Df |g(x) (g(x + h) − g(x) − Dg|x h),
where k = g(x + h) − g(x). Using the triangle inequality and that Df |g(x) is a bounded
linear map we obtain
|f (g(x + h)) − f (g(x)) − Df |g(x) Dg|x hkX3
(2.8)
≤ kf (g(x) + k) − f (g(x)) − Df |g(x) kkX3 + kDf |g(x) kop kg(x + h) − g(x) − Dg|x hkX2
We have
(2.9) kkkX2 = kg(x + h) − g(x)kX2 ≤ kDg|x kop khkX1 + o(khkX1 ).
Dividing by khkX1 on both sides, (2.8) implies
1
kf (g(x + h)) − f (g(x)) − Df |g(x) Dg|x hkX3
khkX1
kkkX2 kf (g(x) + k) − f (g(x)) − Df |g(x) kkX3
≤ + o(1), as h → 0.
khkX1 kkkX2
By (2.9),
kkkX2
≤ kDg|x kop + 1
khkX1
if khkX1 is small enough. In particular, k → 0 as h → 0. Since f is differentiable at
g(x) we have that
kf (g(x) + k) − f (g(x)) − Df |g(x) kkX3
kkkX2
converges to 0 as h → 0 (since then k → 0).
Theorem 2.37 (Product rule). Let X be a normed vector space, U ⊂ X open and
assume that F, G : U → R are differentiable at x ∈ U . Then the function F ·G : U → R,
(F · G)(x) = F (x)G(x) is also differentiable at x and
D(F · G) |x = F (x) · DG |x + G(x) · DF |x .
5. DERIVATIVES 45
6. Further exercises
Exercise 2.43. For x, y ∈ Rn define
n
X
ρp (x, y) = |xi − yi |p .
i=1
Show that T is well-defined and bounded and determine the value of kT kop .
Exercise 2.49. Let X, Y be normed vector spaces and F : X → Y a map.
(i) Show that F is continuous if it is Fréchet differentiable.
(ii) Prove that F is Fréchet differentiable if it is linear and bounded.
Exercise 2.50. Let V , W be normed vector spaces and let T : V → W be a
bounded linear transformation. Show that T is differentiable everywhere and compute
the derivative DTv for all v ∈ V .
Exercise 2.51. Let X = C([0, 1]) be the Banach space of continuous functions on
[0, 1] (with the supremum norm) and define a map F : X → X by
Z s
F (f )(s) = cos(f (t)2 )dt, s ∈ [0, 1].
0
(i) Show that F is Fréchet differentiable and compute the Fréchet derivative DF |f for
each f ∈ X.
(ii) Show that F X = {F (f ) : f ∈ X} ⊂ X is relatively compact.
Exercise 2.52. Let Rn×n denote the space of real n × n matrices equipped with
the matrix norm kAk = supkxk=1 kAxk. Define
F : Rn×n −→ Rn×n , F (A) = A2 .
Show that F is totally differentiable and compute DF |A .
Exercise 2.53. (i) Is there a constant C such that for all continuous functions f
on [0, 2] the inequality
Z 2
|f (t)|dt ≤ C max |f (x)|
0 0≤x≤2
holds? Is there a constant C such that for all continuous functions f on [0, 2] the reverse
inequality
Z 2
max |f (x)| ≤ C |f (t)|dt
0≤x≤1 0
holds? The expressions on the both sides of the above inequalities define norms on
C([0, 1]). Are these equivalent norms?
(ii) True or false: There is a constant Cn such that for all polynomials P of degree
≤ n we have
Z 2
max |P (x)| ≤ Cn |P (t)|dt .
0≤x≤2 0
48 2. LINEAR OPERATORS AND DERIVATIVES
Differential calculus in Rn
∇f (x) = Df |Tx = .. ∈ Rn .
.
∂n f (x)
(Note that n × 1 matrices are identified with vectors in Rn : Rn×1 = Rn .)
Example 3.2. Let F : R3 → R2 be defined by F (x) = (x1 x2 sin(x3 ), x22 −ex1 ). Then
F is totally differentiable and the Jacobian is given by
x2 sin(x3 ) x1 sin(x3 ) x1 x2 cos(x3 )
DF |x = .
−ex1 2x2 0
Recall that a set A ⊂ Rn is called convex if tx + (1 − t)y ∈ A for every x, y ∈ A,
t ∈ [0, 1].
Theorem 3.3 (Mean value theorem). Let U ⊂ Rn be open and convex. Suppose
that f : U → R is totally differentiable on U . Then, for every x, y ∈ U , there exists
ξ ∈ U such that
f (x) − f (y) = Df |ξ (x − y)
and there exists t ∈ [0, 1] such that ξ = tx + (1 − t)y.
The idea of the proof is to apply the one-dimensional mean value theorem to the
function restricted to the line passing through x and y.
Exercise 3.5. Show that the conclusion of the corollary also holds under the weaker
assumption that U is open and connected (rather than convex). Hint: Consider
overlapping open balls along a continuous path connecting two given points in U .
By the one-dimensional mean value theorem there exists tj ∈ [0, 1] such that
f (x + vj ) − f (x + vj−1 ) = f (x + vj−1 + hj ej ) − f (x + vj−1 ) = ∂j f (x + vj−1 + tj hj ej )hj .
By continuity of ∂j f , for every ε > 0 exists δ > 0 such that
|∂j f (y) − ∂j f (x)| ≤ ε/n for all j = 1, . . . , n,
whenever y ∈ U is such that kx − yk ≤ δ. We may choose δ small enough so that
x + h ∈ U whenever khk ≤ δ. Then, if khk ≤ δ (then also kvj k ≤ δ, kvj−1 + tj hj ej k ≤ δ)
we get
n
X n
X
f (x + h) − f (x) − hj ∂j f (x) ≤ f (x + vj ) − f (x + vj−1 ) − hj ∂j f (x)
j=1 j=1
n n
X X ε
= |hj ||∂j f (x + vj−1 + tj hj ej ) − ∂j f (x)| ≤ |hj | ≤ εkhk.
j=1 j=1
n
Proof. Let g(t) = f (tx + (1 − t)y). By the fundamental theorem of calculus and
the chain rule,
Z 1 Z 1
0
f (x) − f (y) = g(1) − g(0) = g (s)ds = Df |tx+(1−t)y (x − y)dt.
0 0
Theorem 3.10 (Mean value theorem, vector-valued case). Let U ⊂ Rn be open and
convex and F ∈ C 1 (U, Rm ). Then for every x, y ∈ U there exists θ ∈ [0, 1] such that
kF (x) − F (y)k ≤ kDF |ξ kop kx − yk,
where ξ = θx + (1 − θ)y.
52 3. DIFFERENTIAL CALCULUS IN Rn
Let λ = kDf |−1a kop . By continuity of Df at a, there exists an open ball U ⊂ E such
that
1
kDf |a − Df |x kop ≤ for x ∈ U.
2λ
Then for x, x0 ∈ U
kϕy (x) − ϕy (x0 )k ≤ kDϕy |ξ kop kx − x0 k
≤ kDf |−1 0 1 0
a kop kDf |a − Df |ξ kop kx − x k ≤ 2 kx − x k.
Note that this doesn’t show that ϕy is a contraction, because ϕy (U ) may not be con-
tained in U . However, it does show that ϕy has at most one fixed point (by the same
argument used to show uniqueness in the Banach fixed point theorem). This already
implies that f is injective on U : for every y ∈ Rn we have f (x) = y for at most one
x ∈ U . Let V = f (U ). Then f |U : U → V is a bijection and has an inverse g : V → U .
Claim. V is open.
Proof of claim. Let y0 ∈ V . We need to show that there exists an open ball
around y0 that is contained in V . Since V = f (U ) there exists x0 ∈ U such that
f (x0 ) = y0 . Let r > 0 be small enough so that Br (x0 ) ⊂ U (possible because U is
open). Let ε > 0 and y ∈ Bε (y0 ). We will demonstrate that if ε > 0 is small enough,
then ϕy maps Br (x0 ) into itself. First note
kϕy (x0 ) − x0 k = kDf |−1
a (y − y0 )k ≤ λε.
r
Hence, choosing ε ≤ 2λ
, we get for x ∈ Br (x0 ) that
kϕy (x) − x0 k ≤ kϕy (x) − ϕy (x0 )k + kϕy (x0 ) − x0 k
≤ 21 kx − x0 k + r
2
≤ r
2
+ r
2
= r.
Thus ϕy (x) ∈ Br (x0 ). This proves ϕy (Br (x0 )) ⊂ Br (x0 ), so ϕy is a contraction of Br (x0 ).
By the Banach fixed point theorem, ϕy must have a unique fixed point x ∈ Br (x0 ). So
by definition of ϕy we have f (x) = y, so y ∈ f (U ) = V . Therefore we have shown that
Bε (y0 ) ⊂ V , so V is open.
It remains to show that g is differentiable on V and Dg|f (a) = Df |−1
a . We use the
following lemma.
Lemma 3.15. Let A, B ∈ Rn×n such that A is invertible and
(3.2) kB − Ak · kA−1 k < 1.
Then B is invertible. (Here k · k denotes the matrix norm, which is just the operator
norm: kAk = supkxk=1 kAxk.)
In other words, if a matrix A is invertible and B is a “small” perturbation of A
(“small” in the sense that (3.2) holds), then B is also invertible.
Proof. It suffices to show that B is injective. Let x 6= 0. Then we need to show
Bx 6= 0. Indeed,
kxk = kA−1 Axk ≤ kA−1 k · kAxk ≤ kA−1 k(k(A − B)xk + kBxk)
≤ kA−1 k · kB − Ak · kxk + kA−1 kkBxk,
which implies kA−1 kkBxk ≥ (1 − kA−1 k · kB − Ak)kxk > 0, so Bx 6= 0.
54 3. DIFFERENTIAL CALCULUS IN Rn
and
h − Df |−1 −1
a k = h + Df |a (f (x) − f (x + h)) = ϕy (x + h) − ϕy (x),
so kh − Df |−1 1
a kk ≤ 2 khk. Therefore, khk ≤ 2λkkk → 0 as kkk → 0. Now we compute
= h − Df |−1 −1
x (f (x + h) − f (x)) = −Df |x (f (x + h) − f (x) − Df |x h)
and so
1 −1 kf (x + h) − f (x) − Df |x hk khk
kg(y + k) − g(y) − Df |−1
x kk ≤ kDf |x k
kkk khk kkk
kf (x + h) − f (x) − Df |x hk
≤ kDf |−1
x k 2λ −→ 0 as k → 0.
khk
Therefore g is differentiable at y with Dg|y = Df |−1
x . This finishes the proof of part (i)
of Theorem 3.14.
To prove part (ii) we assume f is of class C 1 , and it remains to show that Dg is
continuous. To show this we need another lemma.
Lemma 3.16. Let GL(n) denote the space of real invertible n × n matrices (equipped
with some norm). The map GL(n) → GL(n) defined by A 7→ A−1 is continuous.
This lemma follows because the entries of A−1 are rational functions with non-
vanishing denominator in terms of the entries of A (by Cramer’s rule).
Since Dg|y = Df |−1
x and compositions of continuous maps are continuous (Df is
continuous by assumption), we have that Dg must be continuous, so g ∈ C 1 (V, U ).
Remark. If f is locally invertible at every point, it is not necessarily (globally) invertible
(that is, bijective).
Example 3.17. Let f : R2 → R2 be given by f (x) = (ex2 sin(x1 ), ex2 cos(x1 )). Then
x
e 2 cos(x1 ) ex2 sin(x1 )
Df |x = .
−ex2 sin(x1 ) ex2 cos(x1 )
Thus det Df |x = e2x2 (cos(x1 )2 + sin(x1 )2 ) = e2x2 6= 0, so by Theorem 3.14, f is locally
invertible at every point x ∈ R2 . f is not bijective: it is not injective because, for
instance, f (0, 0) = f (2π, 0).
It is natural to ask when Z is locally the graph of a function. The implicit function
theorem gives a satisfactory answer. More precisely, given a point (x0 , y0 ) ∈ Z we ask
whether there exists an open neighborhood of (x0 , y0 ) so that Z intersected with that
neighborhood is given as the graph of a C 1 function in the sense that there exists g so
that f (x, g(x)) = 0 for x close to x0 . Another way to think of this is that we would like
to solve the system of equations given by f (x, y) = 0 for y, when x is given (this seems
reasonable since there are m equations and m unknowns).
Theorem 3.18 (Implicit function theorem). Let E ⊂ Rn × Rm be open, f ∈
C 1 (E, Rm ) and (x0 , y0 ) ∈ Z such that the matrix Dy f |(x0 ,y0 ) ∈ Rm×m is invertible.
Then there exist open neighborhoods U, V of x0 , y0 , respectively and a C 1 function
g : U → V so that
Z ∩ (U × V ) = {(x, g(x)) : x ∈ U }.
In other words, U × V ⊂ E and f (x, g(x)) = 0 for all x ∈ U . Moreover,
(3.3) Dg|x0 = −(Dy f |(x0 ,y0 ) )−1 Dx f |(x0 ,y0 ) .
Here, Dx f |(x0 ,y0 ) ∈ Rm×n denotes the Jacobian matrix of the function x 7→ f (x, y0 )
at x0 , and Dy f |(x0 ,y0 ) ∈ Rm×m the Jacobian matrix of the function y 7→ f (x0 , y) at y0 .
It is instructive to observe that the relation (3.3) follows from an application of the
chain rule when taking derivatives on both sides of the identity
f (x, g(x)) = 0
with respect to x. This is also known as implicit differentiation.
The formula (3.3) is especially useful in cases when it is difficult or impossible to
determine the implicit function g algebraically.
Proof. The proof is an application of the inverse function theorem, Theorem 3.14.
Define a map F : E → Rn × Rm by
F (x, y) = (x, f (x, y)).
Then F is C 1 and DF |(x0 ,y0 ) is given by the (n + m) × (n + m) block matrix
In 0
Dx f |(x0 ,y0 ) Dy f |(x0 ,y0 ) ,
where In denotes the n × n identity matrix. Thus det DF |(x0 ,y0 ) = det Dy f |(x0 ,y0 ) 6= 0,
so DF |(x0 ,y0 ) is invertible. By Theorem 3.14, F is therefore locally invertible at (x0 , y0 ).
As a consequence, there exist an open neighborhood U 0 of x0 and an open neighborhood
V of y0 so that U 0 × V ⊂ E, F (U × V ) ⊂ Rn × Rm is open and F |U 0 ×V is invertible
with a C 1 inverse
G : F (U 0 × V ) → U 0 × V.
Let U = {x ∈ U 0 : (x, 0) ∈ F (U 0 × V )} ⊂ Rn . Then U is open, because F (U 0 × V ) is
open. Also, x0 ∈ U . For x ∈ U we can now define g(x) by G(x, 0) = (x, g(x)). Then
g(x) ∈ V and
(x, f (x, g(x))) = F (x, g(x)) = F (G(x, 0)) = (x, 0),
so f (x, g(x)) = 0 for all x ∈ U . Moreover, g is C 1 and satisfies (3.3).
Example 3.19. Let n = m = 1 and f (x, y) = x2 + y 2 − 1. Then Z is the unit circle
around the origin, which is locally a graph at every point with y0 6= 0. Coincidentally,
Dy f |(x,y) = 2y 6= 0 if and only if y 6= 0.
56 3. DIFFERENTIAL CALCULUS IN Rn
In this case an implicit function g can be determined explicitly: if say (x0 , y0 ) = (0, 1),
then g : (−1, 1) → R with √
g(x) = 1 − x2
is C 1 and satisfies f (x, g(x)) = 0.
Example 3.20. Let n = m = 1 and f (x, y) = x2 − y 3 . Then Z is a cubic curve with
a cusp singularity at the origin. In this case, Z is (globally) the graph of the function
g : R → R with g(x) = |x|2/3 . However,
Dy f |(x,y) = −3y 2 .
so the implicit function theorem does not apply at the cusp (x0 , y0 ) = (0, 0) ∈ Z. This
is consistent with the fact that g is not C 1 at zero.
Example 3.21. Let n = m = 1 and f (x, y) = (y − x)(y + x). Then Z is locally the
graph of a function at every point except for the origin, where it has a self-intersection.
3. Ordinary differential equations
In this section we study initial value problems of the form
0
y (t) = F (t, y(t))
y(t0 ) = y0 ,
where E ⊂ R × R is open, (t0 , y0 ) ∈ E and F ∈ C(E) are given. We say that a
differentiable function y : I → R defined on some open interval I ⊂ R that includes the
point t0 ∈ I is a solution to the initial value problem if (t, y(t)) ∈ E for all t ∈ I and
y(t0 ) = y0 and y 0 (t) = F (t, y(t)) for all t ∈ I. The equation y 0 (t) = F (t, y(t)) is a first
order ordinary differential equation . We also write this differential equation in short
form as
y 0 = F (t, y).
Geometric interpretation. At each point (t, y) ∈ E imagine a small line segment
with slope F (t, y). We are looking for a function such that its graph has the slope
F (t, y) at each point (t, y) on the graph of the function.
Example 3.22. Consider the equation y 0 = yt . The solutions of this equation are
of the form y(t) = ct for c ∈ R.
Example 3.23. Sometimes we can solve initial value problems by computing an
explicit expression for y. Recall for instance that solving differential equations of the
form y 0 = f (t)g(y) is easy (by separation of variables ). Consider for instance
0 t
y (t) = y(t)
y(t0 ) = y0
p
for (t0 , y0 ) ∈ (0, ∞) × (0, ∞). Then y(t) = t2 + y02 − t20 . Note that if y02 − t20 ≥ 0,
2 2
p y is defined on I = (0, ∞). But if y0 − t0 < 0, then y is only defined on I =
then
( t20 − y02 , ∞) 3 t0 .
In general, however it is not easy to find a solution. It may also happen that the
solution is not expressible in terms of elementary functions. Try for instance, to solve
the initial value problem
2 2
0
y (t) = ey(t) t sin(t + y(t)),
y(1) = 5.
Theorem 3.24 (Picard-Lindelöf). Let E ⊂ R × R be open, (t0 , y0 ) ∈ E, F ∈ C(E).
Let a > 0 and b > 0 be small enough such that
R = {(t, y) ∈ R2 : |t − t0 | ≤ a, |y − y0 | ≤ b} ⊂ E.
Let M = sup(t,y)∈R |F (t, y)| < ∞. Assume that there exists c ∈ (0, ∞) such that
(3.4) |F (t, y) − F (t, u)| ≤ c|y − u|
for all (t, y), (t, u) ∈ R. Define a∗ = min(a, b/M ) and let I = [t0 − a∗ , t0 + a∗ ]. Then
there exists a unique solution y : I → R to the initial value problem
0
y (t) = F (t, y(t)),
(3.5)
y(t0 ) = y0 .
(t0 , y0 )
t
Claim. T Y ⊂ Y.
Proof of claim. Let y ∈ Y. Then T y is a continuous function on I. It remains
to show that T y(t) ∈ J for all t ∈ I. Recalling that |F (t, y)| ≤ M for all (t, y) ∈ R we
obtain: Z t
|T y(t) − y0 | ≤ |F (s, y(s))|ds ≤ |t0 − t|M ≤ M a∗ ≤ b,
t0
where we used that a∗ = min(a, b/M ) ≤ b/M .
To apply the contraction principle we need to equip Y with a metric such that
T : Y → Y is a contraction and Y is complete. We could be tempted to try the
usual supremum metric d∞ (g1 , g2 ) = supt∈I |g1 (t) − g2 (t)|. Then Y ⊂ C(I) is closed, so
(Y, d∞ ) is a complete metric space. However, T will not necessarily be a contraction2
with respect to d∞ . Instead, we define the metric
d∗ (g1 , g2 ) = sup e−2c|t−t0 | |g1 (t) − g2 (t)|.
t∈I
2For the supremum metric to give rise to a contraction we would need to make the interval I
smaller.
3. ORDINARY DIFFERENTIAL EQUATIONS 59
Then d∗ (g1 , g2 ) ≤ d∞ (g1 , g2 ) ≤ e2ca∗ d∗ (g1 , g2 ). In other words, d∗ and d∞ are equivalent
metrics. This implies that (Y, d∗ ) is still complete.
Claim. T : Y → Y is a contraction with respect to d∗ .
Proof of claim. For g1 , g2 ∈ Y, t ∈ I we have by (3.4),
Z t
|T g1 (t) − T g2 (t)| = (F (s, g1 (s)) − F (s, g2 (s)))ds
t0
Z t
≤c |g1 (s) − g2 (s)|ds.
t0
Then (yn )n≥0 converges uniformly on I to the solution y. This method is called Picard
iteration .
2. Note that the length of the existence interval I does not depend on the size of the
constant c in (3.4).
Example 3.26. Consider the initial value problem
( t
y 0 (t) = e sin(t+y(t))
ty(t)−1
,
(3.6)
y(1) = 5.
t
Let F (t, y) = e sin(t+y)
ty−1
. We need to choose a rectangle R around the point (1, 5) where
we have control over |F (t, y)| and |∂y F (t, y)|. Thus we need to stay away from the set
of (t, y) such that ty − 1 = 0. Say,
R = {(t, y) : |t − 1| ≤ 21 , |y − 5| ≤ 1}.
Then for (t, y) ∈ R:
|ty − 1| ≥ (1 − 21 )(5 − 1) − 1 = 1.
Also, |et sin(t + y)| ≤ e3/2 . Setting M = e3/2 , we obtain
|F (t, y)| ≤ M for all (t, y) ∈ R.
60 3. DIFFERENTIAL CALCULUS IN Rn
Compute
et cos(t + y) et sin(t + y)
∂y F (t, y) = −t .
ty − 1 (ty − 1)2
For (t, y) ∈ R we estimate
et cos(t + y) et sin(t + y)
|∂y F (t, y)| ≤ + t ≤ c,
ty − 1 (ty − 1)2
where we have set c = e3/2 + 23 e3/2 . Then the number a∗ from Theorem 3.24 is
a∗ = min(a, b/M ) = min( 12 , 1/e3/2 ) = e−3/2 . So the theorem yields the existence
and uniqueness of a solution the the initial value problem (3.6) in the interval I =
[1 − e−3/2 , 1 + e−3/2 ]. We can also compute that solution by Picard iteration: let
y0 (t) = 5 and
Z t s
e sin(s + yn−1 (s))
yn (t) = 5 + ds.
1 syn−1 (s) − 1
The sequence (yn )n∈N converges uniformly on I to the solution y.
Example 3.27. Sometimes one can extend solutions beyond the interval obtained
from the Picard-Lindelöf theorem. Consider the initial value problem
0
y (t) = cos(y(t)2 − 2t3 )
(3.7)
y(0) = 1
We claim that there exists a unique solution y : R → R. To prove this it suffices to
demonstrate the existence of a unique solution on the interval [−L, L] for every L > 0.
To do this we invoke the Picard-Lindelöf theorem. Set
R = {(t, y) ∈ R2 : |t| ≤ L, |y − 1| ≤ L}.
Let F (t, y) = cos(y 2 − 2t3 ). Then
|F (t, y)| ≤ 1 for all (t, y) ∈ R2 .
We have ∂y F (t, y) = −2y sin(y 2 − 2t3 ), so |∂y F (t, y)| ≤ 2|y| ≤ 2(L + 1) for all (t, y) ∈ R.
Then by Theorem 3.24, there exists a unique solution to (3.7) on I = [−L, L].
Example 3.28. If the Lipschitz condition (3.4) fails, then the initial value problem
may have more than one solution. Consider
0
y (t) = |y(t)|1/2 ,
(3.8)
y(0) = 0.
on the interval [t0 − a∗ , t + a∗ ] We restrict our attention to the interval [t0 , t0 + a∗ ], which
we denote by I. The construction is similar on the other half, [t0 − a∗ , t0 ].
Let P be a partition P = {t0 < t1 < · · · < tN = t0 + a} of [t0 , t0 + a]. We let
∆P = max0≤k≤N −1 (tk+1 − tk ) denote the maximal width of P. We try to build an
approximate solution given as a continuous piecewise linear function. The function
yP : [t0 , t0 + a∗ ] → R shall be defined so that yP (t0 ) = y0 and so that on every
partition interval [tk , tk+1 ] the function is given by yP (t) − yP (tk ) = mk (t − tk ) with
mk = F (tk , yP (tk ). To see that this is possible we establish the following
Define gP (t) = F (tk , yP (tk )) for t ∈ (tk , tk+1 ]. Then gP is a step function and
yP0 (t) = gP (t) for t ∈ (tk , tk+1 ).
Claim 2. Suppose that ∆P ≤ δ(ε) min(1, M −1 ). Then we have for all t ∈ [t0 , t0 + a∗ ]
that
Z t
yP (t) = y0 + gP (s)ds and |gP (s) − F (s, yP (s))| ≤ ε if s ∈ (tk−1 , tk ).
t0
We have
|y(tk−1 ) − y(s)| ≤ M |tk−1 − s| ≤ M (tk − tk−1 ) ≤ M · ∆P ≤ δ.
Also, |tk−1 − s| ≤ tk − tk−1 ≤ ∆P ≤ δ. Thus,
k(tk−1 , y(tk−1 )) − (s, y(s))k ≤ 100δ.
By (3.11),
|g(s) − F (s, y(s))| = |F (tk−1 , y(tk−1 )) − F (s, y(s))| ≤ ε.
Claim 3. Suppose that ∆P ≤ δ(ε) min(1, M −1 ). Then it holds for all t ∈ [t0 , t0 + a∗ ]
that Z t
|yP (t) − (y0 + F (s, yP (s))ds)| ≤ εa∗ .
t0
Higher-order differential equations. Let d ≥ 1 and consider the d-th order ordinary
differential equation given by
(3.12) y (d) (t) = F (t, y(t), y 0 (t), . . . , y (d−1) (t))
for some F : E → R, E ⊂ R × Rd open. We can transform this equation into a system
of d first-order equations: if Y = (Y1 , . . . , Yd ) solves the system
Y10 (t) = Y2 (t)
0
Y2 (t) = Y3 (t)
..
.
0
0 d−1 = Yd (t)
Y (t)
Yd (t) = F (t, Y (t))
then Yd is a solution to (3.12).
Proof of Theorem 3.37. The idea is to apply Taylor’s theorem in one dimension
to the function g : [0, 1] → R given by g(t) = f (x + ty). Let us compute the derivatives
of g.
Claim. For m = 1, . . . , k + 1,
X m!
g (m) (t) = ∂ α f (x + ty)y α
α!
|α|=m
4. HIGHER ORDER DERIVATIVES AND TAYLOR’S THEOREM 67
This follows because for a given α = (α1 , . . . , αn ) with |α| = m there are
m! m! m m − α1 m − α1 − · · · − αn−1
= = ···
α! α1 ! · · · αn ! α1 α2 αn
many tuples (i1 , . . . , im ) ∈ {1, . . . , n}m such that i appears exactly αi times among the
ij s. In other words, this is the number of ways to sort m pairwise different marbles into
n numbered bins such that bin number i contains exactly αi marbles.
By the one-dimensional Taylor theorem, there exists a θ ∈ [0, 1] such that
k
X g (m) (0) m g (k+1) (θ) k+1
g(t) = t + t
m=0
m! (k + 1)!
From the claim we see that this equals
k
X 1 X m! α 1 X (k + 1)!
∂ f (x)y α tm + ∂ α f (x + θy)y α tk+1
m=0
m! α! (k + 1)! α!
|α|=m |α|=k+1
X ∂ α f (x) X ∂ α f (ξ)
= (ty)α + (ty)α ,
α! α!
|α|≤k |α|=k+1
Definition 3.39. Let E ⊂ Rn be open and f ∈ C 2 (E). We define the Hessian
matrix of f at x ∈ E by
2
∂1 f (x) · · · ∂1 ∂n f (x)
We have
X ∂ α f (x) Xn
α
y = ∂i f (x)yi = h∇f (x), yi.
α! i=1
|α|=1
(This is the standard calculus definition of a line integral of the vector field F along
the line parametrized by γ(t) = a + t(x − a) with γ 0 (t) = x − a).
We claim that F = ∇g. Compute (differentiating under the integral sign)
n
Z 1X
∂i g(x) = ∂i (xj − aj )Fj (a + t(x − a))dt
0 j=1
Z 1 h n
X i
= Fi (a + t(x − a)) + (xj − aj )∂i Fj (a + t(x − a))t dt
0 j=1
Z 1 h Xn i
= Fi (a + t(x − a)) + (xj − aj )∂j Fi (a + t(x − a))t dt
0 j=1
by the compatibiltycondition
Pn ∂i Fj = ∂j Fi . Observe using the chain rule that we have
d
dt
F i (a + t(x − a)) = j=1 (xj − aj )∂j Fi (a + t(x − a)) so that the last displayed
expression is equal to
Z 1
d
Fi (a + t(x − a)) + t Fi (a + t(x − a)) dt
0 dt
Z 1
d
= tFi (a + t(x − a)) dt = 1Fi (x) − 0Fi (a) = Fi (x).
0 dt
Here we have used, in the last displayed line, the product rule and then the fundamental
theorem of calculus. We have shown ∂i g = Fi , for all i and hence ∇g = F .
5. Local extrema
Let E ⊂ Rn be an open set and f : E → R a function.
Definition 3.42. A point a ∈ E is called a local maximum if there exists an open
set U ⊂ E with a ∈ U such that f (a) ≥ f (x) for all x ∈ U . It is called a strict local
maximum if f (a) > f (x) for all x ∈ U , x 6= a. We define the terms local minimum ,
strict local minimum accordingly. A point is called a (strict) local extremum if it is a
(strict) local maximum or a (strict) local minimum.
70 3. DIFFERENTIAL CALCULUS IN Rn
We also write A > 0 to express that A is positive definite and A ≥ 0 to express that A
is positive semidefinite. The terms negative definite, negative semidefinite are defined
accordingly. A is indefinite if it is not positive semidefinite and not negative semidef-
inite. Every real symmetric matrix has real eigenvalues and there is an orthonormal
basis of eigenvectors (spectral theorem). A real symmetric matrix is positive definite if
and only if all eigenvalues are positive.
Theorem 3.46. Let f ∈ C 2 (E) and a ∈ E with ∇f (a) = 0. Then
(1) if D2 f |a > 0, then a is a strict local minimum of f ,
(2) if D2 f |a < 0, then a is a strict local maximum of f ,
(3) if D2 f |a is indefinite, then a is not a local extremum of f .
Remark. If D2 f |x is only positive semidefinite or negative semidefinite, then we need
more information to be able to decide whether or not a is a local extremum.
Proof. We write A = D2 f |a . Let ε > 0. By Corollary 3.40 there exists δ > 0 such
that for all y with kyk ≤ δ we have
f (a + y) = f (a) + 21 hy, Ayi + r(y)
with |r(y)| ≤ εkyk2 .
(1): Let A be positive definite. Let S = {y ∈ Rn : kyk = 1}. S is compact, so the
continuous map y 7→ hy, Ayi attains its minimum on S. That is, there exists y0 ∈ S
such that
hy0 , Ay0 i ≤ hy, Ayi
for all y ∈ S. Define α = hy0 , Ay0 i. Since y0 6= 0 and A is positive definite, α > 0. Let
y
y ∈ Rn , y 6= 0. Then kyk ∈ S, so
y y 1
α≤h ,A i= hy, Ayi.
kyk kyk kyk2
Thus, hy, Ayi ≥ αkyk2 for all y ∈ Rn . Now we set ε = α4 . Then
f (a + y) ≥ f (a) + 12 hy, Ayi − α4 kyk2 ≥ f (a) + α2 kyk2 − α4 kyk2 = f (a) + α4 kyk2 > f (a)
if y 6= 0, kyk ≤ δ. Therefore a is a local minimum.
(2): Follows from (1) by replacing f by −f .
6. LOCAL EXTREMA ON SURFACES 71
(3): Let A be indefinite. We need to show that in every open neighborhood of a there
exist points y 0 , y 00 such that
f (y 00 ) < f (a) < f (y 0 ).
Since A is not negative semidefinite there exists ξ ∈ Rn such that α = hξ, Aξi > 0.
Then, for t ∈ R small enough such that |tξ| ≤ δ we have
f (a + tξ) = f (a) + 12 htξ, Atξi + r(tξ) = f (a) + 21 αt2 + r(tξ).
Let ε > 0 be such that |r(tξ)| ≤ α4 t2 for all |tξ| ≤ δ (recall that δ depends on ε). Then
f (a + tξ) ≥ f (a) + 14 αt2 > f (a). Similarly, since A is also not positive semidefinite,
there exists η ∈ Rn such that hη, Aηi < 0 and for small enough t, f (a + tη) < f (a).
Examples 3.47. (1) Let f (x, y) = c + x2 + y 2 for c ∈ R. Then
2 2 0
D f |0 = >0
0 2
and 0 is a strict local minimum of f (even a global minimum).
(2) Let f (x, y) = c + x2 − y 2 for c ∈ R. Then
2 2 0
D f |0 =
0 −2
is indefinite and 0 is not a local extremum of f .
(3) Let f1 (x, y) = x2 + y 4 , f2 (x, y) = x2 , f3 (x, y) = x2 + y 3 . Then
2 2 0
D fi |0 = ≥ 0,
0 0
but f1 has a strict local minimum at 0, f2 has a (non-strict) local minimum at
0 and f3 has no local extremum at 0.
Proof of Theorem 3.48. Without loss of generality we may assume (after pos-
sibly relabeling the variables) that the m × m matrix with entries ∂i gj (a) with i, j =
1, . . . , m has maximal rank m. By the implicit function theorem the equation g(x) = 0
can be solved near a by expressing the variables xi , i = 1, . . . , m as functions ui of
x00 = (xm+1 , . . . , xn ). Thus there is a δ > 0 such that for
Bδ = {x = (x0 , x00 ) ∈ Rm : |xi − ai | < δ, i = 1, . . . , n}
the equation g(x) = 0 for x ∈ Bδ is satisfied if and only if x0 = u(x00 ) for x00 =
(xm+1 , . . . , xn ). Hence
(3.15) g(u(x00 ), x00 ) = 0
for x00 close to a00 , and u(a00 ) = a0 . By assumption the function x00 7→ F (u(x00 ), x00 )
has a local extremum at a00 and therefore, by Corollary 3.44 we have ∂ν F (a00 ) = 0 for
ν = m + 1, . . . , n. By the chain rule
Xm
(3.16) ∂ν F (a) = ∂ν f (a) + ∂i f (a)∂ν ui (a00 ) = 0, ν = m + 1, . . . , n.
i=1
Remark. Note that the smallness of α in this lemma depends on the point x. Also,
this result is not enough to prove anything about the convergence of gradient descent.
We will see that gradient descent works well if f is a convex function.
Definition 3.52. Let E ⊂ Rn be convex. A function f : E → R is called convex
if
f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y)
for all x, y ∈ E, t ∈ [0, 1]. f is called strictly convex if
f (tx + (1 − t)y) < tf (x) + (1 − t)f (y)
for all x 6= y ∈ E and t ∈ (0, 1).
Theorem 3.53. Let E ⊂ Rn be open and convex and f ∈ C 1 (E). Then f is convex
if and only if
f (u + v) ≥ f (u) + h∇f (u), vi
for all u, u + v ∈ E.
Proof. ⇒: Fix u, u + v ∈ E. By convexity, for t ∈ [0, 1],
f (u + tv) = f ((1 − t)u + t(u + v)) ≤ (1 − t)f (u) + tf (u + v).
By definition of the derivative,
f (u + tv) = f (u) + t∇f (u)T v + r(t),
r(t)
where limt→0 t
= 0. Thus,
f (u) + th∇f (u), vi + r(t) ≤ (1 − t)f (u) + tf (u + v)
which implies
−r(t)
f (u) + h∇f (u), vi − f (u + v) ≤ → 0 as t → 0.
t
Therefore f (u) + h∇f (u), vi ≤ f (u + v).
⇐: Let x, y ∈ E, t ∈ [0, 1]. Let u = tx + (1 − t)y and v = x − u. Then the assumption
implies
f (x) ≥ f (u) + h∇f (u), x − ui.
On the other hand, letting v = y − u, the assumption implies
f (y) ≥ f (u) + h∇f (u), y − ui.
Therefore
tf (x) + (1 − t)f (y) ≥ t(f (u) + h∇f (u), x − ui) + (1 − t)(f (u) + h∇f (u), y − ui)
= f (u) + h∇f (u), t(x − u) + (1 − t)(y − u)i = f (u) + h∇f (u), tx + (1 − t)y − ui.
Recalling that u = tx + (1 − t)y, we get
tf (x) + (1 − t)f (y) ≥ f (u) = f (tx + (1 − t)y).
Theorem 3.54. Let E ⊂ Rn be open and convex and f ∈ C 2 (E). Then
(1) f is convex if and only if D2 f |x ≥ 0 for all x ∈ E,
(2) f is strictly convex if D2 f |x > 0 for all x ∈ E.
7. OPTIMIZATION AND CONVEXITY* 75
Proof. We only prove (1). The proof of (2) is very similar. Let f be convex. By
Taylor’s theorem, for u, u + tv ∈ E,
f (u + tv) = f (u) + th∇f (u), vi + 12 t2 hD2 f |u v, vi + o(t2 )
and by Theorem 3.53,
f (u + tv) ≥ f (u) + th∇f (u), vi.
Combining these two pieces of information we obtain
1 2
2
t hD2 f |u v, vi + o(t2 ) ≥ 0
which implies hD2 f |u v, vi ≥ 0 for all v ∈ Rn .
Conversely, assume that D2 f |u ≥ 0 for all u ∈ E. By Taylor’s theorem, for all u, u+v ∈
E exists ξ ∈ E such that
f (u + v) = f (u) + h∇f (u), vi + 12 hD2 f |ξ v, vi ≥ f (u) + h∇f (u), vi.
Therefore f is convex by Theorem 3.53.
Remark. If f is strictly convex, then it does not follow that D2 f |x > 0 for all x.
Example 3.55. Let f : R → R, f (x) = x4 . Then D2 f |x = f 00 (x) = 12x2 which is 0
at x = 0, but f is strictly convex.
Theorem 3.56. Let E ⊂ Rn be open and convex and f ∈ C 2 (E). Then
(1) If f is convex, then every critical point of f is a global minimum.
(2) If f is strictly convex, then f has at most one critical point.
Remarks. 1. Convex functions may have more than one critical point. For instance,
the constant function f ≡ 0 is convex.
2. Conclusion (1) implies that if f is convex and gradient descent converges, then it
converges to a global minimum.
Proof. (1): Let ∇f (x∗ ) = 0. Then by Taylor’s theorem, for every x ∈ E there
exists ξ ∈ E such that
f (x) = f (x∗ ) + h∇f (x∗ ), x − x∗ i + 12 hD2 f |ξ (x − x∗ ), x − x∗ i ≥ f (x∗ ).
| {z } | {z }
=0 ≥0
(2): Let x1 , x2 ∈ E be critical points of f . By (1), they are global minima. This implies
f (x1 ) = f (x2 ). If x1 6= x2 , then by strict convexity,
f (x1 ) + f (x2 ) x + x
1 2
f (x1 ) = >f .
2 2
This is a contradiction to x1 being a global minimum. Therefore x1 = x2 .
Example 3.57. If k · k is a norm on Rn , then the function x 7→ kxk is convex:
ktx + (1 − t)yk ≤ tkxk + (1 − t)kyk
by the triangle inequality. Also, this function has a unique global minimum at x = 0.
Lemma 3.58. Let I ⊂ R, E ⊂ Rn be convex and suppose that
(1) f : E → I is convex, and
(2) g : I → R is convex and nondecreasing.
Then the function h : E → R given by h = g ◦ f is convex.
76 3. DIFFERENTIAL CALCULUS IN Rn
Definition 3.62. Let E ⊂ Rn be convex and open. Let f ∈ C 2 (E). We say that
f is strongly convex if there exists β > 0 such that
hD2 f |x y, yi ≥ βkyk2
for all x ∈ E, y ∈ Rn .
Remarks. 1. f is strongly convex if and only if there exists β > 0 such that
D2 f |x − βI ≥ 0 for all x ∈ E. This follows directly from the definition using that
βkyk2 = hβIy, yi. The condition D2 f |x − βI ≥ 0 is equivalent to the smallest eigen-
value of D2 f |x being ≥ β. Yet another equivalent way of stating this is saying that the
function g(x) = f (x) − β2 kxk2 is convex. This is because D2 g|x = D2 f |x − βI.
2. If f is strongly convex, then f is strictly convex (by Theorem 3.54).
3. If f is strictly convex, then f is not necessarily strongly convex. For example con-
sider f : R → R, f (x) = ex . For every β > 0 there exists x ∈ R such that ex < β
because ex → 0 as x → −∞.
The following exercise shows that the assumption of strong convexity is not as
restrictive as it may seem at first sight: strictly convex functions are strongly convex
when restricted to compact sets.
Exercise 3.63. Suppose that f ∈ C 2 (Rn ) is strictly convex. Let K ⊂ Rn be
compact and convex. Show that there exist β− , β+ > 0 such that
β− kyk2 ≤ hD2 f |x y, yi ≤ β+ kyk2
for all x ∈ K and y ∈ Rn . (In particular, f is strongly convex on K.)
Hint: Consider the minimal eigenvalue of D2 f |x as a function of x.
Theorem 3.64. Let E ⊂ Rn be open and convex. Let f ∈ C 2 (E). Then f is
strongly convex if and only if there exists γ > 0 such that
f (u + v) ≥ f (u) + h∇f (u), vi + γkvk2
for every u, u + v ∈ E.
Proof. ⇒: Let β > 0 be such that g(x) = f (x) − β2 kxk2 is convex. Then by
Theorem 3.53,
g(u + v) ≥ g(u) + h∇g(u), vi = f (u) − β2 kuk2 + h∇f (u) − βu, vi
On the other hand,
g(u + v) = f (u + v) − β2 ku + vk2
Thus,
f (u + v) ≥ f (u) + h∇f (u), vi + β2 (ku + vk2 − kuk2 − 2hu, vi) = f (u) + h∇f (u), vi + β2 kvk2 .
⇐: This follows in the same way from the converse direction of Theorem 3.53.
Theorem 3.65. Let f ∈ C 2 (Rn ) be strongly convex. Then for every c ∈ R, the
sublevel set
B = {x ∈ Rn : f (x) ≤ c}
is bounded.
78 3. DIFFERENTIAL CALCULUS IN Rn
n n
!1/2
X X
kAxk = ( |xi |2 λ2i )1/2 ≤ max λi |xi |2 = max λi kxk.
i=1,...,n i=1,...,n
i=1 i=1
Let maxi=1,...,n λi = λi0 . We have shown that kAk ≤ λi0 . On the other hand,
kAvi0 k = λi0 kvi0 k = λi0 ,
so kAk = supkxk=1 kAxk ≥ kAvi0 k = λi0 .
Proof of Theorem 3.67. Let α > 0. Define T (x) = x − α∇f (x). Then xn+1 =
T (xn ). We want T to be a contraction. For R > 0 define BR = {x ∈ Rn : kx − x∗ k ≤
R}. Let R > 0 be large enough such that x0 ∈ BR .
Claim. If α is small enough, then T is a contraction of BR .
8. FURTHER EXERCISES 79
8. Further exercises
Exercise 3.69. Let U ⊆ Rn be open and convex and f : U → R differentiable such
that ∂1 f (x) = 0 for all x ∈ U .
(i) Show that the value of f (x) for x = (x1 , . . . , xn ) ∈ U does not depend on x1 .
(ii) Does (i) still hold if we assume that U is connected instead of convex? Give a proof
or counterexample.
Exercise 3.70. A function f : Rn \ {0} → R is called homogeneous of degree α ∈ R
if f (λx) = λα f (x) for all λ > 0 and x ∈ Rn \ {0}. Suppose that f is differentiable in
Rn \ {0}. Then show that f is homogeneous of degree α if and only if
Xn
xi ∂i f (x) = αf (x)
i=1
for all x ∈ R \ {0}. Hint: Consider the function g(λ) = f (λx) − λα f (x).
n
Find the largest interval I ⊆ R containing t0 = 2 such that the problem has a unique
solutions y in I.
Exercise 3.79. Let F be a smooth function on R2 (i.e. partial derivatives of all
orders exist everywhere and are continuous) and suppose that the initial value problem
y 0 = F (t, y), y(t0 ) = y0 has a unique solution y on the interval I = [t0 , t0 + a] with
y smooth on I. Let h > 0 be sufficiently small and define tk = t0 + kh for integers
0 ≤ k ≤ a/h.
Define a function yh recursively by setting yh (t0 ) = y0 and
yh (t) = yh (tk ) + (t − tk )F (tk , yh (tk ))
for t ∈ (tk , tk+1 ] for integers 0 ≤ k ≤ a/h.
(i) From the proof of Peano’s theorem (Theorem 3.29) it follows that yh → y uniformly
on I as h → 0. Prove the following stronger statement: there exists a constant C > 0
such that for all t ∈ I and h > 0 sufficiently small,
|y(t) − yh (t)| ≤ Ch.
Hint: The left hand side is zero if t = t0 . Use Taylor expansion to study how the error
changes as t increases from tk to tk+1 .
(ii) Let F (t, y) = λy with λ ∈ R a parameter. Explicitly determine y, yh and a value
for C in (i).
Exercise 3.80. Let us improve the approximation from Exercise 3.79. In the
context of that exercise, define a piecewise linear function yh∗ recursively by setting
yh∗ (t0 ) = y0 and
yh∗ (t) = yh∗ (tk ) + (t − tk )G(tk , yh∗ (tk ), h),
for t ∈ (tk , tk+1 ] for integers 0 ≤ k ≤ a/h, where
G(t, y, h) = 12 (F (t, y) + F (t + h, y + hF (t, y))).
Prove that there exists a constant C > 0 such that for all t ∈ I and h > 0 sufficiently
small,
|y(t) − yh∗ (t)| ≤ Ch2 .
Exercise 3.81. For a function f : [a, b] → R define
Z b
I(f ) = (1 + f 0 (t)2 )1/2 dt.
a
2
Let A = {f ∈ C ([a, b]) : f (a) = c, f (b) = d}. Determine f∗ ∈ A such that
I(f∗ ) = inf I(f ).
f ∈A
as y → 0 and
X
f (x + y) = cα y α + o(kykk )
e
|α|≤k
as y → 0. Show that cα = e
cα for all |α| ≤ k.
Exercise 3.89. Let D = {(x, y) ∈ R2 : x2 + y 2 ≤ 1}. Determine the maximum
and minimum values of the function f : D → R, f (x, y) = 4x2 − 3xy.
Exercise 3.90. Let f ∈ C 2 (Rn ) and suppose that the Hessian of f is positive
definite at every point. Show that ∇f : Rn → Rn is an injective map.
Exercise 3.91. For f (x, y, z) = x + y + z determine all maxima and minima under
the constraints x2 + y 2 = 2 and x + z − 1 = 0. Use the method of Lagrange multipliers.
Exercise 3.92. Let f ∈ C 2 (Rn ) be strongly convex. Show that ∇f : Rn → Rn is
a diffeomorphism (that is, show that it is differentiable, bijective and that its inverse is
differentiable).
Exercise 3.93. Let f (x) = 21 hAx, xi − hb, xi + c with A ∈ Rn×n and b ∈ Rn , c ∈ R.
Assume that A is symmetric and positive definite. Show that f has a unique global
minimum at some point x∗ and determine f (x∗ ) in terms of A, b, c.
Exercise 3.94. Prove that the point x∗ from Exercise 3.93 can be computed using
gradient descent: that is, if x0 ∈ Rn arbitrary and
xn+1 = xn − α∇f (xn )
for n = 0, 1, 2, . . . , then the sequence (xn )n∈N converges to x∗ for all starting points
x0 ∈ Rn , provided that α is chosen sufficiently small.
8. FURTHER EXERCISES 83
Exercise 3.96. (a) Find a convex function that is not bounded from below.
(b) Find a strictly convex function that is not bounded from below.
(c) If a function is strictly convex and bounded from below, does it necessarily have a
critical point? (Proof or counterexample.)
Exercise 3.97. (a) Give an example of a convex function that is not continuous.
(b) Let f : (a, b) → R. Show that if f is convex, then f is continuous.
Exercise 3.98. Construct a strictly convex function f : R → R such that f is not
differentiable at x for every x ∈ Q.
Exercise 3.99. Let f ∈ C 2 (Rn ). Recall that we defined f to be strongly convex if
there exists β > 0 such that hD2 f |x y, yi ≥ βkyk2 for every x, y ∈ Rn . Show that f is
strongly convex if and only if there exists γ > 0 such that
f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y) − γt(1 − t)kx − yk2
for all x, y ∈ Rn , t ∈ [0, 1].
(Consequently, that condition can serve as an alternative definition of strong convexity,
which is also valid if f is not C 2 .)
Exercise 3.100. (See also Exercise 4.82 as motivation for this exercise.) Fix a
function σ ∈ C 1 (R) and define for x ∈ Rn , W ∈ Rm×n , v ∈ Rm ,
Xm
µ(x, W, v) = σ((W x)i )vi
i=1
Approximation of functions
1. Polynomial approximation
Theorem 4.1 (Weierstrass). For every continuous function f on [a, b] there exists
a sequence of polynomials that converges uniformly to f .
In other words, the theorem says that the set A = {p : p polynomial} is dense in
C([a, b]).
There are many proofs of this theorem in the literature. We present a proof using
Bernstein polynomials . Without loss of generality we consider only the interval [a, b] =
[0, 1] (why are we allowed to do that?).
Let f be continuous on [0, 1]. Define for n = 1, 2, . . . :
Xn k n
Bn f (t) = f tk (1 − t)n−k .
k=0
n k
Thus,
n
X n k
(4.1) Bn f (t) − f (t) = (f (k/n) − f (t)) t (1 − t)n−k .
k=0
k
Let ε > 0. By uniform continuity of f we choose δ > 0 be such that |f (t) − f (s)| ≤ ε/2
for all t, s ∈ [0, 1] with |t − s| ≤ δ. Now we write the sum on the right hand side of
85
86 4. APPROXIMATION OF FUNCTIONS
n−1 2 1 t − t2
= t + t = t2 + .
n n n
As a consequence, we obtain the following:
2. ORTHONORMAL SYSTEMS 87
t − t2 t − t2
= t2 + − 2t2 + t2 = .
n n
Since t ∈ [0, 1] we have 0 ≤ t − t2 = t(1 − t) ≤ 1.
Now we are ready to estimate II. First note that f is bounded, so there exists C ≥ 0
such that |f (x)| ≤ C for all x ∈ [0, 1]. Choose N ∈ N such that 2cδ −2 N −1 ≤ ε/2. Then
for all n ≥ N ,
n n
X n k n−k −2
X k 2 n
|II| ≤ 2c t (1 − t) ≤ 2Cδ ( − t) tk (1 − t)n−k
k=0,
k k=0
n k
k
| −t|≥δ
n
≤ 2cδ −2 N −1 ≤ ε/2.
In the second inequality we have used that δ −2 | nk −t|2 ≤ 1. Thus if n ≥ N and t ∈ [0, 1],
then
|Bn f (t) − f (t)| ≤ |I| + |II| ≤ ε/2 + ε/2 = ε.
This concludes the proof of Weierstrass’ theorem.
2. Orthonormal systems
In the previous section we studied approximation of continuous functions in the
supremum norm, kf k∞ = supx∈[a,b] |f (x)|. In this section we turn our attention to
another important norm, the L2 norm.
Definition 4.4. For two piecewise continuous functions f, g on an interval [a, b] we
define their inner product by
Z b
(4.2) hf, gi = f (x)g(x)dx.
a
• Sesquilinearity:
hf + λg, hi = hf, hi + λhg, hi,
hh, f + λgi = hh, f i + λhh, gi.
• Antisymmetry: hf, gi = hg, f i
• Positivity: hf, f i ≥ 0 (and > 0 unless f is zero except at possibly finitely many
points)
Theorem 4.5 (Cauchy-Schwarz inequality). For two piecewise continuous functions
f, g we have
|hf, gi| ≤ kf k2 kgk2 .
Proof. For nonnegative real numbers x and y we have the elementary inequality
x2 y 2
xy ≤ + .
2 2
Thus we have
Z b Z b Z b
2
|hf, gi| ≤ 1
|f (x)g(x)|dx ≤ 2 1
|f (x)| dx + 2 |g(x)|2 dx. = 12 hf, f i + 21 hg, gi.
a a a
Now we note that for every λ > 0, replacing f by λf and g by λ−1 g does not change
the left hand side of this inequality. Thus we have for every λ > 0 that
λ2 1
(4.3) |hf, gi| ≤ 2
hf, f i + 2λ2
hg, gi.
q
hg,gi 2
Now we choose λ so that this inequality is as strong as possible: λ = hf,f i
(we may
assume that hf, f i =
6 0 because otherwise there is nothing to show). Then
p p
|hf, gi| ≤ hf, f i hg, gi.
Note that one can arrive at this definition of λ in a systematic way: treat the right
hand side of (4.3) as a function of λ and minimize it using calculus.
Corollary 4.6 (Minkowski’s inequality). For two functions f, g ∈ pc([a, b]),
kf + gk2 ≤ kf k2 + kgk2 .
Proof. We may assume kf + gk2 6= 0 because otherwise there is nothing to prove.
Then Z b Z b Z b
2 2
kf + gk2 = |f + g| ≤ |f + g||f | + |f + g||g|
a a a
(The index n may run over the natural numbers, or thePintegers, a finite set of
integers, or more generally any countable set. We will write n to denote a sum over
all the indices. In proofs we will always adopt the interpretation that the index n runs
over 1, 2, 3, . . . . This is no loss of generality.)
Notice that the formula cn = hf, φn i still makes sense if f is not of the form (4.4).
Theorem 4.11. Let (φn )n be an orthonormal system on [a, b]. Let f be a piecewise
continuous function. Consider
N
X
sN (x) = hf, φn iφn (x).
n=1
XN⊥
kf − gk2 kf − sN k2
XN
g sN
Thus,
hf − g, f − gi = hf, f i − hf, gi − hg, f i + hg, gi
N
X N
X N
X
= hf, f i − c n bn − cn b n + |bn |2
n=1 n=1 n=1
N
X N
X
= hf, f i − |cn |2 + |bn − cn |2
n=1 n=1
We have
(4.7) hf − sN , f − sN i = hf, f i − hf, sN i − hsN , f i + hsN , sN i
N
X N
X N
X
= hf, f i − 2 |cn |2 + |cn |2 = hf, f i − |cn |2 .
n=1 n=1 n=1
Thus we have shown
N
X
hf − g, f − gi = hf − sN , f − sN i + |bn − cn |2
n=1
PN
which implies the claim since n=1 |bn − cn |2 ≥ 0 with equality if and only if bn = cn
for all n = 1, . . . , N .
Proof of Theorem 4.12. From the calculation in (4.7),
N
X
hf, f i − |cn |2 = hf − sN , f − sN i ≥ 0,
n=1
PN 2 2
so n=1 |cPn | ≤ kf k2 for all N . Letting N → ∞ this proves the claim (in particular,
∞ 2
the series n=1 |cn | converges).
Proof of Theorem 4.15. From (4.7),
N
X
kf − sN k22 = hf, f i − |hf, φn i|2
n=1
Definition 4.17. Each dyadic interval I ∈ D with |I| = 2−k can be split in the
middle into its left child and right child , which are again dyadic intervals that we
denote by I` and Ir , respectively.
Example 4.18. The interval I = [ 21 , 21 + 14 ) is a dyadic interval and its left and right
children are given by I` = [ 12 , 12 + 18 ) and Ir = [ 21 + 18 , 12 + 14 ).
0 1
D0
D1
D2
.. .. ..
. . .
S
D= k≥0 Dk
0 I` Ir 1
Lemma 4.19. (1) Two dyadic intervals are either disjoint or contained in each
other. That is, for every I, J ∈ D at least one of the following is true: I ∩J = ∅
or I ⊂ J or J ⊂ I.
(2) For every k ≥ 0 the dyadic intervals of generation k are a partition of [0, 1).
That is,
[
[0, 1) = I.
I∈Dk
|I|−1/2
I` Ir
Let f ∈ C([0, 1]). Motivated by Theorem 4.11, we define for every positive integer n,
the orthogonal projection X
En f = hf, ψI iψI .
I∈D<n
Compute Z Z Z
1/2
f− f = |I| f · ψI = |I|1/2 hf, ψI i
I` Ir
and by the same reasoning,
Z Z
g− g = |I|1/2 hg, ψI i.
I` Ir
Adding the previous two displays gives hf iI` = hgiI` and subtracting them gives hf iIr =
hgiIr . This concludes the proof.
Proof of Theorem 4.27. Let ε > 0. By uniform continuity of f on [0, 1] (which
follows from Theorem 1.53) we may choose δ > 0 such that |f (t) − f (s)| < ε whenever
t, s ∈ [0, 1] are such that |t − s| < δ. Let N ∈ N be large enough so that 2−N < δ and
n ≥ N . Let t ∈ [0, 1] and I ∈ Dn such that t ∈ I. Then by Theorem 4.26,
Z
−1
|En f (t) − f (t)| = |hf iI − f (t)| ≤ |I| |f (s) − f (t)|ds < ε.
I
Remark. This result goes back to A. Haar’s 1910 article Zur Theorie der orthogo-
nalen Funktionensysteme in Math. Ann. 69 (1910), no. 3, p. 331–371. The functions
(En f )n are also called dyadic martingale averages of f and have wide applications in
modern analysis and probability theory.
Exercise 4.30. Recall the functions rn (x) = sgn(sin(2n πx)) from Exercise 4.10.
(i) Show that every rn for n ≥ 1 can be written as a finite linear combination of Haar
functions and determine the coefficients of this linear combination.
(ii) Show that the orthonormal system on [0, 1] given by (rn )n is not complete.
Exercise 4.31. Define
X 1/2
∆n f = En+1 f − En f, Sf = |∆n f |2 .
n≥1
R1
(i) Assume that 0 f = 0. Prove that kSf k2 = kf k2 .
(ii) Show that for every m ∈ N there exists a function fm that is a finite linear combi-
nation of Haar functions such that supx∈[0,1] |fm (x)| ≤ 1 and supx∈[0,1] |Sfm (x)| ≥ m.
4. Trigonometric polynomials
In the following we will only be concerned with the trigonometric system on [0, 1]:
φn (x) = e2πinx (n ∈ Z)
Definition 4.32. A trigonometric polynomial is a function of the form
N
X
(4.9) f (x) = cn e2πinx (x ∈ R),
n=−N
N
X
(4.10) f (x) = a0 + (an cos(2πnx) + bn sin(2πnx)).
n=1
Exercise 4.33. Work out how the coefficients an , bn in (4.10) are related to the cn
in (4.9).
96 4. APPROXIMATION OF FUNCTIONS
One goal in this section is to show that this orthonormal system is in fact complete.
We denote by pc the space of piecewise continuous, 1-periodic functions f : R →
C (let us call a 1-periodic function piecewise continuous, if its restriction to [0, 1] is
piecewise continuous in the sense defined in the beginning of this section).
Definition 4.35. For a 1-periodic function f ∈ pc and n ∈ Z we define the nth
Fourier coefficient by
Z 1
fb(n) = f (t)e−2πint dt.
0
The series
∞
X
fb(n)e2πinx
n=−∞
is called the Fourier series of f .
The question of when the Fourier series of a function f converges and in what sense
it represents the function f is a very subtle issue and we will only scratch the surface
in this lecture.
Definition 4.36. For a 1-periodic function f ∈ pc we define the partial sums
N
X
SN f (x) = fb(n)e2πinx .
n=−N
where
N
X
DN (x) = e2πinx .
n=−N
The sequence of functions (DN )N is called Dirichlet kernel . The Dirichlet kernel can
be written more explicitly.
Lemma 4.40. We have
sin(2π(N + 21 )x)
DN (x) =
sin(πx)
Proof.
N 2N
X X e2πi(2N +1)x − 1
DN (x) = e2πinx = e−2πiN x e2πinx = e−2πiN x
n=−N n=0
e2πix − 1
1 1
e2πi(N + 2 ) − e−2πi(N + 2 )x sin(2π(N + 21 )x)
= = .
eπix − e−πix sin(πx)
We would like to approximate continuous functions by trigonometric polynomials.
If f is only continuous it may happen that SN f (x) does not converge. However, instead
of SN f we may also consider their arithmetic means. We define the Fejér kernel by
N
1 X
KN (x) = Dn (x).
N + 1 n=0
98 4. APPROXIMATION OF FUNCTIONS
Remark. There is no unity for the convolution of functions. More precisely, there
exists no continuous function k such that k ∗ f = f for all continuous, 1-periodic f (this
is the content of Exercise 4.62). An approximation of unity is a sequence (kn )n that
approximates unity:
lim kn ∗ f = f
n→∞
for every continuous, 1-periodic f .
Theorem 4.45. Let (kn )n be a sequence of 1-periodic continuous functions such
that
(1) kn (x) ≥ 0
R 1/2
(2) −1/2 kn (t)dt = 1.
(3) For all 1/2 ≥ δ > 0 we have
Z δ
kn (t)dt → 1
−δ
as n → ∞.
Then (kn )n is an approximation of unity.
Assumption (3) is a precise way to express the idea that the “mass” of kn con-
centrates near the origin. Keeping in mind Assumption (2), Assumption (3) can be
rewritten equivalently as: Z
kn (t)dt → 0
1
≥|t|≥δ
2
where
Z Z
A= (f (x − t) − f (x))kn (t)dt, B= (f (x − t) − f (x))kn (t)dt.
1
|t|≤δ ≥|t|≥δ
2
In view of Theorem 4.15 this means that the trigonometric system is complete.
Corollary 4.48 (Parseval’s theorem). If f, g are 1-periodic, continuous functions,
then
∞
X
hf, gi = fb(n)bg (n).
n=−∞
In particular,
∞
X
(4.13) kf k22 = |fb(n)|2 .
n=−∞
Proof. We have
N
X N
X
hSN f, gi = fb(n)he2πinx , gi = fb(n)b
g (n).
n=−N n=−N
Remark. Theorems 4.47 and Corollary 4.48 also hold for piecewise continuous and
1-periodic functions.
Exercise 4.49. (i) Let f be the 1-periodic function such that f (x) = x for x ∈ [0, 1).
Compute the Fourier coefficient fb(n) for every n ∈ Z and use Parseval’s theorem to
derive the formula
∞
X 1 π2
= .
n=1
n2 6
(ii)
P∞Using Parseval’s theorem for a suitable 1-periodic function, determine the value of
1
n=1 n4 .
While the Fourier series of a continuous function does not necessarily converge point-
wise, we can obtain pointwise convergence easily if we impose additional conditions.
Theorem 4.50. Let f be a 1-periodic continuous function and let x ∈ R. Assume
that f is differentiable at x. Then SN f (x) → f (x) as N → ∞.
102 4. APPROXIMATION OF FUNCTIONS
Proof. By definition,
Z 1
SN f (x) = f (x − t)DN (t)dt.
0
Also,
Z 1 N
X Z 1
DN (t)dt = e2πint dt = 1.
0 n=−N 0
where
f (x − t) − f (x)
g(t) = .
sin(πt)
Differentiability of f at x implies that g is continuous at 0. Indeed,
f (x − t) − f (x) f (x − t) − f (x) t 1
= → f 0 (x)
sin(πt) t sin(πt) π
as t → 0.
√
Exercise 4.51. Show that φn (x) = 2 sin(2π(n + 21 )x) with n = 1, 2, . . . defines
an orthonormal system on [0, 1].
With this exercise, the claim follows from (4.14) and the Riemann-Lebesgue lemma
(Corollary 4.13).
Exercise 4.52. Show that there exists a constant c > 0 such that
Z 1
|DN (x)|dx ≥ c log(2 + N )
0
s1 + · · · + sN
σN = .
N
σN is called
P∞ the N th Cesàro mean of the sequence sk or the N th Cesàro
P∞ sum of the
series k=1 ak . If σN converges to a limit S we say that the series k=1 ak is Cesàro
summable to S. P
∞
(ii) Prove
P∞ that if k=1 ak is summable to S (i.e. by definition converges with sum S)
then k=1 ak is Cesàro summable
P∞ to S.
(iii) Prove that the sum k=1 (−1)k−1 does not converge but is Cesàro summable to
some limit S and determine S.
5. THE STONE-WEIERSTRASS THEOREM 103
PnExercise 4.56. Let A ⊂ C([1, 2]) be the set of all polynomials of the form p(x) =
2k+1
k=0 ck x where ck ∈ C and n a non-negative integer. Show that A is dense, but
not an algebra.
Before we begin the proof of the Stone-Weierstrass theorem we first need some
preliminary lemmas.
Lemma 4.57. For every a > 0 there exists a sequence of polynomials (pn )n with real
coefficients such that pn (0) = 0 for all n and supy∈[−a,a] |pn (y) − |y|| → 0 as n → ∞.
Proof. From Weierstrass’ theorem we get that there exists a sequence of polyno-
mials qn that converges uniformly to f (y) = |y| on [−a, a]. In particular qn (0) → 0.
Now set pn (y) = qn (y) − qn (0).
Exercise 4.58. Work out an explicit sequence of polynomials (pn )n that converges
uniformly to x 7→ |x| on [−1, 1].
Let A ⊂ C(K) satisfy conditions (1),(2),(3). Observe that then also A satisfies (1),
(2), (3).
We may assume without loss of generality that we are dealing with real-valued
functions (otherwise split functions into real and imaginary parts f = g + ih and go
through the proof for both parts).
Lemma 4.59. If f ∈ A, then |f | ∈ A.
Proof. Let ε > 0 and a = maxx∈K |f (x)|. By Lemma 4.57 there exist c1 , . . . , cn ∈
R such that
n
X
| ci y i − |y|| ≤ ε.
i=1
104 4. APPROXIMATION OF FUNCTIONS
Claim: For every x ∈ K there exists gx ∈ A such that gx (x) = f (x) and
gx (t) > f (t) − ε for t ∈ K.
Proof of Claim. Let y ∈ K. By Lemma 4.61 there exists hy ∈ A such that
hy (x) = f (x) and hy (y) = f (y). By continuity of hy there exists an open ball By
around y such that |hy (t) − f (t)| < ε for all t ∈ By . In particular,
hy (t) > f (t) − ε.
Observe that (By )y∈K is an open cover of K. Since K is compact, we can find a finite
subcover by By1 , . . . , Bym . Set
gx = max(hy1 , . . . , hym ).
6. FURTHER EXERCISES 105
By Lemma 4.60, gx ∈ A.
By continuity of gx there exists an open ball Ux such that
|gx (t) − f (t)| < ε
for t ∈ Ux . In particular,
gx (t) < f (t) + ε.
(Ux )x∈K is an open cover of K which has a finite subcover by Ux1 , . . . , Uxn . Then let
h = min(gx1 , . . . , gxn ).
By Lemma 4.60 we have h ∈ A. Also,
f (t) − ε < h(t) < f (t) + ε
for all t ∈ K. That is,
|f (t) − h(t)| < ε
for all t ∈ K. This proves that f ∈ A.
6. Further exercises
Exercise 4.62. Show that there exists no continuous 1-periodic function g such
that f ∗ g = f holds for all continuous 1-periodic functions f .
Hint: Use the Riemann-Lebesgue lemma.
Exercise 4.63. Give an alternative proof of Weierstrass’ theorem by using Fejér’s
theorem and then approximating the resulting trigonometric polynomials by truncated
Taylor expansions.
Exercise 4.64. Find a sequence of continuous functions (fn )n on [0, 1] and a con-
tinuous function f on [0, 1] such that kfn − f k2 → 0, but fn (x) does not converge to
f (x) for any x ∈ [0, 1].
Exercise 4.65 (Weighted L2 norms). Fix a function w ∈ C([a, b]) that is non-
negative and does not vanish identically. Let us define another inner product by
Z b
hf, giL2 (w) = f (x)g(x)w(x)dx
a
1/2
and a corresponding norm kf kL2 (w) = hf, f iL2 (w) . Similarly, we say that (φn )n is an
orthonormal system by asking that hφn , φm iL2 (w) is 1 if n = m and 0 otherwise. Verify
that all theorems in Section 2 continue to hold when h·, ·i, k·k2 are replaced by h·, ·iL2 (w) ,
k · kL2 (w) , respectively.
Exercise 4.66. Let w ∈ C([0, 1]) be such that w(x) ≥ 0 for all x ∈ [0, 1] and w ≡
6 0.
Prove that there exists a sequence of real-valued polynomials (pn )n such that pn is of
degree n and
Z 1
1, if n = m,
pn (x)pm (x)w(x)dx =
0
0, if n 6= m
for all non-negative integers n, m.
106 4. APPROXIMATION OF FUNCTIONS
Hint: Use the same proof as seen for L2c (a, b) in the lecture!
Exercise 4.71. Let f be the 1-periodic function such that f (x) = |x| for x ∈
[−1/2, 1/2]. Determine explicitly a sequence of trigonometric polynomials (pN )N such
that pN → f uniformly as N → ∞.
Exercise 4.72. Let f, g be continuous, 1-periodic functions.
(i) Show that f[ ∗ g(n) = fb(n)b
g (n).
P
(ii) Show that f · g(n) = m∈Z fb(n − m)b
d g (m).
(iii) If f is continuously differentiable, prove that fb0 (n) = 2πinfb(n).
(iv) Let y ∈ R and set fy (x) = f (x + y). Show that fby (n) = e2πiny fb(n).
(v) Let m ∈ Z, m 6= 0 and set fm (x) = f (mx). Show that fc b n
m (n) equals f ( m ) if m
divides n and zero otherwise.
dn
Exercise 4.73 (Legendre polynomials). Define pn (x) = dxn
[(1 − x2 )n ] for n =
0, 1, . . . and Z 1
−1/2
φn (x) = pn (x) · pn (t)2 dt .
−1
Show that (φn )n=0,1,... is a complete orthonormal system on [−1, 1].
Exercise 4.74. Let f be 1-periodic and k times continuously differentiable. Prove
that there exists a constant c > 0 such that
|fb(n)| ≤ c|n|−k for all n ∈ Z.
Hint: What can you say about the Fourier coefficients of f (k) ?
Exercise 4.75. Let f be 1-periodic and continuous.
(i) Suppose that fb(n) = −fb(−n) ≥ 0 holds for all n ≥ 0. Prove that
∞ b
X f (n)
< ∞.
n=1
n
(ii) Show that there does not exist a 1-periodic continuous function f such that
sgn(n)
fb(n) = for all |n| ≥ 2.
log |n|
Here sgn(n) = 1 if n > 0 and sgn(n) = −1 if n < 0.
108 4. APPROXIMATION OF FUNCTIONS
Exercise 4.76. Suppose that f is a 1-periodic function such that there exists c > 0
and α ∈ (0, 1] such that
|f (x) − f (y)| ≤ c|x − y|α
holds for all x, y ∈ R. Show that the sequence of partial sums
N
X
SN f (x) = fb(n)e2πinx
n=−N
converges uniformly to f as N → ∞.
Exercise 4.77. Let f ∈ C([0, 1]) and A ⊂ C([0, 1]) dense. Suppose that
Z 1
f (x)a(x)dx = 0
0
Exercise 4.80. Suppose f ∈ C([1, ∞)) and limx→+∞ f (x) = a. Show that f can
be uniformly approximated on [1, ∞) by functions of the form g(x) = p(1/x), where p
is a polynomial.
Exercise 4.81 (Stone-Weierstrass for finite sets). Let K be a finite set and A
a family of functions on K that is an algebra (i.e. closed under taking finite linear
combinations and products), separates points and vanishes nowhere. Give a purely
algebraic proof that A must then already contain every function on K. (That means
your proof is not allowed to use the concept of an inequality. In particular, you are not
allowed to use any facts about metric spaces such as the Stone-Weierstrass theorem.)
Hint: Take a close look at the proof of Stone-Weierstrass.
Exercise 4.82 (Uniform approximation by neural networks). Let σ(t) = et for
t ∈ R. Fix n ∈ N and let K ⊂ Rn be a compact set. As usual, let C(K) denote the
space of real-valued continuous functions on K. Define a class of functions N ⊂ C(K)
by saying that µ ∈ N iff there exist m ∈ N, W ∈ Rm×n , v, b ∈ Rm such that
m
X
µ(x) = σ((W x)i + bi )vi for all x ∈ K.
i=1
x3
x2 µ(x)
x1
Consequently
N
X K X
X K
X N
X
b−a= (xi − xi−1 ) = (xi − xi−1 ) ≤ `(Jν ) ≤ `(Iν ).
i=1 ν=1 i∈Jν ν=1 ν=1
111
112 5. FROM RIEMANN TO LEBESGUE*
Lemma 5.6. Let E be a compact Lebesgue null set. Then E has content zero.
Proof. Let ε > 0. Since P∞ E is a null set there is a countable family {Iν }ν∈N of
closed intervals such that ν=1 `(Iν ) < ε/2. Write Iν = [aν , bν ] and form the slightly
larger open intervals Ieν = (aν − ε2−ν−2 , bν + ε2−ν−2 ) so that `(Ieν ) = `(Iν ) + ε2−ν−1 and
thus ∞ ∞
X X X
`(Iν ) ≤
e `(Iν ) + ε2−ν−1 < ε/2 + ε/2 = ε.
ν=1 ν=1 ν=1
Since E is compact we may choose finitely many Ieν1 , . . . , IeνM such that E ⊂ ∪M
l=1 Iνl and
e
PM P
l=1 `(Iνl ) ≤ ν=1 `(Iν ) < ε. Hence E has content zero.
e e
Corollary 5.7. Let a < b. Then [a, b] is not a Lebesgue null set.
Proof. This is an immediate consequence of from Lemma 5.5 together with Lemma
5.6.
Exercise 5.8. Let E be the set of rational numbers in [a, b]. Show that E is a
Lebesgue null set but E is not of content zero.
The Lebesgue null sets are usually called sets of Lebesgue measure zero . We avoid
this terminology here because we have not defined Lebesgue measure here and indeed
have not identified the class of sets on which it can be defined (the so called Lebesgue
measurable sets). A substitute for Lebesgue measure which can be defined on all subsets
of R is Lebesgue outer measure:
Definition 5.9. For a subset
P of R the Lebesgue outer measure λ∗ (E) of E is defined
as the quantity λ∗ (E) = inf ∞ n=1 `(In ) where the infimum is taken over all countable
collections {In }n∈N of intervals which have the property that E ⊂ ∪∞ n=1 In .
With this definition, the Lebesgue null sets are simply the sets of Lebesgue outer
measure zero.
2. Lebesgue’s Characterization of the Riemann integral
We can now formulate the main theorem of this chapter.
Theorem 5.10. Let f : [a, b] → R be a bounded function. Then f is Riemann
integrable if and only if the set of discontinuities of f ,
Df := {x ∈ [a, b] : f is not continuous at x},
is a Lebesgue null set.
The following lemma linking oscillation to lower and upper sums is very helpful in
the proof of Theorem 5.10.
Lemma 5.11. Let f : [a, b] → R be a bounded function and assume that oscf (x) < γ
for all x ∈ [a, b]. Then there is a partition P of [a, b] such that U (f, P ) − L(f, P ) <
γ(b − a).
Proof. By definition of oscf (x) we can find a δx > 0 such that
Mf,2δx (x) − mf,2δx (x) < γ.
Since [a.b] is compact we find x1 , ...., xN such that [a, b] is contained in the union of
the intervals (xi − δxi , xi + δxi ). Consider the finite set consisting of the a, b the xi , the
corresponding point xi − δxi and xi + δxi and then discard those point which do not lie
2. LEBESGUE’S CHARACTERIZATION OF THE RIEMANN INTEGRAL 113
in [a, b]. The resulting set P is a partition of [a, b] with nodes a = t0 < · · · < tM = b
and if ti−1 , ti are consecutive nodes in this partition then
sup{f (t) : t ∈ [ti−1 , ti ]} − inf{f (t) : t ∈ [ti−1 , ti ]} < γ.
Hence
M
X
U (f, P ) − L(f, P ) < γ (ti − ti−1 ) = γ(b − a)
i=1
and the lemma is proved.
Proof of Theorem 5.10. Part 1: Set of discontinuities is a null set =⇒ f is
Riemann integrable. By Lemma A.27 it suffices to construct, for given ε > 0, a partition
P such that
(5.1) U (f, P) − L(f, P) < ε.
The function f is bounded and thus there is C > 0 such that |f (x)| ≤ C for x ∈ [a, b].
Now let ε1 ε depending on ε; we will see (only at the end) that
ε
ε1 =
2C + b − a
is an appropriate choice. Consider the set
D(ε1 ) = {x ∈ [a, b] : oscf (x) ≥ ε1 }.
D(ε1 ) is a Lebesgue null set since D(ε1 ) ⊂ Df and Df is a Lebesgue null set. Also,
D(ε1 ) is a closed subset of [a, b], and thus compact and thus has content zero.
PN
Thus there is a finite collection {Iν }N
ν=1 of closed intervals such that ν=1 `(Iν ) < ε1
N ◦ ◦
and D(ε1 ) ⊂ ∪ν=1 (Iν ) (where (Iν ) denotes the interior of Iν ).
We may choose a partition P = {a = x0 < · · · < xN = b} such that each index i
belongs to (at least) one of the following sets:
J1 = {i : [xi−1 , xi ] ⊂ Iν for some ν in [1, N ]]}
J2 = {i : [xi−1 , xi ] ∩ D(ε1 ) = ∅}.
Regarding the intervals [xi−1 , xi ] with i ∈ J1 we have
X N
X X N
X
(5.2) (xi − xi−1 ) ≤ (xi − xi−1 ) ≤ `(Iν ) < ε1 .
i∈J1 ν=1 i: ν=1
[xi−1 ,xi ]⊂Iν
We observe that for all i ∈ J2 we have oscf (x) < ε1 for all x ∈ [xi , xi+1 ]. Thus by
Lemma 5.11, we find a partition Pi of [xi−1 , xi ], labeled {xi = xi,0 , . . . , xi,Ni := xi+1 },
such that with
Ni
X
i
U (f, Pi ) := (xi,j − xi,j−1 ) sup f (x),
j=1 [xi,0 ,xi,Ni ]
Ni
X
i
L (f, Pi ) := (xi,j − xi,j−1 ) inf f (x)
[xi,0 ,xi,Ni ]
j=1
we have
(5.3) U i (f, Pi ) − Li (f, Pi ) < ε1 (xi − xi−1 ).
114 5. FROM RIEMANN TO LEBESGUE*
In view of our choice ε1 = ε/(2C + b − a) we have proved the desired inequality (5.1).
Let (X, d) be a metric space. Recall that the interior Ao of a set A ⊂ X is the set
of interior points of A, i.e. the set of all x ∈ A such that there exists ε > 0 such that
Bε (x) ⊂ A. A set A ⊂ X is dense if A = X. Note that A is dense if and only if for all
non-empty open sets U ⊂ X we have A ∩ U 6= ∅.
Definition 6.1. A set A ⊂ X is called nowhere dense if its closure has empty
o
interior. In other words, if A = ∅. Equivalently, A is nowhere dense if and only if A
contains no non-empty open set.
Note that 1. A closed set A ⊂ X has empty interior if and only if Ac = X \ A is
open and dense. (This is because A is closed if and only if Ac is open and A has empty
interior if and only if Ac is dense.)
2. A is nowhere dense if and only if Ac contains an open dense set.
3. A is nowhere dense if and only if A is contained in a closed set with empty interior.
Example 6.2. The Cantor set
`
−1
∞ 3[
[
(6.1) C = [0, 1]\ ( 3k+1 , 3k+2 )
3`+1 3`+1
`=0 k=0
is a closed subset of [0, 1] and has empty interior. Therefore, it is nowhere dense.
Lemma 6.3. Suppose A1 , . . . , An ⊂ X are nowhere dense sets. Then nk=1 Ak is
S
nowhere dense.
Proof. Without loss of generality let n = 2. We need to show that A1 ∪ A2 has
c
empty interior. Equivalently, setting Uk = Ak for k = 1, 2. We show that U1 ∩ U2 is
dense. Let U ⊂ X be a non-empty open set. Then V1 = U ∩ U1 is open and non-empty,
because U1 is dense. Since U2 is also dense, V1 ∩ U2 = U ∩ (U1 ∩ U2 ) is non-empty, so
U1 ∩ U2 is dense.
Also, a subset of a nowhere dense set is nowhere dense and the closure of a nowhere
dense set is nowhere dense.
However, countable unions of nowhere dense sets are not necessarily nowhere dense
sets.
Example 6.4. Enumerate the rationals as Q = {q
S1 , q2 , . . . }. For every k = 1, 2, . . . ,
the set Ak = {qk } is nowhere dense in R. But Q = ∞ k=1 Ak ⊂ R is not nowhere dense
(it is dense!).
Definition 6.5. A set A ⊂ X is called meager (or of first category ) in X if it
is the countable union of nowhere dense sets. A is called comeager (or residual or of
second category ) if Ac is meager.
115
116 6. THE BAIRE CATEGORY THEOREM*
The above example shows that Q ⊂ R is meager. In fact, every countable subset of
R is meager (because single points are nowhere dense in R).
By definition, countable unions of meager sets are meager. The choice of the word
“meager” suggests that meager sets are somehow “small” or “negligible”. But how
“large” can meager sets be? For example, can X be meager? That is, can we write
the entire metric space X as a countable union of nowhere dense subsets? The Baire
category theorem will show that the answer is no, if X is complete.
Theorem 6.6 (Baire category theorem). In a complete metric space, meager sets
have empty interior. Equivalently, countable intersections of open dense sets are dense.
Corollary 6.7. Let X be a complete metric space and A ⊂ X a meager set. Then
A 6= X. In other words, X is not a meager subset of itself.
Example 6.8. The conclusion of the Baire category theorem fails if we drop the
assumption that X is complete. Consider X = Q with the metric inherited from R
(so d(p, q) = |p − q|). Then X is a meager subset of itself because it is countable and
single points are nowhere dense in X (X has no isolated points). But the interior of X
is non-empty, because X is open in X.
Example 6.9. Not every set with empty interior is meager: consider the irrational
numbers A = R \ Q. A has empty interior, because Ac = Q is dense. It is not meager,
because otherwise R = A ∪ Ac would be meager, which contradicts the Baire category
theorem.
Exercise 6.10. Another notion of “smallness” is the following:
Definition. A set A ⊂ R is called a Lebesgue null set if for every ε > 0 there exist
intervals I1 , I2 , . . . such that
∞
[ ∞
X
A⊂ Ij and |Ij | ≤ ε.
j=1 j=1
T∞Proof of Theorem 6.6. Let (Un )n∈N be open dense sets. We need to show that
Tn=1 Un is dense. Let U ⊂ X be open and non-empty. It suffices to show that U ∩
∞
n=1 Un is non-empty. Since U1 is open and dense, U ∩ U1 is open and non-empty.
Choose a closed ball B(x1 , r1 ) ⊂ U ∩ U1 with r1 ∈ (0, 1). Then B(x1 , r1 ) ∩ U2 is
open and non-empty (because U2 is dense), so we can choose a closed ball B(x2 , r2 ) ⊂
B(x1 , r1 ) ∩ U2 with r2 ∈ (0, 12 ). Iterating this process, we obtain a sequence of closed
balls (B(xn , rn ))n such that B(xn , rn ) ⊂ B(xn−1 , rn−1 ) ∩ Un and rn ∈ (0, n1 ). By Lemma
in ∞
T
6.11 there exists a point x contained
T∞ n=1 B(xn , rn ). Since B(xn , rn ) ⊂ U ∩ Un for
all n ≥ 1, we have x ∈ U ∩ n=1 Un .
The Baire category theorem has a number of interesting consequences.
It suffices to show that each An is nowhere dense. We first prove that An is closed.
Let (fk )k∈N ⊂ An be a sequence that converges to some f ∈ C([0, 1]). We show that
f ∈ An . Indeed, by assumption, there exists (tk )k∈N ⊂ [0, 1] such that
fk (tk + h) − fk (tk )
≤n
h
holds for all k ≥ 1 if tk + h ∈ [0, 1]. By the Bolzano-Weierstrass theorem, we may
assume without loss of generality that (tk )k∈N converges to some t ∈ [0, 1] (by passing
to a subsequence). Then, by continuity of f ,
f (t + h) − f (t) fk (tk + h) − fk (tk )
= lim ≤ n.
h k→∞ h
Therefore, f ∈ An and An is closed. Also, An has empty interior. Indeed, one can see
that C([0, 1]) \ An is dense because every f ∈ C([0, 1]) can be uniformly approximated
by a function that has arbitrarily large slope (think of “sawtooth” functions).
Exercise 6.13. Provide the details of this argument: show that An has empty
interior.
The Baire category theorem implies that A has empty interior. In other words, the
set of nowhere differentiable functions C([0, 1])\A is dense. In this sense, it is “generic”
behavior for continuous functions to be nowhere differentiable. In particular, we can
conclude that there exists f ∈ C([0, 1]) \ A (so f is nowhere differentiable) without
actually constructing such a function. On the other hand, one can also give explicit
examples of nowhere differentiable functions.
118 6. THE BAIRE CATEGORY THEOREM*
Example 6.14 (Weierstrass’ function). Consider the function f ∈ C([0, 1]) defined
as
∞
X
f (x) = b−nα sin(bn x),
n=0
where 0 < α < 1 and b > 1 are fixed. The function f is indeed continuous because
the series is uniformly convergent. In fact, f is the uniform limit of the sequence of
functions (fN )N considered in Exercise 1.105.
Exercise 6.15. Show that f is nowhere differentiable.
2. Sets of continuity*
Definition 6.16. Let X, Y be metric spaces and f : X → Y a map. The set
Cf = {x ∈ X : f is continuous at x} ⊂ X
is called the set of continuity of f . Similarly, Df = X \ Cf is called the set of
discontinuity of f .
Example 6.17. Let f : R → R be defined by f (x) = 1 if x is rational and f (x) = 0
if x is irrational. Then Cf = ∅.
Example 6.18. Let f : R → R be defined by f (x) = x if x is rational and f (x) = 0
if x is irrational. Then Cf = {0}.
Example 6.19. Consider the function f : R → R defined as follows: we set f (0) = 1
and if x ∈ Q \ {0}, then we let f (x) = 1/q, where x = pq , where p ∈ Z, q ∈ N and the
greatest common divisor of p and q is one. If x 6∈ Q, then we let f (x) = 0. We claim
that Cf = R \ Q. Indeed, say x ∈ R \ Q and pn /qn → x a rational approximation. Then
qn → ∞ (otherwise, it must converge and then x would be rational). √This implies that
f is continuous at x. On the other hand, say x ∈ Q. Set xn = x + n2 . Then xn 6∈ Q
√
because 2 6∈ Q, so f (xn ) = 0 for all n, so limn→∞ f (xn ) = 0, but f (x) 6= 0. Hence f
is not continuous at x.
It is natural to ask which subsets of X arise as the set of continuity of some function
on X. For instance, does there exist a function f : R → R such that Cf = Q ?
Definition 6.20. A set A ⊂ X is called an Fσ -set if it is a countable union of
closed sets. A set G ⊂ X is called a Gδ -set if it is a countable intersection of open sets.
These names are motivated historically. The F in Fσ is for fermé which is French
for closed . On the other hand, the G in Gδ is for Gebiet which is German for region .
Examples 6.21. 1. Every open set is a Gδ -set and every closed set is an Fσ -set.
2. Let x ∈ X. Then {x} is a Gδ -set: S
it is the intersection of the open balls B(x, 1/n).
3. Q ⊂ R is an Fσ set, because Q = q∈Q {q} (a countable union of closed sets).
Theorem 6.22. Let X and Y be metric spaces and f : X → Y a map. Then
Cf ⊂ X is a Gδ -set and Df is an Fσ -set.
Proof. Let f : X → Y be given. It suffices to show that Cf is a Gδ -set. For every
S ⊂ X we define the oscillation of f on S by
ωf (S) = sup dY (f (x), f (x0 )) = diam f (S).
x,x0 ∈S
3. BAIRE FUNCTIONS* 119
We are done if we can show that Un = {x ∈ X : oscf (x) < n1 } is open for every
n ∈ N. Let x0 ∈ Un . Then oscf (x0 ) < n1 . Therefore, there exists ε > 0 such that
ωf (B(x0 , ε)) < n1 . Let x ∈ B(x0 , ε/2). Then by the triangle inequality, B(x, ε/2) ⊂
B(x0 , ε). Therefore,
oscf (x) ≤ ωf (B(x, ε/2)) ≤ ωf (B(x0 , ε)) < n1 .
Thus, B(x0 , ε/2) ⊂ Un and so Un is open.
As a sample application of the Baire category theorem we now answer one of our
previous questions negatively:
Lemma 6.23. Q ⊂ R is not a Gδ -set. Consequently, there exists no function f :
R → R such that Cf = Q.
Proof. Suppose Q is a Gδ -set. Then R \ Q is an Fσ -set and therefore can be
written as a countable union of closed sets A1 , A2 , . . . . Since R \ Q has empty interior
(its complement Q is dense), An ⊂ R \ Q also has empty interior for every n. Thus An
is nowhere dense, so R \ Q is meager. But then R = Q ∪ (R \ Q) must be meager, which
contradicts the Baire category theorem.
Observe that an Fσ -set is either meager or has non-empty interior: suppose A ⊂ X
is an Fσ -set with empty interior. Then it is a countable union of closed sets with empty
interior and therefore meager. Similarly, a Gδ -set is either comeager or not dense.
Remark. It is natural to ask if the converse of Theorem 6.22 is true in the following
sense: given a Gδ -set G ⊂ X, can we find a function f : X → R such that Cf = G ?
This cannot hold in general: suppose X contains an isolated point, that is X contains an
open set of the form {x}. Then necessarily x ∈ Cf , but x is not necessarily contained in
every possible Gδ -set. However, this turns out to be the only obstruction: if X contains
no isolated points, then for every Gδ -set G ⊂ X one can find f : X → R such that
Cf = G. For a very short proof of this, see S. S. Kim: A Characterization of the Set
of Points of Continuity of a Real Function. Amer. Math. Monthly 106 (1999), no. 3,
258—259.
3. Baire functions*
Consider again the Dirichlet function D
(
1 if x ∈ Q
(6.2) D(x) =
0 if x ∈
/ Q.
It is natural to ask whether D is the pointwise limit of a sequence of continuous func-
tions. The answer turns out to be no.
120 6. THE BAIRE CATEGORY THEOREM*
in (6.2); this identifies D as a Baire-2 function which by Theorem 6.25 is not Baire-1.
Alternatively one can also use the formula D(x) = limj→∞ (limm→∞ (cos(j!πx))2m ) to
show that D is Baire-2.
In other words, a family of bounded linear operators is uniformly bounded if and only
if it is pointwise bounded.
This theorem is also called the uniform boundedness principle.
Proof. In the ’⇐’ direction there is nothing to show. Let us prove ’⇒’. Suppose
that supT ∈F kT xkY < ∞ for all x ∈ X. Define
An = {x ∈ X : sup kT xkY ≤ n} ⊂ X.
T ∈F
But k`n kop ≥ n because |`n (en )| = n (where en denotes the sequence such that en (m) =
0 for every m 6= n and en (n) = 1). Thus,
sup k`n kop = ∞.
n∈N
Remark. In the proof we only needed that X is not meager. This is true if X is
complete, but it may also be true for an incomplete space.
As a first application of the uniform boundedness principle we prove that the point-
wise limit of a sequence of bounded linear operators on a Banach space must be a
bounded linear operator.
Corollary 6.29. Let X be a Banach space and Y a normed vector space. Suppose
(Tn )n∈N ⊂ L(X, Y ) is such that (Tn x)n∈N converges to some T x for every x ∈ X. Then
T ∈ L(X, Y ).
Proof. Linearity of T follows from linearity of limits. It remains to show that T is
bounded. Let x ∈ X. Since (Tn x)n∈N converges, we have supn kTn xkY < ∞ (convergent
sequences are bounded). By the Banach-Steinhaus theorem, there exists C ∈ (0, ∞)
such that kTn kop ≤ C for every n. Let x ∈ X. Then
kT xkY = lim kTn xkY ≤ CkxkX .
n→∞
Remark. Note that in the context of Corollary 6.29 it does not follow that Tn → T in
L(X, Y ). For instance, let Tn : `1 → `1 and Tn (x) = xn en . Then Tn (x) → 0 as n → ∞
for every x ∈ `1 , but kTn kop = 1 for every n ∈ N, so Tn does not converge to 0 in
L(X, Y ).
4.1. An application to Fourier series. Recall that for a 1-periodic continuous
function f : R → C we defined the partial sums of its Fourier series by
N
X
SN f (x) = cn e2πinx = f ∗ DN (x),
n=−N
R1 sin(2π(N + 21 )x)
where cn = 0 f (t)e−2πitn dt and DN (x) = N 2πixn
P
n=−N e = sin(πx)
is the Dirichlet
kernel (see Section 4).
The uniform boudnedness principle directly implies the following:
Corollary 6.30. Let x0 ∈ R. There exists a 1-periodic continuous function f such
that the sequence (SN f (x0 ))N ⊂ C does not converge. That is, the Fourier series of f
does not converge at x0 .
In particular, this means that the Dirichlet kernels do not form an approximation
of unity. To see why this is a consequence of the uniform boudnedness principle, we
first need to take another close look at the partial sums.
Lemma 6.31. There exists a constant c ∈ (0, ∞) such that for every N ∈ N,
Z 1
|DN (x)|dx ≥ c log(N ).
0
Changing variables 2π(N + 12 )x 7→ x we see that the right hand side of this display
equals
Z π(2N +1) 2N Z π(k+1)
−1 | sin(x)| −1
X | sin(x)|
π dx = π dx.
0 x k=0 πk x
We have that
2N Z π(k+1) 2N Z πk+ π + π 2N Z πk+ π + π
X | sin(x)| X 2 100 | sin(x)| X 2 100 dx
dx ≥ dx ≥ c .
k=0 πk x k=0 πk+ π
2
− π
100
x k=0 πk+ π
2
− π
100
x
Here we have used that | sin(x)| ≥ c for some positive number c whenever |x| is at most
π
100
away from πk + π2 for some integer k ∈ Z (indeed, | sin(x)| ≥ sin(π/2 − π/100) > 0
for such x). Since x 7→ 1/x is a decreasing function,
Z πk+ π + π
2 100 dx 1 1
π 1
≥ 50 · π π ≥ 50 · .
πk+ π2 − 100
π x πk + 2 + 100 k+1
Thus,
2N Z πk+ π2 + 100
π 2N 2N Z k+2 Z 2N +2
X dx 1
X 1 1
X dx 1 dx 1
≥ 50
≥ 50
= 50
= 50
log(2N + 2),
k=0 πk+ π2 − 100
π x k=0
k+1 k=0 k+1 x 1 x
which implies the claim.
Let us denote the space of 1-periodic continuous functions f : R → C by C(T) (here
T = R/Z = S 1 is the unit circle, which is a compact metric space1). Then C(T) is a
Banach space. Fix x0 ∈ R. We can define a linear map TN : C(T) → C by
TN f = SN f (x0 ).
Lemma 6.32. For every N ∈ N, TN : C(T) → C is a bounded linear map and
kTN kop = kDN k1 .
R1
(Here kDN k1 = 0
|DN (x)|dx.)
Proof. For every f ∈ C(T) we have
Z 1 Z 1
|TN f | = |f ∗ DN (x0 )| ≤ |f (x0 − t)DN (t)|dt ≤ kf k∞ |DN (t)|dt = kf k∞ kDN k1 .
0 0
Therefore, TN is bounded and kTN kop ≤ kDN k1 . To prove the lower bound we let
f (x) = sgn(DN (x0 − x)).
While f is not a continuous function, it can be approximated by continuous functions
as the following exercise shows.
Exercise 6.33. Show that for every ε > 0 there exists g ∈ C(T) such that |g(t)| ≤ 1
for all t ∈ R and Z 1
ε
|f (t) − g(t)|dt ≤
0 2N + 1
Hint: Modify the function f in a small enough neighborhood of each discontinuity; g
can be chosen to be a piecewise linear function.
1The metric being the quotient metric inherited from R or the subspace metric induced by the
inclusion S 1 ⊂ R2 . These metrics are equivalent.
124 6. THE BAIRE CATEGORY THEOREM*
Remark. Continuous functions with divergent Fourier series can also be constructed
explicitly. The conclusion of Corollary 6.30 can be strengthened significantly: for ev-
ery Lebesgue null set A ⊂ T 2 there exists a continuous function whose Fourier series
diverges on A (see J.-P. Kahane, Y. Katznelson: Sur les ensembles de divergence des
séries trigonométriques, Studia Math. 26 (1966), 305–306. ).
On the other hand, L. Carleson proved in 1966 that the Fourier series of a continuous
function must always converge almost everywhere (that is, everywhere except possibly
on a Lebesgue null set). This is a very deep result in Fourier analysis which is difficult
to prove (see M. Lacey, C. Thiele: A proof of boundedness of the Carleson operator,
Math. Res. Lett. 7 (2000), no. 4, 361—370 for a very elegant proof).
5. Further exercises
Exercise 6.34. We define the subset A ⊂ R as follows: x ∈ A if and only if there
exists c > 0 such that
|x − j2−k | ≥ c2−k
holds for all j ∈ Z and integers k ≥ 0. Show that A is meager and dense.
Exercise 6.35. Let (X, d) be a complete metric space without isolated points.
Prove that X cannot be countable.
2SeeExercise 6.10 for a definition on R; Lebesgue null sets of T are precisely the images of Lebesgue
null sets on R under the canonical quotient map R → R/Z = T.
5. FURTHER EXERCISES 125
Exercise 6.36. (i) Show that if X is a normed vector space and U ⊂ X a proper
subspace, then U has empty interior.
(ii) Let
X = {P : R → R | P is a polynomial}.
Use the Baire category theorem to prove that there exists no norm k · k on X such that
(X, k · k) is a Banach space.
(iii) Let X be an infinite dimensional Banach space. Prove that X cannot have a
countable (linear-algebraic) basis.
Exercise 6.37. Consider X = C([−1, 1]) equipped with the usual norm kf k∞ =
supt∈[−1,1] |f (t)|. Let
A+ = {f ∈ X : f (t) = f (−t) ∀t ∈ [−1, 1]},
A− = {f ∈ X : f (t) = −f (−t) ∀t ∈ [−1, 1]}.
(i) Show that A+ and A− are meager.
(ii) Is A+ + A− = {f + g : f ∈ A+ , g ∈ A− } meager?
Exercise 6.38. Construct a function f : R → R such that f is continuous at every
x ∈ Z and discontinuous at every x 6∈ Z.
Exercise 6.39. For every interval (open, half-open or closed) I ⊂ R give an example
of a function f : R → R such that f is continuous on I and discontinuous on R \ I.
Exercise 6.40∗. Let f : R → R be a smooth function so that for every x ∈ R there
exists n ≥ 0 with f (n) (x) = 0. Prove that f is a polynomial.
APPENDIX A
Review
1. Series
Let (an )n∈N be a sequence of complex numbers. Recall that we say that the series
P∞ PN
n=1 an converges if the sequence of partial sums ( n=1 an )N ∈N converges. In that
case, the symbol ∞
P
a
n=1 n represents the limit of this sequence. If the summands are
non-negative (that is, an ≥ 0 for all n ∈ N), then we also write
X∞
an < ∞
n=1
P∞ P∞
to denote that the series
P∞ n=1 a n converges. The series n=1 an is said to converge
absolutely if the series n=1 |an | converges.
Similarly, given a sequence of functions (fn )n∈N on a metric space X we say that
P∞ PN
n=1 fn converges uniformly, if the sequence of partial sums ( n=1 fn )N ∈N converges
uniformly.
We will also sometimes consider doubly infinite series of the form ∞
P
n=−∞ an for a
sequence ofPcomplex numbers
P∞ (an )n∈Z . Such a series is considered convergent if each of
the series ∞ a
n=0 −n and n=1 an converges (and its value is in this case the sum of the
values of these two series).
Lemma A.1 (Weierstrass M -test). Let (fn )n∈N be a sequence of functions on a
metric space X such that there exists a sequence of non-negative real numbers (Mn )n∈N
with
|fn (x)| ≤ Mn
P∞
for
P∞ all n = 1, 2, . . . and all x ∈ X. Assume that n=1 Mn converges. Then the series
n=1 fn converges uniformly.
Proof. Let a < b and recall that for two Riemann integrable functions R b h1 , hR2b on
the interval [a, b] which satisfy h1 (x) ≤ h2 (x) for all x ∈ [a, b] we also have a h1 ≤ a h2
127
128 A. REVIEW
(one proves this by considering first the corresponding inequalities for Riemann upper
and lower sums). Apply this fact together with the linearity of the integral to get
Z b Z b Z b Z b
fn − f = fn − f ≤ |fn − f | ≤ (b − a) sup |fn − f |.
a a a a [a,b]
(with the convention that if lim supn→∞ |cn |1/n = 0, then R = ∞.)
−R 0 R
C
z = a + ib = reiφ
b r
φ
a
We finish the review section with a simple, but powerful theorem on the continuity
of power series on the convergence boundary.
P∞ n
Theorem A.14 (Abel). Let fP(x) = n=0 cn x be a power series with radius of
convergence R = 1. Assume that ∞ n=0 cn converges. Then
X∞
lim f (x) = cn .
x→1−
n=0
≤ε
N
X ∞
X z }| {
n
≤ (1 − x) |sn − s|x + (1 − x) |sn − s| xn
n=0 n=N +1
N
X
≤ (1 − x) |sn − s|xn + ε.
n=0
By making x sufficiently close to 1 we can achieve that
N
X
(1 − x) |sn − s|xn ≤ ε.
n=0
3. Taylor’s theorem
Theorem A.17. Let I be an interval and let f ∈ C n+1 (I), i.e all derivatives of f
up to order n + 1 are continuous in I. Fix a ∈ I. Then for all x ∈ I.
n
X f (k) (a)
f (x) = (x − a)k + Rn (x, a)
k=0
k!
where
x
(x − t)n (n+1)
Z
Rn (x, a) = f (t)dt
a n!
(x − a)n+1 1
Z
(A.1) = (1 − s)n f (n+1) (a + s(x − a))ds
n! 0
Proof. We first observe that the second version and the first version of the remain-
der term are equivalent by changing variables (via the substitution t = a + s(x − a),
dt = (x − a)ds; note that t ranges from a to x as s ranges from 0 to 1).
132 A. REVIEW
holds for all f ∈ C (N +1) (I) then (∗)N implies (∗N +1 ) for all N = 0, 1, 2, . . . .
Theorem A.18. Let f be as in Theorem A.17 and let Rn as in (A.1). Let
Mn+1 = max{|f (n+1) (a + s(x − a))| : 0 ≤ s ≤ 1}.
Then
Mn+1
|Rn (x, a)| ≤ |x − a|n+1 .
(n + 1)!
Proof. We have
|x − a|n+1 1
Z
|Rn (x, a)| ≤ (1 − s)n |f (n+1) (a + s(x − a))|ds
n! 0
n+1 Z 1
|x − a| |x − a|n+1
≤ Mn+1 (1 − s)n ds = Mn+1
n! 0 (n + 1)!
Theorem A.19. Let f be as in Theorem A.17 and let Rn as in (A.1). There is ξ
between a and x such that
|x − a|n+1 (n+1)
Rn (x, a) = f (ξ).
(n + 1)!
Proof. Let
m = min{f (n+1) (a + s(x − a)) : 0 ≤ s ≤ 1},
M = max{f (n+1) (a + s(x − a)) : 0 ≤ s ≤ 1}.
We estimate
Z 1 Z 1 Z 1
n n (n+1)
(1 − s) m ds ≤ (1 − s) f (a + s(x − a))ds ≤ (1 − s)n M ds
0 0 0
and hence Z 1
m ≤ (n + 1) (1 − s)n f (n+1) (a + s(x − a))ds ≤ M.
0
4. THE RIEMANN INTEGRAL 133
By the intermediate value theorem for continuous functions there is σ ∈ [0, 1] such that
Z 1
(n+1)
(A.2) f (a + σ(x − a)) = (n + 1) (1 − s)n f (n+1) (a + s(x − a))ds.
0
If we set ξ = a + σ(x − a) so that ξ is on the line segment connecting a to x we get the
claimed statement from (A.1) and (A.2).
Mi (f ) = sup f (t).
t∈[xi−1 ,xi ]
are called the lower and upper Riemann-Darboux integrals of f on the interval [a, b],
respectively. Here the sup and inf are taken over all partitions of [a, b].
Lemma A.25. Let f : [a, b] → R be bounded. Then
b
(b − a) inf f ≤ I ba (f ) ≤ I a (f ) ≤ (b − a) sup f.
[a,b] [a,b]
134 A. REVIEW
We are now ready to define the concept of Riemann integrable functions and the
Riemann integral of such functons.
Definition A.26. (i) Let f : [a, b] → R be bounded. f is called Riemann integrable
b
if I ba (f ) = I a (f ).
b
(ii) If f is Riemann integrable the number I ba (f ) = I a (f ) is called the Riemann
R Rb Rb
integral of f , denoted by [a,b] f or by a f (or even by a f (t)dt ...)
Lemma A.27. Let f : [a, b] → R be a bounded function. Then f is Riemann
integrable if and only if for every ε > 0 there is a partition P of [a, b] such that
U (f, P ) − L(f, P ) < ε.
Proof. Suppose f is Riemann integrable. Then there are partitions P1 , P2 of [a, b]
Rb Rb
such that L(f, P1 ) ≥ a f − ε/2, U (f, P2 ) ≤ a f + ε/2 and thus U (f, P2 ) − L(f, P1 ) < ε.
Let P be the refinement P1 ∪ P2 . Then U (f, P2 ) ≥ U (f, P ) ≥ L(f, P ) ≥ L(f, P1 ) and
hence U (f, P ) − L(f, P ) < ε.
Vice versa assume that for every there is a partition Pε of [a, b] such that
U (f, Pε ) − L(f, Pε ) < ε.
b
Then I a (f ) − I ba (f ) ≤ U (f, Pε ) − L(f, Pε ) < ε, and since ε was arbitrary we conclude
b
I a (f ) = I ba (f ). Hence f is Riemann integrable.
Theorem A.28. If f : [a, b] → R is continuous in [a, b] then f is Riemann inte-
grable.
Proof. Recall that a continuous function on a compact set is uniformly continuous.
Hence given ε > 0 there exists δε > 0 such that |f (x) − f (x̃)| < ε/(b − a) provided that
|x − x̃| < δ. Let N be such that (b − a)/N < δ and choose the partition P = {xj :=
a + j b−a
N
, j = 0, . . . N }. Let Ij = [xj−1 , xj ], j = 1, . . . , N . Then
(Mi (f ) − mi (f ) = (sup f − inf f ) < ε/(b − a)
Ij Ij
for i = 1, . . . , N so that
N
X N
X
U (f, P ) − L(f, P ) = Mi f (xj − xj−1 ) − mi (f )(xj − xj−1 )
i=1 i=1
N N
X X ε ε
= (Mj (f ) − mi (f ))(xj − xj−1 ) ≤ (xj − xj−1 ) = (b − a) = ε.
i=1 i=1
b−a b−a
We can apply Lemma A.27 to see that f is Riemann integrable.
Theorem A.29. (i) Let f and g be Riemann integrable functions on [a, b] and
Rb Rb
suppose that f (x) ≤ g(x) for all x ∈ [a, b]. Then a f ≤ a g.
(ii) Let f be Riemann integrable on [a, b]. Then
Z b
f ≤ (b − a) sup |f |
a [a,b]
∞ ∞ n 2 ∞
X
−1
X 1 X 1
(iv) cos(πn) sin(πn ) (v) 1+ −e (vi)
n=1 n=2
n n=1
n(n1/n )100
∞ ∞ X
10n ∞
X a
X nk X 1
(vii) 2−(log(n)) (viii) (−1)k (ix)
n=2 n=1 k=0
k! n=1
n2 (1 − cos(n))
Exercise A.33. Prove or disprove convergence for each of the following sequences
and in case of
p convergence, determine the limit:
4 2 2
(i) an = n + cos(n
2 1
√)−n
(ii) an = n + 2 n − n4 + n3
P 2
(iii) an = nk=n k1
(iv) an = n ∞ 1
P
k=0 n2 +k2
(v) a0 = 1, an+1 = a2n + a1n
2
(vi) an = nk=2 k k−1
Q
2
Exercise A.34. For which x ∈ R do the following series converge? On which sets
do these series converge uniformly?
∞
X ∞
X ∞
X
(i) 2 n
nx (ii) 1/n n n
(3 − 1) x (iii) tan(n−2 )enx
n=1 n=1 n=1
∞ ∞ ∞
X xn X sin(nx) X
(iv) (v) (vi) 2−n tan(bxc + 1/n)
n=1
nn n=1
n2 n=1
Exercise A.35. (i) Define f by setting f (x) = x for x ≥ 0 and f (x) = 0 for x < 0.
Then f is not differentiable at x = 0. Construct an example of a sequence (fn )n∈N of
continuously differentiable functions defined on R, uniformly convergent on R to f .
(ii) Let fn (x) = n−1/2 sin nx. Show that fn converges uniformly on R, but for every
x ∈ R, the sequence (fn0 (x))n∈N does not have a limit.
Exercise A.36. Give an example of a sequence (fn )n∈N of continuous bounded
functions on R that converges pointwise to some function f such that f is unbounded
and not continuous.
136 A. REVIEW
P∞ (−1)n
Exercise A.37. Determine the value of the series n=1 n(n+1) .