0% found this document useful (0 votes)
2 views

Math522-notes

These lecture notes cover a second undergraduate course in analysis, detailing topics such as metric spaces, linear operators, differential calculus, and function approximation. The notes are a work in progress, with ongoing revisions and updates, and they include exercises and references for further reading. The document emphasizes the importance of independent understanding and encourages students to engage with the material critically.

Uploaded by

anywar jimmy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Math522-notes

These lecture notes cover a second undergraduate course in analysis, detailing topics such as metric spaces, linear operators, differential calculus, and function approximation. The notes are a work in progress, with ongoing revisions and updates, and they include exercises and references for further reading. The document emphasizes the importance of independent understanding and encourages students to engage with the material critically.

Uploaded by

anywar jimmy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 136

Analysis II

Lecture Notes
Work in progress, last updated: May 12, 2023

Joris Roos and Andreas Seeger


J.R., Department of Mathematical Sciences, University of Massachusetts
Lowell, 265 Riverside St., Lowell, MA 01854, USA
A.S., Department of Mathematics, University of Wisconsin-Madison,
480 Lincoln Dr, Madison, WI-53706, USA
Contents

Note to students 5
Chapter 1. Metric spaces 7
1. Topology 7
2. The contraction principle 14
3. Compactness 17
4. Covering numbers and Minkowski dimension* 26
5. Oscillation as a quantification of discontinuity* 28
6. Further exercises 29
Chapter 2. Linear operators and derivatives 33
1. Bounded linear operators 33
2. Equivalence of norms 36
3. Dual spaces* 38
4. Sequential `p spaces* 39
5. Derivatives 41
6. Further exercises 46
Chapter 3. Differential calculus in Rn 49
1. Inverse function theorem 52
2. Implicit function theorem 54
3. Ordinary differential equations 56
4. Higher order derivatives and Taylor’s theorem 64
5. Local extrema 69
6. Local extrema on surfaces 71
7. Optimization and convexity* 73
8. Further exercises 79
Chapter 4. Approximation of functions 85
1. Polynomial approximation 85
2. Orthonormal systems 87
3. The Haar system 91
4. Trigonometric polynomials 95
5. The Stone-Weierstrass Theorem 103
6. Further exercises 105
Chapter 5. From Riemann to Lebesgue* 111
1. Lebesgue null sets 111
2. Lebesgue’s Characterization of the Riemann integral 112
Chapter 6. The Baire category theorem* 115
1. Nowhere differentiable continuous functions* 117
2. Sets of continuity* 118
3
4 CONTENTS

3. Baire functions* 119


4. The uniform boundedness principle* 121
5. Further exercises 124
Appendix A. Review 127
1. Series 127
2. Power series 128
3. Taylor’s theorem 131
4. The Riemann integral 133
5. Further exercises 135
Note to students

These are lecture notes for a second undergraduate course in analysis, taught as
Math 522 at UW Madison. J.R. prepared a full set of lecture notes for the class in the
fall semesters of 2018 and 2019; they were preceded by individual notes on some of the
topics, written by A.S. for previous classes. The current version is by no means a final
one; all chapters are still undergoing revisions and some will be further expanded. We
are grateful to the students of several Math 522 classes for useful questions and remarks
on previous versions of the notes.
There is more content in these notes than we can cover in Math 522, and you may
receive updates about the precise lecture contents throughout the course.
The notes in the present form are likely to still contain typos, errors and imprecisions
of all kinds. Do not ever take anything that you read in a mathematical text for granted.
Think hard about what you are reading and try to make sense of it independently. If
that fails, then it’s time to ask somebody a question and that usually helps. In the spring
semester of 2023 the course will be taught by A.S. He will welcome all comments about
the contents of these notes - please let him know about any misprints or inaccuracies
that you may find.

There are many books on mathematical analysis, each of which will likely have a
large intersection with this course. Here are three very good ones:
• W. Rudin, Principles of mathematical analysis
• T. Apostol, Mathematical analysis: A modern approach to advanced calculus
• T. Körner, A Companion to Analysis: A Second First and First Second Course
in Analysis
For further self study in analysis we recommend the Princeton Lectures in Analysis I-
IV, by Stein and Shakarchi. Throughout the course A.S. will make concrete suggestions
for further reading related to the content of these lecture notes.
• E. M. Stein, R. Shakarchi, Fourier Analysis, an introduction.
• , Complex Analysis
• , Real Analysis : measure theory, integration, and Hilbert spaces
• , Functional Analysis
We mention two excellent books used in first year analysis graduate courses at UW
Madison.
• W. Rudin, Real and Complex Analysis
• G. Folland, Real Analysis, modern techniques and their applications.
Finally, a concise and more general treatment of differential calculus in normed spaces
can be found in chapter 1 of
• L. Hörmander, The analysis of linear partial differential operators, vol. I.

5
CHAPTER 1

Metric spaces

1. Topology
The notion of a metric space serves as a convenient abstract setting that underlies
all topics discussed in this course. A metric space can be thought of as a collection
of distinct objects that come with a distance between them. This provides a structure
that makes it meaningful to speak of notions such as convergence and continuity. It
will allow us to use the same terminology for potentially very different kinds of objects.
Definition 1.1 (Metric space). A set X equipped with a map d : X × X → [0, ∞)
is called a metric space if X is not the empty set and for all x, y, z ∈ X,
(1) d(x, y) = d(y, x),
(2) d(x, z) ≤ d(x, y) + d(y, z),
(3) d(x, y) = 0 if and only if x = y.
d is called a metric .

x z

Figure 1. Property (2) is called the triangle inequality.

One may imagine the d to stand for ‘distance‘. If multiple metric spaces are relevant
at the same time, then we may also write dX for the metric d on the metric space X.
Examples 1.2. Some fundamental examples of metric spaces that will be important
in this course are
• the real numbers R with d(x, y) = |x − y|,
• closed and open intervals of real numbers (with the same metric),
• the complex numbers C with d(z, w) = |z − w|,
• n-dimensional Euclidean space Rn consisting of vectors x = (x1 , . . . , xn ) with
the Euclidean metric
n
X 1/2
d(x, y) = |xi − yi |2 ,
i=1

• the space C([a, b]) of continuous functions [a, b] → C with


d(f, g) = sup |f (x) − g(x)|.
x∈[a,b]

7
8 1. METRIC SPACES

• the space `∞ of bounded sequences (an )n∈N of complex numbers with


d(a, b) = sup |an − bn |
n∈N

• the space c0 of sequences (an )n∈N of complex numbers with limn=0 an = 0, with
the same metric as for `∞ .
Exercise 1.3. Verify that each of the preceding examples is really a metric space.
In the following let X be a metric space with metric d.
1.1. Open and closed sets. For every x0 ∈ X and r > 0 define the open ball
B(x0 , r) = {x ∈ X : d(x, x0 ) < r},
and the closed ball
B(x0 , r) = {x ∈ X : d(x, x0 ) ≤ r}.
Example 1.4. If X = R (always with the usual metric), then the open balls are
open intervals and the closed balls are closed intervals.
Should multiple metric spaces be involved we use subscripts on the metric and balls
to indicate which metric space we mean, i.e. BX (x0 , r) is a ball in the metric space X.
Definition 1.5 (Open set). Let X be a metric space and U ⊂ X. A point x ∈ U
is called interior in U if there exists r > 0 such that B(x, r) ⊂ U .
The set U ⊂ X is called open if every point x ∈ U is interior.
Clarification of notation: A ⊂ B means for us that A is a subset of B, not
necessarily a proper subset. That is, we also allow A = B. We will write A ( B to
refer to proper subsets.
Note that a union of open sets is open. The family (U )U ⊂X open of open sets is also
called the topology of X. Note from the definition that the topology of a metric space
X is determined by the open balls (B(x, r))x∈X,r>0 . The notion of open sets can be
generalized and leads to the concept of topological spaces, which we will not need in
this course.
Definition 1.6 (Closed set). A set A ⊂ X is called closed if its complement
{
A = X \ A is open. If A ⊂ X is an arbitrary set, then A denotes the intersection of
all closed sets containing A.
Since an intersection of closed sets is closed, A is closed by definition and called
the closure of A. It is the ‘smallest’ closed set containing A in the sense that if A0 is a
closed set with A ⊂ A0 , then A ⊂ A0 . As a consequence, a set A is closed if and only if
A = A.
Exercise 1.7. Verify that open balls are open and closed balls are closed.
Note that B(x0 , r), the closure of the open ball B(x0 , r), often coincides with the
closed ball B(x0 , r). While this is the case in most of the metric spaces encountered in
this course, it is generally only true that
B(x0 , r) ⊂ B(x0 , r) ⊂ B(x0 , r).
Example 1.8. Let X be a non-empty set. For x, y ∈ X let

0, if x = y,
d(x, y) =
1, if x 6= y.
1. TOPOLOGY 9

This defines a metric on X (called the trivial metric). The topology on X is very
boring: every set is open (hence also every set is closed). Then, for every x ∈ X,
B(x, 1) = B(x, 1) = {x} ⊂ B(x, 1) = X.
Definition 1.9 (Accumulation point). Let A ⊂ X. A point x ∈ X is called an
accumulation point of A if for every r > 0, there exists y ∈ Br (x) ∩ A with y 6= x.

Lemma 1.10. Let X be a metric space and A ⊂ X. Then A is equal to the union
of A and the set of accumulation points of A.
Proof. For one direction we take an arbitrary closed set C containing A and we
have to show that every accumulation point x belongs to C. If x were in X \ C (an
open set not intersecting A) then there would be an ε > 0 and a ball B(x, ε) such that
B(x, ε) ⊂ X \ C and hence B(x, ε) ∩ A ⊂ B(x, ε) ∩ C = ∅, in contradiction to x being
an accumulation point. Since C was an arbitrary closed set containing A we find that
A and the set of accumulation points are both subsets of A.
To show the converse let x ∈ A \ A; we have to show that x is an accumulation
point. Again argue by contradiction and suppose that x is not an accumulation point.
Then there would exist a ball B(x, ε) containing no points in A (other than x itself,
but that is excluded by assumption). Hence C = (X \ B(x, ε)) ∩ A would be a closed
set containing A, with C ( A, a contradiction to the definition of closure of A. 

1.2. Relative topology. If we have a metric space X with metric d and a non-
empty subset A ⊂ X, then A can be made a metric space by restricting the metric: we
define the metric dA : A × A → [0, ∞) by setting
dA (x, y) = d(x, y) for all x, y ∈ A.
In other words, dA is the restriction of d to the set A × A ⊂ X × X, also denoted
by dA = d|A×A . As a metric space, A comes with its own open sets: unpacking the
definition, a set U ⊂ A is open in A if and only if for every x ∈ U there exists r > 0
such that
BA (x, r) = {y ∈ A : d(x, y) < r} ⊂ U.
Observe that the open balls in A are not necessarily open balls in X. As a consequence,
a set U ⊂ A that is open in A is not necessarily open in X. However, the open sets in
A can be characterized by the open sets in X.
Lemma 1.11. Let A ⊂ X. A set U ⊂ A is open in A if and only if there exists an
open set V ⊂ X such that U = V ∩ A.
Proof. Suppose that U = V ∩ A with V open. We have to show that U is open in
A. Let x ∈ U ⊂ V then there is a ball B(x, r) = {y ∈ X : d(x, y) < rx } contained in V .
Then BA (x, rx ) ⊂ U , so x is an interior point of U (with respect to the metric on A).
Vice versa, let U be open in A. Then for every x ∈ U there is rx > 0 such that
x ∈ BA (x, rx ) ⊂ U , and thus U = ∪x∈U BA (x, rx ). Define V = ∪x∈U B(x, rx ). Then V is
open in X and V ∩ A = U . 
Example 1.12. Let X = R, A = [0, 1]. Then U = [0, 21 ) ⊂ A ⊂ X is open in A, but
not open in R. However, there exists V ⊂ R open such that U = V ∩ A: for example,
V = (−1, 12 ).
10 1. METRIC SPACES

1.3. Convergence.
Definition 1.13 (Convergence). Let X be a metric space, (xn )n∈N ⊂ X a sequence
and x ∈ X. We say that (xn )n∈N converges to x if for all ε > 0 there exists N ∈ N
such that for all n ≥ N it holds that d(xn , x) < ε.
If (xn )n∈N converges to x we also call x the limit of the sequence and write x =
limn→∞ xn ; alternatively we may also write that xn → x in X.
Definition 1.14 (Cauchy sequence). Let X be a metric space. A sequence (xn )n∈N
in X is called Cauchy sequence if for every ε > 0 there exists N ∈ N such that for all
n, m ≥ N we have
d(xn , xm ) < ε.
Lemma 1.15. Every convergent sequence is a Cauchy sequence.
Proof. Let ε > 0. Since xn → x there is N so that for all k ≥ N we have
d(xk , x) ≤ ε/2. If m ≥ N , n ≥ N we get by the triangle inequality
d(xn , xm ) ≤ d(xm , x) + d(x, xn ) < ε/2 + ε/2 = ε
and since ε was arbitrary the result is proved. 
Definition 1.16 (Completeness). A metric space X is called complete if every
Cauchy sequence (xn )n∈N ⊂ X converges.
Example 1.17. The metric space of rational numbers, Q (with the usual metric)
is not complete: the sequence of rational numbers

(10−n b10n 2c)n∈N = (1.4, 1.41, 1.414, . . . )
is a Cauchy sequence, but it does not converge in Q.
√ This is because it converges as a
sequence of real numbers to the irrational number 2 6∈ Q.
The real numbers form an example of a complete metric space (in fact, they are
usually defined via completion of the rational numbers).
Lemma 1.18. If X is complete and A ⊂ X is closed, then A is a complete metric
space.
Proof. Let (xn )n∈N be a Cauchy sequence in A. Since dA = dX |A×A , (xn ) is a
Cauchy sequence in X, it has, by assumption, a limit x ∈ X. By Lemma 1.10 x ∈ A,
but by assumption A = A. Hence xn converges to x in A. 
Note that this is not true if X is not complete: for example, every metric space is
a closed subset of itself, but not every metric space is complete.
1.4. Continuity.
Definition 1.19 (Continuity). Let X, Y be metric spaces.
(i) A map f : X → Y is called continuous at x ∈ X if for every ε > 0 there exists
δ > 0 such that if dX (x, y) < δ, then dY (f (x), f (y)) < ε.
(ii) f is called continuous if it is continuous at every x ∈ X. We also write
f ∈ C(X, Y ).
Lemma 1.20. Let f : X → Y and x ∈ X. The following are equivalent.
(i) f continuous at x.
(ii) For every sequence (xn )n∈N ⊂ X convergent to x, the sequence (f (xn ))n∈N
converges to f (x).
1. TOPOLOGY 11

Proof. (i) =⇒ (ii). Suppose xn → x in X. Let ε > 0. Since by assumption f


is continuous at x we find δε > 0 such that dY (f (x̃), f (x)) < ε whenever dX (x̃, x) <
δε . Since limn→∞ xn = x, we find Nε so that dX (xn , x) < δε for n > Nε and thus
dY (f (xn ), f (x)) < ε for n > Nε . Since ε > 0 was arbitrary this shows that f (xn ) → f (x)
in X.
(ii) =⇒ (i). We argue by contradiction and assume that f is not continuous at
x. Then there exists an ε > 0 so that there is no δ > 0 with the property that
dY (f (x̃), f (x)) < ε for all x̃ ∈ B(x, δ). Taking δ to be a reciprocal of a positive
integer, we see that for every n ∈ N there exists an xn with dX (xn , x) < 1/n but
d(f (xn ), f (x)) ≥ ε. Clearly xn → x in X but (f (xn ))n∈N does not converge to f (x);
this is a contradicton.

Definition 1.21 (Inverse images). Let f : X → Y be a function and A ⊂ Y . The
inverse image of A under f is defined as
f −1 (A) = {x ∈ X : f (x) ∈ A}.
This definition makes sense for all functions from X to Y . The notation does not
imply that f is invertible. Find examples to illustrate this. Convince yourself that if f
is invertible then f −1 (A) is the image of A under the inverse map.
Lemma 1.22. Let f : X → Y be a map between metric spaces X and Y . The
following are equivalent:
(i) f continuous.
(ii) f −1 (U ) ⊂ X is open for every open set U ⊂ Y .
Proof. (i) =⇒ (ii). Let U ⊂ Y be open, and let x ∈ f −1 (U ). Since U is open there
is an open ball B(f (x), ε) contained in U . Since f is continuous at x there is an open
ball B(x, δ) in X such that dY (f (x̃), f (x)) < ε for every x̃ ∈ B((x, δ). In particular
f (x̃) ∈ U for every x̃ ∈ B(x, δ). Hence B(x, δ) ⊂ f −1 (U ) so that x is an interior point
of f −1 (U ). Since x was chosen arbitrary in f −1 (U ) we conclude that f −1 (U ) is open.
(ii) =⇒ (i). Let ε > 0. Let x ∈ X. By assumption f −1 (B(f (x), ε)) is open. The
point x belongs to this set and thus is an interior point. Hence there exists a δ > 0
(depending on x and ε) so that
(1.1) B(x, δ) ⊂ f −1 (B(f (x), ε)).
This means that dY (f (x̃), f (x)) < ε provided that dX (x̃, x) < δ) and since ε > 0 was
arbitrary and thus f is continuous at x. Since x ∈ X was arbitrary f is continuous. 
As a consequence of Lemma 1.20 and Lemma 1.22, continuity of f can also be
characterized by saying that f commutes with limits. That is,
lim f (xn ) = f ( lim xn ),
n→∞ n→∞

provided that (xn )n∈N is a convergent sequence in X.


In this course we will mostly study real- or complex-valued functions on metric
spaces, i.e. f : X → R or f : X → C. Whether functions are real- or complex-valued
is often of little consequence to the heart of the matter. For definiteness we make the
convention that functions are always complex-valued, unless specified otherwise. The
space of continuous functions will be denoted by C(X), while the space of bounded
continuous functions is denoted Cb (X).
12 1. METRIC SPACES

1.5. Uniform convergence.


Definition 1.23. A sequence (fn )n∈N of (real- or complex-valued) functions on a
set X is called uniformly convergent to a function f if for all ε > 0 there exists N ∈ N
such that for all n ≥ N and all x ∈ X,
|fn (x) − f (x)| < ε.
Compare this to pointwise convergence . To see the difference between the two it
helps to write down the two definitions using the symbolism of predicate logic:

∀ε > 0 ∃N ∈ N ∀x ∈ X ∀n ≥ N : |fn (x) − f (x)| < ε.

∀ε > 0 ∀x ∈ X ∃N ∈ N ∀n ≥ N : |fn (x) − f (x)| < ε.


Formally, the difference is an interchange in the order of universal and existential
quantifiers. The first is uniform convergence, where N needs to be chosen independently
of x (uniformly in x) and the second is pointwise convergence, where N is allowed to
depend on x. One can rephrase uniform convergence as follows:

Lemma 1.24. Let (fn )n∈N be a sequence of functions on a set X. Then (fn )n∈N
converges uniformly to f if and only if limn→∞ supx∈X |fn (x) − f (x)| = 0.
To illustrate the difference between the notions of pointwise convergence and uniform
convergence we consider
(
1 − nx if 0 ≤ x ≤ n−1
fn (x) =
0 if n−1 < x ≤ 1
as a sequence of functions on the metric space X = [0, 1]. For every x ∈ [0, 1] the
numerical sequence {fn (x)}n∈N converges and we have
(
0 if 0 < x ≤ 1
lim fn (x) = .
n→∞ 1 if x = 0
Thus fn converges to f pointwise. However, for every n ∈ N
sup |fn (x) − f (x)| = 1
x∈[0,1]

and hence fn does not converge uniformly.


In the following we collect some important facts surrounding uniform convergence
that will be used in this lecture. We give the short proofs, or at least sketches. However,
if you are feeling a bit rusty on these concepts, all of these are good exercises to try
and prove yourself directly from first principles.
Lemma 1.25. A sequence (fn )n∈N of functions on a set X converges uniformly if
and only if it is uniformly Cauchy , i.e. for every ε > 0 there exists N ∈ N such that
for all n, m ≥ N and all x ∈ X, |fn (x) − fm (x)| < ε.
Proof. First assume that fn converges uniformly to f . Then use
|fn (x) − fm (x)| = |fn (x) − f (x) + f (x) − fm (x)| ≤ |fn (x) − f (x)| + |f (x) − fm (x)|
to see that fn is uniformly Cauchy.
Now assume that (fn ) is uniformly Cauchy. Given ε > 0 there is N such that |fn (x)−
fm (x)| < ε/2 for n ≥ N , x ∈ X. In particular for every x ∈ X the numerical sequence
1. TOPOLOGY 13

(fn (x))n∈N is Cauchy. Since all Cauchy sequences in R and C converge this numerical
sequence has a limit, call it f (x). Letting m → ∞ we see that limm→∞ |fn (x)−fm (x)| =
|fn (x) − f (x)| and it follows that for n ≥ N we get |fn (x) − f (x)| ≤ ε/2. Since ε is
arbitrary this means that (fn ) converges to f uniformly. 
Lemma 1.26. If (fn )n∈N converges uniformly to f and each fn is bounded, then f
is bounded.
(Recall that a function f : X → C is called bounded if there exists C > 0 such that
|f (x)| ≤ C for all x ∈ X.)
Proof. By the assumed uniform convergence there is N such that |fN (x)−f (x)| <
1 for all n ≥ N , x ∈ X. Since fN is bounded there is M > 0 such that |fn (x)| ≤ M for
all x ∈ X. Now use
|f (x)| = |f (x) − fN (x) + fN (x)| ≤ |f (x) − fN (x)| + |fN (x)| < 1 + M.

We shall now assume that X is a metric space, with metric d.
Lemma 1.27. Let X be a metric space and a ∈ X. If (fn )n∈N converges uniformly
to f and each fn is continuous at a, then f is continuous at a.
Proof. We have to show that given ε > 0 there is δ > 0 such that |f (x)−f (a)| < ε
provided that d(x, a) < δ.
Since fn converges uniformly to f there is an N ∈ N such that |fn (x) − f (x)| < ε/3
for n ≥ N , and all x ∈ X. Consider the continuous function fN . There is δ > 0 such
that |fN (x) − fN (a)| < ε/3 provided that d(x, a) < δ. For such x we get
|f (x) − f (a)| = |f (x) − fN (x) + fN (x) − fN (a) + fN (a) − f (a)|
≤ |f (x) − fN (x)| + |fN (x) − fN (a)| + |fN (a) − f (a)| < ε/3 + ε/3 + ε/3 = ε.

Lemma 1.28. Let X be a metric space. The space of bounded continuous functions
Cb (X) is a complete metric space with the supremum metric
d∞ (f, g) = sup |f (x) − g(x)|.
x∈X

(Recall that a metric space is complete if every Cauchy sequence converges.)


Proof. We leave the verification of the metric properties to the reader. Let (fn )n∈N
be a Cauchy sequence. Then (fn ) is uniformly Cauchy and by Fact 1.25 it is uniformly
convergent to a limiting function f . By Facts 1.26 and 1.27 f belongs to Cb (X) and
thus indeed (fn ) converges to F with respect to the metric in Cb (X). 
Rephrasing Lemma 1.24 we get
Lemma 1.29. Let (fn )n∈N ⊂ Cb (X) be a sequence. Then (fn )n∈N converges in Cb (X)
(with respect to d∞ ) if and only if it converges uniformly to f for some f ∈ Cb (X).
Exercise 1.30. The concept of uniform convergence can be extended to sequences
of functions fn : X → Y where (Y, dY ) is a metric space. Extend the above definitions
and prove the relevant theorems.
Lemma 1.31. Let (fn )n∈N ⊂ Cb ([a, b]) be a sequence that converges uniformly to f .
Rb Rb
Then f is Riemann integrable on [a, b] and limn→∞ a fn = a f .
14 1. METRIC SPACES

Proof. By Lemma 1.29 f is continuous and bounded on [a, b] and thus fn , f ,


|fn − f | are Riemann integrable. We have by Theorem A.29
Z b Z b Z b
fn − f = (fn − f )
a a a
Z b
≤ |fn − f | ≤ (b − a) sup |fn (t) − f (t)| = (b − a)d∞ (fn , f )
a a≤t≤b

and the conclusion immediately follows from Lemma 1.29 


The preceding theorem can be generalized as follows.
Lemma 1.32. Let {fn }n∈N be a sequence of Riemann integrable functions on an
interval [a, b] that converges uniformly of f on [a, b]. Then f is Riemann integrable on
Rb Rb
[a, b] and limn→∞ a fn = a f .
Proof. Left as an exercise. This requires a closer review of the Riemann integral,
see the Appendix, §4 . 
Lemma 1.33. Let (fn ) be a sequence of functions differentiable in [a, b] such that fn0
is continuous in [a, b] for every n ∈ N. Suppose that there is x0 ∈ [a, b] such that the
numerical sequence (fn (x0 ))n∈N converges and that the sequence of derivatives converges
uniformly on [a, b] to a function g.
Then (fn )n∈N converges uniformly to a function f , and f is differentiable with
f 0 (x) = g(x) for all x ∈ [a, b].
ProofR of Lemma 1.33. By the fundamental theorem of calculusRwe have fn (x) −
x x
fn (x0 ) = x0 fn0 (t)dt. The right hand side converges uniformly to a g(t)dt, and if
Rx
c = limn→∞ fn (x0 ) we see that fn converges uniformly to f (x) = c + a g(t)dt. Since g
is continuous, the function f is differentiable and f 0 = g. 
Careful: If fn → f uniformly and fn is differentiable on [a, b], then this does not
imply that f is differentiable.

2. The contraction principle


The contraction principle is a powerful tool in analysis.
Definition 1.34. A map ϕ : X → X is called a contraction (of X) if there exists
a constant c ∈ (0, 1) such that
(1.2) d(ϕ(x), ϕ(y)) ≤ c · d(x, y)
holds for all x, y ∈ X.
Observe that contractions are continuous.
Exercise 1.35. Let f : (a, b) → (a, b) be differentiable and suppose that c ∈ (0, 1)
is such that |f 0 (x)| ≤ c for all x ∈ (a, b). Show that f is a contraction of (a, b).
Theorem 1.36 (Banach fixed point theorem). Let X be a complete metric space
and ϕ : X → X a contraction. Then there exists a unique x∗ ∈ X such that ϕ(x∗ ) = x∗ .
Remark. A point x ∈ X such that ϕ(x) = x is called a fixed point of ϕ.
2. THE CONTRACTION PRINCIPLE 15

Proof. Uniqueness: Suppose x0 , x1 ∈ X are fixed points of ϕ. Then


0 ≤ d(x0 , x1 ) = d(ϕ(x0 ), ϕ(x1 )) ≤ c · d(x0 , x1 ),
which implies d(x0 , x1 ) = 0, since c ∈ (0, 1). Thus x0 = x1 .
Existence: Pick x0 ∈ X arbitrarily and define a sequence (xn )n≥0 recursively by
xn+1 = ϕ(xn ).
We claim that (xn )n∈N is a Cauchy sequence. Indeed, by induction we see that
d(xn+1 , xn ) ≤ cd(xn , xn−1 ) ≤ c2 d(xn−1 , xn−2 ) ≤ · · · ≤ cn d(x1 , x0 ).
Thus, for n < m we can use the triangle inequality to obtain
d(xm , xn ) ≤ d(xm , xm−1 ) + d(xm−1 , xm−2 ) + · · · + d(xn+1 , xn )
m−1 m−1 ∞
X X X d(x1 , x0 )
= d(xi+1 , xi ) ≤ ci d(x1 , x0 ) ≤ d(x1 , x0 ) ci = cn .
i=n i=n i=n
1−c
Thus, d(xm , xn ) converges to 0 as m > n → ∞. This shows that (xn )n∈N is a Cauchy
sequence. By completeness of X, it must converge to a limit which we call x∗ ∈ X. By
continuity of ϕ,
ϕ(x∗ ) = ϕ( lim xn ) = lim ϕ(xn ) = lim xn+1 = x∗ .
n→∞ n→∞ n→∞


Remarks. 1. The proof not only demonstrates the existence of the fixed point x∗ ,
but also gives an algorithm to compute it via successive applications of the map ϕ. We
can say something about how quickly the algorithm converges: the sequence (xn )n∈N
defined in the proof satisfies the inequality
cn
d(xn , x∗ ) ≤ d(x0 , x1 ),
1−c
so speed of convergence depends only on the parameter c ∈ (0, 1) and the quality of
the initial guess x0 ∈ X.
2. The contraction principle can be used to solve equations. For example, say we want
to solve F (x) = 0 (F is some function). Then we can set G(x) = F (x) + x. Then
F (x) = 0 if and only if x is a fixed point of G.
3. The conclusion does not necessarily hold if we drop the assumption that X is
complete: the map f : (0, 1) → (0, 1) defined by f (x) = x/2 is a contraction (in which
metric space?) but has no fixed point.
4. If we replace the contraction assumption (1.2) by the weaker condition
(1.3) |ϕ(x) − ϕ(y)| < |x − y|
for all x, y with x 6= y then ϕ may not have a fixed point in X. Consider ϕ(x) = x + e−x
on the complete metric space X = [0, ∞). One verifies that ϕ0 (x) = 1 − e−x ∈ (0, 1) for
x ≥ 0 thus (1.3) is satisfied if x 6= y. But clearly ϕ(x) − x > 0 for x ≥ 0 so ϕ does not
have a fixed point.
Exercise 1.37. We are given h ∈ C([0, 1]) and K ∈ C([0, 1]2 ) such that |K(x, t)| ≤
3/4 for (x, t) ∈ [0, 1]2 . Consider the integral equation
Z 1
(1.4) f (x) = K(x, t)f (t)dt + h(x), x ∈ [0, 1]
0
16 1. METRIC SPACES

Show that there exists a unique function continuous in [0, 1] such that (1.4) holds.
Follow the following steps.
(i) Define for f ∈ C([0, 1])
Z 1
T [f ](x) = K(x, t)f (t)dt + h(x)
0
.
(ii) Show that T maps C([0, 1]) to C([0, 1]).
(iii) Show that supx∈[0,1] |T [f ](x) − T [g](x)| ≤ 34 supt∈[0,1] |f (t) − g(t)| and conclude.
Exercise 1.38. Let A > 0. We are given h ∈ C([0, A]) and K ∈ C([0, A]2 ) such
that |K(x, t)| ≤ B for all (x, t) ∈ [0, A]2 .
Consider the Volterra integral equation
Z x
(1.5) f (x) = K(x, t)f (t)dt + h(x), x ∈ [0, A]
0
Show that there exists a unique function continuous in [0, A] such that (1.5) holds. Fill
in the details for the following steps.
(i) Define for f ∈ C([0, A])
Z x
V [f ](x) = K(x, t)f (t)dt + h(x)
0
.
(ii) Show that V maps C([0, A]) to C([0, A]).
(iii) Given a positive number M define a metric on C([0, A]) by
dM (f, g) = sup |f (x) − g(x)|e−M x .
x∈[0,A]

Show that C([0, A]) with this metric is a complete metric space. Show that a sequence
(fn )n∈N of functions in C([0, A]) converges uniformly if and only if it converges with
respect to dM .
(iv) Show
B
dM (V [f ], V [g]) ≤ d(f, g)
M
so that with the choice M > B the map V becomes a contraction on (C([0, A]), dM ) .
Remark. The preceding example shows that a smart choice of the metric or metric
space can be crucial in solving such equations. This can often present highly nontrivial
problems in applications.
Exercise 1.39. Show that the system of equations
1
x1 + 10
cos(sin(2x2 + x1 )) = 6
1 −x21 1
x2 + 12
e + 10 cos(x1 + x2 ) = 7
has a unique solution (x1 , x2 ) ∈ R2 .
Exercise 1.40. (i) Show there is exactly one u ∈ C([−1, 1]) that satisfies the
integral equation
Z x
t2 cos u(t) dt,

u(x) = x x ∈ [−1, 1].
0
Hint: Use the contraction principle in the space C([−1, 1]).
(ii) Show that u is differentiable. Is u0 differentiable?
3. COMPACTNESS 17

Exercise 1.41. Let f : R → R be a C 1 -function, such that |f 0 (x)| ≤ a < 1 for all
x ∈ R. Define a C 1 -function g : R2 → R2 by g(x, y) = (x + f (y), y + f (x)). Show that
the range of g is all of R2 .
Exercise 1.42. Show that there exists a unique (x, y) ∈ R2 such that cos(sin(x)) =
y and sin(cos(y)) = x.
3. Compactness
The goal in this section is to study the general theory of compactness in metric
spaces. From Analysis I, you might already be familiar with compactness in R. By
the Heine-Borel theorem, a subset of Rn is compact if and only if it is bounded and
closed. We will see that this no longer holds in general metric spaces. We will also
study in detail compact subsets of the space of continuous functions C(K) where K is
a compact metric space (Arzelà-Ascoli theorem). Let (X, d) be a metric space. We first
review some basic definitions.
Definition 1.43. A collection (Gi )i∈I (ISis an arbitrary index set) of open sets
Gi ⊂ X is called an open cover of X if X ⊂ i∈I Gi .
Definition 1.44. X is compact if every open cover of X contains a finite subcover.
is, if for every open cover (Gi )i∈I there exists m ∈ N and i1 , . . . , im ∈ I such that
That S
X⊂ m j=1 Gij . This is also called the Heine-Borel property .

Definition 1.45. A subset A ⊂ X is called compact if (A, d|A×A ) is a compact


metric space.
Theorem 1.46 (Heine-Borel). A subset A ⊂ R is compact if and only if A is closed
and bounded.
This theorem also holds for subsets of Rn but not for subsets of general metric
spaces. We will later identify this as a special case of a more general theorem.
Definition 1.47. A subset A ⊂ X is called relatively compact or precompact if
the closure A ⊂ X is compact.
Examples 1.48. • If X is finite, then it is compact.
• [a, b] ⊂ R isPcompact. [a, b), (a, b) ⊂ R are relatively compact.
n
• {x ∈ Rn : 2 n
i=1 |xi | = 1} ⊂ R is compact.
• The set of orthogonal n × n matrices with real entries O(n, R) is compact as a
2
subset of Rn .
• For general X, the closed ball
B(x0 , r) = {x ∈ X : d(x, x0 ) ≤ r} ⊂ X
is not necessarily compact (examples later).
As a warm-up in dealing with the definition of compactness let us prove the follow-
ing.
Lemma 1.49. A closed subset of a compact metric space is compact.
Proof. Let (Gi )i∈I be an open cover of a closed subset A ⊂ X. That is, Gi ⊂ A
is open with respect to A. Then Gi = Ui ∩ A for some open Ui ⊂ X (see Theorem 2.30
in Rudin’s book). Note that X\A is open. Thus,
{Ui : i ∈ I} ∪ {X\A}
18 1. METRIC SPACES

is an open cover of X, which by compactness has a finite subcover {Uik : k =


1, . . . , M } ∪ {X\A}. Then {Gik : k = 1, . . . , M } is an open cover of A. 
Exercise 1.50. A collection {Fα : α ∈ A} of closed sets has the finite intersection
property if for every finite subset Ao of A the intersection ∩α∈Ao Fα is not empty.
Prove that the following statements (i), (ii) are equivalent.
(i) A metric space X, with metric d, is compact.
(ii) For every collection {Fα }α∈A of closed sets with the finite intersection property
it follows that \
Fα 6= ∅.
α∈A

Exercise 1.51. Let X be a compact metric space. Prove that there exists a count-
able, dense set E ⊂ X (recall that E ⊂ X is called dense if E = X).
Exercise 1.52. Construct a compact subset of real numbers whose accumulation
points form a countable set.
3.1. Compactness and continuity. We will now prove three key theorems that
relate compactness to continuity. In Analysis I you might have seen versions of these
on R or Rn . The proofs are not very interesting, but can serve as instructive examples
of how to prove statements involving the Heine-Borel property.
Theorem 1.53. Let X, Y be metric spaces and assume that X is compact. If a map
f : X → Y is continuous, then it is uniformly continuous.
Proof. Let ε > 0. We need to demonstrate the existence of a number δ > 0
such that for all x, y ∈ X we have that dX (x, y) ≤ δ implies dY (f (x), f (y)) ≤ ε. By
continuity, for every x ∈ X there exists a number δx > 0 such that for all y ∈ X,
dX (x, y) ≤ δx implies dY (f (x), f (y)) ≤ ε/2. Let
Bx = B(x, δx /2) = {y ∈ X : dX (x, y) < δx /2}.
Then (Bx )x∈X is an open cover of X. By compactness, there exists a finite subcover by
Bx1 , . . . , Bxm . Now we set
δ = 12 min(δx1 , . . . , δxm ).
We claim that this δ does the job. Indeed, let x, y ∈ X satisfy dX (x, y) ≤ δ. There
exists i ∈ {1, . . . , m} such that x ∈ Bxi . Then
dX (xi , y) ≤ dX (xi , x) + dX (x, y) ≤ 21 δxi + δ ≤ δxi .

xi
x
y

Figure 2. The balls Bxi , B(xi , δxi ), B(x, δ).


Thus, by definition of δxi ,
dY (f (x), f (y)) ≤ dY (f (x), f (xi )) + dY (f (xi ), f (y)) ≤ ε/2 + ε/2 = ε.
3. COMPACTNESS 19


Theorem 1.54. Let X, Y be metric spaces and assume that X is compact. If a map
f : X → Y is continuous, then f (X) ⊂ Y is compact.
Note that for A ⊂ X we have A ⊂ f −1 (f (A)) and for B ⊂ Y we have f (f −1 (B)) ⊂
B, but equality need not hold in either case.
Proof. Let (Vi )i∈I be an open cover of S f (X). Since f is continuous, the sets
Ui = f −1 (Vi ) ⊂ X are open. We have f (X) ⊂ i∈I Vi . So,
[ [
X ⊂ f −1 (f (X)) ⊂ f −1 (Vi ) = Ui .
i∈I i∈I

Thus (Ui )i∈I is an open cover of X and by compactness there exists a finite subcover
{Ui1 , . . . , Uim }. That is,
m
[
X⊂ Uik
k=1
Consequently,
m
[ m
[
f (X) ⊂ f (Uik ) ⊂ Vik .
k=1 k=1
Thus {Vi1 , . . . , Vim } is an open cover of f (X). 
Theorem 1.55. Let X be a compact metric space and f : X → R a continuous
function. Then there exists x0 ∈ X such that f (x0 ) = supx∈X f (x).
By passing from f to −f we see that the theorem also holds with sup replaced by
inf.
Proof. By Theorem 1.54, f (X) ⊂ R is compact. By the Heine-Borel Theorem
1.46, it is therefore closed and bounded. By completeness of the real numbers, f (X)
has a finite supremum sup f (X) and since f (X) is closed we have sup f (X) ∈ f (X), so
there exists x0 ∈ X such that f (x0 ) = sup f (X) = supx∈X f (x). 
Corollary 1.56. Let X be a compact metric space. Then every continuous func-
tion on X is bounded: C(X) = Cb (X).
For a converse of this statement, see Exercise 1.104 below.
Proof. Let f ∈ C(X). Then |f | : X → [0, ∞) is also continuous. By Theorem
1.55 there exists x0 ∈ X such that |f (x0 )| = supx∈X |f (x)|. Set C = |f (x0 )|. Then
|f (x)| ≤ C for all x ∈ X, so f is bounded. 
3.2. Sequential compactness and total boundedness.
Definition 1.57. A metric space X is sequentially compact if every sequence in X
has a convergent subsequence. This is also called the Bolzano-Weierstrass property .
Let us recall the Bolzano-Weierstrass theorem which you should have seen in Anal-
ysis I.
Theorem 1.58 (Bolzano-Weierstrass). Every bounded sequence in R has a conver-
gent subsequence.
Definition 1.59. A metric space X is bounded if it is contained in a single fixed
ball, i.e. if there exist x0 ∈ X and r > 0 such that X ⊂ B(x0 , r).
20 1. METRIC SPACES

Definition 1.60. A metric space X is totally bounded if for every ε > 0 there exist
finitely many balls of radius ε that cover X.
Similarly, we define these terms for subsets A ⊂ X by considering (A, d|A×A ) as its
own metric space.
Note that
X totally bounded =⇒ X bounded.
The converse is generally false. However, for A ⊂ Rn we have that A is totally bounded
if and only if A is bounded.
Theorem 1.61. Let X be a metric space. The following are equivalent:
(1) X is compact
(2) X is sequentially compact
(3) X is totally bounded and complete
Corollary 1.62.
(1) (Heine-Borel Theorem) A subset A ⊂ Rn is compact if and only if it is bounded
and closed.
(2) (Bolzano-Weierstrass Theorem) A subset A ⊂ Rn is sequentially compact if
and only if it is bounded and closed.
Proof of Corollary 1.62. A subset A ⊂ Rn is closed if and only if A is com-
plete as a metric space (this is because Rn is complete). Also, A ⊂ Rn is bounded if
and only if it is totally bounded. Therefore, both claims follow from Theorem 1.61. 
Example 1.63. Let `∞ be the space of bounded sequences (an )n∈N ⊂ C with
d(a, b) = supn∈N |an − bn | (that is, `∞ = Cb (N)). We claim that the closed unit ball
around 0 = (0, 0, . . . ),
B(0, 1) = {a ∈ `∞ : |an | ≤ 1 ∀n ∈ N}
is bounded and closed, but not compact. Indeed, let e(k) ∈ `∞ be the sequence with

(k) 0, k 6= n,
en =
1, k = n.
Then, e(k) ∈ B(0, 1) for all k = 1, 2, . . . but (e(k) )k ⊂ B(0, 1) does not have a convergent
subsequence, because d(e(k) , e(j) ) = 1 for all k 6= j and therefore no subsequence can
be Cauchy. Thus B(0, 1) is not sequentially compact and by Theorem 1.61 it is not
compact.
1
P Example 1.64. Let ` be the space of complex sequences (an )n∈N ⊂ C such that
1
n |a n | < ∞. We define a metric on ` by
X
d(a, b) = |an − bn |
n

Exercise 1.65. Show that the closed and bounded set B(0, 1) ∈ `1 is not compact.
Proof of Theorem 1.61. X compact ⇒ X sequentially compact: Suppose that
X is compact, but not sequentially compact. Then there exists a sequence (xn )n∈N ⊂ X
without a convergent subsequence. Let A = {xn : n ∈ N} ⊂ X. Note that A must
be an infinite set (otherwise (xn )n∈N has a constant subsequence). Since A has no
accumulation points, we have that for every xn there is an open ball Bn such that
Bn ∩ A = {xn }. Also, A is a closed set, so X\A is open. Thus, {Bn : n ∈ N} ∪ {X\A}
3. COMPACTNESS 21

is an open cover of X. By compactness of X, it has a finite subcover, but that is a


contradiction since A is an infinite set.

Bi Bj Bk B`

Figure 3.

X sequentially compact ⇒ X totally bounded and complete: Suppose X is sequen-


tially compact. Then it is complete, because every Cauchy sequence that has a conver-
gent subsequence must converge (prove this!). Suppose that X is not totally bounded.
Then there exists ε > 0 such that X cannot be covered by finitely many ε-balls.
Claim: There exists a sequence p1 , p2 , . . . in X such that d(pi , pj ) ≥ ε for all i 6= j.
Proof of claim. Pick p1 arbitrarily and then proceed inductively: say that we have
exists pn+1 such that d(pi , pn+1 ) ≥ ε for all
constructed p1 , . . . , pn already. Then there S
i = 1, . . . , n since otherwise we would have ni=1 B(pi , ε) ⊃ X. 
Now it remains to observe that the sequence (pn )n∈N has no convergent subsequence
(no subsequence can be Cauchy). Contradiction! Thus, X is totally bounded.

X totally bounded and complete ⇒ X sequentially compact: Assume that X is to-


tally bounded and complete. Let (xn )n∈N ⊂ X be a sequence. We will construct a con-
vergent subsequence. First we cover X by finitely many 1-balls. At least one of them,
call it B0 , must contain infinitely many of the xn (that is, xn ∈ B0 for infinitely many
(0)
n), so there is a subsequence (xn )n ⊂ B0 . Next, cover X by finitely many 21 -balls.
(0)
There is at least one, B1 , that contains infinitely many of the xn . Thus there is a
(1) (0) (1)
subsequence (xn )n ⊂ B1 . Inductively, we obtain subsequences (xn )n ⊃ (xn )n ⊃ . . .
(k) (n)
of (xn )n∈N such that (xn )n is contained in a ball of radius 2−k . Now let an = xn .
Then (an )n∈N is a subsequence of (xn )n∈N .
Claim: (an )n∈N is a Cauchy sequence.
Proof of claim. Let ε > 0 and N large enough so that 2−N +1 < ε. Then for m > n ≥ N
we have
d(am , an ) ≤ 2 · 2−n ≤ 2−N +1 < ε,
because an , am ∈ Bn and Bn is a ball of radius 2−n . 
Since X is complete, the Cauchy sequence (an )n∈N converges.

X sequentially compact ⇒ X compact: Assume that X is sequentially compact.


Let (Gi )i∈I be an open cover of X.
Claim: There exists ε > 0 such that every ball of radius ε is contained in one of the
Gi .
Proof of claim. Suppose not. Then for every n ∈ N there is a ball Bn of radius
1
n
that is not contained in any of the Gi . Let pn be the center of Bn . By sequential
compactness, the sequence (pn )n∈N has a convergent subsequence (pnk )k∈N with some
limit p ∈ X. Let i0 ∈ I be such that p ∈ Gi0 . Since Gi0 is open there exists δ > 0
such that B(p, δ) ⊂ Gi0 . Let k be large enough such that d(pnk , p) < δ/2 and n1k < δ/2.
Then Bnk ⊂ B(p, δ) because if x ∈ Bnk , then
d(p, x) ≤ d(p, pnk ) + d(pnk , x) < δ/2 + δ/2 = δ.
22 1. METRIC SPACES

Thus, Bnk ⊂ B(p, δ) ⊂ Gi0 .

p1 p2 pnk p

Figure 4.
This is a contradiction, because we assumed that the Bn are not contained in any
of the Gi . 
Now let ε > 0 be such that every ε-ball is contained in one of the Gi . We have already
proven earlier that X is totally bounded if it is sequentially compact. Thus there exist
p1 , . . . , pM such that the balls B(pj , ε) cover X. But each B(pj , ε) is contained in a Gi ,
say in Gij , so we have found a finite subcover:
M
[ M
[
X⊂ B(pj , ε) ⊂ Gij .
j=1 j=1


Corollary 1.66. Compact subsets of metric spaces are bounded and closed.
Corollary 1.67. Let X be a complete metric space and A ⊂ X. Then A is totally
bounded if and only if it is relatively compact.
Exercise 1.68. Prove this.
3.3. Equicontinuity and the Arzelà-Ascoli theorem. Let (K, d) be a compact
metric space. By Corollary 1.56, continuous functions on K are automatically bounded.
Thus, C(K) = Cb (K) is a complete metric space with the supremum metric
d∞ (f, g) = sup |f (x) − g(x)|
x∈K

(see Fact 1.28). Convergence with respect to d∞ is uniform convergence (see Fact 1.29).
In this section we ask ourselves when a subset F ⊂ C(K) is compact.
Example 1.69. Let F = {fn : n ∈ N} ⊂ C([0, 1]), where
fn (x) = xn , x ∈ [0, 1].
F is not compact, because no subsequence of (fn )n∈N converges. This is because the
pointwise limit 
0, x ∈ [0, 1),
f (x) =
1, x = 1.
is not continuous, i.e. not in C([0, 1]).
The key concept that characterizes compactness in C(K) is equicontinuity.
Definition 1.70 (Equicontinuity). A subset F ⊂ C(K) is called equicontinuous if
for every ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε for all f ∈ F, x, y ∈ K
with d(x, y) < δ.
3. COMPACTNESS 23

Definition 1.71. F ⊂ C(K) is called uniformly bounded if there exists C > 0


such that |f (x)| ≤ C for all x ∈ K and f ∈ F.
F ⊂ C(K) is called pointwise bounded if for all x ∈ K there exists C = C(x) > 0 such
that |f (x)| ≤ C for all f ∈ F.
Note that F ⊂ C(K) is uniformly bounded if and only if it is bounded (as a metric
space, see Definition 1.59). We have
F uniformly bounded ⇒ F pointwise bounded.
The converse is false in general.
Lemma 1.72. If (fn )n∈N ⊂ C(K) is uniformly convergent (on K), then {fn : n ∈ N}
is equicontinuous.
Proof. Let ε > 0. By uniform convergence there exists N ∈ N such that
sup |fn (x) − fN (x)| ≤ ε/3
x∈K

for n ≥ N . By uniform continuity (using Theorem 1.53) there exists δ > 0 such that
|fk (x) − fk (y)| ≤ ε/3
for all x, y ∈ K with d(x, y) < δ and all k = 1, . . . , N . Thus, for n ≥ N and x, y ∈ K
with d(x, y) < δ we have
|fn (x) − fn (y)| ≤ |fn (x) − fN (x)| + |fN (x) − fN (y)| + |fN (y) − fn (y)| ≤ 3 · ε/3 = ε.

Lemma 1.73. If F ⊂ C(K) is pointwise bounded and equicontinuous, then it is
uniformly bounded.
Proof. Choose δ > 0 such that
|f (x) − f (y)| ≤ 1
for all d(x, y) < δ, f ∈ F. Since K is totally bounded (by Theorem 1.61) there exist
p1 , . . . , pm ∈ K such that the balls B(pj , δ) cover K. By pointwise boundedness, for
every x ∈ K there exists C(x) such that |f (x)| ≤ C(x) for all f ∈ F. Set
C = max{C(p1 ), . . . , C(pm )}.
Then for f ∈ F and x ∈ K,
|f (x)| ≤ |f (pj )| + |f (x) − f (pj )| ≤ C + 1,
where j is chosen such that x ∈ B(pj , δ). 
Theorem 1.74 (Arzelà-Ascoli). A subset F of C(K) is totally bounded if and only
if it is pointwise bounded and equicontinuous.

Proof of necessity. We show that if F is a totally bounded subset of C(K)


then F is pointwise bounded and equicontinuous.
By the definition of F totally bounded there are functions f1 , . . . , fN in F so that
for every f ∈ F there is an index i ∈ {1, . . . , N } with supx∈K |fi (x) − f (x)| < ε/4.
Clearly for every x ∈ K,
|f (x)| ≤ |fi (x)| + |f (x) − fi (x)| ≤ max sup |fi (x)| + ε/4
i=1,...,N x∈K

so that F is pointwise bounded.


24 1. METRIC SPACES

Now we show the equicontinuity of the family F. By Theorem 1.53 each fi is


uniformly continuous. Thus for each i there exists a δi > 0 such that |fi (x) − fi (x0 )| <
ε/2 whenever dK (x, x0 ) < δi . Let δ = min{δ1 , . . . , δN }. Then δ > 0 and we have
|fi (x) − fi (x0 )| < ε/2 for every i whenever dK (x, x0 ) < δ.
Now pick any f ∈ F, and let i be so that supx∈K |fi (x) − f (x)| < ε/4, and let x, x0
be so that dK (x, x0 ) < δ. Then
|f (x) − f (x0 )| ≤ |f (x) − fi (x)| + |fi (x) − fi (x0 )| + |fi (x0 ) − f (x0 )|
≤ ε/4 + |fi (x) − fi (x0 )| + ε/4 < ε .

Proof of sufficiency. We show that if F ⊂ C(K) is equicontinuous and point-
wise bounded then F is totally bounded.
Fix ε > 0. We shall first find a finite collection G of functions in B(K) so that for
every f ∈ F there exists a g ∈ G with supx∈K |f (x) − g(x)| < ε.
Let δ > 0 so that for all f ∈ F we have |f (x) − f (x0 )| < ε/4 whenever |x − x0 | < δ.
Again we use the compactness of K and cover K with finitely many balls B(xi , δ),
i = 1, . . . , L. There is Mi so that |f (xi )| ≤ Mi for all f ∈ F. Let M = 1+maxi=1,...,L Mi .
i−1
We now let A1 = B(x1 , δ), and Ai = B(xi , δ) \ ∪ν=1 B(xν , δ), for 2 ≤ i ≤ L. (Some
of the Ai could be empty but that does not matter).
Let Z L (M, ε) be the set of L-tuples ~n of integers ~n = (n1 , . . . , nL ) with the property
that |ni |ε/4 ≤ M for i = 1, . . . , L. Note that Z L (M, ε) is a finite set (indeed its
cardinality is ≤ (8M ε−1 + 1)L ).
We now define a collection G of functions which are constant on the sets Ai (these
are analogues of step functions). Namely given ~n in Z L (M, ε) we let g~n be the unique
function that takes the value ni ε/4 on the set Ai (provided that that set is nonempty).
Clearly the cardinality of G is not larger than the cardinality of Z L (M, ε).
Let f ∈ F. Consider an Ai which by construction is a subset of B(xi , δ). Then
|f (x) − f (xi )| < ε/4 for all x ∈ Ai (this condition is vacuous if Ai is empty). Now
|f (xi )| ≤ Mi ≤ M and therefore there exists an integer ni with the property that
−M ≤ ni ε/4 ≤ M and |f (xi ) − ni ε/4| < ε/4. Then we also have that for i = 1, . . . , L
and for every x ∈ Ai ,
|f (x) − ni ε/4| ≤ |f (x) − f (xi )| + |f (xi ) − ni ε/4| < ε/4 + ε/4 = ε/2.
This implies that for this choice of ~n = (n1 , . . . , nL ) we get supx∈K |f (x) − g~n (x)| < ε/2.
Finally, we need to find a finite cover of F with ε-balls centered at points in F.
Consider the subcollection Ge of functions in G for which the ball of radius ε/2 centered
at g contains a function in F. Denote the functions in Ge by g1 , . . . , gN . The balls
of radius ε/2 centered at g1 , . . . gN cover F. For i = 1, . . . , N pick fi ∈ F so that
supx∈K |gi (x) − fi (x)| < ε/2. By the triangle inequality (for the norm in B(K) whose
restriction to C(K) is also the norm in C(K)) the ball of radius ε/2 centered at gi is
contained in the ball of radius ε centered at fi . Thus the balls of radius ε centered at
fi , i = 1, . . . , N cover the set F. 
We get as a corollary of the theorem of Arzela-Ascoli a characterization of compact-
ness.
Corollary 1.75. A closed subset F of C(K) is compact if and only if it is pointwise
bounded and equicontinuous.
3. COMPACTNESS 25

Proof. Recall that the space C(K) is complete. Since we now assume that F
is closed in C(K) the metric space F is complete. Thus by the characterization of
compactness (F compact ⇐⇒ F totally bounded and complete) the corollary follows
from the theorem. 
Corollary 1.76. An equicontinuous and bounded sequence {fn } of functions in
C(K) has a uniformly convergent subsequence.
Proof. The closure of F = {fn : n ∈ N} is bounded, complete, and equicontinuous,
thus compact. By a part of the theorem on the characterization of compactness it is
also sequentially compact, therefore fn has a convergent subsequence. 
We now discuss a special case of the Arzelá-Ascoli theorem.
Corollary 1.77. Let F ⊂ C([a, b]) be such that
(i) F is bounded (i.e. uniformly bounded),
(ii) every f ∈ F is continuously differentiable and
F 0 = {f 0 : f ∈ F}
is bounded.
Then F is totally bounded.
Proof of Corollary 1.77. Using the mean value theorem we see that for all
x, y ∈ [a, b] there exists ξ ∈ [a, b] such that
f (x) − f (y) = f 0 (ξ)(x − y).
But since F 0 is bounded there exists C > 0 such that
|f 0 (ξ)| ≤ C
for all f ∈ F, ξ ∈ [a, b]. Thus,
|f (x) − f (x)| ≤ C|x − y|
for all x, y ∈ [a, b] and all f ∈ F. This implies equicontinuity: for ε > 0 we set δ = C −1 ε.
Then for x, y ∈ [a, b] with |x − y| < δ we have
|f (x) − f (y)| ≤ C|x − y| < Cδ = ε.
Therefore the claim follows from Theorem 1.74. 
Example 1.78. Let F = {x 7→ ∞ n 1 1
P
n=0 cn x : |cn | ≤ 1} ⊂ C([− 2 , 2 ]). The set F is
bounded, because

X X∞
n
cn x ≤ 2−n = 2.
n=0 n=0

for all sequences (cn )n∈N with |cn | ≤ 1 and for all x ∈ [−1/2, 1/2]. Similarly,
nX ∞ o
0
F = ncn xn−1 : |cn | ≤ 1
n=1

is also bounded. Thus, F ⊂ C([− 21 , 12 ]) is relatively compact. However, note that the F
interpreted as a subset of C([0, 1]) (with the understanding that convergence at x = 1
is also assumed) is not relatively compact (it contains the set in Example 1.69).
26 1. METRIC SPACES

Example 1.79. The set


n o
F= sin(πnx) : n ∈ Z ⊂ C([0, 1])
is bounded, but not relatively compact. Indeed, suppose it is. Then by Arzelà-Ascoli
it is equicontinuous, so there exists δ > 0 such that for all n ∈ N and for all x, y ∈ [0, 1]
with |x − y| < δ we have | sin(πnx) − sin(πny)| < 1/2. Set x = 0 and y = 1/(2n) for
n > δ −1 /2. Then | sin(πnx) − sin(πny)| = 1. Contradiction!

Example 1.80. Condition (i) from Corollary 1.77 is necessary, because relatively
compact sets are bounded. Condition (ii) however is not necessary.√ Consider for ex-
ample F = {fn : n = 1, 2, . . . } ⊂ C([0, 1]) with fn (x) = sin(nx)/ n. The set F is
bounded, but F 0 is unbounded. But the sequence (fn )n∈N is uniformly convergent, so
by Fact 1.72, F is equicontinuous and hence relatively compact.

4. Covering numbers and Minkowski dimension*

Definition 1.81. Let E be a totally bounded subset of a metric space X, i.e. for
every δ > 0 it is contained in a finite collection of δ-balls.
For δ > 0 let N (E, δ) be the minimal number of δ-balls needed to cover E (the
centers of these balls are not required to belong to E). This number is called the δ-
covering number of E; note that it depends not only on E but also on the underlying
metric space X and the given metric d. The function δ 7→ log N (E, δ) is called the
metric entropy function of E.
The definition of N (E, δ) is extended to sets that are not totally bounded if we allow
the value ∞. If E is not totally bounded then there exists a δ0 such that N (E, δ) = ∞
for δ < δ0 .
One is interested in the behavior of N (E, δ) for small δ. For compact E this serves
as a quantitative measure of compactness.
Definition 1.82. Let E be totally bounded. The number
log N (E, δ)
dimM (E) = lim sup
δ→0+ log( 1δ )
is called the upper Minkowski dimension (also known as Box counting dimension or
upper metric dimension of E.) The expression
log N (E, δ)
dimM (E) = lim inf
δ→0+ log( 1δ )
is called lower Minkowski dimension or lower box counting or metric metric dimension
of E. If dim(E) = dim(E) = α we say that E has Minkowski dimension α.
Example 1.83. Let k ≤ n and let E denote a k-dimensional box in Rn :
E = [0, 1]k × {0}n−k = {x ∈ Rn : xj ∈ [0, 1] for 1 ≤ j ≤ k, xj = 0 for k < j ≤ n}.
Then there exist constants c, c0 > 0 such that
(1.6) c0 δ −k ≤ N (E, δ) ≤ cδ −k
for all δ ∈ (0, 1). Hence E has Minkowski dimension k.
4. COVERING NUMBERS AND MINKOWSKI DIMENSION* 27

Exercise 1.84. Let E ⊂ Rn be a compact set. Show that there exists a constant
c ∈ (0, ∞) such that
N (E, δ) ≤ c · δ −n
holds for all δ > 0. Hence dimM E ≤ n
Exercise 1.85. (i) Show that if we replace the natural log in the above definitions
by another logb with base b > 1 then the definitions of the dimensions do not change.
(ii) Let α > 0. Suppose that for every ε > 0 there is a δ(ε) > 0 and a positive
constant Cε ≥ 1 such that Cε−1 δ −α+ε ≤ N (E, δ) ≤ Cε δ −α−ε for 0 < δ < δ(ε). Show
that E has Minkowski dimension α.
(iii) Let E ⊂ X be totally bounded and let E be the closure of E. Show: E is
totally bounded and we have
N (E, δ) ≤ N (E, δ) ≤ N (E, δ 0 ) if 0 < δ 0 < δ.
(iv) Define N cent (E, δ) to be the minimal number of δ-balls with center in E needed
to cover E. Show that
N (E, δ) ≤ N cent (E, δ) ≤ N (E, δ/2).
(v) Let B1 , . . . , BM be balls of radius δ in X, so that each ball has nonempty
intersection with the set E. For each i = 1, . . . , M denote by Bi∗ the ball with same

center as Bi and radius 3δ. Assume that the balls B1∗ , . . . , BM are disjoint. Prove that
M ≤ N (E, δ).
Remark: This can be an effective tool to prove lower bounds for the covering num-
bers.
Exercise 1.86. Consider the following metrics in Rn .
• d1 (x, y) = ni=1 |xi − yi |,
P
P 1/2
n 2
• d2 (x, y) = i=1 |x i − y i | ,
• d∞ (x, y) = maxi=1,...,n |xi − yi |.
(i) Let E ⊂ Rn and let N1 (E, δ), N2 (E, δ), N∞ (E, δ) be the metric entropy numbers
of E associated with to the metrics d1 , d2 , d∞ , respectively. Show that

N∞ (E, δ) ≤ N2 (E, δ) ≤ N1 (E, δ) ≤ N2 (E, δ/ n) ≤ N∞ (E, δ/n).
(ii) Let Q = [0, 1]n be the unit cube in Rn . Show that Q has Minkowski dimension
n (with respect to any of the metrics d1 , d2 , d3 ).
(iii) Let f be a differentiable function on [0, 1] with bounded derivative. Let E be
the set of all x = (x1 , x2 ) ∈ R2 for which 0 ≤ x1 ≤ 1 and x2 = f (x1 ). What is the
Minkowski dimension of E?

(iv) Let E be the set of all x = (x1 , x2 ) ∈ R2 for which 0 ≤ x1 ≤ 1 and x2 = x1 .
What is the Minkowski dimension of E?
Exercise 1.87. Let β > 0. Consider the subset E of R consisting of the numbers
n−β , for n = 1, 2, . . . . Show that E has a Minkowski dimension and determine it.
Hint: It might help to try this first for the sequence 1/n which, perhaps counterin-
tuitively, turns out to have Minkowski dimension 12 .
Example 1.88. The Cantor middle third set C is given as a the subset of [0, 1]
consisting of numbers of the form
X∞
ak 3−k where ak ∈ {0, 2}.
k=1
28 1. METRIC SPACES

It can be written as
∞ 3[`
−1
[
(1.7) C = [0, 1]\ ( 3k+1 , 3k+2 ).
3`+1 3`+1
`=0 k=0

C is a compact subset of [0, 1], with the property that for each N there are 2N disjoint
closed intervals of length 3−N which cover C.
log 2
Exercise 1.89. Show that C has Minkowski dimension log 3
.
Exercise 1.90. Let A be the space of functions f : N → R (aka sequences) so that
|f (n)| ≤ 2−n for all n ∈ N. It is a subset of the space of bounded sequences with norm
kf k∞ = supn∈N |f (n)| and associated metric d∞ . Show that for δ < 1/2 the covering
numbers N (A, δ) satisfy the bounds
 1 C+ 12 log2 1δ
N (A, δ) ≤
δ
where C is independent of δ. Hint: It helps to work with δ = 2−M where M ∈ N.
Also provide a lower bound which shows that A does not have finite lower Minkowski
dimension.

5. Oscillation as a quantification of discontinuity*


In this section let (X, d) be a metric space and f : X → R be a function.
Definition 1.91. (i) Let f : X → R. For each x ∈ X and δ > 0 we form the
expressions
Mf,δ (x) = sup{f (y) : d(x, y) < δ, y ∈ X}
mf,δ (x) = inf{f (y) : d(x, y) < δ, y ∈ X}
Observe that, for fixed x ∈ X, mf,δ (x) increases in δ as δ decreases. Moreover
M (f, δ)(x) decreases in δ as δ decreases. Thus Mf,δ (x) − mf,δ (x) is a nonnegative
quantity which decreases as δ decreases. Hence the limit as δ → 0+ exists.
Definition 1.92. We call the quantity
oscf (x) = lim Mf,δ (x) − mf,δ (x)
δ→0+

the oscillation of f at x.
The number oscf (x) can be used to quantify discontinuities:
Lemma 1.93. Let f : X → R be a bounded function. Then f is continuous at x if
and only if oscf (x) = 0.
Proof. This is a consequence of the definition of continuity. 
Lemma 1.94. Let f : X → R be a bounded function. Then for every γ ≥ 0 the set
{x : oscf (x) ≥ γ} is closed.
Proof. The conclusion is shown by proving that the complement
Ωγ = {x : oscf (x) < γ}
is open. Let x ∈ Ωγ and choose ε such that 0 < ε < γ − oscf (x). By the definition of
oscf (x) we can pick δ > 0 such that Mf,δ (x)−mf,δ (x) < oscf (x)+ε. If d(y, x) < δ/2 and
6. FURTHER EXERCISES 29

d(z, y) < δ/2 then d(z, x) < δ and thus Mf,δ/2 (y) ≤ Mf,δ (x) and mf,δ/2 (y) ≥ mf,δ (x).
Hence
oscf (y) ≤ Mf,δ/2 (y) − mf,δ/2 (y) ≤ Mf,δ (x) − mf,δ (x) < oscf (x) + ε < γ
so that B(x, δ/2) ⊂ Ωγ . Hence x is an interior point of Ωγ and since x was chosen
arbitrarily in Ωγ this set is open. 
Exercise 1.95. Define f : [−10, 10] → R by f (x) = −4x for x ≤ 0, f (x) = sin(π/x)
for 0 < x < 3/2, f (x) = cos(π/x) for x ≥ 3/2. Determine oscf (x) for all x ∈ [−10, 10].
Exercise 1.96. Consider Thomae’s function f : [0, 1] → R, defined by
(
0 if x ∈ [0, 1] \ Q,
f (x) = 1
n
if x = m
n
with gcd(m, n) = 1.
Find oscf (x) for all x ∈ [0, 1].

6. Further exercises
Exercise 1.97. Let (X, d) be a metric space and A ⊂ X a subset.
(i) Show that A is totally bounded if and only if A is totally bounded.
(ii) Assume that X is complete. Show that A is totally bounded if and only if A is
relatively compact. Which direction is still always true if X is not complete?
1
P∞ 1.98. Let ` denote space of all sequences (an )P
Exercise n∈N of complex numbers
such that n=1 |an | < ∞, equipped with the metric d(a, b) = ∞ n=1 |an − bn |.
(i) Prove that
X∞
1
A = {a ∈ ` : |an | ≤ 1}
n=1
is bounded and closed, but not compact.
(ii) Let b ∈ `1 with bn ≥ 0 for all n ∈ N. Show that
B = {a ∈ `1 : |an | ≤ bn ∀ n ∈ N}
is compact.
Exercise 1.99. Recall that `∞ is the metric space of bounded sequences of complex
numbers equipped with the supremum metric d(a, b) = supn∈N |an − bn |. Let s ∈ `∞ be
a sequence of non–negative real numbers that converges to zero. Let
A = {a ∈ `∞ : |an | ≤ sn for all n}.
Prove that A ⊂ `∞ is compact.
Exercise 1.100. For each of the following subsets of C([0, 1]) prove or disprove
compactness:
(i) A1 = {f ∈ C([0, 1]) : maxx∈[0,1] |f (x)| ≤ 1}.
(ii) A2 = A1 ∩ {p : p polynomial of degree ≤ d} (where d ∈ N is given) .
(iii) A3 = A1 ∩ {f : f is a power series with infinite radius of convergence}.
Exercise 1.101. Let F ⊂ C([a, b]) be a bounded set. Assume that there exists a
function ω : [0, ∞) → [0, ∞) such that
lim ω(t) = ω(0) = 0.
t→0+
30 1. METRIC SPACES

and for all x, y ∈ [a, b], f ∈ F,


|f (x) − f (y)| ≤ ω(|x − y|).
Show that F ⊂ C([a, b]) is relatively compact.
Exercise 1.102. For 1 ≤Pp < ∞ we denote by `p the space of sequences (an )n∈N of
complex numbers such that ∞ p p
n=1 |an | < ∞. Define a metric on ` by
!1/p
X
d(a, b) = |an − bn |p .
n∈N

The purpose of this exercise is to prove a theorem of Fréchet that characterizes com-
pactness in `p . Let F ⊂ `p .
(i) Assume that F is bounded and equisummable in the following sense: for all ε > 0
there exists N ∈ N such that
X∞
|an |p < ε for all a ∈ F.
n=N

Then show that F is totally bounded.


(ii) Conversely, assume that F is totally bounded. Then show that it is equisummable
in the above sense.
Hint: Mimick the proof of Arzelà-Ascoli.
Exercise 1.103. Let C k ([a, b]) denote the space of k-times continuously differen-
tiable functions on [a, b] endowed with the metric
k
X
d(f, g) = sup |f (j) (x) − g (j) (x)|.
j=0 x∈[a,b]

Let 0 ≤ ` < k be integers and consider the canonical embedding map


ι : C k ([a, b]) → C ` ([a, b]) with ι(f ) = f.
Prove that if B ⊂ C k ([a, b]) is bounded, then the image ι(B) = {ι(f ) : f ∈ B} ⊂
C ` ([a, b]) is relatively compact. Hint: Use the Arzelà-Ascoli theorem.

Exercise 1.104. Let X be a metric space. Assume that for every continuous
function f : X → C there exists a constant Cf > 0 such that |f (x)| ≤ Cf for all
x ∈ X. Show that X is compact. Hint: Assume that X is not sequentially compact
and construct an unbounded continuous function on X.
Exercise 1.105. Consider F = {fN : N ∈ N} ⊂ C([0, 1]) with
N
X
fN (x) = b−nα sin(bn x),
n=0

where 0 < α < 1 and b > 1 are fixed.


(a) Show that F is relatively compact in C([0, 1]).
(b) Show that F 0 is not a bounded subset of C([0, 1]).
(c) Show that there exists c > 0 such that for all x, y ∈ R and N ∈ N,
|fN (x) − fN (y)| ≤ c|x − y|α .
6. FURTHER EXERCISES 31

Exercise 1.106. Suppose (X, d) is a metric space with a countable dense subset, i.e.
a set A = {x1 , x2 , . . . } ⊂ X with A = X. Let `∞ denote the metric space of bounded
sequences a = (an )n∈N of real numbers with metric d∞ (a, b) = supn∈N |an − bn |. Show
that there exists a map ι : X → `∞ with d∞ (ι(x), ι(y)) = d(x, y) for every x, y ∈ X (in
other words, X can be isometrically embedded into `∞ ).
CHAPTER 2

Linear operators and derivatives

1. Bounded linear operators


Let K denote either one of the fields R or C. Let X be a vector space over K.
Definition 2.1. A map k · k : X → [0, ∞) is called a norm if for all x, y ∈ X and
λ ∈ K,
kλxk = |λ| · kxk, kx + yk ≤ kxk + kyk, kxk = 0 ⇔ x = 0.
A K-vector space equipped with a norm is called a normed vector space . On every
normed vector space we have a natural metric space structure defined by
d(x, y) = kx − yk.
A normed vector space which is also complete as a metric space is called Banach space
.
Examples 2.2.
• Rn with the Euclidean norm is a Banach space.
• Rn with the norm kxk = supi=1,...,n |xi | is also a Banach space.
• If K is a compact metric space, then C(K) is a Banach space with the supre-
mum norm kf k∞ = supx∈K |f (x)|.
• The
R 1 space2 of continuous functions on [0, 1] equipped with the L2 -norm kf k2 =
( 0 |f (x)| dx)1/2 is a normed vector space, but not a Banach space (why?).
Example 2.3. The set of bounded sequences (an )n∈N of complex numbers equipped
with the `∞ -norm,
kak∞ = sup |an |
n=1,2,...

is a Banach space. As a metric space, `∞ conincides with Cb (N).


Exercise 2.4. P∞
Define `1 = {(an )n∈N ⊂ C : 1
n=1 |an | < ∞}. We equip ` with the norm defined by

X
kak1 = |an |.
n=1

Prove that this defines a Banach space.


P∞
Exercise 2.5. Define `2 = {(an )n∈N ⊂ C : 2 2
n=1 |an | < ∞}. We equip ` with
the norm defined by

X 1/2
kak2 = |an |2 .
n=1

Prove that this is really a norm and that `2 is complete.


33
34 2. LINEAR OPERATORS AND DERIVATIVES

Let X, Y be normed vector spaces. Recall that a map T : X → Y is called linear if


T (x + λy) = T x + λT y
for every x, y ∈ X, λ ∈ K. Here we adopt the convention that whenever T is a linear
map we write T x instead of T (x) (unless brackets are necessary because of operator
precedence).
Definition 2.6. A linear map T : X → Y is called bounded if there exists C > 0
such that kT xkY ≤ CkxkX for all x ∈ X.
Linear maps between normed vector spaces are also referred to as linear operators .
Lemma 2.7. Let T : X → Y be a linear map. The following are equivalent:
(i) T is bounded
(ii) T is continuous
(iii) T is continuous at 0
(iv) supkxkX =1 kT xkY < ∞
Proof. (i) ⇒ (ii): By assumption and linearity, for x, y ∈ X,
kT x − T ykY = kT (x − y)kY ≤ Ckx − ykX .
This implies continuity.
(ii) ⇒ (iii): There is nothing to prove.
(iii) ⇒ (iv): By continuity at 0 there exists δ > 0 such that for x ∈ X with kxkX ≤ δ
we have kT xkY ≤ 1. Let x ∈ X with kxkX = 1. Then kδxkX = δ, so
kT (δx)kY ≤ 1
By linearity of T , kT xkY ≤ δ −1 . Thus, supkxkX =1 kT xkY ≤ δ −1 < ∞.
(iv) ⇒ (i): Let x ∈ X with x 6= 0. Let C = supkxkX =1 kT xkY < ∞. Then
x
= 1.
kxkX X
Thus,  x 
T ≤ C,
kxkX Y
and by linearity of T this implies
kT xkY ≤ CkxkX . 
Definition 2.8. By L(X, Y ) we denote the space of bounded linear maps T : X →
Y . For every T ∈ L(X, Y ) we define its operator norm by
kT xkY
kT kop = sup .
x6=0 kxkX

We also denote kT kop by kT kX→Y .


One should think of kT kop as the best (i.e. smallest) constant C > 0 for which
kT xkY ≤ CkxkX
holds. We have by definition that
kT xkY ≤ kT kop kxkX .
Observe that by linearity of T and homogeneity of the norm,
kT kop = sup kT xkY = sup kT xkY .
kxkX =1 kxkX ≤1
1. BOUNDED LINEAR OPERATORS 35

Exercise 2.9. Show that L(X, Y ) endowed with the operator norm forms a normed
vector space (i.e. show that k · kop is a norm).
Example 2.10. Let A ∈ Rn×m be a real n × m matrix. We view A as a linear
map Rm → Rn : for x ∈ Rm , A(x) = A · x ∈ Rn . Let us equip Rn and Rm with the
corresponding k · k∞ norms. Consider the operator norm kAk∞→∞ = supkxk∞ =1 kAxk∞
with respect to these normed spaces:
m
X m
X 
kAxk∞ = max Aij xj ≤ max |Aij | kxk∞ .
i=1,...,n i=1,...,n
j=1 j=1
Pm
This implies kAk∞→∞ ≤ maxi=1,...,n j=1 |Aij |. On the other hand, for given i =
1, . . . , n we choose x ∈ Rm with xj = |Aij |/Aij if Aij 6= 0 and xj = 0 if Aij = 0. Then
kxk∞ ≤ 1 and
Xm
kAk∞→∞ ≥ kAxk∞ = |Aij |.
j=1
Pm
Since i was arbitrary, we get kAk∞→∞ ≥ maxi=1,...,n j=1 |Aij |. Altogether we proved
m
X
kAk∞→∞ = max |Aij |.
i=1,...,n
j=1

Exercise 2.11. Let A ∈ Rn×m . For x ∈ Rn we define kxk1 = ni=1 |xi |.


P
(i) Determine the value of kAk1→1 = supkxk1 =1 kAxk1 (that is, find a formula for kAk1→1
involving only finitely many computations in terms of the entries of A).
(ii) Do the same for kAk1→∞ = supkxk1 =1 kAxk∞ .

Exercise 2.12. Let A ∈ Rn×n . Define kxk2 = ( ni=1 |xi |2 )1/2 (Euclidean norm)
P
and kAk2→2 = supkxk2 =1 kAxk2 . Observe that AAT is a symmetric n × n matrix and
hence has only non-negative eigenvalues. Denote the largest eigenvalue of AAT by ρ.

Prove that kAk2→2 = ρ. Hint: First consider the case that A is symmetric. Use that
symmetric matrices are orthogonally diagonalizable.
Exercise 2.13. Let A ∈ Rn×n and define
X n X n 1/2
kAkHS = |Aij |2 .
i=1 j=1

This is a norm on R . Prove the following properties for all A, B ∈ Rn×n :


n×n

(i) kABkHS ≤pkAkHS kBkHS


(ii) kAkHS = trace(AAT )
(iii) kU AkHS = kAkHS for√all orthogonal U ∈ Rn×n
(iv) kAk2→2 ≤ kAkHS ≤ nkAk2→2 Hint: First do Exercise 2.12.
Example 2.14. Let A ∈ Rn×n be an invertible n × n matrix and b ∈ Rn . Say we
want to solve the linear system
Ax = b
for x. Of course, x = A−1 b. However, A−1 is expensive to compute if n is large, so
other methods are desirable for solving linear equations. Let
F (x) = λ(Ax − b) + x
36 2. LINEAR OPERATORS AND DERIVATIVES

for some constant λ 6= 0 that we may choose freely. Then Ax = b if and only if x is a
fixed point of F . Moreover,
kF (x) − F (y)k = kλA(x − y) + x − yk = k(λA + I)(x − y)k ≤ kλA + Ikop kx − yk.
Suppose that λ happens to be such that kλA + Ikop < 1. Then F : Rn → Rn is a
contraction, so we can compute the solution to the equation by the iteration xn+1 =
F (xn ).

2. Equivalence of norms
Definition 2.15. Two norms k·ka and k·kb on a vector space X are called equivalent
if there exist constants c, C > 0 such that
ckxka ≤ kxkb ≤ Ckxka
for all x ∈ X.
Exercise 2.16. Prove that equivalent norms generate the same topologies: if k · ka
and k · kb are equivalent then a set U ⊂ X is open with respect to k · ka if and only if
it is open with respect to k · kb .
Exercise 2.17. Show that equivalence of norms forms an equivalence relation on
the space of norms. That is, if we write n1 ∼ n2 to denote that two norms n1 , n2 are
equivalent, then prove that n1 ∼ n1 (reflexivity), n1 ∼ n2 ⇒ n2 ∼ n1 (symmetry) and
n1 ∼ n2 , n2 ∼ n3 ⇒ n1 ∼ n3 (transitivity).
Theorem 2.18. Let X be a finite-dimensional K-vector space. Then all norms on
X are equivalent.

PnProof. Let {b1 , . . . , bn } be a basis. Then for every x ∈ X we can write x =


i=1 xi bi with uniquely determined coefficients xi ∈ K. Then kxk∗ = maxi |xi | defines
a norm on X. Let k · k be any norm on X. Since equivalence of norms is an equivalence
relation, it suffices to show that k · k∗ and k · k are equivalent. We have
n
X n
X
(2.1) kxk ≤ |xi |kbi k ≤ ( max |xj |) kbi k = Ckxk∗ ,
j=1,...,n
i=1 i=1
Pn
where C = i=1 kbi k ∈ (0, ∞). Now define
S = {x ∈ X : kxk∗ = 1}.

Pn to k · k∗ . Indeed, define the canonical


We claim that this is a compact set with respect
n
isomorphism φ : K → X, (x1 , . . . , xn ) 7→ i=1 xi bi . This is a continuous map (where
we equip Kn with the Euclidean metric, say) and S = φ(K), where K = {x ∈ Kn :
maxi |xi | = 1} is compact by the Heine-Borel Theorem (see Corollary 1.62). Thus S is
compact by Theorem 1.54.
Next note that the function x 7→ kxk is continuous with respect to the k · k∗ norm. This
is because by the triangle inequality and (2.1),
|kxk − kyk| ≤ kx − yk ≤ Ckx − yk∗ .
Thus by Theorem 1.55, x 7→ kxk attains its infimum on the compact set S and therefore
there exists c > 0 such that
(2.2) kyk ≥ c
2. EQUIVALENCE OF NORMS 37

x
for all y ∈ S. For x ∈ X, x 6= 0 we have kxk ∗
∈ S and thus by homogeneity of norms,
x
using (2.2) with y = kxk∗ gives
kxk ≥ ckxk∗ .
Thus we proved that k · k and k · k∗ are equivalent norms. 
In contrast, two given norms on an infinite-dimensional vector space are generally
not equivalent. For example, the supremum norm and the L2 -norm on C([0, 1]) are not
equivalent (as a consequence of Exercise 4.64).
Corollary 2.19. If X is finite-dimensional then every linear map T : X → Y is
bounded.
Proof. Let {x1 , . . . , xn } ⊂ X be a basis. Then for x = ni=1 ci xi with ci ∈ K,
P

Xn
kT xkY ≤ |ci |kT xi kY ≤ C max |ci |,
i=1,...,n
i=1

where C = ni=1 kT xi kY . By equivalence of norms we may assume that maxi |ci | is the
P
norm on X. 
This is not true if X is infinite-dimensional.
Example 2.20. Let X be the set of sequences of complex numbers (an )n∈N such
that supn∈N n|an | < ∞ and let Y be the space of bounded complex sequences. Then
X ⊂ Y . Equip both spaces with the norm kak = supn∈N |an |. The map T : X → Y ,
(k) (k)
(T a)n∈N = nan is not bounded: let en = 1 if k = n and en = 0 if k 6= n. Then
e(k) ∈ X and T e(k) = ke(k) and ke(k) k = 1. So
kT e(k) k = k
for every k ∈ N and therefore supkxk=1 kT xk = ∞.
Exercise 2.21. Let X be the set of continuously differentiable functions on [0, 1]
and let Y = C([0, 1]). We consider X and Y as normed vector spaces with the norm
kf k = supx∈[0,1] |f (x)|. Define a linear map T : X → Y by T f = f 0 . Show that T is
not bounded.
38 2. LINEAR OPERATORS AND DERIVATIVES

3. Dual spaces*
Theorem 2.22. Let X be a normed vector space and Y a Banach space. Then
L(X, Y ) is a Banach space (with the operator norm).
Proof. Let (Tn )n∈N ⊂ L(X, Y ) be a Cauchy sequence. Then for every x ∈ X,
(Tn x)n∈N ⊂ Y is Cauchy and by completeness of Y it therefore converges to some limit
which we call T x. This defines a linear operator T : X → Y . We claim that T is
bounded. Since (Tn )n∈N is a Cauchy sequence, it is a bounded sequence. Thus there
exists M > 0 such that kTn kop ≤ M for all n ∈ N. We have for x ∈ X,
kT xkY ≤ kT x − Tn xkY + kTn xkY ≤ kT x − Tn xkY + M kxkX .
Letting n → ∞ we get kT xkY ≤ M kxkX . So T is bounded with kT kop ≤ M . It
remains to show that Tn → T in L(X, Y ). That is, for all ε > 0 we need to find N ∈ N
such that
kTn x − T xkY ≤ εkxkX
for all n ≥ N and x ∈ X. Since (Tn )n∈N is a Cauchy sequence, there exists N ∈ N such
that
kTn x − Tm xkY ≤ 2ε kxkX
for all n, m ≥ N and x ∈ X. Fix x ∈ X. Then there exists mx ≥ N such that
kTmx x − T xkY ≤ 2ε kxkX .
Then if n ≥ N and x ∈ X,
kTn x − T xkY ≤ kTn x − Tmx xkY + kTmx x − T xkY ≤ εkxkX .

Definition 2.23. Let X be a normed vector space. Elements of L(X, K) are called
bounded linear functionals . L(X, K) is called the dual space of X and denoted X 0 .
Corollary 2.24. Dual spaces of normed vector spaces are Banach spaces.
Proof. This follows from Theorem 2.22 because K (which is R or C) is complete.

Theorem 2.25. If X is finite-dimensional, then X 0 is isomorphic to X.
Proof. Let {x1 , . . . , xn } ⊂ X be a basis. Then we can define a corresponding dual
basis of X 0 as follows: let fi ∈ X 0 , i ∈ {1, . . . , n} be the linear map given by fi (xi ) = 1
and fi (xj ) = 0 for j 6= i. Then we claim that {f1 , . . . , fn } is a basis of X 0 . Indeed, let
n
f ∈ X 0 . For x ∈ X we can write x = i=1 ci xi with uniquely determined ci ∈ K. Then
P
by linearity,
X n Xn
f (x) = ci f (xi ) = f (xi )fi (x),
i=1 i=1
because fi (x) = ci . Thus, the linear span of {f1 , . . . , fn } is X 0 . On the other hand,
suppose
Xn
bi f i = 0
i=1
for some coefficients (bi )i=1,...,n ⊂ K. Then for every j ∈ {1, . . . , n}, bj = ni=1 bi fi (xj ) =
P
0. Thus, {f1 , . . . , fn } is linearly independent. Thus, X 0 and X are isomorphic since
they have the same dimension. We can define an isomorphism φ : X → X 0 by xi 7→ fi
for i = 1, . . . , n. 
4. SEQUENTIAL `p SPACES* 39

4. Sequential `p spaces*
Definition 2.26. Let P 1 ≤ p < ∞. Then we define `p as the set of all sequences
(xn )n=1,2,... ⊂ C such that ∞ p p
n=1 |xn | < ∞. The ` -norm is defined as
X∞ 1/p
p
kxkp = |xn | .
n=1

If p ∈ [1, ∞] then the number p0 ∈ [1, ∞] such that 1


p
+ 1
p0
= 1 is called the Hölder dual
exponent of p.
Remark. The definition of `p extends to values of p < 1, but kxkp does not define
a norm if p < 1.
Our first goal is to show that k·kp really is a norm. To do that we need the following
generalization of the Cauchy-Schwarz inequality.
0
Theorem 2.27 (Hölder’s inequality). Let p ∈ [1, ∞] and x ∈ `p , y ∈ `p . Then

X
xn yn ≤ kxkp kykp0
n=1

We need an auxiliary Lemma which generalizes the usual inequality for two non-
negative numbers a, b
√ a+b
ab ≤
2
comparing the geometrical mean of a, b (i.e. the sidelength of the square whose area
equals the area of the rectangle with sides a and b) with the arithmetical mean (the
number half way between a and b).
Lemma 2.28. Let a, b ≥ 0.
(i) Let 0 < ϑ < 1. Then
a1−ϑ bϑ ≤ (1 − ϑ)a + ϑb.
(ii) (Young’s inequality) Let p ∈ (1, ∞). Then
0
ap b p
ab ≤ + 0.
p p
Proof. Clearly the inequality holds if one of a, b is 0. Also Check that if the
inequality is true for some a, b then it is also true for ta, tb where t > 0.
Assume now 0 < b ≤ a and let s = b/a. Then the stated inequality is equivalent
with sϑ ≤ (1 − ϑ) + ϑs for 0 ≤ s ≤ 1. Set f (s) = 1 − ϑ + ϑs − sϑ . Then f (1) = 0 and
f 0 (s) < 0 for 0 < s < 1, thus f (s) ≥ 0 for 0 ≤ s ≤ 1 which implies the desired iequality.
The case 0 < a ≤ b is shown in the same way (in fact follows from the previous case by
interchanging a, b and replacing ϑ by 1 − ϑ). This proves part (i).
0
For part (ii) set x = ap , y = bp , ϑ = 1 − 1/p and observe that the inequality is then
equivalent with x1−ϑ y ϑ ≤ (1 − ϑ)x + ϑy which holds by part (i). 
Proof of Hölder’s inequality. Observe that the inequality is true if either x
or y are 0. Check that if the inequality is true for some choice of x and y then it is also
true for sx, ty with s > 0, t > 0. Finally If p ∈ {1, ∞}, the inequality is trivial. So we
assume p ∈ (1, ∞).
40 2. LINEAR OPERATORS AND DERIVATIVES

By Young’s inequality,
∞ ∞ ∞
X 1X p 1X 0
|xn yn | ≤ |xn | + 0 |yn |p .
n=1
p n=1 p n=1
Observe that this yields the asserted inequality when kxkp = 1 and kykp0 = 1.
Also we have k kxkx
p
kp = 1, k kyky 0 = 1, and since the assertion holds for x/kxkp and
p
y/ky|p0 it holds also for x and y. 
Theorem 2.29 (Minkowski’s inequality). Let p ∈ [1, ∞]. For x, y ∈ `p ,
kx + ykp ≤ kxkp + kykp .
Proof. If p ∈ {1, ∞} the inequality is trivial. Thus we assume p ∈ (1, ∞). If
kx + ykp = 0, the inequality is also trivial, so we can assume kx + ykp > 0. Now we
write
X∞ ∞
X
p p−1
kx + ykp ≤ |xn ||xn + yn | + |yn ||xn + yn |p−1
n=1 n=1
Using Hölder’s inequality on both sums we obtain that this is
≤ kxkp kx + ykpp−1 p−1
0 (p−1) + kykp kx + ykp0 (p−1)

p
We have p0 (p − 1) = p−1
(p − 1) = p, so we have proved that
kx + ykpp = (kxkp + kykp )kx + ykp−1
p .

Dividing by kx + ykp−1
p gives the claim. 
We conclude that k · kp is a norm and `p a normed vector space.
Theorem 2.30. Let p ∈ (1, ∞). The dual space (`p )0 is isometrically isomorphic to
p0
` .
Proof. By ek we denote the sequence which is 1 at position k and 0 everywhere
else.
0
Then we define a map φ : (`p )0 → `p by φ(v) = (v(ek ))k . Clearly, this is a linear
0
map. First we need to show that φ(v) ∈ `p . Let v ∈ (`p )0 . For each n we define
x(n) ∈ `p by
0
(
|v(ek )|p
(n)
xk = v(ek )
if k ≤ n, v(ek ) 6= 0,
0 otherwise.
We have on the one hand
n
0
X
v(x(n) ) = |v(ek )|p .
n=1
And on the other hand
n
X 1/p
0
|v(x(n) )| ≤ kvkop kx(n) kp = kvkop |v(ek )|p .
k=1
0 p p
Here we have used that p(p − 1) = p( p−1 − 1) = p−1
= p0 . Combining these two we get
n
X  10
0 p
|v(ek )|p ≤ kvkop .
k=1
5. DERIVATIVES 41

Letting n → ∞ this implies that



X 1/p0
0
kφ(v)kp0 = |v(en )|p ≤ kvkop ,
n=1
0
so φ(v) ∈ `p . The calculation also shows that φ is bounded. It is easy to check that
0
φ is injective.
P∞ We show that it is surjective: let x ∈ `p . Then define v ∈ (`p )0 by
v(y) = n=1 xn yn . By Hölder’s inequality, v is well-defined. We have v(ek ) = xk , so
φ(v) = x. Thus φ is an isomorphism. It remains to show that φ is an isometry. We
have already seen that
kφ(v)kp0 ≤ kvkop
We leave it to the reader to verify the other inequality. 
Remark. It can be shown similarly that (`1 )0 = `∞ . However, the dual of `∞ is not `1 .
Corollary 2.31. `p is a Banach space for all p ∈ (1, ∞).
Remark. `1 and `∞ are also Banach spaces as we saw in Example 2.3 and Exercise 2.4.
Exercise 2.32. (i) Let 0 < p < ∞. For x ∈ Rn set kxkp = ( ni=1 |xi |p )1/p . Prove
P
that kxkp2 ≤ kxkp1 if p1 ≤ p2 .
Hint: First do this with the additional condition that kxkp1 = 1.
(ii) Show that `p ( `q if 1 ≤ p < q ≤ ∞.
(iii) Show that this inclusion extends to values of p1 , p2 ∈ (0, ∞).

5. Derivatives
Recall that a function f on an interval (a, b) is called differentiable at x ∈ (a, b) if
limh→0 f (x+h)−f
h
(x)
exists. In other words, if there exists a number T ∈ R such that
|f (x + h) − f (x) − T h|
lim = 0.
h→0 |h|
In that case we denote that real number T by f 0 (x). A real number can be understood
as a linear map R → R:
R −→ L(R, R), T 7−→ (x 7→ T · x)
That is, the linear map associated with a real number T is given by multiplication with
T . Interpreting the derivative at a given point as a linear map, we can formulate the
definition in the general setting of normed vector spaces.
Definition 2.33. Let X, Y be normed vector spaces and U ⊂ X open. A map
F : U → Y is called Fréchet differentiable (we also say differentiable ) at x ∈ U if there
exists T ∈ L(X, Y ) such that
kF (x + h) − F (x) − T hkY
(2.3) lim = 0.
h→0 khkX
In that case we call T the (Fréchet) derivative of F at x and write T = DF (x) or
T = DF |x . F is called (Fréchet) differentiable if it is differentiable at every point
x ∈ U . When X = Rn we also use the following terminology: F is totally differentiable
and DF (x) is the total derivative of F at x.
42 2. LINEAR OPERATORS AND DERIVATIVES

Before we move on we need to verify that DF (x) is well-defined. That is, that T is
uniquely determined by F and x. Suppose T, Te ∈ L(X, Y ) both satisfy (2.3). Then
kT h − TehkY ≤ kF (x + h) − F (x) − T hkY + kF (x + h) − F (x) − TehkY
Thus, by (2.3),
kT h − TehkY
−→ 0 as h → 0
khkX
In other words, for all ε > 0 there exists δ > 0 such that
(2.4) kT h − TehkY ≤ εkhkX
if khkX ≤ δ. By homogeneity of norms we argue that the inequality (2.4) must hold
for all h ∈ X: let h ∈ X, h 6= 0 be arbitrary. Then let h0 = δ khkh X . By homogeneity of
norms we have kh0 kX = δ. Thus,
kT h0 − Teh0 kY ≤ εkh0 kX = εδ.
Multiplying both sides by δ −1 khkX and using homogeneity of norms and linearity of T ,
we obtain
kT h − TehkY ≤ εkhkX
for all h ∈ X (it is trivial for h = 0). Since ε > 0 was arbitrary (and is independent of
h), this implies kT h − TehkY = 0, so T h = Teh for all h. Thus T = Te.

Reminder: Big-O and little-o notation. Let f, g be maps between normed


vector spaces X, Y, Z: f : U → Y, g : U → Z, U ⊂ X open neighborhood of 0.
• Big-O: We write
(2.5) f (h) = O(g(h)) as h → 0
to mean
kf (h)k
lim sup < ∞.
h→0 kg(h)k
This is equivalent to saying that there exists a C > 0 and δ > 0 such that
(2.6) kf (h)k ≤ Ckg(h)k
for all h with 0 < khk < δ.
• Little-o: Write
(2.7) f (h) = o(g(h)) as h → 0
to mean
kf (h)k
lim = 0.
h→0 kg(h)k

Comments.
• O and o are not functions and (2.5), (2.7) are not equations!
• This is an abuse of the inequality sign: it would be more accurate to define
O(g) as the class of functions that satisfy (2.6), say to write f ∈ O(g).
• One can think of (say) O(g) as a placeholder for a function which may change
at every occurrence of the symbol O(g) but always satisfies the respective
condition that it is dominated by a constant times kg(h)k if khk is small.
• For brevity, we may sometimes not write out the phrase ”as h → 0”.
5. DERIVATIVES 43

• There is nothing special about letting h tend to 0 in this definition. We can also
define o(g), O(g) with respect to another limit, for instance, say, as khk → ∞.
• If f (h) = o(g(h)), then f (h) = O(g(h)), but generally not vice versa.
• If f (h) = O(khkk ), then f (h) = o(khkk−ε ) for every ε > 0.

• f (h) = o(1) is equivalent to saying that f (h) → 0 as h → 0.


We can use little-o notation to restate the definition of derivatives in an equivalent
way: F is Fréchet differentiable at x if and only if there exists T ∈ L(X, Y ) such that
F (x + h) = F (x) + T h + o(khk) (as h → 0).
The derivative map T = DF |x provides a linear approximation to F (x + h) when
khk is small. Thus, in the same way as in the one-dimensional setting, the derivative
is a way to describe how the values of F change around a fixed point x.
Example 2.34. Let F : R2 → R be given by F (x1 , x2 ) = x1 cos(x2 ). We claim
that F is totally differentiable at every x = (x1 , x2 ) ∈ R2 . Indeed, let x ∈ R2 and
h = (h1 , h2 ) ∈ R2 \{0}. Then
F (x + h) = (x1 + h1 ) cos(x2 + h2 ) = x1 cos(x2 + h2 ) + h1 cos(x2 + h2 )
From Taylor’s theorem we have that
cos(t + ε) = cos(t) − sin(t)ε + O(ε2 ) as ε → 0
Thus,
F (x + h) = x1 cos(x2 ) − x1 sin(x2 )h2 + O(khk2 ) + h1 cos(x2 ) − h1 sin(x2 )h2 + O(khk2 )
F (x + h) − F (x) = h1 cos(x2 ) − x1 sin(x2 )h2 + O(khk2 )
This implies
F (x + h) = F (x) + T h + o(khk),
where we have set T h = h1 cos(x2 ) − x1 sin(x2 )h2 (this is a linear map R2 → R). So we
have proven that F is differentiable at x and
DF |x h = h1 cos(x2 ) − x1 sin(x2 )h2 .
Rx
Example 2.35. Let F : C([0, 1]) → C([0, 1]) be given by F (f )(x) = 0 f (t)2 dt.
Then F is Fréchet differentiable at every f ∈ C([0, 1]). Indeed, we compute
Z x Z x Z x Z x
2 2
F (f +h)(x)−F (f )(x) = (f (t)+h(t)) dt− f (t) dt = 2 f (t)h(t)dt+ h(t)2 dt
0 0 0 0
Rx
Set T (h)(x) = 2 0 f (t)h(t)dt. This is a bounded linear map:
Z 1
kT (h)k∞ ≤ 2 |f (t)h(t)|dt ≤ Ckhk∞ ,
0
R1
where C = 2 0
|f (t)|dt. We have
Z x
F (f + h)(x) − F (f )(x) − T (h)(x) = h(t)2 dt.
0
Thus Z x
kF (f + h) − F (f ) − T hk∞ ≤ sup h(t)2 dt
x∈[0,1] 0
Z 1
≤ |h(t)|2 dt ≤ sup |h(x)|2 = khk2∞
0 x∈[0,1]
44 2. LINEAR OPERATORS AND DERIVATIVES

This implies
1
kF (f + h) − F (f ) − T hk∞ ≤ khk∞ → 0
khk∞
Rx
as h → 0. Thus F is Fréchet differentiable at f and DF |f (h) = 2 0 f (t)h(t)dt.
We go on to discuss some of the familiar properties of derivatives. It follows directly
from the definition that DF |x is linear in F . That is, if F : U → Y, G : U → Y are
differentiable at x ∈ U and λ ∈ R, then the function F + λG : U → Y defined by
(F + λG)(x) = F (x) + λG(x) is differentiable at x and D(F + λG)|x = DF |x + λDG|x .
Theorem 2.36 (Chain rule). Let X1 , X2 , X3 be normed vector spaces and U1 ⊂
X1 , U2 ⊂ X2 open. Let x ∈ U1 and g : U1 → X2 , f : U2 → X3 such that g is Fréchet
differentiable at x, g(U1 ) ⊂ U2 and f is Fréchet differentiable at g(x). Then the function
f ◦ g : U1 → X3 defined by (f ◦ g)(x) = f (g(x)) is Fréchet differentiable at x and
D(f ◦ g)|x h = Df |g(x) Dg|x h
for all h ∈ X1 .
Proof. Let x, x + h ∈ U1 . We write
f (g(x + h)) − f (g(x)) − Df |g(x) Dg|x h
= f (g(x) + k) − f (g(x)) − Df |g(x) k + Df |g(x) (g(x + h) − g(x) − Dg|x h),
where k = g(x + h) − g(x). Using the triangle inequality and that Df |g(x) is a bounded
linear map we obtain
|f (g(x + h)) − f (g(x)) − Df |g(x) Dg|x hkX3
(2.8)
≤ kf (g(x) + k) − f (g(x)) − Df |g(x) kkX3 + kDf |g(x) kop kg(x + h) − g(x) − Dg|x hkX2
We have
(2.9) kkkX2 = kg(x + h) − g(x)kX2 ≤ kDg|x kop khkX1 + o(khkX1 ).
Dividing by khkX1 on both sides, (2.8) implies
1
kf (g(x + h)) − f (g(x)) − Df |g(x) Dg|x hkX3
khkX1
kkkX2 kf (g(x) + k) − f (g(x)) − Df |g(x) kkX3
≤ + o(1), as h → 0.
khkX1 kkkX2
By (2.9),
kkkX2
≤ kDg|x kop + 1
khkX1
if khkX1 is small enough. In particular, k → 0 as h → 0. Since f is differentiable at
g(x) we have that
kf (g(x) + k) − f (g(x)) − Df |g(x) kkX3
kkkX2
converges to 0 as h → 0 (since then k → 0). 
Theorem 2.37 (Product rule). Let X be a normed vector space, U ⊂ X open and
assume that F, G : U → R are differentiable at x ∈ U . Then the function F ·G : U → R,
(F · G)(x) = F (x)G(x) is also differentiable at x and
D(F · G) |x = F (x) · DG |x + G(x) · DF |x .
5. DERIVATIVES 45

Exercise 2.38. Prove this.


Definition 2.39. Let X, Y be normed vector spaces, U ⊂ X open, F : U → Y .
Let v ∈ X with v 6= 0. If the limit
F (x + hv) − F (x)
lim ∈Y (h ∈ K \ {0})
h→0 h
exists, then it is called the directional derivative (or Gâteaux derivative ) of F at x in
direction v and denoted Dv F |x .
Theorem 2.40. Let X, Y be normed vector spaces, U ⊂ X open and F : U → Y
Fréchet differentiable at x ∈ U . Then for every v ∈ X, v 6= 0, the directional derivative
Dv F |x exists and
(2.10) Dv F |x = DF |x v.
Proof. By definition
F (x + hv) − F (x) − DF |x (hv) = o(h) as h → 0.
Therefore,
F (x + hv) − F (x)
= DF |x v + o(1) as h → 0.
h
In other words, the limit as h → 0 exists and equals DF |x v. 
Example 2.41. Consider F : R2 → R, F (x) = x21 + x22 (where x = (x1 , x2 ) ∈ R2 ).
Let e1 = (1, 0), e2 = (0, 1). Then the directional derivatives De1 F |x and De2 F |x exist
at every point x ∈ R2 and
De1 F |x = 2x1 , De2 F |x = 2x2 .
Also, DF |x exists at every x and we can compute it using De1 F |x and De2 F |x : let
v ∈ R2 and write v = v1 e1 + v2 e2 where v1 , v2 ∈ R. Then
DF |x v = v1 DF |x e1 + v2 DF |x e2
By Theorem 2.40 this equals
v1 De1 F |x + v2 De2 F |x = 2x1 v1 + 2x2 v2 .
Remark. The converse of Theorem 2.40 is not true!
x3
Example 2.42. Let F : R2 → R be defined by F (x) = x2 +x 1
2 if x 6= 0 and F (0) = 0.
1 2
Then all directional derivatives Dv F |0 for v 6= 0 exist: for v = (v1 , v2 ),
v13
F (hv) − F (0) = h ,
v12 + v22
v3
so Dv F |0 = v2 +v
1
2 . But F is not totally differentiable at 0, otherwise we would have by
1 2
linearity of the total derivative,
Dv F |0 = DF |0 v = v1 De1 F |0 + v2 De2 F |0 = v1 ,
which is false.
46 2. LINEAR OPERATORS AND DERIVATIVES

6. Further exercises
Exercise 2.43. For x, y ∈ Rn define
n
X
ρp (x, y) = |xi − yi |p .
i=1

(i) Let 0 < p ≤ 1. Prove that ρp is a metric on Rn .


(ii) Let 1 < p < ∞. Prove that ρp is not a metric on Rn .
1/p
(iii) Let 0 < p < 1,Pn ≥ 2. Prove that ρp is not a metric on Rn .
(iv) Let kxkp = ( ni=1 |xi |p )1/p . Prove that if 0 < p < 1, n ≥ 2, neither kxkp nor
kxkpp define a norm on Rn .
P 1/p
n
Exercise 2.44. Let x ∈ Rn . Define kxkp = i=1 |x i |p
for 0 < p < ∞ and
kxk∞ = maxi=1,...,n |xi |.
(i) Show that limp→∞ kxkp = kxk∞ .
(ii) Show that limp→0 kxkp exists and determine its value (we also allow ∞ as a limit).
Exercise 2.45. Let C([0, 1]) be the space of continuous real-valued functions on
the interval [0, 1]. Assume 1 ≤ p < ∞.
(i) Show that for 1 ≤ p < ∞ the expression
Z 1 1/p
kf kp = |f (t)|p dt
0
defines a norm on C([0, 1]). You may choose to do part (iv) below first.
(ii) Let α < 1/p and define
(
t−α for n−1 ≤ t ≤ 1,
fn (t) =
nα for 0 ≤ t < n−1 .
Show that {fn }∞ n=0 is a Cauchy sequence in C([0, 1]), with respect to the norm k · kp .
Show that it is not a Cauchy sequence with respect to the usual sup-norm on C([0, 1]).
(iii) Show that C([0, 1]) with norm k · kp is not complete.
(iv) Let R([0, 1]) be the space of Riemann integrable functions defined on the interval
[0, 1]. Show that if f is Riemann integrable then |f |p is also Riemann integrable. Show
that kf kp defines a seminorm on R([0, 1]) but not a norm.
(v) Show for f, g ∈ R([0, 1]), 1 < p < ∞, p1 + p10 = 1,
Z 1
f (t)g(t)dt ≤ kf kp kgkp0 .
0
(vi) Show for f ∈ R([0, 1]),
kf kp1 ≤ kf kp2 if p1 ≤ p2 .
t
Exercise 2.46. Let C(R) be the set of continuous functions on R. Let w(t) = 1+t
for t ≥ 0. Define

X
d(f, g) = 2−k w( sup |f (x) − g(x)|).
k=0 x∈[−k,k]

(i) Show that d is a well-defined metric.


(ii) Show that C(R) is complete with this metric.
(iii) Show that there exists no norm k · k on C(R) such that d(f, g) = kf − gk.
6. FURTHER EXERCISES 47

Exercise 2.47. Consider the space `1 of absolutely summable sequences of complex


numbers. LetP p, q ∈ [1, ∞] with p 6= q. Then k · kp and k · kq are norms on `1 (recall
that kakp = ( ∞ p 1/p
n=1 |an | ) for p ∈ [1, ∞) and kak∞ = supn∈N |an |). Show that k · kp
and k · kq are not equivalent.
Exercise 2.48.
R 1 Let X be the space of continuous functions on [0, 1] equipped with
the norm kf k = 0 |f (t)|dt. Define a linear map T : X → X by
Z x
T f (x) = f (t)dt.
0

Show that T is well-defined and bounded and determine the value of kT kop .
Exercise 2.49. Let X, Y be normed vector spaces and F : X → Y a map.
(i) Show that F is continuous if it is Fréchet differentiable.
(ii) Prove that F is Fréchet differentiable if it is linear and bounded.
Exercise 2.50. Let V , W be normed vector spaces and let T : V → W be a
bounded linear transformation. Show that T is differentiable everywhere and compute
the derivative DTv for all v ∈ V .
Exercise 2.51. Let X = C([0, 1]) be the Banach space of continuous functions on
[0, 1] (with the supremum norm) and define a map F : X → X by
Z s
F (f )(s) = cos(f (t)2 )dt, s ∈ [0, 1].
0

(i) Show that F is Fréchet differentiable and compute the Fréchet derivative DF |f for
each f ∈ X.
(ii) Show that F X = {F (f ) : f ∈ X} ⊂ X is relatively compact.

Exercise 2.52. Let Rn×n denote the space of real n × n matrices equipped with
the matrix norm kAk = supkxk=1 kAxk. Define
F : Rn×n −→ Rn×n , F (A) = A2 .
Show that F is totally differentiable and compute DF |A .
Exercise 2.53. (i) Is there a constant C such that for all continuous functions f
on [0, 2] the inequality
Z 2
|f (t)|dt ≤ C max |f (x)|
0 0≤x≤2

holds? Is there a constant C such that for all continuous functions f on [0, 2] the reverse
inequality
Z 2
max |f (x)| ≤ C |f (t)|dt
0≤x≤1 0
holds? The expressions on the both sides of the above inequalities define norms on
C([0, 1]). Are these equivalent norms?
(ii) True or false: There is a constant Cn such that for all polynomials P of degree
≤ n we have
Z 2
max |P (x)| ≤ Cn |P (t)|dt .
0≤x≤2 0
48 2. LINEAR OPERATORS AND DERIVATIVES

What about the analogous question concerning the inequality


Z 10−10
max |P (x)| ≤ Cn |P (t)|dt ?
0≤x≤200 0

Exercise 2.54. Let 1 ≤ p ≤ ∞, V = R , W = Rm and use on V the norm k · k1


n

and on W the norm k · kp . Let T : V → W be defined by T (x) = Ax where A is an


m × n matrix. Show that the operator norm of T is given by
Xm 1/p
kT k1→p = max |aij |p .
j=1,...n
i=1

Exercise 2.55. Let 1 ≤ p ≤ ∞, p1 + 1q = 1, V = Rn , W = Rm and use on V the


norm k · kp and on W the norm k · k∞ . Let T : V → W be defined by T (x) = Ax where
A is an m × n matrix. Show that the operator norm of T is given by
X n 1/q
q
kT kp→∞ = max |aij | .
i=1,...m
j=1
CHAPTER 3

Differential calculus in Rn

In this section we study the differential calculus of maps f : U P → Rm , U ⊂ Rn


open. We shall use the Euclidean norms on Rn and Rm , i.e. kxk = ( nj=1 |xj |2 )1/2 for
x ∈ Rn and kyk = ( m 2 1/2
for y ∈ Rm . In this setting we refer to the Fréchet
P
i=1 |yi | )
derivative as total derivative . Whenever we speak of functions in this section, we mean
real-valued functions.
Definition 3.1. By ek we denote the kth unit vector in Rn . Then the directional
derivative in the direction ek is called kth partial derivative and denoted by ∂k f (x) or
∂xk f (x) (if it exists).
If f is totally differentiable at a point x = (x1 , . . . , xn ) ∈ U , then we can compute
its total derivative in terms of the partial derivatives by using (2.10):
n
X n
X
(3.1) Df |x h = hj Df |x ej = hj ∂j f (x) ∈ Rm .
j=1 j=1

By definition, Df |x is a linear map Rn → Rm . It is therefore given by multiplication


with a real m × n matrix. We will denote this matrix also by Df |x and call it the
Jacobian matrix of f at x . From (2.10) we conclude that the jth column vector of this
matrix is given by ∂j f (x) ∈ Rm . Therefore the Jacobian matrix is given by
∂1 f1 (x) · · · ∂n f1 (x)
 

Df |x = (∂j fi (x))i,j =  .. .. ..  ∈ Rm×n ,


. . .
∂1 fm (x) · · · ∂n fm (x)
where f (x) = (f1 (x), . . . , fm (x)) ∈ Rm . If m = 1, then the gradient of f at x is defined
as1
∂1 f (x)
 

∇f (x) = Df |Tx =  ..  ∈ Rn .
.
∂n f (x)
(Note that n × 1 matrices are identified with vectors in Rn : Rn×1 = Rn .)
Example 3.2. Let F : R3 → R2 be defined by F (x) = (x1 x2 sin(x3 ), x22 −ex1 ). Then
F is totally differentiable and the Jacobian is given by
 
x2 sin(x3 ) x1 sin(x3 ) x1 x2 cos(x3 )
DF |x = .
−ex1 2x2 0
Recall that a set A ⊂ Rn is called convex if tx + (1 − t)y ∈ A for every x, y ∈ A,
t ∈ [0, 1].

1Here M T ∈ Rn×m denotes the transpose of the matrix M ∈ Rm×n .


49
50 3. DIFFERENTIAL CALCULUS IN Rn

Theorem 3.3 (Mean value theorem). Let U ⊂ Rn be open and convex. Suppose
that f : U → R is totally differentiable on U . Then, for every x, y ∈ U , there exists
ξ ∈ U such that
f (x) − f (y) = Df |ξ (x − y)
and there exists t ∈ [0, 1] such that ξ = tx + (1 − t)y.

The idea of the proof is to apply the one-dimensional mean value theorem to the
function restricted to the line passing through x and y.

Proof. If x = y there is nothing to show. Let x 6= y. Define g : [0, 1] → R by


g(t) = f (tx + (1 − t)y). The function g is continuous on [0, 1] and differentiable on
(0, 1). By the one-dimensional mean value theorem there exists t0 ∈ [0, 1] such that
g(1) − g(0) = g 0 (t0 ). By the chain rule,

g 0 (t0 ) = Df |t0 x+(1−t0 )y (x − y).

Corollary 3.4. Under the assumptions of the previous theorem: if Df |x = 0 for


all x ∈ U , then f is constant.

Exercise 3.5. Show that the conclusion of the corollary also holds under the weaker
assumption that U is open and connected (rather than convex). Hint: Consider
overlapping open balls along a continuous path connecting two given points in U .

Definition 3.6. A map f : U → Rm , U ⊂ Rn open, is called continuously dif-


ferentiable (on U ) if it is totally differentiable on U and the map U → L(Rn , Rm ),
x 7→ Df |x is continuous. We denote the collection of continuously differentiable maps
by C 1 (U, Rm ). If m = 1 we also write C 1 (U, R) = C 1 (U ).

Remark. For f : U → R, continuity of the map U → Rn , x 7→ ∇f (x) is equivalent to


continuity of the map U → L(Rn , R), x 7→ Df |x .

Theorem 3.7. Let U ⊂ Rn be open. Let f : U → R. Then f ∈ C 1 (U ) if and


only if ∂j f (x) exists for every j ∈ {1, . . . , n} and x 7→ ∂j f (x) is continuous on U for
j ∈ {1, . . . , n}.

Remark. Without additional assumptions (such as continuity of x 7→ ∂j f (x)), existence


of partial derivatives does not imply total differentiability.

Exercise 3.8. Let F : R2 → R be defined by F (x) = xx21+x x2


2 if x 6= 0 and F (0) = 0.
1 2
(i) Show that the partial derivatives ∂1 F (x), ∂2 F (x) exist for every x ∈ R2 .
(ii) Show that F is not continuous at (0, 0).
(iii) Determine at which points F is totally differentiable.

Proof. Let f ∈ C 1 (U ). Then ∂j f (x) exists by Theorem 2.40 and x 7→ ∂j f (x)


is continuous because it can be written as the composition of the continuous maps
x 7→ ∇f (x) and πj : Rn → R, x 7→ xj : ∂j f (x) = (πj ◦ ∇f )(x).
Conversely, assume that ∂j f (x) exists for every x ∈ U , j ∈ {1, . . . , n} and x 7→ ∂j f (x) is
continuous. Let x ∈ U . Write h = nj=1 hj ej and define vk = kj=1 hj ej for 1 ≤ k ≤ n
P P
3. DIFFERENTIAL CALCULUS IN Rn 51

and v0 = 0. Then, if khk is small enough so that x + h ∈ U , then


f (x + h) − f (x) =
f (x + vn ) − f (x + vn−1 ) + f (x + vn−1 ) − f (x + vn−2 ) + · · · + f (x + v1 ) − f (x + v0 )
Xn
= (f (x + vj ) − f (x + vj−1 )).
j=1

By the one-dimensional mean value theorem there exists tj ∈ [0, 1] such that
f (x + vj ) − f (x + vj−1 ) = f (x + vj−1 + hj ej ) − f (x + vj−1 ) = ∂j f (x + vj−1 + tj hj ej )hj .
By continuity of ∂j f , for every ε > 0 exists δ > 0 such that
|∂j f (y) − ∂j f (x)| ≤ ε/n for all j = 1, . . . , n,
whenever y ∈ U is such that kx − yk ≤ δ. We may choose δ small enough so that
x + h ∈ U whenever khk ≤ δ. Then, if khk ≤ δ (then also kvj k ≤ δ, kvj−1 + tj hj ej k ≤ δ)
we get
n
X n
X
f (x + h) − f (x) − hj ∂j f (x) ≤ f (x + vj ) − f (x + vj−1 ) − hj ∂j f (x)
j=1 j=1
n n
X X ε
= |hj ||∂j f (x + vj−1 + tj hj ej ) − ∂j f (x)| ≤ |hj | ≤ εkhk.
j=1 j=1
n

Therefore, f (x + h) − f (x) − Df |x h = o(h), where


n
X
Df |x h = hj ∂j f (x),
j=1

so f is differentiable at x. Also, x 7→ ∇f (x) is continuous, because the ∂j f are contin-


uous. 
To conclude this introductory section, we discuss some variants of the mean value
theorem that will be useful later.
Theorem 3.9 (Mean value theorem, integral version). Let U ⊂ Rn be open and
convex and f ∈ C 1 (U ). Then for every x, y ∈ U ,
Z 1
f (x) − f (y) = Df |tx+(1−t)y (x − y)dt.
0

Proof. Let g(t) = f (tx + (1 − t)y). By the fundamental theorem of calculus and
the chain rule,
Z 1 Z 1
0
f (x) − f (y) = g(1) − g(0) = g (s)ds = Df |tx+(1−t)y (x − y)dt.
0 0

Theorem 3.10 (Mean value theorem, vector-valued case). Let U ⊂ Rn be open and
convex and F ∈ C 1 (U, Rm ). Then for every x, y ∈ U there exists θ ∈ [0, 1] such that
kF (x) − F (y)k ≤ kDF |ξ kop kx − yk,
where ξ = θx + (1 − θ)y.
52 3. DIFFERENTIAL CALCULUS IN Rn

Proof. Write F = (F1 , . . . , Fm ). Then by Theorem 3.9


Z 1
Fi (x) − Fi (y) = DFi |tx+(1−t)y (x − y)dt.
0
This implies Z 1
F (x) − F (y) = DF |tx+(1−t)y (x − y)dt.
0
By the triangle inequality, we have
Z 1
kF (x) − F (y)k ≤ kDF |tx+(1−t)y kop dtkx − yk
0
The map [0, 1] → R, t 7→ kDF |tx+(1−t)y kop is continuous (because F is C 1 ) and therefore
assumes its supremum at some point θ ∈ [0, 1]. Define ξ = θx + (1 − θ)y. Then
kF (x) − F (y)k ≤ kDF |ξ kop kx − yk.

Remark. If m ≥ 2 and F : U → Rm is C 1 and x, y ∈ U , then it is not necessarily true
that there exists ξ ∈ U such that F (x) − F (y) = DF |ξ (x − y).
Exercise 3.11. Find a C 1 map F : R → R2 and points x, y ∈ R such that there
does not exist ξ ∈ R such that F (x) − F (y) = DF |ξ (x − y).
Exercise 3.12. Let U ⊂ Rn be open and convex and F : U → U a differentiable
map. Show: If there exists c ∈ (0, 1) such that kDF |x kop ≤ c for all x ∈ U , then F is a
contraction of U .

1. Inverse function theorem


In this section we will see how the contraction principle can be applied to find (local)
inverses of maps between open sets in Rn , in other words to solve equations of the form
f (x) = y.
Definition 3.13. Let E ⊂ Rn . We say that a map f : E → Rn is locally invertible
at a ∈ E if there exist open sets U, V ⊂ Rn such that U ⊂ E, a ∈ U , f (a) ∈ V and
a function g : V → U such that g(f (x)) = x for all x ∈ U and f (g(y)) = y for all
y ∈ V . In that case we call g a local inverse of f (at a) and denote it by f |−1
U (this is
consistent with usual notation of inverse functions, because the restriction f |U of f to
U is an invertible map U → V ).
Theorem 3.14 (Inverse function theorem). Let E ⊂ Rn be open and let f : E → Rn
be differentiable on E. Let a ∈ E and assume that Df |a is invertible and that x 7→ DFx
is continuous at a. Then f is locally invertible at a in some open neighborhood U ⊂ E
of a with (f |U )−1 differentiable on V = f (U ), and we have for all x ∈ U
D(f |−1 −1
U )|f (x) = (Df |x ) .

(ii) If f ∈ C 1 (E < Rn ) the (f |U )−1 ∈ C 1 (V, Rn )


Proof. We want to apply the contraction principle. For fixed y ∈ Rn , consider the
map
ϕy (x) = x + Df |−1a (y − f (x)) (x ∈ E)
Then f (x) = y if and only if x is a fixed point of ϕy . Calculate
Dϕy |x = I − Df |−1 −1
a Df |x = Df |a (Df |a − Df |x ).
1. INVERSE FUNCTION THEOREM 53

Let λ = kDf |−1a kop . By continuity of Df at a, there exists an open ball U ⊂ E such
that
1
kDf |a − Df |x kop ≤ for x ∈ U.

Then for x, x0 ∈ U
kϕy (x) − ϕy (x0 )k ≤ kDϕy |ξ kop kx − x0 k
≤ kDf |−1 0 1 0
a kop kDf |a − Df |ξ kop kx − x k ≤ 2 kx − x k.
Note that this doesn’t show that ϕy is a contraction, because ϕy (U ) may not be con-
tained in U . However, it does show that ϕy has at most one fixed point (by the same
argument used to show uniqueness in the Banach fixed point theorem). This already
implies that f is injective on U : for every y ∈ Rn we have f (x) = y for at most one
x ∈ U . Let V = f (U ). Then f |U : U → V is a bijection and has an inverse g : V → U .

Claim. V is open.
Proof of claim. Let y0 ∈ V . We need to show that there exists an open ball
around y0 that is contained in V . Since V = f (U ) there exists x0 ∈ U such that
f (x0 ) = y0 . Let r > 0 be small enough so that Br (x0 ) ⊂ U (possible because U is
open). Let ε > 0 and y ∈ Bε (y0 ). We will demonstrate that if ε > 0 is small enough,
then ϕy maps Br (x0 ) into itself. First note
kϕy (x0 ) − x0 k = kDf |−1
a (y − y0 )k ≤ λε.
r
Hence, choosing ε ≤ 2λ
, we get for x ∈ Br (x0 ) that
kϕy (x) − x0 k ≤ kϕy (x) − ϕy (x0 )k + kϕy (x0 ) − x0 k
≤ 21 kx − x0 k + r
2
≤ r
2
+ r
2
= r.
Thus ϕy (x) ∈ Br (x0 ). This proves ϕy (Br (x0 )) ⊂ Br (x0 ), so ϕy is a contraction of Br (x0 ).
By the Banach fixed point theorem, ϕy must have a unique fixed point x ∈ Br (x0 ). So
by definition of ϕy we have f (x) = y, so y ∈ f (U ) = V . Therefore we have shown that
Bε (y0 ) ⊂ V , so V is open. 
It remains to show that g is differentiable on V and Dg|f (a) = Df |−1
a . We use the
following lemma.
Lemma 3.15. Let A, B ∈ Rn×n such that A is invertible and
(3.2) kB − Ak · kA−1 k < 1.
Then B is invertible. (Here k · k denotes the matrix norm, which is just the operator
norm: kAk = supkxk=1 kAxk.)
In other words, if a matrix A is invertible and B is a “small” perturbation of A
(“small” in the sense that (3.2) holds), then B is also invertible.
Proof. It suffices to show that B is injective. Let x 6= 0. Then we need to show
Bx 6= 0. Indeed,
kxk = kA−1 Axk ≤ kA−1 k · kAxk ≤ kA−1 k(k(A − B)xk + kBxk)
≤ kA−1 k · kB − Ak · kxk + kA−1 kkBxk,
which implies kA−1 kkBxk ≥ (1 − kA−1 k · kB − Ak)kxk > 0, so Bx 6= 0. 
54 3. DIFFERENTIAL CALCULUS IN Rn

Let y ∈ V ; there exists an x ∈ U such that f (x) = y. From the above,


kDf |−1 1
a kkDf |x − Df |a k ≤ 2 .

By the lemma, Df |x is invertible, and we need to show that g is differentiable at y and


that the total derivative is equal to Df |−1
x . Let k be such that y + k ∈ V . Then there
exists h such that y + k = f (x + h). We have
khk ≤ kh − Df |−1 −1
a kk + kDf |a kk

and
h − Df |−1 −1
a k = h + Df |a (f (x) − f (x + h)) = ϕy (x + h) − ϕy (x),
so kh − Df |−1 1
a kk ≤ 2 khk. Therefore, khk ≤ 2λkkk → 0 as kkk → 0. Now we compute

g(y + k) − g(y) − Df |−1 −1


x k = x + h − x − Df |x k

= h − Df |−1 −1
x (f (x + h) − f (x)) = −Df |x (f (x + h) − f (x) − Df |x h)

and so
1 −1 kf (x + h) − f (x) − Df |x hk khk
kg(y + k) − g(y) − Df |−1
x kk ≤ kDf |x k
kkk khk kkk
kf (x + h) − f (x) − Df |x hk
≤ kDf |−1
x k 2λ −→ 0 as k → 0.
khk
Therefore g is differentiable at y with Dg|y = Df |−1
x . This finishes the proof of part (i)
of Theorem 3.14.
To prove part (ii) we assume f is of class C 1 , and it remains to show that Dg is
continuous. To show this we need another lemma.
Lemma 3.16. Let GL(n) denote the space of real invertible n × n matrices (equipped
with some norm). The map GL(n) → GL(n) defined by A 7→ A−1 is continuous.
This lemma follows because the entries of A−1 are rational functions with non-
vanishing denominator in terms of the entries of A (by Cramer’s rule).
Since Dg|y = Df |−1
x and compositions of continuous maps are continuous (Df is
continuous by assumption), we have that Dg must be continuous, so g ∈ C 1 (V, U ). 
Remark. If f is locally invertible at every point, it is not necessarily (globally) invertible
(that is, bijective).
Example 3.17. Let f : R2 → R2 be given by f (x) = (ex2 sin(x1 ), ex2 cos(x1 )). Then
 x 
e 2 cos(x1 ) ex2 sin(x1 )
Df |x = .
−ex2 sin(x1 ) ex2 cos(x1 )
Thus det Df |x = e2x2 (cos(x1 )2 + sin(x1 )2 ) = e2x2 6= 0, so by Theorem 3.14, f is locally
invertible at every point x ∈ R2 . f is not bijective: it is not injective because, for
instance, f (0, 0) = f (2π, 0).

2. Implicit function theorem


We will now use the inverse function theorem to prove a significant generalization
called the implicit function theorem. Let E ⊂ Rn × Rm be an open set and f : E → Rm
a C 1 map. Consider the zero set of f given by
Z = {(x, y) ∈ E : f (x, y) = 0}.
2. IMPLICIT FUNCTION THEOREM 55

It is natural to ask when Z is locally the graph of a function. The implicit function
theorem gives a satisfactory answer. More precisely, given a point (x0 , y0 ) ∈ Z we ask
whether there exists an open neighborhood of (x0 , y0 ) so that Z intersected with that
neighborhood is given as the graph of a C 1 function in the sense that there exists g so
that f (x, g(x)) = 0 for x close to x0 . Another way to think of this is that we would like
to solve the system of equations given by f (x, y) = 0 for y, when x is given (this seems
reasonable since there are m equations and m unknowns).
Theorem 3.18 (Implicit function theorem). Let E ⊂ Rn × Rm be open, f ∈
C 1 (E, Rm ) and (x0 , y0 ) ∈ Z such that the matrix Dy f |(x0 ,y0 ) ∈ Rm×m is invertible.
Then there exist open neighborhoods U, V of x0 , y0 , respectively and a C 1 function
g : U → V so that
Z ∩ (U × V ) = {(x, g(x)) : x ∈ U }.
In other words, U × V ⊂ E and f (x, g(x)) = 0 for all x ∈ U . Moreover,
(3.3) Dg|x0 = −(Dy f |(x0 ,y0 ) )−1 Dx f |(x0 ,y0 ) .
Here, Dx f |(x0 ,y0 ) ∈ Rm×n denotes the Jacobian matrix of the function x 7→ f (x, y0 )
at x0 , and Dy f |(x0 ,y0 ) ∈ Rm×m the Jacobian matrix of the function y 7→ f (x0 , y) at y0 .
It is instructive to observe that the relation (3.3) follows from an application of the
chain rule when taking derivatives on both sides of the identity
f (x, g(x)) = 0
with respect to x. This is also known as implicit differentiation.
The formula (3.3) is especially useful in cases when it is difficult or impossible to
determine the implicit function g algebraically.
Proof. The proof is an application of the inverse function theorem, Theorem 3.14.
Define a map F : E → Rn × Rm by
F (x, y) = (x, f (x, y)).
Then F is C 1 and DF |(x0 ,y0 ) is given by the (n + m) × (n + m) block matrix
 
In 0
Dx f |(x0 ,y0 ) Dy f |(x0 ,y0 ) ,
where In denotes the n × n identity matrix. Thus det DF |(x0 ,y0 ) = det Dy f |(x0 ,y0 ) 6= 0,
so DF |(x0 ,y0 ) is invertible. By Theorem 3.14, F is therefore locally invertible at (x0 , y0 ).
As a consequence, there exist an open neighborhood U 0 of x0 and an open neighborhood
V of y0 so that U 0 × V ⊂ E, F (U × V ) ⊂ Rn × Rm is open and F |U 0 ×V is invertible
with a C 1 inverse
G : F (U 0 × V ) → U 0 × V.
Let U = {x ∈ U 0 : (x, 0) ∈ F (U 0 × V )} ⊂ Rn . Then U is open, because F (U 0 × V ) is
open. Also, x0 ∈ U . For x ∈ U we can now define g(x) by G(x, 0) = (x, g(x)). Then
g(x) ∈ V and
(x, f (x, g(x))) = F (x, g(x)) = F (G(x, 0)) = (x, 0),
so f (x, g(x)) = 0 for all x ∈ U . Moreover, g is C 1 and satisfies (3.3). 
Example 3.19. Let n = m = 1 and f (x, y) = x2 + y 2 − 1. Then Z is the unit circle
around the origin, which is locally a graph at every point with y0 6= 0. Coincidentally,
Dy f |(x,y) = 2y 6= 0 if and only if y 6= 0.
56 3. DIFFERENTIAL CALCULUS IN Rn

In this case an implicit function g can be determined explicitly: if say (x0 , y0 ) = (0, 1),
then g : (−1, 1) → R with √
g(x) = 1 − x2
is C 1 and satisfies f (x, g(x)) = 0.
Example 3.20. Let n = m = 1 and f (x, y) = x2 − y 3 . Then Z is a cubic curve with
a cusp singularity at the origin. In this case, Z is (globally) the graph of the function
g : R → R with g(x) = |x|2/3 . However,
Dy f |(x,y) = −3y 2 .
so the implicit function theorem does not apply at the cusp (x0 , y0 ) = (0, 0) ∈ Z. This
is consistent with the fact that g is not C 1 at zero.
Example 3.21. Let n = m = 1 and f (x, y) = (y − x)(y + x). Then Z is locally the
graph of a function at every point except for the origin, where it has a self-intersection.
3. Ordinary differential equations
In this section we study initial value problems of the form
 0
y (t) = F (t, y(t))
y(t0 ) = y0 ,
where E ⊂ R × R is open, (t0 , y0 ) ∈ E and F ∈ C(E) are given. We say that a
differentiable function y : I → R defined on some open interval I ⊂ R that includes the
point t0 ∈ I is a solution to the initial value problem if (t, y(t)) ∈ E for all t ∈ I and
y(t0 ) = y0 and y 0 (t) = F (t, y(t)) for all t ∈ I. The equation y 0 (t) = F (t, y(t)) is a first
order ordinary differential equation . We also write this differential equation in short
form as
y 0 = F (t, y).
Geometric interpretation. At each point (t, y) ∈ E imagine a small line segment
with slope F (t, y). We are looking for a function such that its graph has the slope
F (t, y) at each point (t, y) on the graph of the function.

Figure 1. Visualization of F (t, y).


3. ORDINARY DIFFERENTIAL EQUATIONS 57

Example 3.22. Consider the equation y 0 = yt . The solutions of this equation are
of the form y(t) = ct for c ∈ R.
Example 3.23. Sometimes we can solve initial value problems by computing an
explicit expression for y. Recall for instance that solving differential equations of the
form y 0 = f (t)g(y) is easy (by separation of variables ). Consider for instance
 0 t
y (t) = y(t)
y(t0 ) = y0
p
for (t0 , y0 ) ∈ (0, ∞) × (0, ∞). Then y(t) = t2 + y02 − t20 . Note that if y02 − t20 ≥ 0,
2 2
p y is defined on I = (0, ∞). But if y0 − t0 < 0, then y is only defined on I =
then
( t20 − y02 , ∞) 3 t0 .
In general, however it is not easy to find a solution. It may also happen that the
solution is not expressible in terms of elementary functions. Try for instance, to solve
the initial value problem
2 2
 0
y (t) = ey(t) t sin(t + y(t)),
y(1) = 5.
Theorem 3.24 (Picard-Lindelöf). Let E ⊂ R × R be open, (t0 , y0 ) ∈ E, F ∈ C(E).
Let a > 0 and b > 0 be small enough such that
R = {(t, y) ∈ R2 : |t − t0 | ≤ a, |y − y0 | ≤ b} ⊂ E.
Let M = sup(t,y)∈R |F (t, y)| < ∞. Assume that there exists c ∈ (0, ∞) such that
(3.4) |F (t, y) − F (t, u)| ≤ c|y − u|
for all (t, y), (t, u) ∈ R. Define a∗ = min(a, b/M ) and let I = [t0 − a∗ , t0 + a∗ ]. Then
there exists a unique solution y : I → R to the initial value problem
 0
y (t) = F (t, y(t)),
(3.5)
y(t0 ) = y0 .

(t0 , y0 )
t

Figure 2. Visualization of F (t, y).


58 3. DIFFERENTIAL CALCULUS IN Rn

Remarks. 1. If F satisfies condition (3.4), we also say that F is Lipschitz continuous


in the second variable . Note that the solution interval I guaranteed by the theorem is
independent on the Lipschitz constant.
2. The condition (3.4) follows if F is differentiable in the second variable and
|∂y F (t, y)| ≤ c
for every (t, y) ∈ R (by the mean value theorem).
3. By the fundamental theorem of calculus, the initial value problem (3.5) is equivalent
to the integral equation
Z t
y(t) = y0 + F (s, y(s))ds.
t0

Corollary 3.25. Let E ⊂ R × R open, (t0 , y0 ) ∈ E, F ∈ C 1 (E). Then there exists


an interval I ⊂ R and a unique differentiable function y : I → R such that (t, y(t)) ∈ E
for all t ∈ I and y solves (3.5).
This is true because (3.4) follows from the mean value theorem and continuity of
the second derivative ∂y F .
Proof of Theorem 3.24. Let J = [y0 − b, y0 + b]. It suffices to show that there
exists a unique continuous function y : I → J such that
Z t
y(t) = y0 + F (s, y(s))ds
t0

(that is, y is a solution of the integral equation). Let


Y = {y : I → J : y continuous on I}.
For every y ∈ Y, t 7→ F (t, y(t)) is a well-defined continuous function on I. Define
Z t
T y(t) = y0 + F (s, y(s))ds.
t0

Claim. T Y ⊂ Y.
Proof of claim. Let y ∈ Y. Then T y is a continuous function on I. It remains
to show that T y(t) ∈ J for all t ∈ I. Recalling that |F (t, y)| ≤ M for all (t, y) ∈ R we
obtain: Z t
|T y(t) − y0 | ≤ |F (s, y(s))|ds ≤ |t0 − t|M ≤ M a∗ ≤ b,
t0
where we used that a∗ = min(a, b/M ) ≤ b/M . 
To apply the contraction principle we need to equip Y with a metric such that
T : Y → Y is a contraction and Y is complete. We could be tempted to try the
usual supremum metric d∞ (g1 , g2 ) = supt∈I |g1 (t) − g2 (t)|. Then Y ⊂ C(I) is closed, so
(Y, d∞ ) is a complete metric space. However, T will not necessarily be a contraction2
with respect to d∞ . Instead, we define the metric
d∗ (g1 , g2 ) = sup e−2c|t−t0 | |g1 (t) − g2 (t)|.
t∈I

2For the supremum metric to give rise to a contraction we would need to make the interval I
smaller.
3. ORDINARY DIFFERENTIAL EQUATIONS 59

Then d∗ (g1 , g2 ) ≤ d∞ (g1 , g2 ) ≤ e2ca∗ d∗ (g1 , g2 ). In other words, d∗ and d∞ are equivalent
metrics. This implies that (Y, d∗ ) is still complete.
Claim. T : Y → Y is a contraction with respect to d∗ .
Proof of claim. For g1 , g2 ∈ Y, t ∈ I we have by (3.4),
Z t
|T g1 (t) − T g2 (t)| = (F (s, g1 (s)) − F (s, g2 (s)))ds
t0
Z t
≤c |g1 (s) − g2 (s)|ds.
t0

Let us assume that t ∈ [t0 , t0 + a∗ ]. Then


Z t Z t
|T g1 (t) − T g2 (t)| ≤ c |g1 (s) − g2 (s)|ds ≤ c e2c(s−t0 ) d∗ (g1 , g2 )ds
t0 t0
1
= cd∗ (g1 , g2 ) (e2c(t−t0 ) − 1) ≤ 12 d∗ (g1 , g2 )e2c|t−t0 |
2c
Similarly, for t ∈ [t0 − a∗ , t0 ] we also have
|T g1 (t) − T g2 (t)| ≤ 21 d∗ (g1 , g2 )e2c|t−t0 | .
Thus,
e−2c|t−t0 | |T g1 (t) − T g2 (t)| ≤ 21 d∗ (g1 , g2 )
holds for all t ∈ I, so d∗ (T g1 , T g2 ) ≤ 12 d∗ (g1 , g2 ). 
By the Banach fixed point theorem, there exists a unique y ∈ Y such that T y = y,
i.e. a unique solution to the initial value problem (3.5). 
Remarks. 1. The proof is constructive. That is, it tells us how to compute the solution.
This is because the proof of the Banach fixed point theorem is constructive. Indeed,
construct a sequence (yn )n≥0 ⊂ Y by y0 (t) = y0 and
Z t
yn (t) = y0 + F (s, yn−1 (s))ds for n = 1, 2, . . .
t0

Then (yn )n≥0 converges uniformly on I to the solution y. This method is called Picard
iteration .
2. Note that the length of the existence interval I does not depend on the size of the
constant c in (3.4).
Example 3.26. Consider the initial value problem
( t
y 0 (t) = e sin(t+y(t))
ty(t)−1
,
(3.6)
y(1) = 5.
t
Let F (t, y) = e sin(t+y)
ty−1
. We need to choose a rectangle R around the point (1, 5) where
we have control over |F (t, y)| and |∂y F (t, y)|. Thus we need to stay away from the set
of (t, y) such that ty − 1 = 0. Say,
R = {(t, y) : |t − 1| ≤ 21 , |y − 5| ≤ 1}.
Then for (t, y) ∈ R:
|ty − 1| ≥ (1 − 21 )(5 − 1) − 1 = 1.
Also, |et sin(t + y)| ≤ e3/2 . Setting M = e3/2 , we obtain
|F (t, y)| ≤ M for all (t, y) ∈ R.
60 3. DIFFERENTIAL CALCULUS IN Rn

Compute
et cos(t + y) et sin(t + y)
∂y F (t, y) = −t .
ty − 1 (ty − 1)2
For (t, y) ∈ R we estimate
et cos(t + y) et sin(t + y)
|∂y F (t, y)| ≤ + t ≤ c,
ty − 1 (ty − 1)2
where we have set c = e3/2 + 23 e3/2 . Then the number a∗ from Theorem 3.24 is
a∗ = min(a, b/M ) = min( 12 , 1/e3/2 ) = e−3/2 . So the theorem yields the existence
and uniqueness of a solution the the initial value problem (3.6) in the interval I =
[1 − e−3/2 , 1 + e−3/2 ]. We can also compute that solution by Picard iteration: let
y0 (t) = 5 and
Z t s
e sin(s + yn−1 (s))
yn (t) = 5 + ds.
1 syn−1 (s) − 1
The sequence (yn )n∈N converges uniformly on I to the solution y.
Example 3.27. Sometimes one can extend solutions beyond the interval obtained
from the Picard-Lindelöf theorem. Consider the initial value problem
 0
y (t) = cos(y(t)2 − 2t3 )
(3.7)
y(0) = 1
We claim that there exists a unique solution y : R → R. To prove this it suffices to
demonstrate the existence of a unique solution on the interval [−L, L] for every L > 0.
To do this we invoke the Picard-Lindelöf theorem. Set
R = {(t, y) ∈ R2 : |t| ≤ L, |y − 1| ≤ L}.
Let F (t, y) = cos(y 2 − 2t3 ). Then
|F (t, y)| ≤ 1 for all (t, y) ∈ R2 .
We have ∂y F (t, y) = −2y sin(y 2 − 2t3 ), so |∂y F (t, y)| ≤ 2|y| ≤ 2(L + 1) for all (t, y) ∈ R.
Then by Theorem 3.24, there exists a unique solution to (3.7) on I = [−L, L].
Example 3.28. If the Lipschitz condition (3.4) fails, then the initial value problem
may have more than one solution. Consider
 0
y (t) = |y(t)|1/2 ,
(3.8)
y(0) = 0.

The function y 7→ |y|1/2 is not Lipschitz continuous in any neighborhood of 0: for


y > 0 its derivative 21 y −1/2 is unbounded as y → 0. The function y1 (t) = 0 solves the
initial value problem (3.8). The function
 2
t /4, if t > 0,
y2 (t) =
0, if t ≤ 0
also does.
Existence of a solution still holds without the assumption (3.4). We will prove this
as a consequence of the Arzelá-Ascoli theorem.
3. ORDINARY DIFFERENTIAL EQUATIONS 61

Theorem 3.29 (Peano existence theorem). Let E ⊂ R × R open, (t0 , y0 ) ∈ E,


F ∈ C(E),
R = {(t, y) : |t − t0 | ≤ a, |y − y0 | ≤ b} ⊂ E.
Let M = sup(t,y)∈R |F (t, y)| < ∞. Define a∗ = min(a, b/M ) and let I = [t0 − a∗ , t0 + a∗ ].
Then there exists a solution y : I → R to the initial value problem
 0
y (t) = F (t, y(t)),
(3.9)
y(t0 ) = y0 .
Corollary 3.30. Let E ⊂ R × R open, (t0 , y0 ) ∈ E, F ∈ C(E). Then there exists
an interval I ⊂ R and a differentiable function y : I → R such that (t, y(t)) ∈ E for all
t ∈ I and y solves (3.5).
Proof. It suffices to produce a solution to the integral equation
Z t
(3.10) y(t) = y0 + F (s, y(s))ds.
t0

on the interval [t0 − a∗ , t + a∗ ] We restrict our attention to the interval [t0 , t0 + a∗ ], which
we denote by I. The construction is similar on the other half, [t0 − a∗ , t0 ].
Let P be a partition P = {t0 < t1 < · · · < tN = t0 + a} of [t0 , t0 + a]. We let
∆P = max0≤k≤N −1 (tk+1 − tk ) denote the maximal width of P. We try to build an
approximate solution given as a continuous piecewise linear function. The function
yP : [t0 , t0 + a∗ ] → R shall be defined so that yP (t0 ) = y0 and so that on every
partition interval [tk , tk+1 ] the function is given by yP (t) − yP (tk ) = mk (t − tk ) with
mk = F (tk , yP (tk ). To see that this is possible we establish the following

Claim 0. There is a unique continuous function yP on [t0 , t0 + a∗ ] with y(t0 ) = y0 such


that the following properties hold.
(i) yP (t) = yP (tk−1 ) + F (tk−1 , yP (tk−1 ))(t − tk ) for tk−1 ≤ t ≤ tk , 1 ≤ j ≤ N
(ii) For t0 ≤ t ≤ t0 +a∗ we have y0 −M (t−t0 ) ≤ yP (t) ≤ y0 +M (t−t0 ), in particular
|yP (t) − y0 | ≤ M a∗ ≤ b.
Proof of claim. We prove this by induction, establishing for each 1 ≤ n ≤ N
the properties (i) and (ii) on [t0 , tn ].
First define on [t0 , 11 ] yP (t) = y0 + F (t0 , y0 )(t − t0 ) . Then yP (t0 ) = y0 so that (i)
holds for t0 ≤ t ≤ t1 . Since |F (t0 , y0 )| ≤ M we also have (ii) for t0 ≤ t ≤ t + a∗ .
Now assume that yP is already defined on [t0 , tn ] and (i) and (ii) hold on the interval
[t0 , tn ]. If n = N we are done, so let’s assume n < N . We define yP (t) = yP (tn ) +
F (tn , yP (tn ))(t − tn ) for tn ≤ t ≤ tn+1 . Clearly with the induction hypothesis on [t0 , tn ]
this makes yP a continuous piecewise linear function on [t0 , tn+1 ]. By the induction
hypothesis we have y0 − M (t − t0 ) ≤ yP (t) ≤ y0 + M (t − t0 ) for t0 ≤ t ≤ tn , and now,
since |F (tn , yP (tn ))| ≤ M we get for tn ≤ t ≤ tn+1
yP (t) ≤ yP (tn ) + M (t − tn ) ≤ y0 + M (tn − t0 ) + M (t − tn ) = y0 + M (t − t0 )
yP (t) ≥ yP (tn ) − M (t − tn ) ≥ y0 − M (tn − t0 ) − M (t − tn ) = y0 − M (t − t0 )
so that in particular (t, y( t)) ∈ R for t0 ≤ t ≤ tn+1 .
We have verified (i), (ii) on [0, tn+1 ] which finishes the induction.
Letting n = N − 1 we have defined our polygonal function satisfying the properties
(i) and (ii) on [t + 0 + t0 + a∗ ]. 
62 3. DIFFERENTIAL CALCULUS IN Rn

Claim 1. For t, t0 ∈ [t0 , t0 + a∗ ],


|yP (t) − yP (t0 )| ≤ M |t − t0 |.
Proof of claim. In this proof we will write yP as y for brevity. Say t0 ∈ [tk , tk+1 ],
t ∈ [t` , t`+1 ], k ≤ `. If k = `, then
|y(t) − y(t0 )| = |F (tk , y(tk ))(t − t0 )| ≤ M |t − t0 |.
If k < `, then
`−1
X
|y(t) − y(t0 )| = |y(t) − y(t` ) + (y(tj+1 ) − y(tj )) + y(tk+1 ) − y(t0 )|
j=k+1
`−1
X
≤ |y(t) − y(t` )| + |y(tj+1 ) − y(tj )| + |y(tk+1 ) − y(t0 )|
j=k+1
`−1
X
≤ M (t − t` ) + M (tj+1 − tj ) + M (tk+1 − t0 ) = M (t − t0 ). 
j=k+1

Define gP (t) = F (tk , yP (tk )) for t ∈ (tk , tk+1 ]. Then gP is a step function and
yP0 (t) = gP (t) for t ∈ (tk , tk+1 ).

Let ε > 0. F is uniformly continuous on R, because R is compact (Theorem 1.53).


Thus there exists δ = δ(ε) > 0 such that
(3.11) |F (t, y) − F (t0 , y 0 )| ≤ ε
for all (t, y), (t0 , y 0 ) ∈ R with k(t, y) − (t0 , y 0 )k ≤ 100δ.

Claim 2. Suppose that ∆P ≤ δ(ε) min(1, M −1 ). Then we have for all t ∈ [t0 , t0 + a∗ ]
that
Z t
yP (t) = y0 + gP (s)ds and |gP (s) − F (s, yP (s))| ≤ ε if s ∈ (tk−1 , tk ).
t0

Proof of claim. We will write y instead of yP and g instead of gP in this proof.


First we have for t = tk :
k
X
y(tk ) − y0 = y(tk ) − y(t0 ) = y(tj ) − y(tj−1 )
j=1
k
X k Z
X tj Z tk
= F (tj−1 , y(tj−1 ))(tj − tj−1 ) = g(s)ds = g(s)ds.
j=1 j=1 tj−1 t0

Similarly, for t ∈ (tk , tk+1 ):


Z t
y(t) − y(tk ) = F (tk , y(tk ))(t − tk ) = g(s)ds.
tk
Thus,
Z t Z tk Z t Z t
y(t) = y(tk ) + g(s)ds = y0 + g(s)ds + g(s)ds = y0 + g(s)ds.
tk t0 tk t0

Let s ∈ (tk−1 , tk ). Then


|g(s) − F (s, y(s))| = |F (tk−1 , y(tk−1 )) − F (s, y(s))|.
3. ORDINARY DIFFERENTIAL EQUATIONS 63

We have
|y(tk−1 ) − y(s)| ≤ M |tk−1 − s| ≤ M (tk − tk−1 ) ≤ M · ∆P ≤ δ.
Also, |tk−1 − s| ≤ tk − tk−1 ≤ ∆P ≤ δ. Thus,
k(tk−1 , y(tk−1 )) − (s, y(s))k ≤ 100δ.
By (3.11),
|g(s) − F (s, y(s))| = |F (tk−1 , y(tk−1 )) − F (s, y(s))| ≤ ε.

Claim 3. Suppose that ∆P ≤ δ(ε) min(1, M −1 ). Then it holds for all t ∈ [t0 , t0 + a∗ ]
that Z t
|yP (t) − (y0 + F (s, yP (s))ds)| ≤ εa∗ .
t0

Proof of claim. By Claim 2, the left hand side equals


Z t Z t
(gP (s) − F (s, yP (s)))ds ≤ |gP (s) − F (s, yP (s))|ds.
t0 t0

Claim 2 implies that this is no larger than ε(t − t0 ) ≤ εa∗ . 


Claim 3 says that yP is almost a solution if the maximal width of the partition P is
sufficiently small. In the final step we use a compactness argument to obtain an honest
solution.

Claim 4. The set F = {yP : P partition} ⊂ C([t0 , t0 + a∗ ]) is relatively compact.


Proof of claim. By Claim 1,
|yP (t) − yP (t0 )| ≤ M |t − t0 |.
This implies that F is equicontinuous. It is also bounded:
|yP (t)| ≤ |yP (t0 )| + |yP (t) − yP (t0 )| ≤ |y0 | + M |t − t0 | ≤ |y0 | + M a∗ .
Thus the claim follows from the Arzelà-Ascoli theorem (Theorem 1.74). 
For n ∈ N, choose a partition Pn with ∆Pn ≤ δ(1/n) min(1, M −1 ). By compactness
of F, the sequence (yPn )n∈N ⊂ F ⊂ F has a convergent subsequence that converges to
some limit y ∈ C([t0 , t0 + a∗ ]). It remains to show that y is a solution to the integral
equation (3.10). Let us denote that subsequence by (yn )n∈N . By (uniform) continuity
of F , we have that F (s, yn (s)) → F (s, y(s)) uniformly in s ∈ [t0 , t] as n → ∞. Thus,
Z t Z t
F (s, yn (s))ds −→ F (s, y(s))ds as n → ∞.
t0 t0

On the other hand, by Claim 3 we get


Z t
a∗
|yn (t) − (y0 + F (s, yn (s))ds)| ≤ −→ 0 as n → ∞.
t0 n
Therefore, y solves the integral equation (3.10). 
The theory for ordinary differential equations that we have developed turns out to
be far more general.
64 3. DIFFERENTIAL CALCULUS IN Rn

Systems of first-order ordinary differential equations. The proofs of the Picard-


Lindelöf theorem and the Peano existence theorem can easily be extended to apply
to systems of differential equations:
 0
y (t) = F (t, y(t)),
y(t0 ) = y0
for F : E → Rm , E ⊂ R × Rm open, (t0 , y0 ) ∈ E.

Higher-order differential equations. Let d ≥ 1 and consider the d-th order ordinary
differential equation given by
(3.12) y (d) (t) = F (t, y(t), y 0 (t), . . . , y (d−1) (t))
for some F : E → R, E ⊂ R × Rd open. We can transform this equation into a system
of d first-order equations: if Y = (Y1 , . . . , Yd ) solves the system
Y10 (t) = Y2 (t)


0
 Y2 (t) = Y3 (t)



..
.
0

 0 d−1 = Yd (t)
Y (t)



Yd (t) = F (t, Y (t))
then Yd is a solution to (3.12).

4. Higher order derivatives and Taylor’s theorem


Definition 3.31. Let U ⊂ Rn be open and f : U → R. We define the partial
derivatives of second order as
∂i ∂j f = ∂xi (∂xj f ) for i, j ∈ {1, . . . , n}
(if ∂j f , ∂i (∂j f ) exist). If ∂i f and ∂i ∂j f exist and are continuous for all i, j ∈ {1, . . . , n},
then we say that f ∈ C 2 (U ). In this case we also write (motivated by the next theorem)
∂i ∂j f = ∂ij f.
Theorem 3.32 (Schwarz). Let U ⊂ Rn open. Fix i, j ∈ {1, . . . , n}, i 6= j. Let
f : U → R such that ∂i f, ∂j f , ∂i ∂j f exist at every point in U and ∂i ∂j f is continuous
at some point a ∈ U . Then ∂j ∂i f (a) exists and
∂j ∂i f (a) = ∂i ∂j f (a).
Remark. The algebraic reason for this result can be intuitively understood by replacing
the differentiation operators with corresponding difference operators. Define ∆h f (x) =
f (x + h) − f (x). Then by commutativity of addition we get ∆h ∆k = ∆k ∆h . Indeed,
∆h ∆k f (x) = ∆k f (x + h) − ∆k f (x) = f (x + k + h) − f (x + h) − (f (x + k) − f (x))
∆k ∆h f (x) = ∆h f (x + k) − ∆h f (x) = f (x + h + k) − f (x + k) − (f (x + h) − f (x))
and the two expressions are clearly equal. As motivation for the theorem we want to
choose for h and k the quantities hi ei and hj ej where ei , ej are unit vectors, and hi , hj
∆h e ∆h e f (x)
are small. We have ∂i ∂jf (x) = limhi →0 limhj →0 i i hi hjj j if the two limits exist and
we like to relate this limit to ∂j ∂i f (a). As ∆hi ei and ∆hj ej commute the main issue is
to interchange the order of limits.
4. HIGHER ORDER DERIVATIVES AND TAYLOR’S THEOREM 65

Proof of Theorem 3.32. We first consider the case n = 2 , i = 1, j = 2 and


then reduce to the general case.
Let f be as in the theorem and a = (a1 , a2 )) ∈ U and (h1 , h2 ) ∈ R2 \{0} such that
(a1 + h1 , a2 + h2 ) are contained in an open ball around a that is contained in U . We
are assuming that ∂1 ∂2 f is continuous at a and want to show that ∂2 ∂1 f exists (and is
equal to ∂1 ∂2 f (a)). Hence we need to study the difference quotient
∂1 f (a1 , a2 + h2 ) − ∂1 f (a1 , a2 )
h2
for h2 → 0.
This leads us to consider the quantity
∆2 (a, h) := ∆h2 e2 [∆h1 e1 f (a)]
= (f (a1 + h1 , a2 + h2 ) − f (a1 , a2 + h2 )) − (f (a1 + h1 , a2 ) − f (a1 , a2 )).
Define g(t) := ∆h1 e1 f (a1 , t) = f (a1 + h1 , t) − f (a1 , t)). Since ∂2 f exists at every
point in U , the mean value theorem implies that there exists η = ηh contained in the
closed interval with endpoints a1 and a2 + h2 such that
∆2 (a, h) = g(a2 + h2 ) − g(a2 ) = g 0 (η)h2 = h2 (∂2 f (a1 + h1 , η) − ∂2 f (a1 , η))
Since ∂1 ∂2 f exists at every point in U , another application of the mean value theorem
yields
∆2 (a, h) = h1 h2 · ∂1 ∂2 f (ξ, η),
where ξ = ξh1 is in the closed interval with endpoints a1 and a1 + h1 .
Let ε > 0. Since ∂1 ∂2 f is continuous at (a1 , a2 ),
|∂1 ∂2 f (a) − ∂1 ∂2 f (x)| ≤ ε
whenever ka − xk is small enough. Thus, for small enough h1 and h2 we have
∆2 (a, h)
(3.13) |∂1 ∂2 f (a) − | ≤ ε.
h1 h2
Letting h1 → 0 and using that ∂1 f exists at every point we get for h2 =
6 0
f (a1 +h1 ,a2 +h2 )−f (a1 ,a2 +h2 ) f (a1 +h1 ,a2 )−f (a1 ,a2 )
∆2 (a, h) h1
− h1
lim = lim
h1 →0 h1 h2 h1 →0 h2
∂1 f (a1 , a2 + h2 ) − ∂1 f (a1 , a2 + h2 )
=
h2
and inequality (3.13) implies
∂1 f (a1 , a2 + h2 ) − ∂1 f (a1 , a2 )
∂1 ∂2 f (a) − ≤ ε.
h2
Since ε was arbitrary, ∂2 ∂1 f (a) exists and equals ∂1 ∂2 f (a).
Finally we consider the situation in Rn for any n ≥ 2. Let ei , ej be different unit
coordinate vectors and consider the function of two variables F (s1 , s2 ) = f (a + s1 ei +
s2 ej ) for (s1 , s2 ) near the origin. The assumption that ∂i ∂j f is continuous at a implies
that ∂1 ∂2 F is continuous at 0 ∈ R2 and so by the above ∂2 ∂1 F (0) = ∂1 ∂2 F (0). From
this and the definitions of partial derivatives we conclude that ∂j ∂i f (a) exists and is
equal to ∂i ∂j f (a). 
The following example shows that in Schwarz’ theorem one cannot dispense with
the assumption ∂i ∂j f to be continuous at a.
66 3. DIFFERENTIAL CALCULUS IN Rn

Exercise 3.33. Define f : R2 → R by


2 2
xy xx2 −y

2, if (x, y) 6= (0, 0),
f (x, y) = +y
0, if (x, y) = (0, 0).
Show that the mixed partial derivatives ∂x ∂y f and ∂y ∂x f both exist at every point in
R2 , but that ∂x ∂y f (0, 0) 6= ∂y ∂x f (0, 0).
Corollary 3.34. If f ∈ C 2 (U ), then ∂xi ∂xj f = ∂xj ∂xi f for every i, j ∈ {1, . . . , n}.
Definition 3.35. Let U ⊂ Rn be an open set and f : U → R. Let k ∈ N. If all
partial derivatives of f up to order k exist, i.e. for all j ∈ {1, . . . , k} and i1 , . . . , ij ∈
{1, . . . , n}, the ∂i1 · · · ∂ij f exist, and are continuous, then we write f ∈ C k (E) and say
that f is k times continuously differentiable .
Corollary 3.36. If f ∈ C k (U ) and π : {1, . . . , k} → {1, . . . , k} is a bijection, then
∂i1 · · · ∂ik f = ∂iπ(1) · · · ∂iπ(k) f
for all i1 , . . . , ik ∈ {1, . . . , n}.
Multiindex notation. In order to make formulas involving higher order derivatives
shorter and more readable, we introduce multiindex notation . P A multiindex of order
k is aPvector α = (α1 , . . . , αn ) ∈ N0 = {0, 1, 2, . . . } such that ni=1 αi = k. We write
n n

|α| = ni=1 αi . For every multiindex α we introduce the notation


∂ α f = ∂xα11 · · · ∂xαnn f,
where ∂xαii is short for ∂xi · · · ∂xi (αi times). For x = (x1 , . . . , xn ) ∈ Rn we also write
xα = xα1 1 · · · xαnn
and α! = α1 ! · · · αn !. Moreover, for α, β ∈ Nn0 , α ≤ β means that αi ≤ βi for every
i ∈ {1, . . . , n}.
With this notation, we can state Taylor’s theorem in Rn quite succinctly.
Theorem 3.37 (Taylor). Let U ⊂ Rn be open and convex, f ∈ C k+1 (U ) and x, x +
y ∈ U . Then there exists ξ ∈ U such that
X ∂ α f (x) X ∂ α f (ξ)
f (x + y) = yα + yα.
α! α!
|α|≤k |α|=k+1

Moreover, ξ takes the form ξ = x + θy for some θ ∈ [0, 1].


Remark. Without multiindex notation the statement of this theorem would look much
more messy:
X ∂ α f (x)
α
X ∂xα11 · · · ∂xαnn f (x) α1
y = y1 · · · ynαn .
α! α ,...,α ≥0,
α1 ! · · · αn !
|α|≤k 1 n
α1 +···+αn ≤k

Proof of Theorem 3.37. The idea is to apply Taylor’s theorem in one dimension
to the function g : [0, 1] → R given by g(t) = f (x + ty). Let us compute the derivatives
of g.
Claim. For m = 1, . . . , k + 1,
X m!
g (m) (t) = ∂ α f (x + ty)y α
α!
|α|=m
4. HIGHER ORDER DERIVATIVES AND TAYLOR’S THEOREM 67

Proof of claim. We first show by induction on m that


n
X
(m)
g (t) = ∂i1 · · · ∂im f (x + ty)yi1 · · · yim .
i1 ,...,im =1

Indeed, for m = 1, by the chain rule,


n
X
0
g (t) = ∂i f (x + ty)yi .
i=1

Suppose we have shown it for m. Then


n
d (m) d X
g (m+1) (t) = g (t) = ∂i · · · ∂im f (x + ty)yi1 · · · yim .
dt dt i ,...,i =1 1
1 m

By the chain rule this equals


n
X n
X n
X
∂i1 · · · ∂im ∂i f (x+ty)yi1 · · · yim yi = ∂i1 · · · ∂im+1 f (x+ty)yi1 · · · yim+1 .
i1 ,...,im =1 i=1 i1 ,...,im+1 =1

It remains to show that


X n X m!
∂i1 · · · ∂im f (x + ty)yi1 · · · yim = ∂ α f (x + ty)y α .
i ,...,i =1
α!
1 m |α|=m

This follows because for a given α = (α1 , . . . , αn ) with |α| = m there are
    
m! m! m m − α1 m − α1 − · · · − αn−1
= = ···
α! α1 ! · · · αn ! α1 α2 αn
many tuples (i1 , . . . , im ) ∈ {1, . . . , n}m such that i appears exactly αi times among the
ij s. In other words, this is the number of ways to sort m pairwise different marbles into
n numbered bins such that bin number i contains exactly αi marbles. 
By the one-dimensional Taylor theorem, there exists a θ ∈ [0, 1] such that
k
X g (m) (0) m g (k+1) (θ) k+1
g(t) = t + t
m=0
m! (k + 1)!
From the claim we see that this equals
k
X 1 X m! α 1 X (k + 1)!
∂ f (x)y α tm + ∂ α f (x + θy)y α tk+1
m=0
m! α! (k + 1)! α!
|α|=m |α|=k+1

X ∂ α f (x) X ∂ α f (ξ)
= (ty)α + (ty)α ,
α! α!
|α|≤k |α|=k+1

where we have set ξ = x + θy. Letting t = 1 we obtain the claim. 


Corollary 3.38. If E ⊂ Rn is open and f ∈ C k (E), then for every x ∈ E,
X ∂ α f (x)
f (x + y) = y α + o(kykk ) as y → 0.
α!
|α|≤k
68 3. DIFFERENTIAL CALCULUS IN Rn

Proof. Let x ∈ E and δ > 0 be small enough so that U = Bδ (x) ⊂ E. By Taylor’s


theorem we have for every y with x + y ∈ U that
X ∂ α f (x) X ∂ α f (x + θy)
f (x + y) = yα + yα
α! α!
|α|≤k−1 |α|=k
X ∂ α f (x) X ∂ α f (x + θy) − ∂ α f (x)
= yα + yα
α! α!
|α|≤k |α|=k
α
for some θ ∈ [0, 1]. Since ∂ f is continuous for every |α| = k, it holds that
|∂ α f (x + θy) − ∂ α f (x)| → 0 as y → 0.
Also |y α | = |y1 |α1 · · · |yn |αn ≤ kykα1 +···+αn = kyk|α| , so
X ∂ α f (x + θy) − ∂ α f (x)
y α = o(kykk ).
α!
|α|=k


Definition 3.39. Let E ⊂ Rn be open and f ∈ C 2 (E). We define the Hessian
matrix of f at x ∈ E by
 2
∂1 f (x) · · · ∂1 ∂n f (x)

D2 f |x = (∂i ∂j f (x))i,j=1,...,n =  .. .. ..  ∈ Rn×n .


. . .
∂n ∂1 f (x) · · · ∂n2 f (x)
We call det D2 f |x the Hessian determinant of f at x ∈ E.
Sometimes the term Hessian is used for both, the matrix and its determinant. By
Theorem 3.32 the Hessian matrix is symmetric.
Corollary 3.40. Let E ⊂ Rn be open, f ∈ C 2 (E) and x ∈ E. Then
f (x + y) = f (x) + h∇f (x), yi + 21 hy, D2 f |x yi + o(kyk2 ) as y → 0.
(Here hx, yi = ni=1 xi yi denotes the inner product of two vectors x, y ∈ Rn .)
P

Proof. By Corollary 3.38,


X ∂ α f (x) X ∂ α f (x)
f (x + y) = f (x) + yα + y α + o(kyk2 ) as y → 0.
α! α!
|α|=1 |α|=2

We have
X ∂ α f (x) Xn
α
y = ∂i f (x)yi = h∇f (x), yi.
α! i=1
|α|=1

If |α| = 2 then either α = 2ei for some i ∈ {1, . . . , n} or α = ei + ej for some


1 ≤ i < j ≤ n. Thus,
X ∂ α f (x) n
α 1X 2 X
y = ∂i f (x)yi2 + ∂i ∂j f (x)yi yj
α! 2 i=1 1≤i<j≤n
|α|=2
n n
1X 1X 1
= ∂i ∂j f (x)yi yj = yi (D2 f |x y)i = hy, D2 f |x yi. 
2 i,j=1 2 i=1 2

We discuss a further application of the Schwarz Theorem 3.32.


5. LOCAL EXTREMA 69

Theorem 3.41. Let U ⊂ Rn be open and F = (F1 , . . . , Fn ) : U → Rn be of class


C 1 , i.e. the partial derivatives ∂j Fi are continuous. Then
(i) If F is the gradient of a C 2 function g : U → R, F = ∇g then the compatibility
conditions
(3.14) ∂j Fi = ∂i Fj , i, j = 1, . . . , n
hold on U .
(ii) Suppose that U is convex. If the compatibility conditions (3.14) hold on U then
there exists a C 2 function g such that F = ∇g on U .
Proof. Part (i) is immediate from Theorem 3.32. Indeed if F = ∇g, i.e. Fj = ∂j g,
for j = 1, . . . , n then ∂i Fj = ∂i ∂j g = ∂j ∂i g = ∂j Fi .
For part (ii) we fix a in U and use that by convexity of U that the line segment
between a and x is contained in U , for all x ∈ U . We can then define
Z 1
g(x) = (x − a)| F (a + t(x − a))dt
0

(This is the standard calculus definition of a line integral of the vector field F along
the line parametrized by γ(t) = a + t(x − a) with γ 0 (t) = x − a).
We claim that F = ∇g. Compute (differentiating under the integral sign)
n
Z 1X
∂i g(x) = ∂i (xj − aj )Fj (a + t(x − a))dt
0 j=1
Z 1 h n
X i
= Fi (a + t(x − a)) + (xj − aj )∂i Fj (a + t(x − a))t dt
0 j=1
Z 1 h Xn i
= Fi (a + t(x − a)) + (xj − aj )∂j Fi (a + t(x − a))t dt
0 j=1

by the compatibiltycondition
Pn ∂i Fj = ∂j Fi . Observe using the chain rule that we have
d
dt
F i (a + t(x − a)) = j=1 (xj − aj )∂j Fi (a + t(x − a)) so that the last displayed
expression is equal to
Z 1
d 
Fi (a + t(x − a)) + t Fi (a + t(x − a)) dt
0 dt
Z 1
d 
= tFi (a + t(x − a)) dt = 1Fi (x) − 0Fi (a) = Fi (x).
0 dt

Here we have used, in the last displayed line, the product rule and then the fundamental
theorem of calculus. We have shown ∂i g = Fi , for all i and hence ∇g = F . 

5. Local extrema
Let E ⊂ Rn be an open set and f : E → R a function.
Definition 3.42. A point a ∈ E is called a local maximum if there exists an open
set U ⊂ E with a ∈ U such that f (a) ≥ f (x) for all x ∈ U . It is called a strict local
maximum if f (a) > f (x) for all x ∈ U , x 6= a. We define the terms local minimum ,
strict local minimum accordingly. A point is called a (strict) local extremum if it is a
(strict) local maximum or a (strict) local minimum.
70 3. DIFFERENTIAL CALCULUS IN Rn

Theorem 3.43. Suppose the partial derivative ∂i f exists on E. Then, if f has a


local extremum at a ∈ E, then ∂i f (a) = 0.
Proof. Let δ > 0 be such that a + tei ∈ E for all |t| ≤ δ. Define g : (−δ, δ) → R
by g(t) = f (a + tei ). By the chain rule, g is differentiable and g 0 (t) = ∂i f (a + tei ). Also,
0 is a local extremum of g so by Analysis I, 0 = g 0 (0) = ∂i f (a). 
Corollary 3.44. If f is differentiable at a and a is a local extremum, then ∇f (a) =
0.
Remark. ∇f (a) = 0 is not a sufficient condition for a to be a local extremum. Think
of saddle points.
Definition 3.45. If a ∈ E is such that ∇f (a) = 0, then we call a a critical point
of f .
Recall from linear algebra: A matrix A ∈ Rn×n is called positive definite if
hx, Axi > 0 for all x ∈ R \{0} and positive semidefinite if hx, Axi ≥ 0 for all x ∈ Rn .
n

We also write A > 0 to express that A is positive definite and A ≥ 0 to express that A
is positive semidefinite. The terms negative definite, negative semidefinite are defined
accordingly. A is indefinite if it is not positive semidefinite and not negative semidef-
inite. Every real symmetric matrix has real eigenvalues and there is an orthonormal
basis of eigenvectors (spectral theorem). A real symmetric matrix is positive definite if
and only if all eigenvalues are positive.
Theorem 3.46. Let f ∈ C 2 (E) and a ∈ E with ∇f (a) = 0. Then
(1) if D2 f |a > 0, then a is a strict local minimum of f ,
(2) if D2 f |a < 0, then a is a strict local maximum of f ,
(3) if D2 f |a is indefinite, then a is not a local extremum of f .
Remark. If D2 f |x is only positive semidefinite or negative semidefinite, then we need
more information to be able to decide whether or not a is a local extremum.
Proof. We write A = D2 f |a . Let ε > 0. By Corollary 3.40 there exists δ > 0 such
that for all y with kyk ≤ δ we have
f (a + y) = f (a) + 21 hy, Ayi + r(y)
with |r(y)| ≤ εkyk2 .
(1): Let A be positive definite. Let S = {y ∈ Rn : kyk = 1}. S is compact, so the
continuous map y 7→ hy, Ayi attains its minimum on S. That is, there exists y0 ∈ S
such that
hy0 , Ay0 i ≤ hy, Ayi
for all y ∈ S. Define α = hy0 , Ay0 i. Since y0 6= 0 and A is positive definite, α > 0. Let
y
y ∈ Rn , y 6= 0. Then kyk ∈ S, so
y y 1
α≤h ,A i= hy, Ayi.
kyk kyk kyk2
Thus, hy, Ayi ≥ αkyk2 for all y ∈ Rn . Now we set ε = α4 . Then
f (a + y) ≥ f (a) + 12 hy, Ayi − α4 kyk2 ≥ f (a) + α2 kyk2 − α4 kyk2 = f (a) + α4 kyk2 > f (a)
if y 6= 0, kyk ≤ δ. Therefore a is a local minimum.
(2): Follows from (1) by replacing f by −f .
6. LOCAL EXTREMA ON SURFACES 71

(3): Let A be indefinite. We need to show that in every open neighborhood of a there
exist points y 0 , y 00 such that
f (y 00 ) < f (a) < f (y 0 ).
Since A is not negative semidefinite there exists ξ ∈ Rn such that α = hξ, Aξi > 0.
Then, for t ∈ R small enough such that |tξ| ≤ δ we have
f (a + tξ) = f (a) + 12 htξ, Atξi + r(tξ) = f (a) + 21 αt2 + r(tξ).
Let ε > 0 be such that |r(tξ)| ≤ α4 t2 for all |tξ| ≤ δ (recall that δ depends on ε). Then
f (a + tξ) ≥ f (a) + 14 αt2 > f (a). Similarly, since A is also not positive semidefinite,
there exists η ∈ Rn such that hη, Aηi < 0 and for small enough t, f (a + tη) < f (a). 
Examples 3.47. (1) Let f (x, y) = c + x2 + y 2 for c ∈ R. Then
 
2 2 0
D f |0 = >0
0 2
and 0 is a strict local minimum of f (even a global minimum).
(2) Let f (x, y) = c + x2 − y 2 for c ∈ R. Then
 
2 2 0
D f |0 =
0 −2
is indefinite and 0 is not a local extremum of f .
(3) Let f1 (x, y) = x2 + y 4 , f2 (x, y) = x2 , f3 (x, y) = x2 + y 3 . Then
 
2 2 0
D fi |0 = ≥ 0,
0 0
but f1 has a strict local minimum at 0, f2 has a (non-strict) local minimum at
0 and f3 has no local extremum at 0.

6. Local extrema on surfaces


A standard problem is to find local extrema of a function f of n variables in
the presence of additional constraints given by equations g(x) = 0, where g(x) =
(g1 (x), . . . , gm (x)), i = 1, . . . , m. We shall assume m < n and that the gradients ∇gi
are linearly independent; we then seek to find a necessary condition for local extrema of
f on the surface given by g = 0. We say that f has a local maximum at a with respect
to the constraint g = 0 if there is an open neighborhood U of a such that f (a) ≥ f (x)
for all x ∈ U that satisfy g(x) = 0. The definition of a local minimum under the
constraint g = 0 is analogous.
Theorem 3.48. Let a ∈ Rn and let U be an open neighborhood of a. Let m < n, let
g = (g1 , . . . , gm ) : U → Rm be in C 1 (U ; Rm ), f : U → R) be in C 1 (U ) and assume that
g(a) = 0. Assume that rank(Dg|a ) = m, and assume that f has a local extremum at a
under the constraint g = 0. Then the vectors ∇f (a), ∇g1 (a), ..., ∇gm (a) are linearly
dependent, and there are λ1 , . . . , λm ∈ R such that
m
X
∇f (a) = λk ∇gk (a).
k=1

Remark: The numbers λ1 , . . . , λm are called Lagrange multipliers.


72 3. DIFFERENTIAL CALCULUS IN Rn

Proof of Theorem 3.48. Without loss of generality we may assume (after pos-
sibly relabeling the variables) that the m × m matrix with entries ∂i gj (a) with i, j =
1, . . . , m has maximal rank m. By the implicit function theorem the equation g(x) = 0
can be solved near a by expressing the variables xi , i = 1, . . . , m as functions ui of
x00 = (xm+1 , . . . , xn ). Thus there is a δ > 0 such that for
Bδ = {x = (x0 , x00 ) ∈ Rm : |xi − ai | < δ, i = 1, . . . , n}
the equation g(x) = 0 for x ∈ Bδ is satisfied if and only if x0 = u(x00 ) for x00 =
(xm+1 , . . . , xn ). Hence
(3.15) g(u(x00 ), x00 ) = 0
for x00 close to a00 , and u(a00 ) = a0 . By assumption the function x00 7→ F (u(x00 ), x00 )
has a local extremum at a00 and therefore, by Corollary 3.44 we have ∂ν F (a00 ) = 0 for
ν = m + 1, . . . , n. By the chain rule
Xm
(3.16) ∂ν F (a) = ∂ν f (a) + ∂i f (a)∂ν ui (a00 ) = 0, ν = m + 1, . . . , n.
i=1

In view of the assumption det(∂i gk (a))i,k ) 6= 0 the system of m linear equations


Xm
(3.17) λk ∂i gk (a) = ∂i f (a), i = 1, . . . , m
k=1

has a unique solution (λ1 , . . . , λm ). Differentiating in (3.15) with respect to xν we obtain


X m
(3.18) ∂i gk (a)∂ν ui (a00 ) + ∂ν gk (a) = 0
i=1

for ν = m + 1, . . . , n and k = 1, . . . , m. Combining (3.16) and (3.17) we obtain, for


ν = m + 1, . . . , n
m
X m X
X m 
00
∂ν f (a) = − ∂i f (a)∂ν ui (a ) = − λk ∂i gk (a) ∂ν ui (a00 )
i=1 i=1 k=1
Xm m
X 
00
=− λk ∂i gk (a)∂ν ui (a ) .
k=1 i=1

Now use (3.18) to obtain


m
X
(3.19) ∂ν f (a) = λk ∂ν gk (a), ν = m + 1, . . . , n.
k=1
Pm
Combining (3.19) with (3.17) we get ∇f (a) = ∇gk (a) as claimed.
k=1 
Example 3.49. For x ∈ Rn let g(x) = x| x = nj=1 x2j and let S = {x : g(x) = 1},
P
the unit sphere centered at the origin. Consider the quadratic form
Xn
|
f (x) = x Ax = aij xi xj
i,j=1

where A is a real symmetric n × n matrix, for x ∈ S. The continuous function f


has a maximum on the compact set S, which occurs at v, say. Theorem 3.48 tells
us that ∇f (v) and ∇g(v) are linearly dependent and there is a λ ∈ R such that
∇f (v) = λ∇g(v). Compute ∇f (v) = 2Av and ∇(v) = 2v. Thus we have Av = λv with
v ∈ S. This means that v is an eigenvector of A, with corresponding eigenvalue λ.
7. OPTIMIZATION AND CONVEXITY* 73

Remark: The existence of a eigenvector in Rn for real symmetric matrices is an


important first step in the proof of the spectral theorem for such matrices.

7. Optimization and convexity*


In applications it is often desirable to minimize a given function f : E → R, i.e. to
find x∗ ∈ E such that f (x∗ ) ≤ f (x) for all x ∈ E. We call such a point x∗ a global
minimum of f . We say that x∗ is a strict global minimum if f (x∗ ) < f (x) for all
x 6= x∗ .
Example 3.50 (Linear regression). Say we are given finitely many points
(x1 , y1 ), . . . , (xN , yN ) ∈ Rn × R.
Suppose for instance that these represent measurements or observations of some physical
system. For example, xi could represent a point in space and yi the corresponding air
pressure measurement. We are looking to discover a “hidden relation” between the x
and y coordinates. That is, we are looking for a function F : Rn → R such that F (xi )
is (at least roughly) yi . One way this is done is linear regression . Here we search only
among F that take the form
Fa,b (x) = hx, ai + b
n
with some parameters a ∈ R , b ∈ R. That is, we are trying to “model” the hidden
relation by an affine linear function. The task is now to find the parameters a, b such
that Fa,b “fits best” to the given data set. To make this precise we introduce the error
function
XN
E(a, b) = (Fa,b (xi ) − yi )2 .
i=1
The problem of linear regression is to find the parameters (a, b) such that E(a, b) is
minimal.
One approach to minimizing a function f : E → R is to solve the equation ∇f (x) =
0, i.e. to find all critical points. By Corollary 3.44 we know that every minimum must
be a critical point. However it is often difficult to solve that equation, so more practical
methods are needed.
Gradient descent. Choose x0 ∈ Rn arbitrary and let
xn+1 = xn − αn ∇f (xn )
where αn > 0 is a small enough number to be determined later. The idea of this
iteration is to keep moving into the direction where f decreases the fastest. Sometimes
this simple process successfully converges to a minimum and sometimes it doesn’t,
depending on f , x0 and αn . What we can say from the definition is that, if f ∈ C 1 (E)
and (xn )n∈N converges, then the limit is a critical point of f . The following lemma gives
some more hope.
Lemma 3.51. Let f ∈ C 1 (E). Then, for every x ∈ E and small enough α > 0,
f (x − α∇f (x)) ≤ f (x).
Proof. By the definition of total derivatives,
f (x − α∇f (x)) = f (x) + h∇f (x), −α∇f (x)i + o(α) = f (x) − αk∇f (x)k2 + o(α)
which is ≤ f (x) provided that α > 0 is small enough. 
74 3. DIFFERENTIAL CALCULUS IN Rn

Remark. Note that the smallness of α in this lemma depends on the point x. Also,
this result is not enough to prove anything about the convergence of gradient descent.
We will see that gradient descent works well if f is a convex function.
Definition 3.52. Let E ⊂ Rn be convex. A function f : E → R is called convex
if
f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y)
for all x, y ∈ E, t ∈ [0, 1]. f is called strictly convex if
f (tx + (1 − t)y) < tf (x) + (1 − t)f (y)
for all x 6= y ∈ E and t ∈ (0, 1).
Theorem 3.53. Let E ⊂ Rn be open and convex and f ∈ C 1 (E). Then f is convex
if and only if
f (u + v) ≥ f (u) + h∇f (u), vi
for all u, u + v ∈ E.
Proof. ⇒: Fix u, u + v ∈ E. By convexity, for t ∈ [0, 1],
f (u + tv) = f ((1 − t)u + t(u + v)) ≤ (1 − t)f (u) + tf (u + v).
By definition of the derivative,
f (u + tv) = f (u) + t∇f (u)T v + r(t),
r(t)
where limt→0 t
= 0. Thus,
f (u) + th∇f (u), vi + r(t) ≤ (1 − t)f (u) + tf (u + v)
which implies
−r(t)
f (u) + h∇f (u), vi − f (u + v) ≤ → 0 as t → 0.
t
Therefore f (u) + h∇f (u), vi ≤ f (u + v).
⇐: Let x, y ∈ E, t ∈ [0, 1]. Let u = tx + (1 − t)y and v = x − u. Then the assumption
implies
f (x) ≥ f (u) + h∇f (u), x − ui.
On the other hand, letting v = y − u, the assumption implies
f (y) ≥ f (u) + h∇f (u), y − ui.
Therefore
tf (x) + (1 − t)f (y) ≥ t(f (u) + h∇f (u), x − ui) + (1 − t)(f (u) + h∇f (u), y − ui)
= f (u) + h∇f (u), t(x − u) + (1 − t)(y − u)i = f (u) + h∇f (u), tx + (1 − t)y − ui.
Recalling that u = tx + (1 − t)y, we get
tf (x) + (1 − t)f (y) ≥ f (u) = f (tx + (1 − t)y). 
Theorem 3.54. Let E ⊂ Rn be open and convex and f ∈ C 2 (E). Then
(1) f is convex if and only if D2 f |x ≥ 0 for all x ∈ E,
(2) f is strictly convex if D2 f |x > 0 for all x ∈ E.
7. OPTIMIZATION AND CONVEXITY* 75

Proof. We only prove (1). The proof of (2) is very similar. Let f be convex. By
Taylor’s theorem, for u, u + tv ∈ E,
f (u + tv) = f (u) + th∇f (u), vi + 12 t2 hD2 f |u v, vi + o(t2 )
and by Theorem 3.53,
f (u + tv) ≥ f (u) + th∇f (u), vi.
Combining these two pieces of information we obtain
1 2
2
t hD2 f |u v, vi + o(t2 ) ≥ 0
which implies hD2 f |u v, vi ≥ 0 for all v ∈ Rn .
Conversely, assume that D2 f |u ≥ 0 for all u ∈ E. By Taylor’s theorem, for all u, u+v ∈
E exists ξ ∈ E such that
f (u + v) = f (u) + h∇f (u), vi + 12 hD2 f |ξ v, vi ≥ f (u) + h∇f (u), vi.
Therefore f is convex by Theorem 3.53. 
Remark. If f is strictly convex, then it does not follow that D2 f |x > 0 for all x.
Example 3.55. Let f : R → R, f (x) = x4 . Then D2 f |x = f 00 (x) = 12x2 which is 0
at x = 0, but f is strictly convex.
Theorem 3.56. Let E ⊂ Rn be open and convex and f ∈ C 2 (E). Then
(1) If f is convex, then every critical point of f is a global minimum.
(2) If f is strictly convex, then f has at most one critical point.
Remarks. 1. Convex functions may have more than one critical point. For instance,
the constant function f ≡ 0 is convex.
2. Conclusion (1) implies that if f is convex and gradient descent converges, then it
converges to a global minimum.
Proof. (1): Let ∇f (x∗ ) = 0. Then by Taylor’s theorem, for every x ∈ E there
exists ξ ∈ E such that
f (x) = f (x∗ ) + h∇f (x∗ ), x − x∗ i + 12 hD2 f |ξ (x − x∗ ), x − x∗ i ≥ f (x∗ ).
| {z } | {z }
=0 ≥0

(2): Let x1 , x2 ∈ E be critical points of f . By (1), they are global minima. This implies
f (x1 ) = f (x2 ). If x1 6= x2 , then by strict convexity,
f (x1 ) + f (x2 ) x + x 
1 2
f (x1 ) = >f .
2 2
This is a contradiction to x1 being a global minimum. Therefore x1 = x2 . 
Example 3.57. If k · k is a norm on Rn , then the function x 7→ kxk is convex:
ktx + (1 − t)yk ≤ tkxk + (1 − t)kyk
by the triangle inequality. Also, this function has a unique global minimum at x = 0.
Lemma 3.58. Let I ⊂ R, E ⊂ Rn be convex and suppose that
(1) f : E → I is convex, and
(2) g : I → R is convex and nondecreasing.
Then the function h : E → R given by h = g ◦ f is convex.
76 3. DIFFERENTIAL CALCULUS IN Rn

Proof. By convexity of f and since g is nondecreasing,


h(tx + (1 − t)y) = g(f (tx + (1 − t)y)) ≤ g(tf (x) + (1 − t)f (y)).
Since g is convex this is
≤ tg(f (x)) + (1 − t)g(f (y)) = th(x) + (1 − t)h(y).

Corollary 3.59. If k · k is a norm on Rn , then the function x 7→ kxk2 is convex.
Example 3.60. Recall the error function from linear regression (Example 3.50):
N
X
E(a, b) = (ha, xi i + b − yi )2
i=1
n+1
We claim that E : R → R is a convex function. We first rewrite E(a, b) into a
different form. Define a N × (n + 1) matrix M and a vector v ∈ Rn+1 by
a1
 
x11 · · · x1n 1
 
.. 
.. .. .. ..  ∈ RN ×(n+1) , v =  .  n+1
M= . . . .  a ∈R ,
 
n
xN 1 · · · xN n 1
b
where xi = (xi1 , . . . , xin ) ∈ Rn for i = 1, . . . , N and a = (a1 , . . . , an ) ∈ Rn . Then
N
X
E(a, b) = E(v) = ((M v)i − yi )2 = kM v − yk2 ,
i=1
P 1/2
N
where kck = i=1 |ci |2 .

Let us rename variables and consider


E(x) = kM x − yk2
for x ∈ Rn , M ∈ RN ×n , y ∈ RN . Let F : RN → R be defined by F (y) = kyk2 and
G : Rn → RN , G(x) = M x − y. We have
∂i F (y) = 2yi , so DF |y = 2y T ∈ R1×N .
and DG|x = M ∈ RN ×n . Therefore, by the chain rule we obtain
DE|x = 2(M x − y)T M = 2(M x)T M − 2y T M = 2xT M T M − 2y T M ∈ R1×n .
Therefore,
D2 E|x = (∂i DE|x )i=1,...,n = (2(M T M )i )i=1,...,n = 2M T M.
Notice that M T M is positive semidefinite because
hM T M x, xi = hM x, M xi = kM xk2 ≥ 0.
Therefore E is convex by Theorem 3.54.
Example 3.61. Convex functions do not necessarily have a critical point. For
instance the function f : R → R, f (x) = x is convex, because D2 f |x = f 00 (x) = 0 for
all x ∈ R. But ∇f (x) = f 0 (x) = 1 6= 0 for all x ∈ R.
It is also not enough to assume strict convexity. For instance, the function f : R → R,
f (x) = ex is strictly convex, because f 00 (x) = ex > 0. But f 0 (x) = ex > 0 for all x ∈ R.
This motivates us to consider a stronger notion of convexity.
7. OPTIMIZATION AND CONVEXITY* 77

Definition 3.62. Let E ⊂ Rn be convex and open. Let f ∈ C 2 (E). We say that
f is strongly convex if there exists β > 0 such that
hD2 f |x y, yi ≥ βkyk2
for all x ∈ E, y ∈ Rn .
Remarks. 1. f is strongly convex if and only if there exists β > 0 such that
D2 f |x − βI ≥ 0 for all x ∈ E. This follows directly from the definition using that
βkyk2 = hβIy, yi. The condition D2 f |x − βI ≥ 0 is equivalent to the smallest eigen-
value of D2 f |x being ≥ β. Yet another equivalent way of stating this is saying that the
function g(x) = f (x) − β2 kxk2 is convex. This is because D2 g|x = D2 f |x − βI.
2. If f is strongly convex, then f is strictly convex (by Theorem 3.54).
3. If f is strictly convex, then f is not necessarily strongly convex. For example con-
sider f : R → R, f (x) = ex . For every β > 0 there exists x ∈ R such that ex < β
because ex → 0 as x → −∞.

The following exercise shows that the assumption of strong convexity is not as
restrictive as it may seem at first sight: strictly convex functions are strongly convex
when restricted to compact sets.
Exercise 3.63. Suppose that f ∈ C 2 (Rn ) is strictly convex. Let K ⊂ Rn be
compact and convex. Show that there exist β− , β+ > 0 such that
β− kyk2 ≤ hD2 f |x y, yi ≤ β+ kyk2
for all x ∈ K and y ∈ Rn . (In particular, f is strongly convex on K.)
Hint: Consider the minimal eigenvalue of D2 f |x as a function of x.
Theorem 3.64. Let E ⊂ Rn be open and convex. Let f ∈ C 2 (E). Then f is
strongly convex if and only if there exists γ > 0 such that
f (u + v) ≥ f (u) + h∇f (u), vi + γkvk2
for every u, u + v ∈ E.
Proof. ⇒: Let β > 0 be such that g(x) = f (x) − β2 kxk2 is convex. Then by
Theorem 3.53,
g(u + v) ≥ g(u) + h∇g(u), vi = f (u) − β2 kuk2 + h∇f (u) − βu, vi
On the other hand,
g(u + v) = f (u + v) − β2 ku + vk2
Thus,
f (u + v) ≥ f (u) + h∇f (u), vi + β2 (ku + vk2 − kuk2 − 2hu, vi) = f (u) + h∇f (u), vi + β2 kvk2 .
⇐: This follows in the same way from the converse direction of Theorem 3.53. 
Theorem 3.65. Let f ∈ C 2 (Rn ) be strongly convex. Then for every c ∈ R, the
sublevel set
B = {x ∈ Rn : f (x) ≤ c}
is bounded.
78 3. DIFFERENTIAL CALCULUS IN Rn

Proof. By Theorem 3.64 we have


f (x) ≥ f (0) + h∇f (0), xi + γkxk2 .
Therefore, limkxk→∞ f (x) = ∞. Suppose that B is unbounded. Then there would exist
a sequence (xn )n≥1 ⊂ B such that limn→∞ kxn k = ∞. But f (xn ) ≤ c, so f (xn ) 6→ ∞
as n → ∞. Contradiction! 
Theorem 3.66. Let f ∈ C 2 (Rn ) be strongly convex. Then there exists a unique
global minimum of f .
Proof. By the previous theorem, the set B = {x ∈ Rn : f (x) ≤ f (0)} is bounded.
Thus, there exists R > 0 such that B ⊂ BR = {x ∈ Rn : kxk ≤ R}. BR is compact,
so f attains its minimum on BR (0) at some point x∗ ∈ BR . Then f (x∗ ) ≤ f (x) for all
x ∈ BR . It remains to show f (x∗ ) ≤ f (x) for all x 6∈ BR . If x 6∈ BR , then x 6∈ B, so
f (x) > f (0). Also, 0 ∈ BR , so f (x∗ ) ≤ f (0) < f (x). 
We conclude this discussion by proving that gradient descent converges for strongly
convex functions.
Theorem 3.67. Let f ∈ C 2 (Rn ) be strongly convex and x0 ∈ Rn . Define
xn+1 = xn − α∇f (xn ) for n ≥ 0.
If α is small enough, then (xn )n∈N converges to the global minimum x∗ of f .
Remark. The restriction to f defined on Rn is only for convenience (the same is true
for Theorems 3.65 and 3.66).
Lemma 3.68. Let A ∈ Rn×n be a symmetric and positive definite matrix. Then the
matrix norm kAkop = supx6=0 kAxk
kxk
is equal to the largest eigenvalue of A.
Pn 2 1/2
(Here kxk = ( i=1 |xi | ) is the Euclidean norm.)
Proof. Let {v1 , . . . , vn } be an orthonormal basis of eigenvectors corresponding to
eigenvalues λ1 , . . . , λn , respectively. Then
n
X n
X
kAxk = xi Avi = xi λi vi
i=1 i=1
P 1/2
n
which by orthogonality is equal to i=1 |xi |2 λ2i (use that kxk = (hx, xi)1/2 ). Thus

n n
!1/2
X X
kAxk = ( |xi |2 λ2i )1/2 ≤ max λi |xi |2 = max λi kxk.
i=1,...,n i=1,...,n
i=1 i=1

Let maxi=1,...,n λi = λi0 . We have shown that kAk ≤ λi0 . On the other hand,
kAvi0 k = λi0 kvi0 k = λi0 ,
so kAk = supkxk=1 kAxk ≥ kAvi0 k = λi0 . 

Proof of Theorem 3.67. Let α > 0. Define T (x) = x − α∇f (x). Then xn+1 =
T (xn ). We want T to be a contraction. For R > 0 define BR = {x ∈ Rn : kx − x∗ k ≤
R}. Let R > 0 be large enough such that x0 ∈ BR .
Claim. If α is small enough, then T is a contraction of BR .
8. FURTHER EXERCISES 79

Proof of claim. x∗ is a global minimum of f , so ∇f (x∗ ) = 0. Thus, T (x∗ ) = x∗ .


We have
DT |x = I − αD2 f |x .
The largest eigenvalue of D2 f |x is a continuous function of x which is bounded on the
compact set BR . Therefore there exists γ > 0 such that
hD2 f |x y, yi ≤ γkyk2
for all y ∈ Rn and x ∈ BR . By strong convexity,
βkyk2 ≤ hD2 f |x y, yi ≤ γkyk2 .
In other words, the eigenvalues of D2 f |x are contained in the interval [β, γ] for all
1
x ∈ BR . Let α ≤ 2γ . Then the eigenvalues of I − αD2 f |x are contained in
γ β β
[1 − 2γ
,1 − 2γ
] = [ 12 , 1 − 2γ
] ⊂ (0, 1).
β
Set c = 1 − 2γ
. By Lemma 3.68, we have
kI − αD2 f |x k ≤ c < 1.
Therefore, kT (x) − T (y)k ≤ ckx − yk for all x, y ∈ BR . It remains to show that
T (BR ) ⊂ BR . Let x ∈ BR . Then since T (x∗ ) = x∗ ,
kT (x) − x∗ k = kT (x) − T (x∗ )k ≤ ckx − x∗ k ≤ cR ≤ R.

The claim now follows from the contraction principle (more precisely, from the same
argument used to prove the Banach fixed point theorem). 

8. Further exercises
Exercise 3.69. Let U ⊆ Rn be open and convex and f : U → R differentiable such
that ∂1 f (x) = 0 for all x ∈ U .
(i) Show that the value of f (x) for x = (x1 , . . . , xn ) ∈ U does not depend on x1 .
(ii) Does (i) still hold if we assume that U is connected instead of convex? Give a proof
or counterexample.
Exercise 3.70. A function f : Rn \ {0} → R is called homogeneous of degree α ∈ R
if f (λx) = λα f (x) for all λ > 0 and x ∈ Rn \ {0}. Suppose that f is differentiable in
Rn \ {0}. Then show that f is homogeneous of degree α if and only if
Xn
xi ∂i f (x) = αf (x)
i=1

for all x ∈ R \ {0}. Hint: Consider the function g(λ) = f (λx) − λα f (x).
n

Exercise 3.71. Define F : R2 → R2 by


F (x, y) = (x4 − y 4 , exy − e−xy ).
(i) Compute the Jacobian of F .
(ii) Let p0 ∈ R2 and p0 6= (0, 0). Show that there exist open neighborhoods U, V ⊂ R2
of p0 and F (p0 ), respectively and a function G : V → U such that G(F (p)) = p for all
p ∈ U and F (G(p)) = p for all p ∈ V .
(iii) Compute DG F (p0 ) .
(iv) Is F a bijective map?
80 3. DIFFERENTIAL CALCULUS IN Rn

Exercise 3.72. Let a ∈ R, a 6= 0 and E = {(x, y, z) ∈ R3 : a + x + y + z 6= 0} and


f : E → R3 defined by
 
x y z
f (x, y, z) = , , .
a+x+y+z a+x+y+z a+x+y+z
(i) Compute the Jacobian determinant of f (that is, the determinant of the Jacobian
matrix).
(ii) Show that f is one-to-one and compute its inverse f −1 .
Exercise 3.73. Prove that there exists δ > 0 such that for all square matrices
A ∈ Rn×n with kA−Ik < δ (where I denotes the identity matrix) there exists B ∈ Rn×n
such that B 2 = A.
Exercise 3.74. Look at each of the following as an equation to be solved for x ∈ R
in terms of parameter y, z ∈ R. Notice that (x, y, z) = (0, 0, 0) is a solution for each of
these equations. For each one, prove that it can be solved for x as a C 1 -function of y, z
in a neighborhood of (0, 0, 0).
3
(a) cos(x)2 − esin(xy) +x = z 2
(b) (x2 + y 3 + z 4 )2 = sin(x − y + z)
(c) x7 + yez x3 − x2 + x = log(1 + y 2 + z 2 )

Exercise 3.75. Let (t0 , y0 ) ∈ R2 , c ∈ R and define Y0 (t) = y0 ,


Z t
Yn (t) = y0 + c sYn−1 (s)ds.
t0

(i) Which initial value problem does Y solve?


(ii) For the case t0 = 0 compute Yn (t) and Y (t) = limn→∞ Yn (t).
Exercise 3.76. Let F , (x0 , y0 ), M , C, a∗ be as in the Picard Lindelöf theorem
(Theorem 3.24). Define on [x0 − a∗ , x0 + a∗ ] The method of successive
Rx approximation
produces a sequence Yn , satisfying Y0 (x0 ) = y0 , Yn (x) = y0 + x0 F (t, Yn−1 (t))dt, for
n ≥ 1. The Banach fixed point argument in the proof of the theorem gives that Yn
converges uniformly to the solution of the initial value problem. The convergence rate
for large n is better than predicted by the Banach fixed point argument. To see this
prove by induction that
M C n−1 an∗
max |Yn (x) − Yn−1 (x)| ≤
|x−x0 |≤a∗ n!
for n ≥ 1.
Exercise 3.77. Consider the initial value problem
 0 2 1
y (t) = ey(t) − ty(t) ,
y(1) = 1.
Find an interval I = (1 − h, 1 + h) such that this problem has a unique solution y in I.
Give an explicit estimate for h (it does not need to be best possible).
Exercise 3.78. Consider the initial value problem
 0
y (t) = t + sin(y(t)),
y(2) = 1.
8. FURTHER EXERCISES 81

Find the largest interval I ⊆ R containing t0 = 2 such that the problem has a unique
solutions y in I.
Exercise 3.79. Let F be a smooth function on R2 (i.e. partial derivatives of all
orders exist everywhere and are continuous) and suppose that the initial value problem
y 0 = F (t, y), y(t0 ) = y0 has a unique solution y on the interval I = [t0 , t0 + a] with
y smooth on I. Let h > 0 be sufficiently small and define tk = t0 + kh for integers
0 ≤ k ≤ a/h.
Define a function yh recursively by setting yh (t0 ) = y0 and
yh (t) = yh (tk ) + (t − tk )F (tk , yh (tk ))
for t ∈ (tk , tk+1 ] for integers 0 ≤ k ≤ a/h.
(i) From the proof of Peano’s theorem (Theorem 3.29) it follows that yh → y uniformly
on I as h → 0. Prove the following stronger statement: there exists a constant C > 0
such that for all t ∈ I and h > 0 sufficiently small,
|y(t) − yh (t)| ≤ Ch.
Hint: The left hand side is zero if t = t0 . Use Taylor expansion to study how the error
changes as t increases from tk to tk+1 .
(ii) Let F (t, y) = λy with λ ∈ R a parameter. Explicitly determine y, yh and a value
for C in (i).

Exercise 3.80. Let us improve the approximation from Exercise 3.79. In the
context of that exercise, define a piecewise linear function yh∗ recursively by setting
yh∗ (t0 ) = y0 and
yh∗ (t) = yh∗ (tk ) + (t − tk )G(tk , yh∗ (tk ), h),
for t ∈ (tk , tk+1 ] for integers 0 ≤ k ≤ a/h, where
G(t, y, h) = 12 (F (t, y) + F (t + h, y + hF (t, y))).
Prove that there exists a constant C > 0 such that for all t ∈ I and h > 0 sufficiently
small,
|y(t) − yh∗ (t)| ≤ Ch2 .
Exercise 3.81. For a function f : [a, b] → R define
Z b
I(f ) = (1 + f 0 (t)2 )1/2 dt.
a
2
Let A = {f ∈ C ([a, b]) : f (a) = c, f (b) = d}. Determine f∗ ∈ A such that
I(f∗ ) = inf I(f ).
f ∈A

What is the geometric meaning of I(f ) and inf f ∈A I(f )?


Exercise 3.82. Let f, g : Rn → R be smooth functions (that is, all partial deriva-
tives exist to arbitrary orders and are continuous). Show that for all multiindices
α ∈ Nn0 ,
X α
α
∂ (f · g)(x) = ∂ β f (x)∂ α−β g(x)
β∈Nn
β
0 : β≤α

for all x ∈ Rn , where αβ = β!(α−β)!


α!
= β1 !···βn !(αα11−β
!···αn !

1 )!···(αn −βn )!
.
82 3. DIFFERENTIAL CALCULUS IN Rn

Exercise 3.83. Let f : R2 → R be such that ∂1 ∂2 f exists everywhere. Does it


follow that ∂1 f exists? Give a proof or counterexample.
Exercise 3.84. Determine the Taylor expansion of the function
x−y
f : (0, ∞) × (0, ∞) → R, f (x, y) =
x+y
at the point (x, y) = (1, 1) up to order 2.
Exercise 3.85. Show that every continuous function f : [a, b] → [a, b] has a fixed
point.
Exercise 3.86. Let X be a real Banach space. Let B = {x ∈ X : kxk ≤ 1} and
∂B = {x ∈ X : kxk = 1}. Show that the following are equivalent:
(i) every continuous map f : B → B has a fixed point
(ii) there exists no continuous map r : B → ∂B such that r(b) = b for all b ∈ ∂B.
Exercise 3.87. Determine the local minima and maxima of the function
2 −4y 2
f : R2 → R, f (x, y) = (4x2 + y 2 )e−x .
Exercise 3.88. Let E ⊂ Rn be open, f : E → R and x ∈ E. Assume that for y in
a neighborhood of 0 we have
X
f (x + y) = cα y α + o(kykk )
|α|≤k

as y → 0 and
X
f (x + y) = cα y α + o(kykk )
e
|α|≤k

as y → 0. Show that cα = e
cα for all |α| ≤ k.
Exercise 3.89. Let D = {(x, y) ∈ R2 : x2 + y 2 ≤ 1}. Determine the maximum
and minimum values of the function f : D → R, f (x, y) = 4x2 − 3xy.
Exercise 3.90. Let f ∈ C 2 (Rn ) and suppose that the Hessian of f is positive
definite at every point. Show that ∇f : Rn → Rn is an injective map.
Exercise 3.91. For f (x, y, z) = x + y + z determine all maxima and minima under
the constraints x2 + y 2 = 2 and x + z − 1 = 0. Use the method of Lagrange multipliers.
Exercise 3.92. Let f ∈ C 2 (Rn ) be strongly convex. Show that ∇f : Rn → Rn is
a diffeomorphism (that is, show that it is differentiable, bijective and that its inverse is
differentiable).
Exercise 3.93. Let f (x) = 21 hAx, xi − hb, xi + c with A ∈ Rn×n and b ∈ Rn , c ∈ R.
Assume that A is symmetric and positive definite. Show that f has a unique global
minimum at some point x∗ and determine f (x∗ ) in terms of A, b, c.
Exercise 3.94. Prove that the point x∗ from Exercise 3.93 can be computed using
gradient descent: that is, if x0 ∈ Rn arbitrary and
xn+1 = xn − α∇f (xn )
for n = 0, 1, 2, . . . , then the sequence (xn )n∈N converges to x∗ for all starting points
x0 ∈ Rn , provided that α is chosen sufficiently small.
8. FURTHER EXERCISES 83

Exercise 3.95. Let D ⊂ R2 be a finite set. Define a function E : R3 → R by


X
E(a, b, c) = (ax21 + bx1 + c − x2 )2 .
x∈D

(1) Show that E is convex.


(2) Does there exist a set D such that E is strongly convex? Proof or counterex-
ample.

Exercise 3.96. (a) Find a convex function that is not bounded from below.
(b) Find a strictly convex function that is not bounded from below.
(c) If a function is strictly convex and bounded from below, does it necessarily have a
critical point? (Proof or counterexample.)
Exercise 3.97. (a) Give an example of a convex function that is not continuous.
(b) Let f : (a, b) → R. Show that if f is convex, then f is continuous.
Exercise 3.98. Construct a strictly convex function f : R → R such that f is not
differentiable at x for every x ∈ Q.
Exercise 3.99. Let f ∈ C 2 (Rn ). Recall that we defined f to be strongly convex if
there exists β > 0 such that hD2 f |x y, yi ≥ βkyk2 for every x, y ∈ Rn . Show that f is
strongly convex if and only if there exists γ > 0 such that
f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y) − γt(1 − t)kx − yk2
for all x, y ∈ Rn , t ∈ [0, 1].
(Consequently, that condition can serve as an alternative definition of strong convexity,
which is also valid if f is not C 2 .)
Exercise 3.100. (See also Exercise 4.82 as motivation for this exercise.) Fix a
function σ ∈ C 1 (R) and define for x ∈ Rn , W ∈ Rm×n , v ∈ Rm ,
Xm
µ(x, W, v) = σ((W x)i )vi
i=1

Given a finite set of points D = {(x1 , y1 ), . . . , (xN , yN ) ∈ Rn × R} define


N
X
E(W, v) = (µ(xi , W, v) − yi )2 .
j=1

Is E necessarily convex? (Proof or counterexample.)


CHAPTER 4

Approximation of functions

In this section we want to study different ways to approximate continuous functions.


Let X be a normed vector space of functions (say, continuous functions on [0, 1])
and A ⊂ X some subspace of it (say, polynomials). Let f ∈ X be arbitrary. Our
goal is to ‘approximate‘ the function f by functions g in A. We measure the quality of
approximation by the error in norm, i.e. kf − gk.
The most basic question in this context is:
Can we make kf − gk arbitrarily small?
More precisely, we are asking if A is dense in X. Recall that A ⊂ X is called dense
if A = X. That is, if for every f ∈ X and every ε > 0 there exists a g ∈ A such that
kf − gk ≤ ε.

1. Polynomial approximation
Theorem 4.1 (Weierstrass). For every continuous function f on [a, b] there exists
a sequence of polynomials that converges uniformly to f .
In other words, the theorem says that the set A = {p : p polynomial} is dense in
C([a, b]).

There are many proofs of this theorem in the literature. We present a proof using
Bernstein polynomials . Without loss of generality we consider only the interval [a, b] =
[0, 1] (why are we allowed to do that?).
Let f be continuous on [0, 1]. Define for n = 1, 2, . . . :
Xn  k n
Bn f (t) = f tk (1 − t)n−k .
k=0
n k

Bn f is a polynomial of degree n. We will show that Bn f → f uniformly on [0, 1]. By


the binomial theorem,
n  
n
X n k
1 = (t + 1 − t) = t (1 − t)n−k .
k=0
k

Thus,
n  
X n k
(4.1) Bn f (t) − f (t) = (f (k/n) − f (t)) t (1 − t)n−k .
k=0
k

Let ε > 0. By uniform continuity of f we choose δ > 0 be such that |f (t) − f (s)| ≤ ε/2
for all t, s ∈ [0, 1] with |t − s| ≤ δ. Now we write the sum on the right hand side of
85
86 4. APPROXIMATION OF FUNCTIONS

(4.1) as I + II, where


n  
X n k
I= (f (k/n) − f (t)) t (1 − t)n−k ,
k=0,
k
k
| −t|<δ
n
n  
X n k
II = (f (k/n) − f (t)) t (1 − t)n−k .
k=0,
k
k
| −t|≥δ
n
We estimate I and II separately. For I we have from uniform continuity that
n  
εX n k ε
|I| ≤ t (1 − t)n−k = .
2 k=0 k 2
To estimate II we first compute the Bernstein polynomials for the monomials 1, t, t2 .
Lemma 4.2. Let gm (t) = tm . Then
Bn g0 (t) = 1
Bn g1 (t) = t
t − t2
Bn g2 (t) = t2 + for n ≥ 2
n
Proof. We have
n  
X n k
Bn g0 (t) = t (1 − t)n−k = (t + (1 − t))n = 1
k=0
k
by the binomial theorem. Next,
n   n  
X k n k n−k
X n−1 k
Bn g1 (t) = t (1 − t) = t (1 − t)n−k
k=0
n k k=1
k − 1
n−1  
X n−1 k
=t t (1 − t)(n−1)−k = t(t + (1 − t))n−1 = t.
k=0
k
To compute Bn g2 we use that
k2 n
       
k n−1 n−1k−1 n−1 1 n−1
= = +
n2 k n k−1 n n−1 k−1 n k−1
   
n−1 n−2 1 n−1
= + .
n k−2 n k−1
Thus,
n   n  
n−1X n−2 k n−k 1 X n−1 k
Bn g2 (t) = t (1 − t) + t (1 − t)n−k
n k=2 k − 2 n k=1 k − 1

n−1 2 1 t − t2
= t + t = t2 + .
n n n

As a consequence, we obtain the following:
2. ORTHONORMAL SYSTEMS 87

Lemma 4.3. For all t ∈ [0, 1],


n  
X k 2 n 1
( − t) tk (1 − t)n−k ≤ .
k=0
n k n

Proof. From the previous lemma,


n  
X k 2 n
( − t) tk (1 − t)n−k = Bn g2 (t) − 2tBn g1 (t) + t2 Bn g0 (t)
k=0
n k

t − t2 t − t2
= t2 + − 2t2 + t2 = .
n n
Since t ∈ [0, 1] we have 0 ≤ t − t2 = t(1 − t) ≤ 1. 
Now we are ready to estimate II. First note that f is bounded, so there exists C ≥ 0
such that |f (x)| ≤ C for all x ∈ [0, 1]. Choose N ∈ N such that 2cδ −2 N −1 ≤ ε/2. Then
for all n ≥ N ,
n   n  
X n k n−k −2
X k 2 n
|II| ≤ 2c t (1 − t) ≤ 2Cδ ( − t) tk (1 − t)n−k
k=0,
k k=0
n k
k
| −t|≥δ
n

≤ 2cδ −2 N −1 ≤ ε/2.
In the second inequality we have used that δ −2 | nk −t|2 ≤ 1. Thus if n ≥ N and t ∈ [0, 1],
then
|Bn f (t) − f (t)| ≤ |I| + |II| ≤ ε/2 + ε/2 = ε.
This concludes the proof of Weierstrass’ theorem.

2. Orthonormal systems
In the previous section we studied approximation of continuous functions in the
supremum norm, kf k∞ = supx∈[a,b] |f (x)|. In this section we turn our attention to
another important norm, the L2 norm.
Definition 4.4. For two piecewise continuous functions f, g on an interval [a, b] we
define their inner product by
Z b
(4.2) hf, gi = f (x)g(x)dx.
a

If hf, gi = 0 we say that f and g are orthogonal. We define the L2 -norm of f by


Z b 1/2
2
kf k2 = |f (x)| dx .
a
2
If kf k2 = 1 then we say that f is L -normalized .
Note: Some comments are in order regarding the term ’piecewise continuous’. For
our purposes we call a function f , defined on an interval [a, b], piecewise continuous if
limx→x0 f (x) exists at every point x0 and is different from f (x0 ) at at most finitely many
points. We denote this class of functions by pc([a, b]). Piecewise continuous functions
are Riemann integrable.
The inner product has the following properties (for functions f, g, h and λ ∈ C):
88 4. APPROXIMATION OF FUNCTIONS

• Sesquilinearity:
hf + λg, hi = hf, hi + λhg, hi,
hh, f + λgi = hh, f i + λhh, gi.
• Antisymmetry: hf, gi = hg, f i
• Positivity: hf, f i ≥ 0 (and > 0 unless f is zero except at possibly finitely many
points)
Theorem 4.5 (Cauchy-Schwarz inequality). For two piecewise continuous functions
f, g we have
|hf, gi| ≤ kf k2 kgk2 .
Proof. For nonnegative real numbers x and y we have the elementary inequality
x2 y 2
xy ≤ + .
2 2
Thus we have
Z b Z b Z b
2
|hf, gi| ≤ 1
|f (x)g(x)|dx ≤ 2 1
|f (x)| dx + 2 |g(x)|2 dx. = 12 hf, f i + 21 hg, gi.
a a a

Now we note that for every λ > 0, replacing f by λf and g by λ−1 g does not change
the left hand side of this inequality. Thus we have for every λ > 0 that
λ2 1
(4.3) |hf, gi| ≤ 2
hf, f i + 2λ2
hg, gi.
q
hg,gi 2
Now we choose λ so that this inequality is as strong as possible: λ = hf,f i
(we may
assume that hf, f i =
6 0 because otherwise there is nothing to show). Then
p p
|hf, gi| ≤ hf, f i hg, gi.
Note that one can arrive at this definition of λ in a systematic way: treat the right
hand side of (4.3) as a function of λ and minimize it using calculus. 
Corollary 4.6 (Minkowski’s inequality). For two functions f, g ∈ pc([a, b]),
kf + gk2 ≤ kf k2 + kgk2 .
Proof. We may assume kf + gk2 6= 0 because otherwise there is nothing to prove.
Then Z b Z b Z b
2 2
kf + gk2 = |f + g| ≤ |f + g||f | + |f + g||g|
a a a

≤ kf + gk2 kf k2 + kf + gk2 kgk2 = kf + gk2 (kf k2 + kgk2 ).


Dividing by kf + gk2 we obtain kf + gk2 ≤ kf k2 + kgk2 . 
This is the triangle inequality for k · k2 . This makes d(f, g) = kf − gk2 a metric on
say, the set of continuous functions. Unfortunately, the resulting metric space is not
complete. (Its completion is a space called L2 ([a, b]), see Exercise 4.70.)
Definition 4.7. A sequence (φn )n of piecewise continuous functions on [a, b] is
called an orthonormal system on [a, b] if
Z b 
0, if n 6= m,
hφn , φm i = φn (x)φm (x)dx =
a
1, if n = m.
2. ORTHONORMAL SYSTEMS 89

(The index n may run over the natural numbers, or thePintegers, a finite set of
integers, or more generally any countable set. We will write n to denote a sum over
all the indices. In proofs we will always adopt the interpretation that the index n runs
over 1, 2, 3, . . . . This is no loss of generality.)

Notation: For a set A we denote by 1A the characteristic function of A . This is


the function such that 1A (x) = 1 when x ∈ A and 1A (x) = 0 when x 6∈ A.
Example 4.8 (Disjoint support). Let φn (x) = 1[n,n+1) and N ∈ N. Then {φn }N
n=0
is an orthonormal system on [0, N ].
Examples 4.9 (Trigonometric functions). The following are orthonormal systems
on [0, 1]:
e2πinx
1. φn (x) = √
2. φn (x) = √2 cos(2πnx)
3. φn (x) = 2 sin(2πnx)
Exercise 4.10 (Rademacher functions). For n = 0, 1, . . . and x ∈ [0, 1] we define
rn (x) = sgn(sin(2n πx)). Show that (rn )n∈N is an orthonormal system on [0, 1].
Let (φn )n be an orthonormal system and let f be a finite linear combination of the
functions (φn )n . Say,
N
X
(4.4) f (x) = cn φn (x).
n=1
Then there is an easy way to compute the coefficients cn :
Z b
cn = hf, φn i = f (x)φn (x)dx.
a

To prove this we multiply (4.4) by φm (x) and integrate over x:


Z b XN Z b N
X
f (x)φm (x)dx = cn φn (x)φm (x)dx = cn hφn , φm i = cm .
a n=1 a n=1

Notice that the formula cn = hf, φn i still makes sense if f is not of the form (4.4).
Theorem 4.11. Let (φn )n be an orthonormal system on [a, b]. Let f be a piecewise
continuous function. Consider
N
X
sN (x) = hf, φn iφn (x).
n=1

Denote the linear span of the functions (φn )n=1,...,N by XN . Then


(4.5) kf − sN k2 ≤ kf − gk2
holds for all g ∈ XN with equality if and only if g = sN .
In other words, the theorem says that among all functions of the form N
P
n=1 cn φn (x),
2
the function sN defined by the coefficients cn = hf, φn i is the best “L -approximation”
to f in the sense that (4.5) holds.
This can be interpreted geometrically: the function sN is the orthogonal projection
of f onto the subspace XN . As in Euclidean space, the orthogonal projection is char-
acterized by being the point in XN that is closest to f and it is uniquely determined
by this property (see Figure 1).
90 4. APPROXIMATION OF FUNCTIONS

XN⊥

kf − gk2 kf − sN k2

XN
g sN

Figure 1. sN is the orthogonal projection of f onto XN .

Theorem 4.12 (Bessel’s inequality). If (φn )n is an orthonormal system on [a, b]


and f a piecewise continuous function on [a, b] then
X
(4.6) |hf, φn i|2 ≤ kf k22 .
n

Corollary 4.13 (Riemann-Lebesgue lemma). Let (φn )n=1,2,... be an orthonormal


system and f a piecewise continuous function. Then
lim hf, φn i = 0.
n→∞

This follows because the series ∞ 2


P
n=1 |hf, φn i| converges as a consequence of Bessel’s
inequality.

Definition 4.14. An orthonormal system (φn )n is called complete if


X
|hf, φn i|2 = kf k22
n
for all f .
Theorem 4.15. Let (φn )n be an orthonormal system on [a, b]. Let (sN )N be as
in Theorem 4.11. Then (φn )n is complete if and only if (sN )N converges to f in the
L2 -norm (that is, limN →∞ kf − sN k2 = 0) for every piecewise continuous f on [a, b].
We will later see that the orthonormal system φn (x) = e2πinx (n ∈ Z) on [0, 1] is
complete.
Proof of Theorem 4.11. Let g ∈ XN and write
N
X
g(x) = bn φn (x).
n=1
Let us also write
cn = hf, φn i.
We have
N
X N
X
hf, gi = bn hf, φn i = c n bn .
n=1 n=1
3. THE HAAR SYSTEM 91

Using that (φn )n is orthonormal we get


N
DX N
X E N X
X N N
X
hg, gi = bn φ n , bm φm = bn bm hφn , φm i = |bn |2 .
n=1 m=1 n=1 m=1 n=1

Thus,
hf − g, f − gi = hf, f i − hf, gi − hg, f i + hg, gi
N
X N
X N
X
= hf, f i − c n bn − cn b n + |bn |2
n=1 n=1 n=1
N
X N
X
= hf, f i − |cn |2 + |bn − cn |2
n=1 n=1

We have
(4.7) hf − sN , f − sN i = hf, f i − hf, sN i − hsN , f i + hsN , sN i
N
X N
X N
X
= hf, f i − 2 |cn |2 + |cn |2 = hf, f i − |cn |2 .
n=1 n=1 n=1
Thus we have shown
N
X
hf − g, f − gi = hf − sN , f − sN i + |bn − cn |2
n=1
PN
which implies the claim since n=1 |bn − cn |2 ≥ 0 with equality if and only if bn = cn
for all n = 1, . . . , N . 
Proof of Theorem 4.12. From the calculation in (4.7),
N
X
hf, f i − |cn |2 = hf − sN , f − sN i ≥ 0,
n=1
PN 2 2
so n=1 |cPn | ≤ kf k2 for all N . Letting N → ∞ this proves the claim (in particular,
∞ 2
the series n=1 |cn | converges). 
Proof of Theorem 4.15. From (4.7),
N
X
kf − sN k22 = hf, f i − |hf, φn i|2
n=1

This converges to 0 as N → ∞ if and only if (φn )n is complete. 

3. The Haar system


In this section we discuss an important example of an orthonormal system on [0, 1].
Definition 4.16 (Dyadic intervals). For non-negative integers j, k with 0 ≤ j < 2k
we define
Ik,j = [2−k j, 2−k (j + 1)) ⊂ [0, 1].
The interval Ik,j is called a dyadic interval and k is called its generation
S . We denote
by Dk the set of all dyadic intervals of generation k and by D = k≥0 Dk the set of all
dyadic intervals on [0, 1].
92 4. APPROXIMATION OF FUNCTIONS

Definition 4.17. Each dyadic interval I ∈ D with |I| = 2−k can be split in the
middle into its left child and right child , which are again dyadic intervals that we
denote by I` and Ir , respectively.
Example 4.18. The interval I = [ 21 , 21 + 14 ) is a dyadic interval and its left and right
children are given by I` = [ 12 , 12 + 18 ) and Ir = [ 21 + 18 , 12 + 14 ).

0 1
D0

D1

D2

.. .. ..
. . .

S
D= k≥0 Dk

0 I` Ir 1

Figure 2. Dyadic intervals.

Lemma 4.19. (1) Two dyadic intervals are either disjoint or contained in each
other. That is, for every I, J ∈ D at least one of the following is true: I ∩J = ∅
or I ⊂ J or J ⊂ I.
(2) For every k ≥ 0 the dyadic intervals of generation k are a partition of [0, 1).
That is,
[
[0, 1) = I.
I∈Dk

Exercise 4.20. Prove this lemma.


Exercise 4.21. Let J ⊂ [0, 1] be any interval. Show that there exists I ∈ D such
that |I| ≤ |J| and 3I ⊃ J. (Here 3I denotes the interval with three times the length of
I and the same center as I.)
Definition 4.22. For each I ∈ D we define the Haar function associated with it
by
(4.8) ψI = |I|−1/2 (1I` − 1Ir )
3. THE HAAR SYSTEM 93

The countable set of functions given by


H = {1[0,1] } ∪ {ψI : I ∈ D}
is called the Haar system on [0, 1].
Example 4.23. The Haar function associated with the dyadic interval I = [0, 12 ) is
given by √
ψ[0, 1 ] = 2 · (1[0, 1 ) − 1[ 1 , 1 ) )
2 4 4 2

|I|−1/2

I` Ir

Figure 3. A Haar function ψI .

Lemma 4.24. The Haar system on [0, 1] is an orthonormal system.


Proof. Let f ∈ H. If f = 1[0,1] then kf k2 = ( 0 12 )1/2 = 1. Otherwise, f = ψI for
R1
some I ∈ D. Then by (4.8) and since I` and Ir are disjoint,
Z 1 Z 1 Z 1
2 2 −1
1I` + 1Ir = 1.

kf k2 = |ψI | = |I|
0 0 0
Next let f, g ∈ H with f 6= g. Suppose that one of f, g equals 1[0,1] , say f = 1[0,1] .
Then g = ψJ for some J ∈ D and thus
Z 1
hf, gi = ψJ = 0.
0
It remains to treat the case that f = ψI and g = ψJ for I, J ∈ D with I 6= J. By
Lemma 4.19 (i), I and J are either disjoint or contained in each other. If I and J are
disjoint, then hψI , ψJ i = 0. Otherwise they are contained in each other, say I ( J.
Then ψJ is constant on the set where ψI is different from zero. Thus,
Z Z 1
hψI , ψJ i = ψI · ψJ = ±|I|−1
(1I` − 1Ir ) = 0.
0

Let us write [
D<n = Dk
0≤k<n
to denote the set of dyadic intervals of generation less than n. We want to study how
continuous functions can be approximated by linear combinations of Haar functions.
94 4. APPROXIMATION OF FUNCTIONS

Let f ∈ C([0, 1]). Motivated by Theorem 4.11, we define for every positive integer n,
the orthogonal projection X
En f = hf, ψI iψI .
I∈D<n

Definition 4.25. For a function f on [0, 1] and an interval I ⊂ [0, 1] we write


−1
R
hf iI = |I| I
f to denote the average or the mean of f on I.
R1
Theorem 4.26. Let 0 f = 0. Then, for every I ∈ Dn ,
En f (x) = hf iI if x ∈ I.
In other words,
hf iI 1I .
X
En f =
I∈Dn
R1
Theorem 4.27. Suppose that 0
f = 0 and f ∈ C([0, 1]). Then
En f → f uniformly on [0, 1] as n → ∞.
Remark. If f ∈ C([0, 1]) does not have mean zero then En f converges to f −hf i[0,1] .
Corollary 4.28. The Haar system is complete in the sense of Definition 4.14.
For every f ∈ C([0, 1]) we have
X
kf k22 = |hf i[0,1] |2 + |hf, ψI i|2 .
I∈D

Exercise 4.29. By using Theorem 4.27, prove Corollary 4.28.


Proof of Theorem 4.26. Fix n ≥ 0 and write g = En f . We prove something
seemingly stronger.
Claim. For every dyadic interval I ∈ Dn , we have hf iI = hgiI .
This implies the statement in the theorem because En f is constant on dyadic intervals
of generation n.
To prove the claim weRperform an induction on I ∈ Dn . To begin with, the claim holds
1
for I = [0, 1) because 0 f = 0. Now suppose that it is true for some interval I ∈ D<n .
It suffices to show that it also holds for I` and Ir , i.e. that
hf iI` = hgiI` and hf iIr = hgiIr .
Since the Haar system is orthonormal and I ∈ D<n ,
X
hg, ψI i = hf, ψJ ihψJ , ψI i = hf, ψI i.
J∈D<n

Compute Z Z Z
1/2
f− f = |I| f · ψI = |I|1/2 hf, ψI i
I` Ir
and by the same reasoning,
Z Z
g− g = |I|1/2 hg, ψI i.
I` Ir

Combining the last three displays we get


Z Z Z Z
f− f= g− g.
I` Ir I` Ir
4. TRIGONOMETRIC POLYNOMIALS 95

By the inductive hypothesis we know that hf iI = hgiI , so


Z Z Z Z
f+ f= g+ g.
I` Ir I` Ir

Adding the previous two displays gives hf iI` = hgiI` and subtracting them gives hf iIr =
hgiIr . This concludes the proof. 
Proof of Theorem 4.27. Let ε > 0. By uniform continuity of f on [0, 1] (which
follows from Theorem 1.53) we may choose δ > 0 such that |f (t) − f (s)| < ε whenever
t, s ∈ [0, 1] are such that |t − s| < δ. Let N ∈ N be large enough so that 2−N < δ and
n ≥ N . Let t ∈ [0, 1] and I ∈ Dn such that t ∈ I. Then by Theorem 4.26,
Z
−1
|En f (t) − f (t)| = |hf iI − f (t)| ≤ |I| |f (s) − f (t)|ds < ε.
I

Remark. This result goes back to A. Haar’s 1910 article Zur Theorie der orthogo-
nalen Funktionensysteme in Math. Ann. 69 (1910), no. 3, p. 331–371. The functions
(En f )n are also called dyadic martingale averages of f and have wide applications in
modern analysis and probability theory.
Exercise 4.30. Recall the functions rn (x) = sgn(sin(2n πx)) from Exercise 4.10.
(i) Show that every rn for n ≥ 1 can be written as a finite linear combination of Haar
functions and determine the coefficients of this linear combination.
(ii) Show that the orthonormal system on [0, 1] given by (rn )n is not complete.
Exercise 4.31. Define
X 1/2
∆n f = En+1 f − En f, Sf = |∆n f |2 .
n≥1
R1
(i) Assume that 0 f = 0. Prove that kSf k2 = kf k2 .
(ii) Show that for every m ∈ N there exists a function fm that is a finite linear combi-
nation of Haar functions such that supx∈[0,1] |fm (x)| ≤ 1 and supx∈[0,1] |Sfm (x)| ≥ m.
4. Trigonometric polynomials
In the following we will only be concerned with the trigonometric system on [0, 1]:
φn (x) = e2πinx (n ∈ Z)
Definition 4.32. A trigonometric polynomial is a function of the form
N
X
(4.9) f (x) = cn e2πinx (x ∈ R),
n=−N

where N ∈ N and cn ∈ C. If cN or c−N is non-zero, then N is called the degree of f .


From Euler’s identity (see Fact A.12) we see that every trigonometric polynomial
can also be written in the alternate form

N
X
(4.10) f (x) = a0 + (an cos(2πnx) + bn sin(2πnx)).
n=1

Exercise 4.33. Work out how the coefficients an , bn in (4.10) are related to the cn
in (4.9).
96 4. APPROXIMATION OF FUNCTIONS

Every trigonometric polynomial is 1-periodic :


f (x) = f (x + 1)
for all x ∈ R.
Lemma 4.34. (e2πinx )n∈Z forms an orthonormal system on [0, 1]. In particular,
(i) for all n ∈ Z,
Z 1 
2πinx 0, if n 6= 0,
e dx =
0
1, if n = 0.
PN
(ii) if f (x) = n=−N cn e2πinx is a trigonometric polynomial, then
Z 1
cn = f (t)e−2πint dt.
0

One goal in this section is to show that this orthonormal system is in fact complete.
We denote by pc the space of piecewise continuous, 1-periodic functions f : R →
C (let us call a 1-periodic function piecewise continuous, if its restriction to [0, 1] is
piecewise continuous in the sense defined in the beginning of this section).
Definition 4.35. For a 1-periodic function f ∈ pc and n ∈ Z we define the nth
Fourier coefficient by
Z 1
fb(n) = f (t)e−2πint dt.
0
The series

X
fb(n)e2πinx
n=−∞
is called the Fourier series of f .
The question of when the Fourier series of a function f converges and in what sense
it represents the function f is a very subtle issue and we will only scratch the surface
in this lecture.
Definition 4.36. For a 1-periodic function f ∈ pc we define the partial sums
N
X
SN f (x) = fb(n)e2πinx .
n=−N

Remark. Note that since (φn )n is an orthonormal system, SN f is exactly the


orthogonal projection of f onto the space of trigonometric polynomials of degree ≤ N .
In particular, Theorem 4.11 tells us that
kf − SN f k2 ≤ kf − gk2
holds for all trigonometric polynomials g of degree ≤ N . That is, SN f is the best
approximation to f in the L2 -norm among all trigonometric polynomials of degree
≤ N.
Definition 4.37 (Convolution). For two 1-periodic functions f, g ∈ pc we define
their convolution by
Z 1
f ∗ g(x) = f (t)g(x − t)dt
0

Note that if f, g ∈ pc then f ∗ g ∈ pc.


4. TRIGONOMETRIC POLYNOMIALS 97

Example 4.38. SupposeR f is a given 1-periodic function and g is a 1-periodic


1
function, non-negative and 0 g = 1. Then (f ∗ g)(x) can be viewed as a weighted
average of f around x with weight profile g. For instance, if g = 2N 1[−1/N,1/N ] , then
(f ∗ g)(x) is the average value of f in the interval [x − 1/N, x + 1/N ].
Lemma 4.39. For 1-periodic functions f, g ∈ pc,
f ∗ g = g ∗ f.
Proof. For x ∈ [0, 1],
Z 1 Z x
f ∗ g(x) = f (t)g(x − t)dt = f (x − t)g(t)dt
0 x−1
Z 0 Z x
= f (x − t)g(t)dt + f (x − t)g(t)
x−1 0
Z 1 Z x
= f (x − (t − 1))g(t − 1)dt + f (x − t)g(t)dt
x 0
= g ∗ f (x)
where in the last step we used that f (x − (t − 1)) = f (x − t) and g(t − 1) = g(t) by
periodicity. 
It turns out that the partial sum SN f can be written in terms of a convolution:
XN Z 1 Z 1 N
X
−2πint 2πinx
SN f (x) = f (t)e dte = f (t) e2πin(x−t) dt = f ∗ DN (x).
n=−N 0 0 n=−N

where
N
X
DN (x) = e2πinx .
n=−N
The sequence of functions (DN )N is called Dirichlet kernel . The Dirichlet kernel can
be written more explicitly.
Lemma 4.40. We have
sin(2π(N + 21 )x)
DN (x) =
sin(πx)
Proof.
N 2N
X X e2πi(2N +1)x − 1
DN (x) = e2πinx = e−2πiN x e2πinx = e−2πiN x
n=−N n=0
e2πix − 1
1 1
e2πi(N + 2 ) − e−2πi(N + 2 )x sin(2π(N + 21 )x)
= = .
eπix − e−πix sin(πx)

We would like to approximate continuous functions by trigonometric polynomials.
If f is only continuous it may happen that SN f (x) does not converge. However, instead
of SN f we may also consider their arithmetic means. We define the Fejér kernel by
N
1 X
KN (x) = Dn (x).
N + 1 n=0
98 4. APPROXIMATION OF FUNCTIONS

Lemma 4.41. We have


1 1 − cos(2π(N + 1)x) 1  sin(π(N + 1)x) 2
(4.11) KN (x) = =
2(N + 1) sin(πx)2 N +1 sin(πx)
Proof. Using that 2 sin(x) sin(y) = cos(x − y) − cos(x + y),
sin(2π(N + 21 )x) 2 sin(πx) sin(2π(N + 12 )x)
DN (x) = =
sin(πx) 2 sin(πx)2
cos(2πN x) − cos(2π(N + 1)x)
= .
2 sin(πx)2
Thus,
N N
X 1 X 1 − cos(2π(N + 1)x)
Dn (x) = 2
cos(2πnx) − cos(2π(n + 1)x) = ,
n=0
2 sin(πx) n=0 2 sin(πx)2
and the claim now follows from the formula 1 − cos(2x) = 2 sin(x)2 . 
As a consequence of this explicit formula we see that KN (x) ≥ 0 for all x ∈ R which
is not at all obvious from the initial definition. We define
σN f (x) = f ∗ KN (x).
Theorem 4.42 (Fejér). For every 1-periodic continuous function f ,
σN f → f
uniformly on R as N → ∞.
Corollary 4.43. Every 1-periodic continuous function can be uniformly approxi-
mated by trigonometric polynomials.
Remark. There is nothing special about the period 1 here. By considering the or-

thonormal system (L−1/2 e L inx )n∈Z we obtain a similar result for L-periodic functions.

Proof of Corollary cor:cont-appr. This follows from Fejér’s theorem be-


cause σN f is a trigonometric polynomial. We write σN f (x) as
Z 1 N n N n Z
1 X X 2πik(x−t) 1 X X 1
f (t) e dt = f (t)e−2πikt dte2πikx
0 N + 1 n=0 k=−n
N + 1 n=0 k=−n 0
N n N N
1 X X b 2πikx 1 X X
= f (k)e = fb(k)e2πikx
N + 1 n=0 k=−n N + 1 k=−N
n=|k|
N
X |k|
= (1 − N +1
)fb(k)e2πikx . 
k=−N

We will now derive Fejér’s Theorem as a consequence of a more general principle.


Definition 4.44 (Approximation of unity). A sequence of 1-periodic continuous
functions (kn )n is called approximation of unity if for all 1-periodic continuous functions
f we have that f ∗ kn converges uniformly to f on R. That is,
sup |f ∗ kn (x) − f (x)| → 0
x∈R
as n → ∞.
4. TRIGONOMETRIC POLYNOMIALS 99

Remark. There is no unity for the convolution of functions. More precisely, there
exists no continuous function k such that k ∗ f = f for all continuous, 1-periodic f (this
is the content of Exercise 4.62). An approximation of unity is a sequence (kn )n that
approximates unity:
lim kn ∗ f = f
n→∞
for every continuous, 1-periodic f .
Theorem 4.45. Let (kn )n be a sequence of 1-periodic continuous functions such
that
(1) kn (x) ≥ 0
R 1/2
(2) −1/2 kn (t)dt = 1.
(3) For all 1/2 ≥ δ > 0 we have
Z δ
kn (t)dt → 1
−δ
as n → ∞.
Then (kn )n is an approximation of unity.

Figure 4. Approximation of unity

Assumption (3) is a precise way to express the idea that the “mass” of kn con-
centrates near the origin. Keeping in mind Assumption (2), Assumption (3) can be
rewritten equivalently as: Z
kn (t)dt → 0
1
≥|t|≥δ
2

Proof. Let f be 1-periodic and continuous. By continuity, f is bounded and


uniformly continuous on [−1/2, 1/2]. By periodicity, f is also bounded and uniformly
continuous on all of R. Let ε > 0. By uniform continuity there exists δ > 0 such that
(4.12) |f (x − t) − f (x)| ≤ ε/2
for all |t| < δ, x ∈ R. Using Assumption (2),
Z 1/2
f ∗ kn (x) − f (x) = (f (x − t) − f (x))kn (t)dt = A + B,
−1/2
100 4. APPROXIMATION OF FUNCTIONS

where
Z Z
A= (f (x − t) − f (x))kn (t)dt, B= (f (x − t) − f (x))kn (t)dt.
1
|t|≤δ ≥|t|≥δ
2

By 4.12 and Assumption (2),


Z
ε ε
|A| ≤ kn (t)dt ≤ .
2 |t|≤δ 2
Since f is bounded there exists C > 0 such that |f (x)| ≤ C for all x ∈ R. for all
0 < δ < 21 . Let N be large enough so that for all n ≥ N ,
Z
ε
kn (t)dt ≤ .
1
≥|t|≥δ 4C
2
Thus, if n ≥ N , Z
ε
|B| ≤ 2C kn (t)dt ≤ .
1
≥|t|≥δ 2
2
This implies
|f ∗ kn (x) − f (x)| ≤ ε/2 + ε/2 ≤ ε
for n ≥ N and x ∈ R. 
Corollary 4.46. The Fejér kernel (KN )N is an approximation of unity.
Proof. We verify the assumptions of Theorem 4.45. From (4.11) we see that
KN ≥ 0. Also,
Z 1/2 N n Z N
1 X X 1/2 2πikt 1 X
KN (t)dt = e dt = 1 = 1.
−1/2 N + 1 n=0 k=−n −1/2 N + 1 n=0
1
Now we verify the last property. Let 2
> δ > 0 and |x| ≥ δ. By (4.11),
1 1
KN (x) ≤
N + 1 sin(πδ)2
Thus, Z
1 1
KN (t)dt ≤
1
≥|t|≥δ N + 1 sin(πδ)2
2
which converges to 0 as N → ∞. 
Therefore we have proven Fejér’s theorem. Note that although the Dirichlet kernel
also satisfies Assumptions (2) and (3), it is not an approximation of unity. In other
words, if f is continuous then it is not necessarily true that SN f → f uniformly.
However, we can use Fejér’s theorem to show that SN f → f in the L2 -norm.
Theorem 4.47. Let f be a 1-periodic and continuous function. Then
lim kSN f − f k2 = 0.
N →∞

Proof. Let ε > 0. By Fejér’s theorem there exists a trigonometric polynomial p


such that |f (x) − p(x)| ≤ ε/2 for all x ∈ R. Then
Z 1 1/2
2
kf − pk2 = |f (x) − p(x)| dx ≤ ε/2.
0
4. TRIGONOMETRIC POLYNOMIALS 101

Let N be the degree of p. Then SN p = p by Fact 4.34. Thus,


SN f − f = SN f − SN p + SN p − f = SN (f − p) + p − f.
By Minkowski’s inequality,
kSN f − f k2 ≤ kSN (f − p)k2 + kp − f k2
Bessel’s inequality (Theorem 4.12) says that kSN f k2 ≤ kf k2 . Therefore,
kSN f − f k2 ≤ 2kf − pk2 ≤ ε.


In view of Theorem 4.15 this means that the trigonometric system is complete.
Corollary 4.48 (Parseval’s theorem). If f, g are 1-periodic, continuous functions,
then

X
hf, gi = fb(n)bg (n).
n=−∞

In particular,

X
(4.13) kf k22 = |fb(n)|2 .
n=−∞

Proof. We have
N
X N
X
hSN f, gi = fb(n)he2πinx , gi = fb(n)b
g (n).
n=−N n=−N

But hSN f, gi → hf, gi as N → ∞ because


|hSN f, gi − hf, gi| = |hSN f − f, gi| ≤ kSN f − f k2 kgk2 → 0
as N → ∞. Here we have used the Cauchy-Schwarz inequality and the previous theo-
rem. Equation (4.13) follows from putting f = g. 

Remark. Theorems 4.47 and Corollary 4.48 also hold for piecewise continuous and
1-periodic functions.
Exercise 4.49. (i) Let f be the 1-periodic function such that f (x) = x for x ∈ [0, 1).
Compute the Fourier coefficient fb(n) for every n ∈ Z and use Parseval’s theorem to
derive the formula

X 1 π2
= .
n=1
n2 6
(ii)
P∞Using Parseval’s theorem for a suitable 1-periodic function, determine the value of
1
n=1 n4 .

While the Fourier series of a continuous function does not necessarily converge point-
wise, we can obtain pointwise convergence easily if we impose additional conditions.
Theorem 4.50. Let f be a 1-periodic continuous function and let x ∈ R. Assume
that f is differentiable at x. Then SN f (x) → f (x) as N → ∞.
102 4. APPROXIMATION OF FUNCTIONS

Proof. By definition,
Z 1
SN f (x) = f (x − t)DN (t)dt.
0

Also,
Z 1 N
X Z 1
DN (t)dt = e2πint dt = 1.
0 n=−N 0

Thus from Fact 4.40,


Z 1
SN f (x) − f (x) = (f (x − t) − f (x))DN (t)dt
0
Z 1
= g(t) sin(2π(N + 12 )t)dt,
0

where
f (x − t) − f (x)
g(t) = .
sin(πt)
Differentiability of f at x implies that g is continuous at 0. Indeed,
f (x − t) − f (x) f (x − t) − f (x) t 1
= → f 0 (x)
sin(πt) t sin(πt) π
as t → 0.

Exercise 4.51. Show that φn (x) = 2 sin(2π(n + 21 )x) with n = 1, 2, . . . defines
an orthonormal system on [0, 1].
With this exercise, the claim follows from (4.14) and the Riemann-Lebesgue lemma
(Corollary 4.13). 
Exercise 4.52. Show that there exists a constant c > 0 such that
Z 1
|DN (x)|dx ≥ c log(2 + N )
0

holds for all N = 0, 1, . . . .


Exercise 4.53. (i) Let (ak )k be a sequence of complex numbers with limit L. Prove
that
a1 + · · · + an
lim =L
n→∞ n
Given the sequence ak , form the partial sums sn = nk=1 ak and let
P

s1 + · · · + sN
σN = .
N
σN is called
P∞ the N th Cesàro mean of the sequence sk or the N th Cesàro
P∞ sum of the
series k=1 ak . If σN converges to a limit S we say that the series k=1 ak is Cesàro
summable to S. P

(ii) Prove
P∞ that if k=1 ak is summable to S (i.e. by definition converges with sum S)
then k=1 ak is Cesàro summable
P∞ to S.
(iii) Prove that the sum k=1 (−1)k−1 does not converge but is Cesàro summable to
some limit S and determine S.
5. THE STONE-WEIERSTRASS THEOREM 103

5. The Stone-Weierstrass Theorem


We have seen two different classes of continuous functions that are rich enough
to enable uniform approximation of arbitrary continuous functions: polynomials and
trigonometric polynomials. In other words, we have shown that polynomials are dense
in C([a, b]) and trigonometric polynomials are dense in C(R/Z) (space of continuous
and 1-periodic functions). The Stone-Weierstrass theorem gives a sufficient criterion
for a subset of C(K) to be dense (where K is a compact metric space). Both, Fejér’s
and Weierstrass’ theorems are covered by this more general theorem.
Theorem 4.54 (Stone-Weierstrass). Let K be a compact metric space and A ⊂
C(K). Assume that A satisfies the following conditions:
(1) A is a self-adjoint algebra : for f, g ∈ A, c ∈ C,
f + g ∈ A, f · g ∈ A, c · f ∈ A, f ∈ A.
(2) A separates points : for all x, y ∈ K with x 6= y there exists f ∈ A such that
f (x) 6= f (y).
(3) A vanishes nowhere : for all x ∈ K there exists f ∈ A such that f (x) 6= 0.
Then A is dense in C(K) (that is, A = C(K)).
Exercise 4.55. Let K be a compact metric space. Show that if a subset A ⊂ C(K)
does not separate points or does not vanish nowhere, then A is not dense.

PnExercise 4.56. Let A ⊂ C([1, 2]) be the set of all polynomials of the form p(x) =
2k+1
k=0 ck x where ck ∈ C and n a non-negative integer. Show that A is dense, but
not an algebra.
Before we begin the proof of the Stone-Weierstrass theorem we first need some
preliminary lemmas.
Lemma 4.57. For every a > 0 there exists a sequence of polynomials (pn )n with real
coefficients such that pn (0) = 0 for all n and supy∈[−a,a] |pn (y) − |y|| → 0 as n → ∞.
Proof. From Weierstrass’ theorem we get that there exists a sequence of polyno-
mials qn that converges uniformly to f (y) = |y| on [−a, a]. In particular qn (0) → 0.
Now set pn (y) = qn (y) − qn (0). 
Exercise 4.58. Work out an explicit sequence of polynomials (pn )n that converges
uniformly to x 7→ |x| on [−1, 1].
Let A ⊂ C(K) satisfy conditions (1),(2),(3). Observe that then also A satisfies (1),
(2), (3).
We may assume without loss of generality that we are dealing with real-valued
functions (otherwise split functions into real and imaginary parts f = g + ih and go
through the proof for both parts).
Lemma 4.59. If f ∈ A, then |f | ∈ A.
Proof. Let ε > 0 and a = maxx∈K |f (x)|. By Lemma 4.57 there exist c1 , . . . , cn ∈
R such that
n
X
| ci y i − |y|| ≤ ε.
i=1
104 4. APPROXIMATION OF FUNCTIONS

for all y ∈ [−a, a]. By Condition (1) we have that


X n
g= ci f i ∈ A.
i=1

Then |g(x) − |f (x)|| ≤ ε for all x ∈ K. Thus, |f | can be uniformly approximated by


functions in A. But A is closed, so |f | ∈ A. 
Lemma 4.60. If f1 , . . . , fm ∈ A, then min(f1 , . . . , fm ) ∈ A and max(f1 , . . . , fm ) ∈
A.
Proof. It suffices to show the claim for m = 2 (the general case then follows by
induction). Let f, g ∈ A. We have
f + g |f − g| f + g |f − g|
min(f, g) = − , max(f, g) = + .
2 2 2 2
Thus, Condition (1) and Lemma 4.59 imply that min(f, g), max(f, g) ∈ A. 
Lemma 4.61. For every x0 , x1 ∈ K, x0 6= x1 and c0 , c1 ∈ R there exists f ∈ A such
that f (xi ) = ci for i = 0, 1.
In other words, any two points in K × R that could lie on the graph of a function
in A do lie on the graph of a function in A.
Proof. By Conditions (2) and (3) there exist g, h0 , h1 ∈ A such that g(x0 ) 6= g(x1 )
and hi (xi ) 6= 0 for i = 0, 1. Set
u0 (x) = g(x)h0 (x) − g(x1 )h0 (x),
u1 (x) = g(x)h1 (x) − g(x0 )h1 (x)
Then u0 (x0 ) 6= 0, u0 (x1 ) = 0 and u1 (x1 ) 6= 0, u1 (x0 ) = 0. Now let
c0 c1
f (x) = u0 (x) + u1 (x).
u0 (x0 ) u1 (x1 )
Then f (x0 ) = c0 and f (x1 ) = c1 and f ∈ A by Condition (1). 
This lemma can be seen as a baby version of the full theorem: the statement can be
generalized to finitely many points. So we can use this generalization to find a function
in A that matches a given function f in any given collection of finitely many points
(see Exercise 4.81). Thus, if K was finite, we would already be done. If K is not finite,
we need to exploit compactness. Let us now get to the details.

Fix f ∈ C(K) and let ε > 0.

Claim: For every x ∈ K there exists gx ∈ A such that gx (x) = f (x) and
gx (t) > f (t) − ε for t ∈ K.
Proof of Claim. Let y ∈ K. By Lemma 4.61 there exists hy ∈ A such that
hy (x) = f (x) and hy (y) = f (y). By continuity of hy there exists an open ball By
around y such that |hy (t) − f (t)| < ε for all t ∈ By . In particular,
hy (t) > f (t) − ε.
Observe that (By )y∈K is an open cover of K. Since K is compact, we can find a finite
subcover by By1 , . . . , Bym . Set
gx = max(hy1 , . . . , hym ).
6. FURTHER EXERCISES 105

By Lemma 4.60, gx ∈ A. 
By continuity of gx there exists an open ball Ux such that
|gx (t) − f (t)| < ε
for t ∈ Ux . In particular,
gx (t) < f (t) + ε.
(Ux )x∈K is an open cover of K which has a finite subcover by Ux1 , . . . , Uxn . Then let
h = min(gx1 , . . . , gxn ).
By Lemma 4.60 we have h ∈ A. Also,
f (t) − ε < h(t) < f (t) + ε
for all t ∈ K. That is,
|f (t) − h(t)| < ε
for all t ∈ K. This proves that f ∈ A.

6. Further exercises
Exercise 4.62. Show that there exists no continuous 1-periodic function g such
that f ∗ g = f holds for all continuous 1-periodic functions f .
Hint: Use the Riemann-Lebesgue lemma.
Exercise 4.63. Give an alternative proof of Weierstrass’ theorem by using Fejér’s
theorem and then approximating the resulting trigonometric polynomials by truncated
Taylor expansions.
Exercise 4.64. Find a sequence of continuous functions (fn )n on [0, 1] and a con-
tinuous function f on [0, 1] such that kfn − f k2 → 0, but fn (x) does not converge to
f (x) for any x ∈ [0, 1].
Exercise 4.65 (Weighted L2 norms). Fix a function w ∈ C([a, b]) that is non-
negative and does not vanish identically. Let us define another inner product by
Z b
hf, giL2 (w) = f (x)g(x)w(x)dx
a
1/2
and a corresponding norm kf kL2 (w) = hf, f iL2 (w) . Similarly, we say that (φn )n is an
orthonormal system by asking that hφn , φm iL2 (w) is 1 if n = m and 0 otherwise. Verify
that all theorems in Section 2 continue to hold when h·, ·i, k·k2 are replaced by h·, ·iL2 (w) ,
k · kL2 (w) , respectively.
Exercise 4.66. Let w ∈ C([0, 1]) be such that w(x) ≥ 0 for all x ∈ [0, 1] and w ≡
6 0.
Prove that there exists a sequence of real-valued polynomials (pn )n such that pn is of
degree n and
Z 1 
1, if n = m,
pn (x)pm (x)w(x)dx =
0
0, if n 6= m
for all non-negative integers n, m.
106 4. APPROXIMATION OF FUNCTIONS

Exercise 4.67 (Chebyshev polynomials). Define a sequence of polynomials (Tn )n


by T0 (x) = 1, T1 (x) = x and the recurrence relation Tn (x) = 2xTn−1 (x) − Tn−2 (x) for
n ≥ 2.
(i) Show that Tn (x) = cos(nt) if x = cos(t).
Hint: Use that 2 cos(a) cos(b) = cos(a + b) + cos(a − b) for all a, b ∈ C.
(ii) Compute
Z 1
dx
Tn (x)Tm (x) √
−1 1 − x2
for all non-negative integers n, m.
(iii) Prove that |Tn (x)| ≤ 1 for x ∈ [−1, 1] and determine when there is equality.
Exercise 4.68. Let d be a positive integer and f ∈ C([a, b]). Denote by Pd the set
of polynomials with real coefficients of degree ≤ d. Prove that there exists a polynomial
p∗ ∈ Pd such that kf − p∗ k∞ = inf p∈Pd kf − pk∞ .
Hint: Find a way to apply Theorem 1.55.
Exercise 4.69. Let f be smooth on [0, 1] (that is, arbitrarily often differentiable).
(i) Let p be a polynomial such that |f 0 (x) − p(x)| ≤ ε for all x ∈ [0, 1]. Construct a
polynomial q such that |f (x) − q(x)| ≤ ε for all x ∈ [0, 1].
(k)
(ii) Prove that there exists a sequence of polynomials (pn )n such that (pn )n converges
uniformly on [0, 1] to f (k) for all k = 0, 1, 2, . . . .
Exercise 4.70 (The space L2 ). Let (X, d) be a metric space. Recall that the
completion X of X is defined as follows: for two Cauchy sequences (an )n , (bn )n in X
we say that (an )n ∼ (bn )n if limn→∞ d(an , bn ) = 0. Then ∼ is an equivalence relation on
the space of Cauchy sequences and we define X as the set of equivalence classes. We
identify X with a subset of X by identifying x ∈ X with the equivalence class of the
constant sequence (x, x, . . . ). We make X a metric space by defining
d(a, b) = lim d(an , bn ),
n→∞

where (an )n , (bn )n are representatives of a, b ∈ X, respectively. Then X is a complete


metric space. Let us denote by L2c (a, b) the metric space of continuous functions on
Rb
[a, b] equipped with the metric d(f, g) = kf − gk2 , where kf k2 = ( a |f |2 )1/2 . Define

L2 (a, b) = L2c (a, b).


(i) Define an inner product on L2 (a, b) by
Z b
hf, gi = lim fn (x)gn (x)dx,
n→∞ a

for f, g ∈ L2 (a, b) with (fn )n , (gn )n being representatives of f, g, respectively. Show


that this is well-defined: that is, show that the limit on the right hand side exists and
is independent of the representatives (fn )n , (gn )n and that h·, ·i is an inner product.
Hint: Use the Cauchy-Schwarz inequality on L2c (a, b).

For f ∈ L2 (a, b) we define kf k2 = hf, f i1/2 . Let (φn )n=1,2,... be an orthonormal


system in L2 (a, b) (that is, hφn , φm i = 0 if n =
6 m and = 1 if n = m).
6. FURTHER EXERCISES 107

(ii) Prove Bessel’s inequality: for every f ∈ L2 (a, b) it holds that


X∞
|hf, φn i|2 ≤ kf k22
n=1

Hint: Use the same proof as seen for L2c (a, b) in the lecture!

(iii) Let (cn )n ⊂ C be a sequence of complex numbers and let


N
X
fN = cn φn ∈ L2 (a, b).
n=1
2
Show that (fN )N converges in L (a, b) if and only if
X ∞
|cn |2 < ∞.
n=1

Exercise 4.71. Let f be the 1-periodic function such that f (x) = |x| for x ∈
[−1/2, 1/2]. Determine explicitly a sequence of trigonometric polynomials (pN )N such
that pN → f uniformly as N → ∞.
Exercise 4.72. Let f, g be continuous, 1-periodic functions.
(i) Show that f[ ∗ g(n) = fb(n)b
g (n).
P
(ii) Show that f · g(n) = m∈Z fb(n − m)b
d g (m).
(iii) If f is continuously differentiable, prove that fb0 (n) = 2πinfb(n).
(iv) Let y ∈ R and set fy (x) = f (x + y). Show that fby (n) = e2πiny fb(n).
(v) Let m ∈ Z, m 6= 0 and set fm (x) = f (mx). Show that fc b n
m (n) equals f ( m ) if m
divides n and zero otherwise.

dn
Exercise 4.73 (Legendre polynomials). Define pn (x) = dxn
[(1 − x2 )n ] for n =
0, 1, . . . and Z 1
−1/2
φn (x) = pn (x) · pn (t)2 dt .
−1
Show that (φn )n=0,1,... is a complete orthonormal system on [−1, 1].
Exercise 4.74. Let f be 1-periodic and k times continuously differentiable. Prove
that there exists a constant c > 0 such that
|fb(n)| ≤ c|n|−k for all n ∈ Z.
Hint: What can you say about the Fourier coefficients of f (k) ?
Exercise 4.75. Let f be 1-periodic and continuous.
(i) Suppose that fb(n) = −fb(−n) ≥ 0 holds for all n ≥ 0. Prove that
∞ b
X f (n)
< ∞.
n=1
n
(ii) Show that there does not exist a 1-periodic continuous function f such that
sgn(n)
fb(n) = for all |n| ≥ 2.
log |n|
Here sgn(n) = 1 if n > 0 and sgn(n) = −1 if n < 0.
108 4. APPROXIMATION OF FUNCTIONS

Exercise 4.76. Suppose that f is a 1-periodic function such that there exists c > 0
and α ∈ (0, 1] such that
|f (x) − f (y)| ≤ c|x − y|α
holds for all x, y ∈ R. Show that the sequence of partial sums
N
X
SN f (x) = fb(n)e2πinx
n=−N

converges uniformly to f as N → ∞.
Exercise 4.77. Let f ∈ C([0, 1]) and A ⊂ C([0, 1]) dense. Suppose that
Z 1
f (x)a(x)dx = 0
0

for all a ∈ A. Show


R 1 that f2 = 0.
Hint: Show that 0 |f (x)| dx = 0.
Exercise 4.78. Let f ∈ C([−1, 1]) and a ∈ [−1, 1]. Show that for every ε > 0 there
exists a polynomial p such that p(a) = f (a) and |f (x) − p(x)| < ε for all x ∈ [−1, 1].
Exercise 4.79. Prove that

1 X sin(n)
− = (−1)n .
2 n=1 n

Exercise 4.80. Suppose f ∈ C([1, ∞)) and limx→+∞ f (x) = a. Show that f can
be uniformly approximated on [1, ∞) by functions of the form g(x) = p(1/x), where p
is a polynomial.
Exercise 4.81 (Stone-Weierstrass for finite sets). Let K be a finite set and A
a family of functions on K that is an algebra (i.e. closed under taking finite linear
combinations and products), separates points and vanishes nowhere. Give a purely
algebraic proof that A must then already contain every function on K. (That means
your proof is not allowed to use the concept of an inequality. In particular, you are not
allowed to use any facts about metric spaces such as the Stone-Weierstrass theorem.)
Hint: Take a close look at the proof of Stone-Weierstrass.
Exercise 4.82 (Uniform approximation by neural networks). Let σ(t) = et for
t ∈ R. Fix n ∈ N and let K ⊂ Rn be a compact set. As usual, let C(K) denote the
space of real-valued continuous functions on K. Define a class of functions N ⊂ C(K)
by saying that µ ∈ N iff there exist m ∈ N, W ∈ Rm×n , v, b ∈ Rm such that
m
X
µ(x) = σ((W x)i + bi )vi for all x ∈ K.
i=1

Prove that N is dense in C(K).


Remark. This is a special case of a well-known result of G. Cybenko, Approximation by
Superpositions of a Sigmoidal Function in Math. Control Signals Systems (1989). As a
real-world motivation for this problem, note that a function µ ∈ N can be interpreted as
a neural network with a single hidden layer, see Figure 5. Consequently, in this problem
you are asked to show that every continuous function can be uniformly approximated
by neural networks of this form.
6. FURTHER EXERCISES 109

x3

x2 µ(x)

x1

Figure 5. Visualization of µ when n = 3 and m = 6.

Exercise 4.83. Let f be a continuous function on [0, 1] and N a positive integer.


Define xk = Nk for k = 0, . . . , N . Define
N N
X Y x − xj
LN (x) = f (xk ) .
j=0 j=0,j6=k
xk − xj
(i) Show that f (xk ) = LN (xk ) for all k = 0, . . . , N and that LN is the unique polynomial
of degree ≤ N with this property.
(ii) Suppose f ∈ C N +1 ([0, 1]). Show that for every x ∈ [0, 1] there exists ξ ∈ [0, 1] such
that
N
f (N +1) (ξ) Y
f (x) − LN (x) = (x − xk ).
(N + 1)! k=0
(iii) Show that LN does not necessarily converge to f uniformly on [0, 1]. (Find a
counterexample.)
(iv) Suppose f is given by a power series with infinite convergence radius. Does LN
necessarily converge to f uniformly on [0, 1] ?
Remark. The polynomials LN are also known as Lagrange interpolation polynomials .
CHAPTER 5

From Riemann to Lebesgue*

1. Lebesgue null sets


For a compact interval I = [c, d] we call d − c the length of I, also denoted by `(I).
Definition 5.1. A set E ⊂ [a, b] is called a Lebesgue null setPif for every ε > 0
there is a sequence (In )n∈N of intervals such that E ⊂ ∪n∈N In and ∞
n=1 `(In ) < ε.
Lemma 5.2. Countable unions of Lebesgue null sets are Lebesgue null sets.
Proof. Let Ek , k ∈ N bePLebesgue nullsets. For each k find a countable family of

intervals {Ik,n }∞
n=1 such that n=1 `(Ik,n ) < ε2
−k−1
and Ek ⊂ ∪∞n=1 Ik,n .
The family of intervals {Ik,n }(k,n)∈N2 is countable (and thus cn be arranged in a
series) and we have
X ∞
∞ X ∞
X
`(Ik,n ) < ε2−k−1 = ε/2. 
k=1 n=1 k=1
Exercise 5.3. Which theorems about series with nonnegative terms have been used
in this proof?
Definition 5.4. A set E ⊂ [a, b] has content zero if for every ε > 0 there is a finite
PN
set of intervals I1 , . . . , IN such that E ⊂ ∪N
n=1 In and n=1 `(In ) < ε.
Note that any set of content zero is a Lebesgue null set, but the converse is not true
(see Exercise 5.8 below).
Lemma 5.5. Let {Iν }N N
ν=1 be a finite collection of intervals such that [a, b] ⊂ ∪ν=1 Ij .
PN
Then ν=1 `(Iν ) ≥ b − a. In particular [a, b] does not have content zero.
Proof. Let Jν := I ν ∩ [a, b]. Arrange the finite set formed by all the endpoint of
those intervals in increasing order, written as a ≤ x1 ≤ · · · < xM = b. Then every
interval [xi−1 , xi ] is contained in at least Jν .
Define inductively sets of indices Jν . For ν = 1 set

J1 = i ∈ {1, . . . , M } : [xi−1 , xi ] ⊂ J1 .
For any ν > 1 we are either in the situation that J1 ∪ · · · ∪ Jν−1 contains all i ∈
{1, . . . , M } (then we stop the construction) or, if not, then we form

Jν = i ∈ {1, . . . , M } : [xi−1 , xi ] ⊂ Jν and [xi−1 , xi ] ( Jl for l ≤ ν − 1 .
The construction stops after K steps, where K ≤ N . Note that each index i is in
exactly one family Jν and also for each ν we have
X
(xi − xi−1 ) ≤ `(Jν ).
i∈Jν

Consequently
N
X K X
X K
X N
X
b−a= (xi − xi−1 ) = (xi − xi−1 ) ≤ `(Jν ) ≤ `(Iν ). 
i=1 ν=1 i∈Jν ν=1 ν=1

111
112 5. FROM RIEMANN TO LEBESGUE*

Lemma 5.6. Let E be a compact Lebesgue null set. Then E has content zero.
Proof. Let ε > 0. Since P∞ E is a null set there is a countable family {Iν }ν∈N of
closed intervals such that ν=1 `(Iν ) < ε/2. Write Iν = [aν , bν ] and form the slightly
larger open intervals Ieν = (aν − ε2−ν−2 , bν + ε2−ν−2 ) so that `(Ieν ) = `(Iν ) + ε2−ν−1 and
thus ∞ ∞
X X X
`(Iν ) ≤
e `(Iν ) + ε2−ν−1 < ε/2 + ε/2 = ε.
ν=1 ν=1 ν=1

Since E is compact we may choose finitely many Ieν1 , . . . , IeνM such that E ⊂ ∪M
l=1 Iνl and
e
PM P
l=1 `(Iνl ) ≤ ν=1 `(Iν ) < ε. Hence E has content zero. 
e e
Corollary 5.7. Let a < b. Then [a, b] is not a Lebesgue null set.
Proof. This is an immediate consequence of from Lemma 5.5 together with Lemma
5.6. 
Exercise 5.8. Let E be the set of rational numbers in [a, b]. Show that E is a
Lebesgue null set but E is not of content zero.
The Lebesgue null sets are usually called sets of Lebesgue measure zero . We avoid
this terminology here because we have not defined Lebesgue measure here and indeed
have not identified the class of sets on which it can be defined (the so called Lebesgue
measurable sets). A substitute for Lebesgue measure which can be defined on all subsets
of R is Lebesgue outer measure:
Definition 5.9. For a subset
P of R the Lebesgue outer measure λ∗ (E) of E is defined
as the quantity λ∗ (E) = inf ∞ n=1 `(In ) where the infimum is taken over all countable
collections {In }n∈N of intervals which have the property that E ⊂ ∪∞ n=1 In .

With this definition, the Lebesgue null sets are simply the sets of Lebesgue outer
measure zero.
2. Lebesgue’s Characterization of the Riemann integral
We can now formulate the main theorem of this chapter.
Theorem 5.10. Let f : [a, b] → R be a bounded function. Then f is Riemann
integrable if and only if the set of discontinuities of f ,
Df := {x ∈ [a, b] : f is not continuous at x},
is a Lebesgue null set.
The following lemma linking oscillation to lower and upper sums is very helpful in
the proof of Theorem 5.10.
Lemma 5.11. Let f : [a, b] → R be a bounded function and assume that oscf (x) < γ
for all x ∈ [a, b]. Then there is a partition P of [a, b] such that U (f, P ) − L(f, P ) <
γ(b − a).
Proof. By definition of oscf (x) we can find a δx > 0 such that
Mf,2δx (x) − mf,2δx (x) < γ.
Since [a.b] is compact we find x1 , ...., xN such that [a, b] is contained in the union of
the intervals (xi − δxi , xi + δxi ). Consider the finite set consisting of the a, b the xi , the
corresponding point xi − δxi and xi + δxi and then discard those point which do not lie
2. LEBESGUE’S CHARACTERIZATION OF THE RIEMANN INTEGRAL 113

in [a, b]. The resulting set P is a partition of [a, b] with nodes a = t0 < · · · < tM = b
and if ti−1 , ti are consecutive nodes in this partition then
sup{f (t) : t ∈ [ti−1 , ti ]} − inf{f (t) : t ∈ [ti−1 , ti ]} < γ.
Hence
M
X
U (f, P ) − L(f, P ) < γ (ti − ti−1 ) = γ(b − a)
i=1
and the lemma is proved. 
Proof of Theorem 5.10. Part 1: Set of discontinuities is a null set =⇒ f is
Riemann integrable. By Lemma A.27 it suffices to construct, for given ε > 0, a partition
P such that
(5.1) U (f, P) − L(f, P) < ε.
The function f is bounded and thus there is C > 0 such that |f (x)| ≤ C for x ∈ [a, b].
Now let ε1  ε depending on ε; we will see (only at the end) that
ε
ε1 =
2C + b − a
is an appropriate choice. Consider the set
D(ε1 ) = {x ∈ [a, b] : oscf (x) ≥ ε1 }.
D(ε1 ) is a Lebesgue null set since D(ε1 ) ⊂ Df and Df is a Lebesgue null set. Also,
D(ε1 ) is a closed subset of [a, b], and thus compact and thus has content zero.
PN
Thus there is a finite collection {Iν }N
ν=1 of closed intervals such that ν=1 `(Iν ) < ε1
N ◦ ◦
and D(ε1 ) ⊂ ∪ν=1 (Iν ) (where (Iν ) denotes the interior of Iν ).
We may choose a partition P = {a = x0 < · · · < xN = b} such that each index i
belongs to (at least) one of the following sets:
J1 = {i : [xi−1 , xi ] ⊂ Iν for some ν in [1, N ]]}
J2 = {i : [xi−1 , xi ] ∩ D(ε1 ) = ∅}.
Regarding the intervals [xi−1 , xi ] with i ∈ J1 we have
X N
X X N
X
(5.2) (xi − xi−1 ) ≤ (xi − xi−1 ) ≤ `(Iν ) < ε1 .
i∈J1 ν=1 i: ν=1
[xi−1 ,xi ]⊂Iν

We observe that for all i ∈ J2 we have oscf (x) < ε1 for all x ∈ [xi , xi+1 ]. Thus by
Lemma 5.11, we find a partition Pi of [xi−1 , xi ], labeled {xi = xi,0 , . . . , xi,Ni := xi+1 },
such that with
Ni
X
i
U (f, Pi ) := (xi,j − xi,j−1 ) sup f (x),
j=1 [xi,0 ,xi,Ni ]

Ni
X
i
L (f, Pi ) := (xi,j − xi,j−1 ) inf f (x)
[xi,0 ,xi,Ni ]
j=1

we have
(5.3) U i (f, Pi ) − Li (f, Pi ) < ε1 (xi − xi−1 ).
114 5. FROM RIEMANN TO LEBESGUE*

Now the desired partition is defined by


[
P = {xi : i ∈ J1 } ∪ {xi,0 , . . . , xi,Ni }
i∈J2

and we can split and then estimate


X  X
U (f, P) − L(f, P) = sup f − inf f (xi − xi−1 ) + (U i (f, Pi ) − Li (f, Pi ))
[xi−1 ,xi ] [xi−1 ,xi ]
i∈J1 i∈J2
X X
≤ 2C (xi − xi−1 ) + (U i (f, Pi ) − Li (f, Pi )).
i∈J1 i∈J2

By (5.2) and (5.3) we get


X
U (f, P) − L(f, P) < 2Cε1 + ε1 (xi − xi−1 ) ≤ 2Cε1 + (b − a)ε1 .
i∈J2

In view of our choice ε1 = ε/(2C + b − a) we have proved the desired inequality (5.1).

Part 2: f is Riemann integrable =⇒ Set of discontinuities is a null set.


For each n ∈ N we define Dn = {x ∈ [a, b] : oscf (x) ≥ 1/n}. Observe that
Df = ∪∞ n n
n=1 D , by Lemma 1.93. Thus by Lemma 5.2 it suffices to show that each D is
a Lebesgue null set.
Fix n ∈ N. Since f is Riemann integrable there exists, by Lemma A.27 a partition
P = {x0 < · · · < xN } of [a, b] such that
U (f, P ) − L(f, P ) < ε/n.
Let J be the set of indices i for which Ii := [xi−1 , xi ] contains a point in Dn , so that
Dn ⊂ ∪i∈J Ii . Clearly we have
1
Mi (f ) − mi (f ) ≥ inf oscf (x) ≥ if i ∈ J .
x∈[xi−1 ,xi ] n
Hence
X X (Mi (f ) − mi (f ))(xi − xi−1 )
`(Ii ) =
i∈J I∈J
Mi (f ) − mi (f )
N
1 X ε
≤ (Mi (f ) − mi (f ))(xi − xi−1 ) < n · = ε.
1/n i=1 n
Since ε > 0 was arbitrary we have proved that each Dn is a Lebesgue null set, and thus
Df is a Lebesgue null set. 
CHAPTER 6

The Baire category theorem*

Let (X, d) be a metric space. Recall that the interior Ao of a set A ⊂ X is the set
of interior points of A, i.e. the set of all x ∈ A such that there exists ε > 0 such that
Bε (x) ⊂ A. A set A ⊂ X is dense if A = X. Note that A is dense if and only if for all
non-empty open sets U ⊂ X we have A ∩ U 6= ∅.
Definition 6.1. A set A ⊂ X is called nowhere dense if its closure has empty
o
interior. In other words, if A = ∅. Equivalently, A is nowhere dense if and only if A
contains no non-empty open set.
Note that 1. A closed set A ⊂ X has empty interior if and only if Ac = X \ A is
open and dense. (This is because A is closed if and only if Ac is open and A has empty
interior if and only if Ac is dense.)
2. A is nowhere dense if and only if Ac contains an open dense set.
3. A is nowhere dense if and only if A is contained in a closed set with empty interior.
Example 6.2. The Cantor set
`
−1
∞ 3[
[
(6.1) C = [0, 1]\ ( 3k+1 , 3k+2 )
3`+1 3`+1
`=0 k=0

is a closed subset of [0, 1] and has empty interior. Therefore, it is nowhere dense.
Lemma 6.3. Suppose A1 , . . . , An ⊂ X are nowhere dense sets. Then nk=1 Ak is
S
nowhere dense.
Proof. Without loss of generality let n = 2. We need to show that A1 ∪ A2 has
c
empty interior. Equivalently, setting Uk = Ak for k = 1, 2. We show that U1 ∩ U2 is
dense. Let U ⊂ X be a non-empty open set. Then V1 = U ∩ U1 is open and non-empty,
because U1 is dense. Since U2 is also dense, V1 ∩ U2 = U ∩ (U1 ∩ U2 ) is non-empty, so
U1 ∩ U2 is dense. 
Also, a subset of a nowhere dense set is nowhere dense and the closure of a nowhere
dense set is nowhere dense.

However, countable unions of nowhere dense sets are not necessarily nowhere dense
sets.
Example 6.4. Enumerate the rationals as Q = {q
S1 , q2 , . . . }. For every k = 1, 2, . . . ,
the set Ak = {qk } is nowhere dense in R. But Q = ∞ k=1 Ak ⊂ R is not nowhere dense
(it is dense!).
Definition 6.5. A set A ⊂ X is called meager (or of first category ) in X if it
is the countable union of nowhere dense sets. A is called comeager (or residual or of
second category ) if Ac is meager.
115
116 6. THE BAIRE CATEGORY THEOREM*

The above example shows that Q ⊂ R is meager. In fact, every countable subset of
R is meager (because single points are nowhere dense in R).
By definition, countable unions of meager sets are meager. The choice of the word
“meager” suggests that meager sets are somehow “small” or “negligible”. But how
“large” can meager sets be? For example, can X be meager? That is, can we write
the entire metric space X as a countable union of nowhere dense subsets? The Baire
category theorem will show that the answer is no, if X is complete.
Theorem 6.6 (Baire category theorem). In a complete metric space, meager sets
have empty interior. Equivalently, countable intersections of open dense sets are dense.
Corollary 6.7. Let X be a complete metric space and A ⊂ X a meager set. Then
A 6= X. In other words, X is not a meager subset of itself.
Example 6.8. The conclusion of the Baire category theorem fails if we drop the
assumption that X is complete. Consider X = Q with the metric inherited from R
(so d(p, q) = |p − q|). Then X is a meager subset of itself because it is countable and
single points are nowhere dense in X (X has no isolated points). But the interior of X
is non-empty, because X is open in X.
Example 6.9. Not every set with empty interior is meager: consider the irrational
numbers A = R \ Q. A has empty interior, because Ac = Q is dense. It is not meager,
because otherwise R = A ∪ Ac would be meager, which contradicts the Baire category
theorem.
Exercise 6.10. Another notion of “smallness” is the following:
Definition. A set A ⊂ R is called a Lebesgue null set if for every ε > 0 there exist
intervals I1 , I2 , . . . such that

[ ∞
X
A⊂ Ij and |Ij | ≤ ε.
j=1 j=1

(Here |I| denotes the length of the interval I.)


Give an example of a comeager Lebesgue null set. (Recall that a set is called comeager
if its complement is meager.)
(This implies in particular that Lebesgue null sets are not necessarily meager and meager
sets are not necessarily Lebesgue null sets.)
For the proof of Theorem 6.6 we will need the following lemma.
Lemma 6.11. Let X be complete and A1 ⊃ A2 ⊃ · · · a decreasing sequence of
non-empty closed sets in X such that
diam An = sup d(x, y) −→ 0
x,y∈An
T∞
as n → ∞. Then n=1 An is non-empty.
Proof of Lemma 6.11. For every n ≥ 1 we choose xn ∈ An . Then (xn )n∈N is a
Cauchy sequence, because for all n ≥ m we have d(xn , xm ) ≤ diamAm → 0 as m → ∞.
Since X is complete, there exists x ∈ X such that limn→∞ xn = x. Let N ∈ N. Then
AN contains the sequence (xn )n≥N and since AN is closed, it must also contain the limit
of this sequence, so x ∈ AN . This proves that x ∈ ∞
T
N =1 N .
A 
1. NOWHERE DIFFERENTIABLE CONTINUOUS FUNCTIONS* 117

T∞Proof of Theorem 6.6. Let (Un )n∈N be open dense sets. We need to show that
Tn=1 Un is dense. Let U ⊂ X be open and non-empty. It suffices to show that U ∩

n=1 Un is non-empty. Since U1 is open and dense, U ∩ U1 is open and non-empty.
Choose a closed ball B(x1 , r1 ) ⊂ U ∩ U1 with r1 ∈ (0, 1). Then B(x1 , r1 ) ∩ U2 is
open and non-empty (because U2 is dense), so we can choose a closed ball B(x2 , r2 ) ⊂
B(x1 , r1 ) ∩ U2 with r2 ∈ (0, 12 ). Iterating this process, we obtain a sequence of closed
balls (B(xn , rn ))n such that B(xn , rn ) ⊂ B(xn−1 , rn−1 ) ∩ Un and rn ∈ (0, n1 ). By Lemma
in ∞
T
6.11 there exists a point x contained
T∞ n=1 B(xn , rn ). Since B(xn , rn ) ⊂ U ∩ Un for
all n ≥ 1, we have x ∈ U ∩ n=1 Un . 
The Baire category theorem has a number of interesting consequences.

1. Nowhere differentiable continuous functions*


Theorem 6.12. Let A ⊂ C([0, 1]) be the set of all functions that are differentiable
at at least one point in [0, 1]. Then A is meager.
Proof. For n ∈ N we define An to be the set of all f ∈ C([0, 1]) such that there
exists t ∈ [0, 1] such that
f (t + h) − f (t)
≤n
h
holds for all h ∈ R with t + h ∈ [0, 1]. Then

[
A⊂ An .
n=1

It suffices to show that each An is nowhere dense. We first prove that An is closed.
Let (fk )k∈N ⊂ An be a sequence that converges to some f ∈ C([0, 1]). We show that
f ∈ An . Indeed, by assumption, there exists (tk )k∈N ⊂ [0, 1] such that
fk (tk + h) − fk (tk )
≤n
h
holds for all k ≥ 1 if tk + h ∈ [0, 1]. By the Bolzano-Weierstrass theorem, we may
assume without loss of generality that (tk )k∈N converges to some t ∈ [0, 1] (by passing
to a subsequence). Then, by continuity of f ,
f (t + h) − f (t) fk (tk + h) − fk (tk )
= lim ≤ n.
h k→∞ h
Therefore, f ∈ An and An is closed. Also, An has empty interior. Indeed, one can see
that C([0, 1]) \ An is dense because every f ∈ C([0, 1]) can be uniformly approximated
by a function that has arbitrarily large slope (think of “sawtooth” functions).
Exercise 6.13. Provide the details of this argument: show that An has empty
interior.

The Baire category theorem implies that A has empty interior. In other words, the
set of nowhere differentiable functions C([0, 1])\A is dense. In this sense, it is “generic”
behavior for continuous functions to be nowhere differentiable. In particular, we can
conclude that there exists f ∈ C([0, 1]) \ A (so f is nowhere differentiable) without
actually constructing such a function. On the other hand, one can also give explicit
examples of nowhere differentiable functions.
118 6. THE BAIRE CATEGORY THEOREM*

Example 6.14 (Weierstrass’ function). Consider the function f ∈ C([0, 1]) defined
as

X
f (x) = b−nα sin(bn x),
n=0
where 0 < α < 1 and b > 1 are fixed. The function f is indeed continuous because
the series is uniformly convergent. In fact, f is the uniform limit of the sequence of
functions (fN )N considered in Exercise 1.105.
Exercise 6.15. Show that f is nowhere differentiable.

2. Sets of continuity*
Definition 6.16. Let X, Y be metric spaces and f : X → Y a map. The set
Cf = {x ∈ X : f is continuous at x} ⊂ X
is called the set of continuity of f . Similarly, Df = X \ Cf is called the set of
discontinuity of f .
Example 6.17. Let f : R → R be defined by f (x) = 1 if x is rational and f (x) = 0
if x is irrational. Then Cf = ∅.
Example 6.18. Let f : R → R be defined by f (x) = x if x is rational and f (x) = 0
if x is irrational. Then Cf = {0}.
Example 6.19. Consider the function f : R → R defined as follows: we set f (0) = 1
and if x ∈ Q \ {0}, then we let f (x) = 1/q, where x = pq , where p ∈ Z, q ∈ N and the
greatest common divisor of p and q is one. If x 6∈ Q, then we let f (x) = 0. We claim
that Cf = R \ Q. Indeed, say x ∈ R \ Q and pn /qn → x a rational approximation. Then
qn → ∞ (otherwise, it must converge and then x would be rational). √This implies that
f is continuous at x. On the other hand, say x ∈ Q. Set xn = x + n2 . Then xn 6∈ Q

because 2 6∈ Q, so f (xn ) = 0 for all n, so limn→∞ f (xn ) = 0, but f (x) 6= 0. Hence f
is not continuous at x.
It is natural to ask which subsets of X arise as the set of continuity of some function
on X. For instance, does there exist a function f : R → R such that Cf = Q ?
Definition 6.20. A set A ⊂ X is called an Fσ -set if it is a countable union of
closed sets. A set G ⊂ X is called a Gδ -set if it is a countable intersection of open sets.
These names are motivated historically. The F in Fσ is for fermé which is French
for closed . On the other hand, the G in Gδ is for Gebiet which is German for region .
Examples 6.21. 1. Every open set is a Gδ -set and every closed set is an Fσ -set.
2. Let x ∈ X. Then {x} is a Gδ -set: S
it is the intersection of the open balls B(x, 1/n).
3. Q ⊂ R is an Fσ set, because Q = q∈Q {q} (a countable union of closed sets).
Theorem 6.22. Let X and Y be metric spaces and f : X → Y a map. Then
Cf ⊂ X is a Gδ -set and Df is an Fσ -set.
Proof. Let f : X → Y be given. It suffices to show that Cf is a Gδ -set. For every
S ⊂ X we define the oscillation of f on S by
ωf (S) = sup dY (f (x), f (x0 )) = diam f (S).
x,x0 ∈S
3. BAIRE FUNCTIONS* 119

For a point x ∈ X we recall definition of oscillation of f at x by


oscf (x) = inf ωf (B(x, ε)).
ε>0

This coincides with the definition 1.91. Then we have


x ∈ Cf ⇐⇒ oscf (x) = 0
and we can write the set of continuity of f as

\
Cf = {x ∈ X : oscf (x) < n1 }.
n=1

We are done if we can show that Un = {x ∈ X : oscf (x) < n1 } is open for every
n ∈ N. Let x0 ∈ Un . Then oscf (x0 ) < n1 . Therefore, there exists ε > 0 such that
ωf (B(x0 , ε)) < n1 . Let x ∈ B(x0 , ε/2). Then by the triangle inequality, B(x, ε/2) ⊂
B(x0 , ε). Therefore,
oscf (x) ≤ ωf (B(x, ε/2)) ≤ ωf (B(x0 , ε)) < n1 .
Thus, B(x0 , ε/2) ⊂ Un and so Un is open. 
As a sample application of the Baire category theorem we now answer one of our
previous questions negatively:
Lemma 6.23. Q ⊂ R is not a Gδ -set. Consequently, there exists no function f :
R → R such that Cf = Q.
Proof. Suppose Q is a Gδ -set. Then R \ Q is an Fσ -set and therefore can be
written as a countable union of closed sets A1 , A2 , . . . . Since R \ Q has empty interior
(its complement Q is dense), An ⊂ R \ Q also has empty interior for every n. Thus An
is nowhere dense, so R \ Q is meager. But then R = Q ∪ (R \ Q) must be meager, which
contradicts the Baire category theorem. 
Observe that an Fσ -set is either meager or has non-empty interior: suppose A ⊂ X
is an Fσ -set with empty interior. Then it is a countable union of closed sets with empty
interior and therefore meager. Similarly, a Gδ -set is either comeager or not dense.
Remark. It is natural to ask if the converse of Theorem 6.22 is true in the following
sense: given a Gδ -set G ⊂ X, can we find a function f : X → R such that Cf = G ?
This cannot hold in general: suppose X contains an isolated point, that is X contains an
open set of the form {x}. Then necessarily x ∈ Cf , but x is not necessarily contained in
every possible Gδ -set. However, this turns out to be the only obstruction: if X contains
no isolated points, then for every Gδ -set G ⊂ X one can find f : X → R such that
Cf = G. For a very short proof of this, see S. S. Kim: A Characterization of the Set
of Points of Continuity of a Real Function. Amer. Math. Monthly 106 (1999), no. 3,
258—259.

3. Baire functions*
Consider again the Dirichlet function D
(
1 if x ∈ Q
(6.2) D(x) =
0 if x ∈
/ Q.
It is natural to ask whether D is the pointwise limit of a sequence of continuous func-
tions. The answer turns out to be no.
120 6. THE BAIRE CATEGORY THEOREM*

We start with a definition.


Definition 6.24. Let X be a metric space.
A function f : X → R is a Baire-1 function if there is a sequence of continuous
functions fn : X → R such that limn→∞ fn (x) = f (x) for all x ∈ X.
Clearly a Baire-1 function does not have to be continuous everywhere in X. However
the following theorem shows that f will be continuous on a residual set.
Theorem 6.25. Let X be a complete metric space and let f : X → R be a Baire-1
function. Then the set Cf = {x : f is continuous at x} is a dense Gδ -set.
In the proof of this theorem we shall apply the Baire category theorem twice. The
first application is used in the following lemma.
Lemma 6.26. Let Y be a complete metric space let fn : Y → R be continuous
functions on Y converging pointwise to f . Then for every α > 0 there is an N ∈ N and
an open ball B such that
|fn (x) − f (x)| ≤ α
for all n ≥ N and all x ∈ B.
Proof. Let
An = {x ∈ Y : |fn (x) − fk (x)| ≤ α for all k > n}.
First observe that An is a closed set; indeed the set En,k = {x : |fn (x) − fk (x)| ≤ α} is
closed since |fn − fk | is continuous and we have An = ∩∞ k=n+1 En,k .
Since fn converges pointwise to f we have that |fn (x) − f (x)| ≤ α for x ∈ An ,
and Y = ∪∞ n=1 An . By Baire’s theorem there exists n ∈ N such that An is not nowhere
dense and since An is closed there is an open ball B ⊂ An . We have thus proved the
assertion. 
Proof of Theorem 6.25, conclusion. We recall the definition of oscillation
ωf (A) and the definition of oscillation oscf (x) in the previous section. Consider the
open sets WM = {x : oscf (x) < 1/M } and we have Cf = ∩∞ M =1 WM . We show that WM
is dense in X. Let B0 be any open ball; we need to show that its intersection with WM is
not empty. Let B1 be an open ball such that B1 ⊂ B0 . We apply Lemma 6.26 with the
complete metric space B 1 , and α = (4M )−1 and find an open ball B2 (which is an open
ball in X contained in B0 ) and n ∈ N such that |fn (x) − f (x)| ≤ (4M )−1 for all x ∈ B2 .
Since fn is continuous we find an open ball B3 ⊂ B2 such that |fn (x) − fn (y)| < (4M )−1
for all x, y ∈ B3 . Hence
3 1
|f (x) − f (y)| ≤ |f (x) − fn (x)| + |fn (x) − fn (y)| + |fn (y) − f (y)| ≤ 4M
< M
for x, y ∈ B3 and therefore oscf (x) < 1/M for all x ∈ B3 . Thus B3 ⊂ WM .
We have identified the Gδ -set Cf as a countable intersection of dense open sets,
hence it is dense set by Baire’s theorem, and thus a residual set. 
Remark. Baire considered a hierarchy of increasing classes that we refer to as
Baire-n classes: One defines the Baire-0 class as the class of continuous functions on
X. Then for n ≥ 1 the Baire-n class consists of the pointwise limits of sequences of
functions in the Baire-(n − 1) class.
As an illustration consider an enumeration {rn } of the rational numbers and define
the function Dn so that Dn (rk ) = 1 for 1 ≤ k ≤ n and Dn (x) = 0 elsewhere. Verify that
the functions Dn are Baire-1 and that Dn converges pointwise to the Dirichlet function
4. THE UNIFORM BOUNDEDNESS PRINCIPLE* 121

in (6.2); this identifies D as a Baire-2 function which by Theorem 6.25 is not Baire-1.
Alternatively one can also use the formula D(x) = limj→∞ (limm→∞ (cos(j!πx))2m ) to
show that D is Baire-2.

4. The uniform boundedness principle*


The following theorem is one of the cornerstones of functional analysis and is a
direct application of the Baire category theorem.
Theorem 6.27 (Banach-Steinhaus). Let X be a Banach space and Y a normed
vector space. Let F ⊂ L(X, Y ) be a family of bounded linear operators. Then
sup kT xkY < ∞ for all x ∈ X ⇐⇒ sup kT kop < ∞.
T ∈F T ∈F

In other words, a family of bounded linear operators is uniformly bounded if and only
if it is pointwise bounded.
This theorem is also called the uniform boundedness principle.
Proof. In the ’⇐’ direction there is nothing to show. Let us prove ’⇒’. Suppose
that supT ∈F kT xkY < ∞ for all x ∈ X. Define
An = {x ∈ X : sup kT xkY ≤ n} ⊂ X.
T ∈F

An is a closed set: if (xk )k∈N ⊂ An is a sequence with xk → x ∈ X, then since T


is continuous, kT xkY = limk→∞ kT xk kY ≤ n for all T ∈ F, so x ∈ An . Also, the
assumption supT ∈F kT xkY < ∞ for all x ∈ X implies that

[
X= An .
n=1
By the Baire category theorem, X is not meager. Thus, there exists n0 ∈ N such that
An0 has non-empty interior. This means that there exists x0 ∈ An0 and ε > 0 such that
B(x0 , ε) ⊂ An0 .
Let x ∈ X be such that kxkX ≤ ε. Then for all T ∈ F,
kT xkY = kT (x0 − x) − T x0 kY ≤ kT (x0 − x)kY + kT x0 kY ≤ 2n0 .
Now we use the usual scaling trick: let x ∈ X satisfy kxkX = 1. Then
kT xkY = ε−1 kT (εx)kY ≤ 2ε−1 n0 .
This implies
sup kT kop = sup sup kT xkY ≤ 2ε−1 n0 < ∞.
T ∈F T ∈F kxkX =1

Example 6.28. If X is not complete, then the conclusion of the theorem may fail.
For instance, let X be the space of all sequences (xn )n∈N ⊂ R such that at most finitely
many of the xn are non-zero. Equip X with the norm kxk∞ = supn∈N |xn |. Define
`n : X → R by `n (x) = nxn . `n is a bounded linear map because
|`n (x)| = |nxn | ≤ nkxk∞ .
For every x ∈ X there exists Nx ∈ N such that xn = 0 for all n > Nx .This implies that
sup |`n (x)| = max{|`n (x)| : n = 1, . . . , Nx } < ∞.
n∈N
122 6. THE BAIRE CATEGORY THEOREM*

But k`n kop ≥ n because |`n (en )| = n (where en denotes the sequence such that en (m) =
0 for every m 6= n and en (n) = 1). Thus,
sup k`n kop = ∞.
n∈N

Remark. In the proof we only needed that X is not meager. This is true if X is
complete, but it may also be true for an incomplete space.
As a first application of the uniform boundedness principle we prove that the point-
wise limit of a sequence of bounded linear operators on a Banach space must be a
bounded linear operator.
Corollary 6.29. Let X be a Banach space and Y a normed vector space. Suppose
(Tn )n∈N ⊂ L(X, Y ) is such that (Tn x)n∈N converges to some T x for every x ∈ X. Then
T ∈ L(X, Y ).
Proof. Linearity of T follows from linearity of limits. It remains to show that T is
bounded. Let x ∈ X. Since (Tn x)n∈N converges, we have supn kTn xkY < ∞ (convergent
sequences are bounded). By the Banach-Steinhaus theorem, there exists C ∈ (0, ∞)
such that kTn kop ≤ C for every n. Let x ∈ X. Then
kT xkY = lim kTn xkY ≤ CkxkX .
n→∞

Remark. Note that in the context of Corollary 6.29 it does not follow that Tn → T in
L(X, Y ). For instance, let Tn : `1 → `1 and Tn (x) = xn en . Then Tn (x) → 0 as n → ∞
for every x ∈ `1 , but kTn kop = 1 for every n ∈ N, so Tn does not converge to 0 in
L(X, Y ).
4.1. An application to Fourier series. Recall that for a 1-periodic continuous
function f : R → C we defined the partial sums of its Fourier series by
N
X
SN f (x) = cn e2πinx = f ∗ DN (x),
n=−N
R1 sin(2π(N + 21 )x)
where cn = 0 f (t)e−2πitn dt and DN (x) = N 2πixn
P
n=−N e = sin(πx)
is the Dirichlet
kernel (see Section 4).
The uniform boudnedness principle directly implies the following:
Corollary 6.30. Let x0 ∈ R. There exists a 1-periodic continuous function f such
that the sequence (SN f (x0 ))N ⊂ C does not converge. That is, the Fourier series of f
does not converge at x0 .
In particular, this means that the Dirichlet kernels do not form an approximation
of unity. To see why this is a consequence of the uniform boudnedness principle, we
first need to take another close look at the partial sums.
Lemma 6.31. There exists a constant c ∈ (0, ∞) such that for every N ∈ N,
Z 1
|DN (x)|dx ≥ c log(N ).
0

Proof. Since | sin(x)| ≤ |x|,


| sin(2π(N + 12 )x)| | sin(2π(N + 21 )x)|
Z 1 Z 1 Z 1
−1
|DN (x)|dx = dx ≥ π dx.
0 0 | sin(πx)| 0 x
4. THE UNIFORM BOUNDEDNESS PRINCIPLE* 123

Changing variables 2π(N + 12 )x 7→ x we see that the right hand side of this display
equals
Z π(2N +1) 2N Z π(k+1)
−1 | sin(x)| −1
X | sin(x)|
π dx = π dx.
0 x k=0 πk x
We have that
2N Z π(k+1) 2N Z πk+ π + π 2N Z πk+ π + π
X | sin(x)| X 2 100 | sin(x)| X 2 100 dx
dx ≥ dx ≥ c .
k=0 πk x k=0 πk+ π
2
− π
100
x k=0 πk+ π
2
− π
100
x
Here we have used that | sin(x)| ≥ c for some positive number c whenever |x| is at most
π
100
away from πk + π2 for some integer k ∈ Z (indeed, | sin(x)| ≥ sin(π/2 − π/100) > 0
for such x). Since x 7→ 1/x is a decreasing function,
Z πk+ π + π
2 100 dx 1 1
π 1
≥ 50 · π π ≥ 50 · .
πk+ π2 − 100
π x πk + 2 + 100 k+1
Thus,
2N Z πk+ π2 + 100
π 2N 2N Z k+2 Z 2N +2
X dx 1
X 1 1
X dx 1 dx 1
≥ 50
≥ 50
= 50
= 50
log(2N + 2),
k=0 πk+ π2 − 100
π x k=0
k+1 k=0 k+1 x 1 x
which implies the claim. 
Let us denote the space of 1-periodic continuous functions f : R → C by C(T) (here
T = R/Z = S 1 is the unit circle, which is a compact metric space1). Then C(T) is a
Banach space. Fix x0 ∈ R. We can define a linear map TN : C(T) → C by
TN f = SN f (x0 ).
Lemma 6.32. For every N ∈ N, TN : C(T) → C is a bounded linear map and
kTN kop = kDN k1 .
R1
(Here kDN k1 = 0
|DN (x)|dx.)
Proof. For every f ∈ C(T) we have
Z 1 Z 1
|TN f | = |f ∗ DN (x0 )| ≤ |f (x0 − t)DN (t)|dt ≤ kf k∞ |DN (t)|dt = kf k∞ kDN k1 .
0 0
Therefore, TN is bounded and kTN kop ≤ kDN k1 . To prove the lower bound we let
f (x) = sgn(DN (x0 − x)).
While f is not a continuous function, it can be approximated by continuous functions
as the following exercise shows.
Exercise 6.33. Show that for every ε > 0 there exists g ∈ C(T) such that |g(t)| ≤ 1
for all t ∈ R and Z 1
ε
|f (t) − g(t)|dt ≤
0 2N + 1
Hint: Modify the function f in a small enough neighborhood of each discontinuity; g
can be chosen to be a piecewise linear function.
1The metric being the quotient metric inherited from R or the subspace metric induced by the
inclusion S 1 ⊂ R2 . These metrics are equivalent.
124 6. THE BAIRE CATEGORY THEOREM*

So let ε > 0 and choose g ∈ C(T) as in the exercise. We have


Z 1 Z 1
|TN f | = |f ∗ DN (x0 )| = sgn(DN (t))DN (t)dt = |DN (t)|dt = kDN k1 .
0 0
Moreover,
|TN g| ≥ |TN f | − |TN (f − g)|,
The error term |TN (f − g)| can be estimated as follows:
Z 1
|TN (f − g)| ≤ |DN (x0 − t)||f (t) − g(t)|dt
0
Z 1
ε
≤ kDN k∞ |f (t) − g(t)|dt ≤ (2N + 1) = ε,
0 2N + 1
and so
kTN kop ≥ |TN g| ≥ kDN k1 − ε.
Since ε > 0 was arbitrary, this implies kTN kop ≥ kDN k1 . 
Armed with this knowledge, we can now reveal Corollary 6.30 as a direct conse-
quence of Theorem 6.27. Indeed, we have that
kTN kop = kDN k1 ≥ c log(N )
and therefore
sup kTN kop = ∞.
N ∈N
So by Theorem 6.27 there must exist an f ∈ C(T) such that
sup |TN f | = ∞.
N ∈N

In otherwords, (SN f (x0 ))N does not converge.

Remark. Continuous functions with divergent Fourier series can also be constructed
explicitly. The conclusion of Corollary 6.30 can be strengthened significantly: for ev-
ery Lebesgue null set A ⊂ T 2 there exists a continuous function whose Fourier series
diverges on A (see J.-P. Kahane, Y. Katznelson: Sur les ensembles de divergence des
séries trigonométriques, Studia Math. 26 (1966), 305–306. ).
On the other hand, L. Carleson proved in 1966 that the Fourier series of a continuous
function must always converge almost everywhere (that is, everywhere except possibly
on a Lebesgue null set). This is a very deep result in Fourier analysis which is difficult
to prove (see M. Lacey, C. Thiele: A proof of boundedness of the Carleson operator,
Math. Res. Lett. 7 (2000), no. 4, 361—370 for a very elegant proof).
5. Further exercises
Exercise 6.34. We define the subset A ⊂ R as follows: x ∈ A if and only if there
exists c > 0 such that
|x − j2−k | ≥ c2−k
holds for all j ∈ Z and integers k ≥ 0. Show that A is meager and dense.
Exercise 6.35. Let (X, d) be a complete metric space without isolated points.
Prove that X cannot be countable.
2SeeExercise 6.10 for a definition on R; Lebesgue null sets of T are precisely the images of Lebesgue
null sets on R under the canonical quotient map R → R/Z = T.
5. FURTHER EXERCISES 125

Exercise 6.36. (i) Show that if X is a normed vector space and U ⊂ X a proper
subspace, then U has empty interior.
(ii) Let
X = {P : R → R | P is a polynomial}.
Use the Baire category theorem to prove that there exists no norm k · k on X such that
(X, k · k) is a Banach space.
(iii) Let X be an infinite dimensional Banach space. Prove that X cannot have a
countable (linear-algebraic) basis.
Exercise 6.37. Consider X = C([−1, 1]) equipped with the usual norm kf k∞ =
supt∈[−1,1] |f (t)|. Let
A+ = {f ∈ X : f (t) = f (−t) ∀t ∈ [−1, 1]},
A− = {f ∈ X : f (t) = −f (−t) ∀t ∈ [−1, 1]}.
(i) Show that A+ and A− are meager.
(ii) Is A+ + A− = {f + g : f ∈ A+ , g ∈ A− } meager?
Exercise 6.38. Construct a function f : R → R such that f is continuous at every
x ∈ Z and discontinuous at every x 6∈ Z.
Exercise 6.39. For every interval (open, half-open or closed) I ⊂ R give an example
of a function f : R → R such that f is continuous on I and discontinuous on R \ I.
Exercise 6.40∗. Let f : R → R be a smooth function so that for every x ∈ R there
exists n ≥ 0 with f (n) (x) = 0. Prove that f is a polynomial.
APPENDIX A

Review

1. Series
Let (an )n∈N be a sequence of complex numbers. Recall that we say that the series
P∞ PN
n=1 an converges if the sequence of partial sums ( n=1 an )N ∈N converges. In that
case, the symbol ∞
P
a
n=1 n represents the limit of this sequence. If the summands are
non-negative (that is, an ≥ 0 for all n ∈ N), then we also write
X∞
an < ∞
n=1
P∞ P∞
to denote that the series
P∞ n=1 a n converges. The series n=1 an is said to converge
absolutely if the series n=1 |an | converges.
Similarly, given a sequence of functions (fn )n∈N on a metric space X we say that
P∞ PN
n=1 fn converges uniformly, if the sequence of partial sums ( n=1 fn )N ∈N converges
uniformly.
We will also sometimes consider doubly infinite series of the form ∞
P
n=−∞ an for a
sequence ofPcomplex numbers
P∞ (an )n∈Z . Such a series is considered convergent if each of
the series ∞ a
n=0 −n and n=1 an converges (and its value is in this case the sum of the
values of these two series).
Lemma A.1 (Weierstrass M -test). Let (fn )n∈N be a sequence of functions on a
metric space X such that there exists a sequence of non-negative real numbers (Mn )n∈N
with
|fn (x)| ≤ Mn
P∞
for
P∞ all n = 1, 2, . . . and all x ∈ X. Assume that n=1 Mn converges. Then the series
n=1 fn converges uniformly.

Proof. Let sm (x) = m


P
k=1 fk (x). For ` < m we observe the estimate,
Xm m
X m
X
|sm (x) − s` (x)| = fk (x) ≤ |fk (x)| ≤ Mk .
k=`+1 k=`+1 k=`+1
P PN2
Since k Mk converges there is, given ε > 0 an Nε such that k=N1 Mk < ε provided
that N1 , N2 are greater than Nε (why?). Use this fact and the displayed estimate to
conclude the proof. 
Lemma A.2. Suppose (fn )n∈N is a sequence of Riemann integrable functions on the
interval [a, b] which uniformly converges to some limit f on [a, b]. Then f is Riemann
integrable and Z Z b b
lim fn = f.
n→∞ a a

Proof. Let a < b and recall that for two Riemann integrable functions R b h1 , hR2b on
the interval [a, b] which satisfy h1 (x) ≤ h2 (x) for all x ∈ [a, b] we also have a h1 ≤ a h2
127
128 A. REVIEW

(one proves this by considering first the corresponding inequalities for Riemann upper
and lower sums). Apply this fact together with the linearity of the integral to get
Z b Z b Z b Z b
fn − f = fn − f ≤ |fn − f | ≤ (b − a) sup |fn − f |.
a a a a [a,b]

If fn converges to f uniformly then the right hand side converges to 0. 


Exercise A.3. Prove some of these facts yourself.
2. Power series
A power series is a function of the form

X
f (x) = cn x n
n=0
where cn ∈ C are some complex coefficients.
To a power series we can associate a number R ∈ [0, ∞] called its radius of conver-
gence such that
• P∞ n
P
n=0 cn x converges for every |x| < R,

• n=0 cn xn diverges for every |x| > R.
On the convergence boundary |x| = R, the series may converge or diverge. The number
R can be computed by the Cauchy-Hadamard formula :
 −1
1/n
R = lim sup |cn |
n→∞

(with the convention that if lim supn→∞ |cn |1/n = 0, then R = ∞.)

−R 0 R

Figure 1. Radius of convergence

Lemma A.4. A power series with radius of convergence R converges uniformly on


[−R + ε, R − ε] for every 0 < ε < R. Consequently, power series are continuous on
(−R, R).
Exercise A.5. Prove this. Uniform convergence does not necessarily hold on
(−R, R); give an example.
Lemma A.6. If f (x) = ∞ n
P
n=0 cn x has radius of convergence R, then f is differen-
tiable on (−R, R) and

X
f 0 (x) = ncn xn−1
n=1
for |x| < R.
2. POWER SERIES 129

Example A.7. The exponential function is a power series defined by



X 1 n
exp(x) = x .
n=0
n!
The radius of convergence is R = ∞.
Lemma A.8. The exponential function is differentiable and exp0 (x) = exp(x) for all
x ∈ R.
Lemma A.9. For all x, y ∈ R we have the functional equation
exp(x + y) = exp(x) exp(y).
It also makes sense to speak of exp(z) for z ∈ C since the series converges absolutely.
We also write ex instead of exp(x).
Example A.10. The trigonometric functions can also be defined by power series:

X (−1)n 2n
cos(x) = x
n=0
(2n)!

X (−1)n 2n+1
sin(x) = x
n=0
(2n + 1)!
Lemma A.11. The functions sin and cos are differentiable and
sin0 (x) = cos(x), cos0 (x) = − sin(x)
The trigonometric functions are related to the exponential function via complex
numbers.
Lemma A.12 (Euler’s identity). For all x ∈ R,
eix = cos(x) + i sin(x),
eix + e−ix
cos(x) = ,
2
eix − e−ix
sin(x) = .
2i
Lemma A.13 (Pythagorean theorem). For all x ∈ R,
cos(x)2 + sin(x)2 = 1.
Let us also recall basic properties of complex numbers at this point: For every
complex number z ∈ C there exist a, b ∈ R, r ≥ 0 and φ ∈ [0, 2π) such that
z = a + ib = reiφ .
The complex conjugate of z is defined by
z = a − ib = re−iφ
The absolute value of z is defined by

|z| = a2 + b 2 = r
We have
|z|2 = zz.
130 A. REVIEW

C
z = a + ib = reiφ

b r

φ
a

Figure 2. Polar and cartesian coordinates in the complex plane

We finish the review section with a simple, but powerful theorem on the continuity
of power series on the convergence boundary.
P∞ n
Theorem A.14 (Abel). Let fP(x) = n=0 cn x be a power series with radius of
convergence R = 1. Assume that ∞ n=0 cn converges. Then
X∞
lim f (x) = cn .
x→1−
n=0

(In particular, the limit exists.)


The key idea for the proof is Abel summation , also referred to as summation by
parts . The precise formula can be derived simply by reordering terms (we say that
a−1 = 0):
N
X
(an − an−1 )bn = a0 b0 + a1 b1 − a0 b1 + a2 b2 − a1 b2 + · · · + aN bN − aN −1 bN
n=0
= a0 (b0 − b1 ) + a1 (b1 − b2 ) + · · · + aN −1 (bN −1 − bN ) + aN bN
N
X −1
= aN b N + an (bn − bn+1 ) .
n=0
Pn
Proof. To apply summation by parts we set sn = k=0 ck , s−1 = 0. Then
N
X N
X N
X −1
n n N
cn x = (sn − sn−1 )x = sN x + (1 − x) s n xn .
n=0 n=0 n=0
Let 0 < x < 1. Then

X
f (x) = (1 − x) s n xn
n=0
P∞
Let s = n=0 cn . By assumption, sn → s. Let ε > 0 and choose N ∈ N such that
|sn − s| < ε
for all n > N . Then,

X
|f (x) − s| = (1 − x) (sn − s)xn ,
n=0
P∞ n
because (1 − x) n=0 x = 1. Now we use the triangle inequality and split the sum at
n = N:
3. TAYLOR’S THEOREM 131

≤ε
N
X ∞
X z }| {
n
≤ (1 − x) |sn − s|x + (1 − x) |sn − s| xn
n=0 n=N +1

N
X
≤ (1 − x) |sn − s|xn + ε.
n=0
By making x sufficiently close to 1 we can achieve that
N
X
(1 − x) |sn − s|xn ≤ ε.
n=0

This concludes the proof. 


Abel’s theorem provides a tool to evaluate convergent series.
Example A.15. Consider the power series

X (−1)n 2n+1
f (x) = x .
n=0
2n + 1
The radius of convergence is R = 1. This is the Taylor series at x = 0 of the function
arctan.
Exercise A.16. (a) Prove that f (x) really is the Taylor series at x = 0 of arctan.
(b) Prove that arctan(x) is represented by its Taylor series at x = 0 for every |x| < 1,
i.e. that f (x) = arctan(x) for |x| < 1.
(−1)n
It follows from the alternating series test that ∞
P
n=0 2n+1 converges. Thus, Abel’s
theorem implies that

X (−1)n π
= lim arctan(x) = arctan(1) = .
n=0
2n + 1 x→1− 4
This is also known as Leibniz’ formula .

3. Taylor’s theorem
Theorem A.17. Let I be an interval and let f ∈ C n+1 (I), i.e all derivatives of f
up to order n + 1 are continuous in I. Fix a ∈ I. Then for all x ∈ I.
n
X f (k) (a)
f (x) = (x − a)k + Rn (x, a)
k=0
k!
where
x
(x − t)n (n+1)
Z
Rn (x, a) = f (t)dt
a n!
(x − a)n+1 1
Z
(A.1) = (1 − s)n f (n+1) (a + s(x − a))ds
n! 0

Proof. We first observe that the second version and the first version of the remain-
der term are equivalent by changing variables (via the substitution t = a + s(x − a),
dt = (x − a)ds; note that t ranges from a to x as s ranges from 0 to 1).
132 A. REVIEW

For n = 0 the formula reads


Z x
f (x) = f (a) + f 0 (t)dt
a
which just follows from the fundamental theorem of calculus.
We also find that by integration by parts for f ∈ C (n+2) (I)
Z x h −(x − t)n+1 ix Z x −(x − t)n+1
n (n+1) (n+1)
(x − t) f (t) dt = f (t) − f (n+2) (t)dt
a n + 1 a n + 1
Z x a
(x − a)n+1 (n+1) (x − t)n+1 (n+2)
= f (a) + f (t)dt.
n+1 a n+1
which shows
(x − a)n+1 (n+1)
Rn (x, a) = f (a) + Rn+1 (x, a)
n+1
and establishes the induction step of the proof of the formula. To be precise if (∗N )
denotes the statement that
N
X f (k) (a)
f (x) = (x − a)k + RN (x, a)
k=0
k!

holds for all f ∈ C (N +1) (I) then (∗)N implies (∗N +1 ) for all N = 0, 1, 2, . . . . 
Theorem A.18. Let f be as in Theorem A.17 and let Rn as in (A.1). Let
Mn+1 = max{|f (n+1) (a + s(x − a))| : 0 ≤ s ≤ 1}.
Then
Mn+1
|Rn (x, a)| ≤ |x − a|n+1 .
(n + 1)!
Proof. We have
|x − a|n+1 1
Z
|Rn (x, a)| ≤ (1 − s)n |f (n+1) (a + s(x − a))|ds
n! 0
n+1 Z 1
|x − a| |x − a|n+1
≤ Mn+1 (1 − s)n ds = Mn+1 
n! 0 (n + 1)!
Theorem A.19. Let f be as in Theorem A.17 and let Rn as in (A.1). There is ξ
between a and x such that
|x − a|n+1 (n+1)
Rn (x, a) = f (ξ).
(n + 1)!
Proof. Let
m = min{f (n+1) (a + s(x − a)) : 0 ≤ s ≤ 1},
M = max{f (n+1) (a + s(x − a)) : 0 ≤ s ≤ 1}.
We estimate
Z 1 Z 1 Z 1
n n (n+1)
(1 − s) m ds ≤ (1 − s) f (a + s(x − a))ds ≤ (1 − s)n M ds
0 0 0
and hence Z 1
m ≤ (n + 1) (1 − s)n f (n+1) (a + s(x − a))ds ≤ M.
0
4. THE RIEMANN INTEGRAL 133

By the intermediate value theorem for continuous functions there is σ ∈ [0, 1] such that
Z 1
(n+1)
(A.2) f (a + σ(x − a)) = (n + 1) (1 − s)n f (n+1) (a + s(x − a))ds.
0
If we set ξ = a + σ(x − a) so that ξ is on the line segment connecting a to x we get the
claimed statement from (A.1) and (A.2). 

4. The Riemann integral


We recall some definitions. In what follows a, b ∈ R, with a < b are given. In this
section we recall basic definitions which lead to the definition of Riemann integrable
functions on [a, b], and the Riemann integral of such functions.
Definition A.20. (i) A partition P = {x0 , . . . , xn } of [a, b] is a finite subset of [a, b]
which includes the points a and b and is ordered in the following way:
a = x0 < ... < xi < xi+1 < ... < xn = b.
(ii) If P , P 0 are partitions of [a, b] with P ⊂ P 0 then P 0 is called a refinement of P .
Definition A.21. Given a partition P = {a = x0 < · · · < xn = b} of [a, b] and a
bounded function f : [a, b] → R define
mi (f ) = inf f (t),
t∈[xi−1 ,xi ]

Mi (f ) = sup f (t).
t∈[xi−1 ,xi ]

(i) The expression


n
X
L(f, P ) = mi (f )(xi − xi−1 )
i=1
is called the lower sum of f with respect to the partition P .
(ii) The expression
n
X
U (f, P ) = Mi (f )(xi − xi−1 )
i=1
is called the upper sum of f with respect to the partition P .
Lemma A.22. Let P , P 0 be partitions of [a, b], let f : [a, b] → R be bounded, and let
0
P be a refinement of P . Then
(b − a) inf f ≤ L(f, P ) ≤ L(f, P 0 ) ≤ U (f, P 0 ) ≤ U (f, P ) ≤ (b − a) sup f.
[a,b] [a,b]

Corollary A.23. Let P1 , P2 be partitions of [a, b]. Then L(f, P1 ) ≤ U (f, P2 ).


Definition A.24. Let f : [a, b] → R be a bounded function. The numbers
b
I ba (f ) := sup L(f, P ), I a (f ) := inf U (f, P )
P P

are called the lower and upper Riemann-Darboux integrals of f on the interval [a, b],
respectively. Here the sup and inf are taken over all partitions of [a, b].
Lemma A.25. Let f : [a, b] → R be bounded. Then
b
(b − a) inf f ≤ I ba (f ) ≤ I a (f ) ≤ (b − a) sup f.
[a,b] [a,b]
134 A. REVIEW

We are now ready to define the concept of Riemann integrable functions and the
Riemann integral of such functons.
Definition A.26. (i) Let f : [a, b] → R be bounded. f is called Riemann integrable
b
if I ba (f ) = I a (f ).
b
(ii) If f is Riemann integrable the number I ba (f ) = I a (f ) is called the Riemann
R Rb Rb
integral of f , denoted by [a,b] f or by a f (or even by a f (t)dt ...)
Lemma A.27. Let f : [a, b] → R be a bounded function. Then f is Riemann
integrable if and only if for every ε > 0 there is a partition P of [a, b] such that
U (f, P ) − L(f, P ) < ε.
Proof. Suppose f is Riemann integrable. Then there are partitions P1 , P2 of [a, b]
Rb Rb
such that L(f, P1 ) ≥ a f − ε/2, U (f, P2 ) ≤ a f + ε/2 and thus U (f, P2 ) − L(f, P1 ) < ε.
Let P be the refinement P1 ∪ P2 . Then U (f, P2 ) ≥ U (f, P ) ≥ L(f, P ) ≥ L(f, P1 ) and
hence U (f, P ) − L(f, P ) < ε.
Vice versa assume that for every  there is a partition Pε of [a, b] such that
U (f, Pε ) − L(f, Pε ) < ε.
b
Then I a (f ) − I ba (f ) ≤ U (f, Pε ) − L(f, Pε ) < ε, and since ε was arbitrary we conclude
b
I a (f ) = I ba (f ). Hence f is Riemann integrable. 
Theorem A.28. If f : [a, b] → R is continuous in [a, b] then f is Riemann inte-
grable.
Proof. Recall that a continuous function on a compact set is uniformly continuous.
Hence given ε > 0 there exists δε > 0 such that |f (x) − f (x̃)| < ε/(b − a) provided that
|x − x̃| < δ. Let N be such that (b − a)/N < δ and choose the partition P = {xj :=
a + j b−a
N
, j = 0, . . . N }. Let Ij = [xj−1 , xj ], j = 1, . . . , N . Then
(Mi (f ) − mi (f ) = (sup f − inf f ) < ε/(b − a)
Ij Ij

for i = 1, . . . , N so that
N
X N
X
U (f, P ) − L(f, P ) = Mi f (xj − xj−1 ) − mi (f )(xj − xj−1 )
i=1 i=1
N N
X X ε ε
= (Mj (f ) − mi (f ))(xj − xj−1 ) ≤ (xj − xj−1 ) = (b − a) = ε.
i=1 i=1
b−a b−a
We can apply Lemma A.27 to see that f is Riemann integrable. 
Theorem A.29. (i) Let f and g be Riemann integrable functions on [a, b] and
Rb Rb
suppose that f (x) ≤ g(x) for all x ∈ [a, b]. Then a f ≤ a g.
(ii) Let f be Riemann integrable on [a, b]. Then
Z b
f ≤ (b − a) sup |f |
a [a,b]

Exercise A.30. Let f : [a, b] → R be a bounded function. Under each of the


following hypotheses on f show that f is Riemann integrable.
(i) There is a point c ∈ [a, b] such that f is continuous on [a, b] \ {c}.
5. FURTHER EXERCISES 135

(ii) f is continuous except possibly at a finite number of points in [a, b].


(iii) f is continuous in [a, b] \ {ck : k ∈ N}, where (ck )k∈N is a convergent sequence
of points in [a, b],
Exercise A.31. Let f : [0, 1] → R be defined by
(
x2 if x ∈ Q
f (x) =
x if x ∈ [0, 1] \ Q.
1
Compute I 0 (f ) and I 10 (f ).
5. Further exercises
Exercise A.32. Prove or disprove convergence for each of the following series (a
and b are real parameters and convergence may depend on their values).
∞ ∞ ∞
X 1 X log n
a log
X
1/n n+1

(i) (ii) (log n) log n (iii) e − n
n=2
na (log(n))b n=3 n=1

∞ ∞  n 2 ∞
X
−1
X 1 X 1
(iv) cos(πn) sin(πn ) (v) 1+ −e (vi)
n=1 n=2
n n=1
n(n1/n )100

∞ ∞ X
10n ∞
X a
X nk  X 1
(vii) 2−(log(n)) (viii) (−1)k (ix)
n=2 n=1 k=0
k! n=1
n2 (1 − cos(n))
Exercise A.33. Prove or disprove convergence for each of the following sequences
and in case of
p convergence, determine the limit:
4 2 2
(i) an = n + cos(n
2 1
√)−n
(ii) an = n + 2 n − n4 + n3
P 2
(iii) an = nk=n k1
(iv) an = n ∞ 1
P
k=0 n2 +k2
(v) a0 = 1, an+1 = a2n + a1n
2
(vi) an = nk=2 k k−1
Q
2

Exercise A.34. For which x ∈ R do the following series converge? On which sets
do these series converge uniformly?

X ∞
X ∞
X
(i) 2 n
nx (ii) 1/n n n
(3 − 1) x (iii) tan(n−2 )enx
n=1 n=1 n=1
∞ ∞ ∞
X xn X sin(nx) X
(iv) (v) (vi) 2−n tan(bxc + 1/n)
n=1
nn n=1
n2 n=1

Exercise A.35. (i) Define f by setting f (x) = x for x ≥ 0 and f (x) = 0 for x < 0.
Then f is not differentiable at x = 0. Construct an example of a sequence (fn )n∈N of
continuously differentiable functions defined on R, uniformly convergent on R to f .
(ii) Let fn (x) = n−1/2 sin nx. Show that fn converges uniformly on R, but for every
x ∈ R, the sequence (fn0 (x))n∈N does not have a limit.
Exercise A.36. Give an example of a sequence (fn )n∈N of continuous bounded
functions on R that converges pointwise to some function f such that f is unbounded
and not continuous.
136 A. REVIEW
P∞ (−1)n
Exercise A.37. Determine the value of the series n=1 n(n+1) .

Exercise A.38. For a positive real number x define



X 1
f (x) = .
n=0
n(n + 1) + x
(i) Show that f : (0, ∞) → (0, ∞) is a well-defined and continuous function.
(ii) Prove that there exists a unique x0 ∈ (0, ∞) such that f (x0 ) = 2π.
(iii) Determine the value of x0 . Hint: Recall Leibniz’ formula from Example A.15.
Exercise A.39. Let f : R → R be a smooth function (i.e. derivatives of all orders
exist). Assume that there exist A > 0, R > 0 such that
|f (n) (x)| ≤ An n!
for |x| < R. Show that there exists r > 0 such that for every |x| < r we have that

X f (n) (0) n
f (x) = x .
n=0
n!
(That is, prove that the series on the right hand side converges and that the limit is
f (x).)

You might also like