0% found this document useful (0 votes)
11 views

Analysis II Script v1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Analysis II Script v1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 141

Eidgenössische Technische Hochschule Zürich

Analysis II: Several Variables

Lecture Notes — Spring Semester 2024

Joaquim Serra

February 25, 2024


Chapter 8.0

Preface
This notes are the continuation of ’Analysis I: One variable’ and follow the same format
and general spirit.
Originally crafted in German for the academic year 2016/2017 by Manfred Einsiedler
and Andreas Wieser, they were designed for the Analysis I and II courses in the Interdis-
ciplinary Natural Sciences, Physics, and Mathematics Bachelor programs. In the academic
year 2019/2020, a substantial revision was undertaken by Peter Jossen.

For the academic year 2023/2024, Joaquim Serra has developed this English version. It
differs from the German original in several aspects: reorganization and alternative proofs of
some materials, rewriting and expansion in certain areas, and a more concise presentation in
others. This version strictly aligns with the material presented in class, offering a streamlined
educational experience.

The courses Analysis I/II and Linear Algebra I/II are fundamental to the mathematics
curriculum at ETH and other universities worldwide. They lay the groundwork upon which
most future studies in mathematics and physics are built.

Throughout Analysis I/II, we will delve into various aspects of differential and integral
calculus. Although some topics might be familiar from high school, our approach requires
minimal prior knowledge beyond an intuitive understanding of variables and basic algebraic
skills. Contrary to high-school methods, our lectures emphasize the development of mathemat-
ical theory over algorithmic practice. Understanding and exploring topics such as differential
equations and multidimensional integral theorems is our primary goal. However, students are
encouraged to engage with numerous exercises from these notes and other resources to deepen
their understanding and proficiency in these new mathematical concepts.

Version: February 25, 2024. i


Contents

9 Metric spaces 2
9.1 Basics of Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
9.1.1 The Euclidean space Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
9.1.2 Definition of metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . 5
9.1.3 Sequences, limits, and completeness . . . . . . . . . . . . . . . . . . . . 7
9.1.4 *The Reals as the Completion of Rationals (extra material; cf. Grund-
strukturen) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9.2 Topology of Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.2.1 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.2.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.2.3 Banach’s Fixed-Point Theorem . . . . . . . . . . . . . . . . . . . . . . . 19
9.2.4 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
9.2.5 Compactness and continuity . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.2.6 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9.3 Normed vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
9.3.1 Definition of Normed Vector spaces . . . . . . . . . . . . . . . . . . . . . 31
9.3.2 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
9.3.3 Equivalence of norms in finite dimensional normed spaces . . . . . . . . 35
9.3.4 The space of bounded continuous functions with values in Rm . . . . . . 38
9.3.5 The Length of a Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

10 Multidimensional Differentiation 44
10.1 The Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10.1.2 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
10.1.3 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 53
10.2 Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.2.1 Definition and basic properties . . . . . . . . . . . . . . . . . . . . . . . 55
10.2.2 Schwartz’s Theorem and Multi-indexes notation . . . . . . . . . . . . . . 56
10.2.3 Multidimensional Taylor Approximation . . . . . . . . . . . . . . . . . . 58

11 Potentials, Optimization and Convexity 62


11.1 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

ii
11.1.1 Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
11.1.2 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
11.1.3 Extrema with Constraints and Lagrange Multipliers . . . . . . . . . . . 65
11.2 Relevant examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
11.2.1 Operator norm of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . 68
11.2.2 Fundamental Theorem of Algebra . . . . . . . . . . . . . . . . . . . . . . 69
11.2.3 Diagonalizability of Symmetric Matrices . . . . . . . . . . . . . . . . . . 71
11.3 Potentials and the equation Du = F . . . . . . . . . . . . . . . . . . . . 73
11.3.1 The work of a Vector Field along a line . . . . . . . . . . . . . . . . . . 73
11.3.2 The Poincaré Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

12 Foundations of Differential Geometry 85


12.1 The inverse function Theorem (F) . . . . . . . . . . . . . . . . . . . . . . 85
12.2 Manifolds in Rn (F) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.2.1 Definition of Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.2.2 Tangent space to a manifold . . . . . . . . . . . . . . . . . . . . . . . . . 92
12.2.3 Examples and Non-Examples . . . . . . . . . . . . . . . . . . . . . . . . 92

13 NEW: Multidimensional Integration 93


13.1 The n-volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
13.2 Measurable Sets and the Integral . . . . . . . . . . . . . . . . . . . . . 96
13.2.1 Measurable sets and Fubini’s Theorem . . . . . . . . . . . . . . . . . . . 96
13.2.2 The integral of a continuous function . . . . . . . . . . . . . . . . . . . . 96
13.2.3 The change of variables formula . . . . . . . . . . . . . . . . . . . . . . . 96
13.3 Computation of Multiple integrals . . . . . . . . . . . . . . . . . . . . . 96

14 OLD: Multidimensional Integration 97


14.1 The Riemann Integral for Boxes . . . . . . . . . . . . . . . . . . . . . . . 97
14.1.1 Definition and Initial Properties . . . . . . . . . . . . . . . . . . . . . . . 97
14.1.2 Null Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
14.1.3 The Lebesgue Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
14.1.4 Riemann Integrability and Continuity . . . . . . . . . . . . . . . . . . . 106
14.2 The Riemann Integral over Jordan-Measurable Sets . . . . . . . . . 108
14.2.1 Jordan Measurability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
14.2.2 The Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
14.3 Multidimensional Substitution Rule . . . . . . . . . . . . . . . . . . . . 116
14.3.1 The Substitution Rule and First Examples . . . . . . . . . . . . . . . . . 116
14.3.2 Linear Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
14.3.3 Proof of the Substitution Rule . . . . . . . . . . . . . . . . . . . . . . . 121
14.4 Improper Multiple Integrals and applications . . . . . . . . . . . . . . 126
14.5 Parameter Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
14.5.1 Interchanging Differentiation and Integration . . . . . . . . . . . . . . . 129
14.5.2 The Bessel Differential Equation . . . . . . . . . . . . . . . . . . . . . . 132

1
Chapter 9

Metric spaces

In Analysis I, we focused primarily on functions that operate between real numbers, R to R.


Consequently, we deeply explored the real number line, R, and its properties.
Now, in Analysis II, we expand our scope to functions that map from Rn to Rm , where n
and m are positive integers. In this context, we start by delving into the properties of Rn (or
Rm , as we can use them interchangeably since n and m are arbitrary).
We’ll discover that Rn follows the axioms of a metric space when equipped with the stan-
dard Euclidean distance, which we’ll define later. This means that Rn is a metric space.
Understanding this concept is vital because it lays the foundation for many fundamental def-
initions and results we want to establish for Rn and can be extended to a broader range of
metric spaces.
Additionally, this will allow us to revisit (and review) the crucial concept of convergence,
which we have seen in Analysis I for R, within the broader context of metric spaces. We’ll
emphasize the essential properties of metric spaces and their topology, particularly focusing
on compactness.
Furthermore, we’ll introduce and explore standard results related to normed vector spaces.
As we will see, these spaces are more general than Euclidean space Rn , but they possess a
higher degree of structure (i.e., more axioms) than metric spaces.

2
Chapter 9.1

9.1 Basics of Metric Spaces


In this class, our primary focus will be on Rn , but we’ll find that certain definitions and proofs
become clearer when viewed within the broader context of metric spaces, which have fewer
axioms.
However, when we’re dealing with a general metric space X, it often helps to initially
visualize it as our familiar 2-dimensional or 3-dimensional spaces, R2 or R3 . This approach
can simplify the intuitive understanding of various arguments. If an argument relies on only
a limited set of fundamental properties of R3 (excluding aspects like angles and vector space
structure), it may be applicable to general metric spaces as well.

9.1.1 The Euclidean space Rn


For some integer n ≥ 1, we denote by Rn the set of all ordered n-tuples of real numbers. A
general element x ∈ Rn is thus of the form x = (x1 , . . . , xn ), where the xi ’s are real numbers.
(Even more rigorously Rn := x : {1, . . . , n} → R .)


Rn is a vector space over the field of real numbers with the coordinate-wise addition and
multiplication by a scalar. Rigorously, given x, y ∈ Rn and λ ∈ R we have
1
x + y := (x1 + y1 , . . . , xn + yn ), λx := (λx1 , . . . , λxn ).

Definition 9.1: Euclidean Structure of Rn


Given x, y ∈ Rn we define the standard scalar product of x and y as
X
x · y = ⟨x, y⟩ := xi yi ,
1≤i≤n

the Euclidean norm of x as


s X
∥x∥ := x2i ,
1≤i≤n

and the Euclidean distance of x and y as


s X
d(x, y) := ∥x − y∥ = (xi − yi )2 .
1≤i≤n

We key property of the Euclidean distance that we want to abstract is

Version: February 25, 2024. 3


Chapter 9.1

Proposition 9.2: Triangle Inequality in Rn

For all x, y, z ∈ Rn
∥x − z∥ ≤ ∥x − y∥ + ∥y − z∥.

Equivalently, for all x, y ∈ Rn , ∥x + y∥ ≤ ∥x∥ + ∥y∥ .

Proof. To prove the equivalence of the two statements consider a = x − y and b = y − z so


that a + b = x − z.
We will see later a more general proof that applies to all inner product spaces (see Corollary
9.100), based on the Cauchy-Schwarz inequality (see Proposition 9.98). Lets us give here a
hands-on argument.
Pick two points in Rn , x = (x1 , . . . xn ) and y = (y1 , . . . , yn ) we would like to show
n
X n
1/2  X n
1/2  X 1/2
2 2
(xi + yi ) ≤ xi + yi2 ,
i=1 i=1 i=1

1 Taking squares, this is equivalent to


n
X n
X n
X n
1/2  X 1/2 n
X
x2i + 2xi yi + yi2 ≤ x2i + 2 x2i x2i + yi2 ,
i=1 i=1 i=1 i=1 i=1

and, canceling terms, to


n
X n
X n
1/2  X 1/2
x i yi ≤ x2i x2i . (9.1)
i=1 i=1 i=1

Therefore, the Proposition will follow if we can establish the validity of (9.1), or (squaring
it) of:
Xn n
 X  Xn n
X
xi yi xj yj = xi xj yi yj ≤ x2i yj2 .
i=1 j=1 i,j=1 i,j=1

But this last inequality is easily established summing over all pairs i, j ∈ {1, . . . , n} the
inequalities
2xi xj yi yj ≤ x2i yj2 + x2i yj2 ⇔ (xi yj − xj yi )2 ≥ 0;

and observing that ni,j=1 x2i yj2 = ni,j=1 x2j yi2 .


P P

Version: February 25, 2024. 4


Chapter 9.1

9.1.2 Definition of metric Space

Definition 9.3: Metric space


A metric space (X, d) is a nonempty set X together with a nonnegative function
d : X × X → [0, ∞), called the distance (or metric) on X, which satisfies:

(1) For all x, y ∈ X, d(x, y) = 0 if and only if x = y (Definiteness).

(2) For all x, y ∈ X, d(x, y) = d(y, x) (Symmetry).

(3) For all x, y, z ∈ X, d(x, z) ≤ d(x, y) + d(y, z) (Triangle inequality).

9.4. — A metric d on a set X assigns to each pair of points their distance. In this
interpretation, the definiteness condition states that the only point at zero distance from a
given point x ∈ X is x itself. The symmetry condition states that the distance from x ∈ X to
y ∈ X is the same as from y to x. Interpreting the distance between two points as the length
of a shortest path from one point to the other, the triangle inequality states that the length
of a shortest path from x to y is at most the length of a path one takes by first going to y
and then from there to zx.

1 9.5. — When there is no possible confusion, we will often say “Let X be a metric space...”,
leaving the distance function unspecified. This is a shorter version of the more precise sentence
“Let (X, d) be a metric space...”.
Furthermore, we may refer to the set X as a space and the elements of X as points.
This is because we have in mind that X is some sort of geometric space, like a subset of the
plane or the surface of a sphere. In this setting, “spaces” and “points” will be synonymous
with “sets” and “elements”.

9.6. — Notice that the Euclidean space (Rn , d), with d(x, y) := ∥x − y∥, is a metric space.
In particular R, equipped with the absolute value distance |x − y| is a metric space.

Exercise 9.7. — Let (X, d) be a metric space and let ϕ : [0, ∞) → [0, ∞) a function which
is concave, increasing, ϕ(0) = 0, and not identically zero. Show that (X, ϕ ◦ d) is again a

metric space. For example one can take ϕ(t) = t, ϕ(t) = arctan t or ϕ(t) = 1+t
t
. Notice that
the last two choices always give bounded distances.

Example 9.8. — When X ⊂ R, we can take the standard metric d defined by

d(x, y) = |x − y| for all x, y ∈ X.

Version: February 25, 2024. 5


Chapter 9.1

Example 9.9. — Let X be a set, and d : X × X → R defined by



1 if x ̸= y
d(x, y) =
0 if x = y

for x, y ∈ X. Then, (X, d) is a metric space. Indeed, d is definite and symmetric by definition.
Furthermore, d satisfies the triangle inequality: Let x, y, z be points in X. If d(x, z) = 0, then
d(x, z) ≤ d(x, y) + d(y, z) is trivially satisfied. If d(x, z) = 1, then x ̸= z, and y is at least
different from one point in {x, z}, so the triangle inequality also holds. This metric d is called
the discrete metric on the set X.

Example 9.10. — Let X = R2 , and define the Manhattan metric on X by

dNY (x, y) = |x1 − y1 | + |y1 − y2 |

where we put x = (x1 , x2 ) and y = (y1 , y2 ). It can be verified (exercise) that dNY satisfies all
axioms of a metric. The reason why dNY is called the Manhattan metric is that in grid-like
places such as Manhattan, one can reach from (x1 , x2 ) to (y1 , y2 ) in the following way: first
move ‘horizontaly’ (i.e., with constant second coordinate from x = (x1 , x2 ) to (y1 , x2 ) and
then ‘vertically’ (with constant first coordinate) from (y1 , x2 ) to y = (y1 , y2 ), or vice versa:
1
x = (x1 , x2 ) to (x1 , y2 ), and then to y = (y2 , y2 ). Since all streets in Manhattan run either
from west to east or from north to south, dNY measures the relevant distance between two
points.

Exercise 9.11. — Let X be the set of all continuous real-valued functions defined on
[0, 1] ⊂ R. For f, g ∈ X set
Z 1
d1 (f, g) := max{|f (x) − g(x)| | x ∈ [0, 1]} and d2 (f, g) := |f (x) − g(x)|dx.
0

Show that (X, d1 ) and (X, d2 ) are metric spaces.

Example 9.12. — If (X, d) is a metric space and X0 ⊂ X is some subset then X0 inherits
an structure of metric space from X. Indeed, one can easily verify that (X0 , d0 ), where d0 the
restriction of d to X0 × X0 ⊂ X × X is a metric space.
For a more concrete instance of this take X = R3 with the Euclidean distance d and let
X0 be the sphere x ∈ R3 | x21 + x22 + x23 = R3 , for some R > 0. Then for any pair of points


x, y in the sphere we have


p p
d0 (x, y) = (x1 − y1 )2 + (x1 − y1 )2 + (x1 − y1 )2 = 2R2 − 2x · y

Version: February 25, 2024. 6


Chapter 9.1

An arguably more natural metric d1 on the sphere X0 can be define measuring the length
of the geodesic arc joining x and y. One can see that this metric is given by:
 
x·y
d1 (x, y) = R arccos ∈ [0, πR].
R2

9.1.3 Sequences, limits, and completeness


The definition of sequence in a set was given in Analysis I. We recall it next:

Definition 9.13: Sequences in a set

Let X be a set, a sequence in X is a function x : N → X. The image x(n) of n ∈ N


is also denoted as xn and referred to as the n-th term of x.
Instead of x : N → X, one often writes (xn )n∈N , (xn )n≥0 , or (xn )∞
n=0 .

We introduce the following vocabulary, which is useful if used precisely.

Definition 9.14: “Eventually”


Let (xn )n≥0 ⊂ X be a sequence and let P : X → {true , false} be a property that an
element in X might have or not. Then one says that “xn satisfies P eventually” there
exist N ∈ N such that P(xn ) is true for all n ≥ N . In other words, if P(xn ) holds true
along the sequence with only finitely many exceptions.
1
Definition 9.15: Convergent sequence

Let (X, d) be a metric space, x ∈ X and (xn )n∈N be a sequence in X. We say that
(xn )n∈N converges to x, or that x is the limit of the sequence (xn )n∈N , if

lim d(xn , x) = 0.
n

In other words, for any ε > 0 eventually d(xn , x) < ε.

9.16. — When the metric space (X, d) is clear from the context, we may write limn xn = x
or even xn → x to express that (xn )n∈N converges to x.

Exercise 9.17. — In the setting of Exercise 9.7, show that a sequence converges in (X, d)
if and only if it converges in (X, ϕ ◦ d).

Lemma 9.18: Uniqueness of the limit

In a metric space, a convergent sequence has only one limit.

Proof. Let (X, d) be a metric space and let A, B ∈ X be limits of some sequence (xn )∞
n=0 , we
mean to show that A = B. Take ε > 0, then, we can find NA , NB ∈ N such that d(xn , A) < 2ε

Version: February 25, 2024. 7


Chapter 9.2

for all n ≥ NA , and d(xn , B) < 2ε for all n ≥ NB . Then, for N := max{NA , NB }, we have
that
ε ε
1 d(A, B) ≤ d(A, xN ) + d(xN , B) < + = ε,
2 2
where we used the triangular inequality. Since ε > 0 was arbitrary, it follows that d(A, B) = 0,
and thus A = B because of the definitness of d.

We recall the notions of subsequences and accumulation points

Definition 9.19: Subsequence

Let (xn )∞
n=0 be a sequence in a set X.
A subsequence of (xn )∞ n=0 is a sequence of the form (xf (k) )k=0 , where f : N → N

is a strictly increasing function. It is standard to denote subsequences by (xnk )k∈N ,


k=0 (i.e., nk := f (k)).
(xnk )k≥0 , or (xnk )∞

Definition 9.20: Accumulation points


Let (X, d) be a metric space.

• Given Y ⊂ X, we say that that x ∈ X is an accumulation point of Y if there


exists a sequence (yn )n≥0 ⊂ Y converging to x.

• Given a sequence (xn )n≥0 in X, we say that x is an accumulation point of the


sequence if there if some of subsequence convergences to x.

Lemma 9.21: Accumulations points of a converging sequence


2
Let (xn )n∈N be a sequence in a metric space X, and let x ∈ X. (xn )n∈N converges to x
if and only if every subsequence of (xn )n∈N converges to x.
In other words: a sequence in a metric space is convergent if and only if it has a unique
accumulation point.

Proof. We first prove the “only if” part. Let (xf (n) )n∈N be a subsequence, i.e., f : N → N is
some strictly increasing map. Given ε > 0 there is N such that d(xn , x) < ε for all n ≥ N .
Hence, d(xf (n) , x) < ε for all n ≥ N , as f (n) ≥ n.
We now prove the “if” part we can simply use that (xn )n≥0 is a subsequence (i.e, f (n) = n)
and hence it converges to x.

9.22. — A stronger version of the previous Lemma that is useful in some contexts asserts
the following: a sequence (xn )n∈N in a metric space converges to x if and only if every
partial sequence (xf (n) )n∈N (f increasing) has a sub-subsequence (xg(f (n)) )n∈N (g increasing)
converging to x.
While the proof of the “only if” part is similar (g(f (n) ≥ n) the “if” part is less trivial than
in the previous lemma. One can argue by contraposition: If xn does not converge to x then

Version: February 25, 2024. 8


Chapter 9.2

we want to find a subsequence such that we cannot extract a sub-subsequence converging to


x.
To do so we start by the negation of “xn converges to x. Recall:

xn → x ⇔ ∀ε > 0 ∃N ∈ N ∀n ≥ N d(xn , x) < ε

The negation of this is:

∃ε > 0 ∀N ∈ N ∃n ≥ N d(xn , x) ≥ ε

In other words, the set of {n ∈ N | d(xn , x) ≥ ε} and hence there is an increasing sequence
(nk )k≥0 such that d(xnk , x) ≥ ε. Notice that any sub-subsequence will still remain at distance
≥ ε from x and hence will not converge to x.
This stronger version can be used, for example, to prove that a continuous function f :
[0, 1] → R has a unique minimum point if and only if all sequences (xn )n≥0 ⊂ [0, 1] such that
f (xn ) → min[0,1] converge to the same limit point.

Lemma 9.23: Convergence in Rn


A sequence in Rn converges (in the Euclidean distance) if and only if it converges
coordinate-wise.

2 Proof. Let {xk }k∈N ⊂ Rn be a sequence. For j = 1, . . . , n, we denote with xk,j the j-th
component of the vector xk .
Assume that xk → x ∈ Rn . By definition, given ε > 0 and any j, it holds
n
X 1/2
|xk,j − zj | ≤ (xk,i − zi )2 = ∥xk − x∥ ≤ ε eventually in k.
i=1

This proves that, for each j = 1, . . . , n, xk,j → xj when k → ∞ (as sequences of real numbers,
with the standard absolute value distance).
Assume now that for each j = 1, . . . , d it holds

xk,j → xj as k → ∞,

for some numbers zj ∈ R. We prove that xk → x in Rd where z := (z1 , . . . , zd ). Given ε > 0,


for each j ∈ {1, . . . , d}, there exists an Nj ∈ N such that

ε
|xk,j − xj | < for all k ≥ Nj .
n

This means that


n
X 1/2 √ ε
(xk,i − xi )2 ≤ n < ε for all k ≥ max{N1 , . . . , Nj },
n
i=1

Version: February 25, 2024. 9


Chapter 9.2

which proves that xk → x with respect to the Euclidean distance.

We introduce the concept of completeness for metric spaces. This concept does not
conflict with the notion of completeness that we gave for R. We will soon show that R, as
well as C, is complete as a metric space. In contrast, the metric space Q is not complete.

Definition 9.24: Cauchy sequence

A sequence (xn )∞
n=0 in a metric space (X, d) is a Cauchy sequence if, for every ε > 0,
there exists N ∈ N such that d(xm , xn ) < ε for all pair of integers m, n with n ≥ N
and m ≥ N .

Exercise 9.25. — Prove the following elementary facts about Cauchy sequences in a metric
space (X, d):

• A Cauchy sequence is bounded (meaning that {d(xn , x0 )} ⊂ R is a bounded set)

• Every convergent sequence is a Cauchy sequence.

• A Cauchy sequence converges if and only if it has a converging subsequence.

Definition 9.26: Complete metric space


2 A metric space (X, d) is called complete if every Cauchy sequence in (X, d) converges.

Example 9.27. — The interval (0, 1) ⊂ R, endowed with the standard distance d(x, y) =
|x − y|, is not complete. However, N ⊂ R is complete, as well as [0, ∞).

Exercise 9.28. — Show that Q, with the distance inherited from the standard distance
on R, is not a complete metric space.

Exercise 9.29. — Show that the space X of all bounded sequences (xn )n≥0 of real numbers,
equipped with the distance

d (xn )n≥0 ), (yn )n≥0 = sup |xn − yn |
n≥0

is complete. Show also that the subspace X0 of sequences with limit 0 is not complete.

Theorem 9.30: Completeness of Rn


For all n ≥ 1, Rn (with the Euclidean distance) is complete. In particular, R and C
are complete.

Version: February 25, 2024. 10


Chapter 9.2

Proof. Similar to the proof Lemma 9.23, a sequence (xk ) in Rn is a Cauchy if and only if xk,j ,
j = 1, . . . , n are Cauchy (in R). It then follows from Theorem 2.124???? (Cauchy sequences in
R converge) that xk converges coordinate-wise. Thus, by Lemma 9.23 xk converges in Rn .

9.31. — Completion of metric space (extra material) Let (X, d) be a metric space. We
write CX for the set of all Cauchy sequences in X and define an equivalence relation on CX by

(xn )∞ ∞
n=0 ∼ (yn )n=0 ⇐⇒ lim d(xn , yn ) = 0.
n→∞

The quotient set X = CX /∼ , equipped with the metric d given by

d([(xn )∞ ∞
n=0 ], [(yn )n=0 ]) = lim d(xn , yn ),
n→∞

is called the completion of (X, d). The injection ι : X → X, mapping x ∈ X to the class of
the constant sequence with value x, is called the canonical embedding. For all x, y ∈ X,
we have
d(x, y) = d(ι(x), ι(y)),

which implies that ι is injective.

Exercise 9.32. — Show that the objects introduced in 9.31 are well-defined. In particular,
2 verify that d is indeed a metric on X.

Exercise 9.33. — As the name suggests, the completion (X, d) of a metric space is com-
plete, meaning that every Cauchy sequence in X converges. A sequence in X is essentially a
sequence of sequences, i.e.,
[(xm,n )∞ ∞
n=0 ]m=0 .

Show that [(xn,n )∞


n=0 ] is a limit of this sequence.

Exercise 9.34. — Let (X, d) be a metric space with completion (X, d). Let (Y, dY ) be a
complete metric space, and let f : X → Y be a function such that

d(x, y) = dY (f (x), f (y))

for all x, y ∈ X. Show that there exists a unique function f : X → Y such that f = f ◦ ιX
and
d(x, y) = dY (f (x), f (y))

for all x, y ∈ X. This can be interpreted as: “X is the smallest complete metric space
containing X.”

Version: February 25, 2024. 11


Chapter 9.2

9.1.4 *The Reals as the Completion of Rationals (extra material; cf. Grund-
strukturen)
In the first semester, we defined R as any complete ordered field, postulating its existence
(Definitions 2.18 and 2.19 ????). The idea of completion of metric spaces allows one to easily
construct a model of R. This constructions shows, in particular, the existence of a complete
ordered field. One can also prove with a bit of patience (although it is not hard to do so) that
actually there is only one model of R, in the sense that any two complete ordered fields must
be isomorphic.
The completion of Q serves as a model for a field of real numbers. First, note that the
construction of the completion of Q does not necessarily require a field of real numbers (as
the target space for the standard metric on Q). The set of all Cauchy sequences C in Q is the
set of all sequences of rational numbers (qn )∞
n=0 such that

∀k ∈ N ∃N ∈ N : m, n ≥ N =⇒ |qn − qm | < 2−k .

The set CQ is a vector space over Q with component-wise operations, and

N = (qn )∞

n=0 ∈ C | lim qn = 0
n→∞

is a linear subspace. The equivalence relation (pn − qn )∞


n=0 ∈ N in 9.31 translates to (pn −

2 qn )n=0 ∈ N . We define the set R as the quotient


R = C/∼ = C/N

in the sense of linear algebra. Thus, R is a vector space over Q. We denote the injective linear
map ι : Q → R by the canonical embedding, which assigns to q ∈ Q the class of the constant
sequence with value q. From now on, elements of R are called real numbers, and we consider
Q as a subset of R via the canonical embedding ι.
We define a product on R by component-wise multiplication. That is, for elements x =
n=0 ] and y = [(qn )n=0 ], we define
[(pn )∞ ∞

x · y = [(pn qn )∞
n=0 ].

It can be verified that this gives a well-defined commutative operation on R, satisfying the
distributive law with respect to addition, and compatible with the multiplication of rational
numbers via the canonical embedding. In particular, 1R = ι(1) = [(1)∞ n=0 ] is the multiplicative
identity in R. If x = [(qn )n=0 ] is non-zero, then (qn )n=0 is a Cauchy sequence in Q that does
∞ ∞

not converge to zero. Therefore, qn ̸= 0 for all but finitely many n ∈ N. The class of the
sequence (pn )∞n=0 given by 
1 if qn = 0
pn =
q −1 otherwise
m

Version: February 25, 2024. 12


Chapter 9.2

serves as a multiplicative inverse for x. This shows that R is a field with the given operations.
We use the usual order relation on Q to construct an order relation on R. For elements
n=0 ] and y = [(qn )n=0 ] in R, we declare
x = [(pn )∞ ∞

x≤y

if there exists a sequence (rn )∞ n=0 ∈ N such that pn − rn ≤ qn for all n ∈ N. It is left to
the diligent reader to verify that this indeed defines a well-defined order relation on R that
is compatible with the field structure on R. Thus, R is equipped with the structure of an
ordered field.
It remains to show that the ordered field R is complete in the sense of Definition ??. It
is easy to see that R satisfies the Archimedean Principle: Let x = [(qn )∞ n=0 ] ∈ R be positive.
Then, (qn )∞n=0 is not a null sequence. Thus, there exists a k ∈ N such that |qn | > 2
−k for

infinitely many n ∈ N. However, (qn )∞ n=0 is also a Cauchy sequence, so there exists N ∈ N
such that m, n ≥ N =⇒ |qn − qm | < 2−k−1 . This shows that |qn | > 2−k−1 and even
qn > 2−k−1 for all but finitely many n ∈ N, since x > 0. This demonstrates ι(2−k−1 ) ≤ x, or
simply 2−k−1 ≤ x as we consider Q as a subset of R. Thus, the Archimedean Principle holds,
as stated in Corollary ??. Now, let X, Y ⊂ R be non-empty subsets such that x ≤ y for all
x ∈ X and y ∈ Y . We want to find a real number z = [(rn )∞ n=0 ] ∈ R between X and Y . To
2
do this, we first choose arbitrary a0 , b0 ∈ Q such that [a0 , b0 ] ∩ X ̸= ∅ and [a0 , b0 ] ∩ Y ̸= ∅,
and set r0 = 12 (a0 + b0 ). If x ≤ r0 ≤ Y for all x ∈ X and y ∈ Y , we set z = r0 and we are
done. Otherwise, we define a1 and b1 as

a = r and b = b if [a0 , r0 ] ∩ X ̸= ∅ and [a0 , r0 ] ∩ Y ̸= ∅
1 0 1 0
a = a and b = r if [r , b ] ∩ X ̸= ∅ and [r , b ] ∩ Y ̸= ∅
1 0 1 0 0 0 0 0

and set r1 = 12 (a1 + b1 ). By continuing this process, we either find an rn such that x ≤ rn ≤ Y
for all x ∈ X and y ∈ Y , and we set z = rn , or we obtain sequences (an )∞ n=0 , (bn )n=0 , and

n=0 with |bn − an | ≤ 2


(rn )∞ −n |a − b | and
0 0

[an , bn ] ∩ X ̸= ∅ and [an , bn ] ∩ Y ̸= ∅.

As the diligent reader can verify, this implies that (an )∞


n=0 , (bn )n=0 , and (rn )n=0 are all Cauchy
∞ ∞

sequences, and the real number

z = [(an )∞ ∞ ∞
n=0 ] = [(rn )n=0 ] = [(bn )n=0 ]

satisfies the inequalities x ≤ z ≤ y for all x ∈ X and y ∈ Y .

Version: February 25, 2024. 13


Chapter 9.2

9.2 Topology of Metric Spaces

9.2.1 Open and closed sets

Definition 9.35: Open balls


2
Let (X, d) be a metric space, x ∈ X, and r > 0 a real number. In this context, we write

B(x, r) := {y ∈ X | d(x, y) < r}

and refer to the set B(x, r) as the open ball with center x and radius r.

Definition 9.36: Open and Closed sets


Let (X, d) be a metric space:
• A subset E ⊂ X is called open if, for every x ∈ E, there exists r > 0 such that
B(x, r) ⊂ E.

• The collection of all open sets, Td = {U ⊂ X | U open}, is called the topology


generated by d.

• A subset E ⊂ X is called closed if X \ E is open.

9.37. — In particular, ∅ and X are always both open and closed. In general, a subset U
needs not to be neither open nor closed.
It is not true in general that the only “clopen” sets in a space are the empty set ∅ and the
3 space itself X. A set is termed "clopen" if it is both open and closed. For illustration, take the
space X = (0, 1) ∪ (2, 3), equipped with the standard metric from R. Here, the intervals (0, 1)
and (2, 3) are clopen: they are open and closed in X. This example underscores the presence
of other clopen sets beyond just ∅ and X. The significance of clopen sets will become more
apparent in our discussions on connectedness. As we will see, connected spaces are precisely
characterized by the absence of nontrivial (neither empty nor the whole space) clopen sets.

9.38. — Consider the set X = [0, 2], equipped with the standard metric inherited from
R. In this context, the subset [0, 1) is open within X (an exercise worth verifying). However,
when considered as subset of the whole R, [0, 1) is neither open nor closed. This example
illustrates that statements regarding the openness of a set like [0, 1) require clarity about the
ambient space (X, d) being considered. In practice, though, such nuances are often glossed
over when the context is clear, and delving into these subtleties is usually unnecessary for
typical discussions.

Version: February 25, 2024. 14


Chapter 9.2

Exercise 9.39. — Let (X, d) be a metric space. Show that

• The open ball B(x, r) is an open set.

• Every finite subset of X is closed.

Proposition 9.40: Arbitrary unions and Finite intersections


S
Let (Ui )i∈I be any family of open subsets of X, then i∈I Ui is also open. If I is a finite
T
set, also i∈I Ui is open.

Proof. Set
[
U= Ui
i∈I

and let x ∈ U . Then there exists i ∈ I with x ∈ Ui , and since Ui is open, there exists an
ε > 0 such that B(x, ε) ⊂ Ui , implying B(x, ε) ⊂ U . Thus, U is open. Finally, let (Ui )i∈I be
a finite family of open subsets of X. Set
\
3 U= Ui
i∈I

and let x ∈ U . Then x ∈ Ui for all i ∈ I, and for each i ∈ I, there exists εi > 0 such that
B(x, εi ) ⊂ Ui . For ε := min{εi | i ∈ I}, we have ε > 0 and B(x, ε) ⊂ Ui for all i ∈ I. Thus,
B(x, ε) ⊂ U , completing the proof.

Proposition 9.41: Arbitrary intersections and Finite unions


T
Let (Ai )i∈I be any family of closed subsets of X, then i∈I Ai is also closed. If I is a
S
finite set, also i∈I Ai is closed.

Proof. Apply Proposition 9.40 to the (open) complements of the closed sets.

Example 9.42. — The intersection of infinitely many open sets may not be open. Take for
example R with the standard metric. The intersection of the family of open sets {(−1/n, 1/n) | n ∈ N}
gives {0}, which is not open. Taking complements you obtain an example where an infinite
union of closed sets is not closed.

Version: February 25, 2024. 15


Chapter 9.2

Definition 9.43:
Let X be a metric space and E ⊂ X. We define

• the interior E ◦ := {U ⊂ E : U is open}, which is the largest open set contained


S

in E.

• the closure E := {A ⊃ E : A is closed}, which is the smallest closed set


T

containing E.

• the (topological) boundary ∂E := E \ E ◦ .

Exercise 9.44. — Using Proposition 9.40, prove that E ◦ is always open while E and ∂E
are always closed.

Exercise 9.45. — For balls B(x, r) prove that B(x, r) = {y ∈ X | d(x, y) ≤ r} and deduce
∂B(x, r) = {y ∈ X | d(x, y) = r}.

Lemma 9.46: Open and Closed trough sequences

Let (X, d) be a metric space.

(1) A subset U ⊂ X is open if and only if, for every convergent sequence in X with a
3 limit in U , the sequence eventually lies in U .

(2) A subset A ⊂ X is closed if and only if, for every convergent sequence (xn )∞
n=0 in
X with xn ∈ A for all n ∈ N, the limit also lies in A. In other words, if and only
if A coincides with the set of all its accumulation points.

Proof. Let U ⊂ X be an open subset of X, and let (xn )∞ n=0 be a sequence in X with a limit x
in U . Then, there exists ε > 0 such that B(x, ε) ⊂ U , and since (xn )∞ n=0 converges to x, there
exists an N ∈ N such that xn ∈ B(x, ε) for all n ≥ N . Conversely, let V ⊂ X be a non-open
subset. Then there exists a point x ∈ V such that B(x, ε) \ V ̸= ∅ for every ε > 0. For each
n ∈ N, we can find xn ∈ B(x, 2−n ) \ V . The sequence (xn )∞ n=0 in X \ V converges to x ∈ V ,
and satisfies xn ∈/ V for every n ∈ N. This completes the proof of the first statement.
Let A ⊂ X be closed, and let (xn )∞ n=0 be a convergent sequence in X with xn ∈ A for all
n ∈ N. Let x be the limit of the sequence (xn )∞ n=0 . Then, U = X \ A is open and cannot
contain the limit x of (xn )n=0 , as otherwise almost all elements of the sequence (xn )∞

n=0 would
have to lie in U . Therefore, the limit x belongs to A. Finally, suppose A ⊂ X is not closed.
Then U = X \ A is not open, and according to the previous argument, there exists a sequence
n=0 in A = X \ U with a limit x ∈ U .
(xn )∞

Exercise 9.47. — Let (X, d) be a complete metric space and E ⊂ X a closed subset.
Show that E is complete as well.

Version: February 25, 2024. 16


Chapter 9.2

Proposition 9.48: Topological notion of convergence


Let (X, d) be a metric space a sequence (xn )n≥0 converges to x if and only if for all U
open containing x, xn eventually lies in U .

Proof. Notice that we can rewrite the definition of convergence as follows: xn → x if and only
if for all ε > 0, xn eventually lies in B(x, ε). Now, if U is any open set containing x then
by definition of open set there exists ε > 0 such that B(x, ε) ⊂ U and hence xn → x implies
that xn eventually lies in U , establishing the “only if” direction. For the “if” we just that for
given ϵ > 0 we can take U = B(x, ε) (open balls are open) and hence xn eventually lies in
B(x, ε).

Corollary 9.49: Distances with same convergent sequences

Let X be a set endowed with two different distances d1 and d2 . Then (X, d1 ) and (X, d2 )
have the same convergent sequences if and only if the topologies generated by d1 and d2
coincide.
3
Proof. By Corollary 9.48 the notion of convergent sequence only depends on the collection of
open sets. That is, it only depends on the distance through the topology it generate. Hence
distances generating the same open sets have the same convergent sequences.

Exercise 9.50. — Let (X, d) be a metric space, and A ⊂ X a subset.

(i) Assume X is complete and A is closed. Show that the subspace A is also complete.

(ii) Assume A is complete. Show that A ⊂ X is closed.

9.2.2 Continuity
We now aim to generalize the concept of continuity to functions defined between metric
spaces.

Version: February 25, 2024. 17


Chapter 9.2

Definition 9.51: Continuity


Let (X, dX ) and (Y, dY ) be metric spaces, and let f : X → Y be a function. We say
that f is continuous if one of the following equivalent conditions hold:

(1) We say f is ε − δ continuous if, for all x ∈ X and ϵ > 0, there exists a δ > 0
such that if dX (x, x′ ) < δ, x′ ∈ X =⇒ dY (f (x), f (x′ )) < ε. In other words,
B(x, δ) ⊂ B(f (y), ε).

(2) We say f is sequentially continuous if, for every convergent sequence (xn )n in
X with limit x = lim xn , the sequence (f (xn ))n converges in Y , with f (x) =
n→∞
lim f (xn ).
n→∞

(3) We say f is topologically continuous if, for every open subset U ⊂ Y , the
preimage f −1 (U ) = {x ∈ X | f (x) ∈ U } is open in X.

Proposition 9.52: The three faces of Continuity


Let X and Y be metric spaces, and let f : X → Y be a function. The following
conditions are equivalent:
3
(1) The function f is ε − δ continuous.

(2) The function f is sequentially continuous.

(3) The function f is topologically continuous.

Proof. (1) =⇒ (2): Let (xn )∞ n=0 be a convergent sequence in X with limit x ∈ X, and let
ε > 0. There exists a δ > 0 such that f (x′ ) ∈ B(f (x), ε) for all x′ ∈ B(x, δ). Since (xn )∞ n=0
converges to x, there exists an N ∈ N such that xn ∈ B(x, δ) for all n ≥ N . In particular, for
n ≥ N , f (xn ) ∈ B(f (x), ε). Since ε > 0 was arbitrary, it follows that lim f (xn ) = f (x), and
n→∞
thus f is sequentially continuous as claimed.
¬(3) =⇒ ¬(1): Assume f is not topologically continuous. Then exists U ⊂ Y open such
that f −1 (U ) is not open. Therefore, there is x ∈ f −1 (U ) and a sequence (xn )n≥0 ⊂ X \f −1 (U )
with xn → x. But then f (x) ∈ U and f (xn ) ∈ Y \ U for all n which contradicts. Since U we
have found a sequence such that xn → x but f (xn ) does not coverge to f (x).
(3) =⇒ (1): Let x ∈ X and ε > 0. The preimage f −1 (B(f (x), ε)) contains the point x
and is open by assumption, as B(f (x), ε) ⊂ Y is open. Thus, there exists a δ > 0 such that
B(x, δ) ⊂ f −1 (B(f (x), ε)). Therefore, f is ε-δ-continuous as claimed.

Version: February 25, 2024. 18


Chapter 9.2

Definition 9.53:
Let (X, dX ) and (Y, dY ) be metric spaces. We say that f : X → Y is

• Uniformly continuous if for every ε > 0 there is δ > 0 such that


dY (f (x), f (x′ )) < ε whenever dX (f (x), f (x′ )) < δ.

• Lipschitz continuous if there is a constant L > 0 such that

dY (f (x), f (x′ )) ≤ L dX (x, x′ ) for all x, y ∈ X.

The constant L is called Lipschitz constant of f

Exercise 9.54. — Let (X, d) be a metric space, and let E ⊂ X be a non-empty subset.
For x ∈ X, define
fE (x) = inf{d(x, z) | z ∈ E}.

Show that the function fE : X → R is 1-Lipschitz continuous, and that E ⊂ X is closed if


and only if E = {x ∈ X | fE (x) = 0}.

Exercise 9.55. — Let (X, dX ) and (Y, dY ) be metric spaces. Assume that Y is complete.
Show that if E ⊂ X and f : E → Y is a uniformly continuous function defined on a subset
then there is a unique continuous extension f¯: Ē → Y , which is also uniformly continuous.

9.2.3 Banach’s Fixed-Point Theorem


Fixed-point theorems investigate when a mapping f : X → X possess a fixed point, i.e.,
a point x ∈ X for which f (x) = x. They can be rather powerful tools to prove existence
theorems.
We prove in this section Banach’s Fixed-Point Theorem, which will be used later to prove
the Implicit Function Theorem ?? and the existence of solutions to ODEs.

Theorem 9.56: Banach’s Fixed-Point Theorem


Let (X, d) be a complete metric space and let T : X → X be a Lipschitz map with
Lipschitz constant λ < 1. In other words, assume that for some λ ∈ (0, 1) it holds

d(T (x), T (x′ )) ≤ λ d(x, x′ ) for all x, x′ ∈ X.

Then, there exists a unique element a ∈ X such that T (a) = a.

The function T is a Lipschitz contraction. A point x ∈ X with T (x) = x is called a


fixed point of the mapping T , and the theorem states that a Lipschitz contraction has a
unique fixed point, provided the ambient space is complete (as in Definition ??).

Version: February 25, 2024. 19


Chapter 9.2

Proof. First, we show uniqueness of a putative fixed point. Let a ∈ X and b ∈ X be fixed
points of T . Then,
d(a, b) = d(T (a), T (b)) ≤ λd(a, b),

which, since λ < 1, implies d(a, b) = 0 and thus a = b.


We turn to prove the existence of a fixed point. Choose any x0 ∈ X and define a sequence
n=0 recursively by xn+1 = T (xn ), for n ≥ 0. We claim that the sequence (xn )n=0 is a
(xn )∞ ∞

Cauchy sequence. Iterating the contractivity assumption we find that, for all integers p ≥ 0,

d(xn+1 , xn ) = d(T (xn ), T (xn−1 )) ≤ λ d(xn , xn−1 )


≤ λ2 d(xn−1 , xn−2 )(x0 )) ≤ . . . ≤ λn d(x1 , x0 ).

Pick now any integers m ≥ n ≥ N , then using this observation and the triangular inequality
we find
m−1
X m−1
X
d(xn , xm ) ≤ d(xp , xp+1 ) = d(T p (x0 ), T p (x1 ))
p=n p=n
m−1 ∞
X
p
X λN
≤ λ d(x0 , x1 ) ≤ d(x0 , x1 ) λp = d(x0 , x1 ).
p=n
1−λ
p≥N

We crucially used that λ < 1 to sum the geometric series. Now given any ε > 0 we can find
λN
some N so large that 1−λ d(x0 , x1 ) < ε, thus proving that (xn )∞
n=0 is Cauchy (this estimate is
uniform in n, m as long as they are larger than N !).
Now we use the completeness assumption to infer that xn → a for some a ∈ X. Since T is
continuous, we have

T (a) = lim T (xn ) = lim xn+1 = lim xn = a


n→∞ n→∞ n→∞

which shows that a is a fixed point of T .

9.57. — We remark that the proof is constructive and in concrete situation can be imple-
mented in a an algorithm to find approximate fixed points.

Exercise 9.58. — Find examples for:

(1) A Lipschitz contraction T : X → X on a non-complete metric space X that has no fixed


point.

(2) A complete metric space (X, d) and an isometry (i.e., a mapping T : X → X with
d(T (x1 ), T (x2 )) = d(x1 , x2 )) that has no fixed point, and an isometry that has exactly
13 fixed points.

Version: February 25, 2024. 20


Chapter 9.2

9.2.4 Compactness
A closed and bounded interval of the real line is is called a compact interval, as we saw
in Analysis I. We proved some fundamental properties of continuous functions on compact
intervals: boundedness, existence of maxima and minima, and uniform continuity. We intend
to investigate these and other properties in the broader context of metric spaces. We start
giving a general definition of compactness that works in metric spaces.
Let us immediately clarify a possible source of confusion.

Achtung! 9.59: Closed & Bounded VS Compact


• In a general metric space, it is not necessarily true that a closed and bounded set
is compact.

• Nevertheless in Rn , which is the main focus of this course, it will turn out that a
closed and bounded set is indeed compact, and viceversa.

Interlude: “Cover”
Let E ⊂ X and let U = {Ui }i∈I be a family of subsets of X, where I is some set of
indices. We say that U covers E if
[ [
E⊂ U= Ui .
U ∈U i∈I

Any family V ⊂ U that still covers X is called a subcover of U.

Definition 9.60: Compactness


A metric space (X, d) is compact if one of the following equivalent conditions hold:

(1) X is topologically compact: every family of open sets {Ui }i∈I that cover X, has
a finite sub-family that still covers X.

(2) X is sequentially compact: every sequence (xn )n∈N in X has a subsequence that
is convergent in X.

(3) X is complete and totally bounded: for every r > 0, there exist finitely many
x1 , . . . , xn ∈ X such that the balls B(x1 , r), . . . , B(xn , r) cover X.

A subset E ⊂ X is compact in X if the metric space (E, d|E×E ) is compact.

9.61. — The definition of topological compactness does not explicitly use the distance
function d, but it is formulated only in terms of the collection of open sets (i.e., the topology).
For this reason it is called “topological”.

Version: February 25, 2024. 21


Chapter 9.2

9.62. — The Bolzano-Weierstrass Theorem ensures that a closed and bounded interval of
R is sequentially compact.

Example 9.63. — Q ∩ [0, 2], endowed with the standard distance, is not topologically
compact. Consider the covering
√ [
Q ∩ [0, 2] = (Q ∩ [0, 2)) ∪ (Q ∩ (p, 2]).

p∈Q, p> 2


Any finite subcover will miss some rationals > 2.

Exercise 9.64. — Let X be a metric space. Show that if X is totally bounded, then it is
bounded, i.e., supx,x′ ∈X d(x, x′ ) < +∞.

Example 9.65. — The half-open interval X = (0, 1] ⊂ R is not compact. Indeed, the open
cover U = {(2−n , 1] | n ∈ N} has no finite subcover.
The main result of this section is Theorem 9.67, which shows that the definition of compact
metric space is indeed well-posed.

Exercise 9.66. — Show that a totally bounded metric space is bounded, and find an
example of a bounded metric space that is not totally bounded.

Theorem 9.67: The three faces of Compactness


Let (X, d) be a metric space, the following statements are equivalent:

(1) X is topologically compact.

(2) X is sequentially compact.

(3) X is totally bounded and complete (in the sense of Cauchy sequences).

We will prove that (1) =⇒ (3) =⇒ (2) =⇒ (1).


In order to prove the first of these implications we single out a rephrasing of (1) which is
sometimes useful to keep in mind.

Lemma 9.68: Nesting Principle


(X, d) is topologically compact if and only if has the following property. For every
collection A of closed subsets of X, it holds:
\
Every intersection of finitely many elements of A is non-empty =⇒ A ̸= ∅.
A∈A

Proof of Lemma 9.68. Assume X is compact, and let A be a collection of closed subsets of X
with an empty intersection. The collection of complements U = {X \ A | A ∈ A} is then an
Version: February 25, 2024. 22
Chapter 9.2

open cover of X, and there exists a finite subcover X = U1 ∪ · · · ∪ Un of it. Set Ai = X \ Ui .


Then A1 ∩ · · · ∩ An ̸= ∅. Thus, X satisfies the Nesting Principle.
Now, assume X satisfies the Nesting Principle, and show that X is compact. Let U be an
open cover of X. Then A = {X \ U | U ∈ U } is a collection of closed subsets with an empty
intersection. According to the Nesting Principle, there exist A1 , . . . , An ∈ A with an empty
intersection. Set Ui = X \ Ai . Then X = U1 ∪ · · · ∪ Un is a finite subcover. Since the cover U
was arbitrary, this shows that X is compact.

Proof that (1) =⇒ (3). We first prove that X is totally bounded. Pick any r > 0, consider the
open covering U := {B(x, r) : x ∈ X} and extract a finite subcover {B(x1 , r), . . . , B(xN , r)}.
Now we prove that X must be complete, hence we pick a Cauchy sequence (xn )n∈N and
show that it has a limit point. For each k ≥ 0 there is n(k) so large that

n, m ≥ n(k) =⇒ d(xn , xm ) < 2−k .

For each k, consider the closed balls Ak := B(xn(k) , 2−k ); any finite intersection of them is
nonempty, indeed for every k1 , . . . , kN one has that xm ∈ Ak1 ∩ . . . ∩ AkN , provided m ≥
max{k1 , . . . , kN }. Hence we can apply the Nesting Principle (see Proposition 9.68) and find
some z ∈ k≥0 Ak . We claim that xn → z. Indeed if m ≥ n(k) it holds
T

d(xm , z) ≤ d(xm , xn(k) ) + d(xn(k) , z) ≤ 21−k ,

and 21−k is arbitrarily small.

Before continuing the proof we need an auxiliary Lemma

Lemma 9.69: Diagonal Argument


Let N ⊃ N0 ⊃ N1 ⊃ N2 ⊃ . . . be a an infinite family of nested sets. Assume further that
each Nk has infinite many elements. Then there exists f : N → N strictly increasing
such that f (k) ∈ Nk for all k ≥ 0.

Proof of Lemma 9.69. Set f (0) equal to an arbitrary element of N0 . Then set inductively
f (k) := min{m ∈ Nk : m > f (k − 1)}, this set is non-empty because each Nk is infinite.

Proof that (3) =⇒ (2). We pick any sequence (xn )n∈N and show that admits a Cauchy sub-
sequence, by completeness, this will prove (2).
By assumption, we can cover X by a finite number of balls of radius 1; it follows that
(xn )n∈N will frequently lie in one of these balls, let this ball be B(z1 , 1) for some z1 ∈ X.
Accordingly, we define the set of indices N0 := {j ∈ N : xj ∈ B(z1 , 1)}, which is infinite.
Now we proceed to do same thing to the restricted sequence (xn )n∈N0 , but we shorten the
radius of from 1 to 1/2. Accordingly, we find a ball B(z2 , 1/2) such that the set N1 := {j ∈
N0 : xj ∈ B(z2 , 1/2)} is infinite.

Version: February 25, 2024. 23


Chapter 9.2

We proceed inductively, halving the radius each time, and construct a descending family
of infinite sets N ⊃ N0 ⊃ N1 ⊃ N2 ⊃ . . . with the property that

∀k ≥ 0, ∃zk ∈ X, j ∈ Nk =⇒ d(zk , xj ) < 2−k .

We apply to these sets Lemma 9.69 and find f : N → N strictly increasing such that f (k) ∈ Nk
for all k ≥ 0. Then, the subsequence (xf (k) )k∈N is Cauchy: for n, m ≥ k it holds

f (n) ∈ Nn , f (m) ∈ Nm =⇒ f (n), f (m) ∈ Nk


=⇒ xf (n) , xf (m) ∈ B(zk , 2−k ) =⇒ d(xf (n) , xf (m) ) < 21−k .

We conclude the chain of implications thus proving Theorem 9.67.

Proof that (2) =⇒ (1). Assume U is an open covering of X, we want to construct a finite
open subcover out of it.
We start with a preliminary observation. Consider the function

r(x) := 1 ∧ sup{r > 0 : B(x, r) ⊂ U for some U ∈ U}, for all x ∈ X.

Notice that r(x) > 0 since U is open cover. For each x ∈ X we choose — once and for all —
some U (x) ∈ U which is almost optimal in the sense that B(x, r(x)/2) ⊂ U (x).
Let us now proceed with the construction of our finite subcover. Pick any U0 ∈ U. If
X ⊂ U0 then we are done, otherwise there is some x1 ∈ X \U0 . In this case we set U1 := U (x1 ).
Now we check if X ⊂ U0 ∪ U1 , in which case we have found our finite subcover. If not,
there is some x2 ∈ X \ (U0 ∪ U1 ) and we set U2 := U (x2 ).
Now we check again if X ⊂ U0 ∪ U1 ∪ U2 , in which case we have found our finite subcover.
If not, there is some x3 ∈ X \ (U0 ∪ U1 ∪ U2 ) and we set U3 := U (x3 ).
If this procedure stops at a certain point, it means that we have found our finite open
subcover. So let’s assume that it is goes on indefinitely, and find a contradiction. We obtain
a sequence (xn )n∈N with the property that

/ U0 ∪ . . . ∪ Un for all m > n ≥ 0.


xm ∈

By assumption (2), we have xf (n) → z for a suitable subsequence. On the other hand, for
each n ≥ 0 we have z ∈ / Uf (n) , and in particular z ∈
/ B(xf (n) , r(xf (n) )/2). Combining this
information with xf (n) → z, we find r(xf (n) ) → 0 and in particular

r(xf (n) ) = sup{r > 0 : B(xf (n) , r) ⊂ U for some U ∈ U} ∈ (0, 1).

This is a contradiction as r(z) > 0, so for n large enough we would have

B(xf (n) , 5r(xf (n) )) ⊂ B(z, r(z)/2) ⊂ U (z) ∈ U,

Version: February 25, 2024. 24


Chapter 9.2

which, by definition of r(xf (n) ), implies 5r(xf (n) ) ≤ r(xf (n) ), impossible.

Corollary 9.70: Product of Compacts is Compact


Let (X, dX ) and (Y, dY ) be two compact metric spaces. Endow X × Y with the distance
dX×Y defined as

dX×Y ((x, y), (x′ , y ′ )) := dX (x, x′ ) + dY (y, y ′ ).

Then also (X × Y, dX×Y ) is compact.

Proof. Use the sequential caracheterization.TODO: proof + add exercise in PS2

Corollary 9.71: Closed in Compact is Compact


Let (X, d) be compact and let E ⊂ X be closed. Then E is compact.

Proof. We check that E is sequentially compact: take any sequence (xn )n∈N ⊂ E, by com-
pactness of X, it has a converging subsequence xf (n) → x, for some x ∈ X. Since E is closed
it contains its accumulation points, so in fact x ∈ E.

Corollary 9.72: Compact is always Closed


If E is compact in (X, d), then E is closed.

Proof. Take any sequence in E which is convergent to some x ∈ X. By compactness a suitable


subsequence is converging to some x′ ∈ E, by uniqueness of the limit x = x′ , so E is closed
(see (2) in Lemma 9.46).

We also easily get the following version of the Heine Borel theorem in Rn

Theorem 9.73: Compact subsets of Rn (Heine-Borel)

A subset K ⊂ Rn is compact if and only if it is closed and bounded.

Proof. If K is compact, then it is closed by Corollary 9.72; and bounded, because it is totally
bounded.
To show the converse, we show that K is complete and totally bounded. Since Rn is
complete and K is closed, then K is complete as well.
Given r > 0, take some large integer N ≥ 2n/r and consider all the closed cubes that have
sidelength 1/N , corners with coordinates of the form m/N for some m ∈ Z and intersect K.
Such cubes are only a finite number, because K is bounded. Furthermore, each of them is
contained in the ball of radius r and center one of the edges, thus we have found our finite set
of balls that cover K.

Version: February 25, 2024. 25


Chapter 9.2

Example 9.74. — We stress that the Heine-Borel Theorem fails for general metric spaces:
take R with the bounded distance d(x, y) := arctan |x − y| (see Exercise 9.7). Then in this
metric space the set N would be closed and bounded, but it is trivial to construct sequences
that do not converge.

Exercise 9.75. — Show that an open set U ⊂ Rn is complete if and only if U = Rn .

9.2.5 Compactness and continuity

Theorem 9.76: Continuous functions preserve compactness


Let X and Y be metric spaces, let f : X → Y be a continuous function, and let K ⊂ X
be a compact subset. Then, f (K) is a compact subset of Y .

Proof. Let U be an open cover of f (A). For each U ∈ U, the set f −1 (U ) ⊂ X is open due
to the continuity of f . The collection {f −1 (U ) | U ∈ U } is an open cover of A. Since A is
compact, there exist U1 , . . . , Un ∈ U such that {f −1 (Ui ) | 1 ≤ i ≤ n} is a cover of A. This
implies that {Ui | 1 ≤ i ≤ n} is a cover of f (A), and since U was arbitrary, it shows that f (A)
is compact.

Proposition 9.77: Continuous in a Compact is Uniformly Continuous


Let X and Y be metric spaces, and f : X → Y a continuous function. If X is compact,
then f is uniformly continuous.

Proof. Let ε > 0. Due to the continuity of f , for each x ∈ X, there exists δx > 0 such
that f (B(x, δx )) ⊂ B(f (x), 2ε ). The collection {B(x, 21 δx ) | x ∈ X} forms an open cover of
X. Since X is compact by assumption, there exists a finite subcover of this collection. This
implies the existence of x1 , . . . , xn ∈ X such that

X = B(x1 , 21 δx1 ) ∪ · · · ∪ B(xn , 12 δxn ).

Let δ = 12 min{δx1 , . . . , δxn }. For x, x′ ∈ X with d(x, x′ ) < δ, there exists k such that
x ∈ B(xk , 12 δxk ). This implies x′ ∈ B(xk , δxk ), leading to

ε ε
d(f (x), f (x′ )) ≤ d(f (x), f (xk )) + d(f (xk ), f (x′ )) ≤ + = ε,
2 2

which completes the proof.

Corollary 9.78: Weierstrass


Let X be a compact metric space and f : X → R be a continuous function, then f admits
a maximum point, i.e., there exist x̄ ∈ X such that f (x̄) = supX f . An analogous
statement holds for the minimum. In particular, f must be bounded.

Version: February 25, 2024. 26


Chapter 9.2

Proof. f (X) ⊂ R is compact by Theorem 9.76 and nonempty, so sup f (X) ∈ f (X), any
element x̄ ∈ f −1 (sup f (X)) works.
The fact that sup f (X) ∈ f (X) is readily proved: by definition of supremum there is a
sequence (sn ) ⊂ f (X) such that sn → sup f (X), but then sup f (X) inf(X) since f (X) is
closed and so it contains its accumulation points.

9.2.6 Connectedness

Definition 9.79: Connectenedess


Let (X, d) be a metric space. We call X connected if, apart from ∅ ⊂ X and X ⊂ X,
there are no subsets of X that are both open and closed.
A subset E ⊂ X is called connected in X if (E, d|E×E ) is connected as an independent
metric space.
A subset Y ⊂ X is called a connected component of X if Y is non-empty, open,
closed, and connected.

Lemma 9.80:
Let (X, d) be a metric space, and let Y1 and Y2 be connected subspaces. If the intersection
Y1 ∩ Y2 is non-empty, then the union Y1 ∪ Y2 is connected.

Proof. Let A be a non-empty open and closed subset of Y1 ∪ Y2 . Then, A ∩ Yj is an open and
closed subset of Yj for j = 1, 2. Since A is non-empty, one of these intersections is non-empty;
let’s say A ∩ Y1 is non-empty. As Y1 is connected, we have A ∩ Y1 = Y1 , implying Y1 ⊂ A.
Since Y1 ∩ Y2 ̸= ∅, we have A ∩ Y2 ̸= ∅, and similarly, Y2 ⊂ A. In summary, A = Y1 ∪ Y2 . This
proves that Y1 ∪ Y2 is connected.

Exercise 9.81. — Generalize Lemma 9.80 for arbitrary unions.

Proposition 9.82: Connected subsets of R


A non-empty subset X ⊂ R is connected in R if and only if X is an interval.

Proof. Assume X ⊂ R is not an interval. Then there exist real numbers x1 < y < x2 with
x1 , x2 ∈ X and y ∈
/ X. Define the subsets

U1 = (−∞, y) ∩ X = (−∞, y] ∩ X and U2 = (y, ∞) ∩ X = [y, ∞) ∩ X.

These sets are open and closed (as subsets if X, not in R!), non-empty, and satisfy X = U1 ∪U2 .
Thus, X is not connected.
Now, let X ⊂ R be an interval. Assume there exist non-empty, open, and closed subsets
Y1 ⊂ X and Y2 ⊂ X, with Y1 ∩ Y2 = ∅ and Y1 ∪ Y2 = X. Choose x1 ∈ Y1 and x2 ∈ Y2 .

Version: February 25, 2024. 27


Chapter 9.2

Without loss of generality, assume x1 = 0 and x2 = 1. Since X is an interval, it follows that


[0, 1] ⊂ X = Y1 ∪ Y2 . Now let us define the “magic” number

t∗ := sup{t ≥ 0 : [0, t] ⊂ Y1 },

since 1 ∈ Y2 we have t∗ ∈ [0, 1], so t∗ must belong either to Y1 or to Y2 , but both eventualities
are contradictory.
If t∗ ∈ Y1 then t∗ < 1 and, since Y1 is open, we could enlarge a bit t∗ violating its very
definition.
If t∗ ∈ Y2 , since Y2 is open, it would mean that a whole little neighbourhood of t∗ is
contained in Y2 , but this is impossible because t∗ must be an accumulation point of Y1 .

Proposition 9.83:
Let (X, dX ) and (Y, dY ) be metric spaces, and let f : X → Y be continuous. If X is
connected, then the image f (X) is a connected subspace of Y .

Proof. Without loss of generality, assume that f : X → Y is surjective, replacing f : X → Y


by the continuous surjective map X → f (X), x 7→ f (x) if necessary. Suppose Y is not
connected. Then there exists an open and closed subset A ⊂ Y with A ̸= ∅ and A ̸= Y . The
preimage f −1 (A) of A under f is also open and closed due to the continuity of f . Since X is
connected, either f −1 (A) = ∅ or f −1 (A) = X. However, this implies A = ∅ or A = Y due to
the surjectivity of f , contradicting the assumption on A. Therefore, f (X) is connected.

Corollary 9.84: Intermediate Value Theorem


Let I ⊂ R be an interval, f : I → R a continuous function, and a, b ∈ I. For every
c ∈ R between f (a) and f (b), there exists an x ∈ I between a and b such that f (x) = c.

Proof. Without loss of generality, we can assume a < b. We apply Proposition 9.83 to the con-
tinuous function f |[a,b] : [a, b] → R. Consequently, f ([a, b]) is connected, since by Proposition
9.82, the interval [a, b] in R is connected. Again, according to Proposition 9.82, f ([a, b]) ⊂ R
must be an interval. As f (a), f (b) ∈ f ([a, b]), all values between f (a) and f (b) lie in the image
of f |[a,b] .

Exercise 9.85. — Show the following generalization of the Intermediate Value Theorem:
Let X be a connected topological space, and f : X → R be a continuous function. Let
a, b ∈ X. Then, for every c ∈ R between f (a) and f (b), there exists x ∈ X such that f (x) = c.

Interlude: Paths and Curves


Let X be a metric space. A path or curve in X is a continuous function γ : [0, 1] → X.
We call γ(0) the starting point and γ(1) the ending point. We also say that γ is a
path from γ(a) to γ(b). A path γ with γ(0) = γ(1) is called closed or a loop.

Version: February 25, 2024. 28


Chapter 9.2

If s : [a, b] → [0, 1] is a bijective continuous function with continuous inverse we say that
γ ◦ s is a re-parametrization of γ. Furthermore, exactly one of the following happens

• either, s(a) = 0, s(b) = 1, then we say that s is orientation-preserving,

• or, s(a) = 1, s(b) = 0, then we say that s is orientation-reversing.

Definition 9.86:
We call a topological space X path-connected if, for every two points x, y ∈ X, there
exists a path γ : [0, 1] → X from x = γ(0) to y = γ(1).

Lemma 9.87:
Every path-connected topological space is connected.

Proof. Let X be a disconnected topological space. Then there exist non-empty, open, and
closed subsets U1 and U2 of X such that U1 ∩ U2 = ∅ and U1 ∪ U2 = X. Let x1 ∈ U1 and
x2 ∈ U2 . If X were path-connected, there would exist a path γ : [0, 1] → X from x1 to x2 .
However, this implies that V1 = γ −1 (U1 ) and V2 = γ −1 (U2 ) are non-empty, open, and closed
subsets of [0, 1] with V1 ∩ V2 = ∅ and V1 ∪ V2 = [0, 1], which is a contradiction since [0, 1] is
connected.

Exercise 9.88. — Sketch the subspace X ⊂ R2 given by

X = {0} × [−1, 1] ∪ {(t, sin( 1t )) | t > 0}

and show that X is connected but not path-connected.

Proposition 9.89:
Let U ⊂ Rn be an open subset. Then U is path-connected if and only if U is connected.

Proof. If U is path-connected, then U is also connected according to Lemma 9.87. Now,


assume U is connected, non-empty, and x0 ∈ U is a fixed point. We define the set

G = {x ∈ U | there exists a path in U from x0 to x}

and want to show that G = U . Since U is connected and G is non-empty, it suffices to show
that G is both open and closed.
Let x ∈ G and γ : [0, 1] → U be a path from x0 to x. Since U is open, there exists r > 0
such that B(x, r) ⊂ U . For any y ∈ B(x, r), the straight path t 7→ (1 − t)x + ty, connecting x
and y, lies in U . Concatenating these paths yields the path

γ(2t) if 0 ≤ t ≤ 21
t 7→
(2 − 2t)x + (2t − 1)y if 1 < t ≤ 1
2

Version: February 25, 2024. 29


Chapter 9.2

from x0 to y. Thus, y ∈ G, and since y was arbitrary, we have B(x, r) ⊂ G. This shows that
G is open. Using a similar argument, we can show that U \ G is open. If x ̸∈ G and r > 0
with B(x, r) ⊂ U , then all points in B(x, r) are not in G. If y ∈ G ∩ B(x, r), a concatenation
of paths as above would connect x to x0 . Therefore, B(x, r) ⊂ U \ G, and U \ G is open.
Thus, G is closed.

Corollary 9.90: Rn and its balls are connected


For all n ≥ 1 and r > 0, the metric space Rn (with the standard distance) and the
subsets B(x, r) and B(x, r) of Rn are connected.

Version: February 25, 2024. 30


Chapter 9.3 Normed vector spaces

9.3 Normed vector spaces

9.3.1 Definition of Normed Vector spaces


A norm on a real vector space V is a function from V to R that assigns to each vector a
non-negative number, informally its length. In general there are many different norms on a
vector space, and we can use any of them to construct a distance and turn V into a metric
space. A particularly interesting class of norms is obtained from scalar products. Many
notions of this sections would carry out to vector spaces over C with little modifications, but
we will stick to real vector spaces.

Definition 9.91:
Let V be a vector space over R. A norm on V is a mapping ∥ · ∥ : V → [0, ∞) that
satisfies the following three properties.

(1) (Definiteness) For all v ∈ V , ∥v∥ = 0 ⇐⇒ v = 0.

(2) (Homogeneity) For all v ∈ V and all α ∈ R, ∥αv∥ = |α|∥v∥.

(3) (Triangle Inequality) For all v, w ∈ V , ∥v + w∥ ≤ ∥v∥ + ∥w∥.

The pair (V, ∥ · ∥) is called a normed vector space.

Example 9.92. — Let n ∈ N. The maximum norm or infinity norm ∥ · ∥∞ , and the
1-norm ∥ · ∥1 on Rn are defined by
n
X
∥v∥∞ = max{|v1 |, |v2 |, . . . , |vn |} and ∥v∥1 = |vj |
j=1

for v = (v1 , . . . , vn ) ∈ Rn . The properties of definiteness and homogeneity, as well as the


triangle inequality, can be verified by exercise.

Example 9.93. — If V is the vector space of continuous R-valued functions on [0, 1], we
define analogously the 1-norm and the infinity norm as
Z 1
∥f ∥1 = |f |dx and ∥f ∥∞ = sup{|f (x)| x ∈ [0, 1]}.
0

We immediately observe that Normed Vector Spaces are “automatically” Metric Spaces.

Version: February 25, 2024. 31


Chapter 9.3 Normed vector spaces

Lemma 9.94: Normed Vector Spaces are Naturally Metric Spaces


Let V be a vector space over R and ∥ · ∥ be a norm on V . Define the function

d : V × V → [0, ∞), d(v, w) := ∥v − w∥,

then (V, d) is a metric space.

Proof. We check definiteness, symmetry, and the triangle inequality in the definition of a
metric 9.3. For v, w ∈ V , we have d(v, w) = ∥v − w∥ ≥ 0, and

d(v, w) = 0 ⇐⇒ ∥v − w∥ = 0 ⇐⇒ v − w = 0 ⇐⇒ v = w

by the definiteness of the norm. Using homogeneity of the norm for α = −1, we have for
v, w ∈ V ,
d(v, w) = ∥v − w∥ = ∥(−1)(v − w)∥ = ∥w − v∥ = d(w, v)

thus establishing the symmetry of d. Finally, using the triangle inequality of the norm, we
obtain

d(u, w) = ∥u − w∥ = ∥(u − v) + (v − w)∥ ≤ ∥u − v∥ + ∥v − w∥ = d(u, v) + d(v, w)

for all u, v, w ∈ V . This shows the triangle inequality for d, so d is indeed a metric on V .

We have seen that a normed space is naturally a metric space, now we check that the norm
is indeed continuous. We show sequential continuity.

Lemma 9.95: The Norm is continuous with respect to its own distance
Let V be a R-vector space, and let ∥ · ∥ be a norm on V . Let (vn )∞
n=0 be a sequence in
V converging with respect to the norm ∥ · ∥ to a limit w ∈ V . Then,

lim ∥vn ∥ = ∥w∥.


n→∞

Proof. Let ε > 0. Since (vn )∞ n=0 converges to w, there exists an N ∈ N such that for all
n ≥ N , the estimate ∥vn − w∥ < ε holds. The sequence (∥vn − w∥)∞ n=0 converges to 0. Using
the triangle inequality, we get

∥w∥ − ∥vn − w∥ ≤ ∥vn ∥ ≤ ∥vn − w∥ + ∥w∥

and the lemma follows from the sandwich lemma for sequences of real numbers.

Version: February 25, 2024. 32


Chapter 9.3 Normed vector spaces

9.3.2 Inner product spaces

Definition 9.96:
Let V be a vector space over R. An inner product on V is a map

⟨−, −⟩ : V × V → R

that satisfies the following properties for all u, v, w ∈ V and α, β ∈ R:

(1) (Bilinearity) ⟨αu + βv, w⟩ = α⟨u, w⟩ + β⟨v, w⟩.

(2) (Symmetry) ⟨v, w⟩ = ⟨w, v⟩.

(3) (Definiteness) ⟨v, v⟩ ≥ 0 and ⟨v, v⟩ = 0 ⇐⇒ v = 0.

9.97. — An important example of an inner product is the Euclidean inner product or


standard inner product on Rd . It is given by

d
X
⟨−, −⟩ : V × V → R ⟨v, w⟩ = vk wk
k=1

for v = (v1 , . . . , vd ) and w = (w1 , . . . , wd ). The proof of bilinearity and symmetry is left as an
exercise. We verify definiteness. Let v = (v1 , . . . , vd ) ∈ Rd . Then,

d
X d
X
⟨v, v⟩ = vk vk = |vk |2 ≥ 0
k=1 k=1

is a non-negative real number. If v = 0, then ⟨v, v⟩ = 0. If ⟨v, v⟩ = 0, then each term |vk |2
must be zero, and thus vk = 0 for all k, implying v = 0.

Proposition 9.98: Cauchy-Schwarz Inequality

Let V be a vector space over R, let ⟨−, −⟩ be an inner product on V , and let ∥·∥ : V → R
p
be given by ∥v∥ = ⟨v, v⟩. Then the inequality holds

|⟨v, w⟩| ≤ ∥v∥∥w∥ (9.2)

for all v, w ∈ V . Furthermore, equality in (9.2) holds if and only if v and w are linearly
dependent.

Proof. If v = 0 or w = 0, then both sides of (9.2) are zero, and the vectors v, w are linearly
dependent. So, we assume that v ̸= 0 and w ̸= 0. Then, for α = ⟨v, w⟩∥w∥−2 , we have

∥v − αw∥2 = ⟨v − αw, v − αw⟩ = ⟨v, v − αw⟩ − α⟨w, v − αw⟩


= ⟨v, v⟩ − α⟨v, w⟩ − α⟨w, v⟩ + αα⟨w, w⟩ = ∥v∥2 − α⟨v, w⟩ − α⟨v, w⟩ + |α|2 ∥w∥2

Version: February 25, 2024. 33


Chapter 9.3 Normed vector spaces

|⟨v, w⟩|2 |⟨v, w⟩|2 |⟨v, w⟩|2


= ∥v∥2 − 2 + ∥w∥ 2
= ∥v∥ 2
− .
∥w∥2 ∥w∥4 ∥w∥2

The real number ∥v − αw∥2 is non-negative, and it follows

∥v∥2 ∥w∥2 ≥ |⟨v, w⟩|2

implying the desired inequality (9.2). Equality holds if and only if ∥v − αw∥ = 0, implying
v = αw.

Exercise 9.99. — Prove the Cauchy-Schwarz inequality by following these steps: Let
a > 0. Show that for all v, w ∈ Rn , the inequality

a2 1
|⟨v, w⟩| ≤ ∥v∥2 + 2 ∥w∥2
2 2a

holds and conclude from it the Cauchy-Schwarz inequality.

We prove that the norm induced by an inner product is indeed a norm on V .

Corollary 9.100:
Let V be a vector space over R, let ⟨−, −⟩ be an inner product on V . The map defined
by (9.3)
p
∥ · ∥ : V → R, ∥v∥ = ⟨v, v⟩

satisfies the triangular inequality and is a norm.

Proof. Definiteness and homogeneity follow directly from the definiteness and sesquilinearity
of the inner product. We only need to prove the triangle inequality. Let v, w, ∈ V . Using the
Cauchy-Schwarz inequality, we have the estimate

∥v + w∥2 = ⟨v + w, v + w⟩ = ∥v∥2 + ⟨v, w⟩ + ⟨w, v⟩ + ∥w∥2


= ∥v∥2 + 2 Re(⟨v, w⟩) + ∥w∥2 ≤ ∥v∥2 + 2|⟨v, w⟩| + ∥w∥2
≤ ∥v∥2 + 2∥v∥∥w∥ + ∥w∥2 = (∥v∥ + ∥w∥)2 ,

which implies the desired result after taking the square root.

9.101. — Let V be a vector space over R. If ⟨−, −⟩ is an inner product on V , we call the
norm treated in Corollary 9.100

(9.3)
p
∥ · ∥ : V → R, ∥v∥ = ⟨v, v⟩

Version: February 25, 2024. 34


Chapter 9.3 Normed vector spaces

the induced norm by ⟨−, −⟩. In particular, from the Euclidean inner product on V = Rn ,
we can define a norm on V = Rn . The Euclidean norm on V = Rn is given by
v
u n
p uX
∥x∥ = ⟨x, x⟩ = t |vk |2
k=1

for all v = (v1 , . . . , vn ) ∈ Rn .

9.102. — From now on, in order to keep the notation, simple, we will denote the Euclidean
norm of a vector x in Rn by |x|, instead of ∥x∥, that we will reserve for (less standard) norms.
Notice that this notation does not create any ambiguity or collision with previously introduced
notations. Indeed:

• If n = 1 the Euclidean norm coincides with the absolute value.

• If n = 2 and we are identifying the R2 and C via the usual map (x1 , x2 ) 7→ x1 + ix2 ,
then Euclidean norm coincides with the complex absolute value.

9.103. — The Euclidean norm on Rn holds a special position among all norms on Rn . On
R2 or R3 , it measures the “physical” length of vectors. However, many other norms ∥ · ∥ confer
Rn the structure of normed vector space. A standard family of norms is given by
n
X 1/p
p
∥x∥p := |xi | ,
i=1

where p ∈ [1, +∞) is a given number. Notice that the Euclidean norm |x| corresponds to the
p = 2 case.
One can check (exercise) that for any given x ∈ Rn

lim ∥x∥p = max |xi |


p→+∞ 1≤i≤n

is also a norm. That is why the maximum norm is commontly called infinity norm and denoted
∥ · ∥∞ .

9.3.3 Equivalence of norms in finite dimensional normed spaces

Interlude: Finite and infinite dimensional vector spaces


A basis for a vector space V is a subset B = {ei }i∈I ⊂ V such that

• every v ∈ V can be written as v = i∈I vi ei for some coefficients {vi }i∈I ⊂ R,


P

only finitely many of which are nonzero (so that the previous sum always makes
sense, it is not a series!)

Version: February 25, 2024. 35


Chapter 9.3 Normed vector spaces

• the coefficients {vi }i∈I are uniquely determined, in other words the following
implication holds:
X
vi ei = 0, with finitely many non-zero vi ∈ R =⇒ ci = 0, ∀i ∈ I.
i∈I

For every vector space we can obtain a sequence of linearly independent vectors
e1 , e2 , e3 , . . . . In if this sequence is necessarily stops we obtain a finite basis and we
say that the vector space is finite dimensional. If the sequence can be continued ide-
fenitely we say that the vector space is infinite dimensional.
All bases of a finite dimensional vector space have the same number of vectors. This
number is call the dimension of V .

Exercise 9.104. — Show that the space of polynomials with real coefficients R[x] is an
infinite dimensional vector space and find a basis.

9.105. — If V has finite dimension n ∈ N and we fix a basis B = {ei }1≤i≤n then map

ıB (x1 , . . . , xn ) 7→ x1 e1 + . . . + xn en

is a (vector space) isomorphism and allows to treat V as Rn for most practical tasks. In
particular, if (V, ∥ · ∥) is a normed space then ıB induces a norm in Rn defined as

∥x∥ = ∥ıB (x)∥.

This is a motivation to prove results for Rn equiped with norms different from the Euclidean
one. Indeed, all the concepts and results that can be stated for general norms will automat-
ically hold in “abstract” finite dimensional normed vector space. We see next an important
instance of this.

Definition 9.106: Equivalent (i.e., comparable) norms

Let V be a vector space over R and ∥ · ∥1 and ∥ · ∥2 be two norms on V . We call ∥ · ∥1


and ∥ · ∥2 equivalent if there are constants A > 0 and B > 0 such that

∥v∥1 ≤ A∥v∥2 and ∥v∥2 ≤ B∥v∥1 for all v ∈ V.

Example 9.107. — Let n ∈ N. The 1-norm ∥ · ∥1 and the maximum norm ∥ · ∥∞ given in
Example 9.92 are equivalent, as the inequalities

∥v∥∞ ≤ ∥v∥1 and ∥v∥1 ≤ n∥v∥∞

hold for all v ∈ Rn . As we will show in Theorem 9.108, all norms on a finite-dimensional

Version: February 25, 2024. 36


Chapter 9.3 Normed vector spaces

vector space over R are equivalent to each other. This is not the case for infinite-dimensional
vector spaces. For example, the norms given in 9.92 on the space of continuous functions on
[0, 1] are not equivalent.

Theorem 9.108:
All norms on Rn are equivalent (i.e., comparable).

Proof. Let ∥ · ∥ be any norm on Rn , and let ∥ · ∥1 denote the 1-norm on Rn given in Example
9.92. We show that ∥ · ∥ and ∥ · ∥1 are equivalent, which proves the Theorem.
Let e1 , . . . , en denote the standard basis of Rn , and let A = max{∥e1 ∥, ∥e2 ∥, . . . , ∥en ∥}. For
any vector v = x1 e1 + · · · + xn en ∈ V , we have
n
X n
X
∥v∥ ≤ |xk | · ∥ek ∥ ≤ A · |xk | = A∥v∥1
k=1 k=1

which already shows one of the two required estimates. For the second estimate, consider the
set
S = {v ∈ V | ∥v∥1 = 1}

and the real number B = inf{∥v∥ | v ∈ S}. There exists a sequence (vn )∞ n=0 in S such that
the sequence (∥vn ∥)n=0 converges to B. Since (vn )n=0 is bounded for the 1-norm, it contains
∞ ∞

a convergent subsequence by the Heine-Borel theorem. By replacing (vn )∞ n=0 with such a
subsequence, we can ensure that (vn )n=0 converges to w ∈ V with respect to the 1-norm. For

any ε > 0, there exists an N ∈ N such that

n ≥ N =⇒ ∥w − vn ∥1 ≤ A−1 ε =⇒ ∥w − vn ∥ ≤ ε

We deduce that the sequence (vn )∞


n=0 also converges to w with respect to the norm ∥ · ∥. Thus,
by Lemma 9.95,
∥w∥1 = 1 and ∥w∥ = B,

and, in particular, w ̸= 0 and B > 0. For any vector v ̸= 0 in V , 1


∥v∥1 v is an element of S,
and it satisfies
∥v∥ 1
= v ≥B
∥v∥1 ∥v∥1
Thus, ∥v∥1 ≤ B −1 ∥v∥ for all v ̸= 0 in V . Since this estimate also holds for v = 0, it is shown
that the norms ∥ · ∥ and ∥ · ∥1 are equivalent.

9.109. — Notice that as a simple consequence of Theorem 9.108, in every finite dimensional
vector space V all normed are equivalent. Indeed, we can fix a basis B = {e1 , . . . , en } of V
use the inclusion map ıB defined in 9.105 to export the result for Rn to V .

Version: February 25, 2024. 37


Chapter 9.3 Normed vector spaces

9.3.4 The space of bounded continuous functions with values in Rm


If (X, d) is a metric space we denote by C(X, Rm ) the vector space of continuous functions
defined on X. We denote by Cb (X, Rm ) the vector space of continuous functions which are
also bounded, that is, supx∈X |f | < +∞. By

Proposition 9.110:
For all f ∈ Cb (X, R) set ∥f ∥ := supx∈X |f (x)|. Then (Cb (X, R), ∥ · ∥) is a complete
normed vector space. Furthermore, fn → f in this space if and only if the functions
(fn ) converge uniformly to f , meaning that

∀ε > 0, ∃N ∈ N, ∀x ∈ K, ∀n ≥ N : |f (x) − fn (x)| < ε.

The norm ∥ · ∥∞ is called the supremum norm.

Proof. First of all, ∥f ∥∞ < ∞ by assumption. Let us check the properties of the norm

• Zero norm implies zero function: If ∥f ∥ = 0, then |f (x)| = 0 for all x ∈ X, that is
f (x) = 0 for all x ∈ X. Hence f is the zero element of the vector space Cb (X, R).

• Homogeneity: For any λ ∈ R, x ∈ X, we have |λf (x)| = |λ||f (x)|. Taking the sup
over all x ∈ X, we get ∥λf ∥ = |λ|∥f ∥.

• Triangle inequality: For any f, g ∈ Cb (X, R) and x ∈ X we have

|f (x) + g(x)| ≤ |f (x)| + |g(x)| ≤ ∥f ∥ + ∥g∥.

Taking the sup over all x ∈ X, we get ∥f + g∥ ≤ ∥f ∥ + ∥g∥.

Now, let (fn )∞


n=0 be a Cauchy sequence in Cb (X, R) with respect to ∥ · ∥. For each x ∈ X,
the sequence (fn (x))∞
n=0 is a Cauchy sequence in R, and therefore converges to some limit
fx ∈ R. Define the function f : X → R by f (x) : x 7→ fx , we claim that f ∈ Cb (X, R) and
that (fn )∞
n=0 converges to f uniformly.

• Continuity of f : Fix x0 ∈ X and let ϵ > 0. Since (fn )∞ n=0 is a Cauchy sequence with
respect to ∥ · ∥, there exists N ∈ N such that for all n, m ≥ N and all x ∈ X, we have
|fn (x) − fm (x)| < ϵ. In particular, for n ≥ N , we have

|fn (x) − fN (x)| < ϵ

for all x ∈ X. Fix n ≥ N . Since |fn (x0 ) − fN (x0 )| < ϵ for all x0 ∈ X, the sequence
(fn (x))∞
n=0 is a Cauchy sequence in R, and therefore converges to some limit yx0 ∈ R.
This shows that the function f : X → R is indeed continuous.

• Uniform convergence: Let ϵ > 0. Since (fn )∞ n=0 is a Cauchy sequence with respect
to ∥ · ∥, there exists N ∈ N such that for all n, m ≥ N and all x ∈ X, we have

Version: February 25, 2024. 38


Chapter 9.3 Normed vector spaces

|fn (x) − fm (x)| < ϵ. Fix n ≥ N . Then, for all x ∈ X, we have

|f (x) − fn (x)| = lim |fm (x) − fn (x)| ≤ ϵ.


m→∞

This shows that (fn )∞


n=0 converges uniformly to f .

• Boundedness: Taking ε = 1 in the uniform convergence definition, we find fN ∈ Cb


such that
|f (x)| ≤ |fN (x)| + |f (x) − fN (x)| ≤ ∥fN ∥ + 1,

taking the supremum for x ∈ X we find that f is bounded.

Conversely, suppose (fn )∞


n=0 converges uniformly to f . Let ϵ > 0. Since (fn )n=0 converges

uniformly to f , there exists N ∈ N such that for all n ≥ N and all x ∈ X, we have |fn (x) −
f (x)| < ϵ. This implies that

sup{|fn (x) − f (x)| | x ∈ X} = ∥fn − f ∥ < ϵ.

Since ϵ > 0 was arbitrary, this shows that (fn )∞


n=0 converges to f with respect to ∥ · ∥. This
completes the proof.

Exercise 9.111. — Show that if fj → f uniformly and xj → x, then fj (xj ) → f (x).(F)

By Weierstrass theorem (Corollary 9.78), C(X, Rm ) = Cb (X, Rm ), whenever X is a com-


pact metric space.

Theorem 9.112: Ascoli–Arzelà (F)

Let (X, d) be a compact metric space and let F ⊂ C(X, Rm ) be a subset of functions
which is

(1) equi-bounded, meaning C := supf ∈F ,x∈X |f (x)| < ∞,

(2) equi-continuous, meaning

(9.4)

∀ε > 0, ∃δ > 0, ∀f ∈ F, d(x, y) ≤ δ =⇒ |f (x) − f (y)| ≤ ε .

Then F is compact in C(X, Rm ). Conversely, any compact subset of C(X, Rm ) is equi-


bounded and equi-continuous.

TODO: perhaps the proof with the diagonal argument is neater?

Proof of (⇒). (F) We show that F is complete and totally bounded. For the sake of simplicity,
we work with m = 1, the general case follows arguing coordinate-by-coordinate.
Since F is closed (by definition) and C(X) is complete (by Proposition 9.110), then F is
complete.

Version: February 25, 2024. 39


Chapter 9.3 Normed vector spaces

Let us fix ε > 0, and let δ > 0 be given by (9.4). Since X is compact, it is also totally
bounded, so we find point x1 , . . . xN in X such that

X = B(x1 , δ) ∪ . . . ∪ B(xN , δ).

Now divide the interval [−C, C] in intervals shorter than ε, so that

[−C, C] = I1 ∪ . . . ∪ IM , M ∼ 2C/ε.

Consider the finite set of choices

Σ := σ : {1, . . . , N } → {1, . . . , M } : ∃fσ ∈ F with fσ (xi ) ∈ Iσ(i) ∀i = 1, . . . , N ,




and we claim that the family of balls {B(fσ , 4ε)}σ∈Σ covers F, that is

∀g ∈ F, ∃σ ∈ Σ, ∀x ∈ X : |fσ (x) − g(x)| < 4ε.

To see this define ρ : {1, . . . , N } → {1, . . . , M } by requiring g(xi ) ∈ Iρ(i) , for all i = 1, . . . , N .
By definition ρ ∈ Σ. Now for any x ∈ X take j in such a way that x ∈ B(xj , δ) and we can
bound

|fρ (x) − g(x)| ≤ |fρ (x) − fρ (xj )| + |fρ (xj ) − g(xj )| + |g(xj ) − g(x) ≤ 3ε,

where the first and the third terms are controlled by equicontinuity (d(x, xj ) < δ), and the
second using that fρ (xj ) and g(xj ) both lies in Ij , which is shorter than ε.
Finally we conclude observing that if {B(fσ , 4ε)}σ∈Σ covers F, then {B(fσ , 5ε)}σ∈Σ must
cover F (small exercise).

Proof of (⇐). (F) Assume F is compact in C(X). Then the product metric space X × F is
compact as well by Corollary 9.70. Furthermore, the function Φ : X × F → R, defined as

Φ : (x, f ) 7→ f (x),

is continuous (recall Exercise 9.111). By Weierstarss Theorem, Φ is bounded, which means


that F is equi-bounded. By Heine-Cantor Theorem, Φ is uniformly continuous, which means
that F is equi-continuous.

Exercise 9.113. — Generalize Theorem 9.76. The image of a compact set trough a family
of equibounded and equicontinuous functions is compact in Rm . (F)

Exercise 9.114. — In a compact domain, pointwise convergence and equicontinuity imply


uniform convergence.

Version: February 25, 2024. 40


Chapter 9.3 Normed vector spaces

Exercise 9.115. — Let uj : [0, 1] → R be a sequence of functions which is 2-lipschitz and


such that uj (1) = 0. Show that {uj }j∈N has a uniformly convergent subsequence. (F)

In practice, Asoli-Arzelà Theorem is often employed in its sequential version, which is

Corollary 9.116: Ascoli–Arzelà for sequences (F)

Let (X, d) be a compact metric space and let {fk } be a bounded sequence in C(X, Rm ).
Assume that {fk } is equi-continuous, meaning that

∀ε > 0, ∃δ > 0, ∀k ∈ N, d(x, y) ≤ δ =⇒ |fk (x) − fk (y)| ≤ ε .

Then {fk } has a subsequence that converges uniformly to some f ∈ C(X, Rm ).

9.3.5 The Length of a Curve

Definition 9.117: Length of a C 1 curve

Let γ ∈ C 1 ([a, b], Rn ) be a curve. We define the length of γ as


Z b
L(γ) := ∥γ ′ (t)∥dt,
a

where the vector γ ′ (t) = (γ1′ (t), . . . , γn′ (t)) is the velocity and the number ∥γ ′ (t)∥ is
the speed of the path at time t.

Exercise 9.118. — Show that if s : [0, 1] → [a, b] is a C 1 bijective map with C 1 inverse,
then L(γ ◦ s) = L(γ).

The main result of this section is

Theorem 9.119: Intrinsic formula for the lenght (F)

Let γ ∈ C 1 ([a, b], Rn ) be a curve, then

Z 1 N
X
∥γ ′ (s)∥ ds = sup ∥γ(tj+1 )−γ(tj )∥ : N ∈ N, 0 = t0 ≤ t1 ≤ . . . ≤ tN −1 ≤ tN = 1 .
0 j=0

Proof. We start showing LHS ≥ RHS. Denote γ(t) = (γ1 (t), . . . , γn (t)), fix N ∈ N, some
partition 0 = t0 ≤ t1 ≤ . . . ≤ tN −1 ≤ tN = 1 and any set of N unit vectors in ν1 , . . . , νN ∈ Rn :

Z 1 N Z
X tj+1 N Z
X tj+1
′ ′
∥γ (s)∥ ds = ∥γ (s)∥ ds ≥ νj · γ ′ (s) ds
0 j=0 tj j=0 tj

N
X Z tj+1 N
X


= νj · γ (s) ds = νj · γ(tj+1 ) − γ(tj ) .
j=0 tj j=0

Version: February 25, 2024. 41


Chapter 9.3 Normed vector spaces

γ(tj+1 )−γ(tj )
With the choice νj := ∥γ(tj+1 )−γ(tj )∥ the last term becomes

N
X Z tj+1 N
X

νj · γ (s) ds = ∥γ(tj+1 ) − γ(tj )∥,
j=0 tj j=0

concluding the proof.


We turn to the proof of LHS ≤ RHS + ε, so we fix some error threshold ε > 0 small. By
uniform continuity of the maps

γ ′ (t)
t 7→ and t 7→ γ ′ (t),
max{ε, ∥γ ′ (t)∥}

from [0, 1] to Rn , we find δ > 0 such that

γ ′ (a) γ ′ (b)
[a, b] ⊂ [0, 1], |b − a| ≤ δ =⇒ − <ε
max{ε, ∥γ (a)∥} max{ε, ∥γ ′ (b)∥}

and ∥γ ′ (a) − γ ′ (b)∥ < ε. (9.5)

Now divide [0, 1] in N equal intervals of length less than δ and consider one of them, let it be
[tj , tj+1 ]. We distinguish two cases.
Case 1. If ∥γ ′ (s)∥ ≤ 2ε for all s ∈ [tj , tj+1 ], then we bound directly
Z
∥γ ′ (s)∥ ds ≤ 2ε(tj − tj+1 ).
[tj ,tj+1 ]

Case 2. If there is s̄ ∈ [tj , tj+1 ] such that ∥γ ′ (s̄)∥ > 2ε then, thanks to (9.5), we have

∥γ ′ (s)∥ ≥ ∥γ ′ (s̄)∥ − ∥γ ′ (s) − γ ′ (s̄)∥ ≥ 2ε − ε = ε, for all s ∈ [tj , tj+1 ],

and thus max{ε, ∥γ ′ ∥} = ∥γ ′ ∥ in this interval. Then by (9.5) we find

γ ′ (tj ) γ ′ (s) γ ′ (tj )  γ ′ (tj )


∥γ ′ (s)∥ = γ ′ (s) · + γ ′
(s) · − ≤ γ ′
(s) · + ε∥γ ′ (s)∥.
∥γ ′ (tj )∥ ∥γ ′ (s)∥ ∥γ ′ (tj )∥ ∥γ ′ (tj )∥

Thus, integrating this inequality in [tj , tj+1 ], and using Cauchy-Schwarz, we find

γ ′ (tj )
Z Z
(1 − ε) ∥γ ′ (s)∥ ds ≤ · γ ′ (s) ds
[tj ,tj+1 ] ∥γ ′ (tj )∥ [tj ,tj+1 ]
γ ′ (tj)
= · (γ(tj+1 ) − γ(tj )) ≤ ∥γ(tj+1 ) − γ(tj )∥.
∥γ ′ (tj )∥

So we proved that both in Case 1 and 2 we have


Z
(1 − ε) ∥γ ′ (s)∥ ds ≤ ∥γ(tj+1 ) − γ(tj )∥ + 2ε(tj − tj+1 ).
[tj ,tj+1 ]

Version: February 25, 2024. 42


Chapter 9.3 Normed vector spaces

Summing in j we find that


Z 1 N Z
X tj+1

(1 − ε) LHS = (1 − ε) ∥γ (s)∥ ds = (1 − ε) ∥γ(s)′ ∥ ds
0 j=0 tj

N
X
≤ ∥γ(tj+1 − γ(tj )∥ + 2ε(tj − tj+1 ) ≤ RHS + 2ε,
j=0

we conclude letting ε ↓ 0.
We remark that the idea is simply to take intervals so short that in each of them γ ′ (s) is
approximately constant. The use of ε is instrumental to handle the case where γ ′ is in fact

short, but not pointing in a well-defined direction. In other words, the map t 7→ ∥γγ ′ (t)
(t)∥ is not
necessarily continuous, so we regularized it with ε.

Exercise 9.120. — Let U ⊂ Rn be open and connected. A path γ : [0, 1] → U is called


piecewise differentiable if there exist finitely many points 0 = t0 < t1 < t2 < . . . < tn = 1 in
[0, 1] such that for all k = 1, 2, . . . , n, the restriction γ|[tk−1 ,tk ] is continuously differentiable.

1. Show that for any two points x, y ∈ U , there exists a piecewise differentiable path from
x to y.

2. The length of a piecewise differentiable path as above is defined as

K Z
X tk
L(γ) = |γ ′ (s)|ds
k=1 tk−1

Verify that the path metric dpath (x, y) for x, y ∈ U , defined by

dpath (x, y) = inf{L(γ) | γ is a piecewise differentiable path from x to y},

is indeed a metric.

3. Show that the path metric induces the standard topology on U ⊂ Rn .

4. Let f : U → R be continuously differentiable with a bounded derivative. Show that f is


Lipschitz continuous when equipped with the path metric.

5. Find an example of a connected set U ⊂ Rn and a differentiable function f : U → R


with a bounded derivative that is not Lipschitz continuous with respect to the Euclidean
metric.

Version: February 25, 2024. 43


Chapter 10

Multidimensional Differentiation

In this chapter, we extend the concept of derivatives to functions defined on open subsets
U ⊂ Rn and taking values in Rm . We will not impose any restrictions on the positive integers
n, m.

10.1 The Differential

10.1.1 Definitions
The derivative of a real-variable function f : R → R at some point x0 ∈ R has various
equivalent interpretations. Of course each of these interpretations provides the same number
f ′ (x0 ), but the “meaning” we attach to this number is slightly different in each case. (F) Let
us recall three important ones:

• Slope of tangent line to the graph. We look at the graph of f , i.e. the curve
{y = f (x)} ⊂ R2 and write the tangent line to the graph at (x0 , f (x0 )) in the form
y = ax + b. Then we have a = f ′ (x0 ).

• Coefficient in infinitesimal linear approximation. The linear part of the best


polynomial that approximates f near x0 . More precisely, assume we want to approximate
f (x) ∼ a + bx around x ∼ x0 , for some real numbers a, b. Then, choosing b = f ′ (x0 ), we
get the approximation that gets better and better as x → x0 ;

• Stretching Factor. Look at a short interval I around x0 and the corresponding interval
f (I) around f (x0 ). These two intervals are related trough a “stretching factor” which
tends to be f ′ (x0 ) as I is taken shorter. Look here for an animation. More rigorously,
the family of functions

f (x0 + ry) − f (x0 )


T fr := [−1, 1] ∋ y 7→ ,
r

as r ↓ 0, converges uniformly in [−1, 1] to the linear map y 7→ f ′ (x0 )y.

44
Chapter 10.1 The Differential

To generalize derivatives to functions f : Rn → Rm we will start from the second point of


view. As it will be clear later on, all the three viewpoints (conveniently reinterpreted) are
valid also in the several variables case.

Definition 10.1: Differential of a Function


Let U ⊂ Rn be open and f : U → Rm be a function. Then f is called differentiable
at x0 ∈ U if there exists a linear map L : Rn → Rm such that

∥f (x0 + x) − f (x0 ) − L(x)∥


lim =0
x→0 ∥x∥

holds. If such L exists it is unique (exercise) and it is called the differential of f at


the point x0 , and we denote it as
L = Dfx0 .

The function f is called differentiable in U if it is differentiable at every point in U .

TODO: I think it would be quite nice to show them that this is equivalent to the fact that
the blow-ups of f are linear maps. They have seen uniform convergence and it is geometrically
as insightful as the tangent plane approx (and more appropriate in some cases).

10.2. — If f : U → Rm is differentiable at the point x0 ∈ U , with differential L = Dfx0 ,


we can express f as
f (x0 + x) = f (x0 ) + L(x) + R(x)

Here we recognize the affine-linear approximation x 7→ f (x0 ) + L(x) to f , and a remainder


term R(x), for which, according to the definition of the total derivative, R(x) = o(∥x∥) holds
for x → 0. An notations often found in the literature for differential of f at the point x0 is
dfx0

10.3. — For functions f : R → R the derivative f ′ (x◦ ) is a real number. Notice that in
this case L(y) = Dfx0 (y) = f ′ (x◦ )y.

Version: February 25, 2024. 45


Chapter 10.1 The Differential

Figure 10.1: For a function f : R2 → R, the best affine approximation corresponds to the
tangent plane of the graph in R3 .

Applet 10.4 (Tangent Plane). As shown in the above image, we depict the tangent planes
for the graphs of two functions f : R2 → R. Additionally, we visualize the partial derivatives
and directional derivatives in Definition 10.5. Is there a directional derivative that vanishes
at every point?

Definition 10.5: Directional derivative


Let U ⊂ Rn be an open subset, x0 ∈ U , v ∈ Rn and and f : U → Rm . The directional
derivative of f in the direction v at x0 is

d f (x0 + sv) − f (x0 )


∂v f (x0 ) := f (x0 + sv) = lim ∈ Rm ,
ds s=0 s→0 s

provided that the limit exists. If v = ej , for some j ∈ {1, . . . , n}, we may denote
∂ei f (x0 ) with
∂f (x0 )
Dj f (x0 ), ∂j f (x0 ), .
∂xj
Of course, if the partial derivative in the j-th coordinate exists at every point in U , we
obtain a function ∂j f : U → Rm , which we call the j th directional derivative of f .

10.6. — The partial derivative and the directional derivative along any vector are the
derivatives with respect to one of the independent variables, considering all other variables as
constants. For example, for the function f : R3 → R given by f (x, y, z) = x(y 2 + sin(z)), the
partial derivatives with respect to all coordinate directions are given by

∂x f (x, y, z) = y 2 + sin(z)
∂y f (x, y, z) = 2xy
Version: February 25, 2024. 46
Chapter 10.1 The Differential

∂z f (x, y, z) = x cos(z)

for all (x, y, z) ∈ R3 , as we can apply all known rules from Analysis I. If the total derivative
exists, we can connect it with partial derivatives and derivatives along arbitrary vectors using
the following proposition.

Proposition 10.7: Differentiable implies linear directional derivatives


Let U ⊂ Rn be open and let f : U → Rm be differentiable at x0 ∈ U . Then, for each
v ∈ Rn , the derivative of f in the direction v exists, and we have

∂v f (x0 ) = Dfx0 (v) ∈ Rm .

In particular ∂v+αw f (x0 ) = ∂v f (x0 ) + α∂w f (x0 ), for all α ∈ R and v, w ∈ Rn .

Proof. Assuming the total derivative Df (x0 ) exists, according to the definition of the deriva-
tive, f (x0 + h) = f (x0 ) + Df (x0 )(h) + o(∥h∥) holds for h → 0. Choosing h = sv for s → 0
and v ∈ Rn , we get

f (x0 + sv) − f (x0 )


∂v f (x0 ) = lim = lim (Df (x0 )(v) + o(1)) = Df (x0 )(v)
s→0 s s→0

which concludes the proof.

Exercise 10.8. — Let U ⊂ Rn be open, and let f1 , f2 : U → R be functions. Assume


that f1 and f2 are differentiable at x0 ∈ U . Show that the functions f1 + f2 and f1 f2 are
differentiable at x0 and that

D(f1 + f2 )(x0 ) = Df1 (x0 ) + Df2 (x0 )


D(f1 f2 )(x0 ) = f1 (x0 )Df2 (x0 ) + f2 (x0 )Df1 (x0 )

Formulate and prove analogous statements for directional derivatives in the direction of a fixed
vector v ∈ Rn .

Lemma 10.9: More variables in the target are not a problem


Let U ⊂ Rn be open, and let f : U → Rm be a function. Denote πj : Rm → R as the
projection onto the j-th component. Then, f is differentiable at x0 ∈ U if and only if
the components fj = πj ◦ f for each j ∈ {1, . . . , m} are differentiable at x0 . In this
case, we have
πj ◦ Df (x0 ) = D(πj ◦ f )(x0 ).

Proof. Assume that fj := πj ◦ f is differentiable at x0 for every j. There exists a linear


function Lj : Rn → R and a remainder term Rj : Rn → R such that

fj (x0 + x) = fj (x0 ) + Lj (x) + Rj (x)

Version: February 25, 2024. 47


Chapter 10.1 The Differential

with Rj (x) = o(∥x∥) for x → 0. We can summarize


       
f1 (x0 + h) f1 (x0 ) L1 (x) R1 (x)
 ..   .   .   . 
 =  ..  +  ..  +  .. 
f (x0 + h) = 
 .       
fm (x0 + x) fm (x0 ) Lm (x) Rm (x)

In summary, it can be written as f (x0 + x) = f (x0 ) + L(x) + R(x). In this expression, L is


linear, and it holds that Rj (x) = o(∥x∥) for x → 0. Thus, f is differentiable, and the claimed
formula for Df (x0 ) holds. Conversely, if f is differentiable at the point x0 , then the claimed
formula follows through the same calculation.

Theorem 10.10: Sufficient condition for Differentiability


Let U ⊂ Rn be open, and f : U → Rm be a function. If for every j ∈ {1, . . . , n} the
partial derivative ∂j f exists on the entire U and defines a continuous function, then f
is differentiable on the entire U .

Proof. Due to Lemma 10.9, we can assume m = 1. Let’s fix x0 ∈ U , and we need to show
that f is differentiable at x0 . By replacing f with x 7→ f (x + x0 ) − f (x0 ), we can also assume
that x0 = 0 and f (0) = 0. For x = (x1 , . . . , xn ) ∈ U , we then have

f (x) = f (x1 , x2 , x3 , . . . , xn ) −f (0, x2 , x3 . . . , xn )


+f (0, x2 , x3 , . . . , xn ) −f (0, 0, x3 , . . . , xn )
+f (0, 0, x3 , . . . , xn ) − ···
+ ··· −f (0, 0, . . . , 0, xn )
+f (0, 0, . . . , 0, xn ) −f (0, 0, . . . , 0, 0).

The function [0, xj ] → R defined by t 7→ f (0, 0, . . . , 0, t, xj+1 , . . . , xn ) is continuously differen-


tiable by hypothesis. Its derivative is given by the j-th partial derivative of f . Therefore, by
the Mean Value Theorem, there exists an intermediate point ξj ∈ [0, xj ] such that

∂j f (0, . . . , 0, ξj , xj+1 , . . . , xn )xj = f (0, . . . , 0, xj , xj+1 , . . . , xn ) − f (0, . . . , 0, 0, xj+1 , . . . , xn )

holds. For any choice of such intermediate points ξj ∈ [0, xj ], we obtain

f (x) = ∂1 f (ξ1 , x2 , x3 , . . . , xn )x1


+∂2 f (0, ξ2 , x3 , . . . , xn )x2
+ ···
+∂n f (0, 0, . . . , 0, ξn )xn .

To show that the linear function L : (v1 , . . . , vn ) 7→ ∂1 f (0)v1 + · · · ∂n f (0)vn is the derivative
Df (0), we need to estimate the difference R(x) := f (x) − L(x) = f (0 + x) − f (0) − L(x).

R(x) = ∂1 f (ξ1 , x2 , x3 , . . . , xn ) − ∂1 f (0) x1

Version: February 25, 2024. 48


Chapter 10.1 The Differential


+ ∂2 f (0, ξ2 , x3 , . . . , xn ) − ∂2 f (0) x2
+ ···

+ ∂n f (0, 0, . . . , 0, ξn ) − ∂n f (0) xn

|xj |
According to the assumptions of the theorem and because ∥x∥ ≤ 1 for all x ∈ Rn , the
asymptotics
R(x)
lim =0
x→0 ∥x∥

holds, demonstrating that f is differentiable at x0 = 0 with the derivative Df (0) = L.

The following exercise shows that the existence of partial derivatives of a function f , with-
out the continuity assumption, does not necessarily imply that the function is differentiable.

Exercise 10.11. — Consider the function f : R2 → R defined by



 √ xy if (x, y) ̸= (0, 0),
f (x, y) = x2 +y 2

0 if (x, y) = (0, 0),


for (x, y) ∈ R2 . Show that the partial derivatives ∂x f and ∂y f exist everywhere in R2 , but f
is not differentiable at (0, 0).

Definition 10.12: C 1 functions


We call a function f : U → Rm on an open subset U ⊂ Rn continuously differen-
tiable if all the partial derivatives ∂i f , i = 1, . . . , n exist and continous in U . We will
write in this case f ∈ C 1 (U, Rm ) and often f ∈ C 1 (U ) when m = 1.

Extend definition to clsoed sets as well

10.13. — Notice that f ∈ C 1 (U, Rm ) if and only if the map Df : U → Hom(Rn , Rm )


mapping x 7→ Dfx is continuous.

10.14. — From Proposition 10.7, it follows in particular that the total derivative (when
it exists) Df (x0 ) is uniquely determined by the partial derivatives. Specifically, for v =
a1 e1 + · · · + an en ∈ Rn ,
n
X n
X
Dfx0 (v) = ai Dfx0 (ei ) = ai ∂i f (x0 ).
i=0 i=0

Version: February 25, 2024. 49


Chapter 10.1 The Differential

The m × n matrix of the linear map Dfx0 : Rn → Rm is thus given with respect to the
canonical bases by the matrix n × m matrix
 ∂f ∂f1 ∂f1

1
∂x (x0 ) ∂x2 (x0 ) ··· ∂xn (x0 )
 ∂f21 ∂f2 ∂f2
 ∂x1 (x0 ) ∂x2 (x0 ) ··· ∂xn (x0 ) 

(∂1 f (x0 ), ∂2 f (x0 ) . . . , ∂n f (x0 )) = 
 .. .. .. .. 
. . . .

 
∂fm ∂fm ∂fm
∂x1 (x0 ) ∂x2 (x0 ) · · · ∂xn (x0 )

This matrix is referred to as the Jacobian matrix of f evaluated at the point x0 , commonly
denoted by Jf (x0 ). An alternative notation for the Jacobian matrix is Df (x0 ); however,
to prevent confusion with Dfx0 , which distinguishes between a linear map and a matrix
representation, we will not use this notation yet. While the difference between a matrix and a
linear map might appear to be a minor detail, it becomes more significant in future applications
such as in Differential Geometry or Physics.
Notice that, given U ⊂ Rn open, f ∈ C 1 (U, Rn ) if and only if the maps

Jf : U → Matm,n (R) ∼
= Rm×n

defined by x 7→ Jf (x) is a continuous .

TODO: exercise where they compute the derivative tangential/normal to a circle of the
function (x, y) 7→ x2 + y 2
TODO: example of computation of Jf for a particular f : R3 → R2

Example 10.15. — Let f : R2 → R2 be defined by f (x, y) = (x2 − cos(xy), y 4 − exp(x)).


The Jacobian matrix of f at (x, y) is then
!
2x + sin(xy)y sin(xy)x
,
− exp(x) 4y 3

which is continuous as a function of (x, y) ∈ R2 .

Version: February 25, 2024. 50


Chapter 10.1 The Differential

10.1.2 The Chain Rule

Theorem 10.16: Chain Rule (pointwise version)

Let k, m, n ≥ 1, and let U ⊂ Rn and V ⊂ Rm be open. If f : U → V is differentiable at


x0 and g : V → Rk is differentiable at f (x0 ), then g ◦ f is differentiable at x0 , and the
differential of (g ◦ f ) at x0 is given by

D(g ◦ f )x0 = Dgf (x0 ) ◦ Dfx0 (10.1)

In other words, at the level of matrices

J(g ◦ f )(x0 ) = Jg(f (x0 )) · Jf (x◦ )

or equivalently:
m
∂(g ◦ f ) X ∂g ∂fj
(x0 ) = (f (x0 )) (x◦ )
∂xi ∂xj ∂xi
j=1

Proof. By the definition of differentiability of f at x0 and g at y0 = f (x0 ), we have

f (x0 + x) = f (x0 ) + L(x) + R(x) and g(y0 + y) = g(y0 ) + M (y) + S(y)

with L = Df (x0 ) and R(x) = o(∥x∥) as x → 0, and M = Dg(y0 ) and S(y) = o(∥y∥) as y → 0.
Together, for x ∈ Rn small enough and y = f (x0 + x) − f (x0 ) = L(x) + R(x), we obtain the
equation

g(f (x0 + x)) = g(y0 + y) = g(y0 ) + M (y) + S(y) =


= g(f (x0 )) + M (L(x)) + M (R(x)) + S(L(x) + R(x))
| {z }
T (x)

and we want to show that T (x) = o(∥x∥) as x → 0. Since R(x) = o(∥x∥) as x → 0,


we also have ∥M (R(x))∥ ≤ ∥M ∥op ∥R(x)∥ = o(∥x∥) as x → 0. It remains to show that
S(L(x) + R(x)) = o(∥x∥) as x → 0. The fact that S(x) = o(∥x∥) as x → 0 means that for
every ϵ > 0, there exists δ > 0 such that for all y ∈ Rm

∥y∥ < δ =⇒ ∥S(y)∥ ≤ ϵ∥y∥ (10.2)

After the differentiability of f at x0 , for y = L(x) + R(x), we have the estimate

∥y∥ ≤ ∥L(x)∥ + ∥R(x)∥ ≤ ∥L∥op ∥x∥ + o(x)

as x → 0. Let C = ∥L∥op + 1; then, there exists η > 0 such that for all x ∈ Rn

∥x∥ < η =⇒ ∥L(x) + R(x)∥ < C∥x∥

Version: February 25, 2024. 51


Chapter 10.1 The Differential

For x ∈ Rn with ∥x∥ < min{η, δC −1 }, we have ∥L(x) + R(x)∥ < δ, and using (10.2), we also
have
∥S(L(x) + R(x))∥ ≤ ϵ∥L(x) + R(x)∥ ≤ Cϵ∥x∥.

Thus, we conclude the differentiability of g ◦ f at x0 and the equation (10.1).

Corollary 10.17: Chain Rule

Let k, m, n ≥ 1, and let U ⊂ Rn and V ⊂ Rm be open. If f ∈ C 1 (U, Rn ), g ∈ C 1 (V, Rn ),


and f (U ) ⊂ V then f ∈ C 1 (U, Rk ) and
m
X
∂i (g ◦ f ) = (∂j g) ◦ f ∂i fj (10.3)
j=1

TODO: add a lot of examples, they are usually not able to use correctly the chain rule in
exams....

Exercise 10.18. — Euler’s identity for honogeneous functions (F) Assume f ∈ C 1 (R\ {0})
is positively homogeneous of degree λ ∈ R, that is to say

f (rx) = rλ f (x) for all r > 0, x ̸= 0.


Pn
Show that =1 xj ∂j f (x) = λf (x).

Example 10.19. — (F) Let u : Rn → R, check that

Di u Di u X
Di (arctan(u)) = 2
, Di (1/u) = − 2 , Di (|Du|2 ) = 2 Dj uDij u,
1+u u
j
X X X X
2 2

Dii (|Du| ) = 2 (Dij u) + 2 Di uDi Djj u .
i i,j i j

If A ∈ O(n) and v(x) := u(Ax) then


X X X
Di v(x) = Ai,j Dj u(Ax) and Dii v(x) = (Djj u)(Ax).
j i j

10.20. — Let’s consider the special case n = 1 for the chain rule. Suppose I ⊂ R is an
open interval, and γ : I → V ⊂ Rm is a differentiable function with values in an open subset
V ⊂ Rm . Further, let f : V → Rk be differentiable. Then, the chain rule implies that f ◦ γ is
differentiable, and the formula

(f ◦ γ)′ (t) = Dfγ(t) (γ ′ (t))

holds for all t ∈ I. If additionally k = 1, then f ◦ γ : I → R and Dfγ(t) γ ′ (t), expressed in


matrix form, is the matrix product of the 1 × m matrix Df (γ(t)) with the m × 1 matrix γ ′ (t).

Version: February 25, 2024. 52


Chapter 10.1 The Differential

In this case, we interpret Df (x) for x ∈ V as the column vector

gradf (x) = ∇f (x) = (Df (x))T ∈ Rm

and refer to it as the gradient of the function f at the point x. Using this notation, we
obtain the formula

(f ◦ γ)′ (t) = Df (γ(t)) · γ ′ (t) = ⟨∇f (γ(t)), γ ′ (t)⟩ (10.4)

for all t ∈ I.

10.21. — The concept of directional derivatives and the case of equality in the Cauchy-
Schwarz inequality allow us to provide a geometric interpretation of the gradient of a function.
If f : U → R is a differentiable function on an open subset U ⊂ Rn , then, according to
Proposition 10.7 and the Cauchy-Schwarz inequality, Proposition,

∂v f (x) = Dfx (v) = ⟨∇f (x), v⟩ ≤ |f (x)| |v|

for any vector v ∈ Rn , with equality if and only if ∇f (x) and v are linearly dependent. This
implies that the gradient of f at every point points in the direction of the greatest directional
derivative, indicating the direction of the steepest ascent of f around x. Furthermore, ∥∇f (x)∥
gives the slope in that direction.

10.1.3 The Mean Value Theorem


We formulate a generalization of the Mean Value Theorem for real-valued differentiable func-
tions on an open set U ⊂ Rn . To do this, we consider a given function f along a straight
segment in the open set.

Theorem 10.22: Mean Value Theorem


Let U ⊂ Rn be open, and f : U → R be differentiable. Let x0 ∈ U and h ∈ Rn such that
x0 + th ∈ U for all t ∈ [0, 1]. Then, there exists t ∈ (0, 1) such that for ξ = x0 + th, the
equation
f (x0 + h) − f (x0 ) = Dfξ (h) = ∂h f (ξ)

is satisfied.

Proof. The derivative of the straight path γ : t 7→ x0 + th for fixed x0 , h ∈ Rn is given by


γ ′ (t) = h. Therefore, the function g = f ◦ γ : [0, 1] → R satisfies all the conditions of the
one-dimensional Mean Value Theorem due to the chain rule in Theorem 10.16. Hence, there
exists t ∈ (0, 1) with g(1) − g(0) = g ′ (t) = Df (x0 + th)(h) according to the chain rule, and
thus
f (x0 + h) − f (x) = g(1) − g(0) = g ′ (t) = Df (ξ)(h)

Version: February 25, 2024. 53


Chapter 10.1 The Differential

for ξ = x0 + th.

Corollary 10.23: Solutions of Df = 0

Let U ⊂ Rn be open and let f : U → Rm be differentiable with Df (x) = 0 for all x ∈ U .


Then, f is constant on each connected component of U .

Proof. It suffices to consider the case m = 1. Assuming that U is non-empty, we choose


x0 ∈ U and consider the subset

U ′ = {x ∈ U | f (x) = f (x0 )}

of U . Since f is continuous, U ′ ⊂ U is a closed subset of U . From the assumption and the


mean value theorem, it follows that U ′ is open: Indeed, for x ∈ U ′ , there exists ϵ > 0 such
that B(x, ϵ) ⊂ U , and since every point y ∈ B(x, ϵ) can be connected by a straight path to x,
it follows from Theorem 10.22 that f (y) = f (x) = f (x0 ). Thus, y ∈ U ′ , and since y ∈ B(x, ϵ)
was arbitrary, B(x, ϵ) ⊂ U ′ . However, since U is connected, we have U ′ = U , and the corollary
follows.

TODO: Exercise: what if only a partial derivative is zero? What if U not connected?

Definition 10.24: Local Lipschitz continuity


A function f : X → Y between metric spaces X, Y is called locally Lipschitz contin-
uous if, for every x0 ∈ X, there exists ϵ > 0 such that f |B(x0 ,ϵ) is Lipschitz continuous.

Corollary 10.25: Differentiability VS Lispchitz continuity

Let U ⊂ Rn be open and let f ∈ C 1 (U, Rm ). Then, f is locally Lipschitz continuous.


If U is additionally convex, and the differential is bounded (as a matrix), then f is
Lipschitz continuous.

Proof. It suffices to consider the case m = 1. First, assuming that U is convex and the
derivative is bounded. There exists M ≥ 0 such that ∥Df (ξ)∥op ≤ M for all ξ ∈ U . From the
mean value theorem 10.22, it follows for x, y ∈ U

∥f (x) − f (y)∥ = ∥Df (ξ)(x − y)∥ ≤ M ∥x − y∥

for some ξ ∈ U , since U is convex and thus contains the straight segment between x and y.
This proves the second statement in the corollary. The first statement follows from the second
applied to the ball U0 = B(x0 , ϵ) where ϵ > 0 is chosen such that B(x0 , ϵ) ⊂ U . Indeed, U0
is convex, and the mapping ξ 7→ Df (ξ) is a continuous function on the compact set B(x0 , ϵ),
implying the boundedness of the derivative on B(x0 , ϵ).

Version: February 25, 2024. 54


Chapter 10.2 Higher Derivatives

10.2 Higher Derivatives

10.2.1 Definition and basic properties


Recall that for functions f : R → R we defined second and higher order derivatives recursively
as follows f (k+1) = (f (k) )′ , k ≥ 0, where f (0) = f . Next we will introduce higher order
derivatives for function f : Rn → Rn .

Definition 10.26:
Let U ⊂ Rn be open, f : U → Rm be a function, and d ≥ 1. We say that f is k times
continuously differentiable if, for all j1 , . . . , jd in {1, . . . , n}, the partial derivative

∂j1 ∂j2 · · · ∂jk f (x)

exists at every point x ∈ U and, depending on x ∈ U , defines a continuous function on


U . We write

C k (U, Rm ) = {f : U → Rm | f is k-times continuously differentiable}

for the vector space of d times continuously differentiable Rm -valued functions on U .


We call the function f smooth if f is continuously differentiable an arbitrary number
of times. We write

C ∞ (U, Rm ) = {f : U → Rm | f is d-times continuously differentiable for all d ≥ 1}

for the vector space of smooth Rm -valued functions on U .

Proposition 10.27: Higher regularity of sums, products and compositions

Let f ∈ C k (Rn ), g ∈ C ℓ (Rn ) and ϕ ∈ C m (Rn , Rn ), then

(1) f + g, f · g are of class C min{k,ℓ}

(2) f ◦ ϕ is of class C min{k,m}

Proof. We prove only the second item and leave the first as exercise. We do an induction on
min{k, m}. Since composition of continuous functions is necessarily continuous the base case
min{k, m} = 0 is handled. Now assume the statement has been proved whenever min{k, m} ≤
N and that we have f, ϕ, k, m with min{k, m} = N + 1. For sure ϕ ∈ C N (since m ≥ N + 1)
and Df ∈ C k−1 , so the inductive assumption applied to ∂i f ◦ ϕ gives
n
X
∂j (f ◦ ϕ) = ∂i f ◦ ϕ ∂j ϕi .
| {z } |{z}
i=1
C min{k−1,N } C m−1

Version: February 25, 2024. 55


Chapter 10.2 Higher Derivatives

Thus by the first part ∂j (f ◦ ϕ) is of class min{k − 1, N, k − 1} ≥ N, for all j, which means
that f ◦ ϕ in C N +1 .

10.2.2 Schwartz’s Theorem and Multi-indexes notation

Theorem 10.28: Schwarz’s Theorem

Let U ⊂ Rn be open and let f ∈ C 2 (U, Rm ), then

∂j ∂i f (x) = ∂i ∂j f (x), for all i, j ∈ {1, . . . , n}, x ∈ U.

Proof. It suffices to consider the case n = 2, m = 1 and i = 1, j = 2, the general case


follows by renaming variables from the considered special case (applying it to the components
f1 , . . . , fm of f ). For x ∈ U and a sufficiently small h > 0, we define a function F by

F (h) = f (x1 + h, x2 + h) − f (x1 + h, x2 ) − f (x1 , x2 + h) + f (x1 , x2 ).

Furthermore, for a sufficiently small but fixed h > 0, we consider the differentiable function
φ : [0, 1] → R given by φ(t) = f (x1 + th, x2 + h) − f (x1 + th, x2 ) and obtain

F (h) = φ(1) − φ(0) = φ′ (ξ1 ) = ∂1 f (x1 + ξ1 h, x2 + h) − ∂1 f (x1 + ξ1 h, x2 ) h




for some ξ1 ∈ (0, 1) by the one-dimensional Mean Value Theorem ??.

Figure 10.2: The function h 7→ F (h) is a signed sum of function values of f at the corners of
a square (here marked by a solid line). The function t 7→ φ(t) corresponds to the difference of
function values on a vertical segment through the square.

Applying the one-dimensional Mean Value Theorem again to ψ : [0, 1] → R given by


ψ(t) = ∂1 f (x1 + ξ1 h, x2 + th) along with the chain rule, we obtain

F (h) = ∂1 f (x1 + ξ1 h, x2 + h) − ∂1 f (x1 + ξ1 h, x2 ) h = ∂2 ∂1 f (x1 + ξ1 h, x2 + ξ2 h)h2




Version: February 25, 2024. 56


Chapter 10.2 Higher Derivatives

for some intermediate point ξ2 ∈ (0, 1). Since both components were used symmetrically in
the function h 7→ F (h), we can perform the argument again with the roles of the first and
second components swapped. This yields similarly

F (h) = ∂1 ∂2 f (x1 + ξ1′ h, x2 + ξ2′ h)h2 .

for suitable ξ1′ , ξ2′ ∈ (0, 1). Dividing by h2 > 0, we obtain

∂2 ∂1 f (x1 + ξ1 h, x2 + ξ2 h) = ∂1 ∂2 f (x1 + ξ1′ h, x2 + ξ2′ h).

Since ξ1 , ξ2 , ξ1′ , ξ2′ ∈ (0, 1), the points (ξ1 h, ξ2 h) and (ξ1′ h, ξ2′ h) tend to (0, 0) as h tends to
0. Therefore, due to the continuity of both partial derivatives, we conclude ∂2 ∂1 f (x) =
∂1 ∂2 f (x).

TODO: along with the Hessian, perhaps define also the Laplacian?

Definition 10.29: Hessian and Laplacian

The Hessian matrix of f ∈ C 2 (U ) at x ∈ U is the n × n matrix

Hij (x) = ∂i ∂j f (x)

for i, j ∈ {1, . . . , n}. Schwarz’s theorem 10.28 entails that H(x) is a symmetric matrix.
The Laplacian is the trace of this matrix
n
X
∆f (x) := tr H(x) = ∂ii f (x)
i=1

Exercise 10.30 (Polarisation formula). — Let f : Rn → R be a smooth function. Prove


that the map
d2
Rn ∋ e 7→ 2 f (te) = D2 f (0)(e, e),
dt t=0
determines all the second derivatives ∂ij f .

Interlude: Multi-Indeces and Polynomials of several variables


Any α ∈ Nn is called a multi-index of length n, the length of a multi-index is

|α| := α1 + . . . + αn .

We say that β ≤ α if βi ≤ αi for all i = 1, . . . , n, and define the factorial

α! := α1 ! . . . αn !, 0! = 1.

Version: February 25, 2024. 57


Chapter 10.2 Higher Derivatives

A polynomial of n variables X1 , . . . , Xn of degree k can be uniquely (and compactly)


expressed as
X X
cα X α := c(α1 ,...,αn ) X α1 . . . X αn .
α∈Nn ,|α|≤k |α|≤k

Many combinatorial formulas are simple when expressed in multi-index notation, such
as the multi-nomial formula:
X α  
α
 
α1
 
αn
α
(X + Y ) = X Yβ α−β
, where := ... .
β β β1 βn
β≤α

Exercise 10.31. — Prove the identity

nm X 1
= .
m! α!
α∈Nn ,|α|=m

Thanks to Schwartz’s Theorem, multi-indexes are also useful to express higher order deriva-
tives.

Corollary 10.32: Schwarz in C k

Let f ∈ C k (Rn ) and let α ∈ Nn with |α| ≤ k. Then the derivative


 α1  αn
∂α f (x) = ∂x1 . . . ∂ xn f (x),

is independent of the precise order in which we take the derivatives.

Proof. Immediate using Schwartz’s Theorem (exercise).

10.2.3 Multidimensional Taylor Approximation

Theorem 10.33: Taylor’s Theorem

Let U ⊂ Rn be open and let f ∈ C k+1 (U ), k ≥ 0. Let x0 ∈ U and h ∈ Rn such that


x + th ∈ U for all t ∈ [0, 1]. Then, we have
X hα
f (x0 + h) = ∂ α f (x0 ) + Rk+1 f (x0 , h)
α!
α∈Nn ,|α|≤k

where the reminder is given by


1

Z X
Rk+1 f (x0 , h) := (k + 1)(1 − t)k ∂ α f (x0 + th) dt = O(∥h∥k+1 ).
0 α!
α∈Nn ,|α|=k+1

Proof. Since U is open, there exists ϵ > 0 such that x + th ∈ U for all t ∈ (−ϵ, 1 + ϵ). We apply
the one-dimensional Taylor approximation to φ : (−ϵ, 1 + ϵ) → R given by φ(t) = f (x + th).

Version: February 25, 2024. 58


Chapter 10.2 Higher Derivatives

According to Taylor’s Theorem, we obtain the Taylor approximation around 0 at 1

k Z 1
X φ(m) (0) (1 − t)k
φ(1) = + φ(k+1) (t) dt. (10.5)
m! 0 k!
m=0

Applying the chain rule in Theorem 10.16 to φ, we get for t ∈ (−ϵ, 1 + ϵ) the derivatives
n
X

φ (t) = ∂i f (x0 + th)hi ,
i=1
X n
φ′′ (t) = ∂i ∂j f (x0 + th)hi hj ,
i,j=1
Xn
φ′′′ (t) = ∂i ∂j ∂ℓ f (x0 + th)hi hj hℓ , ...
i,j,ℓ=1

So inductively for all m ≤ k + 1:


n
X X hα
φ(m) (t) = ∂i1 ...im f (x0 + th)hi1 . . . him = m! ∂α f (x0 + th) ,
α!
i1 ,...,im =1 |α|=m

where we used a combinatorial count to re-write the last sum. Indeed unwrapping the defini-
tions one has

hi1 . . . him = hα ⇔ αr = #{j ∈ {1, . . . m} : ij = r} for all r = 1, . . . , n,

and so, given a n-multiindex α of length m, the equation hi1 . . . him = hα has m!/α! solutions:
one has to choose α1 elements among m to be sent to 1, then α2 elements among the remaining
m − α1 to be sent to 2, etc. Hence the total number of solution is
     
m m − α1 m − α1 − α2 m − α1 − . . . αn−1 m! m!
... = =
α1 α2 α3 αn α1 ! . . . αn ! α!

Substituting this into (10.5), we obtain the theorem.

Corollary 10.34: Practical computation of higher derivatives (F)

Let x0 ∈ U , f ∈ C k+1 (U ) and P (x) a polynomial of degree k ≥ 0. Assume that

|f (x0 + h) − P (h)| = o(|h|k ) as h → 0.

Then Dj f (x0 )(h, . . . , h) = Dj P (h), for all 0 ≤ j ≤ k.

Proof. By Taylor’s theorem we immediately get

d
X 1 k
P (h) − D f (x0 )(h, . . . , h) = o(|h|k ),
k!
k=0

Version: February 25, 2024. 59


Chapter 10.2 Higher Derivatives

but two polynomials of degree k whose difference is o(|h|k ) must have exactly the same coef-
ficients.

This Corollary can be useful to compute Taylor polynomials for explicit functions, without
having to care about factorials etc.

Example 10.35. — (F) Let us compute the Taylor polynomial of degree 2 of


p
1 + x − y2
around the origin. From Analysis I, you known that
√ t2
1+t=1+ t
2 − 8 + O(t3 ), t → 0.

Plugging in t = x − y 2 we find

p x − y 2 (x − y 2 )2
1 + x − y2 = 1 + − + O((x − y 2 )3 )
2 8
1 1 1 1 1
= 1 + x − x2 − y 2 + xy 2 − y 4 + O((x − y 2 )3 ).
2 8 2 4 8

Now we want (x, y) → (0, 0), and a reminder which is o(r2 ) where r := x2 + y 2 . Observing
p

that
1 2 1 4
xy = O(r3 ), y = O(r4 ), O((x − y 2 )3 ) = O(r3 ), as r ↓ 0,
4 8
we find out expansion
p 1 1 1
1 + x − y 2 = 1 + x − x2 − y 2 + O(r3 ), as r ↓ 0.
2 8 2

Exercise 10.36. — (F) Compute the Taylor polynomials up to the quadratic order at
(0, 0) of the following functions in two variables
p 1
sin(xy), 1 + x + y2, exp arctan(x − y), ,...
1 − x2 − y 2

Don’t use the general formula!

Exercise 10.37. — (F) Prove the Taylor expansion of the determinant close to the identity
is
t2
tr(X)2 − tr(X 2 ) + O(t3 ),

det(I + tX) = 1 + t tr(X) +
2
and that the one of the inverse matrix function is

(I + tX)−1 = 1 − tX + t2 X 2 + O(t3 ).

Version: February 25, 2024. 60


Chapter 10.2 Higher Derivatives

10.38. — The formula in Theorem 10.33 is called the Taylor expansion with remainder
of f at the point x. The main term

d
X 1 k
P (h) = f (x) + D f (x)(h, . . . , h)
k!
k=1

is exactly like in the one-dimensional Taylor approximation, a polynomial function of h, but


this time in the d variables h1 , . . . , hn . Here, Dk f (x)(h, . . . , h) is precisely the homogeneous
part of degree k. The integral
1
(1 − t)d
Z
R(h) = φ(d+1) (t) dt
0 d!

is called the remainder. The estimation R(h) = O(∥h∥d+1 ) follows from the one-dimensional
case.

Applet 10.39 (Taylor Approximation). We observe how the first, second, or third-order Tay-
lor approximations approximate the function f (x, y) = sin(x) cos(y) + 2.

Version: February 25, 2024. 61


Chapter 11

Potentials, Optimization and


Convexity

TODO: Euler identity for homogeneous functions

11.1 Optimization Problems

11.1.1 Critical Points


Let U ⊆ Rn be open and non-empty. We discuss the relationship between derivatives and
extrema of real-valued functions f : U → R. As with functions in one variable, the vanishing
of the derivative is a necessary, but not sufficient, condition for the presence of an extremum.

11.1. — Recall that an element x0 ∈ U is called a local maximum of f if there exists


δ > 0 such that f (x) ≤ f (x0 ) for all x ∈ B(x0 , δ). We say x0 is an isolated local maximum
or strict local maximum if there exists δ > 0 such that f (x) < f (x0 ) for all x ∈ B(x0 , δ)
with x ̸= x0 . The definition of a local minimum is analogous, and collectively, we refer to
them as local extrema.

Proposition 11.2: The differential vanish at a local extremum


Let U ⊂ Rn be open, f : U → R be a function, and let x0 ∈ U be a point where f is
differentiable and assumes a local extremum. Then Df (x0 ) = 0.

Proof. Without loss of generality, assume that f attains a local maximum at x0 . For all
j ∈ {1, . . . , n} and sufficiently small h > 0, we have, by assumption,

f (x0 + hej ) − f (x0 ) ≤ 0 and f (x0 − hej ) − f (x0 ) ≤ 0

and, consequently, ∂j f (x0 ) = 0 due to

f (x0 + hej ) − f (x0 ) f (x0 − hej ) − f (x0 )


∂j f (x0 ) = lim ≤ 0, ∂j f (x0 ) = lim ≥ 0.
h→0 h h→0 −h

62
Chapter 11.1 Optimization Problems

Since j ∈ {1, . . . , n} was arbitrary, Df (x0 ) = 0.

Interlude: The sign of a Square Symmetric Matrix (F)

Let A ∈ Rn×n be a symmetric matrix and let


X
qA (x) := Ai,j xi xj , x ∈ Rn ,
i,j

be its associated quadratic polynomial. We say that

(1) A is positive definite if qA (x) > 0 for all x ̸= 0.

(2) A is negative definite if qA (x) < 0 for all x ̸= 0.

(3) A is indefinite if qA takes both positive and negative values.

(4) A is degenerate if det A = 0.

These conditions can be equivalently stated in terms of the eigenvalues of A (which can
be diagonalised thanks to the Spectral Theorem).

(1) All eigenvalues are positive.

(2) All eigenvalues are negative.

(3) At least one eigenvalue is positive and at least one is negative.

(4) Zero is an eigenvalue.

Proposition 11.3:
Let U ⊂ Rn be open, f : U → R be twice continuously differentiable, and let x0 ∈ U
with Df (x) = 0. Let H(x0 ) be the Hessian matrix of f at point x0 .

(1) If H(x0 ) is positive definite, then f has a strict local minimum at x0 .

(2) If H(x0 ) is negative definite, then f has a strict local maximum at x0 .

(3) If H(x0 ) is indefinite and non-degenerate, then f has no local extremum at x0 .

(4) If H(x0 ) is degenerate (i.e., has zero determinant), then x0 might, or might not, be
an extremum point: the Hessian test is inconclusive.

Proof. The Hessian matrix H(x0 ) is the matrix of the second derivative of f at x0 as a
symmetric bilinear form D2 f (x0 ) : Rn × Rn → R. Let Q denote the associated quadratic
form.
Q(h) = Df (x0 )(h, h) = ⟨h, H(x0 )h⟩

Version: February 25, 2024. 63


Chapter 11.1 Optimization Problems

By Taylor Theorem (Theorem 10.33), we have


   
1 h
2
f (x0 + h) − f (x0 ) = ∥h∥ Q + α(x0 , h) (11.1)
2 ∥h∥

where α(x0 , h) = o(1) as h → 0. If Q is positive definite, then Q(w) > 0 for all w ∈ Sn−1 =
{v ∈ Rn | ∥v∥ = 1}. Since Sn−1 is compact by the Heine-Borel Theorem ??, and Q is
continuous, by the Compactness of Metric Spaces Theorem ??, there exists c > 0 such that
Q(w) ≥ c for all w ∈ Sn−1 . Furthermore, there exists δ > 0 such that the error term α(x0 , h)
is smaller in absolute value than 2c for h ∈ Rn with ∥h∥ < δ. It follows that
   
1 2 h c c
f (x0 + h) − f (x0 ) ≥ ∥h∥ Q − ≥ ∥h∥2 > 0
2 ∥h∥ 2 4

for all h ∈ B(0, δ) \ {0}, implying that f has a strict local minimum at x0 . If Q is negative
definite, we replace f with −f , which replaces Q with −Q. However, the quadratic form −Q is
positive definite, and thus, −f has a strict local minimum at x0 , proving the second statement
of the corollary.
If Q is indefinite but non-degenerate, then there exist w− , w+ ∈ Sn−1 with Q(w− ) < 0
and Q(w+ ) > 0. For sufficiently small s ∈ R \ {0}, we have |α(x0 , sw− )| < 21 |Q(w− )| and
|α(x0 , sw+ )| < 12 Q(w+ ), and thus, f (x0 + sw− ) − f (x0 ) < 0 and f (x0 + sw+ ) − f (x0 ) > 0
from (11.1). Therefore, f has neither a local minimum nor a local maximum at x0 .

Example 11.4. — The behavior of the following functions f : R2 → R at the point 0 ∈ R2


illustrates the three cases in the corollary. In the indefinite case, it is also referred to as a
saddle point.

The corresponding Hessian matrices are ( 20 02 ), 20 −20 , and −2 0 . If the Hessian matrix
 
0 −2
is degenerate, i.e., if 0 is an eigenvalue of H(x0 ), then generally nothing can be said. The
function f (x, y) = ax4 + by 4 has a local maximum, a local minimum, or neither at 0, and the
Hessian matrix at 0 is the zero matrix regardless of the choice of a and b.

Example 11.5. — Let a, b ∈ R be fixed parameters. We define f : R2 → R by f (x, y) =


x sin(y) + ax2 + by 2 for (x, y) ∈ R2 and consider the point 0 ∈ R2 . We have Df (0) = 0, and
the Hessian matrix of f at 0 is given by
!
2a 1
H=
1 2b
Version: February 25, 2024. 64
Chapter 11.1 Optimization Problems

with det H = 4ab − 1. We obtain the following cases.

• If a > 0 and 4ab − 1 > 0, then H is positive definite, and f has a local minimum at 0.

• If a < 0 and 4ab − 1 > 0, then H is negative definite, and f has a local maximum at 0.

• If det H = 4ab − 1 = 0, then the Hessian matrix is degenerate.

• If 4ab − 1 < 0, then H is indefinite, and 0 is a saddle point.

Exercise 11.6. — Let α ∈ R. Find all points (x, y) ∈ R2 where the derivative of the
function given by f (x, y) = x3 − y 3 + 3αxy vanishes. Determine whether each point is an
extremum and, if so, whether it is a local minimum or maximum.

11.1.2 Convexity
11.1.3 Extrema with Constraints and Lagrange Multipliers
In this paragraph we give fairly concrete recepeis to tackle the following problem

min{f (x) : x ∈ U , g1 (x) = . . . = gk (x) = 0} (11.2)

where

• U ⊂ Rn is open and bounded

• f : U → R is of class C 1

• for some k ≥ 0, g1 , . . . , gk : U → R are C 1 functions, with the understanding that when


k = 0 this constraint is always true.

We are going to prove that

• a minimum point exists x0 ∈ U

• if x0 ∈
/ ∂U, then there are coefficients λ0 , . . . , λk , not all zero, such that

λ0 Df (x0 ) + λ1 Dg(x1 ) + . . . + λk Dgk (x0 ) = 0,

in particular if k = 0 then we simply have Df (x0 ) = 0.

Version: February 25, 2024. 65


Chapter 11.1 Optimization Problems

Proposition 11.7: Lagrange Multipliers

Let U ⊂ Rn be open, Br (x0 ) ⊂ U , k ≤ n and f, g1 , . . . gk functions in C 1 (U, R). Assume


that
X := {x ∈ U : g1 (x) = . . . = gk (x) = 0} = ̸ ∅,

and that f |X has a local minimum at x0 ∈ X:

f (x) ≥ f (x0 ) for all x ∈ X ∩ Br (x0 ).

Then there are real numbers λ0 , . . . , λk such that

λ0 Df (x0 ) + λ1 Dg1 (x0 ) + . . . + λk Dgk (x0 ) = 0, λ20 + . . . + λ2k = 1.

Proof “à la De Giorgi”. Replacing f with f + | · −x0 |4 we may assume that x0 is a strict local
minimum, without changing Df (x0 ).
Step 1. Consider fε (x) := f (x) + 2ε 1
(g12 (x) + . . . + gk (x)2 ), for x ∈ B r (x0 ) and let xε be
its minimum point by compactness.
Step 2. By contradiction one must have fε (xε ) → f (x0 ) and xε → x0 , as ε → 0. Indeed
we have
g12 (xε ) + . . . + gk (xε )2 ≤ 2εfε (xε ) ≤ 2εf (x0 ) → 0,

so if xε → x̄, then x̄ ∈ X. Hence we have

f (x0 ) ≤ f (x̄) = lim f (xε ) ≤ lim inf fε (xε ) ≤ lim sup fε (xε ) ≤ f (x0 ),
ε ε ε

which implies x̄ = x0 (recall that x0 is a strict local maximum). This proves that fε (xε ) →
f (x0 ) and that {xε } can only accumulate at x0 . On the other hand {xε } ⊂ B r (x0 ) is bounded
so it must have accumulation points. Hence we must have xε → x0 as ε ↓ 0 (no need to take
subseqeunces).
Step 3. In particular xε ∈ Br (x0 ) eventually, so

0 = εDfε (xε ) = εDf (xε ) + g1 (xε )Dg1 (xε ) + . . . + gk (xε )Dgk (xε ).

This means that the k + 1 vectors {Df (xε ), Dg1 (xε ), . . . , Dgk (xε )} are linearly dependent,
hence there is a unit vector λε ∈ Rk+1 such that

0 = λε0 Df (xε ) + λε1 Dg1 (xε ) + . . . + λεk Dgk (xε ).

Step 4. Passing this equation to the limit and using the compactness of the k-sphere we
conclude.

11.8. — Often Proposition 11.7 is used in practice in the following way, under the extra
assumption that
Dg(x) has rank k for all x ∈ U,

Version: February 25, 2024. 66


Chapter 11.1 Optimization Problems

with g = (g1 , . . . , gk ). In this situation we consider the so-called Lagrangian function

k
X
m
L:U ×R →R L(x, λ) = f (x) − λj gj (x).
j=1

The components of λ ∈ Rk are called Lagrange multipliers, then Proposition 11.7 guaran-
tees that there exists λ ∈ Rk such that the equations

∂xi L(p, λ) = 0 and ∂λj L(p, λ) = 0

are satisfied for all i ∈ {1, . . . , n} and j ∈ {1, . . . , k}.

Example 11.9. — Consider the function F : R2 → R given by F (x, y) = y 3 − x2 , and the


compact set
K = {(x, y) ∈ [−1, 1]2 | F (x, y) = 0}

and aim to find the minimum of the function f (x, y) = 4y − 3x on K. The set M = K \
{(0, 0), (1, 1), (−1, 1)} is a one-dimensional manifold. We have

f (0, 0) = 0, f (1, 1) = 1, f (−1, 1) = 7.

Now, using the method of Lagrange multipliers, we seek the local extremum values of f |M .
The Lagrange function associated with f and F is given by

L(x, y, λ) = 4y − 3x − λ(y 3 − x2 ).

The partial derivatives are calculated as

∂x L(x, y, λ) = −3 + 2λx
∂y L(x, y, λ) = 4 − 3λy 2
∂λ L(x, y, λ) = −(y 3 − x2 ).

From −3 + 2λx = 0, we deduce λ ̸= 0 and x ̸= 0 with λ = 2x 3


. Similarly, from 4 − 3λy 2 = 0,
we conclude that y ̸= 0 and λ = 3y42 . Thus, 2x 3
= 3y42 or equivalently x = 98 y 2 . Furthermore,
∂λ L(x, y, λ) = −(y 3 − x2 ) = 0. Substituting x = 89 y 2 , we obtain
 
9 2 2 92 4 92
0 = y3 − = y3 − = y3 1 −

8y 82
y 82
y .

2 3
Since y ̸= 0, this yields y = 892 and x = 98 y 2 = 893 . Therefore, using the Lagrange multipliers
method, we find a single additional candidate for extremal values:
 
83 82 2 3
f ,
93 92
= 4 892 − 3 893 = 1.053 . . . .

Version: February 25, 2024. 67


Chapter 11.2 Relevant examples

The set of all points in K where f attains a local extremum on K is thus contained in

3 2
{(0, 0), (1, 1), (1, −1), ( 983 , 892 )}.

The global maximum of f is at the point (1, 1) with a value of 7, and the global minimum is
at the point (0, 0) with a value of 0.

Applet 11.10 (Lagrange Multipliers and Normal Vectors). In this applet, we illustrate Propo-
sition ?? and Corollary ?? using a one-dimensional submanifold. Under this assumption, only
one gradient vector ∇F is present, so Proposition ?? states that ∇F and ∇f should be parallel.

11.2 Relevant examples

11.2.1 Operator norm of a matrix


11.11. — As a first simple application of the Heine-Borel theorem, we want to define a
natural norm on the vector space Matm,n (R) of real m×n matrices. First, note that Matm,n (R)
is isomorphic to Rmn as a real vector space. In particular, any norm on Rmn induces a norm
on Matm,n (R), and vice versa. All such norms are equivalent according to Theorem ??, so we
obtain a canonical induced topology on Matm,n (R) following Exercise ??.

Exercise 11.12. — Show that ⟨A, B⟩ = tr(AB T ) defines an inner product on Matn,n (R).
2
Identify the corresponding norm on Rn .

Definition 11.13: Operator Norm


The operator norm of a matrix A ∈ Matm,n (R) is defined by

∥A∥op = sup {∥Ax∥2 | x ∈ Rn with ∥x∥2 ≤ 1} ,

where ∥ · ∥2 represents the Euclidean norm.

Proposition 11.14:
For m, n ∈ N, the operator norm ∥ · ∥op indeed defines a norm on the vector space
Matm,n (R). Furthermore, the following inequalities hold:

∥Ax∥2 ≤ ∥A∥op ∥x∥2 and ∥AB∥op ≤ ∥A∥op ∥B∥op , (11.3)

for all x ∈ Rn and all A, B ∈ Matm,n (R).

Proof. For the sake of brevity, we write B n = {x ∈ Rn | ∥x∥2 ≤ 1}. According to the Heine-
Borel theorem ??, B n ⊂ Rn is compact. Since the function B n → R given by x 7→ ∥Ax∥2 is
continuous as a composition of continuous functions, by Theorem ??, it is bounded, implying
∥A∥op < ∞.
Version: February 25, 2024. 68
Chapter 11.2 Relevant examples

Definiteness, i.e., ∥A∥op = 0 ⇐⇒ A = 0, and the relations

∥λA∥op = |λ|∥A∥op and ∥A + B∥op ≤ ∥A∥op + ∥A∥op

follow directly from the definition of the operator norm and the corresponding properties of
the Euclidean norm.
For the left inequality in (11.3), it is noted that there is nothing to prove in the case of
x = 0. If x ̸= 0, then
x
∥Ax∥2 = |A ∥x∥ 2
∥2 ≤ ∥A∥op ∥x∥2 .

For the second inequality in (11.3), one calculates ∥ABx∥2 ≤ ∥A∥op ∥Bx∥2 ≤ ∥A∥op ∥B∥op for
all x ∈ B n , proving the claim.

Exercise 11.15. — Show that for B ∈ Matm,m (R) with ∥B∥op < 1, the matrix Im − B is
invertible with
X∞
(1m − B)−1 = Bj .
j=0

Part of the exercise is to show that this sequence converges.


Upgrade your result showing that the set of invertible matrices in Matm,m (R) is open and
that for B ∈ GLm (R), any C ∈ Matm,m (R) with ∥C − B∥op < ∥B −1 ∥−1 op is also invertible.

11.2.2 Fundamental Theorem of Algebra


11.16. — A second application of the Heine-Borel theorem is a proof of the Fundamental
Theorem of Algebra. The Fundamental Theorem of Algebra states that every non-
constant polynomial with complex coefficients has a complex root. This is equivalent to the
statement that the field C is algebraically closed. Polynomial division with remainder and
induction show that any polynomial f ∈ C[T ] of degree n > 0 can be written as the product
of a scalar and n linear factors, i.e.,
n
Y
f (T ) = a (T − αj ) (11.4)
j=1

with a ∈ C× and α1 , . . . , αn ∈ C. There are various proofs of the Fundamental Theorem of


Algebra; some rely on complex analysis, while others on algebraic topology. Here, we will use
the asymptotic behavior of polynomials, the Heine-Borel theorem, and the polar decomposition
of complex numbers.
A Fundamental Theorem can also be formulated for polynomials with real coefficients. We
note that for every f ∈ R[x] with a root α ∈ C, α is also a root. In the factorization of f into
linear factors in (11.4), for every non-real root α ∈ C \ R, we have the factor

(x − α)(x − α) = x2 − (α + α)x + |α|2 = x2 − 2 Re(α)x + |α|2

Version: February 25, 2024. 69


Chapter 11.2 Relevant examples

standing. Applying this insight to every genuinely complex root of f shows that f can be
written as a product of real polynomials of degree 1 and degree 2. We have implicitly used
this in the discussion of the integration of rational functions.

Theorem 11.17: Fundamental Theorem of Algebra


Every non-constant polynomial f ∈ C[T ] has a root in C.

Proof. Let f ∈ C[T ] be a polynomial of degree n > 0. If f (0) = 0, we have already found
a root. So, assume M = 2|f (0)| > 0. Since f is not constant, according to Proposition ??,
there exists an R ≥ 1 such that for all z ∈ C with |z| ≥ R, |f (z)| ≥ M holds. According to
the Heine-Borel theorem ??,
K = {z ∈ C | |z| ≤ R}

is a compact subset. We now apply the existence of the minimum value in Theorem ?? to the
continuous function z ∈ K 7→ |f (z)| ∈ R and find z0 ∈ K with |f (z0 )| = min{|f (z)| | z ∈ K}.
Since |f (0)| = M2 , the inequality |f (z0 )| < M holds. Since |f (z)| ≥ M for all z ∈ C \ K, we
obtain |f (z)| ≥ |f (z0 )| for all z ∈ C.
We claim that z0 is a root of f . Instead, assume that |f (z0 )| > 0, leading to a contradiction.
We represent f as a power series around z0 :
n
X
f (z) = bk (z − z0 )k
k=0

(n)
with b0 = f (z0 ), b1 = f ′ (z0 ), . . . , bn = f n!(z0 ) ∈ C. The existence of this representation
follows from polynomial division with remainder and induction on n = deg(f ). By the as-
sumption on z0 , b0 ̸= 0. Let ℓ ≥ 1 be the smallest index ≥ 1 with bℓ ̸= 0. Set z = z0 +r exp(iφ)
for a fixed φ ∈ R, which we will choose precisely later, and a varying r > 0. We have
 
bℓ ℓ iℓφ
f (z0 + reiφ ) = b0 + bℓ rℓ eiℓφ + O(rℓ+1 ) = b0 1 + b0 r e + O(rℓ+1 )

−ψ+π
as r → 0. Write bℓ
b0 = seiψ and choose φ = ℓ . Then, ei(ℓφ+ψ) = −1 and
   
|f (z0 + reiφ )| = |b0 1 − srℓ + O(rℓ+1 )| ≤ |b0 | 1 − srℓ + O(rℓ+1 )

for r → 0. For sufficiently small r > 0, this upper bound is smaller than |b0 |, leading to a
contradiction with |f (z0 )| = |b0 | = min{|f (z)| | z ∈ K}. This proves that f (z0 ) = 0 must
hold.

Exercise 11.18. — Let U ⊂ C be open, and f : U → C be a complex-valued function that


can be locally represented by power series (an analytic function). More precisely, for every
x0 ∈ U , there exists an r > 0 such that B(x0 , r) ⊂ U , and f on B(x0 , r) is equal to a power
series around x0 with a convergence radius greater than or equal to r. Mimicking the proof of
Theorem 11.17, show that the function z 7→ |f (z)| does not assume a minimum value on U .

Version: February 25, 2024. 70


Chapter 11.2 Relevant examples

11.2.3 Diagonalizability of Symmetric Matrices


With the method of Lagrange multipliers, we can relatively easily prove the following impor-
tant theorem from linear algebra.

Theorem 11.19:
Every symmetric matrix A ∈ Matn,n (R) is diagonalizable, and there exists an orthonor-
mal basis of Rn consisting of eigenvectors of A.

Lemma 11.20:
Let n ≥ 1 and A ∈ Matn,n (R) be a symmetric matrix. Then, A has a real eigenvector.

Proof. Consider the sphere S = Sn−1 as the zero set of the function F : Rn → R, given by
F (x) = ∥x∥2 − 1, and the real-valued function

f : Rn → R, f (x) = xt Ax.

Since S is compact, f |S attains a maximum and a minimum. Assume f |S has an extremum at


p ∈ S. Define

L : Rn × R, L(λ, x) = f (x) − λF (x) = xt Ax − λ(∥x∥2 − 1)

as the Lagrange function associated with S and f . According to Corollary ??, there exists
λ ∈ R such that

∂xj L(p, λ) = ∂j f (p) − λ∂j F (p) = 0 (11.5)

for all j = 1, . . . , n and ∂λ L(p, λ) = F (p) = 0. The latter simply implies ∥p∥ = 1, or in other
words, p ∈ S, as already known.
Now, let’s compute the partial derivatives of F and f . For all x ∈ Rn , we have
n
!
X
∂j F (x) = ∂j x2k = 2xj
k=1
 
n
X n
X n
X
∂j f (x) = ∂j  akℓ xk xℓ  = ajℓ xℓ + akj xk
k,ℓ=1 ℓ=1 k=1

by the product rule, and since ∂j (xk ) is zero unless k ̸= j. Assuming A is symmetric, we
obtain
n
X
∂j f (x) = 2 ajℓ xℓ = 2(Ax)j .
ℓ=1

Thus, (Ap)j = λpj for all j = 1, . . . , n, or equivalently Ap = λp.

Version: February 25, 2024. 71


Chapter 11.2 Relevant examples

Proof of Theorem 11.19. In addition to Lemma 11.20, we need some more linear algebra for
the proof, which we will now carry out by induction on n. For n = 1, there is nothing to
prove. So, let A ∈ Matn,n (R) be a symmetric matrix. According to Lemma 11.20, there exists
a real eigenvector v1 ∈ Sn−1 corresponding to an eigenvalue λ1 ∈ R. Consider the orthogonal
complement

W = v1⊥ = {w ∈ Rn | ⟨w, v1 ⟩ = 0}

of v1 . For w ∈ W , due to the symmetry of A, we have

⟨Aw, v1 ⟩ = ⟨w, Av1 ⟩ = λ1 ⟨w, v1 ⟩ = 0

and it follows that A(W ) ⊂ W . Let w1 , . . . , wn−1 be an orthonormal basis of W with respect to
⟨·, ·⟩ (which exists due to the Gram-Schmidt orthonormalization process). For i, j ∈ {1, . . . , n−
1}, we now have
⟨Awj , wi ⟩ = ⟨wj , Awi ⟩ = ⟨Awi , wj ⟩ .

In other words, the basis representation B of A|W : W → W with respect to the basis
w1 , . . . , wn−1 is symmetric again. By the induction hypothesis, there exists an orthonormal
basis v2 , . . . , vn of eigenvectors of B. Since B (together with the standard basis of Rn−1 )
corresponds exactly to A|W (together with the orthonormal basis w1 , . . . , wn−1 ), there exists
an orthonormal basis v2 , . . . , vn of eigenvectors of A.
Thus, v1 , . . . , vn forms an orthonormal basis of Rn consisting of eigenvectors of A.

Exercise 11.21. — For the sake of completeness, we present an elementary argument for
the proof of Lemma 11.20 using the Fundamental Theorem. Let n ≥ 1 and A ∈ Matn,n (R) be
a symmetric matrix.

(i) Show that all complex eigenvalues of A are real.

(ii) Prove Lemma 11.20 by showing that A has a complex eigenvector if and only if A has a
real eigenvector.

The geometric understanding of the eigenvalues of A gained in our proof can also be utilized
differently. As an example, one can prove a special case of the Courant-Fischer theorem.

(iii) Show that the values


min xt Ax, max xt Ax
x∈Sn−1 x∈Sn−1

represent the smallest and largest eigenvalue of A, respectively.

Exercise 11.22. — For n ≥ 2, prove that two points x, y ∈ Sn−1 have maximum distance
if and only if x = −y. Consider the function (x, y) 7→ ∥x − y∥2 on Sn−1 × Sn−1 ⊂ R2n .

Version: February 25, 2024. 72


Chapter 11.3 Potentials and the equation Du = F

11.3 Potentials and the equation Du = F

11.3.1 The work of a Vector Field along a line

Definition 11.23: Non-Definition of Vector Field


Let U ⊂ Rn be open. A vector field on U is a function F : U → Rn . Continuous,
continuously differentiable, or smooth vector fields are functions F : U → Rn with
corresponding regularity. If we write

F (x) = (F1 (x), . . . , Fn (x)),

then functions {Fi }, from U to Rn , are called the components of the vector field.

11.24. — We often visualize vector fields by drawing the vector F (x) with the point x ∈ U .
In physics, vector fields are often force fields or indicate the flow velocity of a medium.

Definition 11.25: Work of a field along a path


Let U ⊂ Rn be an open subset, and let F : U → Rn be a continuous vector field and
γ : [a, b] → U a continuously differentiable path. We define the work of F along γ as
Z Z b
F := F (γ(t)) · γ ′ (t)dt.
γ a

If γ is piecewise continuously differentiable with respect to a partition a = t0 < t1 <


· · · < tN = b, the integral is interpreted as the sum of integrals over intervals [tk−1 , tk ].

In Physics, one often use the notation


Z Z
F = F⃗ · d⃗ℓ.
γ γ

Lemma 11.26:
Let U ⊂ Rn be an open subset, f : U → Rn be a continuous vector field, let γ : [a, b] →
Rd be a continuously differentiable path and let ψ : [0, 1] → [a, b] be a C 1 function such
that ψ(0) = a, ψ(1) = b. Then Z Z
F = F.
γ γ◦ψ

Proof. This is a consequence of the change of variable formula of Analysis I and the chain
rule:
Z Z 1 Z 1
F dt = F (γ(ψ(t))) · (γ ◦ ψ)′ (t)dt = F (γ(ψ(t))) · γ ′ (ψ(t))ψ ′ (t)dt
γ◦ψ 0 0

Version: February 25, 2024. 73


Chapter 11.3 Potentials and the equation Du = F

Z b Z

= F (γ(s)) · γ (s)ds = F.
a γ

Lemma 11.27: TODO: it seems to me that it is not used later...


Let U ⊂ Rn be open, and let γ : [0, 1] → U be a piecewise continuously differentiable
path. Then there exists a continuously differentiable reparametrization φ = γ ◦ ψ.
Moreover, φ′ (0) = φ′ (1) = 0 can be arranged.

Proof. Let 0 = s0 < s1 < . . . < sN = 1 be a suitable partition of the unit interval. There
exists a smooth function β : [0, 1] → R with non-negative values such that β(sk ) = 0 for
k = 0, 1, 2, . . . , N , and Z sk
β(t)dt = sk − sk−1
sk−1

for k = 1, 2, . . . , N . The function ψ : [0, 1] → [0, 1] given by


Z s
ψ(t) = β(s)ds
0

is continuously differentiable, monotonically increasing, and satisfies ψ(sk ) = sk and ψ ′ (sk ) =


0 for all k. The reparametrization φ = γ ◦ ψ is thus continuously differentiable, and φ′ (sk ) = 0
for all k, in particular, φ′ (s0 ) = φ′ (0) = 0 and φ′ (sN ) = φ′ (1) = 0.

Definition 11.28: Potential of a Vector Field


Let U ⊂ Rn be open, and F : U → Rn be a continuous vector field. A continuously
differentiable function f : U → R is called a potential for F , if F = grad(f ) holds.
That is to say
Di f (x) = Fi (x) for all x ∈ U, 1 ≤ i ≤ n.

If F admits a potential f in U , we say that F is conservative in U .

In the next two propositions we show that a vector field F admits a potential if and only
if the value of γ 7→ γ F depends only on the endpoints of γ.
R

Proposition 11.29:
Let U ⊂ Rn be open, and let F : U → Rn be a continuous vector field. Suppose there
exists a potential f : U → R for F . Then
Z
F = f (γ(1)) − f (γ(0))
γ

for any piecewise continuously differentiable path γ : [0, 1] → U .

Version: February 25, 2024. 74


Chapter 11.3 Potentials and the equation Du = F

Proof. If γ : [0, 1] → U is a continuously differentiable path, then for t ∈ [0, 1], F (γ(t)) =
grad f (γ(t)) = Df (γ(t)), and thus
Z Z 1 Z 1 Z 1
′ ′
F dt = F (γ(t)), γ (t) dt = Df (γ(t))(γ (t))dt = (f ◦ γ)′ (t)dt = f (γ(1)) − f (γ(0))
γ 0 0 0

by the chain rule. If γ is only piecewise continuously differentiable with respect to a partition
0 = s0 < s1 < . . . < sN = 1, the calculation can be applied to the subintervals [sk−1 , sk ]. This
leads to a telescoping sum where all terms except f (γ(1)) − f (γ(0)) cancel.

11.30. — One of the many physical interpretations of such path integrals is the calculation
of work along a path γ. Suppose F (x) indicates the direction and strength of a force field at
point x ∈ U . Then, ⟨F (γ(t)), γ(t + δ) − γ(t)⟩ is approximately the work done when an object
moves along the path γ from γ(t) to γ(t + δ). This leads to the interpretation of γ F dt as the
R

performed work along the path γ. This total work generally depends on the chosen path, not
just the starting point γ(a) and the endpoint γ(b). However, the work done does not depend
on the chosen parameterization of the path, as shown in Lemma 11.26.

Example 11.31. — Consider the vector field F on R2 defined by F (x, y) = (−y, x) and
calculate the integral of F along different paths from (0, 0) to (1, 1).

Figure 11.1: The vector field F - vector lengths are scaled by a factor of 0.1.

Let γ0 , γ1 , and γ2 : [0, 1] → R2 be the paths from (0, 0) to (1, 1) given by γ0 (t) = (t, t) and
 
(2t, 0) if t ∈ [0, 21 ] (0, 2t) if t ∈ [0, 21 ]
γ1 (t) = γ2 (t) =
(1, 2t − 1) if t ∈ [ 1 , 1] (2t − 1, 1) if t ∈ [ 1 , 1]
2 2

Version: February 25, 2024. 75


Chapter 11.3 Potentials and the equation Du = F

Then, Z Z 1 Z 1
F = F (γ0 (t)), γ0′ (t) dt = ⟨(−t, t), (1, 1)⟩ dt = 0
γ0 0 0
1
Z Z
2
Z 1
F = ⟨(0, 2t), (2, 0)⟩ dt + ⟨(1 − 2t, 1), (0, 2)⟩ dt = 1
1
γ1 0 2
1
Z Z
2
Z 1
Fd = ⟨(−2t, 0), (0, 2)⟩ dt + ⟨(−1, 2t − 1), (2, 0)⟩ dt = −1
1
γ2 0 2

We see that the work performed γ F depends on the chosen path γ. If one moves perpendicular
R

to the vector field, no work is done. If one moves with the vector field, positive work is done,
and if one moves against the vector field, negative work is done.
From these calculations, it follows in particular that the vector field F does not possess a
potential.

Exercise 11.32. — Let U ⊂ Rn be open and connected, and let x0 , x1 ∈ U . Show that x0
and x1 can be connected by a continuously differentiable path.

Exercise 11.33. — A loop in an open subset U ⊂ Rn is a path γ : [0, 1] → U with


γ(0) = γ(1). Show that a continuous vector field F : U → Rn is conservative if and only if,
for every piecewise continuously differentiable loop γ in U ,
Z
F =0
γ

holds.

Proposition 11.34:
Let U ⊂ Rn be open, and F : U → Rn be a continuous vector field. Assume that, for
all piecewise continuously differentiable paths γ : [0, 1] → U and σ : [0, 1] → U , with
γ(0) = σ(0), γ(1) = σ(1) it holds
Z Z
F = F.
γ σ

Then F is conservative, i.e., it has a continuously differentiable potential f : U → R.

Proof. Let x0 ∈ U be a fixed point. Since U is connected, there exists, according to Exercise
11.32, for every x ∈ U , a piecewise continuously differentiable path γx in U with initial point
x0 and endpoint x. We consider the function f : U → R given by
Z
f (x) = F
γx

Version: February 25, 2024. 76


Chapter 11.3 Potentials and the equation Du = F

which does not depend on the chosen paths γx by assumption. For k = 1, 2, . . . , n, we verify
that ∂k f (x) = Fk (x), the k-th component of F (x) ∈ Rn . Let x ∈ U and h ∈ R \ {0} be small
enough so that x + thek ∈ U for all t ∈ [0, 1]. Using the path γx : [0, 1] → U from x0 to x, we
can define a path φh from x0 to x + hek as

γ (2t)
x if 0 ≤ t ≤ 21
φh (t) =
x + (2t − 1)he if 1 ≤ t ≤ 1
k 2

For the partial derivative ∂k f , we obtain


Z 
f (x + hek ) − f (x)
Z
1
∂k f (x) = lim = lim F− F
h→0 h h→0 h φh γx
1 1
Z Z 1
= lim ⟨F (x + thek ), hek ⟩ dt = lim Fk (x + shek )ds = Fk (x)
h→0 h 0 h→0 0

due to Theorem 14.80 and the continuity of F . Since this holds for all x ∈ U and k =
1, 2, . . . , n, and F1 , . . . , Fn are continuous by assumption, it follows from Theorem 10.10 that
f is differentiable and grad f (x) = F (x) for all x ∈ U .

Corollary 11.35:
Let U ⊂ Rn be open, and let F be a continuously differentiable conservative vector
field on U , with components F1 , . . . , Fn . Then we necessarily have the integrability
conditions:
∂j Fk = ∂k Fj

for all pairs j, k ∈ {1, . . . , n}.

Proof. If F is conservative, then, according to Theorem 11.34, there exists a differentiable


function f with grad f = F . Then, for j, k ∈ {1, . . . , n}, we have

∂j Fk = ∂j ∂k f = ∂k ∂j f = ∂k Fj . (11.6)

We are allowed to commute second derivatives because f is C 2 (U ), since grad f = F ∈


C 1 (U, Rn ), hence Schwarz Lemma applies.

11.3.2 The Poincaré Lemma


For a continuously differentiable vector field F on U ⊂ Rn to satisfy the partial differential
equations (11.6) established in Corollary 11.35 is a necessary condition for the existence of
a potential for F . We call these differential equations integrability conditions and aim
to investigate to what extent they are also sufficient for the existence of a potential for F .
The following example shows that the equations 11.35 do not generally imply that F has a
potential.

Version: February 25, 2024. 77


Chapter 11.3 Potentials and the equation Du = F

Example 11.36. — Let U = R2 \ {0}, and consider the vector field F : U → R2 given by
 
−y x
F (x, y) = , 2
x + y x + y2
2 2

for (x, y) ∈ U . A direct calculation shows

−x2 + y 2
   
x −y
∂1 F2 (x, y) = ∂x = 2 = ∂y = ∂2 F1 (x, y)
x + y2
2 (x + y 2 )2 x + y2
2

thus satisfying the integrability conditions (11.6) throughout U . However, F is not conserva-
tive. Let γ : [0, 1] → U be the continuously differentiable loop defined by

γ(t) = (cos(2πt), sin(2πt))

which rotates once counterclockwise around the unit circle. Then,


Z Z 1
F dt = 2π ⟨(− sin(2πt), cos(2πt)) , (− sin(2πt), cos(2πt))⟩dt = 2π,
γ 0

even though γ is a closed path with γ(0) = γ(1) = (1, 0).

In order to state the main theorem of this section we need to introduce the concept of
homotopy of curves.

Definition 11.37:
Let X be a metric space, and let γ0 and γ1 be paths in X with the same initial point
x0 = γ0 (0) = γ1 (0) and the same endpoint x1 = γ0 (1) = γ1 (1).
A homotopy from γ0 to γ1 is a continuous function H : [0, 1] × [0, 1] → X with the
following properties:

H(0, t) = γ0 (t), H(1, t) = γ1 (t) and H(s, 0) = x0 , H(s, 1) = x1

for all t ∈ [0, 1] and all s ∈ [0, 1]. We say γ1 is homotopic to γ0 if there exists a
homotopy from γ0 to γ1 .

11.38. — Let H be a homotopy from γ0 to γ1 as in the definition. For each fixed s ∈ [0, 1],
the function γs : t 7→ H(s, t) is a path from x0 to x1 . For s = 0 and s = 1, we obtain the
given paths γ0 and γ1 . This way, we can view the homotopy H as a parametrized family of
paths depending continuously on the parameter s ∈ [0, 1].

Exercise 11.39. — Let X be a topological space, x0 , x1 ∈ X, and P (x0 , x1 ) denote the


set of all paths in X from x0 to x1 . Show that the relation

γ0 ∼ γ1 ⇐⇒ γ1 is homotopic to γ0

Version: February 25, 2024. 78


Chapter 11.3 Potentials and the equation Du = F

Figure 11.2: A non-connected space X, a connected but not simply connected space Y , and
a simply connected space Z.

on the set P (x0 , x1 ) is an equivalence relation.

Definition 11.40: Simply connected space


A metric space X is called simply connected if it is path-connected and if, for all
x0 , x1 ∈ X, all paths from x0 to x1 are homotopic to each other.

11.41. — A connected topological space X is called simply connected if every path γ


from x0 to x1 in X can be continuously transformed into any other path from x0 to x1 .

Exercise 11.42. — Show that a simply connected open set X ⊂ Rn does not have “co-
dimension two” holes. This is the precise meaning: every f : ∂D → X can be extended to some
F : D → X where D denotes the closed bi-dimensional disk. In some sense, any (distorted)
loop bounds a (distorted) disk in X.

Exercise 11.43. — Let X be a topological space, and let γ : [0, 1] → X be a path from
x0 ∈ X to x1 ∈ X.

(i) (Reversing Paths) Show that t ∈ [0, 1] 7→ γ(1 − t) is a path from x1 to x0 .

(ii) (Concatenation of Paths) Suppose γ̃ : [0, 1] → X is a path from x1 to x2 ∈ X. Show


that 
γ(2t) if 0 ≤ t ≤ 21
t 7→
γ̃(2t − 1) if 1 < t ≤ 1
2

defines a path from x0 to x2 .

(iii) Repeat the proof of Proposition 9.89 using this.

Interlude: Domains in Analysis


Let U ⊂ Rn be a non-empty subset, we say that

(1) is connected: The set U cannot be written as the disjoint union of two open,
non-empty subsets of U (Definition 9.79).

Version: February 25, 2024. 79


Chapter 11.3 Potentials and the equation Du = F

(2) is path-connected: For any two points x0 , x1 ∈ U , there exists a path in U from
x0 to x1 (Definition 9.86).

(3) is simply connected: For any two points x0 , x1 ∈ U , there exists a path in U from
x0 to x1 , and between any two such paths, there exists a homotopy (Definition ??).

(4) is star-shaped: There exists an x0 ∈ U such that for all x1 ∈ U and t ∈ [0, 1],
(1 − t)x0 + tx1 ∈ U .

(5) is convex: For all x0 , x1 ∈ U and all t ∈ [0, 1], (1 − t)x0 + tx1 is also in U .

Exercise 11.44. — Show that for a subset U ⊂ Rn , the implications

(5) =⇒ (4) =⇒ (3) =⇒ (2) =⇒ (1)

hold among the properties listed above. Find examples of subsets in R2 that demonstrate the
failure of each reverse implication.

Theorem 11.45: Poincaré Lemma


Let U ⊂ Rn be open, and let F : U → Rn be a continuously differentiable vector field
that satisfies the integrability conditions

∂k Fj = ∂j Fk (11.7)

for all j, k ∈ {1, . . . , n}. Let γ0 : [0, 1] → U and γ1 : [0, 1] → U be piecewise continuously
differentiable paths with the same initial point x0 and the same endpoint x1 . If γ0 and
γ1 are homotopic, then Z Z
F = F.
γ0 γ1

Theorem 11.45 is an example of a so-called global integration theorem because, it has


something to do with the global nature of the domain U . This is evident in the following
important

Corollary 11.46:
Let U ⊂ Rn be open and simply connected. A continuously differentiable vector field on
U is conservative if and only if it satisfies the integrability conditions (11.7).

Proof. This directly follows from Theorem 11.45, Proposition 11.34 and Definition 11.40.

Before proving Theorem 11.45 we show a simpler proof under the additional assumption
that U is convex.

Version: February 25, 2024. 80


Chapter 11.3 Potentials and the equation Du = F

Lemma 11.47: Poincaré Lemma for convex domains


Let U ⊂ Rn be open and convex, and let F : U → Rn be a continuously differentiable
vector field that satisfies the integrability conditions (11.7). Then F is conservative in
U.

Proof. The necessity of the integrability conditions was already proven in Corollary 11.35. For
the converse, assume without loss of generality that 0 ∈ U . We use the path integral of F
along the straight line from 0 to x ∈ U to define a function F : U → R by
Z 1
f (x) = ⟨F (tx), x⟩dt
0

for x ∈ U . As per the proof of Theorem 11.34, f is a candidate for a potential of F . Fix
j ∈ {1, . . . , n} and consider, as a preparation for the computation of ∂j f , for h ∈ Rn

n
X n
X
∂h Fj = hk ∂k Fj = hk ∂j Fk (11.8)
k=1 k=1

by the assumed integrability conditions. According to the Theorem 14.80, ∂j f exists, and for
x ∈ U it holds
n n
Z 1 X ! Z 1 X !
∂j f (x) = ∂j Fk (tx)xk dt = (∂j Fk )(tx)txk + Fj (tx) dt, (11.9)
0 k=1 0 k=1

since only the term with k = j requires the product rule, and the partial derivative of x 7→
Fk (tx) with respect to xj is given by t(∂j Fk )(tx) for x ∈ U , following from the chain rule. We
use (11.8) for h = x in (11.9) and obtain, by partial integration,
Z 1 Z 1 Z 1 Z 1
∂j f (x) = t ∂x Fj (tx) dt + Fj (tx)dt = [tFj (tx)]10 − Fj (tx)dt + Fj (tx)dt = Fj (x)
0 | {z } 0 0 0

where we have recognized the derivative with respect to t of Fj (tx) in the underbraced term.
Thus, F = ∇f , and f is continuously differentiable by Theorem 10.10.

Exercise 11.48. — For which values of λ ∈ R is the vector field F : R2 → R2 defined by

F (x, y) = λx exp(y), (y + 1 + x2 ) exp(y)




conservative? Determine a potential for f for these values.

11.49. — In rough terms, the proof of Theorem 11.45 goes as follows. Let U ⊂ Rn be
open, and let x1 , x2 ∈ U . We equip the set

Ω := {γ : [0, 1] → U | γ(0) = x0 , γ(1) = x1 , and γ is continuous}

Version: February 25, 2024. 81


Chapter 11.3 Potentials and the equation Du = F

with the distance given by d(γ0 , γ1 ) := sup{∥γ0 (t) − γ1 (t)∥ | 0 ≤ t ≤ 1}, so that (Ω, d) is a
metric space whose “points” are in fact paths. In particular concepts like continuous functions
and connectedness applies to this “abstract” metric space. We are also interested in the subset

Ω′ := {γ : [0, 1] → U | γ(0) = x0 , γ(1) = x1 , and γ is piecewise continuously differentiable}.

Lemma 11.50:
Ω′ is dense in Ω with respect to the distance d.

Proof. We need to prove that any continuous path γ can be approximated, in the distance
d, by a piecewise continuously differentiable path θ having the same extrema. It is of course
enough to approximate each component of γ.
Once we observe that each γi : [0, 1] → R is uniformly continuous, it is sufficient to approx-
imate it by linear interpolation of a sufficiently fine mesh.

Lemma 11.51: Homotopies are “Paths of Paths”


The paths γ0 , γ1 ∈ Ω are homotopic if and only if there exists a continuous path φ :
[0, 1] → Ω with φ(0) = γ0 and φ(1) = γ1 .

Proof. If such an homotopy H exists we set



φ(s) := [0, 1] ∋ t 7→ H(s, t) , s ∈ [0, 1],

and this function is continuous. Vice-versa, if such φ exists we set

H(s, t) := (φ(s))(t), (s, t) ∈ [0, 1]2 .

Checking the continuity of these maps is an instructive exercise.

Proof of Theorem 11.45. Under the hypotheses of Theorem 11.45, let x0 and x1 be elements
of U , and define Ω, Ω′ as in 11.49. We consider the function
Z Z 1
I : Ω′ → R I(γ) = F = ⟨F (γ(t)), γ ′ (t)⟩dt.
γ 0

Step 1. We claim that for every point σ ∈ Ω, there exists an ϵ > 0 such that I is constant
on the open ball B(σ, ϵ) ∩ Ω′ (of course this ball is taken with respect to the distance d).
Fix σ ∈ Ω. The image σ([0, 1]) ⊂ U is compact, U ⊂ Rn is open and therefore there exists
an ϵ > 0 such that
B(σ(t), 2ϵ) ⊂ U for all t ∈ [0, 1],

where B(σ(t), ϵ) here denotes the open ball in Rn centered at σ(t) with radius ϵ.

Version: February 25, 2024. 82


Chapter 11.3 Potentials and the equation Du = F

For γ, γ0 ∈ B(σ, ϵ) ∩ Ω′ and all s, t ∈ [0, 1], we define

H(s, t) := sγ(t) + (1 − s)γ0 (t) ∈ B(σ(t), 2ε) ⊂ U.

Notice that t 7→ H(s, t) is a piecewise continuously differentiable path in Rn (i.e., belongs to


Ω′ ), so we may consider the parameter integral
Z 1
E(s) := I(sγ + (1 − s)γ0 ) = ⟨F (sγ(t) + (1 − s)γ0 (t)), sγ ′ (t) + (1 − s)γ0′ (t)⟩dt.
0

A direct but somewhat tedious calculation (we postpone it to the end of the proof) shows
E ′ (s) = 0 and so
I(γ0 ) = E(0) = E(1) = I(γ),

the claim is proved.


Step 2. We claim that the function I : Ω′ → R admits a (unique) continuous extension I¯
to the whole Ω.
We first define I.
¯ For any σ ∈ Ω pick εσ > 0 as in step 1 and some γσ ∈ B(σ, εσ ) ∩ Ω′ ,
which exists thanks to Lemma 11.50, then set

¯
I(σ) := I(γσ ).

Thanks to step 1, the value of I(σ)


¯ does not depend on the specific curve γσ that we picked.
We check that I is locally constant (and thus continuous) around any point σ0 . Pick σ, σ0
¯
and εσ , εσ0 as in step 1 and assume d(σ, σ0 ) < εσ0 /10. Then, by density, pick some

γσ ∈ B(σ0 , εσ0 ) ∩ B(σ, εσ ) ∩ Ω′ .

By definition γσ is sufficiently close to both of them to say that I(σ


¯ 0 ) = I(γσ ) = I(σ).
¯
Step 3. We show that I(γ0 ) = I(γ1 ). We proved that the set

¯
{γ ∈ Ω | I(γ) ¯ 0 )}
= I(γ

is open (I¯ being locally constant) and closed (I¯ being continuous). Thus it must contain the
path-connected component of γ0 . If γ0 ∈ Ω and γ1 ∈ Ω are homotopic paths as in the theorem,
then, according to Lemma 11.51, γ0 and γ1 lie in the same path-connected component of Ω.
It follows that I(γ0 ) = I(γ1 ), completing the proof.
Step 4. We conclude checking that indeed E ′ (s) = 0. Observe that

∂t H(s, t) = sγ ′ (t) + (1 − s)γ0′ (t), ∂s H(s, t) = γ(t) − γ0 (t), ∂st H = ∂ts H.

So we have
X d Z 1

E (s) = Fk (H(s, t))(sγk′ (t) + (1 − s)γ0,k

(t)) dt
ds 0
k

Version: February 25, 2024. 83


Chapter 11.3 Potentials and the equation Du = F

XZ 1  
= ∂s Fk (H(s, t))∂t Hk (s, t) dt
k 0
XZ 1 XZ 1
= ∂ℓ Fk (H(s, t))∂s Hℓ (s, t)∂t Hk (s, t) dt + Fk (H(s, t))∂ts Hk (s, t) dt.
k,ℓ 0 k 0

The trick here is to integrate by parts the second term to cancel the first one, exploiting both
the integrability conditions and γ, γ0 having the same ends.

XZ 1 XZ 1
Fk (H(s, t))∂ts Hk (s, t) dt = Fk (H(s, t))(γk′ (t) − γ0,k

(t))
k 0 k 0
XZ 1
d
= Fk (H(s, t)) (γk (t) − γ0,k (t))
0 dt
k
XZ 1
d 
=− Fk (H(s, t)) (γk (t) − γ0,k (t))
0 dt
k
XZ 1
=− ∂ℓ Fk (H(s, t))∂t Hℓ (s, t)∂s Hk (s, t) dt
k,ℓ 0

Now, exchanging the role of the indexes k, ℓ in the two sums, and using ∂k Fℓ = ∂ℓ Fk , one
finds
XZ 1 XZ 1
∂ℓ Fk (H) ∂s Hℓ ∂t Hk dt = − ∂ℓ Fk (H) ∂t Hℓ ∂s Hk dt.
k,ℓ 0 k,ℓ 0

Applet 11.52 (Integrability Conditions). What different values for the path integral can you
obtain when considering closed paths? Why does the value of the path integral usually not
change, but sometimes does when you move the middle three points?

Version: February 25, 2024. 84


Chapter 12

Foundations of Differential Geometry

12.1 The inverse function Theorem (F)

Lemma 12.1: small Lipschitz perturbations of the identity (F)

Let (V, ∥ · ∥) be a complete normed vector space, B1 ⊂ V be the unit ball in the | · | norm,
and let F : B1 → V a function of the form F (x) = x + ϕ(x), with Lip(ϕ) ≤ λ < 1.
Then

(1) B1−λ (F (0)) ⊂ F (B1 ) ⊂ B1+λ (F (0)), in particular F (B1 ) is open,

(2) F is injective and F −1 : F (B1 ) → B1 is of the form F −1 (y) = y + ψ(y), with


λ
Lip(ψ) ≤ 1−λ .

Proof. (F) We start with the first inclusion in (1), which is the core of this proof. For y ∈
B(F (0), 1 − λ) consider the recurrence

x0 = 0, xk+1 = y − ϕ(xk ).

If we prove that xk → x̄, then x̄ = y − ϕ(x̄), that is we solved y = F (x̄). Using the triangular
inequality and the contraction property of ϕ we find

|xk+1 − xk | ≤ |ϕ(xk ) − ϕ(xk−1 )| ≤ λ|xk−1 − xk | ≤ . . . ≤ λk |x1 − x0 | = λk |y − F (0)|,

which proves that {xk } is Cauchy. We are not done yet though, we have to check that {xk }
is well defined, i.e., never escapes from B1 , as we are computing ϕ on it. This is checked by
the same computation, for all k it holds

k k
X X |y − F (0)|
|xk+1 | ≤ |xi+1 − xi | = |y − F (0)| λi ≤ < 1.
1−λ
i=0 i=0

Now, we prove that F (B1 ) is open. Let Br (x0 ) ⊂ B1 and y0 = F (x0 ). Then the map
Fx0 ,r (x) := x + rϕ((x − x0 )/r), defined in B1 , is again a λ-Lipschitz perturbation of the

85
Chapter 12.1 The inverse function Theorem (F)

identity. Applying what we just proved to Fx0 ,r , we find

F (B1 ) ⊃ F (Br (x0 )) = Fx0 ,r (B1 ) ⊃ B1−λ (Fx0 ,r (0)) = B(1−λ)r (F (x0 )) = B(1−λ)r (y0 ),

which means that F (B1 ) is open.


Finally, the second inclusion in (1) is readily checked

|F (x) − F (0)| = |x + ϕ(x) − ϕ(0)| ≤ |x| + λ|x| < 1 + λ.

We turn to (2). First of all, if F (x) = F (x′ ), then x − x′ = ϕ(x) − ϕ(x′ ), which is in
contradiction with λ < 1. So F is injective and F −1 is a well-defined function. We call
ψ(y) := F −1 (y) − y, y ∈ F (B1 ). Then ψ satisfies

y = F (F −1 (y)) = y + ψ(y) + ϕ(y + ψ(y)), y ∈ F (B1 ).

This readily implies

|ψ(y) − ψ(y ′ )| ≤ |ϕ(y + ψ(y)) − ϕ(y ′ + ψ(y ′ ))| ≤ λ|y − y ′ + ψ(y) − ψ(y ′ )|
≤ λ|y − y ′ | + λ|ψ(y) − ψ(y ′ )|,

which is what we wanted to prove (again, we use λ < 1).

Lemma 12.2: Automatic Regularity of the inverse


Let U, V ⊂ Rn be open sets and let f : U → V and g : V → U be bijective functions.
Assume that f is differentiable at x̄ ∈ U with det Df (x̄) ̸= 0, that g is Lipschitz in V ,
and that
f (g(y)) = y, for all y ∈ V.

Then g must be differentiable at ȳ = f (x̄) and Dg(ȳ) = [Df (x̄)]−1 .

Proof. Let us set A := Df (x̄). By the differentiability assumption, as y → ȳ,

y = f (x̄ + (g(y) − g(ȳ))) = f (x̄) + A(g(y) − g(ȳ)) + o(|g(y) − g(ȳ)|).

Using that A is invertible, we re-write this as

g(y) = g(ȳ) + A−1 (y − ȳ) + o(∥A−1 ∥op |g(y) − g(ȳ)|). (12.1)

We conclude noticing that, since g is Lipschitz,


 |g(y) − g(ȳ)| 
o(∥A−1 ∥op |g(y) − g(ȳ)|) = o |y − ȳ| = O(1)o(|y − ȳ|) = o(|y − ȳ|).
|y − ȳ|

Thus (12.1) is saying that g is differentiable at ȳ, with differential A−1 .

Version: February 25, 2024. 86


Chapter 12.1 The inverse function Theorem (F)

Lemma 12.3: Smoothness of the inverse


Let U ⊂ Rn×n be the set of invertible matrices, and let θ : U → U be defined as

θ : X 7→ X −1 .

Then θ ∈ C ∞ .

Proof. The formula for the inverse matrix expresses the (p, q) entry of X −1 as a polynomial
of {Xi,j } divided by the polynomial det X (which is nonzero in U ).

Theorem 12.4: Inverse function (F)

Let U ⊂ Rn be an open set and ϕ ∈ C 1 (U, Rn ) with det Dϕ(x̄) ̸= 0 at some x̄ ∈ U .


Then, for a sufficiently small r > 0,

(1) ϕ is injective in Br (x̄),

(2) the set ϕ(Br (x̄)) is open,

(3) the inverse function ψ : ϕ(Br (x̄)) → Br (x̄) is of class C 1 and

Dψ(y) = [Dϕ(ψ(y))]−1 , for all y ∈ ϕ(Br (x̄)).

Furthermore, if ϕ ∈ C k (U, Rn ) for some k ≥ 1, then also ψ ∈ C k (ϕ(Br (x̄)), U ).

Proof. Up to replacing ϕ(x) with ϕ(x − x̄) − ϕ(x̄), we can assume 0 = x̄ = ϕ(x̄).
Let A := Dϕ(x̄), then we claim that F (x) := A−1 ◦ ϕ(x) is a Lipschitz perturbation of the
identity in Br for r > 0 small. Indeed, as x → 0, we have

∥D(F (x) − x)∥op = ∥DF (x) − 1n ∥op = ∥A−1 ◦ (Dϕ(x) − A)∥op ≤ ∥A−1 ∥op ∥Dϕ(x) − A∥op → 0,

so choosing r small enough we have

det Dϕ(x) ̸= 0 and ∥D(F (x) − x)∥op ≤ 21 , for all x ∈ Br .

By Corollary 10.25 we deduce that Lip(F (x) − x, Br ) ≤ 1/2. By Lemma 12.1, we find that
F (Br ) is open, that F |Br is injective and that the inverse function G : F (Br ) → Br is Lipschitz.
Now (1) holds because ϕ = A ◦ F, which is a composition of injective functions.
For (2) we need to show that ϕ(Br ) = A(F (Br )) is open. But F (Br ) is open and an
invertible linear map sends open sets to open sets.
For (3) we set ψ(y) := G(A−1 y) for y ∈ ϕ(Br ) = A(F (Br )). Then it holds

z = A−1 ϕ(G(z)) for all z ∈ F (Br ),

Version: February 25, 2024. 87


Chapter 12.2 Manifolds in Rn (F)

which means, setting z := Ay, that

y = ϕ(ψ(y)), for all y ∈ ϕ(Br ).

Since ϕ is differentiable at every point of Br , and ψ is Lipschitz, then Lemma 12.2 shows that
ψ is differentiable for all y ∈ ϕ(Br ), and

Dψ(y) = [Dϕ(ψ(y))]−1 , for all y ∈ ϕ(Br (x̄)).

Finally, assume that ϕ ∈ C k and that you known ψ ∈ C k−1 . The formula for Dψ give it
as the composition of the following functions:

ψ : y 7→ ψ(y) which is of class C k−1 ,


Dϕ : x 7→ Dϕ(x) which is of class C k ,
θ : X 7→ X −1 which is of class C ∞ on the set of invertible matrices by Lemma 12.3.

Thus Dψ ∈ C k−1 , nr which means that ψ ∈ C k (bootstrap). This proves the last part of the
statement by induction from the k = 1 case.

Definition 12.5:
Let U, V ⊂ Rn be open. A bijective, smooth function f : U → V with a smooth inverse
f −1 : V → U is called a diffeomorphism. If f and f −1 are both d-times continuously
differentiable for d ≥ 1, we call f a C d -diffeomorphism.

12.2 Manifolds in Rn (F)

12.2.1 Definition of Manifold


Let us introduce some rough intuition. There are three common ways to give a geometric
shape of dimension k in Rn :

• Cartesian. This means describing the set as the set of points that solves n − k given
equation(s).

• Parametric. This means identifying each point in the set with k numerical parameters,
which lie in some range and don’t necessarily have direct geometric meaning (think about
generalized coordinates in Lagrangian mechanics).

• Graphical. Describe the coordinates of the points in the set prescribing how the last
n − k coordinates depend on the first k coordinates.

Interlude: Linear space in Rn (F)

A set X ⊂ Rn is a k-dimensional linear subspace if one of the following equivalent


conditions holds:

Version: February 25, 2024. 88


Chapter 12.2 Manifolds in Rn (F)

(1) There is a linear map f : Rn → Rn−k , with maximal rank (i.e., with rank n − k),
such that X = {f = 0}. That is, X is described as the set of solutions of a system
of n − k equations.

(2) There is a linear map f : Rk → Rn , with maximal rank (i.e., with rank k), such
that X = f (Rk ). That is, X is parametrised with a linear function of k variables.

(3) up to reordering the coordinates, X coincides with the graph of a linear map
f : Rk → Rn−k . In other words, (n − k) coordinates of every point in X can
be expressed as a linear function of the other k coordinates.

Let us recall that proving the equivalence of these definitions is not trivial in the linear case,
as one needs essentially to learn how to solve linear equations in order to pass from one
representation of X to the other.
Now we aim to define a suitable concept of k-dimensional surface in Rn , using C 1 maps
rather than linear maps. This is indeed possible, paying the price of working “locally” around
each point. In this framework, the Inverse Function Therem is the key tool that allows us to
“locally solve non-singular equations”.

Definition 12.6: Manifold


Let 0 < k < n be integers and X ⊂ Rn be a nonempty set. We say that X is a k-
dimensional manifold embedded in Rn if, for every p ∈ X, one of the following
three equivalent conditions holds:

(1) There is f ∈ C 1 (Rn , Rn−k ) such that the matrix Df (p) has maximal rank (i.e., has
rank n − k) and for r > 0 sufficiently small

X ∩ Br (p) = {f = 0} ∩ Br (p).

(2) There is f ∈ C 1 (Rk , Rn ) such that f (0) = p, the matrix Df (0) has maximal rank
(i.e., has rank k) and for s, r > 0 sufficiently small

X ∩ Br (p) = f (Bs ) ∩ Br (p),

where Bs ⊂ Rk .

(3) There is f ∈ C 1 (Rk , Rn−k ) such that for r > 0 sufficiently small

X ∩ Br (p) = P (graph(f )) ∩ Br (p),

where P : Rn → Rn is some permutation of the coordinates.

Finally, we define a 0-dimensional submanifold as a discrete set of points in Rn and a


n-dimensional submanifold as an open subset of Rn .

Version: February 25, 2024. 89


Chapter 12.2 Manifolds in Rn (F)

The fact that these three conditions are equivalent is not obvious and will be proved by
the means of the Inverse Function Theorem.
We will prove that (1) =⇒ (3) =⇒ (2) =⇒ (1).
We just record that (3) =⇒ (2) is straightforward since the graph of a function f : Rk →
Rn−k coincides, by definition, with the image of the function f˜: Rk → Rn given by f˜(x) =
(x, f (x)).

Theorem 12.7: (1) =⇒ (3), a.k.a. the Implicit Function Theorem

Let U ⊂ Rn × Rm be open and let F ∈ C 1 (U, Rm ). We write a point in Rn × Rm as


(x, y) with x ∈ Rn and y ∈ Rm . Assume that the m × m matrix

Dy F (x̄, ȳ) := {∂yk Fj (x̄, ȳ)}j,k

is invertible at a certain (x̄, ȳ) ∈ U. Then, for sufficiently small r, s > 0, there is a
function f from B(x̄, r) ⊂ Rn , to B(ȳ, s) ⊂ Rm , such that, for all (x, y) in the cylinder
B(x̄, r) × B(ȳ, s) ⊂ U , it holds

F (x, y) = F (x̄, ȳ) ⇐⇒ y = f (x),

and
−1
Df (x) = − (Dy F )(x, f (x)) ◦ (Dx F )(x, f (x)).

Furthermore, if F ∈ C k (U, Rm ) for some k ≥ 1, then also f ∈ C k (Br (x̄), U ).

Proof. Consider the function ϕ ∈ C 1 (U, Rn+m ) given by

ϕ(x, y) := (x, F (x, y)), for all (x, y) ∈ U.

By assumption the matrix Dϕ(x̄, ȳ) — which has size (n + m) × (n + m) — has a block
decomposition !
1n 0
Dϕ(x̄, ȳ) =
Dx F (x̄, ȳ) Dy F (x̄, ȳ),
so in particular it is invertible and we are under the assumptions of the Inverse function
Theorem. Hence ϕ has a smooth inverse when restricted to a small cylinder Br (x̄)×Bs (ȳ) ⊂ U ,
which is mapped to the open set V := ϕ(Br (x̄) × Bs (ȳ)) ⊂ Rn+m .
Let ψ ∈ C 1 (V, Br (x̄) × Bs (ȳ)) be the inverse. Notice that ψ must have the form

ψ(ζ, ξ) = (ζ, g(ξ, ζ)), for all (ζ, ξ) ∈ V,

for some g ∈ C 1 (V, Bs (ȳ)). Now by definition we have

(x, y) = (ψ ◦ ϕ)(x, y) = (x, g(x, F (x, y)) for all (x, y) ∈ Br (x̄) × Bs (ȳ),

Version: February 25, 2024. 90


Chapter 12.2 Manifolds in Rn (F)

thus we set f (x) := g(x, F (x̄, ȳ)), and get

(x, y) ∈ Br (x̄) × Bs (ȳ), F (x, y) = F (x̄, ȳ) =⇒ y = f (x).

For the converse implication we notice that ψ is injective, so we have

(x, y) ∈ Br (x̄) × Bs (ȳ), y = f (x) =⇒ g(x, F (x, y)) = g(x, F (x̄, ȳ)) =⇒ F (x, y) = F (x̄, ȳ).

The formula for Df (x̄) can be read off the matrix Dψ(x̄, ȳ). Recall that
!−1 !
1 0 1 0
= ,
A B −B A B −1
−1

as can be readily checked. So we have


!−1
1n 0
Dψ(ϕ(x, y)) =
Dx F (x, y) Dy F (x, y)
!
1n 0
= .
−[Dy F (x, y)] · Dx F (x, y) [Dy F (x, y)]−1
−1

Hence Df (ϕ(x, y)) = −[Dy F (x, y)]−1 · Dx F (x, y).

Theorem 12.8: (2) =⇒ (1)

Let f ∈ C 1 (Rk , Rn ) with 0 < k < n such that Df (0) has rank k. Then there is r > 0
small and g ∈ C 1 (Br (f (0), Rn−k ) such that Dg(f (0)) ahs rank n − k and f (Rk ) ∩
Br (f (0)) = {g = 0} ∩ Br (f (0)).

Proof. Pick some vectors v1 , . . . vn−k in Rn , and s > 0 small in such a way that

̸ 0, for all |z| < s.



det Df (z) | v1 | . . . | vn−k =
Pn−k
Consider the map ϕ ∈ C 1 (Bs × Rn−k , Rn ) given by ϕ(z, w) := f (z) + j=1 wj vj , whose
differential at the origin is invertible being

Dϕ(0) = Df (z) | v1 | . . . | vn−k .

By the Inverse function Theorem there are functions ζ ∈ C 1 (Br (f (0), Rk ) and ω ∈ C 1 (Br (f (0), Rn−k )
such that
n−k
X
x = f (ζ(x)) + ωj (x)vj , for all x ∈ Br (f (0)),
j=1

and such a representation of x is unique in that ball. The choice g(x) := ω(x) thus provides
the sought function.

Version: February 25, 2024. 91


Chapter 12.2 Manifolds in Rn (F)

12.2.2 Tangent space to a manifold

Definition 12.9:
Let X ⊂ Rn be a k-dimensional manifold. The tangent space of X at p ∈ M is the
vector subspace of Rn defined as

Tp M := {γ ′ (0) | γ : (−1, 1) → M differentiable with γ(0) = p} ⊂ Rn .

12.10. — In plain terms, the set Tp M is the set of velocity vectors of short paths through
p in M . If we think of elements of Tp M as vectors with base point p, then Tp M corresponds
to a k-dimensional affine subspace of Rn that touches M at the point p. We note that the
choice of (−1, 1) as the domain of differentiable paths is arbitrary; we could just as well have
chosen another open neighborhood of 0 ∈ R, such as an open interval (−δ, δ) for a ideally
small δ > 0.

The following Proposition shows how to practically compute Tp X and that it is indeed a
vector subspace of Rn (which is not transparent from the given definition).

Proposition 12.11:
According to how X is given around p, we equivalently have

(1) Tp X = ker Df (p) if X = {f = 0} in Br (p) for some f ∈ C 1 (Br (p), Rn−k );

(2) Tp X = ran Df (0) if X = f (Bs ) in Br (p) and f (0) = p, for some f ∈ C 1 (Bs , Rn )
with Bs ⊂ Rk ;

(3) Tp X is the tangent plane to f at p, if X is the graph of f : Rk → Rn−k in Br (p).

In particular, Tp X is a vector subspace of Rn .

Proof. ...

Definition 12.12:
Let X ⊂ Rn be a k-dimensional manifold. The normal space of X at p ∈ M is the
(n − k) dimensional vector subspace of Rn which is orthogonal to Tp X.

12.2.3 Examples and Non-Examples

Version: February 25, 2024. 92


Chapter 13

NEW: Multidimensional Integration

13.1 The n-volume

Definition 13.1: Dyadic cubes and their measure

We call any subset of the form Q = 2−k (a + [0, 1)n ), for some a ∈ Zn and k ∈ Z a
dyadic cube. Given such Q we define its µ(Q) = 2−kn .
For a finite disjoint union of dyadic cubes E = ∪i Qi , we define µ(E) = i µ(Q).
P

Proposition 13.2: Properties of µ

The measure µ, defined over finite disjoint unions of dyadic cubes, satisfies the following
properties:

1. Translation invariance: µ(E + τ ) = µ(E) for all τ such that 2k τ ∈ Zn for some
k ∈ N.

2. Additivity: µ(E ∪ F ) = µ(E) + µ(F ) − µ(E ∩ F )

3. Normalization: µ([0, 1)n ) = 1

Proof. ... the key advantage of using dyadic cubes is that it allows to refine any union or
intersection of dyadic cubes with sufficiently small cubes, all of the same size. At the same
time we can approximate any real box.

13.3. — Actually, it is not difficult to prove that the properties (1)-(3) below completely
characterize µ.

93
Chapter 13.1 The n-volume

Definition 13.4: Jordan measurable sets and their measure


Given a set E ⊂ Rn let us define
( )
X
µin (E) = sup voln (Qi ) | Qi ⊂ E pairwise disjoing
i

and
µout (E) = inf {voln (Qi ) | E ⊂ ∪i Qi } .

Here, the supremum and the infimum are taken among finite families of rational cubes.
If µin (E) = µout (E) then we say that E is Jordan measurable
We then denote voln (E) = µin (E) = µout (E)

Proposition 13.5: first properties of voln

The measure voln , defined over all Jordan measurable sets E ⊂ Rn :

1. Translation invariance: if E is Jordan measurable and τ ∈ Rn then E + τ is also


Jordan measurable and volm (E + τ ) = voln (E) for all τ ∈ Rn

2. Additivity: If E, F are Jordan measurable then their intersection and union also
are, and voln (E ∪ F ) = voln (E) + voln (F ) − voln (E ∩ F ).

3. Normalization: voln ([0, 1)n ) = 1

Proposition 13.6: Affine transformations


Given some invertible linear map L : Rn → Rn . IF a set E ⊂ Rn is Jordan measurable
then so is its image under L, which we denote LE. Moreover there exist a positive
factor λ(L) such that

voln (LE) = λ(L) voln (E) for all E Jordan measurable.

Lemma 13.7: The factor for special stretchings


Given λ1 , . . . , λn given positive real numbers, consider the ’special stretching’ (affine
transformation) Sx = (λ1 x1 , . . . λn xn ). Then λ(S) = ni=1 λi )
Q

Lemma 13.8: Balls are Jordan measurable


Balls Br (x) ⊂ Rn are Jordan measurable and voln (Br (x)) = rn voln (B1 )

Proof. Show that the number of cubes in the family [0, 2−k )n + εT | T ∈ Zn that intersect


the boundary |x| = 1 of unit ball is bounded by C(n)(2−k )1−n ....

Proposition 13.9: The factor is the determinant


For every invertible linear map L : Rn → Rn we have λ(L) = | det L|

Version: February 25, 2024. 94


Chapter 13.1 The n-volume

Proof. Using the polar decomposition of a matrices (see...) we have that L = R2 SR1 , where
S is an special stretching and Ri are orthogonal matrices (i.e., rotations; characterized by
RRT = Id). Since rotations leave the unit ball invariant we have voln (B1 ) = voln (Ri B1 ) =
λ(Ri ) voln (B1 ) and hence λ(Ri ) = 1. On the other hand λ is clearly multiplicative, that is
λ(L) = λ(R2 SR1 ) = λ(R2 )λ(S)λ(R1 ) = λS = |detS|.
Since orthogonal matrices have determinant equal to one, and the determinant is also
multiplicative, we are done.

Corollary 13.10: Boxes are Jordan measurable


Any box E = [a1 , b1 ) × [a2 , b2 ) × · · · × [an , bn ) where bi > ai is Jordan measurable and
volm (E) = (b1 − a1 )(b2 − a2 ) · · · (bn − an ).

Corollary 13.11:
voln is also invariant under rotations.

Version: February 25, 2024. 95


Chapter 13.3 Measurable Sets and the Integral

13.2 Measurable Sets and the Integral

13.2.1 Measurable sets and Fubini’s Theorem

Definition 13.12: Jordan Measurable Set


A set E ⊂ Rn is measurable if the two numbers
X
sup{ vol(Pi ) : {Pi } is a finite disjoint collection of parallelograms contained in E},
i
X
inf{ vol(Pi ) : {Pi } is a finite collection of parallelograms containing in E}.
i

coincide. Their common value is vol(E).

Proposition 13.13: Well-Posedness


If P is a parallelogram the symbol vol(P ) is well-defined.

Proof. Is it clear that if P ∪ P ′ ⊂ Q, then vol(P ) + vol(P ′ ) ≤ 1?

Proposition 13.14: One can use straight Cubes


If E is Jordan measurable then one can use cubes with sides parallel to the axis to find
its volume.

Proof. By induction on the dimension one can show that an ε tiling of a cube is off by Cn ε,
so by sandwiching cube tilings suffices.

13.2.2 The integral of a continuous function


13.2.3 The change of variables formula

13.3 Computation of Multiple integrals

Version: February 25, 2024. 96


Chapter 14

OLD: Multidimensional Integration

In this chapter, we will extend the Riemann integral over an interval [a, b] ⊂ R to a mul-
tidimensional Riemann integral over sufficiently nice open subsets of Rn . Again, it suffices
to consider real-valued functions, as the generalization to complex-valued or vector-valued
functions is done component-wise.

14.1 The Riemann Integral for Boxes


In this section, we fix a non-negative integer n and aim to replace intervals as integration
domains with subsets in Rn . We begin with the case of a box.

14.1.1 Definition and Initial Properties


14.1. — A box is a subset Q of Rn that is a product of intervals, i.e.,

Q = I1 × . . . × In

for intervals I1 , . . . , In ⊂ R. If the lengths of the intervals I1 , . . . , In are all equal, we also call
Q a cube. For n = 2, we refer to them as rectangles.

Definition 14.2. — For bounded non-empty intervals I1 , . . . , In , the volume of the box
Q = I1 × · · · × In in Rn is defined as
n
Y
vol(Q) = (bk − ak )
k=1

with ak = inf Ik and bk = sup Ik .

14.3. — Let Q = I1 × . . . × In be a bounded closed box with Ik = [ak , bk ]. A partition


of Q means specifying a partition for each interval Ik , in the sense of Definition ??. If such a
partition is given, i.e.,

ak = xk,0 < xk,1 < xk,2 < · · · < xk,l(k) = bk (14.1)


97
Chapter 14.1 The Riemann Integral for Boxes

for each k, then, for α = (α1 , α2 , . . . , αn ) ∈ Nn with 1 ≤ αk ≤ l(k), an address for this
partition is defined. For each such address, we write
n
Y
Qα = [xk,αk −1 , xk,αk ]
k=1

for the corresponding closed sub-box of Q. In other words, Qα ⊆ Q ⊆ Rn is the subset of all
(t1 , . . . , tn ) ∈ Rn for which xk,αk −1 ≤ tk ≤ xk,αk holds for all k = 1, 2, . . . , n. The addition
formula
X
vol(Q) = vol(Qα ) (14.2)
α

is shown using complete induction, where the sum extends over all addresses for the given
partition (14.1). A refinement of the partition (14.1) is a partition

ak = yk,0 ≤ yk,1 ≤ yk,2 ≤ · · · ≤ yk,m(k) = bk k = 1, 2, . . . , n

such that, for each fixed k, the partition ak = yk,0 ≤ · · · ≤ yk,m(k) = bk of Ik is a refinement of
ak = xk,0 ≤ · · · ≤ xk,l(k) = bk , as discussed in ??. Two arbitrary partitions of Q always have
a common refinement.

Figure 14.1: On the left, a partition of a box [a1 , b1 ] × [a2 , b2 ] ⊆ R2 is shown. The horizontally
represented interval [a1 , b1 ] is divided into three parts, and the vertically represented interval
[a2 , b2 ] is divided into two parts. The address α = (3, 2) corresponds to the sub-box Qα =
[x1,2 , x1,3 ] × [x2,1 , x2,2 ]. On the right, a refinement of this partition is shown.

14.4. — Let Q ⊆ Rn be a box. A step function on Q is a bounded function f : Q → R


such that there exists a partition of Q, for each address α, the function f is constant on the
open sub-box Q◦α . In this case, we also say that f is a step function with respect to this
partition of Q. If cα is the constant value of f on the open sub-box Q◦α , we write
Z X
f (x)dx = cα vol(Qα ) (14.3)
Q α

where the sum extends over all addresses for the given partition. We call this number the
integral of f over Q. In the case n = 1, this brings back the definitions ?? and ?? of step
functions and their integral.

Version: February 25, 2024. 98


Chapter 14.1 The Riemann Integral for Boxes

Exercise 14.5. — Let Q ⊆ Rn be a box. Show that the set of all step functions T F(Q) on
Q forms a vector space with respect to pointwise addition and multiplication. Show that (14.3)
is independent of the choice of partition and thus defines a monotone and linear mapping
R
Q : T F(Q) → R

14.6. — Let Q ⊆ Rn be a box, T F denote the vector space of step functions on Q, and
f : Q → R be a function. We define the sets of lower sums U(f ) and upper sums O(f ) of
f as
Z  Z 
U(f ) = udx u ∈ T F and u ≤ f O(f ) = odx o ∈ T F and f ≤ o
Q Q

If f is bounded, these sets are non-empty. Due to the monotonicity of the integral for step
functions, the inequality
sup U(f ) ≤ inf O(f )

holds if f is bounded.

Definition 14.7. — Let Q ⊆ Rn be a box, and f : Q → R be a bounded function. We


call sup U(f ) the lower, and inf O(f ) the upper integral of f . The function f is called
Riemann integrable if sup U(f ) = inf O(f ). In this case, the common value is referred to
as the Riemann integral of f , and is written as
Z
f (x)dx = sup U(f ) = inf O(f )
Q

14.8. — Step functions are Riemann integrable, and the Riemann integral of a step function
is precisely given by (14.3). The following alternative notations for the Riemann integral are
common: Z Z
f (x1 , x2 , . . . , xn )dx1 dx2 · · · dxn f (x)dvol
Q Q

The left-hand notation is useful when f is given as a function of variables x, y, z, . . . whose


combination has no specific name. The right-hand notation indicates that the integral is
defined with respect to the standard volume for boxes in Rn - alternatives are conceivable and
the subject of the so-called measure theory . The following proposition is a generalization
of Proposition ??.

Proposition 14.9. — Let f : Q → R be bounded. The function f is Riemann integrable if


and only if, for every ε > 0, there are step functions u and o on Q such that
Z
u≤f ≤o and (o − u)dx < ε
Q

Proof. The proof is the same as for Proposition ??.

Version: February 25, 2024. 99


Chapter 14.1 The Riemann Integral for Boxes

Proposition 14.10. — Let Q ⊆ Rn be a box, and R(Q) denote the set of all Riemann
integrable functions on Q. Then R(Q) is an R-vector space with respect to pointwise addition
and multiplication, and integration
R
Q : R(Q) → R

is an R-linear map. Integration is also monotonic and satisfies the triangle inequality: It holds
Z Z Z Z
f ≤ g =⇒ f (x)dx ≤ g(x)dx and f (x)dx ≤ |f (x)|dx
Q Q Q Q

for all f, g ∈ R(Q). In particular, |f | is Riemann integrable.

Proof. The proof is analogous to the proof of the corresponding statements for the Riemann
integral on an interval; see Theorem ?? for linearity, Proposition ?? for monotonicity, and
Theorem ?? for the triangle inequality.

Exercise 14.11. — Let Q ⊆ R2 be the box [−1, 1]2 . Calculate the two-dimensional integral
Z
(2 − x2 − y 2 )dxdy
Q

according to the definitions of this section. Then calculate


Z 1 Z 1
F (y) = f (x, y)dx and F (y)dy
−1 −1

and compare the results.

Exercise 14.12. — Let Q ⊆ Rn be a closed box and f : Q → R a bounded function.


Assume an arbitrary partition of Q is given. Show that f is Riemann integrable if and only if
for each sub-box Qα of the given partition, the restricted function f |Qα : Qα → R is Riemann
integrable. Show that in this case
Z XZ
f dx = f |Qα dx
Q α Qα

holds.

14.1.2 Null Sets


Definition 14.13. — A subset N ⊆ Rn is called a Lebesgue null set or simply a null
set if, for every ε > 0, there exists a countable family of open boxes (Qk )k∈N in Rn such that

[ ∞
X
N⊂ Qk and vol(Qk ) < ε (14.4)
k=0 k=0

Version: February 25, 2024. 100


Chapter 14.1 The Riemann Integral for Boxes

holds.

14.14. — We interpret the condition in the definition of null sets such that 0 is the only
reasonable value for the volume of a null set N . In fact, (14.4) can be read as stating that N
is contained in a set ∞ ℓ=1 Qℓ whose volume is small. Here, the volume of the countable union
S
S∞ P∞
ℓ=1 Qℓ is not defined, but ℓ=1 vol(Qℓ ) can be regarded as an upper bound for it. We say
that a statement A about elements x ∈ Rn is true almost everywhere if

{x ∈ Rn | ¬A(x)}

is a null set. For example, if f : Rn → R is a function, we say f is almost everywhere


continuous if the set of x ∈ Rn at which f is discontinuous is a null set.

Lemma 14.15. — A subset of a null set is a null set. A countable union of null sets is
again a null set.

Proof. The first statement follows directly from (14.4). Let (Nj )j∈N be a countable family of
null sets in Rn , and let N be their union. Let ε > 0. Then, by the definition of null sets, for
each j ∈ N, there exists a countable family of open boxes (Qjk )k∈N such that

∞ ∞
[ X ε
Nj ⊂ Qjk and vol(Qjk ) < .
2j+1
k=0 k=0

This implies
∞ ∞ [
∞ ∞ X
∞ ∞
[ [ X X ε
N= Nj ⊂ Qjk and vol(Qjk ) < j+1
= ε.
2
j=0 j=0 k=0 j=0 k=0 j=0

Since N × N is countable, and ε > 0 was arbitrary, this shows that N is a null set.

Example 14.16. — Every countable subset in Rn is a null set. In particular, for example,
Qn ⊆ Rn is a null set, since Qn is countable. Every box with an empty interior is a null set.
Every linear subspace V ⊊ Rn is a null set in Rn . In Proposition 14.18, we will show that an
open subset U ⊆ Rn is a null set if and only if U is empty.

Exercise 14.17. — Let N ⊆ Rn be a null set, and α : Rn → Rn be a linear map. Show


that α(N ) is a null set.

Proposition 14.18. — A set X ⊆ Rn with a non-empty interior is not a null set.

Proof. If the interior of X is not empty, then X contains a closed box with a non-empty
interior. Let Q = [a1 , b1 ] × · · · × [an , bn ] be such a box, a1 < b1 , . . . , an < bn . It suffices to
show that Q is not a null set. If Q were a null set, there would exist open boxes O1 , O2 , . . .

Version: February 25, 2024. 101


Chapter 14.1 The Riemann Integral for Boxes

with
∞ ∞
[ X vol(Q)
Q⊂ Ok and vol(Ok ) < .
2
k=1 k=1

By the Heine-Borel Theorem ?? and Theorem ??, Q is compact, so there exists an m ∈ N


such that
m m
[ X vol(Q)
Q⊂ Ok and vol(Ok ) < . (14.5)
2
k=1 k=1

We define the boxes Qk = Ok ∩ Q and write Qk = [ak,1 , bk,1 ] × . . . × [ak,n , bk,n ] for all k ∈
{1, . . . , m}. In particular, Q = m k=1 Qk .
S

For each j ∈ {1, . . . , n}, we can define a partition of [aj , bj ] by arranging the points
{a1,j , b1,j , . . . , am,j , bm,j }. This gives us a partition of Q such that each closed sub-box Qk ⊆ Q
is a finite union of boxes Qα , where α are addresses in this partition of Q. It follows that
X X
vol(Q) = vol(Qα ) and vol(Qk ) = vol(Qα )
α α|Qα ⊆Qk

which, after summing over k, results in the inequality


n
X
vol(Q) ≤ vol(Qk )
k=1

which contradicts (14.5).

Figure 14.2: After reducing to a finite cover of the cube Q, it is much easier to show that the
total volume of the covering cubes is at least as large as the volume of Q. In fact, for the
latter, a partition is fabricated from Q1 , . . . , Qm as illustrated, and then the addition formula
(??) is applied.

Exercise 14.19. — Show that a subset N ⊆ Rn is a null set if and only if, for every ε > 0,
there exists a family (Qk )k∈N of closed cubes, with

[ ∞
X
N⊂ Qℓ and vol(Qℓ ) < ε.
ℓ=1 ℓ=1

Proposition 14.20. — Let Q ⊂ Rn−1 be a closed cube, and f : Q → R be Riemann


integrable. Then, the graph {(x, f (x)) | x ∈ Q} ⊆ Rn of f is a null set.

Version: February 25, 2024. 102


Chapter 14.1 The Riemann Integral for Boxes

Proof. Let ε > 0. Since f is Riemann integrable, for every ε > 0, there exist step functions u
and o on Q with u ≤ f ≤ o and Q (o−u)dx < ε. We choose a partition of Q such that for each
R

address α, both functions u and o are constant on Q◦α . Let cα be the constant value of u, and
dα be the constant value of o on Q◦α . For each address α, we define the cube Pα = Qα ×[cα , dα ]
and obtain
[ [ 
graph(f ) ⊆ Pα ∪ ∂Qα × R
α α

where the second union corresponds to the grid of the partition of Q. It is a finite union of
axis-parallel hyperplanes in Rn , which we can cover with countably many cubes with volume
0. It holds Z
X X
vol(Pα ) = (dα − cα ) vol(Qα ) = (o − u)dx < ε,
α α Q

showing that graph(f ) can be covered by countably many cubes with a sum of volumes less
than ε. Since ε > 0 was arbitrary, the proposition follows.

Exercise 14.21. — Let U ⊆ Rn be open, and let Φ : U → Rn be a Lipschitz map. Show


that the image of a Lebesgue null set under Φ is again a Lebesgue null set.

Exercise 14.22. — Let n ≥ 1, k ∈ {1, . . . , n − 1}, Q ⊆ Rn−k be a closed cube, and f be a


continuous function on Q with values in Rk . Show that graph(f ) ⊆ Rn is a null set.

Exercise 14.23. — Use the previous exercise to show that every k-dimensional submanifold
of Rn for k < n is a null set in Rn .

Exercise 14.24. — Given is a rectangle Q ⊆ R2 , which has a finite cover with rectangles
Q1 , . . . , Qn ⊂ Q that intersect only along their edges.

Assume that each of the rectangles Q1 , . . . , Qn has at least one edge of integer length. Show
that then Q also has at least one edge of integer length.

14.1.3 The Lebesgue Criterion


Theorem 14.25 (Lebesgue Criterion). — A real-valued, bounded function on a closed cube
Q is Riemann integrable if and only if it is continuous almost everywhere, i.e., if the set

N = {x ∈ Q | f is discontinuous at x}
Version: February 25, 2024. 103
Chapter 14.1 The Riemann Integral for Boxes

is a null set.

Corollary 14.26. — Let Q be a closed cube with non-empty interior. Then, every con-
tinuous function f : Q → R is Riemann integrable.

Proof. This follows directly from the Lebesgue Criterion. Since Q is compact, all continuous
functions on Q are bounded.

14.27. — In Definition ??, the oscillation ω(f, x) of f : Q → R was introduced, describing


the fluctuation of f around a point x ∈ Q. For δ > 0, we define

ω(f, x, δ) = sup f (Q ∩ B∞ (x, δ)) − inf f (Q ∩ B∞ (x, δ)),

where B∞ (x, δ) denotes the ball with respect to the supremum norm. Such a ball is an open,
axis-parallel cube with center x and side length 2δ. The oscillation ω(f, x) is defined as the
limit, independent of the shape of the balls, of ω(f, x, δ) as δ → 0.

Lemma 14.28. — Let f : X → R be a bounded function on a metric space X. For every


η ≥ 0, the subset Nη = {x ∈ X | ω(f, x) ≥ η} ⊆ X is closed.

Proof. Let η ≥ 0, and let (xk )∞k=0 be a convergent sequence in X with ω(f, xk ) ≥ η for all
k ∈ N and with limit x ∈ X. Take δ > 0 arbitrarily. Then, there exists a k such that
xk ∈ B(x, δ). Since B(x, δ) is open, there is a δk > 0 with B(xk , δk ) ⊆ B(x, δ). From

sup f (B(x, δ)) ≥ sup f (B(xk , δk )), inf f (B(x, δ)) ≤ inf f (B(xk , δk )),

it follows ω(f, x, δ) ≥ ω(f, xk , δk ). With ω(f, xk , δk ) ≥ ω(f, xk ) ≥ η, we obtain ω(f, x, δ) ≥ η,


and since δ > 0 was arbitrary, the lemma follows.

Exercise 14.29. — Let f : X → R be a bounded function on a metric space X. Is the set


{x ∈ X | f is discontinuous at x} closed?

Proof of Theorem 14.25. First, we assume that the bounded real-valued function f on Q is
Riemann-integrable. Let η > 0 and ε > 0 be arbitrary. Then, according to Proposition 14.9,
there exist step functions u and o on Q with u ≤ f ≤ o and Q (o − u)dx < εη. We choose a
R

partition of Q so that for each address α, the function u is constant on Q◦α with value cα , and
o is constant on Q◦α with value dα . It holds
X
(dα − cα ) vol(Qα ) < εη.
α

Version: February 25, 2024. 104


Chapter 14.1 The Riemann Integral for Boxes

We define the set of addresses A(η) = {α | (dα − cα ) ≥ η} and obtain


X X
vol(Qα ) ≤ η −1 (dα − cα ) vol(Qα ) < ε.
α∈A(η) α∈A(η)

Now we consider the closed set Nη = {x ∈ Q | ω(f, x) ≥ η} from Lemma 14.28. For each
α∈/ A(η) and x ∈ Q◦α , there exists a δ > 0 such that B(x, δ) is contained in Q◦α . Therefore,

ω(f, x) ≤ ω(f, x, δ) ≤ sup f (Q◦α ) − inf f (Q◦α ) ≤ dα − cα < η

for all α ∈
/ A(η) and x ∈ Q◦α . Every element x ∈ Nη is either an element of ∂Qα for an address
α, or an element of Q◦α for an address α ∈ A(η). From
[ [ X
Nη ⊂ ∂Qα ∪ Q◦α and vol(Qα ) < ε
α α∈A(η) α∈A(η)

follows that Nη is a null set. From Lemma ??, it follows that


[
N = {x ∈ Q | f is discontinuous at x} = {x ∈ Q | ω(f, x) > 0} = N2−k
k=0

is a null set. This establishes one direction of the Lebesgue Criterion.


To prove the reverse direction, assume that f : Q → R is bounded, and N = {x ∈
Q | f is discontinuous at x} is a null set. Write M = sup |f (Q)| and let ε > 0. The set
Nε = {x ∈ Q | ω(f, x) ≥ ε} ⊆ N is, according to Lemma ??, a null set, closed according to
Lemma 14.28, and compact according to the Heine-Borel Theorem ??. Therefore, there exist
finitely many open cubes O1 , . . . , Om such that
m
[ m
X
Nε ⊆ Oℓ and vol(Oℓ ) < ε. (14.6)
ℓ=1 ℓ=1

The subset K = Q \ m ℓ=1 Oℓ is compact according to the Heine-Borel Theorem. By construc-


S

tion, K ⊆ Q \ Nε , and thus, ω(f, x) < ε for all x ∈ K. According to Proposition ??, there
exists a δ > 0 such that for all x ∈ K, the estimate

(14.7)

ω f |K , x, δ < 2ε

holds. We now choose a partition of Q so that the mesh size is smaller than δ, and such that
each of the cubes Ok ∩ Q is a union of closed cubes Qα . We define step functions u and o with
u ≤ f ≤ o by
 
inf f (Q◦ ) if x ∈ Q◦ sup f (Q◦ ) if x ∈ Q◦
α α α α
u(x) = o(x) =
f (x) if x ∈ ◦
/ Q ∀α α
f (x) if x ∈
/ Q◦ ∀α α

and want to show that Q (o − u)dx is small. To do this, we separate the sum corresponding
R

to (??) into two parts. In the first part, we sum over addresses α such that Qα is part of a

Version: February 25, 2024. 105


Chapter 14.1 The Riemann Integral for Boxes

cube Oℓ , and in the second part, we sum over the remaining addresses α. On the first sum,
we then apply (14.6), and on the second sum, we apply (14.7). This results in
Z m
X X X
(o−u)dx ≤ 2M vol(Qα )+ 2ε vol(Qα ) ≤ 2M ε+2ε vol(Q) = 2(M +vol(Q))ε,
Q ℓ=1 α|Qα ⊆Oℓ α|Qα ⊆K

and since ε > 0 was arbitrary, this implies the Riemann integrability of f .

14.1.4 Riemann Integrability and Continuity


Continuous functions on rectangles are integrable, as stated in Corollary 14.26, and are in
many ways simpler to understand than general Riemann-integrable functions. In this section,
we show that Riemann-integrable functions can be sandwiched between continuous functions.
This will later allow us to reduce some statements about Riemann-integrable functions to
statements about continuous functions.

Proposition 14.30. — Let Q ⊆ Rn be a closed rectangle. A bounded function f : Q → R


is Riemann-integrable if and only if, for every ε > 0, there exist continuous functions f− , f+ :
Q → R satisfying Z
f− ≤ f ≤ f+ and (f+ − f− )dx < ε. (14.8)
Q

Proof. Suppose there exist continuous functions f− and f+ on Q satisfying (14.8) for ε >
0. Since f− and f+ are Riemann-integrable according to Corollary 14.26, there exist step
functions u and o with u ≤ f− and Q (f− − u)dx < ε, as well as f+ ≤ o and Q (o − f+ )dx < ε.
R R

It follows that u ≤ f ≤ o and


Z Z
(o − u)dx ≤ (o − f+ + f− − u)dx ≤ 2ε.
Q Q

Since ε > 0 was arbitrary, this implies the Riemann integrability of f from Proposition 14.9.
For the reverse direction, we proceed step by step, making progressively weaker assumptions
about the functions f .
Case 1: f = 1Q′ for a closed rectangle Q′ ⊆ Q. For δ > 0, consider the functions f+ and
f− on Q, given by

f+ (x) = 1 − min{1, δ −1 inf{∥y − x∥ | y ∈ Q′ }},


f− (x) = min{1, δ −1 inf{∥y − x∥ | y ∈ Q \ Q′ }}.

The functions f+ and f− are both continuous, and it holds f− ≤ 1Q′ ≤ f+ . Indeed, the
function f+ is constant with value 1 on Q′ and constant with value 0 outside a δ-neighborhood
of Q′ . The function f− is constant with value 0 on the complement of Q′ and constant with
value 0 outside a δ-neighborhood of Q \ Q′ . In particular, f− (x) = 1Q′ (x) = f+ (x) for all x

Version: February 25, 2024. 106


Chapter 14.1 The Riemann Integral for Boxes

outside a δ-neighborhood of ∂Q′ , which implies the estimate for the integral in (14.8) as long
as δ is chosen small enough.
Case 2: f is a step function. We choose a partition of Q that suits f . By duplicating each
partition point, we can refine the partition so that for each subrectangle Qα , each side face
is again a subrectangle in the partition. Let N be the number of rectangles in this partition,
and let M ≥ 0 with −M ≤ f (x) ≤ M for all x ∈ Q. According to the previous case, for each
α there exist continuous functions fα,− ≤ 1Qα ≤ fα,+ with
Z
ε
(fα,+ − fα,− )dx ≤
Q 2M N

If the interior of Qα is empty, then = 0, and thus


R
Q 1Qα dx
Z Z
ε ε
− < fα,− dx and fα,+ dx <
2M N Q Q 2M N

according to the reverse triangle inequality. We consider the continuous functions


X X X X X X
f− = cα fα,− + cα fα,+ − M fβ,− , f+ = cα fα,+ + cα fα,− + M fβ,−
α|cα ≥0 α|cα <0 β α|cα ≥0 α|cα <0 β

where in each sum, the first two sums run over all addresses α with Q◦α ̸= ∅ and cα denotes
the constant value of f on Q◦α , and the second sum runs over all addresses β with Q◦β = ∅. It
holds f− ≤ f ≤ f+ by construction, and
Z X Z XZ
(f+ − f− )dx = |cα | (fα,+ − fα,− )dx + 2M (fβ,+ + fβ,− )dx <
Q α Q β Q
X |cα |ε X ε
+ 2M ≤ε
α
2M N 2M N
β

which was to be shown.


Case 3: f is Riemann-integrable. Let ε > 0 and let u ≤ f ≤ o be step functions with
Q (o − u)dx < ε. We have already shown that there exist continuous functions u− ≤ u ≤ u+
R

and o− ≤ o ≤ o+ with Q (u+ −u− )dx ≤ ε and Q (o+ −o− )dx ≤ ε. It follows that u− ≤ f ≤ o+
R R

and
Z Z Z Z
(o+ − u− )dx ≤ ε + (o+ − o + u − u− )dx ≤ ε + (o+ − o− )dx + (u+ − u− )dx ≤ 3ε
Q Q Q Q

which proves the statement, as ε > 0 was arbitrary.

Version: February 25, 2024. 107


Chapter 14.2 The Riemann Integral over Jordan-Measurable Sets

14.2 The Riemann Integral over Jordan-Measurable Sets


We now turn away from integration over axis-parallel rectangles and aim to consider more
general sets, such as the circular disk in R2 .

14.2.1 Jordan Measurability


Definition 14.31. — A subset B of Rn is called Jordan-measurable if there exists a
closed rectangle Q in Rn with Q ⊇ B, such that the characteristic function 1B on Q is
Riemann-integrable. The volume or Jordan measure of B is defined in this case by
Z
vol(B) = 1B dx
Q

Corollary 14.32 (Lebesgue’s Criterion). — A subset B ⊂ Rn is Jordan-measurable if and


only if B is bounded and the boundary ∂B is a null set. If B1 , B2 ⊂ Rn are Jordan-measurable,
then B1 ∪ B2 , B1 ∩ B2 , and B1 \ B2 are also Jordan-measurable.

Proof. The first statement follows from Lebesgue’s Criterion and the equality

∂B = {x ∈ Rn | 1B is not continuous at x} (14.9)

which can be verified directly. Now, let B1 , B2 ⊂ Q be Jordan-measurable. It holds

1B1 ∩B2 = 1B1 · 1B2 , 1B1 ∪B2 = 1B1 + 1B2 − 1B1 ∩B2 , 1B1 \B2 = 1B1 − 1B1 · 1B2

and, according to Equation (14.9), the boundaries ∂(B1 ∪ B2 ), ∂(B1 ∩ B2 ), and ∂(B1 \ B2 ) are
contained in the union ∂B1 ∪ ∂B2 and thus null sets. Again, by Lebesgue’s Criterion, B1 ∪ B2 ,
B1 ∩ B2 , and B1 \ B2 are Jordan-measurable.

Proposition 14.33. — Let Q ⊂ Rn−1 be a closed rectangle, f− , f+ : Q → R Riemann


integrable, and let D ⊆ Q be Jordan-measurable. Then, the set

B = {(x, y) ∈ Rn | x ∈ D, f− (x) ≤ y ≤ f+ (x)}

is Jordan-measurable.

Proof. The functions f− and f+ are bounded, so there exists M ≥ 0 such that −M < f− (x) <
M and −M < f+ (x) < M for all x ∈ Q. Let N− ⊆ D and N+ ⊆ D be the set of discontinuity
points of f− and f+ , respectively, and write N = ∂D ∪ N− ∪ N+ . The set B is bounded since
it is contained in Q × [−M, M ]. The boundary of B is contained in the union

N × [−M, M ] ∪ graph(f− ) ∪ graph(f+ )

Version: February 25, 2024. 108


Chapter 14.2 The Riemann Integral over Jordan-Measurable Sets

By Proposition 14.20, graph(f− ) and graph(f+ ) are null sets. According to Lebesgue’s
Criterion, N as the union of three null sets is also a null set. Therefore, N × [−M, M ] is
a null set. In fact, for a countable cover by open rectangles {Qk |k ∈ N} of A, the family
{Qk × [−M, M ]|k ∈ N} is a countable cover by open rectangles of N × [−M, M ], and from
∞ ∞
X ε X
vol(Qk ) < =⇒ vol(Qk × [−M, M ]) < ε
2M
k=0 k=0

the claim follows. This implies that ∂B is a null set, so B is Jordan-measurable.

Exercise 14.34. — Show that for a null set N ⊆ Rn−1 , N ×R is also a null set in Rn . Show
that for a Jordan-measurable set D ⊆ Rn−1 and any a < b, D × [a, b] is a Jordan-measurable
subset of Rn .

Definition 14.35. — Let B ⊆ Rn be a Jordan-measurable subset, and let f be a real-


valued function on B. Then, f is called Riemann-integrable if there exists a closed rectangle
Q ⊆ Rn with B ⊆ Q, such that the function given by

f (x) if x ∈ B
f! (x) =
0 if x ∈ Q \ B

is Riemann-integrable on Q. In this case, we write


Z Z
f dx = f! dx
B Q

and call this number the Riemann integral of f over B.

14.36. — Let B ⊆ Rn be a Jordan-measurable subset, and let f : Q → R be a Riemann-


integrable function. The integral of f over B, as defined in ??, depends a priori on the choice
of a closed box Q containing the set B. The independence of the choice of the box is shown
as follows: Suppose Q1 , Q2 ⊂ Rn are two closed boxes with B ⊂ Q1 and B ⊂ Q2 . Then,
Q = Q1 ∩ Q2 is also a box with B ⊆ Q.

Version: February 25, 2024. 109


Chapter 14.2 The Riemann Integral over Jordan-Measurable Sets

The integral of the function f! extended by the constant value 0 over Q1 is equal to the
corresponding integral over Q, as becomes evident when considering a partition of Q1 for
which Q ⊆ Q1 is a sub-box, as shown in Exercise ??.

Exercise 14.37. — Let a ∈ Rn and λ ∈ R. Show that for every Jordan-measurable subset
B ⊆ Rn , the subset a + λB = {a + λb | b ∈ B} is Jordan-measurable with volume

vol(a + λB) = |λn |vol(B).

Corollary 14.38 (Lebesgue’s Criterion). — Let B ⊆ Rn be Jordan-measurable, and let


f : B → R be bounded. Then, f is Riemann-integrable if and only if f is almost everywhere
continuous on B, meaning the set of points of discontinuity of f is a null set. In particular,
every bounded continuous function on a Jordan-measurable set is Riemann-integrable.

Proof. Let Q be a closed box with B ⊆ Q. Since, by assumption, B is Jordan-measurable,


⊮B is Riemann-integrable on Q, and the set ∂B of points of discontinuity of ⊮B is a null set
according to Lebesgue’s Criterion in Theorem ??. Define f! : Q → R by

f (x) if x ∈ B
f! (x) =
0 if x ∈ Q \ B

then

{x ∈ Q | f! is discontinuous at x} ⊆ ∂B ∪ {x ∈ B | f is discontinuous at x}.

The corollary follows from Lebesgue’s Criterion in Theorem ??.

Corollary 14.39. — Let B ⊆ Rn be a Jordan-measurable subset. Linearity of the Riemann


integral, monotonicity, and the triangle inequality as in Proposition ?? hold analogously for
the Riemann integral of Riemann-integrable functions on B.

Proof. All statements follow directly by applying the propositions in Section 14.1.1 to f! as in
Definition ??.

Proposition 14.40. — Let B1 ⊆ Rn and B2 ⊆ Rn be Jordan-measurable, and let f :


B1 ∪ B2 → R be a Riemann-integrable function. Then, f |B1 is Riemann-integrable, and it

Version: February 25, 2024. 110


Chapter 14.2 The Riemann Integral over Jordan-Measurable Sets

holds Z Z Z Z
f dx = f dx + f dx − f dx.
B1 ∪B2 B1 B2 B1 ∩B2

Proof. Integrability of the restriction of f to B1 , B2 , or B1 ∩ B2 follows from Lebesgue’s


Criterion. We have ⊮B1 ∪B2 = ⊮B1 + ⊮B2 − ⊮B1 ∩B2 . The proposition follows by multiplying
with f and using the linearity of the Riemann integral in Proposition ??.

Exercise 14.41. — A subset J ⊂ Rn is called a Jordan null set if, for every ϵ > 0, there
exists a finite family Q1 , . . . , Qm ⊂ Rn of open boxes such that
m
[ m
X
J⊆ Qk , vol(Qk ) < ϵ
k=1 k=1

Show the equivalence of the following statements for a subset J ⊆ Rn :

1. J is a Jordan null set.

2. J is Jordan-measurable, and vol(J) = 0.

3. J is a bounded Lebesgue null set.

Exercise 14.42. — Let J be a Jordan null set, and let f : J → R be a bounded function.
Show that f is Riemann-integrable, and J f dx = 0.
R

Exercise 14.43. — Let Q be a closed box, and let f : Q → R be a Riemann-integrable


function with f ≥ 0 and Q f dx = 0. Show, in analogy to Exercise ??, that f = 0 almost
R

everywhere. Formulate and prove the analogous statement for functions on Jordan-measurable
subsets.

14.44. — Let Q be a closed box with non-empty interior, and R(Q) be the vector space
of Riemann-integrable functions on Q. Then, the expression
Z
∥ · ∥1 : R(Q) → R ∥f ∥1 = |f |dx
Q

defines a so-called seminorm on R(Q). In fact, ∥ · ∥1 satisfies all properties of a norm from
Definition ??, except definiteness. Instead, for f ∈ R(Q) with ∥f ∥1 = 0 according to Exercise
14.43, f is almost everywhere equal to zero. The set

N (Q) = {f ∈ R(Q) | ∥f ∥1 = 0}

is a linear subspace of R(Q), and the seminorm ∥ · ∥1 can be interpreted as a norm on the
quotient space R(Q)/N (Q). The resulting normed vector space is not complete and thus
not as useful for advanced analysis. However, the completion of this or similarly constructed
spaces plays a crucial role in measure theory, extensively studied.

Version: February 25, 2024. 111


Chapter 14.2 The Riemann Integral over Jordan-Measurable Sets

14.2.2 The Fubini’s Theorem


So far, we have not seen a general method for the explicit calculation of multidimensional
integrals. We address this in this section with the discussion of Fubini’s theorem (1879-1943).
This theorem states that an integral Q f dx over a box can be calculated as an iterated
R

parameter integral, as hinted at in Exercise ??. We use the notation introduced in ?? for
lower and upper sums U(f ), O(f ) for bounded functions f on a box Q.

Theorem 14.45 (Fubini). — Let P ⊂ Rn and Q ⊂ Rm be closed boxes, and f : P ×Q → R be


a Riemann-integrable function. For x ∈ P , define fx : Q → R as the function fx (y) = f (x, y),
and define
F− (x) = sup U(fx ) and F+ (x) = inf O(fx )

There exists a null set N ⊆ P such that for all x ∈


/ N , the function fx is Riemann integrable,
and thus Z Z
F− (x) = F+ (x) = fx (y)dy = f (x, y)dy
Q Q

holds. The functions F− and F+ on P are both Riemann-integrable, and it holds


Z Z Z
f (x, y)d(x, y) = F− (x)dx = F+ (x)dx.
P ×Q P P

Proof. A partition of the box P × Q is given by a partition of P and a partition of Q. If


Pα ⊆ P and Qβ ⊆ Q are sub-boxes with respect to partitions of P and Q, then Pα × Qβ is a
sub-box with respect to the corresponding partition of P × Q, formally with address (α, β).
We will use this correspondence implicitly several times in the proof. Furthermore, according
to the definition of the volume of boxes, we have

vol(Pα × Qβ ) = vol(Pα ) · vol(Qβ ) (14.10)

where we do not distinguish in the notation between volumes of boxes in Rn+m , Rn , or Rm .


Let h : P ×Q → R be a step function with constant value cα,β on Pα◦ ×Q◦β for a corresponding
partition. For each address α and each fixed x ∈ Pα◦ , hx : y 7→ h(x, y) is then a step function
on Q with respect to the given partition of Q, and
Z X
hx (y)dy = cα,β vol(Qβ )
Q β

is independent of x ∈ Pα◦ . It follows that the functions defined by

H− (x) = sup U(hx ) and H+ (x) = inf O(hx ) (14.11)

are both step functions on P with respect to the given partition of P . It holds
Z Z Z
h(x, y)d(x, y) = H− (x)dx = H− (x)dx (14.12)
P ×Q P P

Version: February 25, 2024. 112


Chapter 14.2 The Riemann Integral over Jordan-Measurable Sets

due to (14.10).
With these preparations, we now consider a function F : P → R with F− ≤ F ≤ F+ ,
choose ϵ > 0 and step functions u and o on P × Q with u ≤ f ≤ o and P ×Q (o − u)d(x, y) < ϵ.
R

Defining step functions U− and O+ according to (14.11), it follows


Z Z
U− ≤ F ≤ O+ and (O+ − U− )dx = (o − u)d(x, y) < ϵ
P P ×Q

from elementary properties of supremum and infimum as well as (14.12). This shows that F
is Riemann-integrable, and also that
Z Z
F (x)dx = f (x, y)d(x, y)
P P ×Q

holds. Choosing F = F− or F = F+ , the second statement of Fubini’s theorem is obtained.


It remains to show that the function fx : Q → R is Riemann-integrable for almost all
x ∈ X, meaning that F+ (x) = F− (x) holds for almost all x ∈ P . We know that g = F+ − F−
is a non-negative, Riemann-integrable function on P with integral P gdx = 0. According
R

to the Lebesgue Criterion, the set of discontinuity points N ⊆ P of g is a null set. If g is


continuous at x0 ∈ P , then g(x0 ) = 0 must hold; otherwise, there would be a δ > 0 such that
g(x) ≥ 21 g(x0 ) for all x ∈ B(x0 , δ), leading to the contradiction
Z
1
0= gdx ≥ g(x0 )vol(B(x0 , δ)) > 0
P 2

It follows that g(x) > 0 =⇒ x ∈ N , and therefore

fx not integrable ⇐⇒ g(x) > 0 =⇒ x ∈ N

concluding the proof of the theorem.

Corollary 14.46. — Let Q = [a1 , b1 ] × · · · × [an , bn ] be a box, and let f : Q → R be


Riemann-integrable. Then,
Z Z b1 Z bn
f dx = ··· f (x1 , . . . , xn )dxn . . . dx1
Q a1 an

if all parameter integrals exist. Otherwise, the parameter integrals can be replaced by suprema
of lower sums or infima of upper sums.

Proof. This follows by repeatedly applying Fubini’s theorem.

Corollary 14.47. — Let D ⊆ Rn−1 be a Jordan-measurable set, let φ− , φ+ : D → R be


Riemann-integrable with φ− ≤ φ+ , and let B ⊆ Rn be the Jordan-measurable subset

B = {(x, y) ∈ D × R | φ− (x) ≤ y ≤ φ+ (x)}.

Version: February 25, 2024. 113


Chapter 14.2 The Riemann Integral over Jordan-Measurable Sets

For any Riemann-integrable function f on B, it holds


!
Z Z Z φ+ (x)
f (x, y)d(x, y) = f (x, y)dy dx,
B D φ− (x)

where the same complications as in Theorem 14.45 may arise.

Proof. According to Proposition 14.33, B is indeed Jordan-measurable. Applying Fubini’s


theorem to a box P ⊆ Rn−1 with D ⊆ Q and the interval Q = [inf φ− (D), sup φ+ (D)], and
denoting f! : P × Q → R as the zero-extension of f , we get
Z Z Z Z Z Z φ+ (x)
f (x, y)d(x, y) = f! d(x, y) = f! dydx = f (x, y)dydx
B P ×Q P Q D φ− (x)

as claimed.

Example 14.48. — Consider the subset B ⊆ R2 enclosed between the parabolas with
equations y = x2 and x = y 2 . We view this set as a uniformly thick, homogeneous plate and
want to calculate its center of mass.

From Section ??, we already know that the region defined by 0 ≤ y ≤ x2 and 0 ≤ x ≤ 1 has
area 13 . Due to symmetry, the area of B is also 13 . Also, due to symmetry, the center of mass
of B lies on the line with equation x = y. The x-coordinate xS , and thus the y of the center of
mass S, is given by definition through xS = vol(B)
1
B xdydx and is calculated with Corollary
R

14.47 as

Z Z 1Z x Z 1 √
1 9
xS = xdydx = 3 xdydx = 3 x( x − x2 )dx = .
vol(B) B 0 x2 0 20

Exercise 14.49. — Calculate the integral of the function f (x, y, z) = xyz over the set
B ⊆ R3 for

B = {(x, y, z) ∈ R3 | 0 ≤ x ≤ y ≤ z ≤ 1} and B = {(x, y, z) ∈ R3 | x2 + y 2 + z 2 ≤ 1}.

Calculate the volume of the set A ⊆ R2 enclosed between the curves x2 +y 2 = 8 and 4y = x2 +4.

Corollary 14.50 (Cavalieri’s Principle). — Let B ⊆ [a, b]×Rn−1 be a bounded and Jordan-
measurable set. Then,
Z b
vol(B) = vol(Bt )dt
a

Version: February 25, 2024. 114


Chapter 14.2 The Riemann Integral over Jordan-Measurable Sets

where for t ∈ [a, b], the subset Bt ⊆ Rn−1 is given by Bt = {y ∈ Rn−1 | (t, y) ∈ B} and is
Jordan-measurable for almost all t ∈ [a, b].

Proof. Follows directly from Theorem 14.45 with f = ⊮B .

Exercise 14.51. — In Section ??, the volume of rotational bodies was defined. Here,
we verify that the introduced volume of Jordan-measurable subsets in the last section is
compatible with it. Let f : [a, b] → R≥0 be continuous, and let
p
K = {(x, y, z) ∈ R3 | a ≤ x ≤ b, 0 ≤ y 2 + z 2 ≤ f (x)}

be the rotational body given by f . Show that K is Jordan-measurable and that the volume
Rb
of Kf in the sense of Definition 14.31 is given by π a f (x)2 dx.

R1R1
Exercise 14.52. — Calculate 0 x exp(y 2 )dydx.

14.53. — Theorems 14.45 and Corollary 14.50 are, among other things, necessary even for
very elementary volume calculations. For example, one might wonder whether the volume of a
pyramid in R3 can be geometrically calculated by decomposing the pyramid into finitely many
operations of cutting and gluing into boxes. This method works excellently for polygons in R2 .
However, already for pyramids in R3 , it can be proven, using the so-called Dehn Invariant,
that this method generally does not work - thus, for the volume calculation of polyhedra in
R3 , one has to resort to integration methods of analysis.

Version: February 25, 2024. 115


Chapter 14.3 Multidimensional Substitution Rule

14.3 Multidimensional Substitution Rule

14.3.1 The Substitution Rule and First Examples


In this section, we introduce the multidimensional substitution rule for integrable functions
with compact support and illustrate it with some examples. We use the concept of a diffeo-
morphism from Definition 12.5. In this context, diffeomorphisms can be understood as smooth
coordinate transformations. The proof is discussed in the following two sections 14.3.2 and
14.3.3.

14.54. — Let n ≥ 1 and U ⊆ Rn be open. The support of a function f : U → R is the set

supp(f ) = {x ∈ U | f (x) ̸= 0}.

We say that f has compact support if supp(f ) is a compact subset of U . If U ⊆ Rn is


open and Jordan-measurable, and f : U → R is integrable, then the extension by zero of f
to Rn , usually denoted as f! : Rn → R, has compact support in Rn . Since U is bounded,
U and therefore supp(f! ) are also bounded. Moreover, as supp(f! ) is closed by definition,
the assertion follows from the Heine-Borel theorem. However, having a compact support for
f : U → R is not necessary. Conversely, if f : U → R is a function with compact support, we
define the integral Z
f dx
U

as the integral of f over a box containing supp(f ), if it exists, and in that case, we call f
Riemann-integrable. Here, U does not necessarily need to be bounded or Jordan-measurable.

Theorem 14.55. — Let X, Y ⊆ Rn be open subsets, Φ : X → Y be a C 1 diffeomorphism,


and f : Y → R be a Riemann-integrable function with compact support. Then, the function
Φ∗ f : X → R defined by (Φ∗ f )(x) = (f ◦ Φ(x))| det(DΦ(x))| is Riemann-integrable, has
compact support, and satisfies
Z Z
f (y)dy = (f ◦ Φ(x))| det(DΦ(x))|dx.
Y X

14.56. — The reason why the hypothesis about the support of f is necessary is as follows:
The function arctan : R → (− π2 , π2 ) is a diffeomorphism. According to the substitution rule,
here in dimension 1, for the constant function f : (− π2 , π2 ) → R with value 1, it should hold
π
Z Z Z ∞
2 ′ 1
π= 1dy = arctan (x) dx = dx
π
−2 R −∞ x2 + 1

This is not entirely incorrect, but in this case, the integral on the right must be understood
as an improper Riemann integral, which we have not introduced in the multidimensional case

Version: February 25, 2024. 116


Chapter 14.3 Multidimensional Substitution Rule

yet. According to Definition ??, the function x 7→ (x2 + 1)−1 on R is not Riemann-integrable,
as R as a subset of R is not Jordan-measurable.

Proposition 14.57. — Let X and Y be open subsets of Rn , and Φ : X → Y be a homeo-


morphism. Let f : Y → R be a function, and denote g = f ◦ Φ.

1. If f is Riemann-integrable, then g is also Riemann-integrable.

2. It holds supp(g) = Φ−1 (supp(f )).

Proof.

Example 14.58. — Let 0 < a < b and 0 < c < d be fixed parameters used to define the
curvilinear bounded rectangle M ⊆ X = R2>0 , which is bounded by the four curves

y = ax2 , y = bx2 , x = cy 2 , x = dy 2 .

This is given by M = {(x, y) ∈ X | 1


b ≤ x2 y −1 ≤ a1 , 1
d ≤ x−1 y 2 ≤ 1c }.

We want to use the substitution rule to calculate the area vol(M ) of M . To do this, we
introduce the variables u = x2 y −1 and v = x−1 y 2 . Through
2 1
! ! ! !
x x2 y −1 u u3 v 3
Ψ = and Φ = 1 2 ,
y x−1 y 2 v u3 v 3

we obtain mutually inverse diffeomorphisms Ψ : X → X and Φ : X → X, and 1M ◦ Φ = 1Q


for the rectangle Q = [ 1b , a1 ] × [ d1 , 1c ]. The Jacobian determinant of Φ is

2 − 13 31 1 23 − 23
!
det(DΦ(u, v)) = det 3u v 3u v = 4
− 1
= 13 .
1 − 23 32 2 13 − 13 9 9
3u v 3u v

for all (u, v) ∈ X. Thus, we obtain


Z Z
1Q (u, v) · 13 dudv = 1 1 1 1 1
 
vol(M ) = 1M (x, y)dxdy = 3 a − b c − d .
X X

Version: February 25, 2024. 117


Chapter 14.3 Multidimensional Substitution Rule

Example 14.59. — Let f : R2 → R be a Riemann-integrable function. We can apply


Theorem 14.55 as follows:
Z Z Z ∞ Z 2π
f (x, y)dxdy = f (x, y)dxdy = f (r cos φ, r sin φ)rdφdr.
R2 U 0 0

where U = {(x, y) ∈ R2 | y ̸= 0 or x < 0}. In the first equation, we omit the integration over
the Jordan null set {(x, y) | y = 0, x ≥ 0} using Proposition 14.40, and in the second equation,
we apply Theorem 14.55 to the polar coordinate transformation discussed in Section ??
! !
r r cos φ
Φ : (0, R) × (0, 2π) → U Φ =
φ r sin φ

The factor r in the right integral is the Jacobian determinant of the polar coordinate trans-
formation Φ.

Figure 14.3: The factor r as the Jacobian determinant has a geometric meaning. A box with
side lengths △r and △φ corresponds to an almost rectangular section of a circular ring with
“side lengths” △r and approximately r△φ, which is often informally denoted as dr and rdφ
due to the desired size of these differences.

Example 14.60. — Let f : R3 → R be a Riemann-integrable function. Then, with similar


arguments as in Example 14.59, we have
Z Z ∞Z π Z 2π
f (x, y, z)dxdydz = f (r sin θ cos φ, r sin θ sin φ, r cos θ)r2 sin θdφdθdr,
R3 0 0 0

where the Jacobian determinant r2 sin θ can again be geometrically interpreted.

Exercise 14.61. — Compute the area for given constants 0 < a < b and 0 < c < d of the
set {(x, y) ∈ R2 | a ≤ y exp(−x) ≤ b, c ≤ y exp(x) ≤ d}.

Exercise 14.62. — Compute the following integral.


Z
y

x2 yzdxdydz V = {(x, y, z) ∈ R3>0 | x2 + y 2 ≤ 1, √1
3
≤ x ≤ 3, z ≤ 1}
V

Version: February 25, 2024. 118


Chapter 14.3 Multidimensional Substitution Rule

14.3.2 Linear Substitution


First, we consider the important special case of a linear coordinate transformation, which will
be significant in the proof of the general case.

Lemma 14.63. — Let T : Rn → Rn be an invertible linear transformation given by an upper


triangular matrix, and let f : Rn → R be a continuous function with compact support. Then,
Z Z
f (x)dx = | det(T )| f (T (x))dx
Rn Rn

Proof. Since T is invertible, the function f ◦ T : Rn → R has compact support, namely


supp(f ◦ T ) = T −1 (supp f ), and is Riemann-integrable. We express the integral f (T (x))dx
R

using Fubini’s theorem as a multiple parameter integral


ZZZ Z

· · · f t11 x1 + t12 x2 + · · · + t1n xn , t22 x2 + t23 x3 + · · · + t2n xn , . . . , tnn xn dx1 dx2 · · · dxn

where tij are the coefficients of the matrix associated with T . Each of these parameter integrals
extends from −∞ to ∞ or alternatively from −R to R for a sufficiently chosen R > 0, as f has
compact support. For the integral with respect to the variable xk , starting with k = 1, then
for k = 2, and so on until k = n, we perform the one-dimensional linear substitution yk =
|tkk |xk + tk,k+1 xk+1 + · · · + tk,n xn . In Leibniz notation, dyk = |tkk |dxk , or dxk = |tk k −1 |dyk .
The multiple integral becomes
ZZZ Z
−1
|t11 t22 · · · tnn | · · · f (y1 , y2 , . . . , yn )dyn · · · dy2 dy1

which, again using Fubini’s theorem, is precisely the integral | det T |−1
R
f (y)dy.

Lemma 14.64. — Let L : Rn → Rn be an invertible linear transformation, and f : Rn → R


be a Riemann-integrable function. Then, f ◦ L is Riemann-integrable, and it holds
Z Z
f (x)dx = | det(L)| f (L(x))dx
Rn Rn

Proof. Let’s assume initially that the function f is continuous. From linear algebra, we know
that L can be written as the product of invertible matrices L = P ST , where P is a permutation
matrix, S is a lower triangular matrix, and T is an upper triangular matrix. The statement
of Lemma 14.63 holds equally for lower triangular matrices with the same proof, and also for
permutation matrices. It follows by applying Lemma 14.63 three times:
Z Z Z
f (P ST (x))dx = | det(P )|| det(S)|| det(T )| f (x)dx = | det(L)| f (x)dx
Rn Rn Rn

which proves the statement for continuous functions. Now, let f : Rn → R be any Riemann-
integrable function with support in an open box Q ⊆ Rn . According to Proposition ??, for

Version: February 25, 2024. 119


Chapter 14.3 Multidimensional Substitution Rule

every ϵ > 0, there exist continuous functions f− and f+ on Rn with support in Q such that
Z
f− ≤ f ≤ f+ and (f+ − f− ) < ϵ.
Rn

From this, using the already treated case of continuous functions, we get
Z
f− ◦ L ≤ f ◦ L ≤ f+ ◦ L and (f+ ◦ L − f− ◦ L) < | det L|ϵ
Rn

which, by Proposition ??, implies that f ◦ L is Riemann integrable. We have


Z Z Z Z
−ϵ< f− (x)dx − | det L| f+ (L(x))dx ≤ f (x)dx − | det L| f (L(x))dx
Rn Rn Z Rn
ZR n

≤ f+ (x)dx − | det L| f− (L(x))dx < ϵ


Rn Rn

which proves the lemma since ϵ > 0 was arbitrary.

Corollary 14.65. — Let L : Rn → Rn be a linear transformation, and B ⊆ Rn be a


Jordan-measurable subset. Then, L(B) is Jordan-measurable, and vol(L(B)) = | det L| vol(B).

Proof. If L is not invertible, then L(B) is bounded and contained in a proper linear subspace
of Rn , and thus, a Jordan null set. If L is invertible, the volume vol(L(B)) is given by Lemma
14.64 as
Z Z Z
1L(B) (x)dx = | det L| 1L(B) (L(x))dx = | det L| 1B (x)dx = | det L| vol(B)
Rn Rn Rn

which completes the proof.

Exercise 14.66. — Compute, for a symmetric positive definite matrix A ∈ Matn (R), the
volume of the ellipsoid {x ∈ Rn | ⟨Ax, x⟩ ≤ 1}, assuming knowledge of the volume of the
unit ball.

Corollary 14.67. — Let L ∈ Matn (R) with columns v1 , . . . , vn ∈ Rn . Then, the paral-
lelotope ( n )
X
P = L([0, 1]n ) = s i vi 0 ≤ s i ≤ 1
i=1
p
is Jordan-measurable, and vol(P ) = | det(L)| = gram(v1 , . . . , vn ).

Proof. This follows from Corollary 14.65, applied to the Jordan-measurable set [0, 1]n with
volume 1. Here, gram(v1 , . . . , vn ) denotes the Gram determinant, which is the determinant
of the matrix A with coefficients aij = ⟨vi , vj ⟩. It holds A = Lt L, and therefore | det L|2 =
det A.

Version: February 25, 2024. 120


Chapter 14.3 Multidimensional Substitution Rule

14.3.3 Proof of the Substitution Rule


In this section, we prove the substitution rule for functions with compact support, Theorem
14.55. We will repeatedly make use of the following fact.

14.68. — Let X ⊂ Rn be an open subset, and let K0 ⊂ X be a compact subset. Then,


there exists δ0 > 0, such that the compact set

K1 = K0 + B∞ (0, 2δ0 ) = {x + y | x ∈ X, ∥y∥∞ ≤ 2δ0 }

is contained in X. The compactness of K1 follows from the Heine-Borel theorem. The con-
tainment of K1 in X implies that any closed box Q ⊆ Rn with maximum edge length less than
δ0 and Q ∩ K0 ̸= ∅ is contained in K1 . To see this, we can consider the continuous function,
as per Exercise 9.54:

h : K0 → R, h(x) = sup{δ > 0 | B∞ (x, 2δ) ⊆ X}

Since X is open, we have h(x) > 0 for all x ∈ K0 , and by Theorem ??, this function attains
its minimum on K0 . Then, δ0 = 31 min{h(x) | x ∈ K0 } fulfills the required condition.

Figure 14.4: The set K1 corresponds to a slightly inflated version of K0 within X, defined in
such a way that sufficiently small boxes intersecting K0 are fully contained in K1 .

Lemma 14.69. — Let X ⊆ Rn and Y ⊆ Rn be open, and let Φ : X → Y be a C 1


diffeomorphism. Consider an axis-parallel closed cube Q0 ⊆ X with side length 2r > 0 and
center x0 ∈ X. Set y0 = Φ(x0 ), L = DΦ(x0 ), and

σ = max{∥DΦ(x) − L∥op | x ∈ Q0 }. (14.13)



For any real number s satisfying σ∥L−1 ∥op n ≤ s < 1, we have

y0 + (1 − s)L(Q0 − x0 ) ⊆ Φ(Q0 ) ⊆ y0 + (1 + s)L(Q0 − x0 ). (14.14)

Version: February 25, 2024. 121


Chapter 14.3 Multidimensional Substitution Rule

Proof. We can replace Φ with Φ(x − x0 ) − y0 , without loss of generality, assuming x0 = 0 and
y0 = Φ(x0 ) = 0. In this case, the inclusions (14.14) become

(1 − s)Q0 ⊆ L−1 Φ(Q0 ) ⊆ (1 + s)Q0 ,

which, in turn, is equivalent to the inequalities

∥x∥∞ ≤ r(1 − s) =⇒ x ∈ L−1 Φ(Q0 ) (14.15)


−1
∥x∥∞ ≤ r =⇒ ∥L Φ(x)∥∞ ≤ (1 + s)r (14.16)

For all x1 , x2 ∈ Q0 , we have


Z 1 
Φ(x2 ) − Φ(x1 ) − L(x2 ) + L(x1 ) = DΦ(x1 + t(x2 − x1 )) − L (x2 − x1 )dt
0

from which, using the vector-valued integral triangle inequality in (??), the estimation

∥(Φ(x2 ) − L(x2 )) − (Φ(x1 ) − L(x1 ))∥ ≤ σ∥x2 − x1 ∥ ≤ σ n∥x2 − x1 ∥∞ (14.17)

follows. For x2 = x ∈ Q0 and x1 = 0, we obtain ∥Φ(x) − L(x)∥ ≤ σ nr, and thus

∥L−1 Φ(x) − x∥∞ ≤ ∥∥L−1 Φ(x) − x∥ ≤ ∥L−1 ∥op σ nr ≤ sr (14.18)

Therefore, the claimed inequality in (14.16) holds. For the first inclusion, we apply the Banach
Fixed Point Theorem 9.56 to the map

Tx : Q0 → Q0 , Tx (y) = x − (L−1 Φ(y) − y)

For y ∈ Q0 and x ∈ (1 − s)Q0 , we have, by (14.18)

∥Tx (y)∥∞ ≤ ∥x∥∞ + ∥L−1 Φ(y) − y∥∞ ≤ (1 − s)r + sr = r.

Thus, Tx indeed maps every point in Q0 back to Q0 . For y1 , y2 ∈ Q0 , (14.17) implies

∥Tx (y1 ) − Tx (y2 )∥ = ∥(Φ(y1 ) − y1 ) − (Φ(y2 ) − y2 )∥ ≤ ∥L−1 ∥op σ∥y1 − y2 ∥.

This shows ∥Tx (y1 ) − Tx (y2 )∥ ≤ s∥y1 − y2 ∥. Since s < 1 by assumption, we can apply the
Banach Fixed Point Theorem, showing that for every x ∈ (1 − s)Q0 , there exists a unique
y ∈ Q0 such that Tx (y) = y or equivalently L−1 Φ(y) = x. This proves (14.16).

Lemma 14.70. — Let X, Y ⊂ Rn be open, let Φ : X → Y be a C 1 diffeomorphism, and let


K0 ⊆ X be a compact subset. Then, for any ϵ ∈ (0, 1), there exists δ > 0 with the following
property: For any cube Q0 with center x0 , side length smaller than δ, and Q0 ∩ K0 ̸= ∅, we
have
vol(Φ(Q0 )) vol(Φ(Q0 ))
≤ | det DΦ(x0 )| vol(Q0 ) ≤
1+ϵ 1−ϵ

Version: February 25, 2024. 122


Chapter 14.3 Multidimensional Substitution Rule

Proof. We will construct parallelotopes P + and P − such that P − ⊆ Φ(Q0 ) ⊆ P + and fulfill
the inequalities
vol(P + ) vol(P − )
< | det DΦ(x0 )| vol(Q0 ) <
1+ϵ 1−ϵ
Let s ∈ (0, 1) be small enough such that

(1 − s)n > 1 − ϵ, (1 + s)n < 1 + ϵ,

and let δ0 > 0 be small enough such that the compact set K1 = K0 + B∞ (0, 2δ0 ) is contained
in X. According to Theorem ??, the continuous function x 7→ ∥DΦ(x)−1 ∥op is bounded on
K1 , so we have
∥DΦ(x)−1 ∥op < M for all x ∈ K1

for some suitable M > 0. According to Proposition ??, DΦ is uniformly continuous on K1 ,


so there exists δ ∈ (0, δ0 ) such that for all x0 , x ∈ K1 ,

∥x0 − x∥∞ < δ =⇒ M · ∥DΦ(x) − DΦ(x0 )∥op · n<s

Hold. Now, let x0 ∈ K0 and Q0 be a closed cube with center x0 and side length smaller than
δ. Write y0 = Φ(x0 ) and L = DΦ(x0 ). According to Lemma 14.69, we have

y0 + (1 − s)L(Q0 − x0 ) ⊆ Φ(Q0 ) ⊆ y0 + (1 + s)L(Q0 − x0 )

We set P + = y0 + (1 + s)L(Q0 − x0 ) and P − = y0 + (1 − s)L(Q0 − x0 ). Therefore, P − ⊆


Φ(Q0 ) ⊆ P + . According to Corollary 14.65,

vol(P − ) = (1 − s)n | det L| vol(Q0 ) and vol(P + ) = (1 + s)n | det L| vol(Q0 )

which implies the inequalities in the lemma given the choice of s.

Proof of Theorem ??. Let f : Y → R be a Riemann integrable function. Writing f = f+ − f−


for non-negative functions f+ and f− as in ?? and using linearity of the integral, we can
assume without loss of generality that f ≥ 0. Write g = f ◦ Φ. Then, according to Proposition
??, g : X → R is also Riemann-integrable, and the support K0 = supp(g) ⊆ X is compact.
Choose δ0 > 0 small enough so that the compact set

K1 := K0 + B∞ (0, δ0 )

is contained in X, and choose a closed box Q ⊆ Rn containing K1 . Write g! for the extension
by 0 of g to Q.
Let ϵ > 0, and let δ ∈ (0, δ0 ) be small enough such that δ satisfies the statement for ϵ > 0
in Lemma 14.70 for K0 ⊆ X. Since the function x 7→ det DΦ(x) is continuous on X and thus

Version: February 25, 2024. 123


Chapter 14.3 Multidimensional Substitution Rule

uniformly continuous on K1 , we can additionally choose δ to be small enough that

∥x0 − x∥ ≤ δ =⇒ | det DΦ(x0 ) − det DΦ(x)| < ϵ

holds for all x0 , x ∈ K1 . Choose non-negative step functions u and o on Q with respect to a
partition of Q with mesh smaller than δ, such that
Z
u ≤ g! ≤ o and (o − u)dx < ϵ
Q

Write cα ≥ 0 for the constant value of u on Q◦α and dα ≥ 0 for the constant value of u on Q◦α .
Let A be the set of addresses α in this partition for which Qα ∩ K0 ̸= ∅. For α ∈ A, we have
Qα ⊆ K1 , and
[ [
supp(g) ⊆ Qα ⊆ X and supp(f ) ⊆ Φ(Qα ) ⊆ Y.
α∈A α∈A

For α ∈ A, both ∂Qα and Φ(∂Qα ) are Jordan measurable with volume zero. We can thus
calculate the integral of g| det DΦ| over X as
Z XZ
g(x)| det DΦ(x)|dx = g(x)| det DΦ(x)|dx
X ◦
α∈A Qα

and the integral of f over Y as


Z XZ
f (y)dy = f (y)dy
Y ◦
α∈A Φ(Qα )

We now estimate the individual summands in these representations. For brevity, write vα =
vol(Qα ) and v = vol(Q), and choose M > 0 such that

| det DΦ(xα )| ≤ M, |cα | ≤ M and |dα | ≤ M

for all α ∈ A, where xα denotes the center of Qα .

Figure 14.5: Geometric setup in the proof of the substitution rule. The boxes Qα with α ∈ A
are highlighted.

Version: February 25, 2024. 124


Chapter 14.3 Multidimensional Substitution Rule

The given partition of Q having mesh smaller than δ means that for all α ∈ A and x ∈ Qα ,
the estimate ∥xα − x∥ ≤ δ holds. By choosing δ accordingly, this implies | det(DΦ(x)) −
det(DΦ(xα ))| ≤ ϵ for all α ∈ A and x ∈ Qα . This leads to the upper estimate
Z
g(x)| det(DΦ(x))|dx ≤ vα dα (| det(DΦ(xα ))| + ϵ) ≤ vα dα | det(DΦ(xα ))| + ϵvα M.

Similarly, we can derive a lower estimate, using the step function u ≤ g with constant value
cα on Q◦α . Summing over α ∈ A gives
X Z X
−ϵM v + vα cα | det(DΦ(xα ))| ≤ g(x)| det(DΦ(x))|dx ≤ ϵM v + vα dα | det(DΦ(xα ))|
α∈A X α∈A
(14.19)
By choosing δ and using Lemma 14.70, we have

vol(Φ(Q◦α ))dα ≤ (1 + ϵ) det(Φ(xα ))vα dα ≤ ϵM vα + det(DΦ(xα ))vα dα

for all α ∈ A. The function o ◦ Φ is constant with value dα on Φ(Q◦α ) and an upper bound for
f = g ◦ Φ−1 , which implies
Z Z
f dy ≤ (o ◦ Φ−1 )dy ≤ ϵM vα + det(DΦ(xα ))vα dα
Φ(Q◦α ) Φ(Q◦α )

Summing over α ∈ A gives


Z X X
f dy ≤ ϵM vα + det(DΦ(xα ))vα dα ≤ ϵM v + det(DΦ(xα ))vα dα
Y α∈A α∈A

Similarly, using the step function u ≤ g with constant value cα on Q◦α , we derive a lower bound
for the integral of f , which finally leads to the estimates
X Z X
−ϵM v + det(DΦ(xα ))vα cα ≤ f dy ≤ ϵM v + det(DΦ(xα ))vα dα (14.20)
α∈A Y α∈A

Version: February 25, 2024. 125


Chapter 14.4 Improper Multiple Integrals and applications

14.4 Improper Multiple Integrals and applications


For nonnegative continuous f : Rn → R define
Z Z
f (x)dx = lim f (x)dx
Rm R→∞ BR

Notice that change of variable formula holds and Fubini theorem hold.

Example 14.71. — The Gaussian curve f : R → R, also known as the probability


density function, is defined by f (x) = exp(−x2 ). The normalized cumulative distribution
function Z x
1
F (x) = √ exp(−t2 ) dt
π −∞
is called the distribution function of the normal distribution and is an indispensable function
in statistics. We compute the converging integral
Z ∞
I= exp(−x2 ) dx
−∞

by using polar coordinates in R2 to calculate I 2 . Indeed, we have

Z  Z  Z Z ∞ Z 2π
−x2 −y 2 −x2 −y 2 2
2
I = e dx e dy = e dxdy = e−r r drdφ
R R R2 0 0
Z ∞
2
=π e−r 2r dr = π
0

which implies I = π. We have applied Fubini’s Theorem to the entire R2 , which can be
justified by considering suitable exhaustions, such as ([−m, m]2 )∞
m=0 for R .
2

Exercise 14.72. — Let A ∈ Matn,n (R) be symmetric and positive definite. Show

π n/2
Z
exp(−⟨Ax, x⟩)dx = √
Rn det A

Example 14.73. — As an application of the theory in this chapter, we want to prove the
equation known as the Basel problem:

X 1 π2
ζ(2) = =
n2 6
n=1

The approach we take here is from T. Apostol [Apo1983]. To do this, we will evaluate the
integral Z 1Z 1
1
dxdy (14.21)
0 0 1 − xy
in two different ways.

Version: February 25, 2024. 126


Chapter 14.4 Improper Multiple Integrals and applications

First, using the geometric series, we observe that

Z 1Z 1 Z ∞
1Z 1X ∞ Z 1Z 1
1 k
X
dxdy = (xy) dxdy = xk y k dxdy
0 0 1 − xy 0 0 k=0 0 0
k=0
∞ Z 1 ∞
X 1 X 1
= y k dy = = ζ(2) (14.22)
0 k+1 (k + 1)2
k=0 k=0

It is left to the reader to show that the function f : (x, y) 7→ 1−xy


1
on [0, 1)2 is improperly
Riemann-integrable and that the manipulations in (14.22) are meaningful.
Now, we want to calculate the integral (14.21) using a linear substitution. For this, we
consider new integration variables u, v given by the rotation
! ! !
x 1 1 −1 u
= .
y 2 1 1 v

The new integration domain in the u, v-coordinates is the square with corners at (0, 0), (1, 1),
(2, 0), and (1, −1), as can be easily verified by substitution. It holds that

1 1 4
= =
1 − xy 1 − 14 (u − v)(u + v) 4 − u2 + v 2

and with substitution and Fubini, we have


Z 1Z 1 Z 1Z u Z 2 Z 2−u
1 4 1 4 1
dxdy = 2 2
dvdu + 2 2
dvdu
0 0 1 − xy 0 −u 4 − u + v 2 1 −(2−u) 4 − u + v 2
Z 1Z u Z 2 Z 2−u
1 1
=4 2 2
dvdu +4 dvdu
0 0 4−u +v 1 0 4 − u2 + v 2
| {z } | {z }
=I1 =I2

A direct calculation (see (??)) shows


Z 1   Z π
1 u 6
π 2

I1 = 4 √ arctan √ du = 4 θdθ = 2 6
0 4 − u2 4−u2
0

where u = 2 sin(θ) is substituted. Similarly, we calculate


π
2  
2−u
Z Z
1 6
π 2

I2 = 4 √ arctan √ du = 8 ϕ dφ = 4 6 ,
1 4 − u2 4 − u2 0

where u = 2 cos(2φ) is substituted. In summary, we obtain


Z 1Z 1  π 2 π 2
1  π 2
ζ(2) = dxdy = I1 + I2 = 2 +4 =
0 0 1 − xy 6 6 6

as claimed.

Exercise 14.74. — Carry out the above calculations in detail, justify all the formal steps.

Version: February 25, 2024. 127


Chapter 14.5 Improper Multiple Integrals and applications

P∞ (−1)n+1
Exercise 14.75. — Calculate the value of the alternating series n=1 n2
.

Exercise 14.76. — Let n ∈ N. The set ∆n = {(x1 , . . . , xn ) ∈ [0, 1]n | x1 + · · · + xn ≤ 1} is


called the n-dimensional standard simplex. Calculate the volume vol(∆n ) and the integral
Z
ex1 +···+xn dx1 · · · dxn .
∆n

Exercise 14.77. — For n ∈ N, let ωn denote the volume of the n-dimensional unit ball
B(0, 1) ⊆ Rn . Show that
π n/2
ωn = .
Γ(n/2 + 1)
Calculate ω100 with computer assistance to 30 correct decimal places. For which n is ωn
maximal?
Hint: Show or use without proof that the one-dimensional integral
Z π
In = sinn (x) dx
0

satisfies the equation In In−1 = n1 I1 I0 = 2π


n and use this to derive a recursion formula for ωn .

Exercise 14.78. — Let (Bm )∞ m=0 be an exhaustion of a subset B ⊆ R . Suppose f, g :


n

B → R such that |f | ≤ g and the functions f |Bm and g|Bm are Riemann-integrable for all
m ∈ N. Assume g is improperly Riemann-integrable. Show that f and |f | are also improperly
Riemann-integrable on B, and
Z Z Z
f dx ≤ |f |dx ≤ gdx
B B B

holds.

Exercise 14.79. — For x, y > 0, the beta function is defined by


Z 1
B(x, y) = tx−1 (1 − t)y−1 dt
0

Show that
Γ(x)Γ(y)
B(x, y) =
Γ(x + y)
holds.

Version: February 25, 2024. 128


Chapter 14.5 Parameter Integrals

14.5 Parameter Integrals

14.5.1 Interchanging Differentiation and Integration


Let a < b be real numbers, U ⊆ Rn be open, and f : U × [a, b] → R be a function. An integral
of the form Z b
F (x) = f (x, t)dt
a

is referred to as a parameter integral. Typically, we assume that the function f in n + 1


variables is at least continuous so that, in particular, for each fixed x ∈ U , the map t 7→ f (x, t)
is continuous and thus Riemann integrable.
DO THE GENERAL THEOREM!

Theorem 14.80: Integrals depending on a parameter


Let U ⊂ Rn be an open subset, a < b be real numbers, and f : U × [a, b] → R be
continuous. Then, the parameter integral
Z b
F (x) = f (x, t)dt
a

defines a continuous function F : U → R. If the partial derivatives ∂k f for k = 1, . . . , n


exist and are continuous on U × [a, b], then F is continuously differentiable, and it holds
that Z b
∂k F (x) = ∂k f (x, t)dt
a

for all x ∈ U and k ∈ {1, . . . , n}.

Proof. We check the continuity of F at x0 ∈ U . Let ϵ > 0. Choose r > 0 such that B(x0 , r) is
contained in U . According to the Heine-Borel Theorem, let K := B(x0 , r) × [a, b] be compact,
and f |K is uniformly continuous by Heine-Cantor Theorem. Thus, there exists δ ∈ (0, r) such
that for all x ∈ B(x0 , δ) and t ∈ [a, b], the inequality |f (x, t) − f (x0 , t)| < ϵ(b − a)−1 holds.
This implies
Z b
|F (x) − F (x0 )| ≤ |f (x, t) − f (x0 , t)|dt < ϵ
a

for all x ∈ B(x0 , δ), proving the continuity of F at x0 .


Assuming the partial derivative ∂k f exists and is continuous. For s ∈ (−r, r) \ {0} and
t ∈ [a, b], according to the Mean Value Theorem 10.22, there exists ξ ∈ (0, 1) such that

f (x0 + sek , t) − f (x0 , t)


= ∂k f (x0 + ξsek , t).
s

Let ϵ > 0. Due to the uniform continuity of ∂k f on K, there exists δ ∈ (0, r) such that for
x ∈ B(x0 , δ) and all t ∈ [a, b], the estimate

|∂k f (x, t) − ∂k f (x0 , t)| < ϵ(b − a)−1

Version: February 25, 2024. 129


Chapter 14.5 Parameter Integrals

holds. Combining this, for s ∈ (−δ, δ) \ {0}, we get

Z b Z b 
F (x0 + sek ) − F (x0 ) f (x0 + sek , t) − f (x0 , t)
− ∂k f (x0 , t)dt = − ∂k f (x0 , t) dt
s a a s
Z b
= (∂k f (x0 + ξsek , t) − ∂k f (x0 , t)) dt ≤ ϵ(b − a).
a

As ϵ > 0 was arbitrary, we conclude


b
F (x0 + sek ) − F (x0 )
Z
∂k F (x0 ) = lim = ∂k f (x0 , t)dt.
s→0 s a

By the first part of the theorem, ∂k F is continuous, and since k was arbitrary, continuous
differentiability of F follows from Theorem 10.10.

Example 14.81. — Theorem 14.80 allows us to analyze functions given by integrals. An


example of such a function is obtained by calculating the circumference of an ellipse with axis
lengths 2a and 2b. Let a ≥ b > 0 be real. The ellipse with parameters a, b is defined by the
solution set of the equation
x2 y 2
+ 2 = 1.
a2 b
A parametrization of this ellipse is given by γ(t) = (a cos(t), b sin(t)) for t ∈ [0, 2π]. The
circumference of the ellipse is then, as explained in Section ??:
Z 2π q Z 2π q
L(γ) = a2 sin2 (t) + b2 cos2 (t)dt = a sin2 (t) + ( ab )2 cos2 (t)dt
0 0
Z 2π p Z π/2 q
=a 2 2
1 − ε cos (t)dt = 4a 1 − ε2 sin2 (t)dt,
0 0
q
2
where ε = 1 − ab 2 denotes the eccentricity of the ellipse, which measures the deviation of
the ellipse from a circle. The parameter integral
Z π/2 q
F (x) = 1 − x2 sin2 (t)dt
0

defines, for x ∈ (0, 1), the so-called “complete elliptic integral of the second kind”. According
to Theorem 14.80 and induction, the function F : (0, 1) → R is smooth.

Version: February 25, 2024. 130


Chapter 14.5 Parameter Integrals

Corollary 14.82:
Let U ⊂ Rn be open, a < b be real numbers, and f : U × (a, b) → R be continuous
with continuous partial derivatives ∂k f for k ∈ {1, . . . , n}. Let α, β : U → (a, b) be
continuously differentiable. Then, the parameter integral with varying limits
Z β(x)
F (x) = f (x, t)dt
α(x)

is continuously differentiable on U , and it holds that


Z β(x)
∂k F (x) = f (x, β(x)) ∂k β(x) − f (x, α(x)) ∂k α(x) + ∂k f (x, t)dt
α(x)

for all x ∈ U .

Proof. We combine Theorem 14.80, the Fundamental Theorem of Calculus from Analysis I,
and the chain rule (10.16). For this, we define the auxiliary function
Z β
2
φ : U × (a, b) → R, (x, α, β) 7→ f (x, t)dt
α

First, we show that φ is continuous. Let (xn , αn , βn ) ∈ U × (a, b)2 be a sequence converging
to (x, α, β) ∈ U × (a, b)2 . Choose ϵ > 0 and c, d ∈ (a, b) such that

B(x, ϵ) ⊂ U and c ≤ αn , βn ≤ d

for all n ≥ 0. Thus, K = B(x, ϵ) × [c, d] is a compact subset of U × (a, b) and

M = max{|f (x′ , t′ )| | (x′ , t′ ) ∈ K}

exists. For all sufficiently large n, xn ∈ B(x, ϵ) holds, and consequently

|φ(xn , αn , βn ) − φ(x, α, β)|


Z βn Z β Z β Z β
≤ f (xn , t)dt − f (xn , t)dt + f (xn , t)dt − f (x, t)dt
αn α α α
Z β Z β
≤ M |αn − α| + M |βn − β| + f (xn , t)dt − f (x, t)dt ,
α α

using the triangle inequality for integrals over the subintervals between αn and α, as well as
βn and β, and the bound M for the function values of f . For n → ∞, it follows from Theorem
14.80 that this expression tends to 0, hence φ is continuous.
According to Theorem 14.80, the partial derivatives ∂k φ for k = 1, 2, . . . , n exist and are
given by
Z β
∂k φ(x, α, β) = ∂k f (x, t)dt.
α

Version: February 25, 2024. 131


Chapter 14.5 Parameter Integrals

As argued above, ∂k φ is also continuous. By The fundamental Theorem of Integral Calculus,


the partial derivatives of φ with respect to α and β exist and are given by

∂α φ(x, α, β) = −f (x, α) and ∂β φ(x, α, β) = f (x, β)

In particular, ∂α φ and ∂β φ are continuous. By Theorem 10.10, φ is continuously differen-


tiable.
The function F is given by F (x) = φ(x, α(x), β(x)), and can thus be seen as the composition
of φ with the function ψ : U → U × (a, b)2

ψ(x) = (x, α(x), β(x))

According to the assumption, ψ is also continuously differentiable, with a total derivative


given by
Dψ(x)(v) = (v, Dα(x)(v), Dβ(x)(v))

We can apply the chain rule and obtain that F is continuously differentiable, with a total
derivative

DF (x)(v) = Dφ(ψ(x))(Dψ(x)(v)) = Dφ(ψ(x))(v, Dα(x)(v), Dβ(x)(v)).

For v = ek and thus ∂k F (x) = DF (x)(ek ), it follows, in particular,

∂k F (x) = ∂k φ(x, α(x), β(x)) + ∂α φ(x, α(x), β(x))∂k α(x) + ∂β φ(x, α(x), β(x))∂k β(x)
Z β(x)
= ∂k f (x, t)dt − f (x, α(x))∂k α(x) + f (x, β(x))∂k β(x)
α(x)

which was to be shown.

14.5.2 The Bessel Differential Equation


As an application of the general theory of parameter integrals, we want to use it here to solve
a differential equation. For a parameter n ≥ 0, usually a natural number, the differential
equation
x2 u′′ (x) + xu′ (x) + (x2 − n2 )u(x) = 0 (14.23)

for u ∈ C 2 ((0, ∞), R) is called the Bessel Differential Equation. It is linear, homogeneous
of the second order. From the Picard-Lindelöf existence and uniqueness theorem, which we
will prove towards the end of the semester, it follows that (14.23), together with any two initial
values u(1) = a and u′ (1) = b for a, b ∈ R, has a uniquely determined solution on (0, ∞). In
particular, the vector space of solutions to (14.23) is two-dimensional. We aim to provide two
linearly independent solutions. For this purpose, we assume n ∈ N.

Version: February 25, 2024. 132


Chapter 14.5 Parameter Integrals

14.83. — The function defined by the parameter integral


Z π
1
Jn (x) = cos(x sin(t) − nt)dt (14.24)
π 0

is called the Bessel function of the first kind, and it solves the differential equation
(14.23), as we can verify using Theorem 14.80. Indeed, we have ∂x (cos(x sin(t) − nt)) =
− sin(x sin(t) − nt) sin(t), and therefore,
Z π
1
Jn′ (x) =

− sin(x sin(t) − nt) sin(t) dt
π 0

and similarly Z π
1
Jn′′ (x) − cos(x sin(t) − nt) sin2 (t) dt.

=
π 0

For the expression x2 Jn′′ (x) + (x2 − n2 )Jn (x), we obtain


Z π
1
− x2 cos(x sin(t) − nt) sin2 (t) + (x2 − n2 ) cos(x sin(t) − nt) dt

π 0
1 π
Z
cos(x sin(t) − nt) x2 cos2 (t) − n2 dt

=
π 0
1 π
Z
 
= cos(x sin(t) − nt) x cos(t) − n x cos(t) + n dt
π 0 | {z }
∂t (x sin(t)−nt)

1
= sin(x sin(t) − nt)(x cos(t) + n)
π 0
1 π
Z
+ sin(x sin(t) − nt)x sin(t) dt
π 0
= −xJn′ (x)

using integration by parts and the assumption n ∈ N. Therefore, (14.24) satisfies the differ-
ential equation (14.23).

14.84. — The Bessel function of the second kind is defined by the improper integral
Z π Z ∞
1 1
exp(t)+(−1)n exp(−nt) exp(−x sinh(t))dt

Yn (x) = sin(x sin(t)−nt)x sin(t) dt−
π 0 π 0

for x ∈ (0, ∞). It can be shown that Yn also satisfies the differential equation (14.23). We
have
1 π
Z
lim Jn (x) = cos(nt)dt
x→0 π 0
according to Theorem 14.80, and
lim Yn (x) = −∞. (14.25)
x→0

This shows that Jn and Yn are linearly independent.

Version: February 25, 2024. 133


Chapter 14.5 Parameter Integrals

Exercise 14.85. — Let n ∈ N.

(a) Show that the Bessel function Yn of the second kind is well-defined and prove the asymp-
totics in (14.25).

(b) Assume a suitable generalization of differentiation under the integral for the improper
integrals and use it to prove that Yn is a solution to the Bessel differential equation
(14.23).

(c) For the proof of the appropriate generalization of differentiation under the integral, con-
sider the real-valued function

f (x, t) = exp(t) + (−1)n exp(−nt) exp(−x sinh(t))




on (0, ∞) × [1, ∞) and



0 if s = 0
F (x, s) = f (x, s−1 )
 if s > 0
s2
on (0, ∞) × [0, 1]. Show that
Z ∞ Z 1
f (x, t)dt = F (x, s)ds
1 0

holds and that F satisfies all the conditions of Theorem 14.80.

Version: February 25, 2024. 134


Index

1-norm, 31 infinity norm, 31


integrability condition, 77
algebraically closed, 69
integrability conditions, 77
Basel problem, 126 interior, 16
beta function, 128
Jacobian matrix, 50
Cauchy sequence, 10
Laplacian, 57
closed, 14
limit, 7
closure, 16
Lipschitz constant, 19
compact, 21
Lipschitz continuous, 19
complete, 10
loop, 76
completion, 11, 111
Lower sums, 99
connected, 27
connected component, 27 Manhattan metric, 6
contraction, 19 maximum norm, 31
converges, 7 measure theory, 99, 111
metric, 5
Dehn Invariant, 115 metric space, 5
diffeomorphism, 88
differentiable, 45, 55 Norm
differential, 45 Euclidean, 35
discrete metric, 6 induced, 35
distance, 5 norm, 31
Normal distribution, 126
eccentricity, 130 normal space, 92
ellipse, 130 Null set, 100
ellipsoid, 120
Equivalence open, 14
Norms, 36
parameter integral, 129
Euclidean norm, 35
Partition, 97
fixed point, 19 points, 5

Gaussian curve, 126 Riemann integral, 99


Gram determinant, 120
Schwarz, Theorem of-, 56
Hessian matrix, 57 seminorm, 111
sequence, 7
135
Chapter 14.5 INDEX

smooth, 55
space, 5
standard metric, 5
standard simplex, 128
Step function, 98
subsequence, 8
support, 116

tangent space, 92
term, 7
topological boundary, 16
topology, 14

uniformly continuous, 19
Upper sums, 99

vector field, 73

Version: February 25, 2024. 136


Bibliography

[ACa2003] N. A’Campo, A natural construction for the real numbers arXiv preprint 0301015,
(2003)

[Apo1983] T. Apostol, A proof that Euler missed: Evaluating ζ(2) the easy way The Mathe-
matical Intelligencer 5 no.3, p. 59–60 (1983)

[Aig2014] M. Aigner and G. M. Ziegler, Das BUCH der Beweise Springer, (2014)

[Amm2006] H. Amann und J. Escher, Analysis I, 3. Auflage, Grundstudium Mathematik,


Birkhäuser Basel, (2006)

[Bla2003] C. Blatter, Analysis I ETH Skript, https://people.math.ethz.ch/ blatter/dlp.html


(2003)

[Bol1817] B. Bolzano, Rein analytischer Beweis des Lehrsatzes, daß zwischen je zwei Werthen,
die ein entgegengesetztes Resultat gewähren, wenigstens eine reelle Wurzel der Gleichung
liege, Haase Verl. Prag (1817)

[Boo1847] G. Boole, The mathematical analysis of logic Philosophical library, (1847)

[Can1895] G. Cantor, Beiträge zur Begründung der transfiniten Mengenlehre Mathematische


Annalen 46 no.4, 481–512 (1895)

[Cau1821] A.L. Cauchy, Cours d’analyse de l’école royale polytechnique L’Imprimerie Royale,
Debure frères, Libraires du Roi et de la Bibliothèque du Roi. Paris, (1821)

[Ded1872] R. Dedekind, Stetigkeit und irrationale Zahlen Friedrich Vieweg und Sohn, Braun-
schweig (1872)

[Die1990] J. Dieudonné, Elements d’analyse Editions Jacques Gabay (1990)

[Hat02] A. Hatcher, Algebraic Topology Cambridge University Press (2002)

[Hil1893] D. Hilbert, Über die Transzendenz der Zahlen e und π Mathematische Annalen 43,
216-219 (1893)

[Hos1715] G.F.A. Marquis de l’Hôpital, Analyse des Infiniment Petits pour l’Intelligence des
Lignes Courbes 2nde Edition, F. Montalant, Paris (1715)

137
Chapter 14.5 BIBLIOGRAPHY

[Lin1894] E. Lindelöf, Sur l’application des méthodes d’approximations successives à l’étude


des intégrales réelles des équations différentielles ordinaires Journal de mathématiques pures
et appliquées 10 no.4, 117–128 (1894)

[Rus1903] B. Russell, The principles of mathematics WW Norton & Company, (1903)

[Rot88] J. J. Rotman, An introduction to Algebraic Topology Graduate Texts in Mathematics


119 Springer 1988

[Smu1978] R. Smullyan, What is the name of this book? Prentice-Hall, (1978)

[Zag1990] D. Zagier, A one-sentence proof that every prime p ≡ 1 mod 4 is a sum of two
squares. Amer. Math. Monthly 97, no.2, p. 144 (1990)

Version: February 25, 2024. 138

You might also like