Analysis II Script v1
Analysis II Script v1
Joaquim Serra
Preface
This notes are the continuation of ’Analysis I: One variable’ and follow the same format
and general spirit.
Originally crafted in German for the academic year 2016/2017 by Manfred Einsiedler
and Andreas Wieser, they were designed for the Analysis I and II courses in the Interdis-
ciplinary Natural Sciences, Physics, and Mathematics Bachelor programs. In the academic
year 2019/2020, a substantial revision was undertaken by Peter Jossen.
For the academic year 2023/2024, Joaquim Serra has developed this English version. It
differs from the German original in several aspects: reorganization and alternative proofs of
some materials, rewriting and expansion in certain areas, and a more concise presentation in
others. This version strictly aligns with the material presented in class, offering a streamlined
educational experience.
The courses Analysis I/II and Linear Algebra I/II are fundamental to the mathematics
curriculum at ETH and other universities worldwide. They lay the groundwork upon which
most future studies in mathematics and physics are built.
Throughout Analysis I/II, we will delve into various aspects of differential and integral
calculus. Although some topics might be familiar from high school, our approach requires
minimal prior knowledge beyond an intuitive understanding of variables and basic algebraic
skills. Contrary to high-school methods, our lectures emphasize the development of mathemat-
ical theory over algorithmic practice. Understanding and exploring topics such as differential
equations and multidimensional integral theorems is our primary goal. However, students are
encouraged to engage with numerous exercises from these notes and other resources to deepen
their understanding and proficiency in these new mathematical concepts.
9 Metric spaces 2
9.1 Basics of Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
9.1.1 The Euclidean space Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
9.1.2 Definition of metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . 5
9.1.3 Sequences, limits, and completeness . . . . . . . . . . . . . . . . . . . . 7
9.1.4 *The Reals as the Completion of Rationals (extra material; cf. Grund-
strukturen) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9.2 Topology of Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.2.1 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.2.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.2.3 Banach’s Fixed-Point Theorem . . . . . . . . . . . . . . . . . . . . . . . 19
9.2.4 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
9.2.5 Compactness and continuity . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.2.6 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9.3 Normed vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
9.3.1 Definition of Normed Vector spaces . . . . . . . . . . . . . . . . . . . . . 31
9.3.2 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
9.3.3 Equivalence of norms in finite dimensional normed spaces . . . . . . . . 35
9.3.4 The space of bounded continuous functions with values in Rm . . . . . . 38
9.3.5 The Length of a Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
10 Multidimensional Differentiation 44
10.1 The Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10.1.2 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
10.1.3 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 53
10.2 Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.2.1 Definition and basic properties . . . . . . . . . . . . . . . . . . . . . . . 55
10.2.2 Schwartz’s Theorem and Multi-indexes notation . . . . . . . . . . . . . . 56
10.2.3 Multidimensional Taylor Approximation . . . . . . . . . . . . . . . . . . 58
ii
11.1.1 Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
11.1.2 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
11.1.3 Extrema with Constraints and Lagrange Multipliers . . . . . . . . . . . 65
11.2 Relevant examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
11.2.1 Operator norm of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . 68
11.2.2 Fundamental Theorem of Algebra . . . . . . . . . . . . . . . . . . . . . . 69
11.2.3 Diagonalizability of Symmetric Matrices . . . . . . . . . . . . . . . . . . 71
11.3 Potentials and the equation Du = F . . . . . . . . . . . . . . . . . . . . 73
11.3.1 The work of a Vector Field along a line . . . . . . . . . . . . . . . . . . 73
11.3.2 The Poincaré Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
1
Chapter 9
Metric spaces
2
Chapter 9.1
Rn is a vector space over the field of real numbers with the coordinate-wise addition and
multiplication by a scalar. Rigorously, given x, y ∈ Rn and λ ∈ R we have
1
x + y := (x1 + y1 , . . . , xn + yn ), λx := (λx1 , . . . , λxn ).
For all x, y, z ∈ Rn
∥x − z∥ ≤ ∥x − y∥ + ∥y − z∥.
Therefore, the Proposition will follow if we can establish the validity of (9.1), or (squaring
it) of:
Xn n
X Xn n
X
xi yi xj yj = xi xj yi yj ≤ x2i yj2 .
i=1 j=1 i,j=1 i,j=1
But this last inequality is easily established summing over all pairs i, j ∈ {1, . . . , n} the
inequalities
2xi xj yi yj ≤ x2i yj2 + x2i yj2 ⇔ (xi yj − xj yi )2 ≥ 0;
9.4. — A metric d on a set X assigns to each pair of points their distance. In this
interpretation, the definiteness condition states that the only point at zero distance from a
given point x ∈ X is x itself. The symmetry condition states that the distance from x ∈ X to
y ∈ X is the same as from y to x. Interpreting the distance between two points as the length
of a shortest path from one point to the other, the triangle inequality states that the length
of a shortest path from x to y is at most the length of a path one takes by first going to y
and then from there to zx.
1 9.5. — When there is no possible confusion, we will often say “Let X be a metric space...”,
leaving the distance function unspecified. This is a shorter version of the more precise sentence
“Let (X, d) be a metric space...”.
Furthermore, we may refer to the set X as a space and the elements of X as points.
This is because we have in mind that X is some sort of geometric space, like a subset of the
plane or the surface of a sphere. In this setting, “spaces” and “points” will be synonymous
with “sets” and “elements”.
9.6. — Notice that the Euclidean space (Rn , d), with d(x, y) := ∥x − y∥, is a metric space.
In particular R, equipped with the absolute value distance |x − y| is a metric space.
Exercise 9.7. — Let (X, d) be a metric space and let ϕ : [0, ∞) → [0, ∞) a function which
is concave, increasing, ϕ(0) = 0, and not identically zero. Show that (X, ϕ ◦ d) is again a
√
metric space. For example one can take ϕ(t) = t, ϕ(t) = arctan t or ϕ(t) = 1+t
t
. Notice that
the last two choices always give bounded distances.
for x, y ∈ X. Then, (X, d) is a metric space. Indeed, d is definite and symmetric by definition.
Furthermore, d satisfies the triangle inequality: Let x, y, z be points in X. If d(x, z) = 0, then
d(x, z) ≤ d(x, y) + d(y, z) is trivially satisfied. If d(x, z) = 1, then x ̸= z, and y is at least
different from one point in {x, z}, so the triangle inequality also holds. This metric d is called
the discrete metric on the set X.
where we put x = (x1 , x2 ) and y = (y1 , y2 ). It can be verified (exercise) that dNY satisfies all
axioms of a metric. The reason why dNY is called the Manhattan metric is that in grid-like
places such as Manhattan, one can reach from (x1 , x2 ) to (y1 , y2 ) in the following way: first
move ‘horizontaly’ (i.e., with constant second coordinate from x = (x1 , x2 ) to (y1 , x2 ) and
then ‘vertically’ (with constant first coordinate) from (y1 , x2 ) to y = (y1 , y2 ), or vice versa:
1
x = (x1 , x2 ) to (x1 , y2 ), and then to y = (y2 , y2 ). Since all streets in Manhattan run either
from west to east or from north to south, dNY measures the relevant distance between two
points.
Exercise 9.11. — Let X be the set of all continuous real-valued functions defined on
[0, 1] ⊂ R. For f, g ∈ X set
Z 1
d1 (f, g) := max{|f (x) − g(x)| | x ∈ [0, 1]} and d2 (f, g) := |f (x) − g(x)|dx.
0
Example 9.12. — If (X, d) is a metric space and X0 ⊂ X is some subset then X0 inherits
an structure of metric space from X. Indeed, one can easily verify that (X0 , d0 ), where d0 the
restriction of d to X0 × X0 ⊂ X × X is a metric space.
For a more concrete instance of this take X = R3 with the Euclidean distance d and let
X0 be the sphere x ∈ R3 | x21 + x22 + x23 = R3 , for some R > 0. Then for any pair of points
An arguably more natural metric d1 on the sphere X0 can be define measuring the length
of the geodesic arc joining x and y. One can see that this metric is given by:
x·y
d1 (x, y) = R arccos ∈ [0, πR].
R2
Let (X, d) be a metric space, x ∈ X and (xn )n∈N be a sequence in X. We say that
(xn )n∈N converges to x, or that x is the limit of the sequence (xn )n∈N , if
lim d(xn , x) = 0.
n
9.16. — When the metric space (X, d) is clear from the context, we may write limn xn = x
or even xn → x to express that (xn )n∈N converges to x.
Exercise 9.17. — In the setting of Exercise 9.7, show that a sequence converges in (X, d)
if and only if it converges in (X, ϕ ◦ d).
Proof. Let (X, d) be a metric space and let A, B ∈ X be limits of some sequence (xn )∞
n=0 , we
mean to show that A = B. Take ε > 0, then, we can find NA , NB ∈ N such that d(xn , A) < 2ε
for all n ≥ NA , and d(xn , B) < 2ε for all n ≥ NB . Then, for N := max{NA , NB }, we have
that
ε ε
1 d(A, B) ≤ d(A, xN ) + d(xN , B) < + = ε,
2 2
where we used the triangular inequality. Since ε > 0 was arbitrary, it follows that d(A, B) = 0,
and thus A = B because of the definitness of d.
Let (xn )∞
n=0 be a sequence in a set X.
A subsequence of (xn )∞ n=0 is a sequence of the form (xf (k) )k=0 , where f : N → N
∞
Proof. We first prove the “only if” part. Let (xf (n) )n∈N be a subsequence, i.e., f : N → N is
some strictly increasing map. Given ε > 0 there is N such that d(xn , x) < ε for all n ≥ N .
Hence, d(xf (n) , x) < ε for all n ≥ N , as f (n) ≥ n.
We now prove the “if” part we can simply use that (xn )n≥0 is a subsequence (i.e, f (n) = n)
and hence it converges to x.
9.22. — A stronger version of the previous Lemma that is useful in some contexts asserts
the following: a sequence (xn )n∈N in a metric space converges to x if and only if every
partial sequence (xf (n) )n∈N (f increasing) has a sub-subsequence (xg(f (n)) )n∈N (g increasing)
converging to x.
While the proof of the “only if” part is similar (g(f (n) ≥ n) the “if” part is less trivial than
in the previous lemma. One can argue by contraposition: If xn does not converge to x then
∃ε > 0 ∀N ∈ N ∃n ≥ N d(xn , x) ≥ ε
In other words, the set of {n ∈ N | d(xn , x) ≥ ε} and hence there is an increasing sequence
(nk )k≥0 such that d(xnk , x) ≥ ε. Notice that any sub-subsequence will still remain at distance
≥ ε from x and hence will not converge to x.
This stronger version can be used, for example, to prove that a continuous function f :
[0, 1] → R has a unique minimum point if and only if all sequences (xn )n≥0 ⊂ [0, 1] such that
f (xn ) → min[0,1] converge to the same limit point.
2 Proof. Let {xk }k∈N ⊂ Rn be a sequence. For j = 1, . . . , n, we denote with xk,j the j-th
component of the vector xk .
Assume that xk → x ∈ Rn . By definition, given ε > 0 and any j, it holds
n
X 1/2
|xk,j − zj | ≤ (xk,i − zi )2 = ∥xk − x∥ ≤ ε eventually in k.
i=1
This proves that, for each j = 1, . . . , n, xk,j → xj when k → ∞ (as sequences of real numbers,
with the standard absolute value distance).
Assume now that for each j = 1, . . . , d it holds
xk,j → xj as k → ∞,
ε
|xk,j − xj | < for all k ≥ Nj .
n
We introduce the concept of completeness for metric spaces. This concept does not
conflict with the notion of completeness that we gave for R. We will soon show that R, as
well as C, is complete as a metric space. In contrast, the metric space Q is not complete.
A sequence (xn )∞
n=0 in a metric space (X, d) is a Cauchy sequence if, for every ε > 0,
there exists N ∈ N such that d(xm , xn ) < ε for all pair of integers m, n with n ≥ N
and m ≥ N .
Exercise 9.25. — Prove the following elementary facts about Cauchy sequences in a metric
space (X, d):
Example 9.27. — The interval (0, 1) ⊂ R, endowed with the standard distance d(x, y) =
|x − y|, is not complete. However, N ⊂ R is complete, as well as [0, ∞).
Exercise 9.28. — Show that Q, with the distance inherited from the standard distance
on R, is not a complete metric space.
Exercise 9.29. — Show that the space X of all bounded sequences (xn )n≥0 of real numbers,
equipped with the distance
d (xn )n≥0 ), (yn )n≥0 = sup |xn − yn |
n≥0
is complete. Show also that the subspace X0 of sequences with limit 0 is not complete.
Proof. Similar to the proof Lemma 9.23, a sequence (xk ) in Rn is a Cauchy if and only if xk,j ,
j = 1, . . . , n are Cauchy (in R). It then follows from Theorem 2.124???? (Cauchy sequences in
R converge) that xk converges coordinate-wise. Thus, by Lemma 9.23 xk converges in Rn .
9.31. — Completion of metric space (extra material) Let (X, d) be a metric space. We
write CX for the set of all Cauchy sequences in X and define an equivalence relation on CX by
(xn )∞ ∞
n=0 ∼ (yn )n=0 ⇐⇒ lim d(xn , yn ) = 0.
n→∞
d([(xn )∞ ∞
n=0 ], [(yn )n=0 ]) = lim d(xn , yn ),
n→∞
is called the completion of (X, d). The injection ι : X → X, mapping x ∈ X to the class of
the constant sequence with value x, is called the canonical embedding. For all x, y ∈ X,
we have
d(x, y) = d(ι(x), ι(y)),
Exercise 9.32. — Show that the objects introduced in 9.31 are well-defined. In particular,
2 verify that d is indeed a metric on X.
Exercise 9.33. — As the name suggests, the completion (X, d) of a metric space is com-
plete, meaning that every Cauchy sequence in X converges. A sequence in X is essentially a
sequence of sequences, i.e.,
[(xm,n )∞ ∞
n=0 ]m=0 .
Exercise 9.34. — Let (X, d) be a metric space with completion (X, d). Let (Y, dY ) be a
complete metric space, and let f : X → Y be a function such that
for all x, y ∈ X. Show that there exists a unique function f : X → Y such that f = f ◦ ιX
and
d(x, y) = dY (f (x), f (y))
for all x, y ∈ X. This can be interpreted as: “X is the smallest complete metric space
containing X.”
9.1.4 *The Reals as the Completion of Rationals (extra material; cf. Grund-
strukturen)
In the first semester, we defined R as any complete ordered field, postulating its existence
(Definitions 2.18 and 2.19 ????). The idea of completion of metric spaces allows one to easily
construct a model of R. This constructions shows, in particular, the existence of a complete
ordered field. One can also prove with a bit of patience (although it is not hard to do so) that
actually there is only one model of R, in the sense that any two complete ordered fields must
be isomorphic.
The completion of Q serves as a model for a field of real numbers. First, note that the
construction of the completion of Q does not necessarily require a field of real numbers (as
the target space for the standard metric on Q). The set of all Cauchy sequences C in Q is the
set of all sequences of rational numbers (qn )∞
n=0 such that
N = (qn )∞
n=0 ∈ C | lim qn = 0
n→∞
R = C/∼ = C/N
in the sense of linear algebra. Thus, R is a vector space over Q. We denote the injective linear
map ι : Q → R by the canonical embedding, which assigns to q ∈ Q the class of the constant
sequence with value q. From now on, elements of R are called real numbers, and we consider
Q as a subset of R via the canonical embedding ι.
We define a product on R by component-wise multiplication. That is, for elements x =
n=0 ] and y = [(qn )n=0 ], we define
[(pn )∞ ∞
x · y = [(pn qn )∞
n=0 ].
It can be verified that this gives a well-defined commutative operation on R, satisfying the
distributive law with respect to addition, and compatible with the multiplication of rational
numbers via the canonical embedding. In particular, 1R = ι(1) = [(1)∞ n=0 ] is the multiplicative
identity in R. If x = [(qn )n=0 ] is non-zero, then (qn )n=0 is a Cauchy sequence in Q that does
∞ ∞
not converge to zero. Therefore, qn ̸= 0 for all but finitely many n ∈ N. The class of the
sequence (pn )∞n=0 given by
1 if qn = 0
pn =
q −1 otherwise
m
serves as a multiplicative inverse for x. This shows that R is a field with the given operations.
We use the usual order relation on Q to construct an order relation on R. For elements
n=0 ] and y = [(qn )n=0 ] in R, we declare
x = [(pn )∞ ∞
x≤y
if there exists a sequence (rn )∞ n=0 ∈ N such that pn − rn ≤ qn for all n ∈ N. It is left to
the diligent reader to verify that this indeed defines a well-defined order relation on R that
is compatible with the field structure on R. Thus, R is equipped with the structure of an
ordered field.
It remains to show that the ordered field R is complete in the sense of Definition ??. It
is easy to see that R satisfies the Archimedean Principle: Let x = [(qn )∞ n=0 ] ∈ R be positive.
Then, (qn )∞n=0 is not a null sequence. Thus, there exists a k ∈ N such that |qn | > 2
−k for
infinitely many n ∈ N. However, (qn )∞ n=0 is also a Cauchy sequence, so there exists N ∈ N
such that m, n ≥ N =⇒ |qn − qm | < 2−k−1 . This shows that |qn | > 2−k−1 and even
qn > 2−k−1 for all but finitely many n ∈ N, since x > 0. This demonstrates ι(2−k−1 ) ≤ x, or
simply 2−k−1 ≤ x as we consider Q as a subset of R. Thus, the Archimedean Principle holds,
as stated in Corollary ??. Now, let X, Y ⊂ R be non-empty subsets such that x ≤ y for all
x ∈ X and y ∈ Y . We want to find a real number z = [(rn )∞ n=0 ] ∈ R between X and Y . To
2
do this, we first choose arbitrary a0 , b0 ∈ Q such that [a0 , b0 ] ∩ X ̸= ∅ and [a0 , b0 ] ∩ Y ̸= ∅,
and set r0 = 12 (a0 + b0 ). If x ≤ r0 ≤ Y for all x ∈ X and y ∈ Y , we set z = r0 and we are
done. Otherwise, we define a1 and b1 as
a = r and b = b if [a0 , r0 ] ∩ X ̸= ∅ and [a0 , r0 ] ∩ Y ̸= ∅
1 0 1 0
a = a and b = r if [r , b ] ∩ X ̸= ∅ and [r , b ] ∩ Y ̸= ∅
1 0 1 0 0 0 0 0
and set r1 = 12 (a1 + b1 ). By continuing this process, we either find an rn such that x ≤ rn ≤ Y
for all x ∈ X and y ∈ Y , and we set z = rn , or we obtain sequences (an )∞ n=0 , (bn )n=0 , and
∞
z = [(an )∞ ∞ ∞
n=0 ] = [(rn )n=0 ] = [(bn )n=0 ]
and refer to the set B(x, r) as the open ball with center x and radius r.
9.37. — In particular, ∅ and X are always both open and closed. In general, a subset U
needs not to be neither open nor closed.
It is not true in general that the only “clopen” sets in a space are the empty set ∅ and the
3 space itself X. A set is termed "clopen" if it is both open and closed. For illustration, take the
space X = (0, 1) ∪ (2, 3), equipped with the standard metric from R. Here, the intervals (0, 1)
and (2, 3) are clopen: they are open and closed in X. This example underscores the presence
of other clopen sets beyond just ∅ and X. The significance of clopen sets will become more
apparent in our discussions on connectedness. As we will see, connected spaces are precisely
characterized by the absence of nontrivial (neither empty nor the whole space) clopen sets.
9.38. — Consider the set X = [0, 2], equipped with the standard metric inherited from
R. In this context, the subset [0, 1) is open within X (an exercise worth verifying). However,
when considered as subset of the whole R, [0, 1) is neither open nor closed. This example
illustrates that statements regarding the openness of a set like [0, 1) require clarity about the
ambient space (X, d) being considered. In practice, though, such nuances are often glossed
over when the context is clear, and delving into these subtleties is usually unnecessary for
typical discussions.
Proof. Set
[
U= Ui
i∈I
and let x ∈ U . Then there exists i ∈ I with x ∈ Ui , and since Ui is open, there exists an
ε > 0 such that B(x, ε) ⊂ Ui , implying B(x, ε) ⊂ U . Thus, U is open. Finally, let (Ui )i∈I be
a finite family of open subsets of X. Set
\
3 U= Ui
i∈I
and let x ∈ U . Then x ∈ Ui for all i ∈ I, and for each i ∈ I, there exists εi > 0 such that
B(x, εi ) ⊂ Ui . For ε := min{εi | i ∈ I}, we have ε > 0 and B(x, ε) ⊂ Ui for all i ∈ I. Thus,
B(x, ε) ⊂ U , completing the proof.
Proof. Apply Proposition 9.40 to the (open) complements of the closed sets.
Example 9.42. — The intersection of infinitely many open sets may not be open. Take for
example R with the standard metric. The intersection of the family of open sets {(−1/n, 1/n) | n ∈ N}
gives {0}, which is not open. Taking complements you obtain an example where an infinite
union of closed sets is not closed.
Definition 9.43:
Let X be a metric space and E ⊂ X. We define
in E.
containing E.
Exercise 9.44. — Using Proposition 9.40, prove that E ◦ is always open while E and ∂E
are always closed.
Exercise 9.45. — For balls B(x, r) prove that B(x, r) = {y ∈ X | d(x, y) ≤ r} and deduce
∂B(x, r) = {y ∈ X | d(x, y) = r}.
(1) A subset U ⊂ X is open if and only if, for every convergent sequence in X with a
3 limit in U , the sequence eventually lies in U .
(2) A subset A ⊂ X is closed if and only if, for every convergent sequence (xn )∞
n=0 in
X with xn ∈ A for all n ∈ N, the limit also lies in A. In other words, if and only
if A coincides with the set of all its accumulation points.
Proof. Let U ⊂ X be an open subset of X, and let (xn )∞ n=0 be a sequence in X with a limit x
in U . Then, there exists ε > 0 such that B(x, ε) ⊂ U , and since (xn )∞ n=0 converges to x, there
exists an N ∈ N such that xn ∈ B(x, ε) for all n ≥ N . Conversely, let V ⊂ X be a non-open
subset. Then there exists a point x ∈ V such that B(x, ε) \ V ̸= ∅ for every ε > 0. For each
n ∈ N, we can find xn ∈ B(x, 2−n ) \ V . The sequence (xn )∞ n=0 in X \ V converges to x ∈ V ,
and satisfies xn ∈/ V for every n ∈ N. This completes the proof of the first statement.
Let A ⊂ X be closed, and let (xn )∞ n=0 be a convergent sequence in X with xn ∈ A for all
n ∈ N. Let x be the limit of the sequence (xn )∞ n=0 . Then, U = X \ A is open and cannot
contain the limit x of (xn )n=0 , as otherwise almost all elements of the sequence (xn )∞
∞
n=0 would
have to lie in U . Therefore, the limit x belongs to A. Finally, suppose A ⊂ X is not closed.
Then U = X \ A is not open, and according to the previous argument, there exists a sequence
n=0 in A = X \ U with a limit x ∈ U .
(xn )∞
Exercise 9.47. — Let (X, d) be a complete metric space and E ⊂ X a closed subset.
Show that E is complete as well.
Proof. Notice that we can rewrite the definition of convergence as follows: xn → x if and only
if for all ε > 0, xn eventually lies in B(x, ε). Now, if U is any open set containing x then
by definition of open set there exists ε > 0 such that B(x, ε) ⊂ U and hence xn → x implies
that xn eventually lies in U , establishing the “only if” direction. For the “if” we just that for
given ϵ > 0 we can take U = B(x, ε) (open balls are open) and hence xn eventually lies in
B(x, ε).
Let X be a set endowed with two different distances d1 and d2 . Then (X, d1 ) and (X, d2 )
have the same convergent sequences if and only if the topologies generated by d1 and d2
coincide.
3
Proof. By Corollary 9.48 the notion of convergent sequence only depends on the collection of
open sets. That is, it only depends on the distance through the topology it generate. Hence
distances generating the same open sets have the same convergent sequences.
(i) Assume X is complete and A is closed. Show that the subspace A is also complete.
9.2.2 Continuity
We now aim to generalize the concept of continuity to functions defined between metric
spaces.
(1) We say f is ε − δ continuous if, for all x ∈ X and ϵ > 0, there exists a δ > 0
such that if dX (x, x′ ) < δ, x′ ∈ X =⇒ dY (f (x), f (x′ )) < ε. In other words,
B(x, δ) ⊂ B(f (y), ε).
(2) We say f is sequentially continuous if, for every convergent sequence (xn )n in
X with limit x = lim xn , the sequence (f (xn ))n converges in Y , with f (x) =
n→∞
lim f (xn ).
n→∞
(3) We say f is topologically continuous if, for every open subset U ⊂ Y , the
preimage f −1 (U ) = {x ∈ X | f (x) ∈ U } is open in X.
Proof. (1) =⇒ (2): Let (xn )∞ n=0 be a convergent sequence in X with limit x ∈ X, and let
ε > 0. There exists a δ > 0 such that f (x′ ) ∈ B(f (x), ε) for all x′ ∈ B(x, δ). Since (xn )∞ n=0
converges to x, there exists an N ∈ N such that xn ∈ B(x, δ) for all n ≥ N . In particular, for
n ≥ N , f (xn ) ∈ B(f (x), ε). Since ε > 0 was arbitrary, it follows that lim f (xn ) = f (x), and
n→∞
thus f is sequentially continuous as claimed.
¬(3) =⇒ ¬(1): Assume f is not topologically continuous. Then exists U ⊂ Y open such
that f −1 (U ) is not open. Therefore, there is x ∈ f −1 (U ) and a sequence (xn )n≥0 ⊂ X \f −1 (U )
with xn → x. But then f (x) ∈ U and f (xn ) ∈ Y \ U for all n which contradicts. Since U we
have found a sequence such that xn → x but f (xn ) does not coverge to f (x).
(3) =⇒ (1): Let x ∈ X and ε > 0. The preimage f −1 (B(f (x), ε)) contains the point x
and is open by assumption, as B(f (x), ε) ⊂ Y is open. Thus, there exists a δ > 0 such that
B(x, δ) ⊂ f −1 (B(f (x), ε)). Therefore, f is ε-δ-continuous as claimed.
Definition 9.53:
Let (X, dX ) and (Y, dY ) be metric spaces. We say that f : X → Y is
Exercise 9.54. — Let (X, d) be a metric space, and let E ⊂ X be a non-empty subset.
For x ∈ X, define
fE (x) = inf{d(x, z) | z ∈ E}.
Exercise 9.55. — Let (X, dX ) and (Y, dY ) be metric spaces. Assume that Y is complete.
Show that if E ⊂ X and f : E → Y is a uniformly continuous function defined on a subset
then there is a unique continuous extension f¯: Ē → Y , which is also uniformly continuous.
Proof. First, we show uniqueness of a putative fixed point. Let a ∈ X and b ∈ X be fixed
points of T . Then,
d(a, b) = d(T (a), T (b)) ≤ λd(a, b),
Cauchy sequence. Iterating the contractivity assumption we find that, for all integers p ≥ 0,
Pick now any integers m ≥ n ≥ N , then using this observation and the triangular inequality
we find
m−1
X m−1
X
d(xn , xm ) ≤ d(xp , xp+1 ) = d(T p (x0 ), T p (x1 ))
p=n p=n
m−1 ∞
X
p
X λN
≤ λ d(x0 , x1 ) ≤ d(x0 , x1 ) λp = d(x0 , x1 ).
p=n
1−λ
p≥N
We crucially used that λ < 1 to sum the geometric series. Now given any ε > 0 we can find
λN
some N so large that 1−λ d(x0 , x1 ) < ε, thus proving that (xn )∞
n=0 is Cauchy (this estimate is
uniform in n, m as long as they are larger than N !).
Now we use the completeness assumption to infer that xn → a for some a ∈ X. Since T is
continuous, we have
9.57. — We remark that the proof is constructive and in concrete situation can be imple-
mented in a an algorithm to find approximate fixed points.
(2) A complete metric space (X, d) and an isometry (i.e., a mapping T : X → X with
d(T (x1 ), T (x2 )) = d(x1 , x2 )) that has no fixed point, and an isometry that has exactly
13 fixed points.
9.2.4 Compactness
A closed and bounded interval of the real line is is called a compact interval, as we saw
in Analysis I. We proved some fundamental properties of continuous functions on compact
intervals: boundedness, existence of maxima and minima, and uniform continuity. We intend
to investigate these and other properties in the broader context of metric spaces. We start
giving a general definition of compactness that works in metric spaces.
Let us immediately clarify a possible source of confusion.
• Nevertheless in Rn , which is the main focus of this course, it will turn out that a
closed and bounded set is indeed compact, and viceversa.
Interlude: “Cover”
Let E ⊂ X and let U = {Ui }i∈I be a family of subsets of X, where I is some set of
indices. We say that U covers E if
[ [
E⊂ U= Ui .
U ∈U i∈I
(1) X is topologically compact: every family of open sets {Ui }i∈I that cover X, has
a finite sub-family that still covers X.
(2) X is sequentially compact: every sequence (xn )n∈N in X has a subsequence that
is convergent in X.
(3) X is complete and totally bounded: for every r > 0, there exist finitely many
x1 , . . . , xn ∈ X such that the balls B(x1 , r), . . . , B(xn , r) cover X.
9.61. — The definition of topological compactness does not explicitly use the distance
function d, but it is formulated only in terms of the collection of open sets (i.e., the topology).
For this reason it is called “topological”.
9.62. — The Bolzano-Weierstrass Theorem ensures that a closed and bounded interval of
R is sequentially compact.
Example 9.63. — Q ∩ [0, 2], endowed with the standard distance, is not topologically
compact. Consider the covering
√ [
Q ∩ [0, 2] = (Q ∩ [0, 2)) ∪ (Q ∩ (p, 2]).
√
p∈Q, p> 2
√
Any finite subcover will miss some rationals > 2.
Exercise 9.64. — Let X be a metric space. Show that if X is totally bounded, then it is
bounded, i.e., supx,x′ ∈X d(x, x′ ) < +∞.
Example 9.65. — The half-open interval X = (0, 1] ⊂ R is not compact. Indeed, the open
cover U = {(2−n , 1] | n ∈ N} has no finite subcover.
The main result of this section is Theorem 9.67, which shows that the definition of compact
metric space is indeed well-posed.
Exercise 9.66. — Show that a totally bounded metric space is bounded, and find an
example of a bounded metric space that is not totally bounded.
(3) X is totally bounded and complete (in the sense of Cauchy sequences).
Proof of Lemma 9.68. Assume X is compact, and let A be a collection of closed subsets of X
with an empty intersection. The collection of complements U = {X \ A | A ∈ A} is then an
Version: February 25, 2024. 22
Chapter 9.2
Proof that (1) =⇒ (3). We first prove that X is totally bounded. Pick any r > 0, consider the
open covering U := {B(x, r) : x ∈ X} and extract a finite subcover {B(x1 , r), . . . , B(xN , r)}.
Now we prove that X must be complete, hence we pick a Cauchy sequence (xn )n∈N and
show that it has a limit point. For each k ≥ 0 there is n(k) so large that
For each k, consider the closed balls Ak := B(xn(k) , 2−k ); any finite intersection of them is
nonempty, indeed for every k1 , . . . , kN one has that xm ∈ Ak1 ∩ . . . ∩ AkN , provided m ≥
max{k1 , . . . , kN }. Hence we can apply the Nesting Principle (see Proposition 9.68) and find
some z ∈ k≥0 Ak . We claim that xn → z. Indeed if m ≥ n(k) it holds
T
Proof of Lemma 9.69. Set f (0) equal to an arbitrary element of N0 . Then set inductively
f (k) := min{m ∈ Nk : m > f (k − 1)}, this set is non-empty because each Nk is infinite.
Proof that (3) =⇒ (2). We pick any sequence (xn )n∈N and show that admits a Cauchy sub-
sequence, by completeness, this will prove (2).
By assumption, we can cover X by a finite number of balls of radius 1; it follows that
(xn )n∈N will frequently lie in one of these balls, let this ball be B(z1 , 1) for some z1 ∈ X.
Accordingly, we define the set of indices N0 := {j ∈ N : xj ∈ B(z1 , 1)}, which is infinite.
Now we proceed to do same thing to the restricted sequence (xn )n∈N0 , but we shorten the
radius of from 1 to 1/2. Accordingly, we find a ball B(z2 , 1/2) such that the set N1 := {j ∈
N0 : xj ∈ B(z2 , 1/2)} is infinite.
We proceed inductively, halving the radius each time, and construct a descending family
of infinite sets N ⊃ N0 ⊃ N1 ⊃ N2 ⊃ . . . with the property that
We apply to these sets Lemma 9.69 and find f : N → N strictly increasing such that f (k) ∈ Nk
for all k ≥ 0. Then, the subsequence (xf (k) )k∈N is Cauchy: for n, m ≥ k it holds
Proof that (2) =⇒ (1). Assume U is an open covering of X, we want to construct a finite
open subcover out of it.
We start with a preliminary observation. Consider the function
Notice that r(x) > 0 since U is open cover. For each x ∈ X we choose — once and for all —
some U (x) ∈ U which is almost optimal in the sense that B(x, r(x)/2) ⊂ U (x).
Let us now proceed with the construction of our finite subcover. Pick any U0 ∈ U. If
X ⊂ U0 then we are done, otherwise there is some x1 ∈ X \U0 . In this case we set U1 := U (x1 ).
Now we check if X ⊂ U0 ∪ U1 , in which case we have found our finite subcover. If not,
there is some x2 ∈ X \ (U0 ∪ U1 ) and we set U2 := U (x2 ).
Now we check again if X ⊂ U0 ∪ U1 ∪ U2 , in which case we have found our finite subcover.
If not, there is some x3 ∈ X \ (U0 ∪ U1 ∪ U2 ) and we set U3 := U (x3 ).
If this procedure stops at a certain point, it means that we have found our finite open
subcover. So let’s assume that it is goes on indefinitely, and find a contradiction. We obtain
a sequence (xn )n∈N with the property that
By assumption (2), we have xf (n) → z for a suitable subsequence. On the other hand, for
each n ≥ 0 we have z ∈ / Uf (n) , and in particular z ∈
/ B(xf (n) , r(xf (n) )/2). Combining this
information with xf (n) → z, we find r(xf (n) ) → 0 and in particular
r(xf (n) ) = sup{r > 0 : B(xf (n) , r) ⊂ U for some U ∈ U} ∈ (0, 1).
which, by definition of r(xf (n) ), implies 5r(xf (n) ) ≤ r(xf (n) ), impossible.
Proof. We check that E is sequentially compact: take any sequence (xn )n∈N ⊂ E, by com-
pactness of X, it has a converging subsequence xf (n) → x, for some x ∈ X. Since E is closed
it contains its accumulation points, so in fact x ∈ E.
We also easily get the following version of the Heine Borel theorem in Rn
Proof. If K is compact, then it is closed by Corollary 9.72; and bounded, because it is totally
bounded.
To show the converse, we show that K is complete and totally bounded. Since Rn is
complete and K is closed, then K is complete as well.
Given r > 0, take some large integer N ≥ 2n/r and consider all the closed cubes that have
sidelength 1/N , corners with coordinates of the form m/N for some m ∈ Z and intersect K.
Such cubes are only a finite number, because K is bounded. Furthermore, each of them is
contained in the ball of radius r and center one of the edges, thus we have found our finite set
of balls that cover K.
Example 9.74. — We stress that the Heine-Borel Theorem fails for general metric spaces:
take R with the bounded distance d(x, y) := arctan |x − y| (see Exercise 9.7). Then in this
metric space the set N would be closed and bounded, but it is trivial to construct sequences
that do not converge.
Proof. Let U be an open cover of f (A). For each U ∈ U, the set f −1 (U ) ⊂ X is open due
to the continuity of f . The collection {f −1 (U ) | U ∈ U } is an open cover of A. Since A is
compact, there exist U1 , . . . , Un ∈ U such that {f −1 (Ui ) | 1 ≤ i ≤ n} is a cover of A. This
implies that {Ui | 1 ≤ i ≤ n} is a cover of f (A), and since U was arbitrary, it shows that f (A)
is compact.
Proof. Let ε > 0. Due to the continuity of f , for each x ∈ X, there exists δx > 0 such
that f (B(x, δx )) ⊂ B(f (x), 2ε ). The collection {B(x, 21 δx ) | x ∈ X} forms an open cover of
X. Since X is compact by assumption, there exists a finite subcover of this collection. This
implies the existence of x1 , . . . , xn ∈ X such that
Let δ = 12 min{δx1 , . . . , δxn }. For x, x′ ∈ X with d(x, x′ ) < δ, there exists k such that
x ∈ B(xk , 12 δxk ). This implies x′ ∈ B(xk , δxk ), leading to
ε ε
d(f (x), f (x′ )) ≤ d(f (x), f (xk )) + d(f (xk ), f (x′ )) ≤ + = ε,
2 2
Proof. f (X) ⊂ R is compact by Theorem 9.76 and nonempty, so sup f (X) ∈ f (X), any
element x̄ ∈ f −1 (sup f (X)) works.
The fact that sup f (X) ∈ f (X) is readily proved: by definition of supremum there is a
sequence (sn ) ⊂ f (X) such that sn → sup f (X), but then sup f (X) inf(X) since f (X) is
closed and so it contains its accumulation points.
9.2.6 Connectedness
Lemma 9.80:
Let (X, d) be a metric space, and let Y1 and Y2 be connected subspaces. If the intersection
Y1 ∩ Y2 is non-empty, then the union Y1 ∪ Y2 is connected.
Proof. Let A be a non-empty open and closed subset of Y1 ∪ Y2 . Then, A ∩ Yj is an open and
closed subset of Yj for j = 1, 2. Since A is non-empty, one of these intersections is non-empty;
let’s say A ∩ Y1 is non-empty. As Y1 is connected, we have A ∩ Y1 = Y1 , implying Y1 ⊂ A.
Since Y1 ∩ Y2 ̸= ∅, we have A ∩ Y2 ̸= ∅, and similarly, Y2 ⊂ A. In summary, A = Y1 ∪ Y2 . This
proves that Y1 ∪ Y2 is connected.
Proof. Assume X ⊂ R is not an interval. Then there exist real numbers x1 < y < x2 with
x1 , x2 ∈ X and y ∈
/ X. Define the subsets
These sets are open and closed (as subsets if X, not in R!), non-empty, and satisfy X = U1 ∪U2 .
Thus, X is not connected.
Now, let X ⊂ R be an interval. Assume there exist non-empty, open, and closed subsets
Y1 ⊂ X and Y2 ⊂ X, with Y1 ∩ Y2 = ∅ and Y1 ∪ Y2 = X. Choose x1 ∈ Y1 and x2 ∈ Y2 .
t∗ := sup{t ≥ 0 : [0, t] ⊂ Y1 },
since 1 ∈ Y2 we have t∗ ∈ [0, 1], so t∗ must belong either to Y1 or to Y2 , but both eventualities
are contradictory.
If t∗ ∈ Y1 then t∗ < 1 and, since Y1 is open, we could enlarge a bit t∗ violating its very
definition.
If t∗ ∈ Y2 , since Y2 is open, it would mean that a whole little neighbourhood of t∗ is
contained in Y2 , but this is impossible because t∗ must be an accumulation point of Y1 .
Proposition 9.83:
Let (X, dX ) and (Y, dY ) be metric spaces, and let f : X → Y be continuous. If X is
connected, then the image f (X) is a connected subspace of Y .
Proof. Without loss of generality, we can assume a < b. We apply Proposition 9.83 to the con-
tinuous function f |[a,b] : [a, b] → R. Consequently, f ([a, b]) is connected, since by Proposition
9.82, the interval [a, b] in R is connected. Again, according to Proposition 9.82, f ([a, b]) ⊂ R
must be an interval. As f (a), f (b) ∈ f ([a, b]), all values between f (a) and f (b) lie in the image
of f |[a,b] .
Exercise 9.85. — Show the following generalization of the Intermediate Value Theorem:
Let X be a connected topological space, and f : X → R be a continuous function. Let
a, b ∈ X. Then, for every c ∈ R between f (a) and f (b), there exists x ∈ X such that f (x) = c.
If s : [a, b] → [0, 1] is a bijective continuous function with continuous inverse we say that
γ ◦ s is a re-parametrization of γ. Furthermore, exactly one of the following happens
Definition 9.86:
We call a topological space X path-connected if, for every two points x, y ∈ X, there
exists a path γ : [0, 1] → X from x = γ(0) to y = γ(1).
Lemma 9.87:
Every path-connected topological space is connected.
Proof. Let X be a disconnected topological space. Then there exist non-empty, open, and
closed subsets U1 and U2 of X such that U1 ∩ U2 = ∅ and U1 ∪ U2 = X. Let x1 ∈ U1 and
x2 ∈ U2 . If X were path-connected, there would exist a path γ : [0, 1] → X from x1 to x2 .
However, this implies that V1 = γ −1 (U1 ) and V2 = γ −1 (U2 ) are non-empty, open, and closed
subsets of [0, 1] with V1 ∩ V2 = ∅ and V1 ∪ V2 = [0, 1], which is a contradiction since [0, 1] is
connected.
Proposition 9.89:
Let U ⊂ Rn be an open subset. Then U is path-connected if and only if U is connected.
and want to show that G = U . Since U is connected and G is non-empty, it suffices to show
that G is both open and closed.
Let x ∈ G and γ : [0, 1] → U be a path from x0 to x. Since U is open, there exists r > 0
such that B(x, r) ⊂ U . For any y ∈ B(x, r), the straight path t 7→ (1 − t)x + ty, connecting x
and y, lies in U . Concatenating these paths yields the path
γ(2t) if 0 ≤ t ≤ 21
t 7→
(2 − 2t)x + (2t − 1)y if 1 < t ≤ 1
2
from x0 to y. Thus, y ∈ G, and since y was arbitrary, we have B(x, r) ⊂ G. This shows that
G is open. Using a similar argument, we can show that U \ G is open. If x ̸∈ G and r > 0
with B(x, r) ⊂ U , then all points in B(x, r) are not in G. If y ∈ G ∩ B(x, r), a concatenation
of paths as above would connect x to x0 . Therefore, B(x, r) ⊂ U \ G, and U \ G is open.
Thus, G is closed.
Definition 9.91:
Let V be a vector space over R. A norm on V is a mapping ∥ · ∥ : V → [0, ∞) that
satisfies the following three properties.
Example 9.92. — Let n ∈ N. The maximum norm or infinity norm ∥ · ∥∞ , and the
1-norm ∥ · ∥1 on Rn are defined by
n
X
∥v∥∞ = max{|v1 |, |v2 |, . . . , |vn |} and ∥v∥1 = |vj |
j=1
Example 9.93. — If V is the vector space of continuous R-valued functions on [0, 1], we
define analogously the 1-norm and the infinity norm as
Z 1
∥f ∥1 = |f |dx and ∥f ∥∞ = sup{|f (x)| x ∈ [0, 1]}.
0
We immediately observe that Normed Vector Spaces are “automatically” Metric Spaces.
Proof. We check definiteness, symmetry, and the triangle inequality in the definition of a
metric 9.3. For v, w ∈ V , we have d(v, w) = ∥v − w∥ ≥ 0, and
d(v, w) = 0 ⇐⇒ ∥v − w∥ = 0 ⇐⇒ v − w = 0 ⇐⇒ v = w
by the definiteness of the norm. Using homogeneity of the norm for α = −1, we have for
v, w ∈ V ,
d(v, w) = ∥v − w∥ = ∥(−1)(v − w)∥ = ∥w − v∥ = d(w, v)
thus establishing the symmetry of d. Finally, using the triangle inequality of the norm, we
obtain
for all u, v, w ∈ V . This shows the triangle inequality for d, so d is indeed a metric on V .
We have seen that a normed space is naturally a metric space, now we check that the norm
is indeed continuous. We show sequential continuity.
Lemma 9.95: The Norm is continuous with respect to its own distance
Let V be a R-vector space, and let ∥ · ∥ be a norm on V . Let (vn )∞
n=0 be a sequence in
V converging with respect to the norm ∥ · ∥ to a limit w ∈ V . Then,
Proof. Let ε > 0. Since (vn )∞ n=0 converges to w, there exists an N ∈ N such that for all
n ≥ N , the estimate ∥vn − w∥ < ε holds. The sequence (∥vn − w∥)∞ n=0 converges to 0. Using
the triangle inequality, we get
and the lemma follows from the sandwich lemma for sequences of real numbers.
Definition 9.96:
Let V be a vector space over R. An inner product on V is a map
⟨−, −⟩ : V × V → R
d
X
⟨−, −⟩ : V × V → R ⟨v, w⟩ = vk wk
k=1
for v = (v1 , . . . , vd ) and w = (w1 , . . . , wd ). The proof of bilinearity and symmetry is left as an
exercise. We verify definiteness. Let v = (v1 , . . . , vd ) ∈ Rd . Then,
d
X d
X
⟨v, v⟩ = vk vk = |vk |2 ≥ 0
k=1 k=1
is a non-negative real number. If v = 0, then ⟨v, v⟩ = 0. If ⟨v, v⟩ = 0, then each term |vk |2
must be zero, and thus vk = 0 for all k, implying v = 0.
Let V be a vector space over R, let ⟨−, −⟩ be an inner product on V , and let ∥·∥ : V → R
p
be given by ∥v∥ = ⟨v, v⟩. Then the inequality holds
for all v, w ∈ V . Furthermore, equality in (9.2) holds if and only if v and w are linearly
dependent.
Proof. If v = 0 or w = 0, then both sides of (9.2) are zero, and the vectors v, w are linearly
dependent. So, we assume that v ̸= 0 and w ̸= 0. Then, for α = ⟨v, w⟩∥w∥−2 , we have
implying the desired inequality (9.2). Equality holds if and only if ∥v − αw∥ = 0, implying
v = αw.
Exercise 9.99. — Prove the Cauchy-Schwarz inequality by following these steps: Let
a > 0. Show that for all v, w ∈ Rn , the inequality
a2 1
|⟨v, w⟩| ≤ ∥v∥2 + 2 ∥w∥2
2 2a
Corollary 9.100:
Let V be a vector space over R, let ⟨−, −⟩ be an inner product on V . The map defined
by (9.3)
p
∥ · ∥ : V → R, ∥v∥ = ⟨v, v⟩
Proof. Definiteness and homogeneity follow directly from the definiteness and sesquilinearity
of the inner product. We only need to prove the triangle inequality. Let v, w, ∈ V . Using the
Cauchy-Schwarz inequality, we have the estimate
which implies the desired result after taking the square root.
9.101. — Let V be a vector space over R. If ⟨−, −⟩ is an inner product on V , we call the
norm treated in Corollary 9.100
(9.3)
p
∥ · ∥ : V → R, ∥v∥ = ⟨v, v⟩
the induced norm by ⟨−, −⟩. In particular, from the Euclidean inner product on V = Rn ,
we can define a norm on V = Rn . The Euclidean norm on V = Rn is given by
v
u n
p uX
∥x∥ = ⟨x, x⟩ = t |vk |2
k=1
9.102. — From now on, in order to keep the notation, simple, we will denote the Euclidean
norm of a vector x in Rn by |x|, instead of ∥x∥, that we will reserve for (less standard) norms.
Notice that this notation does not create any ambiguity or collision with previously introduced
notations. Indeed:
• If n = 2 and we are identifying the R2 and C via the usual map (x1 , x2 ) 7→ x1 + ix2 ,
then Euclidean norm coincides with the complex absolute value.
9.103. — The Euclidean norm on Rn holds a special position among all norms on Rn . On
R2 or R3 , it measures the “physical” length of vectors. However, many other norms ∥ · ∥ confer
Rn the structure of normed vector space. A standard family of norms is given by
n
X 1/p
p
∥x∥p := |xi | ,
i=1
where p ∈ [1, +∞) is a given number. Notice that the Euclidean norm |x| corresponds to the
p = 2 case.
One can check (exercise) that for any given x ∈ Rn
is also a norm. That is why the maximum norm is commontly called infinity norm and denoted
∥ · ∥∞ .
only finitely many of which are nonzero (so that the previous sum always makes
sense, it is not a series!)
• the coefficients {vi }i∈I are uniquely determined, in other words the following
implication holds:
X
vi ei = 0, with finitely many non-zero vi ∈ R =⇒ ci = 0, ∀i ∈ I.
i∈I
For every vector space we can obtain a sequence of linearly independent vectors
e1 , e2 , e3 , . . . . In if this sequence is necessarily stops we obtain a finite basis and we
say that the vector space is finite dimensional. If the sequence can be continued ide-
fenitely we say that the vector space is infinite dimensional.
All bases of a finite dimensional vector space have the same number of vectors. This
number is call the dimension of V .
Exercise 9.104. — Show that the space of polynomials with real coefficients R[x] is an
infinite dimensional vector space and find a basis.
9.105. — If V has finite dimension n ∈ N and we fix a basis B = {ei }1≤i≤n then map
ıB (x1 , . . . , xn ) 7→ x1 e1 + . . . + xn en
is a (vector space) isomorphism and allows to treat V as Rn for most practical tasks. In
particular, if (V, ∥ · ∥) is a normed space then ıB induces a norm in Rn defined as
This is a motivation to prove results for Rn equiped with norms different from the Euclidean
one. Indeed, all the concepts and results that can be stated for general norms will automat-
ically hold in “abstract” finite dimensional normed vector space. We see next an important
instance of this.
Example 9.107. — Let n ∈ N. The 1-norm ∥ · ∥1 and the maximum norm ∥ · ∥∞ given in
Example 9.92 are equivalent, as the inequalities
hold for all v ∈ Rn . As we will show in Theorem 9.108, all norms on a finite-dimensional
vector space over R are equivalent to each other. This is not the case for infinite-dimensional
vector spaces. For example, the norms given in 9.92 on the space of continuous functions on
[0, 1] are not equivalent.
Theorem 9.108:
All norms on Rn are equivalent (i.e., comparable).
Proof. Let ∥ · ∥ be any norm on Rn , and let ∥ · ∥1 denote the 1-norm on Rn given in Example
9.92. We show that ∥ · ∥ and ∥ · ∥1 are equivalent, which proves the Theorem.
Let e1 , . . . , en denote the standard basis of Rn , and let A = max{∥e1 ∥, ∥e2 ∥, . . . , ∥en ∥}. For
any vector v = x1 e1 + · · · + xn en ∈ V , we have
n
X n
X
∥v∥ ≤ |xk | · ∥ek ∥ ≤ A · |xk | = A∥v∥1
k=1 k=1
which already shows one of the two required estimates. For the second estimate, consider the
set
S = {v ∈ V | ∥v∥1 = 1}
and the real number B = inf{∥v∥ | v ∈ S}. There exists a sequence (vn )∞ n=0 in S such that
the sequence (∥vn ∥)n=0 converges to B. Since (vn )n=0 is bounded for the 1-norm, it contains
∞ ∞
a convergent subsequence by the Heine-Borel theorem. By replacing (vn )∞ n=0 with such a
subsequence, we can ensure that (vn )n=0 converges to w ∈ V with respect to the 1-norm. For
∞
n ≥ N =⇒ ∥w − vn ∥1 ≤ A−1 ε =⇒ ∥w − vn ∥ ≤ ε
9.109. — Notice that as a simple consequence of Theorem 9.108, in every finite dimensional
vector space V all normed are equivalent. Indeed, we can fix a basis B = {e1 , . . . , en } of V
use the inclusion map ıB defined in 9.105 to export the result for Rn to V .
Proposition 9.110:
For all f ∈ Cb (X, R) set ∥f ∥ := supx∈X |f (x)|. Then (Cb (X, R), ∥ · ∥) is a complete
normed vector space. Furthermore, fn → f in this space if and only if the functions
(fn ) converge uniformly to f , meaning that
Proof. First of all, ∥f ∥∞ < ∞ by assumption. Let us check the properties of the norm
• Zero norm implies zero function: If ∥f ∥ = 0, then |f (x)| = 0 for all x ∈ X, that is
f (x) = 0 for all x ∈ X. Hence f is the zero element of the vector space Cb (X, R).
• Homogeneity: For any λ ∈ R, x ∈ X, we have |λf (x)| = |λ||f (x)|. Taking the sup
over all x ∈ X, we get ∥λf ∥ = |λ|∥f ∥.
• Continuity of f : Fix x0 ∈ X and let ϵ > 0. Since (fn )∞ n=0 is a Cauchy sequence with
respect to ∥ · ∥, there exists N ∈ N such that for all n, m ≥ N and all x ∈ X, we have
|fn (x) − fm (x)| < ϵ. In particular, for n ≥ N , we have
for all x ∈ X. Fix n ≥ N . Since |fn (x0 ) − fN (x0 )| < ϵ for all x0 ∈ X, the sequence
(fn (x))∞
n=0 is a Cauchy sequence in R, and therefore converges to some limit yx0 ∈ R.
This shows that the function f : X → R is indeed continuous.
• Uniform convergence: Let ϵ > 0. Since (fn )∞ n=0 is a Cauchy sequence with respect
to ∥ · ∥, there exists N ∈ N such that for all n, m ≥ N and all x ∈ X, we have
uniformly to f , there exists N ∈ N such that for all n ≥ N and all x ∈ X, we have |fn (x) −
f (x)| < ϵ. This implies that
Let (X, d) be a compact metric space and let F ⊂ C(X, Rm ) be a subset of functions
which is
(9.4)
∀ε > 0, ∃δ > 0, ∀f ∈ F, d(x, y) ≤ δ =⇒ |f (x) − f (y)| ≤ ε .
Proof of (⇒). (F) We show that F is complete and totally bounded. For the sake of simplicity,
we work with m = 1, the general case follows arguing coordinate-by-coordinate.
Since F is closed (by definition) and C(X) is complete (by Proposition 9.110), then F is
complete.
Let us fix ε > 0, and let δ > 0 be given by (9.4). Since X is compact, it is also totally
bounded, so we find point x1 , . . . xN in X such that
[−C, C] = I1 ∪ . . . ∪ IM , M ∼ 2C/ε.
and we claim that the family of balls {B(fσ , 4ε)}σ∈Σ covers F, that is
To see this define ρ : {1, . . . , N } → {1, . . . , M } by requiring g(xi ) ∈ Iρ(i) , for all i = 1, . . . , N .
By definition ρ ∈ Σ. Now for any x ∈ X take j in such a way that x ∈ B(xj , δ) and we can
bound
|fρ (x) − g(x)| ≤ |fρ (x) − fρ (xj )| + |fρ (xj ) − g(xj )| + |g(xj ) − g(x) ≤ 3ε,
where the first and the third terms are controlled by equicontinuity (d(x, xj ) < δ), and the
second using that fρ (xj ) and g(xj ) both lies in Ij , which is shorter than ε.
Finally we conclude observing that if {B(fσ , 4ε)}σ∈Σ covers F, then {B(fσ , 5ε)}σ∈Σ must
cover F (small exercise).
Proof of (⇐). (F) Assume F is compact in C(X). Then the product metric space X × F is
compact as well by Corollary 9.70. Furthermore, the function Φ : X × F → R, defined as
Φ : (x, f ) 7→ f (x),
Exercise 9.113. — Generalize Theorem 9.76. The image of a compact set trough a family
of equibounded and equicontinuous functions is compact in Rm . (F)
Let (X, d) be a compact metric space and let {fk } be a bounded sequence in C(X, Rm ).
Assume that {fk } is equi-continuous, meaning that
∀ε > 0, ∃δ > 0, ∀k ∈ N, d(x, y) ≤ δ =⇒ |fk (x) − fk (y)| ≤ ε .
where the vector γ ′ (t) = (γ1′ (t), . . . , γn′ (t)) is the velocity and the number ∥γ ′ (t)∥ is
the speed of the path at time t.
Exercise 9.118. — Show that if s : [0, 1] → [a, b] is a C 1 bijective map with C 1 inverse,
then L(γ ◦ s) = L(γ).
Z 1 N
X
∥γ ′ (s)∥ ds = sup ∥γ(tj+1 )−γ(tj )∥ : N ∈ N, 0 = t0 ≤ t1 ≤ . . . ≤ tN −1 ≤ tN = 1 .
0 j=0
Proof. We start showing LHS ≥ RHS. Denote γ(t) = (γ1 (t), . . . , γn (t)), fix N ∈ N, some
partition 0 = t0 ≤ t1 ≤ . . . ≤ tN −1 ≤ tN = 1 and any set of N unit vectors in ν1 , . . . , νN ∈ Rn :
Z 1 N Z
X tj+1 N Z
X tj+1
′ ′
∥γ (s)∥ ds = ∥γ (s)∥ ds ≥ νj · γ ′ (s) ds
0 j=0 tj j=0 tj
N
X Z tj+1 N
X
′
= νj · γ (s) ds = νj · γ(tj+1 ) − γ(tj ) .
j=0 tj j=0
γ(tj+1 )−γ(tj )
With the choice νj := ∥γ(tj+1 )−γ(tj )∥ the last term becomes
N
X Z tj+1 N
X
′
νj · γ (s) ds = ∥γ(tj+1 ) − γ(tj )∥,
j=0 tj j=0
γ ′ (t)
t 7→ and t 7→ γ ′ (t),
max{ε, ∥γ ′ (t)∥}
γ ′ (a) γ ′ (b)
[a, b] ⊂ [0, 1], |b − a| ≤ δ =⇒ − <ε
max{ε, ∥γ (a)∥} max{ε, ∥γ ′ (b)∥}
′
Now divide [0, 1] in N equal intervals of length less than δ and consider one of them, let it be
[tj , tj+1 ]. We distinguish two cases.
Case 1. If ∥γ ′ (s)∥ ≤ 2ε for all s ∈ [tj , tj+1 ], then we bound directly
Z
∥γ ′ (s)∥ ds ≤ 2ε(tj − tj+1 ).
[tj ,tj+1 ]
Case 2. If there is s̄ ∈ [tj , tj+1 ] such that ∥γ ′ (s̄)∥ > 2ε then, thanks to (9.5), we have
Thus, integrating this inequality in [tj , tj+1 ], and using Cauchy-Schwarz, we find
γ ′ (tj )
Z Z
(1 − ε) ∥γ ′ (s)∥ ds ≤ · γ ′ (s) ds
[tj ,tj+1 ] ∥γ ′ (tj )∥ [tj ,tj+1 ]
γ ′ (tj)
= · (γ(tj+1 ) − γ(tj )) ≤ ∥γ(tj+1 ) − γ(tj )∥.
∥γ ′ (tj )∥
N
X
≤ ∥γ(tj+1 − γ(tj )∥ + 2ε(tj − tj+1 ) ≤ RHS + 2ε,
j=0
we conclude letting ε ↓ 0.
We remark that the idea is simply to take intervals so short that in each of them γ ′ (s) is
approximately constant. The use of ε is instrumental to handle the case where γ ′ is in fact
′
short, but not pointing in a well-defined direction. In other words, the map t 7→ ∥γγ ′ (t)
(t)∥ is not
necessarily continuous, so we regularized it with ε.
1. Show that for any two points x, y ∈ U , there exists a piecewise differentiable path from
x to y.
K Z
X tk
L(γ) = |γ ′ (s)|ds
k=1 tk−1
is indeed a metric.
Multidimensional Differentiation
In this chapter, we extend the concept of derivatives to functions defined on open subsets
U ⊂ Rn and taking values in Rm . We will not impose any restrictions on the positive integers
n, m.
10.1.1 Definitions
The derivative of a real-variable function f : R → R at some point x0 ∈ R has various
equivalent interpretations. Of course each of these interpretations provides the same number
f ′ (x0 ), but the “meaning” we attach to this number is slightly different in each case. (F) Let
us recall three important ones:
• Slope of tangent line to the graph. We look at the graph of f , i.e. the curve
{y = f (x)} ⊂ R2 and write the tangent line to the graph at (x0 , f (x0 )) in the form
y = ax + b. Then we have a = f ′ (x0 ).
• Stretching Factor. Look at a short interval I around x0 and the corresponding interval
f (I) around f (x0 ). These two intervals are related trough a “stretching factor” which
tends to be f ′ (x0 ) as I is taken shorter. Look here for an animation. More rigorously,
the family of functions
44
Chapter 10.1 The Differential
TODO: I think it would be quite nice to show them that this is equivalent to the fact that
the blow-ups of f are linear maps. They have seen uniform convergence and it is geometrically
as insightful as the tangent plane approx (and more appropriate in some cases).
10.3. — For functions f : R → R the derivative f ′ (x◦ ) is a real number. Notice that in
this case L(y) = Dfx0 (y) = f ′ (x◦ )y.
Figure 10.1: For a function f : R2 → R, the best affine approximation corresponds to the
tangent plane of the graph in R3 .
Applet 10.4 (Tangent Plane). As shown in the above image, we depict the tangent planes
for the graphs of two functions f : R2 → R. Additionally, we visualize the partial derivatives
and directional derivatives in Definition 10.5. Is there a directional derivative that vanishes
at every point?
provided that the limit exists. If v = ej , for some j ∈ {1, . . . , n}, we may denote
∂ei f (x0 ) with
∂f (x0 )
Dj f (x0 ), ∂j f (x0 ), .
∂xj
Of course, if the partial derivative in the j-th coordinate exists at every point in U , we
obtain a function ∂j f : U → Rm , which we call the j th directional derivative of f .
10.6. — The partial derivative and the directional derivative along any vector are the
derivatives with respect to one of the independent variables, considering all other variables as
constants. For example, for the function f : R3 → R given by f (x, y, z) = x(y 2 + sin(z)), the
partial derivatives with respect to all coordinate directions are given by
∂x f (x, y, z) = y 2 + sin(z)
∂y f (x, y, z) = 2xy
Version: February 25, 2024. 46
Chapter 10.1 The Differential
∂z f (x, y, z) = x cos(z)
for all (x, y, z) ∈ R3 , as we can apply all known rules from Analysis I. If the total derivative
exists, we can connect it with partial derivatives and derivatives along arbitrary vectors using
the following proposition.
Proof. Assuming the total derivative Df (x0 ) exists, according to the definition of the deriva-
tive, f (x0 + h) = f (x0 ) + Df (x0 )(h) + o(∥h∥) holds for h → 0. Choosing h = sv for s → 0
and v ∈ Rn , we get
Formulate and prove analogous statements for directional derivatives in the direction of a fixed
vector v ∈ Rn .
Proof. Due to Lemma 10.9, we can assume m = 1. Let’s fix x0 ∈ U , and we need to show
that f is differentiable at x0 . By replacing f with x 7→ f (x + x0 ) − f (x0 ), we can also assume
that x0 = 0 and f (0) = 0. For x = (x1 , . . . , xn ) ∈ U , we then have
To show that the linear function L : (v1 , . . . , vn ) 7→ ∂1 f (0)v1 + · · · ∂n f (0)vn is the derivative
Df (0), we need to estimate the difference R(x) := f (x) − L(x) = f (0 + x) − f (0) − L(x).
R(x) = ∂1 f (ξ1 , x2 , x3 , . . . , xn ) − ∂1 f (0) x1
+ ∂2 f (0, ξ2 , x3 , . . . , xn ) − ∂2 f (0) x2
+ ···
+ ∂n f (0, 0, . . . , 0, ξn ) − ∂n f (0) xn
|xj |
According to the assumptions of the theorem and because ∥x∥ ≤ 1 for all x ∈ Rn , the
asymptotics
R(x)
lim =0
x→0 ∥x∥
The following exercise shows that the existence of partial derivatives of a function f , with-
out the continuity assumption, does not necessarily imply that the function is differentiable.
for (x, y) ∈ R2 . Show that the partial derivatives ∂x f and ∂y f exist everywhere in R2 , but f
is not differentiable at (0, 0).
10.14. — From Proposition 10.7, it follows in particular that the total derivative (when
it exists) Df (x0 ) is uniquely determined by the partial derivatives. Specifically, for v =
a1 e1 + · · · + an en ∈ Rn ,
n
X n
X
Dfx0 (v) = ai Dfx0 (ei ) = ai ∂i f (x0 ).
i=0 i=0
The m × n matrix of the linear map Dfx0 : Rn → Rm is thus given with respect to the
canonical bases by the matrix n × m matrix
∂f ∂f1 ∂f1
1
∂x (x0 ) ∂x2 (x0 ) ··· ∂xn (x0 )
∂f21 ∂f2 ∂f2
∂x1 (x0 ) ∂x2 (x0 ) ··· ∂xn (x0 )
(∂1 f (x0 ), ∂2 f (x0 ) . . . , ∂n f (x0 )) =
.. .. .. ..
. . . .
∂fm ∂fm ∂fm
∂x1 (x0 ) ∂x2 (x0 ) · · · ∂xn (x0 )
This matrix is referred to as the Jacobian matrix of f evaluated at the point x0 , commonly
denoted by Jf (x0 ). An alternative notation for the Jacobian matrix is Df (x0 ); however,
to prevent confusion with Dfx0 , which distinguishes between a linear map and a matrix
representation, we will not use this notation yet. While the difference between a matrix and a
linear map might appear to be a minor detail, it becomes more significant in future applications
such as in Differential Geometry or Physics.
Notice that, given U ⊂ Rn open, f ∈ C 1 (U, Rn ) if and only if the maps
Jf : U → Matm,n (R) ∼
= Rm×n
TODO: exercise where they compute the derivative tangential/normal to a circle of the
function (x, y) 7→ x2 + y 2
TODO: example of computation of Jf for a particular f : R3 → R2
or equivalently:
m
∂(g ◦ f ) X ∂g ∂fj
(x0 ) = (f (x0 )) (x◦ )
∂xi ∂xj ∂xi
j=1
with L = Df (x0 ) and R(x) = o(∥x∥) as x → 0, and M = Dg(y0 ) and S(y) = o(∥y∥) as y → 0.
Together, for x ∈ Rn small enough and y = f (x0 + x) − f (x0 ) = L(x) + R(x), we obtain the
equation
as x → 0. Let C = ∥L∥op + 1; then, there exists η > 0 such that for all x ∈ Rn
For x ∈ Rn with ∥x∥ < min{η, δC −1 }, we have ∥L(x) + R(x)∥ < δ, and using (10.2), we also
have
∥S(L(x) + R(x))∥ ≤ ϵ∥L(x) + R(x)∥ ≤ Cϵ∥x∥.
TODO: add a lot of examples, they are usually not able to use correctly the chain rule in
exams....
Exercise 10.18. — Euler’s identity for honogeneous functions (F) Assume f ∈ C 1 (R\ {0})
is positively homogeneous of degree λ ∈ R, that is to say
Di u Di u X
Di (arctan(u)) = 2
, Di (1/u) = − 2 , Di (|Du|2 ) = 2 Dj uDij u,
1+u u
j
X X X X
2 2
Dii (|Du| ) = 2 (Dij u) + 2 Di uDi Djj u .
i i,j i j
10.20. — Let’s consider the special case n = 1 for the chain rule. Suppose I ⊂ R is an
open interval, and γ : I → V ⊂ Rm is a differentiable function with values in an open subset
V ⊂ Rm . Further, let f : V → Rk be differentiable. Then, the chain rule implies that f ◦ γ is
differentiable, and the formula
and refer to it as the gradient of the function f at the point x. Using this notation, we
obtain the formula
for all t ∈ I.
10.21. — The concept of directional derivatives and the case of equality in the Cauchy-
Schwarz inequality allow us to provide a geometric interpretation of the gradient of a function.
If f : U → R is a differentiable function on an open subset U ⊂ Rn , then, according to
Proposition 10.7 and the Cauchy-Schwarz inequality, Proposition,
for any vector v ∈ Rn , with equality if and only if ∇f (x) and v are linearly dependent. This
implies that the gradient of f at every point points in the direction of the greatest directional
derivative, indicating the direction of the steepest ascent of f around x. Furthermore, ∥∇f (x)∥
gives the slope in that direction.
is satisfied.
for ξ = x0 + th.
U ′ = {x ∈ U | f (x) = f (x0 )}
TODO: Exercise: what if only a partial derivative is zero? What if U not connected?
Proof. It suffices to consider the case m = 1. First, assuming that U is convex and the
derivative is bounded. There exists M ≥ 0 such that ∥Df (ξ)∥op ≤ M for all ξ ∈ U . From the
mean value theorem 10.22, it follows for x, y ∈ U
for some ξ ∈ U , since U is convex and thus contains the straight segment between x and y.
This proves the second statement in the corollary. The first statement follows from the second
applied to the ball U0 = B(x0 , ϵ) where ϵ > 0 is chosen such that B(x0 , ϵ) ⊂ U . Indeed, U0
is convex, and the mapping ξ 7→ Df (ξ) is a continuous function on the compact set B(x0 , ϵ),
implying the boundedness of the derivative on B(x0 , ϵ).
Definition 10.26:
Let U ⊂ Rn be open, f : U → Rm be a function, and d ≥ 1. We say that f is k times
continuously differentiable if, for all j1 , . . . , jd in {1, . . . , n}, the partial derivative
Proof. We prove only the second item and leave the first as exercise. We do an induction on
min{k, m}. Since composition of continuous functions is necessarily continuous the base case
min{k, m} = 0 is handled. Now assume the statement has been proved whenever min{k, m} ≤
N and that we have f, ϕ, k, m with min{k, m} = N + 1. For sure ϕ ∈ C N (since m ≥ N + 1)
and Df ∈ C k−1 , so the inductive assumption applied to ∂i f ◦ ϕ gives
n
X
∂j (f ◦ ϕ) = ∂i f ◦ ϕ ∂j ϕi .
| {z } |{z}
i=1
C min{k−1,N } C m−1
Thus by the first part ∂j (f ◦ ϕ) is of class min{k − 1, N, k − 1} ≥ N, for all j, which means
that f ◦ ϕ in C N +1 .
Furthermore, for a sufficiently small but fixed h > 0, we consider the differentiable function
φ : [0, 1] → R given by φ(t) = f (x1 + th, x2 + h) − f (x1 + th, x2 ) and obtain
Figure 10.2: The function h 7→ F (h) is a signed sum of function values of f at the corners of
a square (here marked by a solid line). The function t 7→ φ(t) corresponds to the difference of
function values on a vertical segment through the square.
for some intermediate point ξ2 ∈ (0, 1). Since both components were used symmetrically in
the function h 7→ F (h), we can perform the argument again with the roles of the first and
second components swapped. This yields similarly
Since ξ1 , ξ2 , ξ1′ , ξ2′ ∈ (0, 1), the points (ξ1 h, ξ2 h) and (ξ1′ h, ξ2′ h) tend to (0, 0) as h tends to
0. Therefore, due to the continuity of both partial derivatives, we conclude ∂2 ∂1 f (x) =
∂1 ∂2 f (x).
TODO: along with the Hessian, perhaps define also the Laplacian?
for i, j ∈ {1, . . . , n}. Schwarz’s theorem 10.28 entails that H(x) is a symmetric matrix.
The Laplacian is the trace of this matrix
n
X
∆f (x) := tr H(x) = ∂ii f (x)
i=1
|α| := α1 + . . . + αn .
α! := α1 ! . . . αn !, 0! = 1.
Many combinatorial formulas are simple when expressed in multi-index notation, such
as the multi-nomial formula:
X α
α
α1
αn
α
(X + Y ) = X Yβ α−β
, where := ... .
β β β1 βn
β≤α
nm X 1
= .
m! α!
α∈Nn ,|α|=m
Thanks to Schwartz’s Theorem, multi-indexes are also useful to express higher order deriva-
tives.
Proof. Since U is open, there exists ϵ > 0 such that x + th ∈ U for all t ∈ (−ϵ, 1 + ϵ). We apply
the one-dimensional Taylor approximation to φ : (−ϵ, 1 + ϵ) → R given by φ(t) = f (x + th).
k Z 1
X φ(m) (0) (1 − t)k
φ(1) = + φ(k+1) (t) dt. (10.5)
m! 0 k!
m=0
Applying the chain rule in Theorem 10.16 to φ, we get for t ∈ (−ϵ, 1 + ϵ) the derivatives
n
X
′
φ (t) = ∂i f (x0 + th)hi ,
i=1
X n
φ′′ (t) = ∂i ∂j f (x0 + th)hi hj ,
i,j=1
Xn
φ′′′ (t) = ∂i ∂j ∂ℓ f (x0 + th)hi hj hℓ , ...
i,j,ℓ=1
where we used a combinatorial count to re-write the last sum. Indeed unwrapping the defini-
tions one has
and so, given a n-multiindex α of length m, the equation hi1 . . . him = hα has m!/α! solutions:
one has to choose α1 elements among m to be sent to 1, then α2 elements among the remaining
m − α1 to be sent to 2, etc. Hence the total number of solution is
m m − α1 m − α1 − α2 m − α1 − . . . αn−1 m! m!
... = =
α1 α2 α3 αn α1 ! . . . αn ! α!
d
X 1 k
P (h) − D f (x0 )(h, . . . , h) = o(|h|k ),
k!
k=0
but two polynomials of degree k whose difference is o(|h|k ) must have exactly the same coef-
ficients.
This Corollary can be useful to compute Taylor polynomials for explicit functions, without
having to care about factorials etc.
Plugging in t = x − y 2 we find
p x − y 2 (x − y 2 )2
1 + x − y2 = 1 + − + O((x − y 2 )3 )
2 8
1 1 1 1 1
= 1 + x − x2 − y 2 + xy 2 − y 4 + O((x − y 2 )3 ).
2 8 2 4 8
Now we want (x, y) → (0, 0), and a reminder which is o(r2 ) where r := x2 + y 2 . Observing
p
that
1 2 1 4
xy = O(r3 ), y = O(r4 ), O((x − y 2 )3 ) = O(r3 ), as r ↓ 0,
4 8
we find out expansion
p 1 1 1
1 + x − y 2 = 1 + x − x2 − y 2 + O(r3 ), as r ↓ 0.
2 8 2
Exercise 10.36. — (F) Compute the Taylor polynomials up to the quadratic order at
(0, 0) of the following functions in two variables
p 1
sin(xy), 1 + x + y2, exp arctan(x − y), ,...
1 − x2 − y 2
Exercise 10.37. — (F) Prove the Taylor expansion of the determinant close to the identity
is
t2
tr(X)2 − tr(X 2 ) + O(t3 ),
det(I + tX) = 1 + t tr(X) +
2
and that the one of the inverse matrix function is
(I + tX)−1 = 1 − tX + t2 X 2 + O(t3 ).
10.38. — The formula in Theorem 10.33 is called the Taylor expansion with remainder
of f at the point x. The main term
d
X 1 k
P (h) = f (x) + D f (x)(h, . . . , h)
k!
k=1
is called the remainder. The estimation R(h) = O(∥h∥d+1 ) follows from the one-dimensional
case.
Applet 10.39 (Taylor Approximation). We observe how the first, second, or third-order Tay-
lor approximations approximate the function f (x, y) = sin(x) cos(y) + 2.
Proof. Without loss of generality, assume that f attains a local maximum at x0 . For all
j ∈ {1, . . . , n} and sufficiently small h > 0, we have, by assumption,
62
Chapter 11.1 Optimization Problems
These conditions can be equivalently stated in terms of the eigenvalues of A (which can
be diagonalised thanks to the Spectral Theorem).
Proposition 11.3:
Let U ⊂ Rn be open, f : U → R be twice continuously differentiable, and let x0 ∈ U
with Df (x) = 0. Let H(x0 ) be the Hessian matrix of f at point x0 .
(4) If H(x0 ) is degenerate (i.e., has zero determinant), then x0 might, or might not, be
an extremum point: the Hessian test is inconclusive.
Proof. The Hessian matrix H(x0 ) is the matrix of the second derivative of f at x0 as a
symmetric bilinear form D2 f (x0 ) : Rn × Rn → R. Let Q denote the associated quadratic
form.
Q(h) = Df (x0 )(h, h) = ⟨h, H(x0 )h⟩
where α(x0 , h) = o(1) as h → 0. If Q is positive definite, then Q(w) > 0 for all w ∈ Sn−1 =
{v ∈ Rn | ∥v∥ = 1}. Since Sn−1 is compact by the Heine-Borel Theorem ??, and Q is
continuous, by the Compactness of Metric Spaces Theorem ??, there exists c > 0 such that
Q(w) ≥ c for all w ∈ Sn−1 . Furthermore, there exists δ > 0 such that the error term α(x0 , h)
is smaller in absolute value than 2c for h ∈ Rn with ∥h∥ < δ. It follows that
1 2 h c c
f (x0 + h) − f (x0 ) ≥ ∥h∥ Q − ≥ ∥h∥2 > 0
2 ∥h∥ 2 4
for all h ∈ B(0, δ) \ {0}, implying that f has a strict local minimum at x0 . If Q is negative
definite, we replace f with −f , which replaces Q with −Q. However, the quadratic form −Q is
positive definite, and thus, −f has a strict local minimum at x0 , proving the second statement
of the corollary.
If Q is indefinite but non-degenerate, then there exist w− , w+ ∈ Sn−1 with Q(w− ) < 0
and Q(w+ ) > 0. For sufficiently small s ∈ R \ {0}, we have |α(x0 , sw− )| < 21 |Q(w− )| and
|α(x0 , sw+ )| < 12 Q(w+ ), and thus, f (x0 + sw− ) − f (x0 ) < 0 and f (x0 + sw+ ) − f (x0 ) > 0
from (11.1). Therefore, f has neither a local minimum nor a local maximum at x0 .
The corresponding Hessian matrices are ( 20 02 ), 20 −20 , and −2 0 . If the Hessian matrix
0 −2
is degenerate, i.e., if 0 is an eigenvalue of H(x0 ), then generally nothing can be said. The
function f (x, y) = ax4 + by 4 has a local maximum, a local minimum, or neither at 0, and the
Hessian matrix at 0 is the zero matrix regardless of the choice of a and b.
• If a > 0 and 4ab − 1 > 0, then H is positive definite, and f has a local minimum at 0.
• If a < 0 and 4ab − 1 > 0, then H is negative definite, and f has a local maximum at 0.
Exercise 11.6. — Let α ∈ R. Find all points (x, y) ∈ R2 where the derivative of the
function given by f (x, y) = x3 − y 3 + 3αxy vanishes. Determine whether each point is an
extremum and, if so, whether it is a local minimum or maximum.
11.1.2 Convexity
11.1.3 Extrema with Constraints and Lagrange Multipliers
In this paragraph we give fairly concrete recepeis to tackle the following problem
where
• f : U → R is of class C 1
• if x0 ∈
/ ∂U, then there are coefficients λ0 , . . . , λk , not all zero, such that
Proof “à la De Giorgi”. Replacing f with f + | · −x0 |4 we may assume that x0 is a strict local
minimum, without changing Df (x0 ).
Step 1. Consider fε (x) := f (x) + 2ε 1
(g12 (x) + . . . + gk (x)2 ), for x ∈ B r (x0 ) and let xε be
its minimum point by compactness.
Step 2. By contradiction one must have fε (xε ) → f (x0 ) and xε → x0 , as ε → 0. Indeed
we have
g12 (xε ) + . . . + gk (xε )2 ≤ 2εfε (xε ) ≤ 2εf (x0 ) → 0,
f (x0 ) ≤ f (x̄) = lim f (xε ) ≤ lim inf fε (xε ) ≤ lim sup fε (xε ) ≤ f (x0 ),
ε ε ε
which implies x̄ = x0 (recall that x0 is a strict local maximum). This proves that fε (xε ) →
f (x0 ) and that {xε } can only accumulate at x0 . On the other hand {xε } ⊂ B r (x0 ) is bounded
so it must have accumulation points. Hence we must have xε → x0 as ε ↓ 0 (no need to take
subseqeunces).
Step 3. In particular xε ∈ Br (x0 ) eventually, so
0 = εDfε (xε ) = εDf (xε ) + g1 (xε )Dg1 (xε ) + . . . + gk (xε )Dgk (xε ).
This means that the k + 1 vectors {Df (xε ), Dg1 (xε ), . . . , Dgk (xε )} are linearly dependent,
hence there is a unit vector λε ∈ Rk+1 such that
Step 4. Passing this equation to the limit and using the compactness of the k-sphere we
conclude.
11.8. — Often Proposition 11.7 is used in practice in the following way, under the extra
assumption that
Dg(x) has rank k for all x ∈ U,
k
X
m
L:U ×R →R L(x, λ) = f (x) − λj gj (x).
j=1
The components of λ ∈ Rk are called Lagrange multipliers, then Proposition 11.7 guaran-
tees that there exists λ ∈ Rk such that the equations
and aim to find the minimum of the function f (x, y) = 4y − 3x on K. The set M = K \
{(0, 0), (1, 1), (−1, 1)} is a one-dimensional manifold. We have
Now, using the method of Lagrange multipliers, we seek the local extremum values of f |M .
The Lagrange function associated with f and F is given by
L(x, y, λ) = 4y − 3x − λ(y 3 − x2 ).
∂x L(x, y, λ) = −3 + 2λx
∂y L(x, y, λ) = 4 − 3λy 2
∂λ L(x, y, λ) = −(y 3 − x2 ).
2 3
Since y ̸= 0, this yields y = 892 and x = 98 y 2 = 893 . Therefore, using the Lagrange multipliers
method, we find a single additional candidate for extremal values:
83 82 2 3
f ,
93 92
= 4 892 − 3 893 = 1.053 . . . .
The set of all points in K where f attains a local extremum on K is thus contained in
3 2
{(0, 0), (1, 1), (1, −1), ( 983 , 892 )}.
The global maximum of f is at the point (1, 1) with a value of 7, and the global minimum is
at the point (0, 0) with a value of 0.
Applet 11.10 (Lagrange Multipliers and Normal Vectors). In this applet, we illustrate Propo-
sition ?? and Corollary ?? using a one-dimensional submanifold. Under this assumption, only
one gradient vector ∇F is present, so Proposition ?? states that ∇F and ∇f should be parallel.
Exercise 11.12. — Show that ⟨A, B⟩ = tr(AB T ) defines an inner product on Matn,n (R).
2
Identify the corresponding norm on Rn .
Proposition 11.14:
For m, n ∈ N, the operator norm ∥ · ∥op indeed defines a norm on the vector space
Matm,n (R). Furthermore, the following inequalities hold:
Proof. For the sake of brevity, we write B n = {x ∈ Rn | ∥x∥2 ≤ 1}. According to the Heine-
Borel theorem ??, B n ⊂ Rn is compact. Since the function B n → R given by x 7→ ∥Ax∥2 is
continuous as a composition of continuous functions, by Theorem ??, it is bounded, implying
∥A∥op < ∞.
Version: February 25, 2024. 68
Chapter 11.2 Relevant examples
follow directly from the definition of the operator norm and the corresponding properties of
the Euclidean norm.
For the left inequality in (11.3), it is noted that there is nothing to prove in the case of
x = 0. If x ̸= 0, then
x
∥Ax∥2 = |A ∥x∥ 2
∥2 ≤ ∥A∥op ∥x∥2 .
For the second inequality in (11.3), one calculates ∥ABx∥2 ≤ ∥A∥op ∥Bx∥2 ≤ ∥A∥op ∥B∥op for
all x ∈ B n , proving the claim.
Exercise 11.15. — Show that for B ∈ Matm,m (R) with ∥B∥op < 1, the matrix Im − B is
invertible with
X∞
(1m − B)−1 = Bj .
j=0
standing. Applying this insight to every genuinely complex root of f shows that f can be
written as a product of real polynomials of degree 1 and degree 2. We have implicitly used
this in the discussion of the integration of rational functions.
Proof. Let f ∈ C[T ] be a polynomial of degree n > 0. If f (0) = 0, we have already found
a root. So, assume M = 2|f (0)| > 0. Since f is not constant, according to Proposition ??,
there exists an R ≥ 1 such that for all z ∈ C with |z| ≥ R, |f (z)| ≥ M holds. According to
the Heine-Borel theorem ??,
K = {z ∈ C | |z| ≤ R}
is a compact subset. We now apply the existence of the minimum value in Theorem ?? to the
continuous function z ∈ K 7→ |f (z)| ∈ R and find z0 ∈ K with |f (z0 )| = min{|f (z)| | z ∈ K}.
Since |f (0)| = M2 , the inequality |f (z0 )| < M holds. Since |f (z)| ≥ M for all z ∈ C \ K, we
obtain |f (z)| ≥ |f (z0 )| for all z ∈ C.
We claim that z0 is a root of f . Instead, assume that |f (z0 )| > 0, leading to a contradiction.
We represent f as a power series around z0 :
n
X
f (z) = bk (z − z0 )k
k=0
(n)
with b0 = f (z0 ), b1 = f ′ (z0 ), . . . , bn = f n!(z0 ) ∈ C. The existence of this representation
follows from polynomial division with remainder and induction on n = deg(f ). By the as-
sumption on z0 , b0 ̸= 0. Let ℓ ≥ 1 be the smallest index ≥ 1 with bℓ ̸= 0. Set z = z0 +r exp(iφ)
for a fixed φ ∈ R, which we will choose precisely later, and a varying r > 0. We have
bℓ ℓ iℓφ
f (z0 + reiφ ) = b0 + bℓ rℓ eiℓφ + O(rℓ+1 ) = b0 1 + b0 r e + O(rℓ+1 )
−ψ+π
as r → 0. Write bℓ
b0 = seiψ and choose φ = ℓ . Then, ei(ℓφ+ψ) = −1 and
|f (z0 + reiφ )| = |b0 1 − srℓ + O(rℓ+1 )| ≤ |b0 | 1 − srℓ + O(rℓ+1 )
for r → 0. For sufficiently small r > 0, this upper bound is smaller than |b0 |, leading to a
contradiction with |f (z0 )| = |b0 | = min{|f (z)| | z ∈ K}. This proves that f (z0 ) = 0 must
hold.
Theorem 11.19:
Every symmetric matrix A ∈ Matn,n (R) is diagonalizable, and there exists an orthonor-
mal basis of Rn consisting of eigenvectors of A.
Lemma 11.20:
Let n ≥ 1 and A ∈ Matn,n (R) be a symmetric matrix. Then, A has a real eigenvector.
Proof. Consider the sphere S = Sn−1 as the zero set of the function F : Rn → R, given by
F (x) = ∥x∥2 − 1, and the real-valued function
f : Rn → R, f (x) = xt Ax.
as the Lagrange function associated with S and f . According to Corollary ??, there exists
λ ∈ R such that
for all j = 1, . . . , n and ∂λ L(p, λ) = F (p) = 0. The latter simply implies ∥p∥ = 1, or in other
words, p ∈ S, as already known.
Now, let’s compute the partial derivatives of F and f . For all x ∈ Rn , we have
n
!
X
∂j F (x) = ∂j x2k = 2xj
k=1
n
X n
X n
X
∂j f (x) = ∂j akℓ xk xℓ = ajℓ xℓ + akj xk
k,ℓ=1 ℓ=1 k=1
by the product rule, and since ∂j (xk ) is zero unless k ̸= j. Assuming A is symmetric, we
obtain
n
X
∂j f (x) = 2 ajℓ xℓ = 2(Ax)j .
ℓ=1
Proof of Theorem 11.19. In addition to Lemma 11.20, we need some more linear algebra for
the proof, which we will now carry out by induction on n. For n = 1, there is nothing to
prove. So, let A ∈ Matn,n (R) be a symmetric matrix. According to Lemma 11.20, there exists
a real eigenvector v1 ∈ Sn−1 corresponding to an eigenvalue λ1 ∈ R. Consider the orthogonal
complement
W = v1⊥ = {w ∈ Rn | ⟨w, v1 ⟩ = 0}
and it follows that A(W ) ⊂ W . Let w1 , . . . , wn−1 be an orthonormal basis of W with respect to
⟨·, ·⟩ (which exists due to the Gram-Schmidt orthonormalization process). For i, j ∈ {1, . . . , n−
1}, we now have
⟨Awj , wi ⟩ = ⟨wj , Awi ⟩ = ⟨Awi , wj ⟩ .
In other words, the basis representation B of A|W : W → W with respect to the basis
w1 , . . . , wn−1 is symmetric again. By the induction hypothesis, there exists an orthonormal
basis v2 , . . . , vn of eigenvectors of B. Since B (together with the standard basis of Rn−1 )
corresponds exactly to A|W (together with the orthonormal basis w1 , . . . , wn−1 ), there exists
an orthonormal basis v2 , . . . , vn of eigenvectors of A.
Thus, v1 , . . . , vn forms an orthonormal basis of Rn consisting of eigenvectors of A.
Exercise 11.21. — For the sake of completeness, we present an elementary argument for
the proof of Lemma 11.20 using the Fundamental Theorem. Let n ≥ 1 and A ∈ Matn,n (R) be
a symmetric matrix.
(ii) Prove Lemma 11.20 by showing that A has a complex eigenvector if and only if A has a
real eigenvector.
The geometric understanding of the eigenvalues of A gained in our proof can also be utilized
differently. As an example, one can prove a special case of the Courant-Fischer theorem.
Exercise 11.22. — For n ≥ 2, prove that two points x, y ∈ Sn−1 have maximum distance
if and only if x = −y. Consider the function (x, y) 7→ ∥x − y∥2 on Sn−1 × Sn−1 ⊂ R2n .
then functions {Fi }, from U to Rn , are called the components of the vector field.
11.24. — We often visualize vector fields by drawing the vector F (x) with the point x ∈ U .
In physics, vector fields are often force fields or indicate the flow velocity of a medium.
Lemma 11.26:
Let U ⊂ Rn be an open subset, f : U → Rn be a continuous vector field, let γ : [a, b] →
Rd be a continuously differentiable path and let ψ : [0, 1] → [a, b] be a C 1 function such
that ψ(0) = a, ψ(1) = b. Then Z Z
F = F.
γ γ◦ψ
Proof. This is a consequence of the change of variable formula of Analysis I and the chain
rule:
Z Z 1 Z 1
F dt = F (γ(ψ(t))) · (γ ◦ ψ)′ (t)dt = F (γ(ψ(t))) · γ ′ (ψ(t))ψ ′ (t)dt
γ◦ψ 0 0
Z b Z
′
= F (γ(s)) · γ (s)ds = F.
a γ
Proof. Let 0 = s0 < s1 < . . . < sN = 1 be a suitable partition of the unit interval. There
exists a smooth function β : [0, 1] → R with non-negative values such that β(sk ) = 0 for
k = 0, 1, 2, . . . , N , and Z sk
β(t)dt = sk − sk−1
sk−1
In the next two propositions we show that a vector field F admits a potential if and only
if the value of γ 7→ γ F depends only on the endpoints of γ.
R
Proposition 11.29:
Let U ⊂ Rn be open, and let F : U → Rn be a continuous vector field. Suppose there
exists a potential f : U → R for F . Then
Z
F = f (γ(1)) − f (γ(0))
γ
Proof. If γ : [0, 1] → U is a continuously differentiable path, then for t ∈ [0, 1], F (γ(t)) =
grad f (γ(t)) = Df (γ(t)), and thus
Z Z 1 Z 1 Z 1
′ ′
F dt = F (γ(t)), γ (t) dt = Df (γ(t))(γ (t))dt = (f ◦ γ)′ (t)dt = f (γ(1)) − f (γ(0))
γ 0 0 0
by the chain rule. If γ is only piecewise continuously differentiable with respect to a partition
0 = s0 < s1 < . . . < sN = 1, the calculation can be applied to the subintervals [sk−1 , sk ]. This
leads to a telescoping sum where all terms except f (γ(1)) − f (γ(0)) cancel.
11.30. — One of the many physical interpretations of such path integrals is the calculation
of work along a path γ. Suppose F (x) indicates the direction and strength of a force field at
point x ∈ U . Then, ⟨F (γ(t)), γ(t + δ) − γ(t)⟩ is approximately the work done when an object
moves along the path γ from γ(t) to γ(t + δ). This leads to the interpretation of γ F dt as the
R
performed work along the path γ. This total work generally depends on the chosen path, not
just the starting point γ(a) and the endpoint γ(b). However, the work done does not depend
on the chosen parameterization of the path, as shown in Lemma 11.26.
Example 11.31. — Consider the vector field F on R2 defined by F (x, y) = (−y, x) and
calculate the integral of F along different paths from (0, 0) to (1, 1).
Figure 11.1: The vector field F - vector lengths are scaled by a factor of 0.1.
Let γ0 , γ1 , and γ2 : [0, 1] → R2 be the paths from (0, 0) to (1, 1) given by γ0 (t) = (t, t) and
(2t, 0) if t ∈ [0, 21 ] (0, 2t) if t ∈ [0, 21 ]
γ1 (t) = γ2 (t) =
(1, 2t − 1) if t ∈ [ 1 , 1] (2t − 1, 1) if t ∈ [ 1 , 1]
2 2
Then, Z Z 1 Z 1
F = F (γ0 (t)), γ0′ (t) dt = ⟨(−t, t), (1, 1)⟩ dt = 0
γ0 0 0
1
Z Z
2
Z 1
F = ⟨(0, 2t), (2, 0)⟩ dt + ⟨(1 − 2t, 1), (0, 2)⟩ dt = 1
1
γ1 0 2
1
Z Z
2
Z 1
Fd = ⟨(−2t, 0), (0, 2)⟩ dt + ⟨(−1, 2t − 1), (2, 0)⟩ dt = −1
1
γ2 0 2
We see that the work performed γ F depends on the chosen path γ. If one moves perpendicular
R
to the vector field, no work is done. If one moves with the vector field, positive work is done,
and if one moves against the vector field, negative work is done.
From these calculations, it follows in particular that the vector field F does not possess a
potential.
Exercise 11.32. — Let U ⊂ Rn be open and connected, and let x0 , x1 ∈ U . Show that x0
and x1 can be connected by a continuously differentiable path.
holds.
Proposition 11.34:
Let U ⊂ Rn be open, and F : U → Rn be a continuous vector field. Assume that, for
all piecewise continuously differentiable paths γ : [0, 1] → U and σ : [0, 1] → U , with
γ(0) = σ(0), γ(1) = σ(1) it holds
Z Z
F = F.
γ σ
Proof. Let x0 ∈ U be a fixed point. Since U is connected, there exists, according to Exercise
11.32, for every x ∈ U , a piecewise continuously differentiable path γx in U with initial point
x0 and endpoint x. We consider the function f : U → R given by
Z
f (x) = F
γx
which does not depend on the chosen paths γx by assumption. For k = 1, 2, . . . , n, we verify
that ∂k f (x) = Fk (x), the k-th component of F (x) ∈ Rn . Let x ∈ U and h ∈ R \ {0} be small
enough so that x + thek ∈ U for all t ∈ [0, 1]. Using the path γx : [0, 1] → U from x0 to x, we
can define a path φh from x0 to x + hek as
γ (2t)
x if 0 ≤ t ≤ 21
φh (t) =
x + (2t − 1)he if 1 ≤ t ≤ 1
k 2
due to Theorem 14.80 and the continuity of F . Since this holds for all x ∈ U and k =
1, 2, . . . , n, and F1 , . . . , Fn are continuous by assumption, it follows from Theorem 10.10 that
f is differentiable and grad f (x) = F (x) for all x ∈ U .
Corollary 11.35:
Let U ⊂ Rn be open, and let F be a continuously differentiable conservative vector
field on U , with components F1 , . . . , Fn . Then we necessarily have the integrability
conditions:
∂j Fk = ∂k Fj
∂j Fk = ∂j ∂k f = ∂k ∂j f = ∂k Fj . (11.6)
Example 11.36. — Let U = R2 \ {0}, and consider the vector field F : U → R2 given by
−y x
F (x, y) = , 2
x + y x + y2
2 2
−x2 + y 2
x −y
∂1 F2 (x, y) = ∂x = 2 = ∂y = ∂2 F1 (x, y)
x + y2
2 (x + y 2 )2 x + y2
2
thus satisfying the integrability conditions (11.6) throughout U . However, F is not conserva-
tive. Let γ : [0, 1] → U be the continuously differentiable loop defined by
In order to state the main theorem of this section we need to introduce the concept of
homotopy of curves.
Definition 11.37:
Let X be a metric space, and let γ0 and γ1 be paths in X with the same initial point
x0 = γ0 (0) = γ1 (0) and the same endpoint x1 = γ0 (1) = γ1 (1).
A homotopy from γ0 to γ1 is a continuous function H : [0, 1] × [0, 1] → X with the
following properties:
for all t ∈ [0, 1] and all s ∈ [0, 1]. We say γ1 is homotopic to γ0 if there exists a
homotopy from γ0 to γ1 .
11.38. — Let H be a homotopy from γ0 to γ1 as in the definition. For each fixed s ∈ [0, 1],
the function γs : t 7→ H(s, t) is a path from x0 to x1 . For s = 0 and s = 1, we obtain the
given paths γ0 and γ1 . This way, we can view the homotopy H as a parametrized family of
paths depending continuously on the parameter s ∈ [0, 1].
γ0 ∼ γ1 ⇐⇒ γ1 is homotopic to γ0
Figure 11.2: A non-connected space X, a connected but not simply connected space Y , and
a simply connected space Z.
Exercise 11.42. — Show that a simply connected open set X ⊂ Rn does not have “co-
dimension two” holes. This is the precise meaning: every f : ∂D → X can be extended to some
F : D → X where D denotes the closed bi-dimensional disk. In some sense, any (distorted)
loop bounds a (distorted) disk in X.
Exercise 11.43. — Let X be a topological space, and let γ : [0, 1] → X be a path from
x0 ∈ X to x1 ∈ X.
(1) is connected: The set U cannot be written as the disjoint union of two open,
non-empty subsets of U (Definition 9.79).
(2) is path-connected: For any two points x0 , x1 ∈ U , there exists a path in U from
x0 to x1 (Definition 9.86).
(3) is simply connected: For any two points x0 , x1 ∈ U , there exists a path in U from
x0 to x1 , and between any two such paths, there exists a homotopy (Definition ??).
(4) is star-shaped: There exists an x0 ∈ U such that for all x1 ∈ U and t ∈ [0, 1],
(1 − t)x0 + tx1 ∈ U .
(5) is convex: For all x0 , x1 ∈ U and all t ∈ [0, 1], (1 − t)x0 + tx1 is also in U .
hold among the properties listed above. Find examples of subsets in R2 that demonstrate the
failure of each reverse implication.
∂k Fj = ∂j Fk (11.7)
for all j, k ∈ {1, . . . , n}. Let γ0 : [0, 1] → U and γ1 : [0, 1] → U be piecewise continuously
differentiable paths with the same initial point x0 and the same endpoint x1 . If γ0 and
γ1 are homotopic, then Z Z
F = F.
γ0 γ1
Corollary 11.46:
Let U ⊂ Rn be open and simply connected. A continuously differentiable vector field on
U is conservative if and only if it satisfies the integrability conditions (11.7).
Proof. This directly follows from Theorem 11.45, Proposition 11.34 and Definition 11.40.
Before proving Theorem 11.45 we show a simpler proof under the additional assumption
that U is convex.
Proof. The necessity of the integrability conditions was already proven in Corollary 11.35. For
the converse, assume without loss of generality that 0 ∈ U . We use the path integral of F
along the straight line from 0 to x ∈ U to define a function F : U → R by
Z 1
f (x) = ⟨F (tx), x⟩dt
0
for x ∈ U . As per the proof of Theorem 11.34, f is a candidate for a potential of F . Fix
j ∈ {1, . . . , n} and consider, as a preparation for the computation of ∂j f , for h ∈ Rn
n
X n
X
∂h Fj = hk ∂k Fj = hk ∂j Fk (11.8)
k=1 k=1
by the assumed integrability conditions. According to the Theorem 14.80, ∂j f exists, and for
x ∈ U it holds
n n
Z 1 X ! Z 1 X !
∂j f (x) = ∂j Fk (tx)xk dt = (∂j Fk )(tx)txk + Fj (tx) dt, (11.9)
0 k=1 0 k=1
since only the term with k = j requires the product rule, and the partial derivative of x 7→
Fk (tx) with respect to xj is given by t(∂j Fk )(tx) for x ∈ U , following from the chain rule. We
use (11.8) for h = x in (11.9) and obtain, by partial integration,
Z 1 Z 1 Z 1 Z 1
∂j f (x) = t ∂x Fj (tx) dt + Fj (tx)dt = [tFj (tx)]10 − Fj (tx)dt + Fj (tx)dt = Fj (x)
0 | {z } 0 0 0
where we have recognized the derivative with respect to t of Fj (tx) in the underbraced term.
Thus, F = ∇f , and f is continuously differentiable by Theorem 10.10.
11.49. — In rough terms, the proof of Theorem 11.45 goes as follows. Let U ⊂ Rn be
open, and let x1 , x2 ∈ U . We equip the set
with the distance given by d(γ0 , γ1 ) := sup{∥γ0 (t) − γ1 (t)∥ | 0 ≤ t ≤ 1}, so that (Ω, d) is a
metric space whose “points” are in fact paths. In particular concepts like continuous functions
and connectedness applies to this “abstract” metric space. We are also interested in the subset
Lemma 11.50:
Ω′ is dense in Ω with respect to the distance d.
Proof. We need to prove that any continuous path γ can be approximated, in the distance
d, by a piecewise continuously differentiable path θ having the same extrema. It is of course
enough to approximate each component of γ.
Once we observe that each γi : [0, 1] → R is uniformly continuous, it is sufficient to approx-
imate it by linear interpolation of a sufficiently fine mesh.
Proof of Theorem 11.45. Under the hypotheses of Theorem 11.45, let x0 and x1 be elements
of U , and define Ω, Ω′ as in 11.49. We consider the function
Z Z 1
I : Ω′ → R I(γ) = F = ⟨F (γ(t)), γ ′ (t)⟩dt.
γ 0
Step 1. We claim that for every point σ ∈ Ω, there exists an ϵ > 0 such that I is constant
on the open ball B(σ, ϵ) ∩ Ω′ (of course this ball is taken with respect to the distance d).
Fix σ ∈ Ω. The image σ([0, 1]) ⊂ U is compact, U ⊂ Rn is open and therefore there exists
an ϵ > 0 such that
B(σ(t), 2ϵ) ⊂ U for all t ∈ [0, 1],
where B(σ(t), ϵ) here denotes the open ball in Rn centered at σ(t) with radius ϵ.
A direct but somewhat tedious calculation (we postpone it to the end of the proof) shows
E ′ (s) = 0 and so
I(γ0 ) = E(0) = E(1) = I(γ),
¯
I(σ) := I(γσ ).
¯
{γ ∈ Ω | I(γ) ¯ 0 )}
= I(γ
is open (I¯ being locally constant) and closed (I¯ being continuous). Thus it must contain the
path-connected component of γ0 . If γ0 ∈ Ω and γ1 ∈ Ω are homotopic paths as in the theorem,
then, according to Lemma 11.51, γ0 and γ1 lie in the same path-connected component of Ω.
It follows that I(γ0 ) = I(γ1 ), completing the proof.
Step 4. We conclude checking that indeed E ′ (s) = 0. Observe that
So we have
X d Z 1
′
E (s) = Fk (H(s, t))(sγk′ (t) + (1 − s)γ0,k
′
(t)) dt
ds 0
k
XZ 1
= ∂s Fk (H(s, t))∂t Hk (s, t) dt
k 0
XZ 1 XZ 1
= ∂ℓ Fk (H(s, t))∂s Hℓ (s, t)∂t Hk (s, t) dt + Fk (H(s, t))∂ts Hk (s, t) dt.
k,ℓ 0 k 0
The trick here is to integrate by parts the second term to cancel the first one, exploiting both
the integrability conditions and γ, γ0 having the same ends.
XZ 1 XZ 1
Fk (H(s, t))∂ts Hk (s, t) dt = Fk (H(s, t))(γk′ (t) − γ0,k
′
(t))
k 0 k 0
XZ 1
d
= Fk (H(s, t)) (γk (t) − γ0,k (t))
0 dt
k
XZ 1
d
=− Fk (H(s, t)) (γk (t) − γ0,k (t))
0 dt
k
XZ 1
=− ∂ℓ Fk (H(s, t))∂t Hℓ (s, t)∂s Hk (s, t) dt
k,ℓ 0
Now, exchanging the role of the indexes k, ℓ in the two sums, and using ∂k Fℓ = ∂ℓ Fk , one
finds
XZ 1 XZ 1
∂ℓ Fk (H) ∂s Hℓ ∂t Hk dt = − ∂ℓ Fk (H) ∂t Hℓ ∂s Hk dt.
k,ℓ 0 k,ℓ 0
Applet 11.52 (Integrability Conditions). What different values for the path integral can you
obtain when considering closed paths? Why does the value of the path integral usually not
change, but sometimes does when you move the middle three points?
Let (V, ∥ · ∥) be a complete normed vector space, B1 ⊂ V be the unit ball in the | · | norm,
and let F : B1 → V a function of the form F (x) = x + ϕ(x), with Lip(ϕ) ≤ λ < 1.
Then
Proof. (F) We start with the first inclusion in (1), which is the core of this proof. For y ∈
B(F (0), 1 − λ) consider the recurrence
x0 = 0, xk+1 = y − ϕ(xk ).
If we prove that xk → x̄, then x̄ = y − ϕ(x̄), that is we solved y = F (x̄). Using the triangular
inequality and the contraction property of ϕ we find
which proves that {xk } is Cauchy. We are not done yet though, we have to check that {xk }
is well defined, i.e., never escapes from B1 , as we are computing ϕ on it. This is checked by
the same computation, for all k it holds
k k
X X |y − F (0)|
|xk+1 | ≤ |xi+1 − xi | = |y − F (0)| λi ≤ < 1.
1−λ
i=0 i=0
Now, we prove that F (B1 ) is open. Let Br (x0 ) ⊂ B1 and y0 = F (x0 ). Then the map
Fx0 ,r (x) := x + rϕ((x − x0 )/r), defined in B1 , is again a λ-Lipschitz perturbation of the
85
Chapter 12.1 The inverse function Theorem (F)
F (B1 ) ⊃ F (Br (x0 )) = Fx0 ,r (B1 ) ⊃ B1−λ (Fx0 ,r (0)) = B(1−λ)r (F (x0 )) = B(1−λ)r (y0 ),
We turn to (2). First of all, if F (x) = F (x′ ), then x − x′ = ϕ(x) − ϕ(x′ ), which is in
contradiction with λ < 1. So F is injective and F −1 is a well-defined function. We call
ψ(y) := F −1 (y) − y, y ∈ F (B1 ). Then ψ satisfies
|ψ(y) − ψ(y ′ )| ≤ |ϕ(y + ψ(y)) − ϕ(y ′ + ψ(y ′ ))| ≤ λ|y − y ′ + ψ(y) − ψ(y ′ )|
≤ λ|y − y ′ | + λ|ψ(y) − ψ(y ′ )|,
θ : X 7→ X −1 .
Then θ ∈ C ∞ .
Proof. The formula for the inverse matrix expresses the (p, q) entry of X −1 as a polynomial
of {Xi,j } divided by the polynomial det X (which is nonzero in U ).
Proof. Up to replacing ϕ(x) with ϕ(x − x̄) − ϕ(x̄), we can assume 0 = x̄ = ϕ(x̄).
Let A := Dϕ(x̄), then we claim that F (x) := A−1 ◦ ϕ(x) is a Lipschitz perturbation of the
identity in Br for r > 0 small. Indeed, as x → 0, we have
∥D(F (x) − x)∥op = ∥DF (x) − 1n ∥op = ∥A−1 ◦ (Dϕ(x) − A)∥op ≤ ∥A−1 ∥op ∥Dϕ(x) − A∥op → 0,
By Corollary 10.25 we deduce that Lip(F (x) − x, Br ) ≤ 1/2. By Lemma 12.1, we find that
F (Br ) is open, that F |Br is injective and that the inverse function G : F (Br ) → Br is Lipschitz.
Now (1) holds because ϕ = A ◦ F, which is a composition of injective functions.
For (2) we need to show that ϕ(Br ) = A(F (Br )) is open. But F (Br ) is open and an
invertible linear map sends open sets to open sets.
For (3) we set ψ(y) := G(A−1 y) for y ∈ ϕ(Br ) = A(F (Br )). Then it holds
Since ϕ is differentiable at every point of Br , and ψ is Lipschitz, then Lemma 12.2 shows that
ψ is differentiable for all y ∈ ϕ(Br ), and
Finally, assume that ϕ ∈ C k and that you known ψ ∈ C k−1 . The formula for Dψ give it
as the composition of the following functions:
Thus Dψ ∈ C k−1 , nr which means that ψ ∈ C k (bootstrap). This proves the last part of the
statement by induction from the k = 1 case.
Definition 12.5:
Let U, V ⊂ Rn be open. A bijective, smooth function f : U → V with a smooth inverse
f −1 : V → U is called a diffeomorphism. If f and f −1 are both d-times continuously
differentiable for d ≥ 1, we call f a C d -diffeomorphism.
• Cartesian. This means describing the set as the set of points that solves n − k given
equation(s).
• Parametric. This means identifying each point in the set with k numerical parameters,
which lie in some range and don’t necessarily have direct geometric meaning (think about
generalized coordinates in Lagrangian mechanics).
• Graphical. Describe the coordinates of the points in the set prescribing how the last
n − k coordinates depend on the first k coordinates.
(1) There is a linear map f : Rn → Rn−k , with maximal rank (i.e., with rank n − k),
such that X = {f = 0}. That is, X is described as the set of solutions of a system
of n − k equations.
(2) There is a linear map f : Rk → Rn , with maximal rank (i.e., with rank k), such
that X = f (Rk ). That is, X is parametrised with a linear function of k variables.
(3) up to reordering the coordinates, X coincides with the graph of a linear map
f : Rk → Rn−k . In other words, (n − k) coordinates of every point in X can
be expressed as a linear function of the other k coordinates.
Let us recall that proving the equivalence of these definitions is not trivial in the linear case,
as one needs essentially to learn how to solve linear equations in order to pass from one
representation of X to the other.
Now we aim to define a suitable concept of k-dimensional surface in Rn , using C 1 maps
rather than linear maps. This is indeed possible, paying the price of working “locally” around
each point. In this framework, the Inverse Function Therem is the key tool that allows us to
“locally solve non-singular equations”.
(1) There is f ∈ C 1 (Rn , Rn−k ) such that the matrix Df (p) has maximal rank (i.e., has
rank n − k) and for r > 0 sufficiently small
X ∩ Br (p) = {f = 0} ∩ Br (p).
(2) There is f ∈ C 1 (Rk , Rn ) such that f (0) = p, the matrix Df (0) has maximal rank
(i.e., has rank k) and for s, r > 0 sufficiently small
where Bs ⊂ Rk .
(3) There is f ∈ C 1 (Rk , Rn−k ) such that for r > 0 sufficiently small
The fact that these three conditions are equivalent is not obvious and will be proved by
the means of the Inverse Function Theorem.
We will prove that (1) =⇒ (3) =⇒ (2) =⇒ (1).
We just record that (3) =⇒ (2) is straightforward since the graph of a function f : Rk →
Rn−k coincides, by definition, with the image of the function f˜: Rk → Rn given by f˜(x) =
(x, f (x)).
is invertible at a certain (x̄, ȳ) ∈ U. Then, for sufficiently small r, s > 0, there is a
function f from B(x̄, r) ⊂ Rn , to B(ȳ, s) ⊂ Rm , such that, for all (x, y) in the cylinder
B(x̄, r) × B(ȳ, s) ⊂ U , it holds
and
−1
Df (x) = − (Dy F )(x, f (x)) ◦ (Dx F )(x, f (x)).
By assumption the matrix Dϕ(x̄, ȳ) — which has size (n + m) × (n + m) — has a block
decomposition !
1n 0
Dϕ(x̄, ȳ) =
Dx F (x̄, ȳ) Dy F (x̄, ȳ),
so in particular it is invertible and we are under the assumptions of the Inverse function
Theorem. Hence ϕ has a smooth inverse when restricted to a small cylinder Br (x̄)×Bs (ȳ) ⊂ U ,
which is mapped to the open set V := ϕ(Br (x̄) × Bs (ȳ)) ⊂ Rn+m .
Let ψ ∈ C 1 (V, Br (x̄) × Bs (ȳ)) be the inverse. Notice that ψ must have the form
(x, y) = (ψ ◦ ϕ)(x, y) = (x, g(x, F (x, y)) for all (x, y) ∈ Br (x̄) × Bs (ȳ),
(x, y) ∈ Br (x̄) × Bs (ȳ), y = f (x) =⇒ g(x, F (x, y)) = g(x, F (x̄, ȳ)) =⇒ F (x, y) = F (x̄, ȳ).
The formula for Df (x̄) can be read off the matrix Dψ(x̄, ȳ). Recall that
!−1 !
1 0 1 0
= ,
A B −B A B −1
−1
Let f ∈ C 1 (Rk , Rn ) with 0 < k < n such that Df (0) has rank k. Then there is r > 0
small and g ∈ C 1 (Br (f (0), Rn−k ) such that Dg(f (0)) ahs rank n − k and f (Rk ) ∩
Br (f (0)) = {g = 0} ∩ Br (f (0)).
Proof. Pick some vectors v1 , . . . vn−k in Rn , and s > 0 small in such a way that
By the Inverse function Theorem there are functions ζ ∈ C 1 (Br (f (0), Rk ) and ω ∈ C 1 (Br (f (0), Rn−k )
such that
n−k
X
x = f (ζ(x)) + ωj (x)vj , for all x ∈ Br (f (0)),
j=1
and such a representation of x is unique in that ball. The choice g(x) := ω(x) thus provides
the sought function.
Definition 12.9:
Let X ⊂ Rn be a k-dimensional manifold. The tangent space of X at p ∈ M is the
vector subspace of Rn defined as
12.10. — In plain terms, the set Tp M is the set of velocity vectors of short paths through
p in M . If we think of elements of Tp M as vectors with base point p, then Tp M corresponds
to a k-dimensional affine subspace of Rn that touches M at the point p. We note that the
choice of (−1, 1) as the domain of differentiable paths is arbitrary; we could just as well have
chosen another open neighborhood of 0 ∈ R, such as an open interval (−δ, δ) for a ideally
small δ > 0.
The following Proposition shows how to practically compute Tp X and that it is indeed a
vector subspace of Rn (which is not transparent from the given definition).
Proposition 12.11:
According to how X is given around p, we equivalently have
(2) Tp X = ran Df (0) if X = f (Bs ) in Br (p) and f (0) = p, for some f ∈ C 1 (Bs , Rn )
with Bs ⊂ Rk ;
Proof. ...
Definition 12.12:
Let X ⊂ Rn be a k-dimensional manifold. The normal space of X at p ∈ M is the
(n − k) dimensional vector subspace of Rn which is orthogonal to Tp X.
We call any subset of the form Q = 2−k (a + [0, 1)n ), for some a ∈ Zn and k ∈ Z a
dyadic cube. Given such Q we define its µ(Q) = 2−kn .
For a finite disjoint union of dyadic cubes E = ∪i Qi , we define µ(E) = i µ(Q).
P
The measure µ, defined over finite disjoint unions of dyadic cubes, satisfies the following
properties:
1. Translation invariance: µ(E + τ ) = µ(E) for all τ such that 2k τ ∈ Zn for some
k ∈ N.
Proof. ... the key advantage of using dyadic cubes is that it allows to refine any union or
intersection of dyadic cubes with sufficiently small cubes, all of the same size. At the same
time we can approximate any real box.
13.3. — Actually, it is not difficult to prove that the properties (1)-(3) below completely
characterize µ.
93
Chapter 13.1 The n-volume
and
µout (E) = inf {voln (Qi ) | E ⊂ ∪i Qi } .
Here, the supremum and the infimum are taken among finite families of rational cubes.
If µin (E) = µout (E) then we say that E is Jordan measurable
We then denote voln (E) = µin (E) = µout (E)
2. Additivity: If E, F are Jordan measurable then their intersection and union also
are, and voln (E ∪ F ) = voln (E) + voln (F ) − voln (E ∩ F ).
Proof. Show that the number of cubes in the family [0, 2−k )n + εT | T ∈ Zn that intersect
Proof. Using the polar decomposition of a matrices (see...) we have that L = R2 SR1 , where
S is an special stretching and Ri are orthogonal matrices (i.e., rotations; characterized by
RRT = Id). Since rotations leave the unit ball invariant we have voln (B1 ) = voln (Ri B1 ) =
λ(Ri ) voln (B1 ) and hence λ(Ri ) = 1. On the other hand λ is clearly multiplicative, that is
λ(L) = λ(R2 SR1 ) = λ(R2 )λ(S)λ(R1 ) = λS = |detS|.
Since orthogonal matrices have determinant equal to one, and the determinant is also
multiplicative, we are done.
Corollary 13.11:
voln is also invariant under rotations.
Proof. By induction on the dimension one can show that an ε tiling of a cube is off by Cn ε,
so by sandwiching cube tilings suffices.
In this chapter, we will extend the Riemann integral over an interval [a, b] ⊂ R to a mul-
tidimensional Riemann integral over sufficiently nice open subsets of Rn . Again, it suffices
to consider real-valued functions, as the generalization to complex-valued or vector-valued
functions is done component-wise.
Q = I1 × . . . × In
for intervals I1 , . . . , In ⊂ R. If the lengths of the intervals I1 , . . . , In are all equal, we also call
Q a cube. For n = 2, we refer to them as rectangles.
Definition 14.2. — For bounded non-empty intervals I1 , . . . , In , the volume of the box
Q = I1 × · · · × In in Rn is defined as
n
Y
vol(Q) = (bk − ak )
k=1
for each k, then, for α = (α1 , α2 , . . . , αn ) ∈ Nn with 1 ≤ αk ≤ l(k), an address for this
partition is defined. For each such address, we write
n
Y
Qα = [xk,αk −1 , xk,αk ]
k=1
for the corresponding closed sub-box of Q. In other words, Qα ⊆ Q ⊆ Rn is the subset of all
(t1 , . . . , tn ) ∈ Rn for which xk,αk −1 ≤ tk ≤ xk,αk holds for all k = 1, 2, . . . , n. The addition
formula
X
vol(Q) = vol(Qα ) (14.2)
α
is shown using complete induction, where the sum extends over all addresses for the given
partition (14.1). A refinement of the partition (14.1) is a partition
such that, for each fixed k, the partition ak = yk,0 ≤ · · · ≤ yk,m(k) = bk of Ik is a refinement of
ak = xk,0 ≤ · · · ≤ xk,l(k) = bk , as discussed in ??. Two arbitrary partitions of Q always have
a common refinement.
Figure 14.1: On the left, a partition of a box [a1 , b1 ] × [a2 , b2 ] ⊆ R2 is shown. The horizontally
represented interval [a1 , b1 ] is divided into three parts, and the vertically represented interval
[a2 , b2 ] is divided into two parts. The address α = (3, 2) corresponds to the sub-box Qα =
[x1,2 , x1,3 ] × [x2,1 , x2,2 ]. On the right, a refinement of this partition is shown.
where the sum extends over all addresses for the given partition. We call this number the
integral of f over Q. In the case n = 1, this brings back the definitions ?? and ?? of step
functions and their integral.
Exercise 14.5. — Let Q ⊆ Rn be a box. Show that the set of all step functions T F(Q) on
Q forms a vector space with respect to pointwise addition and multiplication. Show that (14.3)
is independent of the choice of partition and thus defines a monotone and linear mapping
R
Q : T F(Q) → R
14.6. — Let Q ⊆ Rn be a box, T F denote the vector space of step functions on Q, and
f : Q → R be a function. We define the sets of lower sums U(f ) and upper sums O(f ) of
f as
Z Z
U(f ) = udx u ∈ T F and u ≤ f O(f ) = odx o ∈ T F and f ≤ o
Q Q
If f is bounded, these sets are non-empty. Due to the monotonicity of the integral for step
functions, the inequality
sup U(f ) ≤ inf O(f )
holds if f is bounded.
14.8. — Step functions are Riemann integrable, and the Riemann integral of a step function
is precisely given by (14.3). The following alternative notations for the Riemann integral are
common: Z Z
f (x1 , x2 , . . . , xn )dx1 dx2 · · · dxn f (x)dvol
Q Q
Proposition 14.10. — Let Q ⊆ Rn be a box, and R(Q) denote the set of all Riemann
integrable functions on Q. Then R(Q) is an R-vector space with respect to pointwise addition
and multiplication, and integration
R
Q : R(Q) → R
is an R-linear map. Integration is also monotonic and satisfies the triangle inequality: It holds
Z Z Z Z
f ≤ g =⇒ f (x)dx ≤ g(x)dx and f (x)dx ≤ |f (x)|dx
Q Q Q Q
Proof. The proof is analogous to the proof of the corresponding statements for the Riemann
integral on an interval; see Theorem ?? for linearity, Proposition ?? for monotonicity, and
Theorem ?? for the triangle inequality.
Exercise 14.11. — Let Q ⊆ R2 be the box [−1, 1]2 . Calculate the two-dimensional integral
Z
(2 − x2 − y 2 )dxdy
Q
holds.
holds.
14.14. — We interpret the condition in the definition of null sets such that 0 is the only
reasonable value for the volume of a null set N . In fact, (14.4) can be read as stating that N
is contained in a set ∞ ℓ=1 Qℓ whose volume is small. Here, the volume of the countable union
S
S∞ P∞
ℓ=1 Qℓ is not defined, but ℓ=1 vol(Qℓ ) can be regarded as an upper bound for it. We say
that a statement A about elements x ∈ Rn is true almost everywhere if
{x ∈ Rn | ¬A(x)}
Lemma 14.15. — A subset of a null set is a null set. A countable union of null sets is
again a null set.
Proof. The first statement follows directly from (14.4). Let (Nj )j∈N be a countable family of
null sets in Rn , and let N be their union. Let ε > 0. Then, by the definition of null sets, for
each j ∈ N, there exists a countable family of open boxes (Qjk )k∈N such that
∞ ∞
[ X ε
Nj ⊂ Qjk and vol(Qjk ) < .
2j+1
k=0 k=0
This implies
∞ ∞ [
∞ ∞ X
∞ ∞
[ [ X X ε
N= Nj ⊂ Qjk and vol(Qjk ) < j+1
= ε.
2
j=0 j=0 k=0 j=0 k=0 j=0
Since N × N is countable, and ε > 0 was arbitrary, this shows that N is a null set.
Example 14.16. — Every countable subset in Rn is a null set. In particular, for example,
Qn ⊆ Rn is a null set, since Qn is countable. Every box with an empty interior is a null set.
Every linear subspace V ⊊ Rn is a null set in Rn . In Proposition 14.18, we will show that an
open subset U ⊆ Rn is a null set if and only if U is empty.
Proof. If the interior of X is not empty, then X contains a closed box with a non-empty
interior. Let Q = [a1 , b1 ] × · · · × [an , bn ] be such a box, a1 < b1 , . . . , an < bn . It suffices to
show that Q is not a null set. If Q were a null set, there would exist open boxes O1 , O2 , . . .
with
∞ ∞
[ X vol(Q)
Q⊂ Ok and vol(Ok ) < .
2
k=1 k=1
We define the boxes Qk = Ok ∩ Q and write Qk = [ak,1 , bk,1 ] × . . . × [ak,n , bk,n ] for all k ∈
{1, . . . , m}. In particular, Q = m k=1 Qk .
S
For each j ∈ {1, . . . , n}, we can define a partition of [aj , bj ] by arranging the points
{a1,j , b1,j , . . . , am,j , bm,j }. This gives us a partition of Q such that each closed sub-box Qk ⊆ Q
is a finite union of boxes Qα , where α are addresses in this partition of Q. It follows that
X X
vol(Q) = vol(Qα ) and vol(Qk ) = vol(Qα )
α α|Qα ⊆Qk
Figure 14.2: After reducing to a finite cover of the cube Q, it is much easier to show that the
total volume of the covering cubes is at least as large as the volume of Q. In fact, for the
latter, a partition is fabricated from Q1 , . . . , Qm as illustrated, and then the addition formula
(??) is applied.
Exercise 14.19. — Show that a subset N ⊆ Rn is a null set if and only if, for every ε > 0,
there exists a family (Qk )k∈N of closed cubes, with
∞
[ ∞
X
N⊂ Qℓ and vol(Qℓ ) < ε.
ℓ=1 ℓ=1
Proof. Let ε > 0. Since f is Riemann integrable, for every ε > 0, there exist step functions u
and o on Q with u ≤ f ≤ o and Q (o−u)dx < ε. We choose a partition of Q such that for each
R
address α, both functions u and o are constant on Q◦α . Let cα be the constant value of u, and
dα be the constant value of o on Q◦α . For each address α, we define the cube Pα = Qα ×[cα , dα ]
and obtain
[ [
graph(f ) ⊆ Pα ∪ ∂Qα × R
α α
where the second union corresponds to the grid of the partition of Q. It is a finite union of
axis-parallel hyperplanes in Rn , which we can cover with countably many cubes with volume
0. It holds Z
X X
vol(Pα ) = (dα − cα ) vol(Qα ) = (o − u)dx < ε,
α α Q
showing that graph(f ) can be covered by countably many cubes with a sum of volumes less
than ε. Since ε > 0 was arbitrary, the proposition follows.
Exercise 14.23. — Use the previous exercise to show that every k-dimensional submanifold
of Rn for k < n is a null set in Rn .
Exercise 14.24. — Given is a rectangle Q ⊆ R2 , which has a finite cover with rectangles
Q1 , . . . , Qn ⊂ Q that intersect only along their edges.
Assume that each of the rectangles Q1 , . . . , Qn has at least one edge of integer length. Show
that then Q also has at least one edge of integer length.
N = {x ∈ Q | f is discontinuous at x}
Version: February 25, 2024. 103
Chapter 14.1 The Riemann Integral for Boxes
is a null set.
Corollary 14.26. — Let Q be a closed cube with non-empty interior. Then, every con-
tinuous function f : Q → R is Riemann integrable.
Proof. This follows directly from the Lebesgue Criterion. Since Q is compact, all continuous
functions on Q are bounded.
where B∞ (x, δ) denotes the ball with respect to the supremum norm. Such a ball is an open,
axis-parallel cube with center x and side length 2δ. The oscillation ω(f, x) is defined as the
limit, independent of the shape of the balls, of ω(f, x, δ) as δ → 0.
Proof. Let η ≥ 0, and let (xk )∞k=0 be a convergent sequence in X with ω(f, xk ) ≥ η for all
k ∈ N and with limit x ∈ X. Take δ > 0 arbitrarily. Then, there exists a k such that
xk ∈ B(x, δ). Since B(x, δ) is open, there is a δk > 0 with B(xk , δk ) ⊆ B(x, δ). From
sup f (B(x, δ)) ≥ sup f (B(xk , δk )), inf f (B(x, δ)) ≤ inf f (B(xk , δk )),
Proof of Theorem 14.25. First, we assume that the bounded real-valued function f on Q is
Riemann-integrable. Let η > 0 and ε > 0 be arbitrary. Then, according to Proposition 14.9,
there exist step functions u and o on Q with u ≤ f ≤ o and Q (o − u)dx < εη. We choose a
R
partition of Q so that for each address α, the function u is constant on Q◦α with value cα , and
o is constant on Q◦α with value dα . It holds
X
(dα − cα ) vol(Qα ) < εη.
α
Now we consider the closed set Nη = {x ∈ Q | ω(f, x) ≥ η} from Lemma 14.28. For each
α∈/ A(η) and x ∈ Q◦α , there exists a δ > 0 such that B(x, δ) is contained in Q◦α . Therefore,
for all α ∈
/ A(η) and x ∈ Q◦α . Every element x ∈ Nη is either an element of ∂Qα for an address
α, or an element of Q◦α for an address α ∈ A(η). From
[ [ X
Nη ⊂ ∂Qα ∪ Q◦α and vol(Qα ) < ε
α α∈A(η) α∈A(η)
∞
[
N = {x ∈ Q | f is discontinuous at x} = {x ∈ Q | ω(f, x) > 0} = N2−k
k=0
tion, K ⊆ Q \ Nε , and thus, ω(f, x) < ε for all x ∈ K. According to Proposition ??, there
exists a δ > 0 such that for all x ∈ K, the estimate
(14.7)
ω f |K , x, δ < 2ε
holds. We now choose a partition of Q so that the mesh size is smaller than δ, and such that
each of the cubes Ok ∩ Q is a union of closed cubes Qα . We define step functions u and o with
u ≤ f ≤ o by
inf f (Q◦ ) if x ∈ Q◦ sup f (Q◦ ) if x ∈ Q◦
α α α α
u(x) = o(x) =
f (x) if x ∈ ◦
/ Q ∀α α
f (x) if x ∈
/ Q◦ ∀α α
and want to show that Q (o − u)dx is small. To do this, we separate the sum corresponding
R
to (??) into two parts. In the first part, we sum over addresses α such that Qα is part of a
cube Oℓ , and in the second part, we sum over the remaining addresses α. On the first sum,
we then apply (14.6), and on the second sum, we apply (14.7). This results in
Z m
X X X
(o−u)dx ≤ 2M vol(Qα )+ 2ε vol(Qα ) ≤ 2M ε+2ε vol(Q) = 2(M +vol(Q))ε,
Q ℓ=1 α|Qα ⊆Oℓ α|Qα ⊆K
and since ε > 0 was arbitrary, this implies the Riemann integrability of f .
Proof. Suppose there exist continuous functions f− and f+ on Q satisfying (14.8) for ε >
0. Since f− and f+ are Riemann-integrable according to Corollary 14.26, there exist step
functions u and o with u ≤ f− and Q (f− − u)dx < ε, as well as f+ ≤ o and Q (o − f+ )dx < ε.
R R
Since ε > 0 was arbitrary, this implies the Riemann integrability of f from Proposition 14.9.
For the reverse direction, we proceed step by step, making progressively weaker assumptions
about the functions f .
Case 1: f = 1Q′ for a closed rectangle Q′ ⊆ Q. For δ > 0, consider the functions f+ and
f− on Q, given by
The functions f+ and f− are both continuous, and it holds f− ≤ 1Q′ ≤ f+ . Indeed, the
function f+ is constant with value 1 on Q′ and constant with value 0 outside a δ-neighborhood
of Q′ . The function f− is constant with value 0 on the complement of Q′ and constant with
value 0 outside a δ-neighborhood of Q \ Q′ . In particular, f− (x) = 1Q′ (x) = f+ (x) for all x
outside a δ-neighborhood of ∂Q′ , which implies the estimate for the integral in (14.8) as long
as δ is chosen small enough.
Case 2: f is a step function. We choose a partition of Q that suits f . By duplicating each
partition point, we can refine the partition so that for each subrectangle Qα , each side face
is again a subrectangle in the partition. Let N be the number of rectangles in this partition,
and let M ≥ 0 with −M ≤ f (x) ≤ M for all x ∈ Q. According to the previous case, for each
α there exist continuous functions fα,− ≤ 1Qα ≤ fα,+ with
Z
ε
(fα,+ − fα,− )dx ≤
Q 2M N
where in each sum, the first two sums run over all addresses α with Q◦α ̸= ∅ and cα denotes
the constant value of f on Q◦α , and the second sum runs over all addresses β with Q◦β = ∅. It
holds f− ≤ f ≤ f+ by construction, and
Z X Z XZ
(f+ − f− )dx = |cα | (fα,+ − fα,− )dx + 2M (fβ,+ + fβ,− )dx <
Q α Q β Q
X |cα |ε X ε
+ 2M ≤ε
α
2M N 2M N
β
and o− ≤ o ≤ o+ with Q (u+ −u− )dx ≤ ε and Q (o+ −o− )dx ≤ ε. It follows that u− ≤ f ≤ o+
R R
and
Z Z Z Z
(o+ − u− )dx ≤ ε + (o+ − o + u − u− )dx ≤ ε + (o+ − o− )dx + (u+ − u− )dx ≤ 3ε
Q Q Q Q
Proof. The first statement follows from Lebesgue’s Criterion and the equality
1B1 ∩B2 = 1B1 · 1B2 , 1B1 ∪B2 = 1B1 + 1B2 − 1B1 ∩B2 , 1B1 \B2 = 1B1 − 1B1 · 1B2
and, according to Equation (14.9), the boundaries ∂(B1 ∪ B2 ), ∂(B1 ∩ B2 ), and ∂(B1 \ B2 ) are
contained in the union ∂B1 ∪ ∂B2 and thus null sets. Again, by Lebesgue’s Criterion, B1 ∪ B2 ,
B1 ∩ B2 , and B1 \ B2 are Jordan-measurable.
is Jordan-measurable.
Proof. The functions f− and f+ are bounded, so there exists M ≥ 0 such that −M < f− (x) <
M and −M < f+ (x) < M for all x ∈ Q. Let N− ⊆ D and N+ ⊆ D be the set of discontinuity
points of f− and f+ , respectively, and write N = ∂D ∪ N− ∪ N+ . The set B is bounded since
it is contained in Q × [−M, M ]. The boundary of B is contained in the union
By Proposition 14.20, graph(f− ) and graph(f+ ) are null sets. According to Lebesgue’s
Criterion, N as the union of three null sets is also a null set. Therefore, N × [−M, M ] is
a null set. In fact, for a countable cover by open rectangles {Qk |k ∈ N} of A, the family
{Qk × [−M, M ]|k ∈ N} is a countable cover by open rectangles of N × [−M, M ], and from
∞ ∞
X ε X
vol(Qk ) < =⇒ vol(Qk × [−M, M ]) < ε
2M
k=0 k=0
Exercise 14.34. — Show that for a null set N ⊆ Rn−1 , N ×R is also a null set in Rn . Show
that for a Jordan-measurable set D ⊆ Rn−1 and any a < b, D × [a, b] is a Jordan-measurable
subset of Rn .
The integral of the function f! extended by the constant value 0 over Q1 is equal to the
corresponding integral over Q, as becomes evident when considering a partition of Q1 for
which Q ⊆ Q1 is a sub-box, as shown in Exercise ??.
Exercise 14.37. — Let a ∈ Rn and λ ∈ R. Show that for every Jordan-measurable subset
B ⊆ Rn , the subset a + λB = {a + λb | b ∈ B} is Jordan-measurable with volume
then
Proof. All statements follow directly by applying the propositions in Section 14.1.1 to f! as in
Definition ??.
holds Z Z Z Z
f dx = f dx + f dx − f dx.
B1 ∪B2 B1 B2 B1 ∩B2
Exercise 14.41. — A subset J ⊂ Rn is called a Jordan null set if, for every ϵ > 0, there
exists a finite family Q1 , . . . , Qm ⊂ Rn of open boxes such that
m
[ m
X
J⊆ Qk , vol(Qk ) < ϵ
k=1 k=1
Exercise 14.42. — Let J be a Jordan null set, and let f : J → R be a bounded function.
Show that f is Riemann-integrable, and J f dx = 0.
R
everywhere. Formulate and prove the analogous statement for functions on Jordan-measurable
subsets.
14.44. — Let Q be a closed box with non-empty interior, and R(Q) be the vector space
of Riemann-integrable functions on Q. Then, the expression
Z
∥ · ∥1 : R(Q) → R ∥f ∥1 = |f |dx
Q
defines a so-called seminorm on R(Q). In fact, ∥ · ∥1 satisfies all properties of a norm from
Definition ??, except definiteness. Instead, for f ∈ R(Q) with ∥f ∥1 = 0 according to Exercise
14.43, f is almost everywhere equal to zero. The set
N (Q) = {f ∈ R(Q) | ∥f ∥1 = 0}
is a linear subspace of R(Q), and the seminorm ∥ · ∥1 can be interpreted as a norm on the
quotient space R(Q)/N (Q). The resulting normed vector space is not complete and thus
not as useful for advanced analysis. However, the completion of this or similarly constructed
spaces plays a crucial role in measure theory, extensively studied.
parameter integral, as hinted at in Exercise ??. We use the notation introduced in ?? for
lower and upper sums U(f ), O(f ) for bounded functions f on a box Q.
are both step functions on P with respect to the given partition of P . It holds
Z Z Z
h(x, y)d(x, y) = H− (x)dx = H− (x)dx (14.12)
P ×Q P P
due to (14.10).
With these preparations, we now consider a function F : P → R with F− ≤ F ≤ F+ ,
choose ϵ > 0 and step functions u and o on P × Q with u ≤ f ≤ o and P ×Q (o − u)d(x, y) < ϵ.
R
from elementary properties of supremum and infimum as well as (14.12). This shows that F
is Riemann-integrable, and also that
Z Z
F (x)dx = f (x, y)d(x, y)
P P ×Q
if all parameter integrals exist. Otherwise, the parameter integrals can be replaced by suprema
of lower sums or infima of upper sums.
as claimed.
Example 14.48. — Consider the subset B ⊆ R2 enclosed between the parabolas with
equations y = x2 and x = y 2 . We view this set as a uniformly thick, homogeneous plate and
want to calculate its center of mass.
From Section ??, we already know that the region defined by 0 ≤ y ≤ x2 and 0 ≤ x ≤ 1 has
area 13 . Due to symmetry, the area of B is also 13 . Also, due to symmetry, the center of mass
of B lies on the line with equation x = y. The x-coordinate xS , and thus the y of the center of
mass S, is given by definition through xS = vol(B)
1
B xdydx and is calculated with Corollary
R
14.47 as
√
Z Z 1Z x Z 1 √
1 9
xS = xdydx = 3 xdydx = 3 x( x − x2 )dx = .
vol(B) B 0 x2 0 20
Exercise 14.49. — Calculate the integral of the function f (x, y, z) = xyz over the set
B ⊆ R3 for
Calculate the volume of the set A ⊆ R2 enclosed between the curves x2 +y 2 = 8 and 4y = x2 +4.
Corollary 14.50 (Cavalieri’s Principle). — Let B ⊆ [a, b]×Rn−1 be a bounded and Jordan-
measurable set. Then,
Z b
vol(B) = vol(Bt )dt
a
where for t ∈ [a, b], the subset Bt ⊆ Rn−1 is given by Bt = {y ∈ Rn−1 | (t, y) ∈ B} and is
Jordan-measurable for almost all t ∈ [a, b].
Exercise 14.51. — In Section ??, the volume of rotational bodies was defined. Here,
we verify that the introduced volume of Jordan-measurable subsets in the last section is
compatible with it. Let f : [a, b] → R≥0 be continuous, and let
p
K = {(x, y, z) ∈ R3 | a ≤ x ≤ b, 0 ≤ y 2 + z 2 ≤ f (x)}
be the rotational body given by f . Show that K is Jordan-measurable and that the volume
Rb
of Kf in the sense of Definition 14.31 is given by π a f (x)2 dx.
R1R1
Exercise 14.52. — Calculate 0 x exp(y 2 )dydx.
14.53. — Theorems 14.45 and Corollary 14.50 are, among other things, necessary even for
very elementary volume calculations. For example, one might wonder whether the volume of a
pyramid in R3 can be geometrically calculated by decomposing the pyramid into finitely many
operations of cutting and gluing into boxes. This method works excellently for polygons in R2 .
However, already for pyramids in R3 , it can be proven, using the so-called Dehn Invariant,
that this method generally does not work - thus, for the volume calculation of polyhedra in
R3 , one has to resort to integration methods of analysis.
as the integral of f over a box containing supp(f ), if it exists, and in that case, we call f
Riemann-integrable. Here, U does not necessarily need to be bounded or Jordan-measurable.
14.56. — The reason why the hypothesis about the support of f is necessary is as follows:
The function arctan : R → (− π2 , π2 ) is a diffeomorphism. According to the substitution rule,
here in dimension 1, for the constant function f : (− π2 , π2 ) → R with value 1, it should hold
π
Z Z Z ∞
2 ′ 1
π= 1dy = arctan (x) dx = dx
π
−2 R −∞ x2 + 1
This is not entirely incorrect, but in this case, the integral on the right must be understood
as an improper Riemann integral, which we have not introduced in the multidimensional case
yet. According to Definition ??, the function x 7→ (x2 + 1)−1 on R is not Riemann-integrable,
as R as a subset of R is not Jordan-measurable.
Proof.
Example 14.58. — Let 0 < a < b and 0 < c < d be fixed parameters used to define the
curvilinear bounded rectangle M ⊆ X = R2>0 , which is bounded by the four curves
y = ax2 , y = bx2 , x = cy 2 , x = dy 2 .
We want to use the substitution rule to calculate the area vol(M ) of M . To do this, we
introduce the variables u = x2 y −1 and v = x−1 y 2 . Through
2 1
! ! ! !
x x2 y −1 u u3 v 3
Ψ = and Φ = 1 2 ,
y x−1 y 2 v u3 v 3
2 − 13 31 1 23 − 23
!
det(DΦ(u, v)) = det 3u v 3u v = 4
− 1
= 13 .
1 − 23 32 2 13 − 13 9 9
3u v 3u v
where U = {(x, y) ∈ R2 | y ̸= 0 or x < 0}. In the first equation, we omit the integration over
the Jordan null set {(x, y) | y = 0, x ≥ 0} using Proposition 14.40, and in the second equation,
we apply Theorem 14.55 to the polar coordinate transformation discussed in Section ??
! !
r r cos φ
Φ : (0, R) × (0, 2π) → U Φ =
φ r sin φ
The factor r in the right integral is the Jacobian determinant of the polar coordinate trans-
formation Φ.
Figure 14.3: The factor r as the Jacobian determinant has a geometric meaning. A box with
side lengths △r and △φ corresponds to an almost rectangular section of a circular ring with
“side lengths” △r and approximately r△φ, which is often informally denoted as dr and rdφ
due to the desired size of these differences.
Exercise 14.61. — Compute the area for given constants 0 < a < b and 0 < c < d of the
set {(x, y) ∈ R2 | a ≤ y exp(−x) ≤ b, c ≤ y exp(x) ≤ d}.
where tij are the coefficients of the matrix associated with T . Each of these parameter integrals
extends from −∞ to ∞ or alternatively from −R to R for a sufficiently chosen R > 0, as f has
compact support. For the integral with respect to the variable xk , starting with k = 1, then
for k = 2, and so on until k = n, we perform the one-dimensional linear substitution yk =
|tkk |xk + tk,k+1 xk+1 + · · · + tk,n xn . In Leibniz notation, dyk = |tkk |dxk , or dxk = |tk k −1 |dyk .
The multiple integral becomes
ZZZ Z
−1
|t11 t22 · · · tnn | · · · f (y1 , y2 , . . . , yn )dyn · · · dy2 dy1
which, again using Fubini’s theorem, is precisely the integral | det T |−1
R
f (y)dy.
Proof. Let’s assume initially that the function f is continuous. From linear algebra, we know
that L can be written as the product of invertible matrices L = P ST , where P is a permutation
matrix, S is a lower triangular matrix, and T is an upper triangular matrix. The statement
of Lemma 14.63 holds equally for lower triangular matrices with the same proof, and also for
permutation matrices. It follows by applying Lemma 14.63 three times:
Z Z Z
f (P ST (x))dx = | det(P )|| det(S)|| det(T )| f (x)dx = | det(L)| f (x)dx
Rn Rn Rn
which proves the statement for continuous functions. Now, let f : Rn → R be any Riemann-
integrable function with support in an open box Q ⊆ Rn . According to Proposition ??, for
every ϵ > 0, there exist continuous functions f− and f+ on Rn with support in Q such that
Z
f− ≤ f ≤ f+ and (f+ − f− ) < ϵ.
Rn
From this, using the already treated case of continuous functions, we get
Z
f− ◦ L ≤ f ◦ L ≤ f+ ◦ L and (f+ ◦ L − f− ◦ L) < | det L|ϵ
Rn
Proof. If L is not invertible, then L(B) is bounded and contained in a proper linear subspace
of Rn , and thus, a Jordan null set. If L is invertible, the volume vol(L(B)) is given by Lemma
14.64 as
Z Z Z
1L(B) (x)dx = | det L| 1L(B) (L(x))dx = | det L| 1B (x)dx = | det L| vol(B)
Rn Rn Rn
Exercise 14.66. — Compute, for a symmetric positive definite matrix A ∈ Matn (R), the
volume of the ellipsoid {x ∈ Rn | ⟨Ax, x⟩ ≤ 1}, assuming knowledge of the volume of the
unit ball.
Corollary 14.67. — Let L ∈ Matn (R) with columns v1 , . . . , vn ∈ Rn . Then, the paral-
lelotope ( n )
X
P = L([0, 1]n ) = s i vi 0 ≤ s i ≤ 1
i=1
p
is Jordan-measurable, and vol(P ) = | det(L)| = gram(v1 , . . . , vn ).
Proof. This follows from Corollary 14.65, applied to the Jordan-measurable set [0, 1]n with
volume 1. Here, gram(v1 , . . . , vn ) denotes the Gram determinant, which is the determinant
of the matrix A with coefficients aij = ⟨vi , vj ⟩. It holds A = Lt L, and therefore | det L|2 =
det A.
is contained in X. The compactness of K1 follows from the Heine-Borel theorem. The con-
tainment of K1 in X implies that any closed box Q ⊆ Rn with maximum edge length less than
δ0 and Q ∩ K0 ̸= ∅ is contained in K1 . To see this, we can consider the continuous function,
as per Exercise 9.54:
Since X is open, we have h(x) > 0 for all x ∈ K0 , and by Theorem ??, this function attains
its minimum on K0 . Then, δ0 = 31 min{h(x) | x ∈ K0 } fulfills the required condition.
Figure 14.4: The set K1 corresponds to a slightly inflated version of K0 within X, defined in
such a way that sufficiently small boxes intersecting K0 are fully contained in K1 .
Proof. We can replace Φ with Φ(x − x0 ) − y0 , without loss of generality, assuming x0 = 0 and
y0 = Φ(x0 ) = 0. In this case, the inclusions (14.14) become
from which, using the vector-valued integral triangle inequality in (??), the estimation
√
∥(Φ(x2 ) − L(x2 )) − (Φ(x1 ) − L(x1 ))∥ ≤ σ∥x2 − x1 ∥ ≤ σ n∥x2 − x1 ∥∞ (14.17)
√
follows. For x2 = x ∈ Q0 and x1 = 0, we obtain ∥Φ(x) − L(x)∥ ≤ σ nr, and thus
√
∥L−1 Φ(x) − x∥∞ ≤ ∥∥L−1 Φ(x) − x∥ ≤ ∥L−1 ∥op σ nr ≤ sr (14.18)
Therefore, the claimed inequality in (14.16) holds. For the first inclusion, we apply the Banach
Fixed Point Theorem 9.56 to the map
This shows ∥Tx (y1 ) − Tx (y2 )∥ ≤ s∥y1 − y2 ∥. Since s < 1 by assumption, we can apply the
Banach Fixed Point Theorem, showing that for every x ∈ (1 − s)Q0 , there exists a unique
y ∈ Q0 such that Tx (y) = y or equivalently L−1 Φ(y) = x. This proves (14.16).
Proof. We will construct parallelotopes P + and P − such that P − ⊆ Φ(Q0 ) ⊆ P + and fulfill
the inequalities
vol(P + ) vol(P − )
< | det DΦ(x0 )| vol(Q0 ) <
1+ϵ 1−ϵ
Let s ∈ (0, 1) be small enough such that
and let δ0 > 0 be small enough such that the compact set K1 = K0 + B∞ (0, 2δ0 ) is contained
in X. According to Theorem ??, the continuous function x 7→ ∥DΦ(x)−1 ∥op is bounded on
K1 , so we have
∥DΦ(x)−1 ∥op < M for all x ∈ K1
Hold. Now, let x0 ∈ K0 and Q0 be a closed cube with center x0 and side length smaller than
δ. Write y0 = Φ(x0 ) and L = DΦ(x0 ). According to Lemma 14.69, we have
K1 := K0 + B∞ (0, δ0 )
is contained in X, and choose a closed box Q ⊆ Rn containing K1 . Write g! for the extension
by 0 of g to Q.
Let ϵ > 0, and let δ ∈ (0, δ0 ) be small enough such that δ satisfies the statement for ϵ > 0
in Lemma 14.70 for K0 ⊆ X. Since the function x 7→ det DΦ(x) is continuous on X and thus
holds for all x0 , x ∈ K1 . Choose non-negative step functions u and o on Q with respect to a
partition of Q with mesh smaller than δ, such that
Z
u ≤ g! ≤ o and (o − u)dx < ϵ
Q
Write cα ≥ 0 for the constant value of u on Q◦α and dα ≥ 0 for the constant value of u on Q◦α .
Let A be the set of addresses α in this partition for which Qα ∩ K0 ̸= ∅. For α ∈ A, we have
Qα ⊆ K1 , and
[ [
supp(g) ⊆ Qα ⊆ X and supp(f ) ⊆ Φ(Qα ) ⊆ Y.
α∈A α∈A
For α ∈ A, both ∂Qα and Φ(∂Qα ) are Jordan measurable with volume zero. We can thus
calculate the integral of g| det DΦ| over X as
Z XZ
g(x)| det DΦ(x)|dx = g(x)| det DΦ(x)|dx
X ◦
α∈A Qα
We now estimate the individual summands in these representations. For brevity, write vα =
vol(Qα ) and v = vol(Q), and choose M > 0 such that
Figure 14.5: Geometric setup in the proof of the substitution rule. The boxes Qα with α ∈ A
are highlighted.
The given partition of Q having mesh smaller than δ means that for all α ∈ A and x ∈ Qα ,
the estimate ∥xα − x∥ ≤ δ holds. By choosing δ accordingly, this implies | det(DΦ(x)) −
det(DΦ(xα ))| ≤ ϵ for all α ∈ A and x ∈ Qα . This leads to the upper estimate
Z
g(x)| det(DΦ(x))|dx ≤ vα dα (| det(DΦ(xα ))| + ϵ) ≤ vα dα | det(DΦ(xα ))| + ϵvα M.
Qα
Similarly, we can derive a lower estimate, using the step function u ≤ g with constant value
cα on Q◦α . Summing over α ∈ A gives
X Z X
−ϵM v + vα cα | det(DΦ(xα ))| ≤ g(x)| det(DΦ(x))|dx ≤ ϵM v + vα dα | det(DΦ(xα ))|
α∈A X α∈A
(14.19)
By choosing δ and using Lemma 14.70, we have
for all α ∈ A. The function o ◦ Φ is constant with value dα on Φ(Q◦α ) and an upper bound for
f = g ◦ Φ−1 , which implies
Z Z
f dy ≤ (o ◦ Φ−1 )dy ≤ ϵM vα + det(DΦ(xα ))vα dα
Φ(Q◦α ) Φ(Q◦α )
Similarly, using the step function u ≤ g with constant value cα on Q◦α , we derive a lower bound
for the integral of f , which finally leads to the estimates
X Z X
−ϵM v + det(DΦ(xα ))vα cα ≤ f dy ≤ ϵM v + det(DΦ(xα ))vα dα (14.20)
α∈A Y α∈A
Notice that change of variable formula holds and Fubini theorem hold.
Z Z Z Z ∞ Z 2π
−x2 −y 2 −x2 −y 2 2
2
I = e dx e dy = e dxdy = e−r r drdφ
R R R2 0 0
Z ∞
2
=π e−r 2r dr = π
0
√
which implies I = π. We have applied Fubini’s Theorem to the entire R2 , which can be
justified by considering suitable exhaustions, such as ([−m, m]2 )∞
m=0 for R .
2
Exercise 14.72. — Let A ∈ Matn,n (R) be symmetric and positive definite. Show
π n/2
Z
exp(−⟨Ax, x⟩)dx = √
Rn det A
Example 14.73. — As an application of the theory in this chapter, we want to prove the
equation known as the Basel problem:
∞
X 1 π2
ζ(2) = =
n2 6
n=1
The approach we take here is from T. Apostol [Apo1983]. To do this, we will evaluate the
integral Z 1Z 1
1
dxdy (14.21)
0 0 1 − xy
in two different ways.
Z 1Z 1 Z ∞
1Z 1X ∞ Z 1Z 1
1 k
X
dxdy = (xy) dxdy = xk y k dxdy
0 0 1 − xy 0 0 k=0 0 0
k=0
∞ Z 1 ∞
X 1 X 1
= y k dy = = ζ(2) (14.22)
0 k+1 (k + 1)2
k=0 k=0
The new integration domain in the u, v-coordinates is the square with corners at (0, 0), (1, 1),
(2, 0), and (1, −1), as can be easily verified by substitution. It holds that
1 1 4
= =
1 − xy 1 − 14 (u − v)(u + v) 4 − u2 + v 2
as claimed.
Exercise 14.74. — Carry out the above calculations in detail, justify all the formal steps.
P∞ (−1)n+1
Exercise 14.75. — Calculate the value of the alternating series n=1 n2
.
Exercise 14.77. — For n ∈ N, let ωn denote the volume of the n-dimensional unit ball
B(0, 1) ⊆ Rn . Show that
π n/2
ωn = .
Γ(n/2 + 1)
Calculate ω100 with computer assistance to 30 correct decimal places. For which n is ωn
maximal?
Hint: Show or use without proof that the one-dimensional integral
Z π
In = sinn (x) dx
0
B → R such that |f | ≤ g and the functions f |Bm and g|Bm are Riemann-integrable for all
m ∈ N. Assume g is improperly Riemann-integrable. Show that f and |f | are also improperly
Riemann-integrable on B, and
Z Z Z
f dx ≤ |f |dx ≤ gdx
B B B
holds.
Show that
Γ(x)Γ(y)
B(x, y) =
Γ(x + y)
holds.
Proof. We check the continuity of F at x0 ∈ U . Let ϵ > 0. Choose r > 0 such that B(x0 , r) is
contained in U . According to the Heine-Borel Theorem, let K := B(x0 , r) × [a, b] be compact,
and f |K is uniformly continuous by Heine-Cantor Theorem. Thus, there exists δ ∈ (0, r) such
that for all x ∈ B(x0 , δ) and t ∈ [a, b], the inequality |f (x, t) − f (x0 , t)| < ϵ(b − a)−1 holds.
This implies
Z b
|F (x) − F (x0 )| ≤ |f (x, t) − f (x0 , t)|dt < ϵ
a
Let ϵ > 0. Due to the uniform continuity of ∂k f on K, there exists δ ∈ (0, r) such that for
x ∈ B(x0 , δ) and all t ∈ [a, b], the estimate
Z b Z b
F (x0 + sek ) − F (x0 ) f (x0 + sek , t) − f (x0 , t)
− ∂k f (x0 , t)dt = − ∂k f (x0 , t) dt
s a a s
Z b
= (∂k f (x0 + ξsek , t) − ∂k f (x0 , t)) dt ≤ ϵ(b − a).
a
By the first part of the theorem, ∂k F is continuous, and since k was arbitrary, continuous
differentiability of F follows from Theorem 10.10.
defines, for x ∈ (0, 1), the so-called “complete elliptic integral of the second kind”. According
to Theorem 14.80 and induction, the function F : (0, 1) → R is smooth.
Corollary 14.82:
Let U ⊂ Rn be open, a < b be real numbers, and f : U × (a, b) → R be continuous
with continuous partial derivatives ∂k f for k ∈ {1, . . . , n}. Let α, β : U → (a, b) be
continuously differentiable. Then, the parameter integral with varying limits
Z β(x)
F (x) = f (x, t)dt
α(x)
for all x ∈ U .
Proof. We combine Theorem 14.80, the Fundamental Theorem of Calculus from Analysis I,
and the chain rule (10.16). For this, we define the auxiliary function
Z β
2
φ : U × (a, b) → R, (x, α, β) 7→ f (x, t)dt
α
First, we show that φ is continuous. Let (xn , αn , βn ) ∈ U × (a, b)2 be a sequence converging
to (x, α, β) ∈ U × (a, b)2 . Choose ϵ > 0 and c, d ∈ (a, b) such that
B(x, ϵ) ⊂ U and c ≤ αn , βn ≤ d
using the triangle inequality for integrals over the subintervals between αn and α, as well as
βn and β, and the bound M for the function values of f . For n → ∞, it follows from Theorem
14.80 that this expression tends to 0, hence φ is continuous.
According to Theorem 14.80, the partial derivatives ∂k φ for k = 1, 2, . . . , n exist and are
given by
Z β
∂k φ(x, α, β) = ∂k f (x, t)dt.
α
We can apply the chain rule and obtain that F is continuously differentiable, with a total
derivative
∂k F (x) = ∂k φ(x, α(x), β(x)) + ∂α φ(x, α(x), β(x))∂k α(x) + ∂β φ(x, α(x), β(x))∂k β(x)
Z β(x)
= ∂k f (x, t)dt − f (x, α(x))∂k α(x) + f (x, β(x))∂k β(x)
α(x)
for u ∈ C 2 ((0, ∞), R) is called the Bessel Differential Equation. It is linear, homogeneous
of the second order. From the Picard-Lindelöf existence and uniqueness theorem, which we
will prove towards the end of the semester, it follows that (14.23), together with any two initial
values u(1) = a and u′ (1) = b for a, b ∈ R, has a uniquely determined solution on (0, ∞). In
particular, the vector space of solutions to (14.23) is two-dimensional. We aim to provide two
linearly independent solutions. For this purpose, we assume n ∈ N.
is called the Bessel function of the first kind, and it solves the differential equation
(14.23), as we can verify using Theorem 14.80. Indeed, we have ∂x (cos(x sin(t) − nt)) =
− sin(x sin(t) − nt) sin(t), and therefore,
Z π
1
Jn′ (x) =
− sin(x sin(t) − nt) sin(t) dt
π 0
and similarly Z π
1
Jn′′ (x) − cos(x sin(t) − nt) sin2 (t) dt.
=
π 0
using integration by parts and the assumption n ∈ N. Therefore, (14.24) satisfies the differ-
ential equation (14.23).
14.84. — The Bessel function of the second kind is defined by the improper integral
Z π Z ∞
1 1
exp(t)+(−1)n exp(−nt) exp(−x sinh(t))dt
Yn (x) = sin(x sin(t)−nt)x sin(t) dt−
π 0 π 0
for x ∈ (0, ∞). It can be shown that Yn also satisfies the differential equation (14.23). We
have
1 π
Z
lim Jn (x) = cos(nt)dt
x→0 π 0
according to Theorem 14.80, and
lim Yn (x) = −∞. (14.25)
x→0
(a) Show that the Bessel function Yn of the second kind is well-defined and prove the asymp-
totics in (14.25).
(b) Assume a suitable generalization of differentiation under the integral for the improper
integrals and use it to prove that Yn is a solution to the Bessel differential equation
(14.23).
(c) For the proof of the appropriate generalization of differentiation under the integral, con-
sider the real-valued function
smooth, 55
space, 5
standard metric, 5
standard simplex, 128
Step function, 98
subsequence, 8
support, 116
tangent space, 92
term, 7
topological boundary, 16
topology, 14
uniformly continuous, 19
Upper sums, 99
vector field, 73
[ACa2003] N. A’Campo, A natural construction for the real numbers arXiv preprint 0301015,
(2003)
[Apo1983] T. Apostol, A proof that Euler missed: Evaluating ζ(2) the easy way The Mathe-
matical Intelligencer 5 no.3, p. 59–60 (1983)
[Aig2014] M. Aigner and G. M. Ziegler, Das BUCH der Beweise Springer, (2014)
[Bol1817] B. Bolzano, Rein analytischer Beweis des Lehrsatzes, daß zwischen je zwei Werthen,
die ein entgegengesetztes Resultat gewähren, wenigstens eine reelle Wurzel der Gleichung
liege, Haase Verl. Prag (1817)
[Cau1821] A.L. Cauchy, Cours d’analyse de l’école royale polytechnique L’Imprimerie Royale,
Debure frères, Libraires du Roi et de la Bibliothèque du Roi. Paris, (1821)
[Ded1872] R. Dedekind, Stetigkeit und irrationale Zahlen Friedrich Vieweg und Sohn, Braun-
schweig (1872)
[Hil1893] D. Hilbert, Über die Transzendenz der Zahlen e und π Mathematische Annalen 43,
216-219 (1893)
[Hos1715] G.F.A. Marquis de l’Hôpital, Analyse des Infiniment Petits pour l’Intelligence des
Lignes Courbes 2nde Edition, F. Montalant, Paris (1715)
137
Chapter 14.5 BIBLIOGRAPHY
[Zag1990] D. Zagier, A one-sentence proof that every prime p ≡ 1 mod 4 is a sum of two
squares. Amer. Math. Monthly 97, no.2, p. 144 (1990)