0% found this document useful (0 votes)
346 views478 pages

Advanced Calculus - Folland

Uploaded by

Ekin Yetkin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
346 views478 pages

Advanced Calculus - Folland

Uploaded by

Ekin Yetkin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 478

ADVANCED CALCULUS

Second Edition

Gerald B. Folland
Preface to the Second Edition

The second edition of Advanced Calculus is identical to the first edition,


except for the following points:

• All of the typographical and mathematical errors that were listed on


the errata sheet linked to my web page (last updated in 2021) have
been corrected.

• A brief summary of basic logic has been added. Due to technical


difficulties in producing the new pdf file, it is appended at the very
end of the book, after the index, and it is not listed in the table of
contents.

• There are a few insignificant changes in the formatting of the text.

The first edition of this book was published by Prentice-Hall (later sub-
sumed into Pearson Education) from 2002 to 2022. After their decision to
discontinue publication, the publication rights reverted to me, and I am
making the book freely available to everyone in pdf form.

Gerald B. Folland
Department of Mathematics
University of Washington
Seattle, WA 98195-4350
[email protected]

August 4, 2023
Contents

Preface ix

1 Setting the Stage 1


1.1 Euclidean Spaces and Vectors . . . . . . . . . . . . . . . . . . . 4
1.2 Subsets of Euclidean Space . . . . . . . . . . . . . . . . . . . . . 9
1.3 Limits and Continuity . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.6 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.7 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.8 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . 39

2 Differential Calculus 43
2.1 Differentiability in One Variable . . . . . . . . . . . . . . . . . . 43
2.2 Differentiability in Several Variables . . . . . . . . . . . . . . . . 53
2.3 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.4 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . 70
2.5 Functional Relations and Implicit Functions: A First Look . . . . 73
2.6 Higher-Order Partial Derivatives . . . . . . . . . . . . . . . . . . 77
2.7 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.8 Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.9 Extreme Value Problems . . . . . . . . . . . . . . . . . . . . . . 100
2.10 Vector-Valued Functions and Their Derivatives . . . . . . . . . . 106

3 The Implicit Function Theorem and Its Applications 113


3.1 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . 113
3.2 Curves in the Plane . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.3 Surfaces and Curves in Space . . . . . . . . . . . . . . . . . . . . 126
3.4 Transformations and Coordinate Systems . . . . . . . . . . . . . 133

v
vi Contents

3.5 Functional Dependence . . . . . . . . . . . . . . . . . . . . . . . 140

4 Integral Calculus 147


4.1 Integration on the Line . . . . . . . . . . . . . . . . . . . . . . . 147
4.2 Integration in Higher Dimensions . . . . . . . . . . . . . . . . . 158
4.3 Multiple Integrals and Iterated Integrals . . . . . . . . . . . . . . 168
4.4 Change of Variables for Multiple Integrals . . . . . . . . . . . . . 177
4.5 Functions Defined by Integrals . . . . . . . . . . . . . . . . . . . 188
4.6 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . 193
4.7 Improper Multiple Integrals . . . . . . . . . . . . . . . . . . . . . 202
4.8 Lebesgue Measure and the Lebesgue Integral . . . . . . . . . . . 207

5 Line and Surface Integrals; Vector Analysis 211


5.1 Arc Length and Line Integrals . . . . . . . . . . . . . . . . . . . 212
5.2 Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 222
5.3 Surface Area and Surface Integrals . . . . . . . . . . . . . . . . . 228
5.4 Vector Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 236
5.5 The Divergence Theorem . . . . . . . . . . . . . . . . . . . . . . 239
5.6 Some Applications to Physics . . . . . . . . . . . . . . . . . . . 243
5.7 Stokes’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 252
5.8 Integrating Vector Derivatives . . . . . . . . . . . . . . . . . . . 258
5.9 Higher Dimensions and Differential Forms . . . . . . . . . . . . . 267

6 Infinite Series 279


6.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . 279
6.2 Series with Nonnegative Terms . . . . . . . . . . . . . . . . . . . 284
6.3 Absolute and Conditional Convergence . . . . . . . . . . . . . . 295
6.4 More Convergence Tests . . . . . . . . . . . . . . . . . . . . . . 300
6.5 Double Series; Products of Series . . . . . . . . . . . . . . . . . . 306

7 Functions Defined by Series and Integrals 311


7.1 Sequences and Series of Functions . . . . . . . . . . . . . . . . . 311
7.2 Integrals and Derivatives of Sequences and Series . . . . . . . . . 320
7.3 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
7.4 The Complex Exponential and Trig Functions . . . . . . . . . . . 333
7.5 Functions Defined by Improper Integrals . . . . . . . . . . . . . . 336
7.6 The Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . 342
7.7 Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Contents vii

8 Fourier Series 355


8.1 Periodic Functions and Fourier Series . . . . . . . . . . . . . . . 355
8.2 Convergence of Fourier Series . . . . . . . . . . . . . . . . . . . 362
8.3 Derivatives, Integrals, and Uniform Convergence . . . . . . . . . 372
8.4 Fourier Series on Intervals . . . . . . . . . . . . . . . . . . . . . 377
8.5 Applications to Differential Equations . . . . . . . . . . . . . . . 381
8.6 The Infinite-Dimensional Geometry of Fourier Series . . . . . . . 392
8.7 The Isoperimetric Inequality . . . . . . . . . . . . . . . . . . . . 401

Appendices

A Summary of Linear Algebra 405


A.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
A.2 Linear Maps and Matrices . . . . . . . . . . . . . . . . . . . . . 406
A.3 Row Operations and Echelon Forms . . . . . . . . . . . . . . . . 409
A.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
A.5 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . 413
A.6 Subspaces; Dimension; Rank . . . . . . . . . . . . . . . . . . . . 414
A.7 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
A.8 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . 417

B Some Technical Proofs 419


B.1 The Heine-Borel Theorem . . . . . . . . . . . . . . . . . . . . . 419
B.2 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . 420
B.3 Approximation by Riemann Sums . . . . . . . . . . . . . . . . . 422
B.4 Double Integrals and Iterated Integrals . . . . . . . . . . . . . . . 424
B.5 Change of Variables for Multiple Integrals . . . . . . . . . . . . . 425
B.6 Improper Multiple Integrals . . . . . . . . . . . . . . . . . . . . . 432
B.7 Green’s Theorem and the Divergence Theorem . . . . . . . . . . 433

Answers to Selected Exercises 441

Bibliography 453

Index 455
PREFACE

This is a book about the theory and applications of derivatives (mostly partial),
integrals (mostly multiple or improper), and infinite series (mostly of functions
rather than of numbers), at a deeper level than is found in the standard calculus
books.
In recent years there has been a tendency for the courses that were once called
“advanced calculus” to turn into courses on the foundations of analysis. Students
typically start with a year and a half of calculus that emphasizes computations and
applications, then proceed (perhaps by way of a “bridge course” on mathematical
reasoning) to a course of an entirely theoretical nature that covers such thins as
the topology of Euclidean space, the theory of the Riemann integral, and proofs of
some theorems that have been taken on faith before.
I am not persuaded that such a divorce of the practical from the theoretical
aspects of the subject is a good idea. On the one hand, the study of theoretical un-
derpinnings of ideas with which one is already familiar tends to be dry and tedious,
and the development of unfamiliar ideas can be rather daunting unless it is accom-
panied by some hands-on experience with concrete examples and applications. On
the other hand, relegation of the computations and applications to the elementary
courses means that students are not exposed to these matters on a more sophisti-
cated level. (How many students recognize that Taylor polynomials should be part
of one’s everyday tool kit? How many know that the integral test gives an effective
way of approximating the sum of a series?)
This book is an attempt to present a unified view of calculus in which theory
and practice can reinforce each other. On the theoretical side, it is reasonably com-
plete and self-contained. Accordingly, it contains a certain amount of “foundations
of analysis,” but I have kept this material to the bare minimum needed for the main
topics of the book. I also place a higher premium on intuitive understanding than
on formal proofs and technical definitions. Along with the latter, therefore, I often
offer informal arguments and ideas, sometimes involving infinitesimals, that may
provide more enlightenment than the strictly rigorous approach. The worked-out

ix
x Preface

examples and exercises run the gamut from routine calculations to theoretical ar-
guments; many of them involve a mixture of the two. The reader whose interest in
the theory is limited should be able to benefit from the book by skipping many of
the proofs.
The essential prerequisite for this book is a sound knowledge of the mechanics
of one-variable calculus. The theory of differentiation and integration on the real
line is presented, rather tersely, in Sections 2.1 and 4.1, but I assume that the reader
is thoroughly familiar with the standard techniques for calculating derivatives and
integrals. Some previous experience with infinite series, partial derivatives, and
multiple integrals might be helpful but is not really necessary. And, of course, for a
full appreciation of the theory one needs a certain level of comfort with mathemat-
ical reasoning, but that is best acquired with practice and experience.
An acquaintance with linear algebra is needed in a few places, particularly §2.8
(classification of critical points), §2.10 (differentiation of vector-valued functions
of vector variables), §3.1 and §§3.4–5 (the implicit function theorem for systems
of equations, the inverse mapping theorem, and functional dependence), and §4.4
(change of variables for multiple integrals). However, most of this material can
be done in the two- and three-dimensional cases (perhaps by eliding parts of some
proofs) with vector algebra and a little ad hoc discussion of matrices and determi-
nants. In any case, Appendix A provides a brief summary of the necessary concepts
and results from linear algebra.
A few of the more formidable proofs have been exiled to Appendix B. In some
of them, the ratio of the amount of work required to the amount of understanding
gained is especially high. Others involve ideas such as the Heine-Borel theorem or
partitions of unity that are best appreciated at a more advanced level. Of course,
the decisions on what to put into Appendix B reflect my personal tastes; instructors
will have to make their own choices of what to include or omit.
In this book a single numeration system is used for theorems, lemmas, corollar-
ies, propositions, and displayed formulas. Thus, for each m and n there is only one
item of any of these types labeled m.n, and it is guaranteed to follow m.(n − 1)
and precede m.(n + 1). This procedure minimizes the amount of effort needed to
locate referenced items.
In a few places I offer glimpses into the world of more advanced analysis.
Chapters 4 and 5 end with brief, informal sketches of the Lebesgue integral and
the theory of differential forms; Chapter 8 leads to the point where the realm of
eigenfunction expansions and spectral theory is visible on the horizon. I hope that
many of my readers will accept the invitation to explore further.
Acknowledgments. This book has benefited from the comments and suggestions
of a number of people: my colleague James Morrow, the students in the advanced
xi

calculus classes that he and I have taught over the past three years in which prelimi-
nary versions of this book were used, and several reviewers, especially Jeffrey Fox.
I am also grateful to my editor, George Lobell, for his support and enthusiasm.
Errata. Responsibility for errors in this book, of course, remains with me.
Responsibility for informing me of these errors, however, rests with my readers.
Anyone who finds misprints, mistakes, or obscurities is urged to write to me at the
address below. I will post such things on a web site that will be accessible from
www.math.washington.edu.

Gerald B. Folland
Department of Mathematics
University of Washington
Seattle, WA 98195-4350
[email protected]
Chapter 1

SETTING THE STAGE

The first half of this chapter (§§1.1–4) presents basic facts and concepts concern-
ing geometry, vectors, limits, continuity, and sequences; the material in it is used
throughout the later chapters. The second half (§§1.5–8) deals with some of the
more technical topological results that underlie calculus. It is quite concise and in-
cludes nothing but what is needed in this book. The reader who wishes to proceed
quickly to the study of differentiation and integration may scan it quickly and refer
back to it as necessary; on the other hand, the reader who wishes to see a more
extensive development of this material is referred to books on the foundations of
analysis such as DePree and Swartz [5], Krantz [12], or Rudin [18].1
At the outset, let us review some standard notation and terminology for future
reference:

• Sums:
! If a1 , a2 , . . .!
, ak are numbers, their sum a1 + a2 + · · · + ak is denoted
by k1 an , or by kn=1 an if necessary for clarity. The sum need not be
started at n = 1; more generally, if j < k, we have

k
"
an = aj + aj+1 + · · · + ak .
j

The letters j and k denote the limits of summation; the letter n is analo-
gous to a dummy variable in an integral and may be replaced by any other
letter that is not already in use without
! changing the meaning of the sum.
We shall occasionally write simply an when the limits of summation are
understood.
1
Numbers in brackets refer to the bibliography at the end of the book.

1
2 Chapter 1. Setting the Stage

• Factorials: If n is a positive integer, n! (“n factorial”) is the product of


all the integers from 1 to n. By convention, 0! = 1, so that the formula
n! = n · (n − 1)! remains true even for n = 1.

• Sets: If S and T are two sets, S ∪ T and S ∩ T denote their union and
intersection, respectively, and S \ T denotes the set of all elements of S that
are not in T . The expressions “S ⊂ T ” and “T ⊃ S” both mean that S is a
subset of T , including the possibility that S = T , and “x ∈ S” and “x ∈ / S”
mean, respectively, that x is or is not an element of S. The set of all objects
x satisfying a property P (x) is denoted by {x : P (x)}, and empty set is
denoted by ∅.

# k $k intersection of a family S1 , S2 , . . . , Sk of sets


The union and #are denoted
$ by
S
1 n and S
1 n . The conventions for
! using the symbols and are the
same as those for the summation sign described above.

• Real numbers: The set of real numbers is denoted by R. The following


notations are used for intervals in R:
% & % &
(a, b) = x : a < x < b , [a, b] = x : a ≤ x ≤ b
% & % &
(a, b] = x : a < x ≤ b , [a, b) = x : a ≤ x < b .

Intervals of the form (a, b) are called open; intervals of the form [a, b] are
called closed; and intervals of the forms (a, b] and [a, b) are called half-open.
(Of course, the symbol (a, b) is also used to denote the ordered pair whose
first and second members are a and b, respectively; remarkably enough, this
rarely causes any confusion.)
If {x1 , . . . , xk } is a finite set of real numbers, its largest and smallest ele-
ments are denoted by max(x1 , . . . , xk ) and min(x1 , . . . , xk ), respectively.

• Infinity. In discussing limits it is often convenient to add two “points at in-


finity” ∞ (also called +∞) and −∞ to the real number system. These are
not real numbers, and one can perform arithmetical operations on them only
with great caution, but there is no harm in thinking of them as actual math-
ematical objects. The points ±∞ may be used as endpoints of intervals; for
example, (a, ∞) = {x : x > a}. Intervals of the form [a, ∞) and (−∞, b]
are classified as closed intervals; (a, ∞) and (−∞, a) are open.

• Complex numbers: The imaginary unit −1 is denoted by i, although the
letter i may be used for other purposes when complex numbers are not under
discussion. The set of complex numbers, that is, numbers of the form x + iy
3

where x, y ∈ R, is denoted by C. As a set, C may be identified with the


Cartesian plane by the correspondence x + iy ←→ (x, y), and we speak of
“the complex plane C.” If z = x + iy is a complex number, x and y are
called its real and imaginary parts, respectively, and are denoted by Re z
and Im z. The number x − iy √ the complex conjugate of z and is
is called'
denoted by z, and the number zz = x2 + y 2 (the distance from (x, y)
to the origin in the plane) is called the absolute value of z and is denoted by
|z|.

• Mappings and functions: A mapping, or map, is a rule f that assigns to each


element of some set A an element of some other set B (possibly equal to A).
We write f : A → B to display all these ingredients together. If x ∈ A, the
element of B assigned to x by f is called the value of f at x and is denoted
by f (x). If S is a subset of A, the set of values {f (x) : x ∈ S} is denoted
by f (S). The set A is called the domain of f , and the set f (A) (a subset of
B) is called the range of f . The mapping f : A → B is called one-to-one if
f (x) = f (y) only when x = y, and f is said to map A onto B if f (A) = B.
If f : A → B and g : B → C are mappings, their composition is the
mapping g ◦ f : A → C defined by (g ◦ f )(x) = g(f (x)).
A mapping f : A → B is said to be invertible if there is another mapping
g : B → A such that g(f (x)) = x for all x ∈ A and f (g(y)) = y for all
y ∈ B. The equation g(f (x)) = x can be valid for all x ∈ A only if f is
one-to-one, and the equation f (g(y)) = y can be valid for all y ∈ B only if
f maps A onto B. Conversely, if these two conditions are satisfied, it is easy
to verify that f is invertible. In this case, the mapping g is called the inverse
of f and is commonly denoted by f −1 .
Mappings are sometimes called “functions,” but we shall reserve the term
function for mappings whose values are real numbers, complex numbers,
or vectors. Mappings of a set A into itself (B = A) are sometimes called
transformations.

• Special functions: In this book, we denote the natural logarithm by log rather
than ln, this being the common usage in advanced mathematics. Also, we de-
note the principal branches of the inverse trig functions by arcsin, arccos, and
arctan; arcsin and arccos map [−1, 1] onto [− 21 π, 12 π] and [0, π], respectively,
and arctan maps R onto (− 21 π, 12 π).

• Logical symbols: We shall sometimes use the symbols =⇒ and ⇐⇒ to de-


note logical implication and equivalence, respectively. That is, if A and B
4 Chapter 1. Setting the Stage

are mathematical statements, “A =⇒ B” is read “A implies B” or “If A,


then B,” and “A ⇐⇒ B” is read “A is equivalent to B” or “A if and only
if B.” We point out that “A =⇒ B” and “not B =⇒ not A” are logically
equivalent; that is, in order to prove that hypothesis A implies conclusion B,
one may assume that B is false and show that A is false.

1.1 Euclidean Spaces and Vectors


We shall be studying functions of several real variables, say f (x1 , x2 , . . . , xn ).
In elementary treatments of the subject one usually focuses on the cases n = 2
and n = 3, because these are the ones where ordered n-tuples of numbers can
represent points in physical space. However, most of the ideas work equally well
for any number of variables, and it is helpful to continue using geometric language
in this more general setting even though “n-dimensional space” doesn’t correspond
directly to a physical object that can be visualized.
The set of all ordered n-tuples of real numbers is called n-dimensional Eu-
clidean space and is denoted by Rn . We will denote such n-tuples either by writing
out the components or by single boldface letters:
x = (x1 , x2 , . . . , xn ).
The n-tuple whose components are all zero is denoted by 0:
0 = (0, 0, . . . , 0).
When n = 2 or 3, we shall often write (x, y) or (x, y, z) instead of (x1 , x2 ) or
(x1 , x2 , x3 ), but we shall still use x as a single symbol to denote the ordered pair
or triple.
Ordered n-tuples of numbers lead a double life. We usually think of the n-
tuple (x1 , . . . , xn ) as representing the Cartesian coordinates of a point in the n-
dimensional space Rn . However, sometimes we think of it as representing a “quan-
tity with magnitude and direction” such as a force or velocity and visualize it as an
arrow. There is some virtue in maintaining a notational distinction between these
two concepts, but we shall not attempt to do so.
To express the basic ideas of n-dimensional geometry it is convenient to use
the language of vector algebra. Most of the vector operations work equally well in
any number of dimensions:
Addition : x + y = (x1 + y1 , . . . , xn + yn ),
Scalar multiplication: cx = (cx1 , . . . , cxn ),
Dot product : x · y = x1 y 1 + · · · + xn y n .
1.1. Euclidean Spaces and Vectors 5

The exception is the cross product, which is peculiar to 3 dimensions; we shall


discuss it at the end of this section. If x ∈ Rn , the norm of x is defined to be
( √
|x| = x21 + · · · + x2n = x · x.

Some people denote norms by double vertical bars, thus: ∥x∥.


There are two fundamental inequalities involving the dot product and norm,
Cauchy’s inequality and the triangle inequality. The reader is probably familiar
with them in dimensions 2 and 3, and the ideas are exactly the same in higher
dimensions.

1.1 Proposition (Cauchy’s Inequality). For any a, b ∈ Rn ,

|a · b| ≤ |a| |b|.

Proof. If b = 0 then both sides of the inequality are 0. Otherwise, we introduce a


real variable t and consider the function

f (t) = |a − tb|2 = (a − tb) · (a − tb) = |a|2 − 2ta · b + t2 |b|2 .

This is a quadratic function of t. Its minimum value occurs at t = (a · b)/|b|2 , and


that minimum value is

(a · b)2
f ((a · b)/|b|2 ) = |a|2 − .
|b|2

On the other hand, clearly f (t) ≥ 0 for all t, so

(a · b)2
|a|2 − ≥ 0.
|b|2

Multiplying through by |b|2 , we obtain the desired result: |a|2 |b|2 ≥ (a · b)2 .

Note. Cauchy’s inequality is also called Schwarz’s inequality, the Cauchy-


Schwarz inequality, or Buniakovsky’s inequality. (Schwarz and Buniakovsky in-
dependently discovered the corresponding result for integrals of functions, namely,
)* b ) +* b ,1/2 +* b ,1/2
) )
) )
f (x)g(x) dx) ≤ 2
|f (x)| dx 2
|g(x)| dx ,
)
a a a

which can be proved in much the same way.)


6 Chapter 1. Setting the Stage

1.2 Proposition (The Triangle Inequality). For any a, b ∈ Rn ,


|a + b| ≤ |a| + |b|.
Proof. We have |a + b|2 = (a + b) · (a + b) = |a|2 + 2a · b + |b|2 . By Cauchy’s
inequality, this last sum is at most |a|2 + 2|a| |b| + |b|2 = (|a| + |b|)2 , so the result
follows by taking square roots.

The distance between two points x and y in 3-space is given by


'
(x1 − y1 )2 + (x2 − y2 )2 + (x3 − y3 )2 = |x − y|,
and similarly for points in the plane. We shall take this as a definition of distance
in n-space for any n:
Distance from x to y = |x − y|.
By taking a = x − y and b = y − z in the triangle inequality, we see that
|x − z| ≤ |x − y| + |y − z|
for any x, y, z ∈ Rn . That is, the distance from x to z is at most the sum of the
distances from x to y and from y to z, for any intermediate point y. Hence the
name “triangle inequality”: One side of a triangle is at most the sum of the other
two sides.
If we think of two vectors x and y as arrows emanating from the same point, we
can speak of the angle θ between them. The familiar formula for θ in dimensions
2 and 3 remains valid in higher dimensions:
- .
x·y
θ = arccos .
|x| |y|
Cauchy’s inequality says that the quotient in parentheses always lies in the interval
[−1, 1], so it is indeed the cosine of some number θ ∈ [0, π].
In particular, the directions of two vectors x and y are perpendicular to each
other if and only if x · y = 0. In this case the vectors are said to be orthogonal to
each other.
In many situations we need to control the magnitude, i.e., the norm, of a vector
x = (x1 , . . . , xn ), but it is often more convenient to work with the magnitudes of
the components xj of x. In such cases the following inequalities are useful. Let M
be the largest of the numbers |x1 |, . . . , |xn |. Then M 2 ≤ x21 + · · · + x2n (because
M 2 is one of the numbers on the right), and x21 + · · · + x2n ≤ nM 2 (because each
number on the left is at most M 2 ). In other words,
/ 0 √ / 0
(1.3) max |x1 |, . . . , |xn | ≤ |x| ≤ n max |x1 |, . . . , |xn | .
1.1. Euclidean Spaces and Vectors 7

Cross Products. Let i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1) be the
standard basis vectors for R3 ; then an arbitrary vector a ∈ R3 can be written as

a = (a1 , a2 , a3 ) = a1 i + a2 j + a3 k.

The cross product of two vectors a, b ∈ R3 is defined by


⎛ ⎞
i j k
a × b = det⎝a1 a2 a3 ⎠ = (a2 b3 − a3 b2 )i + (a3 b1 − a1 b3 )j + (a1 b2 − a2 b1 )k.
b1 b2 b3

(For a review of determinants, see Appendix A, (A.24)–(A.33).) It is easily verified


that cross products distribute over addition and scalar multiplication in the usual
way:

(c1 a1 + c2 a2 ) × b = c1 (a1 × b) + c2 (a2 × b),


a × (c1 b1 + c2 b2 ) = c1 (a × b1 ) + c2 (a × b2 ).

The cross product is anticommutative:

a × b = −b × a.

It is not associative; that is, a × (b × c) ̸= (a × b) × c in general. Instead, it


satisfies a quasi-associative law called the Jacobi identity:

a × (b × c) + b × (c × a) + c × (a × b) = 0.

A messy but straightforward calculation shows that

|a × b|2 = |a|2 |b|2 − (a · b)2 .

(|a × b|2 is the sum of the squares of the components of a × b. Multiply it out and
rearrange the terms to get |a|2 |b|2 − (a · b)2 .) If θ is the angle between a and b
(0 ≤ θ ≤ π), we know that a · b = |a| |b| cos θ, so

|a × b|2 = |a|2 |b|2 (1 − cos2 θ), or |a × b| = |a| |b| sin θ.

If a and b represent two sides of a parallelogram and we take a to be the “base,”


then |b| sin θ is the “height”; hence, |a × b| is the area of the parallelogram
generated by a and b. Another easy calculation shows that

a · (a × b) = b · (a × b) = 0;
8 Chapter 1. Setting the Stage

a×b
b

θ
a

F IGURE 1.1: The geometry of the cross product.

in other words, a × b is orthogonal to both a and b. See Figure 1.1.


The two italicized statements specify the magnitude and direction of a × b in
purely geometric terms and show that a × b has an intrinsic geometric meaning,
independent of the choice of coordinate axes. Well, almost: The fact that a × b
is orthogonal to both a and b specifies its direction only up to a factor of ±1, and
this last bit of information is provided by the “right hand rule”: If you point the
thumb and first finger of your right hand in the directions of a and b, respectively,
and bend the middle finger so that it is perpendicular to both of them, the middle
finger points in the direction of a × b. Thus the definition of cross product is tied
to the convention of using “right-handed” coordinate systems. If we were to switch
to “left-handed” ones, all cross products would be multiplied by −1.

EXERCISES
1. Let x = (3, −1, −1, 1) and y = (−2, 2, 1, 0). Compute the norms of x and y
and the angle between them.
2. Given x, y ∈ Rn , show that
a. |x + y|2 = |x|2 + 2x · y + |y|2 .
b. |x + y|2 + |x − y|2 = 2(|x|2 + |y|2 ).
3. Suppose x1 , . . . , xk ∈ Rn .
a. Generalize Exercise 2a to obtain a formula for |x1 + · · · + xk |2 .
b. (The Pythagorean Theorem) Suppose the vectors xj are mutually orthog-
onal, i.e., that xi · xj = 0 for i ̸= j. Show that |x1 + · · · + xk |2 =
|x1 |2 + · · · + |xk |2 .
4. Under what conditions on a and b is Cauchy’s inequality an equality? (Exam-
ine the proof.)
5. Under what conditions on a and b is the triangle inequality an equality?
1.2. Subsets of Euclidean Space 9
) )
6. Show that ) |a| − |b| ) ≤ |a − b| for every a, b ∈ Rn .
7. Suppose a, b ∈ R3 .
a. Show that if a · b = 0 and a × b = 0, then either a = 0 or b = 0.
b. Show that if a · c = b · c and a × c = b × c for some nonzero c ∈ R3 ,
then a = b.
c. Show that (a × a)× b = a × (a × b) if and only if a and b are proportional
(i.e., one is a scalar multiple of the other).
8. Show that a · (b × c) is the determinant of the matrix whose rows are a, b, and
c (if these vectors are considered as row vectors) or the matrix whose columns
are a, b, and c (if they are considered as column vectors).

1.2 Subsets of Euclidean Space


In this section we introduce some standard terminology for sets in Rn .
First, the set of all points whose distance from a fixed point a is equal to some
number r is called the sphere of radius r about a, and the set of points whose dis-
tance from a is less than r is called the (open) ball of radius r about a. (In ordinary
English the word “sphere” is often used for both these purposes, but mathemati-
cians have found it helpful to reserve the word “sphere” for the spherical surface
and to use “ball” to denote the solid body.) We shall use the notation B(r, a) for
the ball of radius r about a:
% &
B(r, a) = x ∈ Rn : |x − a| < r .

Of course, when in dimension 1, a ball is just an open interval, and in dimension 2,


the words “disc” and “circle” may be used in place of “ball” and “sphere.”
A set S ⊂ Rn is called bounded if it is contained in some ball about the origin,
that is, if there is a constant C such that |x| < C for every x ∈ S.
When one studies functions of a single variable, one frequently considers inter-
vals in the real line, and it is often necessary to distinguish between open intervals
(with the endpoints excluded) and closed intervals (with the endpoints included).
When n > 1, there is a much greater variety of interesting subsets of Rn to be
considered, but the notions of “open” and “closed” are still fundamental. Here are
the definitions.
Let S be a subset of Rn .
• The complement of S is the set of all points in Rn that are not in S; we
denote it by Rn \ S or by S c :
% &
S c = Rn \ S = x ∈ Rn : x ∈ /S .
10 Chapter 1. Setting the Stage

• A point x ∈ Rn is called an interior point of S if all points sufficiently


close to x (including x itself) are also in S, that is, if S contains some ball
centered at x. The set of all interior points of S is called the interior of S
and is denoted by S int :
% &
S int = x ∈ S : B(r, x) ⊂ S for some r > 0 .

• A point x ∈ Rn is called a boundary point of S if every ball centered at x


contains both points in S and points in S c . (Note that if x is a boundary point
of S, x may belong to either S or S c .) The set of all boundary points of S is
called the boundary of S and is denoted by ∂S:
% &
∂S = x ∈ Rn : B(r, x) ∩ S ̸= ∅ and B(r, x) ∩ S c ̸= ∅ for every r > 0 .

(Remark. We shall use the term “boundary” slightly differently in §5.7 in


connection with Stokes’s theorem, in the context of surfaces in R3 being
“bounded” by curves. But the present definition is the general-purpose one.)

• S is called open if it contains none of its boundary points.

• S is called closed if it contains all of its boundary points.

• The closure of S is the union of S and all its boundary points. It is denoted
by S:
S = S ∪ ∂S.

• Finally, a neighborhood of a point x ∈ Rn is a set of which x is an interior


point. That is, S is a neighborhood of x if and only if x is an interior point
of S.

Let us examine these ideas a little more closely. First, notice that the boundary
points of S are the same as the boundary points of S c ; the definition of boundary
point remains unchanged if S and S c are switched. Moreover, if x is neither an
interior point of S nor an interior point of S c , then x must be a boundary point of
S. In other words, given S ⊂ Rn and x ∈ Rn , there are exactly three possibilities:
x is an interior point of S, or x is an interior point of S c , or x is a boundary point
of S.

1.4 Proposition. Suppose S ⊂ Rn .


a. S is open ⇐⇒ every point of S is an interior point.
b. S is closed ⇐⇒ S c is open.
1.2. Subsets of Euclidean Space 11

Proof. Every point of S is either an interior point or a boundary point; thus S is


open ⇐⇒ every point of S is an interior point. On the other hand, S is closed
⇐⇒ it contains all of ∂S, which is the same as ∂(S c ); this happens precisely when
S c contains none of its boundary points, i.e., when S c is open.

E XAMPLE 1. Let S be B(ρ, 0), the ball of radius ρ about the origin. First,
given x ∈ S, let r = ρ − |x|. If |y − x| < r, then by the triangle inequality we
have |y| ≤ |y − x| + |x| < ρ, so that B(r, x) ⊂ S. Therefore, every x ∈ S is
an interior point of S, so S is open. Second, a similar calculation shows that if
|x| > ρ then B(r, x) ⊂ S c where r = |x| − ρ, so every point with |x| > ρ is an
interior point of S c . On the other hand, if |x| = ρ, then cx ∈ S for 0 < c < 1
and cx ∈ S c for c ≥ 1, and |cx − x| = |c − 1|ρ can be as small as we please,
so x is a boundary point. In other words, the boundary of S is the sphere of
radius ρ about the origin, and the closure of S is the closed ball {x : |x| ≤ ρ}.
E XAMPLE 2. Now let S be the ball of radius ρ about the origin together with
the “upper hemisphere” of its boundary:
% &
S = B(ρ, 0) ∪ x ∈ Rn : |x| = ρ and xn > 0 .

The calculations in Example 1 show that S int is the open ball B(ρ, 0); ∂S is
the sphere {x : |x| = ρ}, and S is the closed ball {x : |x| ≤ ρ}. The set S is
neither open nor closed.
E XAMPLE 3. In the real line (i.e., n = 1), let S be the set of all rational
numbers. Since every ball in R — that is, every interval — contains both
rational and irrational numbers, every point of R is a boundary point of S. The
set S is neither open nor closed; its interior is empty; and its closure is R.

Subsets of Rn are often specified in terms of equations or inequalities — for


example, by an expression of the form
% &
(1.5) S = x ∈ Rn : f (x) ! 0 ,

where ! denotes one of the relations =, <, >, ≤, ≥. (Taking the quantity on the
right of ! to be 0 is no restriction; just move all the terms over to the left side.) We
anticipate some results from §1.3 in giving the following rule of thumb: Sets defined
by strict inequalities are open; sets defined by equalities or weak inequalities are
closed. More precisely, if S is given by (1.5) where the function f is continuous,
then S is open if ! denotes < or >, and S is closed if ! denotes =, ≤, or ≥. The
reader may feel free to use this rule in doing the exercises.
12 Chapter 1. Setting the Stage

EXERCISES

1. For each of the following sets S in the plane R2 , do the following: (i) Draw a
sketch of S. (ii) Tell whether S is open, closed, or neither. (iii) Describe S int ,
S, and ∂S. (These descriptions should be in the same set-theoretic language as
the description of S itself given here.)
a. S = {(x, y) : 0 < x2 + y 2 ≤ 4}.
b. S = {(x, y) : x2 − x ≤ y ≤ 0}.
c. S = {(x, y) : x > 0, y > 0, and x + y > 1}.
d. S = {(x, y) : y = x3 }.
e. S = {(x, y) : x > 0 and y = sin(1/x)}.
f. S = {(x, y) : x2 + y 2 < 1} \ {(x, 0) : x < 0}.
g. S = {(x, y) : x and y are rational numbers in [0, 1]}.
2. Show that for any S ⊂ Rn , S int is open and ∂S and S are both closed. (Hint:
Use the fact that balls are open, proved in Example 1.)
3. Show that if S1 and S2 are open, so are S1 ∪ S2 and S1 ∩ S2 .
4. Show that if S1 and S2 are closed, so are S1 ∪ S2 and S1 ∩ S2 . (One way is to
use Exercise 3 and Proposition 1.4b.)
5. Show that the boundary of S is the intersection of the closures of S and S c .
Give an example of an infinite collection S1 , S2 , . . . of closed sets whose union
6. #

j=1 Sj is not closed.
7. There are precisely two subsets of Rn that are both open and closed. What are
they?
8. Give an example of a set S such that the interior of S is unequal to the interior
of the closure of S.
9. Show that the ball of radius r about a is contained in the ball of radius r + |a|
about the origin. Conclude that a set S ⊂ Rn is bounded if it is contained in
some ball (whose center can be anywhere in Rn ).

1.3 Limits and Continuity


We now commence our study of functions defined on Rn or subsets of Rn . For
the most part we shall be dealing with real-valued functions, but in many situations
we shall deal with vector-valued or complex-valued functions, that is, functions
whose values lie in Rk or C. For our present purposes we can regard C as R2 by
identifying the complex number u + iv with the ordered pair (u, v), so it is enough
to consider vector-valued functions. But we begin with the real-valued case.
1.3. Limits and Continuity 13

Suppose f is a real-valued function defined on Rn . We say that

lim f (x) = L,
x→a

and call L the limit of f (x) as x approaches a, if f (x) becomes as close as we


wish to L provided x is sufficiently close to, but not equal to, a. More formally,
the statement limx→a f (x) = L means that for any positive number ϵ there is a
positive number δ so that

(1.6) |f (x) − L| < ϵ whenever 0 < |x − a| < δ.

This condition can be rephrased in terms of the individual components xj − aj of


x − a, as follows: limx→a f (x) = L if and only if for every positive number ϵ
there is a positive number δ′ so that
/ 0
(1.7) |f (x) − L| < ϵ whenever 0 < max |x1 − a1 |, . . . , |xn − an | < δ′ .

The equivalence of (1.6) and (1.7) follows from (1.3): If (1.6) is satisfied, then

(1.7) is satisfied with δ′ = δ/ n; and if (1.7) is satisfied, then (1.6) is satisfied
with δ = δ′ .
More generally, we can consider functions f that are only defined on a subset
S of Rn and points a that lie in the closure of S. The definition of limx→a f (x) is
the same as before except that x is restricted to lie in the set S. It may be necessary,
for the sake of clarity, to specify this restriction explicitly; for this purpose we use
the notation
lim f (x).
x→a, x∈S

In particular, for a function f on the real line we often need to consider the one-
sided limits

lim f (x) = lim f (x) and lim f (x) = lim f (x).


x→a+ x→a, x>a x→a− x→a, x<a

For example, let f : R → R be the function defined by f (x) = x + 1 for |x| ≤ 1


and f (x) = 0 for |x| > 1. Then limx→1 f (x) does not exist, but limx→1− f (x) = 2
and limx→1+ f (x) = 0.
Notice that the definition of limx→a f (x) does not involve the value f (a) at
all; only the values of f at points near a but unequal to a are relevant. Indeed, f
need not even be defined at a — a situation that arises, for example, in the limits
that define derivatives. On the other hand, if limx→a f (x) and f (a) both exist and
are equal, that is, if
lim f (x) = f (a),
x→a
14 Chapter 1. Setting the Stage

then f is said to be continuous at a.


If f is continuous at every point of a set U ⊂ Rn , f is said to be continuous on
U . Going back to the condition (1.6) that defines limits, we see that the continuity
of f on U is equivalent to the following condition: For every positive number ϵ and
every a ∈ U there is a positive number δ so that

(1.8) |f (x) − f (a)| < ϵ whenever |x − a| < δ.

Informally speaking, f is continuous if changing the input values by a small amount


changes the output values by only a small amount.
The same definitions apply equally well to vector-valued functions, that is,
functions f with values in Rk for some k > 1. In this case the limit L is an el-
ement of Rk , and |f (x) − L| is the norm of the vector f (x) − L. In view of (1.3),
it is clear that

lim f (x) = L ⇐⇒ lim fj (x) = Lj for j = 1, . . . , k.


x→a x→a

Thus the study of limits and continuity of vector-valued functions is easily reduced
to the scalar case, to which we now return out attention.
We often express the relation limx→a f (x) = L informally by saying that f (x)
approaches L as x approaches a. In one dimension this works quite well; we can
envision x as the location of a particle that moves toward a from the right or the
left. But in higher dimensions there are infinitely many different paths along which
a particle might move toward a, and for the limit to exist one must get the same
result no matter which path is chosen. It is safer to abandon the “dynamic” picture
of a particle moving toward a; we should simply think in terms of f (x) being close
to L provided that x is close to a, without reference to any motion.
xy
E XAMPLE 1. Let f (x, y) = if (x, y) ̸= (0, 0), and let f (0, 0) =
x2 + y 2
0. Show that lim(x,y)→(0,0) f (x, y) does not exist — and, in particular, f is
discontinuous at (0, 0).
Solution. First, note that f (x, 0) = f (0, y) = 0 for all x and y, so
f (x, y) → 0 as (x, y) approaches (0, 0) along the x-axis or the y-axis. But
if we consider other straight lines passing through the origin, say y = cx, we
have f (x, cx) = cx2 /(x2 + c2 x2 ) = c/(1 + c2 ), so the limit as (x, y) ap-
proaches (0, 0) along the line y = cx is c/(1 + c2 ). Depending on the value
of c, this can be anything between − 12 and 21 (these two extreme values being
achieved when c = −1 or c = 1). So there is no limit as (x, y) approaches
(0, 0) unrestrictedly.
1.3. Limits and Continuity 15

The argument just given suggests the following line of thought. We wish to
know if limx→a f (x) exists. We look at all the straight lines passing through a
and evaluate the limit of f (x) as x approaches a along each of those lines by one-
variable techniques; if we always get the same answer L, then we should have
limx→a f (x) = L, right? Unfortunately, this doesn’t work:

x2 y
E XAMPLE 2. Let g(x, y) = if (x, y) ̸= (0, 0) and g(0, 0) = 0. Again
x4 + y 2
we have g(x, 0) = g(0, y) = 0, so the limit as (x, y) → (0, 0) along the
coordinate axes is 0. Moreover, if c ̸= 0,

cx4 cx
g(x, cx) = 4 2 2
= 2 → 0 as x → 0,
x +c x c + x2
so the limit as (x, y) → (0, 0) along any other straight line is also 0. But if we
approach along a parabola y = cx2 , we get

cx3 c
g(x, cx2 ) = 4 2 4
= ,
x +c x 1 + c2

which can be anything between − 12 and 12 as before, so the limit does not
exist. (The similarity with Example 1 is not accidental: If f is the function in
Example 1 we have g(x, y) = f (x2 , y).)

After looking at examples like this one, one might become discouraged about
the possibility of ever proving that limits do exist! But things are not so bad. If f is a
continuous function, limx→a f (x) is simply f (a). Moreover, most of the functions
of several variables that one can easily write down are built up from continuous
functions of one variable by using the arithmetic operations plus composition, and
these operations all preserve continuity (except for division when the denominator
vanishes).
Here are the precise statements and proofs of the fundamental results. (The
reader may wish to skip the proofs; they are of some value as illustrations of the sort
of formal arguments involving limits that are important in more advanced analysis,
but they contribute little to an intuitive understanding of the results.)

1.9 Theorem. Suppose f : Rn → Rm is continuous on U ⊂ Rn and g : Rm → Rk


is continuous on f (U ) ⊂ Rm . Then the composite function g ◦ f : Rn → Rk is
continuous on U .

Proof. Let ϵ > 0 and a ∈ U be given, and let b = f (a). Since g is continuous on
f (U ), we can choose η > 0 so that |g(y)−g(b)| < ϵ whenever |y−b| < η. Having
16 Chapter 1. Setting the Stage

chosen this η, since f is continuous on U we can find δ > 0 so that |f (x) − b| < η
whenever |x − a| < δ. Thus,

|x − a| < δ =⇒ |f (x) − f (a)| < η =⇒ |g(f (x)) − g(f (a))| < ϵ,

which says that g ◦ f is continuous on U .

1.10 Theorem. Let f1 (x, y) = x + y, f2 (x, y) = xy, and g(x) = 1/x. Then f1
and f2 are continuous on R2 and g is continuous on R \ {0}.
Proof. To prove continuity of f1 and f2 , we need to show that lim(x,y)→(a,b) x+y =
a + b and lim(x,y)→(a,b) xy = ab for every a, b ∈ R. That is, given ϵ > 0 and
a, b ∈ R, we need to find δ > 0 so that if |x − a| < δ and |y − b| < δ, then (i)
|(x + y) − (a + b)| < ϵ or (ii) |xy − ab| < ϵ. For (i) we can simply take δ = 21 ϵ,
for if |x − a| < 12 ϵ and |y − b| < 21 ϵ, then

|(x + y) − (a + b)| = |(x − a) + (y − b)| ≤ |x − a| + |y − b| < 12 ϵ + 21 ϵ = ϵ.

For (ii) we observe that xy − ab = (x − a)y + a(y − b), so we can make xy − ab


small by making the two terms on the right small. Indeed, let
- .
ϵ ϵ
δ = min 1, , .
2(|a| + 1) 2(|b| + 1)
If |x − a| < δ and |y − b| < δ, then |y| < |b| + δ ≤ |b| + 1, so

|xy − ab| ≤ |x − a||y| + |a||y − b|


ϵ ϵ ϵ ϵ
≤ (|b| + 1) + |a| < + = ϵ.
2(|b| + 1) 2(|a| + 1) 2 2
This proves the continuity of f1 and f2 . As for g, to show that limx→a 1/x = 1/a
for a ̸= 0, we observe that
1 1 a−x
− = .
x a ax
Given ϵ > 0, let δ be the smaller of the numbers 12 |a| and 12 ϵa2 . If |x − a| < δ, then
|a| ≤ |a − x| + |x| < 21 |a| + |x| and hence |x| > 21 |a|, so
) ) ) ) ) )
) x − a ) ) ϵa2 )
) )<) ) = ϵ )) a )) < ϵ,
) ax ) ) 2ax ) 2x
as desired.

1.11 Corollary. The function f3 (x, y) = x − y is continuous on R2 , and the func-


tion f4 (x, y) = x/y is continuous on {(x, y) : y ̸= 0}.
1.3. Limits and Continuity 17

Proof. With notation as in Theorem 1.10, we have f4 (x, y) = f2 (x, g(y)), so f4 is


the composition of continuous mappings and hence is continuous on the set where
y ̸= 0. Likewise, f3 (x, y) = f1 (x, f2 (−1, y)), so f3 is continuous. (Alternatively,
continuity for f3 may be proved in exactly the same way as for f1 .)

1.12 Corollary. The sum, product, or difference of two continuous functions is


continuous; the quotient of two continuous functions is continuous on the set where
the denominator is nonzero.

Proof. Combine Theorem 1.10 and Corollary 1.11 with Theorem 1.9. For example,
if f and g are continuous functions on U ⊂ Rn , then f + g is continuous because
it is the composition of the continuous map (f, g) from U to R2 and the continuous
map (x, y) 3→ x+y from R2 to R. Likewise for the other arithmetic operations.

The elementary functions of a single variable (polynomials, trig functions, ex-


ponential functions, etc.) are all continuous on their domains of definition, and
elementary functions of several variables are generally built up out of functions of
one variable by the arithmetic operations and composition. The preceding results
therefore allow the continuity of such functions to be established almost immedi-
sin(3x + 2y)
ately in most cases. For example, the function ϕ(x, y) = is contin-
x2 − y
uous everywhere except along the parabola y = x2 , because it is built up from the
continuous functions of one variable 3x, 2y, x2 , and −y by taking sums (3x + 2y
and x2 − y), composing with the sine function (sin(3x + 2y)), and then taking a
quotient. For another example, the function ψ(x, y) = xy , defined on the region
where x > 0, is continuous there, because it can be rewritten as ψ(x, y) = ey log x ,
which is assembled from the (continuous) exponential and logarithmic functions
and the operation of multiplication (y · log x). Similarly, the functions in Examples
1 and 2 are continuous everywhere except at the origin.
Let us look at one more example:
xy(x2 − y 2 )
E XAMPLE 3. Let h(x, y) = for (x, y) ̸= (0, 0) and h(0, 0) = 0.
x2 + y 2
Evaluate lim(x,y)→(2,3) h(x, y) and lim(x,y)→(0,0) h(x, y). Is h continuous at
(0, 0)?
Solution. The first limit is easy: Clearly h is continuous everywhere except
30
at the origin, so lim(x,y)→(2,3) h(x, y) = h(2, 3) = 6(4 − 9)/(4 + 9) = − 13 .
The behavior of h at the origin requires a closer examination. Since h(x, 0) = 0
for all x, if the limit exists it must equal 0. Experimentation with lines and
parabolas as in Examples 1 and 2 fails to yield any evidence to the contrary.
18 Chapter 1. Setting the Stage

In fact, the limit is 0, and this can be established with a little ad hoc estimat-
ing. Clearly |x2 − y 2 | ≤ x2 + y 2 , so |h(x, y)| ≤ |xy|. But xy → 0 as
(x, y) → (0, 0), so h(x, y), being even smaller in absolute value than xy, must
also approach 0. Thus lim(x,y)→(0,0) h(x, y) = 0 and h is continuous at (0, 0).

We now establish the relation between inequalities on continuous functions and


open and closed sets that was mentioned at the end of the preceding section.

1.13 Theorem. Suppose f : Rn → Rk is continuous and U is a subset of Rk , and


let S = {x ∈ Rn : f (x) ∈ U }. Then S is open if U is open, and S is closed if U is
closed.

Proof. Suppose U is open. We shall show that S is open by showing that every
point a in S is an interior point of S. If a ∈ S, then f (a) ∈ U . Since U is open,
some ball centered at f (a) is contained in U ; that is, there is a positive number ϵ
such that every y ∈ Rk such that |y − f (a)| < ϵ is in U . Since f is continuous,
there is a positive number δ such that |f (x) − f (a)| < ϵ whenever |x − a| < δ.
But this means that f (x) ∈ U whenever |x − a| < δ, that is, x ∈ S whenever
|x − a| < δ. Thus a is an interior point of S.
On the other hand, suppose U is closed. Then the complement of U in R is open
by Proposition 1.4b, so the set S ′ = {x : f (x) ∈ U c } is open by the argument just
given. But S ′ is just the complement of S in Rn , so S is closed by Proposition 1.4b
again.

The result about the openness or closedness of sets defined by inequalities or


equations at the end of §1.2 is a corollary of Theorem 1.13. For example, if f :
Rn → R is a continuous function, the set {x : f (x) > 0} (resp.2 {x : f (x) = 0})
is of the form {x : f (x) ∈ U } where U = (0, ∞) (resp. U = {0}), and this U is
open (resp. closed).
Theorem 1.13 can be generalized to functions that are only defined on subsets of
Rn ; with notation as above, the correct statement is that if U is open (resp. closed)
then S is the intersection of the domain of f with an open (resp. closed) set. (For
example, the set {x ∈ R : log x ≤ 0}, namely (0, 1], is the intersection of the
domain of log, namely (0, ∞), with the closed set [0, 1]. On the other hand, the set

{x ∈ R : x < 1}, namely [0, 1), is the intersection of the domain of the square
root function, namely [0, ∞), with the open set (−1, 1).) In particular, if U and the
domain of f are both open (resp. closed), then so is S.
The converse of Theorem 1.13 is also true; see Exercise 8.
2
“resp.” is an abbreviation for “respectively.”
1.4. Sequences 19

EXERCISES
1. For the following functions f , show that lim(x,y)→(0,0) f (x, y) does not ex-
ist.
x2 + y
a. f (x, y) = '
x2 + y 2
x
b. f (x, y) = 4
x + y4
x4 y 4
c. f (x, y) = 2
(x + y 4 )3
2. For the following functions f , show that lim(x,y)→(0,0) f (x, y) = 0.
x2 y 2 3x5 − xy 4
a. f (x, y) = b. f (x, y) =
x2 + y 2 x4 + y 4
3. Let f (x, y) = x−1 sin(xy) for x ̸= 0. How should you define f (0, y) for
y ∈ R so as to make f a continuous function on all of R2 ?
4. Let f (x, y) = xy/(x2 + y 2 ) as in Example 1. Show that, although f is dis-
continuous at (0, 0), f (x, a) and f (a, y) are continuous functions of x and y,
respectively, for any a ∈ R (including a = 0). We say that f is separately
continuous in x and y.
5. Let f (x, y) = y(y − x2 )/x4 if 0 < y < x2 , f (x, y) = 0 otherwise. At which
point(s) is f discontinuous?
6. Let f (x) = x if x is rational, f (x) = 0 if x is irrational. Show that f is
continuous at x = 0 and nowhere else.
7. Let f (x) = 1/q if x = p/q where p and q are integers with no common factors
and q > 0, and f (x) = 0 if x is irrational. At which points, if any, is f
continuous?
8. Suppose f : Rn → Rk has the following property: For any open set U ⊂ Rk ,
{x : f (x) ∈ U } is an open set in Rn . Show that f is continuous on Rn . Show
also that the same result holds if “open” is replaced by “closed.”
9. Let U and V be open sets in Rn and let f be a one-to-one mapping from U onto
V (so that there is an inverse mapping f −1 : V → U ). Suppose that f and f −1
are both continuous. Show that for any set S such that S ⊂ U and f (S) ⊂ V
we have f (∂S) = ∂(f (S)).

1.4 Sequences
Generally speaking, a sequence is a collection of mathematical objects that is in-
dexed by the positive integers. The objects in question can be of any sort, such as
20 Chapter 1. Setting the Stage

numbers, n-dimensional vectors, sets, etc. If the kth object in the sequence is Xk ,
the sequence as a whole is usually denoted by {Xk }∞ ∞
k=1 , or just by {Xk }1 or even
{Xk } if there is no possibility of confusion. (We shall comment further on this
notation below.) Alternatively, we can write out the sequence as X1 , X2 , X3 , . . ..
We speak of a sequence in a set S if the objects of the sequence all belong to S.
E XAMPLE 1.
a. A sequence of numbers: 1, 4, 9, 16, . . .. The kth term in the sequence is k2 ,
and the sequence as a whole may be written as {k2 }∞ 1 .
b. A sequence of intervals: (−1, 1), (− 12 , 21 ), (− 13 , 13 ), (− 41 , 14 ), . . .. The kth
term in the sequence is the interval (− k1 , k1 ), and the sequence as a whole
may be written as {(− k1 , k1 )}∞
1 .

Sequences can be defined by formulas, as in the examples above: xk = k2 , or


Ik = (− k1 , k1 ). They can also be defined by recursion (or induction), that is, by
specifying the first term or the first few terms and then giving a rule that tells how
to obtain the kth term from the preceding ones.
E XAMPLE 2. The Fibonacci sequence is the sequence

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, . . . ,

in which the first two terms are equal to 1 and each of the remaining terms is
the sum of the two preceding ones (that is, xk = xk−2 + xk−1 ).
E XAMPLE 3. Define a sequence {xk } as follows: x1 is a given positive integer
a. If xk is odd, then xk+1 = 3xk + 1; if xk is even, then xk+1 = xk /2. For
example, if a = 13, the sequence is

13, 40, 20, 10, 5, 16, 8, 4, 2, 1, 4, 2, 1, 4, 2, 1, . . . ,

ending in the infinite repetition of (4, 2, 1). It is a famous unsolved problem (as
of this writing) to prove or disprove that this sequence eventually ends in the
repeating figure (4, 2, 1) no matter what initial number a is chosen. (Try a few
values of a to see how it works! For more information, see Lagarias [13].)
It is convenient to make the definition of sequence a little more flexible by
allowing the index k to begin with something other than 1. Thus, we may speak of a
sequence {Xk }∞ ∞
0 whose objects are X0 , X1 , X2 , . . ., or a sequence {Xk }7 , whose
objects are X7 , X8 , X9 , . . .. We may also speak of a finite sequence whose terms
are indexed by a finite collection of integers, such as {Xk }81 (a finite sequence of
eight terms), or a doubly infinite sequence whose terms are indexed by the whole
set of integers: {Xk }∞−∞ .
1.4. Sequences 21

Strictly speaking, a sequence in a set S is a rule that assigns to each positive


integer (or each integer in some other suitable set, as indicated above) an element
of S, in other words, a function or mapping from the positive integers to S. The
common functional notation would be to write X(k) instead of Xk for the value of
this mapping at the integer k, but for sequences it is customary to write the input
variable k as a subscript.
It is sometimes necessary to distinguish between the sequence {Xk }∞ 1 and the
set of values (i.e., the range) of the sequence, because a sequence may assume the
same value many times. For example, consider the sequence of numbers ak =
(−1)k . Then the sequence {ak }∞ 1 is the function on the positive integers whose
values are alternately −1 and +1, which may be written out as

−1, 1, −1, 1, −1, 1, . . . ,

but its set of values is just the two-element set {−1, 1}. Since curly brackets are
commonly used to specify sets (as we just did with {−1, 1}), the notation {Xk }∞ 1
for a sequence invites confusion with the set whose elements are the Xk ’s, and for
this reason some authors use other notations such as ⟨Xk ⟩∞ 1 . However, the notation
{Xk }∞ 1 is by far the most common one, and in practice it rarely causes problems,
so we shall stick with it.
For the remainder of this section we shall be concerned with sequences of num-
bers or n-dimensional vectors. We reserve the letter n for the dimension and use
letters such as k and j for the index on a sequence. Thus, for example, if {xk } is a
sequence in Rn , the components of the vector xk are (xk1 , . . . , xkn ).
A sequence {xk } in Rn is said to converge to the limit L if for every ϵ > 0
there is an integer K such that |xk − L| < ϵ whenever k > K; otherwise, {xk }
diverges. If {xk } converges to L, we write xk → L or L = limk→∞ xk .
We say that limk→∞ xk = ∞ (or +∞) if for every C > 0 there is an integer
K such that xk > C whenever k > K, and limk→∞ xk = −∞ if for every C > 0
there is an integer K such that xk < −C whenever k > K. (However, a sequence
whose limit is ±∞ is still called divergent.)
It follows easily from the estimates (1.3) that xk → L if and only if each
component of xk converges to the corresponding component of L, that is, xkm →
Lm for 1 ≤ m ≤ n. The study of convergence of sequences of vectors is thus
reducible to the study of convergence of numerical sequences.

E XAMPLE 4.
a. The sequence {1/k} converges to 0, since |(1/k) − 0| < ϵ whenever k >
(1/ϵ).
b. The sequence {k2 } diverges; more precisely, limk→∞ k2 = ∞.
22 Chapter 1. Setting the Stage

c. The sequence {xk } = {(−1)k } diverges, but the subsequence {yj } =


{x2j−1 } of odd-numbered terms converges to −1, and the subsequence
{zj } = {x2j } of even-numbered terms converges to 1.
E XAMPLE 5. If C is any positive number, C k /k! → 0 as k → ∞ (that is, k!
grows faster than exponentially as k → ∞). Indeed, pick an integer K > 2C.
For k > K, we then have

Ck CK C C C CK 1 1 1 CK 1
0< = · · ··· < · · ··· = · .
k! K! K + 1 K + 2 k K! 2 2 2 K! 2k−K

But C K /K! is a fixed number, and 1/2k−K → 0 as k → ∞.

Sequential convergence is often a useful tool in studying questions relating to


open and closed sets, continuity, and related matters. The fundamental results are
the following two theorems.

1.14 Theorem. Suppose S ⊂ Rn and x ∈ Rn . Then x belongs to the closure of S


if and only if there is a sequence of points in S that converges to x.

Proof. If {xk } is a sequence in S that converges to x, then every neighborhood of


x contains elements of S — namely, xk where k is sufficiently large — so x is in
the closure of S. Conversely, suppose x is in the closure of S. If x is in S itself, let
xk = x for all k. If not, for each k the ball of radius 1/k about x contains points
of S; pick one and call it xk . In either case, {xk } is a sequence of points in S that
converges to x.

1.15 Theorem. Given S ⊂ Rn , a ∈ S, and f : S → Rm , the following are


equivalent:
a. f is continuous at a.
b. For any sequence {xk } in S that converges to a, the sequence {f (xk )} con-
verges to f (a).

Proof. Suppose f is continuous at a and xk → a. Given ϵ > 0, we wish to show


that |f (xk ) − f (a)| < ϵ provided k is sufficiently large. But by the continuity of
f , there exists δ > 0 such that |f (xk ) − f (a)| < ϵ when |xk − a| < δ, and since
xk → a, there exists an integer K such that |xk − a| < δ whenever k > K.
Combining these, we get |f (xk ) − f (a)| < ϵ whenever k > K, as desired.
On the other hand, suppose f is not continuous at a. This means that there
exists ϵ > 0 such that for every δ > 0 there is a point x ∈ S with |x − a| < δ but
|f (x)−f (a)| ≥ ϵ. Taking δ equal to 1, 12 , 13 , . . ., we see that for each positive integer
k there is a point xk ∈ S such that |xk − a| < k−1 but |f (xk ) − f (a)| ≥ ϵ. The
1.4. Sequences 23

sequence {xk } then converges to a, but the sequence {f (xk )} does not converge to
f (a).
We have shown that if (a) is true then (b) is true, and that if (a) is false then (b)
is false, so the proof is complete.

EXERCISES

1. For each of the following sequences {xk }, find the limit or show that the se-
quence diverges.

2k + 1 sin k kπ
a. xk = √ . b. xk = . c. xk = sin .
2 k+1 k 3

3k + 4
2. Let xk = ; then limk→∞ xk = 3. Given ϵ > 0, find an integer K so
k−5
that |xk − 3| < ϵ whenever k > K.
3. Define a sequence {xk } recursively by x1 = 1 and xk+1 = kxk /(k + 1) for
k ≥ 1. Find an explicit formula for xk . What is limk→∞ xk ?
4. Let {xk } and {yk } be sequences in R such that xk → a and yk → b. Show that
xk + yk → a + b and xk yk → ab. (Use Theorems 1.10 and 1.15.)
5. Given f : Rn → Rm ; show that limx→a f (x) = l if and only if f (xk ) → l for
every sequence {xk } that converges to a. (Adapt the proof of Theorem 1.15.)

A point a ∈ Rn is called an accumulation point of a set S ⊂ Rn if every neigh-


borhood of a contains infinitely many points of S. (The point a itself may or may
not belong to S. Some people use the terms “limit point” or “cluster point” instead
of “accumulation point.”) For example, the accumulation points of the interval
(−1, 1) in R are the points in the closed interval [−1, 1], and the only accumulation
point of the set {1, 21 , 31 , 14 , . . .} is 0.

6. Show that a is an accumulation point of S if and only if there is a sequence


{xk } of points in S, none of which are equal to a, such that xk → a. (Adapt
the proof of Theorem 1.14.)
7. Show that the closure of S is the union of S and the set of all its accumulation
points.
24 Chapter 1. Setting the Stage

1.5 Completeness
The essential properties of the real number system that underlie all the theorems of
calculus are summarized by saying that R is a complete ordered field. We explain
the meaning of these terms one by one:
A field is a set on which the operations of addition, subtraction, multiplication,
and division (by any nonzero number) are defined, subject to all the usual laws of
arithmetic: commutativity, associativity, etc. Besides the real numbers, examples of
fields include the rational numbers and the complex numbers, and there are many
others. (For more precise definitions and more examples, consult a textbook on
abstract algebra such as Birkhoff and Mac Lane [4] or Hungerford [8].)
An ordered field is a field equipped with a binary relation < that is transitive
(if a < b and b < c, then a < c) and antisymmetric (if a ̸= b, then either a < b or
b < a, but not both), and interacts with the arithmetic operations in the usual way
(if a < b then a + c < b + c for any c, and also ac < bc if c > 0). The real number
and rational number systems are ordered fields (with the usual meaning of “<”),
but the complex number system is not.
Finally, completeness is what distinguishes the real numbers from the smaller
ordered fields such as the rational numbers and makes possible the transition from
algebra to calculus; it means that there are “no holes” in the real number line. There
are several equivalent ways of stating the completeness property precisely. The one
we shall use as a starting point is the existence of least upper bounds.
If S is a subset of R, an upper bound for S is a number u such that x ≤ u for
all x ∈ S, and a lower bound for S is a number l such that x ≥ l for all x ∈ S.
The Completeness Axiom. Let S be a nonempty set of real numbers. If S has
an upper bound, then S has a least upper bound, called the supremum of S and
denoted by sup S. If S has a lower bound, then S has a greatest lower bound,
called the infimum of S and denoted by inf S.
If S has no upper bound, we shall define sup S to be +∞, and if S has no lower
bound, we shall define inf S to be −∞.
E XAMPLE 1.
a. If S is the interval (0, 1], then sup S = 1 and inf S = 0.
b. If S = {1, 21 , 13 , 14 , . . .}, then sup S = 1 and inf S = 0.
c. If S = {1, 2, 3, 4, . . .}, then sup S = ∞ and inf S = 1.
d. If S is the single point a, then sup S = inf S = a. √
S = {x : x is rational and x2 < 2}, then sup S = 2 and inf S =
e. If √
− 2. This is an example of a set of rational numbers that has no supremum
or infimum within the set of rational numbers.
1.5. Completeness 25

If S has an upper bound, the number a = sup S is the unique number such
that
i. x ≤ a for every x ∈ S and
ii. for every ϵ > 0 there exists x ∈ S with x > a − ϵ.
(i) expresses the fact that a is an upper bound, whereas (ii) expresses the fact that
there is no smaller upper bound. In particular, while sup S may or may not belong
to S itself, it always belongs to the closure of S. Similarly for inf S if S is bounded
below.
The completeness of the real number system plays a crucial role in establishing
the convergence of numerical sequences. The most basic result along these lines is
the following. First, some terminology: A sequence {xk } is called bounded if all
the numbers xn are contained in some bounded interval. A sequence {xn } is called
increasing if xn ≤ xm whenever n ≤ m, and decreasing if xn ≥ xm whenever
n ≤ m. A sequence that is either increasing or decreasing is called monotone (or
monotonic).

1.16 Theorem (The Monotone Sequence Theorem). Every bounded monotone se-
quence in R is convergent. More precisely, the limit of an increasing (resp. decreas-
ing) sequence is the supremum (resp. infimum) of its set of values.

Proof. Suppose {xk } is a bounded increasing sequence. Let l be the supremum of


the set of values {x1 , x2 , . . .}; I claim that xk → l. Since l is an upper bound, we
have xk ≤ l for all k. On the other hand, since l is the least upper bound, for any
ϵ > 0 there is some K for which xK > l − ϵ. Since the xk ’s increase with k, we
also have xk > l − ϵ for all k > K. Therefore, l − ϵ < xk ≤ l for all k > K, and
this shows that xk → l.
Similarly, if {xk } is decreasing, it converges to inf{x1 , x2 , . . .}.

E XAMPLE 2. Given a positive real number a, define a sequence {xk } recur-


sively as follows. x1 is some fixed positive real number, and for k ≥ 2,
- .
1 a
xk = xk−1 + .
2 xk−1

Observe that if xk−1 > 0 then xk > 0 too; since we assume that x1 > 0,
every term of this sequence is positive. (In particular, division by zero is never

a problem.) We claim that xk → a, no matter what initial x1 is chosen.
Indeed, if we assume that the sequence converges to a nonzero limit L, by
letting k → ∞ in the recursion formula we see that
15 a6
L= L+ , or L2 = 21 L2 + 12 a,
2 L
26 Chapter 1. Setting the Stage

so that L2 = a. Since xk > 0 for every k, we must have L > 0, and hence

L = a. But this argument is without force until we know that {xk } converges
to a nonzero limit.
To verify this, observe that for k ≥ 2,

x2k = 41 (x2k−1 + 2a + a2 x−2 1 2 2 −2


k−1 ) = a + 4 (xk−1 − 2a + a xk−1 )
= a + 41 (xk−1 − ax−1 2
k−1 ) > a.

Thus, starting with the second term, the sequence {xk } is bounded below by

a > 0, and it is decreasing:

xk+1 − xk = 21 (ax−1 1
k − xk ) < 2 (xk − xk ) = 0.

The convergence to a limit L ≥ a now follows from the monotone sequence
theorem. (The verification that {xk } converges is not just a formality; see
Exercise 4.)
The sequence {xk } gives a computationally efficient recursive algorithm
for computing square roots.

The following consequence of the monotone sequence theorem is also a useful


technical tool.

1.17 Theorem (The Nested Interval Theorem). Let I1 = [a1 , b1 ], I2 = [a2 , b2 ],


. . . be a sequence of closed, bounded intervals in R. Suppose that (a) I1 ⊃ I2 ⊃
I3 ⊃ · · · , and (b) the length bk − ak of Ik tends to 0 as k → ∞. Then there is
exactly one point contained in all of the intervals Ik .

Proof. The condition I1 ⊃ I2 ⊃ I3 ⊃ · · · means that a1 ≤ a2 ≤ a3 ≤ · · · and


b1 ≥ b2 ≥ b3 ≥ · · · , so the sequences {ak } and {bk } are monotone. They are also
bounded, since all ak and bk are contained in I1 ; hence, by the monotone sequence
theorem, they are both convergent. Moreover, since bk − ak → 0, their limits are
equal. Call their common limit l. Then ak ≤ l ≤ bk for all k, so l ∈ Ik for all
n. No other point l′ can be common to all Ik , for the length of Ik is less than the
distance |l − l′ | when k is sufficiently large.

It should be emphasized
$ that the real point of the nested interval theorem is that
the intersection ∞ I
1 n is nonempty; the fact that it can contain no more than one
point is pretty obvious from the assumption that the length of In tends to zero.
If {xk } is a sequence (in any set, not necessarily R), we may form a subse-
quence of {xk } by deleting some of the terms and keeping the rest in their original
order. More precisely, a subsequence of {xk } is a sequence {xkj }∞ j=1 specified
1.5. Completeness 27

by a one-to-one, increasing map j → kj from the set of positive integers into it-
self. For example, by taking kj = 2j we obtain the subsequence of even-numbered
terms; by taking kj = j 2 we obtain the subsequence of those terms whose index is
a perfect square, and so on.
The following theorem is one of the most useful results in the foundations of
analysis; it is one version of the Bolzano-Weierstrass theorem, whose general form
will be found in Theorem 1.21.
1.18 Theorem. Every bounded sequence in R has a convergent subsequence.
Proof. Let {xk } be a bounded sequence, say xk ∈ [a, b] for all k. Bisect the interval
[a, b] — that is, consider the two intervals [a, 12 (a + b)] and [ 12 (a + b), b]. At least
one of these subintervals must contain xk for infinitely many k; call that subinterval
I1 . (If both of them contain xk for infinitely many k, pick the one on the left.) Now
bisect I1 . Again, one of the two halves must contain xk for infinitely many k; call
that half I2 . Proceeding inductively, we obtain a sequence of intervals Ij , each one
contained in the preceding one, each one half as long as the preceding one, and
each one containing xk for infinitely many k. By the nested interval theorem, there
is exactly one point l contained in every Ij .
It is now easy to construct a subsequence of {xk } that converges to l, as follows.
Pick an integer k1 such that xk1 ∈ I1 , then pick k2 > k1 such that xk2 ∈ I2 , then
pick k3 > k2 such that xk3 ∈ I3 , and so forth. By construction of the Ij ’s, this
process can be continued indefinitely. Since xkj and l are both in Ij , and the length
of Ij is 2−j (b − a), we have |xkj − l| ≤ 2−j (b − a), which tends to 0 as j → ∞;
that is, xkj → l.
Theorem 1.18 generalizes easily to higher dimensions:
1.19 Theorem. Every bounded sequence in Rn has a convergent subsequence.
Proof. If |xk | ≤ C for all k, then the components xk1 , . . . , xkn all lie in the interval
[−C, C]. Hence, for each m = 1, . . . , n we can extract a convergent subsequence
from the sequence of mth components, {xkm }∞ k=1 . The trouble is that the indices
on these subsequences might all be different, so we can’t put them together. (We
might have chosen the odd-numbered terms for m = 1 and the even-numbered
terms for m = 2, for example.) Instead, we have to proceed inductively. First
we choose a subsequence {xkj } such that the first components converge; then we
choose a sub-subsequence {xkji } whose second components also converge, and so
on until we find a (sub)n sequence whose components all converge.
Another way to express the completeness of the real number system is to say
that every sequence whose terms get closer and closer to each other actually con-
verges. To be more precise, a sequence {xk } in Rn is called a Cauchy sequence if
28 Chapter 1. Setting the Stage

xk − xj → 0 as k, j → ∞, that is, if for every ϵ > 0 there exists an integer K such


that |xk − xj | < ϵ whenever k > K and j > K.

1.20 Theorem. A sequence {xk } in Rn is convergent if and only if it is Cauchy.

Proof. Suppose xk → l. Since xk − xj = (xk − l) − (xj − l), we have 0 ≤


|xk − xj | ≤ |xk − l| + |xj − l|. Both terms on the right tend to zero as k, j → ∞;
hence so does xk − xj . Thus {xk } is Cauchy.
Now suppose {xk } is Cauchy. Taking ϵ = 1 in the definition of “Cauchy,”
we see that there is an integer K such that |xk − xj | < 1 if k, j > K. Then
|xk | < |xK+1 |+ 1 for all k > K, and it follows that the sequence {xk } is bounded.
By Theorem 1.18, there is a subsequence {xkj } that converges to a limit l. But then
since {xk } is Cauchy, the whole sequence must also converge to l. Indeed, given
ϵ > 0, there is an integer J such that |xkj − l| < 21 ϵ if j > J, and there is an integer
K such that |xk − xm | < 12 ϵ if k, m > K. Pick an integer j > J such that kj > K;
then for k > K we have

|xk − l| ≤ |xk − xkj | + |xkj − l| < 12 ϵ + 21 ϵ = ϵ.

Therefore, xk → l.

EXERCISES

1. Find sup S and inf S for the following sets S. Do these numbers belong to S
or not?
a. S = {x : (2x2 − 1)(x2 − 1) < 0}.
b. S = {(−1)k + 2−k : k ≥ 0}.
c. S = {x : arctan x ≥ 1}.
2. Construct a sequence {xk } that has subsequences converging to three different
limits.
3. Consider the sequence 12 , 13 , 23 , 14 , 24 , 34 , 15 , 25 , 53 , 45 , . . ., obtained by listing the
rational numbers in (0, 1) with denominator n in increasing order, for n succe-
sively equal to 2, 3, 4, . . .. Show that for any a ∈ [0, 1], there is a subsequence
that converges to a. (Hint: Consider the decimal expansion of a.)
4. Given a real number a, define a sequence {xk } recursively by x1 = a, xk+1 =
x2k .
a. Show, as in Example 2, that if {xk } converges, its limit must be 0 or 1.
b. For which a is the limit equal to 0? equal to 1? nonexistent?
1.5. Completeness 29
√ √
5. Define a sequence {xk } recursively by x1 = 2, xk+1 = 2 + xk . Show by
induction that (a) xk < 2 and (b) xk < xk+1 for all k. Then show that lim xk
exists and evaluate it.
6. Let rk be the ratio of the (k + 1)th term to the kth term of the Fibonacci
sequence (Example 2, §1.4). (Thus the first few rk ’s are 1, 2, 32 , 53 , . . .) Our

object is to show that limk→∞ rk is the “golden ratio” ϕ = 12 (1 + 5), the
positive root of the equation x2 = x + 1.
a. Show that
rk + 1 2rk + 1
rk+1 = , rk+2 = .
rk rk + 1
b. Show that rk < ϕ if k is odd and rk > ϕ if k is even. Then show that
rk+2 − rk is positive if k is odd and negative if k is even. (Hint: For x > 0
we have x2 < x + 1 if x < ϕ and x2 > x + 1 if x > ϕ.)
c. Show that the subsequences {r2j−1 } and {r2j } of odd- and even-numbered
terms both converge to ϕ.
7. Let {xk } be a sequence in Rn and x a point in Rn . Show that some subsequence
of {xk } converges to x if and only if every ball centered at x contains xk for
infinitely many values of k.
8. Show that every infinite bounded set in Rn has an accumulation point. (See
Exercises 6–7 in §1.4.)

Let {xk }∞
1 be a bounded sequence in R. For m = 1, 2, 3, . . . , let

Ym = sup{xm , xm+1 , xm+2 , . . .}, ym = inf{xm , xm+1 , xm+2 , . . .}.

Then the sequence {Ym } is bounded and decreasing, and {ym } is bounded and
increasing (because the sup and inf are being taken over fewer and fewer numbers
as m increases), so they both converge. The limits lim Ym and lim ym are called
the limit superior and limit inferior of the sequence {xk }, respectively; they are
denoted by lim supk→∞ xk and lim inf k→∞ xk :
/ 0 / 0
lim sup xk = lim sup{xk : k ≥ m} , lim inf xk = lim inf{xk : k ≥ m} .
k→∞ m→∞ k→∞ m→∞

The following exercises pertain to these ideas.

9. Show that lim sup xk is the number a uniquely specified by the following prop-
erty: For any ϵ > 0, there are infinitely many k for which xk > a − ϵ but only
finitely many for which xk > a + ϵ. What is the corresponding condition for
lim inf xk ?
30 Chapter 1. Setting the Stage

10. Show that there is a subsequence of {xk } that converges to lim sup xk , and
one that converges to lim inf xk .
11. Show that if a ∈ R is the limit of some subsequence of {xk }, then lim inf xk ≤
a ≤ lim sup xk .
12. Show that {xk } converges if and only if lim sup xk = lim inf xk , in which
case this common value is equal to lim xk .

1.6 Compactness
A subset of Rn is called compact if it is both closed and bounded. (Note: The
notion of compactness can be extended to settings other than Rn , but a different
definition must be adopted; see the concluding paragraph of this section.) Com-
pactness is an important property, principally because it yields existence theorems
for limits in many situations. The fundamental result is the following theorem.

1.21 Theorem (The Bolzano-Weierstrass Theorem). If S is a subset of Rn , the


following are equivalent:
a. S is compact.
b. Every sequence of points in S has a convergent subsequence whose limit lies
in S.

Proof. Suppose S is compact. If {xk } is a sequence in S, it has a convergent sub-


sequence by Theorem 1.19 since S is bounded, and the limit lies in S by Theorem
1.14 since S is closed; thus (b) holds.
On the other hand, suppose S is not compact, i.e., S is either not closed or not
bounded. If S is not bounded, there is a sequence of points {xk } in S such that
|xk | → ∞. But then {xk } has no convergent subsequence, as any subsequence
must also satisfy |xkj | → ∞. If S is not closed, there is a point x that lies in S
but not in S. By Theorem 1.14 there is a sequence {xk } in S that converges to x.
Every subsequence also converges to x, which is not in S. Thus (b) is false if S is
either not closed or not bounded.

Remark. Every finite subset of Rn is obviously compact. If S is finite, (b) is


true because if {xk } is a sequence in S, then there must be a single point x ∈ S
such that xk = x for infinitely many k; the subsequence consisting of those xk ’s
trivially converges to x.

The Bolzano-Weierstrass theorem paves the way to the fundamental connection


between continuity and compactness:
1.6. Compactness 31

1.22 Theorem. Continuous functions map compact sets to compact sets. That is,
suppose that S is a compact subset of Rn and f : S → Rm is continuous at every
point of S. Then the set % &
f (S) = f (x) : x ∈ S
is also compact.

Proof. Suppose {yk } is a sequence in the image f (S). For each k there is a point
xk ∈ S such that yk = f (xk ). Since S is compact, by the Bolzano-Weierstrass
theorem the sequence {xk } has a convergent subsequence {xkj } whose limit a
lies in S. Since f is continuous at a, by Theorem 1.15 the sequence {ykj } =
{f (xkj )} converges to the point f (a) ∈ f (S). Thus, every sequence in f (S) has a
subsequence whose limit lies in f (S). By the Bolzano-Weierstrass theorem again,
f (S) is compact.

It is not true, in general, that continuous functions map closed sets to closed
sets, or bounded sets to bounded sets. (See Exercises 1–2.) Only the combination
of closedness and boundedness is preserved.
An immediate consequence of Theorem 1.22 is the fundamental existence the-
orem for maxima and minima of real-valued functions.

1.23 Corollary (The Extreme Value Theorem). Suppose S ⊂ Rn is compact and


f : S → R is continuous. Then f has an absolute minimum value and an absolute
maximum value on S; that is, there exist points a, b ∈ S such that f (a) ≤ f (x) ≤
f (b) for all x ∈ S.

Proof. By Theorem 1.22, the set f (S) is a compact subset of R. Thus, it is


bounded, so inf f (S) and sup f (S) exist, and closed, so inf f (S) and sup f (S)
actually belong to f (S). But this says precisely that the set of values of f on S has
a smallest and a largest element, as desired.

The assumption that S is compact is necessary. If S is not closed or not


bounded, the function f might be unbounded, or its extreme values might occur
at points on the boundary of S that are not in S or “at infinity.” Here are a few
simple counterexamples with n = 1:

• f (x) = x, S = (0, 1). (The extreme values occur on the boundary.)

• f (x) = cot πx, S = (0, 1). (The values of f range from −∞ to ∞.)

• f (x) = arctan x, S = R. (f approaches but does not achieve the extreme


values ± 12 π.)
32 Chapter 1. Setting the Stage

• f (x) = 3x − x3 , S = R. (f has a local maximum at x = 1 and a local


minimum at x = −1, but no absolute maximum or minimum.)

Compactness also has another consequence that turns out to be extremely useful
in more advanced mathematical analysis, although its significance may not be very
clear at first sight. (It will not be used elsewhere in this book except in some of the
technical arguments in Appendix B, so it may be regarded as an optional topic.)
Suppose S is a subset of Rn . A collection U of subsets of Rn is called a covering
of S if S is contained in the union of the sets in U. For example, for each x ∈ S
we could pick an open ball Bx centered at x; then U = {Bx : x ∈ S} is a covering
of S.

1.24 Theorem (The Heine-Borel Theorem). If S is a subset of Rn , the following


are equivalent:
a. S is compact.
b. If U is any covering of S by open sets, there is a finite subcollection of U that
still forms a covering of S. (In brief: Every open covering of S has a finite
subcovering.)

Proof. The proof is given in Appendix B.1 (Theorem B.1).

Much of what we have done in this section and the preceding ones can be
generalized from subsets of Rn to subsets of more general spaces equipped with a
“distance function” that behaves more or less like the Euclidean distance d(x, y) =
|x − y|. (Such spaces are known as metric spaces; see DePree and Swartz [5],
Krantz [12], or Rudin [18].) For example, in studying the geometry of a surface
S in R3 , one might want to take the “distance” between two points x, y ∈ S to
be not the straight-line distance |x − y| but the length of the shortest curve on S
that joins x to y. Another class of examples is provided by spaces of functions,
where the “distance” between two functions f and g can be measured in a number
of different ways; we shall say more about this in Chapter 8. In this general setting,
the Bolzano-Weierstrass and Heine-Borel theorems are no longer completely valid.
The conditions on a set S in Theorem 1.21b and Theorem 1.24b still imply that S is
closed and bounded, but not conversely. These conditions are still very important,
however, so a shift in terminology is called for. The condition in Theorem 1.24b —
that every open cover of S has a finite subcover — is usually taken as the definition
of compactness in the general setting, and the condition in Theorem 1.21b — that
every sequence in S has a subsequence that converges in S — is called sequential
compactness.
1.7. Connectedness 33

EXERCISES
1. Give an example of
a. a closed set S ⊂ R and a continuous function f : R → R such that f (S)
is not closed;
b. an open set U ⊂ R and a continuous function f : R → R such that f (U )
is not open.
2. a. Give an example of a bounded set S ⊂ R \ {0} and a real-valued function
f that is defined and continuous on R \ {0} such that f (S) is not bounded.
b. However, show that if f : Rn → Rm is continuous everywhere and S ⊂
Rn is bounded, then f (S) is bounded.
3. Show that an infinite set S ⊂ Rn is compact if and only if every infinite subset
of S has an accumulation point that lies in S. (See Exercises 6–7 in §1.4 and
Exercise 8 in §1.5.)
4. Suppose S ⊂ Rn is compact, f : S → R is continuous, and f (x) > 0 for
every x ∈ S. Show that there is a number c > 0 such that f (x) ≥ c for every
x ∈ S.
5. (A generalization of the nested interval theorem) Suppose {Sk } is a sequence
of nonempty compact subsets of Rn such that S1 ⊃ S2 ⊃ S3 ⊃ $ . . .. Show that
there is at least one point contained in all of the Sk ’s (that is, ∞ 1 Sk ̸= ∅).
(This can be done using either the Bolzano-Weierstrass theorem or the Heine-
Borel theorem. Can you find both proofs?)
6. The distance between two sets U, V ⊂ Rn is defined to be
% &
d(U, V ) = inf |x − y| : x ∈ U, y ∈ V .

a. Show that d(U, V ) = 0 if either of the sets U, V contains a point in the


closure of the other one.
b. Show that if U is compact, V is closed, and U ∩ V = ∅, then d(U, V ) > 0.
c. Give an example of two closed sets U and V in R2 that have no point in
common but satisfy d(U, V ) = 0.

1.7 Connectedness
A set in Rn is said to be connected if it is “all in one piece,” that is, if it is not the
union of two nonempty subsets that do not touch each other. The formal definition
is as follows: A set S ⊂ Rn is disconnected if it is the union of two nonempty
subsets S1 and S2 , neither of which intersects the closure of the other one; in this
34 Chapter 1. Setting the Stage

S T

F IGURE 1.2: The sets S and T in Example 1.

case we shall call the pair (S1 , S2 ) a disconnection of S. The set S is connected
if it is not disconnected.

E XAMPLE 1. Let
% & % &
S1 = (x, y) : (x + 1)2 + y 2 < 1 , S2 = (x, y) : (x − 1)2 + y 2 < 1 ,
% &
S 2 = (x, y) : (x − 1)2 + y 2 ≤ 1 .

Then the set S = S1 ∪ S2 is disconnected, for the only point common to the
closures of S1 and S2 is (0, 0), which belongs to neither S1 nor S2 . However,
the set T = S1 ∪ S 2 is connected, for (0, 0) belongs both to S 2 and the closure
of S1 ; this point “connects” the two pieces of T . See Figure 1.2.

The connected subsets of the real line are easy to describe.

1.25 Theorem. The connected subsets of R are precisely the intervals (open, half-
open, or closed; bounded or unbounded).

Proof. If S ⊂ R is not an interval, there exist a, b ∈ S and c ∈ / S such that


a < c < b. Let S1 = S ∩ (−∞, c) and S2 = S ∩ (c, ∞). Then S = S1 ∪ S2 (since
c∈ / S), and S1 and S2 are nonempty since a ∈ S1 and b ∈ S2 . The closures of
S1 and S2 are contained in (−∞, c] and [c, ∞), so the only point where they can
intersect is c, which is not in either S1 or S2 . Thus S is disconnected.
Conversely, suppose S is an interval. We shall suppose that S is disconnected
and derive a contradiction.
We first consider the case where S is compact, say S = [a, b]. Suppose (S1 , S2 )
is a disconnection of S. By relabeling if necessary, we take S2 to be the set that
contains b. Let c = sup S1 . Then c belongs to the closure of S1 , so it cannot be in
S2 ; hence c ∈ S1 . In particular, c ̸= b. But then the interval (c, b] is included in
S2 , and c is in the closure of this interval; so c is in the closure of S2 and so cannot
belong to S1 . This contradiction shows that S must be connected.
1.7. Connectedness 35

Finally, suppose S is a noncompact interval and (S1 , S2 ) is a disconnection of


S. Pick a ∈ S1 and b ∈ S2 ; then [a, b] ⊂ S since S is an interval. But then
[a, b] = T1 ∪ T2 where T1 = [a, b] ∩ S1 and T2 = [a, b] ∩ S2 . The sets T1 and T2
are nonempty (a ∈ T1 and b ∈ T2 ), and they are contained in S1 and S2 , so neither
one can intersect the closure of the other. But this means that [a, b] is disconnected,
which we have just proved to be false. Therefore, S is connected.

The following result, a cousin of Theorem 1.22, gives the basic relation between
continuity and connectedness:

1.26 Theorem. Continuous functions map connected sets to connected sets. That
is, suppose f : S → Rm is continuous at every point of S and S is connected. Then
the set % &
f (S) = f (x) : x ∈ S
is also connected.

Proof. We proceed by contraposition; that is, we assume that f (S) is disconnected


and deduce that S is disconnected. Thus, suppose that (U1 , U2 ) is a disconnection
of f (S). Let
% & % &
S1 = x ∈ S : f (x) ∈ U1 , S2 = x ∈ S : f (x) ∈ U2 .

Then S1 and S2 are nonempty, and their union is S. If there were a point x ∈ S1
belonging to the closure of S2 , x would be the limit of a sequence {xk } in S2 by
Theorem 1.14. But then f (x) ∈ U1 and f (xk ) ∈ U2 , so f (x) = lim f (xk ) would
be in the closure of U2 by Theorem 1.14 again. This is impossible; hence S1 does
not intersect the closure of S2 , and likewise, S2 does not intersect the closure of S1 .
Thus S = S1 ∪ S2 is disconnected.

1.27 Corollary (The Intermediate Value Theorem). Suppose f : S → R is continu-


ous at every point of S and V ⊂ S is connected. If a, b ∈ V and f (a) < t < f (b)
or f (b) < t < f (a), there is a point c ∈ V such that f (c) = t.

Proof. By Theorems 1.25 and 1.26, f (V ) is an interval. It contains f (a) and f (b)
and hence contains the entire interval between them.

There is another notion of connectedness that is important in many situations.


A set S ⊂ Rn is called arcwise connected (or pathwise connected) if any two
points in S can be joined by a continuous curve in S, that is, if for any a, b in S
there is a continuous map f : [0, 1] → Rn such that f (0) = a, f (1) = b, and
f (t) ∈ S for all t ∈ [0, 1].
36 Chapter 1. Setting the Stage

F IGURE 1.3: The set defined in (1.29).

It is useful to observe that the relation of being joined by a continuous curve is


transitive; that is, if there is a continuous curve in S from a to b, and one from b
to c, then there is one from a to c. Namely, if f : [0, 1] → S and g : [0, 1] → S
are continuous maps with f (0) = a, f (1) = g(0) = b, and g(1) = c, we obtain a
continuous h : [0, 1] → S by gluing f and g together:
7
f (2t) if 0 ≤ t ≤ 21 ,
h(t) =
g(2t − 1) if 21 ≤ t ≤ 1.

The following results explain the relation between connectedness and arcwise
connectedness.

1.28 Theorem. If S ⊂ Rn is arcwise connected, then S is connected.

Proof. We shall assume that S is disconnected and show that it is not arcwise con-
nected. Accordingly, suppose (S1 , S2 ) is a disconnection of S. Pick a ∈ S1 and
b ∈ S2 ; we claim that there is no continuous g : [0, 1] → S such that g(0) = a and
g(1) = b. If there were, the set V = g([0, 1]) would be connected by Theorems
1.25 and 1.26. But this cannot be so: V is the union of V ∩ S1 and V ∩ S2 ; these
sets are nonempty since a ∈ V ∩ S1 and b ∈ V ∩ S2 , and neither of them intersects
the closure of the other. Hence S is not arcwise connected.
1.7. Connectedness 37

The converse of Theorem 1.28 is false: A set can be connected without being
arcwise connected. A typical example is
% & % &
(1.29) S = (x, y) : 0 < x ≤ 2 and y = sin(π/x) ∪ (0, y) : y ∈ [−1, 1] ,

pictured in Figure 1.3. S consists of two pieces, the graph of sin(π/x) and the
vertical line segment. These two sets do not form a disconnection of S, as the line
segment is included in the closure of the graph, but a point on the line segment
cannot be connected to a point on the graph by a continuous curve. The details are
sketched in Exercise 11.
However, open connected sets are arcwise connected:

1.30 Theorem. If S ⊂ Rn is open and connected, then S is arcwise connected.

Proof. Fix a point a ∈ S. Let S1 be the set of points in S that can be joined to a
by a continuous curve in S, and let S2 be the set of points in S that cannot; thus S1
and S2 are disjoint and S = S1 ∪ S2 . We shall show that
a. if x ∈ S1 , then all points sufficiently close to x are in S1 ;
b. if x ∈ S is in the closure of S1 , then x ∈ S1 .
(a) shows that no point of S1 can be in the closure of S2 , and (b) shows that no
point in the closure of S1 can be in S2 . Thus (S1 , S2 ) will form a disconnection
of S, contrary to the assumption that S is connected, unless S2 is empty — which
means that S is arcwise connected.
To prove (a) and (b), we use the fact that S is open, so that if x ∈ S, there is
a ball B centered at x that is included in S. If x ∈ S1 , then every y ∈ B is also
in S1 , for y can be joined to a by first joining x to a and then joining y to x by
the straight line segment from x to y, which lies in B and hence in S. Similarly,
if x is in the closure of S1 , by Theorem 1.14 there is a sequence {xk } of points in
S1 that converges to x. We have xk ∈ B for k sufficiently large, so again, x can
be joined to a by joining xk to a and then joining x to xk by a line segment in B;
hence x ∈ S1 . This completes the proof.

EXERCISES

1. Show directly from the definition that the following sets are disconnected.
(That is, produce a disconnection for each of them.)
a. The hyperbola {(x, y) ∈ R2 : x2 − y 2 = 1}.
b. Any finite set in Rn with at least two elements.
c. {(x, y, z) ∈ R3 : xyz > 0}.
38 Chapter 1. Setting the Stage

2. Show that the unit sphere {(x, y, z) : x2 + y 2 + z 2 = 1} in R3 is arcwise


connected. Can you generalize your argument to show that the unit sphere in
Rn is arcwise connected for all n > 1?

3. Suppose I is an interval in R and f : I → R is continuous and one-to-one (i.e.,


f (x1 ) ̸= f (x2 ) unless x1 = x2 ). Show that f must be strictly increasing or
strictly decreasing on I.

4. Suppose S1 and S2 are connected sets in Rn that contain at least one point in
common. Show that S1 ∪ S2 is connected. Is it true that S1 ∩ S2 must be
connected?

5. Show that an open set in Rn is disconnected if and only if it is the union of two
disjoint nonempty open subsets.

6. Show that a closed set in Rn is disconnected if and only if it is the union of two
disjoint nonempty closed subsets.

7. Show that S ⊂ Rn is disconnected if and only if there is a continuous function


f : S → R such that f (S) consists of the two points 0 and 1.

8. Show that the closure of a connected set is connected.

9. Let S = {x : |x| = 1} be the unit sphere in Rn , and let f : S → R be a


continuous function. Assuming the fact that S is connected (see Exercise 2),
show that there must be a pair of diametrically opposite points on S at which f
assumes the same value. (Hint: Consider g(x) = f (x) − f (−x).)

10. Suppose S is a connected set in R2 that contains (1, 3) and (4, −1). Show that
S contains at least one point on the line x = y. (Hint: Consider f (x, y) =
x − y.)

11. Let S ⊂ R2 be given by (1.29).


a. Show that S is connected. (Hint: The curve y = sin(π/x), x > 0, is
arcwise connected. Use Exercise 8.)
b. Show that S is not arcwise connected. (Suppose f : [0, 1] → S is con-
tinuous and satisfies f (0) = (2, 0) and f (1) = (0, 1). Show that the x-
coordinate of f (t) must assume all values between 2 and 0 as t ranges from
0 to 1, and conclude that for each positive integer k there exists tk ∈ [0, 1]
such that f (tk ) = (1/2k, 0). By passing to a convergent subsequence, you
can suppose that t0 = limk→∞ tk exists. Show that the y-coordinate of
f (t) must assume all values between −1 and 1 as t ranges from tk to tk+1 ,
and derive a contradiction.)
1.8. Uniform Continuity 39

1.8 Uniform Continuity


Suppose S is a subset of Rn . We recall that a function f : S → Rm is said to be
continuous on S if, for each x ∈ S, f (y) can be made as close as we wish to f (x)
by taking y sufficiently close to x. In general, the meaning of “sufficiently close”
will depend on x: If f is nearly constant near x, we may be able to move quite
a distance away from x without changing the value of f much, but if f is rapidly
varying near x, we will need to stay close to x to ensure that the value of f remains
close to f (x). For some purposes, however, it is important to have some control
over the rate at which f (y) approaches f (x) as y approaches x that is independent
of x. Functions for which this is possible are called uniformly continuous.
More precisely, a function f : S → Rm is said to be uniformly continuous on
S if for every ϵ > 0 there is a δ > 0 so that

|f (x) − f (y)| < ϵ whenever x, y ∈ S and |x − y| < δ.

The crucial point is that for simple continuity the number δ may depend on x, but
for uniform continuity it does not. This is a rather subtle point, and the reader
should not be discouraged if its significance is not immediately clear; some very
eminent mathematicians of the past also had trouble with it!
Some readers may find it enlightening to see these conditions rewritten in a
symbolic way that makes them as concise as possible. We employ the logical sym-
bols ∀ and ∃, which mean “for all” and “there exists,” respectively. With this un-
derstanding, the condition for f to be continuous on S is that

(1.31) ∀ϵ > 0 ∀x ∈ S ∃δ > 0 : ∀y ∈ S |x − y| < δ =⇒ |f (x) − f (y)| < ϵ,

whereas the condition for f to be uniformly continuous on S is that

(1.32) ∀ϵ > 0 ∃δ > 0 : ∀x, y ∈ S |x − y| < δ =⇒ |f (x) − f (y)| < ϵ.

The difference between (1.31) and (1.32) is that the “∀x” has been interchanged
with the “∃δ,” so that in (1.31) the δ is allowed to depend on x, whereas in (1.32)
the same δ must work for every x.

E XAMPLE 1. The function f (x) = sin x is uniformly continuous on R. Indeed,


since |f ′ (x)| = | cos x| ≤ 1 for all x, the mean value theorem (reviewed in
§2.1) shows that |f (x) − f (y)| ≤ |x − y| for all x, y. Thus, we can take δ = ϵ,
independent of x: If |x − y| < ϵ, then |f (x) − f (y)| < ϵ.
E XAMPLE 2. The function g(x) = x2 is not uniformly continuous on R,
essentially because the slope of the graph at x = a increases without bound
40 Chapter 1. Setting the Stage

as a → ∞. To be more precise, let us suppose that a > 0 and h > 0. Since


g(a+h)−g(a) = 2ah+h2 > 2ah, there is no hope to get |g(a+h)−g(a)| < ϵ
unless h < ϵ/2a. Thus, the allowable δ in (1.31) at x = a must be smaller than
ϵ/2a, which gets smaller as a gets larger. On the other hand, g is uniformly
continuous on every bounded interval, because on such an interval there is a
finite upper bound for |g ′ |, and the mean value theorem can be applied as in
Example 1.

Example 2 exemplifies the typical situation, in the following sense. On a set


that is not bounded or not closed, things can get worse and worse as one goes off to
infinity or to the boundary of the set; but on a compact set such pathologies cannot
occur.

1.33 Theorem. Suppose S ⊂ Rn and f : S → Rm is continuous at every point of


S. If S is compact, then f is uniformly continuous on S.

Proof. Suppose f is not uniformly continuous on S; we shall derive a contradiction.


The negation of the uniform continuity condition (1.32) is that

∃ϵ > 0 ∀δ > 0 ∃x, y ∈ S : |x − y| < δ and |f (x) − f (y)| ≥ ϵ.

Taking δ = 1, 12 , 13 , . . ., we see that for each positive integer k there exist xk , yk ∈


S such that |xk − yk | < k −1 and |f (xk ) − f (yk )| ≥ ϵ. By the Bolzano-Weierstrass
theorem, by passing to a subsequence we may assume that {xk } converges, say to
a ∈ S. Since |xk − yk | → 0, we also have yk → a. But then f (xk ) − f (yk ) →
f (a) − f (a) = 0, contradicting the assertion that |f (xk ) − f (yk )| ≥ ϵ.

It is remarkable that continuity is the only condition that must be imposed on f


in this theorem. In particular, in contrast to what Examples 1 and 2 might suggest,
no conditions on the derivatives of f enter the picture, even their existence! See
Exercise 2.

EXERCISES

1. A function f : S → Rm that satisfies

|f (x) − f (y)| ≤ C|x − y|λ for all x, y ∈ S,

where C and λ are positive constants, is said to be Hölder continuous on S


(with exponent λ). Show that if f is Hölder continuous on S, then f is uniformly
continuous on S.
1.8. Uniform Continuity 41

2. Suppose 0 < λ < 1.


a. Show that (a + b)λ < aλ + bλ for all a, b > 0. (Hint: Since λ − 1 < 0, for
t > 0 we have (a + t)λ−1 < tλ−1 . Integrate both sides from 0 to b.)
b. Let fλ (x) = |x|λ . Show that f satisfies the condition in Exercise 1, with
S = R and C = 1, and hence conclude that f is uniformly continuous on
R. (Note that f is unbounded on R and that the slope of its graph becomes
infinite at the origin.)
3. Suppose that f : S → Rm and g : S → Rm are both uniformly continuous on
S. Show that f + g is uniformly continuous on S.
4. Show that if f : S → Rm is uniformly continuous on S and {xk } is a Cauchy
sequence in S, then {f (xk )} is also a Cauchy sequence. On the other hand, give
an example of a Cauchy sequence {xk } in (0, ∞) and a continuous function
f : (0, ∞) → R (of necessity, not uniformly continuous) such that {f (xk )} is
not Cauchy.
5. Show that if f : S → Rm is uniformly continuous and S is bounded, then f (S)
is bounded.
Chapter 2

DIFFERENTIAL CALCULUS

The main theme of this chapter is the theory and applications of differential cal-
culus for functions of several variables. The reader is expected to be familiar with
differential calculus for functions of one variable. However, we offer a review of
the one-variable theory that contains a few features that the reader may not have
seen before, and the one-variable theory makes another appearance in the section
on Taylor’s theorem.

2.1 Differentiability in One Variable


We begin with an approach to the notion of derivative that is a bit different from
the one usually found in elementary calculus books. This point of view is very
useful in more advanced work, and it is the one that leads to the proper notion of
differentiability for functions of several variables.
The basic idea is that a function f : R → R is differentiable at x = a if it is
approximately linear near x = a. Geometrically, this means that the graph of f
has a tangent line at x = a. Analytically, it means that there is a linear function
l(x) = mx + b satisfying the following two conditions:
• l(a) = f (a), so that b = f (a) − ma and hence l(x) = f (a) + m(x − a);
• the difference f (x) − l(x) tends to zero at a faster rate than x − a as x → a,
that is,
f (x) − l(x)
→ 0 as x → a.
x−a
It will be convenient to denote the increment x − a by h, so that

f (x) − l(x) = f (a + h) − f (a) − mh.

43
44 Chapter 2. Differential Calculus

We think of this difference as a function of h and denote it by E(h); thus E(h) is


the error when we approximate f (a + h) by the linear function f (a) + mh.
We proceed to the formal definition. Suppose f is a real-valued function de-
fined on some open interval in R containing the point a. We say that f is differen-
tiable at a if there is a number m such that
E(h)
(2.1) f (a + h) = f (a) + mh + E(h), where lim = 0;
h→0 h

in other words, if f (a + h) is the sum of the linear function f (a) + mh and an error
term that tends to zero more rapidly than h as h → 0. In this case we have

f (a + h) − f (a) − E(h) f (a + h) − f (a) E(h)


m= = − .
h h h
As h → 0 the last term on the right vanishes, so we see that

f (a + h) − f (a)
(2.2) m = lim .
h→0 h
Thus the number m is uniquely determined, and it is the derivative of f at a as
usually defined in elementary calculus books, denoted by f ′ (a). Conversely, if the
limit m in (2.2) exists, then (2.1) holds with E(h) = f (a + h) − f (a) − mh. Thus,
our definition of differentiability is equivalent to the usual one; it simply puts more
emphasis on the idea of linear approximation.
Observe that if E(h)/h vanishes as h → 0, then so does E(h) itself and hence
so does f (a + h) − f (a). That is, differentiability at a implies continuity at a.
It is often convenient to express the relation limh→0 E(h)/h = 0 by saying that
“E(h) is o(h)” (pronounced “little oh of h”), meaning that E(h) is of smaller order
of magnitude than h. Thus the differentiability of f at x = a can be expressed by
saying that f (a + h) is the sum of a linear function of h and an error term that is
o(h).
The standard rules for differentiation are easily derived from (2.1). We illustrate
the ideas by working out the product rule.
The Product Rule: Suppose f and g are differentiable at x = a. Then

f (a + h) = f (a) + f ′ (a)h + E1 (h), g(a + h) = g(a) + g ′ (a)h + E2 (h),

where E1 (h) and E2 (h) are o(h). Multiplying these equations together yields
8 9
(2.3) f (a + h)g(a + h) = f (a)g(a) + f ′ (a)g(a) + f (a)g′ (a) h + E3 (h),
2.1. Differentiability in One Variable 45

where
8 9 8 9
E3 (h) = f (a)+f ′ (a)h+E1 (h) E2 (h) + E1 (h) g(a)+g′ (a)h + f ′ (a)g′ (a)h2 .

Clearly E3 (h) is o(h) since E1 (h) and E2 (h) are, so (2.3) is of the form (2.1)
with f replaced by f g and m = f ′ (a)g(a) + f (a)g′ (a). In other words, f g is
differentiable at a and (f g)′ (a) = f ′ (a)g(a) + f (a)g′ (a).
The chain rule can also be derived in this way; we shall do so, in a more general
setting, in §2.3.
We can also define “one-sided derivatives” of a function f at a point a. To
wit, the left-hand derivative f−′ (a) and the right-hand derivative f+′ (a) are the
one-sided limits
f (a + h) − f (a)
(2.4) f±′ (a) = lim .
h→0± h
Clearly f is differentiable at a if and only if its left-hand and right-hand derivatives
at a exist and are equal. These notions are particularly useful in two situations: (i)
in discussing functions whose graphs have “corners” such as f (x) = |x|, which has
one-sided derivatives at the origin although it is not differentiable there, and (ii) in
discussing functions whose domain is a closed interval [a, b], where the one-sided
derivatives f+′ (a) and f−′ (b) may be significant.

The Mean Value Theorem. The definition of the derivative involves passing
from the “local” information given by the values of f (x) for x near a to the “in-
finitesimal” information f ′ (a), which (intuitively speaking) gives the infinitesimal
change in f corresponding to an infinitesimal change in x. To reverse the process
and pass from “infinitesimal” information to “local” information — that is, to ex-
tract information about f from a knowledge of f ′ — the principal tool is the mean
value theorem, one of the most important theoretical results of elementary calculus.
The derivation begins with the following result, which is important in its own right.

2.5 Proposition. Suppose f is defined on an open interval I and a ∈ I. If f has


a local maximum or minimum at the point a ∈ I and f is differentiable at a, then
f ′ (a) = 0.

Proof. Suppose f has a local minimum at a; the argument at a maximum is similar.


In the difference quotient [f (a + h) − f (a)]/h, the numerator is ≥ 0 for all h near
0 since f (a + h) ≥ f (a), so the quotient has the same sign as h. It follows that the
one-sided limits as h → 0 from the left and right must be ≤ 0 and ≥ 0, respectively;
since they are both equal to f ′ (a), the only possibility is that f ′ (a) = 0.
46 Chapter 2. Differential Calculus

2.6 Lemma (Rolle’s Theorem). Suppose f is continuous on [a, b] and differentiable


on (a, b). If f (a) = f (b), there is at least one point c ∈ (a, b) such that f ′ (c) = 0.

Proof. By the extreme value theorem (1.23), f assumes a maximum value and a
minimum value on [a, b]. If the maximum and minimum each occur at an endpoint,
then f is constant on [a, b] since the values at the endpoints are equal, so f ′ (x) = 0
for all x ∈ (a, b). Otherwise, at least one of them occurs at some interior point
c ∈ (a, b), and then f ′ (c) = 0 by Proposition 2.5.

2.7 Theorem (Mean Value Theorem I). Suppose f is continuous on [a, b] and dif-
ferentiable on (a, b). There is at least one point c ∈ (a, b) such that

f (b) − f (a)
f ′ (c) = .
b−a
Proof. The straight line joining (a, f (a)) to (b, f (b)) is the graph of the function

f (b) − f (a)
l(x) = f (a) + (x − a),
b−a
and the assertion is that there is a point c ∈ (a, b) where the slope of the graph
y = f (x) is the same as the slope of this line, in other words, where the derivative
of the difference g(x) = f (x) − l(x) is zero. But f and l have the same values at
a and b, so g(a) = g(b) = 0, and the conclusion then follows by applying Rolle’s
theorem to g.

The mean value theorem is nonconstructive; that is, although it asserts the ex-
istence of a certain point c ∈ (a, b), it gives no clue as to how to find that point.
Students often find this perplexing at first, but in fact the whole power of the mean
value theorem comes from situations where there is no need to know precisely
where c is. In many applications, one has information about the behavior of f ′ on
some interval, and one deduces information about f on that same interval. The
following theorem comprises the most important of them.
We say that a function f is increasing (resp. strictly increasing) on an interval
I if f (a) ≤ f (b) (resp. f (a) < f (b)) whenever a, b ∈ I and a < b; similarly for
decreasing and strictly decreasing.

2.8 Theorem. Suppose f is differentiable on the open interval I.


a. If |f ′ (x)| ≤ C for all x ∈ I, then |f (b) − f (a)| ≤ C|b − a| for all a, b ∈ I.
b. If f ′ (x) = 0 for all x ∈ I, then f is constant on I.
c. If f ′ (x) ≥ 0 (resp. f ′ (x) > 0, f ′ (x) ≤ 0, or f ′ (x) < 0) for all x ∈ I, then f is
increasing (resp. strictly increasing, decreasing, or strictly decreasing) on I.
2.1. Differentiability in One Variable 47

Proof. Given a, b ∈ I, we have f (b) − f (a) = f ′ (c)(b − a) for some c ∈ I. In (a)


or (b) we know that |f ′ (c)| ≤ C or f ′ (c) = 0, respectively, and we conclude that
|f (b) − f (a)| ≤ C|b − a| or f (b) = f (a). In (c), if we know that f ′ (c) ≥ 0, we
conclude that f (b) − f (a) ≥ 0 for b > a, and similarly for the other cases.

In case the reader feels that we are belaboring the obvious here, we should point
out that the mere differentiability of f at a single point a gives less information
about the behavior of f near x = a than we would like. For example, if f ′ (a) > 0,
it does not follow that f is increasing in some neighborhood of a; see Exercises 3
and 4.
The mean value theorem admits the following important generalization, of
which we shall present some applications below.

2.9 Theorem (Mean Value Theorem II). Suppose that f and g are continuous on
[a, b] and differentiable on (a, b), and g′ (x) ̸= 0 for all x ∈ (a, b). Then there exists
c ∈ (a, b) such that
f ′ (c) f (b) − f (a)
= .
g′ (c) g(b) − g(a)
Proof. Let

h(x) = [f (b) − f (a)][g(x) − g(a)] − [g(b) − g(a)][f (x) − f (a)].

Then h is continuous on [a, b] and differentiable on (a, b), and h(a) = h(b) = 0.
By Rolle’s theorem, there is a point c ∈ (a, b) such that

0 = h′ (c) = [f (b) − f (a)]g′ (c) − [g(b) − g(a)]f ′ (c).

Since g ′ is never 0 on (a, b), we have g′ (c) ̸= 0 and also g(b) − g(a) ̸= 0 (by the
mean value theorem, since g(b) − g(a) = g′ (: c)(b − a) for some :
c ∈ (a, b)). Hence
we can divide by both these quantities to obtain the desired result.

L’Hôpital’s Rule. Often one is faced with the evaluation of limits of quotients
f (x)/g(x) where f and g both tend to zero or infinity. The collection of related
results that go under the name of “l’Hôpital’s rule” enable one to evaluate such
limits in many cases by examining the quotient of the derivatives, f ′ (x)/g ′ (x).
The cases involving the indeterminate form 0/0 can be summarized as follows.

2.10 Theorem (L’Hôpital’s Rule I). Suppose f and g are differentiable functions
on (a, b) and
lim f (x) = lim g(x) = 0.
x→a+ x→a+
48 Chapter 2. Differential Calculus

If g′ never vanishes on (a, b) and the limit


f ′ (x)
lim =L
x→a+ g ′ (x)

exists, then g never vanishes on (a, b) and


f (x)
lim = L.
x→a+ g(x)

The same result holds for


• the left-hand limit limx→a− , if f and g are differentiable on an interval (d, a),
• the two-sided limit limx→a , if f and g are differentiable on intervals (d, a) and
(a, b), and
• the limit limx→∞ or limx→−∞ , if f and g are differentiable on an interval
(b, ∞) or (−∞, b).
Proof. If we (re)define f (a) and g(a) to be 0, then f and g are continuous on the
interval [a, x] for x < b. By Theorem 2.9, for each x ∈ (a, b) there exists c ∈ (a, x)
(depending on x) such that
f (x) f (x) − f (a) f ′ (c)
= = ′ .
g(x) g(x) − g(a) g (c)
Since c ∈ (a, x), c approaches a+ as x does, so
f (x) f ′ (c)
lim = lim ′ = L.
x→a+ g(x) c→a+ g (c)

The proof for left-hand limits is similar, and the case of two-sided limits is obtained
by combining right-hand and left-hand limits. Finally, for the case a = ±∞, we
set y = 1/x and consider the functions F (y) = f (1/y) and G(y) = g(1/y).
Since F ′ (y) = −f ′ (1/y)/y 2 and G′ (y) = −g′ (1/y)/y 2 , we have F ′ (y)/G′ (y) =
f ′ (1/y)/g′ (1/y), so by the results just proved,
f (x) F (y) F ′ (y) f ′ (x)
lim = lim = lim ′ = lim ′ .
x→±∞ g(x) y→0± G(y) y→0± G (y) x→±∞ g (x)

Under the conditions of Theorem 2.10, it may well happen that f ′ (x) and g ′ (x)
tend to zero also, so that the limit of f ′ (x)/g ′ (x) cannot be evaluated immediately.
In this case we can apply Theorem 2.10 again to evaluate the limit by examining
f ′′ (x)/g ′′ (x). More generally, if the functions f, f ′ , . . . , f (k−1) , g, g ′ , . . . , g(k−1)
all tend to zero as x tends to a+ or a− or ±∞, but f (k)(x)/g(k) (x) → L, then
f (x)/g(x) → L.
2.1. Differentiability in One Variable 49

E XAMPLE 1. Let f (x) = 2x − sin 2x, g(x) = x2 sin x, a = 0. Then f , g, and


their first two derivatives vanish at x = a, but the third derivatives do not, so
2x − sin 2x 2 − 2 cos 2x 4 sin 2x
lim 2
= lim 2
= lim 2
x→0 x sin x x→0 2x sin x + x cos x x→0 (2 − x ) sin x + 4x cos x
8 cos 2x 4
= lim 2
= .
x→0 (6 − x ) cos x − 6x sin x 3
The corresponding result for limits of the form ∞/∞ is also true.
2.11 Theorem (L’Hôpital’s Rule II). Theorem 2.10 remains valid when the hypoth-
esis that lim f (x) = lim g(x) = 0 (as x → a+, x → a−, etc.) is replaced by the
hypothesis that lim |f (x)| = lim |g(x)| = ∞.
Proof. We consider the case of left-hand limits as x → a−; the other cases follow
as in Theorem 2.10. ) )
Given ϵ > 0, we wish to show that )[f (x)/g(x)] − L) < ϵ provided that x is
sufficiently close to a on the left. Since f ′ (x)/g′ (x) → L and |g(x)| → ∞, we can
choose x0 < a so that
) ′ )
) f (x) ) ϵ
) )
) g ′ (x) − L) < 2 and g(x) ̸= 0 for x0 < x < a.

Moreover, by Theorem 2.9, if x0 < x < a we have


f (x) − f (x0 ) f ′ (c)
= ′ for some c ∈ (x0 , x),
g(x) − g(x0 ) g (c)
and hence, since x0 < c < a,
) )
) f (x) − f (x0 ) ) ϵ
) )
) g(x) − g(x0 ) − L) < 2 for x0 < x < a.

Next, division of top and bottom by g(x) yields


f (x) f (x0 )

f (x) − f (x0 ) g(x) g(x)
= .
g(x) − g(x0 ) g(x0 )
1−
g(x)
Since |g(x)| → ∞ as x → a, the quotients f (x0 )/g(x) and g(x0 )/g(x) can be
made as close to zero as we please by taking x sufficiently close to a. It follows
that for x sufficiently close to a we have
) )
) f (x) − f (x0 ) f (x) ) ϵ
) )
) g(x) − g(x0 ) − g(x) ) < 2 ,
50 Chapter 2. Differential Calculus

and hence, by the preceding estimate,


) )
) f (x) )
) )
) g(x) − L) < ϵ,

which is what we needed to show.

The following special cases of Theorem 2.11 are of fundamental importance.

2.12 Corollary. For any a > 0 we have

xa log x log x
lim = lim = lim −a = 0.
x→+∞ ex x→+∞ xa x→0+ x

That is, the exponential function ex grows more rapidly than any power of x as
x → +∞, whereas | log x| grows more slowly than any positive power of x as
x → +∞ and more slowly than any negative power of x as x → 0+.

Proof. For the first limit, let k be the smallest integer that is ≥ a. A k-fold appli-
cation of Theorem 2.11 yields

xa a(a − 1) · · · (a − k + 1)xa−k
lim = lim ,
x→+∞ ex x→+∞ ex
and the latter limit is zero because a − k ≤ 0. For the other two limits, a single
application of Theorem 2.11 suffices:

log x 1 log x xa
lim = lim = 0, lim = lim = 0.
x→+∞ xa x→+∞ axa x→0+ x−a x→0+ a

By raising the quantities in Corollary 2.12 to a positive power b and replacing


a by a/b, we obtain the more general formulas

xa (log x)b | log x|b


(2.13) lim = lim = lim =0 (a, b > 0).
x→+∞ ebx x→+∞ xa x→0+ x−a

Vector-Valued Functions. The differential calculus generalizes easily to func-


tions of a real variable with values in Rn rather than R. If f = (f1 , . . . , fn ) is such
a function, its derivative at the point a is defined to be

f (a + h) − f (a)
f ′ (a) = lim .
h→0 h
2.1. Differentiability in One Variable 51

The jth component of the difference quotient on the right is h−1 [fj (a+h)−fj (a)].
It follows that f is differentiable if and only if each of its component functions fj
is differentiable, and that differentiation is simply performed componentwise:
/ 0
f ′ (a) = f1′ (a), . . . fn′ (a) .
The usual rules of differentiation generalize easily to this situation. In particular,
there are two forms of the product rule: one for the product of a scalar function ϕ
and a vector function f , and one for the dot product of two vector functions f and g:
(ϕf )′ = ϕ′ f + ϕf ′ , (f · g)′ = f ′ · g + f · g′ .
The first of these is just the ordinary product rule applied to each component ϕfj
of ϕf , and the second one is almost as easy (Exercise 8). Similarly, when n = 3
we have the product rule for cross products:
(f × g)′ = f ′ × g + f × g′ .
(The only point that needs attention here is that the factors f and g must be in the
same order in all three products.)
The most common geometric interpretation of a function f : R → Rn (n >
1) is as the parametric representation of a curve in Rn . That is, the independent
variable t is interpreted as time, and f (t) is the position of a particle moving in
Rn at time t that traces out a curve as t varies. In this setting, the derivative f ′ (t)
represents the velocity of the particle at time t.
Of particular importance are the straight lines in Rn . If a, c ∈ Rn and c ̸= 0,
the line through a in the direction parallel to the vector c is represented parametri-
cally by l(t) = a + tc. In particular, for the line passing through two points a and
b we have c = b − a, and the line is given by l(t) = a + t(b − a); the line segment
from a to b is obtained by restricting t to the interval [0, 1].
If f : R → Rn gives a parametric representation of a curve in Rn and f ′ (a) ̸= 0,
the function l(t) = f (a) + tf ′ (a) gives a parametric representation of the tangent
line to the curve at the point f (a). (If f ′ (a) = 0, the curve may not have a tangent
line at f (a). For example, if f (t) = (t3 , |t|3 ), then f ′ (0) = (0, 0), but the curve in
question is the graph y = |x|.) We shall discuss these matters more thoroughly in
Chapter 3.
It should be pointed out that the mean value theorem is not valid for vector-
valued functions. For example, the function f (t) = (cos t, sin t) satisfies f (0) =
f (2π), but f ′ (t) = (− sin t, cos t), so there is no point t where f ′ (t) = 0. However,
some of the corollaries of the mean value theorem remain valid. In particular, if
|f ′ (t)| ≤ M for all t ∈ [a, b], then
|f (b) − f (a)| ≤ M |b − a|.
52 Chapter 2. Differential Calculus

We shall prove this for the more general case of functions of several variables in
§2.10.

EXERCISES

1. Suppose that f is differentiable on the interval I and that f ′ (x) > 0 for all
x ∈ I except for finitely many points at which f ′ (x) = 0. Show that f is
strictly increasing on I.
2. Define the function f by f (x) = x2 sin(1/x) if x ̸= 0 and f (0) = 0. Show that
f is differentiable at every x ∈ R, including x = 0, but that f ′ is discontinuous
at x = 0. (Calculating f ′ (x) for x ̸= 0 is easy; to calculate f ′ (0) you need to
go back to the definition of derivative.)
3. Let f be the function in Exercise 2, and let g(x) = f (x) + 12 x. Show that
g′ (0) > 0 but that there is no neighborhood of 0 on which g is increasing.
(More precisely, every interval containing 0 has subintervals on which g is
decreasing.)
4. Define the function h by h(x) = x2 if x is rational, h(x) = 0 if x is irrational.
Show that h is differentiable at x = 0, even though it is discontinuous at every
other point.
5. Suppose that f is continuous on [a, b] and differentiable on (a, b), and that the
right-hand limit L = limx→a+ f ′ (x) exists. Show that the right-hand derivative
f+′ (a) exists and equals L. (Hint: Consider the difference quotients defining
f+′ (a) and use the mean value theorem.) Of course, the analogous result for
left-hand limits at b also holds.
6. Suppose that f is three times differentiable on an interval containing a. Show
that
f (a + 2h) − 2f (a + h) + f (a)
lim = f ′′ (a),
h→0 h2
f (a + 3h) − 3f (a + 2h) + 3f (a + h) − f (a)
lim = f (3) (a).
h→0 h3
Can you find the generalization to higher derivatives?
7. Show that for any a, b ∈ R, limx→0 (1+ax)b/x = eab . (Hint: Take logarithms.)
8. Suppose f and g are differentiable functions on R with values in Rn .
a. Show that (f · g)′ = f ′ · g + f · g′ .
b. Suppose also that n = 3, and show that (f × g)′ = f ′ × g + f × g′ .
2
9. Define the function f by f (x) = e−1/x if x ̸= 0, f (0) = 0.
2.2. Differentiability in Several Variables 53

a. Show that limx→0 f (x)/xn = 0 for all n > 0. (You’ll find that a simple-
minded application of Theorem 2.10 doesn’t work. Try setting y = 1/x2
instead.)
b. Show that f is differentiable at x = 0 and that f ′ (0) = 0.
2
c. Show by induction on k that for x ̸= 0, f (k) (x) = P (1/x)e−1/x , where
P is a polynomial of degree 3k.
d. Show by induction on k that f (k) (0) exists and equals 0 for all k. (Use the
results of (a) and (c) to compute the derivative of f (k−1) at x = 0 directly
from the definition, as in (b).)
The upshot is that f possesses derivatives of all orders at every point and that
f (k) (0) = 0 for all k.
10. Exercise 2 shows that it is possible for f ′ to exist at every point of an interval
I but to have discontinuities. It is an intriguing fact that when f ′ exists at every
point of I, it has the intermediate value property whether or not it is continuous.
More precisely:
Darboux’s Theorem. Suppose f is differentiable on [a, b]. If v is any num-
ber between f ′ (a) and f ′ (b), there is a point c ∈ (a, b) such that f ′ (c) = v.
Prove Darboux’s theorem, as follows: To simplify the notation, consider
the case a = 0, b = 1. Define h : [0, 2] → R by setting h(0) = f ′ (0),

f (x) − f (0) f (1) − f (x − 1)


h(x) = if 0 < x ≤ 1, h(x) = if 1 ≤ x < 2,
x 2−x

and h(2) = f ′ (1). Show that h is continuous on [0, 2] and apply the intermedi-
ate value theorem to it. (This argument has a simple geometric interpretation,
which you can find if you think of h(x) as the slope of the chord joining a
certain pair of points on the graph of f .)

2.2 Differentiability in Several Variables


The simplest notion of derivative for a function of several variables is that of partial
derivatives, which are just the derivatives of the function with respect to each of
its variables when the others are held fixed. That is, the partial derivative of a
function f (x1 , . . . , xn ) with respect to the variable xj is

f (x1 , . . . , xj + h, . . . , xn ) − f (x1 , . . . , xj , . . . , xn )
lim ,
h→0 h

provided that the limit exists.


54 Chapter 2. Differential Calculus

The most common notations for the partial derivative just defined are

∂f
, f xj , fj , ∂xj f, ∂j f.
∂xj

The first one is a modification of the Leibniz notation df /dx for ordinary deriva-
tives with the d replaced by the “curly d” ∂. The second one, with the variable of
differentiation indicated merely as a subscript on the function, is often used when
the first one seems too cumbersome. The third one is a variation on the second one
that is used when one does not want to commit oneself to naming the independent
variables but wants to speak of “the partial derivative of f with respect to its jth
variable.” The notations fxj and fj have the disadvantage that they may conflict
with other uses of subscripts — for example, denoting an ordered list of functions
by f1 , f2 , f3 , . . .. It has therefore become increasingly common in advanced math-
ematics to use the notations ∂xj f and ∂j f instead, which are reasonably compact
and at the same time quite unambiguous.

e3x sin xy
E XAMPLE 1. Let f (x, y, z) = . Then
1 + 5y − 7z

∂f 3e3x sin xy + e3x y cos xy


∂x f = ∂1 f = = ,
∂x 1 + 5y − 7z
∂f (1 + 5y − 7z)e3x x cos xy − 5e3x sin xy
∂y f = ∂2 f = = ,
∂y (1 + 5y − 7z)2
∂f 7e3x sin xy
∂z f = ∂3 f = = .
∂z (1 + 5y − 7z)2

The partial derivatives of a function give information about how the value of
the function changes when just one of the independent variables changes; that is,
they tell how the function varies along the lines parallel to the coordinate axes.
Sometimes this is just what is needed, but often we want something more. We may
want to know how the function behaves when several of the variables are changed at
once; or we may want to consider a new coordinate system, rotated with respect to
the old one, and ask how the function varies along the lines parallel to the new axes.
Do the partial derivatives provide such information? Without additional conditions
on the function, the answer is no.

E XAMPLE 2. Let us take another look at the function in Example 1 of §1.3:


xy
(2.14) f (x, y) = for (x, y) ̸= (0, 0), f (0, 0) = 0.
x2 + y 2
2.2. Differentiability in Several Variables 55

We have already observed that f is discontinuous at the origin; it approaches


different limits as (x, y) approaches the origin along different straight lines.
However, we have f (x, 0) = 0 for all x and f (0, y) = 0 for all y, so the partial
derivatives fx (0, 0) and fy (0, 0) both exist and equal zero:

f (h, 0) − f (0, 0) f (0, h) − f (0, 0)


fx (0, 0) = lim = 0 = lim = fy (0, 0).
h h
Clearly fx (0, 0) and fy (0, 0) aren’t describing the behavior of f near the origin
very well: when either x or y is varied while the other is held fixed at 0, f
doesn’t change at all, but when both are varied at once, f can change quite
drastically!

We need to give more thought to what it should mean for a function of several
variables to be differentiable. The right idea is provided by the characterization of
differentiability in one variable that we developed in the preceding section. Namely,
a function f (x) is differentiable at a point x = a if there is a linear function l(x)
such that l(a) = f (a) and the difference f (x) − l(x) tends to zero faster than x − a
as x approaches a. Now, the general linear1 function of n variables has the form

l(x) = b + c1 x1 + · · · + cn xn = b + c · x,

and the condition l(a) = f (a) forces b to be f (a) − c · a, so that l(x) = f (a) + c ·
(x − a). With this in mind, here is the formal definition.
A function f defined on an open set S ⊂ Rn is called differentiable at a point
a ∈ S if there is a vector c ∈ Rn such that
f (a + h) − f (a) − c · h
(2.15) lim = 0.
h→0 |h|
In this case c (which is uniquely determined by (2.15), as we shall see shortly) is
called the gradient of f at a and is denoted by ∇f (a). Denoting the numerator
of the quotient on the left side of (2.15) by E(h), we observe that (2.15) can be
rewritten as
E(h)
(2.16) f (a + h) = f (a) + ∇f (a) · h + E(h), where → 0 as h → 0,
|h|
which clearly expresses the fact that f (a + h), as a function of h, is well approxi-
mated by the linear function f (a) + ∇f (a) · h near h = 0.
1
Unfortunately the term “linear” has two common meanings as applied to functions: “first-degree
polynomial” and “satisfying l(ax + by) = al(x) + bl(y).” The first meaning — the one used here
— allows a constant term; the second does not. See Appendix A, (A.5).
56 Chapter 2. Differential Calculus

F IGURE 2.1: A tangent plane to a smooth surface.

What does this mean? First, let us establish the geometric intuition. If n = 2,
the graph of the equation z = f (x) (with x = (x, y)) represents a surface in
3-space, and the graph of the equation z = f (a) + ∇f (a) · (x − a) (x is the
variable; a is fixed) represents a plane. These two objects both pass through the
point (a, f (a)), and at nearby points x = a + h we have
zsurface − zplane = f (a + h) − f (a) − ∇f (a) · h.
Condition (2.16) says precisely that this difference tends to zero faster than h as
h → 0. Geometrically, this means that the plane z = f (a) + ∇f (a) · (x − a) is
the tangent plane to the surface z = f (x) at x = a, as indicated in Figure 2.1.
The same interpretation is valid in any number of variables, with a little stretch of
the imagination: The equation z = f (x) represents a “hypersurface” in Rn+1 with
coordinates (x1 , . . . , xn , z), and the equation z = f (a)+∇f (a)·(x−a) represents
its “tangent hyperplane” at a.
Next, let us establish the connection with partial derivatives and the uniqueness
of the vector c in (2.15). Suppose f is differentiable at a. If we take the increment
h in (2.16) to be of the form h = (h, 0, . . . , 0) with h ∈ R, we have c · h = c1 h
and |h| = ±h (depending on the sign of h). Thus (2.16) says (after multiplying
through by −1 if h is negative) that
f (a1 + h, a2 , . . . , an ) − f (a1 , . . . , an )
lim − c1 = 0,
h→0 h
or in other words, that c1 = ∂1 f (a). Likewise, cj = ∂j f (a) for j = 2, . . . , n. To
summarize:
2.17 Theorem. If f is differentiable at a, then the partial derivatives ∂j f (a) all
exist, and they are the components of the vector ∇f (a).
We also have the following:
2.18 Theorem. If f is differentiable at a, then f is continuous at a.
2.2. Differentiability in Several Variables 57

Proof. Multiplying (2.15) through by |h|, we see that f (a + h) − f (a) − ∇f (a) ·


h → 0 as h → 0. Since ∇f (a) · h clearly vanishes as h does, we have f (a + h) −
f (a) → 0 as h → 0, which says precisely that f is continuous at a.

The converses of Theorems 2.17 and 2.18 are false. The continuity of f does
not imply the differentiability of f even in dimension n = 1 (think of functions like
f (x) = |x| whose graphs have corners). When n > 1, the mere existence of the
partial derivatives of f does not imply the differentiability of f either. The example
(2.14) demonstrates this: Its partial derivatives exist, but it is not continuous at the
origin, so it cannot be differentiable there.
To restate what we have just shown: For a function f to be differentiable at a
it is necessary for the partial derivatives ∂j f (a) to exist, but not sufficient. How,
then, do we know when a function is differentiable? Fortunately, there is a simple
condition, not too much stronger than the existence of the partial derivatives, that
guarantees differentiability.
2.19 Theorem. Let f be a function defined on an open set in Rn that contains the
point a. Suppose that the partial derivatives ∂j f all exist on some neighborhood of
a and that they are continuous at a. Then f is differentiable at a.
Proof. Let’s consider the case n = 2, to keep the notation simple. We wish to show
that
f (a + h) − f (a) − c · h / 0
(2.20) → 0 as h → 0, where c = ∂1 f (a), ∂2 f (a) .
|h|
To do this, we shall analyze the increment f (a + h) − f (a) by making the change
one variable at a time:
8 9
(2.21) f (a + h) − f (a) = f (a1 + h1 , a2 + h2 ) − f (a1 , a2 + h2 )
8 9
+ f (a1 , a2 + h2 ) − f (a1 , a2 ) .

We assume that h is small enough so that the partial derivatives ∂j f (x) exist when-
ever |x − a| ≤ |h|. In this case, we can use the one-variable mean value theorem to
express the differences on the right side of (2.21) in terms of the partial derivatives
of f at suitable points. If we set g(t) = f (t, a2 + h2 ), we have

f (a1 + h1 , a2 + h2 ) − f (a1 , a2 + h2 ) = g(a1 + h1 ) − g(a1 )


= g′ (a1 + c1 )h1 = ∂1 f (a1 + c1 , a2 + h2 )h1

for some number c1 lying between 0 and h1 . Similarly,

f (a1 , a2 + h2 ) − f (a1 , a2 ) = ∂2 f (a1 , a2 + c2 )h2


58 Chapter 2. Differential Calculus

for some c2 between 0 and h2 . Substituting these results back into (2.21) and then
into the left side of (2.20), we obtain
f (a + h) − f (a) − c · h 8 9 h1
= ∂1 f (a1 + c1 , a2 + h2 ) − ∂1 f (a1 , a2 )
|h| |h|
8 9 h2
+ ∂2 f (a1 , a2 + c2 ) − ∂2 f (a1 , a2 ) .
|h|
Now let h → 0. The expressions in brackets tend to 0 because the partial deriva-
tives ∂j f are continuous at a, and the ratios h1 /|h| and h2 /|h| are bounded by 1 in
absolute value. Thus (2.20) is valid and f is differentiable at a.
The idea for general n is exactly the same. We write f (a + h) − f (a) as the
sum of n increments, each of which involves a change in only one variable — for
example, the first of them is
f (a1 + h1 , a2 + h2 , . . . , an + hn ) − f (a1 , a2 + h2 , . . . , an + hn )
— and then use the mean value theorem to express each difference in terms of a
partial derivative of f and proceed as before.

A function f whose partial derivatives ∂j f all exist and are continuous on an


open set S is said to be of class C 1 on S. For short, we shall also say that “f is
C 1 on S” or “f ∈ C 1 (S)” and refer to “a C 1 function f .” Theorems 2.17 and 2.19
then say that
C 1 =⇒ differentiable =⇒ partial derivatives exist.
The reverse implications are false. We already know that existence of partial deriva-
tives does not imply differentiability, and there are differentiable functions whose
derivatives are discontinuous. The standard example in one variable is the function
in Exercise 2, §2.1, and it is easy to generate higher-dimensional examples from
this one.
For most of the elementary functions that we shall work with, the continuity
of the partial derivatives is obvious by inspection, so verifying the differentiability
of a function is usually no problem. For example, for (x, y) ̸= (0, 0) the partial
derivatives of our old friend (2.14) are
y 3 − x2 y x3 − xy 2
∂x f (x, y) = , ∂y f (x, y) = ,
(x2 + y 2 )2 (x2 + y 2 )2
which are continuous everywhere except at the origin (but not at the origin). Thus
f is differentiable at every point except the origin.
We conclude this section by examining a few ramifications of the notion of
differentiability.
2.2. Differentiability in Several Variables 59

Differentials. Suppose f is differentiable at a, so that

f (a + h) − f (a) = ∇f (a) · h + error,

where the error term is negligibly small in comparison with h. If we neglect the
error term, the resulting approximation to the increment f (a + h) − f (a) is called
the differential of f at a and is denoted by df (a; h) or dfa (h):

(2.22) df (a; h) = dfa (h) = ∇f (a) · h = ∂1 f (a)h1 + · · · + ∂n f (a)hn .

If we set f (x) = u and h = dx = (dx1 , . . . , dxn ), this formula can be written


informally as
∂f ∂f ∂f
du = dx1 + dx2 + · · · + dxn .
∂x1 ∂x2 ∂xn
We can think of this in two ways. Intuitively, if we think of dx1 , . . . , dxn as in-
finitesimal increments in the independent variables x1 , . . . , xn , then du is the cor-
responding infinitesimal increment in the dependent variable u. Or, if we think of
dx1 , . . . , dxn as honest, finite increments, du is the corresponding increment in the
u value, not on the (hyper)surface u = f (x), but on its tangent (hyper)plane: It is
the linear approximation to the increment in the function f .
Differentials obey the usual elementary rules of differentiation, such as the sum,
product, and quotient rules:
- .
f g df − f dg
d(f + g) = df + dg, d(f g) = f dg + g df, d = .
g g2

This follows from (2.22) and the fact that the partial derivatives obey these rules.
We’ll see later how differentials interact with the chain rule.
Differentials are handy for approximating small changes in a function. Here’s
an example:

E XAMPLE 3. A right circular cone has height 5 and base radius 3. (a) About
how much does the volume increase if the height is increased to 5.02 and the
radius is increased to 3.01? (b) If the height is increased to 5.02, by about how
much should the radius be decreased to keep the volume constant?
Solution. The volume of a cone is given by V = 13 πr 2 h, so dV =
2 1 2
3 πrh dr + 3 πr dh. (a) If r = 3, h = 5, dr = .01, and dh = .02, we
have dV = 32 π(3)(5)(.01) + 13 π(32 )(.02) = .16π ≈ .50. (b) If r = 3, h = 5,
dh = .02, as in (a) we have dV = 10π dr + .06π, so dV = 0 if dr = −.006.
60 Chapter 2. Differential Calculus

Directional Derivatives. The partial derivatives ∂j f give information about


how f (x) varies as x moves along lines parallel to the coordinate axes. Sometimes
we wish to study the variation of f along oblique lines instead. Thus, given a unit
vector u and a base point a, we consider the line passing through a in the direction
u, which can be represented parametrically by g(t) = a + tu. The directional
derivative of f at a in the direction u is defined to be
d ) f (a + tu) − f (a)
∂u f (a) = f (a + tu))t=0 = lim ,
dt t→0 t
provided that the limit exists. For example, if u is the unit vector in the positive
jth coordinate direction (that is, u = (0, . . . , 1, . . . , 0) with the 1 in the jth place),
then ∂u f is just the partial derivative ∂j f .

2.23 Theorem. If f is differentiable at a, then the directional derivatives of f at a


all exist, and they are given by

(2.24) ∂u f (a) = ∇f (a) · u.

Proof. Differentiability of f means that


f (a + h) − f (a) − ∇f (a) · h
(2.25) → 0 as h → 0.
|h|

We take h = tu. If t > 0, then |h| = t and the expression on the left of (2.25) is

f (a + tu) − f (a)
− ∇f (a) · u.
t
If t < 0, then |h| = −t and the expression on the left of (2.25) is

f (a + tu) − f (a)
− + ∇f (a) · u.
t
In either case, this quantity tends to 0 as t → 0, which means that ∂u f (a) exists
and equals ∇f (a) · u.

It is possible for all the directional derivatives of f to exist even if f is not


differentiable, but in that case they cannot be computed from the partial derivatives
by the simple formula (2.24); see Exercise 7.
Consideration of directional derivatives leads to a geometric interpretation of
the gradient vector ∇f (a) when this vector is nonzero. Indeed, by (2.24) and
Cauchy’s inequality, we have |∂u f (a)| ≤ |∇f (a)| for every unit vector u, and
the extreme case ∂u f (a) = |∇f (a)| occurs when u is the unit vector in the same
2.2. Differentiability in Several Variables 61

direction as ∇f (a). Thus, ∇f (a) is the vector whose magnitude is the largest di-
rectional derivative of f at a, and whose direction is the direction of that derivative.
In other words, ∇f (a) points in the direction of steepest increase of f at a, and its
magnitude is the rate of increase of f in that direction.

E XAMPLE 4. Let f (x, y) = x2 + 5xy 2 , a = (−2, 1). (a) Find the directional
derivative of f at a in the direction of the vector v = (12, 5). (b) What is the
largest of the directional derivatives of f at a, and in what direction does it
occur?
Solution. We have ∇f (x, y) = (2x + 5y 2 , 10xy), so that ∇f (−2, 1) =
(1, −20). The unit vector in the direction of v is u = ( 12 5
13 , 13 ), so the direc-
12 5
tional derivative in this direction is ∇f (a) · u = (1, −20) · ( 13 , 13 ) = − 88
13 .

The largest directional derivative at a is |∇f (a)| = 401, and it occurs in the
1
direction √401 (1, −20).

EXERCISES
1. For each of the following functions f , (i) compute ∇f , (ii) find the directional
derivative of f at the point (1, −2) in the direction ( 53 , 45 ).
a. f (x, y) = x2 y + sin πxy.
2
b. f (x, y) = e4x−y .
c. f (x, y) = (x + 2y + 4)/(7x + 3y).
2. For each of the following functions f , (i) compute the differential df , (ii) use
the differential to estimate the difference f (1.1, 1.2, −0.1) − f (1, 1, 0).
a. f (x, y, z) = x2 ex−y+3z .
b. f (x, y, z) = y 3 + log(x + z 2 ).
x2 y 3/2 z
3. Let w = f (x, y, z) = . Suppose that, at the outset, (x, y, z) =
z+1
(5, 4, 1), so that w = 100. Use differentials to answer the following ques-
tions.
a. Suppose we change x to 5.03 and y to 3.92. By (about) how much should
we change z in order to keep w = 100?
b. Suppose we want to increase the value of w a little bit by changing the
value of only one of the independent variables. Which variable should
we choose to get the biggest increase in w for the smallest change of the
independent variable?
4. Show that u = f (x, y, z) = xe2z + y −1 e5z satisfies the differential equation
x∂x u + 2y∂y u + ∂z u = 3u.
62 Chapter 2. Differential Calculus

5. Show that u = f (x, y) = xy/(xy − y + 2x) satisfies the differential equation


x2 ∂x u + y 2 ∂y u = u2 .
6. For j!= 1, . . . , n, define the function fj on Rn \ {0} by fj (x) = xj /|x|. Show
that n1 xj dfj ≡ 0.
x2 y
7. Let f (x, y) = 2 if (x, y) ̸= (0, 0) and f (0, 0) = 0.
x + y2
a. Show that f is continuous at (0, 0). (Hint: Since 0 ≤ (x ± y)2 = x2 +
y 2 ± 2xy, we have |xy| ≤ 12 (x2 + y 2 ) for all x, y.)
b. Show that the directional derivatives ∂u f (0, 0) all exist, and compute them.
(Work directly with the definition of directional derivative. The best way
to write a unit vector in R2 is as u = (cos θ, sin θ).)
c. Show that f is not differentiable at (0, 0). (Hint: If it were, the directional
derivatives ∂u f (0, 0) would be related to the partial derivatives ∂x f (0, 0)
and ∂y f (0, 0) by (2.24).)
8. Suppose f is a function defined on an open set S ⊂ Rn . Show that if the
partial derivatives ∂j f exist and are bounded on S, then f is continuous on S.
(Exercise 7 provides an example of a function that satisfies these conditions on
S = R2 but is not everywhere differentiable.)

2.3 The Chain Rule


There are several different but closely related versions of the chain rule for func-
tions of several variables. The most basic one concerns the situation where we have
a function f (x1 , . . . , xn ) and the variables x1 , . . . , xn are themselves functions of
a single real variable t. To be precise, suppose xj = gj (t), or x = g(t); we then
have the composite function ϕ(t) = f (g(t)).
We recall that the derivative g′ (t) is defined componentwise:
/ 0
g′ (t) = g1′ (t), . . . , gn′ (t) .
Geometrically speaking, the equation x = g(t) represents a parametrized curve in
Rn ; we may think of a particle moving in Rn whose position at time t is g(t). In
this case the vector g′ (t) is the velocity of the particle at time t; it is tangent to the
curve at g(t), and its magnitude is the speed at which the particle is traveling along
the curve.
2.26 Theorem (Chain Rule I). Suppose that g(t) is differentiable at t = a, f (x)
is differentiable at x = b, and b = g(a). Then the composite function ϕ(t) =
f (g(t)) is differentiable at t = a, and its derivative is given by
ϕ′ (a) = ∇f (b) · g′ (a),
2.3. The Chain Rule 63

or, in Leibniz notation, with w = f (x),

dw ∂w dx1 ∂w dxn
(2.27) = + ··· + .
dt ∂x1 dt ∂xn dt

Proof. Differentiability of f and g at the appropriate points means that

f (b + h) = f (b) + ∇f (b) · h + E1 (h), E1 (h)/|h| → 0 as h → 0;



g(a + u) = g(a) + ug (a) + E2 (u), |E2 (u)|/u → 0 as u → 0.

In the first equation we take h = g(a + u) − g(a). By the second equation, we also
have h = ug′ (a) + E2 (u), and we are given that g(a) = b, so

ϕ(a + u) = f (g(a + u)) = f (b + h) = f (b) + ∇f (b) · h + E1 (h)


= f (g(a)) + ∇f (b) · [ug′ (a) + E2 (u)] + E1 (h)
= ϕ(a) + u∇f (b) · g′ (a) + E3 (u),

where
E3 (u) = ∇f (b) · E2 (u) + E1 (h).

We claim that the error term E3 (u) satisfies E3 (u)/u → 0 as u → 0. Granted this,
we have

ϕ(a + u) − ϕ(a) E3 (u)


= ∇f (b) · g′ (a) + → ∇f (b) · g′ (a) as u → 0,
u u

so that ϕ′ (a) = ∇f (b) · g′ (a) as claimed.


Showing that E3 (u)/u → 0 is just a matter of sorting out the mess a little.
The fact that |E2 (u)|/u → 0 takes care of the first term in E3 (u), by Cauchy’s
inequality: ) )
|∇f (b) · E2 (u)| ) E2 (u)| )
≤ |∇f (b)| )) ) → 0.
|u| u )
It also implies that when u is small we have |E2 (u)| ≤ |u| and hence
/ 0
|h| = |ug′ (a) + E2 (u)| ≤ |g′ (a)| + 1 |u|.

Now the second term in E3 (u), namely E1 (h), becomes negligibly small in com-
parison to |h| as |h| → 0, and the estimate above shows that |h| in turn is bounded
by a constant times |u|, so E1 (h) becomes negligibly small in comparison to |u| as
u → 0, which means that E1 (h)/u → 0 as desired.
64 Chapter 2. Differential Calculus

E XAMPLE 1. Suppose w = f (x, y, z) is a differentiable function of (x, y, z),


and that x = t4 − t, y = sin 3t, and z = e−2t . Then w can be regarded as a
composite function of t, and we have

dw d
= f (t4 − t, sin 3t, e−2t )
dt dt
= (∂1 f ) · (4t3 − 1) + (∂2 f ) · (3 cos 3t) + (∂3 f ) · (−2e−2t ),

where the partial derivatives ∂j f are all evaluated at (t4 − t, sin 3t, e−2t ).

Suppose now that the variables x1 , . . . , xn are differentiable functions, not of


a single real variable t, but of a family of variables t = (t1 , . . . , tm ); say, xj =
gj (t1 , . . . , tm ), or x = g(t). If f is a differentiable function of x, we then have the
composite function ϕ(t) = f (g(t)). The chain rule, as stated above, can be used
to compute the partial derivatives of ϕ with respect to the variables tk . Indeed, we
simply fix all but one of those variables and apply the chain rule to the resulting
function of the remaining single variable to obtain

∂ϕ ∂g
(2.28) (a) = ∇f (b) · (a) (b = g(a)),
∂tk ∂tk

or, setting w = f (x),

∂w ∂w ∂x1 ∂w ∂xn
= + ··· + .
∂tk ∂x1 ∂tk ∂xn ∂tk

To be precise, this calculation shows that if the partial derivatives ∂g/∂tk exist
at t = a and if f is differentiable at x = b = g(a), then the partial derivatives
∂ϕ/∂tk exist at t = a and are given by (2.28). It also shows that if g is of class
C 1 near a and f is of class C 1 near b = g(a), then ϕ is of class C 1 , and in
particular is differentiable, near a. Indeed, under these hypotheses, (2.28) shows
that the partial derivatives ∂ϕ/∂tk are continuous.
It is also natural to ask whether the composite function f ◦ g is differentiable
when f and g are only assumed to be differentiable rather than C 1 . The answer is
affirmative. When t is only a single real variable, this result is contained in the chain
rule as stated and proved above. The proof for the general case, t = (t1 , . . . , tm ),
is almost identical except that the notation is a little messier, and we shall not take
the trouble to write it out. But we shall give a formal statement of the result:

2.29 Theorem (Chain Rule II). Suppose that g1 , . . . , gn are functions of t =


(t1 , . . . , tm ) and f is a function of x = (x1 , . . . , xn ). Let b = g(a) and ϕ = f ◦ g.
If g1 , . . . , gn are differentiable at a (resp. of class C 1 near a) and f is differentiable
2.3. The Chain Rule 65

at b (resp. of class C 1 near b), then ϕ is differentiable at a (resp. of class C 1 near


a), and its partial derivatives are given by
∂ϕ ∂f ∂x1 ∂f ∂xn
(2.30) = + ··· + ,
∂tk ∂x1 ∂tk ∂xn ∂tk
where the derivatives ∂f /∂xj are evaluated at b and the derivatives ∂ϕ/∂tk and
∂xj /∂tk = ∂gj /∂tk are evaluated at a.
E XAMPLE 2. Suppose that f is a differentiable function of x and y and that
x = s log(1 + t2 ) and y = cos(s3 + 5t). Then the partial derivatives of the
composite function z = f (s log(1 + t2 ), cos(s3 + 5t)) are given by
∂z ∂f ∂x ∂f ∂y
= + = fx · log(1 + t2 ) + fy · (−3s2 ) sin(s3 + 5t),
∂s ∂x ∂s ∂y ∂s
∂z ∂f ∂x ∂f ∂y 2st
= + = fx + fy · (−5) sin(s3 + 5t).
∂t ∂x ∂t ∂y ∂t 1 + t2
Here, the partial derivatives of f are to be evaluated at (s log(1 + t2 ), cos(s3 +
5t)).
The chain rule (2.30) has a neat interpretation in terms of differentials. Let
w = f (x). If we regard x1 , . . . , xn as independent variables, we have
∂w ∂w
(2.31) dw = dx1 + · · · + dxn .
∂x1 ∂xn
On the other hand, if we regard x1 , . . . , xn as functions of the variables t1 , . . . , tm
and w as the composite function f (x(t)), we have
∂xj ∂xj
(2.32) dxj = dt1 + · · · + dtm
∂t1 ∂tm
and
∂w ∂w
(2.33) dw = dt1 + · · · + dtm .
∂t1 ∂tm
If we substitute the expressions (2.32) for dxj into (2.31) and regroup the terms,
we obtain
+ , + ,
∂w ∂x1 ∂x1 ∂w ∂xn ∂xn
dw = dt1 + · · · + dtm +· · ·+ dt1 + · · · + dtm
∂x1 ∂t1 ∂tm ∂xn ∂t1 ∂tm
+ , + ,
∂w ∂x1 ∂w ∂xn ∂w ∂x1 ∂w ∂xn
= + ··· + dt1 +· · ·+ + ··· + dtm .
∂x1 ∂t1 ∂xn ∂t1 ∂x1 ∂tm ∂xn ∂tm
66 Chapter 2. Differential Calculus

The content of the chain rule (2.30) is precisely that this last expression for dw coin-
cides with (2.33). In other words, the differential formalism has the chain rule “built
in,” just as it does in one variable (where the chain rule dw/dt = (dw/dx)(dx/dt)
is just a matter of “canceling the dx’s”).
The preceding discussion concerns the situation where the variable w depends
on a set of variables xj , and the xj ’s depend on a different set of variables tk .
However, in many situations the variables on different “levels” can get mixed up
with each other. The typical example is as follows. Consider a physical quantity
w = f (x, y, z, t) whose value depends on the position (x, y, z) and the time t
(temperature, for example, or air pressure in a region of the atmosphere). Consider
also a vehicle moving through space, so that its coordinates (x, y, z) are functions
of t. We wish to know how the quantity w varies in time, as measured by an
observer on the vehicle; that is, we are interested in the behavior of the composite
function / 0
w = f x(t), y(t), z(t), t .
Here t enters not only as a “first-level” variable, as the last argument of f , but also
as a “second-level” variable through the t-dependence of x, y, z.
How should this be handled? There is no real problem; the only final indepen-
dent variable is t, so the chain rule in the form (2.27) can be applied:
dw ∂w dx ∂w dy ∂w dz ∂w
(2.34) = + + + .
dt ∂x dt ∂y dt ∂z dt ∂t
In the last term we have omitted the derivative dt/dt, which of course equals 1. (If
this makes you nervous, denote the fourth variable in f by u instead of t; then we
are considering w = f (x(t), y(t), z(t), u(t)) where u(t) = t.)
Notice the subtle use of notation: The dw/dt on the left of (2.34) denotes the
“total derivative” of w, taking into account all the ways in which w depends on t,
whereas the ∂w/∂t on the right denotes the partial derivative that involves only the
explicit dependence of the function f on its fourth variable t. This notation works
well enough in this situation, but it becomes inadequate if there is more than one
final independent variable.
Suppose, for example, that we are studying a function w = f (x, y, t, s), and
that x and y are themselves functions of the independent variables t and s. Then
the analogue of (2.34) would be
∂w ∂w ∂x ∂w ∂y ∂w
= + + ,
∂t ∂x ∂t ∂y ∂t ∂t
but this is nonsense! The ∂w/∂t’s on the left and on the right denote different
things. In such a situation we must use one of the alternative notations for partial
2.3. The Chain Rule 67

x1
x2
w t

xn

F IGURE 2.2: Diagram of dependence for the basic chain rule.

derivatives that offer more precision, or perhaps add some subscripts to the ∂w/∂t’s
to specify their meaning. In this case, if x = ϕ(t, s) and y = ψ(t, s), we could
write
∂w
(2.35) = (∂1 f )(∂1 ϕ) + (∂2 f )(∂1 ψ) + ∂3 f.
∂t
The mixture of dependent-and-independent-variable notation on the left and
functional notation on the right in (2.35) is perhaps inelegant, but it does the job!
In general, it is best not to be too doctrinaire about deciding to use one notation
for partial derivatives rather than another one; clarity is more important than con-
sistency. We shall be quite free about adopting whichever notation works best in a
particular situation, and the exercises aim at encouraging the reader to do likewise.
When the relations among the variables become too complicated for comfort,
we can often sort things out by drawing a schematic diagram of the functional
relationships. The idea is as follows:
i. Write down the dependent variable on the left of the page, a list of the inde-
pendent variables on which it ultimately depends on the right, and lists of the
intermediate variables in the middle.
ii. Whenever one variable p depends directly on another one q, draw a line joining
them; this line represents the partial derivative ∂p/∂q.
iii. To find the derivative of the variable w on the left with respect to one of the
variables t on the right, consider all the ways you can go from w to t by follow-
ing the lines. For each such path, write down the product of partial derivatives
corresponding to the lines along the path, then add the results.
The diagram for the basic chain rule (2.27) is shown in Figure 2.2: The path
from w to xj to t gives the term (∂w/∂xj )(dxj /dt) in (2.27). On the other hand,
Figure 2.3 gives the diagram for w = f (x, y, t, s) where x and y depend on t and
s: There are three paths from w to t (w to x to t, w to y to t, and w to t directly)
that give the three terms on the right of (2.35).
68 Chapter 2. Differential Calculus

y
w
t

F IGURE 2.3: Diagram of dependence for w = f (x, y, t, s), x =


ϕ(t, s), y = ψ(t, s).

Here is another useful corollary of the chain rule. A function f on Rn is called


(positively) homogeneous of degree a (a ∈ R) if f (tx) = ta f (x) for all t > 0 and
x ̸= 0.
2.36 Theorem (Euler’s Theorem). If f is homogeneous of degree a, then at any
point x where f is differentiable we have

x1 ∂1 f (x) + x2 ∂2 f (x) + · · · + xn ∂n f (x) = af (x).

Proof. Consider the function ϕ(t) = f (tx). On the one hand, since f (tx) =
ta f (x), we have ϕ′ (t) = ata−1 f (x) = at−1 f (tx). On the other, by the chain rule
we have
d
ϕ′ (t) = ∇f (tx) · (tx) = x · ∇f (tx).
dt
Setting t = 1 and equating the two expressions for ϕ′ (1), we obtain the asserted
result.

We conclude this section with an additional geometric insight into the meaning
of the gradient of a function. If F is a differentiable function of (x, y, z) ∈ R3 , the
locus of the equation F (x, y, z) = 0 is typically a smooth two-dimensional surface
S in R3 . (We shall consider this matter more systematically in Chapter 3.) Suppose
that (x, y, z) = g(t) is a parametric represention of a smooth curve on S. On the
one hand, by the chain rule we have (d/dt)F (g(t)) = ∇F (g(t)) · g′ (t). On the
other hand, since the curve lies on S, we have F (g(t)) = 0 for all t and hence
(d/dt)F (g(t)) = 0. Thus, for any curve on the S, the gradient of F is orthogonal
to the tangent vector to the curve at each point on the curve. Since such curves can
go in any direction on the surface, we conclude that at any point a ∈ S, ∇F (a) is
orthogonal to every vector that is tangent to S at a. (Of course, this is interesting
only if ∇F (a) ̸= 0.) We summarize:
2.3. The Chain Rule 69

2.37 Theorem. Suppose that F is a differentiable function on some open set U ⊂


R3 , and suppose that the set
% &
S = (x, y, z) ∈ U : F (x, y, z) = 0

is a smooth surface. If a ∈ S and ∇F (a) ̸= 0, then the vector ∇F (a) is perpen-


dicular, or normal, to the surface S at a.

2.38 Corollary. Under the conditions of the theorem, the equation of the tangent
plane to S at a is ∇F (a) · (x − a) = 0.

This formula for the tangent plane to a surface agrees with the one we gave in
§2.2 when the surface is the graph of a function f (x, y). The easy verification is
left to the reader (Exercise 5).
A similar result holds if we have two equations F (x, y, z) = 0 and G(x, y, z) =
0. Each of them (usually) represents a surface, and the intersection of the two
surfaces is (usually) a curve. At any point a on this curve, the vectors ∇F (a) and
∇G(a) are both perpendicular to the curve, and if they are linearly independent,
they span the normal plane to the curve at a.
These ideas carry over into dimensions other than 3. For n = 2, an equation
F (x, y) = 0 typically represents a curve C, and ∇F (a, b) is normal to C at each
(a, b) ∈ C. For n > 3, we simply stretch our imagination to say that ∇F (a) is
normal to the hypersurface defined by F (x) = 0 at x = a.

EXERCISES

In these exercises, all functions in question are assumed to be differentiable.


1. Find the indicated derivatives of w in terms of the derivatives of f, g, h.
a. w = f (x, y, t), x = g(y, t), y = h(t). What is dw/dt?
b. w = f (x, u, v), u = g(x, y), v = h(x, z). What are ∂x w, ∂y w, ∂z w?
(∂x w refers to the complete dependence of w on x, as opposed to ∂1 f .)
c. w = f (u), u = g(x, y), y = h(x). What is dw/dx?
2. Find ∂x w and ∂y w in terms of the partial derivatives ∂1 f , ∂2 f , and ∂3 f .
a. w = f (2x − y 2 , x sin 3y, x4').
b. w = f (ex−3y , log(x2 + 1), y 4 + 4).
c. w = arctan[f (y 2 , 2x − y, −4)].
3. Show that the given function u satisfies the given differential equation.
a. u = f (3x + 2y); 2∂x u − 3∂y u = 0.
b. u = xy + xf (y/x); x∂x u + y∂y u − u = xy.
70 Chapter 2. Differential Calculus

c. u = f (xz, yz); x∂x u + y∂y u = z∂z u.


!n
4. Let u = f (r) and r = |x| = (x21 + · · · + x2n )1/2 . Show that 1 (∂u/∂xj )
2 =
[f ′ (r)]2 .
5. Show that the formula for the tangent plane to the surface z = f (x, y) given in
§2.2 coincides with the formula for the tangent plane to the surface F (x, y, z) =
0 given in this section, when F (x, y, z) = f (x, y) − z.
6. Find the tangent plane to the surface in R3 described by the given equation at
the given point a ∈ R3 .
a. z = x2 − y 3 , a = (2, −1, 5).
b. x2 + 2y 2 + 3z 2 = 6, a = (1, 1, −1).

c. z = x + arctan y, a = (9, 0, 3).
d. xyz 2 − log(z − 1) = 8, a = (−2, −1, 2).
7. Suppose ϕ(x) is defined by a formula in which x occurs in several places.
(For example, there are three x’s in ϕ(x) = x2 ex /(x + 3).) Show that the
derivative ϕ′ (x) is obtained by differentiating with respect to each of the x’s
in turn, treating the others as constants, and adding the results. (Hint: If x
occurs in n places in the formula for ϕ, let F (x1 , . . . , xn ) be the function of
n variables obtained by replacing each of the x’s in the formula by a different
variable. How do you express ϕ in terms of F ?) Notice that the rules for
differentiating sums and products are special cases of this result, obtained by
taking ϕ(x) = f (x) + g(x) or ϕ(x) = f (x)g(x). What is the derivative of
ϕ(x) = f (x)g(x) ?

2.4 The Mean Value Theorem


The mean value theorem for functions of n variables can be stated as follows. We
recall that if a and b are two points in Rn , the line passing through them can be
described parametrically by g(t) = a + t(b − a). In particular, the line segment
whose endpoints are a and b is the set of points a + t(b − a) with 0 ≤ t ≤ 1.

2.39 Theorem (Mean Value Theorem III). Let S be a region in Rn that contains
the points a and b as well as the line segment L that joins them. Suppose that f is
a function defined on S that is continuous at each point of L and differentiable at
each point of L except perhaps the endpoints a and b. Then there is a point c on L
such that
f (b) − f (a) = ∇f (c) · (b − a).

Proof. Let h = b − a; then L = {a + th : 0 ≤ t ≤ 1}. Define ϕ(t) = f (a + th)


for 0 ≤ t ≤ 1. Since f is continuous on L, ϕ is continuous on [0, 1]. Moreover, by
2.4. The Mean Value Theorem 71

the chain rule, ϕ is differentiable on (0, 1) and

d
ϕ′ (t) = ∇f (a + th) · (a + th) = ∇f (a + th) · h = ∇f (a + th) · (b − a).
dt
By the one-variable mean value theorem, there is a point u ∈ (0, 1) such that
ϕ(1) − ϕ(0) = ϕ′ (u) · (1 − 0) = ϕ′ (u). Let c = a + uh; then

f (b) − f (a) = ϕ(1) − ϕ(0) = ϕ′ (u) = ∇f (c) · (b − a).

To state the principal corollaries of the mean value theorem, we need a defini-
tion. A set S ⊂ Rn is called convex if whenever a, b ∈ S, the line segment from
a to b also lies in S. Clearly every convex set is arcwise connected (line segments
are arcs!), but most connected sets are not convex. See Figure 2.4.

E XAMPLE 1. Every ball is convex. Indeed, let B = {x : |x − c| < r} be the


ball of radius r about c. If a, b ∈ B, for 0 ≤ t ≤ 1 we have
) ) ) )
)[a + t(b − a)] − c) = )(1 − t)(a − c) + t(b − c))
≤ (1 − t)|a − c| + t|b − c| < (1 − t)r + tr = r,

so a+t(b−a) ∈ B. (We have used the fact that t and 1−t are both nonnegative
when 0 ≤ t ≤ 1.)

2.40 Corollary. Suppose that f is differentiable on an open convex set S and


|∇f (x)| ≤ M for every x ∈ S. Then |f (b) − f (a)| ≤ M |b − a| for all a, b ∈ S.

Proof. The line segment from a to b lies in S, and for some c on this segment we
have f (b) − f (a) = ∇f (c) · (b − a). Hence, by Cauchy’s inequality, |f (b) −
f (a)| ≤ |∇f (c)| |b − a| ≤ M |b − a|.

2.41 Corollary. Suppose f is differentiable on an open convex set S and ∇f (x) =


0 for all x ∈ S. Then f is constant on S.

Proof. Pick a ∈ S and take M = 0 in Corollary 2.40. We conclude that for every
b ∈ S, |f (b) − f (a)| = 0, that is, f (b) = f (a).

The hypothesis of convexity is essential in Corollary 2.40. In a situation like


that of the set S2 in Figure 2.4, |b−a| is small, but f (b)−f (a) could be quite large
even when |∇f | is small in S2 . (Think of a gently sloping spiral ramp.) However,
Corollary 2.41 can be generalized substantially.
72 Chapter 2. Differential Calculus

a
a

b
a b b

S1 S2 S3

F IGURE 2.4: A convex set (S1 ), a set that is connected but not convex
(S2 ), and a disconnected set (S3 ).

2.42 Theorem. Suppose that f is differentiable on an open connected set S and


∇f (x) = 0 for all x ∈ S. Then f is constant on S.
Proof. Pick a ∈ S, and define S1 = {x ∈ S : f (x) = f (a)} and S2 = {x ∈ S :
f (x) ̸= f (a)}. We shall show that S2 must be empty, and hence that f is constant
on S = S1 , by showing that otherwise (S1 , S2 ) would be a disconnection of S.
Clearly S1 and S2 are disjoint and their union is S; moreover, a ∈ S1 . The
set S2 is open (by Theorem 1.13) because the complement of the point f (a) is
an open subset of R. The set S1 is also open, for the following reason. Suppose
x ∈ S1 . Since S is open, there is a ball B centered at x that is contained in S.
Since B is convex, f is constant on B by Corollary 2.41, and hence B ⊂ S1 . That
is, every x ∈ S1 is an interior point of S1 , so S1 is open. Since both S1 and S2
are open, neither one can intersect the closure of the other one without intersecting
the other one itself. But clearly S1 and S2 are disjoint, their union is S, and S1 is
nonempty since it contains a. Therefore, (S1 , S2 ) is a disconnection of S unless S2
is empty.

The hypothesis of connectedness is necessary here. If S = S ′ ∪ S ′′ where S ′


and S ′′ are open and disjoint, we obtain a counterexample by taking f (x) = 0 for
x ∈ S ′ and f (x) = 1 for x ∈ S ′′ . (See Figure 2.4. Differentiability of a function f
on the set S3 there affords no control over the relation between the values of f at a
and b.)

EXERCISES
1. State and prove two analogues of Rolle’s theorem for functions of several vari-
ables, whose hypotheses are, respectively, the following:
2.5. Functional Relations and Implicit Functions: A First Look 73

a. f is differentiable on a set containing the line segment from a to b, and


f (a) = f (b).
b. f is differentiable on a bounded open set S, continuous on the closure of
S, and constant on the boundary of S.
2. Question: If f is differentiable on a connected open set S and ∂1 f (x) = 0 for
all x ∈ S, must f be independent of x1 on S (that is, f (a) = f (b) whenever
a, b ∈ S and aj = bj for all j ̸= 1)?
a. Show that the answer is yes when S is convex.
b. Give a counterexample to show that the answer is no in general. (Hint:
Think of a staircase where you go halfway up on one flight, make a 180◦
turn on a flat landing, then go the rest of the way up on a second flight
parallel to the first one.)

2.5 Functional Relations and Implicit Functions: A First


Look
Often we are presented with an equation F (x1 , . . . , xn ) = 0 relating a collection
of variables x1 , . . . , xn . (There is no harm in taking the right side to be 0; just
move everything over to the left side of the equation.) It may be possible to solve
this equation for one of the variables in terms of the remaining ones, say xn =
g(x1 , . . . , xn−1 ), and we wish to study the resulting function g in terms of the
original function F .
To make things clearer, let us change the notation a little, replacing n by n +
1 and denoting the last variable xn by y; thus, the given equation has the form
F (x1 , . . . , xn , y) = 0, and it is supposed to determine y as a function of x =
(x1 , . . . , xn ).
Let us be clear about what we mean by saying that “it is possible to solve for
y.” First, we mean that it is possible to solve in principle, not necessarily that there
is an explicit formula for y. Second, there might be more than one solution, and
obtaining y as a function of the xj ’s then involves making a definite choice among
the solutions; moreover, the domain of this function may be smaller than one would
suspect from the original equation.
E XAMPLE 1.
a. Consider the equation x − y − y 5 = 0. It’s easy to solve this for x in terms
of y, x = y + y 5 , but there is no nice algebraic formula for y in terms of
x. However, y + y 5 is a strictly increasing function of y (its derivative is
1 + 5y 4 , which is positive everywhere), and its values clearly range from
−∞ to ∞, so for each x there is exactly one y satisfying x = y + y 5 , and
74 Chapter 2. Differential Calculus

we can call it g(x). The object in such a situation is to use the equation
x = y + y 5 to study the function g.
b. The equation x2 + y 2 + z 2 ='
1 can be solved for z as a'
continuous function
of x and y in two ways, z = 1 − x2 − y 2 and z = − 1 − x2 − y 2 , both
of which are defined only for x2 + y 2 ≤ 1.

At this stage we are not going to worry about these matters, or about the ques-
tion of when it is possible to solve the equation at all; such questions will be ad-
dressed in Chapter 3. Rather, we shall assume that there is a differentiable function
g(x1 , . . . , xn ), defined for x1 , . . . , xn in some region S ⊂ Rn , so that the equation
F (x1 , . . . , xn , y) = 0 is satisfied identically when g(x1 , . . . , xn ) is substituted for
y:
/ 0
(2.43) F x1 , . . . , xn , g(x1 , . . . , xn ) ≡ 0, (x1 , . . . , xn ) ∈ S.

In this situation we can use the chain rule to compute the partial derivatives
of g in terms of the partial derivatives of F , simply by differentiating the equation
(2.43) with respect to the variables xj :

∂g ∂g ∂j F
(2.44) ∂j F + ∂n+1 F = 0, so =− .
∂xj ∂xj ∂n+1 F

E XAMPLE 1 (continued).
a. Differentiation of the equation x − y − y 5 = 0 with respect to x yields
1 − (dy/dx) − 5y 4 (dy/dx) = 0, or (dy/dx) = 1/(1 + 5y 4 ). Of course,
this gives dy/dx in terms of y instead of x, and we don’t have a formula
for y in terms of x, but this is better than nothing!
b. Differentiation of x2 + y 2 + z 2 = 1 with respect to x, with z as the depen-
dent variable, gives 2x + 2z(∂z/∂x) = 0, or ∂z/∂x = −x/z. ' It is easily
verified'that this formula is correct whether we take z = 1 − x2 − y 2 or
z = − 1 − x2 − y 2 .

In a related situation, we may wish to differentiate a function ϕ(x1 , . . . , xn , y)


where the variables x1 , . . . , xn , y satisfy a relation F (x1 , . . . , xn , y) = 0. Assum-
ing, as before, that the equation F (x1 , . . . , xn , y) = 0 can be solved for y, say
y = g(x1 , . . . , xn ), it then becomes a matter of applying the chain rule to the com-
posite function / 0
w = ϕ x1 , . . . , xn , g(x1 , . . . , xn ) ,
to obtain
∂w
= ∂j ϕ + (∂n+1 ϕ)(∂j g).
∂xj
2.5. Functional Relations and Implicit Functions: A First Look 75

The derivatives ∂j g can then be evaluated by using (2.44).


In such a situation, however, there is a tricky point that must be confronted. Let
us explain it in the case of three variables for simplicity. That is, suppose we are
given w = ϕ(x, y, z) where x, y, z are constrained to satisfy F (x, y, z) = 0, and
suppose we can solve the latter equation for any one of the three variables in terms
of the other two. If we take x as an independent variable, the meaning of ∂w/∂x
depends critically on whether we take y or z as the other independent variable.

E XAMPLE 2. Let w = x2 + y 2 + z, and suppose x, y, z are constrained to


satisfy x + y + z = 0. If we take x and y as independent variables, then
z = −(x + y), so
∂w
w = x2 + y 2 − x − y, = 2x − 1.
∂x
But if we take x and z as independent variables, then y = −(x + z), and
∂w
w = x2 + (x + z)2 + z = 2x2 + 2xz + z 2 + z, = 4x + 2z.
∂x
Clearly, these two formulas for ∂w/∂x almost never agree.

The usual way to clarify this situation is to put subscripts on the partial deriva-
tives to indicate which variables are being held fixed:
)
∂w ))
= derivative of w with respect to x when y is fixed.
∂x )y

Thus, in Example 2,
) )
∂w )) ∂w ))
= 2x − 1, = 4x + 2z.
∂x )y ∂x )z

The preceding ideas work in much the same way when we are given more than
one constraint equation. For example, if we are given two equations F (x, y, u, v) =
0 and G(x, y, u, v) = 0, we may be able to solve them for the two variables u and
v in terms of the other two variables x and y. In this case the partial derivatives
of u and v with respect to x, say, can be calculated by differentiating the equations
F = 0 and G = 0, obtaining
∂u ∂v
∂x F + ∂u F + ∂v F = 0,
∂x ∂x
∂u ∂v
∂x G + ∂u G + ∂v G = 0,
∂x ∂x
76 Chapter 2. Differential Calculus

and then solving these (linear!) equations simultaneously for ∂u/∂x and ∂v/∂x.
By Cramer’s rule (Appendix A, (A.54)), the result is
- . - .
∂x F ∂v F ∂u F ∂x F
det det
∂u ∂x G ∂v G ∂v ∂u G ∂x G
=− - ., =− - ..
∂x ∂u F ∂v F ∂x ∂u F ∂v F
det det
∂u G ∂v G ∂u G ∂v G

E XAMPLE 3. Suppose the quantities x, y, and z are initially equal to 1, 0, and


2, respectively, and are constrained to satisfy the equations x5 + x(y 3 + 1)z −
2yz 5 = 3 and yz = sin(2x + y − z). By about how much do y and z change
if x is changed to 1.02?
Solution. We need to find dy/dx and dz/dx, which we abbreviate as y ′
and z ′ . Differentiating the two equations with respect to x, treating y and z as
implicit functions of x, we obtain

5x4 + (y 3 + 1)z + 3xy 2 zy ′ + x(y 3 + 1)z ′ − 2z 5 y ′ − 10yz 4 z ′ = 0,


zy ′ + yz ′ = cos(2x + y − z) · (2 + y ′ − z ′ ).

We could solve these equations for y ′ and z ′ as they stand, but since we are
interested in the answer at (x, y, z) = (1, 0, 2), we can simplify matters by
substituting in these values right now. The first equation reduces to 7 + z ′ −
64y ′ = 0 and the second one to 2y ′ = 2 + y ′ − z ′ , or
/ 0
64y ′ − z ′ = 7, y ′ + z ′ = 2 when (x, y, z) = (1, 0, 2) .
9 121
Solving these equations yields y ′ = 65 and z ′ = 65 , so — returning to
′ 9 9 ′
the original question — dy = y dx = 65 (.02) = 3250 and dz = z dx =
121 121
65 (.02) = 3250 .

EXERCISES

1. Compute ∂z/∂x and ∂z/∂y when z is determined as a function of y and x by


the following equations:
a. x + y 2 + z 3 = 3xyz.
b. 2x2 + 3y 2 + z 2 = e−z .
2. Suppose y and z are determined as functions of x by the equations z = x2 − y 2
and z = 2x+4y. Find dy/dx and dz/dx (a) by solving the equations explicitly
for y and z; (b) by implicit differentiation.
2.6. Higher-Order Partial Derivatives 77

3. Compute dy/dt and dz/dt when y and z are determined as functions of t by


the equations y 5 + eyz + zt2 = 1 and y 2 + z 4 = t2 .
4. If u = x2 + 3y 2 and y = xz, there are two possible meanings for ∂u/∂x
depending on whether the independent variables are taken as (x, y) or (x, z).
Compute both of them.
5. Let V = πr 2 h and S = 2πr(r + h) (the volume and surface area of a circular
cylinder). Compute
) ) ) )
∂V )) ∂V )) ∂V )) ∂S ))
, , , ,
∂h )r ∂h )S ∂S )r ∂V )r

where the subscript indicates the variable that is being held fixed.
6. Suppose that F (x, y, z) = 0 is an equation that can be solved to yield any of
the three variables as a function of the other two. Show that
∂x ∂y ∂z
= −1,
∂y ∂z ∂x

provided that the symbols are interpreted properly. (Part of the problem is to
say what the proper interpretation is.)
7. Suppose that the variables E, T , V , and P are related by a pair of equations,
f (E, T, V, P ) = 0 and g(E, T, V, P ) = 0, that can be solved for any two of the
variables in terms of the other two, and suppose that the differential equation
∂V E − T ∂T P + P = 0 is satisfied when V and T are taken as the independent
variables. Show that ∂P E + T ∂T V + P ∂P V = 0 when P and T are taken as
the independent variables. (This example comes from thermodynamics, where
E, T , V , and P represent energy, temperature, volume, and pressure.)

2.6 Higher-Order Partial Derivatives


If f is a differentiable function on an open set S ⊂ Rn , its partial derivatives ∂j f
are also functions on S, and they themselves may have partial derivatives. The
standard notations for the second-order derivative
+ ,
∂ ∂f
∂x i ∂xj
are
∂2f
, f xj xi , fji , ∂xi ∂xj f, ∂i ∂j f
∂xi ∂xj
78 Chapter 2. Differential Calculus

if i ̸= j and
∂2f
, f xj xj , fjj , ∂x2j f, ∂j2 f
∂x2j
if i = j. The analogues of these notations for higher-order partial derivatives
should be pretty clear. However, all of them become quite cumbersome when the
order of the derivative is even moderately large. There is a more compact notation
for partial derivatives of arbitrary order that we shall introduce below.
A function f is said to be of class C k on an open set U if all of its partial
derivatives of order ≤ k — that is, all the derivatives ∂i1 ∂i2 · · · ∂il f , for all choices
of the indices ij and all l ≤ k — exist and are continuous on U . We also say that f
is of class C k on a nonopen set S if it is of class C k on some open set that includes
S. If the partial derivatives of f of all orders exist and are continuous on U , f is
said to be of class C ∞ on U .
It is common to refer to the derivatives ∂j2 f and ∂i ∂j f (i ̸= j) as pure and
mixed second-order partial derivatives of f , respectively. In this connection, a
question that immediately arises is whether the order of differentiation matters.
In other words, is ∂i ∂j f the same as ∂j ∂i f ? Experimentation with elementary
examples suggests that the answer is yes.
E XAMPLE 1. If g(x, y) = x sin(x3 + e2y ), we have
∂x g = sin(x3 + e2y ) + 3x3 cos(x3 + e2y ), ∂y g = 2xe2y cos(x3 + e2y ).
Differentiating ∂x g with respect to y and ∂y g with respect to x yields
∂y ∂x g(x, y) = 2e2y cos(x3 + e2y ) − 6x3 e2y sin(x3 + e2y ) = ∂x ∂y g(x, y).
However, the following example shows that ∂i ∂j f may fail to coincide with
∂j ∂i f .
E XAMPLE 2. Let
xy(x2 − y 2 )
f (x, y) = if (x, y) ̸= (0, 0), f (0, 0) = 0.
x2 + y 2
Since f (x, 0) = f (0, y) = 0 for all x, y, we have ∂x f (0, 0) = ∂y f (0, 0) = 0,
and a little calculation shows that for (x, y) ̸= (0, 0),
x4 y + 4x2 y 3 − y 5 x5 − 4x3 y 2 − xy 4
∂x f (x, y) = , ∂y f (x, y) = .
(x2 + y 2 )2 (x2 + y 2 )2
In particular, ∂x f (0, y) = −y and ∂y f (x, 0) = x for all x, y, so
∂y ∂x f (0, 0) = −1 but ∂x ∂y f (0, 0) = 1.
2.6. Higher-Order Partial Derivatives 79

On the other hand, another little calculation shows that


x6 + 9x4 y 2 − 9x3 y 4 − y 6
∂y ∂x f (x, y) = ∂x ∂y f (x, y) = for (x, y) ̸= (0, 0).
(x2 + y 2 )3
This last expression has no limit as (x, y) → (0, 0) (approaching (0,0) along
different straight lines gives different limits). Thus, we see that ∂y ∂x f and
∂x ∂y f exist everywhere, are continuous except at the origin, and are equal
except at the origin.
Fortunately, the pathological behavior in Example 2 is quite atypical. The fol-
lowing theorem guarantees that the order of differentiation is immaterial in most
situations that arise in practice.
2.45 Theorem. Let f be a function defined in an open set S ⊂ Rn . Suppose a ∈ S
and i, j ∈ {1, . . . , n}. If the derivatives ∂i f , ∂j f , ∂i ∂j f , and ∂j ∂i f exist in S, and
if ∂i ∂j f and ∂j ∂i f are continuous at a, then ∂i ∂j f (a) = ∂j ∂i f (a).
Proof. Since only the variables xi and xj are actually involved here, we may as well
assume that n = 2 and write x = (x, y) and a = (a, b), so that we are studying
the derivatives ∂x ∂y f and ∂y ∂x f . These derivatives can be regarded as limits of
second-order difference quotients, so we begin by examining the “difference of
differences” obtained when x and y are both changed by an amount h:
8 9 8 9
D = f (a + h, b + h) − f (a + h, b) − f (a, b + h) − f (a, b)
8 9 8 9
= f (a + h, b + h) − f (a, b + h) − f (a + h, b) − f (a, b) .
That is, if we set
ϕ(t) = f (a + h, b + t) − f (a, b + t), ψ(t) = f (a + t, b + h) − f (a + t, b),
we have
D = ϕ(h) − ϕ(0) = ψ(h) − ψ(0).
We apply the (one-variable) mean value theorem twice to the first expression for
D, obtaining
8 9
D = ϕ′ (v)h = ∂y f (a + h, b + v) − ∂y f (a, b + v) h
= ∂x ∂y f (a + u, b + v)h2 ,
where u and v are some numbers between 0 and h. Likewise, using the second
expression for D, we obtain
8 9
D = ψ ′ (:
u)h = ∂x f (a + u:, b + h) − ∂x f (a + u
:, b) h
:, b + v:)h2 ,
= ∂y ∂x f (a + u
80 Chapter 2. Differential Calculus

where u: and v: are some other numbers between 0 and h. Equating these two
expressions and cancelling the h2 , we have

∂x ∂y f (a + u, b + v) = ∂y ∂x f (a + u
:, b + v:).

Now let h → 0. Then u, v, u :, v: → 0 also, so since ∂x ∂y f and ∂y ∂x f are assumed


continuous at (a, b), we obtain ∂x ∂y f (a, b) = ∂y ∂x f (a, b).

2.46 Corollary. If f is of class C 2 on an open set S, then ∂i ∂j f = ∂j ∂i f on S, for


all i and j.

Once this is known, an elementary but slightly messy inductive argument shows
that the analogous result for higher-order derivatives is also true:

2.47 Theorem. If f is of class C k on an open set S, then

∂i1 ∂i2 · · · ∂ik f = ∂j1 ∂j2 · · · ∂jk f on S

whenever the sequence {j1 , . . . , jk } is a reordering of the sequence {i1 , . . . , ik }.

The fact that the order of differentiation in a mixed partial derivative can occa-
sionally matter is a technicality that is of essentially no importance in applications.
In fact, by adopting a more sophisticated viewpoint one can prove a theorem to
the effect that, under very general conditions, ∂i ∂j f and ∂j ∂i f are always equal
“almost everywhere,” which is enough to allow regarding them as equal for all
practical purposes.
The chain rule can be used to compute higher-order partial derivatives of com-
posite functions, but there are some pitfalls to be avoided. To be concrete, suppose
that w = f (x, y) and that x and y are functions of s and t. Assume that all the
functions in question are at least of class C 2 . To begin with, the chain rule for
first-order derivatives gives
∂w ∂w ∂x ∂w ∂y
(2.48) = + .
∂s ∂x ∂s ∂y ∂s

If we want to compute ∂ 2 w/∂s2 , we differentiate (2.48) with respect to s, obtaining


+ , + ,
∂2w ∂ ∂w ∂x ∂w ∂ 2 x ∂ ∂w ∂y ∂w ∂ 2 y
(2.49) = + + + .
∂s2 ∂s ∂x ∂s ∂x ∂s2 ∂s ∂y ∂s ∂y ∂s2
+ ,
∂ ∂w ∂2w
The first pitfall is to write as a mixed partial derivative . This
∂s ∂x ∂s∂x
makes no sense because when we write ∂w/∂x we are thinking of w as a function
2.6. Higher-Order Partial Derivatives 81

of x and y, not x and s. Rather, ∂w/∂x is a function of x and y just like w, and
to differentiate it with respect to s we use the chain rule again; and likewise for
∂w/∂y:
+ , + ,
∂ ∂w ∂ 2 w ∂x ∂ 2 w ∂y ∂ ∂w ∂ 2 w ∂x ∂ 2 w ∂y
(2.50) = + , = + .
∂s ∂x ∂x2 ∂s ∂x∂y ∂s ∂s ∂y ∂x∂y ∂s ∂y 2 ∂s
Now we plug these results into (2.49) to get the final answer, which thus contains
quite a few terms. Pitfall number 2: It’s easy to forget some of these terms.
In this situation it’s usually advantageous to use the notation fx and fy in-
stead of ∂w/∂x and ∂w/∂y, and likewise for second-order derivatives. This makes
(2.48)–(2.50) look a little more manageable:
∂w ∂x ∂y
= fx + fy ,
∂s ∂s ∂s
∂2w ∂fx ∂x ∂ 2 x ∂fy ∂y ∂2y
= + f x + + f y ,
∂s2 ∂s ∂s ∂s2 ∂s ∂s ∂s2
∂fx ∂x ∂y ∂fy ∂x ∂y
= fxx + fxy , = fxy + fyy .
∂s ∂s ∂s ∂s ∂s ∂s
The final result is then
+ ,2 + ,2
∂2w ∂x ∂x ∂y ∂y ∂2x ∂2y
= f xx + 2f xy + f yy + f x + f y .
∂s2 ∂s ∂s ∂s ∂s ∂s2 ∂s2
Of course, similar results also hold for the other second-order derivatives of w.
E XAMPLE 3. Suppose u = f (x, y), x = s2 − t2 , y = 2st. Assuming f is of
class C 2 , find ∂ 2 u/∂s∂t in terms of the derivatives of f .
∂u ∂x ∂y
Solution. = fx + fy = −2tfx + 2sfy , so
∂t ∂t ∂t
∂2u
= −2t[2sfxx + 2tfxy ] + 2s[2sfxy + 2tfyy ] + 2fy
∂s∂t
= −4stfxx + 4(s2 − t2 )fxy + 4stfyy + 2fy .

E XAMPLE 4. Let us see what happens to some derivatives when we change


from Cartesian to polar coordinates. Let u = f (x, y), where f is of class C 2 ,
and let x = r cos θ and y = r sin θ. Then
∂u ∂x ∂y
= fx + fy = (cos θ)fx + (sin θ)fy ,
∂r ∂r ∂r
∂u ∂x ∂y
= fx + fy = −(r sin θ)fx + (r cos θ)fy .
∂θ ∂θ ∂θ
82 Chapter 2. Differential Calculus

Proceeding to the second derivatives,

∂2u ∂fx ∂fy


= (cos θ) + (sin θ)
∂r 2 ∂r ∂r
= (cos θ)fxx + (2 cos θ sin θ)fxy + (sin2 θ)fyy ,
2

∂2u ∂fx ∂fy


2
= −(r cos θ)fx − (r sin θ) − (r sin θ)fy + (r cos θ)
∂θ ∂θ ∂θ
∂u
= (r 2 sin2 θ)fxx − (2r 2 sin θ cos θ)fxy + (r 2 cos2 θ)fyy − r .
∂r

The calculation of the mixed derivative ∂ 2 u/∂r∂θ is left to the reader (Exercise
2).
Notice, in particular, that by combining the last two equations and using
the identity sin2 θ + cos2 θ = 1, we obtain

∂ 2 u 1 ∂u 1 ∂2u
+ + = fxx + fyy .
∂r 2 r ∂r r 2 ∂θ 2

The expression on the right, the sum of the pure second partial derivatives of f
with respect to a Cartesian coordinate system, turns up in many practical and
theoretical applications; it is called the Laplacian of f . (We shall encounter
it again in Chapter 5.) What we have just accomplished is the calculation of
the Laplacian in polar coordinates. We state this result formally, with slightly
different notation.

2.51 Proposition. Suppose u is a C 2 function of (x, y) in some open set in R2 . If


(x, y) is related to (r, θ) by x = r cos θ, y = r sin θ, we have

∂2u ∂2u ∂ 2 u 1 ∂u 1 ∂2u


+ = + + .
∂x2 ∂y 2 ∂r 2 r ∂r r 2 ∂θ 2

Multi-index Notation. Traditional notations for partial derivatives become


rather cumbersome for derivatives of order higher than two, and they make it rather
difficult to write Taylor’s theorem in an intelligible fashion. However, a better
notation, which is now in common usage in the literature of partial differential
equations, is available.
A multi-index is an n-tuple of nonnegative integers. Multi-indices are gener-
ally denoted by the Greek letters α or β:
/ 0
α = (α1 , α2 , . . . , αn ), β = (β1 , β2 , . . . , βn ) αj , βj ∈ {0, 1, 2, . . .} .
2.6. Higher-Order Partial Derivatives 83

If α is a multi-index, we define

|α| = α1 + α2 + · · · + αn , α! = α1 !α2 ! · · · αn !,
α
x = xα1 1 xα2 2 · · · xαnn (where x = (x1 , x2 , . . . , xn ) ∈ Rn ),
∂ |α| f
∂ α f = ∂1α1 ∂2α2 · · · ∂nαn f =
∂xα1 1 ∂xα2 2 · · · ∂xαnn

The number |α| = α1 + · · · + αn is called the order or degree of α. Thus, the


order of α is the same as the order of xα as a monomial or the order of ∂ α as a
partial derivative. (The notation |α| = α1 + · · · + αn conflicts with the notation
|x| = (x21 +· · ·+x2n )1/2 for the norm of an n-tuple of real numbers, but the meaning
will be clear from the context.)
If f is a function of class C k , by Theorem 2.47 the order of differentiation in a
kth-order partial derivative of f is immaterial. Thus, the generic kth-order partial
derivative of f can be written simply as ∂ α f with |α| = k.

E XAMPLE 5. With n = 3 and x = (x, y, z), we have

∂3f
∂ (0,3,0) f = , x(2,1,5) = x2 yz 5 .
∂y 3

As the notation xα indicates, multi-indices are handy for writing not only
derivatives but also polynomials in several variables. To illustrate their use, we
present a generalization of the binomial theorem.

2.52 Theorem (The Multinomial Theorem). For any x = (x1 , x2 , . . . xn ) ∈ Rn


and any positive integer k,
" k!
(x1 + x2 + · · · + xn )k = xα .
α!
|α|=k

Proof. The case n = 2 is just the binomial theorem:

k
" " " k!
k! k!
(x1 + x2 )k = xj xk−j = xα1 1 xα2 2 = xα ,
j!(k − j)! 1 2 α1 !α2 ! α!
j=0 α1 +α2 =k |α|=k

where we have set α1 = j, α2 = k −j, and α = (α1 , α2 ). The general case follows
by induction on n. Suppose the result is true for n < N and x = (x1 , . . . , xN ). By
84 Chapter 2. Differential Calculus

using the result for n = 2 and then the result for n = N − 1, we obtain
8 9k
(x1 + · · · + xN )k = (x1 + · · · + xN −1 ) + xN
" k!
= (x1 + · · · + xN −1 )i xjN
i!j!
i+j=k
" k! " i!
= :β xjN ,
x
i!j! β!
i+j=k |β|=i

where β = (β1 , . . . , βN −1 ) and x : = (x1 , . . . , xN −1 ). To conclude, we set α =


:β xjN = xα . Observing that α runs over
(β1 , . . . , βN −1 , j), so that β!j! = α! and x
all multi-indices of order k when ! β runs over all multi-indices of order i = k − j
and j runs from 0 to k, we obtain |α|=k k!xα /α!.

EXERCISES
In these exercises, all functions in question are assumed to be of class C 2 .

1. Verify by explicit calculation that ∂x ∂y f = ∂y ∂x f :


a. f (x, y) = x2 y + sin πxy.
2
b. f (x, y) = e4x−y .
c. f (x, y) = (x + 2y + 4)/(7x + 3y).
2. Calculate ∂ 2 u/∂r∂θ if u = f (x, y), x = r cos θ, y = r sin θ. (See Example
4.)
3. Compute the indicated derivatives of w in terms of the derivatives of f :
a. ∂x2 w and ∂x ∂y w, if w = f (2x − y 2 , x sin 3y, x4'
).
2
b. ∂x ∂y w and ∂y w, if w = f (e x−3y 2
, log(x + 1), y 4 + 4).
4. Show that if u = F (x + g(y)), then ux uxy = uy uxx .
Suppose that f is a homogeneous function of degree a on Rn . Show that
5. !
n
j,k=1 xj xk ∂j ∂k f = a(a − 1)f (cf. Euler’s theorem (2.36) and its proof).
6. Suppose u = f (x, y), x = s2 − t2 , y = 2st. Show that ∂s2 u + ∂t2 u =
4(s2 + t2 )(∂x2 f + ∂y2 f ) (cf. Example 3).
7. Suppose u = f (x − ct) + g(x + ct), where c is a constant. Show that ∂x2 u =
c−2 ∂t2 u.
8. For x = (x, y, z) ∈ R3 \ {0} and t ∈ R, let F (x, t) = r −1 g(ct − r), where
c is a constant, g is a C 2 function of one variable, and r = |x|. Show that
∂x2 F + ∂y2 F + ∂z2 F = c−2 ∂t2 F .
2.7. Taylor’s Theorem 85

9. For x ∈ Rn \ {0}, let F (x) = f (r) where f is a C 2 function on (0, ∞) and


r = |x|. Show that ∂12 F + · · · + ∂n2 F = f ′′ (r) + (n − 1)r −1 f ′ (r).
10. Derive the!following version of the product rule for partial derivatives:
∂ α (f g) = β+γ=α (α!/β!γ!)∂ β f ∂ γ g.
! n-dimensional binomial
11. Prove the following theorem: For all x, y ∈ Rn we
α
have (x + y) = β+γ=α (α!/β!γ!)x y . β γ

2.7 Taylor’s Theorem


In this section we discuss Taylor expansions in their finite form, as polynomial
approximations to a function rather than expansions in infinite series. We begin
with a review of Taylor’s theorem for functions of one real variable.
Taylor’s theorem is a higher-order version of the tangent line approximation; it
says that a function f of class C k on an interval I containing the point x = a is the
sum of a certain polynomial of degree k and a remainder term that vanishes more
rapidly than |x − a|k as x → a. Specifically, the polynomial P = Pa,k of order k
such that P (j) (0) = f (j) (a) for 0 ≤ j ≤ k, namely
k
" f (j) (a)
(2.53) Pa,k (h) = hj ,
j!
j=0

is called the kth-order Taylor polynomial for f based at a, and the difference
k
" f (j)(a)
(2.54) Ra,k (h) = f (a + h) − Pa,k (h) = f (a + h) − hj
j!
j=0

is called the kth-order Taylor remainder. The various versions of Taylor’s theorem
provide formulas or estimates for Ra,k that ensure that the Taylor polynomial Pa,k
is a good approximation to f near a. The ones most commonly known involve the
stronger assumption that f is of class C k+1 and yield the stronger conclusion that
the remainder vanishes as rapidly as |x − a|k+1 . We present two of these, as well
as one that yields the more general form of the theorem stated above.
The easiest version of Taylor’s theorem to derive is the following.
2.55 Theorem (Taylor’s Theorem with Integral Remainder, I). Suppose that f is
of class C k+1 (k ≥ 0) on an interval I ⊂ R, and a ∈ I. Then the remainder Ra,k
defined by (2.53)–(2.54) is given by
*
hk+1 1
(2.56) Ra,k (h) = (1 − t)k f (k+1) (a + th) dt.
k! 0
86 Chapter 2. Differential Calculus

Proof. For k = 0 the assertion is just that


* 1
(2.57) f (a + h) = f (a) + h f ′ (a + th) dt,
0

which is easily verified by the substitution u = a + th:


* 1 * a+h
h f ′ (a + th) dt = f ′ (u) du = f (a + h) − f (a).
0 a

The trick now is to integrate (2.57) by parts, choosing for the antiderivative of the
constant function 1 not t but t − 1, alias −(1 − t):
* 1 * 1
)1
h f ′ (a + th) dt = −(1 − t)hf ′ (a + th))0 + h (1 − t)f ′′ (a + th)h dt
0 0
* 1
= f ′ (a)h + h2 (1 − t)f ′′ (a + th) dt.
0

Plugging this into (2.57), we obtain (2.56) in the case k = 1. If we integrate by


parts again,
* 1
h2 (1 − t)f ′′ (a + th) dt
0
)1 * 1
2 −(1 − t)2 ′′ ) (1 − t)2 ′′′
=h f (a + th))) + h2 f (a + th)h dt
2 0 0 2
*
f ′′ (a) 2 h3 1
= h + (1 − t)2 f ′′′ (a + th) dt,
2 2 0

we obtain the theorem for k = 2. The pattern is now clear: Integrating (2.57) by
parts k times yields (2.56).

Next we present a modification of Theorem 2.55 that works without assum-


ing that f has any additional derivatives beyond the ones occurring in the Taylor
polynomial.

2.58 Theorem (Taylor’s Theorem with Integral Remainder, II). Suppose that
f is of class C k (k ≥ 1) on an interval I ⊂ R, and a ∈ I. Then the remain-
der Ra,k defined by (2.53)–(2.54) is given by
* 1
hk 8 9
(2.59) Ra,k (h) = (1 − t)k−1 f (k) (a + th) − f (k) (a) dt.
(k − 1)! 0
2.7. Taylor’s Theorem 87

Proof. We begin by using Theorem 2.55, with k replaced by k − 1:


k−1 (j)
" * 1
f (a) j hk
f (a + h) − h = (1 − t)k−1 f (k) (a + th) dt.
j! (k − 1)! 0
j=0

Subtracting f (k) (a)hk /k! from both sides gives

k
" * 1
f (j)(a) hk hk
f (a + h) − hj = (1 − t)k−1 f (k) (a + th) dt − f (k) (a) .
j! (k − 1)! 0 k!
j=0

In view of the fact that


* 1
hk hk
= (1 − t)k−1 dt,
k! (k − 1)! 0

this gives (2.59).

The formulas (2.56) and (2.59) are generally used not to obtain the exact value
of the remainder but to obtain an estimate for it. The main results are in the follow-
ing corollaries.

2.60 Corollary. If f is of class C k on I, then Ra,k (h)/hk → 0 as h → 0.

Proof. f (k) is continuous at a, so for any ϵ > 0 there exists δ > 0 such that
|f (k) (y) − f (k) (a)| < ϵ when |y − a| < δ. In particular,
) (k) )
)f (a + th) − f (k)(a)) < ϵ for 0 ≤ t ≤ 1 when |h| < δ.

Hence, (2.59) gives


* 1
|h|k ϵ k
|Ra,k (h)| ≤ (1 − t)k−1 ϵ dt = |h| for |h| < δ.
(k − 1)! 0 k!

In other words, |Ra,k (h)/hk | < ϵ/k! whenever |h| < δ, and hence Ra,k (h)/hk →
0 as h → 0.

Thus, if f is of class C k near x = a, we can write f (x) as the sum of a kth-order


polynomial (the Taylor polynomial) in h = x − a and a remainder that vanishes at
x = a faster than any nonzero term in the polynomial. Notice that for k = 1, this
is just a restatement of the differentiability of f . If f is actually of class C k+1 , we
obtain a better estimate from (2.56):
88 Chapter 2. Differential Calculus

2.61 Corollary. If f is of class C k+1 on I and |f (k+1) (x)| ≤ M for x ∈ I, then


M
|Ra,k (h)| ≤ |h|k+1 , (a + h ∈ I).
(k + 1)!
Proof. By (2.56),
* 1
|h|k+1 M
|Ra,k (h)| ≤ (1 − t)k M dt = |h|k+1 .
k! 0 (k + 1)!

Finally, we present Lagrange’s form of the remainder, which turns Taylor’s


theorem into a higher-order version of the mean value theorem. Just as we deduced
the mean value theorem from Rolle’s theorem, we shall obtain Lagrange’s formula
from the following variant of Rolle’s theorem.
2.62 Lemma. Suppose g is k + 1 times differentiable on [a, b]. If g(a) = g(b) and
g(j) (a) = 0 for 1 ≤ j ≤ k, then there is a point c ∈ (a, b) such that g(k+1) (c) = 0.
Proof. By Rolle’s theorem, there is a point c1 ∈ (a, b) such that g′ (c1 ) = 0. Since
g′ is continuous on [a, c1 ] and differentiable on (a, c1 ), and g′ (a) = g′ (c1 ) = 0,
there is a point c2 ∈ (a, c1 ) such that g ′′ (c2 ) = 0. Proceeding inductively, we find
that for 1 ≤ j ≤ k + 1 there is a point cj ∈ (a, cj−1 ) such that g (j) (cj ) = 0, and
the final case j = k + 1 is the desired result.
2.63 Theorem (Taylor’s Theorem with Lagrange’s Remainder). Suppose f is k + 1
times differentiable on an interval I ⊂ R, and a ∈ I. For each h ∈ R such that
a + h ∈ I there is a point c between 0 and h such that
hk+1
(2.64) Ra,k (h) = f (k+1) (a + c) .
(k + 1)!
Proof. Let us fix a particular h, and suppose for now that h > 0. Let
Ra,k (h) k+1
g(t) = Ra,k (t) − t
hk+1
f (k) (a) k Ra,k (h) k+1
= f (a + t) − f (a) − f ′ (a)t − · · · − t − k+1 t .
k! h
The coefficient of tk+1 is chosen to make g(h) = 0, and clearly g(0) = 0. Simi-
larly, for j ≤ k we have
f (k) (a) k−j
g(j) (t) = f (j)(a + t) − f (j) (a) − · · · − t
(k − j)!
Ra,k (h)
− k+1 (k + 1) · · · (k + 2 − j)tk+1−j ,
h
2.7. Taylor’s Theorem 89

so g(j) (0) = 0. Therefore, by Lemma 2.62, there is a point c ∈ (0, h) such that

Ra,k (h)
0 = g(k+1) (c) = f (k+1) (a + c) − (k + 1)!.
hk+1
But this is precisely (2.64). The case h < 0 is handled similarly by considering the
function :g(t) = g(−t) on the interval [0, |h|].

Corollary 2.61 is obviously an immediate consequence of (2.64).


Remark. In Theorem 2.55 we assumed that f is of class C k+1 , but in Theo-
rem 2.63 we needed only the existence, not the continuity, of f (k+1) . Actually, in
Theorem 2.55 it is enough to assume that f (k+1) is Riemann integrable.
For the convenience of the reader, we recall a few of the most familiar and
useful Taylor expansions, which are easily derived from the definition (2.53). They
will be used without comment in the rest of the book.

2.65 Proposition. The Taylor polynomials of degree k about a = 0 of the functions

ex , cos x, sin x, (1 − x)−1

are, respectively,

" xj " (−1)j x2j " (−1)j x2j+1 "


, , , xj .
j! (2j)! (2j + 1)!
0≤j≤k 0≤j≤k/2 0≤j≤(k−1)/2 0≤j≤k

Taylor polynomials have many uses. From a practical point of view, they allow
one to approximate complicated functions by polynomials that are relatively easy
to compute with. On the more theoretical side, it is an important general principle
that the behavior of a function f (x) near x = a is largely determined by the first
nonvanishing term, apart from the constant term f (a), in its Taylor expansion. That
is, if f ′ (a) ̸= 0, then the tangent line approximation f (x) ≈ f (a) + f ′ (a)(x − a)
is a good one. If f ′ (a) = 0 but f ′′ (a) ̸= 0, the second-order term is decisive,
and so forth. This is the basis for the second-derivative test for local extrema: If
f ′′ (a) ̸= 0, then f (x) ≈ f (a) + 12 f ′′ (a)(x − a)2 , and the expression on the right
is a quadratic function with a maximum or minimum at a, depending on the sign
of f ′′ (a). (See Exercise 9 and §2.8.) The following example illustrates another
application of this principle.

x2 − sin x2
E XAMPLE 1. Use Taylor expansions to evaluate lim .
x→0 x4 (1 − cos x)
90 Chapter 2. Differential Calculus

Solution. We have

x2 − sin x2 = x2 − (x2 − 16 x6 + · · · ) = 16 x6 + · · · ,
/ 0
x4 (1 − cos x) = x4 1 − (1 − 12 x2 + · · · ) = 21 x6 + · · · ,

where the dots denote error terms that vanish faster than x6 as x → 0. There-
fore,
1 6 1
x2 − sin x2 6x + · · · 6 + ···
= 1 = 1 ,
x4 (1 − cos x) 6
2x + · · · 2 + ···
where the dots in the last fraction denote error terms that vanish as x → 0. The
limit is therefore 13 . (To appreciate the efficiency of this calculation, try doing
it by l’Hôpital’s rule!)
We now generalize these results to functions on Rn . Suppose f : Rn → R is of
class C k on a convex open set S. We can derive a Taylor expansion for f (x) about
a point a ∈ S by looking at the restriction of f to the line joining a and x. That is,
we set h = x − a and

g(t) = f (a + t(x − a)) = f (a + th).

By the chain rule,


g′ (t) = h · ∇f (a + th),
and hence
g(j) (t) = (h · ∇)j f (a + th),
where the expression on the right denotes the result of applying the operation
∂ ∂
(2.66) h · ∇ = h1 + · · · + hn
∂x1 ∂xn
j times to f . The Taylor formula for g with a = 0 and h = 1,
k
" g(j) (0)
g(1) = 1j + (remainder),
j!
0

therefore yields
k
" (h · ∇)j f (a)
(2.67) f (a + h) = + Ra,k (h),
j!
0

where formulas for Ra,k (h) can be obtained from the formulas (2.56), (2.59), or
(2.64) applied to g.
2.7. Taylor’s Theorem 91

It is usually preferable, however, to rewrite (2.67) and the accompanying for-


mulas for the remainder so that the partial derivatives of f appear more explicitly.
To do this, we apply the multinomial theorem to the expression (2.66) to get
" j!
(h · ∇)j = hα ∂ α .
α!
|α|=j

Substituting this into (2.67) and the remainder formulas, we obtain the following:

2.68 Theorem (Taylor’s Theorem in Several Variables). Suppose f : Rn → R is


of class C k on an open convex set S. If a ∈ S and a + h ∈ S, then
" ∂ α f (a)
(2.69) f (a + h) = hα + Ra,k (h),
α!
|α|≤k

where
" hα * 1 8 9
(2.70) Ra,k (h) = k (1 − t)k−1 ∂ α f (a + th) − ∂ α f (a) dt.
α! 0
|α|=k

If f is of class C k+1 on S, we also have


" * 1

(2.71) Ra,k (h) = (k + 1) (1 − t)k ∂ α f (a + th) dt,
α! 0
|α|=k+1

and
" hα
(2.72) Ra,k (h) = ∂ α f (a + ch) for some c ∈ (0, 1).
α!
|α|=k+1

This result bears a pleasing similarity to the single-variable formulas (2.54),


(2.56), (2.59), and (2.64) — a triumph for multi-index notation! It may be reas-
suring, however, to see the formula for the second-order Taylor polynomial written
out in the more familiar notation:
(2.73)
n
" n
1 "
Pa,2 (h) = f (a) + ∂j f (a)hj + ∂j ∂k f (a)hj hk
2
j=1 j,k=1
"n " n "
1
(2.74) = f (a) + ∂j f (a)hj + ∂j2 f (a)h2j + ∂j ∂k f (a)hj hk .
1
2
j=1 1≤j<k≤n
92 Chapter 2. Differential Calculus

The first of these formulas is (2.67) with k = 2; the second one is (2.69). (Every
multi-index α of order 2 is either of the form (. . . , 2, . . .) or (. . . , 1, . . . , 1, . . .),
where the dots denote zero entries, so the sum over |α| = 2 in (2.69) breaks up into
the last two sums in (2.74).) Notice that the mixed derivatives ∂j ∂k (j ̸= k) occur
twice in (2.73) (since ∂j ∂k = ∂k ∂j ) but only once in (2.74) (since j < k there);
this accounts for the disappearance of the factor of 12 in the last sum in (2.74).
We also have the following analogue of Corollaries 2.60 and 2.61:

2.75 Corollary. If f is of class C k on S, then Ra,k (h)/|h|k → 0 as h → 0. If f is


of class C k+1 on S and |∂ α f (x)| ≤ M for x ∈ S and |α| = k + 1, then

M
|Ra,k (h)| ≤ ∥h∥k+1 ,
(k + 1)!

where
∥h∥ = |h1 | + |h2 | + · · · + |hn |.

Proof. The proof of the first assertion is the same as the proof of Corollary 2.60.
As for the second, it follows easily from either (2.71) or (2.72) that
" |hα |
|Ra,k (h)| ≤ M ,
α!
|α|=k+1

and this last expression equals M ∥h∥k+1 /(k+1)! by the multinomial theorem.

An essential fact about the Taylor expansion of a function f about a point a


is that it is the only way to write f as the sum of a polynomial of degree k and a
remainder that vanishes to higher order than |x − a|k as x → a. To see this, we
need the following lemma.

2.76 Lemma. If P (h) is a polynomial of degree ≤ k that vanishes to order > k as


h → 0 [i.e., P (h)/|h|k → 0], then P ≡ 0.

Proof. The hypothesis implies that, for each fixed h, P (th)/tk → 0 as t → 0.


Write P = P0 + P1 + · · · + Pk where Pj is the sum of the terms of order j in P ;
thus
P (th) = P0 + tP1 (h) + t2 P2 (h) + · · · + tk Pk (h).
P0 is the constant term; since P (0) = 0 we must have P0 = 0. Hence, dividing by
t,
P (th)
= P1 (h) + tP2 (h) + · · · + tk−1 Pk (h).
t
2.7. Taylor’s Theorem 93

Since P (th)/t → 0, we must have P1 (h) = 0. But then, dividing by t again,

P (th)
= P2 (h) + · · · + tk−2 Pk (h),
t2

so P2 (h) = 0 since P (th)/t2 → 0. Continuing inductively, we conclude that


Pj (h) = 0 for all j, so P ≡ 0.

2.77 Theorem. Suppose f is of class C (k) near a. If f (a + h) = Q(h) + E(h)


where Q is a polynomial of degree ≤ k and E(h)/|h|k → 0 as h → 0, then Q is
the Taylor polynomial Pa,k .

Proof. Corollary 2.75 says that f (a+h) = Pa,k (h)+Ra,k (h), where Ra,k (h)/|h|k
tends to zero as h does. If also f (a+h) = Q(h)+E(h), then Q−Pa,k = Ra,k −E,
so
Q(h) − Pa,k (h) Ra,k (h) − E(h)
k
= → 0.
|h| |h|k
By Lemma 2.76, Q = Pa,k .

Theorem 2.77 has the following important practical consequence. If one wants
to compute the Taylor expansion of f , it may be very tedious to calculate all the
derivatives needed in formula (2.69) directly. But if one can find, by any means
whatever, a polynomial Q of degree k such that [f (a + h) − Q(h)]/|h|k → 0,
then Q must be the Taylor polynomial. This enables one to generate new Taylor
expansions from old ones by operations such as substitution, multiplication, etc.
2
E XAMPLE 2. Find the 3rd-order Taylor polynomial of f (x, y) = ex +y about
(x, y) = (0, 0).
Solution. The direct method is to calculate the derivatives fx , fy , fxx , fxy ,
fyy , fxxx , fxxy , fxyy , and fyyy , and then plug the results into (2.69), but only a
masochist would do this. Instead, use the familiar expansion for the exponential
function (Proposition 2.65), neglecting all terms of order higher than 3:
2 +y
ex = 1 + (x2 + y) + 12 (x2 + y)2 + 61 (x2 + y)3 + (order > 3)
= 1 + x2 + y + 12 (x4 + 2x2 y + y 2 ) + 61 (x6 + 3x4 y + 3x2 y 2 + y 3 )
+ (order > 3)
= 1 + y + x2 + 12 y 2 + x2 y + 16 y 3 + (order > 3).

In the last line we have thrown the terms x4 , x6 , x4 y, and x2 y 2 into the garbage
pail, since they are themselves of order > 3. Thus the answer is 1 + y + x2 +
94 Chapter 2. Differential Calculus

1 2
2y + x2 y + 16 y 3 . Alternatively,
2 +y 2
ex = ex ey = (1 + x2 + · · · )(1 + y + 12 y 2 + 16 y 3 + · · · )
= 1 + y + x2 + 12 y 2 + x2 y + 61 y 3 + · · ·

where the dots indicate terms of order > 3.

EXERCISES
1. Let f (x) = x2 (x − sin x) and g(x) = (ex − 1)(cos 2x − 1)2 .
a. Compute the Taylor polynomials of order 5 based at a = 0 of f and g.
(Don’t compute any derivatives; use Proposition 2.65 as a starting point.)
b. Use the result of (a) to find limx→0 f (x)/g(x) without using l’Hôpital’s
rule.
2. Find the Taylor polynomial P1,3 (h) and give a constant C such that |R1,3 (h)| ≤
Ch4 on the interval |h| ≤ 12 for each of the following functions.
a. f (x) = log x.

b. f (x) = x.
c. f (x) = (x + 3)−1 .
3. Show that | sin x − x + 61 x3 | < .08 for |x| ≤ 12 π. (Hint: x − 16 x3 is actually
the 4th-order Taylor polynomial of sin x.) How large do you have to take k so
that the kth-order Taylor polynomial of sin x about a = 0 approximates sin x
to within .01 for |x| ≤ 12 π?
2 ;1 2
4. Use a Taylor approximation to e−x to compute 0 e−x dx to three decimal
places, and prove the accuracy of your answer. (Hint: It’s easier to apply
Corollary 2.61 to f (t) = e−t and set t = x2 than to apply Corollary 2.61
2
to e−x directly.)
5. Find the Taylor polynomial of order 4 based at a = (0, 0) for each of the
following functions. Don’t compute any derivatives; use Proposition 2.65.
a. f (x, y) = x sin(x + y).
b. exy cos(x2 + y 2 ).
c. ex−2y /(1 + x2 − y).
6. Find the 3rd-order Taylor polynomial of f (x, y) = x + cos πy + x log y based
at a = (3, 1).
7. Find the 3rd-order Taylor polynomial of f (x, y, z) = x2 y + z based at a =
(1, 2, 1). The remainder vanishes identically; why? (You can see this either
from the Taylor remainder formula or by algebra.)
2.8. Critical Points 95

8. Suppose f is defined on the open interval I and a ∈ I. The Taylor polynomial


Pa,k is well defined provided merely that f is of class C k−1 on I and f (k) (a)
exists. Show that under these hypotheses, the remainder Ra,k = f − Pa,k still
satisfies limh→0 Ra,k (h)/hk = 0. (Hint: Apply l’Hôpital’s rule k − 1 times,
then recall precisely what it means for f (k) (a) to exist.)
9. Suppose that f is of class C k on an open interval containing the point a, and
that f ′ (a) = · · · = f (k−1) (a) = 0 but f (k) (a) ̸= 0. Use Corollary 2.60 to
show that (i) if k is even, then f has a local maximum or local minimum at a
according as f (k) (a) is negative or positive, and (ii) if k is odd, f has neither a
maximum nor a minimum at a.
10. Suppose f is of class C k on an open convex set S ⊂ Rn and its kth-order
derivatives, ∂ α f with |α| = k, satisfy

|∂ α f (y) − ∂ α f (x)| ≤ C|y − x|λ (x, y ∈ S),

where C and λ are positive constants (cf. Exercise 1 in §1.8). Use (2.70) to
show that there is another positive constant C ′ such that

|Ra,k (h)| ≤ C ′ |h|k+λ (a ∈ S and a + h ∈ S).

2.8 Critical Points


We know from elementary calculus that in studying a differentiable function f of a
real variable, it is particularly important to look at the points where the derivative
f ′ vanishes. The same is true for functions of several variables.
Suppose f is a differentiable function on some open set S ⊂ Rn . The point
a ∈ S is called a critical point for f if ∇f (a) = 0. Finding the critical points of f
is a matter of solving the n equations ∂1 f (x) = 0,. . . , ∂n f (x) = 0 simultaneously
for the n quantities x1 , . . . , xn .
We say that f has a local maximum (or local minimum) at a if f (x) ≤ f (a)
(or f (x) ≥ f (a)) for all x in some neighborhood of a. Just as in the one-variable
case, we have:

2.78 Proposition. If f has a local maximum or minimum at a and f is differentiable


at a, then ∇f (a) = 0.

Proof. If f has a local maximum or minimum at a, then for any unit vector u,
the function g(t) = f (a + tu) has a local maximum or minimum at t = 0, so
g′ (0) = ∂u f (a) = 0. In particular, ∂j f (a) = 0 for all j, so ∇f (a) = 0.
96 Chapter 2. Differential Calculus

How can we tell whether a function has a local maximum or minimum (or nei-
ther) at a critical point? For functions of one variable we have the second derivative
test: If f is of class C 2 , then f has a local minimum at a if f ′′ (a) > 0 and a local
maximum if f ′′ (a) < 0. (If f ′′ (a) = 0, no conclusion can be drawn.) Something
similar happens for functions of n variables, but the situation is a good deal more
complicated. The full story involves a certain amount of linear algebra; the reader
who is content to consider the case of two variables and wishes to skip the linear
algebra may proceed directly to Theorem 2.82.
Suppose f is a real-valued function of class C 2 on some open set S ⊂ Rn and
that f has a critical point at a, i.e., ∇f (a) = 0. Instead of one second derivative to
examine at a, we have a whole n × n matrix of them, called the Hessian of f at a:
⎛ 2 ⎞
∂1 f (a) ∂1 ∂2 f (a) . . . ∂1 ∂n f (a)
⎜ ∂2 ∂1 f (a) ∂ 2 f (a) . . . ∂2 ∂n f (a)⎟
⎜ 2 ⎟
(2.79) H = H(a) = ⎜ .. .. . .. ⎟.
⎝ . . . . . ⎠
∂n ∂1 f (a) ∂n ∂2 f (a) . . . ∂n2 f (a)

The equality of mixed partials (Theorem 2.45) guarantees that this is a symmetric
matrix, that is, Hij = Hji .
By (2.73), the second-order Taylor expansion of f about a is
n
" n
1 "
f (a + k) = f (a) + ∂j f (a)kj + ∂i ∂j f (a)ki kj + Ra,2 (k).
2
j=1 i,j=1

(We use k rather than h for the increment in this section to avoid a notational clash
with the Hessian H.) If ∇f (a) = 0, the first-order sum vanishes, and the second-
1!
order sum is 2 Hij ki kj = 21 Hk · k. In short,

(2.80) f (a + k) = f (a) + 21 Hk · k + Ra,2 (k).

Now we can begin to see how to analyze the behavior of f about a in terms of
the matrix H. To start with the simplest situation, suppose it happens that all the
mixed partials ∂i ∂j f (i ̸= j) vanish at a. Denoting ∂j2 f (a) by λj , we then have
n
"
f (a + k) = f (a) + λj kj2 + Ra,2 (k).
1

! us 2neglect the remainder term for the moment. If all λj are positive, then
Let
λj kj > 0 for all k ̸= 0, so f has a local minimum; likewise, if all λj are neg-
ative, then f has a local maximum. If some λj are positive and some are negative,
2.8. Critical Points 97

!
then λj kj2 will be positive for some values of k and negative for others, so f will
have neither a maximum or a minimum. It’s not hard to see that these conclusions
remain valid when the remainder term is included; we shall present the details be-
low. Only when some of the λj are zero is the outcome unclear; it is precisely in
this situation that the remainder term plays a significant role.
This is all very well, but the condition that ∂i ∂j f (a) = 0 for i ̸= j is ob-
viously very special. However, it can always be achieved by a suitable rotation
of coordinates, that is, by replacing the standard basis for Rn with another suit-
ably chosen orthonormal basis. This is the content of the spectral theorem, which
says that every symmetric matrix has an orthonormal eigenbasis (see Appendix A,
(A.56)–(A.58)). With this result in hand, we arrive at the second-derivative test for
functions of several variables.

2.81 Theorem. Suppose f is of class C 2 at a and that ∇f (a) = 0, and let H be


the Hessian matrix (2.79). For f to have a local minimum at a, is it necessary for
the eigenvalues of H all to be nonnegative and sufficient for them all to be strictly
positive. For f to have a local maximum at a, it is necessary for the eigenvalues of
H all to be nonpositive and sufficient for them all to be strictly negative.

Proof. We prove only the first assertion; the argument for the second one is similar.
Let u1 , . . . , un be an orthonormal eigenbasis for H with eigenvalues λ1 , . . . , λn .
Our assertion is then that f has a local minimum if all the eigenvalues are (strictly)
positive but not if some eigenvalue is negative.
If all eigenvalues are positive, let l be the smallest of them. Writing k = c1 u1 +
· · · + cn un as before, we have
" "
Hk · k = λj c2j ≥ l c2j = l|k|2 .

But when k is near 0, the error term in (2.80) is less than 14 l|k|2 by Corollary 2.75,
so
f (a + k) − f (a) ≥ 21 l|k|2 − 14 l|k|2 > 0.
Thus f has a local minimum. On the other hand, if some eigenvalue, say λ1 , is
negative, the same argument shows that f (a + tu1 ) − f (a) < 0 for small t ̸= 0, so
f does not have a local minimum.

In short, if all eigenvalues are positive, then f has a local minimum; if all
eigenvalues are negative, then f has a local maximum. If there are two eigenvalues
of opposite signs, then f is said to have a saddle point. At a saddle point, f has
neither a maximum nor a minimum; its graph goes up in one direction and down in
some other direction. The only cases where we can’t be sure what’s going on are
98 Chapter 2. Differential Calculus

F IGURE 2.5: Left: A local maximum (z = −x2 − y 2 ). Middle: A


saddle point (z = x2 − y 2 ). Right: A degenerate critical point (z =
x3 − y 2 ).

those where all the eigenvalues of H are nonnegative or nonpositive but at least one
of them is zero. When that happens, if k is an eigenvector with eigenvalue 0 (i.e.,
k is in the nullspace of H), the quadratic term in (2.80) vanishes and the remainder
term becomes significant; to determine the behavior of f near a we need to look at
the higher-order terms in the Taylor expansion.
Some types of critical points are illustrated in Figure 2.5. A critical point for
which zero is an eigenvalue of the Hessian matrix H — or equivalently, for which
det H = 0 or H is singular — is called degenerate.
In two dimensions it is easy to sort out the various cases:

2.82 Theorem. Suppose f is of class C 2 on an open set in R2 containing the point


a, and suppose ∇f (a) = 0. Let α = ∂12 f (a), β = ∂1 ∂2 f (a), γ = ∂22 f (a).
Then:
a. If αγ − β 2 < 0, f has a saddle point at a.
b. If αγ − β 2 > 0 and α > 0, f has a local minimum at a.
c. If αγ − β 2 > 0 and α < 0, f has a local maximum at a.
d. If αγ − β 2 = 0, no conclusion can be drawn.
/ 0
Proof. The determinant of the Hessian matrix H = αβ βγ is αγ − β 2 . Since the
determinant is the product of the eigenvalues, the two eigenvalues have opposite
signs if αγ − β 2 < 0, and they have the same sign if αγ − β 2 > 0. In the
latter case, H is positive (or negative) definite when the eigenvalues are positive
(or negative), and since α = Hu · u where u = (1, 0), these cases occur precisely
when α > 0 or α < 0. The result now follows from Theorem 2.81.
2.8. Critical Points 99

E XAMPLE 1. Find and classify the critical points of the function f (x, y) =
xy(12 − 3x − 4y).
Solution. We have

∂x f = 12y − 6xy − 4y 2 = y(12 − 6x − 4y),


∂y f = 12x − 3x2 − 8xy = x(12 − 3x − 8y).

Thus, if ∂x f = 0 then y = 0 or 12 − 6x − 4y = 0, and if ∂y f = 0 then x = 0


or 12 − 3x − 8y = 0. So there are four possibilities:

y = x = 0, y = 12 − 3x − 8y = 0,
12 − 6x − 4y = x = 0, and 12 − 6x − 4y = 12 − 3x − 8y = 0.

Solving these gives the critical points (0, 0), (4, 0), (0, 3), and ( 34 , 1). Since
∂x2 f = −6y, ∂y2 f = −8x, and ∂x ∂y f = 12 − 6x − 8y, Theorem 2.82 shows
that the first three of these are saddle points and the last is a local maximum.
The geometry of this example is quite simple. The set where f = 0 is the
union of the three lines x = 0, y = 0, and 3x + 4y = 12. These lines separate
the plane into regions on which f is alternately positive and negative. The three
saddle points are the points where these lines intersect, and the local maximum
is the “peak” in the middle of the triangle defined by these lines.
E XAMPLE 2. Find and classify the critical points of the function f (x, y) =
y 3 − 3x2 y.
Solution. We have ∂x f = −6xy and ∂y f = 3y 2 − 3x2 . Thus, if ∂x f = 0,
then either x = 0 or y = 0, and the equation ∂y f = 0 then forces x = y = 0.
So (0, 0) is the only critical point. The reader may readily verify that all the
second derivatives√of f vanish
√ at (0, 0), so Theorem 2.82 is of√no use. But since
f (x, y) = y(y − 3 x)(y + 3 x), the lines y = 0 and y = ± 3 x separate the
plane into six regions on which f is alternately positive and negative, and these
regions all meet at the origin. Thus f has neither a maximum nor a minimum at
the origin. This configuration is called a “monkey saddle.” (The three regions
where f < 0 provide places for the two legs and tail of a monkey sitting on the
graph of f at the origin.)

EXERCISES
1. Find all the critical points of the following functions. Tell whether each nonde-
generate critical point is a local maximum, local minimum, or saddle point. If
possible, tell whether the degenerate critical points are local extrema too.
100 Chapter 2. Differential Calculus

a. f (x, y) = x2 + 3y 4 + 4y 3 − 12y 2 .
b. f (x, y) = x4 − 2x2 + y 3 − 6y.
c. f (x, y) = (x − 1)(x2 − y 2 ).
d. f (x, y) = x2 y 2 (2 − x − y).
2 2
e. f (x, y) = (2x2 + y 2 )e−x −y .
f. f (x, y) = ax−1 + by −1 + xy, a, b ̸= 0. (The nature of the critical point
depends on the signs of a and b.)
g. f (x, y, z) = x3 − 3x − y 3 + 9y + z 2 .
2 2 2
h. f (x, y, z) = (3x2 + 2y 2 + z 2 )e−x −y −z .
i. f (x, y, z) = xyz(4 − x − y − z).
2. What are the conditions on a, b, c for f (x, y) = ax2 + bxy + cy 2 to have a
minimum, maximum, or saddle point at the origin?
3. The origin is a degenerate critical point of the functions f1 (x, y) = x2 + y 4 ,
f2 (x, y) = x2 − y 4 , and f3 (x, y) = x2 + y 3 . Describe the graphs of these three
functions near the origin. Is the origin a local extremum for any of them?
4. Let f (x, y) = (y − x2 )(y − 2x2 ).
a. Show that the origin is a degenerate critical point of f .
b. Show that the restriction of f to any line through the origin (i.e., the func-
tion g(t) = f (at, bt) for any (a, b) ̸= (0, 0)) has a local minimum at the
origin, but f does not have a local minimum at the origin. (Hint: Consider
the regions where f > 0 or f < 0.)
5. Let H be the Hessian of f . Show that for any unit vector u, Hu · u is the
second directional derivative of f in the direction u.

2.9 Extreme Value Problems


In the previous section we studied the critical points of a differentiable function,
which include its local maxima and minima. In this section we consider the prob-
lem of finding the absolute maximum or minimum of a differentiable function on a
set S ⊂ Rn , which has a somewhat different flavor.
The fundamental theoretical fact that underlies this study is the extreme value
theorem (1.23), whose statement we now recall: If S is a compact subset of Rn and
f is a continuous function on S, then f assumes a minimum and a maximum value
on S — that is, there are points a, b ∈ S such that f (a) ≤ f (x) ≤ f (b) for all
x ∈ S. As the examples that we presented in §1.6 show, the conclusion is generally
invalid if S fails to be both closed and bounded. Accordingly, we shall assume
throughout this section that S is closed, but we shall include some discussion of the
situation when S is unbounded. Moreover, to keep the problem within the realm
2.9. Extreme Value Problems 101

of calculus, we shall assume that S is either (i) the closure of an open set with
a smooth or piecewise smooth boundary, or (ii) a smooth submanifold, such as a
curve or surface, defined by one or more constraint equations. (These geometric
notions will be studied in more detail in Chapter 3.)
Suppose, to begin with, that S is the closure of an open set in Rn , and that we
wish to find the absolute maximum or minimum of a differentiable function f on
S. We assume that the boundary of S is a smooth submanifold (a curve if n = 2, a
surface if n = 3) that can be described as the level set of a differentiable function
G, or that it is the union of a finite number of pieces of this form. (For example,
if S is a cube, its boundary is the union of six faces, each of which is a region in a
smooth surface, viz., a plane.) If S is bounded, the extreme values are guaranteed
to exist, and we can proceed as follows.

i. If an extreme value occurs at an interior point of S, that point must be a crit-


ical point of f . So, we find all the critical points of f inside S and compute
the values of f at these points.

ii. To find candidates for extreme values on the boundary, we can apply the
techniques for solving extremal problems with constraints presented below.

iii. Finally, we pick the smallest and largest of the values of f at the points
found in steps (i) and (ii); these will be the minimum and maximum of f on
S. There is usually no need to worry about the second derivative test in this
situation.

If S is unbounded, the procedure is the same, but we must add an extra argu-
ment to show that the desired extremum actually exists. This must be done on a
case-by-case basis, as there is no general procedure available; however, here are a
couple of simple results that cover many situations in practice and illustrate the sort
of reasoning that must be employed.

2.83 Theorem. Let f be a continuous function on an unbounded closed set S ⊂


Rn .
a. If f (x) → +∞ as |x| → ∞ (x ∈ S), then f has an absolute minimum but no
absolute maximum on S.
b. If f (x) → 0 as |x| → ∞ (x ∈ S) and there is a point x0 ∈ S where f (x0 ) > 0
(resp. f (x0 ) < 0), then f has an absolute maximum (resp. minimum) on S.

Proof. (a) If f (x) → ∞ as |x| → ∞, then clearly f has no maximum. On the


other hand, pick a point x0 ∈ S and let V = {x ∈ S : f (x) ≤ f (x0 )}. Then V is
closed (by Theorem 1.13) and bounded (since f (x) > f (x0 ) when |x| is large). By
102 Chapter 2. Differential Calculus

the extreme value theorem, f has a minimum on V , say at a ∈ V . But then f (a) is
the absolute minimum of f on V because f (x) > f (x0 ) ≥ f (a) for x ∈ S \ V .
The proof of (b) is similar. If f (x0 ) > 0, let V = {x : f (x) ≥ f (x0 )}. Then
V is closed (by Theorem 1.13) and bounded (since f (x) → 0 as |x| → ∞). By the
extreme value theorem, f has a maximum on V , say at a ∈ V . But then f (a) is the
absolute maximum of f on S because f (x) < f (x0 ) ≤ f (a) for x ∈ S \ V .

E XAMPLE 1. Find the absolute maximum and minimum values of the function
x
f (x, y) = 2 on the first quadrant S = {(x, y) : x, y ≥ 0}.
x + (y − 1)2 + 4
Solution. Clearly f (x, y) ≥ 0 for x, y ≥ 0, and f (0, y) = 0, so the
minimum is zero, achieved at all points on the y-axis. Moreover, f (x, y) is less
than the smaller of x−1 and (y − 1)−2 , so it vanishes as |(x, y)| → ∞. Hence,
by Theorem 2.83, f has a maximum on S, which must occur either in the
interior of S or on the positive x-axis. A short calculation that we leave to the
reader shows that the only critical point of f in S is at (2, 1), and f (2, 1) = 14 .
Also, f (x, 0) = x/(x2√+ 5), and the critical
√ points of this function of one
variable
√ at x = ± 5. Only x = 5 is relevant for our purposes, and
are √
f ( 5, 0) = 5/10, which is a bit less than 14 . Thus the maximum value of f
on S is 14 .

Let us turn to the study of extremum problems with constraints. To be precise,


we consider the following situation: We wish to minimize or maximize a differen-
tiable function f on the set
% &
S = x : G(x) = 0 ,

where G is of class C 1 and ∇G(x) ̸= 0 on S. (The latter assumption guarantees


that the set S is smooth in the sense that it possesses a tangent (hyper)plane at every
point a ∈ S, namely, the (hyper)plane through a that is perpendicular to the vector
∇G(a); see Theorem 2.37 and §§3.3–4.) Most applied max-min problems are of
this sort, including the ones one first meets in freshman calculus — for example,
“Find the maximum area of a rectangle with a given perimeter P ,” i.e., maximize
xy subject to the constraint 2x + 2y = P .
There are several methods for attacking such a problem. The most obvious
one is to solve the constraint equation G(x) = 0 for one of the variables, either
explicitly or implicitly, and thus reduce the problem to finding the critical points of
a function of the remaining n−1 variables. (Of course, this is what one always does
in freshman calculus.) Another possibility is to describe the set S parametrically
and thus obtain an (n − 1)-variable problem with the parameters as independent
2.9. Extreme Value Problems 103

variables. This is particularly effective when S is a closed curve or surface such as


a circle or sphere that cannot be described in its entirety as the graph of a function.
There is yet another method, however, which may be derived from the follow-
ing considerations. Suppose that f , as a function on the set S = {x : G(x) = 0},
has a local extremum at x = a. If x = h(t) is a curve on S passing through a at
t = 0, the composite function ϕ(t) = f (h(t)) has a local extremum at t = 0, so
∇f (a) · h′ (0) = ϕ′ (0) = 0. Thus, ∇f (a) is orthogonal to the tangent vector to
every curve on S passing through a; in other words, ∇f (a) is normal to S at a.
But we already know that ∇G(a) is normal to S at a since S is a level set of G. It
follows that ∇f is proportional to ∇G at a:

∇f (a) = λ∇G(a) for some λ ∈ R.

This is the key to the method. The n equations ∂j f = λ∂j G together with the
constraint equation G = 0 give n+1 equations in the n+1 variables x1 , . . . , xn and
λ, and solving them simultaneously will locate the local extrema of f on S. (It will
also produce the appropriate values of λ, which are usually not of much interest,
although one may have to find them in the process of solving for the xj ’s.) This
method is called Lagrange’s method, and the parameter λ is called the Lagrange
multiplier for the problem.
The other methods described above involve reducing the original n-variable
problem to an (n − 1)-variable problem, whereas Lagrange’s method deals directly
with the original n variables. This may be advantageous when the reduction is awk-
ward or when it would involve breaking some symmetry of the original problem.
The disadvantage is that, whereas the other methods lead to solving n − 1 equations
in n − 1 variables, Lagrange’s method requires solving n + 1 equations in n + 1
variables.

E XAMPLE 2. Let’s try out Lagrange’s method on the simple problem of max-
imizing the area of a rectangle with perimeter P . Here f (x, y) = xy and
G(x, y) = 2x + 2y − P , so the equations ∂x f = λ∂x G, ∂y f = λ∂y G, and
G = 0 become

y = 2λ, x = 2λ, 2x + 2y = P.

The first two equations give y = x; substituting into the third equation shows
1 2
that x = y = 14 P , so the maximum of f is 16 P . (Note that the only relevant
1
values of x and y are 0 ≤ x, y ≤ 2 P , so we’re working on a compact set and
the existence of the maximum is not in question. The minimum on this set,
namely 0, is achieved when x = 0, y = 12 P , or vice versa.)
104 Chapter 2. Differential Calculus

E XAMPLE 3. Find the absolute maximum and minimum of f (x, y) = x2 +


y 2 + y on the disc x2 + y 2 ≤ 1.
Solution. We have fx = 2x, fy = 2y + 1. Thus the only critical point is
at (0, − 12 ) (which lies in the disc), at which f = − 41 . To see what happens on
the boundary, we can use Lagrange’s method with G(x, y) = x2 + y 2 − 1. We
have to solve
2x = 2λx, 2y + 1 = 2λy, x2 + y 2 = 1.
The first equation implies that either x = 0 or λ = 1. The latter alternative
is impossible since the equation 2y + 1 = 2y has no solutions, so x = 0 and
then y = ±1 (since x2 + y 2 = 1). We have f (0, 1) = 2, f (0, −1) = 0.
So the absolute maximum is 2 (at (0, 1)) and the absolute minimum is − 14 (at
(0, − 12 )).
We could also analyze f on the boundary by parametrizing the latter as
x = cos θ, y = sin θ. Then f (cos θ, sin θ) = 1 + cos θ, which has a maximum
value of 2 at θ = 0 and a minimum value of 0 at θ = π.
Similar ideas work when there is more than one constraint equation. Let’s
consider the case of two equations:
% &
S = x : G1 (x) = G2 (x) = 0 .
Here G1 and G2 are differentiable functions (the subscripts are labels for the func-
tions, not partial derivatives), and we assume that the vectors ∇G1 (x) and ∇G2 (x)
are linearly independent for x ∈ S. (Again, this guarantees that S is a “smooth”
set, as we shall see in Chapter 3.) To find the extreme values of a differentiable
function on S, we have three methods:
• Solve the equations G1 (x) = G2 (x) = 0 for two of the variables and find
the critical points of the resulting function of the remaining n − 2 variables.
• Find a parametrization of the set S in terms of parameters t1 , . . . , tn−2 , and
find the critical points of f as a function of these variables.
• (Lagrange’s method) At a local extremum, ∇f must be normal to S and
hence must be a linear combination of ∇G1 and ∇G2 :
∇f = λ∇G1 + µ∇G2 for some λ, µ ∈ R.
The n equations ∂j f = λ∂j G1 + µ∂j G2 together with the two constraint
equations G1 = G2 = 0 can be solved for the n + 2 variables x1 , . . . , xn , λ,
and µ, yielding the points where local extrema can occur.
The generalization to k constraint equations should now be pretty clear.
2.9. Extreme Value Problems 105

EXERCISES
1. Find the extreme values of f (x, y) = 2x2 + y 2 + 2x on the set {(x, y) :
x2 + y 2 ≤ 1}.
2. Find the extreme values of f (x, y) = 3x2 − 2y 2 + 2y on the set {(x, y) :
x2 + y 2 ≤ 1}.
3. Find the extreme values of f (x, y) = x3 − x + y 2 − 2y on the closed triangular
region with vertices at (−1, 0), (1, 0), and (0, 2).
4. Find the extreme values of f (x, y) = 3x2 − 8xy − 4y 2 + 2x + 16y on the set
{(x, y) : 0 ≤ x ≤ 4, 0 ≤ y ≤ 3}.
5. Let f (x, y) = (A − bx − cy)2 + x2 + y 2 , where A, b, c are positive constants.
Show that f has an absolute minimum on R2 and find it.
2 2
6. Show that f (x, y) = (x2 + 2y 2 )e−x −y has an absolute minimum and maxi-
mum on R2 , and find them.
2 2
7. Show that f (x, y) = (x2 − 2y 2 )e−x −y has an absolute minimum and maxi-
mum on R2 , and find them.
8. Let f (x, y) = xy+3x−1 +4y −1 . Show that f has a minimum but no maximum
on the set {(x, y) : x, y > 0}, and find the minimum.
9. Find the extreme values of f (x, y, z) = x2 + 2y 2 + 3z 2 on the unit sphere
{(x, y, z) : x2 + y 2 + z 2 = 1}.
10. Let (x1 , y1 ), . . . , (xk , yk ) be points in the plane whose x-coordinates are not
all equal. The linear function f (x) = ax + b such that the sum of the squares
!kthe vertical distances
of from the given points to the line y = ax + b (namely,
(y − ax − b) 2 ) is minimized is called the linear least-squares fit to the
1 j j
points (xj , yj ). Show that it is given by
!
k−1 k1 xj yj − x y
a = −1 ! n 2 2 , b = y − ax,
k 1 xj − x
! !
where x = k−1 k1 xj and y = k−1 k1 yj are the averages of the xj ’s and
yj ’s.
11. Let x, y, z be positive variables and a, b, c positive constants. Find the mini-
mum of x + y + z subject to the constraint (a/x) + (b/y) + (c/z) = 1.
12. Find the minimum possible value of the sum of the three linear dimensions
(length, breadth, and width) of a rectangular box whose volume is a given
constant V . Is there a maximum possible value?
13. Find the point on the line through (1, 0, 0) and (0, 1, 0) that is closest to the
line through (0, 0, 0) and (1, 1, 1). (Hint: Minimize the square of the distance.)
106 Chapter 2. Differential Calculus

14. Find the maximum possible volume of a rectangular solid if the sum of the
areas of the bottom and the four vertical sides is a constant A, and find the
dimensions of the box that has the maximum volume.
15. The two planes x + z = 4 and 3x − y = 6 intersect in a line L. Use Lagrange’s
method to find the point on L that is closest to the origin. (Hint: Minimize the
square of the distance.)
16. Find the maximum value of (xv − yu)2 subject to the constraints x2 + y 2 = a2
and u2 +v 2 = b2 . Do this (a) by Lagrange’s method, (b) by the parametrization
x = a cos θ, y = a sin θ, u = b cos ϕ, v = b sin ϕ.
17. Let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be two points in the plane such that
x1 ̸= x2 and y1 > 0 > y2 . A particle travels in a straight line from P1 to a point
Q on the x-axis with speed v1 , then in a straight line from Q to P2 with speed
v2 . The point Q is allowed to vary. Use Lagrange’s method to show that the
total travel time from P1 to P2 is minimized when (sin θ1 )/(sin θ2 ) = v1 /v2 ,
where θ1 (resp. θ2 ) is the angle between the line P1 Q (resp. QP2 ) and the
vertical line through Q. (Hint: Take θ1 , θ2 as the independent variables.)
18. Let x1 , x2 , . . . , xn denote nonnegative numbers. For c > 0, maximize the
product x1 x2 · · · xn subject to the constraint x1 + x2 + · · · + xn = c, and hence
derive the inequality of geometric and arithmetic means,
/ 01/n x1 + x2 + · · · + xn
x1 x2 · · · xn ≤ (x1 , . . . , xn ≥ 0),
n
where equality holds if and only if the xj ’s are all equal.
19. Let A be a symmetric n × n matrix, and let f (x) = (Ax) · x for x ∈ Rn . Show
that the maximum and minimum of f on the unit sphere {x : |x| = 1} are the
largest and smallest eigenvalues of A.

2.10 Vector-Valued Functions and Their Derivatives


So far our focus has been on real-valued functions on Rn , that is, mappings from
Rn to R. In a number of situations, however, it is useful to consider vector-valued
functions, that is, mappings (or maps, for short) from Rn to Rm where n and m
are any positive integers. We shall denote such functions or mappings by boldface
letters such as f :
/ 0
f (x) = f1 (x), f2 (x), . . . , fm (x) .

Examples of the uses of such mappings include the following:


2.10. Vector-Valued Functions and Their Derivatives 107

• Functions from R to Rm can be interpreted as parametrized curves in Rm .


Similarly, maps from R2 to Rm give parametrizations of 2-dimensional sur-
faces in Rm , and so forth.

• In the situation of the chain rule, where w is a function of x1 , . . . , xn and the


xj ’s are functions of other variables t1 , . . . , tk , we are dealing with a map
x = g(t) from Rk to Rn .

• A map f : Rn → Rn can represent a vector field, that is, a map that assigns
to each point x a vector quantity f (x) such as a force or a magnetic field.

• A map f : Rn → Rn can represent a transformation of a region of space


obtained by applying geometric operations such as dilations and rotations.
For example, under the transformation f (x) = 2x + a, a region in Rn is
expanded by a factor of 2 and then moved over by the amount a.

• A map f : Rn → Rn can represent the transformation from one coordi-


nate system to another — for example, the polar coordinate map f (r, θ) =
(r cos θ, r sin θ).

We shall have more to say about all of these interpretations in Chapter 3.


The simplest mappings from Rn to Rm are the linear2 ones, that is, maps f :
R → Rm that satisfy
n

f (ax + by) = af (x) + bf (y) (a, b, ∈ R, x, y ∈ Rn ).

Such a map is represented by an m × n matrix A = (Ajk ), in such a way that if


elements of Rn and Rm are represented as column vectors, f (x) is just the matrix
product Ax. In other words,
n
"
fj (x) = Ajk xk .
k=1

You can see that the study of mappings from Rn to Rm is complicated, as the study
of the linear ones already constitutes the subject of linear algebra! However, the
basic ideas of differential calculus generalize easily from the scalar case. The only
bits of linear algebra we need for present purposes are the correspondence between
linear maps and matrices, the notion of addition and multiplication of matrices, and
the notion of determinant; see Appendix A, (A.3)–(A.15) and (A.24)-(A.33).
2
Here we use the word “linear” in the more restrictive sense; see Appendix A, (A.5).
108 Chapter 2. Differential Calculus

A mapping f from an open set S ⊂ Rn into Rm is said to be differentiable at


a ∈ S if there is an m × n matrix L such that
|f (a + h) − f (a) − Lh|
(2.84) lim = 0.
h→0 |h|
There can only be one such matrix L (the reason is given in the next paragraph),
and it is called the (Fréchet) derivative of f at a. Commonly used notations for
it include Df (a), Da f , f ′ (a), and dfa . We shall denote it by Df (a). Thus, if f is
differentiable on S, the map Df that assigns to each a ∈ S the derivative Df (a) is
a matrix-valued function on S.
We need to verify that there is at most one matrix L satisfying (2.84). If L′ is
another such matrix, we have
) )
|Lh − L′ h| = )[f (a + h) − f (a) − L′ h] − [f (a + h) − f (a) − Lh])
≤ |f (a + h) − f (a) − L′ h| + |f (a + h) − f (a) − Lh|,
so that |Lh − L′ h|/|h| → 0. But if L′ ̸= L, we can pick a unit vector u with
Lu ̸= L′ u. Setting h = su, we have h → 0 as s → 0, but
|L(su) − L′ (su)| |s(Lu − L′ u)|
= = |Lu − L′ u| ̸→ 0.
|su| |s|
This is a contradiction, so L′ = L.
In the scalar case m = 1 (where f = f ), the definition of differentiability
above coincides with the old one, and Df (a) is just ∇f (a), considered as a row
vector, i.e., a 1×n matrix. (If we think of ∇f (a) as a column vector, then Df (a) =
[∇f (a)]∗ .) Something similar happens when m > 1. Indeed, a vector v approaches
the vector 0 precisely when each of its components approaches the number 0, so
(2.84) is equivalent to the equations
|fj (a + h) − fj (a) − Lj · h|
lim =0 (j = 1, . . . , m)
h→0 |h|
where Lj is the jth row of the matrix L. But these equations say that the compo-
nents fj are differentiable at x = a and that ∇fj (a) = Lj . In short, we have:
2.85 Proposition. An Rm -valued function f is differentiable at a precisely when
each of its components f1 , . . . , fm is differentiable at a. In this case, Df (a) is the
matrix whose jth row is the row vector ∇fj (a). In other words,
⎛ ⎞
∂f1 /∂x1 · · · ∂f1 /∂xn
⎜ .. .. ⎟
Df = ⎝ . . ⎠.
∂fm /∂x1 · · · ∂fm /∂xn
2.10. Vector-Valued Functions and Their Derivatives 109

The general form of the chain rule can now be stated very simply:
2.86 Theorem (Chain Rule III). Suppose g : Rk → Rn is differentiable at a ∈ Rk
and f : Rn → Rm is differentiable at g(a) ∈ Rn . Then H = f ◦ g : Rk → Rm is
differentiable at a, and

DH(a) = Df (g(a))Dg(a),

where the expression on the right is the product of the matrices Df (g(a)) and
Dg(a).
Proof. Differentiability of H is equivalent to the differentiability of each of its
components Hi = fi ◦ g, and for these we have, by Theorem 2.29,
n
"
∂m Hi = (∂1 fi )(∂m g1 ) + · · · + (∂n fi )(∂m gn ) = (∂j fi )(∂m gj ).
j=1

(∂m Hi and ∂m gj are to be evaluated at a, ∂j fi at g(a).) But ∂m Hi is the imth


entry of the matrix DH, and the sum on the right is the imth entry of the product
matrix (Df )(Dg), so we are done.

Since the product of two matrices gives the composition of the linear transfor-
mations defined by those matrices, the chain rule just says that the linear approxi-
mation of a composition is the composition of the linear approximations.
As we pointed out at the end of §2.1, the mean value theorem is false for vector-
valued functions. That is, for a differentiable Rm -valued function f with m > 1,
given two points a and b there is usually no c on the line segment between a and b
such that f (b) − f (a) = [Df (c)][b − a]. However, the main corollary of the mean
value theorem, an estimate on |f (a) − f (b)| in terms of a bound on the derivative
of f , is still valid. To state it, we employ the following terminology: The norm of
a linear mapping A : Rn → Rm is the smallest constant C such that |Ax| ≤ C|x|
for all x ∈ Rn . The norm of A is denoted by ∥A∥; thus,

(2.87) |Ax| ≤ ∥A∥ |x| (x ∈ Rn ).

Equivalently, ∥A∥ = max{|Ax| : |x| = 1}; see Exercise 9. An estimate for ∥A∥
in terms of the entries Ajk is given in Exercise 10.
2.88 Theorem. Suppose f is a differentiable Rm -valued function on an open con-
vex set S ⊂ Rn , and suppose that ∥Df (x)∥ ≤ M for all x ∈ S. Then

|f (b) − f (a)| ≤ M |b − a| for all a, b ∈ S.


110 Chapter 2. Differential Calculus

Proof. Given a unit vector u ∈ Rm , let us consider the scalar-valued function


fu (x) = u · f (x). Clearly fu is differentiable on S and ∂k fu = u · ∂k f =
! m
j=1 uj ∂k fj . By the mean value theorem (2.39) applied to fu , then, there is a
point c on the line segment between a and b (depending on u) such that

u · [f (b) − f (a)] = fu (b) − fu (a) = [∇fu (c)] · [b − a]


"
= uj ∂k fj (c)(bk − ak ) = u · [(Df (c))(b − a)].
j,k

Hence, by Cauchy’s inequality, the fact that |u| = 1, and (2.87),


) )
)u · [f (b) − f (a)]) ≤ |u| ∥Df (c)∥ |b − a| ≤ M |b − a|.

The desired result now follows by taking u to be the unit vector in the direction of
f (b)−f (a), so that u·[f (b)−f (a)] = |f (b)−f (a)|. (Of course, if f (b)−f (a) = 0,
the result is trivial.)

In the case m = n, the Fréchet derivative Df of a function f : Rn → Rn is


an n × n matrix of functions, defined on the set S where f is differentiable, so we
can form its determinant. This determinant, a scalar-valued function on S, is called
the Jacobian of the mapping f . It is sometimes denoted by Jf , or, if y = f (x), by
∂(y1 , . . . , yn )/∂(x1 , . . . , xn ):
∂(y1 , . . . , yn )
(2.89) det Df = Jf = .
∂(x1 , . . . , xn )
(The last notation may look peculiar at first, but it is actually quite handy.) Since
the determinant of a product of two matrices is the product of the determinants, the
chain rule implies that if y = f (x) and x = g(t) (t, x, y ∈ Rn ), then

Jf ◦g (t) = Jf (g(t))Jg (t), or


(2.90) ∂(y1 , . . . , yn ) ∂(y1 , . . . , yn ) ∂(x1 , . . . , xn )
= .
∂(t1 , . . . , tn ) ∂(x1 , . . . , xn ) ∂(t1 , . . . , tn )
If f : Rn → Rm with n > m, we can form a number of different Jacobians by
singling out m of the independent variables for attention and treating the others as
constants, thereby considering f as a function from Rm to Rm . In other words, we
can look at the determinants of all the m × m submatrices of the m × n matrix Df .
The last notation in (2.89) is handy in this situation because it allows us to name
the m independent variables that have been singled out. Similarly, if n < m, we
can consider the determinants of the n × n submatrices of Df obtained by singling
out n of the components of f .
2.10. Vector-Valued Functions and Their Derivatives 111

E XAMPLE 1. Let (u, v) = f (x, y, z) = (2x + y 3 , xe5y−7z ). Then


- .
2 3y 2 0
Df (x, y, z) = ,
e5y−7z 5xe5y−7z −7xe5y−7z

so

∂(u, v) ∂(u, v)
= (10x − 3y 2 )e5y−7z , = −21xy 2 e5y−7z ,
∂(x, y) ∂(y, z)
∂(u, v)
= −14xe5y−7z .
∂(x, z)

EXERCISES

1. Let (u, v) = f (x, y, z) = (xyz 2 − 4y 2 , 3xy 2 − yz). Compute Df (x, y, z),


∂(u, v)/∂(x, y), ∂(u, v)/∂(y, z), and ∂(u, v)/∂(x, z)
2. Let (u, v, w) = f (x, y) = (x + 6y, 3xy, x2 − 3y 2 ). Compute Df (x, y),
∂(u, v)/∂(x, y), ∂(v, w)/∂(x, y), and ∂(u, w)/∂(x, y).
3. Define f : R2 → R3 by f (u, v) = (u2 − 5v, ve2u , 2u − log(1 + v 2 )).
a. Compute Df (u, v). What is Df (0, 0)?
b. Suppose
/1 2 0 g : R2 → R2 is of class C 1 , g(1, 2) = (0, 0), and Dg(1, 2) =
3 4 . Compute D(f ◦ g)(1, 2).
4. Define f : R3 → R2 by f (x, y, z) = (2x+(y −1)2 −sin z, 3x+2e2y−5z ).
a. Compute Df (x, y, z). What are f (0, 0, 0) and Df (0, 0, 0)?
b. Let g be as in Exercise 3b. Compute D(g ◦ f )(0, 0, 0).
5. Show that if f : Rn → Rm is defined by f (x) = Ax + b, where A is an m × n
matrix and b ∈ Rm , then Df (x) = A for all x.
6. Suppose f : Rn → R is of class C 2 ; then ∇f is a C 1 mapping from Rn to
itself. Show that D(∇f ) is the Hessian matrix of f .
7. Suppose f and g are differentiable mappings from Rn to Rm . Show that their
dot product, h(x) = f (x) · g(x), is a differentiable real-valued function on Rn ,
and that
∇h(x) = [Df (x)]∗ g(x) + [Dg(x)]∗ f (x),

if we think of ∇h(x), f (x), and g(x) as column vectors. (Here A∗ denotes the
transpose of the matrix A; see Appendix A, (A.15).)
112 Chapter 2. Differential Calculus

8. Suppose that w = f (x, y, t, s) and x and y are also functions of t and s (the
situation depicted in Figure 2.3). The total dependence of w on t and s can be
expressed by writing w = f (g(t, s)) where g(t, s) = (x(t, s), y(t, s), t, s).
Show that the chain rule (2.86), applied to the composite function f ◦ g, yields
the same result as the one obtained in §2.3.
9. Let A : Rn → Rm be a linear map.
a. Show that the function ϕ(x) = |Ax| has a maximum value on the set
{x : |x| = 1}.
b. Let M be the maximum in part (a). Show that |Ax| ≤ M |x| for all x ∈ Rn ,
with equality for at least one unit vector x. Deduce that M = ∥A∥.
10. Let A : Rn → Rm be a linear map. !
√ n
a. Show that ∥A∥ ≤ m maxm j=1 ( k=1 |Ajk |). (Hint: Use (1.3).)
b. Show that this inequality is an equality when the matrix of A is given by
Aj1 = 1 and Ajk = 0 for k > 1 (1 ≤ j ≤ m).
Chapter 3

THE IMPLICIT FUNCTION


THEOREM AND ITS
APPLICATIONS
In this chapter we take up the general question of the local solvability of systems
of equations involving nonlinear differentiable functions. The main result is the
implicit function theorem, one of the major theoretical results of advanced calcu-
lus. Among other things, it provides the key to answering many questions about
relations between analytic properties of functions and geometric properties of the
sets they define. We shall present some of its applications to the study of geomet-
ric transformations, coordinate systems, and various ways of representing curves,
surfaces, and smooth sets of higher dimension.

3.1 The Implicit Function Theorem


In this section we consider the problem of solving an equation F (x1 , . . . , xn ) = 0
for one of the variables xj as a function of the remaining n − 1 variables, or more
generally of solving a system of k such equations for k of the variables as functions
of the remaining n − k variables.
We begin with the case of a single equation, and to develop some feeling for
the geometry of the problem we consider the cases n = 2 and n = 3. For n = 2
we are given an equation F (x, y) = 0 relating the variables x and y, and we ask
when we can solve for y as a function of x or vice versa. Geometrically, the set
S = {(x, y) : F (x, y) = 0} will usually be some sort of curve, and our question
is: When can S be represented as the graph of a function y = f (x) or x = g(y)?
Likewise, for n = 3, the set where F (x, y, z) = 0 will usually be a surface, and we

113
114 Chapter 3. The Implicit Function Theorem and Its Applications

ask when this surface can be represented as the graph of a function z = f (x, y),
y = g(x, z), or x = h(y, z).
Simple examples show that it is usually impossible to represent the whole set
S = {x : F (x) = 0} as the graph of a function. For example, if n = 2 and
F (x, y) = x2 + y 2 − 1, the set S is the unit
√ circle. We can represent the upper or
lower semicircle as the graph
' of f (x) = ± 1 − x2 , and the right or left semicircle
as the graph of g(y) = ± 1 − y 2 , but the whole circle is not a graph. Thus, in
order to get reasonable results, we must be content only to represent pieces of S
as graphs. More specifically, our object will be to represent a piece of S in the
neighborhood of a given point a ∈ S as a graph.
Since we want to single out one of the variables as the one to be solved for, we
make a little change of notation: We denote the number of variables by n + 1 and
denote the last variable by y rather than xn+1 . We then have the following precise
analytical statement of the problem:
Given a function F (x, y) of class C 1 and a point (a, b) satisfying F (a, b) = 0,
when is there
i. a function f (x), defined in some open set in Rn containing a, and
ii. an open set U ⊂ Rn+1 containing (a, b), such that for (x, y) ∈ U ,
F (x, y) = 0 ⇐⇒ y = f (x)?
We do not try to specify in advance how big the open sets in question will be; that
will depend strongly on the nature of the function F .
The key to the answer is to look at the linear case. If

L(x1 , . . . , xn , y) = α1 x1 + · · · + αn xn + βy + c,

the solution is obvious: The equation L(x, y) = 0 can be solved for y if and only
if the coefficient β is nonzero. But near a given point (a, b), every differentiable
function F (x, y) is approximately linear; in fact, if F (a, b) = 0,

F (x, y) = [∂1 F (a, b)](x1 − a1 ) + · · · + [∂n F (a, b)](xn − an )


+ [∂y F (a, b)](y − b) + small error.

If the “small error” were not there, the equation F (x, y) = 0 could be solved for y
precisely when ∂y F (a, b) ̸= 0. We now show that the condition ∂y F (a, b) ̸= 0 is
still the appropriate one when the error term is taken into account.
3.1 Theorem (The Implicit Function Theorem for a Single Equation). Let
1
F (x, y) be a function of class C on some neighborhood of a point (a, b) ∈ R n+1 .
Suppose that F (a, b) = 0 and ∂y F (a, b) ̸= 0. Then there exist positive numbers
r0 , r1 such that the following conclusions are valid.
3.1. The Implicit Function Theorem 115

y
2r1
(a, b)
x

2r0

F IGURE 3.1: The geometry of the implicit function theorem. ∂y F > 0


in the box, F > 0 on the top side, F < 0 on the bottom side, and
F = 0 on the curve.

a. For each x in the ball |x − a| < r0 there is a unique y such that |y − b| < r1
and F (x, y) = 0. We denote this y by f (x); in particular, f (a) = b.
b. The function f thus defined for |x − a| < r0 is of class C 1 , and its partial
derivatives are given by
∂j F (x, f (x))
(3.2) ∂j f (x) = − .
∂y F (x, f (x))
Notes.
i. The number r0 may be very small, and there is no way to estimate its size
without further hypotheses on F .
ii. The formula (3.2) for ∂j f is, of course, the one obtained via the chain rule
by differentiating the equation F (x, f (x)) = 0.
Proof. We first prove (a). We may assume that ∂y F (a, b) > 0 (by replacing F by
−F if necessary). Since ∂y F is continuous, it remains positive in some neighbor-
hood of (a, b), say for |x−a| < r1 and |y −b| < r1 . On this set, F (x, y) is a strictly
increasing function of y for each fixed x. In particular, since F (a, b) = 0 we have
F (a, b + r1 ) > 0 and F (a, b − r1 ) < 0. The continuity of F then implies that for
some r0 ≤ r1 we have F (x, b + r1 ) > 0 and F (x, b − r1 ) < 0 for |x − a| < r0 .
In short, for each x in the ball B = {x : |x−a| < r0 } we have F (x, b−r1 ) < 0,
F (x, b + r1 ) > 0, and F (x, y) is strictly increasing as a function of y for |y − b| <
r1 . It follows from the intermediate value theorem that there is a unique y for each
x ∈ B that satisfies |y − b| < r1 and F (x, y) = 0, which establishes (a). See
Figure 3.1.
Next we observe that the function y = f (x) thus defined is continuous at x =
a; in other words, for any ϵ > 0 there is a δ > 0 such that |f (x) − f (a)| < ϵ
116 Chapter 3. The Implicit Function Theorem and Its Applications

whenever |x − a| < δ. Indeed, the argument just given shows that |f (x) − f (a)| =
|y − b| < r1 whenever |x − a| < r0 , and we could repeat that argument with r1
replaced by any smaller number ϵ to obtain an appropriate δ in place of r0 .
In fact, this argument can also be applied with a replaced by any other point x0
in the ball B to show that f is continuous at x0 . To recapitulate it briefly: Given
ϵ > 0, there exists δ > 0 such that if |x − x0 | < δ we have F (x, y0 − ϵ) < 0 and
F (x, y0 + ϵ) > 0, where y0 = f (x0 ). For each such x there is a unique y such
that |y − y0 | < ϵ and F (x, y) = 0, and that y is f (x); hence |f (x) − f (x0 )| =
|y − y0 | < ϵ.
Now that we know that f is continuous on B, we can show that its partial
derivatives ∂j f exist on B and are given by (3.2) — which also shows that they are
continuous. Given x ∈ B and a (small) real number h, let y = f (x) and

k = f (x + h) − f (x), where
h = (0, . . . , 0, h, 0, . . . , 0) with the h in the jth place.

Then y + k = f (x + h), so F (x + h, y + k) = F (x, y) = 0. Hence, by the mean


value theorem,

0 = F (x + h, y + k) − F (x, y)
= h∂j F (x + th, y + tk) + k∂y F (x + th, y + tk)

for some t ∈ (0, 1). Rearranging this equation gives

f (x + h) − f (x) k ∂j F (x + th, y + tk)


= =− .
h h ∂y F (x + th, y + tk)

Now let h → 0. Since f is continuous we also have k → 0, and then since ∂j F


and ∂y F are continuous and ∂y F ̸= 0, passage to the limit yields (3.2).

3.3 Corollary. Let F be a function of class C 1 on Rn , and let S = {x : F (x) = 0}.


For every a ∈ S such that ∇F (a) ̸= 0 there is a neighborhood N of a such that
S ∩ N is the graph of a C 1 function.

Proof. Since ∇F (a) ̸= 0, we have ∂j F (a) ̸= 0 for some j. The equation F = 0


can then be solved to yield xj as a C 1 function of the remaining variables near the
point a.

E XAMPLE 1. Let F (x, y) = x − y 2 − 1, for which ∂x F (x, y) = 1 and


∂y F (x, y) = −2y. First, ∂x F is never 0, so the implicit function theorem
guarantees that the equation F (x, y) = 0 can be solved for x locally near any
3.1. The Implicit Function Theorem 117

point (a, b) for which F (a, b) = 0. Of course, for this particular F it is easy
to solve for x explicitly — namely, x = y 2 + 1 — and this solution is valid
not just locally but globally. Next, ∂y F (a, b) = 0 precisely when b = 0, so
the implicit function theorem guarantees that the equation F (x, y) = 0 can be
solved uniquely for y near any point (a,√ b) such that F (a,√b) = 0 and b ̸= 0.
In fact, the possible solutions are y = x − 1 and y = − x − 1. For√x very
√ of these solutions will be very close to b — namely, x − 1
close to a only one
if b > 0 and − x − 1 if b < 0 — and this solution is the one that figures in
the implicit function theorem. Also, these solutions are defined only for x ≥ 1,
so the number r0 in the statement of the implicit function theorem is a − 1.
Finally, we have F (1, 0) = 0, but the equation F (x, y) = 0 cannot be solved
uniquely for y as a function of x in any neighborhood of (1, 0): If x > 1 there
are two solutions, both equally close to 0, and if x < 1 there are none.
E XAMPLE 2. For a contrast with Example 1, let G(x, y) = x − e1−x − y 3 .
First, ∂x G(a, b) = 1 + e1−a > 1 for all (a, b), so the implicit function theorem
guarantees that the equation G(x, y) = 0 can be solved for x locally near
any point (a, b) such that G(a, b) = 0. It is not hard to see (Exercise 4) that
there is a single solution that works globally, but there is no nice formula for
this solution in terms of elementary functions. Next, ∂y G(a, b) = −3b2 , so
the implicit function theorem guarantees that the equation G(x, y) = 0 can
be solved for y as a C 1 function of x locally near any point (a, b) such that
G(a, b) = 0 and b ̸= 0. In fact, the solution is y = (x − e1−x )1/3 , which is
globally uniquely defined but fails to be differentiable at the point where y = 0
(i.e., x = 1).

We now turn to the more general problem of solving several equations simul-
taneously for some of of the variables occurring in them. This will require some
facts about invertible matrices and determinants, for which we refer to Appendix
A, (A.24)–(A.33) and (A.50)–(A.55). To fix the notation, we shall consider k func-
tions F1 , . . . , Fk of n + k variables x1 , . . . , xn , y1 , . . . , yk , and ask when we can
solve the equations

F1 (x1 , . . . , xn , y1 , . . . , yk ) = 0,
(3.4) ..
.
Fk (x1 , . . . , xn , y1 , . . . , yk ) = 0

for the y’s in terms of the x’s. We shall use vector notation to abbreviate (3.4) as

(3.5) F(x, y) = 0.
118 Chapter 3. The Implicit Function Theorem and Its Applications

We assume that F is of class C 1 near a point (a, b) such that F(a, b) = 0, and we
ask when (3.5) determines y as a C 1 function of x in some neighborhood of (a, b).
Again the key to the problem is to consider the linear case,

(3.6) Ax + By + c = 0,

where A is a k × n matrix, B is a k × k matrix, and c ∈ Rk . Here the criterion for


solvability is obvious: The matrix B must be invertible, in which case the solution
is y = −B −1 (Ax+c). Now, the linear approximation to the equation (3.5) near the
point (a, b) is an equation of the form (3.6) in which the matrix B is the (partial)
Fréchet derivative of F with respect to the variables y, evaluated at (a, b):
∂Fi
(3.7) Bij = (a, b) (1 ≤ i, j ≤ k).
∂yj

Hence, the crucial requirement is that

(3.8) the matrix B defined by (3.7) is invertible.

Invertibility of a matrix can be characterized in a number of different ways, as


discussed in Appendix A, (A.52). For example, (3.8) can be expressed more geo-
metrically as the condition that the gradient vectors ∇y Fj = (∂y1 Fj , . . . , ∂yk Fj ),
1 ≤ j ≤ k, are linearly independent at (a, b). However, the version of (3.8) that
is directly used in the proof of the following theorem, as well as in many of its
applications, is that det B ̸= 0. We therefore state the theorem in these terms.

3.9 Theorem (The Implicit Function Theorem for a System of Equations).


Let F(x, y) be an Rk -valued function of class C 1 on some neighborhood of a
point (a, b) ∈ Rn+k and let Bij = (∂Fi /∂yj )(a, b). Suppose that F(a, b) = 0
and det B ̸= 0. Then there exist positive numbers r0 , r1 such that the following
conclusions are valid.
a. For each x in the ball |x − a| < r0 there is a unique y such that |y − b| < r1
and F(x, y) = 0. We denote this y by f (x); in particular, f (a) = b.
b. The function f thus defined for |x−a| < r0 is of class C 1 , and its partial deriva-
tives ∂xj f can be computed by differentiating the equations F(x, f (x)) = 0
with respect to xj and solving the resulting linear system of equations for
∂xj f1 , . . . , ∂xj fk .

Proof. The proof is presented in Appendix B.2 (Theorem B.2). In a nutshell, it


proceeds by induction on k. The hypothesis that det B ̸= 0 implies that at least
one of the (k − 1)×(k−1) submatrices of B is invertible. By inductive hypothesis,
one can solve the corresponding system of k − 1 equations for k − 1 of the variables
3.1. The Implicit Function Theorem 119

yj ; then, after substituting the results into the remaining equation, one solves that
equation for the remaining variable. The main difficulty is in showing that the
implicit function theorem can be applied to the last equation.

E XAMPLE 3. Consider the problem of solving the equations

(3.10) x − yu2 = 0, xy + uv = 0

for u and v as functions of x and y. Setting F = x − yu2 and G = xy + uv,


we see that - .
∂(F, G) −2yu 0
= det = −2yu2 ,
∂(u, v) v u
so the implicit function theorem guarantees a local solution near any point
(x0 , y0 , u0 , v0 ) at which (3.10) holds provided that −2y0 u20 ̸= 0, that is, y0 ̸= 0
and u0 ̸= 0. Notice that under this condition, the first equation in (3.10) im-
plies that x0 ̸= 0 and that x0 and y0 have the same sign; the second equation
then implies that v0 ̸= 0 and that u0 and v0 have opposite signs.
It is not hard to find the solution explicitly:
> '
x
u=± , v = ∓ xy 3 ,
y

the signs of u and v being the same as the signs of u0 and v0 , respectively. This
solution is valid for all (x, y) in the same quadrant as (x0 , y0 ). The problems
that arise if y0 = 0 or u0 = 0 are evident: If y0 = 0, then the formula for u
does not even make sense for y = y0 ; if u0 = 0, then x0 must also be 0, and
the square roots present the same sort of problem as in Example 1.

EXERCISES

1. Investigate the possibility of solving the equation x2 − 4x + 2y 2 − yz = 1


for each of its variables in terms of the other two near the point (2, −1, 3). Do
this both by checking the hypotheses of the implicit function theorem and by
explicitly computing the solutions.
2. Show that the equation x2 + 2xy + 3y 2 = c can be solved either for y as a
C 1 function of x or for x as a C 1 function of y (but perhaps not both) near any
point (a, b) such that a2 + 2ab + 3b2 = c, provided that c > 0. What happens
if c = 0 or if c < 0?
120 Chapter 3. The Implicit Function Theorem and Its Applications

3. Can the equation (x2 + y 2 + 2z 2 )1/2 = cos z be solved uniquely for y in terms
of x and z near (0, 1, 0)? For z in terms of x and y?
4. Sketch the graph of the equation x − e1−x − y 3 = 0 in Example 2. Show
graphically that for each x there is a unique y satisfying this equation, and vice
versa.
5. Suppose F (x, y) is a C 1 function such that F (0, 0) = 0. What conditions on
F will guarantee that the equation F (F (x, y), y) = 0 can be solved for y as a
C 1 function of x near (0, 0)?
6. Investigate the possibility of solving the equations xy + 2yz − 3xz = 0, xyz +
x − y = 1 for two of the variables as functions of the third near the point
(x, y, z) = (1, 1, 1).
7. Investigate the possibility of solving the equations u3 + xv − y = 0, v 3 + yu −
x = 0 for any two of the variables as functions of the other two near the point
(x, y, u, v) = (0, 1, 1, −1).
8. Investigate the possibility of solving the equations xy 2 + xzu + yv 2 = 3 and
u3 yz + 2xv − u2 v 2 = 2 for u and v as functions of x, y, and z near x = y =
z = u = v = 1.
9. Can the equations x2 + y 2 + z 2 = 6, xy + tz = 2, xz + ty + et = 0 be solved
for x, y, and z as C 1 functions of t near (x, y, z, t) = (−1, −2, 1, 0)?

3.2 Curves in the Plane


In this section we examine the relations between various ways of representing
smooth curves in the plane. Here we shall take “smooth” to mean that the curve
possesses a tangent line at each point and that the tangent line varies continuously
with the point of tangency. (Don’t worry if this last continuity condition seems a
little unclear; we will reformulate it more precisely below.) Thus “smooth” is the
geometric equivalent of “C 1 .”
There are three common ways of representing smooth curves in the plane R2 :
i. as the graph of a function, y = f (x) or x = f (y), where f is of class C 1 ;

ii. as the locus1 of an equation F (x, y) = 0, where F is of class C 1 ;

iii. parametrically, as the range of a C 1 function f : (a, b) → R2 .


Of these, (i) is the simplest, and it a special case of the other two. Indeed, the curve
given by y = f (x) is the locus of the equation F (x, y) = 0 where F (x, y) =
1
The locus of an equation F (x) = c is the set of all x that satisfy the equation.
3.2. Curves in the Plane 121

F IGURE 3.2: Left: The sets x2 − y 2 = c for c = ±1 (the hyperbolas)


and c = 0 (the cross). Right: The sets y 3 = x2 + c for c = 1 (top),
c = 0 (middle), and c = −1 (bottom).

y − f (x), and it is also the range of the map f (t) = (t, f (t)). The representations
(ii) and (iii) are more flexible, but they are also too general as they stand because
the sets represented by them may not be smooth curves. Consider the following
examples, in which c denotes an arbitrary real constant:

E XAMPLE 1. Let F (x, y) = x2 + y 2 − c. The set where F (x, y) = 0 is a


smooth curve (a circle) if c > 0, but it is a single point if c = 0 and it is the
empty set if c < 0.
E XAMPLE 2. Let G(x, y) = x2 − y 2 − c. The set where G(x, y) = 0 is a
hyperbola (the union of two disjoint smooth curves) if c ̸= 0, but if c = 0 it
is the union of the two lines y = x and y = −x. The latter set looks like a
smooth curve in a neighborhood of any of its points except the origin, where
the two lines cross. See Figure 3.2.
E XAMPLE 3. Let H(x, y) = y 3 − x2 − c. The set where H(x, y) = 0 is a
smooth curve if c ̸= 0, but when c = 0 it is a curve with a sharp cusp at the
origin. The latter set can also be described parametrically by f (t) = (t3 , t2 ).
See Figure 3.2.
E XAMPLE 4. The function g(t) = (sin2 t, cos2 t) is C 1 , but its range is the
line segment from (0, 1) to (1, 0). The point g(t) traverses this line segment
from (0, 1) to (1, 0) as t goes from 0 to 12 π, then traverses it in the reverse
direction as t goes from 12 π to π, and this pattern is repeated on every interval
[nπ, (n + 1)π].
122 Chapter 3. The Implicit Function Theorem and Its Applications

In these examples, the functions in question are all of class C 1 , but the sets they
describe fail to be smooth curves at certain points. However, they share a common
feature: The points where smoothness fails — namely, the origin in Examples 1–3
and the points (0, 1) and (1, 0) in Example 4 — are the points where the derivatives
of the relevant functions vanish. That is, the origin is the one and only point where
the gradients ∇F , ∇G, and ∇H vanish, and it is the image under f of the one and
only point (t = 0) where f ′ vanishes. Moreover, (0, 1) and (1, 0) are the images
under g of the points t = nπ and t = (n + 21 )π where g′ (t) = 0.
This suggests that it might be a good idea to impose the extra conditions that
∇F ̸= 0 on the set where F = 0 in (ii) and that f ′ (t) ̸= 0 in (iii). And indeed, with
the help of the implicit function theorem, it is easy to see that under these extra
conditions the representations (i)–(iii) are all locally equivalent. That is, if a curve
is represented in one of the forms (i)–(iii) and a is a point on the curve, at least a
small piece of the curve including the point a can also be represented in the other
two forms.
We now make this precise. Since (i) is more special than either (ii) or (iii), as
we have observed above, it is enough to see that a curve given by (ii) or (iii) can
also be represented in the form (i).
3.11 Theorem.
a. Let F be a real-valued function of class C 1 on an open set in R2 , and let S =
{(x, y) : F (x, y) = 0}. If a ∈ S and ∇F (a) ̸= 0, there is a neighborhood N
of a in R2 such that S ∩ N is the graph of a C 1 function f (either y = f (x) or
x = f (y)).
b. Let f : (a, b) → R2 be a function of class C 1 . If f ′ (t0 ) ̸= 0, there is an open
interval I containing t0 such that the set {f (t) : t ∈ I} is the graph of a C 1
function f (either y = f (x) or x = f (y)).
Proof. Part (a) is a special case of Corollary 3.3. As for (b), let f = (ϕ, ψ). If
f ′ (t0 ) ̸= 0, then either ϕ′ (t0 ) ̸= 0 or ψ ′ (t0 ) ̸= 0; let’s assume that the former
condition holds. Let F (x, t) = x − ϕ(t) and x0 = ϕ(t0 ). Since ∂t F (x0 , t0 ) =
−ϕ′ (t0 ) ̸= 0, the implicit function theorem guarantees that the equation x = ϕ(t)
can be solved for t as a C 1 function of x, say t = ω(x), in some neighborhood of the
point (x0 , t0 ). But then (ϕ(t), ψ(t)) = (x, ψ(ω(x))) for t in some neighborhood I
of t0 ; that is, the set {f (t) : t ∈ I} is the graph of the C 1 function f = ψ ◦ ω. (If
ψ ′ (t0 ) ̸= 0 instead, one can make the same argument with x and y switched.)

It should be noted that the conditions of nonvanishing derivatives in Theorem


3.11 are automatically satisfied in the special case where the curve is given in the
form (i). That is, if F (x, y) = y − f (x), then ∂F/∂y = 1, so ∇F never vanishes;
similarly, if f (t) = (t, f (t)), then f ′ (t) = (1, f ′ (t)) ̸= (0, 0).
3.2. Curves in the Plane 123

With this in mind, we may make the following more formal definition of a
smooth curve: A set S ⊂ R2 is a smooth curve if (a) S is connected, and (b)
every a ∈ S has a neighborhood N such that S ∩ N is the graph of a C 1 function
f (either y = f (x) or x = f (y)). This agrees with the notion of smooth curve
indroduced at the beginning of this section: The curve described by y = f (x)
has a tangent line at each point (x0 , f (x0 )), and that line is given by an equation
y − f (x0 ) = f ′ (x0 )(x − x0 ) whose coefficients depend continuously on x0 .
It should be emphasized that the conditions ∇F ̸= 0 and f ′ ̸= 0 in Theorem
3.11, are sufficient for the smoothness of the associated curves but not necessary.
In other words, the condition ∇F (a) = 0 or f ′ (t0 ) = 0 allows the possibility
of non-smoothness at a or f (t0 ) but does not guarantee it. For example, suppose
G(x, y) is a C 1 function whose gradient does not vanish on the set S = {(x, y) :
G(x, y) = 0}, so that S is a smooth curve, and let F = G2 . Then the set where
F = 0 coincides with S, but ∇F = 2G∇G ≡ 0 on S! Similarly, as t ranges over
the interval (−1, 1), the functions f (t) and g(t) = f (t3 ) describe the same curve,
but g′ (0) = 0 no matter what f is.
The following question remains: Suppose S is a subset of R2 that is described
in one of the forms (i)–(iii), and suppose that the regularity condition ∇F ̸= 0 on
S (in case (ii)) or f ′ (t) ̸= 0 for all t ∈ (a, b) (in case (iii)) is satisfied. Theorem
3.11 shows that every sufficiently small piece of S is a smooth curve, but is the
entire set S a smooth curve? In case (i) the answer is clearly yes. However, in cases
(ii) and (iii) the answer may be no.
The trouble in case (ii) is that S may be disconnected. For example, if F =
GH, then S is the union of the sets {(x, y) : G(x, y) = 0} and {(x, y) : H(x, y) =
0}, and these sets may well be disjoint and form a disconnection of S. (Also see
Exercise 6.)
E XAMPLE 5. Let F (x, y) = (x2 + y 2 − 1)(x2 + y 2 − 2). Then the set where
F = 0 is the union of two disjoint circles centered at the origin. See Figure
3.3.
E XAMPLE 6. Let F (x, y) = (x2 + y 2 − 1)(x2 + y 2 − 2x). Then the set S
where F = 0 is the union of the circles of√radius 1 about (0, 0) and (1, 0).
These circles intersect at the points ( 12 , ± 12 3), and S is not a smooth curve
at these points. The reader may verify that ∇F = (0, 0) at these points, in
accordance with Theorem 3.11. See Figure 3.3 and also Exercise 6.
As for the representation (iii), a set of the form {f (t) : a < t < b} is necessarily
connected if f is continuous (Theorem 1.26). However, the function f (t) may not
be one-to-one, in which case the curve it describes may be traced more than once
(as we observed in Example 4) or may cross itself. These phenomena can happen
124 Chapter 3. The Implicit Function Theorem and Its Applications

F IGURE 3.3: The sets in Examples 5 (left), 6 (middle), and 8 (right).

even if f ′ (t) never vanishes. Consequently, the condition f ′ (t) ̸= 0 is not sufficient
to guarantee that the set S = {f (t) : t ∈ (a, b)} is a smooth curve, only that
the pieces of it obtained by restricting t to small intervals are smooth curves. In
practice, sometimes one simply imposes the extra assumption that f is one-to-one
in order to avoid various pitfalls.
E XAMPLE 7. Let f (t) = (cos t, sin t). Then f ′ (t) = (− sin t, cos t) is never
zero since the sine and cosine functions have no common zeros, but f is one-to-
one on the interval (a, b) only when b − a ≤ 2π. The range {f (t) : t ∈ R} of f
is a smooth curve (namely, the unit circle), but in order to obtain a one-to-one
correspondence between points on the circle and values of the parameter t, one
must restrict t to an interval of the form [a, a + 2π) or (a, a + 2π].
E XAMPLE 8. Let f (t) = (t3 −t, t2 ). Then f ′ (t) = (3t2 −1, 2t) never vanishes,
but f (−1) = f (1) = (0, 1). The curve {f (t) : t ∈ R} loops around and
crosses itself at (0, 1), so it fails to be a smooth curve at that point. However,
{f (t) : t ∈ I} is a smooth curve as long as I is an interval whose closure does
not contain both −1 and 1. See Figure 3.3.
The reader with access to a computer graphics program may find it entertaining
to experiment with examples similar to the ones in this section to obtain a better
understanding of the relations between analytic and geometric properties of func-
tions and to see the various types of singularities that can arise when the regularity
condition ∇F ̸= 0 or f (t) ̸= 0 is violated.

EXERCISES
1. For each of the following functions F (x, y), determine whether the set S =
{(x, y) : F (x, y) = 0} is a smooth curve. Draw a sketch of S. Examine the
3.2. Curves in the Plane 125

nature of S near any points where ∇F = 0. Near which points of S is S the


graph of a function y = f (x)? x = f (y)?
a. F (x, y) = x2 + 3y 2 − 3.
b. F (x, y) = x2 −' 3y 2 − 3.
c. F (x, y) = x − 3(y 2 + 1).
d. F (x, y) = xy(x + y − 1).
e. F (x, y) = (x2 + y 2 )(y − x2 − 1).
f. F (x, y) = (x2 + y 2 )(y − x2 ).
g. F (x, y) = (ex − 1)2 + (sin y − 1)2 .
2. Let Sp = {(x, y) : xp + y p = 1}, where p is a positive integer.
a. Show that Sp is a smooth curve for all p.
b. Draw a sketch of Sp . (The geometry of Sp depends strongly on whether p
is even or odd.)
c. Which portions of Sp can be represented as the graph of a continuous func-
tion y = f (x)? x = f (y)? What if f is required to be C 1 ? (Again, the
cases p even, p odd and > 1, and p = 1 are different.)
3. For each of the following functions f (t), determine whether the set S = {f (t) :
t ∈ R} is a smooth curve. Draw a sketch of S. Examine the nature of S near
any points f (t) where f ′ (t) = 0.
a. f (t) = (t2 − 1, t + 1).
b. f (t) = (t2 − 1, t2 + 1).
c. f (t) = (t3 − 1, t3 + 1)
d. f (t) = (cos3 t, sin3 t).
e. f (t) = (cos t + cos 2t, sin t + sin 2t).
4. Let ϕ(s) = s2 if s ≥ 0, ϕ(s) = −s2 if s < 0.
a. Show that ϕ is of class C 1 , even at s = 0.
b. Let f (t) = (ϕ(cos t), ϕ(sin t)). Show that {f (t) : t ∈ R} is the square
with vertices at (±1, 0) and (0, ±1). For which values of t is f ′ (t) = 0?
What are the corresponding points f (t)?
/ 0
5. Let f (t) = (t2 −1)/(t2 +1), t(t2 −1)/(t2 +1) and S = {f (t) : t ∈ R}.
a. Show that S is the locus of the equation y 2 (1 − x) = x2 (1 + x).
b. Draw a sketch of S. (S is a curve containing a loop; it is called a strophoid.)
Show that S is asymptotic to the line x = 1.
c. Discuss the nature of the point (0, 0) where S crosses itself in terms of the
parametric and nonparametric representations of S in (a).
6. Let F1 and F2 be C 1 functions on some open set U in the plane, and let F3 =
F1 F2 . For j = 1, 2, 3, let Sj = {x ∈ U : Fj (x) = 0}.
a. Show that S3 = S1 ∪ S2 .
b. Show that if a ∈ S1 ∩ S2 , then ∇F3 (a) = 0.
126 Chapter 3. The Implicit Function Theorem and Its Applications

3.3 Surfaces and Curves in Space


In this section we discuss ways of representing smooth surfaces and curves in R3 ,
with a brief sketch of the situation in higher dimensions.

Surfaces in R3 . The standard ways of representing surfaces in 3-space are


analogous to the standard ways of representing curves in the plane:

i. as the graph of a function, z = f (x, y) (or y = f (x, z) or x = f (y, z)),


where f is of class C 1 ;

ii. as the locus of an equation F (x, y, z) = 0, where F is of class C 1 ;

iii. parametrically, as the range of a C 1 function f : R2 → R3 .

As before, (i) is a special case of (ii) and (iii), with F (x, y, z) = z − f (x, y) and
f (u, v) = (u, v, f (u, v)), and as before, some additional conditions need to be
imposed in cases (ii) and (iii) in order to guarantee the smoothness of the surface.
The condition in case (ii) is exactly the same as for curves, namely, that

(3.12) ∇F (x, y, z) ̸= (0, 0, 0) whenever F (x, y, z) = 0.

The situation in case (iii) needs to be examined a little more closely.


To be precise, we assume that f is a C 1 map from some open set U ⊂ R2 into
3
R , and we consider the set
% &
S = x ∈ R3 : x = f (u), u ∈ U .

Here x = (x, y, z) and u = (u, v); the variables u and v are the parameters used
to represent the surface S. We can think of them as giving a coordinate system on
S, with the coordinate grid being formed by the images of the lines v = constant
and u = constant, that is, the curves given parametrically by x = f (u, c) and
x = f (c, v). The picture is as in Figure 3.4.
What is the appropriate nondegeneracy condition on the derivatives of f ? A first
guess might be that the Fréchet derivative Df (a 3 × 2 matrix) should be nonzero,
but this is not enough. We can obtain more insight by looking at the case where
f is linear, that is, f (u, v) = ua + vb + c for some a, b, c ∈ R3 . Typically the
range of such an f is a plane, but if the vectors a and b are linearly dependent
— that is, if one is a scalar multiple of the other — it will only be a line (unless
a = b = 0, in which case it is a single point). Now, for a general smooth f , the
linear approximation to f near a point (u0 , v0 ) is f (u, v) ≈ ua + vb + c where the
3.3. Surfaces and Curves in Space 127

v z

y
u
x

F IGURE 3.4: Parametric representation of a surface.

vectors a, b, and c are ∂u f , ∂v f , and f evaluated at (u0 , v0 ). Hence we are led to


the regularity hypothesis:
∂f ∂f
the vectors (u, v) and (u, v) are linearly independent
(3.13) ∂u ∂v
at each (u, v) ∈ U .

Since two vectors in R3 are linearly independent if and only if their cross product
is nonzero, (3.13) can be restated as
+ ,
∂f ∂f
(3.14) × (u, v) ̸= 0 at each (u, v) ∈ U .
∂u ∂v

If S is the graph of a function f and we take the standard parametrization f (u, v) =


(u, v, f (u, v)), the condition (3.13) or (3.14) is automatically satisfied, because
∂u f = (1, 0, ∂u f ) and ∂v f = (0, 1, ∂v f ).
Notice that ∂u f and ∂v f are the tangent vectors to the “coordinate curves”
x = f (u, c) and x = f (c, v) described above. Thus, the condition (3.13) means
that these tangent vectors, at each point of the surface, are nonzero and point in
different directions; this implies that the coordinate curves are smooth and intersect
nontangentially.
With these things in mind, we arrive at the analogue of Theorem 3.11 for sur-
faces.
3.15 Theorem.
a. Let F be a real-valued function of class C 1 on an open set in R3 , and let
S = {(x, y, z) : F (x, y, z) = 0}. If a ∈ S and ∇F (a) ̸= 0, there is a
neighborhood N of a in R3 such that S ∩ N is the graph of a C 1 function f
(either z = f (x, y), y = f (x, z), or x = f (y, z)).
128 Chapter 3. The Implicit Function Theorem and Its Applications

b. Let f be a C 1 mapping from an open set in R2 into R3 . If [∂u f × ∂v f ](u0 , v0 ) ̸=


0, there is a neighborhood N of (u0 , v0 ) in R2 such that the set {f (u, v) :
(u, v) ∈ N } is the graph of a C 1 function.
Proof. Part (a) is a special case of Corollary 3.3. As for (b), let f = (ϕ, ψ, χ). The
components of the cross product ∂u f × ∂v f are just the Jacobians ∂(ϕ, ψ)/∂(u, v),
∂(ϕ, χ)/∂(u, v), and ∂(ψ, χ)/∂(u, v). Under the hypothesis of (b), at least one of
them — let us say ∂(ϕ, ψ)/∂(u, v) — is nonzero at (u0 , v0 ). The implicit function
theorem then guarantees that the pair of equations x = ϕ(u, v), y = ψ(u, v) can
be solved to yield u and v as C 1 functions of x and y near u = u0 , v = v0 ,
x = ϕ(u0 , v0 ), y = ψ(u0 , v0 ). Substituting these functions for u and v in the
equation z = χ(u, v) then yields z as a C 1 function of x and y whose graph is the
range of f .

Thus the representations (i)–(iii) for surfaces are locally equivalent in the pres-
ence of the regularity conditions (3.12) and (3.13); a smooth surface is a connected
subset of R3 that can be locally described in any of these three forms. The poten-
tial global problems with the representations (ii) and (iii) are the same as for plane
curves; namely, the set where a C 1 function F vanishes may be disconnected, and
a map f that is locally one-to-one need not be globally one-to-one.
/
E0XAMPLE 1. Let f (u, v) = (u + v) cos(u − v), (u + v) sin(u − v), u +
v . The set S = f (R2 ) is a right circular cone with vertex at the origin; it
is described nonparametrically by the equation x2 + y 2 = z 2 . The set S is
a smooth surface except at the origin, which accords with the fact that the
gradient of F (x, y, z) = x2 + y 2 − z 2 vanishes at the origin and nowhere else.
Correspondingly, the vectors

∂u f
/ 0
= cos(u − v) − (u + v) sin(u − v), sin(u − v) + (u + v) cos(u − v), 1
and

∂v f
/ 0
= cos(u − v) + (u + v) sin(u − v), sin(u − v) − (u + v) cos(u − v), 1
are linearly independent except when u + v = 0, in which case they coincide.
The map f is locally one-to-one except along the line u + v = 0, and this entire
line is mapped to the origin. (The reader will recognize that u + v and u − v are
really the r and θ of cylindrical coordinates in R3 . We have chosen to disguise
them a little in order to display a situation where ∂u f and ∂v f are both nonzero
but are linearly dependent where the singularities occur.)
3.3. Surfaces and Curves in Space 129

E XAMPLE 2. The unit sphere S = {(x, y, z) : x2 + y 2 + z 2 = 1} can be


parametrized by spherical coordinates,

f (θ, ϕ) = (cos θ sin ϕ, sin θ sin ϕ, cos ϕ).

Here θ is the longitude and ϕ is the co-latitude, i.e., the latitude as measured
from the north pole rather than the equator. The longitude θ is only well defined
up to multiples of 2π, but the co-latitude is usually restricted to the interval
[0, π]. The sphere is a smooth surface, but the map f does not provide a “good”
parametrization of the whole sphere because it is not locally one-to-one when
sin ϕ = 0. (That is, the longitude is completely undetermined at the north and
south poles.) This degeneracy is also reflected in the tangent vectors

∂θ f = (− sin θ sin ϕ, cos θ sin ϕ, 0),


∂ϕ f = (cos θ cos ϕ, sin θ cos ϕ, − sin ϕ);

they are linearly independent when sin ϕ ̸= 0, but ∂θ f = 0 when sin ϕ = 0.


However, if we restrict θ and ϕ to the rectangle −π < θ < π, 0 < ϕ < π, we
obtain a good parametrization of the sphere with the “international date line”
removed.

Finally, a few words about finding the tangent plane to a smooth surface S at a
point a ∈ S. In general, the tangent plane is given by the equation n · (x − a) = 0,
where n is a (nonzero) normal vector to S at a. We have already observed in
Theorem 2.37 that when S is given by an equation F = 0, then the vector ∇F (a)
is normal to S at a. On the other hand, when S is given parametrically as the range
of a map f (u, v), the vectors ∂u f (b, c) and ∂v f (b, c) are tangent to certain curves
in S and hence to S itself at f (b, c); we therefore obtain a normal at f (b, c) by
taking their cross product. In both cases, the conditions on F or f that guarantee
the smoothness of S also guarantee that these normal vectors are nonzero.

Curves in R3 . Curves in R3 are generally described either parametrically or


as the intersection of two surfaces. The situation where two of the coordinates are
given as functions of the third one can be considered as a special case of either of
these. Thus, once again we have three kinds of representation for curves:

i. as a graph, y = f (x) and z = g(x) (or similar expressions with the coordi-
nates permuted), where f and g are C 1 functions;

ii. as the locus of two equations F (x, y, z) = G(x, y, z) = 0, where F and G


are C 1 functions;
130 Chapter 3. The Implicit Function Theorem and Its Applications

iii. parametrically, as the range of a C 1 function f : R → R3 .

The form (ii) describes the curve as the intersection of the two surfaces F = 0
and G = 0, and (i) is a special case of (ii) (with F (x, y, z) = y − f (x) and
G(x, y, z) = z − g(x)) and of (iii) (with f (t) = (t, f (t), g(t))).
By now the reader should be able to guess what the appropriate regularity con-
dition for cases (ii) and (iii) is. In (iii) it is simply that f ′ (t) ̸= 0, and in (ii) it is
that

∇F (x) and ∇G(x) are linearly independent


at every x at which F (x) = G(x) = 0.

(Geometrically, this means that the surfaces F = 0 and G = 0 are nowhere tangent
to each other.) With these conditions we have an analogue of Theorems 3.11 and
3.15. Rather than give another precise statement and proof, we sketch the ideas and
leave the details to the reader (Exercise 7).
First, if ∇F and ∇G are linearly independent, then at least one of the Jacobians
∂(F, G)/∂(x, y), ∂(F, G)/∂(x, z), and ∂(F, G)/∂(y, z) must be nonzero; let us
say the last one. Then the implicit function theorem guarantees that the equations
F = G = 0 can be solved for y and z as functions of x. Second, if f ′ (t) ̸= 0,
then one of the components of f ′ (t) must be nonzero; let us say the first one. Then
the equation x = f1 (t) can be solved for t in terms of x, and then the equations
y = f2 (t) and z = f3 (t) yield y and z as functions of x. In either case we end up
with the representation (i).
Let us say a little more about what can go wrong in case (ii) when ∇F and
∇G are linearly dependent. The potential problems are clearly displayed in the
following situation: Let F (x, y, z) = z − ϕ(x, y), where ϕ is a C 1 function, and
let G(x, y, z) = z. Then the sets where F = 0 and G = 0 are smooth surfaces; the
former is the graph of ϕ, whereas the latter is the xy-plane. The intersection of these
two surfaces is the curve in the xy-plane described by the equation ϕ(x, y) = 0.
Now, this curve can have all sorts of singularities if there are points on it where
∇ϕ = (0, 0), as we have discussed in §3.2. But since ∇F = (−∂x ϕ, −∂y ϕ, 1) and
∇G = (0, 0, 1), the points where ∇ϕ = (0, 0) are precisely the points where ∇F
and ∇G are linearly dependent.
If a curve S is represented parametrically by a function f (t), the derivative f ′ (t)
furnishes a tangent vector to S at the point f (t). On the other hand, if S is given
by a pair of equations F = G = 0 and a ∈ S, the vectors ∇F (a) and ∇G(a) are
both normal to S at a and hence span the normal plane to S at a. One can therefore
obtain a tangent vector to S at a by taking their cross product.
3.3. Surfaces and Curves in Space 131

Higher Dimensions. The pattern for representations of curves and surfaces that
we have established in this section and the preceding one should be pretty clear by
now, and it generalizes readily to higher dimensions. We sketch the main points
briefly and leave it to the ambitious reader to work out the details.
The general name for a “smooth k-dimensional object” is manifold; thus, a
curve is a 1-dimensional manifold and a surface is a 2-dimensional manifold. Here
we consider the question of representing k-dimensional manifolds in Rn , for any
positive integers k and n with n > k. The two general forms, corresponding to (ii)
and (iii) above for curves and surfaces, are as follows.
The Nonparametric Form: A k-dimensional manifold S in Rn can be described
as the set of simultaneous solutions of n − k equations. That is, given C 1 functions
F1 , . . . , Fn−k defined on some open set U ⊂ Rn , or equivalently a C 1 mapping
F = (F1 , . . . , Fn−k ) from U into Rn−k , we can consider the set
% &
(3.16) S = x : F(x) = 0 .

The regularity condition that guarantees that S is a smooth k-dimensional manifold


is that

∇F1 (x), . . . , ∇Fn−k (x) are linearly independent at each x ∈ S,

or, equivalently,

the matrix DF(x) has rank n − k at every x ∈ S.

This condition implies that, for each x0 ∈ S, some (n − k) × (n − k) submatrix


of DF(x0 ) is nonsingular, and the implicit function theorem then implies that the
equations F(x) = 0 can be solved near x0 for n−k of the variables as C 1 functions
of the remaining k variables. This leads to the more special representation of (small
pieces of) S by the equations analogous to (i) for curves and surfaces, namely,
x′′ = g(x′ ), where x′′ represents an ordered (n − k)-tuple of coordinates and x′ is
the ordered k-tuple of remaining coordinates.
The Parametric Form: Given a C 1 map f from some open set V ⊂ Rk into Rn ,
we can consider the set
% &
(3.17) S = f (u) : u ∈ V .

The regularity condition that guarantees that S is a smooth k-dimensional manifold


is that

∂u1 f (u), . . . , ∂uk f (u) are linearly independent at each u ∈ V,


132 Chapter 3. The Implicit Function Theorem and Its Applications

or equivalently,

the matrix Df (u) has rank k at each u ∈ V .

This condition implies that, for each u0 ∈ V , some k × k submatrix of Df (u0 )


is invertible, say the one formed from the rows i1 , . . . , ik . The implicit function
theorem then implies that the equations xij = fij (u1 , . . . , uk ) (1 ≤ j ≤ k) can
be solved near u0 to yield the uj ’s as C 1 functions of x′ = (xi1 , . . . , xik ). Substi-
tuting these functions for the uj ’s in the remaining equations xl = fl (u1 , . . . , uk )
again yields a representation of (small pieces of) S analogous to (i) for curves and
surfaces.
It is perhaps worth pointing out what these two representations boil down to
in the linear case. That is, suppose S is a k-dimensional vector subspace of Rn ;
then S can be represented in the forms (3.16) or (3.17) where the functions F and
f are linear and hence are given by matrices. (3.16) is the representation of S as
the nullspace of an (n − k) × n matrix, and (3.17) is the representation of S as the
column space of an n × k matrix; in both cases the regularity condition is that the
rank of the matrix in question is as large as possible.

EXERCISES

1. For each of the following maps f : R2 → R3 , describe the surface S = f (R2 )


and find a description of S as the locus of an equation F (x, y, z) = 0. Find the
points where ∂u f and ∂v f are linearly dependent, and describe the singularities
of S (if any) at these points.
a. f (u, v) = (2u + v, u − v, 3u).
b. f (u, v) = (au cos v, bu sin v, u) (a, b > 0).
c. f (u, v) = (cos u cosh v, sin u cosh v, sinh v).
d. f (u, v) = (u cos v, u sin v, u2 ).
2. Find an equation for the tangent plane to the following parametrized surfaces
at the point (1, −2, 1). (The first step is to find the values of the parameters u, v
that yield this point.)
a. x = eu−v , y = u − 3v, z = 21 (u2 + v 2 ).
b. x = 1/(u + v), y = −(u + ev ), z = u3 .
3. Find a parametrization for each of the following surfaces (perhaps involving an
angular variable that is defined only up to multiples of 2π).
a. The surface obtained by revolving the curve z = f (x) (a < x < b) in the
xz-plane around the z-axis, where a > 0.
3.4. Transformations and Coordinate Systems 133

b. The surface obtained by revolving the curve z = f (x) (a < x < b) in the
xz-plane around the x-axis, where f (x) > 0.
c. The lower sheet of the hyperboloid z 2 − 2x2 − y 2 = 1.
d. The cylinder x2 + z 2 = 9.
4. Find a parametric description of the following lines:
a. The intersection of the planes x − 2y + z = 3 and 2x − y − z = −1.
b. The intersection of the planes x + 2y = 3 and y − 3z = 2.
5. Let S be the circle formed by intersecting the plane x + z = 1 with the sphere
x2 + y 2 + z 2 = 1.
a. Find a parametrization of S.
b. Find parametric equations for the tangent line to S at the point ( 12 , − √12 , 12 ).
6. Let S be the intersection of the cone z 2 = x2 + y 2 and the plane z = ax + 1,
where a ∈ R.
a. Show that S is a circle if a = 0, an ellipse if |a| < 1, a parabola if |a| = 1,
and a hyperbola if |a| > 1.
b. Find a parametrization for S in the first two cases and for the part of S
lying above the xy-plane in the third case.
7. Give a precise statement and proof of the analogue of Theorem 3.11 for curves
in R3 .

3.4 Transformations and Coordinate Systems


In this section we study smooth mappings from Rn to itself in more detail, with
emphasis on geometric intuition for the cases n = 2 and n = 3.
Suppose f : Rn → Rn is a map of class C 1 . We can regard f as a transfor-
mation of Rn , that is, an operation that moves the points in Rn around in some
definite fashion. When n > 1, such transformations are usually best pictured with
“before and after” sketches. That is, if x = f (u), we think of u and x as living
in two separate copies of Rn . We draw a sketch of u-space with some geometric
figures in it, such as a grid of coordinate lines, then draw a sketch of x-space with
the images of those figures under the transformation f .
√ √
E XAMPLE 1. Define f : R2 → R2 by f (u, v) = 12 ( 3u − v, u + 3v). The
map f represents a counterclockwise rotation through the angle 16 π about the

origin (since 21 3 = cos 61 π and 12 = sin 61 π). See Figure 3.5.
E XAMPLE 2. Define f : R2 → R2 by f (u, v) = (2u, v). f simply stretches out
the u coordinate by a factor of 2. See Figure 3.6.
134 Chapter 3. The Implicit Function Theorem and Its Applications

√ √
F IGURE 3.5: The transformation f (u, v) = 12 ( 3u − v, u + 3v).

F IGURE 3.6: The transformation f (u, v) = (2u, v).

E XAMPLE 3. Define f : R2 → R2 by f (u, v) = (u2 − v 2 , 2uv). Unlike the


previous two examples, this f is not one-to-one; it maps (u, v) and (−u, −v) to
the same point. (It’s not hard to check that this is the only duplication of values:
If f (u, v) = f (z, w) then (z, w) = ±(u, v).) In order to draw an intelligible
picture, we restrict attention to the region u > 0. We also denote f (u, v) by
(x, y), so the “before” and “after” pictures are the uv-plane and the xy-plane.
The image of the vertical line u = c under f is given by x = c2 − v 2 , y = 2cv.
Elimination of v yields x = c2 − y 2 /4c2 , the equation of a parabola that opens
out to the left. On the other hand, the image of the horizontal line v = c is
given by x = u2 − c2 , y = 2cu, which yields x = y 2 /4c2 − c2 . Since we
are assuming u > 0, we have y > 0 or y < 0 depending on whether c > 0
or c < 0; in either case this curve is half of a parabola opening to the right.
See Figure 3.7: The v-axis is mapped to the negative x-axis (both (0, v) and
(0, −v) being mapped to (−v 2 , 0)), as indicated by the dotted lines, and the
right half of the uv-plane is bent to the left to fill up the rest of the xy-plane.
We can also draw the reverse picture. The horizontal line y = c in the xy-
plane corresponds to the curve 2uv = c in the uv-plane, which is a hyperbola
whose asymptotes are the coordinate axes. The vertical line x = c corresponds
to the curve u2 − v 2 = c, which is a hyperbola whose asymptotes are the
3.4. Transformations and Coordinate Systems 135

F IGURE 3.7: The transformation (x, y) = (u2 − v 2 , 2uv), showing the


image in the xy-plane of the coordinate grid in the half-plane u > 0.

F IGURE 3.8: The transformation (x, y) = (u2 − v 2 , 2uv), showing


the curves in the half-plane u > 0 that map to the coordinate grid in
the xy-plane.

lines v = ±u when c ̸= 0 and the union of these two lines when c = 0. See
Figure 3.8.

We can think of mappings from R3 to itself pictorially in the same way, though
the pictures are harder to draw. Figure 3.9 shows what happens to a cube under the
transformation f (u, v, w) = (−2u, v, 21 w).
Another common interpretation of a map f : Rn → Rn is as a coordinate
system on Rn . For example, we usually think of f (r, θ) = (r cos θ, r sin θ) as
representing polar coordinates in the plane. In the preceding discussion we thought
in terms of moving the points in Rn around without changing the labeling system
(namely, Cartesian coordinates); here we are thinking of leaving the points alone
but giving them different labels (polar rather than Cartesian coordinates.) It’s just
a matter of point of view; the same transformation f can be interpreted either way.
For example, the systems of parabolas and hyperbolas in Figures 3.7 and 3.8 can
136 Chapter 3. The Implicit Function Theorem and Its Applications

F IGURE 3.9: The transformation f (u, v, w) = (−2u, v, 12 w). (The u


and w axes are horizontal and vertical, respectively.)

F IGURE 3.10: The polar coordinate transformation (x, y) = (r cos θ, r sin θ).

be interpreted as the grids for curvilinear coordinate systems in the appropriate


parts of the plane, and the map f (r, θ) = (r cos θ, r sin θ) can be interpreted as a
transformation. Figure 3.10 shows a representative piece of it.
Not all mappings f : Rn → Rn can be used as coordinate systems, however. A
“good” coordinate system should have the property that there is a one-to-one cor-
respondence between points and their coordinates; that is, each set of coordinates
should specify a unique point in Rn , and two different sets of coordinates should
specify different points. Polar coordinates, for example, do not satisfy this condi-
tion — (r, θ) and (r, ϕ) are polar coordinates of the same point whenever θ−ϕ is an
integer multiple of 2π, or whenever r = 0 — and this fact always has the potential
to cause problems when polar coordinates are used. However, if we restrict r and θ
to satisfy r > 0 and −π < θ < π, we do get a “good” coordinate system, not on the
whole plane, but on the plane with the negative real axis removed. Likewise, the
map (u, v) = (x2 − y 2 , 2xy) in Example 3, restricted to the half-plane x > 0, gives
a “good” coordinate system on the uv-plane with the negative u axis removed.
In short, our attention is directed to transformations f of class C 1 that map an
open set U ⊂ Rn in a one-to-one fashion onto another open set V ⊂ Rn . There is
3.4. Transformations and Coordinate Systems 137

one further requirement that is natural to impose, namely, that the inverse mapping
f −1 : V → U should also be of class C 1 , so that the correspondence is smooth in
both directions. Hence, the question arises: Given a C 1 transformation f : U → V ,
when does f possess a C 1 inverse f −1 : V → U ? That is, when can the equation
f (x) = y be solved uniquely for x as a C 1 function of y?
This question is clearly closely related to the ones that led to the implicit func-
tion theorem, and indeed, if we restrict attention to the solvability of the equation
f (x) = y in a small neighborhood of a point, its answer becomes a special case of
that theorem. As we did before, we can guess what the answer should be by looking
at the linear approximation. If f (a) = b, the linear approximation to the equation
f (x) = y at the point (a, b) is T (x − a) = y − b where the matrix T is the Fréchet
derivative Df (a), and the latter equation can be solved for x precisely when T is
invertible, that is, when the Jacobian det Df (a) is nonzero. We are therefore led to
the following theorem.
3.18 Theorem (The Inverse Mapping Theorem). Let U and V be open sets in Rn ,
a ∈ U , and b = f (a). Suppose that f : U → V is a mapping of class C 1 and the
Fréchet derivative Df (a) is invertible (that is, the Jacobian det Df (a) is nonzero).
Then there exist neighborhoods M ⊂ U and N ⊂ V of a and b, respectively, so
that f is a one-to-one map from M onto N , and the inverse map f −1 from N to M
is also of class C 1 . Moreover, if y = f (x) ∈ N , D(f −1 )(y) = [Df (x)]−1 .
Proof. The existence of the inverse map is equivalent to the unique solvability of
the equation F(x, y) = 0 for x, where F(x, y) = f (x) − y. Since the derivative of
F as a function of x is just Df (x), the implicit function theorem (3.9) guarantees
that this unique solvability will hold for (x, y) near (a, b) provided that Df (a) is
invertible. (In referring to the statement of the implicit function theorem, however,
note that the roles of the variables x and y have been reversed here.) Moreover,
since f −1 (f (x)) = x for x ∈ M , the chain rule gives D(f −1 )(f (x)) · Df (x) = I
where I is the n × n identity matrix; in other words, D(f −1 )(y) = [Df (x)]−1
where y = f (x).

It is to be emphasized that the inverse mapping theorem is local in nature; the


global invertibility of f is a more delicate matter. To be more precise, consider the
following question: Suppose f : U → V is of class C 1 and that Df (x) is invertible
for every x ∈ U . Is f one-to-one on U ?
When n = 1, the answer is yes, provided that U is an interval. Here we are
considering a C 1 function f (x) such that f ′ (x) ̸= 0 on the interval U = (a, b).
Since f ′ is continuous, we must have either f ′ (x) > 0 for all x ∈ (a, b), so that f
is strictly increasing, or f ′ (x) < 0 for all x ∈ (a, b), so that f is strictly decreasing.
In either case, f is one-to-one.
138 Chapter 3. The Implicit Function Theorem and Its Applications

When n > 1, however, the answer is no. The simplest counterexample is


our old friend the polar coordinate map, f (r, θ) = (r cos θ, r sin θ), on the set
U = {(r, θ) : r > 0}. We have
- .
cos θ −r sin θ
Df (r, θ) = , det Df (r, θ) = r(cos2 θ + sin2 θ) = r,
sin θ r cos θ

so det Df ̸= 0 on U , but f is not one-to-one since f (r, θ + 2kπ) = f (r, θ). It is,
however, locally one-to-one, in that it is one-to-one if one restricts θ to any interval
of length less than 2π. (Notice also that the Jacobian of the polar coordinate map
vanishes when r = 0. This accords with the fact that the polar coordinate map is
not even locally invertible there; the angular coordinate is completely undetermined
at the origin.)
The question of global invertibility is a delicate one. Consider the following
situation: Let f : Rn → Rn be a map whose component functions are all polyno-
mials, and suppose that the Jacobian det Df is identically equal to 1. Is f globally
invertible? The answer is so far unknown; this is a famous unsolved problem.
We should also point out that the invertibility of Df (a) is not necessary for the
existence of an inverse map, although it is necessary for the differentiability of that
inverse. (Example: Let f (x) = x3 . Then f has the global inverse f −1 (y) = y 1/3 ,
but f (0) = f ′ (0) = 0 and f −1 is not differentiable at 0.)

EXERCISES

1. For each of the following transformations (u, v) = f (x, y), (i) compute the
Jacobian det Df , (ii) draw a sketch of the images of some of the lines x =
constant and y = constant in the uv-plane, (iii) find formulas for the local
inverses of f when they exist.
a. u = ex cos y, v = ex sin y.
b. u = x2 , v = y/x.
c. u = x2 + 2xy + y 2 , v = 2x + 2y.
2. Let (u, v) = f (x, y) = (x − 2y, 2x − y).
a. Compute the inverse transformation (x, y) = f −1 (u, v).
b. Find the image in the uv-plane of the triangle bounded by the lines y = x,
y = −x, and y = 1 − 2x.
c. Find the region in the xy-plane that is mapped to the triangle with vertices
(0, 0), (−1, 2), and (2, 1) in the uv-plane.
3.4. Transformations and Coordinate Systems 139

3. Let u = sin x cosh y, v = cos x sinh y.


a. Show that the images of the lines x = constant (resp. y = constant) in the
uv-plane are hyperbolas (resp. ellipses).
b. Show that ∂(u, v)/∂(x, y) = cos2 x + sinh2 y. At what points (x, y) does
this Jacobian vanish? Show that the corrsponding points in the uv-plane
are (±1, 0).
c. (optional) Show that the ellipses and hyperbolas in (a) all have foci at
(±1, 0).
4. Let (u, v) = f (x, y) = (x − y, xy).
a. Sketch some of the curves x − y = constant and xy = constant in the
xy-plane. Which regions in the xy-plane map onto the rectangle in the
uv-plane given by 0 ≤ u ≤ 1, 1 ≤ v ≤ 4? There are two of them; draw a
picture of them.
b. Compute the derivative Df and the Jacobian J = det Df .
c. The Jacobian J vanishes at (a, b) precisely when the gradients ∇u(a, b)
and ∇v(a, b) are linearly dependent, i.e., when the level sets of u and v
passing through a and b are tangent to each other. (If this doesn’t seem
obvious at first, think about it!) Use your sketch of the level sets in (a) to
show pictorially that this assertion is correct.
d. Notice that f (2, −3) = (5, −6). Compute explicitly the local inverse g of
f such that g(5, −6) = (2, −3) and also compute its derivative Dg.
e. Show by explicit calculation that the matrices Df (2, −3) and Dg(5, −6)
are inverses of each other.
5. Find a one-to-one C 1 mapping f from the first quadrant of the xy-plane to the
first quadrant of the uv-plane such that the region where x2 ≤ y ≤ 2x2 and
1 ≤ xy ≤ 3 is mapped to a rectangle. Compute the Jacobian det Df and the
inverse mapping f −1 . (Hint: Map all the regions where ax2 ≤ y ≤ bx2 and
c ≤ xy ≤ d to rectangles.)
6. Let f : R3 → R3 be the spherical coordinate map,
(x, y, z) = f (r, ϕ, θ) = (r sin ϕ cos θ, r sin ϕ sin θ, r cos ϕ).
Thus r is the distance to the origin, ϕ is the co-latitude (the angle from the
positive z-axis), and θ is the longitude.
a. Describe the surfaces in xyz-space that are the images of the planes r =
constant, ϕ = constant, and θ = constant.
b. Compute the derivative Df and show that det Df (r, ϕ, θ) = r 2 sin ϕ.
c. What is the condition on the point (r0 , ϕ0 , θ0 ) for f to be locally invertible
about this point? What is the corresponding condition on (x0 , y0 , z0 ) =
f (r0 , ϕ0 , θ0 )?
140 Chapter 3. The Implicit Function Theorem and Its Applications

7. We have obtained the inverse mapping theorem as a corollary of the implicit


function theorem. It is also possible to prove the inverse mapping theorem di-
rectly and then obtain the implicit function theorem as a corollary of it. Do this
last step; that is, assume the inverse mapping theorem and deduce the implicit
function theorem from it. (Hint: Let F(x, y) be as in Theorem 3.9. Apply the
inverse mapping theorem to the transformation G : Rn+k → Rn+k defined by
G(x, y) = (x, F(x, y)).)

3.5 Functional Dependence


In the implicit function theorem and its applications discussed in the preceding
sections, we have drawn consequences from the nonvanishing of various Jacobians.
In this section we consider the opposite situation, in which a Jacobian vanishes
identically.
For motivation, let us first consider the linear case. Let A be an n × n matrix,
and define F : Rn → Rn by F(x) = Ax (where x is considered as a column
vector). If A is nonsingular, F is a one-to-one map from Rn onto itself, whose
inverse is F−1 (y) = A−1 y. However, if det A = 0, the range of T (namely,
the column space of A) is a proper linear subspace of Rn , and the components
(f1 , . . . , fn ) of F satisfy at least one nontrivial linear relation. More precisely, if
the rank of A is k, where k < n, then the range of F is a k-dimensional subspace
of Rn , and the components of F satisfy n − k independent linear relations (namely,
the relations satisfied by the rows of A).

E XAMPLE 1. Let F = (f1 , f2 , f3 ) be given by the matrix


⎛ ⎞
1 2 −1
A = ⎝ 1 −3 4 ⎠,
2 −1 3

that is,

f1 (x, y, z) = x + 2y − z,
f2 (x, y, z) = x − 3y + 4z,
f3 (x, y, z) = 2x − y + 3z.

It is easily verified that det A = 0, that the first two rows of A are independent,
and that the third row is the sum of the first two. This last relation means that
the functions f1 , f2 , f3 satisfy the linear relation f3 = f1 + f2 . Equivalently,
the range of F is the plane defined by the equation y3 = y1 + y2 .
3.5. Functional Dependence 141

E XAMPLE 2. Let F = (f1 , f2 , f3 ) be given by the matrix


⎛ ⎞
1 2 −1
A=⎝ 2 4 −2 ⎠ ,
−3 −6 3
that is,
f1 (x, y, z) = x + 2y − z,
f2 (x, y, z) = 2x + 4y − 2z,
f3 (x, y, z) = −3x − 6y + 3z.
Here the rank of A is 1, and the functions fj satisfy the relations f2 = 2f1 ,
f3 = −3f1 . The range of F is the line passing through the origin and the point
(1, 2, −3).
More generally, one can consider linear maps F : Rm → Rn defined by n × m
matrices A. The range of such a map is a linear subspace of Rn whose dimension
is the rank of A. It must happen when n > m, and may happen when n ≤ m,
that this subspace is a proper subspace of Rn , in which case the components of F
satisfy nontrivial linear relations.
Now let us return to the study of more general functions. The appropriate ana-
logue of “linear dependence” for nonlinear functions is “functional dependence,”
which means that the functions in question satisfy a nontrivial functional relation,
in other words, that one of them must be expressible as a function of the others.
We shall formulate this idea precisely in a way that is appropriate for C 1 func-
tions, although the notion of functional dependence does not really depend on any
differentiability conditions.
Suppose f1 , . . . , fn are C 1 real-valued functions on an open set U ⊂ Rm .
We say that f1 , . . . , fn are functionally dependent on U if there is a C 1 function
Φ : Rn → R such that
/ 0 / 0
(3.19) Φ f1 (x), . . . , fn (x) = 0 and ∇Φ f1 (x), . . . , fn (x) ̸= 0 for x ∈ U.
The nonvanishing of ∇Φ guarantees, via the implicit function theorem, that the
equation Φ = 0 can be solved locally for one of the variables in terms of the others;
in other words, one of the functions fj can be expressed in terms of the remaining
ones.
Geometrically, (3.19) means that the range of the map f = (f1 , . . . , fn ) is
contained in the hypersurface {y : Φ(y) = 0} in Rn , so that it is at most (n − 1)-
dimensional. (It might be even smaller, of course; the functions fj might satisfy
other relations in addition to the equation Φ(f (x)) = 0.)
142 Chapter 3. The Implicit Function Theorem and Its Applications

E XAMPLE 3. The functions


f1 (x, y, z) = x + y + z,
f2 (x, y, z) = xy + xz + yz,
f3 (x, y, z) = x2 + y 2 + z 2
are functionally dependent on R3 , for f3 = f12 − 2f2 .
E XAMPLE 4. The functions f1 (x, y) = 3x + 1, f2 (x, y) = x2 − y are not
functionally dependent on any open set in R2 . Indeed, the transformation f =
(f1 , f2 ) is a one-to-one map from R2 onto itself whose inverse g = (g1 , g2 ) is
given by g1 (u, v) = 31 (u − 1), g2 (u, v) = 91 (u − 1)2 − v; hence the values of
f are not subject to any restrictions.
It should be noted that the question of functional dependence is interesting
only when the number of functions does not exceed the number of independent
variables; when it does, functional dependence is almost automatic. For example,
if f and g are any two C 1 functions of one variable, then f and g are functionally
dependent on any interval I on which either f ′ ̸= 0 or g′ ̸= 0. Indeed, if f ′ ̸= 0
on I, then f is one-to-one on I and so has an inverse; then Φ(f (x), g(x)) = 0 on I
where Φ(u, v) = g(f −1 (u)) − v.
The main results of this section concern the close relation between the func-
tional dependence of a family of functions and the linear dependence of their linear
approximations. To begin with, we consider the case where the number of functions
equals the number of independent variables.
3.20 Theorem. Suppose f = (f1 , . . . , fn ) is a C 1 map on some open set U ⊂ Rn .
If f1 , . . . , fn are functionally dependent on U , then the Jacobian det Df vanishes
identically on U .
Proof. Functional dependence of the fj ’s means that there is a C 1 function Φ such
that Φ(f (x)) = 0 and ∇Φ(f (x)) ̸= 0 for x ∈ U . Differentiation of the equation
Φ(f (x)) = 0 with respect to the variables x1 , . . . , xn via the chain rule yields
(∂1 Φ)(∂1 f1 ) + (∂2 Φ)(∂1 f2 ) + · · · + (∂n Φ)(∂1 fn ) = 0,
..
.
(∂1 Φ)(∂n f1 ) + (∂2 Φ)(∂n f2 ) + · · · + (∂n Φ)(∂n fn ) = 0,
where the derivatives of Φ are evaluated at f (x) and the derivatives of the fj ’s are
evaluated at x. Thus, at each x ∈ U , the system of equations
(∂1 f1 )y1 + (∂1 f2 )y2 + · · · + (∂1 fn )yn = 0,
..
.
(∂n f1 )y1 + (∂n f2 )y2 + · · · + (∂n fn )yn = 0,
3.5. Functional Dependence 143

has a nonzero solution, namely y = ∇Φ(f (x)). Therefore, its coefficient matrix
(∂j fk (x)), which is nothing but the transpose of Df (x), must be singular, and
hence det Df (x) = 0.

More interesting is the fact that the converse of this theorem is also true: The
vanishing of the Jacobian det Df implies the functional dependence of the fj ’s. We
now present a version of this result with an additional hypothesis (the constancy of
the rank of Df ) that yields a sharper conclusion. We formulate it so that it also cov-
ers the case when the number of functions differs from the number of independent
variables.

3.21 Theorem. Let f = (f1 , . . . , fn ) be a C 1 map from a connected open set


U ⊂ Rm into Rn . Suppose that the matrix Df (x) has rank k at every x ∈ U ,
where k < n. Then every x0 ∈ U has a neighborhood N such that f1 , . . . , fn are
functionally dependent on N and f (N ) is a smooth k-dimensional submanifold of
Rn .

(The restriction to a small neighborhood N is necessary because the set f (U )


can cross itself, as in Example 8 in §3.2.)
Since Df (x) is an n × m matrix, its rank k always satisfies k ≤ m and k ≤ n.
When k = m, the situation described here is simply the representation in para-
metric form of an m-dimensional submanifold of Rn , as discussed in §§3.2–3, and
the conclusion of the theorem is that such a submanifold can also be described as
the locus of a system of equations. In other words, the case k = m boils down
to Theorems 3.11b and 3.15b and their generalizations to higher dimensions. The
case where more needs to be said is the one where k < m.
Rather than proving this theorem in complete generality, we shall restrict atten-
tion to the case where m = n = 3 and k is 1 or 2. The ideas in the general case
are the same; only the details are more cumbersome. (See also Exercise 2.) Let us
restate the theorem for the special case:

3.22 Theorem. Let f = (f, g, h) be a C 1 map from a connected open set U ⊂ R3


into R3 . Suppose that the matrix Df (x) has rank k at every x ∈ U , where k = 1
or 2. Then every x0 ∈ U has a neighborhood N such that the functions f, g, h are
functionally dependent on N and f (N ) is a smooth curve (if k = 1) or a smooth
surface (if k = 2).

Proof. Let x = (x, y, z), u = f (x), v = g(x), and w = h(x), and fix x0 =
(x0 , y0 , z0 ) ∈ U .
First suppose k = 1. Since the matrix Df (x0 ) has rank 1, it has at least one
nonzero entry; by relabeling the functions and variables, we may assume that the
144 Chapter 3. The Implicit Function Theorem and Its Applications

(1, 1) entry is nonzero, that is, ∂x f (x0 ) ̸= 0. By the implicit function theorem,
then, the equation u = f (x, y, z) can be solved near x = x0 , u = u0 = f (x0 ), to
yield x as a function of y, z, and u. Then v and w turn into functions of y, z, and
u also. Implicit differentiation of the equations u = f (x, y, z) and v = g(x, y, z)
with respect to y (taking y, z, and u as the independent variables) yields

0 = (∂x f )(∂y x) + (∂y f ),


∂y v = (∂x g)(∂y x) + (∂y g).

Solving the first equation for ∂y x and substituting the result into the second equa-
tion then yields
−∂y f 1 ∂(f, g)
∂y v = ∂x g + ∂y g = · .
∂x f ∂x f ∂(x, y)
But since Df has rank 1, all of its 2 × 2 submatrices are singular; therefore,
∂(f, g)/∂(x, y) ≡ 0 and hence ∂y v ≡ 0. Restricting to a convex neighborhood
of (y0 , z0 , u0 ), we conclude that v is independent of y. For exactly the same rea-
son, v is independent of z, and w is independent of y and z. That is, v and w are
functions of u alone, say v = ϕ(u) and w = ψ(u). This shows that f, g, h are
functionally dependent — g(x) = ϕ(f (x)) and h(x) = ψ(f (x)) — and that the
image of a neighborhood of x0 under f is the locus of the equations v = ϕ(u),
w = ψ(u), which is a smooth curve.
Now let us turn to the case k = 2. Here some 2 × 2 submatrix of Df (x0 ) is
nonsingular; by relabeling the functions and variables, we can assume that it is the
one in the upper left corner, so that ∂(f, g)/∂(x, y) is nonzero at x0 . By the implicit
function theorem, the equations u = f (x, y, z) and v = g(x, y, z) can be solved
near x = x0 , u = u0 = f (x0 ), v = v0 = g(x0 ), to yield x and y as functions of
u, v, and z. Taking u, v, and z as the independent variables, then, we differentiate
the equations u = f (x, y, z), v = g(x, y, z), and w = h(x, y, z) implicitly with
respect to z to obtain

0 = (∂x f )(∂z x) + (∂y f )(∂z y) + (∂z f ),


0 = (∂x g)(∂z x) + (∂y g)(∂z y) + (∂z g),
∂z w = (∂x h)(∂z x) + (∂y h)(∂z y) + (∂z h),

or

(∂x f )(∂z x) + (∂y f )(∂z y) = −∂z f,


(∂x g)(∂z x) + (∂y g)(∂z y) = −∂z g,
(∂x h)(∂z x) + (∂y h)(∂z y) − (∂z w) = −∂z h.
3.5. Functional Dependence 145

These equations may be solved simultaneously for ∂z x, ∂z y, and ∂z w. By Cramer’s


rule (Appendix A, (A.54)),
⎛ ⎞ ⎛ ⎞
∂x f ∂y f −∂z f ? ∂x f ∂y f 0
∂z w = det ⎝ ∂x g ∂y g −∂z g ⎠ det ⎝ ∂x g ∂y g 0 ⎠
∂x h ∂y h −∂z h ∂x h ∂y h −1
@
∂(f, g, h) ∂(f, g)
= .
∂(x, y, z) ∂(x, y)

The denominator is nonzero by assumption, but the numerator is zero because Df


has rank 2. Hence w is independent of z; that is, w depends only on u and v,
say w = ϕ(u, v). This shows that f, g, h are functionally dependent — h(x) =
ϕ(f (x), g(x)) — and that the image of a neighborhood of x0 under f is the locus
of the equation w = ϕ(u, v), which is a smooth surface.

We conclude with a few words about the assumption that the rank of Df is con-
stant. Suppose that A(x) is a matrix whose entries depend continuously on x ∈ U
(U an open subset of Rm ), and the rank of A(x0 ) is k. Since a set of linearly inde-
pendent vectors remains linearly independent if the vectors are perturbed slightly,
the rank of A(x) is at least k when x is sufficiently close to x0 . In other words,
for each k the set {x ∈ U : rank(A(x)) ≥ k} is open. In particular, if k0 is the
maximum rank of A(x) as x ranges over U , then {x ∈ U : rank(A(x)) = k0 } is
open.
Now, in this chapter we have been concerned with C 1 maps f : U → Rn (U
an open subset of Rm ) and the matrix in question is the derivative Df (x). If k0
is the maximum rank of this matrix as x ranges over U , the set V = {x ∈ U :
rank(Df (x)) = k0 } is open, and the theorems of this chapter can be applied on V .
(The implicit function and inverse mapping theorems deal with the case when k0 is
as large as possible, namely, k0 = min(m, n); the theorems of this section provide
information for smaller values of k.) The typical situation is that V is dense in U ,
that is, the set U \ V has no interior points. Thus, the structure of the mapping
f near “most” points of U (the ones in V ) is fairly simple to understand, but at
the remaining points, various kinds of singularities can occur. The study of such
singularities is a substantial and rather intricate branch of mathematical analysis.

EXERCISES

1. For each of the following maps f = (f, g, h), determine whether f, g, h are
functionally dependent on some open set U ⊂ R3 by examining the Jacobian
146 Chapter 3. The Implicit Function Theorem and Its Applications

∂(f, g, h)/∂(x, y, z). If they are, determine the rank of Df on U and find
functional relations (one relation if rank(Df ) = 2, two relations if rank(Df ) =
1) satisfied by f, g, h.
a. f (x, y, z) = x + y − z, g(x, y, z) = x − y + z, h(x, y, z) = x2 + y 2 +
z 2 − 2yz.
b. f (x, y, z) = x2 + y 2 + z 2 , g(x, y, z) = x + y + z, h(x, y, z) = y − z.
c. f (x, y, z) = y 1/2 sin x, g(x, y, z) = y cos2 x − y, h(x, y, z) = z − 3.
d. f (x, y, z) = xy+z, g(x, y, z) = x2 y 2 +2xyz+z 2 , h(x, y, z) = 2−xy−z.
e. f (x, y, z) = log x − log y + z, g(x, y, z) = log x − log y − z, h(x, y, z) =
(x2 + 2y 2 )/xy.
f. f (x, y, z) = x − y + z, g(x, y, z) = x2 − y 2 , h(x, y, z) = x + z.
2. Write out the statement and give a precise proof for the following special cases
of Theorem 3.21, along the lines of Theorem 3.22.
a. m = n = 2, k = 1.
b. m = 2, n = 3, k = 1.
Chapter 4

INTEGRAL CALCULUS

In this chapter we study the integration of functions of one and several real vari-
ables. As we assume that the reader is already familiar with the standard techniques
of integration for functions of one variable, our discussion of integration on the line
is limited to theoretical issues. On the other hand, some of these issues arise also in
higher dimensions, and we shall sometimes invoke the careful treatment of the one-
variable case as an excuse for being somewhat sketchy in developing the theory for
several variables.
In elementary calculus, the term “integral” can! refer either to the antiderivative
of a function f or to a limit of sums of the form f (xj )∆xj ; one speaks of in-
definite or definite integrals. At the more advanced level, and in particular in this
book, “integral” almost always carries the latter meaning. The notion of integra-
tion as a sophisticated form of summation is one of the truly fundamental ideas of
mathematical analysis, and it arises in many contexts where the connection with
differentiation is tenuous or nonexistent.

4.1 Integration on the Line

; b for a nonnegative function f , the basic geometric interpretation of the


Recall that
integral a f (x) dx is as the area of the region between the graph of f and the x-
axis over the interval [a, b]. The idea for computing this area is to subdivide the
interval [a, b] into small subintervals [x0 , x1 ], [x1 , x2 ], . . . , [xJ−1 , xJ ], with x0 = a
and xJ = b, and to approximate the region under the graph of f by a union of
rectangles based on the intervals [xj−1 , xj ]. If we choose the height hj of the
jth rectangle to be smaller (resp. ! larger) than all the values of f on the interval
[xj−1 , xj ], the corresponding sum k1 hj (xj − xj−1 ) will be a lower (resp. upper)
bound for the area under the graph of f . If all goes well, these lower and upper

147
148 Chapter 4. Integral Calculus

approximations will approach each other as we subdivide the interval [a, b] into
smaller and smaller pieces, and their common limit will be the integral of f .
Let us make this more precise, introducing some useful definitions along the
way. A partition P of the interval [a, b] is a subdivision of [a, b] into nonover-
lapping subintervals, specified by giving the subdivision points x1 , . . . , xJ−1 along
with the endpoints x0 = a and xJ = b. In symbols, we shall write
% &
P = x0 , x1 , . . . , xJ , with a = x0 < x1 < · · · < xJ = b.

If P and P ′ are partitions of [a, b], we say that P ′ is a refinement of P if P ′ is


obtained from P by adding in more subdivision points, that is, if P ⊂ P ′ .
Observe that if P and Q are any two partitions of [a, b], they can be combined
into a single partition P ∪ Q whose subdivision points are those of P together with
those of Q; P ∪ Q is a refinement of both P and Q.
Now let f be a bounded real-valued function on [a, b]. (We make no continuity
assumptions on f at this point.) Given a partition P = {x0 , . . . , xJ } of [a, b], for
1 ≤ j ≤ J we set
(4.1) % & % &
mj = inf f (x) : xj−1 ≤ x ≤ xj , Mj = sup f (x) : xj−1 ≤ x ≤ xj .

(If f is continuous, mj and Mj are just the minimum and maximum values of
f on [xj−1 , xj ], which exist by the extreme value theorem.) We then define the
lower Riemann sum sP f and the upper Riemann sum SP f corresponding to the
partition P by

J
" J
"
(4.2) sP f = mj (xj − xj−1 ), SP f = Mj (xj − xj−1 ).
1 1

See Figure 4.1, where the lower and upper Riemann sums are the sums of the areas
of the rectangles, an area being counted as negative if the rectangle is below the
x-axis.
If m and M are the infimum and supremum of the values of f over the whole
interval [a, b], we clearly have mj ≥ m and Mj ≤ M for all j, and hence

J
"
sP f ≥ m (xj − xj−1 ) = m(b − a),
1
J
"
SP f ≤ M (xj − xj−1 ) = M (b − a).
1
4.1. Integration on the Line 149

F IGURE 4.1: Lower and upper Riemann sums.

The same argument shows that if one of the subintervals [xj−1 , xj ] is subdivided
further, the lower sum sP f becomes larger while the upper sum SP f becomes
smaller. In short:
4.3 Lemma. If P ′ is a refinement of P , then sP ′ f ≥ sP f and SP ′ f ≤ SP f .
An immediate consequence of this is that any lower Riemann sum for f is less
than any upper Riemann sum for f :
4.4 Lemma. If P and Q are any partitions of [a, b], then sP f ≤ SQ f .
Proof. Consider the common refinement P ∪ Q. By Lemma 4.3,
sP f ≤ sP ∪Qf ≤ SP ∪Q f ≤ SQ f.

Next, we define the lower and upper integrals of f on [a, b] by


b
I ba (f ) = sup sP f, I a (f ) = inf SP f,
P P

the supremum and infimum being taken over all partitions P of [a, b]. By Lemma
b
4.4, we have I ba (f ) ≤ I a (f ). If the upper and lower integrals coincide, f is called
Riemann integrable on [a, b], and the common value of the upper and lower in-
;b
tegrals is the Riemann integral a f (x) dx. We shall generally omit the eponym
“Riemann,” as the Riemann integral is the only one we shall use in this book, but it
is significant not only for historical reasons but in order to distinguish the Riemann
integral from the more sophisticated Lebesgue integral.
At first sight it would seem difficult to determine whether a function f is inte-
grable and to evaluate its integral, as the definitions involve all possible partitions
of [a, b]. The following lemma is the key to making these calculations more man-
ageable.
150 Chapter 4. Integral Calculus

4.5 Lemma. If f is a bounded function on [a, b], the following conditions are equiv-
alent:
a. f is integrable on [a, b].
b. For every ϵ > 0 there is a partition P of [a, b] such that SP f − sP f < ϵ.
b
Proof. If SP f − sP f < ϵ for some partition P , then I a f − I ba f < ϵ, and since
b
ϵ is arbitrary, it follows that I a f = I ba f , i.e., f is integrable. Conversely, if f is a
bounded function and ϵ is positive, we can find partitions Q and Q′ of [a, b] such
b
that SQ f < I a f + 12 ϵ and sQ′ f > I ba f − 12 ϵ. Thus, if f is integrable, we have
SQ f − sQ′ f < ϵ. Let P = Q ∪ Q′ ; then by Lemma 4.3, sQ′ f ≤ sP f ≤ SP f <
SQ f , so SP f − sP f < SQ f − sQ′ f < ϵ.

The condition (b) in Lemma 4.5 not only gives a workable criterion for integra-
bility but also gives us some leverage for computing the integral of an integrable
function f . Indeed, for any partition P we have
* b
sP f ≤ f (x) dx ≤ SP f,
a
;b
so if SP f − sP f < ϵ, SP f and sP f are both within ϵ of a f (x) dx. The latter
quantity is therefore the limit of the sums SP f or sP f as P runs through any
sequence of partitions such that SP f − sP f → 0.
We next present the fundamental additivity properties of the integral, which are
easy but not quite trivial consequences of the definitions:

4.6 Theorem.
a. Suppose a < b < c. If f is integrable on [a, b] and on [b, c], then f is integrable
on [a, c], and
* c * b * c
(4.7) f (x) dx = f (x) dx + f (x) dx.
a a b

b. If f and g are integrable on [a, b], then so is f + g, and


* b8 * b * b
9
(4.8) f (x) + g(x) dx = f (x) dx + g(x) dx.
a a a

Proof. (a) Given ϵ > 0, let P and Q be partitions of [a, b] and [b, c], respectively,
such that SP f − sP f < 12 ϵ and SQ f − sQ f < 12 ϵ. Then P ∪ Q is a partition of
[a, c] and
SP ∪Q f = SP f + SQ f, sP ∪Q f = sP f + sQ f.
4.1. Integration on the Line 151

It follows that SP ∪Q f − sP ∪Q f < ϵ, so that f is integrable on [a, c] by Lemma 4.5.


;c ;b ;c
Moreover, a f (x) dx is within ϵ of SP ∪Q f , and a f (x) dx, and b f (x) dx are
1
;c ;b
within
;c 2 ϵ of SP f and SQ f , respectively, so a f (x) dx is within 2ϵ of a f (x) dx+
b f (x) dx. Since ϵ is arbitrary, (4.7) follows.
(b) Given ϵ > 0, choose partitions P and Q of [a, b] such that SP f − sP f < 21 ϵ
and SQ g − sQ g < 12 ϵ, and let R = P ∪ Q be the common refinement of P and
Q. Then by Lemma 4.3 we have SR f − sR f ≤ SP f − sP f and SR g − sR g ≤
SQ g − sQ g. Moreover, the maximum of the sum of two functions is at most the
sum of the maxima, and the minimum of the sum is at least the sum of the minima,
so
SR (f + g) ≤ SR f + SR g, sR (f + g) ≥ sR f + sR g.
Hence,
SR (f + g) ≤ SR f + SR g ≤ sR f + 12 ϵ + sR g + 12 ϵ ≤ sR (f + g) + ϵ.
In other words, SR (f + g) − sR (f + g) < ϵ, so that f + g is integrable by Lemma
4.5. The formula (4.8) then follows in much the same way as (4.7).

Remark. We make the usual convention that


* a * b
f (x) dx = − f (x) dx;
b a
then (4.7) holds no matter how the points a, b, c are ordered.
The following theorem lists some more standard properties of integrals. They
are all quite easy to derive from the definitions with the help of Lemma 4.5, and we
leave their proofs as Exercises 2–5.
4.9 Theorem. Suppose f is integrable on [a, b].
;b ;b
a. If c ∈ R, then cf is integrable on [a, b], and a cf (x) dx = c a f (x) dx.
b. If [c, d] ⊂ [a, b], then f is integrable on [c, d].
;b
c. If g is integrable on [a, b] and f (x) ≤ g(x) for x ∈ [a, b], then a f (x) dx ≤
;b
a g(x) dx. );b ) ;b
d. |f | is integrable on [a, b], and ) a f (x) dx) ≤ a |f (x)| dx.
We now derive some useful criteria for integrability. The first one has a very
simple proof, and in conjunction with Theorem 4.6a it establishes the integrability
of most of the functions that arise in elementary calculus. (Such functions have
only a finite number of local maxima and minima on any bounded interval [a, b],
so one can break [a, b] up into finitely many subintervals on which the function in
question is monotone, apply Theorem 4.10 on each subinterval, and then add the
results by Theorem 4.6a.)
152 Chapter 4. Integral Calculus

F IGURE 4.2: An increasing function and a partition with equal subin-


tervals. The difference between the upper and lower Riemann sums is
the sum of the areas of the solid rectangles, which is easily found by
stacking them.

4.10 Theorem. If f is bounded and monotone on [a, b], then f is integrable on


[a, b].

Proof. Suppose f is increasing on [a, b]; the proof is similar if f is decreasing.


Consider the partition Pk of [a, b] into k equal subintervals of length (b − a)/k.
Since f is increasing, the quantities mj and Mj in (4.1) are given by

mj = f (xj−1 ), Mj = f (xj ),

and hence the lower and upper Riemann sums are


k−1 k
b−a" b−a"
s Pk f = f (xj ), SPk f = f (xj ),
k k
0 1

and their difference is


b − a8 9 (b − a)[f (b) − f (a)]
SPk f − s Pk f = f (xk ) − f (x0 ) = .
k k
This can be made as small as we please by taking k sufficiently large, so f is
integrable by Lemma 4.5. (The geometry of this calculation is shown in Figure
4.2.)

The next criterion for integrability is the one that is most commonly stated in
calculus books. Its proof, however, is frequently omitted because it relies on the
notion of uniform continuity that we studied in §1.8.

4.11 Theorem. If f is continuous on [a, b], then f is integrable on [a, b].


4.1. Integration on the Line 153

Proof. First, f is bounded on [a, b] by Theorem 1.23, so the upper and lower Rie-
mann sums for any partition exist. By Theorem 1.33, f is uniformly continuous
on [a, b]; thus, given ϵ > 0, we can find δ > 0 so that |f (x) − f (y)| < ϵ/(b − a)
whenever x, y ∈ [a, b] and |x − y| < δ. Let P be any partition of [a, b] whose
subintervals [xj−1 , xj ] all have length less than δ. Then |f (x) − f (y)| < ϵ/(b − a)
whenever x and y both lie in the same subinterval, and in particular the maximum
and minimum values of f on that subinterval differ by less than ϵ/(b − a). But this
means that
J
"
SP f − s P f = (Mj − mj )(xj − xj−1 )
1
J
ϵ " ϵ
< (xj − xj−1 ) = (b − a) = ϵ.
b−a 1 b−a

By Lemma 4.5, then, f is integrable.

Theorem 4.11 can be extended to functions that have some discontinuities, as


long as the set of discontinuities is “small.” The following result suffices for most
practical purposes.
4.12 Theorem. If f is bounded on [a, b] and continuous at all except finitely many
points in [a, b], then f is integrable on [a, b].
Proof. Let y1 , . . . , yL be the points in [a, b] where f is discontinuous, and let m
and M be the infinum and supremum of {f (x) : a ≤ x ≤ b}, the set of values of f
on [a, b]. Given δ > 0, let
8 9 8
Il = a, b ∩ yl − δ, yl + δ],

and let
L
A
U= Il , V = [a, b] \ U int .
1
Thus U is a union of small intervals that contain the discontinuities of f , and V is
the remainder of [a, b]. Each interval Im has length at most 2δ, and there are L of
these intervals, so the total length of the set U is at most 2Lδ. On the other hand,
V is a finite union of closed intervals, on each of which f is continuous.
Let P be any partition of [a, b] that includes the endpoints of the intervals Im
among its subdivision points. Then we can write

SP f = SPU f + SPV f, sP f = sU V
P f + sP f,
154 Chapter 4. Integral Calculus

where SPU f (resp. SPV f ) is the sum of the terms Mj (xj − xx−1 ) in SP f for which
the interval [xj−1 , xj ] is contained in U (resp. V ), and likewise for sU V
P f and sP f .
Now, let ϵ > 0 be given. Since f is continuous on each of the closed intervals
that constitute V , Theorem 4.11 shows that we can make

SPV f − sVP f < 21 ϵ

by choosing the partition P sufficiently fine. On the other hand,


"
SPU f − sU
Pf = (Mj − mj )(xj − xj−1 )
[xj−1 ,xj ]⊂U

≤ (M − m)(length of U ) ≤ (M − m)2Lδ,

and we can make this less than 12 ϵ by taking δ < ϵ/2L(M − m). In short, for a
suitably chosen P we have SP f − sP f < ϵ, so f is integrable by Lemma 4.5.

The preceding argument actually proves more than is stated in Theorem 4.12.
It is not necessary that the set of discontinuities of f be finite, only that it can be
covered by finitely many intervals I1 , . . . , IL whose total length is as small as we
please. Certain infinite sets, such as convergent sequences, also have this property
(Exercise 6). We make it into a formal definition: A set Z ⊂ R is said to have zero
content # if for any ϵ > 0 there is a finite collection of intervals I1 , . . . , IL such that
(i) Z ⊂ L 1 Il , and (ii) the sum of the lengths of the Il ’s is less than ϵ. The proof of
Theorem 4.12 now yields the following result:

4.13 Theorem. If f is bounded on [a, b] and the set of points in [a, b] at which f is
discontinuous has zero content, then f is integrable on [a, b].

Theorem 4.13 is only a technical refinement of Theorem 4.12, and the reader
should not attach undue importance to it.1 We mention it because its analogue in
higher dimensions does play a significant role in the theory, as we shall see. We
also remark that neither of Theorems 4.10 and 4.13 includes the other; the set of
discontinuities of a monotone function need not have zero content, and there are
continuous functions that are not monotone on any interval. ;b
If f is an integrable function on [a, b], the value of a f (x) dx is somewhat
insensitive to the values of f at individual points, in the following sense:

4.14 Proposition. Suppose f and g are integrable on [a, b] and f (x) = g(x) for
;b ;b
all except finitely many points x ∈ [a, b]. Then a f (x) dx = a g(x) dx.
1
It does, however, point the way toward a necessary and sufficient condition for a function to be
integrable, which we shall describe at the end of §4.8.
4.1. Integration on the Line 155

Proof. First suppose g is identically zero. That is, we are assuming that f (x) = 0
for all x ∈ [a, b] except for finitely many points y1 , . . . , yL . Let Pk be the partition
of [a, b] into k equal subintervals, and take k large enough so that the points yl all
lie in different subintervals. Then
L L
b−a" / 0 b−a" / 0
SPk f = max f (yl ), 0 , s Pk f = min f (yl ), 0 .
k 1
k 1
;b
Both these quantities tend to zero as k → ∞, and hence a f (x) dx = 0.
The general case follows by applying this argument to the difference f − g.

The main use of Proposition 4.14 is in the context of functions with finitely
many discontinuities, as in Theorem 4.12. For such a function f there is often no
“right” way to define f at the points where it is discontinuous. Proposition 4.14
assures us that this problem is of no consequence as far as integration is concerned;
we may define f at these; b points however we like, or indeed leave f undefined there,
without any effect on a f (x) dx.
Next, we present a general version of the fundamental theorem of calculus. Its
two parts say in effect that differentiating an integral or integrating a derivative
leads back to the original function.
4.15 Theorem (The Fundamental Theorem of Calculus). ;x
a. Let f be an integrable function on [a, b]. For x ∈ [a, b], let F (x) = a f (t) dt
(which is well defined by Theorem 4.9b). Then F is continuous on [a, b]; more-
over, F ′ (x) exists and equals f (x) at every x at which f is continuous.
b. Let F be a continuous function on [a, b] that is differentiable except perhaps at
finitely many points in [a, b], and let f be a function on [a, b] that agrees with
F ′ at all points where the latter is defined. If f is integrable on [a, b], then
;b
a f (t) dt = F (b) − F (a).
Proof. (a) If x, y ∈ [a, b], by (4.7) we have
* y
F (y) − F (x) = f (t) dt.
x
Let C = sup{|f (t)| : t ∈ [a, b]}; then by Theorem 4.9d,
* y * y
|F (y) − F (x)| ≤ |f (t)| dt ≤ C dt = C|y − x|,
x x
which implies that F is continuous. Next, suppose that f is continuous at x; thus,
given ϵ > 0, there is a δ > 0 so that |f (t) − f (x)| < ϵ whenever |t − x| < δ. Since
* y * y
1 1
f (x) = f (x) dt = f (x) dt,
y−x x y−x x
156 Chapter 4. Integral Calculus

we have * y
F (y) − F (x) 1
− f (x) = [f (t) − f (x)] dt.
y−x y−x x
Hence, if |y − x| < δ, we have |f (t) − f (x)| < ϵ for all t between y and x, so
) ) )* y )
) F (y) − F (x) ) 1 ) )
) − f (x))) ≤ ) ϵ dt)) = ϵ.
) y−x |y − x| x)

It follows that limy→x [F (y) − F (x)]/(y − x) = f (x), as claimed.


(b) Let P = {x0 , . . . , xJ } be a partition of [a, b]; by adding in extra points,
we may assume that all the points where F is not differentiable are among the
subdivision points xj . Then, for each j, F is continuous on the interval [xj−1 , xj ]
and differentiable on its interior, so by the mean value theorem,
F (xj ) − F (xj−1 ) = F ′ (tj )(xj − xj−1 ) = f (tj )(xj − xj−1 )
for some point tj ∈ (xj−1 , xj ). Adding up these equalities yields
J
"
F (b) − F (a) = F (xJ ) − F (x0 ) = f (tj )(xj − xj−1 ),
1

which implies that


sP f ≤ F (b) − F (a) ≤ SP f.
;b
Since f is integrable, we can make sP f and SP f as close to a f (x) dx as we like
by choosing P suitably, and the desired result follows immediately.

We have developed the notion of the integral of a function f in terms of the up-
per and lower Riemann sums SP f and sP f . More generally, if P = {x0 , . . . , xJ }
is a partition of [a, b] and tj is any point in the interval [xj−1 , xj ] (1 ≤ j ≤ J), the
quantity
"J
f (tj )(xj − xj−1 )
1
is called a Riemann sum for f associated to the partition P . Clearly, if mj and Mj
are as in (4.1) we have mj ≤ f (tj ) ≤ Mj , so that
J
"
sP f ≤ f (tj )(xj − xj−1 ) ≤ SP f.
1

Thus, if f is integrable ;and we choose the partition P so that sP f and SP f are


b
good approximations to a f (x) dx, all the Riemann sums corresponding to P will
;b
also be good approximations to a f (x) dx.
4.1. Integration on the Line 157

One last question should be addressed: Given an integrable function f on [a, b],
; b which partitions P do the sums sP f and SP f furnish a good approximation to
for
a f (x) dx? It might seem that the answer might depend strongly on the nature of
the function f , but in fact, any partition whose subintervals are sufficiently small
will do the job. More precisely:

4.16 Proposition. Suppose f is integrable on [a, b]. Given ϵ > 0, there exists δ > 0
such that if P = {x0 , . . . , xJ } is any partition of [a, b] satisfying

max (xj − xj−1 ) < δ,


1≤j≤J

;b
the sums sP f and SP f differ from a f (x) dx by at most ϵ.

Proof. The proof is presented in Appendix B.3 (Theorem B.7).


;b
Proposition 4.16 shows, in particular, that one can always compute a f (x) dx
as the limit as k → ∞ of sPk f or SPk f , where Pk is the partition of [a, b] into k
equal subintervals.
One final remark: The definite integral, which is defined as a limit of Riemann
sums, may be considered on the intuitive level as a sum of infinitely many infinites-
imal terms. This notion, which is probably quite obvious to the alert reader, is often
not stated explicitly in mathematics texts because of its lack of rigorous meaning.
But the fact is that in many situations — and we shall encounter several of them
later on — the interpretation of the integral as a sum of infinitesimals is the clearest
way to understand what is going on.

EXERCISES

1. Let f (x) = 1 if x is rational, f (x) = 0 if x is irrational. Show that f is not


integrable on any interval.
2. Prove Theorem 4.9a. (Hint: Show that sP (cf ) = csP f and SP (cf ) = cSP f
if c ≥ 0, and sP (cf ) = cSP f and SP (cf ) = csP f if c < 0.)
3. Prove Theorem 4.9b. (Hint: Consider partitions of [a, b] for which c and d are
among the subdivision points.)
4. Prove Theorem 4.9c.
5. Prove Theorem 4.9d. (Hint: To prove that;|f | is integrable,
; show that SP |f | −
sP |f | ≤ SP f − sP f . For the inequality | f | ≤ |f |, observe that ±f ≤ |f |
and use Theorem 4.9c.)
158 Chapter 4. Integral Calculus

6. Let {xk } be a convergent sequence in R. Show that the set {x1 , x2 , . . .} has
zero content.
7. Let f be an integrable function on [a, b]. Suppose that f (x) ≥ 0 for all x
and there is at least one point x0 ∈ [a, b] at which f is continuous and strictly
;b
positive. Show that a f (x) dx > 0.
8. Let f be an integrable function on [a, b]. Prove the following formulas directly
from the definitions:; ; b/c
b
a. For any c > 0, a f (x) dx = c a/c f (cx) dx.
;b ; −a
b. a f (x) dx = −b f (−x) dx.
;b ; b−c
c. For any c ∈ R, a f (x) dx = a−c f (x + c) dx.
9. Suppose g and h are continuous functions on [a, b], and f is a continuous func-
tion on R2 . Show that for any ϵ > 0 there is a δ > 0 such that if P =
{x0 , . . . , xJ } is any partition of [a, b] satisfying max1≤j≤J (xj − xj−1 ) < δ,
then
)* b J )
) " )
) f (g(x), h(x)) dx − f (g(x ′
), h(x ′′
))(x − x ))<ϵ
) j j j j−1 )
a j=1

for any choice of x′j , x′′j in the interval [xj−1 , xj ]. (The point is that x′j and x′′j
need not be equal, so the sum in this inequality may not be a genuine Riemann
sum for the integral.)

4.2 Integration in Higher Dimensions


In this section we develop the theory of multiple integrals. The basic ideas are much
the same as for single integrals; the most serious complication comes from the
greater variety of regions over which integration is to be performed. To minimize
the complexity of the notation, we first develop the two-dimensional case and then
sketch the extension to higher dimensions.
Here and in what follows we shall employ the following notation. If S and T
are sets, their Cartesian product S × T is the set of all ordered pairs (s, t) with
s ∈ S and t ∈ T . For example, the plane is the Cartesian product of the line with
itself: R2 = R × R. This idea extends in the obvious way to products of n sets,
with ordered n-tuples replacing ordered pairs; for example, R3 = R × R × R. We
can also think of R3 as R2 × R or as R × R2 .

Double Integrals. We begin by defining the double integral of a function over


a rectangular region in the plane. In this chapter, by a rectangle we shall mean a
4.2. Integration in Higher Dimensions 159

set of the form


% &
R = [a, b] × [c, d] = (x, y) ∈ R2 : x ∈ [a, b], y ∈ [c, d] .

(Thus, a “rectangle” in this sense is always closed, and its sides are always parallel
to the coordinate axes.) A partition of R is a subdivision of R into subrectangles
obtained by partitioning both sides of R. Thus, a partition P is specified by its
subdivision points,
7
% & a = x0 < · · · < xJ = b,
P = x0 , . . . , xJ ; y 0 , . . . , y K ,
c = y0 < · · · < yK = d,

and it yields a decomposition of R into the subrectangles

Rjk = [xj−1 , xj ] × [yk−1 , yk ]

with area
∆Ajk = (xj − xj−1 )(yk − yk−1 ).
Now let f be a bounded function on the rectangle R. Given a partition P as
above, we set
% & % &
mjk = inf f (x, y) : (x, y) ∈ Rjk , Mjk = sup f (x, y) : (x, y) ∈ Rjk ,

and define the lower and upper Riemann sums of f corresponding to P by


J "
" K J "
" K
sP f = mjk ∆Ajk , SP f = Mjk ∆Ajk .
j=1 k=1 j=1 k=1

The lower and upper integrals of f on R are

I R (f ) = sup sP f, I R (f ) = inf SP f,
P P

the supremum and infimum being taken over all partitions P of R. If the lower and
upper integrals coincide, f is called (Riemann) integrable on R, and the common
value of the upper and lower integrals is called the (Riemann) integral of f over
R and is denoted by
** **
f dA or f (x, y) dx dy.
R R

These notions are entirely analogous to their one-dimensional counterparts.


The reader should refer back to §4.1 for a more detailed discussion, which can
160 Chapter 4. Integral Calculus

easily be adapted to the present situation. However, we have not yet built a satis-
factory definition of two-dimensional integrals, because we often wish to integrate
functions over regions other than rectangles. The solution to this problem is simple,
in principle: To integrate a function f over a bounded region S ⊂ R2 , we draw a
large rectangle R that contains S, (re)define f to be zero outside of S, and integrate
the resulting function over R.
To express this neatly, it is convenient to introduce another definition. If S is a
subset of R2 (or Rn , or indeed any set), the characteristic function or indicator
function of S is the function χS defined by
7
1 if x ∈ S,
χS (x) =
0 otherwise.

Now, suppose S is a bounded subset of R2 and f is a bounded function on R2 .


Let R be a rectangle that contains S. We say that f is integrable on S if f χS is
integrable on R, in which case we define the integral of f over S by
** **
f dA = f χS dA.
S R

It is easily verified that this definition does not depend on the choice of the en-
veloping rectangle R, since the integrand f χS vanishes outside of S. (It also does
not depend on the values of f outside of S. We could just as well assume that
f is only defined on S or on some set containing S, with the understanding that
(f χS )(x) = 0 for x ∈ / S.)
The properties of integrals in two dimensions are very similar to those in one;
the following theorem provides a list of the most basic ones. The proof is essentially
identical to that of Theorems 4.6 and 4.9; we leave the details to the interested
reader.
4.17 Theorem.
a. If f1 and f2 are integrable on the bounded set S and c1 , c2 ∈ R, then c1 f1 +
c2 f2 is integrable on S, and
** ** **
[c1 f1 + c2 f2 ] dA = c1 f1 dA + c2 f2 dA.
S S S

b. Let S1 and S2 be bounded sets with no points in common, and let f be a


bounded function. If f is integrable on S1 and on S2 , then f is integrable
on S1 ∪ S2 , in which case
** ** **
f dA = f dA + f dA.
S1 ∪S2 S1 S2
4.2. Integration in Higher Dimensions 161

;;
If f and g are integrable on S and f (x) ≤ g(x) for x ∈ S, then S f dA ≤
c. ;;
S g dA. ;; ;;
d. If f is integrable on S, then so is |f |, and | S f dA| ≤ S |f | dA.
At this point we need to say more about the conditions under which a function is
integrable. In the one-variable situation, we can get along quite well by restricting
attention to continuous functions, but that is not the case here: Even;;if the function
f is continuous, the function χS that enters into the definition of S f dA is not.
The starting point is the analogue of Theorem 4.13. The notion of “zero content”
transfers readily to sets in the plane; namely, a set Z ⊂ R2 is said to have zero
content if for#any ϵ > 0 there is a finite collection of rectangles R1 , . . . , RM such
that (i) Z ⊂ M 1 Rm , and (ii) the sum of the areas of the Rm ’s is less than ϵ. We
then have:
4.18 Theorem. Suppose f is a bounded function on the rectangle R. If the set of
points in R at which f is discontinuous has zero content, then f is integrable on R.
Proof. The proof is essentially identical to that of Theorem 4.13. That is, one
first shows that f is integrable if f is continuous on all of R by the argument
that proves Theorem 4.11, then encompasses the general case by the argument that
proves Theorem 4.12. Details are left to the reader.
The notion of “zero content” is considerably more interesting in the plane than
on the line, as the sets having this property include not only finite sets but things
such as smooth curves (that is, curves parametrized by C 1 functions f : [a, b] →
R2 ). The following proposition summarizes the results we will need; see also Ex-
ercise 2.
4.19 Proposition.
a. If Z ⊂ R2 has zero content and U ⊂ Z, then U # has zero content.
b. If Z1 , . . . , Zk have zero content, then so does k1 Zj .
c. If f : (a0 , b0 ) → R2 is of class C 1 , then f ([a, b]) has zero content whenever
a0 < a < b < b0 .
Proof. Parts (a) and (b) are easy, and their proofs are left as an exercise. As for
(c), let Pk = {t0 , . . . , tk } be the partition of [a, b] into k equal subintervals of
length δ = (b − a)/k, and let C be an upper bound for {|f ′ (t)| : t ∈ [a, b]}. By
the mean value theorem applied to the two components x(t), y(t) of f (t), we have
|x(t) − x(tj )| ≤ Cδ and |y(t) − y(tj )| ≤ Cδ for t ∈ [tj−1 , tj ]. In other words,
f ([tj−1 , tj ]) is contained in the square of side length 2Cδ centered at f (tj ). Hence,
f ([a, b]) is contained in the union of these squares, and the sum of their areas is
k(2Cδ)2 = 4C 2 (b − a)2 /k. This can be made as small as we please by taking k
sufficiently large, so f ([a, b]) has zero content.
162 Chapter 4. Integral Calculus
;;
To apply Theorem 4.18 to the integrand f χS in the definition of S f dA, we
need to know about the discontinuities of χS . The following lemma provides the
answer.

4.20 Lemma. The function χS is discontinuous at x if and only if x is in the


boundary of S.

Proof. If x is in the interior of S, then χS is identically 1 on some ball containing


x, so it is continuous at x. Likewise, if x is in the interior of the complement S c ,
then f is identically 0 near x and hence continuous at x. But if x is in the boundary
of S, then there are points arbitrarily close to x where χS = 1 and other such points
where χS = 0, so χS is discontinuous at x.

In view of Theorem 4.18 and Lemma 4.20, to have a good notion of integra-
tion over a set S, we should require the boundary of S to have zero content. We
make this condition into a formal definition: A set S ⊂ R2 is Jordan measurable
if it is bounded and its boundary has zero content. (We shall comment further on
this nomenclature below.) We shall generally say “measurable” rather than “Jor-
dan measurable,” but we advise the reader that in more advanced works the term
“measurable” refers to the more general concept of Lebesgue measurability (see
§4.8).
By Proposition 4.19, any bounded set whose boundary is a finite union of pieces
of smooth curves is measurable; these are the sets that we almost always encounter
in practice. The following theorem gives a convenient criterion for integrability.

4.21 Theorem. Let S be a measurable subset of R2 . Suppose f : R2 → R is


bounded and the set of points in S at which f is discontinuous has zero content.
Then f is integrable on S.

Proof. The only points where f χS can be discontinuous are those points in the
closure of S where either f or χS is discontinuous. By Lemma 4.20 and Proposition
4.19b, the set of such points has zero content. By Theorem 4.18, f χS is integrable
on any rectangle R containing S, and hence f is integrable on S.

To complete the picture, we need the following generalization of Proposition


4.14, which shows that sets of zero content are negligible for the purposes of inte-
gration.

4.22 Proposition. Suppose Z ;;⊂ R2 has zero content. If f : R2 → R is bounded,


then f is integrable on Z and Z f dA = 0.
4.2. Integration in Higher Dimensions 163

Proof.#Given ϵ > 0, there is a finite collection of rectangles R1 , . . . , RM such that


Z ⊂ M 1 Rm and the sum of the areas of the Rm ’s is less than ϵ. By subdividing
these rectangles if necessary, we can assume that they have disjoint2 interiors and
form part of a grid obtained by partitioning some large rectangle R. Denoting this
partition by P , the area of Rj by |Rj |, and supx |f (x)| by C, we have

M
" M
"
−Cϵ < −C |Rj | ≤ sP (f χZ ) ≤ SP (f χZ ) ≤ C |Rj | < Cϵ.
1 1

Since ϵ is arbitrary, the desired conclusion follows directly from the definition of
the integral.

4.23 Corollary.
a. Suppose f is integrable on S ⊂ R2 . If g is bounded and g(x)
;; = f (x);;except for
x in a set of zero content, then g is integrable on S and S g dA = S f dA.
b. Suppose f is integrable on;;S and on T , and;; S ∩ T has
;; zero content. Then f is
integrable on S ∪ T , and S∪T f dA = S f dA + T f dA.

Proof. For (a), apply Proposition 4.22 to the function f − g. For (b), we are as-
suming that f χS and f χT are integrable; moreover, by Proposition 4.22, f χS∩T
is integrable and its integral is zero. But f χS∪T = f χS + f χT − f χS∩T , so the
result follows.

Area. The problem of determining the area of regions in the plane goes back
to antiquity. The first effective general method of attacking this problem was pro-
vided by the integral calculus in one variable, which yields the area of a region
under a graph, or of a region between two graphs. It therefore produces a theory
of area for regions that can be broken up into finitely many subregions bounded by
graphs of (nice) functions. However, the two-variable theory of integration con-
tains, as a special case, a theory of area (due to the French mathematician Jordan)
that encompasses more complicated sorts of regions too. Namely, if S is any Jordan
measurable set in the plane, its area is the integral over S of the constant function
f (x) ≡ 1: ** **
(area)(S) = 1 dA = χS dA,
S
the latter integral being taken over any rectangle that contains S.
Let us;;pause to see just what this means. Given any bounded set S ⊂ R2 , to
compute S χS dA we enclose S in a large rectangle R and consider a partition P
2
A collection {Sj } of sets is disjoint if Sj ∩ Sk = ∅ for j ̸
= k.
164 Chapter 4. Integral Calculus

F IGURE 4.3: Approximations to the inner and outer areas of a region.

of R, which produces a grid of small rectangles that cover S. The lower sum for this
partition is simply the sums of the areas of the small rectangles that are contained
in S, whereas the upper sum is the sum of the areas of the small rectangles that
intersect S. Taking the supremum of the lower sums and the infimum of the upper
sums yields quantities that may be called the inner area and outer area of S:

A(S) = I R (χS ), A(S) = I R (χS ).

When these two quantities coincide, that is, when the characteristic function χS is
integrable, their common value is the area of S. See Figure 4.3.
When do we have A(S) = A(S)? It is not hard to see (Exercises 3–5) that for
any bounded set S,
• S and its interior S int have the same inner area;

• S and its closure S have the same outer area;

• the inner area of S int plus the outer area of the boundary ∂S equals the outer
area of the closure S.
It follows that the inner and outer areas of S coincide precisely when the outer area
of the boundary ∂S is zero. But a moment’s thought shows that this is nothing but
the condition that ∂S should have zero content. In short, the inner and outer area
of S coincide precisely when S is measurable. This is the explanation for the name
“measurable”: The measurable sets are the ones that have a well-defined area.
Although the class of Jordan measurable sets is much more extensive than the
class of sets whose area can be computed by one-variable calculus, it is not as big
as we would ideally wish. It does not include all bounded open sets or all compact
4.2. Integration in Higher Dimensions 165

sets, for example. Moreover, it does not behave well with respect to passage to
limits: The union of a sequence of measurable sets, all contained in a common
rectangle, need not be measurable.
A simple example of the latter phenomenon can be obtained by considering the
sets Sk of all points in the unit square whose x-coordinate is an integer multiple of
2−k . Each Sk is the union of a finite collection
#∞ of line segments, so it is measurable
and its area is zero. However, the union 1 Sk is the set of all points in the unit
square whose x-coordinate has a terminating base-2 decimal expansion. This set
is dense in the unit square but has no interior, from which it is easy to see that its
inner area is 0 but its outer area is 1 (Exercises 3 and 4). By “fattening up” the
sets Sk (replacing the line segments in them by thin rectangles), we can also obtain
examples of open sets and closed sets that are not measurable (Exercise 6).
The defects of the Jordan theory of area carry over more generally to the theory
of integration we are discussing, and for more advanced work one needs the more
sophisticated Lebesgue theory of measure and integration, of which we present a
brief sketch in §4.8. It is largely for this reason that we are being somewhat cavalier
about presenting all the theoretical details in this chapter; there seems to be little
virtue in expending an enormous amount of effort on a theory that must be upgraded
when one proceeds to a more advanced level.

Higher Dimensions. The theory of n-dimensional integrals is almost identical


to the theory of double integrals; the only reason we have not considered an arbi-
trary n from the beginning is that the notation is simpler, and the geometric intuition
is clearer, when n = 2. We have merely to replace rectangles by n-dimensional
rectangular boxes, that is, regions in Rn of the form
% &
R = [a1 , b1 ] × · · · × [an , bn ] = x : a1 ≤ x1 ≤ b1 , . . . , an ≤ xn ≤ bn .

The n-dimensional volume


B B of such a box is the product of!the lengths of its sides,
n
j=1 (bj − aj ). (Here is the product sign, analogous to for sums.) A partition
of such a box is specified by partitioning each of its “sides” [a1 , b1 ], . . . , [an , bn ].
The notion of “zero content” generalizes to n dimensions in the obvious way:
A bounded set Z ⊂ Rn has zero content if for any ϵ > 0 there are#rectangular
boxes R1 , . . . , RK whose total volume is less than ϵ, such that Z ⊂ K 1 Rj . The
analogue of Proposition 4.19c is that smooth submanifolds of dimension k < n in
Rn (given parametrically by C 1 maps f : Rk → Rn ) have zero content.
With these modifications, the definition of integrability and Theorems 4.17,
4.18, and 4.21 work just as in the 2-dimensional case. The element of area dA
becomes an element of n-dimensional volume, which may be denoted by dV n ,
dn x, or dx1 · · · dxn : thus, the notation for n-dimensional integrals over a region
166 Chapter 4. Integral Calculus

S ⊂ Rn is
* * * * * *
n n
· · · f dV = · · · f (x) d x = · · · f (x1 , . . . , xn ) dx1 · · · dxn ,
S S S

; ;
where · · · is shorthand for a row of n integral signs. When n = 3, we usually
write dV instead of dV 3 , the V denoting ordinary 3-dimensional volume.
We conclude with a useful fact about integrals in any number of dimensions.

4.24 Theorem (The Mean Value Theorem for Integrals). Let S be a compact, con-
nected, measurable susbset of Rn , and let f and g be continuous functions on S
with g ≥ 0. Then there is a point a ∈ S such that
* * * *
· · · f (x)g(x) d x = f (a) · · · g(x) dn x.
n
S S

Proof. Let m and M be the minimum and maximum values of f on S, which exist
since S is compact. Since g ≥ 0, we have mg ≤ f g ≤ M g on S, and hence
* * * * * *
m · · · g(x) d x ≤ · · · f (x)g(x) d x ≤ M · · · g(x) dn x.
n n
S S S

; ; ; ;
Thus the quotient ( · · · f g)/( · · · g) lies between m and M , so by the interme-
diate value theorem, it is equal to f (a) for some a ∈ S.

The special case g ≡ 1 is of particular interest:

4.25 Corollary. Let S be a compact, connected, measurable subset of Rn , and let


f be a continuous function on S. Then there is a point a ∈ S such that
* *
· · · f (x) dn x = f (a)|S|,
S

where |S| denotes the n-dimensional volume of S.


; ;
The ratio of · · · S f (x)dn x to the n-dimensional volume of S is, by definition,
the average or mean value of f on S. Corollary 4.25 says that when f is contin-
uous and S is compact and connected, there is some point in S at which the actual
value of f is the average value.
4.2. Integration in Higher Dimensions 167

EXERCISES
1. Prove Proposition 4.19(a,b).
2. Let f : [a, b] → R be an integrable function.
a. Show that the graph of f in R2 has zero content. (Hint: Given a partition
P of [a, b], interpret SP f − sP f as a sum of areas of rectangles that cover
the graph of f .)
b. Suppose f ≥ 0 and let S = {(x, y) : x ∈ [a, b], 0 ≤ y ≤ f (x)}. Show
; b S is measurable and that its area (as defined in this section) equals
that
a f (x) dx.
3. Let S be a bounded set in R2 . Show that S and S int have the same inner area.
(Hint: For any rectangle contained in S, there are slightly smaller rectangles
contained in S int .)
4. Let S be a bounded set in R2 . Show that S and S have the same outer area.
(Hint: For any rectangle that does not intersect S, there are slightly smaller
rectangles that do not intersect S.)
5. Let S be a bounded set in R2 . Show that the inner area of S plus the outer area
of ∂S equals the outer area of S. (Use Exercises 3 and 4.)
6. Let S be the subset of the x-axis consisting of the union of the open interval
of length 41 centered at 12 , the open intervals of length 161
centered at 14 and 34 ,
1 1 3 5 7
the open intervals of length 64 centered at 8 , 8 , 8 , and 8 , and so forth. Let
U = S × (0, 1) be the union of the open rectangles of height 1 based on these
intervals. Thus U is the union of one rectangle of area 14 , two rectangles of area
1 1
16 , four rectangles of area 64 , . . . , some of which overlap.
a. Show that U is an open subset of the unit square R = [0, 1] × [0, 1].
b. Show that the inner area of U is less than 21 .
c. Show that U is dense in R and hence that its outer area is 1. (Use Exercise
4.)
d. Let V = R \ U . Show that V is a closed set whose inner area is 0 and
whose outer area is bigger than 21 .
7. (The Second Mean Value Theorem for Integrals) Suppose f is continuous on
[a, b] and ϕ is of class C 1 and increasing on [a, b]. Show that there is a point
c ∈ [a, b] such that
* b * c * b
f (x)ϕ(x) dx = ϕ(a) f (x) dx + ϕ(b) f (x) dx.
a a c
;x
(Hint: First suppose ϕ(b) = 0. Set F (x) = a f (t) dt, integrate by parts
;b ;b
to show that a f (x)ϕ(x) dx = − a F (x)ϕ′ (x) dx, and apply Theorem 4.24
168 Chapter 4. Integral Calculus

to the latter integral. To remove the condition ϕ(b) = 0, show that if the
conclusion is true for f and ϕ, it is true for f and ϕ + C for any constant C.)

4.3 Multiple Integrals and Iterated Integrals


The next issue to be addressed is the evaluation of n-dimensional integrals. The
usual procedure is to reduce them to one-dimensional integrals.
Again we focus on the case n = 2, and we begin by considering the integral of
a function f over a rectangle R. Given a partition P = {x0 , . . . , xJ ; y0 , . . . , yK }
of R, we pick points x:j ∈ [xj−1 , xj ] and y:k ∈ [yk−1 , yk ] (1 ≤ j ≤ J, 1 ≤ k ≤ K)
and form the Riemann sum
J "
" K
f (:
xj , y:k ) ∆xj ∆yk (∆xj = xj − xj−1 , ∆yk = yk − yk−1 ).
j=1 k=1
;;
If f is integrable on R, this double sum approximates the integral R f (x, y) dx dy.
!
On the other hand, for each fixed y, the sum Jj=1 f (: xj , y) ∆xj is a Riemann sum
;b !
for the single integral g(y) = a f (x, y) dx, and then the sum K k=1 g(:yk ) ∆yk is
;d
a Riemann sum for the integral c g(y) dy. Thus, in an approximate sense,

** J "
" K
f (x, y) dx dy ≈ f (xj , yk ) ∆xj ∆yk
R j=1 k=1
K *
" b * d +* b ,
≈ f (x, yk ) dx ∆yk ≈ f (x, y) dx dy.
k=1 a c a

In short, if there are no unexpected pitfalls we should have


** * d +* b ,
f dA = f (x, y) dx dy.
R c a

We could also play the same game with x and y switched, obtaining
** * b +* d ,
f dA = f (x, y) dy dx.
R a c

If f is continuous on the rectangle R, it is not hard to make this argument


rigorous by using the uniform continuity of f . However, we need to allow discon-
tinuous functions in order to encompass integrals over more general regions, and
4.3. Multiple Integrals and Iterated Integrals 169

here there is one potential pitfall: The integrability of f on R need not imply the
integrability of f (x, y0 ), as a function of x for fixed y0 , on [a, b]. The line seg-
ment {(x, y) : a ≤ x ≤ b, y = y0 } is a set of zero content, after all, so f could
be discontinuous at every point on it, and its behavior as a function of x could be
quite wild. This problem is actually not too serious, and we shall sweep it under
the rug by making the assumption — quite harmless in practice — that it does not
occur. The resulting theorem is as follows. It is sometimes referred to as Fubini’s
theorem, although that name belongs more properly to the generalization of the
theorem to Lebesgue integrals.
4.26 Theorem. Let R = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d}, and let f be an
integrable function on R. Suppose that, for each y ∈ [c, d], the function fy defined
;b
by fy (x) = f (x, y) is integrable on [a, b], and the function g(y) = a f (x, y) dx is
integrable on [c, d]. Then
** * d +* b ,
(4.27) f dA = f (x, y) dx dy.
R c a

Likewise, iff x (y) = f (x, y) is integrable on [c, d] for each x ∈ [a, b], and h(x) =
;d
c f (x, y) dy is integrable on [a, b], then
** * b +* d ,
(4.28) f dA = f (x, y) dy dx.
R a c

Proof. The proof is presented in Appendix B.4 (Theorem B.9). The issue that
must be addressed is the permissibility of first letting the x-subdivisions get finer
and finer, and then doing the same for the y-subdivisions, or vice versa, as opposed
to requiring both subdivisions to become finer at the same time.

The integrals on the right side of (4.27) and (4.28) are called iterated integrals.
It is customary to omit the brackets in these integrals and to write, for example,
* d* b
f (x, y) dx dy,
c a

with the understanding that; the integration is to be done “from the inside out.” That
b
is, the innermost integral a corresponds to the innermost differential dx, and the
integral with respect to the corresponding variable x is to be performed first. Some
people find it clearer to write the differentials dx and dy next to the integral signs
to which they pertain, thus:
* d * b
dy dx f (x, y).
c a
170 Chapter 4. Integral Calculus

;; ;;
F IGURE 4.4: · · · dx dy versus · · · dy dx.

If our region of integration is not the whole rectangle R but a subset S, the in-
tegration effectively stops at the boundary of S, and the limits of integration should
be adjusted accordingly. For example, if S is bounded above and below by the
graphs of two functions,
% &
(4.29) S = (x, y) : a ≤ x ≤ b, ϕ(x) ≤ y ≤ ψ(x) ,

we have
** * b* ψ(x)
(4.30) f dA = f (x, y) dy dx.
S a ϕ(x)

Here it is essential to integrate first in y, then in x, since the limits ϕ(x) and ψ(x)
;b
furnish part of the x-dependence of the integrand for the outer integral a · · · dx.
It is important to observe that if S is a region of the form (4.29) where ϕ and
ψ are of class C 1 , and f is continuous on S, the hypotheses in Theorem 4.26
that allow integration first in y and then in x are automatically satisfied, so that
(4.30) is valid. Indeed, the integrability of f χS on any rectangle R ⊃ S follows
from Proposition 4.19c and Theorem 4.21, and the integrability of the function
(f χS )(x, y) as a function of y for fixed x is obvious since it is continuous except
at the two points y = ϕ(x) and y = ψ(x).
On the other hand, if S is bounded on the left and right by graphs of functions
of y, we obtain a formula similar to (4.30) with the roles of x and y reversed.
In general, most of the regions S that arise in practice can be decomposed into a
finite number of pieces S1 , . . . , SK , each of which is of the form (4.29) or of the
analogous form with x and y switched. By using ;; the additivity property (Theorem
4.17b), we can reduce the computation of S f dA to the calculation of iterated
integrals on these subregions.
Figure 4.4 may
;; be helpful in interpreting iterated integrals. The sketch on the
left symbolizes · · · dx dy, in which we integrate first over the horizontal lines
that run from the left side to the right side of the region, then integrate over the
y-interval that comprises;;the y-coordinates of all these lines. Similarly, the sketch
on the right symbolizes · · · dy dx.
4.3. Multiple Integrals and Iterated Integrals 171

2 2

−4 4

1 −2

F IGURE 4.5: The regions of integration in Example 1 (left) and Example 2 (right).

E XAMPLE 1. Find the volume of the region in R3 above the triangle T in


the xy-plane with vertices (0, 0), (1, 0), and (1, 2) and below the surface z =
xy + y 2 . (See Figure 4.5.) ;;
Solution. The volume in question is T (xy + y 2 ) dA, which can be ex-
pressed as an iterated integral in two ways:
* 2* 1 * 1 * 2x
2
(xy + y ) dx dy or (xy + y 2 ) dy dx.
0 y/2 0 0

For the sake of illustration, we perform both calculations:


* 2* 1 * 2 * 2
2
(xy + y ) dx dy = [ 21 x2 y + xy 2 ]1y/2 dy = ( 12 y + y 2 − 85 y 3 ) dy,
0 y/2 0 0
* 1 * 2x * 1 * 1
2
(xy + y ) dy dx = [ 12 xy 2 + 1 3 2x
3 y ]0 dx = 14 3
3 x dx.
0 0 0 0

Both single integrals on the right evaluate to 76 .


E XAMPLE 2. Let S be the region between the;;parabolas x = 4 − y 2 and
x = y 2 − 4. (See Figure 4.5.) A double integral S f (x, y) dA can be reduced
to iterated integrals in two ways. Integrating first in x is more straightforward:
* 2 * 4−y 2
f (x, y) dx dy.
−2 y 2 −4

To integrate first in y, we must break up R into its left and right halves:
* * √ * √
0 4+x 4* 4−x
√ f (x, y) dy dx + √ f (x, y) dy dx.
−4 − 4+x 0 − 4−x
172 Chapter 4. Integral Calculus

The ideas in higher dimensions are entirely similar. The analogue of Theo-
rem 4.26 is that an integral over an n-dimensional rectangular solid with sides
[a1 , b1 ], . . . , [an , bn ] can be evaluated as an n-fold iterated integral,
* * * bn * b1
· · · f dV = ··· f (x1 , . . . , xn ) dx1 · · · dxn ,
R an a1

provided that the indicated integrals exist. The meaning of the iterated integral
on the right is that the integration is to be performed first with respect to x1 and
last with respect to xn . However, the same formula remains valid with the n inte-
grations performed
; bj in any order. The only thing that needs some care is that the
integral signs aj must be matched up with the differentials dxj in the right order
so as to get the right limits of integration, and the convention is the same as in
the case n = 2: The integrations are to be performed in order from innermost to
outermost.
When the region of integration is something other than a rectangular solid, set-
ting up the right limits of integration can be rather complicated. A typical situation
in 3 dimensions is as follows: The region of integration S is the region in between
two graphs,
% &
S = (x, y, z) : (x, y) ∈ U, ϕ(x, y) ≤ z ≤ ψ(x, y) ,

based on some region U in the xy-plane. The region U in turn is the region between
two graphs,
% &
U = (x, y) : a ≤ x ≤ b, σ(x) ≤ y ≤ τ (x) ,

based on an interval [a, b] ⊂ R. We then have


*** * b* τ (x) * ψ(x,y)
f dV = f (x, y, z) dz dy dx.
S a σ(x) ϕ(x,y)

The rule to remember is that limits of integration in an iterated integral can


depend on the remaining “outer” variables whose integration is yet to be performed,
but not on the “inner” variables that have already been integrated out. The final
answer should be a number, not a function of some of the variables!

E XAMPLE 3. Find the mass of the tetrahedron T formed by the three coordi-
nate planes and the plane x + y + 2z = 2 (see Figure 4.6) if the mass density
is given by ρ(x, y, z) = e−z .
4.3. Multiple Integrals and Iterated Integrals 173

(0, 0, 1)

(0, 0, 0)
(0, 2, 0)

(2, 0, 0)

F IGURE 4.6: The tetrahedron in Example 3.

;;;
Solution. There are six ways to write the triple integral T e−z dV as an
iterated integral, although only three of them are essentially different, namely,
* 2 * 2−x * 1−(x+y)/2 * 1 * 2−2z * 2−y−2z
−z
e dz dy dx, e−z dx dy dz,
0 0 0 0 0 0
* 2 * 1−(y/2) * 2−y−2z
e−z dx dz dy.
0 0 0

(The remaining three can be obtained from these simply by interchanging x


and y, since T and the density function are invariant under this interchange.)
Using the first of these, we obtain
* 2 * 2−x 28 *
(x+y)/2−1
92−x
(1 − e ) dy dx = y − 2e(x+y)/2−1 0 dx
0 0 0
* 2
8 92
= (2e(x/2)−1 − x) dx = 4e(x/2)−1 − 12 x2 0 = 2 − 4e−1 .
0

The reader may verify that the other two iterated integrals give the same answer.

In the preceding discussion, iterated integrals appeared as a tool for evaluating


n-dimensional integrals. However, they also arise in a number of other contexts
in advanced analysis where a quantity is defined by performing two or more in-
tegrations in succession. In this context, the significance of Theorem 4.26 is that
under suitable hypotheses on the integrand f , the order of integration in an iterated
integral can be reversed:
* b* d * d* b
(4.31) f (x, y) dy dx = f (x, y) dx dy.
a c c a
174 Chapter 4. Integral Calculus

More precisely, (4.31) is valid if f satisfies the conditions in Theorem 4.26 for
both (4.27) and (4.28) to hold. (See Exercise 13 for an example to demonstrate
the significance of these conditions.) The importance of this result can hardly be
overestimated; it is an extremely powerful tool for evaluating quantities defined by
integrals. We shall see a number of examples in later chapters.
;2;1 3
E XAMPLE 4. Evaluate 0 y/2 ye−x dx dy.
Solution. The integral cannot be evaluated by elementary methods as it
−x3
;; e −x3has no elementary antiderivative. However, it can be inter-
stands, since
preted as T ye dA where T is the triangle with vertices (0, 0), (1, 0), and
(1, 2) as in Example 1. Writing this double integral as an iterated integral in
the other order leads to an easy calculation:
* 1 * 2x * 1 * 1
−x3
)
1 2 )2x −x3 3
ye dy dx = 2 y 0
e dx = 2x2 e−x dx
0 0 0 0
3 )1
= − 23 e−x )0 = 32 (1 − e−1 ).

Applications. Double and triple integrals can be used to calculate physical and
geometric quantities in much the same way as single integrals. Here are a few
standard examples:
;;
• If f (x, y) ≥ 0, the integral S f dA can be interpreted as the volume of the
region in R3 between the graph of f and the xy-plane that lies over the base
region S.

• Suppose that a quantity of some substance (which might be mass, elec-


tric charge, a particular chemical compound, etc.) is distributed through-
out a region U ⊂ R3 . It is frequently useful to think of the distribution
of the substance as being described by a density function ρ; the meaning
this, in practical terms, is that the amount of substance in a set S ⊂ U
of ;;;
is S ρ(x) d3 x. This idea works also in other dimensions, for example, to
describe distributions of a substance on a planar surface or a line.
(The reader may wish for a more careful discussion of the meaning of ρ. In-
formally, ρ(x) represents the ratio of the amount of substance in an infinites-
imal cube centered at x to the volume of that cube. To make this rigorous,
one should interpret ρ(x) as the limit of the ratio of the amount of substance
in a finite cube centered at x to the volume of that cube as the side length of
the cube tends to zero. One can then prove, under
;;; suitable hypotheses, that
the amount of substance in any region S is S ρ(x)d3 x. But a complete
analysis of these matters is beyond the scope of this book.)
4.3. Multiple Integrals and Iterated Integrals 175

• Suppose that a massive object with mass


;;; density ρ(x) occupies the region
3 3
S ⊂ R , so that its total mass is m = S ρ(x)d x. The center;;; of gravity of
the object is the point x whose coordinates are xj = m−1 S xj ρ(x)d3 x. In
the special case where ρ ≡ 1, x is the centroid of the region S, which is the
point whose coordinates are the average values of the coordinate functions
on S. The center of mass, in general, can be interpreted similarly as the point
whose coordinates are the weighted averages of the coordinate functions on
S where the weighting is given by the density ρ.

• Again suppose that a massive object with mass density ρ(x) occupies the
region S ⊂ R3 , and let L be a line in R3 . The moment of inertia of the
the line L, a quantity that is useful in analyzing rotational motion
body about ;;;
about L, is S d(x)2 ρ(x)d3 x, where d(x) is the distance from x to L. (For
example, if L is the z-axis, then d(x, y, z)2 = x2 + y 2 .)

EXERCISES

1. Evaluate
;; the following double integrals.
a. ;;S (x + 3y 3 ) dA, S = the upper half (y ≥ 0) of the unit disc x2 + y 2 ≤ 1.

b. S (x2 − y) dA, S = the region between the parabola x = y 2 and the
line x = 2y.
2. Find the volume of the region above the triangle in the xy-plane with vertices
(0, 0), (1, 0), and (0, 1), and below the surface z = 6xy(1 − x − y).
;;
3. For the following regions S ⊂ R2 , express the double integral S f dA in
terms of iterated integrals in two different ways.
a. S = the region in the left half plane between the curve y = x3 and the line
y = 4x.
b. S = the triangle with vertices (0, 0), (2, 2), and (3, 1).
c. S = the region between the parabolas y = x2 and y = 6 − 4x − x2 .
4. Express each of the following iterated integrals as a double integral and as an
iterated integral in the opposite order. (That is, find the region of integration
for the double integral and the limits of integration for the other iterated inte-
gral.)
; 1 ; x1/3
a. 0 x2 f (x, y) dy dx.
; 1 ; 2y
b. 0 −y f (x, y) dx dy.
; 2 ; log x
c. 1 0 f (x, y) dy dx.
176 Chapter 4. Integral Calculus

5. Evaluate the following iterated integrals. (You may need to reverse the order of
integration.)
;3;y
a. 1 1 ye2x dx dy.
;1;1
b. 0 √x cos(y 3 + 1) dy dx.
;2;1
c. 1 1/x yexy dy dx.
; 1 ; x+1 ;1 ;2
6. Fill in the blanks: 0 2x2 f (y) dy dx = 0 [ ] dy + 1 [ ]dy. The expres-
sions you obtain for the [ ]’s should not contain integral signs.
;x;y
7. Given a continuous function g : R → R, let h(x) = 0 0 g(t) dt dy. That is,
h is obtained by integrating g twice, starting the integration
; x at 0. Show that h
can be expressed as a single integral, namely, h(x) = 0 (x − t)g(t) dt. (Note
that x can be treated as a constant here; y and t are the variables of integration.)

8. Let S ⊂ R3 be the region between 2 2


;;; the paraboloid z = x +y and the plane z =
1. Express the triple integral S f dV as an iterated integral with the order of
integration (a) z, y, x; (b) y, z, x; (c) x, y, z. (That is, find the appropriate limits
of integration in each case.)
; 1 ; 1−y2 ; y
9. Express the iterated integral 0 0 0 f (x, y, z) dz dx dy
a. as a triple integral (i.e., describe the region of integration);
b. as an iterated integral in the order z, y, x;
c. as an iterated integral in the order y, z, x.

10. Find the centroid of the tetrahedron bounded by the coordinate planes and the
plane (x/a) + (y/b) + (z/c) = 1.

11. An object with mass density ρ(x, y, z) = yz occupies the cube {(x, y, z) : 0 ≤
x, y, z ≤ 2}. Find its mass and center of mass.

12. A body with charge density ρ(x, y, z) = 2z occupies the region bounded below
by the parabolic cylinder z = x2 − 3, above by the plane z = x − 1, and on the
sides by the planes y = 0 and y = 2. Find its net charge (total positive charge
minus total negative charge).

13. Let f (x, y) = y −2 if 0 < x < y < 1, f (x, y) = −x−2 if 0 < y < x < 1, and
f (x, y) = 0 otherwise, and let S be the unit square [0, 1] × [0, 1].
a. Show that f is not integrable on S, but that f (x, y) is integrable on [0, 1]
as a function of x for each fixed y and as a function of ;y for each fixed x.
1;1
b. Show by explicit calculation that the iterated integrals 0 0 f (x, y) dx dy
;1;1
and 0 0 f (x, y) dy dx both exist and are unequal.
4.4. Change of Variables for Multiple Integrals 177

4.4 Change of Variables for Multiple Integrals


To motivate the ideas in this section, we recall the change-of-variable formula for
single definite integrals: If g is a one-to-one function of class C 1 on the interval
[a, b], then for any continuous function f ,
* b * g(b)

(4.32) f (g(u))g (u) du = f (x) dx.
a g(a)

The proof is a simple matter of combining the chain rule and the fundamental the-
orem of calculus. Indeed, if F is an antiderivative
;b of f , the right side of (4.32) is

F (g(b)) − F (g(a)), which in turn equals a (F ◦ g) (u) du, and the latter integrand
is f (g(u))g ′ (u). (Formula (4.32) is actually valid when f is merely integrable, but
we shall not worry about this refinement here.)
There is one slightly tricky point here, which we point out now because it will
be significant later. If g is an increasing function, (4.32) is fine as it stands, but
if g is decreasing, the endpoints on the integral on the right are in the “wrong”
order, and we might prefer to put them back in the “right” order by introducing a
; g(b) ; g(a)
minus sign: g(a) = − g(b) . Since g is increasing or decreasing according as g′ is
positive or negative, we could rewrite (4.32) as
* *

(4.33) f (g(u))|g (u)| du = f (x) dx.
[a,b] g([a,b])

Here g([a, b]);is the interval to which [a, b] is mapped under g, and for any interval
I the symbol I means the integral from the left endpoint of I to the right endpoint.
The replacement of g′ by |g′ | compensates for the extra minus sign that comes from
adjusting the order of the endpoints when g is decreasing.
In practice it is often more convenient to have all the g’s on one side of the
equation. If we set I = g([a, b]), we have [a, b] = g−1 (I), and (4.33) becomes
* *
(4.34) f (x) dx = f (g(u))|g ′ (u)| du.
I g −1 (I)

Our object is to find the analogous formula for multiple integrals. It is natural
to use (4.34) rather than (4.32) as a starting point, since for multiple integrals the
issue of left-to-right or right-to-left disappears and we just speak of integrals over
a region, like the integrals over intervals that appear in (4.34). More precisely,
suppose G is a one-to-one transformation from a region R in Rn to another region
S = G(R) in Rn ; then R = G−1 (S), and the formula we are seeking should look
178 Chapter 4. Integral Calculus

r dθ

dr
dr

F IGURE 4.7: The element of area in polar coordinates.

something like this:


* * * *
n
(4.35) · · · f (x) d x = · · · f (G(u)) [????] dn u.
S G−1 (S)

The missing ingredient is the quantity that will play the role of |g′ (u)| in the formula
(4.34).
Now, the g′ (u) in (4.32) or (4.34) is the factor that relates the differentials du
and dx under the transformation x = g(u). In n variables, the n-fold differential
dn x = dx1 · · · dxn represents the “element of volume,” that is, the volume of an
infinitesimal piece of n-space. So the question is: How does the volume of a tiny
piece of n-space change when one applies the transformation G?
To get a feeling for what is going on, let us look at the polar coordinate map

(x, y) = G(r, θ) = (r cos θ, r sin θ).

A small rectangle in the rθ-plane with lower left corner at (r, θ) and sides of length
dr and dθ is mapped to a small region in the xy-plane bounded by two line seg-
ments of length dr and two circular arcs of length r dθ and (r + dr) dθ. When dr
and dθ are very small, this is essentially a rectangle with sides dr and r dθ, so its
area is r dr dθ. In short, a small bit of the rθ-plane with area dr dθ is mapped to a
small bit of the xy-plane with area r dr dθ; see Figure 4.7. Hence, in this case the
missing factor in (4.35) is simply r, and (4.36) becomes
** **
(4.36) f (x, y) dx dy = f (r cos θ, r sin θ)r dr dθ.
S R

Here S is a region in the xy-plane and R = G−1 (S) is the corresponding region in
the rθ-plane. Our argument here has been very informal, but this result is correct,
and it gives the formula for computing double integrals in polar coordinates.
The case /of a 0linear mapping of the plane is also easy to analyze. Given a
matrix A = ac db with det A = ad − bc ̸= 0, let x = G(u) = Au, that is,
4.4. Change of Variables for Multiple Integrals 179

(0, 1) (1, 1) (a + c, b + d)
(b, d)

(a, c)
(0, 0) (1, 0) (0, 0)

F IGURE 4.8: The linear map (x, y) = (au + bv, cu + dv).

(x, y) = G(u, v) = (au + bv, cu + dv). The transformation G takes the unit
vectors (1, 0) and (0, 1) to the vectors (a, c) and (b, d), so it maps the standard
coordinate grid to a grid of parallelograms with sides parallel to these vectors. In
particular, it maps the square [0, 1] × [0, 1] to the parallelogram with vertices at
(0, 0), (a, c), (b, d), and (a + b, c + d), as indicated in Figure 4.8. The area of that
parallelogram is |ad − bc|, that is, | det A|. (To see this, think of the plane as sitting
in R3 and recall the geometric interpretation of the cross product: The area of the
parallelogram is

|(ai + cj) × (bi + dj)| = |(ad − bc)k| = |ad − bc|.)

Since the map G is linear, it commutes with translations and dilations, so if R is


any square in the uv-plane, its image under G is a parallelogram in the xy-plane
whose area is | det A| times that of R. It follows that the missing factor in (4.35)
should be simply | det A|, so that for linear maps of the plane, (4.35) becomes
** **
f (x, y) dx dy = |ad − bc| f (au + bv, cu + dv) du dv.
S G−1 (S)

The situation is similar for linear mappings of 3-space. Namely, let x =


G(u) = Au where A is an invertible 3 × 3 matrix. If i, j, and k are the stan-
dard basis vectors for R3 , we have Ai = a, Aj = b, and Ak = c where a, b, c are
the columns of A, so A maps the unit cube to the parallelepiped generated by these
vectors. To find the volume of that parallelepiped, think of the bc-plane as its base.
Then the area of the base is |b × c|, and the height is the length of the projection of
a onto a line perpendicular to the bc-plane, namely, the line generated by b × c.
But this length is |a| | cos θ| where θ is the angle between a and b × c (we need the
absolute value because θ might be obtuse). Hence,

Volume = |b × c| |a| | cos θ| = |a · (b × c)|,

which is nothing but | det A| (Exercise 8 in §1.1). As before, we conclude that for
the linear map G(u) = Au of R3 , the missing factor in (4.35) should be | det A|.
180 Chapter 4. Integral Calculus

It is now reasonable to conjecture that the same result should hold for linear
mappings of Rn for any n. We proceed to show that this is correct.
4.37 Theorem. Let A be an invertible n × n matrix, and let G(u) = Au be the
corresponding linear transformation of Rn . Suppose S is a measurable region in
Rn and f is an integrable function on S. Then G−1 (S) = {A−1 x : x ∈ S} is
measurable and f ◦ G is integrable on G−1 (S), and
* * * *
n
(4.38) · · · f (x) d x = | det A| · · · f (Au) dn u.
S G−1 (S)

Proof. The proof of the measurability of G−1 (S) and the integrability of f ◦ G,
which is not profound but rather tedious, is given in Appendix B.5 (Corollaries
B.16 and B.17). (Actually, what is proved in Appendix B.5 is that if f is continuous
except on a set of zero content, a slightly stronger condition than integrability, then
the same is true of f ◦ G.) Here we concentrate on proving (4.38). The proof
naturally requires some linear algebra, in particular, the facts about elementary row
operations and determinants in (A.17)–(A.18), (A.28), and (A.30) of Appendix A.
Step 1: Let us agree to (re)define f (x) to be 0 for x ∈ / S. Then f (Au) = 0
for u ∈ / G (S), and we can replace the regions S and G−1 (S) in (4.38) by Rn .
−1

This makes the integrals in (4.38) look improper, but they really are not, since the
integrands vanish outside bounded sets. The point is that now we don’t have to
worry about what the limits of integration in each variable are; we can take them to
be ±∞.
Step 2: We prove the theorem when G is an “elementary transformation,” that
is, the transformation given by performing a single elementary row operation on
the column vector u. There are three kinds of elementary transformations, corre-
sponding to the three types of row operations (see (A17)–(A18)):
1. Multiply the kth component by a nonzero number c, leaving all the other
components alone:

G1 (u1 , . . . , uk , . . . , un ) = (u1 , . . . , cuk , . . . , un ).

2. Add a multiple of the jth component to the kth component, leaving all the
other components alone:

G2 (u1 , . . . , uk , . . . , un ) = (u1 , . . . , uk + cuj , . . . , un ).

3. Interchange the jth and kth components:

G3 (u1 , . . . , uj , . . . , uk , . . . , un ) = (u1 , . . . , uk , . . . , uj , . . . , un ).
4.4. Change of Variables for Multiple Integrals 181

The corresponding matrices A1 , A2 , A3 are obtained by performing the same row


operations on the identity matrix. Since det I = 1, the rules that tell how row
operations affect determinants (see (A.30)) give

(4.39) det A1 = c, det A2 = 1, det A3 = −1.

It is easy to verify that (4.38) holds for these three types of transformations.
The first two involve a change in only the kth variable, so we can integrate first
with respect to that variable and use (4.34) (or, rather, the simple special cases of
(4.34) discussed in Exercise 8 of §4.1). Thus, for G1 we set xk = cuk and obtain
* ∞ * c−1 ∞
f (. . . , xk , . . .) dxk = f (. . . , cuk , . . .) c duk
−∞ −c−1 ∞
* ∞
= |c| f (. . . , cuk , . . .) duk .
−∞

(The endpoints have to be switched if c < 0, which accounts for replacing c by |c|,
as in the discussion preceding (4.34).) Likewise, for G2 we set xk = uk + cuj and
obtain * ∞ * ∞
f (. . . , xk , . . .) dxk = f (. . . , uk + cuj , . . .) duk .
−∞ −∞

(uj is a constant as far as this calculation is concerned.) Now an integration with


respect to the remaining variables (for which xi = ui ) yields
* * * * * *
· · · f (x) d x = |c| · · · f (G1 (u)) d u = · · · f (G2 (u)) dn u.
n n

In view of (4.39), this establishes (4.38) for G1 and G2 . As for G3 , we have


* ∞ * ∞
f (. . . , uj , . . . , uk , . . .) duj duk
−∞ −∞
* ∞* ∞
= f (. . . , uk , . . . , uj , . . .) duj duk ,
−∞ −∞

simply because the variables uj and uk are dummy variables here. That is, we are
integrating f with respect to its jth and kth variables, and it doesn’t matter what
we call them. Now an integration with respect to the remaining variables, together
with (4.39), gives (4.38) for G3 .
Step 3: We next verify that if (4.38) is valid for the linear maps G(u) = Au
and H(u) = Bu, then it is also valid for the composition (G ◦ H)(u) = ABu.
182 Chapter 4. Integral Calculus

Indeed, if we set v = Bu and x = Av, we have


* * * *
· · · f (x) dn x = | det A| · · · f (Av) dn v
S G (S)
−1
* *
= | det A| | det B| · · · f (ABu) dn u.
H−1 (G−1 (S))

But (det A)(det B) = det(AB) and H−1 (G−1 (S)) = (G ◦ H)−1 (S), so the
integral on the right equals
* *
| det(AB)| · · · f (ABu) dn u,
(G◦H)−1 (S)

as claimed.
The Final Step: From Step 3, it follows easily by induction that if (4.38) is valid
for G1 , . . . , Gk , then it is also valid for the composition G1 ◦· · ·◦Gk . Thus, in view
of Step 2, to complete the proof we have merely to observe that every invertible
linear transformation of Rn is a composition of elementary transformations. This
is equivalent to the fact that every invertible matrix A can be row-reduced to the
identity matrix; see (A.52) (in particular, the equivalence of (a) and (i)) and (A.53)
in Appendix A.

There is one more simple class of transformations for which the change-of-
variable formula is easily established, namely the translations. These are the map-
pings of the form G(u) = u + b where b is a fixed vector. Indeed, we just make
the substitution xj = uj + bj , dxj = duj in each variable separately to conclude
that * * * *
· · · f (x) dn x = · · · f (u + b) dn u.
S S−b

Combining this with Theorem 4.37, we see that if G(u) = Au + b, then


* * * *
(4.40) · · · f (x) dn x = | det A| · · · f (Au + b) dn u.
S G−1 (S)

In particular, by taking f ≡ 1, we see that the n-dimensional volume of S is | det A|


times the n-dimensional volume of G−1 (S).
It is now easy to guess what the change-of-variable formula for a general invert-
ible C 1 transformation must be. Indeed, suppose that U and V are open sets in Rn ,
G : U → V is a one-to-one transformation of class C 1 whose derivative Df (u) is
invertible for all u ∈ U , and f is a continuous function on V . To relate the integral
of f over a measurable set S ⊂ V to an integral of f ◦ G over T = G−1 (S),
4.4. Change of Variables for Multiple Integrals 183

we think of the former as a sum of infinitesimal terms f (x) dn x, each of which is


the value of f at a point x multiplied by the volume dn x of an infinitesimal region
dS located at x. Under the transformation x = G(u), f (x) becomes f (G(u)),
and the region dS is the image under G of another infinitesimal region dT whose
volume is dn u. But on the infinitesimal level, the differentiable map G is the same
as its linearization:
G(u + du) = x + DG(u) · du.
Therefore, by (4.40), the elements of volume dn x and dn u are related by the for-
mula dn x = | det DG(u)| dn u. Putting this all together, we arrive at the main
theorem.
4.41 Theorem. Given open sets U and V in Rn , let G : U → V be a one-to-one
transformation of class C 1 whose derivative DG(u) is invertible for all u ∈ U .
Suppose that T ⊂ U and S ⊂ V are measurable sets such that T ⊂ U and
G(T ) = S. If f is an integrable function on S, then f ◦ G is integrable on T , and
* * * *
(4.42) · · · f (x) d x = · · · f (G(u))| det DG(u)| dn u.
n
S T
Proof. We present a proof in Appendix B.5 (Theorem B.24), under the slightly
stronger hypothesis that f is continuous except on a set with zero content. The
key idea is explained in the preceding paragraph, but turning it into a solid proof
is a surprisingly laborious task. An interesting and quite different approach to the
problem can be found in Lax [14], [15]. It shifts the hard work to a different part
of the argument; in particular, it uses the notion of partition of unity developed in
Appendix B.7.

Notice that the results derived earlier in this section are indeed special cases of
Theorem 4.41. If G is a linear map, G(u) = Au, then DG(u) = A for all u,
so | det DG(u)| = | det A| is a constant that can be brought outside the integral
sign. And if G is the polar coordinate map, G(r, θ) = (r cos θ, r sin θ), then
det DG(r, θ) = r, so we recover (4.36).
Let us record the corresponding results for the standard “polar” coordinate sys-
tems in R3 , shown in Figure 4.9. Cylindrical coordinates are just polar coordi-
nates in the xy-plane with the z-coordinate added in,
Gcyl (r, θ, z) = (r cos θ, r sin θ, z).
It is easily verified that det DGcyl (r, θ, z) = r again, so the formula for integration
in cylindrical coordinates is
*** ***
(4.43) f (x, y, z) dx dy dz = f (r cos θ, r sin θ, z) r dr dθ dz.
S G−1
cyl (S)
184 Chapter 4. Integral Calculus

z z

r
z ϕ

r y y
θ θ
x x

F IGURE 4.9: Cylindrical coordinates (left) and spherical coordinates (right).

Spherical coordinates are given by


Gsph (r, ϕ, θ) = (r sin ϕ cos θ, r sin ϕ sin θ, r cos ϕ).
Here r is the distance from the origin, θ is the longitude, and ϕ is the co-latitude (the
angle from the positive z-axis). The reader may check that det DGsph (r, ϕ, θ) =
r 2 sin ϕ (Exercise 6c, §3.4), so the formula for integration in spherical coordinates
is
***
(4.44) f (x, y, z) dx dy dz
*S* *
= f (r sin ϕ cos θ, r sin ϕ sin θ, r cos ϕ) r 2 sin ϕ dr dϕ dθ.
G−1
sph (S)

We conclude with some examples.


E XAMPLE 1. Find the volume and the centroid of the region S above the
surface z = x2 + y 2 and below the plane z = 4. (See Figure 4.10.)
Solution. Because of the circular symmetry, it is most convenient to use
polar coordinates. The projection of S onto the xy-plane is the disc of radius 2
about the origin, so the volume of S is
* 2 * 2π
8 92
V = (4 − r 2 )r dθ dr = 2π 2r 2 − 41 r 4 0 = 8π.
0 0
By symmetry, the centroid lies on the z-axis, and its z-coordinate is
*** * 2 * 4 * 2π *
1 1 1 2 8 1 2 94
z= z dV = rz dθ dz dr = r 2 z r2 dr
V S 8π 0 r2 0 4 0
18 1 6 2
9 8
= 4r 2 − 12 r 0= .
4 3
4.4. Change of Variables for Multiple Integrals 185

S T

F IGURE 4.10: The regions in Example 1 (left) and Example 2 (right).

E XAMPLE 2. Find' the volume of the “ice cream cone” T bounded below by
the cone z = 2 x2 + y 2 and above by the sphere x2 + y 2 + z 2 = 1. (See
Figure 4.10.)
Solution. In spherical coordinates (r, ϕ, θ), the equation of the cone is
tan ϕ = 21 and the equation of the sphere is r = 1. Hence the volume is

* 1 * tan−1 (1/2) * 2π 8 9tan−1 (1/2) 8 1 3 91


r 2 sin ϕ dθ dϕ dr = (2π) − cos ϕ 0 3r 0
0 0 0
- .
2π 2
= 1− √ .
3 5

This can also be done in cylindrical coordinates (r, θ, z) (note that the meaning
of r has changed here), in which the equation of the cone is z = 2r and the
2 2
√ is r + z = 1. The projection of T onto the xy-plane
equation of the sphere
is the disc r ≤ 1/ 5, so the volume is

* √ √ √
1/ 5* 1−r 2* 2π * 1/ 5 '
r dθ dz dr = 2π (r 1 − r 2 − 2r 2 ) dr
0 2r 0 0
2π 8 2 3/2
9 √
3 1/ 5
= − (1 − r ) − 2r 0 ,
3
which yields the same answer as before.
E XAMPLE 3. Let P be the parallelogram bounded by the lines x − y = 0,
x + 2y = 0, x − y = 1, and x + 2y = 6. (See Figure 4.11.) Compute
;;
P xy dA.
Solution. The equations of the bounding lines suggest the linear transfor-
mation u = x − y, v = x + 2y, which maps P to the rectangle 0 ≤ u ≤ 1,
0 ≤ v ≤ 6. In the notation of Theorem 4.37, P plays the role of S and this
transformation is G−1 ; its inverse G is easily computed to be x = 13 (2u + v),
186 Chapter 4. Integral Calculus

4
2
2
R
P
2 1 4

F IGURE 4.11: The regions in Example 3 (left) and Example 4 (right).

y = 13 (v − u), whose determinant is 13 . Thus, by Theorem 4.37,

** * 1* 6- .- .
1 2u + v v−u
xy dA = dv du,
P 3 0 0 3 3

which is easily computed to be 77 27 .


Alternatively, one can readily calculate that the vertices of P are (0, 0),
( 83 , 53 ), ( 23 , − 13 ), and (2, 2). It follows that P is the image of the unit square
0 ≤ s, t ≤ 1 under the transformation
- . - 2 .- .
x 3 2 s
= ,
y − 13 2 t

where the columns of the 2 × 2 matrix are the vectors from the origin to the
two adjacent vertices. Taking this transformation as G in Theorem 4.37 yields
** * 1* 1
xy dA = 2 ( 32 s + 2t)(− 31 s + 2t) dt ds.
P 0 0

This integral is essentially the same as the preceding one; the variables (s, t)
and (u, v) are related by u = s, v = 6t.

E XAMPLE 4. Let R be the region in the first quadrant of the xy-plane bounded
1 2 1 2 1 2
;; x = 1 − 4 y , x = 4 y − 1, and x = 4 − 16 y .
by the x-axis and the parabolas
(See Figure 4.11.) What is R xy dx dy?
Solution. Refer back to Example 3 in §3.4: The region R is the image of
the rectangle {(u, v) : 1 ≤ u ≤ 2, 0 ≤ /2uv −2v
≤ 1}
0 under the map G(u, v) =
2 2
(u − v , 2uv). We have DG(u, v) = 2v 2u and hence det DG(u, v) =
4.4. Change of Variables for Multiple Integrals 187

4(u2 + v 2 ). Thus, the substitutions x = u2 − v 2 , y = 2uv give


** * 1* 2
xy dx dy = (u2 − v 2 )(2uv)4(u2 + v 2 ) du dv
R 0 1
* 1* 2 * 1
)2
= 5 5
8(u v − uv ) du dv = ( 34 u6 v − 4u2 v 5 ))u=1 dv
0 1 0
* 1
)1
= (84v − 12v 5 ) dv = (42v 2 − 2v 6 ))0 = 40.
0

EXERCISES
1. Find the area of the region inside the cardioid r = 1+cos θ (polar coordinates).
'
2. Find the centroid of the half-cone x2 + y 2 ≤ z ≤ 1, x ≥ 0.
3. Find the volume of the region inside both the sphere x2 + y 2 + z 2 = 4 and the
cylinder x2 + y 2 = 1.
Find the volume of the region above the xy-plane, below the cone z = 2 −
4. '
x2 + y 2 , and inside the cylinder (x − 1)2 + y 2 = 1.
5. Find the mass of a right circular cylinder of base radius R and height h if the
mass density is c times the distance from the bottom of the cylinder.
6. Find the volume of the portion of the sphere x2 + y 2 + z 2 = 4 lying above the
plane z = 1.
7. Find the mass of a ball of radius R if the mass density is c times the distance
from the boundary of the ball.
8. Find the centroid of the portion of the ball x2 + y 2 + z 2 ≤ 1 lying in the first
octant (x, y, z ≥ 0).
9. Find the centroid of the parallelogram bounded by the lines x − 3y = 0, 2x +
y = 0, x − 3y = 10, and 2x + y = 15.
;;
10. Calculate S (x + y)4 (x − y)−5 dA where S is the square −1 ≤ x + y ≤ 1,
1 ≤ x − y ≤ 3.
11. Find the volume of the ellipsoid (x + 2y)2 + (x − 2y + z)2 + 3z 2 = 1.
12. Let S be the region in the first quadrant bounded by the curves xy = 1, xy = 4,
and the lines y = x, y = 4x. Find the area and the centroid of S by using the
transformation u = xy, v = y/x.
13. Let S be the region in the first quadrant bounded;;by the curves xy = 1, xy = 3,
x2 − y 2 = 1, and x2 − y 2 = 4. Compute S (x2 + y 2 ) dA. (Hint: Let
G(x, y) = (xy, x2 − y 2 ). What is | det DG|?)
188 Chapter 4. Integral Calculus
;;
14. Use the transformation x = u−uv, y = uv to evaluate S (x+y)−1 dA where
S is the region in the first quadrant between the lines x + y = 1 and x + y = 4.
15. Use “double polar coordinates” x = r cos θ, y = r sin θ, z = s cos ϕ, w =
s sin ϕ in R4 to compute the 4-dimensional volume of the ball x2 + y 2 + z 2 +
w 2 = R2 .

4.5 Functions Defined by Integrals


Suppose f (x, y) is a function of x ∈ Rm and y ∈ Rn . If f (x, y) is integrable over
the set S ⊂ Rn as a function of y for each fixed x, we can form a new function of
x by integrating out y:
* *
(4.45) F (x) = · · · f (x, y) dn y.
S

The question then arises as to how properties of f such as continuity and differen-
tiability relate to the corresponding properties of F .
Perhaps the most basic question of this sort is the following. Suppose that

lim f (x, y) = g(y) (y ∈ S);


x→a

is it true that * *
lim F (x) = · · · g(y) dn y?
x→a S
In other words, can one interchange the operations of integrating with respect to y
and taking a limit with respect to x? Is the limit of the integral equal to the integral
of the limit? In general, the answer is no.

E XAMPLE 1. let

x2 y / 0
f (x, y) = (x, y) ̸= (0, 0) , f (0, 0) = 0.
(x2 + y 2 )2

Evidently limx→0 f (x, y) = 0 for each y ;(although for different reasons when
1
y = 0 or when y ̸= 0). However, limx→0 0 f (x, y) dy ̸= 0; in fact,
* 1
)1
x2 y x2 ) 1
dy = − ) = ,
0
2
(x + y ) 2 2 2(x + y ) )0 2(1 + x2 )
2 2

1
which tends to 2 as x → 0.
4.5. Functions Defined by Integrals 189

Notice, however, that the f in Example 1 is discontinuous, and indeed un-


bounded, at the origin; for instance, f (x, x) = 1/4x → ∞ as x → 0. (f (x0 , y)
is bounded as a function of y for each fixed x0 , but its maximum value tends to
infinity as x0 → 0.) If we impose some stronger conditions on f , we can obtain
an affirmative result. The following theorem is not the last word in the subject (see
Corollary 4.53), but it suffices for many purposes.

4.46 Theorem. Suppose S and T are compact subsets of Rn and Rm , respectively,


and S is measurable. If f (x, y) is continuous on the set T × S = {(x, y) : x ∈
T, y ∈ S}, then the function F defined by (4.45) is continuous on T .

Proof. Given ϵ > 0, we wish to find δ > 0 so that |F (x) − F (x′ )| < ϵ whenever
|x − x′ | < δ. Let |S| denote the n-dimensional volume of S. Since T × S is
compact, f is uniformly continuous on it by Theorem 1.33, so there is a δ > 0 so
that |f (x, y) − f (x′ , y)| < ϵ/|S| whenever y ∈ S, x, x′ ∈ T , and |x − x′ | < δ.
But then
* * * *
′ ′ n ϵ n
|F (x) − F (x )| ≤ · · · |f (x, y) − f (x , y)| d y < · · · d y = ϵ,
S S |S|

and we are done.

Remark. In the statement of Theorem 4.46 we could assume that T is open


rather than compact. Indeed, every point x in an open set T is the center of a closed
ball B that is contained in T . Since B is compact, the preceding argument shows
that F is continuous on B, and hence F is continuous at every x ∈ T .
A related question concerns differentiability. Suppose that f is differentiable
as a function of x for each y ∈ S; is it true that F is differentiable in x and that its
partial derivatives ∂xj F are the integrals of the derivatives ∂xj f ? In other words, is
the integral of the derivative equal to the derivative of the integral? This is another
question about the interchange of limits and integrals. Indeed, it is always true that
the finite difference F (x + h) − F (x) is the integral of f (x + h, y) − f (x, y),
simply because integration is a linear operation, and the question is what happens
in the limit as h → 0. As in Example 1, things can go wrong; see Exercise 1. Our
main positive result is as follows.

4.47 Theorem. Suppose S ⊂ Rn is compact and measurable, and T ⊂ Rm is


open. If f and ∇x f are continuous on T × S, then the function F defined by (4.45)
is of class C 1 on T , and
* *
∂F ∂f
(4.48) (x) = · · · (x, y) dn y (x ∈ T ).
∂xj S ∂x j
190 Chapter 4. Integral Calculus

Proof. Given a point x0 ∈ T , choose r > 0 small enough so that x ∈ T whenever


|x − x0 | ≤ 2r. We shall show that F is of class C 1 on B(r, x0 ) and prove (4.48)
for x ∈ B(r, x0 ); since x0 is an arbitrary point in T , this will establish the theorem.
For the purpose of computing ∂xj F , the other variables xk (k ̸= j) play no role, so
we may assume that m = 1. In fact, in order to simplify the notation a bit, we shall
also assume that n = 1; the proof for general n is exactly the same. Accordingly,
we write x and y instead of x and y henceforth.
For 0 < |h| ≤ r and |x − x0 | ≤ r, we consider the difference quotient
*
F (x + h) − F (x) f (x + h, y) − f (x, y)
= dy.
h S h
By the mean value theorem, we have f (x + h, y) − f (x, y) = h∂x f (x + th, y),
where t is some number between 0 and 1 depending on x, h, and y. Hence,
(4.49) * *
F (x + h) − F (x) 8 9
− ∂x f (x, y) dy = ∂x f (x + th, y) − ∂x f (x, y) dy.
h S S

The argument now proceeds as in the proof of Theorem 4.46. Since ∂x f is contin-
uous on the compact set B(r, x0 ) × S, it is uniformly continuous there by Theorem
1.33. Thus, given ϵ > 0, we can find δ > 0 so that the integrand on the right of
(4.49) is less than ϵ/|S| for all y ∈ S, x ∈ B(r, x0 ), and t ∈ (0, 1), whenever
|h| < δ. It follows that
) * ) *
) F (x + h) − F (x) ) ϵ
) − ∂ f (x, y) dy )< dy = ϵ for |h| < δ,
) h
x )
S S |S|

and hence that


*
F (x + h) − F (x)
lim − ∂x f (x, y) dy = 0,
h→0 h S

as claimed.

E XAMPLE 2. Let F (x) = 0 y −1 exy sin y dy. This integral
;π cannot be eval-
uated in elementary terms; however, we have F ′ (x) = 0 exy sin y dy, which
can be evaluated by two integrations by parts. The result is that F ′ (x) =
(eπx + 1)/(x2 + 1).
Situations often occur in which the variable x occurs in the limits of integration
as well as the integrand. For simplicity we consider the case where x and y are
scalar variables:
* ϕ(x)
(4.50) F (x) = f (x, y) dy.
a
4.5. Functions Defined by Integrals 191

We suppose that f is continuous in x and y and of class C 1 in x for each y, and that
ϕ is of class C 1 . If f does not depend on x, the derivative of F can be computed
by the fundamental theorem of calculus together with the chain rule:
* ϕ(x)
d
f (y) dy = f (ϕ(x))ϕ′ (x).
dx a

For the more general case (4.50), we can differentiate F by combining this result
with Theorem 4.47 according to the recipe in Exercise 7 of §2.3: Differentiate with
respect to each x in (4.50) in turn while treating the others as constants, and add
the results. The upshot is that
* ϕ(x)
∂f
(4.51) F ′ (x) = f (x, ϕ(x))ϕ′ (x) + (x, y) dy.
a ∂x

E XAMPLE 3. Given a continuous function g on R, let


* x
h(x) = (x − y)g(y) dy.
0

Then * *
x x

h (x) = (x − x)g(x) + g(y) dy = g(y) dy,
0 0

and hence h′′ (x) = g(x). (Cf. Exercise 7 in §4.3, where this result is ap-
proached from a different angle.)

The hypotheses of Theorems 4.46 and 4.47 can be weakened considerably, but
only at the cost of a more intricate proof. More sophisticated theories of integra-
tion (see §4.8) furnish a powerful theorem, the so-called dominated convergence
theorem, that generally provides the sharpest results in these situations. The full
statement of this theorem requires more background than we have available here,
but its restriction to the context of Riemann integrable functions is the following
result, in which the crucial condition is the existence of the uniform bound C.

4.52 Theorem (The Bounded Convergence Theorem). Let S be a measurable


subset of Rn and {fj } a sequence of integrable functions on S. Suppose that
fj (y) → f (y) for each y ∈ S, where f is an integrable function on S, and that
there is a constant C such that |fj (y)| ≤ C for all j and all y ∈ S. Then
* * * *
lim · · · fj (y) d y = · · · f (y) dn y.
n
j→∞ S S
192 Chapter 4. Integral Calculus

An elementary (but not simple) proof for the case where S is an interval in R
can be found in Lewin [17]. The full dominated convergence theorem can be found
in Bear [3, p. 68], DePree and Swartz [5, p. 194], Jones [9, p. 133], and Rudin [18,
p. 321].
Theorem 4.52 implies the following improvements on Theorems 4.46 and 4.47.

4.53 Corollary. Let S be a measurable subset of Rn and T a subset of Rm . Suppose


f (x, y) is a function on T × S that is integrable as a function of y ∈ S for each
x ∈ T , and let F be defined by (4.45).
a. If f (x, y) is continuous as a function of x ∈ T for each y ∈ S, and there is
a constant C such that |f (x, y)| ≤ C for all x ∈ T and y ∈ S, then F is
continuous on T .
b. Suppose T is open. If f (x, y) is of class C 1 as a function of x ∈ T for each
y ∈ S, and there is a constant C such that |∇x f (x, y)| ≤ C for all x ∈ T and
y ∈ S, then F is of class C 1 on T and (4.48) holds.

Proof. To prove part (a), by Theorem 1.15 it is enough to show that F (xj ) → F (x)
whenever {xj } is a sequence in S converging to x ∈ S. This follows by applying
the bounded convergence theorem to the sequence of functions fj (y) = f (xj , y).
Similarly, part (b) is proved by applying the bounded convergence theorem to the
sequence of difference quotients with increments hj , where {hj } is a sequence
tending to zero along one of the coordinate axes. The uniform bound on these quo-
tients is obtained by applying the mean value theorem as in the proof of Theorem
4.47; details are left as Exercise 8.

EXERCISES
2
1. Let f (x, y) = x3 y −2 e−x /y if y > 0, f (x, y) = 0 if y ≤ 0.
a. Show that f (x, y) is of class C 1 as a function of x for each fixed y and as a
function of y for each fixed x, but that f is unbounded in any neighborhood
; 1 the smoothness in y, cf. Exercise 9 in
of the origin. (For
−x
§2.1.)
2
b. Let F (x) = 0 f (x, y) dy. Show that F (x) = xe and hence that

; 1
F (0) = 1, but that 0 ∂x f (0, y) dy = 0.
2. Compute F ′ (x) for the functions F (x) defined for x > 0 by the following
formulas. Your
; 1 answers should not contain integral signs.
a. F (x) = 0 log(1 + xey ) dy.
; x2
b. F (x) = 1 y −1 cos(xy 2 ) dy.
; 3x −1 xy
c. F (x) = 1 y e dy.
4.6. Improper Integrals 193

;x
3. Given a continuous function g on R, let h(x) = 0 (x − y)ex−y g(y) dy. Show
that h′′ − 2h′ + h = g.
;x
4. Given a continuous function g on R, let h(x) = 12 0 [sin 2(x − y)]g(y) dy.
Show that h′′ + 4h = g.
; ϕ(x)
5. Given F (x) = ψ(x) f (x, y) dy, find F ′ (x), assuming suitable smoothness
conditions on ψ, ϕ, and f .
6. (How to compress n antidifferentiations into one) Let f be a continuous func-
tion on R. For n ≥ 1, let
* x
1
f [n](x) = (x − y)n−1 f (y) dy.
(n − 1)! 0
/ 0′
Show that f [n] = f [n−1] for n > 1 and conclude that f [n] is an nth-order
antiderivative of f .
7. Let f be any continuous function on [0, 1]. For x ∈ R and t > 0, let
* 1 * 1
2 f (y)
u(x, t) = t−1/2 e−(x−y) /4t f (y) dy, v(x, t) = t 2 2
dy.
0 0 (x − y) + t

a. Show that ∂t u = ∂x2 u.


b. Show that ∂x2 v + ∂t2 v = 0.
8. Complete the deduction of Corollary 4.53b from the bounded convergence the-
orem.

4.6 Improper Integrals


In this section we return to integration in one variable. The Riemann theory of
integration pertains to bounded functions on finite intervals, but there are many sit-
uations where one needs to integrate functions over infinite intervals (i.e., half-lines
or the whole line) or functions that are unbounded near some point in the interval
of integration. Such integrals are called improper, and they are defined in terms of
limits of ordinary integrals. To do a really good job with improper integrals, one
should adopt the more powerful Lebesgue theory of integration, sketched in §4.8.
(Even then, additional limiting procedures are needed to handle integrals such as
the one in Example 3 below.) Here we content ourselves with a short discussion of
useful results about simple types of improper integrals.
The two most basic types of improper integrals are as follows:
;∞
I. a f (x) dx, where f is integrable over every finite subinterval [a, b].
194 Chapter 4. Integral Calculus
;b
II. a f (x) dx, where f is integrable over [c, b] for every c > a but is unbounded
near x = a.

We study these two types in turn and then consider integrals of more complicated
sorts that can be obtained by combining them.

Improper Integrals of Type I. In this subsection, all functions in question are


assumed to be defined on [a, ∞) and integrable on [a, b] for every b > a.
The definition of the improper integral is
* ∞ * b
f (x) dx = lim f (x) dx.
a b→∞ a
;∞
More precisely, the integral a f (x) dx is said to converge if the limit on the right
exists, in which case its value is defined to be that limit; otherwise the integral is
said ;to diverge, and it is;not assigned a numerical value. (However, we may say
∞ b
that a f (x) dx = ∞ if a f (x) dx grows without bound as b → ∞.)

E XAMPLE
; ∞ −x1. 8 9b
a. ;0 e dx = limb→∞ − e−x 0 = 1, since limb→∞ e−b = 0.

b. 0 cos x dx diverges, since limb→∞ sin b does not exist.
;∞
Our main concern here is not with the evaluation of a f (x) dx but with the
more basic question of whether or not it converges. At ; ∞the outset, we make one
; ∞ If c > a, the convergence of a f (x) dx is equivalent to
simple but useful remark:
the convergence
;c of c f (x) dx, the difference between
;∞ the two being the ordinary
integral a f (x) dx. Thus, the convergence of a f (x) dx depends only on the
behavior of f (x) as x → ∞, not on its behavior on a finite interval [a, c].
;b
We first consider the situation when f ≥ 0. In this case, the integral a f (x) dx
increases along with the upper endpoint b, so we can exploit the following variant
of the monotone sequence theorem.

4.54 Lemma. If ϕ is a bounded increasing function on [a, ∞), then limx→∞ ϕ(x)
exists and equals sup{ϕ(x) : x ≥ a}.

Proof. The proof is left to the reader (Exercise 7); it is essentially identical to the
proof of the monotone sequence theorem (1.16).
;x
By applying Lemma 4.54 to the function ϕ(x) = a f (t) dt, we see that the
;∞ ;b
integral a f (x) dx converges if and only if a f (x) dx remains bounded as b →
∞. This immediately leads to the basic comparison test for convergence.
4.6. Improper Integrals 195

; ∞ Theorem. Suppose that 0 ≤


4.55 ; ∞f (x) ≤ g(x) ;for

all sufficiently large x. If
;a∞ g(x) dx converges, so does a f (x) dx. If a f (x) dx diverges, so does
a g(x) dx.

Proof. By the remarks following the definition


;∞ of convergence, we may assume
that 0 ≤ f (x) ≤ g(x) for all x ≥ a. If a g(x) dx converges, it provides an upper
;b
bound for ϕ(b) = a f (x) dx as b → ∞:
* b * b * ∞
f (x) dx ≤ g(x) dx ≤ g(x) dx.
a a a
;∞
The convergence of a f (x) dx then follows from Lemma 4.54. The second as-
sertion is equivalent to the first one.

The following variant of Theorem 4.55 is sometimes easier to apply:

4.56 Corollary.; ∞Suppose f > ;0,∞g > 0, and f (x)/g(x) → l as x → ∞. If 0 <


l < ∞, then a f (x) dx and ; ∞ a g(x) dx are both convergent or both
; ∞ divergent.
If l = 0, the convergence of
; ∞a g(x) dx implies the convergence;of a f (x) dx. If

l = ∞, the divergence of a g(x) dx implies the divergence of a f (x) dx.

Proof. If 0 < l < ∞, the fact that f (x)/g(x) → l yields the estimates f (x) ≤
2lg(x) and f (x) ≥ 12 lg(x) for sufficiently large x, so the first assertion follows by
comparing f to a multiple of g. If l = 0 (resp. l = ∞), we have f (x) ≤ g(x) (resp.
g(x) ≥ f (x)) for sufficiently large x, whence the other assertions follow.

The functions most often used for comparison in Theorem 4.55 and Corollary
4.56 are the power functions x−p . Taking a = 1 for convenience, for p ̸= 1 we
have * b 7
dx b1−p − 1 ∞ if p < 1,
p
= →
1 x 1−p (p − 1)−1 if p > 1,
;b ;∞
and 1 x−1 dx = log b → ∞. In short, 1 x−p dx converges if and only if p > 1.
Combining this fact with Theorem 4.55, we obtain the following handy rule:

;4.57 Corollary. If 0 ≤ f (x) ≤ Cx−p for all sufficiently large x, where p > 1, then
∞ −1
;a∞ f (x) dx converges. If f (x) ≥ cx (c > 0) for all sufficiently large x, then
a f (x) dx diverges.
;∞
E XAMPLE 2. The integral 0 [(2x + 14)/(x3 + 1)] dx converges, because

2x + 14 4x 4
3
≤ 3 = 2 for x ≥ 7.
x +1 x x
196 Chapter 4. Integral Calculus

Alternatively, we could observe that


@
2x + 14 1
3
→ 2 as x → ∞
x +1 x2

and use Corollary 4.56 with g(x) = x−2 to establish the convergence of the
; ∞[1, ∞). (The integral over [0,;1]∞is proper.) Note that we are
integral over, say,
not comparing 0 [(2x + 14)/(x3 + 1)] dx to 0 x−2 dx, which presents an
additional difficulty because x−2 is unbounded at x = 0; the comparison of
(2x + 14)/(x3 + 1) with x−2 is significant only for large x.

It should be noted that the power functions x−p do not quite tell the whole story.
There are functions whose rate of decay at infinity is faster than x−1 but slower
than x−p for p > 1, and their integrals may be either convergent or divergent; see
Exercises 4 and 5.
Next we remove the assumption that f is nonnegative, and with a view toward
future applications, we shall allow f to be complex-valued. The question of con-
vergence can often be reduced to the case where f ≥ 0 via the following result.
;∞ ;∞
4.58 Theorem. If a |f (x)| dx converges, then a f (x) dx converges.

Proof. First suppose f is real-valued. Let f + (x) = max[f (x), 0] and f − (x) =
;max[−f (x), 0]. Then
; ∞ we have 0 ≤ f + (x) ≤ |f (x)| and 0 ≤ f − (x) ≤ |f (x)|, so
∞ + − + −
;a∞ f (x) dx and a f (x) dx converge by Theorem 4.55. But f = f − f , so
a f (x) dx converges also.
If f is complex-valued,
; ∞ we have | Re f (x)| ≤ |f (x)| and | Im; f∞(x)| ≤ |f (x)|,
so the; ∞convergence of a |f (x)| dx implies the convergence of a | Re f (x)| dx
and a | Im f (x)| dx and hence ; ∞(by the preceding argument) the convergence of
the real and imaginary parts of a f (x) dx.
;∞ ;∞
The integral a f (x) dx is called absolutely convergent if a |f (x)| dx con-
verges. Theorem 4.55 and its corollaries can be used to test ; ∞ for absolute conver-
gence, by applying
;∞ them to |f |. It is possible, however, for a f (x) dx to converge
even when a |f (x)| dx diverges because of cancellation effects between positive
and negative values. Here is an important example.
* ∞
sin x
E XAMPLE 3. The integral dx is not absolutely convergent (Exercise
1 x
8), but it is convergent. To see this, integrate by parts:
* ) * b
b
sin x − cos x ))b cos x
dx = ) − dx.
1 x x 1 1 x2
4.6. Improper Integrals 197

;∞
Now, 1 |x−2 cos x| dx converges by Corollary 4.57 since |x−2 cos x| ≤ x−2 ,
so the integral on the right approaches a finite limit as b → ∞; moreover,
;b since
|b−1 cos b| ≤ b−1 → 0, so does the other term. Hence limb→∞ 1 x−1 sin x dx
exists, as claimed.

Improper Integrals of Type II. In this subsection, all functions in question are
assumed to be defined on (a, b] and integrable on [c, b] for every c > a.
The definition of the improper integral in this situation is
* b * b
f (x) dx = lim f (x) dx.
a c>a, c→a c
;b
That is, a f (x) dx converges if the limit on the right exists, and diverges other-
wise. The obvious analogues of the results in the preceding subsection are valid in
this situation with essentially the same proofs; one has merely to replace conditions
like “x → ∞” or “for sufficiently large x” by “x → a” or “for x sufficiently close
to a.” For instance, here is the basic comparison test:
4.59 Theorem. Suppose that 0 ≤ f (x) ≤ g(x) for all x sufficiently close to
;b ;b ;b
a. If a g(x) dx converges, so does a f (x) dx. If a f (x) dx diverges, so does
;b
a g(x) dx.
The functions most often used for comparison in this situation are the power
functions (x − a)−p , but now the condition for convergence is p < 1 rather than
p > 1. Indeed, for p ̸= 1,
* b ) 7
−p (x − a)1−p ))b (1 − p)−1 (b − a)1−p if p < 1,
(x − a) dx = →
c 1 − p )c ∞ if p > 1,
;b
and c (x − a)−1 dx = log(x − a)|bc → ∞. Hence the analogue of Corollary 4.57
is as follows:
4.60 Corollary. If 0 ≤ f (x) ≤ C(x − a)−p for x near a, where p < 1, then
;b ;
−1 (c > 0) for x near a, then b f (x) dx
a f (x) dx converges. If f (x) > c(x − a) a
diverges.
;1
E XAMPLE 4. 0 x−2 sin 3x dx diverges. Indeed, x−1 sin 3x → 3 as x → 0, so
x−2 sin 3x > 2x−1 for x near 0.
Theorem 4.58 also remains valid in this situation; that is, absolute convergence
implies convergence.
; 1 −1/2
E XAMPLE 5. 0 x sin(x−1 ) dx is absolutely convergent, because
|x−1/2 sin(x−1 )| ≤ x−1/2 .
198 Chapter 4. Integral Calculus

Other Types of Improper Integrals. Various other kinds of improper integrals


can be built up out of those of types I and II.
First, obviously one;can consider the “mirror images” of types I and II; that
b
is, integrals of the form −∞ f (x) dx where f is integrable on [a, b] for all a < b,
;b
or integrals of the form a f (x) dx where f is integrable on [a, c] for all c < b
but is unbounded near x = b. The ideas are exactly the same; only minor nota-
tional changes are needed. (In the latter situation, the comparison functions for the
analogue of Corollary 4.60 are the power functions |x − b|−p = (b − x)−p .)
;b
Second, one can consider improper integrals a f (x) dx where a difficulty oc-
curs at both endpoints of the interval of integration, either because the endpoint is
at infinity or because the integrand is unbounded
; b ; c there.
; b The trick here is to pick an
intermediate point c ∈ (a, b) and write a = a + c , thus reducing the integral to
a sum of two integrals that are each of type I or II; the original integral is said to be
convergent if and only if each of the two subintegrals is convergent. For example,
if f is integrable over every finite interval [a, b], we define
* ∞ * 0 * ∞
f (x) dx = f (x) dx + f (x) dx
−∞ −∞ 0
* 0 * b
= lim f (x) dx + lim f (x) dx.
a→−∞ a b→∞ 0

The integral on the left converges only when both of the limits on the right exist
independently of one another; ; b a and b.
; ∞ there is no relation between the variables
The same ideas apply to a f (x) dx when f is unbounded at a or to a f (x) dx
when f is unbounded at both a and b.
;∞
E XAMPLE 6. −∞ dx/(1 + x2 ) converges; the integrals over (−∞, 0] and
[0, ∞) are both convergent by comparison to x−2 . In fact,
* ∞ 5 6
dx )b
= lim arctan x ) = π − − π = π.
2 a
−∞ 1 + x a→−∞, b→+∞ 2 2
;∞ ;1
E XAMPLE 7. ;0 x−p dx is divergent for every p. Indeed, if p < 1, 0 x−p dx

converges but 1 x−p dx diverges, whereas the reverse is true if p > 1. If
p = 1, these integrals both diverge.
;∞
E XAMPLE 8. Consider 0 f (x) dx where f (x) = 1/(x1/2 + x3/2 ). Since
;1
0 < f (x) < x−1/2 , 0 f (x) dx converges by Corollary 4.60. Since 0 <
;∞ ;∞
f (x) < x−3/2 , 1 f (x) dx converges by Corollary 4.57. Hence 0 f (x) dx
converges.
4.6. Improper Integrals 199

;b
Finally, one can consider improper integrals a f (x) dx where f is unbounded
near one or more interior points of [a, b]. Again the trick is to break up [a, b] into
subintervals such that the singularities of f occur only at endpoints of the subinter-
vals and consider the integrals of f over the subintervals separately.
;9
XAMPLE 9. Let f (x) = (x3 − 8x2 )−1/3 , and let us consider 0 f (x) dx and
;E ∞
0 f (x) dx. The singularities of f occur at x = 0 and x = 8, so for the first
integral we write
* 9 * c * 8 * 9
= + + (0 < c < 8).
0 0 c 8

We have |f (x)| = x−2/3 |x − 8|−1/3 , which is approximately 21 x−2/3 for x


near 0 and approximately 41 |x − 8|−1/3 for x near 8. Hence all three subinte-
;9
grals are absolutely convergent by Corollary 4.60, and the original integral 0
converges. On the other hand, f (x); is positive for x > 8 and f (x)/x−1 =

(1 − 8x−1 )−1/3
; ∞ → 1 as x → ∞, so 9 f (x) dx diverges by Corollary 4.56. It
follows that 0 f (x) dx diverges too.
;b
The definition of the improper integral a f (x) dx given above when f has
; 1 of [a, b] is a little too restrictive for some purposes.
a singularity in the interior
Consider, for example, −1 x−1 dx. According to our definition, this integral is to
be considered as the limit of
* −δ * 1 - .
dx dx δ
(4.61) + = log δ − log ϵ = log
−1 x ϵ x ϵ
as δ and ϵ decrease to 0, and this limit does not exist: When δ and ϵ are extremely
small, their ratio can be arbitrarily large or arbitrarily small. However, since x−1 is
an odd function,;it seems natural to interpret the value of the integral as 0;
; 1the−1neg-
0 −1
ative infinity of −1 x dx should exactly cancel the positive infinity of 0 x dx.
We can achieve this result by modifying (4.61) so as to preserve the symmetry of
the situation, namely, by taking δ = ϵ, so that log(δ/ϵ) = 0.
These considerations lead to the following definition. Suppose a < c < b, and
supppose f is integrable on [a, c − ϵ] and on [c + ϵ, b] for all ϵ > 0. The (Cauchy)
;b
principal value of the integral a f (x) dx is
* b +* c−ϵ * b ,
P.V. f (x) dx = lim f (x) dx + f (x) dx ,
a ϵ→0 a c+ϵ
;b
provided that the limit exists. Of course, if a f (x) dx converges, its Cauchy prin-
cipal value is its ordinary value.
200 Chapter 4. Integral Calculus

The following proposition describes a typical situation in which principal val-


ues occur.

4.62 Proposition. Suppose a < 0 < b. If ϕ is continuous on [a, b] and differen-


;b
tiable at 0, then P.V. a x−1 ϕ(x) dx exists.

Proof. First we check the case ϕ ≡ 1 by explicit calculation:


* b +* −e * b , - .
dx dx dx )−ϵ )b b
P.V. = lim + = log |x| )−a + log x)ϵ = log .
a x ϵ→0 a x ϵ x |a|

For the general case, we write ϕ(x) = ϕ(0) + [ϕ(x) − ϕ(0)], obtaining
* b * b * b
ϕ(x) dx ϕ(x) − ϕ(0)
P.V. dx = ϕ(0) P.V. + dx.
a x a x a x
We have just seen that the first quantity on the right exists, and the second one is a
proper integral: The integrand is actually continuous on [a, b] if we define its value
at x = 0 to be ϕ′ (0).

The
; ∞notion of principal value is also occasionally applied to integrals of the
form −∞ f (x) dx in which f is integrable over any finite interval:
* ∞ * R
P.V. f (x) dx = lim f (x) dx.
−∞ R→∞ −R
;∞
For example, the integral −∞ x(1 + x2 )−1 dx is divergent because the integrand
is asymptotically equal to x−1 as x → ±∞, but its principal value is zero because
the integrand is odd.

EXERCISES

1. Determine
* ∞ whether the following improper integrals of type I converge.
dx
a. √ .
*1 ∞ x2 x + 3
x − 3x − 1
b. 2
dx.
*3 ∞ x(x + 2)
2
c. x2 e−x dx.
*0 ∞
sin 4x
d. 2−x−2
dx.
3 x
4.6. Improper Integrals 201
* ∞
1
e. tan dx.
1 x
2. Determine
* 1 whether the following improper integrals of type II converge.
x
a. √ dx.
*0 1 − x2
π
b. cot x dx.
*π/2
1

1−x
c. 2
dx.
0
* 1 x − 4x + 3
dx
d. .
0 x (x + x)1/3
1/2 2
* 1
1 − cos x
e. dx.
0 sin3 2x
3. Determine whether the following improper integrals converge. In each case
it will be necessary to break up the integral into a sum of integrals of types I
and/or
* ∞II.
a. x−3/4 e−x dx.
*0 1
b. x−1/3 (1 − x)−2 dx.
*0 ∞ √
x
c. x
dx.
*0 ∞ e − 1
dx
d. .
*0 ∞ x(x − 1)1/3
1
e. x−1/5 sin dx.
*0 ∞ x
ex
f. x 2
dx.
−∞ e + x

4. For p > 0, let fp (x) = x−1 (log x)−p .


a. Given p > 0 and ϵ > 0, show that x−1−ϵ < fp (x) < x−1 for sufficiently
large x. ;∞
b. For which p does 2 fp (x) dx converge?
5. Let fp be as in Exercise 4 and gp (x) = (x log x)−1 (log log x)−p .
a. Given p > 0 and ϵ > 0, show that f1+ϵ (x) < gp (x) < f1 (x) for suffi-
ciently large x. ;

b. For which p does 3 gp (x) dx converge?
6. Let f (x) = 1 on the intervals [1, 1 21 ], [2, 2 14 ], [3, 3 81 ], . . . , and f (x) = 0 else-
where.
202 Chapter 4. Integral Calculus
;∞
a. Show that 0 f (x) dx converges (and is equal to 1) although f (x) ̸→ 0 as
x → ∞. ;∞
b. Modify f to make an example of a function g such that 0 g(x) dx con-
verges although g(x) does not remain bounded as x → ∞.
7. Prove Lemma 4.54.
;∞
8. Prove that 1 x−1 | sin x| dx diverges. (Hint: Show that there is a constant
; (n+1)π −1 ; (n+1)π −1
c > 0 such that nπ x | sin x| dx > c nπ x dx for all n ≥ 1.)
9. (Dirichlet’s Test for Convergence) Let f be continuous
;x and let g be C 1 on
[a, ∞). Suppose that (i) the function F (x) = a f (t) dt remains bounded

;as∞x → ∞; (ii) g (x) ≤ 0 on [a, ∞) and limx→∞ g(x) = 0. Show that
a f (x)g(x) dx converges. (Hint: Example 3 is the case f (x) = sin x,
g(x) = x−1 . Generalize the argument given there.)
;1
10. Evaluate P.V. −1 dx/x(x + 2).
;1
11. Suppose ϕ is of class C 3 on [−1, 1]. Show that P.V. −1 x−3 ϕ(x) dx exists if
and only if ϕ′ (0) = 0. (Hint: Consider the second-order Taylor expansion of
ϕ.)

4.7 Improper Multiple Integrals


The problem of defining improper integrals in dimensions n > 1 is trickier than in
dimension 1. Suppose,
;; for example, that f is a continuous function on R2 and we
wish to define R2 f dA. The obvious idea is to set
** **
f dA = lim f dA,
R2 r→∞ Sr

where the Sr ’s are a family of measurable sets that fill out R2 as r → ∞. For
instance, we could take Sr to be the disc of radius r about the origin, or the square
of side length r centered at the origin, or the rectangle of side lengths r and r 2
centered at the origin, or the disc of radius r centered at (15, −37), and so on. The
difficulty is evident: There is a bewildering array of possibilities, with no rationale
for choosing one over another and no guarantee that different families Sr will yield
the same limit.
Evidently there is some work to be done, and we shall not give all the details
here. The outcome, in a nutshell, is that everything goes well when the integrand is
nonnegative or when the integral is absolutely convergent, but not otherwise.
4.7. Improper Multiple Integrals 203

We begin by considering the situation where a nonnegative function f is to be


integrated over a set S ⊂ Rn . We suppose that f is not integrable on S according
to the definitions in §4.2, either because S is unbounded or because f is unbounded
on S. Instead, we assume the following:
S is the union of an increasing sequence of sets U1 , U2 , . . .,

A
(4.63) S= Uj (U1 ⊂ U2 ⊂ U3 ⊂ · · · ),
1
where each Uj is measurable and f is integrable on each Uj .
E XAMPLE 1. If S = Rn and f is continuous on Rn , we can take Uj to be
the ball of radius j about the origin. As noted above, there are many other
possibilities.
E XAMPLE 2. Suppose f is continuous on Rn \ {0} but f (x) → ∞ as x → 0,
and S is the ball {x : |x| ≤ 1}. Then we can take Uj to be the spherical shell
{x : 1/j ≤ |x| ≤ 1}. (Strictly speaking, the union of the Uj ’s is S \ {0}, but
this is immaterial: Omission of a single point, or any set of zero content, from
a domain has no effect on integration over that domain.)
; ;
With S, f , and Uj as in (4.63), the integrals · · · Uj f dV n exist for all j, and
they increase along with j since the sets Uj do. It therefore follows from the mono-
tone sequence theorem that the limit
* *
lim · · · f dV n
j→∞ Uj

always exists, provided that we allow +∞ as a ;value, ; and this limit is an obvious
n
candidate for the value of the improper integral · · · S f dV .
Here is the crucial point: Suppose that {U:j } is another sequence of sets satis-
fying the conditions of (4.63). Then the two limits
* * * *
n
lim · · · f dV and lim · · · f dV n
j→∞ Uj j→∞ !j
U

are equal. Therefore, it makes sense to define the integral of f over S by


* * * *
(4.64) · · · f dV n = lim · · · f dV n ,
S j→∞ Uj

where {Uj } is any sequence of sets satisfying the conditions of (4.63). It is un-
derstood that the value of the integral may be +∞, in which case we say that the
integral diverges.
204 Chapter 4. Integral Calculus

The proof that the limit in (4.64) is independent of the choice of {Uj }, in full
generality, requires the Lebesgue theory of integration. We shall give a proof under
some additional restrictions on S and the Uj ’s, usually easy to satisfy in practice,
in Appendix B.6 (Theorem B.25).
It is also true that improper multiple integrals of nonnegative functions can be
evaluated as iterated improper integrals under suitable conditions on S and f so
that the latter integrals exist. For example,
** * ∞* ∞ * ∞* ∞
f dA = f (x, y) dx dy = f (x, y) dy dx,
R2 −∞ −∞ −∞ −∞

and if S = {(x, y) : 0 ≤ x ≤ y},


** * ∞* y * ∞* ∞
f dA = f (x, y) dx dy = f (x, y) dy dx.
S 0 0 0 x

We shall not attempt to state a general theorem to cover all the various cases (much
less give a precise proof), but we assure the reader that as long as the integrand is
nonnegative, there is almost never any difficulty.
The analogue of the comparison test, Theorem 4.55, is valid for multiple im-
proper integrals, with the same proof. Again the basic comparison functions are
powers of |x|, but the critical exponent depends on the dimension.

4.65 Proposition. For p > 0, define fp on Rn \{0} by fp (x) = |x|−p . The integral
of fp over a ball {x : |x| < a} is finite if and only if p < n; the integral of fp over
the complement of a ball, {x : |x| > a}, is finite if and only if p > n.

Proof. We present the proof when n = 2. The only singularity of f is at the


origin, so we may use the annuli {x : ϵ < |x| < a} and {x : a < |x| < b} as
approximating regions. In polar coordinates, the integrals then become
* a * 2π * b* 2π
r −p r dθ dr, r −p r dθ dr.
ϵ 0 a 0
;a ;∞
As ϵ → 0 and b → ∞ we obtain 2π 0 r 1−p dr and 2π a r 1−p dr, which are
convergent when p < 2 and p > 2, respectively.
The proof for general n is similar, using spherical coordinates and their ana-
logues in higher dimensions. The reader is invited to work out the case n = 3 in
Exercise 1.

As another example of improper double integrals, we now perform a classic


calculation that leads to one of the most important formulas in mathematics.
4.7. Improper Multiple Integrals 205

Let us consider the integral


**
2 −y 2
e−x dA.
R2

On the one hand, we can take the approximating regions Uj to be discs centered at
the origin and switch to polar coordinates:

** * R * 2π * ∞ * 2π
−x2 −y 2 −r 2 2
e dA = lim e r dθ dr = e−r r dθ dr
R2 R→∞ 0 0 0 0
8 2 9∞
= 2π − 12 e−r 0 = π.

On the other hand, we can take the approximating regions to be squares centered at
the origin and stick to Cartesian coordinates:

** * R * R
−x2 −y 2 2 2
e dA = lim e−x e−y dx dy
R2 R→∞ −R −R
-* ∞ . -* ∞ .
−x2 −y 2
= e dx e dy .
−∞ −∞

The two integrals in parentheses are equal, of course; the name of the variable of
integration is irrelevant. We have shown that
-* ∞ .2
−x2
e dx = π.
−∞

2
Since e−x > 0, we can take the positive square root of both sides to obtain the
magic formula:
* ∞
2 √
4.66 Proposition. e−x dx = π.
−∞

2
The function e−x turns up in many contexts. In particular, it is essentially
the “bell curve” or “normal distribution” of probability and statistics, but in that
setting one must rescale it so that the total area under its graph is 1; Proposition
4.66 provides the appropriate scaling factor. Proposition 4.66 is remarkable not
2
only because it is inaccessible by elementary calculus (the antiderivative of e−x is
not an elementary function) but because it presents the number π in a starring role
that has nothing to do with circles.
206 Chapter 4. Integral Calculus

Now, what about functions that are not nonnegative? Let us suppose that S, f ,
and {Uj } are as in (4.63), but f is merely assumed to be real-valued. The essential
; the preceding theory can be applied to |f |, so that it makes sense to say
point;is that
that · · · S |f | dV n converges. If this
; condition
; holds, the argument used to prove
n
Theorem 4.58 shows that limj→∞ · · · Uj f dV exists and that
* * * * * *
n
lim ··· f dV = · · · f dV − · · · f − dV n ,
+ n
j→∞ Uj S S

where f + (x) = max[f (x), 0] and f − (x) = max[−f (x), 0]. The integrals on the
right converge by comparison to the integral of |f |, and they are independent
; ; of
the choice of {Uj }; hence, so is the limit on the left. In short, if · · · S |f | dV n
converges, we may define the improper integral of f over S by formula (4.64); the
limit in question exists and is independent of the choice of approximating sequence
{Uj }.
The same result holds if f is complex-valued; we simply consider its real and
imaginary parts separately.
In dimensions n > 1, however, there is no general theory of improper integrals
that are convergent but not absolutely convergent. Such integrals, when they arise,
must be defined by specific limiting procedures that are adapted to the situation at
hand.

EXERCISES

1. Prove Proposition 4.65 for the case n = 3.


2. Determine whether the following improper integrals converge, and evaluate the
ones*that
* * do.
dV
a. 2 2 2
.
* * R3 1 + x + y + z
dA
b. .
(1 + x2 + y 2 )2
* *x,y>0
*
z2
c. 2 2 2 3/2
dV .
* * x2 +y2 +z 2 <1 (x + y + z )
2 2
d. xe−x −y dA.
* *x>0
x2
e. 2 2 2
dA.
x2 +y 2 <1 (x + y )
4.8. Lebesgue Measure and the Lebesgue Integral 207

3. The electrostatic potential generated by a distribution of electric charge in R3


with density ρ is defined to be
***
ρ(x − y) 3
ϕ(x) = d y.
R3 4π|y|

Show that this integral is absolutely convergent if ρ is continuous and vanishes


outside a bounded set.
4. Let f (x, y) = (x2 − y 2 )(x2 + y 2 )−2 , and let S be the unit square [0, 1] ×
[0, 1]. ;;
a. Show that S |f | dA = ∞.
;1;1
b. Show by explicit calculation that the iterated integrals 0 0 f (x, y) dx dy
;1;1
and 0 0 f (x, y) dy dx both exist and are unequal.

4.8 Lebesgue Measure and the Lebesgue Integral


In several places in this book we allude to the fact that in advanced analysis, the
Riemann theory of integration that we have developed here is replaced by the more
sophisticated theory due to Lebesgue. Detailed accounts of the Lebesgue integral
can be found in Bear [3], Jones [9], and Rudin [18]. Here we shall content our-
selves with a brief informal description of how it works. (Note: There are several
ways to develop the Lebesgue theory of integration; in some treatments, the char-
acterization of Lebesgue measure and the Lebesgue integral that we give here are
theorems rather than definitions.) In a few places we need the notion of the sum of
an infinite series, for which the reader is referred to §6.1.
The starting point is a refined concept of n-dimensional measure, independent
of any theory of integration. To keep things on a concrete level, let us explain this
concept for the case n = 2.
In the Jordan theory of area, described in §4.2, we find the area of a set S ⊂ R2
by approximating S from the inside and the outside by unions of rectangles. For
the Lebesgue notion of area, we use a two-step approximation process: We first
approximate S from the inside by compact sets and from the outside by open sets,
then approximate the compact sets from the outside and the open sets from the
inside by unions of rectangles. More precisely, let us agree to call a set that is
the union of a finite collection of rectangles with disjoint interiors a tiled set. The
Lebesgue measure m(S) of a set S ⊂ R2 is then defined as follows:
#
• If T = K k=1 Rk is a tiled set, where the Rk ’s are rectangles with disjoint
interiors, the Lebesgue measure m(T ) is the sum of the areas of the Rk ’s.
208 Chapter 4. Integral Calculus

• The Lebesgue measure of a compact set K is


% &
m(K) = inf m(T ) : T is a tiled set and T ⊃ K .

• The Lebesgue measure of an open set U is


% &
m(U ) = sup m(T ) : T is a tiled set and T ⊂ U .

• A set S ⊂ R2 is said to be Lebesgue measurable if the quantities


% &
sup m(K) : K is compact and K ⊂ S
and
% &
inf m(U ) : U is open and U ⊃ S

are equal, in which case their common value is the Lebesgue measure m(S).

Note that there is no assumption that the sets in question are bounded (although
compact sets are bounded by definition); the Lebesgue theory applies equally well
to bounded and unbounded sets.
The notion of n-dimensional Lebesgue measure for sets in Rn is entirely simi-
lar; only the terminology needs to be modified a little. Every set that one will ever
meet in “real life” — in particular, every open set, every closed set, every intersec-
tion of countably many open sets, every union of countably many closed sets, and
so on — is Lebesgue measurable.3 Lebesgue measure has the following fundamen-
tal additivity property:#If {Sj } is a finite or infinite sequence
# of disjoint
! Lebesgue
measurable sets, then Sj is Lebesgue measurable and m( Sj ) = m(Sj ). In
the Jordan theory, this additivity is guaranteed to hold only for finitely many sets;
the extension to infinitely many sets is the crucial property that allows the Lebesgue
theory to handle various limiting processes more smoothly.
It is not hard to show that every open set U ⊂ Rn is the union of a finite or
countably infinite family of rectangular boxes Rj (intervals when n = 1) with dis-
joint interiors, and the Lebesgue measure of U is just the sum of the n-dimensional
volumes of the boxes. (In general these boxes are not part of a fixed grid of boxes;
if there are infinitely many of them, the diameter of Rj generally tends to zero as
j → ∞.) It follows that a set S ⊂ Rn has Lebesgue measure zero if and only if for
every ϵ > 0, S is contained in the union of a finite or countable family of boxes,
the sum of whose volumes is less than ϵ. The only difference between this and the
condition that S have zero content is the fact that here we allow an infinite family
3
For those who know some set theory: More precisely, one cannot construct Lebesgue nonmea-
surable sets without invoking the axiom of choice.
4.8. Lebesgue Measure and the Lebesgue Integral 209

of boxes, but as with additivity, this difference is significant. In particular, every


countable set has Lebesgue measure zero (if S = {x1 , x2 , . . .}, let Rj be a box
centered at xj with volume 2−j ϵ), whereas many countable sets — the set of points
with rational coordinates, for example — are not Jordan measurable.
With the notion of Lebesgue measure in hand, we turn to the Lebesgue inte-
gral. First we specify the class of functions to which the theory applies. A function
f : Rn → R is called Lebesgue measurable if, for every interval I ⊂ R, the
set {x ∈ Rn : f (x) ∈ I} is Lebesgue measurable. Again, every function that
one will ever meet in “real life” is Lebesgue measurable. In particular, every con-
tinuous function is Lebesgue measurable, and if f is Riemann integrable on the
Jordan measurable set S, then f χS is Lebesgue measurable. Moreover, if {fj } is
a sequence of Lebesgue measurable functions such that fj (x) → f (x) for every
x, then the limit f is Lebesgue measurable. (This last statement is quite false if
“Lebesgue measurable” is replaced by “Riemann integrable”!)
Suppose that f is Lebesgue measurable and nonnegative. Rather than parti-
tioning the domain of f , we partition the set [0, ∞) in which f takes its values into
small intervals [0, 2−n ), [2−n , 2 · 2−n ), [2 · 2−n , 3 · 2−n ), and so on, and form the
sum -C D.
"∞
j j j+1
Sn f = m x : n ≤ f (x) < n .
2n 2 2
j=0
(The Lebesgue measurability of f is needed so that the terms in this sum are well
defined. One or more of them may be infinite, in which case the value of the
sum is +∞.) The sums Sn f increase with n because the associated partitions of
[0, ∞) become finer and finer, so they have a limit (possibly
; +∞), which is defined
to be the Lebesgue integral of f (over Rn ), denoted by f dm. More generally,
we define the; Lebesgue integral
; of f over any Lebesgue measurable set S ⊂ Rn ,
denoted by S f dm, to be (f χS ) dm. Note that neither the function f nor the
set S needs to be bounded; for nonnegative integrands there are no “improper”
integrals in the Lebesgue theory.
Now we drop the assumption that f ≥ 0. If f is any Lebesgue measurable
function, we write it as the difference of the two nonnegative functions
f + (x) = max[f (x), 0] and f − (x) = max[−f (x), 0]
; ; + ;
;and define the Lebesgue integral f dm ;to be f dm;− f − dm. The integral
f dm is not defined in the case where f + dm and f − dm are both infinite,
although in some instances one can define it as an “improper” integral by limiting
procedures such as those in §4.6. (Example 3 in §4.6 illustrates this phenomenon.)
The Lebesgue integral is ;an extension of the Riemann integral. That is, if
the (proper) Riemann integral S f dV n exists, then so does the Lebesgue integral
210 Chapter 4. Integral Calculus
;
f dm, and the two are equal; but the class of Lebesgue integrable functions is
S
much bigger than the class of Riemann integrable functions. We conclude with two
additional remarks about the relation between the Lebesgue and Riemann integrals.

• The notion of Lebesgue measure provides a definitive answer to the question


of which functions are Riemann integrable. Namely, a function f : Rn → R
is Riemann integrable on the bounded set S if and only if f is bounded on S
and the set of points at which f χS is discontinuous has Lebesgue measure
zero. (Cf. Theorems 4.13 and 4.18 and the discussion of zero content versus
zero measure above.)

• There is a way of giving the Riemann theory of integration an extra twist to


obtain an integral, called the Henstock-Kurzweil integral, generalized Rie-
mann integral, or gauge integral, that is equivalent to the Lebesgue integral
for nonnegative functions
; + but also ; gives a well-defined result for some func-

tions f for which f dm and f dm are both infinite. See Bartle [2] for
a brief introduction and DePree and Swartz [5] for a complete treatment. The
virtue of this theory is that it yields a powerful theory of integration within
the same conceptual framework as the familiar Riemann integral without the
necessity of developing a theory of measure first. The compensating virtue
of the Lebesgue theory is that it generalizes readily to yield useful notions of
measure and integration in many important situations other than the classical
integral on Euclidean space.
Chapter 5

LINE AND SURFACE


INTEGRALS;
VECTOR ANALYSIS
The themes of this chapter are (1) integrals over curves and surfaces and (2) differ-
ential operations on vector fields, which combine to yield (3) a group of theorems
relating integrals over curves, surfaces, and regions in space that are among the
most powerful and useful results of advanced calculus.
At the outset, let us explain the term “vector field” in more detail. Let F be
an Rn -valued function defined on some subset of Rn . We have encountered such
things in previous chapters, where we generally thought of them as representing
transformations from one region of Rn to another or coordinate systems on regions
of Rn . In this chapter, however, we think of such an F as a function that assigns to
each point x in its domain a vector F(x), represented pictorially as an arrow based
at x, and we therefore call it a vector field. Two simple vector fields are sketched in
Figure 5.1. The primary physical motivation is the idea of a force field. For exam-
ple, F could represent a gravitational field, F(x) being the gravitational force felt
by a unit mass located at x, or an electric field, F(x) being the electrostatic force
felt by a unit charge located at x. There are many other physical interpretations; for
example, in a moving fluid like a stream of water, F(x) could represent the velocity
of the fluid at position x. (In all these examples, F(x) may also depend on other
parameters such as the time t.)
One other general comment: The notion of differentiability, or being of class
k
C , is defined for functions on open sets, because to compute the derivative of a
function at a point it is necessary to know the values of the function at neighboring
points. However, we shall frequently be dealing with functions and vector fields on

211
212 Chapter 5. Line and Surface Integrals; Vector Analysis

F IGURE 5.1: The vector fields F(x, y) = (x, y) (left) and F(x, y) =
(−y, x) (right).

closed sets. When we say that a function or vector field is of class C k on a closed
set S ⊂ Rn , we always mean that it is of class C k on some open set containing S.

5.1 Arc Length and Line Integrals


In this section we discuss integrals over curves, traditionally called “line integrals,”
which are generalizations of ordinary (one-dimensional) integrals over intervals on
the real line. As one would expect, they are based on the idea of cutting up the curve
into many tiny pieces, forming appropriate Riemann sums, and passing to the limit.
However, there are two species of line integrals, appropriate for integrating real-
valued or vector-valued
;b functions, depending on how one adapts the differential dx
appearing in a f (x) dx to the more general situation. Our discussion here will
be on the informal, intuitive level where we think of dx as being an infinitesimal
increment in the variable x.

Differentials on Curves; Arc Length. Suppose C is a smooth curve in Rn .


We consider two nearby points x and x + dx on the curve; here

(5.1) dx = (dx1 , . . . , dxn )

is the vector difference between the two points, and we imagine it as being infinitely
small. We may, however, be more interested in the distance between the two points,
traditionally denoted by ds, which is
(
(5.2) ds = |dx| = dx21 + · · · + dx2n .
5.1. Arc Length and Line Integrals 213

To give these differentials a precise meaning that can be used for calculations, the
best procedure is to parametrize the curve. Thus, we assume that C is given by
parametric equations x = g(t), a ≤ t ≤ b, where g is of class C 1 and g′ (t) ̸= 0.
Then the neighboring points x and x + dx are given by g(t) and g(t + dt), so
- .
′ dx1 dxn
(5.3) dx = g(t + dt) − g(t) = g (t) dt = ,..., dt.
dt dt

(The difference between the increment of g and its linear approximation disappears
in the infinitesimal limit.) Moreover,
E- . - .
′ dx1 2 dxn 2
(5.4) |dx| = |g (t)| dt = + ··· + dt,
dt dt

which is just what one gets by formally multiplying and dividing the expression on
the right of (5.2) by dt.
What happens if we sum up all the infinitesimal increments dx or ds — that
is, if we integrate the differentials dx or ds = |dx| over the curve? Integration of
the vector increments dx just gives the total vector increment, that is, the vector
difference between the initial and final points on the curve:
* * b
(5.5) dx = g′ (t) dt = g(b) − g(a).
C a

This is nothing but the fundamental theorem of calculus applied to the components
of g; it is simple but not very exciting. On the other hand, ds is the straight-line
distance between two infinitesimally close points x and x + dx on the curve, and
since smooth curves are indistinguishable from their linear approximations on the
infinitesimal level, ds is the arc length of the bit of curve between dx and x + dx.
Adding these up gives the total arc length of the curve:
* * b
(5.6) Arc length = ds = |g′ (t)| dt.
C a

Our derivation of (5.6) in terms of infinitesimals was meant as motivation rather


than as a rigorous proof of anything. Henceforth, we shall take (5.6) as a definition
of arc length for a smooth curve. (There is another, perhaps better, definition that
does not require the curve to be C 1 ; we shall discuss it at the end of this section.)
There is, however, one crucial issue that must be addressed: The arc length of a
curve C is an intrinsic property of the geometric object C and should not depend
on the particular parametrization we use. To see that this is the case, suppose we
214 Chapter 5. Line and Surface Integrals; Vector Analysis

F IGURE 5.2: Two oriented curves.

choose a new parameter u related to t by t = ϕ(u), where ϕ is a one-to-one smooth


mapping from the interval [c, d] to the interval [a, b]. Then the curve C described
by x = g(t) is also described by x = (g ◦ ϕ)(u), c ≤ u ≤ d, so we should have
* d * d

Arc length = |(g ◦ ϕ) (u)| du = |g′ (ϕ(u))| |ϕ′ (u)| du,
c c

where for the second equality we have used the chain rule. This does indeed agree
with (5.6), by formula (4.34).
The same independence of parametrization holds for ; bthe′ related integral (5.5),
with one subtle but important difference. The integral a g (t) dt gives the vector
difference between the two endpoints of the curve, which is clearly independent of
the parametrization except insofar as the parametrization determines which is the
initial point and which is the final point. If we choose a new parameter u as above
so that t is a decreasing function of u (thus a = ϕ(d) and b = ϕ(c)), then the initial
and final points get switched, and so their difference is multiplied by −1.
The issue here is that a parametrization x = g(t) determines an orientation for
the curve C, that is, a determination of which direction along the curve is “forward”
and which direction is “backward,” the “forward” direction being the direction in
which the point g(t) moves as t increases. The orientation of a curve can be conve-
niently indicated in a picture by drawing one or more arrowheads along the curve
that point in the “forward” direction, as indicated in Figure 5.2. The substance of
the preceding paragraph is then that the integral (5.5) depends on the parametriza-
tion only insofar as the parametrization determines a choice of orientation. In
contrast, the arc length of a curve is independent even of the orientation.
The notion of arc length extends in an obvious way to piecewise smooth curves,
obtained by joining finitely many smooth curves together end-to-end but allow-
ing corners or cusps at the joining points; we simply compute the lengths of the
smooth pieces and add them up. We can express this more precisely in terms of
parametrizations, as follows: The function g : [a, b] → Rn is called piecewise
smooth if (i) it is continuous, and (ii) its derivative exists and is continuous except
perhaps at finitely many points tj where the one-sided limits limt→tj ± g′ (t) exist.
5.1. Arc Length and Line Integrals 215

(Note. In Chapter 8 we shall use the term “piecewise smooth” in a slightly different
sense.) In this case |g′ (t)| is an integrable function on [a, b] by Theorem 4.12 (the
fact that it may be undefined at a few points is immaterial), and its integral gives
the arc length. The same generalization also applies to the line integrals discussed
below.
Remarks.
i. The parametrization x = g(t) may be considered as representing the curve C
as the path traced out by a moving particle whose position at time t is g(t).
The derivative g′ (t) is then the velocity of the particle, and its norm |g′ (t)|
;b
is the speed of the particle. Integrating the velocity, a g′ (t) dt, gives the net
difference ;in the initial and final positions of the particle, whereas integrating
b
the speed, a |g′ (t)| dt, gives the total distance traveled by the particle, i.e., the
arc length of the curve.
ii. In the preceding discussion, we have implicitly assumed that the parametri-
zation x = g(t) is one-to-one. This is not always the case if we think of g(t)
as the position of a particle at time t, for the particle can traverse a path more
than once. For example, g(t) = (cos t, sin t) represents a particle moving
around the unit circle with constant speed. If we restrict t to an interval of
length ≤ 2π, we get a one-to-one parametrization of part or all of the circle,
but from the physical point of view there is no reason to make such a restriction.
However, the interpretations
;b ′ ; b ′hold whether g is one-
in the preceding paragraph
to-one or not: a g (t) dt is still g(b) − g(a), and a |g (t)| dt is still the total
distance traveled by the particle from time a to time b; it can be interpreted
as arc length if the portions of the curve that are traversed more than once are
counted with the appropriate multiplicity.
iii. While theoretically simple, calculation of arc length tends to be difficult in
practice because the square root implicit in the definition of the norm |g′ (t)|
often leads to unpleasant integrands. This is just a fact of life.

Line Integrals of Scalar Functions. If f is a continuous function whose do-


main includes a smooth (or piecewise smooth) curve C in Rn , we can integrate f
over the curve, taking the differential in the integral to be the element of arc length
ds. Thus, if C is parametrized by x = g(t), a ≤ t ≤ b, we define
* * b
(5.7) f ds = f (g(t))|g′ (t)| dt.
C a

This is independent of the parametrization and the orientation, by the same chain-
rule calculation that we performed above for the case f ≡ 1.
216 Chapter 5. Line and Surface Integrals; Vector Analysis

As an example of an application of such integrals, we can define the average


value of f over the curve C, just like the average value over a region:
; ;
C f ds f ds
Average of f over C = = ;C .
Arc length of C C ds
E XAMPLE 1. What is the centroid of the upper half of the unit circle, C =
{(x, y) : x2 + y 2 = 1, y ≥ 0}?
Solution. The centroid of C is the point whose coordinates (x, y) are the
averages of x and y over C. Clearly x = 0 by symmetry. Just to get some
practice,
; let’s do the calculation of the arc length of C (which of course is π)
and C y ds with two different parametrizations: (i) taking x as the parameter

and y = 1 − x2 , and (ii) taking the polar angle θ as the parameter, x = cos θ,
y = sin θ. (Note that these two parametrizations give opposite orientations on
C; the first goes from left to right, the second from right to left.)
In the first parametrization, we have
>
−x dx ' x2 dx
dy = √ , 2
ds = dx + dy = 1 +2 dx = √ ,
1−x 2
1 − x2 1 − x2
so
* * 1 √
1 − x2 )1
y ds = √ dx = x)−1 = 2,
C −1 1 − x2
* * 1
dx )1
ds = √ = arcsin x)−1 = π.
C −1 1 − x2
In the second one, we have
'
dx = − sin θ dθ, dy = cos θ dθ; ds = dx2 + dy 2 = dθ,
so
* * π * * π

y ds = sin θ dθ = − cos θ )0 = 2, ds = dθ = π.
C 0 C 0

Either way, y = 2/π.

Line Integrals of Vector Fields. We can define the integral of an Rm -valued


function over a curve in Rn , simply
; by integrating
; each component
; separately; that
is, if F = (F1 , . . . , Fm ), then C F ds = ( C F1 ds, . . . , C Fm ds). There is not
much to be said about such integrals that does not follow immediately from the
facts about scalar-valued integrals. One significant fact, however, does require a
little extra proof, namely the analogue of Theorem 4.9d. We state it for ordinary
integrals over [a, b]; the generalization to integrals over curves is easy (Exercise 7a).
5.1. Arc Length and Line Integrals 217

5.8 Proposition. If F is a continuous Rm -valued function on [a, b], then


)* b ) * b
) )
) F(t) dt)) ≤ |F(t)| dt.
)
a a

Proof. For any unit vector u, we have


)- * b . ) )* b ) * b * b
) ) ) )
) F(t) dt · u)) = )) F(t) · u dt)) ≤ |F(t) · u| dt ≤ |F(t)| dt.
)
a a a a

Here we have applied Theorem 4.9d to the scalar-valued function F(t) · u and then
invoked Cauchy’s inequality. ;The desired result is obtained by taking u to be the
b
unit vector in the direction of a F(t) dt.

Of greater interest is a scalar-valued line integral for vector fields — that is, for
Rn -valued functions on Rn . If C is a smooth (or piecewise smooth) curve in Rn
and F is a continuous vector field defined on some neighborhood of C in Rn , the
line integral of F over C is
* *
F · dx = (F1 dx1 + F2 dx2 + · · · + Fn dxn ).
C C

That is, if C is described parametrically by x = g(t), a ≤ t ≤ b, then


* * b
(5.9) F · dx = F(g(t)) · g′ (t) dt.
C a

If we make a change of parameters, say t = ϕ(u), the chain rule g′ (t) dt =


g′ (ϕ(u))ϕ′ (u) du together with the change-of-variable formula for ordinary (sin-
gle) integrals guarantees that the quantity on the right of (5.9) is unchanged, except
that the new endpoints; of integration may end up in the wrong order. Therefore:
The line integral C F · dx is independent of the parametrization as long as
the orientation is ;unchanged, but it acquires a factor of −1 if the orientation is
reversed. That is, C F · dx is a well-defined quantity once the vector field F and
the oriented curve C ;are specified.
The line integral C F · dx can be expressed as an integral of a scalar function
over C. Indeed, let us choose a parametrization x = g(t) and set
g′ (t)
t(g(t)) = , Ftang (x) = F(x) · t(x).
|g′ (t)|
That is, t(x) is the unit tangent vector to the curve C in the forward direction at the
point x, and Ftang (x) is the component of F(x) in the direction of t(x). Then
F(g(t)) · g′ (t) = F(g(t)) · t(g(t))|g′ (t)| dt = Ftang (g(t)) ds,
218 Chapter 5. Line and Surface Integrals; Vector Analysis

so
* *
(5.10) F · dx = Ftang ds.
C C
;
That is, C F · dx is the integral of the tangential component of F with respect to
arc length. The dependence on the orientation here comes through Ftang , which
changes sign if the orientiation is reversed. (Any temptation to compute specific
line integrals by using (5.10), however, should probably be resisted, because the
element of arc length ds is often hard to compute with. It is almost always better to
use the basic definition (5.9) instead.)
Remarks. ;
i. If F is a force field, then C F · dx represents a quantity of energy; it is the
work done by the force on a particle that traverses the curve C.
ii. The integrand F · dx = F1 dx1 + · · · + Fn dxn in a line integral, with the
dx’s included, is often called a differential form, and we speak of integrating
a differential form over a curve. We shall return to this notion in §5.9.
What does all this boil down to when n = 1? In this case, vector fields and
scalar functions are the same thing, and both the scalar and vector versions of line
integrals are just ordinary one-variable integrals. The former, however, is indepen-
dent of orientation, whereas the latter depends on orientation. The distinction is the
same as the one between formulas (4.32) and (4.33) in §4.4; it is a question of
* * b
f (x) dx versus f (x) dx.
[a,b] a

In the integral on the left we must have a ≤ b; but in the integral on the right a and
b can occur in either order, and the sign of the integral depends on the order.
E XAMPLE 2. Let C be the ellipse formed by the intersection of the circular
cylinder x2 + y 2 = 1 and the plane z = 2y + 1, oriented counterclockwise
;
as
; viewed from above, and let F(x, y, z) = (y, z, x). Calculate C F · dx =
C (y dx + z dy + x dz).
Solution. We can parametrize C by x = cos t, y = sin t, z = 2 sin t + 1,
with 0 ≤ t ≤ 2π. Then dx = (− sin t, cos t, 2 cos t) dt, so
/ 0
F · dx = − sin2 t + (2 sin t + 1) cos t + 2 cos2 t dt
= (cos 2t + sin 2t + cos t + cos2 t) dt.

; first three terms over [0, 2π] vanishes, and the integral of the
The integral of the
last one is π. So C F · dx = π.
5.1. Arc Length and Line Integrals 219

F IGURE 5.3: Approximation of a curve by a piecewise linear curve.

Note that it doesn’t matter which point on C we choose to start and end at.
Instead of taking t ∈ [0, 2π], we could take t ∈ [a, a + 2π] for any a ∈ R; the
answer is the same since the integral of a trig function over a complete period
is independent of the particular period chosen.

Rectifiable Curves. There is an alternative definition of arc length that requires


no a priori hypotheses about the smoothness of the curve. One cuts the curve C up
into a finite number of pieces by inserting subdivision points and approximates
C by the piecewise linear curve obtained by connecting the dots, as indicated in
Figure 5.3. The length of the piecewise linear approximation is obtained by adding
up the lengths of its constituent line segments, and the arc length of C is defined to
be the limit of this sum as the subdivision is made finer and finer.
To make this more precise, it is convenient to describe C parametrically. Thus,
we assume that C is the range of a one-to-one continuous mapping g : [a, b] → Rn .
Given a partition P = {t0 , . . . , tJ } of [a, b], the sum of the lengths of the line
segments joining the points g(tj ) is
J
"
LP (C) = |g(tj ) − g(tj−1 )|.
1

If the set of numbers


% &
L = LP (C) : P is a partition of [a, b]

is bounded, then C is called rectifiable, and the arc length L(C) is defined to be
the supremum of L:
% &
L(C) = sup LP (C) : P is a partition of [a, b] .

Note that if P ′ is a refinement of P then LP ′ (C) ≥ LP (C), by the triangle inequal-


ity; hence the supremum is indeed the appropriate sort of limit. This estimate also
implies that the supremum is unchanged if we consider only partitions containing
a given c ∈ (a, b) among their subdivision points, and from this it follows that arc
220 Chapter 5. Line and Surface Integrals; Vector Analysis

length is additive: If C1 and C2 are the curves parametrized by g(t) for t ∈ [a, c]
and t ∈ [c, b], then L(C) = L(C1 ) + L(C2 ). See Exercise 8.
We now show that this definition coincides with our previous one for C 1 curves.

5.11 Theorem. With notation as above, if g is of class C 1 , then C is rectifiable,


and * b
L(C) = |g′ (t)| dt.
a

Proof. For any partition P of [a, b], by (5.5) and Proposition 5.8 we have
J )*
"
) J * *
) tj ) " tj b
LP (C) = ) g (t) dt)) ≤

|g′ (t)| dt = |g′ (t)| dt.
)
1 tj−1 1 tj−1 a

;b
It follows that L(C) ≤ a |g′ (t)| dt, and in particular that C is rectifiable.
Next, for r, s ∈ [a, b], let Crs be the curve parametrized by g(t) with t ∈ [r, s],
and let ϕ(s) = L(Cas ). (That is, we consider the length of the curve C, starting
at t = a, as a function of the right endpoint of the parameter interval.) Suppose
h > 0. Since arc length is additive, we have L(Css+h ) = ϕ(s + h) − ϕ(s), so by
the inequality we have just proved (applied to the curve Css+h ) and the mean value
theorem for integrals,
* s+h
L(Css+h ) = ϕ(s + h) − ϕ(s) ≤ |g′ (t)| dt = h|g′ (σ)|,
s

where σ is some number between s and s + h. On the other hand, |g(s + h) − g(s)|
is LP (Css+h ) where P is the trivial partition {s, s + h}, and hence it is no bigger
than L(Css+h ). Combining these estimates and dividing by h, we see that
) )
) g(s + h) − g(s) ) ϕ(s + h) − ϕ(s)
) )≤ ≤ |g′ (σ)|.
) h ) h

As h → 0, the quantities on the left and right approach |g′ (s)|, and hence so does
the one in the middle. A slight modification of this argument works also for h < 0,
so we conclude that ϕ is differentiable and that ϕ′ (s) = |g′ (s)|. The desired result
is now immediate:
* b
L(C) = ϕ(b) = ϕ(b) − ϕ(a) = |g′ (s)| ds.
a
5.1. Arc Length and Line Integrals 221

EXERCISES
1. Find the arc length of the following parametrized curves:
a. g(t) = (a cos t, a sin t, bt), t ∈ [0, 2π].
b. g(t) = ( 13 t3 − t, t2 ), t ∈ [0, 2].
c. g(t) = (log t, 2t, t2 ), t ∈ [1, e].
d. g(t) = (6t, 4t3/2 , −4t3/2 , 3t2 ), t ∈ [0, 2].
2. Express the arc length of the following curves in terms of the integral
* π/2 '
E(k) = 1 − k2 sin2 t dt (0 < k < 1),
0
for suitable values of k. (E(k) is one of the standard elliptic integrals, so
called because of their connection with the arc length of an ellipse.)
a. An ellipse with semimajor axis a and semiminor axis b.
b. The portion of the intersection of the sphere x2 + y 2 + z 2 = 4 and the
cylinder x2 + y 2 − 2y = 0 lying in the first octant.
3. Find the centroid of the curve y = cosh x, −1 ≤ x ≤ 1.
; √
4. Compute C z ds where C is parametrized by g(t) = (2 cos t, 2 sin t, t2 ),
0 ≤ t ≤ 2π.
;
5. Compute C F · dx for the following F and C:
a. F(x, y, z) = (yz, x2 , xz); C is the line segment from (0, 0, 0) to (1, 1, 1).
b. F is as in (a); C is the portion of the curve y = x2 , z = x3 from (0, 0, 0)
to (1, 1, 1).
c. F(x, y) = (x − y, x + y); C is the circle x2 + y 2 = 1, oriented clockwise.
d. F(x, y) = (x2 y, x3 y 2 ); C is the closed curve formed by portions of the
line y = 4 and the parabola y = x2 , oriented counterclockwise.
6. Compute; the following line integrals:
a. C (xe−y dx + sin πx dy), where C is the portion of the parabola y = x2
from
; (0, 0) to (1, 1).
b. C (y dx + z dy + xy dz), where C is given by x = cos t, y = sin t, z = t
;with 02 ≤ t ≤ 2π.
c. C (y dx − 2x dy), where C is the triangle with vertices (0, 0), (1, 0), and
(1, 1), oriented counterclockwise.
7. Let F : Rn → Rm be a continuous map,; and let C be 1
; a C curve in R .
n

a. Deduce from Proposition 5.8 that ; | C F ds| ≤


; C |F| ds.
b. In the case m = n, show that | C F · dx| ≤ C |F| ds.
8. Prove in detail that arc length, as defined for rectifiable curves, is additive; that
is, if C, C1 , and C2 are the curves parametrized by g(t) for t ∈ [a, b], t ∈ [a, c],
and t ∈ [c, b], then L(C) = L(C1 ) + L(C2 ).
222 Chapter 5. Line and Surface Integrals; Vector Analysis

9. Let g(t) = (g(t), h(t)) be a C 1 parametrization of a plane curve. Given a


partition P = {t0 , . . . , tJ } of [a, b], the distance between two neighboring
points g(tj−1 ) and g(tj ) is
(
[g(tj ) − g(tj−1 )]2 + [h(tj ) − h(tj−1 )]2 .

Use the mean value theorem to express the differences inside the square root in
terms of g′ and h′ , and then use Exercise 9 in §4.1 to give an alternate proof of
Theorem 5.11. (Exactly the same idea works for curves in Rn .)

5.2 Green’s Theorem


Green’s theorem is the simplest of a group of theorems — actually, they’re all
special cases of one big theorem, as we shall indicate in §5.9 — that say that “the
integral of something over the boundary of a region equals the integral of something
else over the region itself.” To state it, we need some terminology.
A simple closed curve in Rn is a curve whose starting and ending points co-
incide, but that does not intersect itself otherwise. More precisely, a simple closed
curve is one that can be parametrized by a continuous map x = g(t), a ≤ t ≤ b,
such that g(a) = g(b) but g(s) ̸= g(t) unless {s, t} = {a, b}.
We shall use the term regular region to mean a compact set in Rn that is the
closure of its interior. Equivalently, a compact set S ⊂ Rn is a regular region if
every neighborhood of every point on the boundary ∂S contains points in S int . For
example, a closed ball is a regular region, but a closed line segment in Rn (n > 1)
is not, because its interior is empty.
Now let n = 2. We say that a regular region S ⊂ R2 has a piecewise smooth
boundary if the boundary ∂S consists of a finite union of disjoint, piecewise
smooth simple closed curves, where “piecewise smooth” has the meaning assigned
in the previous section. (We thus allow the possibility that S contains “holes,” so
that its boundary may be disconnected.) In this case, the positive orientation on ∂S
is the orientation on each of the closed curves that make up the boundary such that
the region S is on the left with respect to the positive direction on the curve. More
precisely, if x is a point on ∂S at which ∂S is smooth, and t = (t1 , t2 ) is the unit
tangent vector in the positive direction at that point, then the vector n = (t2 , −t1 ),
obtained by rotating t by 90◦ clockwise, points out of S. (That is, x + ϵn ∈ / S for
small ϵ > 0.) See Figure 5.4.
If F = (F1 , F2 ) is a continuous vector field on R2 , we denote by
* *
F · dx or F1 dx1 + F2 dx2
∂S ∂S
5.2. Green’s Theorem 223

F IGURE 5.4: A region with piecewise smooth, positively oriented boundary.

the sum of the line integrals of F over the positively oriented closed curves that
make up ∂S.
5.12 Theorem (Green’s Theorem). Suppose S is a regular region in R2 with piece-
wise smooth boundary ∂S. Suppose also that F is a vector field of class C 1 on S.
Then
* ** - .
∂F2 ∂F1
(5.13) F · dx = − dA.
∂S S ∂x1 ∂x2
In the more common notation, if we set F = (P, Q) and x = (x, y),
* ** - .
∂Q ∂P
(5.14) P dx + Q dy = − dA.
∂S S ∂x ∂y
Proof. First we consider a very restricted class of regions, for which the proof is
quite simple. We shall say that the region S is x-simple if it is the region between
the graphs of two functions of x, that is, if it has the form
% &
(5.15) S = (x, y) : a ≤ x ≤ b, ϕ1 (x) ≤ y ≤ ϕ2 (x) ,
where ϕ1 and ϕ2 are continuous, piecewise smooth functions on [a, b]. Likewise,
we say that S is y-simple if it has the form
% &
(5.16) S = (x, y) : c ≤ y ≤ d, ψ1 (y) ≤ x ≤ ψ2 (y) ,
where ψ1 and ψ2 are continuous, piecewise smooth functions on [c, d].
E XAMPLE 1. The region bounded by the curve y = 18 x3 − 1, the line x + 2y =
2, and the y-axis is both x-simple and y simple. (See Figure 5.5.) It has the
forms (5.15) and (5.16) with
a = 0, b = 2, ϕ1 (x) = 18 x3 − 1, ϕ2 (x) = 1 − 12 x,
7
2(y + 1)1/3 if −1 ≤ y ≤ 0,
c = −1, d = 1, ψ1 (y) = 0, ψ2 (y) =
2 − 2y if 0 ≤ y ≤ 1.
224 Chapter 5. Line and Surface Integrals; Vector Analysis

−1

F IGURE 5.5: The region in Example 1.

Now let us suppose that S is both x-simple and y-simple. If we write S in


the form (5.15), then ∂S consists of (i) the curve y = ϕ1 (x), oriented from left to
right, (ii) the curve y = ϕ2 (x), oriented from right to left, and (iii) portions of the
vertical
; lines x = a and x = b, which may reduce to single points. The line integral
∂S P dx is the sum of the integrals over these pieces. On the vertical lines, x is
constant and so dx = 0 (that is, dx/dt = 0 in any parametrization), so these pieces
contribute nothing. On the curves y = ϕ1 (x) and y = ϕ2 (x) we can take x as the
parameter, except that the orientation is wrong for y = ϕ2 (x); hence
* * b * b
P dx = P (x, ϕ1 (x)) dx − P (x, ϕ2 (x)) dx.
∂S a a

On the other hand, by the fundamental theorem of calculus,


** * b * ϕ2 (x) * b
∂P ∂P 8 9
dA = dy dx = P (x, ϕ2 (x)) − P (x, ϕ1 (x)) dx.
S ∂y a ϕ1 (x) ∂y a

Comparing these equalities, we obtain


* **
∂P
P dx = − dA.
∂S S ∂y

In exactly the same way, using the representation (5.16) for S, we see that
* **
∂Q
Q dy = dA.
∂S S ∂x

(There is no minus sign here, because if we take y as the parameter for the curves
x = ψ1 (y) and ψ2 (y), the orientation is wrong for ψ1 and right for ψ2 .) Adding
these last two equalities, we obtain the desired result (5.14).
5.2. Green’s Theorem 225

F IGURE 5.6: A decomposition of the region in Figure 5.4 into simple subregions.

Thus Green’s theorem is established for regions that are both x-simple and y-
simple. There is now an immediate generalization to a much larger class of regular
regions. Namely, suppose the region S can be cut up into finitely many subregions,
say S = S1 ∪ · · · · ∪Sk , where
a. the Sj ’s may intersect along common edges but have disjoint interiors;
b. each Sj has a piecewise smooth boundary and is both x-simple and y-simple.
(See Figure 5.6.) Since the Sj ’s overlap only in a set of zero content, by Corollary
4.23b we have
** - . " k ** - .
∂Q ∂P ∂Q ∂P
− dA = − dA.
S ∂x ∂y Sj ∂x ∂y
j=1

On the other hand, we also have


* k *
"
(P dx + Q dy) = (P dx + Q dy),
∂S j=1 ∂Sj

because the integrals over the parts of the boundaries of the Sj ’s that are not parts of
the boundary of S all cancel out. In more detail, if Si and Sj have a common edge
C, then C will have one orientation as part of ∂Si and the opposite ; orientation
;
as part of ∂Sj , so the two integrals over C that make up parts of ∂Si and ∂Sj
will cancel each other. Therefore, we obtain Green’s theorem for the region S by
applying Green’s theorem to the simple regions Sj and adding up the results.
The result we have just obtained is sufficient for most practical purposes, but
it is not definitive. The class of regular regions that can be cut up into simple
subregions does not include all regions with C 1 boundary, much less all regions
with piecewise smooth boundary, and it may be difficult to tell whether a given
region has this property. For example, the region
% &
(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 + x3 sin x−1
226 Chapter 5. Line and Surface Integrals; Vector Analysis

is x-simple but cannot be cut up into finitely many y-simple subregions because
the graph of x3 sin x−1 has infinitely many “wiggles.” The deduction of the general
case from the special cases considered here requires some additional machinery that
is of interest in its own right; we present it in Appendix B.7 (Theorem B.28).

E XAMPLE 2. Let C be the unit circle x2 + y 2 = 1, oriented counterclockwise.


The line integral
* '
8 9 8 9
1 + x2 − yexy + 3y dx + x2 − xexy + log(1 + y 4 ) dy
C

is difficult to evaluate directly, but it yields easily to Green’s theorem. Indeed,


C is the oriented boundary of the unit disc D, so the integral equals

** - .
∂ 8 2 xy 4
9 ∂ 8' 2 xy
9
x − xe + log(1 + y ) − 1 + x − ye + 3y dA
D ∂x ∂y
**
= (2x − 3) dA = −3π.
D

(The integral of 2x over D vanishes by symmetry.)


E XAMPLE 3. It is an amusing and sometimes useful fact that the area of a reg-
ular region S in the plane can be expressed as a line integral over the boundary
∂S. This can be done in many different ways; for instance,
* * *
1
Area of S = x dy = − y dx = 2 (x dy − y dx).
∂S ∂S ∂S
;;
Indeed, Green’s theorem shows that all of these integrals are equal to S 1 dA.
;
The line integral ∂S F · dx is the integral of the tangential component of F
over ∂S. However, Green’s theorem can also be interpreted as a statement about
the integral of the normal component of a vector field.
To see this, recall that counterclockwise and clockwise rotations by 90◦ in
the plane are given by the transformations R+ (x, y) = (−y, x) and R− (x, y) =
(y, −x), respectively. Thus, if t = (t1 , t2 ) is the unit tangent vector to ∂S at a
point on ∂S, pointing in the forward direction, then n = R− (t) = (t2 , −t1 ) is the
unit normal vector to ∂S pointing out of S. Given a vector field F = (F1 , F2 ),
: = R+ (F) = (−F2 , F1 ) be the vector field obtained by rotating the values
let F
of F by 90◦ counterclockwise. Then the normal component of F is the tangential
component of F: :
F · n = F1 t2 − F2 t1 = F : · t.
5.2. Green’s Theorem 227

: we obtain the following


Hence, by applying Green’s theorem to the rotated field F,
result:

5.17 Corollary. Suppose S is a regular region in R2 with piecewise smooth bound-


ary ∂S, and let n(x) be the unit outward normal vector to ∂S at x ∈ ∂S. Suppose
also that F is a vector field of class C 1 on S. Then
* ** - .
∂F1 ∂F2
(5.18) F · n ds = + dA.
∂S S ∂x1 ∂x2

Let us see what Green’s theorem says when F is the gradient of a C 2 function
f , so that F1 = ∂1 f and F2 = ∂2 f . Formula (5.13) gives
* ** **
∇f · dx = (∂1 ∂2 f − ∂2 ∂1 f ) dA = 0 dA = 0.
∂S S S

This is no surprise; it is easy to see directly that the line integral of a gradient over
any closed curve vanishes. Indeed, if the curve C is parametrized by x = g(t) with
g(a) = g(b), then by the chain rule,
* * b * b
′ d
∇f · dx = ∇f (g(t)) · g (t) dt = f (g(t)) dt
C a a dt
= f (g(b)) − f (g(a)) = 0.

The formula (5.18) gives a more interesting result. ∇f · n is the directional deriva-
tive of f in the outward normal direction to ∂S, or normal derivative of f on ∂S,
often denoted by ∂f /∂n; and (5.18) says that
* ** - 2 .
∂f ∂ f ∂2f
ds = + 2 dA.
∂S ∂n S ∂x21 ∂x2

The integrand on the right is the Laplacian of f , which we encountered in §2.6 and
which will play an important role in §5.6.

EXERCISES

1. Evaluate the following line integrals by using Green’s theorem.


a. The integral in Exercise 5c in §5.1.
; integral in Exercise 6c in §5.1.
b. The
c. C [(x2 + 10xy + y 2 ) dx + (5x2 + 5xy)dy], where C is the square with
vertices (0, 0), (2, 0), (0, 2), and (2, 2), oriented counterclockwise.
228 Chapter 5. Line and Surface Integrals; Vector Analysis
; 2 sin y 2 dx
d. ∂S (3x + 2x3 y cos y 2 dy), where S is any regular region with
piecewise smooth boundary.
;
2. Let S be the annulus 1 ≤ x2 + y 2 ≤ 4. Compute ∂S (xy 2 dy − x2 y dx), both
directly and by using Green’s theorem.
3. Find the; positively oriented simple closed curve C that maximizes the line
integral C [y 3 dx + (3x − x3 ) dy].
4. Use Green’s theorem as in Example 3 to calculate the area under one arch of
the cycloid described parametrically by x = R(t − sin t), y = R(1 − cos t).
Let S = {(x, y) : a ≤ x ≤ b, 0 ≤ y ≤ f (x)}, where 1
5. ; f is a nonnegative C
function on [a, b]. Explain how the formula A = − ∂S y dx for the area of S
;b
in Example 3 leads to the familiar formula A = a f (x) dx.
6. Let S be a regular region in R2 with piecewise smooth boundary, and let f and
g be functions of class C 2 on S. Show that
* **
∂g 8 9
f ds = f (∂x2 g + ∂y2 g) + ∇f · ∇g dA.
∂S ∂n S

7. The point of this exercise is to show how Green’s theorem can be used to de-
duce a special case of Theorem 4.41. Let U , V be connected open sets in R2 ,
and let G : U → V be a one-to-one transformation of class C 1 whose deriva-
tive DG(u) is invertible for all u ∈ U . Moreover, let S be a regular region in V
with piecewise smooth boundary, let A be its area, and let T = G−1 (S).
a. The Jacobian det DG is either everywhere positive or everywhere negative
on U ; why? ;
b. Suppose det DG(u) > 0 for all u ∈ U . Write A = ∂S y dx as in Ex-
ample 3, make a change of variable to transform this line integral into
a line integral over ∂T , and apply Green’s theorem to deduce that A =
;;
T det DG dA.
c. By a similar
;; argument, show ;; that if det DG(u) < 0 for all u ∈ U , then
A = − T det DG dA = T | det DG| dA. Where does the minus sign
come from?

5.3 Surface Area and Surface Integrals


In this section we discuss integrals of functions and vector fields over smooth sur-
faces in R3 . Like line integrals, surface integrals come in two varieties, unoriented
and oriented. On a curve the orientation is a matter of deciding which direction
along a curve is “positive”; on a surface it is a matter of deciding which side of the
surface is the “positive” side. The convenient way of specifying the orientation of
5.3. Surface Area and Surface Integrals 229

F IGURE 5.7: A Möbius band.

a smooth surface in R3 is to make a choice of one of the two unit normal vectors
at each point of the surface, in such a way that the choice varies continuously with
the point. The “positive” side of the surface is the one into which the normal arrow
points.
It is important to note that not every surface can be oriented. The standard
example of a nonorientable surface is the Möbius band, which can be constructed
by taking a long strip of paper, giving it a half twist, and gluing the ends together.
(That is, call the two sides of the original strip A and B; the ends are to be glued
together so that side A of one end matches with side B of the other.) A sketch of a
Möbius band is given in Figure 5.7, but the best way to appreciate the features of
the Möbius band is to make one for yourself.
However, if a surface forms part of the boundary of a regular region in R3 , it
is always orientable, and the standard specification for the orientation is that the
positive normal vector is the one pointing out of the region.

Surface Area. We begin by deriving a formula for the area of a region on


a smooth surface S. We shall assume that S is represented parametrically as the
image of a connected open set W in the uv-plane under a one-to-one C 1 map
G : W → R3 :
x = (x, y, z) = G(u, v), (u, v) ∈ W.
For a given surface S, it may not be the case that all of S can be represented by a
single parametrization. We shall assume, however, that S can be cut up into finitely
many pieces which each admit a parametrization; it is then enough to consider the
pieces separately. Also, it is usually sufficient to have a good parametrization for a
subset of S whose complement is of lower dimension, such as the one provided by
spherical coordinates on the unit sphere with the “international date line” removed.
To see how to compute surface area on S, consider a small rectangle in the uv-
plane with vertices (u, v), (u + ∆u, v), (u, v + ∆v), and (u + ∆u, v + ∆v). Its
230 Chapter 5. Line and Surface Integrals; Vector Analysis

image under the map G is a small quadrilateral (with curved sides) on the surface
S whose vertices are G(u, v), G(u + ∆u, v), etc. (See Figure 3.4 in §3.3.) In the
limit in which the increments ∆u and ∆v become infinitesimals du and dv, this
quadrilateral becomes a parallelogram whose sides from the vertex x = G(u, v) to
the two adjacent vertices are described by the vectors
∂G ∂G
G(u + du, v) − G(u, v) = du and G(u, v + dv) − G(u, v) = dv.
∂u ∂v
These two vectors are tangent to the surface S at x, so their cross product is a
vector normal to S at x, whose magnitude is the area of the parallelogram they
span. Therefore, the element of area on S is given in terms of the parametrization
x = G(u, v) by
) )
) ∂G ∂G )
(5.19) dA = ) ) × ) du dv.
∂u ∂v )
In other words, if R is a measurable subset of W in the uv-plane and G(R) is the
corresponding region in the surface S,
** ) )
) ∂G ∂G )
(5.20) Area of G(R) = ) × ) du dv.
) ∂v )
R ∂u

Henceforth we shall take (5.20) as the definition of area for a parametrized


surface. One might wonder if surface area can also be defined by considering poly-
hedral approximations to the surface, as polygonal approximations to a curve were
used to define arc length in the appendix of §5.1. The answer is affirmative, but this
matter is a good deal trickier than the theory of arc length, and we shall not pursue
it further.
Let us be a little more explicit about the formula (5.19). With the notation
G(u, v) = (x, y, z), we have
⎛ ⎞
i j k
∂G ∂G ∂(y, z) ∂(z, x) ∂(x, y)
× = det ⎝∂u x ∂u y ∂u z ⎠ = i+ j+ k.
∂u ∂v ∂(u, v) ∂(u, v) ∂(u, v)
∂v x ∂v y ∂v z

Thus,
E+ ,2 + ,2 + ,2
∂(y, z) ∂(z, x) ∂(x, y)
(5.21) dA = + + du dv.
∂(u, v) ∂(u, v) ∂(u, v)

Computationally, this is usually a horrible mess. (But what did you expect? Arc
length is already problematic; surface area must be worse!)
5.3. Surface Area and Surface Integrals 231

As with arc length, we must verify that our informally-derived formula for sur-
face area really makes sense by checking that it is independent of the parametriza-
tion. Thus, suppose we make a change of variables (u, v) = Φ(s, t), where Φ is a
one-to-one C 1 map from a region V in the st-plane to the region W in the uv-plane.
The elements of area are then related by
) )
) ∂(u, v) )
du dv = )) ) ds dt,
∂(s, t) )

by Theorem 4.41. If we plug this into (5.21), we get


' ∂(y, z) ∂(u, v)
dA = α2 + β 2 + γ 2 ds dt, where α = , etc.
∂(u, v) ∂(s, t)

But by the chain rule and the fact that the determinant of a product is the product
of the determinants, we have

∂(y, z) ∂(u, v) ∂(y, z)


= ,
∂(u, v) ∂(s, t) ∂(s, t)

and likewise for the other two terms. Hence, in the st-parametrization,
E+ , + , + ,
∂(y, z) 2 ∂(z, x) 2 ∂(x, y) 2
dA = + + ds dt.
∂(s, t) ∂(s, t) ∂(s, t)

This is of exactly the same form as (5.21), as we wished to show.


The formula for surface area becomes a little less hideous in the special case
where the surface is the graph of a function, z = ϕ(x, y). In this case we can take
x and y as the parameters, that is,
/ 0
G(x, y) = x, y, ϕ(x, y) .

Here ∂x G = (1, 0, ∂x ϕ) and ∂y G = (0, 1, ∂y ϕ), so

∂G ∂G
× = −(∂x ϕ)i − (∂y ϕ)j + k,
(5.22) ∂x ∂y
(
dA = 1 + (∂x ϕ)2 + (∂y ϕ)2 dx dy.

(Note that our surface is a level set of the function Φ(x, y, z) = z − ϕ(x, y) and
that −(∂x ϕ)i − (∂y ϕ)j + k = ∇Φ; we deduced that ∇Φ is normal to the surface
by other means in Theorem 2.37.)
232 Chapter 5. Line and Surface Integrals; Vector Analysis

E XAMPLE 1. Let us compute the surface area of the unit sphere x2 + y 2 + z 2 =


1. We can proceed in two ways:
' Solution I. The upper hemisphere is the graph of the function ϕ(x, y) =
1 − x2 − y 2 . A little calculation yields
(
1
1 + (∂x ϕ)2 + (∂y ϕ)2 = ' ,
1 − x2 − y 2
and by (5.22), the area of the upper hemisphere is obtained by integrating this
function over the unit disc. (Note that this integral is improper, as the integrand
blows up along the boundary of the disc.) Switching to polar coordinates yields
* 1 * 2π '
r )1
√ dθ dr = −2π 1 − r 2 )0 = 2π.
0 0 1 − r2
Hence the area of the whole sphere is 4π.
Solution II. We can parametrize the sphere by the spherical coordinates
x = sin ϕ cos θ, y = sin ϕ sin θ, z = cos ϕ. An easy calculation yields
∂(y, z) ∂(z, x) ∂(x, y)
= sin2 ϕ cos θ, = sin2 ϕ sin θ, = cos ϕ sin ϕ,
∂(ϕ, θ) ∂(ϕ, θ) ∂(ϕ, θ)
and the sum of the squares of these quantities is
sin4 ϕ(cos2 θ + sin2 θ) + cos2 ϕ sin2 ϕ = sin2 ϕ(cos2 ϕ + sin2 ϕ) = sin2 ϕ.
Hence, by (5.21), the area of the sphere is
* π * 2π

sin ϕ dθ dϕ = −2π cos ϕ)0 = 4π.
0 0

Surface Integrals of Scalar Functions. Now that we know how to compute


surface area, it is easy to ;;
define the integral of a real-valued continuous function
over a surface: It is just S f dA, where dA is the element of surface area de-
fined above. (To keep the notation simple, we shall take the region over which the
integration is performed to be the whole surface S; the idea is exactly the same
for integration over subsets of S.) More precisely, if S admits a parametrization
x = G(u, v) with (u, v) ∈ W , where W is tacitly assumed to be measurable,
** ** ) )
) ∂G ∂G )
f dA = f (G(u, v)) )) × ) du dv.
S W ∂u ∂v )
If S is the graph of a function z = ϕ(x, y), (x, y) ∈ W , the result is
** **
/ 0(
f dA = f x, y, ϕ(x, y) 1 + (∂x ϕ)2 + (∂y ϕ)2 dx dy.
S W
5.3. Surface Area and Surface Integrals 233

Surface Integrals of Vector Fields. The element of area dA on a surface S


parametrized by x = G(u, v) is the norm of the vector (∂u G × ∂v G) du dv. It is
natural to regard the vector (∂u G × ∂v G) du dv itself as a “vector element of area”
for S: its magnitude gives the area of a small bit of S, and its direction, namely the
normal direction to S, specifies how that bit is oriented in space. That is, we have
- .
∂G ∂G
× du dv = n dA
∂u ∂v
where n is a unit normal vector to the surface S. We have already observed that
dA is independent of the parametrization, and clearly so is n up to a factor of ±1.
However, using a different parametrization (for example, interchanging u and v)
might result in replacing n by −n. In other words, a parametrization for a surface
S gives a definite orientation for the S, that is, a specification of which side of S is
the “positive” side.
Now suppose S is a surface with a specified orientation, and F is a continuous
vector field defined on a neighborhood of S. The surface integral of F over S is
defined to be **
F · n dA.
S
Thus, if S is parametrized by x = G(u, v), (u, v) ∈ W , we have
** ** - .
∂G ∂G
F · n dA = F(G(u, v)) · × du dv.
S W ∂u ∂v
This integral is independent of the choice of parametrization as long as the paramet-
rization induces the specified orientation of S; switching to the opposite orientation
results in multiplying
;; the integral by −1. (If S is a nonorientable surface such as a
Möbius band, S F · n dA is not defined.)
A geometric-physical interpretation of this is easy to obtain. F·n is the normal
component of F along S; it is positive or negative according as F points into the
positive or negative side of S. We can think of F as representing the flow of some
substance (air, for example, although there is no need to be specific at this point):
the magnitude of F(x) is the rate of flow;; of the substance past x and its direction
is the direction of flow. The integral F · n dA then represents the net flow, or
flux, of F across the surface S from the negative side to the positive side. We shall
discuss this in more detail in §5.6.
As with line integrals, surface integrals of vector fields are often easier to com-
pute than surface integrals of scalar functions because the inconvenient square root
in the;;formula for dA does not appear in the vector n dA. Let us see, for example,
what S F·n dA becomes when S is the graph of a function with domain W ⊂ R2 ,
234 Chapter 5. Line and Surface Integrals; Vector Analysis

say z = ϕ(x, y). As in the preceding discussion of surface area, we take x and y
as the parameters and find that
n dA = [−(∂x ϕ)i − (∂y ϕ)j + k] dx dy.

The orientation here is the one with the normal pointing upward, since its z com-
ponent is positive. Thus, if F = F1 i + F2 j + F3 k and G(x, y) = (x, y, ϕ(x, y)),
**
(5.23) F · n dA
S
** + ,
∂ϕ ∂ϕ
= −F1 (G(x, y)) − F2 (G(x, y)) + F3 (G(x, y)) dx dy.
W ∂x ∂y
Here and in what follows, we adopt the common practice of denoting by i, j,
and k the unit vectors in the positive coordinate directions and writing vector fields
in R3 as F = F1 i + F2 j + F3 k in preference to F = (F1 , F2 , F3 ); this serves to
emphasize the interpretation of F as a vector field rather than a transformation.
E XAMPLE 2. Let S be the portion of the cone x2 + y 2 = z 2 with 0 ≤ z ≤ 1,
oriented so;;that the normal points upward, and let F(x, y, z) = x2 i + yzj + yk.
Compute S F · n dA.
Solution. One way is to use polar coordinates as parameters: G(r, θ) =
(r cos θ, r sin θ, r). Then we have ∂r G = (cos θ)i + (sin θ)j + k and ∂θ G =
−(r sin θ)i + (r cos θ)j, so
∂r G × ∂θ G = −(r cos θ)i − (r sin θ)j + rk.

This gives the right orientation since the z component, namely r, is positive.
Thus,
**
F · n dA
S
* 2π * 1
8 9
= (r cos θ)2 (−r cos θ) + (r sin θ)r(−r sin θ) + (r sin θ)r dr dθ,
0 0

whose value is easily found to be − 41 π. Alternatively, we could use the repre-


'
sentation z = x2 + y 2 and use (5.23). The reader may verify that this leads
to ** ** F ' G
−x3 − y 2 x2 + y 2
F · n dA = ' + y dx dy,
S x2 +y 2 ≤1 x2 + y 2
and conversion of this integral to polar coordinates leads to the same rθ-integral
as before.
5.3. Surface Area and Surface Integrals 235

Finally, as a practical matter we need to extend the ideas in this section from
smooth surfaces to piecewise smooth surfaces. Giving a satisfactory general def-
inition of a “piecewise smooth surface” is a rather messy business, and we shall
not attempt it. For our present purposes, it will suffice to assume that the surface S
under consideration is the union of finitely many pieces S1 , . . . , Sk that satisfy the
following conditions:
i. Each Sj admits a smooth parametrization as discussed above.

ii. The intersections Si ∩ Sj are either empty or finite unions of smooth curves.
We then define integration over S in the obvious way:
** k **
"
f dA = f dA.
S j=1 Sj

Condition (ii) guarantees that the parts of S that are counted more than once on
the right, namely the intersections Si ∩ Sj , contribute nothing to the integral, by
Propositions 4.19 and 4.22.

E XAMPLE 3.
a. Let S be the surface of a cube; then we can take S1 , . . . , S6 to be the faces
of the cube.
b. Let S be the surface of the cylindrical solid {(x, y, z) : x2 + y 2 ≤ 1, |z| ≤
1}. We can write S = S1 ∪ S2 ∪ S3 where S1 and S2 are the discs forming
the top and bottom and S3 is the circular vertical side. S1 and S2 can be
parametrized by (x, y) → (x, y, 1) and (x, y) → (x, y, −1) with x2 +y 2 ≤
1, and S3 can be parametrized by (θ, z) → (cos θ, sin θ, z) with 0 ≤ θ <
2π and |z| ≤ 1. If one wishes to use only one-to-one parametrizations with
compact parameter domains, one can cut S3 further into two pieces, say
the left and right halves defined by 0 ≤ θ ≤ π and π ≤ θ ≤ 2π.
Remark. In condition (ii) above, we have in mind that the sets Sj will intersect
each other only along their edges, although there is nothing to forbid them from
crossing one another. For example, S could be the union of the two spheres S1 =
{x : |x| = 1} and S2 = {x : |x − i| = 1}. This added generality is largely useless
but also harmless.

EXERCISES
1. Find the area of the part of the surface z = xy inside the cylinder x2 + y 2 = a2 .
236 Chapter 5. Line and Surface Integrals; Vector Analysis

2. Find the area of the part of the surface z = x2 +y 2 inside the cylinder x2 +y 2 =
a2 .
3. Suppose 0 < a < b. Find the area of the torus obtained by revolving the circle
(x − b)2 + z 2 = a2 in the xz-plane about the z axis. (Hint: The torus may be
parametrized by x = (b + a cos ϕ) cos θ, y = (b + a cos ϕ) sin θ, z = a sin ϕ,
with 0 ≤ ϕ, θ ≤ 2π.)
4. Find the area of the ellipsoid (x/a)2 + (y/a)2 + (z/b)2 = 1.
5. Find the centroid of the upper hemisphere of the unit sphere x2 + y 2 + z 2 = 1.
;;
6. Compute S (x2 +y 2 ) dA where S is the portion of the sphere x2 +y 2 +z 2 = 4
with z ≥ 1.
;;
7. Compute S (x2 + y 2 − 2z 2 ) dA where S is the unit sphere. Can you find the
answer by symmetry considerations without doing any calculations?
;;
8. Calculate S F · n dA for the following F and S.
a. F(x, y, z) = xzi − xyk; S is the portion of the surface z = xy with
0 ≤ x ≤ 1, 0 ≤ y ≤ 2, oriented so that the normal points upward.
b. F(x, y, z) = x2 i + zj − yk; S is the unit sphere x2 + y 2 + z 2 = 1, oriented
so that the normal points outward (away from the center).
c. F(x, y, z) = xyi + zj; S is the triangle with vertices (2, 0, 0), (0, 2, 0),
(0, 0, 2), oriented so that the normal points upward.
d. F(x, y, z) = z 2 k; S is the boundary of the region x2 + y 2 ≤ 1, a ≤ z ≤ b,
oriented so that the normal points out of the region. (You should be able to
do this in your head.)
F(x, y, z) = xi + yj + zk; S is the boundary of the region x2 + y 2 ≤ z ≤
e. '
2 − x2 − y 2 , oriented so that the normal points out of the region.

5.4 Vector Derivatives


Let ∇ denote the n-tuple of partial differential operators ∂j = ∂/∂xj :

∇ = (∂1 , . . . , ∂n ).

We are already familiar with this notation in connection with the gradient of a C 1
function on Rn , which is the vector field defined by

grad f = ∇f = (∂1 f, . . . , ∂n f ).

We can also use ∇ to form interesting combinations of the derivatives of a vector


field, via the dot and cross product. If F is a C 1 vector field on an open subset of
5.4. Vector Derivatives 237

Rn , the divergence of F is the function defined by

div F = ∇ · F = ∂1 F1 + · · · + ∂n Fn .

The geometric (coordinate-invariant) meaning of ∇ · F will be explained in §5.5.


Next, suppose n = 3. If F is a C 1 vector field on an open subset of R3 , the
curl of F is the vector field defined by

curl F = ∇ × F = (∂2 F3 − ∂3 F2 )i + (∂3 F1 − ∂1 F3 )j + (∂1 F2 − ∂2 F1 )k.

(Some authors write rot F instead of curl F; “rot” stands for “rotation.”) Again,
the curl has a geometric significance that will be explained later, in §5.7.
We shall employ the notations div F and curl F in preference to ∇·F and ∇×F
because they seem to be more readable. In this section we shall also write grad f
instead of ∇f for the sake of consistency; later we shall use these two notations
interchangeably.
The operators grad, curl, and div satisfy product rules with respect to scalar
multiplication and dot and cross products. As these rules are useful and some of
them are not obvious, it is well to make a list for handy reference. In the following
formulas, f and g are real-valued functions and F and G are vector fields, all of
class C 1 .

(5.24) grad(f g) = f grad g + g grad f


(5.25) grad(F · G) = (F · ∇)G + F × (curl G) + (G · ∇)F + G × (curl F)
(5.26) curl(f G) = f curl G + (grad f ) × G
(5.27) curl(F × G) = (G · ∇)F + (div G)F − (F · ∇)G − (div F)G
(5.28) div(f G) = f div G + (grad f ) · G
(5.29) div(F × G) = G · (curl F) − F · (curl G)
!
In (5.25) and (5.27), F · ∇ denotes the directional derivative Fj ∂j , that is,
" ∂G
(F · ∇)G = Fj .
∂xj
Equations (5.24) and (5.28) are valid in Rn for any n; the others, which involve
cross products and curls, are restricted to n = 3. The proofs of all these formulas
are just a matter of computation; we leave them to the reader as exercises.
We can combine the operations grad, curl, and div pairwise in several ways.
That is, if f and F are of class C 2 , we can form

curl(grad f ), div(curl F), div(grad f ), curl(curl F), grad(div F).


238 Chapter 5. Line and Surface Integrals; Vector Analysis

It is an important fact that the first two of these always vanish, by the equality
of mixed partials:

(5.30) curl(grad f )
= (∂2 ∂3 f − ∂3 ∂2 f )i + (∂3 ∂1 f − ∂1 ∂3 f )j + (∂1 ∂2 f − ∂2 ∂1 f )k = 0

and

(5.31) div(curl F)
= ∂1 (∂2 F3 − ∂3 F2 ) + ∂2 (∂3 F1 − ∂1 F3 ) + ∂3 (∂1 F2 − ∂2 F1 ) = 0.

Schematically, we have

scalar grad vector curl vector div scalar


−→ −→ −→
functions fields fields functions

and (5.30) and (5.31) say that the composition of two successive mappings is zero.
The third combination, div(grad f ), which makes sense in any number of di-
mensions, is of fundamental importance for both physical and purely mathematical
reasons. It is called the Laplacian of f and is usually denoted by ∇2 f or ∆f :

(5.32) ∇2 f = ∆f = div(grad f ) = ∂12 f + · · · + ∂n2 f.

The last two combinations are of less interest by themselves, but together they yield
the Laplacian for vector fields in R3 :

(5.33) grad(div F) − curl(curl F) = ∇2 F = (∇2 F1 )i + (∇2 F2 )j + (∇2 F3 )k.

The verification of (5.33) is a straightforward but somewhat tedious calculation that


we leave to the reader.

EXERCISES

1. Compute the curl and divergence of the following vector fields.


a. F(x, y, z) = xy 2 i + xyj + xyk.
b. F(x, y, z) = (sin yz)i + (xz cos yz)j + (xy cos yz)k.
c. F(x, y, z) = x2 zi + 4xyzj + (y − 3xz 2 )k.
2. Compute the Laplacians of the following functions.
a. f (x, y) = x5 − 10x3 y 2 + 5xy 4 .
b. f (x, y, z) = xy 2 − 4yz 3 .
5.5. The Divergence Theorem 239

c. f (x) = |x|a (x ∈ Rn \ {0}, a ∈ R). (Hint: Use Exercise 9 in §2.6.)


d. f (x, y) = log(x2 + y 2 ) ((x, y) ̸= (0, 0)).
3. Let F(x, y, z) = xi+yj+zk. Show that for any a ∈ R3 , we have curl(a×F) =
2a, div[(a · F)a] = |a|2 and div[(a × F) × a] = 2|a|2 .
4. Prove (5.24) and (5.25).
5. Prove (5.26) and (5.27).
6. Prove (5.28) and (5.29).
7. Prove (5.33).
8. Why is the minus sign in (5.29) there? That is, on grounds of symmetry, with-
out going through any calculations, why must the formula div(F × G) =
G · (curl F) + F · (curl G) be wrong?
9. Show that for any C 2 functions f and g, div(grad f × grad g) = 0.

5.5 The Divergence Theorem


The divergence theorem, also known as Gauss’s theorem or Ostrogradski’s the-
orem, is the 3-dimensional analogue of the version (5.18) of Green’s theorem; it
relates surface integrals over the boundary of a regular region in R3 to volume inte-
grals over the region itself. The divergence theorem is valid for regions with piece-
wise smooth boundaries, but we shall allow the meaning of “piecewise smooth”
to remain a little vague; see the remarks at the end of §5.3. To formulate precise
conditions that encompass all the cases of interest would involve a rather arduous
excursion into technicalities, and the more retricted class of regions covered by the
following argument suffices for most purposes.
5.34 Theorem (The Divergence Theorem). Suppose R is a regular region in R3
with piecewise smooth boundary ∂R, oriented so that the positive normal points
out of R. Suppose also that F is a vector field of class C 1 on R. Then
** ***
(5.35) F · n dA = div F dV.
∂R R

Proof. As with Green’s theorem, we begin by considering a class of simple regions.


We say that R is xy-simple if it has the form
% &
R = (x, y, z) : (x, y) ∈ W, ϕ1 (x, y) ≤ z ≤ ϕ2 (x, y) ,

where W is a regular region in the xy-plane and ϕ1 and ϕ2 are piecewise smooth
functions on W . We define the notions of yz-simple and xz-simple similarly, and
we say that R is simple if it is xy-simple, yz-simple, and xz-simple.
240 Chapter 5. Line and Surface Integrals; Vector Analysis

Suppose now that R is simple. We shall prove the divergence theorem for
the region R by considering the components of F separately. That is, let F =
F1 i + F2 j + F3 k; we shall show that
** ***
F3 k · n dA = ∂3 F3 dV,
∂R R
and similarly for the other two components. Since R is xy-simple, the boundary
∂R consists of three pieces: the “top” and “bottom” surfaces z = ϕ2 (x, y) and
z = ϕ1 (x, y) and the “sides” consisting of the union of the vertical line segments
from (x, y, ϕ1 (x, y)) to (x, y, ϕ2 (x, y)) as (x, y) ranges over the boundary of W .
The outward normal to R is horizontal on the sides, i.e., k · n = 0 there, so the
sides contribute nothing to the surface integral. For the top and bottom surfaces we
use (5.23). The outward normal points upward on the top surface and downward
on the bottom surface, so
** ** **
F3 k · n dA = F3 (x, y, ϕ2 (x, y)) dx dy − F3 (x, y, ϕ1 (x, y)) dx dy
∂R W W
** * ϕ2 (x,y)
= ∂3 F3 (x, y, z) dz dx dy
W ϕ1 (x,y)
***
= ∂3 F3 (x, y, z) dV,
R
as claimed. The proof for F1 i and F2 j is the same, using the assumptions that R is
yz-simple and xz-simple.
It now follows that the divergence theorem is valid for regions that can be cut
up into finitely many simple regions R1 , . . . , Rk . The integrals of div F over the
regions R1 , . . . , Rk add up to the integral over R, and the integrals of F · n over
the boundaries ∂R1 , . . . , ∂Rk add up to the integral over ∂R because the integrals
over the portions of the ∂Rj ’s that are not part of ∂R cancel out. (The reasoning is
the same as in the proof of Green’s theorem.)
The completion of the proof for general regular regions with smooth boundary,
with indications of how to generalize it to the piecewise smooth case, is given in
Appendix B.7 (Theorem B.30).
Armed with the divergence theorem, we can obtain a better understanding of
the meaning of div F. Suppose F is a vector field of class C 1 in some open set
containing the point a. For r > 0, let Br be the ball of radius r about a. If r is
very small, the average value of div F(x) on the ball Br is very nearly equal to
div F(a). Therefore, by the divergence theorem,
*** **
3 3
div F(a) ≈ div F dV = F · n dA.
4πr 3 Br 4πr 3 ∂Br
5.5. The Divergence Theorem 241

This approximation becomes better and better as r → 0, and hence


**
3
(5.36) div F(a) = lim F · n dA.
r→0 4πr 3 |x−a|=r

The integral on the right is the flux of F across ∂Br from the inside (Br ) to the
outside (the complement of Br ). If we think of the vector field as representing
the flow of some substance through space, the integral represents the amount of
substance flowing out of Br minus the amount of substance flowing in; thus, the
condition div F(a) > 0 means that there is a net outflow near a, in other words,
that F tends to “diverge” from a. (The effect is subtle, though: One has to divide
the flux by r 3 in (5.36) to get something that does not vanish in the limit.) In any
case, the integral in (5.36) is a geometrically defined quantity that is independent
of the choice of coordinates; this gives the promised coordinate-free interpretation
of div F.
Among the important consequences of the divergence theorem are the follow-
ing identities.
5.37 Corollary (Green’s Formulas). Suppose R is a regular region in R3 with
piecewise smooth boundary, and f and g are functions of class C 2 on R. Then
** ***
(5.38) f ∇g · n dA = (∇f · ∇g + f ∇2 g) dV,
** ∂R * * *R
(5.39) (f ∇g − g∇f ) · n dA = (f ∇2 g − g∇2 f ) dV.
∂R R

Proof. An application of the product rule (5.28) shows that div(f ∇g) = ∇f ·
∇g + f · ∇2 g, so the divergence theorem applied to F = f ∇g yields (5.38). The
corresponding equation with f and g switched also holds; by subtracting the latter
equation from the former we obtain (5.39).

The directional derivative ∇f · n that occurs in these formulas is called the


outward normal derivative of f on ∂R and is often denoted by ∂f /∂n.

EXERCISES
In several of these exercises it will be useful to note that if Sr is the sphere of
radius r about the origin, the unit outward normal to Sr at a point x ∈ Sr is just
r −1 x. This is geometrically obvious if you think about it a little. Alternatively,
since Sr is a level set of the function |x|2 = x2 + y 2 + z 2 , we know that ∇(|x|2 ) =
2xi + 2yj + 2zk = 2x is normal to Sr , so the unit normal is |x|−1 x = r −1 |x| for
x ∈ Sr .
242 Chapter 5. Line and Surface Integrals; Vector Analysis
;;
1. Use the divergence theorem to evaluate the surface integral S F · n dA for the
following F and S, where S is oriented so that the positive normal points out
of the region bounded by S.
a. F, S as in Exercise 8b in §5.3.
b. F, S as in Exercise 8e in §5.3.
c. F(x, y, z) = x2 i + y 2 j + z 2 k; S is the surface of the cube 0 ≤ x, y, z ≤ a.
d. F(x, y, z) = (x/a2 )i + (y/b2 )j + (z/c2 )k; S is the ellipsoid (x/a)2 +
(y/b)2 + (z/c)2 = 1.
e. F(x, y, z) = x2 i − 2xyj + z 2 k; S is the surface of the cylindrical solid
{(x, y, z) : (x, y) ∈ W, 1 ≤ z ≤ 2} where W is a smoothly bounded
regular region in the plane with area A.
2. Let F(x, y, z) = (x2 + y 2 + z 2 )(xi
;; + yj + zk) and let S be the sphere of radius
a about the origin. Compute S F · n both directly and by the divergence
theorem.
3. Let R be a regular region
;; in R3 with piecewise smooth boundary. Show that
the volume of R is 13 ∂R F · n dA where F(x, y, z) = xi + yj + zk.
4. Prove the following integration-by-parts formula for triple integrals:
*** *** **
∂g ∂f
f dV = − g dV + f gnx dA,
R ∂x R ∂x ∂R

where nx is the x-component of the unit outward normal to ∂R. (Of course,
similar formulas also hold with x replaced by y and z.)
5. Suppose R is a regular region in R3 with piecewise smooth boundary, and f is
* * C 2 on R. * * *
a function of class
∂f
a. Show that dA = ∇2 f dV .
∂R ∂n ** R ***
∂f
2
b. Show that if ∇ f = 0, then f dA = |∇f |2 dV .
∂R ∂n R
6. Let x = (x, y, z) and g(x) = |x|−1 = (x2 + y 2 + z 2 )−1/2 .
a. Compute ∇g(x) for x ̸= 0.
b. Show that ∇2 g(x) = 0 for x ̸= ;;
0. (Cf. Exercise 9 in §2.6.)
c. Show by direct calculation that S (∂g/∂n) dA = −4π if S is any sphere
centered at the origin.
d. Since ∂g/∂n = ∇g · n and ∇2 g = div(∇g), why do (b) and (c) not
contradict the
;; divergence theorem?
e. Show that ∂R (∂g/∂n) dA = −4π if R is any regular region with piece-
wise smooth boundary whose interior contains the origin. (Hint: Consider
the region obtained by excising a small ball about the origin from R.)
5.6. Some Applications to Physics 243

7. Suppose that f is a C 2 function on R3 that satisfies Laplace’s equation ∇2 f =


0.
a. By applying (5.39) to f and g, with g as in Exercise 6 and R = {x : ϵ ≤
|x| ≤ r}, show that the mean values of f on the spheres |x| = r and
|x| = ϵ are equal. (Use Exercises 5a and 6.)
b. Conclude that the mean value of f on any sphere centered at the origin is
equal to the value of f at the origin. (Remark: There is nothing special
about the origin here. By applying this result to f:(x) = f (x + a), which
also satisfies Laplace’s equation, we see that the mean value of f on any
sphere is the value of f at the center. The converse is also true; a function
that has this mean value property must satisfy Laplace’s equation.)

5.6 Some Applications to Physics


In this section we illustrate the uses of the divergence theorem by deriving some
important differential equations of mathematical physics. We make a standing as-
sumption that all unspecified mathematical functions that denote physical quantities
are smooth enough to ensure the validity of the calculations.

Flow of Material. We have previously alluded to an interpretation of a vector


field in terms of material flowing through space. We now develop this idea in more
detail.
Suppose there is some substance moving through a region of space — it might
be air, water, electric charge, or whatever. The distribution of the substance is given
by a density function ρ(x, t); thus ρ(x, t) dV is the amount of substance at time t
in a small box of volume dV located at the point x = (x, y, z). The substance is
moving around, so we also have the velocity field v(x, t) that gives the velocity of
the substance at position x and time t.
Now consider a small bit of oriented surface dS (imagined, not physical) with
area dA and normal vector n located near the point x. (We shall picture dS as a
parallelogram, but its exact shape is unimportant.) At what rate does the substance
flow through this bit of surface?
First suppose that n is parallel to the velocity v = v(x, t). We picture a small
box with vertical face dS and length |v| dt, where dt is a small increment in time,
as in Figure 5.8a. We assume that the box is sufficiently small so that that the
velocity and density are essentally constant throughout the box during the time
interval (t, t + dt). Then the substance that flows through the surface dS in the
time interval dt is just the contents of the box at time t. The volume of the box is
244 Chapter 5. Line and Surface Integrals; Vector Analysis

(a) (b)

n
n θ
dS dS

v v

F IGURE 5.8: Flow of material through a surface element dS.

|v| dt dA, so the amount of substance in the box is ρ|v| dt dA. In short, the rate of
flow of substance through dS is ρ|v| dA.
Now suppose, more generally, that the angle from the velocity v to the normal
n to dS is θ. We apply the same reasoning to the box in Figure 5.8b. The vertical
height of the box is now | cos θ| times the slant height of dS, so the volume of the
box is |v| | cos θ| dt dA = |v · n| dt dA. Therefore, the rate of flow of substance
through dS is ρv · n dA if we take orientation into account, that is, if we count the
flow as negative when it goes in across dS in the direction opposite to n.
Passing from the infinitesimal level to the macroscopic level, we conclude that
the rate of flow of substance through a surface S is
**
J · n dA, where J(x, t) = ρ(x, t)v(x, t).
S

The time-dependent vector field J = ρv that occurs here represents the momentum
density if ρ is the mass density of the substance, and it represents the current density
if the substance is electric charge and ρ is the charge density. Our earlier remarks
about interpreting vector fields in terms of flows really mean thinking of the vector
field as a momentum or current density.

A Conservation Law. Now we come to the application of the divergence the-


orem. In the context of the preceding discussion, suppose that the substance is
conserved, i.e., that it is neither created nor destroyed. Consider a regular region R
space with smooth boundary ∂R. The total amount of substance in R at time t
in ;;;
is R ρ(x, t) d3 x. Since the substance is conserved, the only way for this amount
to change is for the substance to flow in or out through ∂R. Therefore,
*** **
d 3
ρ(x, t) d x = − J · n dA.
dt R ∂R
5.6. Some Applications to Physics 245

(The integral on the right is positive when the substance flows out of S, i.e., when
the amount of substance in S is decreasing; hence the minus sign.) The quantity
on the left is the integral over R of ∂ρ/∂t, by Theorem 4.47. We can use the
divergence theorem to convert the integral on the right to another integral over R,
obtaining
*** ***
∂ρ
(5.40) (x, t) dV = − div J dV.
R ∂t R

Now, this relation holds for any region R. In particular, let us take R = Br to
be the ball of radius r centered at the point x. After division of both sides by the
volume of Br , (5.40) says that the mean values of ∂ρ/∂t and − div J on Br are
equal. Letting r → 0 and assuming that these functions are continuous, we see that
their values at the center x are equal. In short, we have

∂ρ
(5.41) + div J = 0,
∂t
the classic differential equation relating the charge and current densities (or mass
and momentum densities, etc.).
This argument is reversible; that is, (5.41) implies that the substance is con-
served. Indeed, suppose R is a regular region such that no substance flows in or out
of R. Integrating (5.41) and using Theorem 4.47 and the divergence theorem, we
obtain
*** *** *** **
d ∂ρ
ρ dV = dV = − div J dV = − J · n dA = 0,
dt R R ∂t R ∂R

so the amount of substance in R remains constant. Although (5.41) is equivalent


to the conservation of the substance, it is more informative than the mere statement
the substance is neither created or destroyed; it provides information about how the
substance can move around.
The conservation law (5.41) has an important consequence for an incompress-
ible fluid such as water. Incompressibility means that the density ρ is a constant,
so that on the one hand, ∂ρ/∂t = 0, and on the other, div J = div(ρv) = ρ div v.
Thus, (5.41) implies that the velocity field v for an incompressible fluid satisfies
div v = 0.

The Heat Equation. We now derive a mathematical model for the transfer
of heat through a substance by diffusion. (If the substance in question is a fluid
like water or air, our model does not take convection effects into account; we must
assume that the fluid is immobile on the macroscopic scale. But our model is valid
246 Chapter 5. Line and Surface Integrals; Vector Analysis

for the diffusion of heat in solids as well as in fluids that cannot flow readily, such
as air in a down jacket.) Our model will take the form of a differential equation for
the temperature u(x, t) at position x and time t.
The first basic physical assumption (which may be a simplification of the real-
life situation) is that the thermal energy density is proportional to the temperature.
The constant of proportionality σ is the specific heat density; it is the product of the
usual specific heat or heat capacity and the mass density of the substance. The total
thermal energy (or “heat,” for short) within a region R at time t is then
***
σu(x, t) d3 x.
R

The next assumption is Newton’s law of cooling, which says that heat flows
from hotter to colder regions at a rate proportional to the difference in temperature.
In our situation, the precise interpretation of this statement is that the flux of heat
per unit area in the direction of the unit vector n at the point x is proportional to the
directional derivative ∇u(x) · n of the temperature in the direction n, the constant
of proportionality being negative since heat flows in the direction of decreasing
temperature. Denoting the constant of proportionality by −K, then, we see that the
flux of heat across an oriented surface S with normal vector n is
**
− K∇u · n dA.
S

K is called the thermal conductivity.


Next, the amount of heat in a regular region R can change only by the flow of
heat across the boundary ∂R or by the creation or destruction of heat within R (by
a chemical or nuclear reaction, for example). Thus, if we denote by F (x, t) the rate
per unit volume at which heat is being produced at position x at time t, we have
*** ** ***
d 3
σu(x, t) d x = K∇u(x, t) · n dA + F (x, t) d3 x.
dt R ∂R R

Here n denotes the unit outward normal to ∂R, as usual, and the minus sign on the
surface integral has disappeared because a positive flow of heat out of R represents
a decrease of heat in R.
As in the preceding subsection, we bring the d/dt inside the integral and apply
the divergence theorem to obtain
*** *** ***
∂u 2
σ dV = K ∇ u dV + F dV.
R ∂t R R
5.6. Some Applications to Physics 247

Since this holds for an arbitrary regular region R, we conclude as before that
∂u
(5.42) σ (x, t) = K∇2 u(x, t) + F (x, t).
∂t
This partial differential equation is known as the (inhomogeneous) heat equation;
it is of fundamental importance in the study of all sorts of diffusion processes. The
important special case F = 0 (the homogeneous equation) is what is usually called
the heat equation.
We have implicitly assumed that the specific heat density σ and the thermal
conductivity K are constants. However, the same arguments apply to the more
general situation where they are allowed to depend on position, as will be the case
where the material through which the heat is diffusing varies in some way from
point to point. The reader may verify that the result is the following generalized
heat equation:
∂u 8 9
σ(x) (x, t) = div K(x)∇u(x, t) + F (x, t).
∂t
Potentials and Laplace’s Equation. The electric field generated by a system
of electric charges is the vector field E whose value at a point x is the force felt
by a unit positive charge locted at x as the result of the electrostatic attraction or
repulsion to the system of charges. If the system is just a single unit positive charge
at the point p, the field is given by the usual inverse square law force, E(x) =
(x − p)/|x − p|3 . (There should be a constant of proportionality, but we shall
assume that units of measurement have been chosen so that the constant is 1.) For
many purposes, it is more convenient to work with the electric potential u(x) =
|p − x|−1 , which is related to the electric field E by

E = −∇u.

(For any points x1 and x2 , u(x2 )−u(x1 ) is the work done in moving a unit positive
charge from x1 to x2 through the field E.)
If, instead of a single charge at one point, our system of charges consists of
a number of charges located at different points, the electric field (resp. electric
potential) generated by the system is just the sum of the fields (resp. potentials)
generated by the individual charges. We wish to consider the case where there
is a continuous distribution of charge (an idealization, but a useful one) in some
bounded region of space. That is, we are given a charge density function ρ(p), a
continuous function that vanishes outside some bounded set R. The field generated
by such a charge distribution is found in the usual way: Chop up the set R into tiny
pieces, treat the charge coming from each piece as a point charge, and add up the
248 Chapter 5. Line and Surface Integrals; Vector Analysis

resulting fields or potentials. We shall work primarily with the potentials, for which
the result is
***
ρ(p) 3
(5.43) u(x) = d p.
R3 |p − x|

It will be convenient to make the substitution y = p − x. This is just a translation


of coordinates, so its Jacobian is 1, and we obtain
***
ρ(x + y) 3
(5.44) u(x) = d y.
R3 |y|
A couple of comments are in order about this integral. We have written it as
an integral over R3 , but it really extends only over the bounded region R − x =
{y : x + y ∈ R} on which ρ(x + y) ̸= 0. The integral is improper because of
the singularity of |y|−1 at the origin, but one can easily see that it is absolutely
convergent by Proposition 4.65.
The main object of this subsection is to derive an important differential equation
relating u and ρ. The key point is the fact that the Laplacian of |y|−1 vanishes
except at the origin (where it is undefined):
/ 0
(5.45) ∇2 |y|−1 = 0 for y ̸= 0.

The proof is a straightforward calculation (Exercise 2c in §5.4 or Exercise 6b in


§5.5).

5.46 Theorem. Suppose ρ is a function of class C 2 on R3 that vanishes outside


a bounded set, and let u be defined by (5.44). Then u is of class C 2 and ∇2 u =
−4πρ.

Proof. We can differentiate u by passing the derivatives under the integral sign.
They fall on ρ, which is assumed to be of class C 2 , so u is of class C 2 and
***
2 ∇2 ρ(x + y) 3
∇ u(x) = d y.
|y|
(Strictly speaking, Theorem 4.47 does not apply because of the singularity of the
integrand at the origin, but this is a minor technicality. One can finesse the problem,
for example, by switching to spherical coordinates, in which the r 2 sin ϕ coming
from the volume element cancels the r −1 of the integrand with room to spare.)
Here ∇2 ρ(x + y) is obtained by differentiating ρ with respect to x, but the same
result is obtained by taking the derivatives with respect to y, for ∂xj [ρ(x + y)] =
(∂j ρ)(x + y) = ∂yj [ρ(x + y)]. We can therefore use Green’s formula to transfer
5.6. Some Applications to Physics 249

the derivatives to |y|−1 . We need to take some care, however, since the singularity
of |y|−1 does not remain harmless after being differentiated twice.
Let us fix the point x and choose positive numbers ϵ and K, with ϵ < 1 and K
large enough so that ρ(x + y) = 0 if |y| ≥ K − 1. Let Rϵ,K = {y : ϵ < |y| < K}.
We then have ***
2 ∇2 ρ(x + y) 3
∇ u(x) = lim d y.
ϵ→0 Rϵ,K |y|
The integrand has no singularities in the region Rϵ,K , so we can apply Green’s
formula (5.39) to obtain
+ ***
2
∇ u(x) = lim ρ(x + y)∇2 (|y|−1 ) d3 y
ϵ→0 Rϵ,K
** ,
8 −1 −1
9
+ ∇ρ(x + y)|y| − ρ(x + y)∇(|y| ) · n dA .
∂Rϵ,K

The integral over Rϵ,K on the right vanishes by (5.45). Also, the boundary of Rϵ,K
consists of two pieces, the sphere |y| = K and the sphere |y| = ϵ, and the integral
over |y| = K is zero because ρ(x + y) and its derivatives vanish for |y| > K − 1.
Therefore,
**
2
8 9
(5.47) ∇ u(x) = lim ∇ρ(x + y)|y|−1 − ρ(x + y)∇(|y|−1 ) · n dA.
ϵ→0 |y|=ϵ

Here n denotes the unit normal to the sphere |y| = ϵ that is outward with respect
to Rϵ,K and hence inward in the usual sense.
Since the first derivatives of ρ are continuous, |∇ρ(x + y)| is bounded by some
constant C for |y| ≤ 1, and hence
)* * ) **
) ∇ρ(x + y) · n ) C C
) )
(5.48) ) dA) ≤ dA = 4πϵ2 = 4πCϵ,
) |y|=ϵ |y| ) |y|=ϵ ϵ ϵ

which vanishes as ϵ → 0. To evaluate the second term in (5.47), we observe that


n = −ϵ−1 y. (See the remark preceding the exercises in §5.5.) An easy calculation
gives ∇(|y|−1 ) = −y/|y|3 , so ∇(|y|−1 ) · n = ϵ−1 |y|2 /|y|3 = ϵ−2 . Therefore,
(5.47) and (5.48) show that
**
2 ρ(x + y)
∇ u(x) = − lim dA
ϵ→0 |y|=ϵ ϵ2
F ** G
1
= (−4π) lim ρ(x + y) dA .
ϵ→0 4πϵ2 |y|=ϵ
250 Chapter 5. Line and Surface Integrals; Vector Analysis

But the expression inside the brackets is just the mean value of ρ(x + y) on the
sphere |y| = ϵ, which tends to ρ(x) as ϵ → 0, so the proof is complete.

Remark. The hypothesis that ρ is of class C 2 can be weakened (C 1 is more


than enough); we impose it simply to avoid technicalities in the proof. In fact, if
ρ vanishes outside a bounded set and is integrable there, then the equation ∇2 u =
−4πρ holds on any open set on which ρ is C 1 . The key ideas of the proof are all
present in the preceding argument.

5.49 Corollary. The electric field E is related to the charge density ρ by div E =
4πρ.

Proof. 4πρ = div(−∇u) = div E.

The differential equation ∇2 u = −4πρ is called the inhomogeneous Laplace


equation or Poisson equation. The special case ∇2 u = 0, valid in regions where
there are no charges, is the (homogeneous) Laplace equation. These equations
have been extensively studied; solutions of ∇2 u = 0, in particular, have many
interesting properties and applications in many areas.
Everything we have said applies also to gravitational potentials and fields gen-
erated by mass distributions with mass density ρ, except for some minus signs
coming from the fact that masses attract whereas like ;;; charges repel.−1Specifically,
the gravitational potential is given by u(x) = − ρ(x + y)|y| d3 y, and it
2
satisfies ∇ u = 4πρ.
It should be noted that the preceding discussion applies only to situations where
the charge or mass density ρ is static, that is, unchanging in time. If the charges or
masses move around, things become more complicated. The basic reason is that if
a charge or mass at p is moved to a nearby point p′ , the potential it induces cannot
change instantly from |x − p|−1 to |x − p′ |−1 throughout all of space, because the
news of the move can only travel with the speed of light. For electricity, the physics
of time-varying fields is contained in Maxwell’s equations, which we shall present
below; for gravity, it is described by general relativity. (If the time dependence
is not too rapid, however, the relativistic effects will be small and the preceding
calculations can be used as a good approximation. This is more often the case with
gravity than with electricity, because gravity is a much weaker interaction.)

Maxwell’s Equations. Maxwell’s equations are the fundamental differential


equations that are the foundation for the classical (unquantized) theory of electicity
and magnetism. They relate the electric field E, the magnetic field B, the charge
5.6. Some Applications to Physics 251

density ρ, and the current density J. In suitably normalized units, they are
1 ∂B
div E = 4πρ, curl E = − ,
(5.50) c ∂t
1 ∂E 4π
div B = 0, curl B = + J,
c ∂t c
where c is the speed of light. This is not the place for a thorough study of Maxwell’s
equations and their consequences for physics, but we wish to point out a couple of
features of them in connection with the ideas we have been developing. In what
follows we shall assume that all functions in question are of class C 2 , so that the
second derivatives make sense and the mixed partials are equal.
First, Maxwell’s equations contain the law of conservation of charge. Indeed,
by formula (5.31) we have
∂ρ 1 ∂E c
= div = div(curl B) − div J = − div J,
∂t 4π ∂t 4π
and this is the conservation law in the form (5.41). Second, in a region of space
with no charges or currents (ρ = 0 and J = 0), by formula (5.33) we have
1 ∂B 1 ∂2E
∇2 E = ∇(div E) − curl(curl E) = 0 + curl = 2 2
c ∂t c ∂t
and
1 ∂E 1 ∂2B
∇2 B = ∇(div B) − curl(curl B) = 0 − curl = 2 2.
c ∂t c ∂t
That is, the components of E and B all satisfy the differential equation
1 ∂2f
(5.51) ∇2 f = .
c2 ∂t2
This is the wave equation, another of the fundamental equations of mathematical
physics. It describes the propagation of waves in many different situations; here it
concerns electromagnetic radiation — light, radio waves, X-rays, and so on.

EXERCISES
Besides distributions of charge or mass in 3-space, one can consider distributions on
surfaces or curves (physically: thin plates or wires). The formula for the associated
potential or field is similar to (5.43) except that the triple integral is replaced by a
surface or line integral, and the density ρ represents charge or mass per unit area or
unit length rather than unit volume. In the following exercises, “uniform” means
“of constant density.”
252 Chapter 5. Line and Surface Integrals; Vector Analysis

1. Consider a uniform distribution of mass on the sphere of radius r about the


origin. Show that
a. inside the sphere, the potential is constant and the gravitational field van-
ishes;
b. outside the sphere; the potential and field are the same as if the entire mass
were located at the origin.
2. Consider a uniform distribution of mass on the solid ball of radius R about the
origin. Show that the gravitational field at a point x is the same as if the mass
closer to the origin than x were all located at the origin and the mass farther
from the origin than x (if any) were absent. (Use Exercise 1.)
3. Consider a uniform distribution of charge on the z-axis, with density ρ (charge
per unit length).
a. Compute the electric field generated by this distribution. (The relevant
formula is similar to (5.43), but 1/|p − x| is replaced by the negative of its
gradient with respect to x, namely, (x − p)/|x − p|3 .)
b. Show that the modification of (5.43) that presumably gives the potential
for this charge distribution is a divergent integral.
c. To resolve the difficulty presented by (b), we make use of the fact that
the defining property of the potential u, namely ∇u = −E, only deter-
mines u up to an additive constant, so we may subtract constants from u
without affecting the physics. Consider instead a uniform distribution of
charge on the interval [−R, R] on the z-axis with density ρ. Compute the
potential uR generated by this distribution, and show that uR − 2ρ log R
converges as R → ∞ to a function whose gradient is the negative of the
field found in (a). (This sort of removal of divergences by “subtracting off
infinite constants” is common in quantum field theory, where it is known
as renormalization.)
4. Prove the following two-dimensional analogue of Theorem 5.46: Suppose ρ is
a function of class C 2 on R2 that vanishes outside a bounded set, and let
*
u(x) = ρ(x + y) log |y| d2 y.

Then u is of class C 2 and ∇2 u = 2πρ. (The proof is very similar to that of


Theorem 5.46; see Exercise 2d in §5.4.)

5.7 Stokes’s Theorem


Stokes’s theorem is the generalization of Green’s theorem in which the plane is
replaced by a curved surface. The precise setting is as follows.
5.7. Stokes’s Theorem 253

F IGURE 5.9: An oriented surface and its positively oriented boundary.

Let S0 be a smooth surface in R3 , and let S be a region in S0 that is bounded by


a piecewise smooth curve ∂S. By this we mean that ∂S is the boundary of S within
the surface S0 .1 (Of course, if we think of S as a subset of R3 , it has no interior
and so is its own boundary.) We assume that S is oriented by a choice of normal
vector field n, so we can speak of the positive and negative sides of S, and we give
∂S the orientation compatible with the orientation of S in the sense we used in
Green’s theorem. This means, informally speaking, that if you walk around ∂S in
the positive direction, standing on the positive side of S, then S is on your left. In
more mathematical terms, if t is the unit tangent to ∂S in the forward direction at
a point x ∈ ∂S, then n × t, considered as an arrow emanating from x, points into
S. See Figure 5.9.

5.52 Theorem (Stokes’s Theorem). Let S and ∂S be as described above, and let
F be a C 1 vector field defined on some neighborhood of S in R3 . Then
* **
(5.53) F · dx = (curl F) · n dA.
∂S S

Proof. If S is a region in the xy-plane, then n = k = (0, 0, 1); moreover, F · dx


involves only the x- and y-components of F, i.e., F1 and F2 , and (curl F) · n is the
z-component of curl F, namely ∂1 F2 − ∂2 F1 . Hence Stokes’s theorem reduces to
Green’s theorem in this case.
Next, suppose that S admits a parametrization x = G(u, v), so that S is the
image under G of a regular region W in the uv-plane and ∂S is the image of ∂W .
1
Here are the precise definitions: A point x ∈ S is in the interior of S relative to S0 if it has a
neighborhood U (in R3 ) such that U ∩ S0 ⊂ S; it is in the boundary of S relative to S0 if all of
its neighborhoods contain points in S and points in S0 \ S. S is regular if it is compact and every
neighborhood of every (relative) boundary point contains points in the (relative) interior.
254 Chapter 5. Line and Surface Integrals; Vector Analysis

We assume that this parametrization yields the given orientation on S (otherwise,


just switch u and v). We use the parametrization to pull back the integrals over S
and ∂S to integrals over W and ∂W , and we apply Green’s theorem to the latter. It
is just a matter of seeing that this change of variables works out as it should.
As in the proofs of Green’s theorem and the divergence theorem, we consider
the components of F separately. Thus, if we write F = F i + Gj + Hk, it is enough
to prove (5.53) for F i, Gj, and Hk separately. All three of them work the same
way, so we shall just do F i, for which (5.53) reduces to
* **
(5.54) F (x, y, z) dx = [(∂z F )j − (∂y F )k] · n dA.
∂S S

Now, using the parametrization x = G(u, v), we have


(5.55)
** ** - .
∂G ∂G
[(∂z F )j − (∂y F )k] · n dA = [(∂z F )j − (∂y F )k] · × du dv
S W ∂u ∂v
** - .
∂F ∂(z, x) ∂F ∂(x, y)
= − du dv.
W ∂z ∂(u, v) ∂y ∂(u, v)

On the other hand, since the formalism of differentials automatically encodes the
chain rule, - .
* *
∂x ∂x
F dx = F du + dv .
∂S ∂W ∂u ∂v
(In both of these equations, F and its derivatives are evaluated at G(u, v).) We
apply Green’s theorem to this last line integral:
* - . ** - + , + ,.
∂x ∂x ∂ ∂x ∂ ∂x
F du + dv = F − F du dv.
∂W ∂u ∂v W ∂u ∂v ∂v ∂u

By the product rule and the chain rule, the integrand on the right equals
+ ,
∂F ∂x ∂F ∂y ∂F ∂z ∂x ∂2x
+ + +F
∂x ∂u ∂y ∂u ∂z ∂u ∂v ∂u∂v
+ ,
∂F ∂x ∂F ∂y ∂F ∂z ∂x ∂2x
− + + −F
∂x ∂v ∂y ∂v ∂z ∂v ∂u ∂v∂u
∂F ∂(z, x) ∂F ∂(x, y)
= − .
∂z ∂(u, v) ∂y ∂(u, v)

But this is the integrand on the right side of (5.55), so (5.54) is proved.
5.7. Stokes’s Theorem 255

Finally, as in the proofs of Green’s theorem and the divergence theorem, we


obtain Stokes’s theorem more generally for surfaces S that can be cut up into a
finite number of pieces that each admit a parametrization by applying the preceding
argument to the pieces and adding up the results. Alternatively, we can obtain
Stokes’s theorem for general surfaces by an adaptation of the proof of Green’s
theorem in Appendix B.7.
;
E XAMPLE 1. Use Stokes’s theorem to compute C F · dx where F(x, y, z) =

x2 + 1 i + xj + 2yk and C is the intersection of the surfaces z = xy and
x + y 2 = 1, oriented counterclockwise as viewed from above.
2

Solution. C is the boundary of the portion of the surface z = xy inside


the cylinder x2 + y 2 = 1, and its orientation is compatible with the orientation
of S with the normal pointing upward. We have curl F(x, y, z) = 2i + k and
n dA = (−yi − xj + k) dx dy, so
* **
F · dx = (1 − 2y) dx dy = π.
C x2 +y 2 ≤1

(No computation is necessary here; the integral of 1 is the area of the disc and
the integral of −2y vanishes by symmetry.)

There is an interesting feature of Stokes’s theorem that does not appear in its
siblings. A closed curve in R2 is the boundary of just one regular region in R2 ,
and a closed surface in R3 is the boundary of just one regular region in R3 ; but a
closed curve in R3 is the boundary of infinitely many surfaces in R3 ! For example,
the unit circle in the xy-plane is the boundary of the unit disc in the xy-plane, the
upper and lower hemispheres of the unit sphere in R3 , the portion of the paraboloid
z = 1 − x2 − y 2 lying above the unit disc, and so forth. Stokes’s theorem says that
if C is a closed curve in R3 and S is any oriented surface bounded by C, then
* **
F · dx = (curl F) · n dA
C S

for any C 1 vector field F, provided that the orientations on C and S are compatible.
xz x+2y ]i + [log(2 + y + z) + 2ex+2y ]j +
E XAMPLE 2. Let F(x,;; y, z) = [e + e
3xyzk. Compute S curl F · n dA, where S is the portion of the surface z =
1 − x2 − y 2 above the xy-plane, oriented with the normal pointing upward.
Solution. We have curl F(x, y, z) = [3xz − (2+ y + z)−1 ]i+ [xexz − 3yz]j
and n dA = (2xi + 2yj + k) dx dy, so direct evaluation;of the integral is quite
unpleasant. By Stokes’s theorem, the integral equals C F · dx where C is
the unit circle in the xy-plane; this is not much better. However, by Stokes’s
256 Chapter 5. Line and Surface Integrals; Vector Analysis
;;
theorem again, the latter line integral is equal to D curl F · n dA where D is
the unit disc in the xy-plane. Here n = k, so curl F · n = 0 and the integral
vanishes!

Here is an analogue of the fact that the integral of the gradient of a function
over any closed curve vanishes:

5.56 Corollary. If S is a closed surface (i.e., a surface with no boundary)


;; in R3
1
with unit outward normal n, and F is a C vector field on S, then S (curl F) ·
n dA = 0.

Proof. If F extends differentiably to the region R inside S, this follows from the
divergence theorem, since div(curl F) = 0 for any F. However, it is true even if F
has singularities inside S. To see this, draw a small simple closed curve C in S (say,
the image of a small circle in the uv-plane under a parametrization x = G(u, v)).
C divides S into two regular regions S1 and S2 , and we have
** ** **
(5.57) (curl F) · n dA = (curl F) · n dA + (curl F) · n dA.
S S1 S2

On the other hand, if we give C the orientation compatible with S1 , Stokes’s theo-
rem gives
** * **
(curl F) · n dA = F · dx = − (curl F) · n dA,
S1 C S2

because the orientation compatible with S2 is the opposite one. Hence the terms on
the right of (5.57) cancel.
(Note: We had to say that C is a “small” closed curve, because otherwise C
might not divide S into two pieces. For example, take S to be a torus [the surface
of a doughnut] and C to be a circle that goes completely around S in one direction.)

Stokes’s theorem gives a geometric, coordinate-free interpretation of the curl of


a vector field. Namely, suppose F is a C 1 vector field on some open set containing
the point a; here’s how to find the component of curl F(a) in the direction of any
unit vector u, that is, (curl F(a)) · u . Let Dϵ be the disc of radius ϵ centered at a
in the plane perpendicular to u, oriented so that u is the positive normal for Dϵ . As
ϵ → 0, the average value of (curl F) · u over Dϵ approaches its value at a:
**
1
(curl F(a)) · u = lim 2 (curl F) · u dA.
ϵ→0 πϵ Dϵ
5.7. Stokes’s Theorem 257

Since u is the normal to Dϵ , Stokes’s theorem gives


*
1
(5.58) (curl F(a)) · u = lim 2 F · dx,
ϵ→0 πϵ Cϵ

where Cϵ is the circle of radius ϵ about a in the plane perpendicular to u, traversed


counterclockwise as viewed from the side on which u lies. This is the promised
coordinate-free description of curl F.;
If we think of F as a force field, Cϵ F · dx is the work done by F on a particle
that moves around Cϵ . Thus (5.58) says that (curl F(a)) · u represents the tendency
of the force F to push the particle around Cϵ , counterclockwise if (curl F(a)) · u
is positive and clockwise if it is negative (as viewed from the u-side).

EXERCISES
;
1. Use Stokes’s theorem to calculate C [(x − z) dx + (x + y) dy + (y + z) dz]
where C is the ellipse where the plane z = y intersects the cylinder x2 +y 2 = 1,
oriented counterclockwise as viewed from above.
;
2. Use Stokes’s theorem to evaluate C [y dx+y 2 dy +(x+2z) dz] where C is the
curve of intersection of the sphere x2 + y 2 + z 2 = a2 and the plane y + z = a,
oriented counterclockwise as viewed from above.
3. Given any nonvertical plane P parallel to the x-axis, let C; be the curve of
intersection of P with the cylinder x2 + y 2 = a2 . Show that C [(yz − y) dx +
(xz + x) dy] = 2πa2 .
;;
4. Evaluate S curl F · n dA where F(x, y, z) = yi + (x − 2x3 z)j + xy 3 k and S
is the upper half of the sphere x2 + y 2 + z 2 = a2 .
5. Let F(x, y, z) = 2xi + 2yj + (x2 + y 2 + z 2 )k and let S be the lower half of the
ellipsoid (x2 /4) + (y 2 /9) + (z 2 /27) = 1. Use Stokes’s theorem to calculate
the flux of curl F across S from the lower side to the upper side.
6. Define the vector field F on the complement of the z-axis by F(x, y, z) =
(−yi + xj)/(x2 + y 2 ).
a. Show that curl F = 0. ;
b. Show by direct calculation C F · dx = 2π for any horizontal circle C
centered at a point on the z-axis.
c. Why do (a) and (b) not contradict Stokes’s theorem?
7. Let Cr denote the circle of radius r about the origin in the xz-plane, oriented
counterclockwise as viewed from the positive y-axis. Suppose
; F is a C 1 vector
field on the complement of the y-axis in R3 such that C1 F · dx = 5 and
;
curl F(x, y, z) = 3j + (zi − xk)/(x2 + z 2 )2 . Compute Cr F · dx for every r.
258 Chapter 5. Line and Surface Integrals; Vector Analysis

8. Let S be a smooth oriented surface in R3 with piecewise smooth, compatibly


oriented boundary ∂S. Suppose f is C 1 and g is C 2 on some open set contain-
ing S. Show that
* **
f ∇g · dx = (∇f × ∇g) · n dA.
∂S S

5.8 Integrating Vector Derivatives


In this section we study the question of solving the equations
grad f = G, curl F = G, div F = g
for f or F, given g or G. We first consider the equation ∇f = G, and we begin
with a simple and useful result:
5.59 Proposition. Suppose G is a continuous vector field on an open set R in Rn .
The following two conditions are equivalent:
a. If C1 and C2 are any two ;oriented curves
; in R with the same initial point and
the same final point, then C1 G · dx = C2 G · dx.
;
b. If C is any closed curve in R, C G · dx = 0.
Proof. (a) implies (b): Suppose C starts and ends at a. Then C has the same initial
;and final point as the “constant curve” C2 described by x(t) ≡ a, and obviously
C2 G · dx = 0 since dx = 0 on C2 .
(b) implies (a): Suppose C1 and C2 start at a and end at b. Let C be the closed
curve obtained
; by following
; C1 from ;a to b and then C2 backwards from b to a.
Then 0 = C G · dx = C1 G · dx − C2 G · dx.
A vector field G that satisfies (a) and (b) is called conservative in the region
R. (The word “conservative” has to do with conservation of energy. If we interpret
G as a force field, condition (b) says that the force does no net work on a particle
that returns to its starting point.) A good deal of mathematical physics is based on
the following characterization of conservative vector fields:
5.60 Proposition. A continuous vector field G in an open set R ⊂ Rn is conser-
vative if and only if G is the gradient of a C 1 function f on R.
Proof. If G = ∇f and C is a closed curve parametrized by x = g(t), a ≤ t ≤ b,
by the chain rule we have
* * b * b
′ d
∇f · dx = ∇f (g(t)) · g (t) dt = f (g(t)) dt
C a a dt
= f (g(b)) − f (g(a)) = 0
5.8. Integrating Vector Derivatives 259

because g(b) = g(a), so condition (b) in Proposition 5.59 is satisfied.


Conversely, suppose G is conservative in R. To construct a function of which
G is the gradient, we shall assume R is connected. (Otherwise we can consider
each connected piece of R separately.) Pick a base point a ∈ R. For any x ∈ R,
let C be a curve in R from ; a to x — such a curve always exists, by Theorem 1.30
— and define f (x) = C G · dx. This definition makes sense by condition (a)
in Proposition 5.59: It doesn’t matter which curve we pick. We shall show that
G = ∇f by showing that Gj = ∂j f for each j; it is enough to do the case j = 1.
Let h = (h, 0, . . . , 0). Given x ∈ R, suppose h is small enough; so that the line
segment L from x to x + h lies entirely in R. We have f (x) = C G · dx where
C is a curve from a to x. We can make
; a curve;from a to x + h by joining L onto
the end of C, so that f (x + h) = C G · dx + L G · dx. But then
* * h
f (x + h) − f (x) 1 1
= G · dx = G1 (x1 + t, x2 , . . . , xn ) dt,
h h L h 0

and by letting h → 0 we obtain ∂1 f (x) = G1 (x).

The function f in Proposition 5.60 is determined up to an additive constant, as-


suming that R is connected. It is called the potential associated to the conservative
vector field G.
It remains to find a good method for determining whether a vector field is con-
servative, i.e., whether it is the gradient of a function. Another way of phrasing this
question: When is a differential form G1 dx1 + · · · + Gn dxn the differential of a
function? We shall assume that the vector field G is of class C 1 on an open set
R. In this case, there is an obvious necessary condition for G to be a gradient of
a function on R. Indeed, if Gj = ∂j f , then ∂j Gk and ∂k Gj are both equal to the
mixed partial ∂j ∂k f , so

∂Gj ∂Gk
(5.61) − = 0 for all j ̸= k.
∂xk ∂xj

We observe that when n = 3, the quantities in (5.61) are the components of curl G,
so that (5.61) is equivalent to the condition curl G = 0.
The condition (5.61) is almost sufficient to guarantee that G is a gradient; the
only possible problem arises from the geometry of R, as we shall explain in more
detail below. When R is convex, the problem disappears, and we have the following
result. Our proof will only be complete in dimensions 2 and 3 because it invokes
Green’s or Stokes’s theorem, but the same idea works in higher dimensions.
260 Chapter 5. Line and Surface Integrals; Vector Analysis

5.62 Theorem. Suppose R is a convex open set in Rn and G is a C 1 vector field


on R. If G satisfies (5.61) in R (which means that curl G = 0 in R in the case
n = 3), then G is the gradient of a C 2 function on R.

Proof. The idea is similar to the proof of Proposition 5.60, but we do not know
yet that condition (a) of Proposition 5.59 is satisfied, so we must be more
; careful.
Pick a base point a in R, and define f (x) for x ∈ R by f (x) = L(a,x) G ·
dx, where L(a, x) is the line segment from a to x. (We need the hypothesis of
convexity so that this line segment lies in R.) To show that G(x) = ∇f (x), let
h = (h, 0, · · · , 0) be small enough so that x + h ∈ R. Let C be the triangular
closed curve obtained by following L(a, x) from a to x, L(x, x + h) from x to
x + h, and then L(a, x + h) backwards from x + h to a. Green’s theorem (if
n = 2), Stokes’s theorem (if n = 3), or the
; higher-dimensional version of Stokes’s
theorem (if n > 3; see §5.9) converts C G · dx into a double integral over the
solid
; triangle whose boundary is C, whose integrand vanishes by (5.61). Hence
C G · dx = 0, or in other words,
* * *
f (x + h) − f (x) = G · dx − G · dx = G · dx.
L(a,x+h) L(a,x) L(x,x+h)

Now the same argument as in Proposition 5.60 shows that ∂1 f = G1 , and likewise
∂j f = Gj for the other j.

The hypothesis of convexity in Theorem 5.62 is stronger than necessary; one


can generalize the argument by using curves other than straight lines. What is
crucial is that when one joins the points a, x, and x + h by line segments or curves,
the resulting “triangle” is the boundary of a piece of surface that lies entirely in R,
so that the condition (5.61) and Stokes’s theorem can be applied. This may not be
the case if the region R has “holes.” The following example shows what can go
wrong in such a case.

E XAMPLE 1. Let R be the complement of the z-axis in R3 , and let

−yi + xj
G(x, y, z) = .
x2 + y 2

It is easily verified that the condition curl


; G = 0 is satisfied on R, but that
G is not conservative on R; in fact, C G · dx = 2π when C is the unit
circle. (See Exercise 6 in §5.7.) The key to the mystery is as follows: G is
really the gradient of the angular variable θ in cylindrical coordinates, but θ
is not a well-defined function on R. It is defined only up to multiples of 2π.
5.8. Integrating Vector Derivatives 261

However, if we choose a convex subregion of S ⊂ R (for example, the half-


space y > 0), we can choose a well-defined “branch” of the angle θ on S (for
example, 0 < θ < π), and then G is the gradient of this function on S. The
same example can be used in R2 , taking R to be the complement of the origin.

The hypothesis on R that should replace convexity in Theorem 5.62 to give the
best result is that every simple closed curve in R is the boundary of a surface lying
entirely in R. (The proof requires more advanced techniques.) The region R in
Example 1 does not have this property; no closed curve that encircles the z-axis
can be the boundary of a surface in R.
In practice, if R is a rectangular box, to find a function whose gradient is G one
can proceed in a more simple-minded way than is indicated in the proof of Theorem
5.62. Consider the 2-dimensional case, where R = [a, b] × [α, β] and G(x, y) =
P (x, y)i + Q(x, y)j. Assuming that ∂x Q = ∂y P , we begin by integrating P with
respect to x, including a “constant” of integration that can depend on the other
variable y: * x
f (x, y) = P (t, y) dt + ϕ(y).
c

Here c can be any point in the interval [a, b]. Any such f will satisfy ∂x f = P . To
obtain ∂y f = Q, differentiate the formula for f with respect to y and use Theorem
4.47:
* x * x

∂y f (x, y) = ∂y P (t, y) dt + ϕ (y) = ∂x Q(t, y) dt + ϕ′ (y)
a a
= Q(x, y) − Q(a, y) + ϕ′ (y).

Thus we obtain the desired f by taking ϕ to be an antiderivative of Q(a, y).


The same idea works in n variables. If G is a vector field on Rn that satisfies
(5.61), we integrate G1 with respect to x1 to get
* x1
f (x1 , . . . , xn ) = G1 (t, x2 , . . . , xn ) dt + ϕ(x2 , . . . , xn ).
a

Then ∂1 f = G1 . Differentiating this formula with respect to x2 , . . . , xn and using


the facts that ∂j G1 = ∂1 Gj , we obtain formulas for ∂2 ϕ, . . . , ∂n ϕ. The problem is
thereby reduced to a similar problem (finding a function with a given gradient) in
one less variable, so we can proceed inductively.

E XAMPLE 2. Let G(x, y) = [y 2 exy ]i + [(xy + 1)exy + cos y]j. We have


∂1 G2 = ∂2 G1 = (2y + xy 2 )exy , so (5.61) is satisfied. To find a function f
262 Chapter 5. Line and Surface Integrals; Vector Analysis

such that ∇f = G, we set


*
f (x, y) = y 2 exy dx = yexy + ϕ(y).

Then ∂y f = (xy + 1)exy + ϕ′ (y); matching this up with the second component
yields ϕ′ (y) = cos y, so we can take ϕ(y) = sin y. The general solution is
f (x, y) = yexy + sin y + C.
E XAMPLE 3. Let G(x, y, z) = yzi+(xz+y)j+(xy−z)k. An easy calculation
shows that curl G = 0. To find a function f such that ∇f = G, we integrate
the first component with respect to x, obtaining f (x, y, z) = xyz + ϕ(y, z).
Differentiating this in y and z yields ∂y f = xz + ∂y ϕ and ∂z f = xy + ∂z ϕ.
Therefore, we must have ∂y ϕ = y and ∂z ϕ = −z. Integrating the first of these
equations with respect to y gives ϕ(y, z) = 21 y 2 + ψ(z), so ∂z ϕ = ψ ′ (z) = −z
and ψ(z) = − 12 z 2 + C. Putting this all together,

f (x, y, z) = xyz + 12 y 2 − 21 z 2 + C.

Next, we turn to the question of solving the equation curl F = G, where G is a


C 1 vector field on some open set R ⊂ R3 . There is an obvious necessary condition
for solvability: Since div(curl F) = 0 for any F (formula (5.31)), we must have
div G = 0 on R. Again, this condition turns out to be sufficient provided that R
has “no holes,” but here the meaning of “no holes” is somewhat different. Instead
of requiring that every closed curve in R be the boundary of a surface that lies
entirely in R, we require that every closed surface in R be the boundary of a 3-
dimensional region that lies entirely in R. For example, the complement of the
z-axis in R3 satisfies the second condition but not the first; the complement of the
origin satisfies the first condition but not the second. An example of a vector field
G that satisfies div G = 0 on the complement of the origin but is not the curl
of any vector field there is provided by G(x) = x/|x|3 , the “inverse square law
force.” This G cannot be a curl because its integral over a sphere about the origin
is nonzero, and this contradicts Corollary 5.56. (See Exercise 6 in §5.5; our G is
the negative of the gradient of the g there.)
Convex regions have no holes, no matter what one means by “holes,” and the
following analogue of Theorem 5.62 is valid.

5.63 Theorem. Suppose R is a convex open set in R3 and G is a C 1 vector field


on R. If G satisfies div G = 0 on R, then G is the curl of a C 2 vector field on R.

Proof. We shall not give the general proof but shall content ourselves with present-
ing an algorithm for solving curl F = G when R is a rectangular box, similar to the
5.8. Integrating Vector Derivatives 263

one given above for solving ∇f = G. Suppose that R = [a1 , b1 ]×[a2 , b2 ]×[a3 , b3 ]
and G is a C 1 vector field satisfying div G = 0 on R. Unlike the problem of find-
ing a function with a given gradient, whose solution is unique up to an additive
constant, there is lots of freedom in choosing an F such that curl F = G, for if
curl F = G then also curl(F + ∇f ) = G for any smooth function f . This gives
enough leeway to allow us to assume that the z-component of F is zero. Thus, let
us write G = G1 i + G2 j + G3 k and F = F1 i + F2 j; we then want

curl F = −∂z F2 i + ∂z F1 j + (∂x F2 − ∂y F1 )k = G1 i + G2 j + G3 k.

We solve the first two equations by taking


* z * z
F2 = − G1 (x, y, t) dt + ϕ(x, y), F1 = G2 (x, y, t) dt + ψ(x, y),
c c

where c is some chosen point in [a3 , b3 ]. We then have


* z8 9
∂x F2 − ∂y F1 = − ∂y G2 (x, y, t) + ∂x G1 (x, y, t) dt
c
+ ∂x ϕ(x, y) − ∂y ψ(x, y).

Since div G = 0, this equals


* z
∂z G3 (x, y, t) dt + ∂x ϕ(x, y) − ∂y ψ(x, y)
c
= G3 (x, y, z) − G3 (x, y, c) + ∂x ϕ(x, y) − ∂y ψ(x, y).

We therefore achieve our goal by choosing ϕ and ψ to satisfy

∂x ϕ(x, y) − ∂y ψ(x, y) = G3 (x, y, c).

There is still lots of freedom here; for example, we could take


* x
ϕ(x, y) = G3 (t, y, c) dt, ψ(x, y) = 0 (a ∈ [a1 , b1 ]).
a

If div G = 0, a vector field F such that curl F = G is called a vector potential


for G.
264 Chapter 5. Line and Surface Integrals; Vector Analysis

E XAMPLE 4. Find a vector potential for the vector field


G(x, y, z) = (6xz + x3 )i − (3x2 y + y 2 )j + (4x + 2yz − 3z 2 )k.
Solution. First one should verify that div G = 0 so as not to go on a fool’s
errand. Having done so, one can take F = F1 i + F2 j where

∂z F1 = −(3x2 y + y 2 ), ∂z F2 = −(6xz + x3 ),
∂x F2 − ∂y F1 = 4x + 2yz − 3z 2 .

Solving the first two equations yields

F1 = −3x2 yz − y 2 z + ψ(x, y), F2 = −3xz 2 − x3 z + ϕ(x, y),

and plugging these results into the third equation yields ∂x ϕ − ∂y ψ = 4x.
Therefore, one solution (with ϕ = 2x2 and ψ = 0) is

F0 = −(3x2 yz + y 2 z)i + (2x2 − 3xz 3 − x3 z)j;

the general solution is F = F0 + ∇f where f is an arbitrary C 1 function.


Now, what about the equation div F = g? Here there are no obstructions to
solvability, and there is an enormous amount of freedom in finding a solution. For
example, if we wish to solve div F = g in a rectangular box in Rn , we could take
* x1
F = (F, 0, . . . , 0), F (x) = g(t, x2 , . . . , xn ) dt,
c

or similar expressions with the variables permuted; there are many other possi-
bilities. In fact, this problem is so easy that it seems reasonable to make it more
interesting by imposing additional conditions on F. We restrict attention to the
three-dimensional situation, but there are similar results in higher dimensions.
The key result here is Theorem 5.46, which shows that we can solve the equa-
tion div F = g subject to the restriction that curl F = 0. More precisely, suppose
R is a bounded open set in R3 and g is of class C 1 on R. (In Theorem 5.46 g
was assumed to be C 2 , but see the remarks following the proof.) Smoothness on
R means that g can be extended as a C 1 function to an open set containing R, and
it can be modified outside R so as to vanish outside some bounded set while re-
maining of class C 1 . (One multiplies g by a C 1 function that is identically 1 on R
and vanishes outside some slightly larger region; we omit the details, which are of
little importance for this argument.) Hence we may assume that g is C 1 on R3 and
vanishes outside a bounded set. Then, by Theorem 5.46, the function
***
1 g(x + y) 3
u(x) = − d y
4π R 3 |y|
5.8. Integrating Vector Derivatives 265

satisfies ∇2 u = g, and so the vector field F = ∇u satisfies both div F = g and


curl F = 0 on R.
With this result in hand, we show that the equations curl F = G and div F = g
can be solved simultaneously (for the same F).
5.64 Theorem. Let R be a bounded convex open set in R3 . For any C 1 function g
on R and any C 2 vector field G on R such that div G = 0, there is a C 2 vector
field F on R such that curl F = G and div F = g on R.
Proof. Let H be a solution of curl H = G, as in Theorem 5.63, and let u be
a solution of ∇2 u = g − div H, as explained above. Let F = ∇u + H; then
curl F = curl(∇u) + G = G and div F = ∇2 u + div H = g.

There is a companion result to Theorem 5.64: Not every vector field is a gra-
dient, and not every vector field is a curl, but every vector field is the sum of a
gradient and a curl. The proof is left to the reader as Exercise 3, where a more
precise statement is given.
One might also ask about uniqueness in Theorem 5.64; that is, to what extent is
a vector field determined by its curl and divergence? Clearly, if F satisfies curl F =
G and div F = g, then so does F + H whenever curl H = 0 and div H = 0.
Solutions of the latter pair of equations can be obtained simply by taking H = ∇ϕ
where ϕ is any solution of Laplace’s equation ∇2 ϕ = 0. Such solutions exist in
great abundance, so the F in Theorem 5.64 is far from unique. However, one can
pin down a unique solution by imposing suitable boundary conditions.
5.65 Proposition. Let R be a bounded convex open set in R3 with piecewise smooth
boundary. Suppose H is a C 1 vector field on R such that curl H = 0 and div H =
0 on R and H · n = 0 on ∂R. Then H vanishes identically on R.
Proof. By Theorem 5.62, H is the gradient of a function u on R, and ∇2 u =
div H = 0. Since H · n = ∂u/∂n, by Green’s formula (5.38) we have
** *** ***
∂u / 2 2
0 / 2 0
0= u dA = |∇u| + u∇ u dV = |H| + 0 dV.
∂R ∂n R R

But |H|2 is a nonnegative continuous function, so its integral over R can be zero
only if |H|2 (and hence H) vanishes identically on R.

By applying Proposition 5.65 to the difference of two solutions of the problem


in Theorem 5.64, we see that if F and F′ are vector fields with the same curl and
divergence on R and the same normal component on ∂R, then F = F′ on R.
We conclude with a few remarks about the application of the results of this
section to Maxwell’s equations (5.50). First, we observe that the curl of the electric
266 Chapter 5. Line and Surface Integrals; Vector Analysis

field E vanishes only when there are no time-varying magnetic fields present. Only
in this case is E the gradient of a potential function. However, div B = 0 always
(this expresses the fact that there are no “magnetic charges”), so B is the curl of a
vector potential A. We then have
- .
1 ∂A 1 ∂B
curl E + = curl E + = 0,
c ∂t c ∂t

so E + c−1 ∂t A is the gradient of a function −ϕ. The four-component quantity


(ϕ, A) = (ϕ, A1 , A2 , A3 ) is called the electromagnetic 4-potential. It is best re-
garded as a vector in 4-dimensional space-time, with ϕ being the time component,
in the context of special relativity.

EXERCISES

1. Determine whether each of the following vector fields is the gradient of a func-
tion f , and if so, find f . The vector fields in (a)–(c) are on R2 ; those in (d)–(f)
are on R3 , and the one in (g) is on R4 . In all cases i, j, k, and l denote unit
vectors along the positive x-, y-, z-, and w-axes.
a. G(x, y) = (2xy + x2 )i + (x2 − y 2 )j.
b. G(x, y) = (3y 2 + 5x4 y)i + (x5 − 6xy)j.
c. G(x, y) = (2e2x sin y − 3y + 5)i + (e2x cos y − 3x)j
d. G(x, y, z) = (yz − y sin xy)i + (xz − x sin xy + z cos yz)j + (xy +
y cos yz)k.
e. G(x, y, z) = (y − z)i + (x − z)j + (x − y)k
f. G(x, y, z) = 2xyi + (x2 + log z)j + ((y + 2)/z)k (z > 0).
g. G(x, y, z, w) = (xw2 + yzw)i + (xzw + yz 2 − 2e2y+z )j + (xyw + y 2 z −
e2y+z − w sin zw)k + (xyz + x2 w − z sin zw)l.
2. Determine whether each of the following vector fields is the curl of a vector
field F, and if so, find such an F.
a. G(x, y, z) = (x3 + yz)i + (y − 3x2 y)j + 4y 2 k.
b. G(x, y, z) = (xy + z)i + xzj − (yz + x)k.
2 2 2 2
c. G(x, y, z) = (xe−x z − 6x)i + (5y + 2z)j + (z − ze−x z )k.
3. Let R be a bounded convex open set in R3 . Show that for any C 2 vector
field H on R there exist a C 2 function f and a C 2 vector field G such that
H = grad f + curl G. (Hint: Solve ∇2 f = div H.)
4. Let F = F1 i + F2 j be a C 1 vector field on S = R2 \ {(0, 0)} such that
∂1 F2 = ∂2 F1 on S (but F may be singular at the origin).
5.9. Higher Dimensions and Differential Forms 267

a. Let Cr be the
; circle of radius r about the origin, oriented counterclockwise.
Show that Cr F · dx is a constant α that does not depend on r. (Hint:
Consider the
; region between two circles.)
b. Show that C F · dx = α for any simple closed curve C, oriented counter-
clockwise, that encircles the origin.
c. Let F0 = (xj − yi)/(x2 + y 2 ) as in Example 1. Show that F − (α/2π)F0
is the gradient of a function on S. (Thus, all curl-free vector fields on S
that are not gradients can be obtained from F0 by adding gradients.)

5.9 Higher Dimensions and Differential Forms


Green’s theorem has to do with integrals of vector fields in the plane, and the di-
vergence theorem and Stokes’s theorem have do do with integrals of vector fields
in 3-space. What happens in dimension n? There are a couple of things we can say
without too much additional explanation.
First, the obvious analogue of the divergence theorem holds in Rn for any
n > 1. To wit, if R is a regular region in Rn bounded by a piecewise smooth
hypersurface ∂R, and F is a C 1 vector field on R, then
* * ** *
··· F · n dV n−1 = · · · div F dV n .
∂R R

Here dV is the n-dimensional volume element in Rn and dV n−1 is the (n − 1)-


n

dimensional “area” element on ∂R. The “vector area element” n dV n−1 is given
by a formula analogous to the one in R3 . Namely, if (part of) ∂R is parametrized
by x = G(u1 , . . . , un−1 ), then
⎛ ⎞
e1 ··· en
⎜ ∂1 G1 · · · ∂1 Gn ⎟
⎜ ⎟
n dV n−1 = det ⎜ .. .. ⎟ du1 · · · dun−1 ,
⎝ . . ⎠
∂n−1 G1 · · · ∂n−1 Gn
where e1 , . . . , en are the standard basis vectors for Rn . (The reader may verify that
in the case n = 2, these formulas yield Green’s theorem in the form (5.18).)
Second, the analogue of the divergence theorem in dimension 1 is just the fun-
damental theorem of calculus:
*
f (b) − f (a) = f ′ (t) dt.
[a,b]

On the real line, vector fields are the same thing as functions, and the divergence of
a vector field is just the derivative of a function. A regular region in R is an interval
268 Chapter 5. Line and Surface Integrals; Vector Analysis

[a, b], whose boundary is the two-element set {a, b}. Since the boundary is finite,
“integration” over the boundary is just summation, and the minus sign on f (a)
comes from assigning the proper “orientation” to the two points in the boundary.
There are also analogues of Stokes’s theorem in higher dimensions, which say
that the integral of some gadget G over the boundary of a k-dimensional submani-
fold of Rn equals the integral of another gadget formed from the first derivatives of
G over the submanifold itself. However, to formulate things properly in this general
setting, it is necessary to develop some additional algebraic machinery, the theory
of differential forms. To do so is beyond the scope of this book; what follows is
intended to provide an informal introduction to the ideas involved. For a detailed
treatment of differential forms, we refer the reader to Hubbard and Hubbard [7] and
Weintraub [19].
Roughly speaking, a differential k-form is an object whose mission in life is to
be integrated over k-dimensional sets; thus, 1-forms are designed to be integrated
over curves, 2-forms are designed to be integrated over surfaces, and so on. Here
is how the ideas of vector analysis that we have been studying can be reformulated
in terms of differential forms.

1-Forms. A differential 1-form on Rn is an expression of the form


ω = F1 (x1 , . . . , xn ) dx1 + · · · + Fn (x1 , . . . , xn ) dxn ,
where the Fj ’s are continuous functions. There is an obvious correspondence be-
tween the 1-form ω and the vector field F = (F1 , . . . , Fn ). In particular, in 3
dimensions the correspondence between 1-forms and vector fields takes the form
(5.66) ω = F dx + G dy + H dz ←→ F = F i + Gj + Hk.
One type of 1-form that we have already encountered is the differential of a C 1
function,
df = (∂1 f ) dx1 + · · · + (∂n f ) dxn .
However, not every 1-form is the differential of a function; the necessary condition
for ω to be of the form df is (5.61).
We note that the set of 1-forms on Rn is a vector space. That is, it makes sense
to add 1-forms to each other and to multiply them by scalars. In fact, the “scalars”
here can be taken to be not just constants but arbitrary continuous functions on Rn .
Thus, if α = A1 dx1 + · · · + An dxn and β = B1 dx1 + · · · + Bn dxn are 1-forms
and f is a continuous function,
α + β = (A1 + B1 ) dx1 + · · · + (An + Bn ) dxn ,
f α = (f A1 ) dx1 + · · · + (f An ) dxn .
5.9. Higher Dimensions and Differential Forms 269

Any smooth mapping T : Rk → Rn induces a mapping of 1-forms in the


opposite direction, that is, an operation T∗ which takes 1-forms on Rn to 1-forms
on Rk . Schematically:
T
Rk −→ Rn
T∗
1-forms on R ←− 1-forms on Rn
k

This operation is just the “built-in chain rule” for differentials of functions, ex-
tended to arbitrary 1-forms. To wit, let x1 , . . . , xn and u1 , . . . , uk be the coordi-
nates on Rn and Rk , respectively. If ω = F1 dx1 + · · · + Fn dxn is a 1-form on
Rn , its pullback via T is the 1-form T∗ ω on Rk defined by substituting into ω the
expressions for the x’s in terms of the u’s and the dx’s in terms of the du’s:
(5.67)
+ , + ,
∂x1 ∂x1 ∂xn ∂xn
T∗ ω = F:1 du1 + · · · + duk + · · · + F:n du1 + · · · + duk
∂u1 ∂uk ∂u1 ∂uk
+ , + ,
: ∂x1 : ∂xn : ∂x1 : ∂xn
= F1 + · · · + Fn du1 + · · · + F1 + · · · + Fn duk ,
∂u1 ∂u1 ∂uk ∂uk
where / 0
F:m (u1 , . . . , uk ) = Fm T(u1 , . . . , uk ) .
Two special cases are of particular interest. First, the chain rule says that when
ω = df , T∗ ω = d(f ◦ T). Second, when k = 1 so that T : R → Rn defines a
curve in Rn , (5.67) becomes
+ ,
dx1 dxn
T∗ ω = (F1 ◦ T) + · · · + (Fn ◦ T) du.
du du
1-forms can be integrated over curves. To begin with, a 1-form on R is merely
something of the form ω = g(t) dt, and its integral over an interval [a, b] is just
what you think it is:
* * b
ω= g(t) dt.
[a,b] a
Now, if ω = F1 dx1 + · · · + ;Fn dxn is a 1-form on Rn and C is a smooth curve
parametrized by x = g(t), C ω is defined by pulling ω back to R via g and
integrating the result as before:
* * * b+ ,
∗ dx1 dxn
ω= g ω= F1 (g(t)) + · · · + Fn (g(t)) dt.
C [a,b] a dt dt
In other words, if we identify ω with the vector field F as before,
* *
ω= F · dx.
C C
270 Chapter 5. Line and Surface Integrals; Vector Analysis

2-Forms and the Exterior Product. We now define a notion of a “product of


two 1-forms” that is related to the cross product of vector fields in R3 but works
in any number of dimensions. This product is called the exterior product; the
exterior product of two 1-forms α and β is denoted by α ∧ β. The novel feature of
this is that α ∧ β is no longer a 1-form but a new type of object called a 2-form.
Without specifying what a 2-form is just yet, we list the basic properties that
the exterior product is to have. First, it distributes over addition and scalar multi-
plication in the usual way. That is, if α1 , α2 , and β are 1-forms on Rn and f1 and
f2 are continuous functions on Rn ,

(f1 α1 + f2 α2 ) ∧ β = f1 (α1 ∧ β) + f2 (α2 ∧ β),


(5.68)
β ∧ (f1 α1 + f2 α2 ) = f1 (β ∧ α1 ) + f2 (β ∧ α2 ).

Second, the exterior product is anticommutative:

(5.69) β ∧ α = −α ∧ β.

Thus, if α = A1 dx1 + · · · + An dxn and β = B1 dx1 + · · · + Bn dxn , we can


expand α ∧ β according to (5.68) to obtain
n "
" n
(5.70) α∧β = Ai Bj dxi ∧ dxj .
i=1 j=1

But according to (5.69), dxj ∧ dxi = −dxi ∧ dxj and dxi ∧ dxi = 0. Thus the
terms with i = j in (5.70) drop out, and for i ̸= j we can combine the ijth and jith
terms into one:

Ai Bj dxi ∧ dxj + Aj Bi dxj ∧ dxi = (Ai Bj − Aj Bi ) dxi ∧ dxj


= (Aj Bi − Ai Bj ) dxj ∧ dxi .

We have the option of using either of the two expressions on the right, and the usual
choice is to use the one where the first index is smaller than the second one. (In R3
a different choice is sometimes convenient, as we shall soon see.) Thus, we finally
obtain "
α∧β = (Ai Bj − Aj Bi ) dxi ∧ dxj .
1≤i<j≤n

In general, a differential 2-form on Rn is an expression of the type


"
(5.71) ω= Cij (x1 , . . . , xn ) dxi ∧ dxj ,
1≤i<j≤n
5.9. Higher Dimensions and Differential Forms 271

where the Cij are continuous functions on Rn . We note that the number of terms
in this sum, that is, the number of pairs (i, j) with 1 ≤ i < j ≤ n, is 21 n(n − 1).
In (5.71) we also have the option of rewriting dxi ∧ dxj as −dxj ∧ dxi if we so
choose.
What does this really mean? We have been proceeding purely formally, without
saying what meaning is to be attached to the expressions dxi ∧dxj . In the full-dress
treatment of this subject, 2-forms are defined to be alternating rank-2 tensor fields
over Rn , but this is somewhat beside the point. For now it is probably best to
think of a 2-form on Rn simply as a 12 n(n − 1)-tuple of functions, namely the
functions Cij in (5.71), and the expressions dxi ∧ dxj simply as a convenient set of
signposts to mark the various components, just as i, j, and k are used to mark the
components of vector fields in R3 . The important features of 2-forms are not their
precise algebraic definition but the way they transform under changes of variables
and the way they integrate over surfaces.
Before proceeding to these matters, however, let us see how things look in the
3-dimensional case. When n = 3 we also have 12 n(n − 1) = 3, so 2-forms have 3
components just as vector fields and 1-forms do: This is the “accident” that makes
n = 3 special! The general 2-form on R3 can be written as

ω = F (x, y, z) dy ∧ dz + G(x, y, z) dz ∧ dx + H(x, y, z) dx ∧ dy,

so there is a one-to-one correspondence between 2-forms and vector fields:

(5.72) ω = F dy ∧ dz + G dz ∧ dx + H dx ∧ dy ←→ F = F i + Gj + Hk.

Observe carefully how we have set this correspondence up: we have written the
basis elements dxi ∧ dxj with the variables in cyclic order,

dx before dy before dz before dx,

rather than the “i < j” order we used above, so that the middle term is dz ∧ dx
rather than dx ∧ dz. Also, we identify the unit vector i in the x direction with the
2-form dy ∧ dz from which dx is missing, and likewise for j and k.
The exterior product in 3 dimensions looks like this: If

α = A1 dx + A2 dy + A3 dz, β = B1 dx + B2 dy + B3 dz,

then

α ∧ β = (A2 B3 − A3 B2 ) dy ∧ dz + (A3 B1 − A1 B3 ) dz ∧ dx
+ (A1 B2 − A2 B1 ) dx ∧ dy.
272 Chapter 5. Line and Surface Integrals; Vector Analysis

Thus, if we identify α and β with vector fields according to (5.66) and α ∧ β with
a vector field according to (5.72), the exterior product turns into the cross product:
α ←→ F, β ←→ G, α∧β ←→ F × G.

Pullbacks and Integrals of 2-Forms. We have seen that a smooth mapping


T : Rk → Rn induces a “pullback” mapping T∗ that takes 1-forms on Rn to 1-
forms on Rk . It also induces a pullback mapping, still denoted by T∗ , from 2-forms
on Rn !to 2-forms on Rk , in exactly the same way: We simply substitute T(u) for
x and j (∂xm /∂uj ) duj for dxm . Thus,
+ , + ,
∗ ∂xl ∂xl ∂xm ∂xm
T (dxl ∧ dxm ) = du1 + · · · + duk ∧ du1 + · · · + duk
∂u1 ∂uk ∂u1 ∂uk
" ∂(xl , xm )
= dui ∧ duj ,
∂(ui , uj )
i<j

so in general, if "
ω= Clm (x) dxl ∧ dxm ,
l<m
then
"" ∂(xl , xm )
T∗ ω = Clm (T(u)) dui ∧ duj .
∂(ui , uj )
l<m i<j
It is a consequence of the chain rule that the pullback operation behaves properly
under composition of mappings, namely, (T1 ◦ T2 )∗ ω = T∗2 (T∗1 ω).
We can now show how to integrate 2-forms over surfaces. First consider the
simplest case, where the surface is simply a region D in R2 . If we name the coor-
dinates on R2 x and y, the general 2-form on R2 has the form ω = f (x, y) dx ∧ dy,
and its integral over D is the obvious thing:
** **
(5.73) f (x, y) dx ∧ dy = f (x, y) dx dy,
D D
the integral on the right being the ordinary double integral of f over D. The only
subtle point is that the integral on the left is an oriented integral, the orientation
being carried in the fact that dx comes before dy in dx ∧ dy. If we wrote dy ∧ dx
instead, we would introduce a minus sign.
The nice thing about (5.73) is that the change-of-variable formula for double
integrals is more or less built into it. Namely, suppose T : R2 → R2 is an invertible
C 1 transformation, say T(u, v) = (x, y). If ω = f (x, y) dx ∧ dy, then
∂(x, y)
T∗ ω = f (T(u, v)) du ∧ dv = f (T(u, v))(det DT) du dv,
∂(u, v)
5.9. Higher Dimensions and Differential Forms 273

so the change-of-variable formula simply says that


** **
(5.74) ω= T∗ ω.
T(D) D

In other words, the formalism of differential forms produces the necessary Jacobian
factor automatically. The change-of-variable formula as we have seen it before
involved | det DT| rather than det DT, but this discrepancy is accounted for by
the difference between ordinary integrals and oriented integrals.
Now we turn to the case of integrals over a surface S in Rn . The idea is the
Rn and S is a surface parametrized
same as for line integrals: If ω is a 2-form on ;;
2
by x = G(u, v), (u, v) ∈ D ⊂ R , we define S ω by pulling ω back to D via G
and using (5.73) to define the resulting integral:
** **
ω= G∗ ω.
S D
: ◦T
This is independent of the parametrization, in the following sense: If G = G
2 2 1
where T : R → R is a C transformation, then by (5.74),
** ** **
G∗ ω = T∗ G: ∗ω = G: ∗ ω.
D D T(D)

Let us see how this looks in the case n = 3. If


ω = A dy ∧ dz + B dz ∧ dx + C dx ∧ dy and (x, y, z) = G(u, v),
then G∗ ω equals
+ ,
∂(y, z) ∂(z, x) ∂(x, y)
A(G(u, v)) + B(G(u, v)) + C(G(u, v)) du ∧ dv,
∂(u, v) ∂(u, v) ∂(u, v)
;;
and hence S ω equals
** + ,
∂(y, z) ∂(z, x) ∂(x, y)
A(G(u, v)) + B(G(u, v)) + C(G(u, v)) du dv.
D ∂(u, v) ∂(u, v) ∂(u, v)
But this is something we have seen before. Indeed, we have
∂(y, z) ∂(z, x) ∂(x, y) ∂G ∂G
i+ j+ k= × ,
∂(u, v) ∂(u, v) ∂(u, v) ∂u ∂v
so if we identify ω with the vector field F = Ai + Bj + Ck as in (5.72), we have
** **
ω= F · n dA.
S S

Hence the notion of surface integrals of vector fields in R3 also fits into the theory
of differential forms.
274 Chapter 5. Line and Surface Integrals; Vector Analysis

3-Forms. A differential 3-form on Rn is an expression of the form


"
(5.75) ω= Cijk (x1 , . . . , xn ) dxi ∧ dxj ∧ dxk .
1≤i<j<k≤n

Here, as in the case of 2-forms, one can think of the expressions dxi ∧ dxj ∧ dxk
simply as formal basis elements, and one can put the indices i, j, k in an order other
than i < j < k with the understanding that whenever one interchanges two of the
dx’s one introduces a minus sign. The number of terms in the sum in (5.75) is the
binomial coefficient n!/3!(n − 3)!. When n = 3, this number is 1: All 3-forms on
R3 have the form
ω = f (x, y, z) dx ∧ dy ∧ dz
and hence can be identified with functions:
f (x, y, z) dx ∧ dy ∧ dz ←→ f (x, y, z).
The notion of exterior product extends so as to yield a 3-form as the product
of three 1-forms or as the product of a 1-form and a 2-form. The idea is pretty
obvious: dxi ∧ dxj ∧ dxk is the exterior product of the three 1-forms dxi , dxj , and
dxk , or the 1-form dxi and the 2-form dxj ∧ dxk , or the 2-form dxi ∧ dxj and the
1-form dxk . The exterior product distributes over sums and scalar multiples in the
usual way, and the anticommutative law becomes
α ∧ β = (−1)l+m−1 β ∧ α if α is an l-form and β is an m-form.
Here is how it works when n = 3: If
α = A1 dx + A2 dy + A3 dz,
β = B1 dx + B2 dy + B3 dz,
γ = C1 dx + C2 dy + C3 dz,
ω = W1 dy ∧ dz + W2 dz ∧ dx + W3 dx ∧ dy,
then
⎛ ⎞
A1 A2 A3
α ∧ (β ∧ γ) = (α ∧ β) ∧ γ = det ⎝B1 B2 B3 ⎠ dx ∧ dy ∧ dz,
C1 C2 C3
α ∧ ω = ω ∧ α = (A1 W1 + A2 W2 + A3 W3 ) dx ∧ dy ∧ dz.
Thus, if we identify α, β, γ with the vector fields F, G, H and ω with the vector
field V, the exterior product turns into the scalar triple product and dot product:
α∧β∧γ ←→ F · (G × H), α∧ω ←→ F · V.
5.9. Higher Dimensions and Differential Forms 275

Pullbacks and integrals of 3-forms work just as before; we restrict ourselves to


the 3-dimensional case. Let ω = f (x, y, z) dx ∧ dy ∧ dz. If T : R3 → R3 is a
C 1 transformation, say T(u, v, w) = (x, y, z), we obtain T∗ ω by subsituting in the
formulas for x, y, z, dx, dy, and dz in terms of u, v, w; the result is
∂(x, y, z)
T∗ ω = f (T(u, v, w)) du ∧ dv ∧ dw.
∂(u, v, w)
The integral of ω over a region D ⊂ R3 is defined in the obvious way:
*** ***
f (x, y, z) dx ∧ dy ∧ dz = f,
D D
and the change-of variable formula (for oriented integrals) reads
*** ***
ω= T∗ ω.
T(D) D

We have now sketched the whole idea of differential forms in dimension 3.


In dimension n one needs to develop the theory of k-forms for all k ≤ n, which
requires the machinery of multilinear algebra.

The Exterior Derivative. When the operations of gradient, curl, and diver-
gence are expressed in terms of differential forms, they are all instances of a single
operation, denoted by d and called the exterior derivative, which maps k-forms
on Rn into (k + 1)-forms on Rn :
d d d d
0-forms −→ 1-forms −→ 2-forms −→ 3-forms −→ · · · .
Here’s how it works.
First, a 0-form is, by definition, a function; if f is a 0-form, then df is just the
differential of f . If we identify 1-forms with vector fields, df becomes ∇f . That
is, the gradient is the exterior derivative on 0-forms.
Now, any k-form ω with k ≥ 1 is a sum of terms of the form f β where f is a
function and β is one of the basis elements (dxi for 1-forms, dxi ∧ dxj for 2-forms,
etc.). dω is defined to be the (k + 1)-form obtained by replacing each such term f β
by df ∧ β.
This is what it looks like when ω = A1 dx1 + A2 dx2 + · · · + An dxn is a
1-form:
dω = dA1 ∧ dx1 + · · · + dAn ∧ dxn
+ , + ,
∂A1 ∂A1 ∂An ∂An
= dx1 + · · · + dxn ∧ dx1 + · · · + dx1 + · · · + dxn ∧ dxn
∂x1 ∂xn ∂x1 ∂xn
" + ∂Aj ∂Ai
,
= − dxi ∧ dxj .
∂xi ∂xj
i<j
276 Chapter 5. Line and Surface Integrals; Vector Analysis

When n = 3 and we write x, y, z instead of x1 , x2 , x3 , we obtain


+ , + ,
∂A3 ∂A2 ∂A1 ∂A3
dω = − dy ∧ dz + − dz ∧ dx
∂y ∂z ∂z ∂x
+ ,
∂A2 ∂A1
+ − dx ∧ dy.
∂x ∂y

But this is just the curl! That is, if we identify the 1-form ω and the 2-form dω
with vector fields F and G in the standard way, then G = curl F. The curl is the
exterior derivative on 1-forms in R3 .
Now suppose that ω = A dy ∧ dz + B dz ∧ dx + C dx ∧ dy is a 2-form. As the
notation in higher dimensions gets messy, we shall write out only the 3-dimensional
case:

dω = dA ∧ dy ∧ dz + dB ∧ dz ∧ dx + dC ∧ dx ∧ dy
= (∂x A dx + ∂y A dy + ∂z A dz) ∧ dy ∧ dz
+ (∂x B dx + ∂y B dy + ∂z B dz) ∧ dz ∧ dx
+ (∂x C dx + ∂y C dy + ∂z C dz) ∧ dx ∧ dy
= (∂x A + ∂y B + ∂z C) dx ∧ dy ∧ dz.

(For the last equality we have used the fact that an exterior product containing two
identical factors vanishes and the fact that the product dx ∧ dy ∧ dz is unchanged by
cyclic permutation of its three terms.) If we identify ω with a vector field F and dω
with a function g as before, we see that g = div F. The divergence is the exterior
derivative on 2-forms in R3 .
We observed earlier that curl(∇f ) = 0 for any function f and div(curl F) = 0
for any vector field F. The interpretation of these identities in terms of differential
forms is that d(df ) = 0 for any 0-form (function) f and d(dω) = 0 for any 1-form
ω. It is true in general that

(5.76) d(dω) = 0

for any k-form ω on Rn . In all cases the proof of this fact boils down to the equality
of mixed partials.
As an illustration of the exterior derivative, we give the relativistically covari-
ant reformulation of Maxwell’s equations (5.50). The key idea is to think of elec-
tromagnetism as a phenomenon in 4-dimensional space-time rather than a time-
dependent phenomenon in 3-dimensional space. The electric and magnetic fields
E = (Ex , Ey , Ez ) and B = (Bx , By , Bz ) are combined into a single entity, the
5.9. Higher Dimensions and Differential Forms 277

electromagnetic field tensor, which we identify in two ways with a 2-form on R4 :


ω = c(Ex dx ∧ dt + Ey dy ∧ dt + Ez dz ∧ dt)
+ Bx dy ∧ dz + By dz ∧ dx + Bz dx ∧ dy,

ω = c(Bx dx ∧ dt + By dy ∧ dt + Bz dz ∧ dt)
− Ex dy ∧ dz − Ey dz ∧ dx − Ez dx ∧ dy,
where c is the speed of light. Also, the current and charge densities ρ and J =
(Jx , Jy , Jz ) are combined into a single entity, the 4-current density, which we iden-
tify with a 3-form on R4 :
γ = Jx dy ∧ dz ∧ dt + Jy dz ∧ dx ∧ dt + Jz dx ∧ dy ∧ dt − ρ dx ∧ dy ∧ dz.
The four Maxwell equations (5.50) then turn into the two equations
dω = 0, dω ∗ = 4πγ.
The verification of this is a good way for readers to see whether they have learned
how to compute exterior derivatives!

Stokes’s Theorem. We can now state the general theorem that encompasses the
integral theorems of the preceding sections and their higher dimensional analogues:
5.77 Theorem (The General Stokes Theorem). Let M be a smooth, oriented k-
dimensional submanifold of Rn with a piecewise smooth boundary ∂M , and let
∂M carry the orientation that is (in a suitable sense) compatible with the one on
M . If ω is a (k − 1)-form of class C 1 on an open set containing M , then
* * ** *
··· ω= · · · dω.
∂M M
We conclude with a final suggestive remark. The formal differential-algebraic
identity d(dω) = 0 stated above has a geometric counterpart. The boundary of a
region in the plane is a closed curve with no endpoints, and the boundary of a region
in 3-space is a closed surface with no edge. In general, the boundary of a (smoothly
bounded) region M in a k-dimensional manifold is a (k − 1)-dimensional manifold
with no boundary, that is,
(5.78) ∂(∂M ) = ∅.
The general Stokes theorem shows that (5.76) and (5.78) are in some sense
equivalent. Indeed, if M is k-dimensional and ω is a (k − 2)-form, the Stokes
theorem gives
* * ** * ** * *
··· ω= ··· dω = · · · d(dω).
∂(∂M ) ∂M M
278 Chapter 5. Line and Surface Integrals; Vector Analysis

If we accept the geometric fact that ∂(∂M ) = ∅, then the integral on the left
vanishes, and hence so does the integral on the right. But since this happens for
every M , it follows that d(dω) = 0. Similarly, if we know that d(dω) = 0 for
every ω, we can conclude that ∂(∂M ) = ∅. This sort of interplay of algebra,
analysis, and geometry is a significant feature of much of modern mathematics.
Chapter 6

INFINITE SERIES

Infinite series are sums with infinitely many terms, of which the most familiar
examples are the nonterminating decimal expansions. For instance, the equality
π = 3.14159 . . . is an abbreviation of the statement that π is the sum of the infinite
series
1 4 1 5 9
3+ + 2 + 3 + 4 + 5 + ··· .
10 10 10 10 10
The procedure by which one makes sense out of such sums stands alongside dif-
ferentiation and integration as one of the fundamental limiting processes of mathe-
matical analysis. Just as decimal expansions provide a useful way of obtaining all
real numbers from the finite decimal fractions, infinite series provide a flexible and
powerful way of building complicated functions out of simple ones.
This chapter is devoted to the foundations of the theory of infinite series. In
it we develop the basic facts about series of numbers; then in the next chapter we
proceed to the study of series of functions.

6.1 Definitions and Examples


Informally speaking, an infinite series (or just a series, for short) is an expression
of the form
"∞
an = a0 + a 1 + a2 + · · · .
0

Here the ak ’s can be real numbers, complex numbers, vectors, and so on; for the
present, we shall mainly consider the case where they are real numbers.

279
280 Chapter 6. Infinite Series

It is not immediately
!∞ clear what precise meaning is to be attached to an expres-
sion of the form 0 an that involves a sum of infinitely many terms. The formal
definition must be phrased in terms of limits of finite sums, as follows.
Given a sequence {an }∞0 of real numbers (or complex numbers, vectors, etc.),
we can form a new sequence {sk }∞ 0 by adding up the terms of the original sequence
successively:

s0 = a0 , s 1 = a 0 + a1 , s2 = a0 + a1 + a2 , ...,
s k = a0 + a 1 + · · · + ak .

An infinite series is formally defined to be !a pair of sequences {an } and {sk } re-
lated by these equations, and the notation ∞ 0 an is to be regarded as a convenient
way of encoding this information. The an ’s are called the terms of the series, and
the sk ’s are called the partial sums of the series. If the sequence {sk } of partial
sums converges to a! limit S, then the series is said to be convergent, S is called its
sum, and we write ∞ 0 an = S; otherwise, the series !∞ is said to be divergent, and
no numerical meaning is attached! to the expression 0 an . (However, if sk → ∞
as k → ∞, we may say that ∞ 0 a n = ∞.)
Remark. We have elected to start the numbering of the sequences {an } and
{sk } at n = 0 and k = 0, since this is perhaps the most common situation in
practice. However, we could equally well start at some other point, for instance,

"
a n = a 5 + a6 + a 7 + · · · ,
5

for which we would write

s5 = a5 , s6 = a5 + a6 , s7 = a5 + a6 + a7 , . . . .

Before proceeding further, let us record a couple of very simple but important
facts about series.

6.1 Theorem.
! !∞
a. If
!the series ∞ 0 an and 0 bn are convergent, with sums S and T , then

0 (a n + b n
!∞) is convergent, with sum S + T .
b. If
!the series 0 an is convergent, with sum S, then for any c ∈ R the series

0 can is convergent,
! with sum cS.
c. If the series ∞ 0 a n is convergent,!then limn→∞ an = 0. Equivalently, if
an ̸→ 0 as n → ∞, then the series ∞ 0 an is divergent.
6.1. Definitions and Examples 281

!∞
Proof.
!∞ Let {s k } and {t k } be the sequences of partial sums of the series 0 an and
0 bn , respectively. (a) and (b) follow from the fact that if sk → S and tk → T ,
then sk + tk → S + T and csk → cS. As for (c), we observe that an = sn − sn−1 .
If the series converges to the sum S, it follows that lim an = lim sn − lim sn−1 =
S − S = 0.

At present we are thinking primarily of series whose terms are numbers, but
most of the really significant applications of series come !from situations where the
terms an depend on a variable x. In this case the series ∞ 0 an (x) may converge
for some values of x and diverge for others, and it defines a function whose domain
is the set of all x for which it converges. We shall explore this idea in more detail
in the next chapter; at this point we recall some familiar examples.
One of the simplest and most useful infinite series is the geometric series, in
which the ratio of two succeeding terms is a constant x. That is, the geometric
series with initial term a and ratio x is

"
2 3
a + ax + ax + ax + · · · = axn .
0

The constant a can be factored out, according to Theorem 6.1b, so it suffices to


consider the case a = 1.
!k n !∞ n
The partial sums sk = 0 x of the series 0 x are easily evaluated. If
x = 1, then of course sk = 1 + 1 + · · · + 1 = k + 1. If x ̸= 1, we observe that

s k = 1 + x + · · · + xk ,
xsk = x + · · · + xk + xk+1 ,

and subtracting the second equation from the first yields (1 − x)sk = 1 − xk+1 .
Therefore,

1 − xk+1
(6.2) sk = if x ̸= 1, sk = k + 1 if x = 1.
1−x

If |x| < 1, then xk+1 → 0 as k → ∞, so sk → (1 − x)−1 . It also follows easily


from (6.2), or from Theorem 6.1c, that {sk } diverges when |x| ≥ 1. In short, we
have:
!∞
6.3 Theorem. The geometric series 0 xn converges if and only if |x| < 1, in
which case its sum is (1 − x)−1 .
282 Chapter 6. Infinite Series

Another familiar result that leads to infinite series is Taylor’s theorem. We


recall that if f is a function of class C ∞ (that is, possessing derivatives of all orders)
on some interval (−c, c) centered at the origin, for any positive integer k we have
f (k)(0) k
(6.4) f (x) = f (0) + f ′ (0)x + · · · + x + Rk (x) (|x| < c).
k!
If it happens that Rk (x) → 0 as k → ∞, we can let k → ∞ in (6.4) to obtain an
infinite series expansion of f (x), the Taylor series of f (centered at x = 0):

" f (n) (0)
(6.5) f (x) = xn .
n!
0

One simple sufficient condition to guarantee that Rk (x) → 0 follows from the
estimate for the Taylor remainder in Corollary 2.61:
|x|k+1
|Rk (x)| ≤ sup |f (k+1) (t)| (|x| < c).
|t|≤|x| (k + 1)!
6.6 Theorem. Let f be a function of class C ∞ on the interval (−c, c), where
0 < c ≤ ∞.
a. If there exist constants a, b > 0 such that |f (k) (x)| ≤ abk k! for all |x| < c and
k ≥ 0, then (6.5) holds for |x| < min(c, b−1 ).
b. If there exist constants A, B > 0 such that |f (k) (x)| ≤ AB k for all |x| < c
and k ≥ 0, then (6.5) holds for |x| < c.
Proof. By Corollary 2.61, the estimate |f (k) (x)| ≤ abk k! implies the estimate
|Rk−1 (x)| ≤ a|bx|k for |x| < c. If also |x| < b−1 , then |bx|k → 0 as k → ∞,
so (6.4) yields the result (a). To deduce (b), we observe that the factorial function
grows faster than exponentially (see Example 5 in §1.4), so that for any positive
A, B, and b, the sequence A(B/b)k /k! tends to zero as k → ∞. Letting a be the
largest term in this sequence, we have
+ ,
k (B/b)k k
AB = A b k! ≤ abk k!,
k!
so the estimate |f (k) (x)| ≤ AB k , for a given A and B, implies the estimate
|f (k) (x)| ≤ abk k! for every b > 0 (with a depending on b). Hence (b) follows
from (a).

Remark. The interval (−c, c) might not be the whole set where the function
f and its derivatives are defined. It may be necessary to restrict x to a proper
subinterval of the domain of f to obtain the estimates on f (k) (x) in Theorem 6.6,
as Example 2 will show.
6.1. Definitions and Examples 283

E XAMPLE 1. Let f (x) = cos x. The derivatives f (k)(x) are equal to ± cos x or
± sin x, depending on k, so they all satisfy |f (k) (x)| ≤ 1 !
for all x. By Theorem
6.6b, it follows that cos x is the sum of its Taylor series, ∞ n 2n
0 (−1) x /(2n)!,
!∞all x. nFor
for exactly the same reason, sin x is the sum of its Taylor series,
2n+1
0 (−1) x /(2n + 1)!, for all x.
E XAMPLE 2. Let f (x) = ex . Here f (k) (x) = ex for all k. We cannot obtain a
good estimate on f (k) (x) that is valid for all x at once, but for |x| < c we have
|f (k) (x)|
!∞ < ec . By Theorem 6.6b, it follows that ex is the sum of!its Taylor
series, 0 xn /n!, for |x| < c. But c is arbitrary, so in fact ex = ∞ n
0 x /n!
for all x.
Finally, we; bmention one other simple type of series that arises from time to
time. Just as a f (x)
!∞ dx is easy to compute when f is the derivative of a known
function, the series 0 an is easy to sum when the terms an are the differences of
a known sequence {bn }. That is, suppose a0 = b0 and an = bn − bn−1 for n ≥ 1;
then
sk = a0 + a1 + · · · + ak = b0 + (b1 − b0 ) + · · · + (bk − bk−1 ) = bk ,
!
so the!series ∞0 an converges if and only if the sequence {bn } converges, in which

case 0 an = lim bn . Such series are called telescoping series.

EXERCISES
1. Find the values of x for which each of the following series converges and com-
pute its sum.
a. 2(x + 1) + 4(x + 1)4 + 8(x + 1)7 + · · · + 2n+1 (x + 1)3n+1 + · · ·
b. 10x−2 + 20x−4 + 40x−6 + · · · + 10 · 2n x−2(n+1) + · · ·
c. 1 + (1 − x)/(1 + x) + (1 − x)2 /(1 + x)2 + · · · + (1 − x)n /(1 + x)n + · · ·
d. log x + (log x)2 + (log x)3 + · · · + (log x)n + · · ·
2. Tell whether each of the following series converges; if it does, find its sum.
a. 1 + 34 + 58 + 169
+ 17
32 + · · ·
1 1 1 1
b. 1·2 + 2·3 + 3·4 + 4·5 + · · · (Hint: [n(n + 1)]−1 = n−1 − (n + 1)−1 ).
√ √ √ √ √ √
c. ( 2 − 1) + ( 3 − 2) + ( 4 − 3) + · · ·
d. 1 − 12 + 1 − 31 + 1 − 14 + 1 − 51 + · · ·
3. Let f (x) = log(1 + x). Show that the Taylor remainder R0,k (x) (defined by
(2.54)) tends to zero as k → ∞ for −1 < x ≤ 1, and conclude that

" xn
log(1 + x) = (−1)n+1 for − 1 < x ≤ 1.
1
n
284 Chapter 6. Infinite Series

(Hint: Lagrange’s formula for R0,k easily yields the desired result when − 12 <
x ≤ 1 but not when −1 < x ≤ − 21 . For x < 0, use the integral for-
mula (2.56) for R0,k and the mean value theorem for integrals to show that
|R0,k (x)| = |x|(x′ − x)n (x′ + 1)−n−1 for some x′ ∈ (x, 0), and thence show
that |R0,k (x)| < |x|n+1 /(1 + x).)
Bk
4. Given a sequence {an } of numbers, let B∞ 1 an denote the product of the num-
bers a1 , . . . , ak . The infinite product 1 an is said to converge to the number
P if the sequence of partial products converges to P :


H k
H
an = lim an = lim a1 a2 · · · ak .
k→∞ k→∞
1 1

(Note: In many books one finds a more complicated definition that takes ac-
B∞ role of the number 0 with regard to multiplication.)
count of the peculiar
a. Show that if 1 an converges to a nonzero number P , then limn→∞ an =
1. (This is theBanalogue of Theorem 6.1c for products.) !∞
b. Show that if ∞ 1 an converges to a nonzero number P , then 1 log an
converges after omission of those terms for which an < 0. (By (a), there
!∞ and no an can be 0.) Conversely,
can only be finitely many such terms, B
show that if an > 0 for all n and 1 log an converges to S, then ∞ 1 an
converges to eS . (See also Exercise 5 in §6.3.)

6.2 Series with Nonnegative Terms


In this section we begin the systematic study of the convergence of infinite series
by considering series with nonnegative terms. If an ≥ 0 for all n, the partial sums
sk = a0 + · · · + ak form an!increasing sequence. By the monotone sequence
theorem, therefore, the series ∞ 0 an converges if and only if the partial sums sk
have a finite upper bound. This observation leads to a variety of comparison tests,
in which the partial sums sk are compared to more easily computable quantities
that can be shown to be bounded or unbounded.

The Integral Test. If an = f (n) where f is a function of a real variable, a


! ;k
sum kn=j an can be compared to an integral j f (x) dx. The virtue of this idea is
that although integration is a more sophisticated concept than summation, integrals
are often easier to compute than sums! The fundamental theorem, whose pictorial
meaning is indicated in Figure 6.1, is as follows:
6.2. Series with Nonnegative Terms 285

;k
F IGURE 6.1: Comparison of j f (x) dx (the area under the curve)
!k−1 !k
with n=j f (n) and n=j+1 f (n) (its upper and lower Riemann
sums).

6.7 Theorem. Suppose f is a positive, decreasing function on the half-line [a, ∞).
Then for any integers j, k with a ≤ j < k,

k−1
" * k k
"
f (n) ≥ f (x) dx ≥ f (n).
n=j j n=j+1

Proof. Since f is decreasing, for n ≤ x ≤ n+1 we have f (n) ≥ f (x) ≥ f (n+1),


and hence
* n+1 * n+1 * n+1
f (n) = f (n) dx ≥ f (x) dx ≥ f (n + 1) dx = f (n + 1).
n n n

Adding up these inequalities from n = j to n = k − 1, we obtain the asserted


result.

An immediate corollary is the following test for convergence.

6.8 Corollary (The Integral Test). Suppose


!∞ f is a positive, decreasing function on
the half-line
;∞ [1, ∞). Then the series 1 f (n) converges if and only if the improper
integral 1 f (x) dx converges.
!k ;∞
Proof. Let sk = n=1 f (n). If 1 f (x) dx < ∞, we have

k
" * k * ∞
sk = f (1) + f (n) ≤ f (1) + f (x) dx ≤ f (1) + f (x) dx,
2 1 1
286 Chapter 6. Infinite Series

so the partial
; ∞ sums are bounded above and hence the series converges. On the other
hand, if 1 f (x) dx = ∞, we have
k−1
" * k
sk = f (n) + f (k) ≥ f (x) dx + f (k) → ∞ as k → ∞,
1 1

so the series diverges.


! ;∞
Of course, a similar result relates ∞ J f (n) to J f (x) dx, for any integer J.
We chose J = 1 because it is appropriate for the following important application.
!
6.9 Theorem. The series ∞ 1 n
−p converges if p > 1 and diverges if p ≤ 1.

;∞
Proof. The same is true of the integrals 1 x−p dx, for
* ∞ ) 7
−p x1−p ))K (p − 1)−1 if p > 1,
x dx = lim =
1 K→∞ 1 − p )1 ∞ if p < 1,
;∞ )K
and 1 x−1 dx = limK→∞ log x)1 = ∞.

Theorem 6.7 does more than provide a test for convergence; it also provides an
approximation to the partial sums and the full sum of the series. In the convergent
!
case, this can be used to provide a numerical approximation to the sum ∞ 1 f (n)
or an estimate of how many terms must be used for a partial sum to provide a good
approximation; in the divergent case, it can be used to estimate how rapidly the
partial sums grow. ;∞
Suppose, for example, that f is positive and decreasing, and that 1 f (x) dx <
∞. By letting k → ∞ in Theorem 6.7, we obtain
"∞ * ∞ "∞
f (n) ≥ f (x) dx ≥ f (n),
1 1 2

and hence * *
∞ ∞
" ∞
f (x) dx ≤ f (n) ≤ f (1) + f (x) dx.
1 1 1
!
This gives an approximation to the sum ∞ 1 f (n) with an error of at most f (1).
A better approximation can be obtained by using this estimate not for the whole
series but for its tail end:
* ∞ ∞
" * ∞
f (x) dx ≤ f (n) ≤ f (k) + f (x) dx.
k k k
6.2. Series with Nonnegative Terms 287

Adding on the first k − 1 terms of the series, we see that



" k−1
" * ∞
(6.10) f (n) = f (n) + f (x) dx, with an error of at most f (k).
1 0 k

The error f (k) will be as small as we please provided k is sufficiently large.


!
E XAMPLE 1. To evaluate ∞ 1 n
−4 with an error of at most 0.0001, we take

k = 10 in (6.10) to get

" * ∞
−4 −4 −4 −4
n ≈1 +2 + ··· + 9 + x−4 dx
1 10

= 1−4 + 2−4 + · · · + 9−4 + 31 10−3 .

A bit of work with a pocket calculator yields the!value of this last sum as
we can conclude that 1.08226 < ∞
1.08226 . . ., so! 1 n
−4 < 1.08236. (The
∞ −4 4
exact value of 1 n is π /90 = 1.0823232 . . . ; see Exercise 3 in §8.3 or
Exercise 9a in §8.6.)

General Comparison Tests. One can often decide whether a series of nonneg-
ative terms converges by comparing it to a series whose convergence or divergence
is known. The general method is as follows.
!∞
6.11 Theorem.
!∞ Suppose
!∞ 0 ≤ a n ≤ b n for n ≥
!∞0. If 0 bn converges, then so
does 0 an . If 0 an diverges, then so does 0 bn .
! ! !
Proof. Let sk = k0 an and tk = k0 bn ; thus 0 ≤ sk ≤ tk for all k. If ∞ 0 bn
converges, the numbers tk form a bounded set; hence so do the numbers sk , so the
sequence {sk } converges by the monotone sequence theorem. This proves the first
assertion, to which the second one is logically equivalent.

A couple of remarks are in order concerning this result. First, the convergence
or divergence of a series is unaffected if finitely many terms are deleted from or
added to the series. Hence, the comparison an ≤ bn only has to be valid for all
n ≥ N , where N is some (possibly large) positive integer. Second, the convergence
or divergence of a series is unaffected if all the terms of the series are multiplied by
a nonzero constant. Hence, the comparison an ≤ bn can be replaced by an ≤ cbn ,
where c is any positive number.
When an is an algebraic function of n (obtained from n by applying various
combinations of the arithmetic operations together with the operation
! of raising to
a
a power, x → x ), one can usually decide the convergence of an by comparing
288 Chapter 6. Infinite Series

!
it to one of the series ∞ −p
1 n , discussed in Theorem 6.9. The rule of thumb,
!
obtained by combining Theorems 6.9 and 6.11, is that
! if an ≥ cn−1 then an
diverges, whereas if an ≤ cn−p for some p > 1 then an converges.
!
E XAMPLE 2. ! The series ∞ 1 (2n − 1)
−1 = 1 + 1 + 1 + · · · diverges by
3 5
∞ −1
comparison to 1 n , for
1 1 1 1
> = · .
2n − 1 2n 2 n
!∞ 2
E XAMPLE 3. The series 1 (n − 6n + 10)−1 converges by comparison to
! ∞ −2
1 n , but here the comparison takes more work to establish. Since 6n > 10
except for n = 1, it is not true that (n2 − 6n + 10)−1 ≤ n−2 . However, we can
observe when n > 12 we have 6n < 21 n2 , and hence
1 1 1 2
< 2 < 2 = 2 (n > 12),
n2 − 6n + 10 (n /2) + 10 (n /2) n
which gives the desired comparison. However, there is also a simpler way to
proceed. The key observation is that when n is large, −6n + 10 is negligibly
small in comparison with n2 , so (n2 − 6n + 10)−1 is practically equal to n−2 .
More precisely,
(n2 − 6n + 10)−1 n2 1
= = → 1 as n → ∞,
n−2 n2 − 6n + 10 1 − 6n−1 + 10n−2
which immediately gives the comparison (n2 − 6n + 10)−1 < 2n−2 when n is
large.
The second method for solving Example 3 can be formulated quite generally;
the result is often called the limit comparison test:
6.12 Theorem. Suppose {an } and {bn } are sequences of positive numbers ! and
an /bn approaches a positive, finite limit as n → ∞. Then the series ∞
that ! 0 an

and 0 bn are either both convergent or both divergent.
Proof. If an /bn → l as n → ∞, where 0 < l < ∞, we have 12 l < an /bn < 2l
when n is large; that is, an < 2lbn and bn < (2/l)an . The result therefore follows
from Theorem 6.11 and the remarks following it.

! If an /bn → 0 as n → ∞, then!
Theorem 6.12 can be extended a little. an <
bn for large n, so the convergence of bn will imply the convergence of !an .
Likewise, if an /bn → ∞, then ! an > bn for large n, so the convergence of an
will imply the convergence of bn . However, the reverse implications are not
valid in these cases.
6.2. Series with Nonnegative Terms 289

Comparisons to the Geometric Series. There are a couple of very!useful


convergence tests that are based on a comparison to the geometric series ∞ n
0 r ,
where r > 0. We recall that this series converges for r < 1 and diverges for r ≥ 1.

6.13 Theorem (The Ratio Test). Suppose {an } is a sequence of positive num-
bers. !
a. If an+1 /an < r for all sufficiently large n, where r < 1, then the series ∞0 an
converges.! On the other hand, if an+1 /an ≥ 1 for all sufficiently large n, then
the series ∞ 0 an diverges. !
b. Suppose that l = limn→∞ an+1 /an exists. Then the series ∞ 0 an converges
if l < 1 and diverges if l > 1. No conclusion can be drawn if l = 1.

Proof. Suppose an+1 /an < r < 1 for all n ≥ N . Then

aN +1 < raN , aN +2 < raN +1 < r 2 aN ,


aN +3 < raN +2 < r 3 aN , . . . ,
!∞
so aN +m < r m aN for all m > 0. The
! m series 0 an therefore converges by
comparison to the geometric series r :

"
an < a0 + · · · + aN −1 + aN (1 + r + r 2 + · · · ) < ∞.
0

! if an+1 /an ≥ 1 then an+1 ≥ an ; if this is so for all n ≥ N ,


On the other hand,
then an ̸→ 0, so an cannot converge. This proves (a).
Assertion (b) is a corollary of (a). If l < 1, choose
! r with l < r < 1. If
lim an+1 /an = l, then an+1 /an <!r for large n, so an converges. If l > 1,
then an+1 /an !≥ 1 for large n, so an diverges. Finally, if we take an = n−p ,

we know that 1 an converges if p > 1 and diverges if p ≤ 1; but an+1 /an =
[n/(n + 1)]p → 1 no matter what p is. Hence the test is inconclusive if l = 1.

6.14 Theorem (The Root Test). Suppose {an } is a sequence of positive num-
bers. !
1/n
a. If an < r for all sufficiently large n, where r < 1, then the series ∞ 0 an
1/n
! On the other hand, if an ≥ 1 for all sufficiently large n, then the
converges.
series ∞0 an diverges.
1/n !
b. Suppose that l = limn→∞ an exists. Then the series ∞ 0 an converges if
l < 1 and diverges if l > 1. No conclusion can be drawn if l = 1.
1/n
Proof. If an < r, we have an < r n , so we have an immediate comparison to the
! ! 1/n
geometric series r n that gives
! the convergence of an when r < 1. If an ≥ 1
then an ≥ 1, so an ̸→ 0 and an diverges. This proves (a).
290 Chapter 6. Infinite Series

1/n
Part (b) follows as in the proof of the ratio test. If an → l < 1, we pick
1/n ! 1/n
r ∈ (l, 1) and obtain an < r for large n, so an converges. If an → l > 1,
1/n !
then an ≥ 1 for large n, and an diverges. Finally, for an = n−p we have
1/n
an = n−p/n → 1 for any p, so the test is inconclusive when l = 1.

Note: In the last line of this proof, and in Example 4 below, we use the fact
that limx→∞ x1/x = 1. To see, this, observe that log(x1/x ) = (log x)/x, and
limx→∞ (log x)/x = 0 by l’Hôpital’s rule.
1/n
It can be shown that if an+1 /an converges to a limit l, then an also converges
1/n
to the same limit; but the convergence of an does not imply the convergence of
an+1 /an . (See Example 6.) Thus the root test is, in theory, more powerful than
the ratio test. However, the ratio test is often more convenient to use in practice,
especially for series whose terms involve factorials or similar sorts of products.

E XAMPLE 4. Let an = n2 /2n . !The ratio test and the root test can both be used
to establish the convergence of ∞0 an :
+ ,
an+1 (n + 1)2 /2n+1 1 n+1 2 1 1 1
= 2 n
= → , a1/n
n = n2/n → .
an n /2 2 n 2 2 2

1 · 4 · 7 · · · (3n + 1)
E XAMPLE 5. Let an = . Here the root test is cumber-
2n n!
some, but the ratio test works easily:

an+1 1 · 4 · · · (3n + 1)(3n + 4)/2n+1 (n + 1)! 3n + 4 3


= = → ,
an 1 · 4 · · · (3n + 1)/2n n! 2(n + 1) 2
!
so ∞0 an diverges.

E XAMPLE 6. Let an = 2−n/2 if n is even and an = 2−(n−1)/2 if n is odd; thus



"
1 1 1 1 1 1
an = 1 + 1 + 2 + 2 + 4 + 4 + 8 + 8 + · · ·.
0

Here an+1 /an equals 1 if n is even and 12 if n is odd, so the ratio test (even
the more general form in part (a) of Theorem 6.13) fails; its hypotheses are not
1/n
satisfied. But the root test works: an equals 2−1/2 if n is even and 2−(n−1)/2n
if n is odd; both of these expressions converge to 2−1/2 as n → ∞, so the series
converges. (Of course, this can!∞also be proved
!∞ −m more simply. By grouping the
terms in pairs, one sees that 0 an = 2 0 2 = 4.)
6.2. Series with Nonnegative Terms 291

Raabe’s Test. The ratio test and the root test are, in a sense, rather crude,
1/n
for the indecisive cases where lim an+1 !/a n = 1 or lim an = 1 include many
commonly encountered series such as !∞ 1 n −p . The reason for this insensitivity is

that the terms of the geometric series r n either converge to zero exponentially
fast (if r < 1) or not at all (if r ≥ 1), so they do not furnish a useful comparison for
quantities such as n−p that tend to zero only polynomially fast. However, there is
another test, Raabe’s test, that is sometimes useful in the case where lim an+1 /an =
1. The class of problems for which Raabe’s test is effective is rather limited, and
there is another way of attacking the most important of them that we shall present in
§7.6. Hence we view Raabe’s test as an optional topic; however, the insight behind
it is of interest in its own right.
!
The idea
! −p is to use the ratios a n+1 /a n to compare the series !an to one of the
series n rather than to the geometric series. For the series n−p , the ratio
−p −p −p
of two successive terms is (n + 1) /n = [1 + (1/n)] . To put this quantity
in a form more amenable to comparison, we use the tangent line approximation to
the function f (x) = (1 + x)−p at x = 0. Since f ′ (x) = −p(1 + x)−p−1 and
f ′′ (x) = p(p + 1)(1 + x)−p−2 , Lagrange’s formula for the error term gives

p(p + 1) 2
(1 + x)−p = 1 − px + E(x), 0 < E(x) < x for x > 0.
2
Hence,
+ ,
(n + 1)−p 1 −p p p(p + 1)
(6.15) −p
= 1+ = 1 − + En , 0 < En < .
n n n 2n2

Thus, n[1 − (n + 1)−p /n−p ] is approximately p when n is large. With this in mind,
we are ready for the main result.

6.16 Theorem (Raabe’s Test). Let {an } be a sequence of positive numbers. Sup-
pose that + ,
an+1 an+1
→ 1 and n 1 − → L as n → ∞.
an an
! !
If L > 1, the series an converges, and if L < 1, the series an diverges. (If
L = 1, no conclusion can be drawn.)

Proof. If L > 1, choose a number p with 1 < p < L. Then, when n is large, we
have n[1 − (an+1 /an )] > p, that is, an+1 /an < 1 − (p/n). Thus, by (6.15),

an+1 p (n + 1)−p an+1 an


<1− < , or < −p .
an n n−p (n + 1)−p n
292 Chapter 6. Infinite Series

Thus the sequence {an /n−p } is decreasing, so !


it is bounded above by a constant
C. −p
! In−pother words, an ≤ Cn , so since p > 1, an converges by comparison to
n .
On the other hand, if L < 1, choose numbers p and q with L < q < p < 1.
Then, when n is large, we have n[1 − (an+1 /an )] < q, that is, (an+1 /an ) >
1 − (q/n). If also n > p(p + 1)/2(p − q), we have p(p + 1)/2n2 < (p − q)/n, so
by (6.15),
an+1 q p p−q p (n + 1)−p
>1− =1− + > 1 − + En = .
an n n n n n−p
Thus (n + 1)−p /an+1 < n−p /an , so the sequence {n−p /a !n } is decreasing. As
before, this!gives n−p ≤ Can , and p < 1 in this case, so an diverges by com-
parison to n . −p

The main applications of Raabe’s test are to series whose terms involve quo-
tients of factorial-like products. The following example is typical.
1 · 4 · 7 · · · (3n + 1)
E XAMPLE 7. Let an = . We have
n2 3n n!
an+1 1 · 4 · · · (3n + 1)(3n + 4)/(n + 1)2 3n+1 (n + 1)! (3n + 4)n2
= = .
an 1 · 4 · · · (3n + 1)/n2 3n n! 3(n + 1)3

This tends to 1 as n → ∞ (the dominant term on both top and bottom is 3n3 ),
so the ratio test fails. But
+ , + ,
an+1 (3n + 4)n2 5n3 + 9n2 + 3n 5
n 1− =n 1− 3
= 3
→ ,
an 3(n + 1) 3(n + 1) 3
!
and 35 > 1, so the series an converges.
!
Concluding Remarks. Faced with an infinite series an , how does one de-
cide how to test it for convergence? Some series require more cleverness than
others, but the following rules of thumb may be helpful.
!
• Does an → 0 as n → ∞? If not, an diverges.

• If an is an algebraic function of n (say, a rational function of n, or!a similar


expression involving fractional powers of n), try comparison with n−p for
a suitable value of p.

• If an involves expressions with n in the exponent, try the ratio test or the root
test.
6.2. Series with Nonnegative Terms 293

• If an involves factorial-like products, the ratio test is the best bet. If the ratio
test fails because lim an+1 /an = 1, try Raabe’s test.

• The integral test may be useful when numerical estimates are desired or when
the series is near the borderline between convergence and divergence.
In any case, one should beware of confusing the ! various sequences that arise in
the study of infinite series. For any infinite series an , one has the sequence {an }
of terms and the sequence {sk } of partial sums. In the ratio test, one considers the
sequence {an+1 /an } of ratios of successive terms of a series, whereas in the limit
comparison test, one considers the sequence {an /bn } of ratios of corresponding
terms of two different series. Don’t mix these sequences up!

EXERCISES
In Exercises 1–18, test the series for convergence.
∞ √
" n+1
1. 2
.
n − 4n + 5
0

"
2. ne−n .
1

" 2n2 − n
3. .
1
2n8/3 + n

" n+1
4. .
n!
1

" (2n + 1)3n
5. .
(3n + 1)2n
0

" 12 · 32 · · · (2n + 1)2
6. .
0
3n (2n)!

" n!
7. .
1
10n

"
8. (log n)−100 .
2

" 1 · 3 · · · (2n + 1)
9. .
0
2 · 5 · · · (3n + 2)
294 Chapter 6. Infinite Series


" (n!)2
10. .
(2n)!
0

" 3n n!
11. .
nn
0
"-

n
.n 2
12. .
n+1
1

"
13. [1 − cos(1/n)].
1
∞ √ √
" n+1− n
14. √ .
1
n+2

" n
15. sin .
n2 + 3
1

" n2 [π + (−1)n ]n
16. .
5n
1

" 1 · 3 · · · (2n − 1)
17. .
4 · 6 · · · (2n + 2)
1

" 2 · 4 · · · (2n)
18. .
1
3 · 5 · · · (2n + 1)
! ! p
19. Suppose an > 0. Show that if an converges, then so does an for any
p > 1.

" 1
20. Show that converges if p > 1 and diverges if p ≤ 1.
2
n(log n)p

"
1
21. For which p does converge?
4
n(log n)(log log n)p
! !∞
22. By Exercise 20, ∞ 2 1/[n log n] diverges while
2
2 1/[n(log n) ] converges.
Use Theorem 6.7 to show that
1040 ∞
" 1 " 1
4.88 < < 5.61, ≈ 0.011.
2
n log n n(log n)2
1040

The point is that for series such as these that are near the borderline between
convergence and divergence, attempts at numerical approximation by adding
6.3. Absolute and Conditional Convergence 295

up the first few terms aren’t much use. If you add up the first 1040 terms of the
first series, you get no clue that the series diverges; and if you add up the first
1040 terms of the second one, the answer you get still differs from the full sum
in the second decimal place. (By way of comparison, the universe is around
1018 seconds old, and the earth contains around 1050 atoms.)
2 2 −1/2 , and thence show that
!∞x/(x 2+ 1) 2 is decreasing for x ≥ 3
23. Verify that
0.38 < 1 n/(n + 1) < 0.41.
24. Let ck = 1 + 12 + · · · + k1 − log k. Show that the sequence {ck } is positive
and decreasing, and hence convergent. (limk→∞ ck is conventionally denoted
by γ and is called Euler’s constant or the Euler-Mascheroni constant. It is
approximately equal to 0.57721; it is conjectured to be transcendental, but at
present no one knows whether it is even irrational.)
1/n
25. Suppose an > 0 for!all n > 0, and let L = lim sup an (see Exercises 9–12
in §1.5). Show that ∞
1 an converges if L < 1 and diverges if L > 1.

6.3 Absolute and Conditional Convergence


We now consider the question of convergence of series whose terms may be either
positive or negative. To a certain extent, this question may be reduced to the study
of series with nonnegative terms, via the notion of absolute convergence.
! !∞
A series ∞ 0 an is called absolutely convergent if the series 0 |an | con-
verges. For series with nonnegative terms, absolute convergence is the same thing
as convergence. For more general series, the basic result is as follows.

6.17 Theorem. Every absolutely convergent series is convergent.


! !k !k
Proof. Suppose ∞ 0 |an | converges. Let sk = 0 an and Sk = 0 |an |. The
sequence {Sk } is convergent and hence Cauchy, so given ϵ > 0, there exists an
integer K such that

|aj+1 | + · · · + |ak | = Sk − Sj < ϵ whenever k > j ≥ K.

But then

|sk − sj | = |aj+1 + · · · + ak | ≤ |aj+1 | + · · · + |aj | < ϵ whenever k > j ≥ K,

{sk } is also Cauchy. By Theorem 1.20, the sequence {sk }, and


so the sequence !
hence the series an , is convergent.
296 Chapter 6. Infinite Series

Important Remark. We can consider series whose terms are complex numbers
or n-dimensional vectors instead of real numbers. The definition of absolute con-
vergence is the same, with |an | denoting the norm of the vector an . Theorem 6.17
remains valid in this more general setting, with exactly the same proof.
The converse of Theorem 6.17 is false; a series that is not absolutely convergent
may still converge because of cancellation between the positive and negative terms.
A series that converges but does not converge absolutely is said to be conditionally
convergent.
E XAMPLE 1. Let an = 1/(n + 1) if n is even, an = −1/n if n is odd; thus,

"
1 1 1 1
an = 1 − 1 + 3 − 3 + 5 − 5 + · · ·.
0

Clearly sk = 0 if k is odd and sk = 1/(k + 1) if k is even, so the series


converges to the sum 0. However,

" ∞
"
1 1 1
|an | = 1 + 1 + 3 + 3 + ··· = 2 ,
0 0
2n + 1
!
which diverges by comparison to n−1 .
E XAMPLE 2. Here is a more interesting example. The series

" (−1)n−1 1 1 1
=1− 2 + 3 − 4 + ···
n
1
!
is not absolutely convergent since ∞1 n
−1 diverges. However, it is the Taylor

series for f (x) = log(1 + x) at x = 1. Indeed, for n > 0 we have f (n) (x) =
(−1)n−1 (n − 1)!(1 + x)−n , so Taylor’s formula gives
k
" (−1)n−1 (n − 1)!
log(1 + x) = xn + Rk (x)
n!
1
x2 x3 (−1)k−1 xk
=x− + + ··· + + Rk (x),
2 3 k
and by Corollary 2.61,
) )
1 ) (−1)k k! )
|Rk (1)| ≤ )
sup ) )= 1 ,
(k + 1)! 0≤t≤1 (1 + t) ) k + 1
k

!
which tends to zero as k → ∞. It follows that ∞ 1 (−1)
n−1 /n converges to

log 2.
6.3. Absolute and Conditional Convergence 297

It is to be emphasized that conditionally convergent series converge only be-


cause of cancellation between positive and negative terms. More precisely, let

a+
n = max(an , 0) a−
n = max(−an , 0).

That is, a+ + −
n = an if an is positive and an = 0 otherwise, and an = |an | if an
− +
! and an = 0 otherwise;
is negative the nonzero an ’s are the positive terms of the
series an , and the nonzero a−n ’s are the absolute values of the negative terms.
Observe that
a+ −
n − an = a n , a+ −
n + an = |an |.
! ! + ! −
6.18 Theorem. If !an is absolutely convergent, the series a!n and a!
n are
both convergent. If an is conditionally convergent, the series a+
n and a−n
are both divergent.

Proof. This theorem follows from the following three facts:


! ! ! −
i. The convergence of! |an | implies the convergence of a+ n and ! an .
ii. !
The divergence of |an | implies the divergence of at least one of a+
n and

an .
! ! + ! −
iii. If an converges, it cannot happen that one of an and an converges
while the other one diverges.
The first of these is clear since 0 ≤ a+ −
n ≤ |an | and 0 ≤ an ≤ |an |, and the second
+ −
is clear since |an | =!an + an .! As for the third, let sk and s±k denote the kth partial
± + −
sums! of the series an !
and an ; thus sk = sk − sk . Suppose, to be definite,
that a+
n = ∞ while a− = S < ∞; then for any C > 0 (no matter how
n
large), for sufficiently large k we will have s+ k > C! + S, while s−k ≤ S, so that
sk > C + S − S = C. It follows that sk → +∞, so an diverges.

Absolutely convergent series are much more pleasant to deal with than condi-
tionally convergent ones. For one thing, they converge more rapidly; the partial
sums sk of conditionally convergent series tend to provide poor approximations
! to
the full sum unless one takes k very large because the divergence of |an | implies
that an cannot tend to zero very rapidly as n → ∞. For another thing, the sum
of an absolutely convergent series cannot be affected by rearranging the terms, but
this is not the case for conditionally convergent series!
!∞Let us explain this mysterious statement in more detail. The terms of a series
0 an are presented in a definite order: a0 , a1 , a2 , . . .. We might think of forming
a new series by writing down these terms in a different order, such as

a0 , a2 , a1 , a4 , a6 , a3 , a8 , a10 , a5 , . . . ,
298 Chapter 6. Infinite Series

where we take the first two even-numbered terms, the first odd-numbered term,
the next two even-numbered terms, the next odd-numbered term, and so forth. In
general, if σ is any one-to-one
! mapping from the set of nonnegative integers! onto it-
self, we can form the series ∞ 0 aσ(n) , which we call a rearrangement of ∞
0 an .
(The reasons why we would want to do this are perhaps not so clear right now, but
we will encounter situations in §6.5 where this issue must be addressed.) The sharp
contrast between absolutely and conditionally convergent series with respect to re-
arrangements is explained in the following two theorems.
!∞
6.19 Theorem.!∞ If 0 an is absolutely convergent with sum S, then every rear-
rangement 0 aσ(n) is also absolutely convergent with sum S.
!
Proof. First suppose an ≥ 0 for all n. Every! term of the rearranged series aσ(n)
is among the terms of the original series an , and hence the partial sums of the
rearranged series cannot exceed S. It follows that the full sum S ′ of the rearranged
series satisfies S ′ ≤ S. The same reasoning shows that S ≤ S ′ , so S ′ = S.
! !
Now we do the general case. If |an | < ∞, we have |aσ(n) | < ∞ by what
we have just proved. Hence, given ϵ > 0, for k sufficiently large we have

" ∞
"
|an | < ϵ and |aσ(n) | < ϵ.
k+1 k+1

Given such a k, let K be the largest of the numbers σ(0), . . . , σ(k), so that
% & % &
σ(0), σ(1), . . . , σ(k) ⊂ 0, 1, . . . , K .

The elements of {0, 1, . . . , K} \ {σ(0), σ(1), . . . , σ(k)} are among the σ(n)’s with
n ≥ k + 1, so
)" ) "
)K " k
) ∞
) an − )
aσ(n) ) ≤ |aσ(n) | < ϵ.
)
0 0 k+1

But then
) k ) ) k K ) )K ) ∞
)" ) )" " ) )" ) "
) ) )
aσ(n) − S ) ≤ ) aσ(n) − ) )
an ) + ) )
an − S ) ≤ ϵ + |an | < 2ϵ.
)
0 0 0 0 K+1
!∞
As ϵ is arbitrary, we conclude that 0 aσ(n) = S.
!
6.20 Theorem. Suppose ∞ 0 an!is conditionally convergent. Given any real num-
ber S, there is a rearrangement ∞ 0 aσ(n) that converges to S.
6.3. Absolute and Conditional Convergence 299

! ! −
Proof.!By Theorem 6.18, the series ! a+ n and an of positive and negative terms
from an both diverge; but since an converges, we have an → 0 as n → ∞.
These pieces of information are all we need.
Suppose S ≥ 0. (A similar argument works for S < 0.) We construct the
desired rearrangement as follows: !
1. Add up the positive terms from the series! an (in their original order) until
the sum exceeds S. This is possible since a+ n = ∞. Stop as soon as the sum
exceeds S.
2. Now start adding in the negative terms (in their original
! − order) until the sum
becomes less than S. Again, this is possible since an = ∞. Stop as soon as
the sum is less than S.
3. Repeat steps 1 and 2 ad infinitum. That is, add in positive terms until the sum
is greater than S, then add in negative terms until the sum ! is+ less than ! − S, and
so forth. This process never terminates since the series an and an both
diverge, and sooner or later every term from the original! series will be added
into the new series. The result is a rearrangement ∞ 0 aσ(n) of the original
series.
We claim that this rearrangement converges to S. Indeed, given ϵ > 0, there exists
an integer N so that |an | < ϵ if n > N . If we choose K large enough so that all
the terms a0 , a1 , . . . , aN are included among the terms aσ(0) , aσ(1) , . . . aσ(K) , then
!
|aσ(n) | < ϵ if n > K. It follows that the partial sums k0 aσ(n) differ from S by
less than ϵ if k > K, because the procedure specifies switching from positive to
negative terms or vice versa as soon as the sum is greater than or less than S; if
the sum became greater than S + ϵ or less than! S − ϵ, we would have added in too
many terms of the same sign. Hence the sums k0 aσ(n) converge to S.

EXERCISES
1. Show!that the following series are absolutely convergent.
a. !∞0 x n cos nθ (|x| < 1, θ ∈ R).
b. !∞1 n −2 sin nθ (θ ∈ R).
∞ n 2 1−n
c. 1 (−1) n 3 xn (|x| < 3).
!
2. Suppose
! an is conditionally convergent. Show that there are rearrangements
of an whose partial sums diverge to +∞ or −∞.
!
3. Consider the rearrangement of the series ∞ 1 (−1)
n−1 /n obtained by taking

two positive terms, one negative term, two positive terms, one negative term,
and so forth:
1 1 1 1 1 1 1 1
1+ 3 − 2 + 5 + 7 − 4 + 9 + 11 − 6 + ··· .
300 Chapter 6. Infinite Series

Show that the sum of this series is 32 log 2. (Hint: Deduce from Example 2 that
0 + 21 + 0 − 14 + 0 + 16 + 0 − · · · = 12 log 2 and add this to the result of Example
2.)
! !∞
4. Let ∞ 0 an be a convergent series, and let 0 bn be its rearrangement ob-
tained by interchanging each even-numbered term with the odd-numbered term
immediately
! !following it: a1 + a0 + a3 + a2 + a5 + a4 + · · · . Show that
∞ ∞
0 nb = 0 a n .
5. Suppose an > −1 for all n. By suitable applications of Taylor’s theorem to the
x
! log(1 + x) or e , show the following: !
functions
a. an is absolutely convergent if and only if log(1 + an ) is absolutely
Bof interest in connection with Exercise 4 of §6.1: If
convergent. (This is
!
|an | < ∞, then (1 + an ) converges.)
!

b. Let an = (−1)n+1 / n. ! Then ∞ 1 an is conditionally convergent (see

Theorem 6.22 below), but 1 log(1 + an ) diverges.

6.4 More Convergence Tests


The tests we developed in §6.2 for the convergence of series of nonnegative terms
immediately yield tests for the absolute convergence of more general series. We
sum up the most important results:

6.21 Theorem.
!
a. If |an | ≤ Cn−1−ϵ for some C, ϵ > ! 0, then an converges absolutely. If
−1
|an | ≥ Cn for some C > 0, then an either converges conditionally or
diverges. !
b. (The Ratio Test) If |an+1 /an | → l as n → ∞, then an converges absolutely
if l < 1 and diverges if l > 1. !
c. (The Root Test) If |an |1/n → l as n → ∞, then an converges absolutely if
l < 1 and diverges if l > 1.

In the ratio and root tests, the divergence (rather than conditional convergence)
when l > 1 is guaranteed because an ̸→ 0 in this case; see the proofs of Theorems
6.13 and 6.14. The statements of the ratio and root tests can be sharpened a bit as
in Theorems 6.13a and 6.14a.
Warning. It is a common mistake to obtain incorrect results ! by forgetting the
absolute values in Theorem 6.21. For example, the series ∞ 0 (−2) n satisfies

an+1 /an = −2, and −2 < 1, but the series diverges!


It remains to investigate criteria that will yield information about conditional
convergence as well as absolute convergence. By far the most commonly used
6.4. More Convergence Tests 301

result of this kind pertains to alternating series, that is,


!seriesnwhose terms
! alternate
in sign. Such a series can be written in the form (−1) an or (−1)n−1 an
(depending on whether the even or odd numbered terms are positive), where an >
0; we shall consider the first form for the sake of definiteness.
6.22 Theorem (The Alternating Series Test). !Suppose the sequence {an } is de-
creasing and limn→∞ an = 0. Then the series ∞ n
0 (−1) an is convergent. More-
over, if sk and S denote the kth partial sum and the full sum of this series, we
have

sk > S for k even, sk < S for k odd, and |sk − S| < ak+1 for all k.

Proof. Since ak ≥ ak+1 for all k, we have

s2m+1 = s2m−1 + a2m − a2m+1 ≥ s2m−1 ,


s2m+2 = s2m − a2m+1 + a2m+2 ≤ s2m .

Thus the sequence {s2m−1 } of odd-numbered partial sums is increasing and the
sequence {s2m } of even-numbered partial sums is decreasing. This monotonicity
further yields

s2m−1 = s2m−2 − a2m−1 ≤ s2m−2 ≤ s0 ,


s2m = s2m−1 + a2m ≥ s2m−1 ≥ s1 ,

so {s2m−1 } and {s2m } are bounded above and below, respectively. By the mono-
tone sequence theorem, these sequences both converge, and since s2m − s2m−1 =
a2m → 0,! their limits are equal. Thus the whole sequence {sk } converges, that is,
the series (−1)n an converges. The even-numbered partial sums decrease to the
full sum S while the odd-numbered ones increase, so S < s2m and S > s2m−1 for
all m. In particular,

0 < S − s2m−1 < s2m − s2m−1 = a2m ,


0 < s2m − S < s2m − s2m+1 = a2m+1 ,

so |sk − S| < ak+1 whether k is even or odd.


!
E XAMPLE 1. The series ∞ n 1/n − 1) converges by the alternating
1 (−1) (e
series test, because e1/n decreases to 1 as n → ∞. The convergence is only
conditional, however, since e1/n − 1 ≈ 1/n when n is large. (More precisely,
by Taylor’s theorem! we have ex =!1 + x +! R(x) where |R(x)| ≤ Cx2 for
0 ≤ x ≤ 1. Thus (e 1/n − 1) = −1
n + R(1/n); the ! first series on the
right diverges, while the second converges by comparison to n−2 .)
302 Chapter 6. Infinite Series

The alternating series test is a useful test for conditional convergence, but the
fact that the difference between a partial sum and the full sum is less in absolute
value than the first neglected term is also of interest in the absolutely convergent
case. (This estimate for the error in replacing the full sum by a partial sum is, in
most cases, accurate to within an order of magnitude.) !
The alternating series test can be applied to a series (−1)n an for which
lim an = 0 provided that the an ’s decrease from some point onward. (Of course,
the inequalities for the partial sums are only valid from that point onward too.)
However, the monotonicity condition cannot be dropped entirely, as the following
example shows:
1 1 1 1 1 1 1
1− 2 + 2 − 4 + 3 − 6 + ··· + m − 2m + · · ·.

Here an → 0 as n → ∞, but not monotonically, and the series diverges. (The sum
of!the first 2m terms is 12 (1 + 12 + 31 + · · · + m1
), a partial sum of the divergent series
1 −1
2 n .)
The tests we have developed can ! be used to analyze a wide variety of power
series, that is, series of the form ∞ 0 n c (x − a) n where x is a real variable. In

typical cases, the ratio test or the root test will establish that there is some number r
such that the series converges absolutely for |x−a| < r and diverges for |x−a| > r.
The convergence at the two remaining points x = a ± r can then be studied by one
of the other tests.

" (−1)n (x − 3)n
E XAMPLE 2. Consider the series . We start with the ratio
(n + 1)22n+1
0
test:
) ) ) )
) an+1 ) ) (−1)n+1 (x − 3)n+1 /(n + 2)22n+3 ) n + 1 |x − 3| |x − 3|
) )=) )
) an ) ) (−1)n (x − 3)n /(n + 1)22n+1 ) = n + 2 4 →
4
.

Thus the series converges absolutely for |x−3| < 4 and diverges for |x−3| > 4.
(The root test would also yield this result.) The two remaining points are where
x − 3 = ±4, that is, x = −1 and x = 7. At these two points the series becomes

" ∞ ∞ ∞
(−1)n (−4)n 1" 1 " (−1)n 4n 1 " (−1)n
= and = .
(n + 1)22n+1 2 n+1 (n + 1)22n+1 2 n+1
0 0 0 0

The first of these diverges, while the second one converges by the alternating
series test. The convergence is only conditional, by the divergence of the first
series. Thus the original series converges absolutely for −1 < x < 7, con-
verges conditionally at x = 7, and diverges elsewhere.
6.4. More Convergence Tests 303

We conclude with another test for convergence (absolute or conditional) that


generalizes the alternating series test and is sometimes useful for trigonometric
series. Its proof is based on the following
! discrete analogue of the integration-
by-parts formula, in which a sum k1 an bn is rewritten by “differentiating” the
sequence {an } and “integrating” the sequence {bn }.

6.23 Lemma (Summation by Parts). Given two numerical sequences {an } and
{bn }, let
a′n = an − an−1 , Bn = b0 + · · · + bn .
Then
k
" k
"
an bn = ak Bk − a′n Bn−1 .
0 1

Proof. We have b0 = B0 , and bn = −Bn−1 + Bn for n ≥ 1, so

(6.24) a0 b0 + a1 b1 + a2 b2 + · · · + ak bk
= a0 B0 − a1 B0 + a1 B1 − a2 B1 + a2 B2 − · · · − ak Bk−1 + ak Bk
= −a′1 B0 − a′2 B1 − · · · − a′k Bk−1 + ak Bk .

6.25 Theorem (Dirichlet’s Test). Let {an } and {bn } be numerical sequences. Sup-
pose that the sequence {an } is decreasing and tends to 0 as n → ∞, and that the
sums Bn = b0 +· · ·+b!n are bounded in absolute value by a constant C independent
of n. Then the series ∞ 0 an bn converges.
! ! ′
Proof. With notation as in Lemma 6.23, k0 an bn = ak Bk − k1! an Bn−1 , so
it is enough to show that limk→∞ ak Bk exists and that the series ∞ ′
1 an Bn−1
converges. The first assertion is easy: Since |Bk | ≤ C and ak → 0, we have
|ak Bk | ≤ Cak → 0. On the other hand, since {an } is decreasing, we have a′n ≤ 0
for all n, so

k
" k
"
|a′n Bn−1 | ≤ C |a′n |
1 1
8 9
= C (a0 − a1 ) + (a1 − a2 ) + · · · + (ak−1 − ak ) = C(a0 − ak ) ≤ Ca0
!∞
for all k. It follows that the series 1 a′n Bn−1 is absolutely convergent and hence
convergent.
304 Chapter 6. Infinite Series

Dirichlet’s test includes the alternating series test as a special case, by taking
bn = (−1)n , for which Bn = 1 or 0 according as n is even or odd. The other
situations in which it is most commonly applied are those with bn = sin nθ or
bn = cos nθ, where θ is not an integer multiple of 2π. That the hypotheses on {bn }
in Dirichlet’s test are satisfied in these cases is shown by the following calculation.

6.26 Lemma. If θ is not an integer multiple of 2π, then

k
" cos 21 (k + 1)θ · sin 21 kθ
cos nθ = ,
1
sin 21 θ
k
" sin 12 (k + 1)θ · sin 21 kθ
sin nθ = .
1
sin 21 θ

Proof. These formulas can be established by using various trigonometric identities.


The easiest method is to use Euler’s formula cos x + i sin x = eix (which we shall
discuss in detail in §7.5). By the formula (6.2) for the sum of a finite geometric
series,

k
" eikθ − 1 iθ e
ikθ/2 [eikθ/2 − e−ikθ/2 ]
einθ = eiθ = e
eiθ − 1 eiθ/2 [eiθ/2 − e−iθ/2 ]
1
eikθ/2 − e−ikθ/2
= ei(k+1)θ/2
eiθ/2 − e−iθ/2
8 9 sin 21 kθ
= cos 12 (k + 1)θ + i sin 12 (k + 1)θ .
sin 12 θ

The asserted formulas follow by taking the real and imaginary parts of both sides.

!∞ Suppose that the sequence {an } decreases to 0 as n → ∞. Then


6.27 Corollary.
the series 1 an cos
! nθ converges for all θ except perhaps for integer multiples of
2π, and the series ∞
1 an sin nθ converges for all θ.

Proof. The hypotheses of Dirichlet’s test are satisfied for θ ̸= 2πj, for if bn is either
cos nθ!or sin nθ, the lemma implies that |Bn | ≤ | csc 12 θ| for all n. (If θ = 2πj, the
series an sin nθ converges trivially since sin nθ = 0 for all n.)
6.4. More Convergence Tests 305

EXERCISES
In Exercises 1–9, determine the values of x at which the series converges absolutely
or conditionally.

" (x + 2)n
1. .
n2 + 1
0

"
2. n3 (2x − 1)n .
1

" x2n
3. .
1 · 3 · · · (2n + 1)
0

" nxn+2
4. .
5n (n + 1)2
1

"(−1)n (x − 4)n
5. .
0
(2n − 3) log(n + 3)
∞ - .
" 1 x−1 n
6. √ .
1
n x+1

" 2 · 4 · · · (2n) 1
7. ( x − 3)n .
1
1 · 3 · · · (2n − 1) 2

" (−1)n (x + 1)2n
8. .
0
3n + 2

" 1 · 3 · · · (2n + 1) n
9. x .
2 · 5 · · · (3n + 2)
0

In Exercises 10–14, determine whether the series converges absolutely, converges


conditionally, or diverges.
"∞ - .
n n+1
10. (−1) log .
2
n
"∞ * n+1
log(x + 7)
11. (−1)n dx.
1 n x

" (−1)n
12. .
1
n1/n
306 Chapter 6. Infinite Series


"
13. (−1)n−1 log(n sin n−1 ).
1

" + - .n ,
n−1 n+1
14. (−1) e− .
n
1
1 2 1 4
15. Use the alternating series test to show that x−1 sin x = 1 − 3! x + 5! x −
1 6
7! x + E(x) where 0 < E(x) < 0.027 for |x| ≤ π.
!
16. (Abel’s Test) Suppose an is a convergent series and {bn } is a decreasing
!
sequence of positive numbers. (lim bn need not be zero.) Show that an bn
converges. (This can be done by using Dirichlet’s test or by modifying the
proof of Dirichlet’s test.)
! !∞ −p
17. Show that if ∞ 1 an converges, then so does 1 n an for any p > 0. For
which p can you guarantee absolute convergence without knowing anything
more about the an ’s?
!
18. For which x and θ does ∞ −1 n
1 n x cos nθ converge?

6.5 Double Series; Products of Series


A double infinite series, informally speaking, is an expression of the form

"
(6.28) amn ,
m,n=0

that is, a series whose terms are indexed by ordered pairs of nonnegative integers.
The difficulty in making precise sense out of such an expression is that it is not
clear what one should mean by a “partial sum.” Two obvious candidates are the
“square” partial sums and the “triangular” partial sums
k
" "
s!
k = amn , s△
k = amn ,
m,n=0 m+n≤k

which are defined by adding up all the terms amn for which (m, n) lies in the
outlined regions in Figure 6.2. (Note that passing from s! △ ! △
k or sk to sk+1 or sk+1
involves adding not just a single term but a finite set of terms to the sum. It is not
necessary to specify the order in which these terms are added, as finite addition
is commutative.) Clearly there are many other possibilities. Indeed, there are in-
finitely many ways to enumerate the set of ordered pairs of nonnegative integers,
each of which leads to a different notion of “partial sums.”
6.5. Double Series; Products of Series 307

k k
n

m
0 0
0 k 0 k

F IGURE 6.2: Schematic representation of square and triangular partial


sums of a double series.

There is yet another possibility: One can consider the double series (6.28) as
an iterated series, just as one can regard double integrals as iterated integrals. That
is, one could interpret (6.28) as
"∞ -" ∞ . "∞ -" ∞ .
amn or amn ,
m=0 n=0 n=0 m=0
!∞
in which one forms the ordinary! series σm = n=0 amn for each m and then
adds up the sums to obtain ∞ m=0 mσ , or similarly with m and n switched. This is
different from the partial-sum procedures discussed above because the intermediate
steps involve infinite sums rather than finite ones.
How is one to make sense out of all these ways of interpreting (6.28)? The
answer, in a nutshell, is that the situation is similar to that for improper double
integrals discussed in §4.7: For series of positive terms, or for absolutely conver-
gent series, there is no problem, as all interpretations lead to the same answer.
Otherwise, one must proceed with great caution.
Let us explain this in more detail. Given any one-to-one correspondence j ↔
(m, n) between the set of nonnegative integers and the set of ordered pairs! of non-

negative integers, we can set bj = amn and form the ordinary!∞ infinite series 0 bj ;
! of the double series m,n=0 amn . The essential
we call such a series an ordering
point is that the orderings of amn are all rearrangements of one another, and we
can apply Theorem 6.19. !
First, if amn ≥ 0, then either all orderings of amn diverge or all orderings
converge,
! and in the latter case their sums are all equal. Thus, the sum of the series
amn is well defined as a positive number or +∞, independent of the choice of
ordering. !
Second,! without the assumption of positivity, if |bj | is convergent for one
ordering of amn , then the same is true for every ordering. In this case the series
308 Chapter 6. Infinite Series

!
amn !is called absolutely convergent, and by Theorem 6.19 again, all order-
ings of
! amn have the same sum, which we call the sum of the double series
amn . Moreover, !an argument similar to the proof of Theorem 6.19 shows that
the
! ! double series amn is absolutely convergent! if and only! if the!iterated series
m ( n |amn |) is convergent, in which case m,n amn = m ( n amn ). (See
Exercises 5 and 6.)
!
Given a double! series amn , we can therefore proceed as follows. First we
evaluate the series |amn | by ordering it in some fashion! or treating it as an iter-
ated series; if it turns out to be finite, we can then evaluate amn by ordering it in
any fashion or treating it as an iterated series.
!
What if amn is not absolutely convergent? Let us separate out the positive
and negative terms as we ! did in Theorem! 6.18. The argument in the proof of!Theo-
rem 6.18 shows that if a+ mn =
! + ∞ but a−
mn! < ∞, then all orderings of amn
+∞; a < ∞ −
diverge
! to likewise, if mn ! +amn =
but !∞,− then all orderings of
amn diverge to −∞. On the other hand, if amn = amn = ∞ but amn !→ 0
as m, n → ∞, the proof of Theorem 6.20 shows that various orderings of amn
can converge to any real number. In this ! case, therefore, we simply cannot make
numerical sense out of the expression amn without specifying more precisely
how the summation is to be performed.
An important situation in which double series occur is in multiplying two series
together. The basic result is as follows.
! !∞
6.29 Theorem. Suppose that ∞ 0 am and ! 0 bn are both absolutely convergent,
with sums A and B. Then the double series ∞ m,n=0 am bn is absolutely convergent,
and its sum is AB.
!
Proof. We consider the ! square partial
! sums of am bn , which are just the products
of the partial sums of am and bn :
k
" -"
k .-"
k .
(6.30) a m bn = am bn .
m,n=0 0 0

If we replace am!and bn by !|am | and |bn | in (6.30), the right side is bounded!by the
finite quantity ( ∞
0 |am |)( ∞
0 |b n |), which shows that the double ! am bn
series
is absolutely convergent. Then, letting k → ∞ in (6.30), we obtain am bn =
AB.

! Under the conditions of Theorem 6.29, we are free to use any ordering of
am bn that we choose, and in particular, we can use the triangular partial sums
rather than the square ones. This is the natural thing to do when considering power
6.5. Double Series; Products of Series 309

! !
series. Indeed, if an xn and ! bn xn are absolutely convergent for a particu-
lar value of x, their product is am bn xm+n , which can also be expressed as a
power series if we group together all the terms involving a given power of x. The
terms involving xj are those with m + n = j, i.e., those with m = 0, 1, . . . , j and
n = j − m. Collecting these terms together yields
-" ∞ .-" ∞ . " ∞ + " ,
n n
an x bn x = a n bm x j .
0 0 j=0 m+n=j

The expression on the right is a power series whose jth coefficient is a finite sum
of products of the original coefficients;
! its partial sums are precisely the triangular
partial sums of the double series am bn xm+n .
The same procedure can also be used for series !∞ without an!x (by taking x = 1,
if you like). That is, given two convergent series 0 am and ∞ 0 bn , we can form
the series
"∞ - " . " ∞
an bm = (a0 bj + a1 bj−1 + · · · + aj−1 b1 + aj b0 ),
j=0 m+n=j j=0
!
whose partial sums are the triangular!partial sums
! of the double series a
!m bn ;
called the Cauchy product of
it is ! am and bn . As we have seen, if am
and
! bn!are absolutely convergent, their Cauchy product is ! too, and!its sum is
( am )( bn ). In fact, the ! Cauchy product
! converges to ( am )( bn ) pro-
vided that at least one of am and bn is absolutely
! convergent! (see Krantz
[12, pp. 109–10], or Rudin [18, p. 74]). However, if am and bn are both
conditionally convergent, their Cauchy product may diverge. (See Exercise 4.)

EXERCISES
1. By multiplying the!geometric series by itself, show that for |x| < 1,

a. (1 − x)−2 = ! 0 (n + 1)x ;
n
1 ∞
−3
b. (1 − x) = 2 0 (n + 1)(n + 2)xn .
!∞ n
2. Let f (x) = 0 x /n!. Show directly from this formula that f (x)f (y) =
f (x + y).
!
3. Verify that the Taylor series of (1 − 4x)−1/2 about x = 0 is ∞ n
0 (2n)!x /(n!)
2
1
and that this series converges absolutely for |x| < 4 . Then, taking for granted
that the sum of this series actually is (1 − 4x)−1/2 (which we shall prove in
§7.3), multiply the series by itself and conclude that for any positive integer j,
"j
(2n)!(2j − 2n)!
2 ((j − n)!)2
= 4j .
n=0
(n!)
310 Chapter 6. Infinite Series

!
4. Show that the series ∞ n
0 (−1) (n+1)
−1/2 is conditionally convergent and that

the Cauchy product of this series with itself diverges. (Hint: The maximum
of the function f (x) = (x + 1)(j − x + 1) occurs at x = 21 j, and hence
(n + 1)(j − n + 1) ≤ ( 12 j + 1)2 for n = 0, . . . , j.)
! !∞ !∞
5. Show that ∞ m,n=0 amn = m=0 ( n=0 amn ) whenever amn ≥ 0 for all
m, n ≥ 0.
!
Suppose! ∞
6. ! m,n=0 amn is absolutely convergent.
!∞ Show that the iterated series
∞ ∞
m=0 ( a
n=0 mn ) converges to the sum m,n=0 amn . (Use Exercise 5.)
!∞
7. Show that m,n=1 (m + n)−p converges if and only if p > 2. (Hint: Use
triangular partial sums.)
8. Let amn = 1 if m = n, amn ! = −1 ! if m − n = 1, and
!∞amn!
= 0 otherwise.
∞ ∞
Show that the iterated series n=0 m=0 amn and m=0 ∞ n=0 amn both
converge, but their sums are unequal.
Chapter 7

FUNCTIONS DEFINED BY
SERIES AND INTEGRALS

In this chapter we study the convergence of sequences and series whose terms are
functions of a variable x and improper integrals whose integrand contains x as a
free variable. In all these situations, the study of the resulting function of x may
reveal unpleasant surprises unless we have some control over the way the rate of
convergence varies along with x; the most commonly encountered form of such
control, uniform convergence, is a major theme of this chapter.

7.1 Sequences and Series of Functions


We recall that a sequence {fk }∞0 of functions is a map that assigns to each non-
negative integer k a function fk . It is implicitly assumed that the functions fk are
all defined on some common domain S (usually a subset of R or Rn ) and all take
values in the same space (R, C, or Rm ).
What does it mean for a sequence of functions {fk } defined on a set S ⊂ Rn
to converge to a function f on S? The most obvious interpretation is that

(7.1) fk (x) → f (x) for every x ∈ S.

This is, indeed, what is usually meant by the statement “fk → f on S” when no
further qualification is added; when we wish to be very clear about it, we shall say
that fk → f pointwise on S when (7.1) holds.
Unfortunately, pointwise convergence is a rather badly behaved operation in
the sense that it does not interact well with other limiting operations, such as dif-
ferentiation and integration. Consider the following group of examples:

311
312 Chapter 7. Functions Defined by Series and Integrals

F IGURE 7.1: Some of the functions defined in (7.2). Top: f1 (dashed)


and f3 (solid). Middle: g1 (dashed) and g3 (solid). Bottom: h1
(dashed) and h3 (solid).
7.1. Sequences and Series of Functions 313

E XAMPLE 1. Let
1 1
fk (x) = arctan kx, gk (x) = fk′ (x) = ,
k k 2 x2 +1
(7.2)
−2k2 x
hk (x) = gk′ (x) = .
(k2 x2 + 1)2

Observe that fk (x) = k−1 f1 (kx), gk (x) = g1 (kx), and hk (x) = kh1 (kx).
In graphical terms, as shown in Figure 7.1, this means that the graph of fk is
obtained from the graph of f1 by shrinking the x and y scales by a factor of k;
the graph of gk is obtained from the graph of g1 by shrinking the x scale by a
factor of k and leaving the y scale unchanged; and the graph of hk is obtained
from the graph of h1 by shrinking the x scale and expanding the y scale by a
factor of k. We have:
i. fk (x) → 0 for all x, since |fk (x)| ≤ π/2k.
ii. gk (x) → 0 for all x ̸= 0, but gk (0) = 1 for all k. That is,
7
1 if x = 0,
lim gk (x) = g(x) ≡
k→∞ 0 otherwise.

iii. hk (x) → 0 for all x. (hk (0) = 0 for all k, and if x ̸= 0, hk (x) ≈ −2/k 2 x3
for large k.)
Therefore, g is discontinuous even though the gk ’s are all continuous; more-
over, since gk is the derivative of fk and an antiderivative of hk ,
/ 0′
lim fk′ (0) = 1 ̸= 0 = lim fk (0);
k→∞ k→∞
lim lim gk (x) = 1 ̸= 0 = lim lim gk (x);
k→∞ x→0 x→0 k→∞
* 1 * 1
8 9
lim hk (x) dx = −1 ̸= 0 = lim hk (x) dx.
k→∞ 0 0 k→∞

Clearly, if we want some theorems to the effect that “the integral of the limit is
the limit of the integrals,” or “the derivative of a limit is the limit of the derivatives,”
pointwise convergence is the wrong condition to impose. We now develop a more
stringent notion of convergence that removes some of the pathologies.
The real trouble with pointwise convergence is as follows. The statement
“fk (x) → f (x) for all x ∈ S” means that, for each x, fk (x) will be close to
f (x) provided k is sufficiently large, but the rate of convergence of fk (x) to f (x)
can be very different for different values of x. For example, if gk is as in (7.2), for
all x ̸= 0 we have gk (x) → 0, so |gk (x)| < 10−4 (say) provided k is sufficiently
314 Chapter 7. Functions Defined by Series and Integrals

large; for x = 10, “sufficiently large” means k ≥ 10, but for x = 0.1, it means
k ≥ 1000. If, however, we have some control over the rate of convergence that is
independent of the particular point x, then many of the pathologies disappear.
The precise definition is as follows. A sequence {fk } of functions defined on a
set S ⊂ Rn is said to converge uniformly on S to the function f if for every ϵ > 0
there is an integer K such that

(7.3) |fk (x) − f (x)| < ϵ whenever k > K and x ∈ S.

The point here is that the same K will work for every x ∈ S. Another way of
writing (7.3) is

(7.4) sup |fk (x) − f (x)| ≤ ϵ whenever k > K.


x∈S

The geometry of this inequality is indicated in Figure 7.2. Yet another way of
expressing uniform convergence is the following, which is sufficiently useful to be
displayed as a theorem.
7.5 Theorem. The sequence {fk } converges to f uniformly on S if and only if
there is a sequence {Ck } of positive constants such that |fk (x) − f (x)| ≤ Ck for
all x ∈ S and limk→∞ Ck = 0.
Proof. If fk → f uniformly, by (7.4) we can take Ck = supx∈S |fk (x) − f (x)|.
Conversely, if Ck → 0, for any ϵ > 0 there exists K such that Ck < ϵ whenever
k > K, and hence |fk (x) − f (x)| ≤ Ck < ϵ for all x ∈ S whenever k > K; that
is, (7.3) holds.

Let us take another look at the examples in (7.2) with regard to uniform con-
vergence. First, the sequence {fk } defined by fk (x) = k−1 arctan kx converges
uniformly to 0 on R, since we can take Ck = π/2k in Theorem 7.5. Second, the
sequence {gk } defined by gk (x) = (k2 x2 + 1)−1 does not converge uniformly to
its limit g on R; indeed,
1
sup |gk (x) − g(x)| = sup = 1 for all k.
x∈R x̸
=0 k 2 x2 +1

(Notice that the supremum is not actually achieved; the maximum of (k2 x2 + 1)−1
occurs at x = 0, but g(0) = 1, so gk (0)−g(0) = 0. See Figure 7.2.) Finally, the se-
quence {hk } defined by hk (x) = −2k2 x(k2 x2 +1)−2 does not converge uniformly
to its limit 0 on R. Indeed, a bit of calculus
√ shows that the√
minimum and maximum
values of hk (x), achieved at x = ±1/ 3k, are ∓9k/8 3, so supx |hk (x) − 0|
actually tends to ∞ rather than 0.
7.1. Sequences and Series of Functions 315

F IGURE 7.2: Left: Uniform convergence. For k large, the graph of


fk − f is contained in the shaded strip |y| < ϵ. Right: Nonuniform
convergence of the sequence {gk } in (7.2). The spike of gk − g around
the origin becomes narrower as k → ∞ but is never wholly within the
shaded strip.

On the other hand, the bad behavior in these examples is all at x = 0. The
sequences {gk } and {hk } do converge uniformly to 0 on the intervals [δ, ∞) and
(−∞, −δ] for any δ > 0. For gk this is clear:
1 1
|gk (x) − 0| = ≤ (x ≤ −δ or x ≥ δ),
k 2 x2 +1 δ2 k2 +1
and (δ2 k2 + 1)−1 → 0 as k → ∞. For hk we do not get a good estimate for the first
√ of k, but (by the same bit of calculus as in the preceding paragraph) when
few values
k > 1/ 3δ the function hk is positive and increasing on (−∞, −δ] and negative
and increasing on [δ, ∞), so the maximum of |hk | on these intervals occurs at the
endpoints ±δ:
- .
2δk2 1
|hk (x) − 0| ≤ 2 2 x ≤ −δ or x ≥ δ, k > √ .
(δ k + 1)2 3δ
The phenomenon exhibited here is quite common. That is, one has a sequence
{fk } of functions that converge pointwise to f on a set S; the convergence is not
uniform on all of S but is uniform on many “slightly smaller” subsets of S. The
situation we shall encounter most often is where S is an open interval (a, b), and
the “bad behavior” occurs near the endpoints, so that the convergence is uniform on
[a + δ, b − δ] for any δ > 0. In this case, the sequence of constants Ck in Theorem
39 will generally depend on δ — as they do in the preceding examples.
The notion of Cauchy sequence has an obvious adaptation to the context of uni-
form convergence. Namely, a sequence {fk } of functions on a set S is uniformly
Cauchy if for every ϵ > 0 there is an integer K so that

(7.6) |fj (x) − fk (x)| < ϵ whenever j, k > K and x ∈ S,


316 Chapter 7. Functions Defined by Series and Integrals

or in other words,

sup |fj (x) − fk (x)| < ϵ whenever j, k > K.


x∈S

We have the following analogue of Theorem 1.20:


7.7 Theorem. The sequence {fk } is uniformly Cauchy on S if and only if there is
a function f on S such that fk → f uniformly on S.
Proof. If {fk } is uniformly Cauchy, then for each x ∈ S the numerical sequence
{fk (x)} is Cauchy. By Theorem 1.20, it has a limit, which we call f (x). Letting
j → ∞ in (7.6), we see that |fk (x) − f (x)| ≤ ϵ whenever k > K and x ∈ S,
so that fk → f uniformly on S. Conversely, if fk → f uniformly on S, we have
|fk (x) − f (x)| ≤ Ck for all x ∈ S, where Ck → 0 as k → ∞, and

|fj (x) − fk (x)| ≤ |fj (x) − f (x)| + |f (x) − fk (x)| ≤ Cj + Ck ,

and Cj + Ck < ϵ when j and k are sufficiently large, so (7.6) holds.

One of the most important properties of uniform convergence is that it preserves


continuity, as mere pointwise convergence does not (see the example {gk } in (7.2)).
7.8 Theorem. Suppose fk → f uniformly on S. If each fk is continuous on S,
then so is f .
Proof. Given a point a ∈ S, we show that f is continuous at a. Given ϵ > 0, we
can choose k large enough so that |fk (x) − f (x)| < ϵ/3 for all x ∈ S. Since fk is
continuous, there exists δ > 0 so that |fk (x) − fk (a)| < ϵ/3 whenever |x − a| < δ
and x ∈ S. But then, under these same conditions on x, we have

|f (x) − f (a)| ≤ |f (x) − fk (x)| + |fk (x) − fk (a)| + |fk (a) − f (a)|
ϵ ϵ ϵ
< + + = ϵ,
3 3 3
which shows that f is continuous at a.

Theorem 7.8 can be strengthened somewhat, because the continuity of a func-


tion f at a point a depends only on the behavior of f at points close to a. Hence,
if fk is continuous on S and fk → f pointwise on S, it is not necessary to have
uniform convergence on all of S to guarantee continuity of the limit function f ; it
is enough to have uniform convergence on some neighborhood of each point in S.
For example, if S is the interval (a, b) and fk → f uniformly on [a + δ, b − δ] for
each δ > 0, we conclude that f is continuous on [a + δ, b − δ] for each δ and hence
that f is continuous on all of (a, b).
7.1. Sequences and Series of Functions 317

The preceding discussion of sequences of functions leads immediately to re-


sults about series of functions. Namely, given a! sequence of functions {fn }∞ 0 de-
fined on a set S, we can form the infinite series ∞ 0 f n
! (x) for each x ∈ S. If this
series converges for each x ∈ S, we say that the series ∞ 0 f n is (pointwise) con-
! on S; in this case,
vergent !∞its sum defines a function on S, which we also denote
by ∞ 0 f n . The series 0 fn! is said to be uniformly convergent on S if the
sequence of partial sums, sk = k0 fn , is uniformly convergent on S.
!
E XAMPLE 2. The geometric series ∞ n
0 x converges pointwise on (−1, 1) to
−1
(1 − x) . Denoting the kth partial sum by sk (x), we have
) )
1 − xk+1 ) 1 )) |x|k+1
sk (x) = , so )sk (x) − = .
1−x ) 1 − x) 1−x

The latter quantity tends to ∞ as x → 1 and to 21 as x → −1 no matter what


k is, so the convergence is not uniform on (−1, 1). (This is hardly surprising,
since the series diverges at both endpoints.) But it is uniform on [−r, r] for any
r < 1, for
|x|k+1 r k+1
≤ for |x| ≤ r,
1−x 1−r
and this quantity vanishes as k → ∞.

The following is the most commonly used test for uniform convergence of se-
ries:

7.9 Theorem (The Weierstrass M-Test). Let {fn }∞ 0 be a sequence of functions on


the set S. Suppose there is a sequence {Mn }∞ 0 of
!positive constants such that (i)

|f
!n∞ (x)| ≤ Mn for all x ∈ S and all n, and (ii) 0 Mn < ∞. Then the series
0 fn is absolutely and uniformly convergent on S.
!∞
Proof. The series ! 0 fn (x) is absolutely convergent for each x ∈ S by com-
parison to the series ∞ 0!Mn . Let us denote its sum by s(x), the kth partial sum
!k ∞
0 fn (x) by sk (x), and k+1 Mn by Ck ; then


" ∞
"
|s(x) − sk (x)| ≤ |fn (x)| ≤ Mn = Ck (x ∈ S).
k+1 k+1
!
But Ck → 0 as k → ∞ since the series Mn is !convergent, so it follows from
Theorem 7.5 that the sequence {sk }, i.e., the series fn , is uniformly convergent
on S.
318 Chapter 7. Functions Defined by Series and Integrals

The tribute to Weierstrass in the name of this theorem is appropriate, since


Weierstrass was one of the pioneers in the rigorous theory of infinite series; but the
term “M-test” signifies nothing more than the fact that the sequence of constants in
the theorem is traditionally denoted by {Mn }.
It is quite possible for a series of functions to be uniformly convergent on
S without being absolutely convergent. (See Exercises 5 and 6.) Therefore, the
Weierstrass M-test, unlike its cousin Theorem 7.5, gives a sufficient condition for
uniform convergence but not a necessary one.

E XAMPLE 3. The M-test gives an easy verification that the geometric series
! ∞ n n
0 x converges uniformly !onn [−r, r] for any r < 1, by taking Mn = r .
n n
(|x | ≤ r for |x| ≤ r, and r < ∞.)
!
E XAMPLE 4. The Taylor series for log(1 + x), ∞ 1 (−1)
n−1 xn /n, converges

absolutely for x ∈ (−1, 1) (by the ratio test) and conditionally at x = 1 (by the
alternating series test). Since |(−1)n−1 xn /n| ≤ r n /n when |x| ≤ r, the M-test
(with Mn = r n /n) shows that this series converges uniformly on [−r, r] for
any r < 1. It actually converges uniformly on [−r, 1] for any r < 1, but the M-
test will not yield this result because the convergence at 1 is only conditional.
(The result needed here is a theorem of Abel that we shall present in §7.3.)

Theorem 7.8, concerning the continuity of limits of sequences, translates im-


mediately into a theorem about continuity of sums of series, as follows:

! Suppose {fn } is a sequence of continuous functions on a set S. If


7.10 Theorem.
the series fn converges uniformly on S, its sum is a continuous function on S.

Proof. Apply Theorem 7.8 to the sequence of partial sums.

The remarks following Theorem 7.8, to the effect that local uniform conver-
gence is enough to yield continuity, apply to this situation also.

EXERCISES

1. For each of the following sequences {fk } of functions, compute limk→∞ fk on


the given interval and tell whether the convergence is uniform on that interval.
If not, is the convergence uniform on some slightly smaller sets?
a. fk (x) = xk , x ∈ [0, 1].
b. fk (x) = x1/k , x ∈ [0, 1].
c. fk (x) = sink x, x ∈ [0, π].
2
d. fk (x) = k−1 e−x /k , x ∈ R.
7.1. Sequences and Series of Functions 319

e. fk (x) = kxe−kx , x ∈ [0, ∞).


f. fk (x) = (x/k)e−x/k , x ∈ [0, ∞).
g. fk (x) = xk /(1 + x2k ), x ∈ [0, ∞).
2. Test the following series for absolute and uniform convergence; state the inter-
val(s) on which you obtain such convergence. What can you conclude about
the continuity of the sum of the series?
"∞
a. e−nx .
0

" xn
b. .
n2 + n + 1
0

" nxn
c. .
2n+3
1

" cos nx
d. .
1
n3

" 1
e. .
x2 + n 2
1

"
f. n−x .
1
3. Let fk (x) = g(x)xk , where g is continuous on [0, 1] and g(1) = 0. Show that
fk → 0 uniformly on [0, 1]. (Cf. Exercise 1a.)
"∞
1
4. Show that the series converges uniformly on any compact interval
1
x − n2
2

that does not contain a nonzero integer, and conclude that the sum of the series
is a continuous function on R \ {±1, ±2, . . .}.
"∞
(−1)n−1
5. Show that the series converges uniformly on R, although the
x2 + n
1
convergence is conditional at every point.
!
6. Given a sequence {cn } of real numbers such that ∞ 1 cn converges, consider
"∞ n
x
the series cn (x ̸= ±1). (Such a series is called a Lambert se-
1 − xn
1
ries.)
a. Show that the series converges absolutely and uniformly on [−a, a] for any
a < 1.
b. Show that the series converges uniformly on (−∞, −b] and on![b, ∞) for
any b > 1, and that the convergence is absolute if and only if ∞ 1 |cn | <
320 Chapter 7. Functions Defined by Series and Integrals

∞. (Hint: xn (1 − xn )−1 = (1 − xn )−1 − 1.)


7. Let {fk } be a sequence of functions defined on a set S, and let S1 , . . . , SM be
a finite collection of subsets of S. Show#that if {fk } converges uniformly on
each Sm , then it converges uniformly on M 1 Sm .
8. Let {fk } be a sequence of continuous functions on [a, b]. Show that if {fk }
converges uniformly on (a, b), then it converges uniformly on [a, b].
9. Let {fk } be a sequence of continuous functions on a compact set S ⊂ Rn .
Suppose that (a) the sequence {fk (x)} is bounded and increasing (and hence
has a limit) for each x ∈ S, and (b) the function f = limk→∞ fk is continuous
on S. Show that fk → f uniformly on S. (Hint: Given ϵ > 0, apply Exercise
5 in §1.6 to the sets Sk = {x ∈ S : f (x) − fk (x) ≥ ϵ}.)

7.2 Integrals and Derivatives of Sequences and Series


If {fk } is a sequence of functions on the interval [a, b] and fk → f on [a, b], is
;b ;b
it true that a fk (x) dx → a f (x) dx? The sequence {hk } in (7.2) shows that
the answer is sometimes no. The best general affirmative result in the context of
Riemann integration is the bounded convergence theorem that we stated in §4.5.
As we indicated there, the proof of that theorem is beyond the scope of this book;
however, uniform convergence yields a affirmative result with an easy proof. It
works equally well for n-dimensional integrals, so we present it in that generality.

7.11 Theorem. Suppose S is a measurable set in Rn and {fk } is a sequence of


integrable functions on S that converges uniformly to an integrable function f on
S. Then * * * *
··· f (x) dn x = lim ··· fk (x) dn x.
S k→∞ S

Proof. By Theorem 7.5, there is a sequence {Ck } of constants such that Ck → 0


and |fk (x) − f (x)| ≤ Ck for x ∈ S. But then
)* * * * ) * *
) )
) · · · fk (x) dn x − · · · f (x) dn x) ≤ · · · |fk (x) − f (x)| dn x
) )
S S S
* *
≤ · · · Ck dn x.
S

This last quantity is the n-dimensional volume of S times Ck , which tends to zero
as k → ∞.
7.2. Integrals and Derivatives of Sequences and Series 321

Returning to the one-dimensional situation, we now ask the corresponding


question for derivatives: If fk → f , is it true that fk′ → f ′ ? Equivalently, set-
ting gk = fk − f , if gk → 0, is it true that gk′ → 0? Here the answer is clearly no
in general; the function gk can be very small but also very wiggly, so that gk′ is not
small.

E XAMPLE 1. Let gk (x) = k−1 sin kx. Then |gk (x)| ≤ k−1 for all x, so
gk → 0 uniformly on R. On the other hand, gk′ (x) = cos kx; the sequence
{cos kx} does not converge at all for most values of x, and when it does —
namely, when x is an even multiple of π — its limit is 1, not 0.

In this situation, the crucial uniformity hypothesis is not on the original se-
quence {fk } but on the differentiated sequence {fk′ }. Here is the result:

7.12 Theorem. Let {fk } be a sequence of functions of class C 1 on the interval


[a, b]. Suppose that {fk } converges pointwise to f and that {fk′ } converges uni-
formly to g on [a, b]. Then f is of class C 1 on [a, b], and g = f ′ .

Proof. The function g is continuous on [a, b] by Theorem 7.8, so it is integrable


over any subinterval of [a, b]. By Theorem 7.11,
* x * x
8 9
g(t) dt = lim fk′ (t) dt = lim fk (x) − fk (a) = f (x) − f (a).
a k→∞ a k→∞

;x
Thus f (x) = f (a) + a g(t) dt. But by the fundamental theorem of calculus, the
function on the right is differentiable and its derivative is g.

The example {fk } in (7.2) shows that pointwise convergence of {fk′ } is not
sufficient to obtain lim(fk′ ) = (lim fk )′ . On the other hand, Theorem 7.12 can be
extended somewhat. Since differentiability (like continuity) is a local property, it is
enough for the convergence of {fk′ } to be uniform on a neighborhood of each point,
rather than on the whole interval in question. In many situations, the sequence
{fk } is defined on an open interval (a, b) and one has uniform convergence of
{fk′ } on each compact subinterval [a + δ, b − δ]; this suffices to guarantee that
lim(fk′ ) = (lim fk )′ on (a, b).
The results on term-by-term integration and differentiation of series are imme-
diate consequences of those for sequences. We have merely to apply Theorems
7.11 and 7.12 to the partial sums of the series to obtain the following theorem.

7.13 Theorem. Suppose that {f! n } is a sequence of continuous functions on the


interval [a, b] and that the series fn converges pointwise on [a, b].
322 Chapter 7. Functions Defined by Series and Integrals

!
a. If fn converges uniformly on [a, b], then
* b I" J "* b
fn (x) dx = fn (x) dx.
a a
! ′
b. If the fn ’s are!of class C 1 and the series fn converges uniformly on [a, b],
1
then the sum fn is of class C on [a, b] and

d I" J "
fn (x) = fn′ (x) (x ∈ [a, b]).
dx

EXERCISES
!
1. Let f (x) = ∞ 1 n
−2 sin nx. Show that f is a continuous function on R and
; π/2 ! !
that 0 f (x) dx = n=1,3,5,... n−3 + 2 n=2,6,10,... n−3 .
!
2. Let f (x) = ∞ −2
1 (x + n) . Show that f is a continuous function on [0, ∞)
;1
and that 0 f (x) dx = 1.
3. Let fk (x) = x arctan kx.
a. Show that limk→∞ fk (x) = 12 π|x|.
b. Show that limk→∞ fk′ (x) exists for every x, including x = 0, but that the
convergence is not uniform in any interval containing 0.
4. For each of the series (a–f) in Exercise 2, §7.1, show that the series can be dif-
ferentiated term-by-term on its interval of convergence (except at the endpoints
in (b)).
!
5. For x ̸= ±1, ±2, . . ., let f (x) = 2x ∞ 2
1 (x − n )
2 −1 (see Exercise 4, §7.1).
!
Show that f is of class C on its domain and that f ′ (x) = − ∞
1
1 [(x − n)
−2 +
−2
(x + n) ].
6. Let f be a continuous function ;on [0, ∞) such that 0 ≤ f (x) ≤ Cx−1−ϵ for

some C, ϵ > 0, and let a = 0 f (x) dx. (The estimate on f implies the
convergence of this integral.) Let fk (x) = kf (kx).
a. Show that limk→∞ fk (x) = 0 for all x > 0 and that the convergence is
uniform on [δ, ∞) for ; 1 any δ > 0.
b. Show that limk→∞ 0 fk (x) dx = a.
;1
c. Show that limk→∞ 0 fk (x)g(x) dx = ag(0) for any integrable function g
;1 ;δ ;1
on [0, 1] that is continuous at 0. (Hint: Write 0 = 0 + δ .)
7.3. Power Series 323

7.3 Power Series


A power series is an infinite series of the form

"
(7.14) an (x − b)n = a0 + a1 (x − b) + a2 (x − b)2 + · · · ,
0

where x is a real or complex variable. The lower limit of summation is always


n = 0 in principle, although the first few terms might vanish (a0 = · · · = ak = 0);
the crucial point is that only nonnegative integer powers of x−b are allowed. (Thus,
one might think of a power series as a “polynomial of infinite degree in x − b.”)
The study of series of the general form (7.14) can be reduced to the special case
b = 0 by the change of variable x → x + b, and we do so henceforth.
The first order of business in studying power series is to determine the range of
values of the variable x for which they converge. The key observation is as follows.
!
7.15 Lemma. If the power series ∞ n
0 an x converges for x = x0 , then it con-
verges absolutely for all x such that |x| < |x0 |.
!
Proof. The convergence of an xn0 implies that an xn0 → 0, and in particular that
|an xn0 | ≤ C for some constant C independent of n. Since
) n) ) )n
) )
n )x )
)x)
|an x | = |an x0 | ) n ) ≤ C )) )) ,
n
x0 x0
! n
for |x| < |x0 | the! series n an x converges absolutely by comparison with the
geometric series |x/x0 | .
!
7.16 Theorem. For any power series ∞ n
0 an x , there is a number R ∈ [0, ∞],
called the radius of convergence of the series, such that the series converges ab-
solutely for |x| < R and diverges for |x| > R. (When R = 0, this means that the
series converges only for x = 0; when R = ∞, it means that the series converges
absolutely for all x.)
!
Proof. Let R = sup{|x0 | : !an xn0 converges}. (R ≥ 0 since the series always
converges at x0 = 0.) Thus an xn diverges if |x|! > R. On the other hand, if
|x|
! < R, there exists x0 such that |x0 | > |x| and an xn0 converges, and then
n
an x converges absolutely by Lemma 7.15.

Important Remark. The reader has probably been thinking of an and x as real
numbers, but Theorem 7.16 is valid, with exactly the same proof, when an and x
are complex numbers.
324 Chapter 7. Functions Defined by Series and Integrals

!
Theorem 7.16 says that the set of all real x such that an xn converges is an
open interval centered at 0, possibly
! together with one or both endpoints, and the
set of all complex x such that an xn converges is an open disc centered at 0 in
the complex plane, possibly together with some or all of its boundary points. The
behavior of the series on the boundary of the region of convergence must be decided
on a case-by-case basis.

E XAMPLE 1. Consider the series



" ∞
" ∞
"
xn n xn
, x , .
1
n2 0 1
n

An easy application of the ratio test shows that each of these series converges
absolutely for |x| < 1 and diverges for |x| > 1, so their radius of convergence
!1. −2
is The first one is absolutely convergent when |x| = 1 by comparison with
n , whereas the second is divergent when |x| = 1 because xn ̸→ 0 as
n → ∞ in that case. The third one is divergent when x = 1 but is conditionally
convergent at x = −1 by the alternating series test. It is also conditionally
convergent at all other complex numbers x such that |x| = 1, by Dirichlet’s
test. (Indeed, take an = n−1 and bn = xn . Then b1 + · · · + bn is a finite
geometric series whose sum equals x(1 − xn )/(1 − x), and this is bounded by
2|x|/(|1 − x|) as n → ∞.)

The standard tools for determining the radius of convergence of a power series
are the ratio test and the root test. We have already seen how this works in §6.4
(especially Example 2 and Exercises 1–9), so we shall not belabor the point here.
However, see Exercise 1. In fact, a slight extension of the root test yields a formula
for the radius of convergence of an arbitrary power series; see Exercise 4.
Theorem 7.16 shows that any power series converges absolutely within the re-
gion |x| < R. Equally important is that it converges uniformly on compact subsets
of this region.
!∞ n
7.17 Theorem.
!∞ Let R be the radius of convergence of 0 an x . For any r < R,
the series 0 an xn converges uniformly on the set {x : |x| ≤ r}, and its sum is a
continuous function on the set {x : |x| < R}.
!
Proof.!For |x| ≤ r we have |an xn | ≤ |an |r n , and the series |an |r n is convergent
since an xn is absolutely convergent at x = r. The first assertion therefore fol-
lows from the Weierstrass M-test, and the second follows from the first by Theorem
7.8.
7.3. Power Series 325

We now turn to the question of integrating power series. In this discussion we


take x to be a real variable.
!
7.18 Theorem. Suppose the series f (x) = ∞ n
0 an x has radius of convergence
R > 0. * b "∞
bn+1 − an+1
a. If −R < a < b < R, then f (x) dx = an .
a n+1
0
"∞
an n+1
b. If F is any antiderivative of f , then F (x) = F (0) + x for |x| <
0
n +1
R.

Proof. Assertion (a) follows immediately from !Theorems 7.13a and 7.17. The fun-
∞ n+1
damental theorem of calculus then shows that 0 an x /(n+1) is an antideriva-
tive of f on (−R, R) — specifically, the one whose value at x = 0 is zero — and
any other antiderivative differs from this one by a constant.

Theorem 7.18 gives a way of generating new series expansions from old ones.
!∞
E XAMPLE 2. If we integrate the geometric series 0 (−x)n = (1 + x)−1
(|x| < 1), we obtain
* x ∞
" (−1)n ∞
" (−1)n−1
dt
log(1 + x) = = xn+1 = xn (|x| < 1).
0 1+t n+1 n
0 1

(The last equality is obtained by the!change of variable n → n − 1.) Similarly,


integration of the geometric series ∞ 2 n
0 (−x ) = (1 + x )
2 −1 leads to

* x ∞" (−1)n x2n+1


dt
arctan x = = (|x| < 1).
0 1 + t2 2n + 1
0

The series for log(1+x) is easily obtained from Taylor’s theorem (see Exercise
3 in §6.1), but not the series for arctan x; the computation of the high-order
derivatives of the latter function is very cumbersome. (Remark: The expansion
of log(1+x) is also valid at x = 1, and that of arctan x is also valid at x = ±1.
However, these facts do not follow from Theorem 7.18. The extra result needed
here is Abel’s theorem, which we shall present below.)

Theorem 7.18 also offers a technique for expressing definite or indefinite inte-
grals of functions that have no elementary antiderivatives in a computable form.
326 Chapter 7. Functions Defined by Series and Integrals

E XAMPLE 3. The function f (x) = x−1 sin x has no elementary antiderivative,


but
* x * x"∞ "∞
sin t (−1)m t2m (−1)m x2m+1
dt = dt = .
0 t 0 0
(2m + 1)! 0
(2m + 1) · (2m + 1)!
;x
This gives a precise analytic expression for 0 t−1 sin t dt that is valid for all
1 3 1 5
x, and the first few terms, x − 18 x + 600 x + · · · , furnish a good numerical
approximation to the integral when x is not too large.
!
Next, what about term-by-term differentiation of a power series ∞ n
0 an x ?
According
!∞ to Theorem 7.13b, we must examine the convergence of the series
na x n−1 obtained by termwise differentiation, which we shall call the de-
0 n
rived series. At first glance, the latter series seems less likely to converge than
the original series, since the nth term of the derived series is much larger than the
corresponding term of the original series when n is large (by a factor of n/|x|). But
in fact, the only values of x for which this really matters are those on the boundary
of the interval (or disc) of convergence; elsewhere, the exponential behavior of xn
as n → ∞ swamps the extra factor of n, as will be seen in the following proof.
!
7.19 Theorem. The radius of convergence of any! power series ∞ n
0 an x is equal
∞ n−1
to the radius of convergence of the derived series 0 nan x .
! !∞
Proof. Let R and R′ be the radii of convergence
! of ∞ n
0 an x and 0 nan x
n−1 ,

respectively. Suppose |x| < R′ . Then nan xn−1 is absolutely convergent, and

|x|
|an xn | = |nan xn−1 | ≤ |nan xn−1 | for large n,
n
!
so an xn is absolutely convergent by comparison. Thus, if |x| < R′ then |x| ≤
R, and it follows that R′ ≤ R.
On the other!hand, if |x| < R, we can pick a number r such that |x| < r < R.
Then the series an r n is absolutely convergent, and

n−1 1 5 )) x ))n 6
|nan x |= n ) ) |an |r n .
|x| r

Since |x/r| < 1, the sequence |x/r|n tends to 0 exponentially fast as n → ∞,


and hence!n|x/r|n → 0 also. In particular, we have |nan xn−1 ! | ≤ |an |r n for n
large, so nan xn−1 converges (absolutely) by comparison to |an r n |. In short,
if |x| < R then |x| ≤ R′ , and it follows that R ≤ R′ . Combining this inequality
with the one in the preceding paragraph, we conclude that R = R′ .
7.3. Power Series 327

Combining this result with Theorem 7.13b, we obtain the fundamental theorem
on term-by-term differentiation of a power series.
!
7.20 Theorem. Suppose the radius of convergence of the series f (x) = an x n

is R > 0. Then the function f is of class C on the interval (−R, R), and !∞its kth
derivative may be computed on (−R, R) by differentiating the series 0 an xn
termwise k times.
!
Proof. In view of Theorem 7.19, Theorem 7.13b shows that f ′ (x) = nan xn−1
for |x| < R. It now follows by induction on k that, for any positive integer k, f is
of class C k on (−R, R) and that f (k) is the sum of the k-times derived series.
!
7.21 Corollary. Every power series ∞ n
0 an x with a positive
!∞ radius of conver-
gence is the Taylor series of its sum; that is, if f (x) = a x n for |x| < R
0 n
(R > 0), then
f (n) (0)
an = .
n!
Proof. Since (d/dx)n xk = 0 when k < n and (d/dx)n xn ≡ n!, we have
dn / 0
f (n) (x) = n
a0 + a1 x + · · · + an xn + · · · = n!an + · · · ,
dx
where the last set of dots denotes terms containing positive powers of x. Setting
x = 0, we obtain f (n) (0) = n!an .
! !∞
7.22 Corollary. If ∞ n
0 an x =
n
0 bn x for |x| < R (R > 0), then an = bn for
all n.
Proof. We have an = f (n) (0)/n! = bn where f (x) is the common sum of the two
series.

The following examples will illustrate the use of Theorem 7.20. The second one
contains a result of importance in its own right, the binomial formula for fractional
and negative exponents.
!
E XAMPLE 4. Suppose we wish to express the sum of the series ∞ n
1 x /n
2

in terms of familiar elementary functions. !∞Then key is to recognize that this


series is related to the geometric series 0 x , and that the factors of 1/n
should arise from integrating
!∞then latter series. With this in mind, we proceed as
2
follows. Setting f (x) = 1 x /n , we obtain successively

" ∞
" ∞
"
xn−1 xn 1
f ′ (x) = , xf ′ (x) = , (xf ′ )′ (x) = xn−1 = .
1
n 1
n 1
1−x
328 Chapter 7. Functions Defined by Series and Integrals

Undoing these transformations in turn yields

log(1 − x)
xf ′ (x) = − log(1 − x), f ′ (x) = − ,
x
and, finally, * x
log(1 − t)
f (x) = − dt.
0 t
E XAMPLE 5. Let α be a real number. Since
dn
(1 + x)α = α(α − 1) · · · (α − n + 1)(1 + x)α−n ,
dxn
the Taylor series of (1 + x)α is
∞ - .
" - .
α n α α(α − 1) · · · (α − n + 1)
(7.23) fα (x) = x , where =
n n n!
n=0
/ 0
(with the understanding that α0 = 1). This series is called the binomial series
of order α. When α is a nonnegative integer k, the terms with n > k all vanish
since they contain a factor of (α − k), and we obtain the familiar binomial
expansion formula for (1 + x)k . For other values of α, the Taylor series is a
genuine infinite series, and one can easily check by the ratio test that its radius
of convergence is 1. Our aim is to verify that the sum of this series is actually
(1 + x)α for |x| < 1.
We need
/α0 the following formulas concerning the generalized binomial co-
efficients n :
- . - .
α α(α − 1) · · · (α − n + 1) α−1
(7.24) n = =α ;
n (n − 1)! n−1
(7.25)
- . 8 - . - .
α (α − n) + n](α − 1) · · · (α − n + 1) α−1 α−1
= = + .
n n! n n−1

Now, if fα (x) is defined by (7.23) for |x| < 1, by (7.24) we have


" - . ∞ - . ∞ - .
α n−1 " α − 1 n−1 " α−1 n
fα′ (x) = n x = α x =α x
1
n 1
n−1 0
n
= αfα−1 (x).
7.3. Power Series 329

(For the third equality we have made the change of variable n → n + 1.) On
the other hand,
∞ -
" . ∞ -
" .
α−1 α − 1 n+1
n
(1 + x)fα−1 (x) = x + x
n n
0 0
"∞ +- . - ., "∞ - .
α−1 α−1 n α n
= + x = x = fα (x).
0
n n−1 0
n

In the second equality, we substituted n − 1 for n in the second sum, and


the third equality comes from (7.25). Combining these results, we see that
(1 + x)fα′ (x) = αfα (x). Multiplying through by (1 + x)−α−1 yields
d8 9
0 = (1 + x)−α fα′ (x) − α(1 + x)−α−1 fα (x) = (1 + x)−α fα (x) .
dx
Thus (1 + x)−α fα (x) is a constant C, and setting x = 0, we see that C =
fα (0) = 1. In short, fα (x) = (1 + x)α , as claimed.
!
E XAMPLE 6. The series ∞ 0 (−1) x
n 2n is a geometric series with ratio −x2 ,

so it converges to (1 + x2 )−1 for |x| < 1 and diverges elsewhere. By Corollary


7.21, this series is the Taylor series of the function f (x) = (1 + x2 )−1 about
x = 0. Now, the function f is C ∞ on the whole real line, so it seems rather
mysterious that its Taylor series converges only on a finite interval. Why should
the series behave badly as x → ±1 when the function itself does not? The
mystery is dispelled by considering complex values of x and recalling that the
region of convergence of a power series in the complex plane is always a disc.
The function f (x) does blow up at x = ±i, so the largest disc about the origin
in the complex plane on which f is smooth is the disc |x| < 1.
!
Abel’s Theorem. Suppose f (x) = ∞ n
0 an x is a power series whose radius of
convergence R is positive and finite. We have seen that the convergence is uniform
on any compact subinterval of (−R, R) and hence that f is continuous on (−R, R).
But now suppose that the series converges at one of the endpoints, say x = R. Does
the uniformity of convergence and the continuity of the sum persist up to this point?
If the series converges absolutely at x = R, then the M-test (with Mn =
|an |Rn ) shows that the series converges absolutely and uniformly on [−R, R], so
its sum is continuous there. But when the convergence is only conditional, a more
subtle argument is needed. The necessary tool is the summation-by-parts formula
that we used to obtain Dirichlet’s test; since we need a slightly different version of
that formula than the one given in Lemma 6.23 (namely, formula (7.27)), we shall
simply derive it as we proceed.
330 Chapter 7. Functions Defined by Series and Integrals

!
7.26 Theorem (Abel’s Theorem). If the series ∞ n
0 an x converges at x = R
(resp. x = −R), then it converges uniformly on the interval [0, R] (resp. [−R, 0])
and hence defines a continuous function on that interval.
Proof.
! Convergence at x = −R (and uniform convergence on [−R, 0]) of f (x) =
an xn is the!same as convergence at x = R (and uniform convergence on [0, R])
of f (−x) = (−1)n an xn , so it is enough to consider convergence at x = R.
Moreover, convergence at x = R (and uniform convergence on [0, R]) of f (x) =
!
an xn is the!same as convergence at x = 1 (and uniform convergence
! on [0, 1])
of f (Rx) = Rn xn . In short, it is enough to assume that ∞
an! 0 a n converges
and to prove that ∞ n
0!an x converges uniformly on [0, 1]. To do this we must
show that the tail end ∞ n
k an x of the series converges uniformly to zero on [0, 1]
as k → ∞. ! !∞
For k ≥ 1, let Ak = ∞ k an be the kth tail end of the series 0 an , so that
ak = Ak − Ak+1 . For l > k and x ∈ [0, 1] we have

ak xk + · · · + al xl = (Ak − Ak+1 )xk + · · · + (Al − Al+1 )xl


= Ak xk + Ak+1 (xk+1 − xk ) + · · · + Al (xl − xl−1 ) − Al+1 xl .

Let l → ∞: then Al+1 → 0 and xl remains bounded, so the last term on the right
disappears and we obtain

" ∞
"
(7.27) an xn = Ak xk + An+1 (xn+1 − xn ).
k k

Now, given ϵ > 0, we can choose k so large that |An | < 12 ϵ whenever n ≥ k.
Since x ∈ [0, 1], we have xn+1 − xn ≤ 0, so (7.27) yields
) ∞ ) ∞
)" ) "
) n) k
an x ) ≤ |Ak |x + |An+1 |(xn − xn+1 )
)
k k

"
1 k 1
≤ 2 ϵx + 2 ϵ (xn − xn+1 ).
k

If x = 1, the series on the right vanishes; if 0 ≤ x < 1, it is a telescoping series


whose sum is xk . In either case, we obtain

"
| an xn | ≤ ϵxk ≤ ϵ
k

for all x ∈ [0, 1] when k is sufficiently large, which establishes the desired uniform
convergence.
7.3. Power Series 331

! !
Remark. If an Rn converges, we already know (Theorem 7.17) that an xn
converges uniformly !on [−r, r] for any r < R. Combining this with Abel’s the-
orem, we see that an xn converges uniformly on [−r, R]. (See Exercise 7 in
§7.1.)
The continuity of the series at the endpoint can be restated in the following way.
Recall that limx→a− f (x) denotes the limit of f (x) as x approaches a from the left.
! !∞ !∞
7.28 Corollary. If ∞ 0 an converges, then limx→1−
n
0 an x = 0 an .
!∞
E XAMPLE 7. The expansion arctan x = 0 (−1)n x2n+1 /(2n + 1) was es-
tablished in Example 2 for |x| < 1. Since the series also converges at x = 1
(by the alternating series test), we obtain a neat series formula for π:

"
1 (−1)n 1 1 1
4π = lim arctan x =
x→1− 2n + 1
=1− 3 + 5 − 7 + ··· .
0
!∞ n
The converse! of Corollary 7.28 is false: The limit S = limx→1− ! 0 an x may
∞ n ∞
exist even when 0 an diverges. (Example: Take an = (−1) ! ; then 0 an xn =
(1 + x)−1 for |x| < 1, so S = 21 .) In this case the series an is said to be Abel
summable to the sum S. Abel summation provides a way of making sense out
of certain divergent series that is useful in some situations, one of which we shall
discuss in §8.2.

EXERCISES
1. Let {an }∞0 be a sequence of real or complex numbers.
a. Suppose that |an+1 /an | converges
!∞ to a limit L as n → ∞. Show that the
radius of convergence of 0 an x is L−1 .
n

b. Suppose that |an |1/n converges


! to a limit L as n → ∞. Show that the
radius of convergence of ∞ 0 a n x n is L−1 .


!∞ thatn if the sequence {an }0 is bounded, the radius of convergence of
2. Show
0 an x is at least 1.
!
3. Suppose the radius
!∞of convergence of ∞ n
0 an x is R. What is the radius of
kn
convergence of 0 an x (k = 2, 3, 4, . . .)?
!∞
4. Show that for any sequence {an }∞ 0 , the radius of convergence of
n
0 an x is
1/n
the reciprocal of lim supn→∞ |an | . (See Exercises 9–12 in §1.5 and Exer-
cise 25 in §6.2.)
5. Show that each of the following functions of x admits a power series expansion
on some interval centered at the origin. Find the expansion and give its interval
of validity.
332 Chapter 7. Functions Defined by Series and Integrals
;x 2
a. ;0 e−t dt.
x
b. ;0 cos t2 dt.
x
c. 0 t−1 log(1 + 2t) dt.
6. Use the series expansions in Exercise 5 to calculate the following integrals to
three; decimal places, and prove the accuracy of your answer.
1 2
a. 0 e−t dt.
;1
b. 0 cos t2 dt.
; 1/2
c. 0 t−1 log(1 + 2t) dt.
!
7. Let f (x) = ∞ n
0 an x be a power series with positive radius of convergence.
Show that f (−x) = f (x) (resp. f (−x) = −f (x)) for all x in the interval of
convergence if and only if an = 0 for all odd n (resp. all even n).
8. Let k be a nonnegative integer. The Bessel function of order k is the function
Jk defined by
(−1)n I x J2n+k
"∞
Jk (x) = .
n!(n + k)! 2
0

a. Verify that the series defining Jk (x) converges for all x.


b. Show that (d/dx)[xk Jk (x)] = xk Jk−1 (x).
c. Show that (d/dx)[x−k Jk (x)] = −x−k Jk+1 (x).
d. Show that u = Jk (x) satisfies the differential equation x2 u′′ + xu′ +
(x2 − k2 )u = 0.
9. Show that the series
x3 x6 x3n
1+ + + ··· + + ···
2·3 2·3·5·6 2 · 3 · 5 · 6 · · · (3n − 1)(3n)

converges for all x and that its sum f (x) satisfies f ′′ (x) = xf (x).
10. Express the sums of the following series in terms of elementary functions and
(perhaps) their antiderivatives in the manner of Example 4.
"∞
nxn
a. .
1
(n + 1)!
"∞
(−1)n x2n+1
b. .
(2n + 1) · (2n + 2)!
0
"∞
xn
c. .
(n + 1)2 n!
0
"∞
(−1)n (2n + 1)x2n
d. .
0
(2n)!
7.4. The Complex Exponential and Trig Functions 333

;x
11. Consider the function f (x) = 0 arctan t dt.
a. Perform the integration to evaluate f in terms of elementary functions.
b. Using the result of Example 2, compute the Taylor series of f (x) (centered
at the origin) and show that it converges to f (x) for x ∈ [−1, 1]. (The
endpoints require special attention.)
c. Deduce that
1 1 1 1 1 1
1− 2 − 3 + 4 + 5 − 6 − 7 + · · · = 14 π − 1
2 log 2.

7.4 The Complex Exponential and Trig Functions


! n
The series ∞ 0 z /n! converges absolutely for every complex number z, by the
ratio test, so we can use it to define the exponential function for a complex variable:

" zn
exp(z) = ez = (z ∈ C).
0
n!

This extended exponential function still obeys the basic law of exponents. Indeed,
by Theorem 6.29,

" ∞ ∞
z w z m wn " " z m wn " (z + w)k
(7.29) e e = = = = ez+w .
m,n=0
m!n! m!n! k!
k=0 m+n=k k=0

(In the third equality


√ we have used the binomial theorem.)
Let i = −1 be the imaginary unit. Since i2 = −1, we have i3 = −i and
4
i = 1, so

i4n = 1, i4n+1 = i, i4n+2 = −1, i4n+3 = −i (n = 0, 1, 2, . . .).

Therefore, when z = ix is purely imaginary,


"∞ n n - . - .
ix i x x2 x4 x3 x5
e = = 1− + − ··· + i x − + − ··· .
n! 2! 4! 3! 5!
0

The series on the right are the Taylor series of cos x and sin x, so we have arrived
at Euler’s formula

(7.30) eix = cos x + i sin x.

This is the appropriate place to raise the issue of the definition of cos x and
sin x. These functions are so familiar that we take them entirely for granted, but the
334 Chapter 7. Functions Defined by Series and Integrals

definitions presented in elementary trigonometry — as ratios of sides of right trian-


gles, or as the coordinates of the point where the unit circle intersects the ray that
makes an angle x with the positive horizontal axis — are quite unsatisfactory, for
they provide neither a precise formula nor a computationally effective algorithm.
(Think for a minute: How could you possibly use these definitions to calculate
cos(1) to four decimal places?)1 In fact, the best procedure is to use Taylor series
as a definition! That is, we define cos x and sin x for all real (or, for that matter,
complex) numbers x by

" ∞
"
(−1)n x2n (−1)n x2n+1
(7.31) cos x = , sin x = .
0
(2n)! 0
(2n + 1)!

We now indicate how to derive all the familiar properties of the trig functions
from these definitions. First, it is clear from (7.31) that

(7.32) cos(−x) = cos x, sin(−x) = − sin x,

so that e−ix = cos x − i sin x. Second, termwise differentiation of (7.31) immedi-


ately yields

(7.33) cos′ = − sin, sin′ = cos .

Third, the addition formulas for sine and cosine follow easily from the law of ex-
ponents:

cos(x ± y) + i sin(x ± y) = ei(x±y) = eix e±iy


= (cos x + i sin x)(cos y ± i sin y)
= (cos x cos y ∓ sin x sin y) + i(sin x cos y ± cos x sin y).

Taking the real and imaginary parts of both sides, we obtain

cos(x ± y) = cos x cos y ∓ sin x sin y,


(7.34)
sin(x ± y) = sin x cos y ± cos x sin y.

In particular, we have the Pythagorean identity

(7.35) cos2 x + sin2 x = cos(x − x) = cos 0 = 1.


1
A similar problem! x arises if one tries to define ex directly. However, here there is an alternative:
−1
Define log x to be 1 t dt and then define exp to be the inverse function of log. The analogous
!x
procedure for developing trig functions, taking the equation arcsin x = 0 (1 − t2 )−1/2 dt as a
starting point, is less satisfactory, because the inverse function of arcsin is not the whole sine function
but only its restriction to the interval [−π/2, π/2].
7.4. The Complex Exponential and Trig Functions 335

Next, we have to!bring the number π into play somehow. We can proceed as
follows. The series ∞ n 2n
0 (−1) 2 /(2n)! for cos 2 is an alternating series whose
terms decrease in magnitude starting with n = 1, so by the alternating series test,

22 24 2
cos 2 = 1 − = −1 with error less than = .
2! 4! 3

In particular, cos 2 < 0, and of course cos 0 = 1 > 0, so by the intermediate value
theorem there is at least one number a ∈ (0, 2) such that cos a = 0. Therefore, the
set Z = {x ≥ 0 : cos x = 0} is nonempty; it is closed since cos is continuous;
hence it contains its greatest lower bound, which is positive since cos 0 = 1. We
denote this smallest positive zero of cos by 12 π. (Again, this may be taken as a
definition of the number π, from which its other familiar properties can be derived.)
Now, by (7.33), (d/dx) sin x = cos x > 0 for 0 ≤ x < 21 π, so sin is increasing
on [0, 21 π], and sin 0 = 0; hence sin 21 π > 0. But by (7.35), sin2 12 π = sin2 12 π +
cos2 12 π = 1; hence, sin 12 π = 1. In summary,

(7.36) cos 0 = sin 12 π = 1, sin 0 = cos 12 π = 0.

All of the familiar formulas of (precalculus) trigonometry can be derived from


the even-odd relations (7.32), the addition formulas (7.34), and the special values
(7.36), and these together with (7.33) yield all the formulas for integration and
differentiation of trigonometric functions. For example, (7.34) and (7.36) yield the
complementarity relations

cos( 12 π − x) = cos 12 π cos x + sin 21 π sin x = sin x,


(7.37)
sin( 12 π − x) = sin 12 π cos x − cos 21 π sin x = cos x.

These, in turn, yield the 2π-periodicity of sine and cosine. Indeed, replacing x by
−x in (7.37) and using (7.32), we see that cos(x+ 21 π) = − sin x and sin(x+ 21 π) =
cos x, whence

cos(x + π) = cos(x + 12 π + 12 π) = − sin(x + 21 π) = − cos x,


sin(x + π) = sin(x + 12 π + 12 π) = cos(x + 12 π) = − sin x,

and therefore

cos(x + 2π) = − cos(x + π) = cos x, sin(x + 2π) = − sin(x + π) = sin x.


336 Chapter 7. Functions Defined by Series and Integrals

EXERCISES

1. Recall that the hyperbolic sine and cosine functions are defined by sinh z =
1 z −z 1 z −z
2 (e − e ) and cosh z = 2 (e + e ). Here, z may now be taken to be a
complex number.
a. Show that sinh ix = i sin x and cosh ix = cos x.
b. Show that sinh(z +w) = sinh z cosh w+cosh z sinh w and cosh(z +w) =
cosh z cosh w + sinh z sinh w.
c. Express sinh(x + iy) and cosh(x + iy) in terms of real functions of the
real variables x and y.
2. Verify that the formula (d/dx)ecx = cecx remains valid when c is a complex
number. (However, x is still a real variable, since we have not discussed differ-
entiation of functions of a complex variable.)
;
3. Let a and b be real numbers. Compute e(a+ib)x dx by using the result of
Exercise 2; then, by taking real and imaginary parts, deduce the formulas
*
eax (a cos bx + b sin bx)
eax cos bx dx = ,
a2 + b2
*
eax (a sin bx − b cos bx)
eax sin bx dx = .
a2 + b2

7.5 Functions Defined by Improper Integrals


In the preceding sections we have considered ; d infinite series of functions. The ana-
logue for integrals is an improper integral c f (x, t) dt, where the integrand con-
tains a free variable x as well as the variable of integration and the resulting integral
defines a function of x. The integral may be improper because c = −∞ or d = ∞
or because of singularities of the function f . To keep the notation simple, we shall
restrict our discussion to the case where d = ∞ and f has no singularities on
[c, ∞), but everything we say extends to the other cases with the obvious modifica-
tions.
In this situation,
;∞ the notion of uniform convergence is as follows: We say that
the integral c f (x, t) dt converges uniformly for x ∈ I (where I is an interval
;d ;∞
in R) if the difference between
;∞ the “partial integral” c and the full integral c
— that is, the “tail end” d — tends to zero uniformly for x ∈ I as d → ∞.
Precisely, this means that
)* ∞ )
) )
sup )) f (x, t) dt)) → 0 as d → ∞.
x∈I d
7.5. Functions Defined by Improper Integrals 337

The most useful test for uniform convergence is the following analogue of the
Weierstrass M-test. The proof is essentially identical to that of the M-test, and we
leave the details to the reader (Exercise 1).
7.38 Theorem. Suppose there is a function g(t) ≥ 0; on [c, ∞) such that (i)

|f
;∞ (x, t)| ≤ g(t) for all x ∈ I and t ≥ c, and (ii) c g(t) dt < ∞. Then
c f (x, t) dt converges absolutely and uniformly for x ∈ I.
The consequences of uniform convergence
;∞ for continuity, integration, and dif-
ferentiation of the function F (x) = c f (x, t) dt are much the same as for series.
The following two theorems provide analogues of Theorems 7.10 and 7.13 in the
present setting.
7.39 Theorem. Suppose that f (x, t) is;a continuous function on the set {(x, t) :

x ∈ I, t ≥ c} and that the integral c f (x, t) dt is uniformly convergent for
x ∈ I. Then: ;∞
a. The function F (x) = c f (x, t) dt is continuous on I.
b. If [a, b] ⊂ I, then
* b* ∞ * ∞* b
f (x, t) dt dx = f (x, t) dx dt.
a c c a
;∞ ;d
Proof. The conclusions are true if c is replaced by c where d < ∞, by The-
orems 4.46 and 4.26. (a) then follows because the uniform limit of continuous
functions is continuous, and (b) follows by the argument in the proof of Theorem
7.11.
7.40 Theorem. Suppose that f (x, t) and its partial derivative ∂x f (x, t) are con-
tinuous
;∞ functions on the set {(x, t) : x ∈ I, t ≥ c}.; Suppose also that the integral

c f (x, t) dt converges for x ∈ I and the integral c ∂x f (x, t) dt converges uni-
formly for x ∈ I. Then the former integral is differentiable on I as a function of x,
and * ∞ * ∞
d ∂f
f (x, t) dt = (x, t) dt.
dx c c ∂x
Theorem 7.40 may be deduced from Theorem 7.39 in much the same way as
Theorem 7.12 was deduced from Theorem 7.11 (Exercise 2).
Let us state explicitly the result of combining Theorems 7.39 and 7.40 with
Theorem 7.38:
7.41 Theorem. The conclusions of ;Theorem 7.39 are valid whenever |f (x, t)| ≤

g(t) for all x ∈ I and t ≥
;∞c, where c g(t) dt < ∞. The conclusions of Theorem
7.40 are valid whenever c f (x,; ∞t) dt converges for x ∈ I and |∂x f (x, t)| ≤ g(t)
for all x ∈ I and t ≥ c, where c g(t) dt < ∞.
338 Chapter 7. Functions Defined by Series and Integrals

The manipulation of improper integrals by the foregoing theorems can be quite


an entertaining exercise, and it leads to a number of interesting and useful results.
Let us look at some examples.
* ∞
arctan(bt) − arctan(at)
E XAMPLE 1. Evaluate dt where 0 < a < b.
0 t
;b 2 2
Solution: We recognize that the integrand is a (x; t + 1)−1 dx. For x ≥ a
∞ 2 2
and t ≥ 0 we have (x2 t2 + 1)−1 ≤ (a 2 2 −1
; ∞t +21)2 , and 0 (a t + 1)
−1 dt < ∞.
−1
Thus, by Theorem 7.38, the integral 0 (x t +1) dt is uniformly convergent
for x ≥ a, so we can apply Theorem 7.39 to obtain
* ∞ * ∞* b
arctan(bt) − arctan(at) 1
dt = 2 t2 + 1
dx dt
0 t 0 a x
* b* ∞ * b * b
1 )∞ π
= dt dx = −1 )
x arctan xt 0 = dx
2 2
x t +1
a 0 a a 2x
- .
π b
= log .
2 a

E XAMPLE 2. Let
* ∞
2
F (x) = e−xt dt, x > 0.
0

2 2
Since (∂ k /∂xk )e−xt = (−t2 )k e−xt , by Theorem 7.40 we can conclude that
* ∞
(k) k 2
F (x) = (−1) t2k e−xt dt (x > 0),
0

provided that we establish the uniform convergence of the integral on the right.
In fact, the convergence is not uniform on the whole interval (0, ∞), but it is
uniform on [δ, ∞) for any δ > 0, which is sufficient. This follows easily from
2 2
Theorem 7.38, since t2k e−xt ≤ t2k e−δt for x ≥ δ.
On the other hand, we can evaluate F (x) explicitly by making the substi-
tution u = x1/2 t and invoking Proposition 4.66:
* ∞ √
−u2 −1/2 π −1/2
F (x) = e x du = x ,
0 2

and therefore

(k) π 1
F (x) = (− 2 )(− 32 ) · · · (−k + 12 )x−k−(1/2) .
2
7.5. Functions Defined by Improper Integrals 339

Comparing the two formulas for F (k) (x), we conclude that


* ∞ √
2 π
t2k e−xt dt = ( 12 )( 32 ) · · · (k − 12 ) .
0 2xk+(1/2)

This result can also be obtained by a laborious k-fold integration by parts (u =


2
t2k−1 , dv = te−xt dt, etc.), but differentiation under the integral gives a rather
painless derivation.
E XAMPLE 3. We now derive one of the most important of all integral formulas:
* ∞
sin t π
(7.42) dt = .
0 t 2

This is a bit tricky, since the integral is not absolutely convergent. (Note that
since t−1 sin t → 1 as t → 0, the integral over [0, 1] is an ordinary proper inte-
gral. The convergence of the integral over [1, ∞) was proved in §4.6 [Example
3].) Our strategy will be to consider an improper integral with two parameters:
* ∞
e−xt sin yt
(7.43) F (x, y) = dt (x > 0, y ∈ R).
0 t

Again, this integral is proper at t = 0, and for x > 0 it is absolutely convergent.


First, we fix x > 0 and consider the integral as a function of y. Formal
differentiation of (7.43) with respect to y leads to
* ∞
∂F
= e−xt cos yt dt.
∂y 0

By Theorem ; ∞7.41, this formula is indeed valid, since |e−xt cos yt| ≤ e−xt for
all y and 0 e−xt dt < ∞. The integral on the right can be evaluated by
elementary calculus (integrate by parts twice, or use Exercise 3 in §7.4), and
the result is
)∞
∂F )
−xt y sin yt − x cos yt ) x
=e 2 2 ) = 2 .
∂y x +y 0 x + y2

Now we can recover F by integrating in y. Obviously F (x, 0) = 0, so we get


the right constant of integration by starting the integration at 0:
* y
x
F (x, y) = ds = arctan(y/x).
0 x2 + s 2
340 Chapter 7. Functions Defined by Series and Integrals

The variable y has now served its purpose, and we henceforth set it equal to 1.
We have shown that
* ∞
e−xt sin t
(7.44) dt = arctan(1/x) (x > 0).
0 t

We now wish to let x → 0. In order to pass the limit under the integral sign
in (7.44), it is enough to show that the integral in (7.44) is uniformly convergent
for x ≥ 0. Unfortunately, Theorem 7.38 does not apply here, since the integral
is not absolutely convergent at x = 0. (Theorem 7.38 easily yields the uniform
convergence for x ≥ δ for any δ > 0, but that isn’t good enough!) Recall the
meaning of uniform convergence: What we need to show is that
)* ∞
)
) e−xt sin t ))
sup )) dt) → 0 as b → ∞.
x≥0 b t

To this end, we use integration by parts,2 taking u = t−1 and dv = e−xt sin t dt;
the result is
* ∞ * ∞
e−xt sin t e−bx (x sin b + cos b) e−xt (x sin t + cos t)
dt = − dt.
b t (x2 + 1)b b (x2 + 1)t2

Now,
) −xt )
) e (x sin t + cos t) )
) )≤ x+1 .
) (x2 + 1) ) x2 + 1

The quantity on the right is continuous on R and tends to zero as x → ∞, so it


is bounded by a constant C for x ≥ 0. Therefore,
)* ∞
) * ∞
) e−xt sin t )) C dt 2C
sup )) dt) ≤ + C 2
= ,
x≥0 b t b b t b

which tends to zero as b → ∞, as desired. Thus the convergence is uniform in


(7.44), and it follows that
* ∞ * ∞
sin t e−xt sin t π
dt = lim dt = lim arctan(1/x) = .
0 t x→0+ 0 t x→0+ 2

2
The idea is much the same as the use of summation by parts in the proof of Abel’s theorem.
7.5. Functions Defined by Improper Integrals 341

EXERCISES
1. Prove Theorem 7.38.
2. Prove Theorem 7.40.
;∞
3. Suppose x > 0. Verify that 0 ;e−xt dt = x−1 , justify differentiating under

the integral sign, and deduce that 0 tn e−xt dt = n!x−n−1 .
;∞
4. Verify that 0 (t2 + x)−1 dt =; 21 πx−1/2 , justify differentiating under the inte-

gral sign, and thence evaluate 0 (t2 + x)−n dt.
* ∞ −bx
e − e−ax a
5. Show that dx = log for a, b > 0.
x b
*0 ∞ −bx −ax
e −e 1 + a2
6. Show that cos x dx = 12 log for a, b > 0.
x 1 + b2
*0 ∞
1 − cos ax
7. Show that e−x dx = 12 log(1 + a2 ) for all a ∈ R.
0 x
8. Deduce from (7.42) that

1
* ∞ ⎨2π
⎪ if x > 0,
sin xt
dt = 0 if x = 0,
0 t ⎪
⎩ 1
− 2 π if x < 0.
Show that the convergence is uniform for x ∈ I if I is any compact interval
with 0 ∈ / I, but not if 0 ∈ I.
* ∞
sin2 xt
9. Use Exercise 8 to show that dt = 21 πx for x > 0.
0 t2
* ∞
cos bx − cos ax
10. Let I(a, b) = dx.
0 x2
a. Show that I(a, b) is convergent for all a, b ∈ R and that the convergence is
uniform for a in any finite interval when b is fixed (or vice versa).
b. Use Exercise 8 to show that I(a, b) = 21 π(a − b) if a, b > 0.
c. Show that I(a, b) = 12 π(|a| − |b|) for all a, b ∈ R.
;∞ 2
11. Let F (x) = 0 e−t cos xt dt for x ∈ R.
a. Justify differentiating under the integral sign and thence show that F ′ (x) =
− 12 xF (x).
√ 2
b. Show that F (x) = 12 πe−x /4 .
;∞ 2
12. Let G(x) = 0 e−t sin xt dt for x ∈ R. Proceeding as in Exercise 11, show
2 ; x 2
that G(x) = e−x /4 0 et /4 dt.
* ∞ 2
1 − e−xt √
13. Show that dt = πx for x ≥ 0.
0 t2
342 Chapter 7. Functions Defined by Series and Integrals
;∞ 2 2 2
14. Let F (x) = 0 e−t −(x /t ) dt.
a. Show that F is a continuous function on R that satisfies F ′ (x) = −2F (x)
for x > 0 and F ′ (x) = 2F (x) for x < 0.

b. Show that F (x) = 21 π e−2|x| .
;∞ 2 2
c. Evaluate 0 e−pt −(q/t ) dt for p, q > 0.
15. Let f be a continuous function on [0, ∞) that satisfies |f (x)| ≤ a(1 + x)N ebx
for some a, b, N ≥ 0. The Laplace transform of f is the function L[f ] defined
on (b, ∞) by * ∞
L[f ](s) = e−sx f (x) dx.
0

a. Show that L[f ] is of class C ∞ on (b, ∞) and (d/ds)n L[f ] = (−1)n L[fn ]
where fn (x) = xn f (x).
b. Suppose that f is of class C 1 on [0, ∞) and that f ′ satisfies the same sort of
exponential growth condition as f . Show that L[f ′ ](s) = sL[f ](s) − f (0).

7.6 The Gamma Function


Perhaps the most important of all functions defined by improper integrals is the
gamma function Γ(x) defined for x > 0 by
* ∞
(7.45) Γ(x) = tx−1 e−t dt,
0

which has a way of turning up in many unexpected places. Let us analyze the
integrals over [0, 1] and [1, ∞) separately. The integral over [0, 1] is proper for
x ≥ 1 and improper but convergent for 0 < x < 1. In fact, by Theorem 7.38 it
is uniformly convergent for x ≥ δ, for any δ > 0, since 0 < tx−1 e−t ≤ tδ−1 for
x ≥ δ and 0 ≤ t ≤ 1. The integral over [1, ∞) is convergent for all x and uniformly
convergent for x ≤ C, for any constant C, since 0 < tx−1 e−t ≤ tC−1 e−t for
x ≤ C and t ≥ 1. Therefore, the integral defining Γ(x) is convergent for x > 0
and uniformly convergent on δ ≤ x ≤ C for any δ > 0 and C > 0.
It follows that Γ is a continuous function on (0, ∞). In fact, Γ is of class C ∞ on
(0, ∞), and its derivatives can be calculated by differentiating under the integral:
* ∞
(k)
(7.46) Γ (x) = (log t)k tx−1 e−t dt.
0

Since | log t| grows more slowly than any power of t as t → 0 or t → ∞, the argu-
ment of the preceding paragraph shows that the integral on the right is absolutely
7.6. The Gamma Function 343

and uniformly convergent for δ ≤ x ≤ C for any positive δ and C, so Theorem


7.40 guarantees the validity of (7.46).
The most important property of Γ is that it satisfies the functional equation

(7.47) Γ(x + 1) = xΓ(x).

The proof is a simple integration by parts (u = tx , dv = e−t dt):


* ∞ * ∞
x −t
)
x −t )∞
Γ(x + 1) = t e dt = −t e 0 + xtx−1 e−t dt = 0 + xΓ(x).
0 0

There are two values of Γ that can be calculated easily by hand:


* ∞
)∞
Γ(1) = e−t dt = −e−t )0 = 1,
0
* ∞ * ∞
−1/2 −t 2 √
1
Γ( 2 ) = t e dt = 2 e−u du = π.
0 0

(For the second one we set u = t and used Proposition 4.66.) The functional
equation (7.47) now yields the values of Γ at all positive integers and half-integers:

Γ(2) = 1Γ(1) = 1, Γ(3) = 2Γ(2) = 2!, Γ(4) = 3Γ(3) = 3!, . . .


√ √
Γ( 23 ) = 1 1
2 Γ( 2 ) = 12 π, Γ( 52 ) = 32 Γ( 23 ) = 32 · 12 π,

and so by induction,
1√
(7.48) Γ(n) = (n − 1)!, Γ(n + 12 ) = (n − 12 ) · · · 23 · 2 π.

Thus the gamma function provides an extension of the factorial function to non-
integers: x! = Γ(x + 1), if you like. It is the natural extension of the factorial
function, not just because it gives the right values at the integers, but because the
functional equation Γ(x + 1) = xΓ(x) is the natural generalization of the recursive
formula n! = n · (n − 1)! that defines factorials.
Other factorial-like products — more precisely, products of numbers in an arith-
metic progression — can also be expressed in terms of the gamma function. Indeed,
since

Γ(c + n + 1) = (c + n)Γ(c + n) = · · · = (c + n)(c + n − 1) · · · cΓ(c),

for a, b > 0 we have


(7.49)
5a6 5a 6 5a 6 Γ( a + n + 1)
a(a + b) · · · (a + nb) = bn+1 + 1 ··· + n = bn+1 b a .
b b b Γ( b )
344 Chapter 7. Functions Defined by Series and Integrals

The functional equation, written in the form

Γ(x + 1)
Γ(x) = ,
x

shows that Γ(x) blows up like x−1 as x → 0. It also provides a way of extending
the gamma function to negative values of x. Indeed, the expression on the right is
defined for all x > −1 except x = 0, and it can be taken as a definition of Γ(x)
for −1 < x < 0. Once this has been done, Γ(x + 1)/x is defined for all x > −2
except x = 0, −1, and it can be taken as a definition of Γ(x) for −2 < x < −1.
Proceeding inductively, we eventually obtain a definition of Γ(x) for all x except
the nonpositive integers, where Γ(x) blows up. In more explicit form, it is

Γ(x + n)
(7.50) Γ(x) = (x > −n).
x(x + 1) · · · (x + n − 1)

This extended gamma function still satisfies the functional equation (7.47), more or
less by definition, and (7.49) remains valid provided that a/b is not a nonpositive
integer.
The qualitative behavior of the gamma function for x > 0 can be analyzed as
follows: Since Γ(1) = Γ(2) = 1, there is a critical point x0 in the interval (1, 2)
by Rolle’s theorem. On the other hand, from (7.46) it is clear that Γ′′ (x) > 0
for x > 0, so that Γ′ (x) is strictly increasing. It follows that Γ is decreasing for
0 < x < x0 and increasing for x > x0 ; in particular, it has a minimum at x0 . Also,
it tends to ∞ as x → 0 or x → ∞, so its graph is roughly U-shaped. The behavior
for x < 0 can then be deduced from (7.50). The graph of Γ is sketched in Figure
7.3.
A number of useful integrals can be transformed into the integral defining Γ(x)
by a change of variables. We single out two particularly useful ones, obtained by
setting u = bt and v = t2 , respectively:
* ∞ * ∞I J
x−1 −bt u x−1 −u du
(7.51) t e dt = e = b−x Γ(x) (b > 0),
0 0 b b
* ∞ * ∞
2 dv
(7.52) t2x−1 e−t dt = v (2x−1)/2 e−v 1/2 = 21 Γ(x).
0 0 2v

There is another important integral related to the gamma function, the so-called
beta function
* 1
(7.53) B(x, y) = tx−1 (1 − t)y−1 dt (x, y > 0).
0
7.6. The Gamma Function 345

F IGURE 7.3: Graph of the equation y = Γ(x), −4 < x < 4. (The


lines x = −k, k = 0, 1, 2, . . ., are vertical asymptotes.)

Since the integrand is approximately equal to tx−1 for t near 0 and to (1 − t)y−1
for t near 1, the integral is proper when x, y ≥ 1 and convergent for x, y > 0. Like
the gamma function, the beta function can be expressed in a number of different
forms by changes of variable in the integral. Other than (7.53), the most important
of these is obtained by the substitution t = sin2 θ, which makes 1 − t = cos2 θ and
dt = 2 sin θ cos θ dθ, so that

* π/2
(7.54) B(x, y) = 2 sin2x−1 θ cos2y−1 θ dθ.
0

The relation between the gamma and beta functions is as follows:

Γ(x)Γ(y)
7.55 Theorem. For x, y > 0, B(x, y) = .
Γ(x + y)

;∞ 2
Proof. We employ the same device that we used to calculate −∞ e−x dx in §4.7:
We express Γ(x) and Γ(y) by (7.52), write Γ(x)Γ(y) as an iterated integral, convert
346 Chapter 7. Functions Defined by Series and Integrals

the latter to a double integral, and switch to polar coordinates:


* ∞ * ∞
2x−1 −t2 2
Γ(x)Γ(y) = 4 t e dt s2y−1 e−s ds
*0 ∞ * ∞ 0
2 2
=4 s2y−1 t2x−1 e−s −t ds dt
0 0
* π/2 * ∞
2
=4 (r cos θ)2y−1 (r sin θ)2x−1 e−r r dr dθ
0 0
* π/2 * ∞
2y−1 2x−1 2
=4 cos θ sin θ dθ r 2x+2y−1 e−r dr
0 0
= B(x, y)Γ(x + y).

In the last step we have used (7.52) and (7.54).

We draw two useful consequences from Theorem 7.55. The first one is another
functional equation for the gamma function; the second one compares the growth
of Γ(x) and Γ(x + a) as x → ∞.

7.56 Theorem (The Duplication Formula). Γ(2x) = π −1/2 22x−1 Γ(x)Γ(x + 21 ).

Proof. Assume that x > 0. By taking y = x in Theorem 7.55 and observing that
the function t(1 − t) is symmetric about t = 12 , we see that
* 18 * 1/2 8
Γ(x)2 9x−1 9x−1
= t(1 − t) dt = 2 t(1 − t) dt.
Γ(2x) 0 0

By the substitution

t = 12 (1 − s1/2 ), dt = − 14 s−1/2 ds, t(1 − t) = 14 (1 − s)

and another application of Theorem 7.55, we obtain


* 1 1
Γ(x)2 1−2x Γ( 2 )Γ(x)
= 21−2x s −1/2
(1 − s) x−1
ds = 2 .
Γ(2x) 0 Γ(x + 12 )

Since Γ( 12 ) = π 1/2 , the result follows. The extension to negative values of x is left
to the reader (Exercise 6).

Γ(x + a)
7.57 Theorem. For a > 0, lim = 1.
x→∞ xa Γ(x)
7.6. The Gamma Function 347

Proof. By Theorem 7.55, the substitution t = e−u , and formula (7.51),


* 1 * ∞
Γ(x)Γ(a)
= tx−1 (1 − t)a−1 dt = (1 − e−u )a−1 e−xu du.
Γ(x + a) 0 0

When x is large, e−xu is very small unless u is close to 0, and in that case 1−e−u is
;approximately u. Hence, the integral on the right should be approximately equal to
∞ a−1 −xu
0 u e du = x−a Γ(a), which is what we are trying to show. More precisely,
we have
* ∞ * ∞8
Γ(x)Γ(a) a−1 −xu
9
= u e du + (1 − e−u )a−1 − ua−1 e−xu du
Γ(x + a) 0 0
* ∞
8 9
= x−a Γ(a) + (1 − e−u )a−1 − ua−1 e−xu du.
0

Multiplying both sides by xa /Γ(a), we obtain


* F- .a−1 G

xa Γ(x) xa 1 − e−u
(7.58) −1= − 1 ua−1 e−xu du.
Γ(x + a) Γ(a) 0 u

It remains to show that the quantity on the right tends to zero as x → ∞.


The function defined by f (u) = (1 − e−u )/u for u ̸= 0 and f (0) = 1 is
everywhere
!∞ positive and of class C ∞ (even at u = 0, for it is the sum of the power
series 1 (−1) n−1 un−1 /n!). Hence the same is true of f (u)a−1 , so the function
g(u) = f (u)a−1 − 1 is smooth and vanishes at u = 0. By the mean value theorem,
then, for 0 ≤ u ≤ 1 we have |g(u)| = |g(u) − g(0)| ≤ Cu where C is the
maximum value of |g′ (u)| on [0, 1]. On the other hand, for u > 1 we clearly have
0 < f (u) < 1 and hence −1 < g(u) < 0. Therefore, the quantity on the right of
(7.58) is bounded in absolute value by
+* 1 * ∞ ,
xa a −xu a−1 −xu
Cu e du + u e du
Γ(a) 0 1
+ * 1 * ∞ , * ∞
xa a −xu a −xu xa
≤ C u e du + u e du ≤ (C + 1) ua e−xu du
Γ(a) 0 1 Γ(a) 0
Γ(a + 1) a
= (C + 1) = (C + 1) ,
xΓ(a) x

where we have used (7.51) again in the last step. In short, the right side of (7.58) is
dominated by x−1 as x → ∞, so we are done.
348 Chapter 7. Functions Defined by Series and Integrals

Theorem 7.57 can be used as an effective alternative to Raabe’s test to decide


the convergence of series involving quotients of factorial-like products, for such
quotients can be expressed as quotients of gamma functions by (7.49).

E XAMPLE 1. Let us reconsider Example 7 from §6.2, namely,



" 1 · 4 · 7 · · · (3n + 1)
.
1
n2 3n n!

Since
84 9 Γ(n + 43 )
1 · 4 · 7 · · · (3n + 1) = 3n 3 · 73 · · · (n + 31 ) = 3n ,
Γ( 43 )

and n! = Γ(n + 1), the nth term of the series is

Γ(n + 43 )
.
n2 Γ( 43 )Γ(n + 1)

By Theorem 7.57, Γ(n + 43 )/Γ(n + 1) is approximately n1/3 when n is large,


!
so the series converges by comparison to n−5/3 .

EXERCISES

1. Prove the duplication formula for the case where x is a positive integer simply
by using (7.48).
2. Show that for a, b > 0,
* 1- . * 1- .a−1
1 a−1 1
log dt = Γ(a), log tb−1 dt = b−a Γ(a).
0 t 0 t

3. Evaluate
; ∞ the following integrals:
2
a. ;0 x4 e−x dx.
∞ √
b. 0 e−3x x dx.
; ∞ 9 −x4
c. 0 x e dx.
4. Prove the following identities directly from the definition (7.53) (without using
Theorem 7.55):
a. B(x, y) = B(y, x).
b. B(x, 1) = x−1 .
7.6. The Gamma Function 349

c. B(x + 1, y);∞+ B(x, y + 1) = B(x, y).


d. B(x, y) = 0 (1 + t)−x−y ty−1 dt.
;1
5. Given a, b, c > 0, evaluate 0 xa (1 − xb )c dx in terms of the gamma function.
6. Use the functional equation (7.47) to show that if the duplication formula (7.56)
is valid for a particular value of x, then it is also true for x − 1. Thence show
how to deduce its validity for all x from its validity for x > 0. (In case x is
a nonpositive integer or half-integer, the formula is valid in the sense that both
sides are infinite.)
; π/2
7. Use (7.54), Theorem 7.55, and (7.48) to evaluate 0 sink x dx. (The form of
the answer is different depending on whether k is even or odd.)
8. Prove Wallis’s formula:

π 2 · 2 · 4 · 4 · 6 · 6 · · · (2n)(2n)
= lim .
2 n→∞ 1 · 3 · 3 · 5 · 5 · 7 · · · (2n − 1)(2n + 1)

(Hint: Denote the fraction on the right by cn . Use Exercise 7 and the fact that
sin2n+1 x < sin2n x < sin2n−1 x for 0 < x < 12 π to show that cn < 21 π <
(2n + 1)cn /2n.)
9. Suppose f is a continuous function on [0, ∞). For α > 0, define the function
Iα [f ] on [0, ∞) by
* x
1
Iα [f ](x) = (x − t)α−1 f (t) dt.
Γ(α) 0

Iα [f ] is called the αth-order fractional integral of f .


a. Show that the derivative of Iα+1 [f ] is Iα [f ] for α > 0, and the derivative
of I1 [f ] is f . (This generalizes Exercise 6 in §4.5.)
b. Show that Iα [Iβ [f ]] = Iα+β [f ] for all α, β > 0.
10. Test the following series for convergence in the manner of Example 1.
"∞
1 · 4 · · · (3n + 1)
a.
2 · 5 · · · (3n + 2)
0
"∞
4n n!
b.
5 · 9 · · · (4n + 5)
0
∞ + ,
" 1 · 3 · · · (2n − 1) p
11. Show that converges if and only if p > 2. (Try both
1
2 · 4 · · · (2n)
Theorem 7.57 and Raabe’s test; you’ll find that the latter doesn’t work in the
borderline case p = 2.)
350 Chapter 7. Functions Defined by Series and Integrals


" Γ(a + n)Γ(b + n)
12. Suppose a, b, c > 0. Show that converges if and only
Γ(c + n)n!
0
if a + b < c.

7.7 Stirling’s Formula


Stirling’s formula is a simple and useful approximation to Γ(x) for large x; in
particular, it tells precisely how rapidly Γ(x) grows as x → ∞.
We begin by analyzing the case where x is an integer n + 1, for which Γ(x) =
n!. First, observe that

log(n!) = log 1 + log 2 + · · · + log n.


;
The sum on the right suggests a Riemann sum for log x dx. Indeed, it is the
; n+(1/2)
midpoint Riemann sum for 1/2 log x dx corresponding to a division into n
equal subintervals, so the latter integral provides an approximation to log(n!). In
more detail, using this Riemann sum means taking log k as an approximation to
* k+(1/2) * 1/2
log x dx = log(k + x) dx.
k−(1/2) −1/2

To see how good this approximation is, we approximate log(k + x) by its tangent
line at x = 0 and use Taylor’s theorem to estimate the error:
x 1 x2
log(k + x) = log k + + Ek (x), |Ek (x)| ≤ sup 2
.
k |t|≤|x| (k + t) 2!

(Here (k+t)−2 is the absolute value of the second derivative of log(k+t).) Clearly,
for |x| ≤ 12 and k ≥ 1 we have
1 1 1
|Ek (x)| ≤ 1 2 ≤ 1 2 ≤ 2k 2 .
8(k − 2 ) 8( 2 k)
Hence,
* 1/2 * 1/2
log(k + x) dx = [log k + k−1 x + Ek (x)] dx = log k + ck ,
−1/2 −1/2

where
)* 1/2
)
) ) 1
(7.59) |ck | = )) Ek (x) dx)) ≤ 2 .
−1/2 2k
7.7. Stirling’s Formula 351

Adding these equalities up from k = 1 to k = n, we obtain


* n+(1/2) n
"
log x dx = log(n!) + ck .
1/2 1

On the other hand,


* n+(1/2)
n+(1/2)
log x dx = [x log x − x]1/2 = (n + 12 ) log(n + 21 ) − n − 21 log 12
1/2
O P
1 1 n + 12
= (n + 2 ) log n − n + (n + 2 ) log − 12 log 12 .
n

Therefore,
n
"
log(n!) − (n + 12 ) log n + n = (n + 12 ) log(1 + (2n)−1 ) − 1
2 log 21 − ck .
1

Since log(1 + x) ≈ x for! x near 0, as n → ∞ the quantity on the right approaches


the constant 2 − 2 log 2 − ∞
1 1 1
1 ck , where the series converges by the estimate (7.59).
Exponentiating both sides, we obtain a preliminary version of Stirling’s formula:
n!
7.60 Lemma. As n → ∞, approaches a limit L ∈ (0, ∞).
nn+(1/2) e−n
Since Γ(n) = n!/n, Lemma 7.60 says that Γ(n)/(nn−(1/2) e−n ) → L as n →
∞. We now extend this result from integers n to real numbers x. To do so we need
a slight strengthening of Theorem 7.57, namely, the uniformity of the convergence
with respect to the parameter a.
) a )
) x Γ(x) )
7.61 Lemma. For any A > 0, sup ) ) − 1)) → 0 as x → ∞.
0≤a≤A Γ(x + a)

Proof. With g(u) = f (u)a−1 − 1 as in the proof of Theorem 7.57, the function
|g′ (u)| = |(a − 1)f (u)a−2 f ′ (u)| is jointly continuous in a and u in the compact
region a ∈ [0, A], u ∈ [0, 1], so its maximum on this region is finite. The constant C
in that proof can be taken to be this maximum when a ∈ [0, A], and the conclusion
of the proof shows that
) a )
) x Γ(x) ) (C + 1)A
sup ) ) − 1)) ≤ ,
0≤a≤A Γ(x + a) x

which yields the desired result.


352 Chapter 7. Functions Defined by Series and Integrals

Γ(x)
7.62 Lemma. lim = L, where L is as in Lemma 7.60.
x→∞ xx−(1/2) e−x
Proof. Any number x ≥ 1 can be written as x = n + a where n is a positive integer
and 0 ≤ a < 1, so that
Γ(x) Γ(n + a)
=
xx−(1/2) e−x (n + a)n+a−(1/2) e−n−a
+ ,+ ,+ - .−n−a+(1/2) ,
Γ(n) Γ(n + a) a n+a
= e .
nn−(1/2) e−n na Γ(n) n
By Lemma 7.60, the first factor in this last expression will be as close to L as we
please when n is sufficiently large. By Lemma 7.61, the second factor will be as
close to 1 as we please when n is sufficiently large and 0 ≤ a ≤ 1. The same is
also true of the third factor; indeed, by taking logarithms it is enough to verify that
) 5 a 6))
)
)a − (n + a − 12 ) log 1 + )
n
will be as close to 0 as we please when n is sufficiently large and 0 ≤ a < 1, and
this is easily accomplished by using the Taylor expansion of log(1 + t) about t = 0.
(Details are left to the reader as Exercise 1.) Combining these results, we see that
Γ(x)/xx−(1/2) e−x becomes as close to L as we please when x is sufficiently large,
as claimed.
Γ(x) √
7.63 Theorem (Stirling’s Formula). lim x−(1/2) −x = 2π.
x→∞ x e
Proof. It remains only to identify the constant L in Lemma 7.62. According to that
lemma, the quantities
Γ(x) Γ(x + 12 ) Γ(2x)
, ,
x x−(1/2) e−x (x + 21 )x e−x−(1/2) (2x)2x−(1/2) e−2x
all approach L as x → ∞. Dividing the product of the first two by the third and
using the duplication formula
Γ(x)Γ(x + 12 ) √
= 21−2x π,
Γ(2x)
we see that
Γ(x) Γ(x + 12 ) (2x)2x−(1/2) e−2x
L = lim x−(1/2) −x · ·
x→∞ x e (x + 21 )x e−x−(1/2) Γ(2x)
- .−x
√ 1
= lim 2πe 1 + .
x→∞ 2x
The last factor on the right tends to e−1/2 as x → ∞, so we are done.
7.7. Stirling’s Formula 353

Stirling’s formula is often written as



Γ(x) ∼ 2π xx−(1/2) e−x ,

where ∼ means that the ratio of the quantities on the left and right approaches 1 as
x → ∞. (The difference of these two quantites, however, tends to ∞ along with
x.)

EXERCISES

1. Complete the proof) of Lemma 7.62 by showing that) for some constant C > 0
we have sup0≤a≤1 )a − (n + a − 12 ) log[1 + (a/n)]) ≤ C/n.
2. If a fair coin is tossed 2n times, the probability that it will come up heads
exactly n times is (2n)!/(n!)2 22n . (The total number of possible outcomes is
2/ 2n0, and the number of those with exactly n heads is the binomial coefficient
2n 2
n = (2n)!/(n!) .) Use Stirling’s formula to estimate this probability when
n is large.
3. Stirling’s formula for factorials,

n! √
lim = 2π,
n→∞ nn+(1/2) e−n

can be proved more simply than the general case. One begins, as we did, by
proving Lemma 7.60, but it is then enough to evaluate the constant L there.
To do this, show that the fraction on the right of Wallis’s formula (Exercise 8
in §7.6) equals [2n n!]4 /[(2n)!]2 (2n + 1), then use
√ Lemma 7.60 to show that it
1 2
approaches 4 L as n → ∞; conclude that L = 2π.
Chapter 8

FOURIER SERIES

Fourier series are infinite series that use the trigonometric functions cos nθ and
sin nθ, or, equivalently, einθ and e−inθ , as the basic building blocks, in the same
way that power series use the monomials xn . They are a basic tool for analyzing
periodic functions, and they therefore have applications in the study of physical
phenomena that are periodic in time (such as circular or oscillatory motion) or in
space (such as crystal lattices). They can also be used to analyze functions defined
on finite intervals in ways that are useful in solving differential equations, and this
leads to many other applications in physics and engineering. The theory of Fourier
series and its ramifications is an extensive subject that lies at the heart of much
of modern mathematical analysis. Here we present only the basics; for further
information we refer the reader to Folland [6], Kammler [10], and Körner [11].

8.1 Periodic Functions and Fourier Series


A function f on R is called periodic with period P , or P -periodic for short, if
f (x + P ) = f (x) for all x. In this case, f is completely determined by its values
on any interval [a, a + P ) of length P , including one but not both of the endpoints;
conversely, any function f defined on an interval [a, a + P ) can be extended in a
unique way to be a periodic function on R, by declaring that f (x + nP ) = f (x)
for all x ∈ [a, a + P ) and all integers n. This correspondence between functions
on intervals and periodic functions on R will be useful later; for the time being, we
focus our attention on periodic functions.
Unlike power series, Fourier series can be used to represent functions that are
quite irregular. To keep the discussion reasonably simple, however, we shall make
a standing assumption that all functions under consideration are piecewise continu-
ous. By this we mean, precisely, the following: A function f defined on an interval

355
356 Chapter 8. Fourier Series

[a, b] is piecewise continuous on [a, b] if it is continuous except at finitely many


points in [a, b], and at each such point the one-sided limits
(8.1) f (x+) = lim f (x + ϵ), f (x−) = lim f (x − ϵ)
ϵ→0+ ϵ→0+

exist (and are finite). Moreover, we shall say that a P -periodic function f on R is
piecewise continuous if it is piecewise continuous on each interval of length P . (If
it is piecewise continuous on one such interval, of course, it is piecewise continuous
on all of them.)
Note. It is sometimes convenient to allow a piecewise continuous function to
be undefined at the points where it has jumps. This does not affect anything that
follows in a significant way.
A piecewise continuous function is integrable over every bounded interval in
its domain. In this connection, the following elementary fact is worth pointing out
explicitly: If f is P -periodic and piecewise continuous, the integrals of f over all
intervals of length P are equal:
* a+P * P
(8.2) f (x) dx = f (x) dx for every a ∈ R.
a 0
The proof is left to the reader (Exercise 9).
By making the change of variable θ = 2πx/P , we can convert any P -periodic
function into a 2π-periodic function. Namely, if f (x + P ) = f (x) and we set
g(θ) = f (x) = f (P θ/2π), then g(θ + 2π) = g(θ). We may therefore restrict
attention to the case where the period is 2π, and we shall generally denote the inde-
pendent variable by θ. There is no presumption that θ denotes an angle, however;
it is just a convenient name for a real variable.
The basic idea of Fourier analysis is that an arbitrary piecewise continuous 2π-
periodic function f (θ) can be expanded as an infinite linear combination of the
functions einθ (n = 0, ±1, ±2, . . .), or equivalently of the functions cos nθ and
sin nθ (n = 0, 1, 2, . . .). In terms of the functions einθ , this expansion takes the
form

"
(8.3) f (θ) = cn einθ .
−∞

Here f may be either real-valued or complex-valued; the cn ’s are complex numbers,


and the series on the right is always to be interpreted as the limit of the symmetric
partial sums in which the nth and (−n)th terms are added in together:

" k
"
cn einθ = lim cn einθ .
k→∞
−∞ −k
8.1. Periodic Functions and Fourier Series 357

Since e±inθ = cos nθ ± i sin nθ, combining the nth and (−n)th terms gives

cn einθ + c−n e−inθ = an cos nθ + bn sin nθ,

where
an = cn + c−n , bn = i(cn − c−n ).
Therefore, (8.3) can be rewritten as

"
1
(8.4) f (θ) = 2 a0 + (an cos nθ + bn sin nθ).
1

The grouping of the nth and (−n)th terms in (8.4) corresponds to the grouping of
the cos nθ and sin nθ terms in (8.4). (The factor of 12 in front of a0 is an artifact of
the definition a0 = c0 + c−0 = 2c0 .)
The series (8.3) and (8.4) can be used interchangeably. The more traditional
form is (8.4), but each of them has its advantages. The advantages of (8.4) derive
from the fact that cos nθ and sin nθ are real-valued and are respectively even and
odd; the advantages of (8.3) derive from the fact that exponentials tend to be eas-
ier to manipulate than trig functions. For developing the basic theory, the latter
consideration is compelling, so we shall work mostly with (8.3).
The questions that face us are as follows: Given a 2π-periodic function f , can
it be expanded in a series of the form (8.3)? If so, how do we find the coefficients
cn in this series? It turns out to be easier to tackle the second question first. That
is, we first assume that f can be expressed in the form (8.3) and figure out what
the coefficients cn must be; then we show that with this choice of cn , the expansion
(8.3) is actually valid under suitable hypotheses on f .
!
Suppose, then, that the series ∞ −∞ cn e
inθ converges pointwise to the function

f (θ), and suppose also that the convergence is sufficiently well behaved that term-
by-term integration is permissible. The coefficients cn can then be evaluated by
the following device. To compute ck , we multiply both sides of (8.3) by e−ikθ and
integrate over [−π, π]:
* π ∞
" * π
−ikθ
f (θ)e dθ = cn ei(n−k)θ dθ.
−π −∞ −π

Now,
* 7
π
i(n−k)θ [i(n − k)]−1 ei(n−k)θ |π−π = 0 if n ̸= k,
(8.5) e dθ =
−π θ|π−π = 2π if n = k.
358 Chapter 8. Fourier Series

Thus all the terms on the right of the integrated series vanish except for the one
with n = k, and we obtain
* π
f (θ)e−ikθ dθ = 2πck ,
−π
or, relabeling k as n,
* π
1
(8.6) cn = f (θ)e−inθ dθ.
2π −π
This is the promised formula for the coefficients cn . The corresponding formula
for an and bn in (8.4) follows immediately:
(8.7) * π *
1 −inθ inθ 1 π
an = cn + c−n = f (θ)[e + e ] dθ = f (θ) cos nθ dθ,
2π −π π −π
* π *
i 1 π
bn = i(cn − c−n ) = f (θ)[e−inθ − einθ ] dθ = f (θ) sin nθ dθ.
2π −π π −π
Of course, according to (8.2), the integrals over [−π, π] in (8.6) and (8.7) can be
replaced by integrals over any interval of length 2π.
It is useful to keep in mind that in either (8.3) or (8.4), the constant term in the
series is
* π
1 1
(8.8) c0 = 2 a0 = f (θ) dθ,
2π −π
the mean value of f on the interval [−π, π] (or on any interval of length 2π).
!∞What have we accomplished? We have shown that if f (θ) is the sum of a series
inθ
−∞ cn e , and if term-by-term integration is legitimate, then the coefficients
cn must be given by (8.6), but as yet we know almost nothing about the class of
functions that can be represented by such series. But now the formula (8.6) provides
a starting point for studying this matter. Indeed, if f is any integrable 2π-periodic
function, the quantities
* *
1 π 1 π
an = f (θ) cos nθ dθ, bn = f (θ) sin nθ dθ,
π −π π −π
* π
1
cn = f (θ)e−inθ dθ
2π −π
are well defined. We call them the Fourier coefficients of f , and we call the series

" ∞
"
cn einθ = 21 a0 + (an cos nθ + bn sin nθ)
−∞ 1

the Fourier series of f .


8.1. Periodic Functions and Fourier Series 359

The study of general Fourier series will be undertaken in the following sections.
We conclude this one by working out two simple examples.
E XAMPLE 1. Let f (θ) be the 2π-periodic function determined by the formula
f (θ) = θ, (−π < θ ≤ π).
That is, f is the sawtooth wave depicted in the top graph of Figure 8.1. The
calculation of the Fourier coefficients cn is an easy integration by parts for
n ̸= 0:
* π + ,π
1 −inθ 1 θe−inθ e−inθ (−1)n+1
cn = θe dθ = + = ,
2π −π 2π −in n2 −π in

since e±inπ = (−1)n . Moreover, c0 = 0 since the mean value of f is clearly


zero. Thus the Fourier series of f is
" (−1)n+1
einθ .
in

=0

Grouping together the nth and (−n)th terms yields the equivalent form

" (−1)n+1
(8.9) 2 sin nθ.
n
1

(We could also have obtained this series directly by using (8.7); we have an = 0
for all n since f is odd, and a calculation similar to the one above shows that
bn = 2(−1)n+1 /n.)
The series (8.9) converges for all θ by Dirichlet’s test. (See Corollary
6.27. The factor of (−1)n+1 does not affect the result, since (−1)n sin nθ =
sin n(θ + π).) The sketches of some of the partial sums in Figure 8.1 lend plau-
sibility to the conjecture that (8.9) does indeed converge to the function f (θ),
at least at the points where f is continuous. (At the points θ = (2k + 1)π where
f is discontinuous, every term in (8.9) vanishes.)
E XAMPLE 2. Let g(θ) be the 2π-periodic function determined by the formula
g(θ) = |θ|, (−π ≤ θ ≤ π).
That is, g is the triangle wave depicted in the top graph of Figure 8.2. Here it
is a bit easier to calculate the Fourier coefficients in terms of sines and cosines.
Since g is an even function, we have bn = 0 for all n and
* *
1 π 2 π
an = g(θ) cos nθ dθ = θ cos nθ dθ.
π −π π 0
360 Chapter 8. Fourier Series

F IGURE 8.1: Top to bottom: The sawtooth wave of Example 1


and
!kthe partial sums S4 , S10 , and S16 of its Fourier series (Sk =
2 1 (−1) n+1 −1
n sin nθ).
8.1. Periodic Functions and Fourier Series 361

F IGURE 8.2: Top to bottom: The triangle wave of Example 2 and


! sums S1 , S2 , and S3 of its Fourier series (Sk = (π/2) −
the partial
(4/π) k1 (2m − 1)−2 cos(2m − 1)θ).


For n = 0 we have a0 = (2/π) 0 θ dθ = π, and for n > 0 an integration by
parts gives
+ ,
2 θ sin nθ cos nθ π 2 (−1)n − 1
an = + = .
π n n2 0 π n2

In other words, an = 0 when n is even and an = −4/πn2 when n is odd, so


we obtain the Fourier series
" ∞
π 4 cos nθ π 4 " cos(2m − 1)θ
(8.10) − = − .
2 π n=1,3,5,...
n2 2 π 1 (2m − 1)2
!
Since ∞ 1 n
−2 < ∞, this series converges absolutely and uniformly by the

Weierstrass M-test. Again, a glance at its first few partial sums in Figure 8.2
supports the conjecture that its full sum is g(θ).
362 Chapter 8. Fourier Series

EXERCISES
In Exercises 1–8, find the Fourier series of the 2π-periodic function f (θ) that
is given on the interval (−π, π) by the indicated formula. (All but Exercise 5 are
either even or odd, so their Fourier series are naturally expressed in terms of cosines
or sines.) Sketches of these functions are given in Figure 8.3.
7
−1 (−π < θ < 0)
1. f (θ) = (the square wave).
1 (0 < θ < π)
2. f (θ) = sin2 θ. (You don’t need calculus if you look at this the right way.)
3. f (θ) = | sin θ|. (Hint: sin a cos b = 21 [sin(a + b) + sin(a − b)].)
4. f (θ) = θ 2 .
5. f (θ) = ebθ (b > 0).
6. f (θ) = θ(π − |θ|).
7
1/a (|θ| < a),
7. f (θ) = where 0 < a < π. (The values of f
−1/(π − a) (a < |θ| < π),
are chosen to make the areas of the rectangles between the graph of f and the
x-axis on the intervals [0, a] and [a, π] both equal to 1.)
7
a−2 (a − |θ|) (|θ| < a),
8. f (θ) = where 0 < a < π. (The constants are
0 (a < |θ| < π),
chosen to make the areas of the triangles under the graph of f equal to 1.)
9. Prove that (8.2) is valid for every piecewise continuous P -periodic function f .
(This
; a+P can be done either directly by changes of variable or by differentiating
a with respect to a via Theorem 4.15a.)

8.2 Convergence of Fourier Series


Given a piecewise continuous 2π-periodic function f , we form its Fourier series:

" * π
1
(8.11) cn einθ , cn = f (θ)e−inθ dθ.
−∞
2π −π

Does this series converge? If so, what is its sum?


These questions are rather delicate. In the first place, since |einθ | ≡ 1, a neces-
sary condition for the convergence of the Fourier series is that cn → 0 as n → ∞,
but the only estimate on the cn ’s that is obvious from the definition is that they are
8.2. Convergence of Fourier Series 363

π π

Exercise 1 Exercise 2

π
Exercise 3
Exercise 4

π
Exercise 5 Exercise 6

a π a π

Exercise 7 Exercise 8

F IGURE 8.3: The functions in Exercises 1–8 of §8.1.


364 Chapter 8. Fourier Series

bounded by a constant:
* π * π
1 −inθ 1
|cn | ≤ |f (θ)e | dθ = |f (θ)| dθ ≤ sup |f (θ)|.
2π −π 2π −π θ

However, it is actually true that cn → 0; in fact, we can say something more precise.

8.12 Theorem (Bessel’s Inequality). If f is 2π-periodic and piecewise continuous


and cn is defined by (8.11), then

" * π
1
|cn |2 ≤ |f (θ)|2 dθ.
−∞
2π −π

!
In particular, |cn |2 < ∞, and hence limn→±∞ cn = 0.

Proof. We examine the difference between f and a partial sum of its Fourier series.
Since the absolute value of a complex number z is given by |z|2 = zz, we have

) N
" )2 - N
" .- N
" .
) )
)f (θ) − inθ )
cn e ) = f (θ) − cn einθ
f (θ) − cn e−inθ
)
−N −N −N
N
" N
"
2
8 −inθ inθ
9
= |f (θ)| − cn f (θ)e + cn f (θ)e + cm cn ei(m−n)θ .
−N m,n=−N

Next, integration of both sides over [−π, π], using the definition of cn and the
relation (8.5), yields

* )π N
" )2 * π N
" N
"
1 ) ) 1
)f (θ)− c einθ )
dθ = |f (θ)|2
dθ− [c c +c c ]+ cn cn
2π ) n ) 2π −π
n n n n
−π −N −N −N
* π N
"
1
= |f (θ)|2 dθ − |cn |2 .
2π −π −N

The integral on the left is clearly nonnegative, so


* π N
"
1 2
0≤ |f (θ)| dθ − |cn |2 .
2π −π −N

Letting N → ∞, we obtain the desired result.


8.2. Convergence of Fourier Series 365

To proceed further in our study of the convergence of the Fourier series of a


function f , we must take a closer look at the partial sums

N
" * π
f 1
(8.13) SN (θ) = cn einθ , cn = f (ψ)e−inψ dψ.
2π −π
−N

Substitution of the formula for cn into the sum yields

N
" * π N
" * π
f 1 in(θ−ψ) 1
SN (θ) = f (ψ)e dψ = f (ψ)ein(ψ−θ) dψ
2π −π 2π −π
−N −N
N
" * π
1
= f (ϕ + θ)einϕ dϕ.
2π −π
−N

(The second equality is obtained by replacing n by −n, which leaves the sum from
−N to N unchanged, and the third one comes from the change of variable ϕ =
ψ − θ with the help of (8.2).) In other words,
* N
f
π
1 " inϕ
(8.14) SN (θ) = f (ϕ + θ)DN (ϕ) dϕ, where DN (ϕ) = e .
−π 2π
−N

DN is called the N th Dirichlet kernel. Its essential properties are summarized in


the following lemma.

8.15 Lemma. Let DN (ϕ) be the function defined in (8.14). Then:


* 0 * π
1
a. DN (ϕ) dϕ = DN (ϕ) dϕ = .
−π 0 2
1 ei(N +1)ϕ − e−iN ϕ
b. DN (ϕ) = .
2π eiϕ − 1

! of (a) is most easily seen by rewriting (8.14) as DN (ϕ) =


Proof. The validity
(2π)−1 + π −1 N 1 cos nϕ and integrating this sum term by term. Since sin 0 =
sin(±nπ) = 0, only the constant term gives a nonzero contribution. To prove (b),
we use the formula (6.2) for the sum of a finite geometric progression:

2N
" ei(2N +1)ϕ − 1 ei(N +1)ϕ − e−iN ϕ
2πDN (ϕ) = e−iN ϕ einϕ = e−iN ϕ = .
0
eiϕ − 1 eiϕ − 1
366 Chapter 8. Fourier Series

Incidentally, if we multiply and divide the formula in Lemma 8.15b for DN (ϕ)
by e−iϕ/2 , we obtain

1 ei(N +(1/2))ϕ − e−i(N +(1/2))ϕ sin(N + 12 )ϕ


DN (ϕ) = = .
2π eiϕ/2 − e−iϕ/2 2π sin 12 ϕ

This shows that DN is real-valued and gives an easy way to visualize it: Its graph
is the rapidly oscillating sine wave y = sin(N + 21 )ϕ, amplitude-modulated to fit
inside the envelope y = ±(2π sin 12 ϕ)−1 . (The reader may wish to generate graphs
of DN for various values of N on a computer.)
We are now ready to formulate and prove the basic convergence theorem for
Fourier series. It turns out that piecewise continuity of a periodic function f is not
enough to yield a good result. Instead we shall assume, in effect, that not only
f but also its derivative f ′ is piecewise continuous. More precisely, we shall say
that a periodic function f is piecewise smooth if, on any bounded interval, f is
of class C 1 except at finitely many points, at which the one-sided limits f (θ+),
f (θ−), f ′ (θ+), and f ′ (θ−) (as defined in (8.1)) exist and are finite. (Note that this
definition of piecewise smoothness is more general than that given in §5.1, which
required the function to be continuous.) Pictorially, f is piecewise smooth if its
graph over any bounded interval is a smooth curve except at finitely many points
where it has jumps (if f is discontinuous) or corners (if f is continuous but f ′ is
discontinuous). In addition, the one-sided tangent lines at the jumps and corners
are not allowed to be vertical.

8.16 Theorem. Suppose f is 2π-periodic and piecewise smooth. Then the partial
f
sums SN (θ) of the Fourier series of f , defined by (8.13), converge pointwise to
1
2 [f (θ−) + f (θ+)]. In particular, they converge to f (θ) at each point θ where f is
continuous.

Proof. By Lemma 8.15a, we have


* 0 * π
1 1
2 f (θ−) = f (θ−) DN (ϕ) dϕ, 2 f (θ+) = f (θ+) DN (ϕ) dϕ,
−π 0

f
so by (8.14), the difference between SN (θ) and its asserted limit is

f 8 9
SN (θ) − 12 f (θ−) + f (θ+)
* 0 * π
8 9 8 9
= f (ϕ + θ) − f (θ−) DN (ϕ) dϕ + f (ϕ + θ) − f (θ+) DN (ϕ) dϕ.
−π 0
8.2. Convergence of Fourier Series 367

Our object is to show that this quantity vanishes as N → ∞. By Lemma 8.15b, we


can rewrite it as
* π
1 8 9
(8.17) g(ϕ) ei(N +1)ϕ − e−iN ϕ dϕ,
2π −π

where ⎧
⎪ f (ϕ + θ) − f (θ−)
⎨ if −π ≤ ϕ < 0,

g(ϕ) = f (ϕ +e θ) −

1
f (θ+)

⎩ if 0 < ϕ ≤ π.
eiϕ − 1
(We could define g(0) to be anything we please; altering the value at this one point
does not affect (8.17), by Proposition 4.14.) On the interval [−π, π], g(ϕ) is con-
tinuous wherever f (ϕ + θ) is and has jump discontinuities wherever f (ϕ + θ) does,
except for an additional singularity at ϕ = 0 caused by the vanishing of eiϕ − 1
there. But this singularity is also at worst a jump discontinuity; that is, the limits
g(0+) and g(0−) both exist. Indeed, by l’Hôpital’s rule,

f (ϕ + θ) − f (θ+) f ′ (ϕ + θ) f ′ (θ+)
g(0+) = lim iϕ
= lim iϕ
= ,
ϕ→0+ e −1 ϕ→0+ ie i

and likewise g(0−) = i−1 f ′ (θ−). In short, g is piecewise continuous.


Now we are done. By Bessel’s inequality, the Fourier coefficients of g,
* π
1
Cn = g(ϕ)e−inϕ dϕ,
2π −π

tend to zero as n → ±∞. But the quantity (8.17) is simply C−N −1 − CN , so it


vanishes as N → ∞, as desired.

If f is piecewise continuous, there may be some question as to how to define f


at its points of discontinuity; as we mentioned earlier, we may wish to allow f to
remain undefined at these points. But Theorem 8.16 shows that for the purposes of
Fourier analysis, the natural choice is the average of the left- and right-hand limits:
f (θ) = 21 [f (θ−) + f (θ+)]. We shall say that f is standardized if it satisfies this
condition at all θ; thus, every standardized piecewise smooth 2π-periodic function
is the sum of its Fourier series at every point.

8.18 Corollary. If f and g are standardized piecewise smooth 2π-periodic func-


tions with the same Fourier coefficients, then f = g.

Proof. f and g are the sum of the same Fourier series.


368 Chapter 8. Fourier Series

To illustrate Theorem 8.16, let us consider the two examples in §8.1.


E XAMPLE 1. The sawtooth wave f (θ) defined by f (θ) = θ for |θ| < π is
smooth except at the odd multiples of π, where its left- and right-hand limits are
π and −π, respectively. Thus the Fourier series of f converges to f everywhere
except at the odd multiples of π, where it converges to 0. On the interval
(−π, π), the result is

" (−1)n+1 θ
sin nθ = for − π < θ < π.
1
n 2

In particular, by taking θ = 21 π, we obtain the interesting formula



" (−1)m−1 1 1 1 π
=1− + − + ··· = ,
1
2m − 1 3 5 7 4

which we derived by other methods in Example 7 of §7.3.


E XAMPLE 2. The triangle wave g(θ) defined by f (θ) = |θ| for |θ| < π is
piecewise smooth and everywhere continuous, so it is the sum of its Fourier
series at every point. Thus,

π 4 " cos(2m − 1)θ
− = |θ| for − π ≤ θ ≤ π.
2 π (2m − 1)2
1

By taking θ = 0 (or θ = ±π), we obtain another interesting formula:



" 1 1 1 1 π2
= 1 + + + + · · · = .
(2m − 1)2 32 52 72 8
1

From this it is also easy to obtain the sum



" 1 1 1 1
S= 2
= 1 + 2 + 2 + 2 + ···
n 2 3 4
1

by separating out the odd and even terms:


- . - .
1 1 1 1 1
S = 1 + 2 + 2 + ··· + + + + ···
3 5 22 42 62
- .
π2 1 1 1 π2 S
= + 1 + 2 + 2 + ··· = + ,
8 4 2 3 8 4
so that 3S/4 = π 2 /8, or S = π 2 /6.
8.2. Convergence of Fourier Series 369

π/2

F IGURE 8.4: The function h of Example 3.

We conclude by remarking that one can often use simple changes of variable
to generate new Fourier expansions from old ones without recalculating the coeffi-
cients from scratch.
E XAMPLE 3. Consider the modified triangle wave h whose graph is given
in Figure 8.4. It is related to the triangle wave g in Example 2 by h(θ) =
g(θ + 12 π), and cos(2m − 1)(θ + 12 π) = (−1)m sin(2m − 1)θ, so

π 4 " (−1)m−1 sin(2m − 1)θ
h(θ) = + .
2 π (2m − 1)2
1

Abel Summability of Fourier Series. The Fourier coefficients of a periodic


function f are defined whenever f is piecewise continuous, but we have proved
the convergence of the Fourier series only when f is piecewise smooth. In fact,
it has been known since 1876 that there are continuous periodic functions whose
Fourier series fail to converge at some points. (The examples are all quite compli-
cated.) However,! if f is merely piecewise continuous, we can still recover f from
its Fourier series ∞ −∞ cn e
inθ by the method of Abel summation that we discussed

at the end of §7.3. Namely, for 0 < r < 1 we consider the series

"
(8.19) Ar f (θ) = r |n| cn einθ
−∞

and its limit as r → 1− (i.e., as r approaches 1 from the left).


Since the coefficients cn are bounded,
! |n|the series (8.19) converges absolutely by
comparison to the geometric series r . Moreover, substitution of the formula
(8.6) for cn into (8.19) gives
∞ *
1 " π |n|
Ar f (θ) = r f (ψ)ein(θ−ψ) dψ.
2π −∞ −π
370 Chapter 8. Fourier Series

!
Since f is bounded, the Weierstrass M-test (comparison to r |n| again) gives the
uniform convergence to justify interchange of the summation and integration, and
a couple of manipulations like those that lead to (8.14) then show that
* ∞
π
1 " |n| inϕ
(8.20) Ar f (θ) = f (θ + ϕ)Pr (ϕ) dϕ, where Pr (ϕ) = r e .
−π 2π −∞

The function Pr is called the Poisson kernel. Like the Dirichlet kernel, it satisfies
* 0 * π
1
(8.21) Pr (ϕ) dϕ = Pr (ϕ) dϕ =
−π 0 2
! n
(write Pr (ϕ) = (2π)−1 + π −1 ∞ 1 r cos nϕ and integrate term by term), and it is
easily expressed in closed form since it is the sum of two geometric series:

(8.22)
+∞ ∞ , + ,
1 " n inϕ " n −inϕ 1 1 re−iϕ
Pr (ϕ) = r e + r e = +
2π 0 1
2π 1 − reiϕ 1 − re−iϕ
1 − r2 1 − r2
= = .
2π(1 − reiϕ )(1 − re−iϕ ) 2π(1 + r 2 − 2r cos ϕ)

However, the Poisson kernel has one additional crucial property that is not shared
by the Dirichlet kernel:
(8.23)
For any δ > 0, Pr (ϕ) → 0 uniformly on [−π, −δ] and on [δ, π] as r → 1−.

Indeed, by (8.22), for δ ≤ |ϕ| ≤ π we have

1 − r2
0 ≤ Pr (ϕ) ≤ ,
2π(1 + r 2 − 2r cos δ)

and the expression on the right tends to zero as r → 1−. With these results in hand,
we come to the main theorem.

8.24 Theorem. Suppose that f is 2π-periodic. If f is piecewise continuous, then

lim Ar f (θ) = 21 [f (θ−) + f (θ+)]


r→1−

for every θ. If f is continuous, then Ar f → f uniformly on R as r → 1.


8.2. Convergence of Fourier Series 371

Proof. We sketch the ideas and leave the details to the reader as Exercises 5 and 6.
Given θ ∈ R and ϵ > 0, we choose δ > 0 small enough so that |f (θ+ϕ)−f (θ+)| <
ϵ when 0 < ϕ < δ and |f (θ + ϕ) − f (θ−)| < ϵ when −δ < ϕ < 0. We then write
the formula (8.20) for Ar f (θ) as
+* −δ * 0 * δ * π,
Ar f (θ) = + + + f (θ + ϕ)Pr (ϕ) dϕ.
−π −δ 0 δ

The first and last integrals tend to zero as r → 1− by (8.23). In the second and
third integrals, f (θ + ϕ) is within ϵ of f (θ−) and f (θ+), respectively, and (8.21)
and (8.23) together show that the integrals of Pr (ϕ) over [−δ, 0] and [0, δ] tend to
1 1
2 as r → 1−. The upshot is that Ar f (θ) is within 2ϵ of 2 [f (θ−) + f (θ+)] when
r is sufficiently close to 1, and since ϵ is arbitrary, the first assertion is proved.
If f is continuous, it is uniformly continuous on [−π, π] by Theorem 1.33 and
hence uniformly continuous on R by periodicity. This means that the δ in the
preceding paragraph can be chosen independent of θ, and the argument given there
then yields uniform convergence.

EXERCISES

1. Find the Fourier series of the sawtooth waves depicted below by modifying the
series in Example 1.

π/2 2

π π

(a) (b)

2. Find the Fourier series of the 2π-periodic function f (θ) defined by f (θ) =
(θ − 14 π)2 on the interval [− 34 π, 54 π]. Use the result of Exercise 4 in §8.1.
3. Find the Fourier series of the 2π-periodic functions defined on the interval
(−π, π) by the indicated formulas by modifying the series in the exercises of
§8.1. 7
0 (−π < θ < 0),
a. f (θ) =
1 (0 < θ < π).
372 Chapter 8. Fourier Series
7
0 (−π < θ < 0),
b. f (θ) = (Hint: max(x, 0) = 21 (x + |x|).)
sin θ (0 < θ < π).
7
(2a)−1 (|θ| < a),
c. f (θ) = where 0 < a < π.
0 (a < |θ| < π),
d. f (θ) = sinh θ.
4. Find the sums of the following series by applying Theorem 8.16 to the series
obtained in the indicated exercises from §8.1 and choosing appropriate values
of θ.
" ∞ "∞
1 (−1)n+1
a. and (Exercise 3). Can you sum the first series
1
4n2 − 1 1
4n2 − 1
in a more elementary way by rewriting it as a telescoping series?
" ∞ ∞
"
1 (−1)n+1
b. and (Exercise 4).
n2 n2
1 1
" ∞ "∞
(−1)n 1
c. and , where b > 0 (Exercise 5).
n 2 + b2 n 2 + b2
1 1
" ∞
(−1)n+1
d. (Exercise 6).
1
(2n − 1)3
5. Fill in the details of the proof of the first assertion of Theorem 8.24.
6. Fill in the details of the proof of the second assertion of Theorem 8.24.

8.3 Derivatives, Integrals, and Uniform Convergence


We next study the differentiation and integration of Fourier series. As a first step,
we point out that by the fundamental theorem of calculus as stated in §4.1, the
formula
* b
(8.25) f (b) − f (a) = f ′ (θ) dθ
a

is valid when f is continuous and piecewise smooth, even though f ′ may be un-
defined at finitely many points. (However, it is generally false if f itself has
jump discontinuities.) In particular, if f and g are both continuous and piecewise
smooth, then so is f g, and an application of (8.25) to the latter function yields the
integration-by-parts formula
* b * b
)b
f (x)g(x) dx = f (x)g(x))a −

f (x)g′ (x) dx.
a a
8.3. Derivatives, Integrals, and Uniform Convergence 373

The first main result is that there is a very simple relation between the Fourier
coefficients of f and those of f ′ .

8.26 Theorem. Suppose f is 2π-periodic, continuous, and piecewise smooth, and


let cn and c′n be the Fourier coefficients of f and f ′ , given by (8.6). Then

c′n = incn .

Equivalently, if an , bn and a′n , b′n are the Fourier coefficients of f and f ′ given by
(8.7), then a′n = nbn and b′n = −nan .

Proof. Simply integrate by parts:


* π * π
′ 1 ′ −inθ 1 )
−inθ )π 1
cn = f (θ)e dθ = f (θ)e −π
− f (θ)(−ine−inθ ) dθ.
2π −π 2π 2π −π

The first term on the right vanishes because f (θ)e−inθ is 2π-periodic, and the sec-
ond one is incn . The argument for an and bn is similar (Exercise 1).

Note that Theorem 8.26 makes no claim about the Fourier series of f ′ ; it is
valid whether or not that series actually converges. If we add more conditions on f
to ensure that it does, we obtain the following result:

8.27 Corollary. Suppose that f is 2π-periodic, continuous, and piecewise smooth,


and that f ′ is also piecewise smooth. If

" ∞
"
cn einθ = 21 a0 + (an cos nθ + bn sin nθ)
−∞ 1

is the Fourier series of f , then f ′ (θ) is the sum of the derived series

" ∞
"
incn einθ = (nbn cos nθ − nan sin nθ)
−∞ 1

at every θ at which f ′ (θ) exists. At the exceptional points where f ′ has jumps, the
series converges to 12 [f ′ (θ−) + f ′ (θ+)].

Proof. By Theorem 8.16, f ′ is the sum of its Fourier series everywhere except
where it has jumps, and the coefficients in that series are given by Theorem 8.26.
374 Chapter 8. Fourier Series

E XAMPLE 1. The triangle wave (Example 2 in §8.1) is continuous and piece-


wise smooth, and its derivative is the square wave (Exercise 1 in §8.1). We can
therefore recover the result of the latter exercise by differentiating the series
(8.10): 7

4 " sin(2m − 1)θ −1 (−π < θ < 0),
=
π 2m − 1 1 (0 < θ < π).
1

Next, we consider integration of Fourier series. There is one annoying point


that must be kept in mind: ;If f is a piecewise continuous 2π-periodic function, its
θ
indefinite integral F (θ) = 0 f (ϕ) dϕ will be periodic only when
* π
f (ϕ) dϕ = F (π) − F (−π) = 0,
−π

that is, when the mean value of f over an interval of length 2π is zero, or, equiv-
alently, when the constant term in the Fourier series of f vanishes. We make this
assumption in the following theorem; if it is not valid, we may wish to subtract off
the constant term and deal with it separately.

8.28 Theorem. Suppose f is 2π-periodic and piecewise continuous, with Fourier


coefficients cn given by (8.6) or an , bn given by (8.7). Assume that c0 = 12 a0 = 0.
If F is a continuous, piecewise smooth function such that F ′ = f (except at the
points where f has jumps), then

" cn ∞ -
" .
inθ an bn
F (θ) = C0 + e = C0 + sin nθ − cos nθ
in n n

=0 1

for all θ, where C0 is the mean value of F on [−π, π].

Proof. F is 2π-periodic by (8.2), for


* θ+2π * π
F (θ + 2π) − F (θ) = f (ϕ) dϕ = f (ϕ) dϕ = 2πc0 = 0.
θ −π

By Theorem 8.16, F is the sum of its Fourier series at every point, and by Theorem
8.26, its Fourier coefficients Cn are given for n ̸= 0 by inCn = cn (and likewise
for the cosine and sine coefficients). The constant term C0 is, as always, the mean
value of F .

Observe that the series in Theorem 8.28 is obtained by formally integrating the
Fourier series of f term-by-term, whether the latter series converges or not.
8.3. Derivatives, Integrals, and Uniform Convergence 375

E XAMPLE 2. Subtraction of the mean value from the triangle wave (Example
2 in §8.2) and multiplication by −2 gives

8 " cos(2m − 1)θ
π − 2|θ| = (|θ| ≤ π),
π 1 (2m − 1)2

and integration of both sides from 0 to θ then yields



8 " sin(2m − 1)θ
πθ − θ|θ| = (|θ| ≤ π),
π (2m − 1)3
1

which is the result of Exercise 6 in §8.1.

Theorem 8.28 and the Corollary 8.27 exhibit situations where we can integrate
or differentiate a series termwise without worrying about uniform convergence.
However, uniform and absolute convergence are still highly desirable things, so
we present a simple criterion for the Fourier series of a function to have these
properties.

8.29 Theorem. If f is 2π-periodic, continuous, and piecewise smooth, then the


Fourier series of f is absolutely and uniformly convergent.

Proof. Let cn and c′n be the Fourier ! coefficients of f and f ′ . Since |cn einθ | =
|cn |, the absolute convergence of cn einθ is equivalent to the convergence of
!
|c |,
!n inθ and by the Weierstrass M-test, this also implies the uniform convergence

of cn e . But by Theorem 8.26, cn = cn /in for n ̸= 0, so

|cn | ≤ 12 (|c′n |2 + |n|−2 ) (n ̸= 0).

(The inequality αβ ≤ 12 (α2 + β 2 )!


is valid for all!α, β ∈ R since α2 + β 2 − 2αβ =
2
(α − β) ≥ 0.) But the series |c′n |2 and n−2 are both convergent — by
Bessel’s inequality
! in the former case, since f ′ is piecewise continuous — and
hence so is |cn |.

We conclude this section with an important feature of Fourier series, which we


state as a general principle rather than as a precise theorem:
The degree of smoothness of a periodic function is closely related to the rate of
decay of its Fourier coefficients, that is, to the rate of convergence of its Fourier
series.
376 Chapter 8. Fourier Series

Indeed, let f be a 2π-periodic function f with Fourier coefficients cn . If f is of


class C k , then f (k) is a continuous 2π-periodic function whose Fourier coefficients
are (in)k cn , by Theorem 8.26. By Bessel’s inequality, limn→∞ |nk cn | = 0, so |cn |
tends to zero faster than |n| −k n → ±∞. Conversely, suppose |c −k−ϵ
! as !n | ≤ C|n|
for some C, ϵ > 0. Then |nj cn | < ∞ for j < k, so the series cn einθ can be
differentiated termwise k − 1 times with the differentiated series being absolutely
and uniformly convergent, and hence its sum f (assuming f is standardized) is of
class C k−1 . (Several other variations can be played on this theme.)
We can see this phenomenon in the examples of §8.1. The sawtooth wave
has discontinuities, and its Fourier coefficients decay like n−1 ; the triangle wave
is continuous but its first derivative is not, and its Fourier coefficients decay like
n−2 . Figures 8.1 and 8.2 show clearly that the Fourier series of the triangle wave
converges more rapidly than that of the sawtooth wave.

EXERCISES

1. Verify the assertion about an and bn in Theorem 8.26.


2. Given a ∈ (0, π), let f be the 2π-periodic function defined by f (θ) = a−1 for
|θ| < a and f (θ) = (a − π)−1 for a < |θ| < π.

a. Find the formula for g(θ) = 0 f (ϕ) dϕ on [−π, π] and sketch its graph.
b. Use the Fourier series of f found in Exercise 7 of §8.1 to compute the
Fourier series of g.
3. By applying Theorem 8.28 to the result of Exercise 4 of §8.1, show that:
"∞
3 2 (−1)n sin nθ
a. θ − π θ = 12 (|θ| ≤ π).
n3
1
"∞
−7π 4 (−1)n+1 cos nθ
b. θ 4 − 2π 2 θ 2 = + 48 (|θ| ≤ π).
15 n4
1
"∞
1 π4
c. = .
1
n4 90
4. From Exercise 3 of §8.1, we know that

2 4 " cos 2nθ
sin θ = − for 0 ≤ θ ≤ π.
π π 4n2 − 1
1

Show that this series can be differentiated or integrated termwise to yield two
apparently different series expansions of cos θ for 0 < θ < π, and reconcile
these two expansions. (Hint: Example 1 of §8.2 is useful.)
8.4. Fourier Series on Intervals 377

Let f (θ) be the 2π-periodic function such that f (θ) = eθ for |θ| < π, and let
5. !
∞ inθ be its Fourier series. If we formally differentiate this equation,
−∞ cn e !
we obtain eθ = ∞ −∞ incn e
inθ for |θ| < π. But then c and inc are both
n n
; π
equal to (2π)−1 −π eθ e−inθ dθ, so cn = incn and hence cn = 0 for all n.
Clearly this is wrong; where is the mistake?
6. How smooth are the following functions? That is, for which k can you show
that the function is of class C k ?
" ∞
" ∞
"
einθ cos nθ cos 2n θ
a. . b. . c.
n6/5 (1 + n6 ) 2n 2n

=0 0 0

8.4 Fourier Series on Intervals


A 2π-periodic function is completely determined by its values on any interval of
length 2π. Conversely, if one is given a function f defined on an interval of length
2π, say [−π, π], one can extend f to be a 2π-periodic function on R by declaring
that f (θ + 2kπ) = f (θ) for all θ ∈ [−π, π] and k ∈ Z. (Actually, this definition
is not consistent at the points θ = (2k + 1)π unless f (−π) = f (π), but one can
redefine f to be any given number at these points, such as 21 [f (−π) + f (π)].) If the
original f on [−π, π] is piecewise continuous or piecewise smooth, the same will
be true of its periodic extension. However, even if f is perfectly smooth on [−π, π],
there will usually be discontinuities in the periodic extension or its derivatives at
the points (2k + 1)π where the translates of f are joined together. (For example,
the periodic extension of f (θ) = θ on [−π, π] is the sawtooth wave.)
By considering the periodic extension, then, one can use Fourier series to ex-
pand a piecewise smooth function on [−π, π] in terms of trig functions. All of
the results in the preceding sections apply, except that in using the results of §8.3
one must remember to take into account the possible extra discontinuities in the
periodic extension or its derivatives at the points (2k + 1)π.
There is an extra twist we can add to this construction that is useful in many
situations. Suppose that we are considering functions on [0, π] rather than [−π, π].
Given a piecewise continuous function f on [0, π], we first extend it to [−π, π] by
declaring it to be even (see Figure 8.5), and then extend it to be 2π-periodic on R.
That is, we define the even extension feven of f on [−π, π] by
7
f (θ) if 0 ≤ θ ≤ π,
feven (θ) =
f (−θ) if −π ≤ θ ≤ 0.

For this extension the Fourier sine coefficients bn all vanish because feven (θ) sin nθ
378 Chapter 8. Fourier Series

F IGURE 8.5: A function on [0, π] (above) and its even and odd exten-
sions to [−π, π] (below, left and right).

is an odd function, and the cosine coefficients an are given by


* *
1 π 2 π
an = feven (θ) cos nθ dθ = f (θ) cos nθ dθ.
π −π π 0
!
The resulting Fourier series is 12 a0 + ∞ 1 an cos nθ.
On the other hand, we could also consider the odd extension of f to [−π, π]
(see Figure 8.5):


⎨f (θ) if 0 < θ < π,
fodd (θ) = −f (−θ) if −π < θ < 0,


0 if θ = 0, ±π.
Here the Fourier cosine coefficients an all vanish, and the sine coefficients bn are
given by * *
1 π 2 π
bn = fodd (θ) sin nθ dθ = f (θ) sin nθ dθ.
π −π π 0
!
The resulting Fourier series is ∞ 1 bn sin nθ.
We are thus led to the following definitions: If f is a piecewise continuous
function on [0, π], its Fourier cosine series is the series
"∞ *
1 2 π
2 a0 + an cos nθ, an =
π 0
f (θ) cos nθ dθ,
1
and its Fourier sine series is the series
"∞ *
2 π
bn sin nθ, bn = f (θ) sin nθ dθ.
1
π 0
8.4. Fourier Series on Intervals 379

E XAMPLE 1. Let f (θ) = θ on [0, π]. The even and odd periodic extensions of
f are the triangle and sawtooth waves, respectively, and the Fourier cosine and
sine series of f are
∞ ∞
π 4 " cos(2m − 1)θ " (−1)n+1 sin nθ
− and 2 ,
2 π (2m − 1)2 n
1 1

respectively.
If f is piecewise smooth on [0, π], its even and odd periodic extensions will be
piecewise smooth on R. If f (0) = f (0+) and f (π) = f (π−), its even periodic
extension will be continuous at both 0 and π, but its odd periodic extension will
have jumps at 0 or π unless f (0) = 0 or f (π) = 0, respectively. In any case, an
application of Theorem 8.16 to these extensions easily yields the following:
8.30 Theorem. Suppose f is piecewise smooth on [0, π]. The Fourier cosine series
and the Fourier sine series of f converge to 12 [f (θ−) + f (θ+)] at every θ ∈ (0, π).
The cosine series converges to f (0+) at θ = 0 and to f (π−) at θ = π; the sine
series converges to 0 at both these points.
We may wish to consider periodic functions with period other than 2π, or func-
tions defined on intervals other than [0, π]. The general situation can be reduced to
the one we have studied by a linear change of variable; we record the results for
future reference.
Suppose f (x) is a piecewise smooth 2l-periodic function. We make the change
of variables - .
πx lθ
θ= , g(θ) = f (x) = f .
l π
Then g is 2π-periodic, and we have

" * π
inθ 1
g(θ) = cn e , cn = g(θ)e−inθ dθ.
−∞
2π −π

The substitution θ = πx/l then yields the Fourier series for f .


"∞ *
inπx/l 1 l
(8.31) f (x) = cn e , cn = f (x)e−inπx/l dx.
−∞
2l −l

The corresponding formula in terms of sines and cosines is


∞ I
" nπx nπx J
f (x) = 21 a0 + an cos + bn sin ,
1
l l
380 Chapter 8. Fourier Series

where
* l * l
1 nπx 1 nπx
an = f (x) cos dx, bn = f (x) sin dx.
l −l l l −l l

It follows that the Fourier cosine and sine series of a piecewise smooth function f
on the interval [0, l] are


" * l
1 nπx 2 nπx
(8.32) f (x) = 2 a0 + an cos , an = f (x) cos dx,
1
l l 0 l

and


" * l
nπx 2 nπx
(8.33) f (x) = bn sin , bn = f (x) sin dx.
l l 0 l
0

We conclude with a few remarks comparing Taylor series and Fourier series,


" ∞
"
f (n) (0)
f (x) = xn and f (x) = cn einπx/l ,
0
n! −∞

as ways of expanding a function f on an interval centered at the origin. First, Tay-


lor series are only defined for functions of class C ∞ , whereas the smoothness re-
quirements for Fourier series are quite minimal. The Taylor coefficients f (n) (0)/n!
depend only on the values of f in an arbitrarily small neighborhood of the origin,
whereas the Fourier coefficients cn depend on the values of f over the whole inter-
val [−l, l]. The partial sums of the Taylor series provide an excellent approximation
to f (x) when |x| is small but are often quite useless when |x| is large; the partial
sums of the Fourier series tend to approximate f about equally well over the whole
interval [−l, l]. (This last statement is a bit of an oversimplification!)
Despite their differences, there is a connection between Taylor and Fourier se-
ries that is of considerable importance! in more advanced mathematics. Namely, let
us consider a power series f (z) = ∞ 0 a n z n as a function of the complex variable

z. If we write z in polar coordinates as z = reiθ and fix r, we obtain a function


g(θ) = f (reiθ ) of !the variable θ, and the power series for f becomes a Fourier se-
ries for g: g(θ) = ∞ n inθ . (It is a special kind of Fourier series, however,
0 (an r )e
since the coefficient of einθ vanishes for all n < 0.)
8.5. Applications to Differential Equations 381

EXERCISES

1. Find the Fourier cosine series and the Fourier sine series of the following func-
tions on the interval [0, π]. All of these series can be derived from the results
of the examples and exercises in §8.1 without computing the coefficients from
scratch.
a. f (θ) = 1.
b. f (θ) = sin θ.
c. f (θ) = θ 2 . (For the sine series, use Example 1 and Exercise 6 in §8.1.)
d. f (θ) = θ for 0 ≤ θ ≤ 12 π, f (θ) = π − θ for 12 π ≤ θ ≤ π.
2. Expand the given function in a series of the given type. As in Exercise 1, use
previously derived results as much as possible.
a. f (x) = 1; sine series on [0, 1].
b. f (x) = 1 for 0 < x < 2, f (x) = −1 for 2 < x < 4; cosine series on
[0, 4].
c. f (x) = lx − x2 ; sine series on! [0, l].
d. f (x) = ex ; series of the form ∞ −∞ cn e
2πinx on [0, 1].

3. Suppose f is a piecewise continuous function on [0, 2l] that satisfies f (x) =


f (2l − x) (that is, the graph of f is symmetric about the line x = l). Let an and
bn be the Fourier cosine and sine coefficients of f (given by (8.32) and (8.33)
with l replaced by 2l). Show that an = 0 for n odd and bn = 0 for n even.
4. Show that a piecewise smooth function f on [0, l] can be expanded in a series
as follows:

" *
(n − 21 )πx 2 l (n − 12 )πx
f (x) = βn sin , βn = f (x) sin dx.
1
l l 0 l

(Hint: Extend f to [0, 2l] by making it even about x = l, i.e., f (x) = f (2l − x)
for x ∈ [l, 2l], and use Exercise 3.)

8.5 Applications to Differential Equations


Fourier series were originally invented in order to solve some boundary value prob-
lems of mathematical physics. In this section we study a few basic examples.

Heat Flow in an Insulated Rod. Consider a rod occupying the interval [0, l],
insulated so that no heat can enter or leave it, and let f (x) be the temperature at
position x and time t = 0. How does the temperature distribution evolve with time?
(Note: Instead of thinking of a thin rod, one can think of a thick cylindrical slab
382 Chapter 8. Fourier Series

x
R

F IGURE 8.6: The cylindrical slab {(x, y, z) : 0 ≤ x ≤ l, (y, z) ∈ R}.

occupying the region where 0 ≤ x ≤ l and (y, z) ∈ R, where R is a bounded


region in the yz-plane, as in Figure 8.6. The model of heat flow described here is
valid under the hypothesis that the temperature depends only on x.)
Let u(x, t) denote the temperature at position x and time t; thus u satisfies the
initial condition u(x, 0) = f (x). As we showed in §5.6, u obeys the heat equation
∂t u = k∂x2 u, where k is a positive constant (equal to K/σ in (5.42)). Since the
rate of heat flow across the point x is proportional to −∂x u(x, t) (Newton’s law
of cooling), the fact that no heat enters or leaves the ends of the rod means that u
satisfies the boundary conditions ∂x u(0, t) = ∂x u(l, t) = 0. In summary,
∂u ∂2u ∂u ∂u
(8.34) = k 2, u(x, 0) = f (x), (0, t) = (l, t) = 0.
∂t ∂x ∂x ∂x
This is the problem we propose to solve.
The first step is to find a family of solutions of the heat equation satisfying the
right boundary conditions by a device called separation of variables. The idea is
to look for solutions of the form u(x, t) = ϕ(x)ψ(t). For such a function, the heat
equation becomes
ψ ′ (t) ϕ′′ (x)
ϕ(x)ψ ′ (t) = kϕ′′ (x)ψ(t), or = .
kψ(t) ϕ(x)
In this last equation, the quantities on the left and right depend only on t and x,
respectively, so they must both be equal to a constant that we call −α. Thus,

ψ ′ (t) = −kαψ(t), ϕ′′ (x) = −αϕ(x).

These are simple ordinary differential equations, and the general solutions are read-
ily found:
√ √
ψ(t) = C0 e−kαt , ϕ(x) = C1 cos α x + C2 sin α x.

We have thus found a large family of solutions of the heat equation of the form
ϕ(x)ψ(t). For these solutions, the boundary conditions ∂x u(0, t) = ∂x u(l, t) = 0
8.5. Applications to Differential Equations 383

translate into the conditions ϕ′ (0) = ϕ′ (l) = 0. But


√ √ √
ϕ′ (x) = α(−C1 sin α x + C2 cos α x),

so the condition ϕ′ (0) = 0 forces C2 = 0, and the condition ϕ′ (l) = 0 then forces

α to be a multiple of π/l, or α = n2 π 2 /l2 where n is an integer (which might as
well be nonnegative). In short, we have obtained the following family of solutions
of the heat equation together with the boundary conditions:
- 2 2 .
−n π kt nπx
un (x, t) = exp cos (n = 0, 1, 2, 3, . . .).
l2 l
Since the heat equation and the boundary conditions are linear, we obtain more
general solutions by taking linear combinations of these. In fact, we can pass to
infinite linear combinations — that is, infinite series of the form
"∞ - 2 2 .
−n π kt nπx
(8.35) u(x, t) = an exp 2
cos .
l l
0

Finally, we are ready to tackle the initial condition u(x, 0) = f (x). If we set
t = 0 in (8.35), we obtain

" nπx
u(x, 0) = an cos ,
l
0

so we can make u(x, 0) equal to f (x) by taking the series on the right to be the
Fourier cosine series of f , defined by (8.32)! (Note that the constant term, which
we called 12 a0 before, is called a0 here.) In other words, to solve the problem (8.34),
we take u(x, t) to be defined by (8.35), where the coefficients an are given in terms
of the initial data f by
* *
1 l 2 l nπx
a0 = f (x) dx, an = f (x) cos dx (n > 0).
l 0 l 0 l
At this point we should stop to verify that the proposed solution (8.35) of the
problem (8.34) really works, as the passage from finite linear combinations to infi-
nite series has the potential to cause difficulties. In fact, everything turns out quite
nicely for this problem. In the first place, if the initial temperature distribution f (x)
is continuous and piecewise smooth (a reasonable physical assumption), the same
will be true of its even 2l-periodic extension, so by Theorem ! 8.29, its Fourier series
is absolutely and uniformly convergent. In particular, ∞ 1 |an | < ∞. The abso-
lute value of the nth term of the series in (8.35) is at most |an |, so the Weierstrass
384 Chapter 8. Fourier Series

M-test shows that this series converges absolutely and uniformly for 0 ≤ x ≤ l and
t ≥ 0 to define a continuous function u(x, t) there. Moreover, for t > 0, the ex-
ponential factors in (8.35) decay rapidly as n → ∞, which makes the convergence
even better. In particular, repeated differentiation with respect to t or x introduces
factors of nk into the series, which are still overpowered by the decay of the expo-
nential factors, so the differentiated series still converges absolutely and uniformly.
If follows that u(x, t) is of class C ∞ for t > 0 and that termwise differentiation
is permissible; u therefore satisfies the heat equation and the boundary conditions
because each term of the series does.
Two further remarks: First, as t → ∞, the exponential factors in (8.35) all
tend rapidly to zero except for the one with n = 0, and so u(x, t) approaches
the constant a0 , the mean value of f on the interval [0, l]. In physical terms this
means that the rod approaches thermal equilibrium as time progresses. Second, the
series (8.35) will usually diverge when t < 0, for then the exponential factors grow
rather than decay! This corresponds to the physical fact that time is irreversible for
diffusion processes governed by the heat equation.

The Vibrating String. We now study the vibrations of a string stretched across
the interval 0 ≤ x ≤ l and fixed at the endpoints. (Think of a guitar string, and see
Figure 8.7.) Here u(x, t) will denote the displacement of the string (in a direction
perpendicular to the x-axis) at position x and time t. The relevant differential
equation is the wave equation ∂t2 u = c2 ∂x2 u, where c is a positive constant that
can be interpreted as the speed with which disturbances propagate down the string.
(See Folland [6, pp. 388–90] or Kammler [10, pp. 526–7] for a derivation of the
wave equation from physical principles.) Since the string is fixed at both ends,
the boundary conditions for this problem are u(0, t) = u(l, t) = 0. As for initial
conditions, since the wave equation is second-order in t we need to specify both
the initial displacement u(x, 0) and the initial velocity ∂t u(x, 0). Thus the problem
we have to solve is
(8.36)
∂2u 2
2∂ u ∂u
2
= c 2
, u(x, 0) = f (x), (x, 0) = g(x), u(0, t) = u(l, t) = 0,
∂t ∂x ∂t

where f and g are specified functions on [0, l].


Again we employ the technique of separation of variables and look for solutions
of the wave equation of the form u(x, t) = ϕ(x)ψ(t). For such functions the wave
equation becomes

ψ ′′ (t) ϕ′′ (x)


ϕ(x)ψ ′′ (t) = c2 ϕ′′ (x)ψ(t), or = .
c2 ψ(t) ϕ(x)
8.5. Applications to Differential Equations 385

F IGURE 8.7: A vibrating string fixed at its ends.

In the last equation, the quantities on the left and right depend only on t and x,
respectively, so they are both equal to a constant −α, and we obtain the ordinary
differential equations
ψ ′′ (t) + αc2 ψ(t) = 0, ϕ′′ (x) + αϕ(x) = 0.
The general solution of the second equation is
√ √
ϕ(x) = C1 cos α x + C2 sin α x.
The boundary condition u(0, t) = 0 forces C1 to vanish, and then the boundary

condition u(l, t) = 0 forces α to be a multiple of π/l, so α = n2 π 2 /l2 for some
(positive) integer n. With this value of α, the general solution of the differential
equation for ψ is
nπct nπct
ψ(t) = b cos + B sin .
l l
(The arbitrary constants are labeled b and B for reasons that will become clearer in
a moment.)
For each positive integer n, we therefore have the solution
- .
nπct nπct nπx
un (x, t) = bn cos + Bn sin sin .
l l l
Taking linear combinations and passing to limits, we are led to the series solution
∞ -
" .
nπct nπct nπx
(8.37) u(x, t) = bn cos + Bn sin sin .
1
l l l

It remains to satisfy the initial conditions. Setting t = 0 in (8.37) yields



" nπx
u(x, 0) = bn sin ,
1
l

so we satisfy the condition u(x, 0) = f (x) by taking the bn ’s to be the Fourier sine
coefficients of f :
*
2 l nπx
bn = f (x) sin dx.
l 0 l
386 Chapter 8. Fourier Series

Moreover, formally differentiating (8.37) with respect to t and then setting t = 0


yields
"∞
∂u nπc nπx
(x, 0) = Bn sin ,
∂t l l
1

so we should be able to satisfy the condition ∂t u(x, 0) = g(x) by taking nπcBn /l


to the nth Fourier sine coefficient of g:
* l
2 nπx
Bn = g(x) sin dx.
nπc 0 l

Again, we ask: Does this really work? It is physically reasonable to assume


that the initial functions f and g are continuous and piecewise smooth and satisfy
the boundary conditions f (0) = f (l) = g(0) = g(l) = 0. Their odd 2l-periodic
extensions will then have the same properties, so their Fourier series! will be abso-
lutely
! and uniformly convergent by Theorem 8.29. In particular, |bn | < ∞ and
|nBn | < ∞, so by the Weierstrass M-test, the series (8.37) is absolutely and
uniformly convergent for 0 ≤ x ≤ l, −∞ < t < ∞. However, there is no reason
for the twice-differentiated series that should represent ∂t2 u or c2 ∂x2 u, namely,
∞ - .
π 2 c2 " 2 nπct nπct nπx
(8.38) − n bn cos + B n sin sin ,
l2 l l l
1

to converge. The extra factor of n2 makes the terms larger, and there is no ex-
ponential decay anywhere to compensate. If we recall that the decay of Fourier
coefficients is related to the degree of smoothness of the function in question, the
contrast with the heat equation may be expressed as follows: The diffusion of heat
tends to smooth out irregularities in the initial temperature distribution, but in wave
motion, any initial roughness simply propagates without dying out.
We can obtain a positive result by imposing more differentiability hypotheses
on f and g. If we assume that not only f and g but also the first two derivatives of f
and the first derivative of g are continuous and piecewise smooth, and that not only
f and g but also f ′′ vanishes at the endpoints (so that its odd! periodic extension
is
! 2continuous there), then Theorems 8.26 and 8.29 imply that n2 |bn | < ∞ and
n |Bn | < ∞, which guarantees the absolute and uniform convergence of (8.38).
This is also enough to guarantee that the formal differentiation of (8.37) that led to
the formula for the Bn ’s is valid.
However, these additional assumptions are rather unnatural from a physical
point of view. The obvious model for a plucked string, for example, is to take
f to be a piecewise linear function as in Figure 8.8. It is easy to calculate the
8.5. Applications to Differential Equations 387

F IGURE 8.8: A model for a plucked string.

coefficients bn explicitly for such an f (Exercise 4), and they turn out to decay
exactly like n−2 . The series (8.37) therefore converges nicely, and we may expect
it to provide a good description of the physical vibration of the string. On the other
hand, the twice-differentiated series (8.38) does not converge at all, so it is hard to
say in what sense (8.37) satisfies the wave equation. The resolution of this paradox
is to expand our vision of what a solution of a differential equation ought to be and
to develop a notion of “weak solution” that will encompass examples such as this
one. But this is a more advanced topic; see, for example, Folland [6, §9.5].
Taking for granted that the series (8.37) really is the solution of the boundary
value problem (8.36), we say a few words about its physical interpretation. Think of
the string as being a producer of musical notes such as a guitar string. The nth term
in the series (8.37), as a function of t, is a pure sine wave with frequency nπc/l,
which represents a musical tone at a pure, definite pitch. The series (8.37) therefore
shows how the sound produced by the string can be resolved into a superposition
of these pure pitches. Typically, the coefficients bn and Bn decrease as n increases,
so that the largest contribution comes from the first term, n = 1. This is the
“fundamental” pitch, and the higher n’s are the “overtones” that give the note its
particular tone quality.

Related Problems. The heat flow and vibration problems (8.34) and (8.36)
can be modified by changing the boundary conditions; this leads to models of other
interesting physical processes. Here are a few examples:
1. The boundary value problem

∂u ∂2u
= k 2, u(x, 0) = f (x), u(0, t) = u(l, t) = 0
∂t ∂x
models the flow of heat in a rod that occupies the interval 0 ≤ x ≤ l when both
ends are held at temperature zero — by immersing them in ice water, for instance.
(Note that the heat equation doesn’t care where the zero point of the temperature
scale is located; if u is a solution, so is u + c for any constant c. Of course, this
means that the validity of the heat equation as a model for actual thermodynamic
processes has its limitations, as absolute zero exists physically.) The method of
solution is exactly the same as for the insulated problem (8.34), except that the
388 Chapter 8. Fourier Series

boundary conditions for ϕ(x) are ϕ(0) = ϕ(l) = 0. Thus, as in the vibrating string
problem, we obtain ϕ(x) = sin(nπx/l), and the solution is given by
"∞ - 2 2 .
−n π kt nπx
u(x, t) = bn exp sin ,
l2 l
1
!
where bn sin(nπx/l) is the Fourier sine series of f (x).
2. The boundary value problem

∂2u 2
2∂ u
= c ,
∂t2 ∂x2
∂u ∂u ∂u
u(x, 0) = f (x), (x, 0) = g(x), (0, t) = (l, t) = 0
∂t ∂x ∂x
models the vibration of air in a cylindrical pipe occupying the interval 0 ≤ x ≤ l
that is open at both ends. (Examples: flutes and some organ pipes.) Here u(x, t)
represents the longitudinal displacement of the air at position x and time t. The
boundary conditions ∂x u(0, t) = ∂x u(l, t) = 0 come from the fact that the change
in air pressure due to the displacement u is proportional to ∂x u, and the air pressure
at both ends must remain equal to the ambient air pressure. Again, the solution is
very similar to (8.37) except that it involves cosines instead of sines in x:
∞ -
" .
1 nπct nπct nπx
u(x, t) = 2 (a0 + A0 t) + an cos + An sin cos ,
1
l l l
! !∞
where 21 a0 + ∞ 1
1 an cos(nπx/l) and 2 A0 + 1 (nπcAn /l) cos(nπx/l) are the
Fourier cosine series of f and g, respectively. (The term 21 (a0 + A0 t) represents
a flow of air down the tube with constant velocity, of no importance for the vibra-
tions.) As with the vibrating string, the vibrations of the pipe are a superposition of
vibrations at the definite frequencies nπc/l (n = 1, 2, 3, . . .).
3. We can also mix the two types of boundary conditions we have been consid-
ering: for the heat equation,
∂u ∂2u ∂u
= k 2, u(x, 0) = f (x), u(0, t) = (l, t) = 0,
∂t ∂x ∂x
or the wave equation,

∂2u 2
2∂ u
= c ,
∂t2 ∂x2
∂u ∂u
u(x, 0) = f (x), (x, 0) = g(x), u(0, t) = (l, t) = 0.
∂t ∂x
8.5. Applications to Differential Equations 389

The first of these models heat flow in a rod where one end is held at temperature
zero and the other is insulated; the second models vibrations of air in cylindrical
pipes where one end is closed and the other is open, such as clarinets and some
organ pipes. In both of them, separation of variables leads to the ordinary differen-
tial equation ϕ′′ (x) = −αϕ(x) with boundary conditions ϕ(0) = ϕ′ (l) = 0. The
√ √
general solution of the differential equation is ϕ(x) = C1 cos αx + C2 sin αx;
the condition ϕ(0) = 0 forces C1 to vanish, and then the condition ϕ′ (l) = 0 forces

α to be of the form (n − 21 )π/l with n a positive integer. We are therefore led to
try to expand the initial functions in a series of the form

" πx
f (x) = an sin(n − 12 ) .
1
l
This can indeed be done; the technique for reducing this problem to one of ordinary
Fourier sine series is outlined in Exercise 4 of §8.4.
It is interesting to note that the resulting frequencies for the vibrating pipe are
(n − 21 )πc/l (n = 1, 2, 3, . . .). In particular, the fundamental frequency for a pipe
closed at one end and open at the other, namely 21 πc/l, is half as great as for a
pipe of equal length that is open at both ends. Moreover, only the odd-numbered
multiples of this fundamental frequency occur as “harmonics” for half-open pipes,
whereas all integer multiples occur for open pipes; as a result, the two kinds of
pipes produce notes of different tone qualities.
4. Clearly there are many other variations to be played on this theme — dif-
ferent boundary conditions, other differential equations, and so on. A few further
examples are outlined in the exercises, and we shall indicate a more general frame-
work in which such problems can be studied in the next section.

EXERCISES
1. A rod 100 cm long is insulated along its length and at both ends. Suppose that
its initial temperature is u(x, 0) = x (x in cm, u in ◦ C, t in sec, 0 ≤ x ≤ 100),
and that its diffusivity coefficient k is 1.1 cm2 /sec (about right if the rod is made
of copper).
a. Find the temperature u(x, t) for t > 0. (For the relevant Fourier series, see
Example 1 of §8.4.)
b. Show that the first three nonvanishing terms of the series (including the
constant term) give the temperature accurately to within 1◦ when t = 60
!∞What are u(0,
(one minute after starting). 60), u(10, 60),
!∞ and u(40, 60) to
the nearest 1◦ ? (Hint: 1 (2n − 1)−2 = π 2 /8, so
3 (2n − 1)
−2 =

(π 2 /8) − 1 − 19 ≈ 0.123.)
390 Chapter 8. Fourier Series

c. Show that u(x, t) is within 1◦ of its equilibrium value of 50◦ for all x when
t ≥ 3600 (i.e., after one hour). (Don’t work too hard; crude estimates are
enough.)
2. Find the temperature function u(θ, t) (t > 0) for a rod bent into the shape of
a circular hoop, given the initial temperature u(θ, 0) = f (θ). (Here θ denotes
the angular coordinate on the circle, and the boundary conditions for a straight
rod are replaced by the requirement that u should be a 2π-periodic function of
θ.)
3. As we found in §5.6, the inhomogeneous heat equation ∂t u = k∂x2 u + G can
be used to model heat flow in a rod when the total amount of heat energy is not
constant; here G is a function of x and t, with units of degrees per unit time,
that accounts for the addition or subtraction of heat from the rod. Let us solve
the initial value problem with constant-temperature boundary conditions,

∂t u = k∂x2 u + G, u(x, 0) = f (x), u(0, t) = u(l, t) = 0,

making appropriate assumptions on f and G so that Fourier expansions are


valid. Motivated by the solution (8.35) for the special case G ≡ 0, we ex-
!∞ everything in a Fourier sine series. That is, for each t we write G(x, t) =
pand
!1∞ βn (t) sin(nπx/l), and we try to find a solution in the form u(x, t) =
1 bn (t) sin(nπx/l), where the coefficients bn (t) are to be determined. Plug
this into the equation ∂t u = k∂x2 u + G to obtain an ordinary differential equa-
!∞ for each bn (t), with initial condition determined by the requirement that
tion
1 bn (0) sin(nπx/l) should be the Fourier sine series of f (x). Then solve
these ordinary differential equations to obtain u. What conditions on f and G
will guarantee the validity of these calculations?
4. Consider a vibrating string occupying the interval [0, l]. Suppose the string is
plucked at x = b (0 < b < l) so that its initial displacement u(x, 0) is mx/b
for 0 ≤ x ≤ b and m(l − x)/(l − b) for b ≤ x ≤ l (that is, u(x, 0) is linear on
[0, b] and on [b, l], and equal to m at x = b), and its initial velocity ∂t u(x, 0)
is zero. (Note: For this to be a realistic model of a plucked string, we should
have l ≫ m.)
a. Find the Fourier series for u(x, t) for t > 0. (The result of Exercise 2 of
§8.3 can be used.)
b. Compute the coefficients b1 , . . . , b5 of the first five terms (notation as in
(8.37)) numerically when b = (0.4)l and when b = (0.1)l. Observe that
the higher frequencies contribute a lot more to u(x, t) when b = (0.1)l
than when b = (0.4)l. (Musically: Plucking a string nearer the end gives a
note with more “harmonics.”)
8.5. Applications to Differential Equations 391

5. The model for a vibrating string given by the wave equation is unrealistic be-
cause it predicts that the vibration will continue forever without dying out. Real
strings, however, are not perfectly elastic, so the vibrational energy is gradu-
ally dissipated. A better model is obtained by the following modification of the
wave equation:
∂t2 u = c2 ∂x2 u − 2δ∂t u,
where δ is a small positive constant. (The left side is the acceleration, and the
terms on the right are the effects of the elastic restoring force and the damping
force that tends to slow the motion down. The factor of 2 is just for conve-
nience.) Find the general solution of this differential equation subject to the
boundary conditions u(0, t) = u(l, t) = 0 by modifying the method used in
the text for the ordinary wave equation. Assume that δ < πc/l. You should find
that the solutions decay exponentially in time and that the frequencies decrease
as the damping constant δ increases.
Exercises 6 and 7 concern the Dirichlet problem for a bounded open set S ⊂
R2 : Given a function f on the boundary ∂S, find a solution of Laplace’s equation
∂x2 u + ∂y2 u = 0 on S such that u = f on ∂S. (A physical interpretation: Find
the steady-state distribution of heat in S when the temperature on the boundary is
given.)
6. Consider the Dirichlet problem for a rectangle:
∂x2 u + ∂y2 u = 0 for 0 < x < l, 0 < y < L;
u(x, 0) = f1 (x), u(x, L) = f2 (x), u(0, y) = g1 (y), u(l, y) = g2 (y).

a. Suppose we can solve this problem in the two special cases g1 = g2 = 0


and f1 = f2 = 0. How can the solutions u1 and u2 for these cases be
combined to yield the solution for the general case?
b. Henceforth we assume that g1 = g2 = 0 (the case f1 = f2 = 0 is sim-
ilar). Use separation of variables to find solutions of Laplace’s equation
satisfying u(0, y) = u(l, y) = 0 in the form u(x, y) = ϕ(x)ψ(y); then
use Fourier techniques to find the (infinite) linear combination of these so-
lutions that satisfies u(x, 0) = f1 (x) and u(x, L) = f2 (x). (Hint: The
general solution of ψ ′′ − c2 ψ = 0 can be written in the form ψ(y) =
a sinh cy + b sinh c(L − y). [Why?] This form of the solution is more
convenient than the more obvious a sinh cy + b cosh cy.)
7. Consider the Dirichlet problem for the unit disc:
∂x2 u + ∂y2 u = 0 for x2 + y 2 < 1, u(cos θ, sin θ) = f (θ).
392 Chapter 8. Fourier Series

If we think of u as a function of the polar coordinates (r, θ) rather than the


Cartesian coordinates (x, y), by Proposition 2.51 this becomes

r 2 ∂r2 u + r∂r u + ∂θ2 u = 0 for r < 1, u(1, θ) = f (θ).

a. Use separation of variables to find solutions of this differential equation in


the form u(r, θ) = ϕ(r)ψ(θ). Keep in mind that the solutions must be 2π-
periodic functions of θ and that they must be smooth at the origin, where
r = 0 and θ is undefined. (Hint: The general solution of the Euler equation
r 2 ϕ′′ + rϕ′ − c2 ϕ = 0 is ϕ(r) = ar c + br −c if c ̸= 0, a + b log r if c = 0.)
Then use Fourier techniques to find the (infinite) linear combination of
these solutions that satisfies u(1, θ) = f (θ).
b. You should find that u(r, θ) equals Ar f (θ), the Abel approximant to f
defined by (8.19). Use (8.20) and (8.22) to derive the Poisson integral
formula for the solution:
* π
1 1 − r2
u(r, θ) = f (θ − ϕ) dϕ.
2π −π 1 + r 2 − 2r cos ϕ

8.6 The Infinite-Dimensional Geometry of Fourier Series


In this section we shall re-examine the notion of Fourier series in the light of a
profound analogy with certain ideas from vector algebra. We begin with a quick
review of the latter.
When expressed in algebraic terms, the concepts of Euclidean geometry in n
dimensions are based on the vector-space structure of Rn (that is, the operations
of vector addition and scalar multiplication), together with the dot product or inner
product a · b, in terms of which we can define lengths (|a| = (a · a)1/2 ) and angles
(the angle from a to b is arccos(a · b/|a| |b|)). The “natural” coordinate systems
for this geometry are the ones arising from an orthonormal basis for Rn , that is, a
basis u1 , . . . , un such that uj · uk equals 0 for j ̸= k and 1 for j = k. The formula
for expressing an arbitrary vector x in terms of such a basis is given very simply in
terms of inner products:
n
"
x= cj uj , cj = x · uj .
1

!n for cj results from taking the inner


(The formula !nproduct of both sides of the equa-
tion x = 1 ck uk with uj to yield x · uj = 1 ck uk · uj = cj .)
8.6. The Infinite-Dimensional Geometry of Fourier Series 393

Similar ideas underlie the study of complex n-dimensional vectors. The main
difference is that, since the absolute value |z| of a complex number z is given by
(zz)1/2 rather than (z 2 )1/2 , the appropriate definition of inner product is
n
"
(8.39) ⟨a, b⟩ = aj bj (a, b ∈ Cn ).
1

(Recall that the conjugate z of a complex number z = x + iy (x, y ∈ R) is defined


to be x − iy. The notation a · b is also used for the complex inner product, but
we introduce the new notation ⟨a, b⟩ to avoid confusion with the real case and
to prepare for further developments.) Thus ⟨a, b⟩ is a linear function of a but
a conjugate-linear function of b (meaning that ⟨a, cb⟩ equals c⟨a, b⟩ rather than
c⟨a, b⟩), and ⟨b, a⟩ = ⟨a, b⟩. The magnitude or norm of the vector a is still given
by |a| = ⟨a, a⟩1/2 , and we still call two vectors a and b orthogonal if ⟨a, b⟩ = 0.
As in the real case, a basis u1 , . . . , un for Cn is orthonormal if ⟨uj , uk ⟩ is 0 if
j ̸= k and 1 if j = k. The expansion formula for a vector x ∈ Cn with respect to
an orthonormal basis is exactly the same:
n
"
x= cj uj , cj = ⟨x, uj ⟩.
1

If the basis {uj } is orthogonal (⟨uj , uk ⟩ = 0 for j ̸= k) but not normalized (∥uj ∥
not necessarily equal to 1), the formula becomes
n
" ⟨x, uj ⟩
(8.40) x= cj uj , cj = .
1
∥uj ∥2

Now we are ready to make the conceptual leap from the discrete and finite-
dimensional to the continuous and infinite dimensional. Suppose we are studying
functions on an interval [a, b] — let us say, piecewise continuous, complex-valued
ones. We regard such a function f as a “vector” whose “components” are the
values f (x) as x ranges over [a, b]. We define the inner product of two functions
f and g just as in (8.39) except that the sum is replaced by an integral:
* b
(8.41) ⟨f, g⟩ = f (x)g(x) dx.
a

Further, we define the norm of a function f to be


+* b ,1/2
∥f ∥ = ⟨f, f ⟩1/2 = |f (x)|2 dx ,
a
394 Chapter 8. Fourier Series

and we define two functions f and g to be orthogonal on [a, b] if ⟨f, g⟩ = 0. A


sequence of functions {ϕn } is called orthogonal if ⟨ϕm , ϕn ⟩ = 0 for m ̸= n, and
orthonormal if, in addition, ∥ϕn ∥ = 1 for all n.
For example, take the interval [a, b] to be [−π, π], and define en (x) = einx .
Then, since einx = e−inx , by (8.5) we have
* π 7
i(m−n)x 2π if m = n,
⟨em , en ⟩ = e dx =
−π 0 otherwise.
Thus {en }∞
−∞ is an orthogonal set; the corresponding orthonormal set is {ϕn }−∞

where ϕn = (2π) −1/2 en . The formula for the Fourier series of a function f ,
"∞ * π
1 ⟨f, en ⟩
f= cn en , cn = f (x)e−inx dx = ,
−∞
2π −π ∥en ∥2

is an exact analogue of the formula (8.40) for the expansion of a vector in terms of
an orthogonal basis!
A similar interpretation holds for Fourier cosine and sine series. To wit, it is
easy to verify (Exercise 1) that {cos nπx/l}∞ ∞
0 and {sin nπx/l}1 are orthogonal
sets on the interval [0, l], and that the formulas for the Fourier cosine and sine
coefficients of a function f on [0, l] are analogous to (8.40).
There are some unanswered questions here, however. The inner product ⟨f, g⟩
makes sense when f and g are piecewise continuous on [a, b], but we have proved
the validity of Fourier expansions only for piecewise smooth functions. So, what is
the “right” class of functions to consider here? Can we make sense out of Fourier
series for functions that may not be piecewise smooth?
The key insight is that pointwise convergence is the wrong notion of conver-
gence in this situation. Instead, we should use a notion of convergence that arises
from the geometry of the inner product. That is, we think of the set
P C(a, b) = set of all piecewise continuous complex-valued functions on [a, b]
as an “infinite-dimensional Euclidean space” with the notions of length and angle
given by the inner product (8.41). The “distance” between two functions is to be
interpreted as the norm of their difference,
+* b ,1/2
2
Distance from f to g = ∥f − g∥ = |f (x) − g(x)| dx ,
a

and the corresponding notion of convergence is that


* b
fk → f ⇐⇒ ∥fk − f ∥ → 0, i.e., |fk (x) − f (x)|2 dx → 0.
a
8.6. The Infinite-Dimensional Geometry of Fourier Series 395

This notion of convergence is called convergence in norm or mean-square con-


vergence.
Note. If the distance ∥f − g∥ between two piecewise continuous functions is
zero, it does not follow that f is identically equal to g, but only that f (x) = g(x) for
all except perhaps finitely many values of x. In this setting, it is appropriate not to
worry about this technicality and to think of two functions as being the same when
they differ only at finitely many points. This issue already arose in connection
with the behavior of the Fourier series of f at points where f is discontinuous
(cf. Corollary 8.18).
Mean-square convergence is rather different from pointwise convergence, and
neither one implies the other. For example, let us take [a, b] = [−1, 1]. If
7
k if 0 < x < 1/k,
fk (x) =
0 otherwise,
; 1/k √
then fk → 0 pointwise but ∥fk ∥ = ( 0 k2 dx)1/2 = k → ∞. On the other
hand, if 7
1 if −1/k < x < 1/k,
gk (x) =
0 otherwise,
; 1/k '
then ∥gk ∥ = ( −1/k dx)1/2 = 2/k → 0, but gk (0) = 1 ̸→ 0. (By replacing
the interval (−1/k, 1/k) here by an interval Ik whose length tends to 0 but whose
midpoint oscillates back and forth within the interval [−1, 1] as k → ∞, one can
construct examples of sequences {gk } that converge in norm but do not converge
at any point.) However, for uniform rather than pointwise convergence there is
something to say.

8.42 Proposition. If fk → f uniformly on [a, b], then fk → f in norm on [a, b].

Proof. If fk → f uniformly, there is a sequence {Ck } of constants such that


|fk (x) − f (x)| ≤ Ck for all x ∈ [a, b] and Ck → 0, so
* b
|fk (x) − f (x)|2 dx ≤ (b − a)Ck2 → 0.
a

More generally, fk → f in norm provided that fk → f pointwise and there is a


constant C such that |fk (x)| ≤ C for all k and all x ∈ [a, b]; this follows from the
bounded convergence theorem (4.52).
396 Chapter 8. Fourier Series

The introduction of norm convergence is justified by the fact that the Fourier
series of any piecewise continuous function f on [−π, π] converges in norm to f .
This is a substantial result, but there is more to be said before we state a formal
theorem.
The space P C(a, b) of piecewise continuous functions on [a, b] fails to be a
good infinite-dimensional analogue of Euclidean space in one crucial respect: it is
not complete. That is, if {fk } is a sequence in P C(a, b) such that ∥fj − fk ∥ → 0
as j, k → ∞, there may not be a function f ∈ P C(a, b) such that ∥fk − f ∥ → 0.
For example, with [a, b] = [0, 1], let
7
x−1/4 if x > 1/k,
fk (x) =
0 otherwise.

It is easily verified that ∥fj −fk ∥2 = 2|j −1/2 −k−1/2 | → 0 as j, k → ∞. However,


the function to which the fk ’s are converging is clearly f (x) = x−1/4 (x > 0),
which does not belong to P C(0, 1) because it blows up at 0. Thus, to fill in the
“holes” in P C(a, b) one will have to deal with unbounded functions and improper
integrals. But even this is not enough; with more cleverness one can construct ex-
amples where the limiting function f is not (Riemann) integrable on any subinterval
of [a, b].
What is needed here is the Lebesgue integral, which handles integrals of un-
bounded and discontinuous functions more capably (see §4.8). The appropriate
“completion” of the space P C(a, b) is the space of square-integrable functions,
C * b D
2 2
L (a, b) = f : f is Lebesgue measurable on [a, b] and |f (x)| dx < ∞ ,
a

where the integral is a Lebesgue integral. (The name “L2 ” is pronounced “L-two”;
the L is in honor of Lebesgue and the 2 refers to the exponent in |f (x)|2 .)
We can now state the general convergence theorem for Fourier series.
8.43 Theorem. Let en (θ) = einθ .
a. If f ∈ L2 (−π, π), the Fourier series

" * π
1
cn en , cn = f (θ)e−inθ dθ,
−∞
2π −π

converges in norm to f , that is,


* π
) N
" )2
) )
lim )f (θ) − cn e )) dθ = 0.
inθ
N →∞ −π )
−N
8.6. The Infinite-Dimensional Geometry of Fourier Series 397

b. Bessel’s inequality is an equality: For any f ∈ L2 (−π, π),



" * π
21
|cn | = |f (θ)|2 dθ.
−∞
2π −π

!∞
c. If {cn }∞ !∞ of complex numbers such that −∞ |c2n | con-
−∞ is any sequence
2

verges, then the series −∞ cn en converges in norm to a function in L (−π, π).

Proof. A full proof of Theorem 8.43 is beyond the scope of this book. (One may
be found in Jones [9, p. 325] or Rudin [18, pp. 328ff.].) However, the idea is as
follows. If f is continuous and piecewise smooth, we know that its Fourier series
converges uniformly (Theorem 8.29) and hence in norm, so (a) is valid for such f .
We then obtain the result for arbitrary f ∈ L2 (−π, π) by a limiting argument that
involves proving that any function in L2 (−π, π) is the limit in norm of a sequence
of continuous, piecewise smooth functions. (A partial result in this direction is
indicated in Exercise 7.) (b) follows easily because, as we showed in the proof of
Bessel’s inequality,
* N * ) N )2
1 π " 1 π ) " )
2
|f (θ)| dθ − 2
|cn | = )f (θ) − cn e )) dθ,
inθ
2π 2π )
−π −N −π −N

and the integral on the right tends to zero as N → ∞ since the series converges
in norm to f . (c) follows from (b) and the completeness of L2 (−π, π). Indeed, by
(b),
* π) " )2
"
) )
) c einθ )
dθ = 2π |cn |2 ,
) n )
−π M ≤|n|≤N M ≤|n|≤N
!
so the partial sums of the series cn en are Cauchy in norm; by completeness, the
series converges in norm.

Theorem 8.43 says that {einx }∞ 2


−∞ is an orthogonal basis for L (−π, π), that
2
is, an orthogonal set with the property that every function in L (−π, π) can be
expanded uniquely as a norm-convergent series of scalar multiples of functions in
the set. Likewise, {cos nx}∞ ∞ 2
0 and {sin nx}1 are orthogonal bases for L (0, π);
see Exercises 1 and 2.
The equality in Theorem 8.43b,

" * π
1
(8.44) |cn |2 = |f (θ)|2 dθ,
−∞
2π −π
398 Chapter 8. Fourier Series

is known as Parseval’s identity; it is the infinite-dimensional analogue of the


Pythagorean theorem for finite-dimensional vectors, if we think of f as an infinite-
dimensional vector and the cn ’s as the components of this vector with respect to the
orthogonal basis {en }. The factor of 2π is there because ∥en ∥2 = 2π.
As an illustration
! of the use of Parseval’s identity, we give another derivation
of the formula ∞ 1 n −2 = π 2 /6. (The first one was in Example 2 of §8.2.) Let

f be the sawtooth wave function (f (θ) = θ for |θ| < π). We calculated in §8.1
that its Fourier coefficients are given by c0 = 0 and cn = (−1)n+1 /in for n ̸= 0.
Therefore,

" +∞ −1 , ∞ * π
1 1 " 1 " 1 1" 2 1 2 π2
= + = |cn | = θ dθ = .
n2 2 n2 −∞ n2 2 −∞ 4π −π 6
1 1

Parseval’s identity easily yields the following generalization of itself, which is


often useful:
! !
8.45 Corollary. If f, g ∈ L2 (−π, π) have the Fourier series cn en and γn en ,
then
"∞ * π
1
(8.46) cn γ n = f (θ)g(θ) dθ.
−∞
2π −π

Proof. We apply (8.44) to the functions f , g, and f + g:


"/ 0 "
|cn |2 + 2 Re cn γ n + |γn |2 = |cn + γn |2
* π * π
1 2 1 / 0
= |f (θ) + g(θ)| dθ = |f (θ)|2 + 2 Re f (θ)g(θ) + |g(θ)|2 dθ
2π −π 2π −π
" * π "
2 1
= |cn | + Re f (θ)g(θ) dθ + |γn |2 .
π −π
! ;π
It follows that Re cn γ n = Re(1/2π) −π f (θ)g(θ) dθ. The same calculation,
with f replaced by if , shows that the imaginary parts are also equal.

The Fourier bases {einx }∞ ∞ ∞


−∞ , {cos nx}0 and {sin nx}1 play a special role
among all the orthogonal bases for L2 (−π, π) and L2 (0, π) because these functions
are eigenfunctions for the differential operators d/dx and d2 /dx2 . To explain this
in more detail, we recall that an eigenvector for a linear transformation T on Rn
or Cn is a nonzero vector x such that T x = λx for some scalar λ. (See Appendix
A, (A.56)–(A.58)). In our situation, the “vectors” are functions in L2 (−π, π) or
L2 (0, π), and the linear transformation in question is d/dx or d2 /dx2 , defined not
8.6. The Infinite-Dimensional Geometry of Fourier Series 399

on the whole L2 space but on a suitable subspace of functions that possess the
requisite derivatives and satisfy certain boundary conditions. Indeed, we have

d inx d2 d2
e = ineinx , cos nx = −n2 cos nx, sin nx = −n2 sin nx.
dx dx2 dx2
The functions einx are precisely the eigenfunctions of d/dx on [−π, π] that satisfy
the periodicity condition f (−π) = f (π), and the functions cos nx and sin nx are
precisely the eigenfunctions of d2 /dx2 on [0, π] that satisfy the boundary condi-
tions f ′ (0) = f ′ (π) = 0 and f (0) = f (π) = 0, respectively. The Fourier expan-
sion of a function therefore provides the analogue of the spectral theorem (A.58)
for these fundamental differential operators, with all the resulting simplifications
that one expects when one finds an orthonormal eigenbasis for a matrix.
For example, we can rederive the solution (8.35) of the insulated heat flow
problem (8.34) as follows. To solve the heat equation ∂t u = k∂x2 u subject to the
boundary conditions ∂x u(0, t) = ∂x u(l, t) = 0, we take u to be the sum of a series
of eigenfunctions of ∂x2 satisfying these boundary conditions:

" nπx
u(x, t) = αn (t) cos .
0
l

Plugging this into the heat equation turns the partial differential equation ∂t u =
k∂x2 u into the ordinary differential equations α′n (t) = −k(nπ/l)2 αn (t) for the
2
coefficients. The latter are easily solved to yield αn (t) = an e−k(nπ/l) t and hence
the solution (8.35).
There is an extensive theory of eigenfunction expansions associated to bound-
ary value problems. Many such expansions yield interesting orthogonal bases for
L2 spaces. Others, in which there is a “continuous spectrum” instead of (or in addi-
tion to) a “discrete spectrum,” involve integrals instead of (or in addition to) infinite
series. A great deal of interesting mathematics has arisen from these ideas, and its
ramifications spread far beyond the problems for which it was originally devised.
An introduction to this subject can be found, for example, in Folland [6].

EXERCISES

1. Show that {cos nx}∞ ∞


0 and {sin nx}1 are orthogonal sets of functions on [0, π].
What are the norms of these functions?
2. Deduce from Theorem 8.43 that if f ∈ L2 (0, π), the Fourier cosine and sine
series of f both converge to f in norm.
400 Chapter 8. Fourier Series

3. Determine the constants a, b, and c so that the functions

f0 (x) = 1, f1 (x) = x + a, f2 (x) = x2 + bx + c

form an orthogonal set on [0, 1].


4. Suppose {ϕn }∞ +
1 is an orthonormal set of functions on [0, l], and let√ϕn and
ϕn be√the even and odd extensions of ϕn to [−l, l]. Show that {ϕn / 2}∞
− +
1 ∪
− ∞
{ϕn / 2}1 is an orthonormal set on [−l, l].
5. Suppose {ϕn }∞ 1 is an orthonormal set of functions on [a, b]. Given c > 0 and

d ∈ R, let ψn (x) = c ϕn (cx + d). Show that {ψn }∞ 1 is an orthonormal set
on [(a − d)/c, (b − d)/c].
6. Suppose {ϕn }∞
√ 1 is an orthonormal set of functions on [0, 1], and let ψn (x) =
2x ϕn (x2 ). Show that {ψn }∞
1 is also an orthonormal set on [0, 1].
7. Show that any piecewise continuous function on [a, b] is the limit in norm of
a sequence of continuous functions on [a, b] by the argument suggested by the
following picture:

= lim

8. Show that in terms of the cosine and sine coefficients an and bn defined by
(8.7), Parseval’s identity takes the form
* π ∞
"
π / 0
|f (θ)|2 dθ = |a0 |2 + π |an |2 + |bn |2 .
−π 2
1

9. Evaluate the following series by applying Parseval’s identity, in the form given
in Exercise 8, to certain of the Fourier series found in the exercises of §8.1 and
§8.3. (Remember that the constant term is 21 a0 , not a0 .)
"∞
1
a.
n4
1
"∞
1
b.
(2n − 1)6
1
"∞
1
c. 8
1
n
8.7. The Isoperimetric Inequality 401


" sin2 na
d. (First assume that 0 < a < π, then deduce the general re-
n2
1
sult.)
10. Suppose that f is 2π-periodic, real-valued, and of class C 1 . Show that f ′ is
orthogonal to f on [−π, π] in two ways: (i) directly from the fact that 2f f ′ =
(f 2 )′ , and (ii) by expanding f in a Fourier series and using (8.46). (Hint: When
f is real we have c−n = cn ; why?)

8.7 The Isoperimetric Inequality


We conclude this chapter by using Fourier analysis together with Green’s theorem
(thereby joining two of the main threads of this book) to show that among all simple
closed curves in the plane with a given length, the circle is the one that encloses the
greatest area.
First, a few preliminaries. Suppose g : [a, b] → R2 is a continuous, piece-
wise smooth parametrized curve in the plane. (Thus, the components of g are
continuous, piecewise smooth functions on [a, b]; g′ (t) is defined except perhaps
at finitely many points, and we make the usual nondegeneracy
;t ′ assumption that

g (t) ̸= 0.) The arc-length function s = ϕ(t) = a |g (u)| du is a continuous,
piecewise smooth, strictly increasing function on [a, b]. It therefore has an inverse
function, t = ϕ−1 (s), with the same properties, defined on the interval [0, L] where
L = ϕ(b) is the total length of the curve. We can then reparametrize the curve by
h(s) = g(ϕ−1 (s)), s ∈ [0, L]; we then say that the curve is parametrized by arc
length. In this parametrization, the speed |h′ (s)| is identically equal to 1 (except at
isolated points where it is undefined):
|g′ (t)|
(8.47) |h′ (s)| = |g′ (ϕ−1 (s))[ϕ−1 ]′ (s)| = = 1.
ϕ′ (t)
Now, suppose in addition that our curve is a simple closed curve; this means
that, for 0 ≤ s1 < s2 ≤ L, h(s1 ) = h(s2 ) only when s1 = 0 and s2 = L. We can
then extend the function h from [0, L] to R by requiring it to be L-periodic; this
extension is still continuous and piecewise smooth. (Indeed, this is the natural way
to think of a simple closed curve. We think of θ = 2πs/L as the angular coordinate
on a circle; then h(s) traces out the curve as θ goes once around the circle.)
Finally, we observe that we can identify R2 with the complex plane C and
the vector-valued function h = (h1 , h2 ) with the complex-valued function ζ =
h1 + ih2 . The “velocity” h′ (s) then turns into ζ ′ (s), and the condition (8.47)
becomes |ζ ′ (s)| ≡ 1.
Now we are ready to state our theorem:
402 Chapter 8. Fourier Series

8.48 Theorem (The Isoperimetric Inequality). Suppose that C is a piecewise smooth,


simple closed curve in the plane. Let L be the length of C and A the area of the
region enclosed by C. Then A ≤ L2 /4π, with equality if and only if C is a circle
of radius L/2π.
Proof. We identify the plane with C. Dilating the plane by a factor of r, z → rz,
has the effect of multiplying the length of a curve by r and the area of a region by r 2 ,
so it is enough to consider the case L = 2π, for which the conclusion is that A ≤ π.
By the preceding remarks, then, we can assume that C is given by z = ζ(s), where
ζ is a continuous, piecewise smooth, 2π-periodic, complex-valued function on R,
and |ζ ′ (s)| ≡ 1 (except at isolated points where ζ ′ (s) is undefined). We expand ζ
in a Fourier series:
"∞
ζ(s) = cn eins .
−∞

Since ζ is continuous and piecewise smooth, the nth Fourier coefficient of ζ ′ is


incn , by Theorem 8.26. Since |ζ ′ (s)| ≡ 1, Parseval’s identity implies that
* π "∞
1 ′ 2
(8.49) 1= |ζ (s)| ds = n2 |cn |2 .
2π −π −∞

On the other hand, by Green’s theorem (see Example 3 in §5.2), the area of the
region enclosed by C is ) * )
)1 )
A = )2) x dy − y dx)).
C
(The absolute value is there because we do not specify whether C is positively or
negatively oriented.) Moreover,
8 9
x dy − y dx = Im (x − iy)(dx + i dy) = Im z dz,

so ) * ) ) * π
)
) ) ) )
A = )) 12 Im z dz )) = 1)
2 )Im ζ(s)ζ ′ (s) ds)).
C −π
Thus, by the general form (8.46) of Parseval’s identity,
) ∞ ) )∞ )
) " ) )" )
)
A = π )Im ) )
cn incn ) = π ) 2)
n|cn | ).
−∞ −∞

Comparing this with (8.49) yields the desired upper bound for A:
)∞ ) ∞ ∞
)" ) " "
A = π )) n|cn | )) ≤ π
2
|n| |cn |2 ≤ π n2 |cn |2 = π.
−∞ −∞ −∞
8.7. The Isoperimetric Inequality 403

Moreover, the second inequality is strict unless cn = 0 for |n| > 1. In that case,
the first inequality becomes
) )
) |c1 |2 − |c−1 |2 ) ≤ |c1 |2 + |c−1 |2 ,

which is strict unless either c1 or c−1 vanishes. Thus A < π unless ζ(s) =
c0 + c1 eis or ζ(s) = c0 + c−1 e−is , both of which describe a circle centered at
c0 , traversed counterclockwise or clockwise, respectively. (In either case the radius
is 1 since |c±1 | = |ζ ′ (s)| = 1.)
Appendix A

SUMMARY OF LINEAR
ALGEBRA

This appendix consists of a brief summary of the definitions and results from linear
algebra that are needed in the text (and a little more). Brief indications of proofs
are given where it is easy to do so, but lack of any proof does not necessarily mean
that a statement is supposed to be obvious. More complete treatments can be found
in texts on linear algebra such as Anton [1] and Lay [16].

A.1 Vectors
Most of the basic terminology concerning n-dimensional vectors is contained in
§1.1; we introduce a few more items here.
(A.1) If x1 , . . . , xk are vectors in Rn , any vector of the form
c1 x1 + c2 x2 + · · · + ck xk (c1 , . . . , ck ∈ R)
is called a linear combination of x1 , . . . , xk . The set of all linear combinations of
x1 , . . . , xk is called the linear span of x1 , . . . , xk .
Geometrically, the linear span of a single nonzero vector x (that is, the set of all
scalar multiples of x) is the straight line through x and the origin. The linear span
of a pair of nonzero vectors x and y is the plane containing x, y, and the origin
unless y is a scalar multiple of x, in which case it is just the line through x and the
origin.
(A.2) For 1 ≤ j ≤ n, we define ej to be the vector in Rn whose jth component
is 1 and whose other components are all 0:
e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, 0, . . . , 1).

405
406 Appendix A. Summary of Linear Algebra

We call e1 , . . . , en the standard basis vectors for Rn . (When n = 3, the com-


mon notation is i, j, k rather than e1 , e2 , e3 .) Every vector x ∈ Rn can be written
uniquely as a linear combination of the standard basis vectors:

(x1 , x2 , . . . , xn ) = x1 e1 + x2 e2 + · · · + xn en .

A.2 Linear Maps and Matrices


(A.3) Let m and n be positive integers. A map A : Rn → Rm is called linear if
it preserves the vector operations of addition and scalar multiplication:

(A.4) A(x + y) = A(x) + A(y), A(cx) = cA(x) (x, y ∈ Rn , c ∈ R).

(A.5) In elementary mathematics, a “linear function” of the real variable x is


something of the form f (x) = ax + b. As a mapping from R1 to R1 , such a
function is linear in the sense just defined only when b = 0. More generally,
mappings from Rn to Rm of the form f (x) = A(x) + b, where A satisfies (A.4),
are called “linear” in some contexts, as in Chapters 2 and 3 when we speak of
the “linear approximation” to a differentiable map. However, within the subject of
linear algebra, and in particular throughout this appendix, “linear” is always meant
in the strict sense (A.4), and the term affine is used for the more general notion.
The feature that immediately distinguishes linear maps in the strict sense among
the affine ones is that they satisfy A(0) = 0.
(A.6) If A is linear, we have
-"
n . "n
A xj ej = xj A(ej ),
1 1

so A is completely determined by its values on the standard basis vectors. Let us


denote the jth component of A(ek ) by Ajk :

A(e1 ) = (A11 , A21 , . . . , Am1 ), . . . , A(en ) = (A1n , A2n , . . . , Amn ).

Then for any x ∈ Rn we have


n
"
(A.7) A(x) = y, where yj = Ajk xk .
k=1

Thus A can be completely described by the m · n numbers Ajk .


A.2. Linear Maps and Matrices 407

(A.8) Such a collection (Ajk ) = {Ajk : 1 ≤ j ≤ m, 1 ≤ k ≤ n} of m · n


real numbers is called an m × n matrix. It is pictured as a rectangular array, with
the first index j labeling the rows of the array and the second index k labeling the
columns: ⎛ ⎞
A11 A12 . . . A1n
⎜ .. .. .. .. ⎟ .
⎝ . . . . ⎠
Am1 Am2 . . . Amn
(More precisely, such an array is a real m × n matrix. One can also consider ma-
trices whose entries are other sorts of algebraic objects, such as complex numbers
or polynomials.) The formula (A.7) defines a one-to-one correspondence between
linear maps from Rn to Rm and m × n matrices. Henceforth we shall use the same
letter A to denote either a linear map or its associated matrix; the meaning will be
clear from the context.
(A.9) Linear maps from Rn to Rm can be added to one another and multiplied by
scalars:
(A + B)(x) = A(x) + B(x), (cA)(x) = c(A(x)).
On the level of matrices, this is just addition and multiplication in each entry —
that is, vector addition and scalar multiplication, if we think of m × n matrices as
mn-dimensional vectors.
(A.10) Suppose that A : Rn → Rm and B : Rm → Rl are linear maps. We can
then consider their composition B ◦ A : Rn → Rl , and it is easy to check that B ◦ A
is again linear. It is customary in linear algebra to denote this composition simply
by BA, and we do so henceforth.
Given x ∈ Rn , let y = A(x) and z = B(y). On the one hand, we have
m
" m "
" n
zi = Bij yj = Bij Ajk xk ,
j=1 j=1 k=1

and on the other,


n
"
zi = (BA)ik xk .
k=1
It follows that the matrix BA is obtained from the matrices B and A by the formula
m
"
(BA)ik = Bij Ajk .
j=1

In general, if B is an l × m matrix and A is an m × n matrix, the l × n matrix BA


defined by this formula is called the product of the matrices B and A.
408 Appendix A. Summary of Linear Algebra

(A.11) It is important to note that the product BA is defined only if the number
of columns in B is the same as the number of rows in A, that is, if the length of
a row in B is equal to the length of a column in A. It is also important to note
that matrix multiplication is not commutative: In general, BA ̸= AB, even when
both products are defined. However, matrix multiplication is associative; that is,
(CB)A = C(BA) for any A, B, C such that all products in question are defined.
It also distributes over addition in the obvious way: C(A + B) = CA + CB and
(A + B)D = AD + BD.
(A.12) Let I be the identity mapping on Rn , I(x) = x for all x ∈ Rn . The
corresponding matrix is called the n × n identity matrix and is denoted by I or by
In if the size needs to be specified. It is the matrix whose columns are the standard
basis vectors e1 , . . . , en , that is, the matrix whose entries Ijk are equal to 1 when
j = k and 0 when j ̸= k. If A is any m × n matrix, we have Im A = A and
AIn = A. This is obvious since the composition of any map A with the identity
map is just A; it is also easy to verify from the definition of matrix products in
(A.10).
(A.13) Let A : Rn → Rn be a linear map. If there is another linear map B :
Rn → Rn such that AB(x) = BA(x) = x for all x ∈ Rn (that is, in terms of
matrices, AB = BA = In ), then A (or its associated matrix) is called invertible
or nonsingular, and B is called the inverse of A and is denoted by A−1 . It is easy
to verify that if A1 and A2 are both invertible, then so is their product A1 A2 , and
(A1 A2 )−1 = A−1 −1
2 A1 . We shall say more about invertibility in (A.50)–(A.55).
(A.14) Vectors in Rn can be thought of as n × 1 matrices (called column vec-
tors) or as 1 × n matrices (called row vectors), and scalars can be thought of as
1 × 1 matrices. With these identifications, we can reinterpret some of the preceding
formulas:
• If A : Rn → Rm and x ∈ Rn , then by (A.7), A(x) is the matrix product
Ax, where x and A(x) are considered as column vectors. For this reason, we
(almost) always think of vectors as column vectors when we perform matrix
calculations with linear maps. Moreover, we shall henceforth write Ax in
preference to A(x).
• Let B be an l × m matrix and A an m × n matrix; then the rows of B and
the columns of A can both be considered as vectors in Rm . The (ik)th entry
of the product matrix BA is the dot product of the ith row of B with the kth
column of A.

(A.15) The transpose or adjoint of an m × n matrix A is the n × m matrix A∗


defined by (A∗ )jk = Akj . (Many people denote A∗ by At or AT .) Thus, the rows
A.3. Row Operations and Echelon Forms 409

of A∗ are the columns of A and vice versa. As linear maps, A and A∗ are related
through the dot product:

(A.16) x · Ay = A∗ x · y,
! !n
since both sides are equal to the double sum m j=1 k=1 xj Ajk yk . It is easy to

check that (AB) = B A .∗ ∗

A.3 Row Operations and Echelon Forms


(A.17) In high school algebra one learns techniques for solving systems of linear
equations that involve multiplying equations by scalars, adding one equation to
another one, and so forth. When systematized and translated into matrix language,
these methods amount to performing “row operations” on matrices. The three types
of elementary row operations on a matrix are defined as follows. Let A be an
m × n matrix, and let r1 , . . . , rm be the rows of A (considered as vectors in Rn ).
i. Multiply one row by a nonzero scalar. (That is, for some j, replace rj by crj
with c ̸= 0, and leave all the other rows unchanged.)

ii. Add a scalar multiple of one row to another row. (That is, for some j ̸= k,
replace rj by rj + crk , and leave all the other rows unchanged.)

iii. Interchange two rows. (That is, for some j ̸= k, replace rj by rk and rk by
rj , and leave all other rows unchanged.)

(A.18) For each elementary row operation, the matrix obtained by performing
that operation on the identity matrix Im is called the corresponding elementary
matrix. For example, the entries of the elementary matrix corresponding to the
operation (ii) are 1 on the main diagonal, c in the (jk)th slot, and 0 elsewhere.
We leave it as an easy exercise for the reader to verify that performing an elemen-
tary row operation on a matrix A is the same as multiplying A on the left by the
corresponding elementary matrix.
(A.19) It is important to note that the elementary row operations, and their as-
sociated matrices, are all invertible, and their inverses are operations of the same
types. Indeed, the inverses of the operations

rj → crj , rj → rj + crk , rj ↔ rk

are
rj → c−1 rj , rj → rj − crk , rj ↔ rk .
410 Appendix A. Summary of Linear Algebra

(A.20) Row operations can be used to transform a matrix into certain standard
forms that are useful for many purposes. The definitions are as follows. A matrix
is said to be in echelon form if the following conditions are satisfied:

• In every nonzero row (that is, every row in which at least one entry is non-
zero), the first nonzero entry is equal to 1.

• If the jth and kth rows are nonzero, and j < k, the initial 1 in row j is to the
left of the initial 1 in row k.

• The zero rows (if any) are below all of the nonzero rows.

The following matrices are in echelon form:


⎛ ⎞ ⎛ ⎞
- . 1 2 1 4 0
1 3 5 ⎝0 1⎠ , ⎝0 1 1⎠ .
,
0 1 0
0 0 0 0 1

A matrix is said to be in reduced echelon form if it is in echelon form, and in


addition,

• The entries above and below the initial 1’s in the nonzero rows are all 0.

The matrices displayed above are not in reduced echelon form, but the following
matrices are:
⎛ ⎞
- . - . 1 7 0
1 0 −5 1 4 ⎝0 0 1⎠ .
, ,
0 1 −3 0 0
0 0 0

(A.21) Suppose A is a square matrix (say, n × n) in echelon form, and suppose


A has no zero rows. The first nonzero entry in each row is a 1, and these initial
1’s occur successively farther to the right. Since the n initial 1’s must occur in
n different columns, the only possibility is that the initial 1 in the jth row occurs
precisely in the jth column. In other words, the entries of A on the main diagonal
are all equal to 1, and below the main diagonal they are all equal to 0. If A is in
reduced echelon form, then all the entries above the main diagonal must also be 0.
In short, the only n × n matrix in reduced echelon form with no zero rows is the
identity matrix In .
(A.22) The simplest algorithm for turning a given m × n matrix A into one in
echelon form by elementary row operations, known as row reduction or Gaussian
elimination, can be described as follows:
A.4. Determinants 411

1. If necessary, interchange the first row with another row so that the leftmost
nonzero column has a nonzero entry in the first row.

2. Multiply the first row by the reciprocal of its first nonzero entry (thus turning
the first nonzero entry into a 1).

3. Add multiples of the first row to the rows below so as to make the entries
below the initial 1 in the first row equal to 0.

4. Set the first row aside and apply steps 1–3 to the submatrix obtained by omit-
ting the first row. Repeat this process until no nonzero rows remain.

Once this is done, the matrix can be further transformed into one in reduced echelon
form as follows:

5. Add multiples of each nonzero row to the rows above so as to make the
entries above the initial 1’s equal to 0.

(A.23) All of the ideas in this section have analogues for columns in place of
rows. That is, we have the elementary column operations (multiply a column by
a nonzero scalar, add a multiple of one column to another one, interchange two
columns), which are implemented by multiplying a matrix on the right by the cor-
responding elementary matrix. They can be used to transform a matrix into one in
column-echelon form or reduced column-echelon form, whose definitions are the
obvious modifications of the ones given above for (row-)echelon forms.

A.4 Determinants
(A.24) The determinant is a function that assigns to each square matrix A a
certain number det A. For 2 × 2 and 3 × 3 matrices, the determinant is given by
- .
a b
(A.25) det = ad − bc,
c d
⎛ ⎞
a b c
(A.26) det ⎝d e f ⎠ = a(ei − f h) − b(di − f g) + c(dh − eg).
g h i

For larger matrices, the explicit formula for the determinant is quite a mess. How-
ever, this formula is of little use; the important things about determinants are the
properties they possess, which lead to more efficient ways of computing them. The
412 Appendix A. Summary of Linear Algebra

following seven items constitute a list of the most fundamental properties of deter-
minants. In them, A and B denote n × n matrices.
(A.27) det In = 1.
(A.28) det(AB) = (det A)(det B).
(A.29) For each j, det A is a linear function of the jth row of A when the other
rows are kept fixed. (Thus, for example, when j = 1,
⎛ ′ ⎞ ⎛ ′⎞ ⎛ ′′ ⎞
ar1 + br′′1 r1 r1
⎜ r ⎟ ⎜r2 ⎟ ⎜r2 ⎟
det ⎝ 2 ⎠ = a det ⎝ ⎠ + b det ⎝ ⎠ ,
.. .. ..
. . .

where the rj ’s denote row vectors.) In particular, if A has a zero row, det A = 0.
(A.30) (Behavior under elementary row operations)

• If one row of A is multiplied by c and the other rows are left unchanged,
det A is multiplied by c.

• If a multiple of the kth row of A is added to the jth row and the other rows
are left unchanged, det A is unchanged.

• If two rows of A are interchanged, det A is multiplied by −1.

(A.31) Let M jk denote the (n − 1) × (n − 1) matrix obtained by deleting the jth


row and kth column of A. Then, for each j,
n
"
det A = (−1)j+k Ajk det M jk .
k=1

This formula is called the cofactor expansion of det A along the jth row. (For
example, in view of equation (A.25), equation (A.26) gives the cofactor expansion
of the determinant of a 3 × 3 matrix along its first row.)
(A.32) det(A∗ ) = det A. Consequently, properties (A.29) and (A.30) remain
valid if “row” is replaced by “column,” and we can sum over j instead of k in the
cofactor expansion.
(A.33) (How to compute determinants) The cofactor expansion reduces n × n de-
terminants to determinants of smaller size and so can be used recursively to com-
pute a determinant. However, for large matrices it is much more efficient to use
row operations. That is, to compute det A, row-reduce A and keep track of what
A.5. Linear Independence 413

happens to the determinant as each row operation is performed, according to the


rules in (A.30). At the end, we have a matrix in reduced echelon form, which (by
(A.21)) is either the identity matrix (whose determinant is 1, by (A.26)) or a matrix
with a zero row (whose determinant is 0, by (A.29)).

A.5 Linear Independence


(A.34) The vectors x1 , . . . , xk ∈ Rn are said to be linearly dependent if they
satisfy a nontrivial linear equation, that is, if there are scalars c1 , . . . , ck , not all
zero, such that
c1 x1 + c2 x2 + · · · + ck xk = 0.
If cj ̸= 0, this equation can be solved for xj :
8 9
xj = −c−1j c1 x1 + · · · + cj−1 xj−1 + cj+1 xj+1 + · · · + ck xk

Hence, x1 , . . . , xk are linearly dependent if and only if one of them is a linear


combination of the others. If x1 , . . . , xk are not linearly dependent, they are said to
be linearly independent. That is, linear independence of x1 , . . . , xk means that

c1 x1 + · · · + ck xk = 0 only when c1 = · · · = ck = 0.

In the case k = 2, linear independence of x1 and x2 means simply that x1 and


x2 are not scalar multiples of one another.
(A.35) If A is a matrix in echelon form, then !the nonzero rows r1 , . . . , rk of A are
k
linearly independent. Indeed, suppose that 1 cj rj = 0. In the column in which
the initial 1 in
!the first row appears, the entries in all the other rows are 0; hence,
the entry of k1 cj rj in this column is c1 , and so c1 = 0. That being the case, we
!
have k2 cj rj = 0; the same argument now shows that c2 = 0, and so forth.
(A.36) A set of vectors x1 , . . . , xk is called orthonormal if they are mutually
orthogonal and have unit norm:

xi · xj = 0 for i ̸= j, and |xj | = 1 for all j.

(For example, the standard basis vectors ej for Rn are orthonormal.) If x1 , . . . , xk


!
are orthonormal, then they are linearly independent. Indeed, suppose k1 cj xj =
0; we wish to show that cj = 0 for all j. To see that a particular coefficient ci is
!
zero, take the dot product of both sides of the equation 0 = kj=1 cj xj with xi . All
of the dot products xj · xi vanish except for j = i, so we obtain 0 = ci |xi |2 = ci .
414 Appendix A. Summary of Linear Algebra

(A.37) In general, to determine whether x1 , . . . , xk ∈ Rn are linearly indepen-


dent, we can regard them as the rows of a k×n matrix and perform a row reduction.
The rows of the resulting echelon matrix are linear combinations of the original
rows xj . If they are all nonzero, then they are linearly independent by (A.35), and
so are x1 , . . . , xk . But if there is a zero row, then x1 , . . . , xk are linearly dependent,
because that row is a nontrivial linear combination of the xj ’s.

A.6 Subspaces; Dimension; Rank


(A.38) A vector subspace of Rn , or just a subspace for short, is a subset X of
Rn such that
i. if x, y ∈ X then x + y ∈ X, and
ii. if x ∈ X and c ∈ R then cx ∈ X.
Subspaces are closed under taking linear combinations; that is, if x1 , . . . , xk ∈ X
and c1 , . . . , ck ∈ R, then c1 x1 + · · · + ck xk ∈ X. The largest subspace of Rn is Rn
itself, and the smallest one is the trivial subspace consisting of the single element
0. When n > 1, there are also subspaces of intermediate size. For example, when
n = 2, the intermediate subspaces are the lines through the origin; when n = 3,
they are the lines and planes through the origin.
(A.39) The linear span of any set of vectors in Rn is easily seen to be a subspace
of Rn .
(A.40) Let X be a subspace of Rn . A set of vectors in X is called a basis for
X if it is linearly independent and its linear span is X. For example, the standard
basis vectors e1 , . . . , en for Rn are a basis for Rn in this sense. One can show that
any two bases for X have the same number of elements; that number is called the
dimension of X and is denoted by dim X. The dimension of Rn itself is n, and we
define the dimension of the trivial subspace {0} to be 0; the dimension of any other
subspace is an integer strictly between 0 and n.
If x1 , . . . , xk is a !
basis for X, then any element of X can be written in one
and only one way as j1 cj xj . Thus, the dimension of X is the number of real
parameters (namely, the coefficients cj ) that are needed to specify an element of X.
(A.41) Let X be a subspace of Rn . Its orthogonal complement X⊥ is the set of
all vectors that are orthogonal to every vector in X:
% &
X⊥ = x ∈ Rn : x · y = 0 for all y ∈ X .
It is easy to verify that X⊥ is also a subspace. For example, in R3 , the orthogonal
complement of a plane through the origin is the line through the origin perpendic-
ular to it, and vice versa; the orthogonal complement of R3 is {0}, and vice versa.
A.6. Subspaces; Dimension; Rank 415

The complementary relations between the dimensions of X and X⊥ in this example


persists in higher dimensions:
(A.42) For any subspace X ⊂ Rn , dim X + dim X⊥ = n.

(A.43) Let A : Rn → Rm be a linear map. There are two subspaces (one of Rn


and one of Rm ) naturally associated to A: its nullspace
% &
N(A) = x ∈ Rn : Ax = 0 ,
and its range
% &
R(A) = y ∈ Rm : y = Ax for some x ∈ Rn .
It is an easy exercise to check that N(A) and R(A) are indeed subspaces. If we
think of A as an m × n matrix, R(A) is the linear span of the columns of A,
because these columns are the vectors obtained by applying A to the standard basis
vectors for Rn . Hence, R(A) is sometimes called the column space of A.
(A.44) We can also consider the nullspace and range of the transpose A∗ : Rm →
Rn . The range R(A∗ ) is the linear span of the columns of A∗ , which are the rows
of A; hence R(A∗ ) is sometimes called the row space of A. The spaces N(A),
R(A), N(A∗ ), and R(A∗ ) are related as follows:
(A.45) N(A∗ ) = R(A)⊥ ; N(A) = R(A∗ )⊥ .
This follows easily from the relation (A.16). Indeed, x ∈ R(A)⊥ ⇐⇒ x · Ay = 0
for all y ⇐⇒ A∗ x · y = 0 for all y ⇐⇒ A∗ x = 0 ⇐⇒ x ∈ N(A∗ ), and
likewise with A and A∗ switched.
(A.46) The fundamental identity concerning dimensions is the following:
(A.47) For any linear map A : Rn → Rm , dim N(A) + dim R(A) = n.
The intuitive reason behind this identity is simple. An element of Rn has n degrees
of freedom (one can vary any of its n components). The elements of the nullspace
N(A) are all mapped by A to the single vector 0, resulting in a loss of dim N(A)
degrees of freedom and leaving n − dim N(A) degrees of freedom for the range
R(A).
(A.48) From the preceding results, we obtain one more important relation:
(A.49) For any linear map A : Rn → Rm , dim R(A) = dim R(A∗ ).
Indeed, by (A.42), (A.45), and (A.47),
dim R(A) = n − dim N(A) = n − dim R(A∗ )⊥ = dim R(A∗ ).
The common dimension of R(A) and R(A∗ ) is called the rank of A.
416 Appendix A. Summary of Linear Algebra

A.7 Invertibility
(A.50) We recall from the introduction to Chapter 1 that that a mapping f : X →
Y from a set X to another set Y is invertible if there is another mapping g : Y → X
such that g(f (x)) = x for all x ∈ X and f (g(y)) = y for all y ∈ Y , and that f is
invertible if and only if f maps X onto Y and f is one-to-one.
(A.51) Now let A : Rn → Rm be a linear map. We first observe that A is one-
to-one if and only if N(A) = {0}, for Ax = Ay if and only if x − y ∈ N(A).
In particular, if m < n, then by (A.47) we have dim N(A) = n − dim R(A) ≥
n − m > 0, so A cannot be one-to-one. On the other hand, if m > n, then by
(A.47) again, dim R(A) ≤ n < m, so R(A) cannot be all of Rm . Hence, A can
be invertible in the sense of (A.50) only when n = m; in this case, it is not hard to
check that the inverse of A (if it exists) is again a linear map. Thus, for linear maps
the definition of invertibility in (A.50) agrees with the one in (A.13).
(A.52) For a linear map A : Rn → Rn , the following conditions are all equiva-
lent:
a. A is invertible.
b. R(A) = Rn .
c. N(A) = {0}.
d. R(A∗ ) = Rn .
e. N(A∗ ) = {0}.
f. The columns of the matrix A are linearly independent.
g. The rows of the matrix A are linearly independent.
h. det A ̸= 0.
i. The matrix A is a product of elementary matrices.
(A.53) Let us prove (A.52). First, (a) is equivalent to the conjunction of (b) and
(c) by the discussion in (A.50–A.51). (b) and (c) are equivalent to each other by
(A.47), as are (d) and (e), and (b) and (d) are equivalent!
by (A.49). (f) is equivalent
(c), for if cj = Aej is the jth column of A, we have nj=1 aj cj = 0 if and only
to !
if aj ej ∈ N(A); similarly, (g) is equivalent to (e).
Next, we can perform elementary row operations on A to turn A into a matrix
B in reduced echelon form; since performing row operations does not change the
row space of a matrix, we have R(A∗ ) = R(B ∗ ). But by (A.21) and (A.33), either
B = I, in which case det A ̸= 0 and R(A∗ ) = R(I) = Rn ; or B contains at least
one zero row, in which case det A = 0 and dim R(A∗ ) = dim R(B ∗ ) < n; thus
(h) is equivalent to (d).
We have shown that (a)–(h) are all equivalent. Finally, we observed in (A.19)
that every elementary matrix is invertible, and hence so is every product of elemen-
A.8. Eigenvectors and Eigenvalues 417

tary matrices. Conversely, if A is invertible, let B = A−1 . Then B is invertible


also, so B can be row-reduced to the identity matrix; that is, there is a product E
of elementary matrices such that EB = I. But E = E(BA) = (EB)A = A, so
A is a product of elementary matrices. Thus (a) is equivalent to (i).

(A.54) (Cramer’s Rule) If A is invertible and b ∈ Rn , the vector x = A−1 b


is given by xj = (det B j )/(det A), where B j is the matrix obtained from A by
replacing its jth column with the column vector b. This is not a computationally
efficient way of solving Ax = b when n is large, but the fact that the solution can
be expressed as a quotient of determinants is theoretically important.

(A.55) In particular, computing A−1 amounts to solving Axj = ej for j =


1, . . . , n, where the ej ’s are the standard basis vectors: The solutions xj are the
columns of A−1 . It follows that the entries of A−1 are rational functions of the
entries of A whose common denominator is det A.

A.8 Eigenvectors and Eigenvalues


(A.56) Let A be an n × n matrix. A nonzero vector x ∈ Rn is called an eigen-
vector for A if there is a scalar λ ∈ R such that Ax = λx; in this case, λ is called
the eigenvalue of A for the vector x. The equation Ax = λx can be rewritten as
(A − λI)x = 0; hence, λ is an eigenvalue of A (that is, there is a nonzero x such
that Ax = λx) if and only if N(A − λI) ̸= {0}. By (A.52), this condition is
equivalent to det(A − λI) = 0. It is easy to see that det(A − λI) is a polynomial
of degree n in λ, called the characteristic polynomial of A, and the eigenvalues
of A are precisely the roots of this polynomial.

(A.57) The analysis of a matrix A is greatly facilitated if there is an eigenbasis


for A, that is, a basis b1 , . . . , bn of Rn consisting of eigenvectors for A. Indeed,
suppose Abj =! λj bj . Any x ∈ Rn can be written
! as a linear combination of the
bj ’s, say x = nj=1 cj bj , and then Ax = nj=1 λj cj bj . In other words, once
the basis b1 , . . . , bn is known, the action of A is completely determined by the n
numbers λj rather than the n2 numbers Ajk .

(A.58) Not all matrices have eigenbases. (In fact, some matrices have no eigen-
values at all, as long as we allow only real numbers. The situation changes dramat-
ically if we consider complex matrices and complex eigenvalues, but even then A
may not have an eigenbasis when the characteristic polynomial has multiple roots.)
However, there is an important class of matrices that do have eigenbases.
418 Appendix A. Summary of Linear Algebra

The n×n matrix A is called symmetric if A = A∗ , that is, if Ajk = Akj for all
j and k. One can show that every symmetric matrix has an orthonormal eigenbasis.
This is one of the major results of linear algebra, known as the spectral theorem
or principal axis theorem.
Appendix B

SOME TECHNICAL PROOFS

B.1 The Heine-Borel Theorem


B.1 Theorem. If S is a subset of Rn , the following are equivalent:
a. S is compact.
b. If U is any covering of S by open sets, there is a finite subcollection of U that
still forms a covering of S.

Proof. If S is not compact, by the Bolzano-Weierstrass theorem there is a sequence


{xk } in S, no subsequence of which converges to any point of S. This means that
for each x ∈ S there is an open ball Dx centered at x that contains xk for at most
finitely many values of k (Exercise 7, §1.5). The collection U = {Dx : x ∈ S} is
then an open cover of S. Any finite subcollection can contain at most finitely many
of the xk ’s and hence cannot cover all of S.
Conversely, suppose S is compact. Since S is bounded, it is contained in some
closed rectangular box

B0 = [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ]


% &
= x : a1 ≤ x1 ≤ b1 , a2 ≤ x2 ≤ b2 , . . . , an ≤ xn ≤ bn .

By bisecting the intervals [aj , bj ], we can write B0 as the union of 2n boxes whose
side lengths are half as big as those of B0 ; we denote this collection of boxes by
B1 . By bisecting the sides of each box in B1 , we can write B0 as the union of 22n
boxes whose side length are 14 as big as those of B0 ; we denote this collection of
boxes by B2 . Continuing inductively, for each positive integer k we can write B0
as the union of 2kn boxes whose side lengths are 2−k times as big as those of B0 ,
and we denote this collection of boxes by Bk .

419
420 Appendix B. Some Technical Proofs

Now suppose U is a covering of S by open sets. We claim that there is an


integer k such that each box in Bk that intersects S is included in one of the open
sets in U. Once we know this, we are done. There are finitely many (in fact, 2kn )
boxes in Bk ; let B1 , . . . , Bm be the ones that intersect S. Each Bj is included in
some Uj ∈ U; the sets B1 , . . . , Bm cover S, and hence so do U1 , . . . , Um .
It remains to prove the claim. Suppose, to the contrary, that for each k there is
a box Bk ∈ Bk containing a point xk ∈ S but not included in any set in U. By
the Bolzano-Weierstrass theorem, by passing to a subsequence we may assume that
{xk } converges to some point x ∈ S. This x is contained in some open set U in the
collection U. Since U is open, there is a positive number ϵ such that every point y
with |y − x| < ϵ is contained in U . Now pick k large enough so that |xk − x| < 21 ϵ
!
and also 2−k [ n1 (bj − aj )2 ]1/2 < 12 ϵ. The latter condition implies that the distance
between any two points of the box Bk is less than 21 ϵ. Thus, if y ∈ Bk , then
|y − x| ≤ |y − xk | + |xk − x| < 12 ϵ + 21 ϵ = ϵ.
But this means that Bk ⊂ U , contrary to assumption. This contradiction completes
the proof.

B.2 The Implicit Function Theorem


B.2 Theorem. Let F(x, y) be an Rk -valued function of class C 1 on some neigh-
borhood of a point (a, b) ∈ Rn+k , and let Bij = (∂Fi /∂yj )(a, b). Suppose that
F(a, b) = 0 and det B ̸= 0. Then for some positive numbers r0 , r1 , the following
conclusions are valid.
a. For each x in the ball |x − a| < r0 there is a unique y such that |y − b| < r1
and F(x, y) = 0. We denote this y by f (x); in particular, f (a) = b.
b. The function f thus defined for |x−a| < r0 is of class C 1 , and its partial deriva-
tives ∂xj f can be computed by differentiating the equations F(x, f (x)) = 0
with respect to xj and solving the resulting linear system of equations for
∂xj f1 , . . . , ∂xj fk .
Proof. The proof proceeds by induction on k. The case k = 1 is the implicit
function theorem for a single equation, proved in §3.1. We assume that the result
is valid when the number of equations is 1, 2, . . . , k − 1 and deduce it when the
number of equations is k.
Let M ij denote the (k − 1) × (k − 1) matrix obtained by deleting the ith row
and the jth column from the matrix B. By the cofactor expansion along the last
row (see (A.31) in Appendix A),
(B.3)
det B = (−1)k+1 Bk1 det M k1 + (−1)k+2 Bk2 det M k2 + · · · + Bkk det M kk .
B.2. The Implicit Function Theorem 421

Since det B ̸= 0 by assumption, at least one term in this sum must be nonzero. By
reordering the variables if necessary, we can assume that the last term is nonzero,
so det M kk ̸= 0.
Now, M kk is the matrix of partial derivatives of F1 , . . . , Fk−1 with respect to
the variables y1 , . . . , yk−1 , evaluated at (a, b), so by inductive hypothesis, the k −1
equations
F1 (x, y) = F2 (x, y) = · · · = Fk−1 (x, y) = 0

determine y1 , . . . , yk−1 as C 1 functions of x1 , . . . , xn and yk in some neighbor-


hood of (a, b):
yj = gj (x, yk ) (j ≤ k − 1).

Let G be the function of x1 , . . . , xn , yk obtained by substituting the gj ’s for the yj ’s


in the last function Fk :

G(x, yk ) = Fk (x, g(x, yk ), yk ).

We wish to use the implicit function theorem for a single equation to solve the
equation G(x, yk ) = 0 for yk as a C 1 function of x, say yk = fk (x). Then for
j < k we will have yj = fj (x) where fj (x) = gj (x, fk (x)), and the proof will
be complete. (The method for computing the partial derivatives of f stated in (b) is
just implicit differentiation, as discussed in §2.5.)
Our task is to verify that the hypothesis of the implicit function theorem, namely
∂yk G(a, bk ) ̸= 0, is satisfied. To do this we need the chain rule, some facts about
determinants, and perseverance. To begin with,

k−1
" ∂Fk ∂gj
∂G ∂Fk
= + ,
∂yk ∂yj ∂yk ∂yk
j=1

so setting (x, y) = (a, b) gives

"k−1
∂G ∂gj
(B.4) (a, bk ) = Bkj (a, bk ) + Bkk .
∂yk ∂yk
j=1

To evaluate ∂gj /∂yk , we differentiate the equations Fi (x, g(x, yk ), yk ) = 0 for


i < k, obtaining
k−1
" ∂Fi ∂gj ∂Fi
+ =0 (i < k),
∂yj ∂yk ∂yk
j=1
422 Appendix B. Some Technical Proofs

which at (x, y) = (a, b) becomes


k−1
" ∂gj
(B.5) Bij (a, bk ) = −Bik (i < k).
∂yk
j=1

These k −1 equations can be solved for the desired quantities (∂gj /∂yk )(a, bk )
by Cramer’s rule (see (A.54) in Appendix A). The coefficient matrix in (B.5),
(Bij )k−1
i,j=1 , is what we called M
kk above, and the matrix obtained by replacing

its jth column by the numbers −Bik on the right of (B.5) is


⎛ ⎞
B11 ··· −B1k ··· B1(k−1)
⎜ .. .. .. ⎟
⎝ . . . ⎠.
B(k−1)1 · · · −B(k−1)k · · · B(k−1)(k−1)

But this is just the matrix M kj obtained by deleting the kth row and the jth column
from B except that the column involving the Bik ’s has been multiplied by −1 and
moved from the last slot to the jth slot. The determinant of this matrix is therefore
(−1)k−j det M kj — one factor of −1 because of the minus signs on the column of
Bik ’s, and k − j − 1 more factors of −1 from interchanging that column with the
succeeding k − j − 1 columns to move it back to its rightful place on the right end.
In short, the application of Cramer’s rule to the system (B.5) yields
∂gj det M jk
(a, bk ) = (−1)k−j .
∂yk det M kk
Now we are done. Substitute this result back into (B.4), noting that (−1)−j =
(−1)j , and recall (B.3):
k−1
"
∂G det M kj
(a, bk ) = (−1)j+k Bkj + Bkk
∂yk det M kk
j=1
!k j+k B det M kj
j=1 (−1) kj det B
= kk
= .
det M det M kk
Since det B ̸= 0 by assumption, this completes the verification that ∂yk G(a, bk ) ̸=
0 and hence the proof of the theorem.

B.3 Approximation by Riemann Sums


The subject of this section is Proposition 4.16 and its generalization to multiple
integrals.
B.3. Approximation by Riemann Sums 423

B.6 Lemma. Suppose f is an integrable function on [a, b] and |f (x)| ≤ C for x ∈


[a, b]. Let P = {x0 , . . . , xJ } be a partition of [a, b] such that maxj (xj −xj−1 ) < δ,
and let P ′ be another partition obtained by adding N extra points to P . Then
SP f < SP ′ f + 2CN δ and sP f > sP ′ f − 2CN δ.

Proof. We consider the upper sums SP f and SP ′ f ; the argument for the lower
sums is similar. If no extra point is added in the interval (xj−1 , xj ) in passing from
P to P ′ , both sums contain the term Mj (xj − xj−1 ), where Mj is the supremum
of f on [xj−1 , xj ]. If extra points are added, the term Mj (xj − xj−1 ) in SP f is
replaced by a sum of similar terms corresponding to subintervals of [xj−1 , xj ]. Both
Mj (xj −xj−1 ) and the latter sum are bounded in absolute value by C(xj −xj−1 ) <
Cδ, so their difference is bounded by 2Cδ. The total change from SP f to SP ′ f
is the sum of these differences, of which there are at most N , so it is less than
2CN δ.

Remark. The conclusion of this lemma is significant only when N δ ≪ 1, and


hence when N is much less than the number J of subdivision points of P (since
Jδ > b − a).

B.7 Theorem. Suppose f is integrable on [a, b]. Given ϵ > 0, there exists δ > 0
such that if P = {x0 , . . . , xJ } is any partition of [a, b] satisfying

max (xj − xj−1 ) < δ,


1≤j≤J

!J ;b
any Riemann sum 1 f (tj )(xj − xj−1 ) associated to P differs from a f (x) dx by
at most ϵ.

Proof. It is enough to prove the result for the lower and upper sums sP f and SP f ,
as all other Riemann sums lie in between these two. Pick a partition Q of [a, b] such
;b ;b
that SQ f < a f (x) dx + 12 ϵ and sQ f > a f (x) dx − 21 ϵ. Let N be the number
of subdivision points in Q, and let C be an upper bound for |f | on [a, b]; we claim
that any δ < ϵ/4N C will do the job. Indeed, suppose P = {x0 , . . . , xJ } satisfies
maxj (xj − xj−1 ) < δ. Then the partition P ∪ Q is obtained by adding at most N
points to P (namely, the points of Q that are not already in P ). By Lemma B.6 and
Lemma 4.3,
* b
1 1
SP f < SP ∪Q f + 2N Cδ < SP ∪Q f + 2 ϵ ≤ SQ f + 2 ϵ ≤ f (x) dx + ϵ,
a
;b ;
and likewise sP f > a f (x) dx − ϵ. Since sP f ≤ f (x) dx ≤ SP f , the proof is
complete.
424 Appendix B. Some Technical Proofs

In the next two sections we shall need the generalization of Theorem B.7 to
multiple integrals. The idea is exactly the same, but the notation is more compli-
cated. We give the precise statement of the result but leave the adaptation of the
one-dimensional proof to the reader.

B.8 Theorem. Suppose f is integrable on the rectangular box B = [a1 , b1 ] × · · · ×


[an , bn ]. Given ϵ > 0, there exists δ > 0 such that if
% &
P = x10 , . . . , x1J1 ; x20 , . . . , x2J2 ; . . . ; xn0 , . . . xnJn

is any partition of B satisfying

max max (xij − xi(j−1) ) < δ,


1≤i≤n 1≤j≤Ji
; ;
any Riemann sum for f associated to B differs from · · · B f (x) dn x by at most ϵ.

B.4 Double Integrals and Iterated Integrals


B.9 Theorem. Let R = [a, b] × [c, d], and let f be an integrable function on R.
Suppose that, for each y ∈ [c, d], the function fy defined by fy (x) = f (x, y) is
;b
integrable on [a, b], and the function g(y) = a f (x, y) dx is integrable on [c, d].
Then ** * +* ,
d b
f dA = f (x, y) dx dy.
R c a

Proof. Let PJK = {x0 , . . . , xJ ; y0 , . . . , yK } be the partition of R obtained by


subdividing [a, b] and [c, d], respectively, into J and K equal subintervals of length
∆x = (b − a)/J and ∆y = (d − c)/K. Given ϵ > 0, there is an integer N such
that
)* * J "
" K )
) ) ϵ
(B.10) ) f dA − f (xj , yk ) ∆x ∆y )) <
) 3
R j=1 k=1

provided that J ≥ N and K ≥ N , and also


)* +* , K * b
)
) d b " ) ϵ
) )
(B.11) ) f (x, y) dx dy − f (x, yk ) dx ∆y ) <
) c a a ) 3
k=1

provided that K ≥ N . (For (B.10) we are applying Theorem B.8 to ;the function f ,
b
and for (B.11) we are applying Theorem B.7 to the function g(y) = a f (x, y) dx.)
B.5. Change of Variables for Multiple Integrals 425

Let us fix K to be equal to N ; then the points yk are also fixed. By Theorem B.7
again, we can choose J large enough so that
)* b J
" )
) ) ϵ
) f (x, y ) dx − f (x , y ) ∆x )<
) k j k ) 3(d − c)
a j=1

for all k = 1, . . . , K. Then


) )
) J K K * b )
)" " " )
) f (xj , yk ) ∆x ∆y − f (x, yk ) dx ∆y ))
)
) j=1 k=1 k=1 a )
) )
" K )" J * b )
) ) Kϵ∆y ϵ
≤ ) f (xj , yk ) ∆x − f (x, yk ) dx)) ∆y < = .
) 3(d − c) 3
k=1 ) j=1 a )

Therefore, by (B.10),
)* * )
) K * b
" ) 2ϵ
) )
) f dA − f (x, yk ) dx ∆y ) < ,
) R a ) 3
k=1

and hence by (B.11),


)* * * d +* b , )
) )
) f dA − f (x, y) dx dy )) < ϵ.
)
R c a

Since ϵ is arbitrary, the double integral and the iterated integral must be equal.

B.5 Change of Variables for Multiple Integrals


The object of this section is to show that measurability and the zero-content prop-
erty are preserved under invertible C 1 transformations, and to prove Theorem 4.37.
The arguments are rather difficult, and we must begin by developing some tools.
For the calculations in this section, it will be convenient to measure the magni-
tude of a vector x ∈ Rn not by the Euclidean norm |x| but by the “max-norm”
/ 0
∥x∥ = max |x1 |, |x2 |, . . . , |xn | .

As we observed in (1.3), the norms |x| and ∥x∥ are comparable to each other in

the sense that ∥x∥ ≤ |x| ≤ n∥x∥. The max-norm shares the following basic
properties with the Euclidean norm:

∥x + y∥ ≤ ∥x∥ + ∥y∥, ∥cx∥ = |c| ∥x∥, ∥x∥ = 0 ⇐⇒ x = 0.


426 Appendix B. Some Technical Proofs

However, the set % &


Q(r, x) = y : ∥y − x∥ < r
is not the ball of radius r about x but rather the open cube (or square, if n = 2) of
side length 2r centered at x.
Suppose A : Rn → Rm is a linear map with associated matrix (Ajk ). For any
x ∈ Rn we have
) n ) O P
) ) n
m )" ) m "
∥Ax∥ = max ) Ajk xk ) ≤ max |Ajk | ∥x∥.
j=1 ) ) j=1
k=1 k=1

Hence, if we define
n
"
m
(B.12) ∥A∥ = max |Ajk |,
j=1
k=1

we have
∥Ax∥ ≤ ∥A∥ ∥x∥.
We shall need the variant of Theorem 2.88 that pertains to the norms just de-
fined, and an extension of it to nonconvex sets:
B.13 Lemma. Suppose F is a differentiable map from a convex set W ⊂ Rn into
Rm , and suppose that ∥DF(x)∥ ≤ M for all x ∈ W (where ∥DF(x)∥ is defined
by (B.12)). Then
∥F(x) − F(y)∥ ≤ M ∥x − y∥ for all x, y ∈ W.
Proof. Let F = (F1 , . . . , Fm ). By the mean value theorem (2.39), for each j there
is a point c on the line segment between x and y such that
n
"
Fj (x) − Fj (y) = ∇Fj (c) · (x − y) = (∂k Fj (c))(xk − yk ).
k=1

But then
n
"
|Fj (x) − Fj (y)| ≤ |∂k Fj (c)| ∥x − y∥ ≤ ∥DF(c)∥ ∥x − y∥ ≤ M ∥x − y∥.
k=1

Taking the maximum over j, we obtain the desired result.

B.14 Lemma. Suppose F is a map of class C 1 from an open set U ⊂ Rn into Rm .


For any compact set R ⊂ U there is a constant C such that
∥F(x) − F(y)∥ ≤ C∥x − y∥ for all x, y ∈ R.
B.5. Change of Variables for Multiple Integrals 427

Proof. Since U is open, for each x ∈ R there is a positive number r such that
the cube Q(2r, x) is contained in U . By the Heine-Borel theorem, R is covered
#J finitely many of the cubes Q(r, x) with side length half as large, say R ⊂
by
j=1 Q(rj , xj ). Let r0 be the smallest of the numbers r1 , . . . , rJ . Moreover, let
C1 and C2 be the maximum values of ∥DF(x)∥ and ∥F(x)∥ as x ranges over R.
(These maxima exist since R is compact and ∥DF(x)∥ and ∥F(x)∥ are continuous
functions of x ∈ R.)
Now suppose x, y ∈ R; then either ∥x − y∥ < r0 or ∥x − y∥ ≥ r0 . In
the first case, both x and y lie in one of the cubes Q(2rj , xj ). (Indeed, x lies
in one of the cubes Q(rj , xj ) since they cover R, and then y ∈ Q(rj + r0 , xj ).)
Since Q(2rj , xj ) is convex, we can apply Lemma B.9 to conclude that ∥DF(x) −
DF(y)∥ ≤ C1 ∥x − y∥. In the second case, we simply have

2C2
∥F(x) − F(y)∥ ≤ ∥F(x)∥ + ∥F(y)∥ ≤ 2C2 ≤ ∥x − y∥.
r0

Hence we can take C = max(C1 , 2C2 /r0 ).

Before proceeding, we need to make one more observation: In developing the


theory of integration one uses (n-dimensional) rectangles in a number of places; it
is enough to use cubes instead. First, in defining the integral of an integrable func-
tion over a measurable set S, we can enclose S in a cube Q and restrict attention to
the approximating sums obtained by partitioning Q evenly into smaller subcubes;
these sums converge to the integral by Theorem B.8. Second, in showing that a set
has zero content, we consider coverings of a set S by finite unions of rectangles
whose total volume is small. We can enlarge each rectangle by an arbitrarily small
amount to obtain one whose vertices have rational coordinates, and the latter rect-
angle can be subdivided into cubes of side length 1/d where d is the least common
denominator of its side lengths.
Now we are ready to address the central issues of this section. For the rest of
this section, G will denote a one-to-one transformation of class C 1 from an open
set U ⊂ Rn onto another open set V ⊂ Rn whose derivative DG(u) is invertible
for all u ∈ U . By the inverse mapping theorem, G−1 : V → U also has the same
properties. Moreover, we denote the n-dimensional volume of a measurable set
S ⊂ Rn by V n (S). (Thus, if S is a cube of side length r, we have V n (S) = r n .)

B.15 Theorem. Suppose K ⊂ U is a compact set with zero content. Then G(K)
also has zero content.

Proof. First, since U is open, for each u ∈ K there is a cube centered at x whose
vertices have rational coordinates and whose closure lies in U . Since K is compact,
428 Appendix B. Some Technical Proofs

finitely many of these cubes cover K; thus, K ⊂ Rint where R is a finite union of
closed cubes contained in U . Let C be the constant in Lemma B.14, with R being
the set we have just defined.
Since K has zero content,# for any ϵ > 0! there is a finite collection
! n of cubes
{Q(rj , xj )} such that K ⊂ Q(rj , xj ) and n
V (Q(rj , xj )) = rj < ϵ/C n ,
and these cubes can be taken to be subsets of R. (See the remarks following Lemma
B.14.) By Lemma B.14, G(Q(rj , xj )) ⊂ Q(Crj , G(xj )). Thus G(K) is con-
tained
! ! of the cubes Q(Crj , G(xj )), and the sum of their volumes is
in the union
(Crj )n = C n rjn < ϵ. It follows that G(K) has zero content.

B.16 Corollary. Suppose T is a measurable set with T ⊂ U . Then G(T ) is also


measurable.

Proof. First we observe that T is bounded (because it is measurable), so its bound-


ary ∂T is compact. Moreover, G(∂T ) = ∂(G(T )). (This is an easy consequence
of the fact that G and G−1 are both continuous; the proof is left as an exercise
to the reader.) Now, measurability means that the boundary ∂T is a set of zero
content. (In particular, ∂T is bounded, and hence compact since it is closed.) By
Theorem B.15, ∂(G(T )) = G(∂T ) has zero content, so G(T ) is measurable.

B.17 Corollary. If f : V → R is continuous except possibly on a compact set of


zero content, then the same is true of f ◦ G : U → R.

Proof. Suppose f is continuous on V \ K, where K ⊂ V is compact and has zero


content. Since G is continuous, f ◦ G is continuous on U \ G−1 (K). Since G−1
is continuous, G−1 (K) is compact (by Theorem 1.22) and has zero content (by
Theorem B.15).

We now present a sequence of lemmas leading up to the main change-of-


variable theorem. The heart of the argument is Lemma B.21.
If S and T are subsets of Rn , the distance from S to T is defined to be
% &
d(S, T ) = inf |x − y| : x ∈ S, y ∈ T .

B.18 Lemma. Suppose that S and T are disjoint closed subsets of Rn and S is
compact. Then d(S, T ) > 0.

Proof. If the assertion is false, there exist sequences {xj } in S and {yj } in T such
that |xj − yj | → 0. Since S is compact, by passing to a subsequence we may
assume that xj converges to a point x ∈ S. But then yj → x also, so x ∈ T since
T is closed. This is impossible since S ∩ T = ∅.
B.5. Change of Variables for Multiple Integrals 429

B.19 Lemma. Suppose Q ⊂ U is a closed cube. For any invertible linear map
A : Rn → Rn ,
O Pn
V n (G(Q)) ≤ | det A| sup ∥A−1 DG(u)∥ V n (Q).
u∈Q

Proof. Let C = supu∈Q ∥A−1 DG(u)∥ (which is finite since Q is compact), and
notice that A−1 DG(u) = D(A−1 ◦ G)(u) since A−1 is linear. We apply Lemma
B.13 to the map F = A−1 ◦ G on the set W = Q to see that A−1 (G(Q)) is
contained in a cube Q′ whose side length is C times the side length of Q, and
whose volume is therefore C n times that of Q. Hence, by Theorem 4.35,

| det A|−1 V n (G(Q)) = V n (A−1 (G(Q))) ≤ V n (Q′ ) = C n V n (Q),

as claimed.

B.20 Lemma. Let R be a compact subset of U . For any ϵ > 0 there is a δ > 0
such that
) ) ) )
) −1 ) ) −1 )
) ∥DG(u) DG(v)∥ − 1) < ϵ and ) | det DG(u)| | det DG(v)| − 1) < ϵ

whenever u, v ∈ R and ∥u − v∥ < δ.

Proof. By (A.55) in Appendix A, the entries of the matrix DG(u)−1 DG(v) vary
continuously as u, v vary over R, so the functions ϕ(u, v) = ∥DG(u)−1 DG(v)∥
and ψ(u, v) = | det DG(u)|−1 | det DG(v)| are continuous on R × R. Moreover,
ϕ(u, u) = ψ(u, u) = 1 for all u ∈ R. (It follows easily from the definition
(B.12) that ∥I∥ = 1.) Since R × R is compact, ϕ and ψ are uniformly continuous
(Theorem 1.33). Hence, for any ϵ > 0 there is a δ > 0 such that |ϕ(u, v) −
ϕ(u′ , v′ )| < ϵ whenever ∥u − u′ ∥ + ∥v − v′ ∥ < δ, and likewise for ψ. Taking
u′ = v′ = u, we obtain the desired conclusions.

For the remainder


; of this; section,
; we denote n-dimensional integrals by a single
integral sign rather than · · · .

B.21 Lemma. Let T be a measurable set such that T ⊂ U . Then


*
n
(B.22) V (G(T )) ≤ | det DG| dV n .
T
430 Appendix B. Some Technical Proofs

Proof. Since ∂T and ∂(G(T )) have zero content (Corollary B.16), the quantities
on either side of (B.22) are unchanged if we replace T by T . Hence we may as well
assume that T = T is compact.
We shall prove (B.22) by approximating the quantities on either side by finite
sums corresponding to a grid of small cubes. In detail, the process is as follows.
Pick a closed cube Q0 such that T ⊂ Q0 , and denote the side length of Q0 by l. By
partitioning the sides of Q0 into M equal pieces, we obtain a partition of Q0 into
M n equal subcubes of side length l/M ; denote this collection of closed cubes by
QM . Since distance from T to the complement of U is strictly positive by Lemma
B.18, all of the cubes in QM that intersect T will be contained in U provided M is
sufficiently large, say M ≥ M0 . For each M ≥ M0 , let RM be the union of those
cubes in QM that intersect T . Then RM is a compact set such that T ⊂ RM ⊂ U ,
and V n (RM ) → V n (T ) as M → ∞.
Now, let ϵ > 0 be given. We choose δ > 0 as in Lemma B.20, and we then pick
M ≥ M0 large enough so that l/M < δ and V n (RM ) < V n (T ) + ϵ.
#
Let Q1 , . . . , QK be the cubes in QM that intersect T , so that RM = Kk=1 Qk ,
and let xk be the center of Qk . Since l/M < δ, Lemma B.20 applies whenever
u ∈ Qk and v = xk . Thus, by Lemma B.19, with A = DG(xk ),

V n (G(Qk )) ≤ | det DG(xk )|(1 + ϵ)n V n (Qk ),

so
K
" K
"
n n n
V (G(T )) ≤ V (G(Qk )) < (1 + ϵ) | det DG(xk )|V n (Qk ).
k=1 k=1

On the other hand, by Lemma B.20 again,

| det DG(u)| > (1 − ϵ)| det DG(xj )| for all u ∈ Qj ,

so
* *
n n1
| det DG(xk )|V (Qk ) = | det DG(xk )| d u < | det DG(u)| dn u.
Qk 1−ϵ Qk

In short,

K * *
(1 + ϵ)n " (1 + ϵ)n
V n (G(T )) ≤ | det DG| dV n = | det DG| dV n .
1−ϵ Qk 1 − ϵ RM
k=1
B.5. Change of Variables for Multiple Integrals 431

Finally, let C be the maximum of | det DG(u)| as u ranges over the compact set
RM0 . Then
)* * ) *
) )
) | det DG| dV n
− | det DG| dV n)
| det DG| dV n
) )=
RM T RM \T
≤ C[V (RM ) − V n (T )] < Cϵ.
n

Therefore,
*
n (1 + ϵ)n (1 + ϵ)n
V (G(T )) ≤ | det DG| dV n + C ϵ.
1−ϵ T 1−ϵ
Since ϵ is arbitrary and C is independent of ϵ, (B.22) follows.

B.23 Lemma. Let T be a measurable set such that T ⊂ U , and let f be a bounded
nonnegative function on G(T ) that is continuous except perhaps on a set of zero
content. Then
* *
n
f (x) d x ≤ f (G(u))| det DG(u)| dn u.
G(T ) T
;
Proof. Consider a lower Riemann sum for G(T ) f :

J
"
sP f = mj V n (Qj ),
j=1

where the Qj ’s are cubes with disjoint interiors contained in G(T ) and mj =
inf x∈Qj f (x). (The hypothesis f ≥ 0 is needed so that the cubes Qj satisfy Qj ⊂
G(T ), not just Qj ∩ G(T ) ̸= ∅.) By Theorem B.15 and Corollary B.17 (applied
to G−1 ), the sets G−1 (Qj ) are measurable and overlap only in sets of zero content.
By Lemma B.21, then, we have
"
sP f = mj V n (Qj )
" * "*
n
≤ mj | det DG| dV ≤ (f ◦ G)| det DG| dV n
G (Qj )
−1 G (Qj )
−1
* *
= " (f ◦ G)| det G| dV ≤ (f ◦ G)| det DG| dV n .
n
G−1 (Qj ) T
#
(For the last inequality we used the fact that G−1 (Qj ) ⊂ T and the assumption
that f ≥ 0.) Taking the supremum over all lower Riemann sums sP f , we obtain
the desired conclusion.
432 Appendix B. Some Technical Proofs

At last we come to the main result, for which we restate the hypotheses in
full. We assume that f : G(T ) → R is bounded and continuous except on a
set of zero content (and hence is integrable on G(T )); by Corollary B.17, this
implies that f ◦ G : T → R is also bounded and continuous except on a set of
zero content (and hence is integrable on T ). It is actually enough to assume that
f is integrable on G(T ), but then an additional argument would be necessary to
establish the integrability of f ◦ G.

B.24 Theorem. Let G be a one-to-one transformation of class C 1 from an open


set U ⊂ Rn onto another open set V ⊂ Rn whose derivative DG(u) is invertible
for all u ∈ U . Let T be a measurable set such that T ⊂ U , and let f be a bounded
function on G(T ) that is continuous except perhaps on a set of zero content. Then
* *
f (x) dn x = f (G(u))| det DG(u)| dn u.
G(T ) T

Proof. It suffices to show that each of these integrals is less than or equal to the
other one. For f ≥ 0, Lemma B.23 proves one of these inequalities, and the
reverse inequality follows by applying Lemma B.23 to the inverse transformation.
More precisely, if in Lemma B.23 we replace T by G(T ), G by G−1 , and f by
(f ◦ G)| det DG|, we obtain
*
f (G(u))| det DG(u)| dn u
T *
≤ f (G(G−1 (x)))| det DG(G−1 (x))|| det DG−1 (x)| dn x.
G(T )

But by the chain rule (2.86), the matrices DG(G−1 (x)) and DG−1 (x) are inverses
of each other, so their determinants
; are reciprocals of each other; hence, the integral
on the right is simply G(T ) f (x) dn x. Thus the theorem is proved for the case
f ≥ 0. The general case follows by writing f = (f + C) − C where C ≥ 0 is
sufficiently large that f + C ≥ 0 on T . The argument just given applies to f + C
and to the constant functon C; subtracting the results yields the theorem.

B.6 Improper Multiple Integrals


;
In
; this
; section we denote multiple integrals by a single integral sign rather than
··· .
B.7. Green’s Theorem and the Divergence Theorem 433

B.25 Theorem. Let S be an open set in Rn , and let f be a nonnegative function


on S that is integrable over every compact subset of S. Let {Uj } and {U:j } be
sequences of compact subsets of S such that

A ∞
A
U1 ⊂ U2 ⊂ U3 ⊂ · · · , :1 ⊂ U
U :2 ⊂ U
:3 ⊂ · · · , and Ujint = S = : int .
U j
1 1

Then * *
lim f dV n = lim f dV n ,
j→∞ Uj j→∞ U
!j

where the limits may be finite or +∞.

;Proof. The limits


n and
; in question exist by the monotone sequence
n increase with j. Let I = lim
; theorem, because
n , and let c
Uj f dV !j
U f dV j→∞ Uj f dV
;
be any number less than I. We then have Uj f dV n > c when j is sufficiently
# : int
large, say j ≥ J. Now UJ ⊂ S = ∞ U , so by the Heine-Borel theorem, for
#K : int 1 :j
some finite K we have UJ ⊂ 1 Uj ⊂ UK . But then, for j ≥ K,
* * *
n n
f dV ≥ f dV ≥ f dV n > c.
!j
U !K
U UJ
;
Since c is an arbitrary number less than I, it follows that lim U!j f dV n ≥ I =
;
lim Uj f dV n . The same argument works with the roles of the Uj ’s and U :j ’s
switched, so the two limits are actually equal.

B.7 Green’s Theorem and the Divergence Theorem


The object of this section is to show how to prove Green’s theorem and its analogues
in higher dimensions for general C 1 domains. For this purpose we need to develop
a technical tool, the notion of partitions of unity, that has many uses in advanced
analysis.

B.26 Lemma. For any rectangular box B = [a1 , b1 ] × · · · × [an , bn ] in Rn , there is


a C ∞ function f on Rn such that f (x) > 0 for x ∈ B int and f (x) = 0 otherwise.

Proof. In the case n = 1, B = [a, b], we can take f to be


7
b e1/(x−a)(x−b) if a < x < b,
fa (x) =
0 otherwise.
434 Appendix B. Some Technical Proofs

(Note that the exponent 1/(x − a)(x − b) is negative for a < x < b and tends to
−∞ as x → a+ or x → b−.) An argument like that in Exercise 9, §2.1, shows that
f and all its derivatives vanish as x → a+ or x → b−, so f is C ∞ even at a and b.
For the n-dimensional case, then, the function

f (x) = fab11 (x1 )fab22 (x2 ) · · · fabnn (xn )

does the job.

If f is a function on Rn , the support of f , denoted by supp(f ), is the closure


of the set of all points x such that f (x) ̸= 0; in other words, it is the smallest closed
set outside of which f vanishes.
n
#J Suppose K ⊂ R is compact and U1 , . . . , UJMare open
B.27 Theorem. sets such
that K ⊂ 1 Uj . Then there exists a finite collection {ϕm }1 of C ∞ functions
such that
a. the support of each ϕm is compact and contained in one of the sets Uj , and
!M
b. 1 ϕm (x) = 1 for all x ∈ K.

Proof. The starting point is a fact we demonstrated in the course of proving The-
orem B.1: There is a grid B of closed rectangular boxes such that each box in B
that intersects K is contained in one of the sets Uj . Let B1 , . . . , BM be the boxes
in B that intersect K, and let BM +1 , . . . , B#NMbe the additional boxes in B that in-
tersect at least one of B1 , . . . , BM . (Thus, 1 Bm is a compact set contained in U
#
whose interior contains K, and N 1 Bm is obtained by adding one additional layer
#
of boxes around the boundary of M 1 Bm .)
For 1 ≤ m ≤ M , the box Bm is contained in one of the Uj ’s, say Uj(m) ; let
cm = d(Bm , Uj(m) c ). (Here d(S, T ) is the distance from S to T , defined before
Lemma B.18.) On the other hand, for M < m ≤ N we have Bm ∩ K = ∅; let
cm = d(Bm , K). The numbers cm are all positive by Lemma B.18. Let η be the
smallest of the side lengths of the Bm ’s, let
1
δ = √ min(c1 , . . . , cN , η),
2 n

and for 1 ≤ m ≤ N let B :m be the closed box with the same center as Bm whose
side lengths are larger than those of Bm by the amount δ. Then the boxes B :m have
:
the following properties: First, each point of Bm is in the interior of Bm . Second,
since δ < 21 η, for m ≤ M each point of B :m is in the interior of one of the B
:l ’s. (It
is the points on the boundary of B :m that are at issue here, and it may happen that
:
l > M .) Third, if x ∈ Bm , there is a point y ∈ Bm such that |xj − yj | ≤ δ for all
B.7. Green’s Theorem and the Divergence Theorem 435

√ √ :m ⊂ Uj(m) for
j, and hence |x − y| ≤ δ n. Since δ n < 21 cm , it follows that B
m ≤ M and B :m ∩ K = ∅ for m > M .
Now, for 1 ≤ m ≤ N , choose a C ∞ function ψm such that ψm > 0 on B :m
: c
and ψm = 0 on Bm , according to Lemma B.26, and let

ψm
ϕm = !N (1 ≤ m ≤ M ),
l=1 ψl

with the understanding#that ϕm = 0 outside B :m . Since the sum in the denominator


N : int :m , the function ϕm is
is strictly positive on 1 Bl , an open set that includes B
C ∞ and supp(ϕm ) = B :m ⊂ Uj(m) . Finally, for l > M we have B :l ∩ K = ∅ and
hence ψl = 0 on K; therefore, for x ∈ K,
M
" !M !M
ψm (x) ψm (x)
ϕm (x) = !m=1
N
= !m=1
M
= 1,
m=1 l=1 ψl (x) l=1 ψl (x)

so the ϕm ’s have all the desired properties.

The collection of functions {ϕm } in Theorem B.27 is called a partition of


unity on K subordinate to the covering {Uj }.
We are now ready to prove Green’s theorem for general regions with smooth
boundary. Afterwards, we shall indicate how to extend the proof to regions with
piecewise smooth boundary.

B.28 Theorem. Suppose S is a compact region in R2 whose boundary ∂S is a finite


union of simple closed curves of class C 1 , equipped with the positive orientation.
Suppose also that P and Q are C 1 functions on S. Then
* ** - .
∂Q ∂P
P dx + Q dy = − dA.
∂S S ∂x ∂y

Proof. The starting point is the special case of Green’s theorem, proved in §5.2, in
which S is x-simple and y-simple. (What we actually need here is the case where
S is a rectangle with sides parallel to the axes.) In contrast to the method used in
§5.2 to handle more general regions, instead of cutting up the region into simple
pieces, we shall use a partition of unity to cut up the integrand into pieces that are
easily analyzed by a change of variables.
By Theorem 3.13, for every point x ∈ ∂S there is an open disc D centered at x
such that the portion of ∂S within D is the graph of a C 1 function, either y = f (x)
or x = f (y). By the Heine-Borel theorem, we can select finitely many of these
436 Appendix B. Some Technical Proofs

S∩D L

F IGURE B.1: The transformation (x, y) → (u, v). The disc D is the
indicated by the dashed circle on the left; the rectangle R to which
Green’s theorem is to be applied is dotted on the right.

#
discs, say D1 , . . . , DJ , so that ∂S ⊂ J1 Dj . Then D1 , . . . , DJ , and S int form an
open covering of S.
By Theorem B.27 we can! choose a partition of !unity {ϕm }M1 on S subordinate
m m
to this covering. Then P = 1 ϕm P and Q = 1 ϕm Q on S, and ϕm P and
ϕm Q are still of class C 1 , so it suffices to prove the theorem with P and Q replaced
by ϕm P and ϕm Q for m = 1, . . . , M . In short, it is enough to prove the theorem
when supp(P ) and supp(Q) are either (a) contained in S int or (b) contained in a
disc D such that D ∩ ∂S is the graph of a C 1 function. ;
In case (a), P and Q both vanish on ∂S, so ∂S P dx + Q dy = 0. Also, P
and Q remain C 1 if we extend them to be zero outside of S. But then we can apply
Green’s theorem on any rectangle R that includes S to conclude that
** - . ** - . *
∂Q ∂P ∂Q ∂P
− dA = − dA = P dx + Q dy = 0.
S ∂x ∂y R ∂x ∂y ∂R

Thus the theorem is true in case (a).


Case (b) is the more interesting one. Suppose, to be definite, that P and Q are
supported in D, where ∂S ∩ D is a portion of the graph of a C 1 function y = f (x).
(The case x = f (y) is similar.) We define a change of variables (x, y) = G(u, v)
on D by

x = u, y = v + f (u); that is, u = x, v = y − f (x).

The transformation G−1 (x, y) = (u, v) maps D to a bounded region in the


uv-plane, ∂S ∩ D to a line segment L in the u-axis, and S ∩ [supp(P ) ∪ supp(Q)]
to a bounded region T in either the upper or the lower half-plane. More precisely,
T will be in the upper half-plane if S lies above the graph y = f (x) and in the
B.7. Green’s Theorem and the Divergence Theorem 437

lower half-plane if S lies below; thus, the relative orientations of T and L are the
same as those of S and ∂S. See Figure B.1.
Let R be a rectangle in the uv-plane, one of whose sides is the segment L, that
includes T . Then the functions P: = P ◦ G and Q : = Q ◦ G are C 1 functions on R
that vanish on the three sides of R other than L.
Now, dx = du and dy = f ′ (u) du + dv, so
* *
P dx + Q dy = P: du + Q[f
: ′ (u) du + dv],
∂S L

where L is oriented as a portion of ∂R. Since P: and Q: vanish on the other sides of
R, we can apply Green’s theorem on R to conclude that
* *
P dx + Q dy = [P:(u, v) + Q(u,
: v)f ′ (u)] du + Q(u,
: v) dv
∂S ∂R
** O : P
∂ Q ∂ P: ∂ Q :
(B.29) = − − f ′ (u) du dv.
R ∂u ∂v ∂v

But by the chain rule,


: ∂Q
∂Q : ∂Q )) ∂ P: ∂P ))
− f ′ (u) = ; = .
∂u ∂v ∂x (x,y)=G(u,v) ∂v ∂y (x,y)=G(u,v)
Also, - .
∂(x, y) 1 0
= det DG = det = 1,
∂(u, v) f ′ (u) 1
so du dv;;= dx dy by Theorem B.24. It follows that the double integral (B.29) is
equal to S (∂x Q − ∂y P ) dx dy, which completes the proof.

Let us indicate how this argument can be extended to a region S with piece-
wise smooth boundary. Recall from §5.1 that “piecewise smooth” means that ∂S
consists of curves that are smooth except at finitely many points, where they have
“corners,” i.e., where the direction of the curve changes abruptly. If x0 is such a
point, there is a small disc D centered at x0 such that ∂S ∩ D is the union of por-
tions of two smooth curves that intersect at x0 . By Theorem 3.13, by shrinking D
if necessary we may assume that these curves are the loci of equations F (x0 ) = 0
and G(x0 ) = 0, where ∇F ̸= 0 and ∇G ̸= 0 on D. We shall assume that ∇F (x0 )
and ∇G(x0 ) are linearly independent. (The exceptional case where they are not —
that is, where the two curves are tangent at x0 and the region has a sharp “cusp”
rather than a “corner” at x0 — must be handled by an additional limiting argu-
ment, in which S is approximated by regions with smooth boundaries.) Then, by
438 Appendix B. Some Technical Proofs

F IGURE B.2: Transformation of a region with a corner. The rectangles


to which Green’s theorem is to be applied are dotted on the right.

the inverse mapping theorem, by shrinking D yet further we may assume that the
transformation u = F (x, y), v = G(x, y) has a C 1 inverse on D.
Now, as in the proof of Theorem B.28, we can cover ∂S by finitely many discs
D1 , . . . DJ such that ∂S ∩ Dj is the graph of a smooth function, together with
finitely many discs DJ+1 , . . . DK centered at the corners and satisfying the condi-
tions of the preceding paragraph. By using of a partition of unity subordinate to
the covering {D1 , . . . , DK , S int } of S, we reduce to the case where P and Q are
supported in one of these discs. The discs Dj of the first kind (j ≤ J) are han-
dled as before. For the ones centered at a corner, we use the change of variables
u = F (x, y), v = G(x, y) described above to reduce to the case where the bound-
ary consists of a segment of the u-axis and a segment of the v-axis that meet at the
origin. (This change of variables is not as simple as the one we used before, so
the calculations are more complicated, but the idea is the same.) If S occupies the
“inside” of the corner, the calculation boils down to Green’s theorem on a rectangle
as before; if S occupies the “outside,” it boils down to Green’s theorem for two
rectangles; see Figure B.2.
Finally, we prove the divergence theorem for general regions with C 1 boundary.
The argument can be extended to handle regions with piecewise smooth boundary
in a manner similar to that in the preceding paragraphs.

B.30 Theorem. Suppose R is a compact region in R3 with C 1 boundary ∂R, ori-


ented so that the positive normal points out of R. Suppose also that F is a vector
B.7. Green’s Theorem and the Divergence Theorem 439

field of class C 1 on R. Then


** ***
F · n dA = div F dV.
∂R R

Proof. The proof is very similar to that of Theorem B.28, so we shall omit many
details. By using a partition of unity, we reduce the problem to proving the theorem
when supp(F) ⊂ Rint or when supp(F) ⊂ B where B is a ball such that ∂R ∩ B
1
;; the graph of a C;;; function, say z = ϕ(x, y). In the first case, the integrals
is
∂R F · n dA and R div F dV both vanish, as in Theorem B.28. In the second
case, we introduce a change of variables on B, (x, y, z) = G(u, v, w), defined by

x = u, y = v, z = w + ϕ(u, v); that is, u = x, v = y, w = z − ϕ(x, y),

: = F ◦ G. The set corresponding to ∂R ∩ B in uvw-space is a region S


and set F
in the uv-plane. Let Q be a rectangular box in uvw-space that includes G−1 (R ∩
supp(F)), one face of which is a rectangle in the uv-plane that includes S. (S and
Q correspond to L and R in the proof of Theorem B.28.)
We parametrize the surface z = ϕ(x, y) by (u, v) → (u, v, ϕ(u, v)) to see that
** ** **
8 9
F · n dA = ± : : :
− (∂u ϕ)F1 − (∂v ϕ)F2 + F3 dA = H · n dA,
∂R S ∂Q

where 8 9
H = − (∂u ϕ)F:1 − (∂v ϕ)F:2 + F:3 k.
Here the ± is + or − depending on whether R (resp. Q) lies below or above the
surface z = ϕ(x, y) (resp. the uv-plane), that is, on whether the outward normal to
Q on S is +k or −k; the last equality holds because F : vanishes on ∂Q \ S. In the
vector field H, the functions F:j depend on (u, v, w), but ϕ depends only on (u, v).
By the divergence theorem for the box Q (proved in §5.5), then,
** ***
(B.31) F · n dA = div H dV
∂R Q
*** F G
∂ϕ ∂ F:1 ∂ϕ ∂ F:2 ∂ F:3
= − − + dV.
Q ∂u ∂w ∂v ∂w ∂w

: v, w) = F(u, v, w + ϕ(u, v)), so by the chain rule,


Now, F(u,

∂ F:1 Q1 ∂F
∂F Q1 ∂ϕ ∂ F:2 Q2
∂F Q2 ∂ϕ
∂F
= + , = + ,
∂u ∂x ∂z ∂u ∂v ∂y ∂z ∂v
440 Appendix B. Some Technical Proofs

and
∂ F:j Qj
∂F
= for j = 1, 2, 3,
∂w ∂z
where the tildes continue to denote composition with G. Substituting these formu-
las into (B.31), we obtain
** *** F Q Q2 ∂ F:2 ∂F Q3
G
∂F1 ∂ F:1 ∂F
(B.32) F · n dA = − + − + dV.
∂R Q ∂x ∂u ∂y ∂v ∂z

We are almost done. On the one hand, by integrating first with respect to u or
v, we see that
*** ***
∂ F:1 ∂ F:2
dV = dV = 0,
Q ∂u Q ∂v

because F1 and F2 vanish on the vertical faces of Q. On the other hand, the trans-
formation G is volume-preserving,

∂(x, y, z)
= 1,
∂(u, v, w)

so by Theorem B.24,
*** F Q Q2 ∂FQ3
G *** ***
∂F1 ∂F "
+ + dV = div F dV = div F dV.
Q ∂x ∂y ∂z Q R

Therefore, (B.32) reduces to the desired result:


** ***
F · n dA = div F dV.
∂R R

In conclusion, we remark that these calculations appear more natural if the argu-
ment is recast in the language of differential forms as described in §5.9.
Answers to Selected Exercises

CHAPTER 1

Section 1.1 √
1. ∥x∥ = 2 3, ∥y∥ = 3, θ = 5π/6.

Section 1.2
1. (a) Not open or closed; ∂S = {(0, 0)} ∪ {(x, y) : x2 + y 2 = 4}.
(b) Closed; ∂S = {(x, 0) : 0 ≤ x ≤ 1} ∪ {(x, x2 − x) : 0 ≤ x ≤ 1}.
(c) Open; S = {(x, y) : x ≥ 0, y ≥ 0, and x + y ≥ 1}.
(d) Closed; S int = ∅.

Section 1.3
3. f (0, y) = y.
5. Discontinuous only at (0, 0).
7. Continuous at every irrational.

Section 1.4

1. (a) 1/ 2. (b) 0. (c) Diverges.
2. Any K ≥ (19/ϵ) + 5 will work.
3. lim xk = 0.

Section 1.5
1. (a) sup S = 1, inf S = −1.
(b) sup S = 2, inf S = −1.
(c) sup S = ∞, inf S = π/4.
5. lim xk = 2.

441
442 Answers to Selected Exercises

CHAPTER 2

Section 2.2
1. (a) ∇f (x, y) = (2xy + πy cos πxy, x2 + πx cos πxy);
[∇f (1, −2)] · ( 35 , 45 ) = − 15 (8 + 2π).
2. (a) df = ex−y+3z [(2x + x2 ) dx − x2 dy + 3x2 dz];
f (1.1, 1.2, −0.1) − f (1, 1, 0) ≈ −0.2.
3. (a) dz = 0.036. (b) z.

Section 2.3
1. (Derivatives of f , g, and h are to be evaluated at the same points as f , g, and h
themselves.) (a) dw/dt = f1 (g1 h′ + g2 ) + f2 h′ + f3 .
(b) ∂x w = f1 + f2 g1 + f3 h1 , ∂y w = f2 g2 , ∂z w = f3 h2 .
(c) dw/dx = f ′ (g1 + g2 h′ ).
2. (a) ∂x w = 2f1 + (sin 3y)f2 + 4x3 f3 , ∂y w = −2yf1 + (3x cos 3y)f2 (f1 and
f2 evaluated at (2x − y 2 , x sin 3y, x4 )).
(c) ∂x w = 2(∂2 f )/(f 2 + 1), ∂y w = (2y∂1 f − ∂2 f )/(f 2 + 1) (f and its
derivatives evaluated at (y 2 , 2x − y)).
6. (a) z = 4x − 3y − 6.
(b) 2x + 4y − 6z = 12.

Section 2.5
1. (a) ∂z/∂x = (1 − 3yz)/(3xy − 3z 2 ), ∂z/∂y = (2y − 3xz)/(3xy − 3z 2 ).
3. dz/dt = (2yzt + 5y 4 t + zteyz )/(10y 4 z 3 + 2z 4 eyz − y 2 eyz − yt2 ).
4. 2x, 2x + 6xz 2 .
5. (∂V /∂h)|r = πr 2 , (∂V /∂h)|S = πr 2 − 2πr 2 h/(2r + h), (∂V /∂S)|r =
r/2, (∂S/∂V )|r = 2/r.

Section 2.6
2. r sin θ cos θ(fyy − fxx ) + r(cos2 θ − sin2 θ)fxy − (sin θ)fx + (cos θ)fy .
3. (a) ∂x2 w = 4f11 + 4(sin 3y)f12 + 16x3 f13 + (sin2 3y)f22 + 8x3 (sin 3y)f23 +
16x6 f33 +12x2 f3 , ∂x ∂y w = −4yf11 +(6x cos 3y−2y sin 3y)f12 −8x3 yf13 +
3x(sin 3y cos 3y)f22 + 12x4 (cos 3y)f23 + 3(cos 3y)f2 .

Section 2.7
1. (b) 1/24.
2. (a) P1,3 (h) = h − 21 h2 + 13 h3 , C = 4.
(b) P1,3 (h) = 1 + 12 h − 18 h2 + 161 3
h , C = 5 · 2−7/2 . (Note: These C’s come
from Lagrange’s formula and may not be optimal.)
443

4. 0.747.
5. (a) x2 + xy − 16 (x4 + 3x3 y + 3x2 y 2 + xy 3 ).
(b) 1 + xy − 12 (x4 + x2 y 2 + y 4 ).
6. P(3,1),3 (h, k) = 2 + h + 3k + hk + 12 (π 2 − 3)k2 − 21 hk2 + k3 .
7. P(1,2,1),3 (h, k, l) = 3 + 4h + k + l + 2h2 + 2hk + h2 k.

Section 2.8
1. (a) (0, −2)
√ and (0, 1) minima, √ (0, 0) saddle. √ √
(b) (±1, 2) minima, (0, − 2) maximum, (±1, − 2) and (0, 2) saddles.
(c) (1, ±1) and (0, 0) saddles, ( 32 , 0) minimum.
(e) (0, 0) minimum, (±1, 0) maxima, (0, ±1) saddles.
(f) ((a2 /b)1/3 , (b2 /a)1/3 ), a minimum if a and b have the same sign and a
maximum otherwise.

Section 2.9
1. min = − 12 , max = 4.
2. min = −4, max = 16 .
√5 √
3. min = (308 − 62 31)/27, max = 2/3 3.
4. min = − 85 3 , max = 56.
5. A2 /(1 + b2 + c2 ).
6. min = 0, max = 2/e.
7. min = −2/e, max = 1/e.
8. 3(12)1/3
9. min = 1, √ max√= 3.

11. ( a + b + c)2 .
12. 3V 1/3 .
13. ( 21 , 12 , 0).

14. Vmax = A3/2 /6 3.
15. (2, 0, 2).
16. a2 b2 .

Section 2.10
1. ∂(u, v)/∂(x, y) = 3xy 2 z 2 − yz 3 + 24y 3 , ∂(u, v)/∂(x, z) = −y 2 z 2 − 6xy 3 z,
∂(u, v)/∂(y, z) = xyz 2 + 8y 2 − 12x2 y 2 z.
2. ∂(u, v)/∂(x, y) = 3x − 18y, ∂(v, w)/∂(x, y) = −6x2 − 18y 2 ,
∂(u,⎛w)/∂(x, y) ⎞
= −12x − 6y.
−15 −20
3. (b) ⎝ 3 4 ⎠.
2 4
444 Answers to Selected Exercises
- .
8 6 −21
4. (b) .
18 10 −43

CHAPTER 3

Section 3.1
3. y yes; z no.
5. ∂2 F (0, 0) ̸= 0 and ∂1 F (0, 0) ̸= −1.
6. Can solve for x and y or y and z.
7. Can solve for any pair.
9. Yes.

Section 3.2
1. (a), (c), (f) are smooth curves.
3. (a), (c), (e) are smooth curves.

Section 3.3
1. (a) Plane.
(b) Elliptic cone.
(c) Hyperboloid of revolution.
(d) Paraboloid of revolution.
2. (a) 2x − y − z = 3. (b) x − y = 3.
3. (a) One possibility: f (u, v) = (u cos v, u sin v, f (u)) (a < u < b, |v| ≤ π).
4. (a) One possibility: f (t) = (1 + t, 13 + t, 83 + t).

5. (a) One possibility: f (t) = 12 (1+cos t, 2 sin t, 1−cos t). (b) One possibility:

f (t) = 21 (1 + t, − 2, 1 − t).

Section 3.4
1. (a) det Df = e2x ; x = 12 log(u2 + v 2 ); y is given up to multiples of 2π by
arctan(v/u) when u > 0, 12 π − arctan(u/v) when v > 0, π + arctan(v/u)
when u < 0, 32 π − arctan(u/v) when v < 0.
2. (a) (x, y) = 13 (2v − u, v − 2u).
√ √
4. (d) g(u, v) = 12 (u − u2 + 4v, −u − u2 − 4v).

Section 3.5
1. One relation for (a), (c), and (e); two relations for (d).
445

CHAPTER 4

Section 4.3

1. (a) 54 . (b) 32 35 (5 − 2).
1
2. 20 .
; 0 ; x3 ; 0 ; y/4
3. (a) −2 4x f (x, y) dy dx, −8 y1/3 f (x, y) dx dy.
;2;x ; 3 ; 4−x
(b) 0 x/3 f (x, y) dy dx + 2 x/3 f (x, y) dy dx,
; 1 ; 3y ; 2 ; 4−y
0 y f (x, y) dx dy + 1 y f (x, y) dx dy.
; 1 ; y1/2
4. (a) 0 y3 f (x, y) dx dy.
;0 ;1 ;2;1
(b) −1 −x f (x, y) dy dx + 0 x/2 f (x, y) dy dx.
5. (a) 58 e6 − 17 e2 . (b) 13 (sin 2 − sin 1). (c) 21 e2 − e.
;1 '8 ;2 '
6. 0 f (y) y/2 dy + 1 f (y)( y/2 − y + 1) dy.
; 1 ; √1−x2 ; 1
8. (a) −1 −√1−x2 x2 +y2 f dz dy dx.
; 1 ; 1 ; √z−x2
(b) −1 x2 −√z−x2 f dy dz dx.

; 1 ; √z ; z−y2
(c) 0 −√z √ 2 f dx dy dz.
− z−y
; 1 ; √1−x ; y
9. (b) 0 0 f dz dy dx.
; 1 ; √1−x ; 0√1−x
(c) 0 0 z f dy dz dx.
10. 41 (a, b, c).
11. mass = 8, center of mass = (1, 43 , 43 ).
12. − 126 5 .

Section 4.4
1. 3π/2.
2. ( π1 , 0, 34 ).

3. 4π( 83 − 3).
4. 2π − 32 9 .
1 2 h2 .
5. 2 πcR
6. 5π/3.
7. πcR4 /3.
8. ( 83 , 38 , 38 ).
1
9. 14 (55, −5).
4
10. 81 . √
11. π/3 3.
12. A = 23 log 4, x = 14
9 log 4 , y= 28
9 log 4 .
13. 3.
446 Answers to Selected Exercises

14. 3.
15. 21 π 2 R4 .

Section 4.5
- .
1 1 + ex
2. (a) log .
x 1+x
(b) (2x)−1 (5 cos x5 − cos x).
2
(c) x−1 (2e3x − ex ).

Section 4.6
1. (a) Converges. (b) Diverges. (c) Converges. (d) Converges. (e) Diverges.
2. (a) Converges. (b) Diverges. (c) Converges. (d) Converges. (e) Diverges.
3. (a) Converges. (b) Diverges. (c) Converges. (d) Diverges. (e) Converges.
(f) Diverges.
4. (b) p > 1.
5. (b) p > 1.
10. − 21 log 3.

Section 4.7
1√
2. (a) Diverges. (b) 14 π. (c) 2π/3. (d) 2 π. (e) Diverges.

CHAPTER 5

Section 5.1

1. (a) 2π a2 + b2 . (b) 14 2
3 . (c) e . (d) 24.
'
2. /(a) 4aE( 1 − (b/a)2 ). 0(b) 2 E(2−1/2 ).
3/2

3. 0, (2 + sinh 2)/4 sinh 1) .


4. 32 [(1 + 4π 2 )3/2 − 1].
5. (a) 1. (b) 23 9856
21 . (c) −2π. (d) 45 .
6. (a) 2 (1 − e ) + (2/π). (b) −π. (c) − 43 .
1 −1

Section 5.2
1. (c) 12. (d) 0.
2. 15
2 π.
3. The circle x2 + y 2 = 1.
4. 3πR2 .
447

Section 5.3
1. 32 π[(1 + a2 )3/2 − 1].
2. 16 π[(1 + 4a2 )3/2 − 1].
3. 4π 2 ab. O √ P
2 2πab2 a + a2 − b2
4. 2πa + √ log if a > b,
a2 − b2 b
O√ P
2πab 2 b 2 − a2
2πa2 + √ arcsin if b > a.
b2 − a2 b
5. (0, 0, 21 ).
6. 20π/3.
7. 0.
8. (a) − 179 . (b) 0. (c) 2. (d) π(b2 − a2 ). (e) π(25/2 − 72 ).

Section 5.4
1. (a) curl F = xi − yj + (y − 2xy)k, div F = x + y 2 .
(b) curl F = 0, div F = −x(y 2 + z 2 ) sin yz.
(c) curl F = (1 − 4xy)i − (x2 − 3z 2 )j + 4yzk, div F = 0.
2. (a) 0. (b) 2x − 24yz. (c) a(a + n − 2)|x|a−2 . (d) 0.

Section 5.5
1. (c) 3a4 . (d) 4π(a2 b2 + b2 c2 + a2 c2 )/3abc. (e) 3A.
2. 4πa5 .
6. (a) −x/|x|3 .

Section 5.6
3. (a) 2ρ(xi + yj)/(x2 + y 2 ).

Section 5.7
1. 2π. √
2. −πa2 / 2.
4. 0.
5. 0.
7. 5 + 3π(r 2 − 1).

Section 5.8
1. (a) x2 y + 13 x3 − 13 y 3 + C.
(b) Not a gradient.
(c) e2x sin y − 3xy + 5x + C.
448 Answers to Selected Exercises

(d) xyz + cos xy + sin yz + C.


(e) Not a gradient.
(f) x2 y + (y + 2) log z + C.
(g) 21 (xw + yz)2 − e2y+z + cos zw + C.
2. (a) Not a curl.
(b) 12 xz 2 i − (xyz + 12 x2 + 12 z 2 )j + ∇f .
;z 2 2
(c) (5yz + z 2 )i + (6xz − x 0 e−x t dt)j + ∇f .

CHAPTER 6

Section 6.1
1. (a) −1 − 2√−1/3 < x < −1 + 2−1/3 ; (2x + 2)/[1 − 2(x + 1)3 ].

(b) x < − 2 or x > 2; 10/(x2 − 2).
(c) x > 0; 12 (1 + x−1 ).
(d) e−1 < x < e; log x/(1 − log x).
2. (a) Diverges. (b) 1. (c) Diverges. (d) Diverges.

Section 6.2
1. Converges.
2. Converges.
3. Diverges.
4. Converges.
5. Diverges.
6. Converges.
7. Diverges.
8. Diverges.
9. Converges.
10. Converges.
11. Diverges.
12. Converges.
13. Converges.
14. Diverges.
15. Diverges.
16. Converges.
17. Converges.
18. Diverges.
21. p > 1.
449

Section 6.4
1. Converges absolutely for −3 ≤ x ≤ −1.
2. Converges absolutely for 0 < x < 1.
3. Converges absolutely for all x.
4. Converges absolutely for −5 < x < 5, conditionally for x = −5.
5. Converges absolutely for 2 < x < 6, conditionally for x = 6.
6. Converges absolutely for x > 0, conditionally for x = 0.
7. Converges absolutely for 4 < x < 8.
8. Converges absolutely for −2 < x < 0, conditionally for x = −2 and x = 0.
9. Converges absolutely for − 32 < x < 23 , conditionally for x = − 32 .
10. Converges conditionally.
11. Converges conditionally.
12. Diverges.
13. Converges absolutely.
14. Converges conditionally.
18. Converges when |x| < 1 and θ ∈ R, when x = 1 and θ ̸= 2kπ, or when
x = −1 and θ ̸= (2k + 1)π.

CHAPTER 7
Section 7.1
1. (a) Uniform convergence on [0, 1 − δ] (δ > 0).
(b) Uniform convergence on [δ, 1] (δ > 0).
(c) Uniform convergence on [0, 12 π − δ] and [ 12 π + δ, π] (δ > 0).
(d) Uniform convergence on R.
(e) Uniform convergence on [δ, ∞) (δ > 0).
(f) Uniform convergence on [0, b] (b < ∞).
(g) Uniform convergence on [0, 1 − δ] and [1 + δ, ∞) (δ > 0).
2. (a) Uniform convergence on [δ, ∞) (δ > 0).
(b) Uniform convergence on [−1, 1].
(c) Uniform convergence on [−2 + δ, 2 − δ] (δ > 0).
(d) Uniform convergence on R.
(e) Uniform convergence on R.
(f) Uniform convergence on [1 + δ, ∞) (δ > 0).
Section 7.3

" (−1)n x2n+1
5. (a) , x ∈ R.
0
n!(2n + 1)
450 Answers to Selected Exercises


" (−1)n x4n+1
(b) , x ∈ R.
(2n)!(4n + 1)
0

" (−1)n−1 (2x)n
(c) , |x| ≤ 12 .
n2
1
10. (a) ;ex + x−1 (1 − ex ).
x
(b) 0 t−2; x(1 − cos t) dt.
(c) x−1 0 t−1 (et − 1) dt.
(d) cos x − x sin x.

Section 7.5
1 · 3 · · · (2n − 3) π
4. (2n−1)/2
.
2 · 4 · · · (2n − 2) 2x
Section 7.6
√ ' 3 √
3. (a) 83 π. (b) 12 π/27. (c) 16 π.
5. Γ((a + 1)/b)Γ(c + 1)/bΓ(c + 1 + (a + 1)/b).
1 · 3 · · · (k − 1) π 2 · 4 · · · (k − 1)
7. if k is even, if k is odd (and k > 1).
2 · 4···k 2 3 · 5···k
10. (a) Diverges. (b) Converges.

CHAPTER 8

Section 8.1

4 " sin(2m − 1)θ
1. .
π 1 2m − 1
1
2. 2 − 12 cos 2θ.

2 4 " cos 2mθ
3. − .
π π 1 4m2 − 1
"∞
π2 (−1)n
4. +4 cos nθ.
3 n2
1

sinh bπ " (−1)n inθ
5. e .
π −∞
b − in

8 " sin(2m − 1)θ
6. .
π (2m − 1)3
1
"∞
2 sin na
7. cos nθ.
a(π − a) 1 n
451


1 2 " 1 − cos na
8. + cos nθ.
2π π n2 a2
1

Section 8.2
"∞
sin 2nθ
1. (a) .
n
1

2 " sin 2nθ
(b) 1 − .
π 1 n
"∞
π2 (−1)n
2. +4 (cos 41 nπ cos nθ + sin 14 nπ sin nθ).
3 n2
1

1 2 " sin(2m − 1)θ
3. (a) + .
2 π 2m − 1
1

1 2 " cos 2mθ 1
(b) − 2
+ sin θ.
π π 1 4m − 1 2

1 1 " sin na
(c) + cos nθ.
2π π na
1

2 sinh π " (−1)n−1 n
(d) sin nθ.
π n2 + 1
1
4. (a) 12 , 14 (π − 2).
1 2
(b) 61 π 2 , 12 π .
(c) (πb csch πb − 1)/2b2 , (πb coth πb − 1)/2b2 .
1 3
(d) 32 π .

Section 8.3

" sin na
2
2. (b) sin nθ.
a(π − a) n2
1
6. (a) k = 6.
(b) k = ∞.
(c) k = 0, i.e., the function is merely continuous. (It is known to be nowhere
differentiable.)

Section 8.4

4 " sin(2m − 1)θ
1. (a) 1; .
π 2m − 1
1

2 4 " cos 2mθ
(b) − ; sin θ.
π π 1 4m2 − 1
452 Answers to Selected Exercises


" "∞ ∞
π2 (−1)n (−1)n+1 8 " sin(2m − 1)θ
(c) +4 cos nθ; 2π sin nθ− .
3 n2 n π (2m − 1)3
1 1 1
∞ ∞
π 2 " cos(4m − 2)θ 4" sin(2m − 1)θ
(d) − 2
; (−1)m+1 .
4 π (2m − 1) π (2m − 1)2
1 1

4 " sin(2m − 1)πx
2. (a) .
π 1 2m − 1

4 " (−1)m+1 cos( 12 m − 14 )πx
(b) .
π 2m − 1
1

8l2 " sin(2m − 1)πx/l
(c) 3 .
π (2m − 1)3
1
"∞
e2πinx
(d) (e − 1) .
−∞
1 − 2πin

Section 8.5

400 " 1 2 2 (2m − 1)πx
1. (a) u(x, t) = 50 − 2 2
e−(0.00011)(2m−1) π t cos .
π 1
(2m − 1) 100
! !∞
2. u(x, t) = ∞ 2
−∞ cn exp[inθ8 − n kt] ;where f (θ) = −∞ cn e
inθ .
t 9
3. bn (t) = exp(−n2 π 2 kt/l2 ) bn (0) + 0 βn (s) exp(n2 π 2 ks/l2 ) ds .

2l2 m " 1 nπb nπx nπct
4. (a) u(x, t) = 2 sin sin cos .
π b(l − b) 1 n2 l l l
! −δt
5. u(x, t) = ∞ 1 e (bn cos ωn t + Bn sin ωn t) sin(nπx/l),
where ωn2 = (nπc/l)2 − δ2 .
"∞ + ,
1 sinh(nπ(L − y)/l) 2 sinh(nπy/l)
6. (b) u(x, y) = an + an sin(nπx/l),
sinh(nπL/l) sinh(nπL/l)
1! !∞ 2
where f1 (x) = ∞ 1
1 an sin(nπx/l) and f2 (x) = 1 an sin(nπx/l).

Section 8.6
3. a = − 12 , b = −1, c = 61 .
9. (a) π 4 /90.
(b) π 6 /960.
(c) π 8 /9450.
(d) 21 a(π − a) if 0 ≤ a ≤ π, π-periodic as a function of a.
Bibliography

[1] H. Anton, Elementary Linear Algebra (7th ed.), John Wiley, New York, 1994.

[2] R. G. Bartle, Return to the Riemann integral, Amer. Math. Monthly 103
(1996), 625–632.
[3] H. S. Bear, A Primer of Lebesgue Integration, Academic Press, San Diego,
1995.

[4] G. Birkhoff and S. Mac Lane, A Survey of Modern Algebra (5th ed.), A K Pe-
ters, Wellesley, MA, 1997.
[5] J. D. DePree and C. W. Swartz, Introduction to Real Analysis, John Wiley,
New York, 1988.

[6] G. B. Folland, Fourier Analysis and its Applications, Brooks/Cole, Pacific


Grove, CA, 1992.
[7] J. H. Hubbard and B. B. Hubbard, Vector Calculus, Linear Algebra, and Dif-
ferential Forms, Prentice-Hall, Upper Saddle River, NJ, 1999.

[8] T. W. Hungerford, Abstract Algebra: an Introduction (2nd ed.), Saunders Col-


lege Publishing, Fort Worth, 1997.
[9] B. F. Jones, Lebesgue Integration on Euclidean Space, Jones and Bartlett,
Boston, 1993.

[10] D. W. Kammler, A First Course in Fourier Analysis, Prentice Hall, Upper


Saddle River, NJ, 2000.
[11] T. W. Körner, Fourier Analysis, Cambridge University Press, Cambridge, UK,
1988.

[12] S. G. Krantz, Real Analysis and Foundations, CRC Press, Boca Raton, FL,
1991.

453
454 Bibliography

[13] J. C. Lagarias, The 3x + 1 problem and its generalizations, Amer. Math.


Monthly 92 (1985), 3–23.

[14] P. D. Lax, Change of variables in multiple integrals, Amer. Math. Monthly 106
(1999), 497–501.

[15] P. D. Lax, Change of variables in multiple integrals II, Amer. Math. Monthly
108 (2001), 115–119.

[16] D. C. Lay, Linear Algebra and its Applications (2nd ed.), Addison-Wesley,
Reading, MA, 1997.

[17] J. W. Lewin, A truly elementary approach to the bounded convergence theo-


rem, Amer. Math. Monthly 93 (1986), 395–397.

[18] W. Rudin, Principles of Mathematical Analysis (3rd ed.), McGraw-Hill, New


York, 1976.

[19] S. H. Weintraub, Differential Forms: A Complement to Vector Calculus, Aca-


demic Press, San Diego, 1997.
Index

Abel summability, 331 piecewise smooth, 222


Abel’s test, 306 boundary point, 10
Abel’s theorem, 330 bounded convergence theorem, 191
absolute convergence bounded sequence, 25
of a series, 295, 308 bounded set, 9
of an improper integral, 196
accumulation point, 23 C 1 , 58
adjoint of a matrix, 408 C k , 78
affine map, 406 C ∞ , 78
alternating series, 301 Cartesian product, 158
alternating series test, 301 Cauchy principal value, 199
angle, 6 Cauchy product, 309
arc length, 213, 219 Cauchy sequence, 27
arcwise connected set, 35 uniformly, 315
area, 163, 164, 230 Cauchy’s inequality, 5
inner, 164 center of gravity, 175
outer, 164 centroid, 175
average of a function, 166 chain rule, 62, 64, 109
characteristic function, 160
ball, 9 characteristic polynomial, 417
basis, 414 closed interval, 2
orthogonal, 398 closed set, 10
standard, 406 closure, 10
Bessel function, 332 cofactor expansion, 412
Bessel’s inequality, 364 column space, 415
beta function, 344 column vector, 408
binomial series, 328 compact set, 30, 32
Bolzano-Weierstrass theorem, 30 complement, 9
bound completeness, 24
lower, 24 composition of mappings, 3
upper, 24 conditional convergence, 296
boundary, 10, 253 conjugate of a complex number, 3

455
456 Index

connected set, 34 decreasing sequence, 25


arcwise, 35 degenerate critical point, 98
conservative vector field, 258 density, 174
content (zero), 154, 161, 165 derivative, 44
continuity directional, 60
at a point, 14 exterior, 275
Hölder, 40 Fréchet, 108
on a set, 14 left-hand, 45
piecewise, 356 normal, 227
separate, 19 outward normal, 241
uniform, 39 partial, 53
convergence mixed, 78
absolute pure, 78
of a series, 295, 308 right-hand, 45
of an improper integral, 196 derived series, 326
conditional, 296 determinant, 411
in norm, 395 differentiability, 44, 55, 108
mean-square, 395 differential, 59
of a sequence, 21 differential form, 218, 268–275
of a series, 280 dimension of a subspace, 414
of an improper integral, 194 directional derivative, 60
of improper integral, 197 Dirichlet kernel, 365
pointwise, 311, 317 Dirichlet problem, 391
uniform, 314, 317, 336 Dirichlet’s test
convex set, 71 for improper integrals, 202
covering, 32 for series, 303
Cramer’s rule, 417 disconnected set, 33
critical point, 95 disconnection, 34
degenerate, 98 distance
cross product, 7 between two points, 6
curl of a vector field, 237 between two sets, 33, 428
curve divergence
piecewise smooth, 214 of a sequence, 21
simple closed, 222 of a series, 280
smooth, 123 of a vector field, 237
cylindrical coordinates, 183 of an improper integral, 194
of improper integral, 197
Darboux’s theorem, 53 divergence theorem, 239
decreasing function, 46 domain of a mapping, 3
Index 457

duplication formula, 346 Green’s theorem, 223

echelon form, 410 Hölder continuity, 40


reduced, 410 half-open interval, 2
eigenbasis, 417 heat equation, 247, 382
eigenvalue, 417 Heine-Borel theorem, 32
eigenvector, 417 Henstock-Kurzweil integral, 210
electric field, 247 Hessian, 96
elementary matrix, 409 homogeneity, 68
elementary row operation, 409
elliptic integral, 221 identity matrix, 408
Euclidean space, 4 imaginary part, 3
Euler’s theorem, 68 implicit function theorem, 115, 118
Euler-Mascheroni constant, 295 improper integral, 193
exterior derivative, 275 increasing function, 46
exterior product, 270 increasing sequence, 25
extreme value theorem, 31 induction, 20
inequality
factorial, 2 Bessel’s, 364
Fibonacci sequence, 20 Cauchy’s, 5
flux, 233 isoperimetric, 402
Fourier coefficient, 358 infimum, 24
Fourier cosine series, 378 infinite product, 284
Fourier series, 358 infinite series, 280
Fourier sine series, 379 inner product, 394
Fréchet derivative, 108 integrability, 149, 159, 160
fractional integral, 349 square, 396
Fubini’s theorem, 169 integral
function, 3 elliptic, 221
functional dependence, 141 fractional, 349
functional equation, 343 gauge, 210
fundamental theorem of calculus, 155 Hesntock-Kurzweil, 210
improper, 193
gamma function, 342 iterated, 169
gauge integral, 210 Lebesgue, 209
Gauss’s theorem, 239 line, 217
Gaussian elimination, 410 lower, 149, 159
geometric series, 281 Riemann, 149, 159
gradient, 55 surface, 233
Green’s formulas, 241 upper, 149, 159
458 Index

integral test, 285 locus, 120


interior, 10 lower bound, 24
interior point, 10
intermediate value theorem, 35 manifold, 131
inverse map, 3
of a mapping, 3 mapping, 3
of a matrix, 408 invertible, 3
inverse mapping theorem, 137 linear, 406
invertible mapping, 3 one-to-one, 3
invertible matrix, 408 matrix, 407
isoperimetric inequality, 402 invertible, 408
iterated integral, 169 symmetric, 418
Maxwell’s equations, 250
Jacobian, 110 mean value of a function, 166
Jordan measurability, 162 mean value theorem, 46
for integrals, 166, 167
l’Hôpital’s rule, 47, 49 generalized, 47
Lagrange multiplier, 103 mean-square convergence, 395
Lagrange’s method, 103 measurability, 162, 208, 209
Lambert series, 319 measure, 208
Laplace transform, 342 moment of intertia, 175
Laplace’s equation, 250 monotone sequence, 25
Laplacian, 82, 238 monotone sequence theorem, 25
least-squares fit, 105 multi-index, 82
Lebesgue integral, 209
Lebesgue measurability, 208, 209 neighborhood, 10
Lebesgue measure, 208 nested interval theorem, 26
left-hand derivative, 45 nonsingular matrix, 408
limit norm
of a function, 13 of a function, 394
one-sided, 13 of a linear map, 109
limit comparison test, 288 of a vector, 5
limit inferior, 29 normal component, 233
limit superior, 29 normal derivative, 227
line integral, 217 nullspace, 415
linear combination, 405
linear dependence, 413 one-to-one mapping, 3
linear mapping, 406 open interval, 2
linear span, 405 open set, 10
local maximum or minimum, 95 order of a multi-index, 83
Index 459

ordering of a double series, 307 Cartesian, 158


orientation Cauchy, 309
of a curve, 214 cross, 7
positive, 222 exterior, 270
orthogonal basis, 398 infinite, 284
orthogonal complement, 414 inner, 394
orthogonality of matrices, 407
of a sequence, 394 pullback, 269
of functions, 394 Pythagorean theorem, 8
of vectors, 6
orthonormal sequence, 394 Raabe’s test, 291
radius of convergence, 323
orthonormality, 413
range of a mapping, 3
Ostrogradski’s theorem, 239
rank of a matrix, 415
outward normal derivative, 241
ratio test, 289, 300
real part, 3
Parseval’s identity, 398
rearrangement of a series, 298
partial derivative, 53
rectangle, 158
partial sum, 280
rectifiable curve, 219
partition
recursion, 20
of a rectangle, 159
refinement of a partition, 148
of an interval, 148
region
of unity, 435
regular, 222
pathwise connected set, 35
simple, 223, 239
periodic function, 355
regular region, 222
piecewise continuity, 356
renormalization, 252
piecewise smoothness
Riemann integrability, 149, 159
of a curve, 214
Riemann integral, 149
of a function, 214, 366
Riemann sum, 156
of boundary, 222 lower, 148, 159
pointwise convergence, 311, 317 upper, 148, 159
Poisson equation, 250 right-hand derivative, 45
Poisson integral formula, 392 Rolle’s theorem, 46
Poisson kernel, 370 root test, 289, 300
positive orientation, 222 row reduction, 410
potential, 247, 259 row space, 415
power series, 323 row vector, 408
principal axis theorem, 418
principal value, 199 saddle point, 97
product sawtooth wave, 359
460 Index

separate continuity, 19 strictly decreasing function, 46


separation of variables, 382 strictly increasing function, 46
sequence, 19 strophoid, 125
bounded, 25 subsequence, 26
Cauchy, 27 subspace, 414
decreasing, 25 sum of a series, 280
doubly infinite, 20 summation by parts, 303
Fibonacci, 20 support of a function, 433
finite, 20 supremum, 24
increasing, 25 surface integral, 233
monotone, 25 symmetric matrix, 418
orthogonal, 394
orthonormal, 394 tangent line, 51
series tangent plane, 56
alternating, 301 Taylor polynomial, 85
binomial, 328 Taylor remainder, 85
derived, 326 Taylor series, 282
Fourier, 358 Taylor’s theorem
Fourier cosine, 378 in several variables, 91
Fourier sine, 379 with integral remainder, 85, 86
geometric, 281 with Lagrange’s remainder, 88
infinite, 280 telescoping series, 283
Lambert, 319 term of a series, 280
power, 323 test
Taylor, 282 Abel’s, 306
telescoping, 283 alternating series, 301
simple closed curve, 222 Dirichlet’s
simple region, 223, 239 for improper integrals, 202
smooth curve, 123 for series, 303
smooth surface, 128 integral, 285
spectral theorem, 418 limit comparison, 288
sphere, 9 M, 317
spherical coordinates, 184 Raabe’s, 291
square wave, 362 ratio, 289, 300
square-integrability, 396 root, 289, 300
standard basis, 406 Weierstrass, 317
standardized function, 367 theorem
Stirling’s formula, 352 Abel’s, 330
Stokes’s theorem, 253, 277 Bolzano-Weierstrass, 30
Index 461

bounded convergence, 191


Darboux’s, 53
divergence, 239
Euler’s, 68
extreme value, 31
Fubini’s, 169
fundamental (of calculus), 155
Gauss’s, 239
Green’s, 223
Heine-Borel, 32
intermediate value, 35
mean value, 46
monotone sequence, 25
nested interval, 26
Ostrogradski’s, 239
Pythagorean, 8
Rolle’s, 46
Stokes’s, 253, 277
transformation, 3, 133
transpose of a matrix, 408
triangle inequality, 5
triangle wave, 359

uniform continuity, 39
uniform convergence
of a sequence, 314
of a series, 317
of an improper integral, 336
upper bound, 24

vector field, 211


conservative, 258
vector potential, 263
velocity, 51

Wallis’s formula, 349


wave equation, 251, 384
Weierstrass M-test, 317
work, 218

zero content, 154, 161, 165


SUMMARY OF BASIC LOGIC

This appendix is a very brief summary of the basic language and princi-
ples of mathematical logic. More extensive treatments can be found in many
places, such as The Tools of Mathematical Reasoning by Tamara J. Lakins
(American Mathematical Society, 2016).
Statements. Mathematics deals with statements (or assertions or propo-
sitions) that have a definite truth value: they must be either true or false.
In this discussion we use the letters P and Q to denote such statements. For
example, P could stand for the statement“5 > 2” (true) and Q could stand
for the statement “every odd number is divisible by 3” (false). Statements
can be quite complex objects built up out of simpler statements. For exam-
ple, the statement “every real number x can be written as n + y where n is
an integer and 0  y < 1” is built from the statements “x = n + y,” “n is
an integer,” and “0  y < 1” together with a couple of quantifiers. (See (2)
below.)
The Fundamental Operations. The basic logical operations to create
new statements from old ones are defined by the English words “and,” “or,”
and “not,” which logicians like to indicate by the symbols ^, _, and ¬. If P
and Q are statements, the statement P ^ Q is true precisely when P and Q
are both true; the statement P _ Q is true precisely when either P or Q is
true (or both);1 and ¬P is true precisely when P is false.
Observe that the negation ¬ interchanges ^ and _. If it is not the case
that P and Q are both true, then one or the other (or both) must be false;
and if it is not the case that P is true or Q is true, then both must be false:
(1) ¬(P ^ Q) ⌘ ¬P _ ¬Q, ¬(P _ Q) ⌘ ¬P ^ ¬Q.
Here the symbol ⌘ means that the statements on either side of it are logically
equivalent: they both have the same truth value, no matter whether P and
Q are true or false.
The symbols ^ and _ will probably remind the reader of the symbols
\ and [ in set theory. This is no accident. if P and Q are the statements
“x 2 A” and “x 2 B,” where x denotes an element of some set S and A and
B denote subsets of S, then P ^ Q and P _ Q are the statements “x 2 A \ B”
and “x 2 A [ B.” Also, ¬P is the statement “x 2 / A.”
1
That is, the word “or” is always to be interpreted in the inclusive sense: saying that
P is true or Q is true includes the possibility that they are both true.

463
Many mathematical statements involve a variable that can take di↵erent
values, such as the x in the preceding statements. We can denote such
a statement by P (x) to indicate the variable object explicitly; it is always
assumed, either explicitly or implicitly, that x is an element of some specified
set. P (x) may be true for some x’s and false for others. For example,
“x2 x 6 = 0” is a statement about real numbers; it is true for x = 3 and
x = 2 and false for all other values of x.
Quantifiers. Often we are interested not in the truth of P (x) for a
particular x but wish to say something about its truth as x ranges over
some specified set S. The two most common species of such statements
are “P (x) is true for all x 2 S” and “P (x) is true for at least one x 2 S.”
Logicians use the universal quantifier 8 and the existential quantifier 9 (read
as “for all” and “there exists”) for these situations. That is,

8x 2 S P (x) and 9x 2 S P (x)

are the symbolic form of the statements “for all x in S, P (x) is true” and
“there exists an x in S such that P (x) is true.” Note that the English versions
of these statements can be reformulated in various ways such as “P (x) is true
for every x in S” and “P (x) is true for some x in S” in which the quantifying
clause follows the P (x), but in symbolic form, the quantifiers must always
precede P (x). Note also that when the set S is clearly understood, it is often
omitted from the quantifier; that is, we just say “8x” or “9x” rather than
“8x 2 S” or “9x 2 S.”
Example: The sentence at the end of the first paragraph can be written
symbolically as

(2) 8x 2 R 9n 2 R 9y 2 R (n 2 Z) ^ (0  y < 1) ^ (x = n + y).

Negation interchanges 8 and 9 just as it does ^ and _. That is, if it is


not the case that P (x) is true for all x 2 S, then P (x) must be false for some
x 2 S; and if it is not the case that there is an x 2 S for which P (x) is true,
then P (x) must be false for all x 2 S.:

¬[8x 2 S P (x)] ⌘ [9x 2 S ¬P (x)], ¬[9x 2 S P (x)] ⌘ [8x 2 S ¬P (x)].

Implications. The majority of significant mathematical statements are


implications, that is, statements of the form P ) Q, read as “P implies
Q” or (more commonly) “if P , then Q.” In this situation, P is called the
hypothesis and Q is called the conclusion. The “if . . . then” construction

464
is used in ordinary English in several di↵erent ways, but in mathematics it
has just one precise interpretation in terms of the truth values of P and Q.
Namely, when P is true then Q must also be true, but when P is false, Q
can be either true or false. That is, the only forbidden situation is that P is
true and Q is false:
P ) Q ⌘ ¬(P ^ ¬Q).
(In view of (1), this means that P ) Q is logically equivalent to ¬P _ Q.
It is a matter of psychology rather than logic to prefer the former version to
the latter.)
Implications involving a variable x often implicitly contain an unex-
pressed universal quantifier. For example, “if x > 0 then ex > 1” is a
statement about real numbers, and it really should be prefaced by “for all
x 2 R.” This is rarely a source of confusion, except that the negation of
such a statement contains an existential quantifier that cannot be omitted.
The negation of the (false) statement “if x > 0 then 3x > 1” is “there is an
x such that x > 0 but 3x  1 (true; any x with 0 < x  13 will work).
The converse of an implication P ) Q is the implication Q ) P . These
two statements are di↵erent and must not be confused with each other. For
example, the statement “if 0 < x < 1 then x3 < x” is true; its converse “if
x3 < x then 0 < x < 1” is false (any x < 1 is a counterexample).
There is, however, a way of “reversing the order” in an implication that
yields an equivalent statement. The assertion P ) Q means that Q must
be true when P is true; thus if Q is false, P must also be false. That is,
P )Q ⌘ ¬Q ) ¬P.
The statement ¬Q ) ¬P is called the contrapositive of the statement
P ) Q. These two statements are logically equivalent; for both of them, the
forbidden situation is that P is true while Q is false. This equivalence gives
a useful strategy for proving an implication P ) Q (“proof by contraposi-
tion”). Namely, instead of assuming the hypothesis P and reasoning one’s
way to the conclusion Q, one assumes the hypothesis ¬Q and reasons one’s
way to the conclusion ¬P .
If P ) Q and Q ) P are both true, we say that the statements P and
Q are equivalent and write P , Q (read as “P if and only if Q”). One can
replace the implication Q ) P by its contrapositive:
P ,Q ⌘ (P ) Q) ^ (¬P ) ¬Q).
That is, the statement P , Q means that P and Q (which, in practice, will
usually contain a variable x) always have the same truth value (no matter

465
what x is). Proving that P , Q is usually a matter of making two separate
arguments to show that P ) Q and Q ) P or that P ) Q and ¬P ) ¬Q.
(Note: The “equivalences” ⌘ and , are di↵erent. “P ⌘ Q” means
that P and Q have the same truth values simply by virtue of their logical
structure; “P , Q” means that P and Q have the same truth value by
virtue of their specific content.)
Proof by Contradiction. We conclude with a few words about the proof
technique known as proof by contradiction. The underlying logical princi-
ple is that mathematical statements must be either true or false, so that a
statement that is not false must be true:

¬(¬P ) ) P.

Thus, one can prove a statement P by assuming ¬P and p reasoning one’s


way to a contradiction. For example,p the classic proof that 2 is irrational
is of this form. One assumes that 2 = p/q where p and q are integers with
no common factor (so that p/q is a fraction “in lowest terms”) and shows
that p and q must both be even, contradicting the condition that they have
no common factor.
When proving a statement that has the form of an implication, a common
tactical error is to confuse proof by contraposition with proof by contradic-
tion. That is, wishing to prove P ) Q, one can assume its negation P ^ ¬Q
and deduce a contradiction. But more often than not, the argument never
uses the hypothesis P until the very end; rather, it proves the contraposi-
tive ¬Q ) ¬P and then uses the conclusion ¬P as a contradiction to P .
In these cases it makes a cleaner argument to phrase it as a contraposition
from the start: assume ¬Q and deduce ¬P , and you’re done. A true proof
by contradiction, which involves the risky business of reasoning on the basis
of an assumption that will turn out to be incorrect, is a weapon to be used
in cases of necessity, but only then. As G. H. Hardy said, “It is a far finer
gambit than any chess play: a chess player may o↵er the sacrifice of a pawn
or even a piece, but a mathematician o↵ers the game.”

466

You might also like