Advanced Calculus - Folland
Advanced Calculus - Folland
Second Edition
Gerald B. Folland
Preface to the Second Edition
The first edition of this book was published by Prentice-Hall (later sub-
sumed into Pearson Education) from 2002 to 2022. After their decision to
discontinue publication, the publication rights reverted to me, and I am
making the book freely available to everyone in pdf form.
Gerald B. Folland
Department of Mathematics
University of Washington
Seattle, WA 98195-4350
[email protected]
August 4, 2023
Contents
Preface ix
2 Differential Calculus 43
2.1 Differentiability in One Variable . . . . . . . . . . . . . . . . . . 43
2.2 Differentiability in Several Variables . . . . . . . . . . . . . . . . 53
2.3 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.4 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . 70
2.5 Functional Relations and Implicit Functions: A First Look . . . . 73
2.6 Higher-Order Partial Derivatives . . . . . . . . . . . . . . . . . . 77
2.7 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.8 Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.9 Extreme Value Problems . . . . . . . . . . . . . . . . . . . . . . 100
2.10 Vector-Valued Functions and Their Derivatives . . . . . . . . . . 106
v
vi Contents
Appendices
Bibliography 453
Index 455
PREFACE
This is a book about the theory and applications of derivatives (mostly partial),
integrals (mostly multiple or improper), and infinite series (mostly of functions
rather than of numbers), at a deeper level than is found in the standard calculus
books.
In recent years there has been a tendency for the courses that were once called
“advanced calculus” to turn into courses on the foundations of analysis. Students
typically start with a year and a half of calculus that emphasizes computations and
applications, then proceed (perhaps by way of a “bridge course” on mathematical
reasoning) to a course of an entirely theoretical nature that covers such thins as
the topology of Euclidean space, the theory of the Riemann integral, and proofs of
some theorems that have been taken on faith before.
I am not persuaded that such a divorce of the practical from the theoretical
aspects of the subject is a good idea. On the one hand, the study of theoretical un-
derpinnings of ideas with which one is already familiar tends to be dry and tedious,
and the development of unfamiliar ideas can be rather daunting unless it is accom-
panied by some hands-on experience with concrete examples and applications. On
the other hand, relegation of the computations and applications to the elementary
courses means that students are not exposed to these matters on a more sophisti-
cated level. (How many students recognize that Taylor polynomials should be part
of one’s everyday tool kit? How many know that the integral test gives an effective
way of approximating the sum of a series?)
This book is an attempt to present a unified view of calculus in which theory
and practice can reinforce each other. On the theoretical side, it is reasonably com-
plete and self-contained. Accordingly, it contains a certain amount of “foundations
of analysis,” but I have kept this material to the bare minimum needed for the main
topics of the book. I also place a higher premium on intuitive understanding than
on formal proofs and technical definitions. Along with the latter, therefore, I often
offer informal arguments and ideas, sometimes involving infinitesimals, that may
provide more enlightenment than the strictly rigorous approach. The worked-out
ix
x Preface
examples and exercises run the gamut from routine calculations to theoretical ar-
guments; many of them involve a mixture of the two. The reader whose interest in
the theory is limited should be able to benefit from the book by skipping many of
the proofs.
The essential prerequisite for this book is a sound knowledge of the mechanics
of one-variable calculus. The theory of differentiation and integration on the real
line is presented, rather tersely, in Sections 2.1 and 4.1, but I assume that the reader
is thoroughly familiar with the standard techniques for calculating derivatives and
integrals. Some previous experience with infinite series, partial derivatives, and
multiple integrals might be helpful but is not really necessary. And, of course, for a
full appreciation of the theory one needs a certain level of comfort with mathemat-
ical reasoning, but that is best acquired with practice and experience.
An acquaintance with linear algebra is needed in a few places, particularly §2.8
(classification of critical points), §2.10 (differentiation of vector-valued functions
of vector variables), §3.1 and §§3.4–5 (the implicit function theorem for systems
of equations, the inverse mapping theorem, and functional dependence), and §4.4
(change of variables for multiple integrals). However, most of this material can
be done in the two- and three-dimensional cases (perhaps by eliding parts of some
proofs) with vector algebra and a little ad hoc discussion of matrices and determi-
nants. In any case, Appendix A provides a brief summary of the necessary concepts
and results from linear algebra.
A few of the more formidable proofs have been exiled to Appendix B. In some
of them, the ratio of the amount of work required to the amount of understanding
gained is especially high. Others involve ideas such as the Heine-Borel theorem or
partitions of unity that are best appreciated at a more advanced level. Of course,
the decisions on what to put into Appendix B reflect my personal tastes; instructors
will have to make their own choices of what to include or omit.
In this book a single numeration system is used for theorems, lemmas, corollar-
ies, propositions, and displayed formulas. Thus, for each m and n there is only one
item of any of these types labeled m.n, and it is guaranteed to follow m.(n − 1)
and precede m.(n + 1). This procedure minimizes the amount of effort needed to
locate referenced items.
In a few places I offer glimpses into the world of more advanced analysis.
Chapters 4 and 5 end with brief, informal sketches of the Lebesgue integral and
the theory of differential forms; Chapter 8 leads to the point where the realm of
eigenfunction expansions and spectral theory is visible on the horizon. I hope that
many of my readers will accept the invitation to explore further.
Acknowledgments. This book has benefited from the comments and suggestions
of a number of people: my colleague James Morrow, the students in the advanced
xi
calculus classes that he and I have taught over the past three years in which prelimi-
nary versions of this book were used, and several reviewers, especially Jeffrey Fox.
I am also grateful to my editor, George Lobell, for his support and enthusiasm.
Errata. Responsibility for errors in this book, of course, remains with me.
Responsibility for informing me of these errors, however, rests with my readers.
Anyone who finds misprints, mistakes, or obscurities is urged to write to me at the
address below. I will post such things on a web site that will be accessible from
www.math.washington.edu.
Gerald B. Folland
Department of Mathematics
University of Washington
Seattle, WA 98195-4350
[email protected]
Chapter 1
The first half of this chapter (§§1.1–4) presents basic facts and concepts concern-
ing geometry, vectors, limits, continuity, and sequences; the material in it is used
throughout the later chapters. The second half (§§1.5–8) deals with some of the
more technical topological results that underlie calculus. It is quite concise and in-
cludes nothing but what is needed in this book. The reader who wishes to proceed
quickly to the study of differentiation and integration may scan it quickly and refer
back to it as necessary; on the other hand, the reader who wishes to see a more
extensive development of this material is referred to books on the foundations of
analysis such as DePree and Swartz [5], Krantz [12], or Rudin [18].1
At the outset, let us review some standard notation and terminology for future
reference:
• Sums:
! If a1 , a2 , . . .!
, ak are numbers, their sum a1 + a2 + · · · + ak is denoted
by k1 an , or by kn=1 an if necessary for clarity. The sum need not be
started at n = 1; more generally, if j < k, we have
k
"
an = aj + aj+1 + · · · + ak .
j
The letters j and k denote the limits of summation; the letter n is analo-
gous to a dummy variable in an integral and may be replaced by any other
letter that is not already in use without
! changing the meaning of the sum.
We shall occasionally write simply an when the limits of summation are
understood.
1
Numbers in brackets refer to the bibliography at the end of the book.
1
2 Chapter 1. Setting the Stage
• Sets: If S and T are two sets, S ∪ T and S ∩ T denote their union and
intersection, respectively, and S \ T denotes the set of all elements of S that
are not in T . The expressions “S ⊂ T ” and “T ⊃ S” both mean that S is a
subset of T , including the possibility that S = T , and “x ∈ S” and “x ∈ / S”
mean, respectively, that x is or is not an element of S. The set of all objects
x satisfying a property P (x) is denoted by {x : P (x)}, and empty set is
denoted by ∅.
Intervals of the form (a, b) are called open; intervals of the form [a, b] are
called closed; and intervals of the forms (a, b] and [a, b) are called half-open.
(Of course, the symbol (a, b) is also used to denote the ordered pair whose
first and second members are a and b, respectively; remarkably enough, this
rarely causes any confusion.)
If {x1 , . . . , xk } is a finite set of real numbers, its largest and smallest ele-
ments are denoted by max(x1 , . . . , xk ) and min(x1 , . . . , xk ), respectively.
• Special functions: In this book, we denote the natural logarithm by log rather
than ln, this being the common usage in advanced mathematics. Also, we de-
note the principal branches of the inverse trig functions by arcsin, arccos, and
arctan; arcsin and arccos map [−1, 1] onto [− 21 π, 12 π] and [0, π], respectively,
and arctan maps R onto (− 21 π, 12 π).
|a · b| ≤ |a| |b|.
(a · b)2
f ((a · b)/|b|2 ) = |a|2 − .
|b|2
(a · b)2
|a|2 − ≥ 0.
|b|2
Multiplying through by |b|2 , we obtain the desired result: |a|2 |b|2 ≥ (a · b)2 .
Cross Products. Let i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1) be the
standard basis vectors for R3 ; then an arbitrary vector a ∈ R3 can be written as
a = (a1 , a2 , a3 ) = a1 i + a2 j + a3 k.
a × b = −b × a.
a × (b × c) + b × (c × a) + c × (a × b) = 0.
(|a × b|2 is the sum of the squares of the components of a × b. Multiply it out and
rearrange the terms to get |a|2 |b|2 − (a · b)2 .) If θ is the angle between a and b
(0 ≤ θ ≤ π), we know that a · b = |a| |b| cos θ, so
a · (a × b) = b · (a × b) = 0;
8 Chapter 1. Setting the Stage
a×b
b
θ
a
EXERCISES
1. Let x = (3, −1, −1, 1) and y = (−2, 2, 1, 0). Compute the norms of x and y
and the angle between them.
2. Given x, y ∈ Rn , show that
a. |x + y|2 = |x|2 + 2x · y + |y|2 .
b. |x + y|2 + |x − y|2 = 2(|x|2 + |y|2 ).
3. Suppose x1 , . . . , xk ∈ Rn .
a. Generalize Exercise 2a to obtain a formula for |x1 + · · · + xk |2 .
b. (The Pythagorean Theorem) Suppose the vectors xj are mutually orthog-
onal, i.e., that xi · xj = 0 for i ̸= j. Show that |x1 + · · · + xk |2 =
|x1 |2 + · · · + |xk |2 .
4. Under what conditions on a and b is Cauchy’s inequality an equality? (Exam-
ine the proof.)
5. Under what conditions on a and b is the triangle inequality an equality?
1.2. Subsets of Euclidean Space 9
) )
6. Show that ) |a| − |b| ) ≤ |a − b| for every a, b ∈ Rn .
7. Suppose a, b ∈ R3 .
a. Show that if a · b = 0 and a × b = 0, then either a = 0 or b = 0.
b. Show that if a · c = b · c and a × c = b × c for some nonzero c ∈ R3 ,
then a = b.
c. Show that (a × a)× b = a × (a × b) if and only if a and b are proportional
(i.e., one is a scalar multiple of the other).
8. Show that a · (b × c) is the determinant of the matrix whose rows are a, b, and
c (if these vectors are considered as row vectors) or the matrix whose columns
are a, b, and c (if they are considered as column vectors).
• The closure of S is the union of S and all its boundary points. It is denoted
by S:
S = S ∪ ∂S.
Let us examine these ideas a little more closely. First, notice that the boundary
points of S are the same as the boundary points of S c ; the definition of boundary
point remains unchanged if S and S c are switched. Moreover, if x is neither an
interior point of S nor an interior point of S c , then x must be a boundary point of
S. In other words, given S ⊂ Rn and x ∈ Rn , there are exactly three possibilities:
x is an interior point of S, or x is an interior point of S c , or x is a boundary point
of S.
E XAMPLE 1. Let S be B(ρ, 0), the ball of radius ρ about the origin. First,
given x ∈ S, let r = ρ − |x|. If |y − x| < r, then by the triangle inequality we
have |y| ≤ |y − x| + |x| < ρ, so that B(r, x) ⊂ S. Therefore, every x ∈ S is
an interior point of S, so S is open. Second, a similar calculation shows that if
|x| > ρ then B(r, x) ⊂ S c where r = |x| − ρ, so every point with |x| > ρ is an
interior point of S c . On the other hand, if |x| = ρ, then cx ∈ S for 0 < c < 1
and cx ∈ S c for c ≥ 1, and |cx − x| = |c − 1|ρ can be as small as we please,
so x is a boundary point. In other words, the boundary of S is the sphere of
radius ρ about the origin, and the closure of S is the closed ball {x : |x| ≤ ρ}.
E XAMPLE 2. Now let S be the ball of radius ρ about the origin together with
the “upper hemisphere” of its boundary:
% &
S = B(ρ, 0) ∪ x ∈ Rn : |x| = ρ and xn > 0 .
The calculations in Example 1 show that S int is the open ball B(ρ, 0); ∂S is
the sphere {x : |x| = ρ}, and S is the closed ball {x : |x| ≤ ρ}. The set S is
neither open nor closed.
E XAMPLE 3. In the real line (i.e., n = 1), let S be the set of all rational
numbers. Since every ball in R — that is, every interval — contains both
rational and irrational numbers, every point of R is a boundary point of S. The
set S is neither open nor closed; its interior is empty; and its closure is R.
where ! denotes one of the relations =, <, >, ≤, ≥. (Taking the quantity on the
right of ! to be 0 is no restriction; just move all the terms over to the left side.) We
anticipate some results from §1.3 in giving the following rule of thumb: Sets defined
by strict inequalities are open; sets defined by equalities or weak inequalities are
closed. More precisely, if S is given by (1.5) where the function f is continuous,
then S is open if ! denotes < or >, and S is closed if ! denotes =, ≤, or ≥. The
reader may feel free to use this rule in doing the exercises.
12 Chapter 1. Setting the Stage
EXERCISES
1. For each of the following sets S in the plane R2 , do the following: (i) Draw a
sketch of S. (ii) Tell whether S is open, closed, or neither. (iii) Describe S int ,
S, and ∂S. (These descriptions should be in the same set-theoretic language as
the description of S itself given here.)
a. S = {(x, y) : 0 < x2 + y 2 ≤ 4}.
b. S = {(x, y) : x2 − x ≤ y ≤ 0}.
c. S = {(x, y) : x > 0, y > 0, and x + y > 1}.
d. S = {(x, y) : y = x3 }.
e. S = {(x, y) : x > 0 and y = sin(1/x)}.
f. S = {(x, y) : x2 + y 2 < 1} \ {(x, 0) : x < 0}.
g. S = {(x, y) : x and y are rational numbers in [0, 1]}.
2. Show that for any S ⊂ Rn , S int is open and ∂S and S are both closed. (Hint:
Use the fact that balls are open, proved in Example 1.)
3. Show that if S1 and S2 are open, so are S1 ∪ S2 and S1 ∩ S2 .
4. Show that if S1 and S2 are closed, so are S1 ∪ S2 and S1 ∩ S2 . (One way is to
use Exercise 3 and Proposition 1.4b.)
5. Show that the boundary of S is the intersection of the closures of S and S c .
Give an example of an infinite collection S1 , S2 , . . . of closed sets whose union
6. #
∞
j=1 Sj is not closed.
7. There are precisely two subsets of Rn that are both open and closed. What are
they?
8. Give an example of a set S such that the interior of S is unequal to the interior
of the closure of S.
9. Show that the ball of radius r about a is contained in the ball of radius r + |a|
about the origin. Conclude that a set S ⊂ Rn is bounded if it is contained in
some ball (whose center can be anywhere in Rn ).
lim f (x) = L,
x→a
The equivalence of (1.6) and (1.7) follows from (1.3): If (1.6) is satisfied, then
√
(1.7) is satisfied with δ′ = δ/ n; and if (1.7) is satisfied, then (1.6) is satisfied
with δ = δ′ .
More generally, we can consider functions f that are only defined on a subset
S of Rn and points a that lie in the closure of S. The definition of limx→a f (x) is
the same as before except that x is restricted to lie in the set S. It may be necessary,
for the sake of clarity, to specify this restriction explicitly; for this purpose we use
the notation
lim f (x).
x→a, x∈S
In particular, for a function f on the real line we often need to consider the one-
sided limits
Thus the study of limits and continuity of vector-valued functions is easily reduced
to the scalar case, to which we now return out attention.
We often express the relation limx→a f (x) = L informally by saying that f (x)
approaches L as x approaches a. In one dimension this works quite well; we can
envision x as the location of a particle that moves toward a from the right or the
left. But in higher dimensions there are infinitely many different paths along which
a particle might move toward a, and for the limit to exist one must get the same
result no matter which path is chosen. It is safer to abandon the “dynamic” picture
of a particle moving toward a; we should simply think in terms of f (x) being close
to L provided that x is close to a, without reference to any motion.
xy
E XAMPLE 1. Let f (x, y) = if (x, y) ̸= (0, 0), and let f (0, 0) =
x2 + y 2
0. Show that lim(x,y)→(0,0) f (x, y) does not exist — and, in particular, f is
discontinuous at (0, 0).
Solution. First, note that f (x, 0) = f (0, y) = 0 for all x and y, so
f (x, y) → 0 as (x, y) approaches (0, 0) along the x-axis or the y-axis. But
if we consider other straight lines passing through the origin, say y = cx, we
have f (x, cx) = cx2 /(x2 + c2 x2 ) = c/(1 + c2 ), so the limit as (x, y) ap-
proaches (0, 0) along the line y = cx is c/(1 + c2 ). Depending on the value
of c, this can be anything between − 12 and 21 (these two extreme values being
achieved when c = −1 or c = 1). So there is no limit as (x, y) approaches
(0, 0) unrestrictedly.
1.3. Limits and Continuity 15
The argument just given suggests the following line of thought. We wish to
know if limx→a f (x) exists. We look at all the straight lines passing through a
and evaluate the limit of f (x) as x approaches a along each of those lines by one-
variable techniques; if we always get the same answer L, then we should have
limx→a f (x) = L, right? Unfortunately, this doesn’t work:
x2 y
E XAMPLE 2. Let g(x, y) = if (x, y) ̸= (0, 0) and g(0, 0) = 0. Again
x4 + y 2
we have g(x, 0) = g(0, y) = 0, so the limit as (x, y) → (0, 0) along the
coordinate axes is 0. Moreover, if c ̸= 0,
cx4 cx
g(x, cx) = 4 2 2
= 2 → 0 as x → 0,
x +c x c + x2
so the limit as (x, y) → (0, 0) along any other straight line is also 0. But if we
approach along a parabola y = cx2 , we get
cx3 c
g(x, cx2 ) = 4 2 4
= ,
x +c x 1 + c2
which can be anything between − 12 and 12 as before, so the limit does not
exist. (The similarity with Example 1 is not accidental: If f is the function in
Example 1 we have g(x, y) = f (x2 , y).)
After looking at examples like this one, one might become discouraged about
the possibility of ever proving that limits do exist! But things are not so bad. If f is a
continuous function, limx→a f (x) is simply f (a). Moreover, most of the functions
of several variables that one can easily write down are built up from continuous
functions of one variable by using the arithmetic operations plus composition, and
these operations all preserve continuity (except for division when the denominator
vanishes).
Here are the precise statements and proofs of the fundamental results. (The
reader may wish to skip the proofs; they are of some value as illustrations of the sort
of formal arguments involving limits that are important in more advanced analysis,
but they contribute little to an intuitive understanding of the results.)
Proof. Let ϵ > 0 and a ∈ U be given, and let b = f (a). Since g is continuous on
f (U ), we can choose η > 0 so that |g(y)−g(b)| < ϵ whenever |y−b| < η. Having
16 Chapter 1. Setting the Stage
chosen this η, since f is continuous on U we can find δ > 0 so that |f (x) − b| < η
whenever |x − a| < δ. Thus,
1.10 Theorem. Let f1 (x, y) = x + y, f2 (x, y) = xy, and g(x) = 1/x. Then f1
and f2 are continuous on R2 and g is continuous on R \ {0}.
Proof. To prove continuity of f1 and f2 , we need to show that lim(x,y)→(a,b) x+y =
a + b and lim(x,y)→(a,b) xy = ab for every a, b ∈ R. That is, given ϵ > 0 and
a, b ∈ R, we need to find δ > 0 so that if |x − a| < δ and |y − b| < δ, then (i)
|(x + y) − (a + b)| < ϵ or (ii) |xy − ab| < ϵ. For (i) we can simply take δ = 21 ϵ,
for if |x − a| < 12 ϵ and |y − b| < 21 ϵ, then
Proof. Combine Theorem 1.10 and Corollary 1.11 with Theorem 1.9. For example,
if f and g are continuous functions on U ⊂ Rn , then f + g is continuous because
it is the composition of the continuous map (f, g) from U to R2 and the continuous
map (x, y) 3→ x+y from R2 to R. Likewise for the other arithmetic operations.
In fact, the limit is 0, and this can be established with a little ad hoc estimat-
ing. Clearly |x2 − y 2 | ≤ x2 + y 2 , so |h(x, y)| ≤ |xy|. But xy → 0 as
(x, y) → (0, 0), so h(x, y), being even smaller in absolute value than xy, must
also approach 0. Thus lim(x,y)→(0,0) h(x, y) = 0 and h is continuous at (0, 0).
Proof. Suppose U is open. We shall show that S is open by showing that every
point a in S is an interior point of S. If a ∈ S, then f (a) ∈ U . Since U is open,
some ball centered at f (a) is contained in U ; that is, there is a positive number ϵ
such that every y ∈ Rk such that |y − f (a)| < ϵ is in U . Since f is continuous,
there is a positive number δ such that |f (x) − f (a)| < ϵ whenever |x − a| < δ.
But this means that f (x) ∈ U whenever |x − a| < δ, that is, x ∈ S whenever
|x − a| < δ. Thus a is an interior point of S.
On the other hand, suppose U is closed. Then the complement of U in R is open
by Proposition 1.4b, so the set S ′ = {x : f (x) ∈ U c } is open by the argument just
given. But S ′ is just the complement of S in Rn , so S is closed by Proposition 1.4b
again.
EXERCISES
1. For the following functions f , show that lim(x,y)→(0,0) f (x, y) does not ex-
ist.
x2 + y
a. f (x, y) = '
x2 + y 2
x
b. f (x, y) = 4
x + y4
x4 y 4
c. f (x, y) = 2
(x + y 4 )3
2. For the following functions f , show that lim(x,y)→(0,0) f (x, y) = 0.
x2 y 2 3x5 − xy 4
a. f (x, y) = b. f (x, y) =
x2 + y 2 x4 + y 4
3. Let f (x, y) = x−1 sin(xy) for x ̸= 0. How should you define f (0, y) for
y ∈ R so as to make f a continuous function on all of R2 ?
4. Let f (x, y) = xy/(x2 + y 2 ) as in Example 1. Show that, although f is dis-
continuous at (0, 0), f (x, a) and f (a, y) are continuous functions of x and y,
respectively, for any a ∈ R (including a = 0). We say that f is separately
continuous in x and y.
5. Let f (x, y) = y(y − x2 )/x4 if 0 < y < x2 , f (x, y) = 0 otherwise. At which
point(s) is f discontinuous?
6. Let f (x) = x if x is rational, f (x) = 0 if x is irrational. Show that f is
continuous at x = 0 and nowhere else.
7. Let f (x) = 1/q if x = p/q where p and q are integers with no common factors
and q > 0, and f (x) = 0 if x is irrational. At which points, if any, is f
continuous?
8. Suppose f : Rn → Rk has the following property: For any open set U ⊂ Rk ,
{x : f (x) ∈ U } is an open set in Rn . Show that f is continuous on Rn . Show
also that the same result holds if “open” is replaced by “closed.”
9. Let U and V be open sets in Rn and let f be a one-to-one mapping from U onto
V (so that there is an inverse mapping f −1 : V → U ). Suppose that f and f −1
are both continuous. Show that for any set S such that S ⊂ U and f (S) ⊂ V
we have f (∂S) = ∂(f (S)).
1.4 Sequences
Generally speaking, a sequence is a collection of mathematical objects that is in-
dexed by the positive integers. The objects in question can be of any sort, such as
20 Chapter 1. Setting the Stage
numbers, n-dimensional vectors, sets, etc. If the kth object in the sequence is Xk ,
the sequence as a whole is usually denoted by {Xk }∞ ∞
k=1 , or just by {Xk }1 or even
{Xk } if there is no possibility of confusion. (We shall comment further on this
notation below.) Alternatively, we can write out the sequence as X1 , X2 , X3 , . . ..
We speak of a sequence in a set S if the objects of the sequence all belong to S.
E XAMPLE 1.
a. A sequence of numbers: 1, 4, 9, 16, . . .. The kth term in the sequence is k2 ,
and the sequence as a whole may be written as {k2 }∞ 1 .
b. A sequence of intervals: (−1, 1), (− 12 , 21 ), (− 13 , 13 ), (− 41 , 14 ), . . .. The kth
term in the sequence is the interval (− k1 , k1 ), and the sequence as a whole
may be written as {(− k1 , k1 )}∞
1 .
in which the first two terms are equal to 1 and each of the remaining terms is
the sum of the two preceding ones (that is, xk = xk−2 + xk−1 ).
E XAMPLE 3. Define a sequence {xk } as follows: x1 is a given positive integer
a. If xk is odd, then xk+1 = 3xk + 1; if xk is even, then xk+1 = xk /2. For
example, if a = 13, the sequence is
ending in the infinite repetition of (4, 2, 1). It is a famous unsolved problem (as
of this writing) to prove or disprove that this sequence eventually ends in the
repeating figure (4, 2, 1) no matter what initial number a is chosen. (Try a few
values of a to see how it works! For more information, see Lagarias [13].)
It is convenient to make the definition of sequence a little more flexible by
allowing the index k to begin with something other than 1. Thus, we may speak of a
sequence {Xk }∞ ∞
0 whose objects are X0 , X1 , X2 , . . ., or a sequence {Xk }7 , whose
objects are X7 , X8 , X9 , . . .. We may also speak of a finite sequence whose terms
are indexed by a finite collection of integers, such as {Xk }81 (a finite sequence of
eight terms), or a doubly infinite sequence whose terms are indexed by the whole
set of integers: {Xk }∞−∞ .
1.4. Sequences 21
but its set of values is just the two-element set {−1, 1}. Since curly brackets are
commonly used to specify sets (as we just did with {−1, 1}), the notation {Xk }∞ 1
for a sequence invites confusion with the set whose elements are the Xk ’s, and for
this reason some authors use other notations such as ⟨Xk ⟩∞ 1 . However, the notation
{Xk }∞ 1 is by far the most common one, and in practice it rarely causes problems,
so we shall stick with it.
For the remainder of this section we shall be concerned with sequences of num-
bers or n-dimensional vectors. We reserve the letter n for the dimension and use
letters such as k and j for the index on a sequence. Thus, for example, if {xk } is a
sequence in Rn , the components of the vector xk are (xk1 , . . . , xkn ).
A sequence {xk } in Rn is said to converge to the limit L if for every ϵ > 0
there is an integer K such that |xk − L| < ϵ whenever k > K; otherwise, {xk }
diverges. If {xk } converges to L, we write xk → L or L = limk→∞ xk .
We say that limk→∞ xk = ∞ (or +∞) if for every C > 0 there is an integer
K such that xk > C whenever k > K, and limk→∞ xk = −∞ if for every C > 0
there is an integer K such that xk < −C whenever k > K. (However, a sequence
whose limit is ±∞ is still called divergent.)
It follows easily from the estimates (1.3) that xk → L if and only if each
component of xk converges to the corresponding component of L, that is, xkm →
Lm for 1 ≤ m ≤ n. The study of convergence of sequences of vectors is thus
reducible to the study of convergence of numerical sequences.
E XAMPLE 4.
a. The sequence {1/k} converges to 0, since |(1/k) − 0| < ϵ whenever k >
(1/ϵ).
b. The sequence {k2 } diverges; more precisely, limk→∞ k2 = ∞.
22 Chapter 1. Setting the Stage
Ck CK C C C CK 1 1 1 CK 1
0< = · · ··· < · · ··· = · .
k! K! K + 1 K + 2 k K! 2 2 2 K! 2k−K
sequence {xk } then converges to a, but the sequence {f (xk )} does not converge to
f (a).
We have shown that if (a) is true then (b) is true, and that if (a) is false then (b)
is false, so the proof is complete.
EXERCISES
1. For each of the following sequences {xk }, find the limit or show that the se-
quence diverges.
√
2k + 1 sin k kπ
a. xk = √ . b. xk = . c. xk = sin .
2 k+1 k 3
3k + 4
2. Let xk = ; then limk→∞ xk = 3. Given ϵ > 0, find an integer K so
k−5
that |xk − 3| < ϵ whenever k > K.
3. Define a sequence {xk } recursively by x1 = 1 and xk+1 = kxk /(k + 1) for
k ≥ 1. Find an explicit formula for xk . What is limk→∞ xk ?
4. Let {xk } and {yk } be sequences in R such that xk → a and yk → b. Show that
xk + yk → a + b and xk yk → ab. (Use Theorems 1.10 and 1.15.)
5. Given f : Rn → Rm ; show that limx→a f (x) = l if and only if f (xk ) → l for
every sequence {xk } that converges to a. (Adapt the proof of Theorem 1.15.)
1.5 Completeness
The essential properties of the real number system that underlie all the theorems of
calculus are summarized by saying that R is a complete ordered field. We explain
the meaning of these terms one by one:
A field is a set on which the operations of addition, subtraction, multiplication,
and division (by any nonzero number) are defined, subject to all the usual laws of
arithmetic: commutativity, associativity, etc. Besides the real numbers, examples of
fields include the rational numbers and the complex numbers, and there are many
others. (For more precise definitions and more examples, consult a textbook on
abstract algebra such as Birkhoff and Mac Lane [4] or Hungerford [8].)
An ordered field is a field equipped with a binary relation < that is transitive
(if a < b and b < c, then a < c) and antisymmetric (if a ̸= b, then either a < b or
b < a, but not both), and interacts with the arithmetic operations in the usual way
(if a < b then a + c < b + c for any c, and also ac < bc if c > 0). The real number
and rational number systems are ordered fields (with the usual meaning of “<”),
but the complex number system is not.
Finally, completeness is what distinguishes the real numbers from the smaller
ordered fields such as the rational numbers and makes possible the transition from
algebra to calculus; it means that there are “no holes” in the real number line. There
are several equivalent ways of stating the completeness property precisely. The one
we shall use as a starting point is the existence of least upper bounds.
If S is a subset of R, an upper bound for S is a number u such that x ≤ u for
all x ∈ S, and a lower bound for S is a number l such that x ≥ l for all x ∈ S.
The Completeness Axiom. Let S be a nonempty set of real numbers. If S has
an upper bound, then S has a least upper bound, called the supremum of S and
denoted by sup S. If S has a lower bound, then S has a greatest lower bound,
called the infimum of S and denoted by inf S.
If S has no upper bound, we shall define sup S to be +∞, and if S has no lower
bound, we shall define inf S to be −∞.
E XAMPLE 1.
a. If S is the interval (0, 1], then sup S = 1 and inf S = 0.
b. If S = {1, 21 , 13 , 14 , . . .}, then sup S = 1 and inf S = 0.
c. If S = {1, 2, 3, 4, . . .}, then sup S = ∞ and inf S = 1.
d. If S is the single point a, then sup S = inf S = a. √
S = {x : x is rational and x2 < 2}, then sup S = 2 and inf S =
e. If √
− 2. This is an example of a set of rational numbers that has no supremum
or infimum within the set of rational numbers.
1.5. Completeness 25
If S has an upper bound, the number a = sup S is the unique number such
that
i. x ≤ a for every x ∈ S and
ii. for every ϵ > 0 there exists x ∈ S with x > a − ϵ.
(i) expresses the fact that a is an upper bound, whereas (ii) expresses the fact that
there is no smaller upper bound. In particular, while sup S may or may not belong
to S itself, it always belongs to the closure of S. Similarly for inf S if S is bounded
below.
The completeness of the real number system plays a crucial role in establishing
the convergence of numerical sequences. The most basic result along these lines is
the following. First, some terminology: A sequence {xk } is called bounded if all
the numbers xn are contained in some bounded interval. A sequence {xn } is called
increasing if xn ≤ xm whenever n ≤ m, and decreasing if xn ≥ xm whenever
n ≤ m. A sequence that is either increasing or decreasing is called monotone (or
monotonic).
1.16 Theorem (The Monotone Sequence Theorem). Every bounded monotone se-
quence in R is convergent. More precisely, the limit of an increasing (resp. decreas-
ing) sequence is the supremum (resp. infimum) of its set of values.
Observe that if xk−1 > 0 then xk > 0 too; since we assume that x1 > 0,
every term of this sequence is positive. (In particular, division by zero is never
√
a problem.) We claim that xk → a, no matter what initial x1 is chosen.
Indeed, if we assume that the sequence converges to a nonzero limit L, by
letting k → ∞ in the recursion formula we see that
15 a6
L= L+ , or L2 = 21 L2 + 12 a,
2 L
26 Chapter 1. Setting the Stage
so that L2 = a. Since xk > 0 for every k, we must have L > 0, and hence
√
L = a. But this argument is without force until we know that {xk } converges
to a nonzero limit.
To verify this, observe that for k ≥ 2,
Thus, starting with the second term, the sequence {xk } is bounded below by
√
a > 0, and it is decreasing:
xk+1 − xk = 21 (ax−1 1
k − xk ) < 2 (xk − xk ) = 0.
√
The convergence to a limit L ≥ a now follows from the monotone sequence
theorem. (The verification that {xk } converges is not just a formality; see
Exercise 4.)
The sequence {xk } gives a computationally efficient recursive algorithm
for computing square roots.
It should be emphasized
$ that the real point of the nested interval theorem is that
the intersection ∞ I
1 n is nonempty; the fact that it can contain no more than one
point is pretty obvious from the assumption that the length of In tends to zero.
If {xk } is a sequence (in any set, not necessarily R), we may form a subse-
quence of {xk } by deleting some of the terms and keeping the rest in their original
order. More precisely, a subsequence of {xk } is a sequence {xkj }∞ j=1 specified
1.5. Completeness 27
by a one-to-one, increasing map j → kj from the set of positive integers into it-
self. For example, by taking kj = 2j we obtain the subsequence of even-numbered
terms; by taking kj = j 2 we obtain the subsequence of those terms whose index is
a perfect square, and so on.
The following theorem is one of the most useful results in the foundations of
analysis; it is one version of the Bolzano-Weierstrass theorem, whose general form
will be found in Theorem 1.21.
1.18 Theorem. Every bounded sequence in R has a convergent subsequence.
Proof. Let {xk } be a bounded sequence, say xk ∈ [a, b] for all k. Bisect the interval
[a, b] — that is, consider the two intervals [a, 12 (a + b)] and [ 12 (a + b), b]. At least
one of these subintervals must contain xk for infinitely many k; call that subinterval
I1 . (If both of them contain xk for infinitely many k, pick the one on the left.) Now
bisect I1 . Again, one of the two halves must contain xk for infinitely many k; call
that half I2 . Proceeding inductively, we obtain a sequence of intervals Ij , each one
contained in the preceding one, each one half as long as the preceding one, and
each one containing xk for infinitely many k. By the nested interval theorem, there
is exactly one point l contained in every Ij .
It is now easy to construct a subsequence of {xk } that converges to l, as follows.
Pick an integer k1 such that xk1 ∈ I1 , then pick k2 > k1 such that xk2 ∈ I2 , then
pick k3 > k2 such that xk3 ∈ I3 , and so forth. By construction of the Ij ’s, this
process can be continued indefinitely. Since xkj and l are both in Ij , and the length
of Ij is 2−j (b − a), we have |xkj − l| ≤ 2−j (b − a), which tends to 0 as j → ∞;
that is, xkj → l.
Theorem 1.18 generalizes easily to higher dimensions:
1.19 Theorem. Every bounded sequence in Rn has a convergent subsequence.
Proof. If |xk | ≤ C for all k, then the components xk1 , . . . , xkn all lie in the interval
[−C, C]. Hence, for each m = 1, . . . , n we can extract a convergent subsequence
from the sequence of mth components, {xkm }∞ k=1 . The trouble is that the indices
on these subsequences might all be different, so we can’t put them together. (We
might have chosen the odd-numbered terms for m = 1 and the even-numbered
terms for m = 2, for example.) Instead, we have to proceed inductively. First
we choose a subsequence {xkj } such that the first components converge; then we
choose a sub-subsequence {xkji } whose second components also converge, and so
on until we find a (sub)n sequence whose components all converge.
Another way to express the completeness of the real number system is to say
that every sequence whose terms get closer and closer to each other actually con-
verges. To be more precise, a sequence {xk } in Rn is called a Cauchy sequence if
28 Chapter 1. Setting the Stage
Therefore, xk → l.
EXERCISES
1. Find sup S and inf S for the following sets S. Do these numbers belong to S
or not?
a. S = {x : (2x2 − 1)(x2 − 1) < 0}.
b. S = {(−1)k + 2−k : k ≥ 0}.
c. S = {x : arctan x ≥ 1}.
2. Construct a sequence {xk } that has subsequences converging to three different
limits.
3. Consider the sequence 12 , 13 , 23 , 14 , 24 , 34 , 15 , 25 , 53 , 45 , . . ., obtained by listing the
rational numbers in (0, 1) with denominator n in increasing order, for n succe-
sively equal to 2, 3, 4, . . .. Show that for any a ∈ [0, 1], there is a subsequence
that converges to a. (Hint: Consider the decimal expansion of a.)
4. Given a real number a, define a sequence {xk } recursively by x1 = a, xk+1 =
x2k .
a. Show, as in Example 2, that if {xk } converges, its limit must be 0 or 1.
b. For which a is the limit equal to 0? equal to 1? nonexistent?
1.5. Completeness 29
√ √
5. Define a sequence {xk } recursively by x1 = 2, xk+1 = 2 + xk . Show by
induction that (a) xk < 2 and (b) xk < xk+1 for all k. Then show that lim xk
exists and evaluate it.
6. Let rk be the ratio of the (k + 1)th term to the kth term of the Fibonacci
sequence (Example 2, §1.4). (Thus the first few rk ’s are 1, 2, 32 , 53 , . . .) Our
√
object is to show that limk→∞ rk is the “golden ratio” ϕ = 12 (1 + 5), the
positive root of the equation x2 = x + 1.
a. Show that
rk + 1 2rk + 1
rk+1 = , rk+2 = .
rk rk + 1
b. Show that rk < ϕ if k is odd and rk > ϕ if k is even. Then show that
rk+2 − rk is positive if k is odd and negative if k is even. (Hint: For x > 0
we have x2 < x + 1 if x < ϕ and x2 > x + 1 if x > ϕ.)
c. Show that the subsequences {r2j−1 } and {r2j } of odd- and even-numbered
terms both converge to ϕ.
7. Let {xk } be a sequence in Rn and x a point in Rn . Show that some subsequence
of {xk } converges to x if and only if every ball centered at x contains xk for
infinitely many values of k.
8. Show that every infinite bounded set in Rn has an accumulation point. (See
Exercises 6–7 in §1.4.)
Let {xk }∞
1 be a bounded sequence in R. For m = 1, 2, 3, . . . , let
Then the sequence {Ym } is bounded and decreasing, and {ym } is bounded and
increasing (because the sup and inf are being taken over fewer and fewer numbers
as m increases), so they both converge. The limits lim Ym and lim ym are called
the limit superior and limit inferior of the sequence {xk }, respectively; they are
denoted by lim supk→∞ xk and lim inf k→∞ xk :
/ 0 / 0
lim sup xk = lim sup{xk : k ≥ m} , lim inf xk = lim inf{xk : k ≥ m} .
k→∞ m→∞ k→∞ m→∞
9. Show that lim sup xk is the number a uniquely specified by the following prop-
erty: For any ϵ > 0, there are infinitely many k for which xk > a − ϵ but only
finitely many for which xk > a + ϵ. What is the corresponding condition for
lim inf xk ?
30 Chapter 1. Setting the Stage
10. Show that there is a subsequence of {xk } that converges to lim sup xk , and
one that converges to lim inf xk .
11. Show that if a ∈ R is the limit of some subsequence of {xk }, then lim inf xk ≤
a ≤ lim sup xk .
12. Show that {xk } converges if and only if lim sup xk = lim inf xk , in which
case this common value is equal to lim xk .
1.6 Compactness
A subset of Rn is called compact if it is both closed and bounded. (Note: The
notion of compactness can be extended to settings other than Rn , but a different
definition must be adopted; see the concluding paragraph of this section.) Com-
pactness is an important property, principally because it yields existence theorems
for limits in many situations. The fundamental result is the following theorem.
1.22 Theorem. Continuous functions map compact sets to compact sets. That is,
suppose that S is a compact subset of Rn and f : S → Rm is continuous at every
point of S. Then the set % &
f (S) = f (x) : x ∈ S
is also compact.
Proof. Suppose {yk } is a sequence in the image f (S). For each k there is a point
xk ∈ S such that yk = f (xk ). Since S is compact, by the Bolzano-Weierstrass
theorem the sequence {xk } has a convergent subsequence {xkj } whose limit a
lies in S. Since f is continuous at a, by Theorem 1.15 the sequence {ykj } =
{f (xkj )} converges to the point f (a) ∈ f (S). Thus, every sequence in f (S) has a
subsequence whose limit lies in f (S). By the Bolzano-Weierstrass theorem again,
f (S) is compact.
It is not true, in general, that continuous functions map closed sets to closed
sets, or bounded sets to bounded sets. (See Exercises 1–2.) Only the combination
of closedness and boundedness is preserved.
An immediate consequence of Theorem 1.22 is the fundamental existence the-
orem for maxima and minima of real-valued functions.
• f (x) = cot πx, S = (0, 1). (The values of f range from −∞ to ∞.)
Compactness also has another consequence that turns out to be extremely useful
in more advanced mathematical analysis, although its significance may not be very
clear at first sight. (It will not be used elsewhere in this book except in some of the
technical arguments in Appendix B, so it may be regarded as an optional topic.)
Suppose S is a subset of Rn . A collection U of subsets of Rn is called a covering
of S if S is contained in the union of the sets in U. For example, for each x ∈ S
we could pick an open ball Bx centered at x; then U = {Bx : x ∈ S} is a covering
of S.
Much of what we have done in this section and the preceding ones can be
generalized from subsets of Rn to subsets of more general spaces equipped with a
“distance function” that behaves more or less like the Euclidean distance d(x, y) =
|x − y|. (Such spaces are known as metric spaces; see DePree and Swartz [5],
Krantz [12], or Rudin [18].) For example, in studying the geometry of a surface
S in R3 , one might want to take the “distance” between two points x, y ∈ S to
be not the straight-line distance |x − y| but the length of the shortest curve on S
that joins x to y. Another class of examples is provided by spaces of functions,
where the “distance” between two functions f and g can be measured in a number
of different ways; we shall say more about this in Chapter 8. In this general setting,
the Bolzano-Weierstrass and Heine-Borel theorems are no longer completely valid.
The conditions on a set S in Theorem 1.21b and Theorem 1.24b still imply that S is
closed and bounded, but not conversely. These conditions are still very important,
however, so a shift in terminology is called for. The condition in Theorem 1.24b —
that every open cover of S has a finite subcover — is usually taken as the definition
of compactness in the general setting, and the condition in Theorem 1.21b — that
every sequence in S has a subsequence that converges in S — is called sequential
compactness.
1.7. Connectedness 33
EXERCISES
1. Give an example of
a. a closed set S ⊂ R and a continuous function f : R → R such that f (S)
is not closed;
b. an open set U ⊂ R and a continuous function f : R → R such that f (U )
is not open.
2. a. Give an example of a bounded set S ⊂ R \ {0} and a real-valued function
f that is defined and continuous on R \ {0} such that f (S) is not bounded.
b. However, show that if f : Rn → Rm is continuous everywhere and S ⊂
Rn is bounded, then f (S) is bounded.
3. Show that an infinite set S ⊂ Rn is compact if and only if every infinite subset
of S has an accumulation point that lies in S. (See Exercises 6–7 in §1.4 and
Exercise 8 in §1.5.)
4. Suppose S ⊂ Rn is compact, f : S → R is continuous, and f (x) > 0 for
every x ∈ S. Show that there is a number c > 0 such that f (x) ≥ c for every
x ∈ S.
5. (A generalization of the nested interval theorem) Suppose {Sk } is a sequence
of nonempty compact subsets of Rn such that S1 ⊃ S2 ⊃ S3 ⊃ $ . . .. Show that
there is at least one point contained in all of the Sk ’s (that is, ∞ 1 Sk ̸= ∅).
(This can be done using either the Bolzano-Weierstrass theorem or the Heine-
Borel theorem. Can you find both proofs?)
6. The distance between two sets U, V ⊂ Rn is defined to be
% &
d(U, V ) = inf |x − y| : x ∈ U, y ∈ V .
1.7 Connectedness
A set in Rn is said to be connected if it is “all in one piece,” that is, if it is not the
union of two nonempty subsets that do not touch each other. The formal definition
is as follows: A set S ⊂ Rn is disconnected if it is the union of two nonempty
subsets S1 and S2 , neither of which intersects the closure of the other one; in this
34 Chapter 1. Setting the Stage
S T
case we shall call the pair (S1 , S2 ) a disconnection of S. The set S is connected
if it is not disconnected.
E XAMPLE 1. Let
% & % &
S1 = (x, y) : (x + 1)2 + y 2 < 1 , S2 = (x, y) : (x − 1)2 + y 2 < 1 ,
% &
S 2 = (x, y) : (x − 1)2 + y 2 ≤ 1 .
Then the set S = S1 ∪ S2 is disconnected, for the only point common to the
closures of S1 and S2 is (0, 0), which belongs to neither S1 nor S2 . However,
the set T = S1 ∪ S 2 is connected, for (0, 0) belongs both to S 2 and the closure
of S1 ; this point “connects” the two pieces of T . See Figure 1.2.
1.25 Theorem. The connected subsets of R are precisely the intervals (open, half-
open, or closed; bounded or unbounded).
The following result, a cousin of Theorem 1.22, gives the basic relation between
continuity and connectedness:
1.26 Theorem. Continuous functions map connected sets to connected sets. That
is, suppose f : S → Rm is continuous at every point of S and S is connected. Then
the set % &
f (S) = f (x) : x ∈ S
is also connected.
Then S1 and S2 are nonempty, and their union is S. If there were a point x ∈ S1
belonging to the closure of S2 , x would be the limit of a sequence {xk } in S2 by
Theorem 1.14. But then f (x) ∈ U1 and f (xk ) ∈ U2 , so f (x) = lim f (xk ) would
be in the closure of U2 by Theorem 1.14 again. This is impossible; hence S1 does
not intersect the closure of S2 , and likewise, S2 does not intersect the closure of S1 .
Thus S = S1 ∪ S2 is disconnected.
Proof. By Theorems 1.25 and 1.26, f (V ) is an interval. It contains f (a) and f (b)
and hence contains the entire interval between them.
The following results explain the relation between connectedness and arcwise
connectedness.
Proof. We shall assume that S is disconnected and show that it is not arcwise con-
nected. Accordingly, suppose (S1 , S2 ) is a disconnection of S. Pick a ∈ S1 and
b ∈ S2 ; we claim that there is no continuous g : [0, 1] → S such that g(0) = a and
g(1) = b. If there were, the set V = g([0, 1]) would be connected by Theorems
1.25 and 1.26. But this cannot be so: V is the union of V ∩ S1 and V ∩ S2 ; these
sets are nonempty since a ∈ V ∩ S1 and b ∈ V ∩ S2 , and neither of them intersects
the closure of the other. Hence S is not arcwise connected.
1.7. Connectedness 37
The converse of Theorem 1.28 is false: A set can be connected without being
arcwise connected. A typical example is
% & % &
(1.29) S = (x, y) : 0 < x ≤ 2 and y = sin(π/x) ∪ (0, y) : y ∈ [−1, 1] ,
pictured in Figure 1.3. S consists of two pieces, the graph of sin(π/x) and the
vertical line segment. These two sets do not form a disconnection of S, as the line
segment is included in the closure of the graph, but a point on the line segment
cannot be connected to a point on the graph by a continuous curve. The details are
sketched in Exercise 11.
However, open connected sets are arcwise connected:
Proof. Fix a point a ∈ S. Let S1 be the set of points in S that can be joined to a
by a continuous curve in S, and let S2 be the set of points in S that cannot; thus S1
and S2 are disjoint and S = S1 ∪ S2 . We shall show that
a. if x ∈ S1 , then all points sufficiently close to x are in S1 ;
b. if x ∈ S is in the closure of S1 , then x ∈ S1 .
(a) shows that no point of S1 can be in the closure of S2 , and (b) shows that no
point in the closure of S1 can be in S2 . Thus (S1 , S2 ) will form a disconnection
of S, contrary to the assumption that S is connected, unless S2 is empty — which
means that S is arcwise connected.
To prove (a) and (b), we use the fact that S is open, so that if x ∈ S, there is
a ball B centered at x that is included in S. If x ∈ S1 , then every y ∈ B is also
in S1 , for y can be joined to a by first joining x to a and then joining y to x by
the straight line segment from x to y, which lies in B and hence in S. Similarly,
if x is in the closure of S1 , by Theorem 1.14 there is a sequence {xk } of points in
S1 that converges to x. We have xk ∈ B for k sufficiently large, so again, x can
be joined to a by joining xk to a and then joining x to xk by a line segment in B;
hence x ∈ S1 . This completes the proof.
EXERCISES
1. Show directly from the definition that the following sets are disconnected.
(That is, produce a disconnection for each of them.)
a. The hyperbola {(x, y) ∈ R2 : x2 − y 2 = 1}.
b. Any finite set in Rn with at least two elements.
c. {(x, y, z) ∈ R3 : xyz > 0}.
38 Chapter 1. Setting the Stage
4. Suppose S1 and S2 are connected sets in Rn that contain at least one point in
common. Show that S1 ∪ S2 is connected. Is it true that S1 ∩ S2 must be
connected?
5. Show that an open set in Rn is disconnected if and only if it is the union of two
disjoint nonempty open subsets.
6. Show that a closed set in Rn is disconnected if and only if it is the union of two
disjoint nonempty closed subsets.
10. Suppose S is a connected set in R2 that contains (1, 3) and (4, −1). Show that
S contains at least one point on the line x = y. (Hint: Consider f (x, y) =
x − y.)
The crucial point is that for simple continuity the number δ may depend on x, but
for uniform continuity it does not. This is a rather subtle point, and the reader
should not be discouraged if its significance is not immediately clear; some very
eminent mathematicians of the past also had trouble with it!
Some readers may find it enlightening to see these conditions rewritten in a
symbolic way that makes them as concise as possible. We employ the logical sym-
bols ∀ and ∃, which mean “for all” and “there exists,” respectively. With this un-
derstanding, the condition for f to be continuous on S is that
The difference between (1.31) and (1.32) is that the “∀x” has been interchanged
with the “∃δ,” so that in (1.31) the δ is allowed to depend on x, whereas in (1.32)
the same δ must work for every x.
EXERCISES
DIFFERENTIAL CALCULUS
The main theme of this chapter is the theory and applications of differential cal-
culus for functions of several variables. The reader is expected to be familiar with
differential calculus for functions of one variable. However, we offer a review of
the one-variable theory that contains a few features that the reader may not have
seen before, and the one-variable theory makes another appearance in the section
on Taylor’s theorem.
43
44 Chapter 2. Differential Calculus
in other words, if f (a + h) is the sum of the linear function f (a) + mh and an error
term that tends to zero more rapidly than h as h → 0. In this case we have
f (a + h) − f (a)
(2.2) m = lim .
h→0 h
Thus the number m is uniquely determined, and it is the derivative of f at a as
usually defined in elementary calculus books, denoted by f ′ (a). Conversely, if the
limit m in (2.2) exists, then (2.1) holds with E(h) = f (a + h) − f (a) − mh. Thus,
our definition of differentiability is equivalent to the usual one; it simply puts more
emphasis on the idea of linear approximation.
Observe that if E(h)/h vanishes as h → 0, then so does E(h) itself and hence
so does f (a + h) − f (a). That is, differentiability at a implies continuity at a.
It is often convenient to express the relation limh→0 E(h)/h = 0 by saying that
“E(h) is o(h)” (pronounced “little oh of h”), meaning that E(h) is of smaller order
of magnitude than h. Thus the differentiability of f at x = a can be expressed by
saying that f (a + h) is the sum of a linear function of h and an error term that is
o(h).
The standard rules for differentiation are easily derived from (2.1). We illustrate
the ideas by working out the product rule.
The Product Rule: Suppose f and g are differentiable at x = a. Then
where E1 (h) and E2 (h) are o(h). Multiplying these equations together yields
8 9
(2.3) f (a + h)g(a + h) = f (a)g(a) + f ′ (a)g(a) + f (a)g′ (a) h + E3 (h),
2.1. Differentiability in One Variable 45
where
8 9 8 9
E3 (h) = f (a)+f ′ (a)h+E1 (h) E2 (h) + E1 (h) g(a)+g′ (a)h + f ′ (a)g′ (a)h2 .
Clearly E3 (h) is o(h) since E1 (h) and E2 (h) are, so (2.3) is of the form (2.1)
with f replaced by f g and m = f ′ (a)g(a) + f (a)g′ (a). In other words, f g is
differentiable at a and (f g)′ (a) = f ′ (a)g(a) + f (a)g′ (a).
The chain rule can also be derived in this way; we shall do so, in a more general
setting, in §2.3.
We can also define “one-sided derivatives” of a function f at a point a. To
wit, the left-hand derivative f−′ (a) and the right-hand derivative f+′ (a) are the
one-sided limits
f (a + h) − f (a)
(2.4) f±′ (a) = lim .
h→0± h
Clearly f is differentiable at a if and only if its left-hand and right-hand derivatives
at a exist and are equal. These notions are particularly useful in two situations: (i)
in discussing functions whose graphs have “corners” such as f (x) = |x|, which has
one-sided derivatives at the origin although it is not differentiable there, and (ii) in
discussing functions whose domain is a closed interval [a, b], where the one-sided
derivatives f+′ (a) and f−′ (b) may be significant.
The Mean Value Theorem. The definition of the derivative involves passing
from the “local” information given by the values of f (x) for x near a to the “in-
finitesimal” information f ′ (a), which (intuitively speaking) gives the infinitesimal
change in f corresponding to an infinitesimal change in x. To reverse the process
and pass from “infinitesimal” information to “local” information — that is, to ex-
tract information about f from a knowledge of f ′ — the principal tool is the mean
value theorem, one of the most important theoretical results of elementary calculus.
The derivation begins with the following result, which is important in its own right.
Proof. By the extreme value theorem (1.23), f assumes a maximum value and a
minimum value on [a, b]. If the maximum and minimum each occur at an endpoint,
then f is constant on [a, b] since the values at the endpoints are equal, so f ′ (x) = 0
for all x ∈ (a, b). Otherwise, at least one of them occurs at some interior point
c ∈ (a, b), and then f ′ (c) = 0 by Proposition 2.5.
2.7 Theorem (Mean Value Theorem I). Suppose f is continuous on [a, b] and dif-
ferentiable on (a, b). There is at least one point c ∈ (a, b) such that
f (b) − f (a)
f ′ (c) = .
b−a
Proof. The straight line joining (a, f (a)) to (b, f (b)) is the graph of the function
f (b) − f (a)
l(x) = f (a) + (x − a),
b−a
and the assertion is that there is a point c ∈ (a, b) where the slope of the graph
y = f (x) is the same as the slope of this line, in other words, where the derivative
of the difference g(x) = f (x) − l(x) is zero. But f and l have the same values at
a and b, so g(a) = g(b) = 0, and the conclusion then follows by applying Rolle’s
theorem to g.
The mean value theorem is nonconstructive; that is, although it asserts the ex-
istence of a certain point c ∈ (a, b), it gives no clue as to how to find that point.
Students often find this perplexing at first, but in fact the whole power of the mean
value theorem comes from situations where there is no need to know precisely
where c is. In many applications, one has information about the behavior of f ′ on
some interval, and one deduces information about f on that same interval. The
following theorem comprises the most important of them.
We say that a function f is increasing (resp. strictly increasing) on an interval
I if f (a) ≤ f (b) (resp. f (a) < f (b)) whenever a, b ∈ I and a < b; similarly for
decreasing and strictly decreasing.
In case the reader feels that we are belaboring the obvious here, we should point
out that the mere differentiability of f at a single point a gives less information
about the behavior of f near x = a than we would like. For example, if f ′ (a) > 0,
it does not follow that f is increasing in some neighborhood of a; see Exercises 3
and 4.
The mean value theorem admits the following important generalization, of
which we shall present some applications below.
2.9 Theorem (Mean Value Theorem II). Suppose that f and g are continuous on
[a, b] and differentiable on (a, b), and g′ (x) ̸= 0 for all x ∈ (a, b). Then there exists
c ∈ (a, b) such that
f ′ (c) f (b) − f (a)
= .
g′ (c) g(b) − g(a)
Proof. Let
Then h is continuous on [a, b] and differentiable on (a, b), and h(a) = h(b) = 0.
By Rolle’s theorem, there is a point c ∈ (a, b) such that
Since g ′ is never 0 on (a, b), we have g′ (c) ̸= 0 and also g(b) − g(a) ̸= 0 (by the
mean value theorem, since g(b) − g(a) = g′ (: c)(b − a) for some :
c ∈ (a, b)). Hence
we can divide by both these quantities to obtain the desired result.
L’Hôpital’s Rule. Often one is faced with the evaluation of limits of quotients
f (x)/g(x) where f and g both tend to zero or infinity. The collection of related
results that go under the name of “l’Hôpital’s rule” enable one to evaluate such
limits in many cases by examining the quotient of the derivatives, f ′ (x)/g ′ (x).
The cases involving the indeterminate form 0/0 can be summarized as follows.
2.10 Theorem (L’Hôpital’s Rule I). Suppose f and g are differentiable functions
on (a, b) and
lim f (x) = lim g(x) = 0.
x→a+ x→a+
48 Chapter 2. Differential Calculus
The proof for left-hand limits is similar, and the case of two-sided limits is obtained
by combining right-hand and left-hand limits. Finally, for the case a = ±∞, we
set y = 1/x and consider the functions F (y) = f (1/y) and G(y) = g(1/y).
Since F ′ (y) = −f ′ (1/y)/y 2 and G′ (y) = −g′ (1/y)/y 2 , we have F ′ (y)/G′ (y) =
f ′ (1/y)/g′ (1/y), so by the results just proved,
f (x) F (y) F ′ (y) f ′ (x)
lim = lim = lim ′ = lim ′ .
x→±∞ g(x) y→0± G(y) y→0± G (y) x→±∞ g (x)
Under the conditions of Theorem 2.10, it may well happen that f ′ (x) and g ′ (x)
tend to zero also, so that the limit of f ′ (x)/g ′ (x) cannot be evaluated immediately.
In this case we can apply Theorem 2.10 again to evaluate the limit by examining
f ′′ (x)/g ′′ (x). More generally, if the functions f, f ′ , . . . , f (k−1) , g, g ′ , . . . , g(k−1)
all tend to zero as x tends to a+ or a− or ±∞, but f (k)(x)/g(k) (x) → L, then
f (x)/g(x) → L.
2.1. Differentiability in One Variable 49
xa log x log x
lim = lim = lim −a = 0.
x→+∞ ex x→+∞ xa x→0+ x
That is, the exponential function ex grows more rapidly than any power of x as
x → +∞, whereas | log x| grows more slowly than any positive power of x as
x → +∞ and more slowly than any negative power of x as x → 0+.
Proof. For the first limit, let k be the smallest integer that is ≥ a. A k-fold appli-
cation of Theorem 2.11 yields
xa a(a − 1) · · · (a − k + 1)xa−k
lim = lim ,
x→+∞ ex x→+∞ ex
and the latter limit is zero because a − k ≤ 0. For the other two limits, a single
application of Theorem 2.11 suffices:
log x 1 log x xa
lim = lim = 0, lim = lim = 0.
x→+∞ xa x→+∞ axa x→0+ x−a x→0+ a
f (a + h) − f (a)
f ′ (a) = lim .
h→0 h
2.1. Differentiability in One Variable 51
The jth component of the difference quotient on the right is h−1 [fj (a+h)−fj (a)].
It follows that f is differentiable if and only if each of its component functions fj
is differentiable, and that differentiation is simply performed componentwise:
/ 0
f ′ (a) = f1′ (a), . . . fn′ (a) .
The usual rules of differentiation generalize easily to this situation. In particular,
there are two forms of the product rule: one for the product of a scalar function ϕ
and a vector function f , and one for the dot product of two vector functions f and g:
(ϕf )′ = ϕ′ f + ϕf ′ , (f · g)′ = f ′ · g + f · g′ .
The first of these is just the ordinary product rule applied to each component ϕfj
of ϕf , and the second one is almost as easy (Exercise 8). Similarly, when n = 3
we have the product rule for cross products:
(f × g)′ = f ′ × g + f × g′ .
(The only point that needs attention here is that the factors f and g must be in the
same order in all three products.)
The most common geometric interpretation of a function f : R → Rn (n >
1) is as the parametric representation of a curve in Rn . That is, the independent
variable t is interpreted as time, and f (t) is the position of a particle moving in
Rn at time t that traces out a curve as t varies. In this setting, the derivative f ′ (t)
represents the velocity of the particle at time t.
Of particular importance are the straight lines in Rn . If a, c ∈ Rn and c ̸= 0,
the line through a in the direction parallel to the vector c is represented parametri-
cally by l(t) = a + tc. In particular, for the line passing through two points a and
b we have c = b − a, and the line is given by l(t) = a + t(b − a); the line segment
from a to b is obtained by restricting t to the interval [0, 1].
If f : R → Rn gives a parametric representation of a curve in Rn and f ′ (a) ̸= 0,
the function l(t) = f (a) + tf ′ (a) gives a parametric representation of the tangent
line to the curve at the point f (a). (If f ′ (a) = 0, the curve may not have a tangent
line at f (a). For example, if f (t) = (t3 , |t|3 ), then f ′ (0) = (0, 0), but the curve in
question is the graph y = |x|.) We shall discuss these matters more thoroughly in
Chapter 3.
It should be pointed out that the mean value theorem is not valid for vector-
valued functions. For example, the function f (t) = (cos t, sin t) satisfies f (0) =
f (2π), but f ′ (t) = (− sin t, cos t), so there is no point t where f ′ (t) = 0. However,
some of the corollaries of the mean value theorem remain valid. In particular, if
|f ′ (t)| ≤ M for all t ∈ [a, b], then
|f (b) − f (a)| ≤ M |b − a|.
52 Chapter 2. Differential Calculus
We shall prove this for the more general case of functions of several variables in
§2.10.
EXERCISES
1. Suppose that f is differentiable on the interval I and that f ′ (x) > 0 for all
x ∈ I except for finitely many points at which f ′ (x) = 0. Show that f is
strictly increasing on I.
2. Define the function f by f (x) = x2 sin(1/x) if x ̸= 0 and f (0) = 0. Show that
f is differentiable at every x ∈ R, including x = 0, but that f ′ is discontinuous
at x = 0. (Calculating f ′ (x) for x ̸= 0 is easy; to calculate f ′ (0) you need to
go back to the definition of derivative.)
3. Let f be the function in Exercise 2, and let g(x) = f (x) + 12 x. Show that
g′ (0) > 0 but that there is no neighborhood of 0 on which g is increasing.
(More precisely, every interval containing 0 has subintervals on which g is
decreasing.)
4. Define the function h by h(x) = x2 if x is rational, h(x) = 0 if x is irrational.
Show that h is differentiable at x = 0, even though it is discontinuous at every
other point.
5. Suppose that f is continuous on [a, b] and differentiable on (a, b), and that the
right-hand limit L = limx→a+ f ′ (x) exists. Show that the right-hand derivative
f+′ (a) exists and equals L. (Hint: Consider the difference quotients defining
f+′ (a) and use the mean value theorem.) Of course, the analogous result for
left-hand limits at b also holds.
6. Suppose that f is three times differentiable on an interval containing a. Show
that
f (a + 2h) − 2f (a + h) + f (a)
lim = f ′′ (a),
h→0 h2
f (a + 3h) − 3f (a + 2h) + 3f (a + h) − f (a)
lim = f (3) (a).
h→0 h3
Can you find the generalization to higher derivatives?
7. Show that for any a, b ∈ R, limx→0 (1+ax)b/x = eab . (Hint: Take logarithms.)
8. Suppose f and g are differentiable functions on R with values in Rn .
a. Show that (f · g)′ = f ′ · g + f · g′ .
b. Suppose also that n = 3, and show that (f × g)′ = f ′ × g + f × g′ .
2
9. Define the function f by f (x) = e−1/x if x ̸= 0, f (0) = 0.
2.2. Differentiability in Several Variables 53
a. Show that limx→0 f (x)/xn = 0 for all n > 0. (You’ll find that a simple-
minded application of Theorem 2.10 doesn’t work. Try setting y = 1/x2
instead.)
b. Show that f is differentiable at x = 0 and that f ′ (0) = 0.
2
c. Show by induction on k that for x ̸= 0, f (k) (x) = P (1/x)e−1/x , where
P is a polynomial of degree 3k.
d. Show by induction on k that f (k) (0) exists and equals 0 for all k. (Use the
results of (a) and (c) to compute the derivative of f (k−1) at x = 0 directly
from the definition, as in (b).)
The upshot is that f possesses derivatives of all orders at every point and that
f (k) (0) = 0 for all k.
10. Exercise 2 shows that it is possible for f ′ to exist at every point of an interval
I but to have discontinuities. It is an intriguing fact that when f ′ exists at every
point of I, it has the intermediate value property whether or not it is continuous.
More precisely:
Darboux’s Theorem. Suppose f is differentiable on [a, b]. If v is any num-
ber between f ′ (a) and f ′ (b), there is a point c ∈ (a, b) such that f ′ (c) = v.
Prove Darboux’s theorem, as follows: To simplify the notation, consider
the case a = 0, b = 1. Define h : [0, 2] → R by setting h(0) = f ′ (0),
and h(2) = f ′ (1). Show that h is continuous on [0, 2] and apply the intermedi-
ate value theorem to it. (This argument has a simple geometric interpretation,
which you can find if you think of h(x) as the slope of the chord joining a
certain pair of points on the graph of f .)
f (x1 , . . . , xj + h, . . . , xn ) − f (x1 , . . . , xj , . . . , xn )
lim ,
h→0 h
The most common notations for the partial derivative just defined are
∂f
, f xj , fj , ∂xj f, ∂j f.
∂xj
The first one is a modification of the Leibniz notation df /dx for ordinary deriva-
tives with the d replaced by the “curly d” ∂. The second one, with the variable of
differentiation indicated merely as a subscript on the function, is often used when
the first one seems too cumbersome. The third one is a variation on the second one
that is used when one does not want to commit oneself to naming the independent
variables but wants to speak of “the partial derivative of f with respect to its jth
variable.” The notations fxj and fj have the disadvantage that they may conflict
with other uses of subscripts — for example, denoting an ordered list of functions
by f1 , f2 , f3 , . . .. It has therefore become increasingly common in advanced math-
ematics to use the notations ∂xj f and ∂j f instead, which are reasonably compact
and at the same time quite unambiguous.
e3x sin xy
E XAMPLE 1. Let f (x, y, z) = . Then
1 + 5y − 7z
The partial derivatives of a function give information about how the value of
the function changes when just one of the independent variables changes; that is,
they tell how the function varies along the lines parallel to the coordinate axes.
Sometimes this is just what is needed, but often we want something more. We may
want to know how the function behaves when several of the variables are changed at
once; or we may want to consider a new coordinate system, rotated with respect to
the old one, and ask how the function varies along the lines parallel to the new axes.
Do the partial derivatives provide such information? Without additional conditions
on the function, the answer is no.
We need to give more thought to what it should mean for a function of several
variables to be differentiable. The right idea is provided by the characterization of
differentiability in one variable that we developed in the preceding section. Namely,
a function f (x) is differentiable at a point x = a if there is a linear function l(x)
such that l(a) = f (a) and the difference f (x) − l(x) tends to zero faster than x − a
as x approaches a. Now, the general linear1 function of n variables has the form
l(x) = b + c1 x1 + · · · + cn xn = b + c · x,
and the condition l(a) = f (a) forces b to be f (a) − c · a, so that l(x) = f (a) + c ·
(x − a). With this in mind, here is the formal definition.
A function f defined on an open set S ⊂ Rn is called differentiable at a point
a ∈ S if there is a vector c ∈ Rn such that
f (a + h) − f (a) − c · h
(2.15) lim = 0.
h→0 |h|
In this case c (which is uniquely determined by (2.15), as we shall see shortly) is
called the gradient of f at a and is denoted by ∇f (a). Denoting the numerator
of the quotient on the left side of (2.15) by E(h), we observe that (2.15) can be
rewritten as
E(h)
(2.16) f (a + h) = f (a) + ∇f (a) · h + E(h), where → 0 as h → 0,
|h|
which clearly expresses the fact that f (a + h), as a function of h, is well approxi-
mated by the linear function f (a) + ∇f (a) · h near h = 0.
1
Unfortunately the term “linear” has two common meanings as applied to functions: “first-degree
polynomial” and “satisfying l(ax + by) = al(x) + bl(y).” The first meaning — the one used here
— allows a constant term; the second does not. See Appendix A, (A.5).
56 Chapter 2. Differential Calculus
What does this mean? First, let us establish the geometric intuition. If n = 2,
the graph of the equation z = f (x) (with x = (x, y)) represents a surface in
3-space, and the graph of the equation z = f (a) + ∇f (a) · (x − a) (x is the
variable; a is fixed) represents a plane. These two objects both pass through the
point (a, f (a)), and at nearby points x = a + h we have
zsurface − zplane = f (a + h) − f (a) − ∇f (a) · h.
Condition (2.16) says precisely that this difference tends to zero faster than h as
h → 0. Geometrically, this means that the plane z = f (a) + ∇f (a) · (x − a) is
the tangent plane to the surface z = f (x) at x = a, as indicated in Figure 2.1.
The same interpretation is valid in any number of variables, with a little stretch of
the imagination: The equation z = f (x) represents a “hypersurface” in Rn+1 with
coordinates (x1 , . . . , xn , z), and the equation z = f (a)+∇f (a)·(x−a) represents
its “tangent hyperplane” at a.
Next, let us establish the connection with partial derivatives and the uniqueness
of the vector c in (2.15). Suppose f is differentiable at a. If we take the increment
h in (2.16) to be of the form h = (h, 0, . . . , 0) with h ∈ R, we have c · h = c1 h
and |h| = ±h (depending on the sign of h). Thus (2.16) says (after multiplying
through by −1 if h is negative) that
f (a1 + h, a2 , . . . , an ) − f (a1 , . . . , an )
lim − c1 = 0,
h→0 h
or in other words, that c1 = ∂1 f (a). Likewise, cj = ∂j f (a) for j = 2, . . . , n. To
summarize:
2.17 Theorem. If f is differentiable at a, then the partial derivatives ∂j f (a) all
exist, and they are the components of the vector ∇f (a).
We also have the following:
2.18 Theorem. If f is differentiable at a, then f is continuous at a.
2.2. Differentiability in Several Variables 57
The converses of Theorems 2.17 and 2.18 are false. The continuity of f does
not imply the differentiability of f even in dimension n = 1 (think of functions like
f (x) = |x| whose graphs have corners). When n > 1, the mere existence of the
partial derivatives of f does not imply the differentiability of f either. The example
(2.14) demonstrates this: Its partial derivatives exist, but it is not continuous at the
origin, so it cannot be differentiable there.
To restate what we have just shown: For a function f to be differentiable at a
it is necessary for the partial derivatives ∂j f (a) to exist, but not sufficient. How,
then, do we know when a function is differentiable? Fortunately, there is a simple
condition, not too much stronger than the existence of the partial derivatives, that
guarantees differentiability.
2.19 Theorem. Let f be a function defined on an open set in Rn that contains the
point a. Suppose that the partial derivatives ∂j f all exist on some neighborhood of
a and that they are continuous at a. Then f is differentiable at a.
Proof. Let’s consider the case n = 2, to keep the notation simple. We wish to show
that
f (a + h) − f (a) − c · h / 0
(2.20) → 0 as h → 0, where c = ∂1 f (a), ∂2 f (a) .
|h|
To do this, we shall analyze the increment f (a + h) − f (a) by making the change
one variable at a time:
8 9
(2.21) f (a + h) − f (a) = f (a1 + h1 , a2 + h2 ) − f (a1 , a2 + h2 )
8 9
+ f (a1 , a2 + h2 ) − f (a1 , a2 ) .
We assume that h is small enough so that the partial derivatives ∂j f (x) exist when-
ever |x − a| ≤ |h|. In this case, we can use the one-variable mean value theorem to
express the differences on the right side of (2.21) in terms of the partial derivatives
of f at suitable points. If we set g(t) = f (t, a2 + h2 ), we have
for some c2 between 0 and h2 . Substituting these results back into (2.21) and then
into the left side of (2.20), we obtain
f (a + h) − f (a) − c · h 8 9 h1
= ∂1 f (a1 + c1 , a2 + h2 ) − ∂1 f (a1 , a2 )
|h| |h|
8 9 h2
+ ∂2 f (a1 , a2 + c2 ) − ∂2 f (a1 , a2 ) .
|h|
Now let h → 0. The expressions in brackets tend to 0 because the partial deriva-
tives ∂j f are continuous at a, and the ratios h1 /|h| and h2 /|h| are bounded by 1 in
absolute value. Thus (2.20) is valid and f is differentiable at a.
The idea for general n is exactly the same. We write f (a + h) − f (a) as the
sum of n increments, each of which involves a change in only one variable — for
example, the first of them is
f (a1 + h1 , a2 + h2 , . . . , an + hn ) − f (a1 , a2 + h2 , . . . , an + hn )
— and then use the mean value theorem to express each difference in terms of a
partial derivative of f and proceed as before.
where the error term is negligibly small in comparison with h. If we neglect the
error term, the resulting approximation to the increment f (a + h) − f (a) is called
the differential of f at a and is denoted by df (a; h) or dfa (h):
This follows from (2.22) and the fact that the partial derivatives obey these rules.
We’ll see later how differentials interact with the chain rule.
Differentials are handy for approximating small changes in a function. Here’s
an example:
E XAMPLE 3. A right circular cone has height 5 and base radius 3. (a) About
how much does the volume increase if the height is increased to 5.02 and the
radius is increased to 3.01? (b) If the height is increased to 5.02, by about how
much should the radius be decreased to keep the volume constant?
Solution. The volume of a cone is given by V = 13 πr 2 h, so dV =
2 1 2
3 πrh dr + 3 πr dh. (a) If r = 3, h = 5, dr = .01, and dh = .02, we
have dV = 32 π(3)(5)(.01) + 13 π(32 )(.02) = .16π ≈ .50. (b) If r = 3, h = 5,
dh = .02, as in (a) we have dV = 10π dr + .06π, so dV = 0 if dr = −.006.
60 Chapter 2. Differential Calculus
We take h = tu. If t > 0, then |h| = t and the expression on the left of (2.25) is
f (a + tu) − f (a)
− ∇f (a) · u.
t
If t < 0, then |h| = −t and the expression on the left of (2.25) is
f (a + tu) − f (a)
− + ∇f (a) · u.
t
In either case, this quantity tends to 0 as t → 0, which means that ∂u f (a) exists
and equals ∇f (a) · u.
direction as ∇f (a). Thus, ∇f (a) is the vector whose magnitude is the largest di-
rectional derivative of f at a, and whose direction is the direction of that derivative.
In other words, ∇f (a) points in the direction of steepest increase of f at a, and its
magnitude is the rate of increase of f in that direction.
E XAMPLE 4. Let f (x, y) = x2 + 5xy 2 , a = (−2, 1). (a) Find the directional
derivative of f at a in the direction of the vector v = (12, 5). (b) What is the
largest of the directional derivatives of f at a, and in what direction does it
occur?
Solution. We have ∇f (x, y) = (2x + 5y 2 , 10xy), so that ∇f (−2, 1) =
(1, −20). The unit vector in the direction of v is u = ( 12 5
13 , 13 ), so the direc-
12 5
tional derivative in this direction is ∇f (a) · u = (1, −20) · ( 13 , 13 ) = − 88
13 .
√
The largest directional derivative at a is |∇f (a)| = 401, and it occurs in the
1
direction √401 (1, −20).
EXERCISES
1. For each of the following functions f , (i) compute ∇f , (ii) find the directional
derivative of f at the point (1, −2) in the direction ( 53 , 45 ).
a. f (x, y) = x2 y + sin πxy.
2
b. f (x, y) = e4x−y .
c. f (x, y) = (x + 2y + 4)/(7x + 3y).
2. For each of the following functions f , (i) compute the differential df , (ii) use
the differential to estimate the difference f (1.1, 1.2, −0.1) − f (1, 1, 0).
a. f (x, y, z) = x2 ex−y+3z .
b. f (x, y, z) = y 3 + log(x + z 2 ).
x2 y 3/2 z
3. Let w = f (x, y, z) = . Suppose that, at the outset, (x, y, z) =
z+1
(5, 4, 1), so that w = 100. Use differentials to answer the following ques-
tions.
a. Suppose we change x to 5.03 and y to 3.92. By (about) how much should
we change z in order to keep w = 100?
b. Suppose we want to increase the value of w a little bit by changing the
value of only one of the independent variables. Which variable should
we choose to get the biggest increase in w for the smallest change of the
independent variable?
4. Show that u = f (x, y, z) = xe2z + y −1 e5z satisfies the differential equation
x∂x u + 2y∂y u + ∂z u = 3u.
62 Chapter 2. Differential Calculus
dw ∂w dx1 ∂w dxn
(2.27) = + ··· + .
dt ∂x1 dt ∂xn dt
In the first equation we take h = g(a + u) − g(a). By the second equation, we also
have h = ug′ (a) + E2 (u), and we are given that g(a) = b, so
where
E3 (u) = ∇f (b) · E2 (u) + E1 (h).
We claim that the error term E3 (u) satisfies E3 (u)/u → 0 as u → 0. Granted this,
we have
Now the second term in E3 (u), namely E1 (h), becomes negligibly small in com-
parison to |h| as |h| → 0, and the estimate above shows that |h| in turn is bounded
by a constant times |u|, so E1 (h) becomes negligibly small in comparison to |u| as
u → 0, which means that E1 (h)/u → 0 as desired.
64 Chapter 2. Differential Calculus
dw d
= f (t4 − t, sin 3t, e−2t )
dt dt
= (∂1 f ) · (4t3 − 1) + (∂2 f ) · (3 cos 3t) + (∂3 f ) · (−2e−2t ),
where the partial derivatives ∂j f are all evaluated at (t4 − t, sin 3t, e−2t ).
∂ϕ ∂g
(2.28) (a) = ∇f (b) · (a) (b = g(a)),
∂tk ∂tk
∂w ∂w ∂x1 ∂w ∂xn
= + ··· + .
∂tk ∂x1 ∂tk ∂xn ∂tk
To be precise, this calculation shows that if the partial derivatives ∂g/∂tk exist
at t = a and if f is differentiable at x = b = g(a), then the partial derivatives
∂ϕ/∂tk exist at t = a and are given by (2.28). It also shows that if g is of class
C 1 near a and f is of class C 1 near b = g(a), then ϕ is of class C 1 , and in
particular is differentiable, near a. Indeed, under these hypotheses, (2.28) shows
that the partial derivatives ∂ϕ/∂tk are continuous.
It is also natural to ask whether the composite function f ◦ g is differentiable
when f and g are only assumed to be differentiable rather than C 1 . The answer is
affirmative. When t is only a single real variable, this result is contained in the chain
rule as stated and proved above. The proof for the general case, t = (t1 , . . . , tm ),
is almost identical except that the notation is a little messier, and we shall not take
the trouble to write it out. But we shall give a formal statement of the result:
The content of the chain rule (2.30) is precisely that this last expression for dw coin-
cides with (2.33). In other words, the differential formalism has the chain rule “built
in,” just as it does in one variable (where the chain rule dw/dt = (dw/dx)(dx/dt)
is just a matter of “canceling the dx’s”).
The preceding discussion concerns the situation where the variable w depends
on a set of variables xj , and the xj ’s depend on a different set of variables tk .
However, in many situations the variables on different “levels” can get mixed up
with each other. The typical example is as follows. Consider a physical quantity
w = f (x, y, z, t) whose value depends on the position (x, y, z) and the time t
(temperature, for example, or air pressure in a region of the atmosphere). Consider
also a vehicle moving through space, so that its coordinates (x, y, z) are functions
of t. We wish to know how the quantity w varies in time, as measured by an
observer on the vehicle; that is, we are interested in the behavior of the composite
function / 0
w = f x(t), y(t), z(t), t .
Here t enters not only as a “first-level” variable, as the last argument of f , but also
as a “second-level” variable through the t-dependence of x, y, z.
How should this be handled? There is no real problem; the only final indepen-
dent variable is t, so the chain rule in the form (2.27) can be applied:
dw ∂w dx ∂w dy ∂w dz ∂w
(2.34) = + + + .
dt ∂x dt ∂y dt ∂z dt ∂t
In the last term we have omitted the derivative dt/dt, which of course equals 1. (If
this makes you nervous, denote the fourth variable in f by u instead of t; then we
are considering w = f (x(t), y(t), z(t), u(t)) where u(t) = t.)
Notice the subtle use of notation: The dw/dt on the left of (2.34) denotes the
“total derivative” of w, taking into account all the ways in which w depends on t,
whereas the ∂w/∂t on the right denotes the partial derivative that involves only the
explicit dependence of the function f on its fourth variable t. This notation works
well enough in this situation, but it becomes inadequate if there is more than one
final independent variable.
Suppose, for example, that we are studying a function w = f (x, y, t, s), and
that x and y are themselves functions of the independent variables t and s. Then
the analogue of (2.34) would be
∂w ∂w ∂x ∂w ∂y ∂w
= + + ,
∂t ∂x ∂t ∂y ∂t ∂t
but this is nonsense! The ∂w/∂t’s on the left and on the right denote different
things. In such a situation we must use one of the alternative notations for partial
2.3. The Chain Rule 67
x1
x2
w t
xn
derivatives that offer more precision, or perhaps add some subscripts to the ∂w/∂t’s
to specify their meaning. In this case, if x = ϕ(t, s) and y = ψ(t, s), we could
write
∂w
(2.35) = (∂1 f )(∂1 ϕ) + (∂2 f )(∂1 ψ) + ∂3 f.
∂t
The mixture of dependent-and-independent-variable notation on the left and
functional notation on the right in (2.35) is perhaps inelegant, but it does the job!
In general, it is best not to be too doctrinaire about deciding to use one notation
for partial derivatives rather than another one; clarity is more important than con-
sistency. We shall be quite free about adopting whichever notation works best in a
particular situation, and the exercises aim at encouraging the reader to do likewise.
When the relations among the variables become too complicated for comfort,
we can often sort things out by drawing a schematic diagram of the functional
relationships. The idea is as follows:
i. Write down the dependent variable on the left of the page, a list of the inde-
pendent variables on which it ultimately depends on the right, and lists of the
intermediate variables in the middle.
ii. Whenever one variable p depends directly on another one q, draw a line joining
them; this line represents the partial derivative ∂p/∂q.
iii. To find the derivative of the variable w on the left with respect to one of the
variables t on the right, consider all the ways you can go from w to t by follow-
ing the lines. For each such path, write down the product of partial derivatives
corresponding to the lines along the path, then add the results.
The diagram for the basic chain rule (2.27) is shown in Figure 2.2: The path
from w to xj to t gives the term (∂w/∂xj )(dxj /dt) in (2.27). On the other hand,
Figure 2.3 gives the diagram for w = f (x, y, t, s) where x and y depend on t and
s: There are three paths from w to t (w to x to t, w to y to t, and w to t directly)
that give the three terms on the right of (2.35).
68 Chapter 2. Differential Calculus
y
w
t
Proof. Consider the function ϕ(t) = f (tx). On the one hand, since f (tx) =
ta f (x), we have ϕ′ (t) = ata−1 f (x) = at−1 f (tx). On the other, by the chain rule
we have
d
ϕ′ (t) = ∇f (tx) · (tx) = x · ∇f (tx).
dt
Setting t = 1 and equating the two expressions for ϕ′ (1), we obtain the asserted
result.
We conclude this section with an additional geometric insight into the meaning
of the gradient of a function. If F is a differentiable function of (x, y, z) ∈ R3 , the
locus of the equation F (x, y, z) = 0 is typically a smooth two-dimensional surface
S in R3 . (We shall consider this matter more systematically in Chapter 3.) Suppose
that (x, y, z) = g(t) is a parametric represention of a smooth curve on S. On the
one hand, by the chain rule we have (d/dt)F (g(t)) = ∇F (g(t)) · g′ (t). On the
other hand, since the curve lies on S, we have F (g(t)) = 0 for all t and hence
(d/dt)F (g(t)) = 0. Thus, for any curve on the S, the gradient of F is orthogonal
to the tangent vector to the curve at each point on the curve. Since such curves can
go in any direction on the surface, we conclude that at any point a ∈ S, ∇F (a) is
orthogonal to every vector that is tangent to S at a. (Of course, this is interesting
only if ∇F (a) ̸= 0.) We summarize:
2.3. The Chain Rule 69
2.38 Corollary. Under the conditions of the theorem, the equation of the tangent
plane to S at a is ∇F (a) · (x − a) = 0.
This formula for the tangent plane to a surface agrees with the one we gave in
§2.2 when the surface is the graph of a function f (x, y). The easy verification is
left to the reader (Exercise 5).
A similar result holds if we have two equations F (x, y, z) = 0 and G(x, y, z) =
0. Each of them (usually) represents a surface, and the intersection of the two
surfaces is (usually) a curve. At any point a on this curve, the vectors ∇F (a) and
∇G(a) are both perpendicular to the curve, and if they are linearly independent,
they span the normal plane to the curve at a.
These ideas carry over into dimensions other than 3. For n = 2, an equation
F (x, y) = 0 typically represents a curve C, and ∇F (a, b) is normal to C at each
(a, b) ∈ C. For n > 3, we simply stretch our imagination to say that ∇F (a) is
normal to the hypersurface defined by F (x) = 0 at x = a.
EXERCISES
2.39 Theorem (Mean Value Theorem III). Let S be a region in Rn that contains
the points a and b as well as the line segment L that joins them. Suppose that f is
a function defined on S that is continuous at each point of L and differentiable at
each point of L except perhaps the endpoints a and b. Then there is a point c on L
such that
f (b) − f (a) = ∇f (c) · (b − a).
d
ϕ′ (t) = ∇f (a + th) · (a + th) = ∇f (a + th) · h = ∇f (a + th) · (b − a).
dt
By the one-variable mean value theorem, there is a point u ∈ (0, 1) such that
ϕ(1) − ϕ(0) = ϕ′ (u) · (1 − 0) = ϕ′ (u). Let c = a + uh; then
To state the principal corollaries of the mean value theorem, we need a defini-
tion. A set S ⊂ Rn is called convex if whenever a, b ∈ S, the line segment from
a to b also lies in S. Clearly every convex set is arcwise connected (line segments
are arcs!), but most connected sets are not convex. See Figure 2.4.
so a+t(b−a) ∈ B. (We have used the fact that t and 1−t are both nonnegative
when 0 ≤ t ≤ 1.)
Proof. The line segment from a to b lies in S, and for some c on this segment we
have f (b) − f (a) = ∇f (c) · (b − a). Hence, by Cauchy’s inequality, |f (b) −
f (a)| ≤ |∇f (c)| |b − a| ≤ M |b − a|.
Proof. Pick a ∈ S and take M = 0 in Corollary 2.40. We conclude that for every
b ∈ S, |f (b) − f (a)| = 0, that is, f (b) = f (a).
a
a
b
a b b
S1 S2 S3
F IGURE 2.4: A convex set (S1 ), a set that is connected but not convex
(S2 ), and a disconnected set (S3 ).
EXERCISES
1. State and prove two analogues of Rolle’s theorem for functions of several vari-
ables, whose hypotheses are, respectively, the following:
2.5. Functional Relations and Implicit Functions: A First Look 73
we can call it g(x). The object in such a situation is to use the equation
x = y + y 5 to study the function g.
b. The equation x2 + y 2 + z 2 ='
1 can be solved for z as a'
continuous function
of x and y in two ways, z = 1 − x2 − y 2 and z = − 1 − x2 − y 2 , both
of which are defined only for x2 + y 2 ≤ 1.
At this stage we are not going to worry about these matters, or about the ques-
tion of when it is possible to solve the equation at all; such questions will be ad-
dressed in Chapter 3. Rather, we shall assume that there is a differentiable function
g(x1 , . . . , xn ), defined for x1 , . . . , xn in some region S ⊂ Rn , so that the equation
F (x1 , . . . , xn , y) = 0 is satisfied identically when g(x1 , . . . , xn ) is substituted for
y:
/ 0
(2.43) F x1 , . . . , xn , g(x1 , . . . , xn ) ≡ 0, (x1 , . . . , xn ) ∈ S.
In this situation we can use the chain rule to compute the partial derivatives
of g in terms of the partial derivatives of F , simply by differentiating the equation
(2.43) with respect to the variables xj :
∂g ∂g ∂j F
(2.44) ∂j F + ∂n+1 F = 0, so =− .
∂xj ∂xj ∂n+1 F
E XAMPLE 1 (continued).
a. Differentiation of the equation x − y − y 5 = 0 with respect to x yields
1 − (dy/dx) − 5y 4 (dy/dx) = 0, or (dy/dx) = 1/(1 + 5y 4 ). Of course,
this gives dy/dx in terms of y instead of x, and we don’t have a formula
for y in terms of x, but this is better than nothing!
b. Differentiation of x2 + y 2 + z 2 = 1 with respect to x, with z as the depen-
dent variable, gives 2x + 2z(∂z/∂x) = 0, or ∂z/∂x = −x/z. ' It is easily
verified'that this formula is correct whether we take z = 1 − x2 − y 2 or
z = − 1 − x2 − y 2 .
The usual way to clarify this situation is to put subscripts on the partial deriva-
tives to indicate which variables are being held fixed:
)
∂w ))
= derivative of w with respect to x when y is fixed.
∂x )y
Thus, in Example 2,
) )
∂w )) ∂w ))
= 2x − 1, = 4x + 2z.
∂x )y ∂x )z
The preceding ideas work in much the same way when we are given more than
one constraint equation. For example, if we are given two equations F (x, y, u, v) =
0 and G(x, y, u, v) = 0, we may be able to solve them for the two variables u and
v in terms of the other two variables x and y. In this case the partial derivatives
of u and v with respect to x, say, can be calculated by differentiating the equations
F = 0 and G = 0, obtaining
∂u ∂v
∂x F + ∂u F + ∂v F = 0,
∂x ∂x
∂u ∂v
∂x G + ∂u G + ∂v G = 0,
∂x ∂x
76 Chapter 2. Differential Calculus
and then solving these (linear!) equations simultaneously for ∂u/∂x and ∂v/∂x.
By Cramer’s rule (Appendix A, (A.54)), the result is
- . - .
∂x F ∂v F ∂u F ∂x F
det det
∂u ∂x G ∂v G ∂v ∂u G ∂x G
=− - ., =− - ..
∂x ∂u F ∂v F ∂x ∂u F ∂v F
det det
∂u G ∂v G ∂u G ∂v G
We could solve these equations for y ′ and z ′ as they stand, but since we are
interested in the answer at (x, y, z) = (1, 0, 2), we can simplify matters by
substituting in these values right now. The first equation reduces to 7 + z ′ −
64y ′ = 0 and the second one to 2y ′ = 2 + y ′ − z ′ , or
/ 0
64y ′ − z ′ = 7, y ′ + z ′ = 2 when (x, y, z) = (1, 0, 2) .
9 121
Solving these equations yields y ′ = 65 and z ′ = 65 , so — returning to
′ 9 9 ′
the original question — dy = y dx = 65 (.02) = 3250 and dz = z dx =
121 121
65 (.02) = 3250 .
EXERCISES
where the subscript indicates the variable that is being held fixed.
6. Suppose that F (x, y, z) = 0 is an equation that can be solved to yield any of
the three variables as a function of the other two. Show that
∂x ∂y ∂z
= −1,
∂y ∂z ∂x
provided that the symbols are interpreted properly. (Part of the problem is to
say what the proper interpretation is.)
7. Suppose that the variables E, T , V , and P are related by a pair of equations,
f (E, T, V, P ) = 0 and g(E, T, V, P ) = 0, that can be solved for any two of the
variables in terms of the other two, and suppose that the differential equation
∂V E − T ∂T P + P = 0 is satisfied when V and T are taken as the independent
variables. Show that ∂P E + T ∂T V + P ∂P V = 0 when P and T are taken as
the independent variables. (This example comes from thermodynamics, where
E, T , V , and P represent energy, temperature, volume, and pressure.)
if i ̸= j and
∂2f
, f xj xj , fjj , ∂x2j f, ∂j2 f
∂x2j
if i = j. The analogues of these notations for higher-order partial derivatives
should be pretty clear. However, all of them become quite cumbersome when the
order of the derivative is even moderately large. There is a more compact notation
for partial derivatives of arbitrary order that we shall introduce below.
A function f is said to be of class C k on an open set U if all of its partial
derivatives of order ≤ k — that is, all the derivatives ∂i1 ∂i2 · · · ∂il f , for all choices
of the indices ij and all l ≤ k — exist and are continuous on U . We also say that f
is of class C k on a nonopen set S if it is of class C k on some open set that includes
S. If the partial derivatives of f of all orders exist and are continuous on U , f is
said to be of class C ∞ on U .
It is common to refer to the derivatives ∂j2 f and ∂i ∂j f (i ̸= j) as pure and
mixed second-order partial derivatives of f , respectively. In this connection, a
question that immediately arises is whether the order of differentiation matters.
In other words, is ∂i ∂j f the same as ∂j ∂i f ? Experimentation with elementary
examples suggests that the answer is yes.
E XAMPLE 1. If g(x, y) = x sin(x3 + e2y ), we have
∂x g = sin(x3 + e2y ) + 3x3 cos(x3 + e2y ), ∂y g = 2xe2y cos(x3 + e2y ).
Differentiating ∂x g with respect to y and ∂y g with respect to x yields
∂y ∂x g(x, y) = 2e2y cos(x3 + e2y ) − 6x3 e2y sin(x3 + e2y ) = ∂x ∂y g(x, y).
However, the following example shows that ∂i ∂j f may fail to coincide with
∂j ∂i f .
E XAMPLE 2. Let
xy(x2 − y 2 )
f (x, y) = if (x, y) ̸= (0, 0), f (0, 0) = 0.
x2 + y 2
Since f (x, 0) = f (0, y) = 0 for all x, y, we have ∂x f (0, 0) = ∂y f (0, 0) = 0,
and a little calculation shows that for (x, y) ̸= (0, 0),
x4 y + 4x2 y 3 − y 5 x5 − 4x3 y 2 − xy 4
∂x f (x, y) = , ∂y f (x, y) = .
(x2 + y 2 )2 (x2 + y 2 )2
In particular, ∂x f (0, y) = −y and ∂y f (x, 0) = x for all x, y, so
∂y ∂x f (0, 0) = −1 but ∂x ∂y f (0, 0) = 1.
2.6. Higher-Order Partial Derivatives 79
where u: and v: are some other numbers between 0 and h. Equating these two
expressions and cancelling the h2 , we have
∂x ∂y f (a + u, b + v) = ∂y ∂x f (a + u
:, b + v:).
Once this is known, an elementary but slightly messy inductive argument shows
that the analogous result for higher-order derivatives is also true:
The fact that the order of differentiation in a mixed partial derivative can occa-
sionally matter is a technicality that is of essentially no importance in applications.
In fact, by adopting a more sophisticated viewpoint one can prove a theorem to
the effect that, under very general conditions, ∂i ∂j f and ∂j ∂i f are always equal
“almost everywhere,” which is enough to allow regarding them as equal for all
practical purposes.
The chain rule can be used to compute higher-order partial derivatives of com-
posite functions, but there are some pitfalls to be avoided. To be concrete, suppose
that w = f (x, y) and that x and y are functions of s and t. Assume that all the
functions in question are at least of class C 2 . To begin with, the chain rule for
first-order derivatives gives
∂w ∂w ∂x ∂w ∂y
(2.48) = + .
∂s ∂x ∂s ∂y ∂s
of x and y, not x and s. Rather, ∂w/∂x is a function of x and y just like w, and
to differentiate it with respect to s we use the chain rule again; and likewise for
∂w/∂y:
+ , + ,
∂ ∂w ∂ 2 w ∂x ∂ 2 w ∂y ∂ ∂w ∂ 2 w ∂x ∂ 2 w ∂y
(2.50) = + , = + .
∂s ∂x ∂x2 ∂s ∂x∂y ∂s ∂s ∂y ∂x∂y ∂s ∂y 2 ∂s
Now we plug these results into (2.49) to get the final answer, which thus contains
quite a few terms. Pitfall number 2: It’s easy to forget some of these terms.
In this situation it’s usually advantageous to use the notation fx and fy in-
stead of ∂w/∂x and ∂w/∂y, and likewise for second-order derivatives. This makes
(2.48)–(2.50) look a little more manageable:
∂w ∂x ∂y
= fx + fy ,
∂s ∂s ∂s
∂2w ∂fx ∂x ∂ 2 x ∂fy ∂y ∂2y
= + f x + + f y ,
∂s2 ∂s ∂s ∂s2 ∂s ∂s ∂s2
∂fx ∂x ∂y ∂fy ∂x ∂y
= fxx + fxy , = fxy + fyy .
∂s ∂s ∂s ∂s ∂s ∂s
The final result is then
+ ,2 + ,2
∂2w ∂x ∂x ∂y ∂y ∂2x ∂2y
= f xx + 2f xy + f yy + f x + f y .
∂s2 ∂s ∂s ∂s ∂s ∂s2 ∂s2
Of course, similar results also hold for the other second-order derivatives of w.
E XAMPLE 3. Suppose u = f (x, y), x = s2 − t2 , y = 2st. Assuming f is of
class C 2 , find ∂ 2 u/∂s∂t in terms of the derivatives of f .
∂u ∂x ∂y
Solution. = fx + fy = −2tfx + 2sfy , so
∂t ∂t ∂t
∂2u
= −2t[2sfxx + 2tfxy ] + 2s[2sfxy + 2tfyy ] + 2fy
∂s∂t
= −4stfxx + 4(s2 − t2 )fxy + 4stfyy + 2fy .
The calculation of the mixed derivative ∂ 2 u/∂r∂θ is left to the reader (Exercise
2).
Notice, in particular, that by combining the last two equations and using
the identity sin2 θ + cos2 θ = 1, we obtain
∂ 2 u 1 ∂u 1 ∂2u
+ + = fxx + fyy .
∂r 2 r ∂r r 2 ∂θ 2
The expression on the right, the sum of the pure second partial derivatives of f
with respect to a Cartesian coordinate system, turns up in many practical and
theoretical applications; it is called the Laplacian of f . (We shall encounter
it again in Chapter 5.) What we have just accomplished is the calculation of
the Laplacian in polar coordinates. We state this result formally, with slightly
different notation.
If α is a multi-index, we define
|α| = α1 + α2 + · · · + αn , α! = α1 !α2 ! · · · αn !,
α
x = xα1 1 xα2 2 · · · xαnn (where x = (x1 , x2 , . . . , xn ) ∈ Rn ),
∂ |α| f
∂ α f = ∂1α1 ∂2α2 · · · ∂nαn f =
∂xα1 1 ∂xα2 2 · · · ∂xαnn
∂3f
∂ (0,3,0) f = , x(2,1,5) = x2 yz 5 .
∂y 3
As the notation xα indicates, multi-indices are handy for writing not only
derivatives but also polynomials in several variables. To illustrate their use, we
present a generalization of the binomial theorem.
k
" " " k!
k! k!
(x1 + x2 )k = xj xk−j = xα1 1 xα2 2 = xα ,
j!(k − j)! 1 2 α1 !α2 ! α!
j=0 α1 +α2 =k |α|=k
where we have set α1 = j, α2 = k −j, and α = (α1 , α2 ). The general case follows
by induction on n. Suppose the result is true for n < N and x = (x1 , . . . , xN ). By
84 Chapter 2. Differential Calculus
using the result for n = 2 and then the result for n = N − 1, we obtain
8 9k
(x1 + · · · + xN )k = (x1 + · · · + xN −1 ) + xN
" k!
= (x1 + · · · + xN −1 )i xjN
i!j!
i+j=k
" k! " i!
= :β xjN ,
x
i!j! β!
i+j=k |β|=i
EXERCISES
In these exercises, all functions in question are assumed to be of class C 2 .
is called the kth-order Taylor polynomial for f based at a, and the difference
k
" f (j)(a)
(2.54) Ra,k (h) = f (a + h) − Pa,k (h) = f (a + h) − hj
j!
j=0
is called the kth-order Taylor remainder. The various versions of Taylor’s theorem
provide formulas or estimates for Ra,k that ensure that the Taylor polynomial Pa,k
is a good approximation to f near a. The ones most commonly known involve the
stronger assumption that f is of class C k+1 and yield the stronger conclusion that
the remainder vanishes as rapidly as |x − a|k+1 . We present two of these, as well
as one that yields the more general form of the theorem stated above.
The easiest version of Taylor’s theorem to derive is the following.
2.55 Theorem (Taylor’s Theorem with Integral Remainder, I). Suppose that f is
of class C k+1 (k ≥ 0) on an interval I ⊂ R, and a ∈ I. Then the remainder Ra,k
defined by (2.53)–(2.54) is given by
*
hk+1 1
(2.56) Ra,k (h) = (1 − t)k f (k+1) (a + th) dt.
k! 0
86 Chapter 2. Differential Calculus
The trick now is to integrate (2.57) by parts, choosing for the antiderivative of the
constant function 1 not t but t − 1, alias −(1 − t):
* 1 * 1
)1
h f ′ (a + th) dt = −(1 − t)hf ′ (a + th))0 + h (1 − t)f ′′ (a + th)h dt
0 0
* 1
= f ′ (a)h + h2 (1 − t)f ′′ (a + th) dt.
0
we obtain the theorem for k = 2. The pattern is now clear: Integrating (2.57) by
parts k times yields (2.56).
2.58 Theorem (Taylor’s Theorem with Integral Remainder, II). Suppose that
f is of class C k (k ≥ 1) on an interval I ⊂ R, and a ∈ I. Then the remain-
der Ra,k defined by (2.53)–(2.54) is given by
* 1
hk 8 9
(2.59) Ra,k (h) = (1 − t)k−1 f (k) (a + th) − f (k) (a) dt.
(k − 1)! 0
2.7. Taylor’s Theorem 87
k
" * 1
f (j)(a) hk hk
f (a + h) − hj = (1 − t)k−1 f (k) (a + th) dt − f (k) (a) .
j! (k − 1)! 0 k!
j=0
The formulas (2.56) and (2.59) are generally used not to obtain the exact value
of the remainder but to obtain an estimate for it. The main results are in the follow-
ing corollaries.
Proof. f (k) is continuous at a, so for any ϵ > 0 there exists δ > 0 such that
|f (k) (y) − f (k) (a)| < ϵ when |y − a| < δ. In particular,
) (k) )
)f (a + th) − f (k)(a)) < ϵ for 0 ≤ t ≤ 1 when |h| < δ.
In other words, |Ra,k (h)/hk | < ϵ/k! whenever |h| < δ, and hence Ra,k (h)/hk →
0 as h → 0.
so g(j) (0) = 0. Therefore, by Lemma 2.62, there is a point c ∈ (0, h) such that
Ra,k (h)
0 = g(k+1) (c) = f (k+1) (a + c) − (k + 1)!.
hk+1
But this is precisely (2.64). The case h < 0 is handled similarly by considering the
function :g(t) = g(−t) on the interval [0, |h|].
are, respectively,
Taylor polynomials have many uses. From a practical point of view, they allow
one to approximate complicated functions by polynomials that are relatively easy
to compute with. On the more theoretical side, it is an important general principle
that the behavior of a function f (x) near x = a is largely determined by the first
nonvanishing term, apart from the constant term f (a), in its Taylor expansion. That
is, if f ′ (a) ̸= 0, then the tangent line approximation f (x) ≈ f (a) + f ′ (a)(x − a)
is a good one. If f ′ (a) = 0 but f ′′ (a) ̸= 0, the second-order term is decisive,
and so forth. This is the basis for the second-derivative test for local extrema: If
f ′′ (a) ̸= 0, then f (x) ≈ f (a) + 12 f ′′ (a)(x − a)2 , and the expression on the right
is a quadratic function with a maximum or minimum at a, depending on the sign
of f ′′ (a). (See Exercise 9 and §2.8.) The following example illustrates another
application of this principle.
x2 − sin x2
E XAMPLE 1. Use Taylor expansions to evaluate lim .
x→0 x4 (1 − cos x)
90 Chapter 2. Differential Calculus
Solution. We have
x2 − sin x2 = x2 − (x2 − 16 x6 + · · · ) = 16 x6 + · · · ,
/ 0
x4 (1 − cos x) = x4 1 − (1 − 12 x2 + · · · ) = 21 x6 + · · · ,
where the dots denote error terms that vanish faster than x6 as x → 0. There-
fore,
1 6 1
x2 − sin x2 6x + · · · 6 + ···
= 1 = 1 ,
x4 (1 − cos x) 6
2x + · · · 2 + ···
where the dots in the last fraction denote error terms that vanish as x → 0. The
limit is therefore 13 . (To appreciate the efficiency of this calculation, try doing
it by l’Hôpital’s rule!)
We now generalize these results to functions on Rn . Suppose f : Rn → R is of
class C k on a convex open set S. We can derive a Taylor expansion for f (x) about
a point a ∈ S by looking at the restriction of f to the line joining a and x. That is,
we set h = x − a and
therefore yields
k
" (h · ∇)j f (a)
(2.67) f (a + h) = + Ra,k (h),
j!
0
where formulas for Ra,k (h) can be obtained from the formulas (2.56), (2.59), or
(2.64) applied to g.
2.7. Taylor’s Theorem 91
Substituting this into (2.67) and the remainder formulas, we obtain the following:
where
" hα * 1 8 9
(2.70) Ra,k (h) = k (1 − t)k−1 ∂ α f (a + th) − ∂ α f (a) dt.
α! 0
|α|=k
and
" hα
(2.72) Ra,k (h) = ∂ α f (a + ch) for some c ∈ (0, 1).
α!
|α|=k+1
The first of these formulas is (2.67) with k = 2; the second one is (2.69). (Every
multi-index α of order 2 is either of the form (. . . , 2, . . .) or (. . . , 1, . . . , 1, . . .),
where the dots denote zero entries, so the sum over |α| = 2 in (2.69) breaks up into
the last two sums in (2.74).) Notice that the mixed derivatives ∂j ∂k (j ̸= k) occur
twice in (2.73) (since ∂j ∂k = ∂k ∂j ) but only once in (2.74) (since j < k there);
this accounts for the disappearance of the factor of 12 in the last sum in (2.74).
We also have the following analogue of Corollaries 2.60 and 2.61:
M
|Ra,k (h)| ≤ ∥h∥k+1 ,
(k + 1)!
where
∥h∥ = |h1 | + |h2 | + · · · + |hn |.
Proof. The proof of the first assertion is the same as the proof of Corollary 2.60.
As for the second, it follows easily from either (2.71) or (2.72) that
" |hα |
|Ra,k (h)| ≤ M ,
α!
|α|=k+1
and this last expression equals M ∥h∥k+1 /(k+1)! by the multinomial theorem.
P (th)
= P2 (h) + · · · + tk−2 Pk (h),
t2
Proof. Corollary 2.75 says that f (a+h) = Pa,k (h)+Ra,k (h), where Ra,k (h)/|h|k
tends to zero as h does. If also f (a+h) = Q(h)+E(h), then Q−Pa,k = Ra,k −E,
so
Q(h) − Pa,k (h) Ra,k (h) − E(h)
k
= → 0.
|h| |h|k
By Lemma 2.76, Q = Pa,k .
Theorem 2.77 has the following important practical consequence. If one wants
to compute the Taylor expansion of f , it may be very tedious to calculate all the
derivatives needed in formula (2.69) directly. But if one can find, by any means
whatever, a polynomial Q of degree k such that [f (a + h) − Q(h)]/|h|k → 0,
then Q must be the Taylor polynomial. This enables one to generate new Taylor
expansions from old ones by operations such as substitution, multiplication, etc.
2
E XAMPLE 2. Find the 3rd-order Taylor polynomial of f (x, y) = ex +y about
(x, y) = (0, 0).
Solution. The direct method is to calculate the derivatives fx , fy , fxx , fxy ,
fyy , fxxx , fxxy , fxyy , and fyyy , and then plug the results into (2.69), but only a
masochist would do this. Instead, use the familiar expansion for the exponential
function (Proposition 2.65), neglecting all terms of order higher than 3:
2 +y
ex = 1 + (x2 + y) + 12 (x2 + y)2 + 61 (x2 + y)3 + (order > 3)
= 1 + x2 + y + 12 (x4 + 2x2 y + y 2 ) + 61 (x6 + 3x4 y + 3x2 y 2 + y 3 )
+ (order > 3)
= 1 + y + x2 + 12 y 2 + x2 y + 16 y 3 + (order > 3).
In the last line we have thrown the terms x4 , x6 , x4 y, and x2 y 2 into the garbage
pail, since they are themselves of order > 3. Thus the answer is 1 + y + x2 +
94 Chapter 2. Differential Calculus
1 2
2y + x2 y + 16 y 3 . Alternatively,
2 +y 2
ex = ex ey = (1 + x2 + · · · )(1 + y + 12 y 2 + 16 y 3 + · · · )
= 1 + y + x2 + 12 y 2 + x2 y + 61 y 3 + · · ·
EXERCISES
1. Let f (x) = x2 (x − sin x) and g(x) = (ex − 1)(cos 2x − 1)2 .
a. Compute the Taylor polynomials of order 5 based at a = 0 of f and g.
(Don’t compute any derivatives; use Proposition 2.65 as a starting point.)
b. Use the result of (a) to find limx→0 f (x)/g(x) without using l’Hôpital’s
rule.
2. Find the Taylor polynomial P1,3 (h) and give a constant C such that |R1,3 (h)| ≤
Ch4 on the interval |h| ≤ 12 for each of the following functions.
a. f (x) = log x.
√
b. f (x) = x.
c. f (x) = (x + 3)−1 .
3. Show that | sin x − x + 61 x3 | < .08 for |x| ≤ 12 π. (Hint: x − 16 x3 is actually
the 4th-order Taylor polynomial of sin x.) How large do you have to take k so
that the kth-order Taylor polynomial of sin x about a = 0 approximates sin x
to within .01 for |x| ≤ 12 π?
2 ;1 2
4. Use a Taylor approximation to e−x to compute 0 e−x dx to three decimal
places, and prove the accuracy of your answer. (Hint: It’s easier to apply
Corollary 2.61 to f (t) = e−t and set t = x2 than to apply Corollary 2.61
2
to e−x directly.)
5. Find the Taylor polynomial of order 4 based at a = (0, 0) for each of the
following functions. Don’t compute any derivatives; use Proposition 2.65.
a. f (x, y) = x sin(x + y).
b. exy cos(x2 + y 2 ).
c. ex−2y /(1 + x2 − y).
6. Find the 3rd-order Taylor polynomial of f (x, y) = x + cos πy + x log y based
at a = (3, 1).
7. Find the 3rd-order Taylor polynomial of f (x, y, z) = x2 y + z based at a =
(1, 2, 1). The remainder vanishes identically; why? (You can see this either
from the Taylor remainder formula or by algebra.)
2.8. Critical Points 95
where C and λ are positive constants (cf. Exercise 1 in §1.8). Use (2.70) to
show that there is another positive constant C ′ such that
Proof. If f has a local maximum or minimum at a, then for any unit vector u,
the function g(t) = f (a + tu) has a local maximum or minimum at t = 0, so
g′ (0) = ∂u f (a) = 0. In particular, ∂j f (a) = 0 for all j, so ∇f (a) = 0.
96 Chapter 2. Differential Calculus
How can we tell whether a function has a local maximum or minimum (or nei-
ther) at a critical point? For functions of one variable we have the second derivative
test: If f is of class C 2 , then f has a local minimum at a if f ′′ (a) > 0 and a local
maximum if f ′′ (a) < 0. (If f ′′ (a) = 0, no conclusion can be drawn.) Something
similar happens for functions of n variables, but the situation is a good deal more
complicated. The full story involves a certain amount of linear algebra; the reader
who is content to consider the case of two variables and wishes to skip the linear
algebra may proceed directly to Theorem 2.82.
Suppose f is a real-valued function of class C 2 on some open set S ⊂ Rn and
that f has a critical point at a, i.e., ∇f (a) = 0. Instead of one second derivative to
examine at a, we have a whole n × n matrix of them, called the Hessian of f at a:
⎛ 2 ⎞
∂1 f (a) ∂1 ∂2 f (a) . . . ∂1 ∂n f (a)
⎜ ∂2 ∂1 f (a) ∂ 2 f (a) . . . ∂2 ∂n f (a)⎟
⎜ 2 ⎟
(2.79) H = H(a) = ⎜ .. .. . .. ⎟.
⎝ . . . . . ⎠
∂n ∂1 f (a) ∂n ∂2 f (a) . . . ∂n2 f (a)
The equality of mixed partials (Theorem 2.45) guarantees that this is a symmetric
matrix, that is, Hij = Hji .
By (2.73), the second-order Taylor expansion of f about a is
n
" n
1 "
f (a + k) = f (a) + ∂j f (a)kj + ∂i ∂j f (a)ki kj + Ra,2 (k).
2
j=1 i,j=1
(We use k rather than h for the increment in this section to avoid a notational clash
with the Hessian H.) If ∇f (a) = 0, the first-order sum vanishes, and the second-
1!
order sum is 2 Hij ki kj = 21 Hk · k. In short,
Now we can begin to see how to analyze the behavior of f about a in terms of
the matrix H. To start with the simplest situation, suppose it happens that all the
mixed partials ∂i ∂j f (i ̸= j) vanish at a. Denoting ∂j2 f (a) by λj , we then have
n
"
f (a + k) = f (a) + λj kj2 + Ra,2 (k).
1
! us 2neglect the remainder term for the moment. If all λj are positive, then
Let
λj kj > 0 for all k ̸= 0, so f has a local minimum; likewise, if all λj are neg-
ative, then f has a local maximum. If some λj are positive and some are negative,
2.8. Critical Points 97
!
then λj kj2 will be positive for some values of k and negative for others, so f will
have neither a maximum or a minimum. It’s not hard to see that these conclusions
remain valid when the remainder term is included; we shall present the details be-
low. Only when some of the λj are zero is the outcome unclear; it is precisely in
this situation that the remainder term plays a significant role.
This is all very well, but the condition that ∂i ∂j f (a) = 0 for i ̸= j is ob-
viously very special. However, it can always be achieved by a suitable rotation
of coordinates, that is, by replacing the standard basis for Rn with another suit-
ably chosen orthonormal basis. This is the content of the spectral theorem, which
says that every symmetric matrix has an orthonormal eigenbasis (see Appendix A,
(A.56)–(A.58)). With this result in hand, we arrive at the second-derivative test for
functions of several variables.
Proof. We prove only the first assertion; the argument for the second one is similar.
Let u1 , . . . , un be an orthonormal eigenbasis for H with eigenvalues λ1 , . . . , λn .
Our assertion is then that f has a local minimum if all the eigenvalues are (strictly)
positive but not if some eigenvalue is negative.
If all eigenvalues are positive, let l be the smallest of them. Writing k = c1 u1 +
· · · + cn un as before, we have
" "
Hk · k = λj c2j ≥ l c2j = l|k|2 .
But when k is near 0, the error term in (2.80) is less than 14 l|k|2 by Corollary 2.75,
so
f (a + k) − f (a) ≥ 21 l|k|2 − 14 l|k|2 > 0.
Thus f has a local minimum. On the other hand, if some eigenvalue, say λ1 , is
negative, the same argument shows that f (a + tu1 ) − f (a) < 0 for small t ̸= 0, so
f does not have a local minimum.
In short, if all eigenvalues are positive, then f has a local minimum; if all
eigenvalues are negative, then f has a local maximum. If there are two eigenvalues
of opposite signs, then f is said to have a saddle point. At a saddle point, f has
neither a maximum nor a minimum; its graph goes up in one direction and down in
some other direction. The only cases where we can’t be sure what’s going on are
98 Chapter 2. Differential Calculus
those where all the eigenvalues of H are nonnegative or nonpositive but at least one
of them is zero. When that happens, if k is an eigenvector with eigenvalue 0 (i.e.,
k is in the nullspace of H), the quadratic term in (2.80) vanishes and the remainder
term becomes significant; to determine the behavior of f near a we need to look at
the higher-order terms in the Taylor expansion.
Some types of critical points are illustrated in Figure 2.5. A critical point for
which zero is an eigenvalue of the Hessian matrix H — or equivalently, for which
det H = 0 or H is singular — is called degenerate.
In two dimensions it is easy to sort out the various cases:
E XAMPLE 1. Find and classify the critical points of the function f (x, y) =
xy(12 − 3x − 4y).
Solution. We have
y = x = 0, y = 12 − 3x − 8y = 0,
12 − 6x − 4y = x = 0, and 12 − 6x − 4y = 12 − 3x − 8y = 0.
Solving these gives the critical points (0, 0), (4, 0), (0, 3), and ( 34 , 1). Since
∂x2 f = −6y, ∂y2 f = −8x, and ∂x ∂y f = 12 − 6x − 8y, Theorem 2.82 shows
that the first three of these are saddle points and the last is a local maximum.
The geometry of this example is quite simple. The set where f = 0 is the
union of the three lines x = 0, y = 0, and 3x + 4y = 12. These lines separate
the plane into regions on which f is alternately positive and negative. The three
saddle points are the points where these lines intersect, and the local maximum
is the “peak” in the middle of the triangle defined by these lines.
E XAMPLE 2. Find and classify the critical points of the function f (x, y) =
y 3 − 3x2 y.
Solution. We have ∂x f = −6xy and ∂y f = 3y 2 − 3x2 . Thus, if ∂x f = 0,
then either x = 0 or y = 0, and the equation ∂y f = 0 then forces x = y = 0.
So (0, 0) is the only critical point. The reader may readily verify that all the
second derivatives√of f vanish
√ at (0, 0), so Theorem 2.82 is of√no use. But since
f (x, y) = y(y − 3 x)(y + 3 x), the lines y = 0 and y = ± 3 x separate the
plane into six regions on which f is alternately positive and negative, and these
regions all meet at the origin. Thus f has neither a maximum nor a minimum at
the origin. This configuration is called a “monkey saddle.” (The three regions
where f < 0 provide places for the two legs and tail of a monkey sitting on the
graph of f at the origin.)
EXERCISES
1. Find all the critical points of the following functions. Tell whether each nonde-
generate critical point is a local maximum, local minimum, or saddle point. If
possible, tell whether the degenerate critical points are local extrema too.
100 Chapter 2. Differential Calculus
a. f (x, y) = x2 + 3y 4 + 4y 3 − 12y 2 .
b. f (x, y) = x4 − 2x2 + y 3 − 6y.
c. f (x, y) = (x − 1)(x2 − y 2 ).
d. f (x, y) = x2 y 2 (2 − x − y).
2 2
e. f (x, y) = (2x2 + y 2 )e−x −y .
f. f (x, y) = ax−1 + by −1 + xy, a, b ̸= 0. (The nature of the critical point
depends on the signs of a and b.)
g. f (x, y, z) = x3 − 3x − y 3 + 9y + z 2 .
2 2 2
h. f (x, y, z) = (3x2 + 2y 2 + z 2 )e−x −y −z .
i. f (x, y, z) = xyz(4 − x − y − z).
2. What are the conditions on a, b, c for f (x, y) = ax2 + bxy + cy 2 to have a
minimum, maximum, or saddle point at the origin?
3. The origin is a degenerate critical point of the functions f1 (x, y) = x2 + y 4 ,
f2 (x, y) = x2 − y 4 , and f3 (x, y) = x2 + y 3 . Describe the graphs of these three
functions near the origin. Is the origin a local extremum for any of them?
4. Let f (x, y) = (y − x2 )(y − 2x2 ).
a. Show that the origin is a degenerate critical point of f .
b. Show that the restriction of f to any line through the origin (i.e., the func-
tion g(t) = f (at, bt) for any (a, b) ̸= (0, 0)) has a local minimum at the
origin, but f does not have a local minimum at the origin. (Hint: Consider
the regions where f > 0 or f < 0.)
5. Let H be the Hessian of f . Show that for any unit vector u, Hu · u is the
second directional derivative of f in the direction u.
of calculus, we shall assume that S is either (i) the closure of an open set with
a smooth or piecewise smooth boundary, or (ii) a smooth submanifold, such as a
curve or surface, defined by one or more constraint equations. (These geometric
notions will be studied in more detail in Chapter 3.)
Suppose, to begin with, that S is the closure of an open set in Rn , and that we
wish to find the absolute maximum or minimum of a differentiable function f on
S. We assume that the boundary of S is a smooth submanifold (a curve if n = 2, a
surface if n = 3) that can be described as the level set of a differentiable function
G, or that it is the union of a finite number of pieces of this form. (For example,
if S is a cube, its boundary is the union of six faces, each of which is a region in a
smooth surface, viz., a plane.) If S is bounded, the extreme values are guaranteed
to exist, and we can proceed as follows.
ii. To find candidates for extreme values on the boundary, we can apply the
techniques for solving extremal problems with constraints presented below.
iii. Finally, we pick the smallest and largest of the values of f at the points
found in steps (i) and (ii); these will be the minimum and maximum of f on
S. There is usually no need to worry about the second derivative test in this
situation.
If S is unbounded, the procedure is the same, but we must add an extra argu-
ment to show that the desired extremum actually exists. This must be done on a
case-by-case basis, as there is no general procedure available; however, here are a
couple of simple results that cover many situations in practice and illustrate the sort
of reasoning that must be employed.
the extreme value theorem, f has a minimum on V , say at a ∈ V . But then f (a) is
the absolute minimum of f on V because f (x) > f (x0 ) ≥ f (a) for x ∈ S \ V .
The proof of (b) is similar. If f (x0 ) > 0, let V = {x : f (x) ≥ f (x0 )}. Then
V is closed (by Theorem 1.13) and bounded (since f (x) → 0 as |x| → ∞). By the
extreme value theorem, f has a maximum on V , say at a ∈ V . But then f (a) is the
absolute maximum of f on S because f (x) < f (x0 ) ≤ f (a) for x ∈ S \ V .
E XAMPLE 1. Find the absolute maximum and minimum values of the function
x
f (x, y) = 2 on the first quadrant S = {(x, y) : x, y ≥ 0}.
x + (y − 1)2 + 4
Solution. Clearly f (x, y) ≥ 0 for x, y ≥ 0, and f (0, y) = 0, so the
minimum is zero, achieved at all points on the y-axis. Moreover, f (x, y) is less
than the smaller of x−1 and (y − 1)−2 , so it vanishes as |(x, y)| → ∞. Hence,
by Theorem 2.83, f has a maximum on S, which must occur either in the
interior of S or on the positive x-axis. A short calculation that we leave to the
reader shows that the only critical point of f in S is at (2, 1), and f (2, 1) = 14 .
Also, f (x, 0) = x/(x2√+ 5), and the critical
√ points of this function of one
variable
√ at x = ± 5. Only x = 5 is relevant for our purposes, and
are √
f ( 5, 0) = 5/10, which is a bit less than 14 . Thus the maximum value of f
on S is 14 .
This is the key to the method. The n equations ∂j f = λ∂j G together with the
constraint equation G = 0 give n+1 equations in the n+1 variables x1 , . . . , xn and
λ, and solving them simultaneously will locate the local extrema of f on S. (It will
also produce the appropriate values of λ, which are usually not of much interest,
although one may have to find them in the process of solving for the xj ’s.) This
method is called Lagrange’s method, and the parameter λ is called the Lagrange
multiplier for the problem.
The other methods described above involve reducing the original n-variable
problem to an (n − 1)-variable problem, whereas Lagrange’s method deals directly
with the original n variables. This may be advantageous when the reduction is awk-
ward or when it would involve breaking some symmetry of the original problem.
The disadvantage is that, whereas the other methods lead to solving n − 1 equations
in n − 1 variables, Lagrange’s method requires solving n + 1 equations in n + 1
variables.
E XAMPLE 2. Let’s try out Lagrange’s method on the simple problem of max-
imizing the area of a rectangle with perimeter P . Here f (x, y) = xy and
G(x, y) = 2x + 2y − P , so the equations ∂x f = λ∂x G, ∂y f = λ∂y G, and
G = 0 become
y = 2λ, x = 2λ, 2x + 2y = P.
The first two equations give y = x; substituting into the third equation shows
1 2
that x = y = 14 P , so the maximum of f is 16 P . (Note that the only relevant
1
values of x and y are 0 ≤ x, y ≤ 2 P , so we’re working on a compact set and
the existence of the maximum is not in question. The minimum on this set,
namely 0, is achieved when x = 0, y = 12 P , or vice versa.)
104 Chapter 2. Differential Calculus
EXERCISES
1. Find the extreme values of f (x, y) = 2x2 + y 2 + 2x on the set {(x, y) :
x2 + y 2 ≤ 1}.
2. Find the extreme values of f (x, y) = 3x2 − 2y 2 + 2y on the set {(x, y) :
x2 + y 2 ≤ 1}.
3. Find the extreme values of f (x, y) = x3 − x + y 2 − 2y on the closed triangular
region with vertices at (−1, 0), (1, 0), and (0, 2).
4. Find the extreme values of f (x, y) = 3x2 − 8xy − 4y 2 + 2x + 16y on the set
{(x, y) : 0 ≤ x ≤ 4, 0 ≤ y ≤ 3}.
5. Let f (x, y) = (A − bx − cy)2 + x2 + y 2 , where A, b, c are positive constants.
Show that f has an absolute minimum on R2 and find it.
2 2
6. Show that f (x, y) = (x2 + 2y 2 )e−x −y has an absolute minimum and maxi-
mum on R2 , and find them.
2 2
7. Show that f (x, y) = (x2 − 2y 2 )e−x −y has an absolute minimum and maxi-
mum on R2 , and find them.
8. Let f (x, y) = xy+3x−1 +4y −1 . Show that f has a minimum but no maximum
on the set {(x, y) : x, y > 0}, and find the minimum.
9. Find the extreme values of f (x, y, z) = x2 + 2y 2 + 3z 2 on the unit sphere
{(x, y, z) : x2 + y 2 + z 2 = 1}.
10. Let (x1 , y1 ), . . . , (xk , yk ) be points in the plane whose x-coordinates are not
all equal. The linear function f (x) = ax + b such that the sum of the squares
!kthe vertical distances
of from the given points to the line y = ax + b (namely,
(y − ax − b) 2 ) is minimized is called the linear least-squares fit to the
1 j j
points (xj , yj ). Show that it is given by
!
k−1 k1 xj yj − x y
a = −1 ! n 2 2 , b = y − ax,
k 1 xj − x
! !
where x = k−1 k1 xj and y = k−1 k1 yj are the averages of the xj ’s and
yj ’s.
11. Let x, y, z be positive variables and a, b, c positive constants. Find the mini-
mum of x + y + z subject to the constraint (a/x) + (b/y) + (c/z) = 1.
12. Find the minimum possible value of the sum of the three linear dimensions
(length, breadth, and width) of a rectangular box whose volume is a given
constant V . Is there a maximum possible value?
13. Find the point on the line through (1, 0, 0) and (0, 1, 0) that is closest to the
line through (0, 0, 0) and (1, 1, 1). (Hint: Minimize the square of the distance.)
106 Chapter 2. Differential Calculus
14. Find the maximum possible volume of a rectangular solid if the sum of the
areas of the bottom and the four vertical sides is a constant A, and find the
dimensions of the box that has the maximum volume.
15. The two planes x + z = 4 and 3x − y = 6 intersect in a line L. Use Lagrange’s
method to find the point on L that is closest to the origin. (Hint: Minimize the
square of the distance.)
16. Find the maximum value of (xv − yu)2 subject to the constraints x2 + y 2 = a2
and u2 +v 2 = b2 . Do this (a) by Lagrange’s method, (b) by the parametrization
x = a cos θ, y = a sin θ, u = b cos ϕ, v = b sin ϕ.
17. Let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be two points in the plane such that
x1 ̸= x2 and y1 > 0 > y2 . A particle travels in a straight line from P1 to a point
Q on the x-axis with speed v1 , then in a straight line from Q to P2 with speed
v2 . The point Q is allowed to vary. Use Lagrange’s method to show that the
total travel time from P1 to P2 is minimized when (sin θ1 )/(sin θ2 ) = v1 /v2 ,
where θ1 (resp. θ2 ) is the angle between the line P1 Q (resp. QP2 ) and the
vertical line through Q. (Hint: Take θ1 , θ2 as the independent variables.)
18. Let x1 , x2 , . . . , xn denote nonnegative numbers. For c > 0, maximize the
product x1 x2 · · · xn subject to the constraint x1 + x2 + · · · + xn = c, and hence
derive the inequality of geometric and arithmetic means,
/ 01/n x1 + x2 + · · · + xn
x1 x2 · · · xn ≤ (x1 , . . . , xn ≥ 0),
n
where equality holds if and only if the xj ’s are all equal.
19. Let A be a symmetric n × n matrix, and let f (x) = (Ax) · x for x ∈ Rn . Show
that the maximum and minimum of f on the unit sphere {x : |x| = 1} are the
largest and smallest eigenvalues of A.
• A map f : Rn → Rn can represent a vector field, that is, a map that assigns
to each point x a vector quantity f (x) such as a force or a magnetic field.
You can see that the study of mappings from Rn to Rm is complicated, as the study
of the linear ones already constitutes the subject of linear algebra! However, the
basic ideas of differential calculus generalize easily from the scalar case. The only
bits of linear algebra we need for present purposes are the correspondence between
linear maps and matrices, the notion of addition and multiplication of matrices, and
the notion of determinant; see Appendix A, (A.3)–(A.15) and (A.24)-(A.33).
2
Here we use the word “linear” in the more restrictive sense; see Appendix A, (A.5).
108 Chapter 2. Differential Calculus
The general form of the chain rule can now be stated very simply:
2.86 Theorem (Chain Rule III). Suppose g : Rk → Rn is differentiable at a ∈ Rk
and f : Rn → Rm is differentiable at g(a) ∈ Rn . Then H = f ◦ g : Rk → Rm is
differentiable at a, and
DH(a) = Df (g(a))Dg(a),
where the expression on the right is the product of the matrices Df (g(a)) and
Dg(a).
Proof. Differentiability of H is equivalent to the differentiability of each of its
components Hi = fi ◦ g, and for these we have, by Theorem 2.29,
n
"
∂m Hi = (∂1 fi )(∂m g1 ) + · · · + (∂n fi )(∂m gn ) = (∂j fi )(∂m gj ).
j=1
Since the product of two matrices gives the composition of the linear transfor-
mations defined by those matrices, the chain rule just says that the linear approxi-
mation of a composition is the composition of the linear approximations.
As we pointed out at the end of §2.1, the mean value theorem is false for vector-
valued functions. That is, for a differentiable Rm -valued function f with m > 1,
given two points a and b there is usually no c on the line segment between a and b
such that f (b) − f (a) = [Df (c)][b − a]. However, the main corollary of the mean
value theorem, an estimate on |f (a) − f (b)| in terms of a bound on the derivative
of f , is still valid. To state it, we employ the following terminology: The norm of
a linear mapping A : Rn → Rm is the smallest constant C such that |Ax| ≤ C|x|
for all x ∈ Rn . The norm of A is denoted by ∥A∥; thus,
Equivalently, ∥A∥ = max{|Ax| : |x| = 1}; see Exercise 9. An estimate for ∥A∥
in terms of the entries Ajk is given in Exercise 10.
2.88 Theorem. Suppose f is a differentiable Rm -valued function on an open con-
vex set S ⊂ Rn , and suppose that ∥Df (x)∥ ≤ M for all x ∈ S. Then
The desired result now follows by taking u to be the unit vector in the direction of
f (b)−f (a), so that u·[f (b)−f (a)] = |f (b)−f (a)|. (Of course, if f (b)−f (a) = 0,
the result is trivial.)
so
∂(u, v) ∂(u, v)
= (10x − 3y 2 )e5y−7z , = −21xy 2 e5y−7z ,
∂(x, y) ∂(y, z)
∂(u, v)
= −14xe5y−7z .
∂(x, z)
EXERCISES
if we think of ∇h(x), f (x), and g(x) as column vectors. (Here A∗ denotes the
transpose of the matrix A; see Appendix A, (A.15).)
112 Chapter 2. Differential Calculus
8. Suppose that w = f (x, y, t, s) and x and y are also functions of t and s (the
situation depicted in Figure 2.3). The total dependence of w on t and s can be
expressed by writing w = f (g(t, s)) where g(t, s) = (x(t, s), y(t, s), t, s).
Show that the chain rule (2.86), applied to the composite function f ◦ g, yields
the same result as the one obtained in §2.3.
9. Let A : Rn → Rm be a linear map.
a. Show that the function ϕ(x) = |Ax| has a maximum value on the set
{x : |x| = 1}.
b. Let M be the maximum in part (a). Show that |Ax| ≤ M |x| for all x ∈ Rn ,
with equality for at least one unit vector x. Deduce that M = ∥A∥.
10. Let A : Rn → Rm be a linear map. !
√ n
a. Show that ∥A∥ ≤ m maxm j=1 ( k=1 |Ajk |). (Hint: Use (1.3).)
b. Show that this inequality is an equality when the matrix of A is given by
Aj1 = 1 and Ajk = 0 for k > 1 (1 ≤ j ≤ m).
Chapter 3
113
114 Chapter 3. The Implicit Function Theorem and Its Applications
ask when this surface can be represented as the graph of a function z = f (x, y),
y = g(x, z), or x = h(y, z).
Simple examples show that it is usually impossible to represent the whole set
S = {x : F (x) = 0} as the graph of a function. For example, if n = 2 and
F (x, y) = x2 + y 2 − 1, the set S is the unit
√ circle. We can represent the upper or
lower semicircle as the graph
' of f (x) = ± 1 − x2 , and the right or left semicircle
as the graph of g(y) = ± 1 − y 2 , but the whole circle is not a graph. Thus, in
order to get reasonable results, we must be content only to represent pieces of S
as graphs. More specifically, our object will be to represent a piece of S in the
neighborhood of a given point a ∈ S as a graph.
Since we want to single out one of the variables as the one to be solved for, we
make a little change of notation: We denote the number of variables by n + 1 and
denote the last variable by y rather than xn+1 . We then have the following precise
analytical statement of the problem:
Given a function F (x, y) of class C 1 and a point (a, b) satisfying F (a, b) = 0,
when is there
i. a function f (x), defined in some open set in Rn containing a, and
ii. an open set U ⊂ Rn+1 containing (a, b), such that for (x, y) ∈ U ,
F (x, y) = 0 ⇐⇒ y = f (x)?
We do not try to specify in advance how big the open sets in question will be; that
will depend strongly on the nature of the function F .
The key to the answer is to look at the linear case. If
L(x1 , . . . , xn , y) = α1 x1 + · · · + αn xn + βy + c,
the solution is obvious: The equation L(x, y) = 0 can be solved for y if and only
if the coefficient β is nonzero. But near a given point (a, b), every differentiable
function F (x, y) is approximately linear; in fact, if F (a, b) = 0,
If the “small error” were not there, the equation F (x, y) = 0 could be solved for y
precisely when ∂y F (a, b) ̸= 0. We now show that the condition ∂y F (a, b) ̸= 0 is
still the appropriate one when the error term is taken into account.
3.1 Theorem (The Implicit Function Theorem for a Single Equation). Let
1
F (x, y) be a function of class C on some neighborhood of a point (a, b) ∈ R n+1 .
Suppose that F (a, b) = 0 and ∂y F (a, b) ̸= 0. Then there exist positive numbers
r0 , r1 such that the following conclusions are valid.
3.1. The Implicit Function Theorem 115
y
2r1
(a, b)
x
2r0
a. For each x in the ball |x − a| < r0 there is a unique y such that |y − b| < r1
and F (x, y) = 0. We denote this y by f (x); in particular, f (a) = b.
b. The function f thus defined for |x − a| < r0 is of class C 1 , and its partial
derivatives are given by
∂j F (x, f (x))
(3.2) ∂j f (x) = − .
∂y F (x, f (x))
Notes.
i. The number r0 may be very small, and there is no way to estimate its size
without further hypotheses on F .
ii. The formula (3.2) for ∂j f is, of course, the one obtained via the chain rule
by differentiating the equation F (x, f (x)) = 0.
Proof. We first prove (a). We may assume that ∂y F (a, b) > 0 (by replacing F by
−F if necessary). Since ∂y F is continuous, it remains positive in some neighbor-
hood of (a, b), say for |x−a| < r1 and |y −b| < r1 . On this set, F (x, y) is a strictly
increasing function of y for each fixed x. In particular, since F (a, b) = 0 we have
F (a, b + r1 ) > 0 and F (a, b − r1 ) < 0. The continuity of F then implies that for
some r0 ≤ r1 we have F (x, b + r1 ) > 0 and F (x, b − r1 ) < 0 for |x − a| < r0 .
In short, for each x in the ball B = {x : |x−a| < r0 } we have F (x, b−r1 ) < 0,
F (x, b + r1 ) > 0, and F (x, y) is strictly increasing as a function of y for |y − b| <
r1 . It follows from the intermediate value theorem that there is a unique y for each
x ∈ B that satisfies |y − b| < r1 and F (x, y) = 0, which establishes (a). See
Figure 3.1.
Next we observe that the function y = f (x) thus defined is continuous at x =
a; in other words, for any ϵ > 0 there is a δ > 0 such that |f (x) − f (a)| < ϵ
116 Chapter 3. The Implicit Function Theorem and Its Applications
whenever |x − a| < δ. Indeed, the argument just given shows that |f (x) − f (a)| =
|y − b| < r1 whenever |x − a| < r0 , and we could repeat that argument with r1
replaced by any smaller number ϵ to obtain an appropriate δ in place of r0 .
In fact, this argument can also be applied with a replaced by any other point x0
in the ball B to show that f is continuous at x0 . To recapitulate it briefly: Given
ϵ > 0, there exists δ > 0 such that if |x − x0 | < δ we have F (x, y0 − ϵ) < 0 and
F (x, y0 + ϵ) > 0, where y0 = f (x0 ). For each such x there is a unique y such
that |y − y0 | < ϵ and F (x, y) = 0, and that y is f (x); hence |f (x) − f (x0 )| =
|y − y0 | < ϵ.
Now that we know that f is continuous on B, we can show that its partial
derivatives ∂j f exist on B and are given by (3.2) — which also shows that they are
continuous. Given x ∈ B and a (small) real number h, let y = f (x) and
k = f (x + h) − f (x), where
h = (0, . . . , 0, h, 0, . . . , 0) with the h in the jth place.
0 = F (x + h, y + k) − F (x, y)
= h∂j F (x + th, y + tk) + k∂y F (x + th, y + tk)
point (a, b) for which F (a, b) = 0. Of course, for this particular F it is easy
to solve for x explicitly — namely, x = y 2 + 1 — and this solution is valid
not just locally but globally. Next, ∂y F (a, b) = 0 precisely when b = 0, so
the implicit function theorem guarantees that the equation F (x, y) = 0 can be
solved uniquely for y near any point (a,√ b) such that F (a,√b) = 0 and b ̸= 0.
In fact, the possible solutions are y = x − 1 and y = − x − 1. For√x very
√ of these solutions will be very close to b — namely, x − 1
close to a only one
if b > 0 and − x − 1 if b < 0 — and this solution is the one that figures in
the implicit function theorem. Also, these solutions are defined only for x ≥ 1,
so the number r0 in the statement of the implicit function theorem is a − 1.
Finally, we have F (1, 0) = 0, but the equation F (x, y) = 0 cannot be solved
uniquely for y as a function of x in any neighborhood of (1, 0): If x > 1 there
are two solutions, both equally close to 0, and if x < 1 there are none.
E XAMPLE 2. For a contrast with Example 1, let G(x, y) = x − e1−x − y 3 .
First, ∂x G(a, b) = 1 + e1−a > 1 for all (a, b), so the implicit function theorem
guarantees that the equation G(x, y) = 0 can be solved for x locally near
any point (a, b) such that G(a, b) = 0. It is not hard to see (Exercise 4) that
there is a single solution that works globally, but there is no nice formula for
this solution in terms of elementary functions. Next, ∂y G(a, b) = −3b2 , so
the implicit function theorem guarantees that the equation G(x, y) = 0 can
be solved for y as a C 1 function of x locally near any point (a, b) such that
G(a, b) = 0 and b ̸= 0. In fact, the solution is y = (x − e1−x )1/3 , which is
globally uniquely defined but fails to be differentiable at the point where y = 0
(i.e., x = 1).
We now turn to the more general problem of solving several equations simul-
taneously for some of of the variables occurring in them. This will require some
facts about invertible matrices and determinants, for which we refer to Appendix
A, (A.24)–(A.33) and (A.50)–(A.55). To fix the notation, we shall consider k func-
tions F1 , . . . , Fk of n + k variables x1 , . . . , xn , y1 , . . . , yk , and ask when we can
solve the equations
F1 (x1 , . . . , xn , y1 , . . . , yk ) = 0,
(3.4) ..
.
Fk (x1 , . . . , xn , y1 , . . . , yk ) = 0
for the y’s in terms of the x’s. We shall use vector notation to abbreviate (3.4) as
(3.5) F(x, y) = 0.
118 Chapter 3. The Implicit Function Theorem and Its Applications
We assume that F is of class C 1 near a point (a, b) such that F(a, b) = 0, and we
ask when (3.5) determines y as a C 1 function of x in some neighborhood of (a, b).
Again the key to the problem is to consider the linear case,
(3.6) Ax + By + c = 0,
yj ; then, after substituting the results into the remaining equation, one solves that
equation for the remaining variable. The main difficulty is in showing that the
implicit function theorem can be applied to the last equation.
(3.10) x − yu2 = 0, xy + uv = 0
the signs of u and v being the same as the signs of u0 and v0 , respectively. This
solution is valid for all (x, y) in the same quadrant as (x0 , y0 ). The problems
that arise if y0 = 0 or u0 = 0 are evident: If y0 = 0, then the formula for u
does not even make sense for y = y0 ; if u0 = 0, then x0 must also be 0, and
the square roots present the same sort of problem as in Example 1.
EXERCISES
3. Can the equation (x2 + y 2 + 2z 2 )1/2 = cos z be solved uniquely for y in terms
of x and z near (0, 1, 0)? For z in terms of x and y?
4. Sketch the graph of the equation x − e1−x − y 3 = 0 in Example 2. Show
graphically that for each x there is a unique y satisfying this equation, and vice
versa.
5. Suppose F (x, y) is a C 1 function such that F (0, 0) = 0. What conditions on
F will guarantee that the equation F (F (x, y), y) = 0 can be solved for y as a
C 1 function of x near (0, 0)?
6. Investigate the possibility of solving the equations xy + 2yz − 3xz = 0, xyz +
x − y = 1 for two of the variables as functions of the third near the point
(x, y, z) = (1, 1, 1).
7. Investigate the possibility of solving the equations u3 + xv − y = 0, v 3 + yu −
x = 0 for any two of the variables as functions of the other two near the point
(x, y, u, v) = (0, 1, 1, −1).
8. Investigate the possibility of solving the equations xy 2 + xzu + yv 2 = 3 and
u3 yz + 2xv − u2 v 2 = 2 for u and v as functions of x, y, and z near x = y =
z = u = v = 1.
9. Can the equations x2 + y 2 + z 2 = 6, xy + tz = 2, xz + ty + et = 0 be solved
for x, y, and z as C 1 functions of t near (x, y, z, t) = (−1, −2, 1, 0)?
y − f (x), and it is also the range of the map f (t) = (t, f (t)). The representations
(ii) and (iii) are more flexible, but they are also too general as they stand because
the sets represented by them may not be smooth curves. Consider the following
examples, in which c denotes an arbitrary real constant:
In these examples, the functions in question are all of class C 1 , but the sets they
describe fail to be smooth curves at certain points. However, they share a common
feature: The points where smoothness fails — namely, the origin in Examples 1–3
and the points (0, 1) and (1, 0) in Example 4 — are the points where the derivatives
of the relevant functions vanish. That is, the origin is the one and only point where
the gradients ∇F , ∇G, and ∇H vanish, and it is the image under f of the one and
only point (t = 0) where f ′ vanishes. Moreover, (0, 1) and (1, 0) are the images
under g of the points t = nπ and t = (n + 21 )π where g′ (t) = 0.
This suggests that it might be a good idea to impose the extra conditions that
∇F ̸= 0 on the set where F = 0 in (ii) and that f ′ (t) ̸= 0 in (iii). And indeed, with
the help of the implicit function theorem, it is easy to see that under these extra
conditions the representations (i)–(iii) are all locally equivalent. That is, if a curve
is represented in one of the forms (i)–(iii) and a is a point on the curve, at least a
small piece of the curve including the point a can also be represented in the other
two forms.
We now make this precise. Since (i) is more special than either (ii) or (iii), as
we have observed above, it is enough to see that a curve given by (ii) or (iii) can
also be represented in the form (i).
3.11 Theorem.
a. Let F be a real-valued function of class C 1 on an open set in R2 , and let S =
{(x, y) : F (x, y) = 0}. If a ∈ S and ∇F (a) ̸= 0, there is a neighborhood N
of a in R2 such that S ∩ N is the graph of a C 1 function f (either y = f (x) or
x = f (y)).
b. Let f : (a, b) → R2 be a function of class C 1 . If f ′ (t0 ) ̸= 0, there is an open
interval I containing t0 such that the set {f (t) : t ∈ I} is the graph of a C 1
function f (either y = f (x) or x = f (y)).
Proof. Part (a) is a special case of Corollary 3.3. As for (b), let f = (ϕ, ψ). If
f ′ (t0 ) ̸= 0, then either ϕ′ (t0 ) ̸= 0 or ψ ′ (t0 ) ̸= 0; let’s assume that the former
condition holds. Let F (x, t) = x − ϕ(t) and x0 = ϕ(t0 ). Since ∂t F (x0 , t0 ) =
−ϕ′ (t0 ) ̸= 0, the implicit function theorem guarantees that the equation x = ϕ(t)
can be solved for t as a C 1 function of x, say t = ω(x), in some neighborhood of the
point (x0 , t0 ). But then (ϕ(t), ψ(t)) = (x, ψ(ω(x))) for t in some neighborhood I
of t0 ; that is, the set {f (t) : t ∈ I} is the graph of the C 1 function f = ψ ◦ ω. (If
ψ ′ (t0 ) ̸= 0 instead, one can make the same argument with x and y switched.)
With this in mind, we may make the following more formal definition of a
smooth curve: A set S ⊂ R2 is a smooth curve if (a) S is connected, and (b)
every a ∈ S has a neighborhood N such that S ∩ N is the graph of a C 1 function
f (either y = f (x) or x = f (y)). This agrees with the notion of smooth curve
indroduced at the beginning of this section: The curve described by y = f (x)
has a tangent line at each point (x0 , f (x0 )), and that line is given by an equation
y − f (x0 ) = f ′ (x0 )(x − x0 ) whose coefficients depend continuously on x0 .
It should be emphasized that the conditions ∇F ̸= 0 and f ′ ̸= 0 in Theorem
3.11, are sufficient for the smoothness of the associated curves but not necessary.
In other words, the condition ∇F (a) = 0 or f ′ (t0 ) = 0 allows the possibility
of non-smoothness at a or f (t0 ) but does not guarantee it. For example, suppose
G(x, y) is a C 1 function whose gradient does not vanish on the set S = {(x, y) :
G(x, y) = 0}, so that S is a smooth curve, and let F = G2 . Then the set where
F = 0 coincides with S, but ∇F = 2G∇G ≡ 0 on S! Similarly, as t ranges over
the interval (−1, 1), the functions f (t) and g(t) = f (t3 ) describe the same curve,
but g′ (0) = 0 no matter what f is.
The following question remains: Suppose S is a subset of R2 that is described
in one of the forms (i)–(iii), and suppose that the regularity condition ∇F ̸= 0 on
S (in case (ii)) or f ′ (t) ̸= 0 for all t ∈ (a, b) (in case (iii)) is satisfied. Theorem
3.11 shows that every sufficiently small piece of S is a smooth curve, but is the
entire set S a smooth curve? In case (i) the answer is clearly yes. However, in cases
(ii) and (iii) the answer may be no.
The trouble in case (ii) is that S may be disconnected. For example, if F =
GH, then S is the union of the sets {(x, y) : G(x, y) = 0} and {(x, y) : H(x, y) =
0}, and these sets may well be disjoint and form a disconnection of S. (Also see
Exercise 6.)
E XAMPLE 5. Let F (x, y) = (x2 + y 2 − 1)(x2 + y 2 − 2). Then the set where
F = 0 is the union of two disjoint circles centered at the origin. See Figure
3.3.
E XAMPLE 6. Let F (x, y) = (x2 + y 2 − 1)(x2 + y 2 − 2x). Then the set S
where F = 0 is the union of the circles of√radius 1 about (0, 0) and (1, 0).
These circles intersect at the points ( 12 , ± 12 3), and S is not a smooth curve
at these points. The reader may verify that ∇F = (0, 0) at these points, in
accordance with Theorem 3.11. See Figure 3.3 and also Exercise 6.
As for the representation (iii), a set of the form {f (t) : a < t < b} is necessarily
connected if f is continuous (Theorem 1.26). However, the function f (t) may not
be one-to-one, in which case the curve it describes may be traced more than once
(as we observed in Example 4) or may cross itself. These phenomena can happen
124 Chapter 3. The Implicit Function Theorem and Its Applications
even if f ′ (t) never vanishes. Consequently, the condition f ′ (t) ̸= 0 is not sufficient
to guarantee that the set S = {f (t) : t ∈ (a, b)} is a smooth curve, only that
the pieces of it obtained by restricting t to small intervals are smooth curves. In
practice, sometimes one simply imposes the extra assumption that f is one-to-one
in order to avoid various pitfalls.
E XAMPLE 7. Let f (t) = (cos t, sin t). Then f ′ (t) = (− sin t, cos t) is never
zero since the sine and cosine functions have no common zeros, but f is one-to-
one on the interval (a, b) only when b − a ≤ 2π. The range {f (t) : t ∈ R} of f
is a smooth curve (namely, the unit circle), but in order to obtain a one-to-one
correspondence between points on the circle and values of the parameter t, one
must restrict t to an interval of the form [a, a + 2π) or (a, a + 2π].
E XAMPLE 8. Let f (t) = (t3 −t, t2 ). Then f ′ (t) = (3t2 −1, 2t) never vanishes,
but f (−1) = f (1) = (0, 1). The curve {f (t) : t ∈ R} loops around and
crosses itself at (0, 1), so it fails to be a smooth curve at that point. However,
{f (t) : t ∈ I} is a smooth curve as long as I is an interval whose closure does
not contain both −1 and 1. See Figure 3.3.
The reader with access to a computer graphics program may find it entertaining
to experiment with examples similar to the ones in this section to obtain a better
understanding of the relations between analytic and geometric properties of func-
tions and to see the various types of singularities that can arise when the regularity
condition ∇F ̸= 0 or f (t) ̸= 0 is violated.
EXERCISES
1. For each of the following functions F (x, y), determine whether the set S =
{(x, y) : F (x, y) = 0} is a smooth curve. Draw a sketch of S. Examine the
3.2. Curves in the Plane 125
As before, (i) is a special case of (ii) and (iii), with F (x, y, z) = z − f (x, y) and
f (u, v) = (u, v, f (u, v)), and as before, some additional conditions need to be
imposed in cases (ii) and (iii) in order to guarantee the smoothness of the surface.
The condition in case (ii) is exactly the same as for curves, namely, that
Here x = (x, y, z) and u = (u, v); the variables u and v are the parameters used
to represent the surface S. We can think of them as giving a coordinate system on
S, with the coordinate grid being formed by the images of the lines v = constant
and u = constant, that is, the curves given parametrically by x = f (u, c) and
x = f (c, v). The picture is as in Figure 3.4.
What is the appropriate nondegeneracy condition on the derivatives of f ? A first
guess might be that the Fréchet derivative Df (a 3 × 2 matrix) should be nonzero,
but this is not enough. We can obtain more insight by looking at the case where
f is linear, that is, f (u, v) = ua + vb + c for some a, b, c ∈ R3 . Typically the
range of such an f is a plane, but if the vectors a and b are linearly dependent
— that is, if one is a scalar multiple of the other — it will only be a line (unless
a = b = 0, in which case it is a single point). Now, for a general smooth f , the
linear approximation to f near a point (u0 , v0 ) is f (u, v) ≈ ua + vb + c where the
3.3. Surfaces and Curves in Space 127
v z
y
u
x
Since two vectors in R3 are linearly independent if and only if their cross product
is nonzero, (3.13) can be restated as
+ ,
∂f ∂f
(3.14) × (u, v) ̸= 0 at each (u, v) ∈ U .
∂u ∂v
Thus the representations (i)–(iii) for surfaces are locally equivalent in the pres-
ence of the regularity conditions (3.12) and (3.13); a smooth surface is a connected
subset of R3 that can be locally described in any of these three forms. The poten-
tial global problems with the representations (ii) and (iii) are the same as for plane
curves; namely, the set where a C 1 function F vanishes may be disconnected, and
a map f that is locally one-to-one need not be globally one-to-one.
/
E0XAMPLE 1. Let f (u, v) = (u + v) cos(u − v), (u + v) sin(u − v), u +
v . The set S = f (R2 ) is a right circular cone with vertex at the origin; it
is described nonparametrically by the equation x2 + y 2 = z 2 . The set S is
a smooth surface except at the origin, which accords with the fact that the
gradient of F (x, y, z) = x2 + y 2 − z 2 vanishes at the origin and nowhere else.
Correspondingly, the vectors
∂u f
/ 0
= cos(u − v) − (u + v) sin(u − v), sin(u − v) + (u + v) cos(u − v), 1
and
∂v f
/ 0
= cos(u − v) + (u + v) sin(u − v), sin(u − v) − (u + v) cos(u − v), 1
are linearly independent except when u + v = 0, in which case they coincide.
The map f is locally one-to-one except along the line u + v = 0, and this entire
line is mapped to the origin. (The reader will recognize that u + v and u − v are
really the r and θ of cylindrical coordinates in R3 . We have chosen to disguise
them a little in order to display a situation where ∂u f and ∂v f are both nonzero
but are linearly dependent where the singularities occur.)
3.3. Surfaces and Curves in Space 129
Here θ is the longitude and ϕ is the co-latitude, i.e., the latitude as measured
from the north pole rather than the equator. The longitude θ is only well defined
up to multiples of 2π, but the co-latitude is usually restricted to the interval
[0, π]. The sphere is a smooth surface, but the map f does not provide a “good”
parametrization of the whole sphere because it is not locally one-to-one when
sin ϕ = 0. (That is, the longitude is completely undetermined at the north and
south poles.) This degeneracy is also reflected in the tangent vectors
Finally, a few words about finding the tangent plane to a smooth surface S at a
point a ∈ S. In general, the tangent plane is given by the equation n · (x − a) = 0,
where n is a (nonzero) normal vector to S at a. We have already observed in
Theorem 2.37 that when S is given by an equation F = 0, then the vector ∇F (a)
is normal to S at a. On the other hand, when S is given parametrically as the range
of a map f (u, v), the vectors ∂u f (b, c) and ∂v f (b, c) are tangent to certain curves
in S and hence to S itself at f (b, c); we therefore obtain a normal at f (b, c) by
taking their cross product. In both cases, the conditions on F or f that guarantee
the smoothness of S also guarantee that these normal vectors are nonzero.
i. as a graph, y = f (x) and z = g(x) (or similar expressions with the coordi-
nates permuted), where f and g are C 1 functions;
The form (ii) describes the curve as the intersection of the two surfaces F = 0
and G = 0, and (i) is a special case of (ii) (with F (x, y, z) = y − f (x) and
G(x, y, z) = z − g(x)) and of (iii) (with f (t) = (t, f (t), g(t))).
By now the reader should be able to guess what the appropriate regularity con-
dition for cases (ii) and (iii) is. In (iii) it is simply that f ′ (t) ̸= 0, and in (ii) it is
that
(Geometrically, this means that the surfaces F = 0 and G = 0 are nowhere tangent
to each other.) With these conditions we have an analogue of Theorems 3.11 and
3.15. Rather than give another precise statement and proof, we sketch the ideas and
leave the details to the reader (Exercise 7).
First, if ∇F and ∇G are linearly independent, then at least one of the Jacobians
∂(F, G)/∂(x, y), ∂(F, G)/∂(x, z), and ∂(F, G)/∂(y, z) must be nonzero; let us
say the last one. Then the implicit function theorem guarantees that the equations
F = G = 0 can be solved for y and z as functions of x. Second, if f ′ (t) ̸= 0,
then one of the components of f ′ (t) must be nonzero; let us say the first one. Then
the equation x = f1 (t) can be solved for t in terms of x, and then the equations
y = f2 (t) and z = f3 (t) yield y and z as functions of x. In either case we end up
with the representation (i).
Let us say a little more about what can go wrong in case (ii) when ∇F and
∇G are linearly dependent. The potential problems are clearly displayed in the
following situation: Let F (x, y, z) = z − ϕ(x, y), where ϕ is a C 1 function, and
let G(x, y, z) = z. Then the sets where F = 0 and G = 0 are smooth surfaces; the
former is the graph of ϕ, whereas the latter is the xy-plane. The intersection of these
two surfaces is the curve in the xy-plane described by the equation ϕ(x, y) = 0.
Now, this curve can have all sorts of singularities if there are points on it where
∇ϕ = (0, 0), as we have discussed in §3.2. But since ∇F = (−∂x ϕ, −∂y ϕ, 1) and
∇G = (0, 0, 1), the points where ∇ϕ = (0, 0) are precisely the points where ∇F
and ∇G are linearly dependent.
If a curve S is represented parametrically by a function f (t), the derivative f ′ (t)
furnishes a tangent vector to S at the point f (t). On the other hand, if S is given
by a pair of equations F = G = 0 and a ∈ S, the vectors ∇F (a) and ∇G(a) are
both normal to S at a and hence span the normal plane to S at a. One can therefore
obtain a tangent vector to S at a by taking their cross product.
3.3. Surfaces and Curves in Space 131
Higher Dimensions. The pattern for representations of curves and surfaces that
we have established in this section and the preceding one should be pretty clear by
now, and it generalizes readily to higher dimensions. We sketch the main points
briefly and leave it to the ambitious reader to work out the details.
The general name for a “smooth k-dimensional object” is manifold; thus, a
curve is a 1-dimensional manifold and a surface is a 2-dimensional manifold. Here
we consider the question of representing k-dimensional manifolds in Rn , for any
positive integers k and n with n > k. The two general forms, corresponding to (ii)
and (iii) above for curves and surfaces, are as follows.
The Nonparametric Form: A k-dimensional manifold S in Rn can be described
as the set of simultaneous solutions of n − k equations. That is, given C 1 functions
F1 , . . . , Fn−k defined on some open set U ⊂ Rn , or equivalently a C 1 mapping
F = (F1 , . . . , Fn−k ) from U into Rn−k , we can consider the set
% &
(3.16) S = x : F(x) = 0 .
or, equivalently,
or equivalently,
EXERCISES
b. The surface obtained by revolving the curve z = f (x) (a < x < b) in the
xz-plane around the x-axis, where f (x) > 0.
c. The lower sheet of the hyperboloid z 2 − 2x2 − y 2 = 1.
d. The cylinder x2 + z 2 = 9.
4. Find a parametric description of the following lines:
a. The intersection of the planes x − 2y + z = 3 and 2x − y − z = −1.
b. The intersection of the planes x + 2y = 3 and y − 3z = 2.
5. Let S be the circle formed by intersecting the plane x + z = 1 with the sphere
x2 + y 2 + z 2 = 1.
a. Find a parametrization of S.
b. Find parametric equations for the tangent line to S at the point ( 12 , − √12 , 12 ).
6. Let S be the intersection of the cone z 2 = x2 + y 2 and the plane z = ax + 1,
where a ∈ R.
a. Show that S is a circle if a = 0, an ellipse if |a| < 1, a parabola if |a| = 1,
and a hyperbola if |a| > 1.
b. Find a parametrization for S in the first two cases and for the part of S
lying above the xy-plane in the third case.
7. Give a precise statement and proof of the analogue of Theorem 3.11 for curves
in R3 .
√ √
F IGURE 3.5: The transformation f (u, v) = 12 ( 3u − v, u + 3v).
lines v = ±u when c ̸= 0 and the union of these two lines when c = 0. See
Figure 3.8.
We can think of mappings from R3 to itself pictorially in the same way, though
the pictures are harder to draw. Figure 3.9 shows what happens to a cube under the
transformation f (u, v, w) = (−2u, v, 21 w).
Another common interpretation of a map f : Rn → Rn is as a coordinate
system on Rn . For example, we usually think of f (r, θ) = (r cos θ, r sin θ) as
representing polar coordinates in the plane. In the preceding discussion we thought
in terms of moving the points in Rn around without changing the labeling system
(namely, Cartesian coordinates); here we are thinking of leaving the points alone
but giving them different labels (polar rather than Cartesian coordinates.) It’s just
a matter of point of view; the same transformation f can be interpreted either way.
For example, the systems of parabolas and hyperbolas in Figures 3.7 and 3.8 can
136 Chapter 3. The Implicit Function Theorem and Its Applications
F IGURE 3.10: The polar coordinate transformation (x, y) = (r cos θ, r sin θ).
one further requirement that is natural to impose, namely, that the inverse mapping
f −1 : V → U should also be of class C 1 , so that the correspondence is smooth in
both directions. Hence, the question arises: Given a C 1 transformation f : U → V ,
when does f possess a C 1 inverse f −1 : V → U ? That is, when can the equation
f (x) = y be solved uniquely for x as a C 1 function of y?
This question is clearly closely related to the ones that led to the implicit func-
tion theorem, and indeed, if we restrict attention to the solvability of the equation
f (x) = y in a small neighborhood of a point, its answer becomes a special case of
that theorem. As we did before, we can guess what the answer should be by looking
at the linear approximation. If f (a) = b, the linear approximation to the equation
f (x) = y at the point (a, b) is T (x − a) = y − b where the matrix T is the Fréchet
derivative Df (a), and the latter equation can be solved for x precisely when T is
invertible, that is, when the Jacobian det Df (a) is nonzero. We are therefore led to
the following theorem.
3.18 Theorem (The Inverse Mapping Theorem). Let U and V be open sets in Rn ,
a ∈ U , and b = f (a). Suppose that f : U → V is a mapping of class C 1 and the
Fréchet derivative Df (a) is invertible (that is, the Jacobian det Df (a) is nonzero).
Then there exist neighborhoods M ⊂ U and N ⊂ V of a and b, respectively, so
that f is a one-to-one map from M onto N , and the inverse map f −1 from N to M
is also of class C 1 . Moreover, if y = f (x) ∈ N , D(f −1 )(y) = [Df (x)]−1 .
Proof. The existence of the inverse map is equivalent to the unique solvability of
the equation F(x, y) = 0 for x, where F(x, y) = f (x) − y. Since the derivative of
F as a function of x is just Df (x), the implicit function theorem (3.9) guarantees
that this unique solvability will hold for (x, y) near (a, b) provided that Df (a) is
invertible. (In referring to the statement of the implicit function theorem, however,
note that the roles of the variables x and y have been reversed here.) Moreover,
since f −1 (f (x)) = x for x ∈ M , the chain rule gives D(f −1 )(f (x)) · Df (x) = I
where I is the n × n identity matrix; in other words, D(f −1 )(y) = [Df (x)]−1
where y = f (x).
so det Df ̸= 0 on U , but f is not one-to-one since f (r, θ + 2kπ) = f (r, θ). It is,
however, locally one-to-one, in that it is one-to-one if one restricts θ to any interval
of length less than 2π. (Notice also that the Jacobian of the polar coordinate map
vanishes when r = 0. This accords with the fact that the polar coordinate map is
not even locally invertible there; the angular coordinate is completely undetermined
at the origin.)
The question of global invertibility is a delicate one. Consider the following
situation: Let f : Rn → Rn be a map whose component functions are all polyno-
mials, and suppose that the Jacobian det Df is identically equal to 1. Is f globally
invertible? The answer is so far unknown; this is a famous unsolved problem.
We should also point out that the invertibility of Df (a) is not necessary for the
existence of an inverse map, although it is necessary for the differentiability of that
inverse. (Example: Let f (x) = x3 . Then f has the global inverse f −1 (y) = y 1/3 ,
but f (0) = f ′ (0) = 0 and f −1 is not differentiable at 0.)
EXERCISES
1. For each of the following transformations (u, v) = f (x, y), (i) compute the
Jacobian det Df , (ii) draw a sketch of the images of some of the lines x =
constant and y = constant in the uv-plane, (iii) find formulas for the local
inverses of f when they exist.
a. u = ex cos y, v = ex sin y.
b. u = x2 , v = y/x.
c. u = x2 + 2xy + y 2 , v = 2x + 2y.
2. Let (u, v) = f (x, y) = (x − 2y, 2x − y).
a. Compute the inverse transformation (x, y) = f −1 (u, v).
b. Find the image in the uv-plane of the triangle bounded by the lines y = x,
y = −x, and y = 1 − 2x.
c. Find the region in the xy-plane that is mapped to the triangle with vertices
(0, 0), (−1, 2), and (2, 1) in the uv-plane.
3.4. Transformations and Coordinate Systems 139
that is,
f1 (x, y, z) = x + 2y − z,
f2 (x, y, z) = x − 3y + 4z,
f3 (x, y, z) = 2x − y + 3z.
It is easily verified that det A = 0, that the first two rows of A are independent,
and that the third row is the sum of the first two. This last relation means that
the functions f1 , f2 , f3 satisfy the linear relation f3 = f1 + f2 . Equivalently,
the range of F is the plane defined by the equation y3 = y1 + y2 .
3.5. Functional Dependence 141
has a nonzero solution, namely y = ∇Φ(f (x)). Therefore, its coefficient matrix
(∂j fk (x)), which is nothing but the transpose of Df (x), must be singular, and
hence det Df (x) = 0.
More interesting is the fact that the converse of this theorem is also true: The
vanishing of the Jacobian det Df implies the functional dependence of the fj ’s. We
now present a version of this result with an additional hypothesis (the constancy of
the rank of Df ) that yields a sharper conclusion. We formulate it so that it also cov-
ers the case when the number of functions differs from the number of independent
variables.
Proof. Let x = (x, y, z), u = f (x), v = g(x), and w = h(x), and fix x0 =
(x0 , y0 , z0 ) ∈ U .
First suppose k = 1. Since the matrix Df (x0 ) has rank 1, it has at least one
nonzero entry; by relabeling the functions and variables, we may assume that the
144 Chapter 3. The Implicit Function Theorem and Its Applications
(1, 1) entry is nonzero, that is, ∂x f (x0 ) ̸= 0. By the implicit function theorem,
then, the equation u = f (x, y, z) can be solved near x = x0 , u = u0 = f (x0 ), to
yield x as a function of y, z, and u. Then v and w turn into functions of y, z, and
u also. Implicit differentiation of the equations u = f (x, y, z) and v = g(x, y, z)
with respect to y (taking y, z, and u as the independent variables) yields
Solving the first equation for ∂y x and substituting the result into the second equa-
tion then yields
−∂y f 1 ∂(f, g)
∂y v = ∂x g + ∂y g = · .
∂x f ∂x f ∂(x, y)
But since Df has rank 1, all of its 2 × 2 submatrices are singular; therefore,
∂(f, g)/∂(x, y) ≡ 0 and hence ∂y v ≡ 0. Restricting to a convex neighborhood
of (y0 , z0 , u0 ), we conclude that v is independent of y. For exactly the same rea-
son, v is independent of z, and w is independent of y and z. That is, v and w are
functions of u alone, say v = ϕ(u) and w = ψ(u). This shows that f, g, h are
functionally dependent — g(x) = ϕ(f (x)) and h(x) = ψ(f (x)) — and that the
image of a neighborhood of x0 under f is the locus of the equations v = ϕ(u),
w = ψ(u), which is a smooth curve.
Now let us turn to the case k = 2. Here some 2 × 2 submatrix of Df (x0 ) is
nonsingular; by relabeling the functions and variables, we can assume that it is the
one in the upper left corner, so that ∂(f, g)/∂(x, y) is nonzero at x0 . By the implicit
function theorem, the equations u = f (x, y, z) and v = g(x, y, z) can be solved
near x = x0 , u = u0 = f (x0 ), v = v0 = g(x0 ), to yield x and y as functions of
u, v, and z. Taking u, v, and z as the independent variables, then, we differentiate
the equations u = f (x, y, z), v = g(x, y, z), and w = h(x, y, z) implicitly with
respect to z to obtain
or
We conclude with a few words about the assumption that the rank of Df is con-
stant. Suppose that A(x) is a matrix whose entries depend continuously on x ∈ U
(U an open subset of Rm ), and the rank of A(x0 ) is k. Since a set of linearly inde-
pendent vectors remains linearly independent if the vectors are perturbed slightly,
the rank of A(x) is at least k when x is sufficiently close to x0 . In other words,
for each k the set {x ∈ U : rank(A(x)) ≥ k} is open. In particular, if k0 is the
maximum rank of A(x) as x ranges over U , then {x ∈ U : rank(A(x)) = k0 } is
open.
Now, in this chapter we have been concerned with C 1 maps f : U → Rn (U
an open subset of Rm ) and the matrix in question is the derivative Df (x). If k0
is the maximum rank of this matrix as x ranges over U , the set V = {x ∈ U :
rank(Df (x)) = k0 } is open, and the theorems of this chapter can be applied on V .
(The implicit function and inverse mapping theorems deal with the case when k0 is
as large as possible, namely, k0 = min(m, n); the theorems of this section provide
information for smaller values of k.) The typical situation is that V is dense in U ,
that is, the set U \ V has no interior points. Thus, the structure of the mapping
f near “most” points of U (the ones in V ) is fairly simple to understand, but at
the remaining points, various kinds of singularities can occur. The study of such
singularities is a substantial and rather intricate branch of mathematical analysis.
EXERCISES
1. For each of the following maps f = (f, g, h), determine whether f, g, h are
functionally dependent on some open set U ⊂ R3 by examining the Jacobian
146 Chapter 3. The Implicit Function Theorem and Its Applications
∂(f, g, h)/∂(x, y, z). If they are, determine the rank of Df on U and find
functional relations (one relation if rank(Df ) = 2, two relations if rank(Df ) =
1) satisfied by f, g, h.
a. f (x, y, z) = x + y − z, g(x, y, z) = x − y + z, h(x, y, z) = x2 + y 2 +
z 2 − 2yz.
b. f (x, y, z) = x2 + y 2 + z 2 , g(x, y, z) = x + y + z, h(x, y, z) = y − z.
c. f (x, y, z) = y 1/2 sin x, g(x, y, z) = y cos2 x − y, h(x, y, z) = z − 3.
d. f (x, y, z) = xy+z, g(x, y, z) = x2 y 2 +2xyz+z 2 , h(x, y, z) = 2−xy−z.
e. f (x, y, z) = log x − log y + z, g(x, y, z) = log x − log y − z, h(x, y, z) =
(x2 + 2y 2 )/xy.
f. f (x, y, z) = x − y + z, g(x, y, z) = x2 − y 2 , h(x, y, z) = x + z.
2. Write out the statement and give a precise proof for the following special cases
of Theorem 3.21, along the lines of Theorem 3.22.
a. m = n = 2, k = 1.
b. m = 2, n = 3, k = 1.
Chapter 4
INTEGRAL CALCULUS
In this chapter we study the integration of functions of one and several real vari-
ables. As we assume that the reader is already familiar with the standard techniques
of integration for functions of one variable, our discussion of integration on the line
is limited to theoretical issues. On the other hand, some of these issues arise also in
higher dimensions, and we shall sometimes invoke the careful treatment of the one-
variable case as an excuse for being somewhat sketchy in developing the theory for
several variables.
In elementary calculus, the term “integral” can! refer either to the antiderivative
of a function f or to a limit of sums of the form f (xj )∆xj ; one speaks of in-
definite or definite integrals. At the more advanced level, and in particular in this
book, “integral” almost always carries the latter meaning. The notion of integra-
tion as a sophisticated form of summation is one of the truly fundamental ideas of
mathematical analysis, and it arises in many contexts where the connection with
differentiation is tenuous or nonexistent.
147
148 Chapter 4. Integral Calculus
approximations will approach each other as we subdivide the interval [a, b] into
smaller and smaller pieces, and their common limit will be the integral of f .
Let us make this more precise, introducing some useful definitions along the
way. A partition P of the interval [a, b] is a subdivision of [a, b] into nonover-
lapping subintervals, specified by giving the subdivision points x1 , . . . , xJ−1 along
with the endpoints x0 = a and xJ = b. In symbols, we shall write
% &
P = x0 , x1 , . . . , xJ , with a = x0 < x1 < · · · < xJ = b.
(If f is continuous, mj and Mj are just the minimum and maximum values of
f on [xj−1 , xj ], which exist by the extreme value theorem.) We then define the
lower Riemann sum sP f and the upper Riemann sum SP f corresponding to the
partition P by
J
" J
"
(4.2) sP f = mj (xj − xj−1 ), SP f = Mj (xj − xj−1 ).
1 1
See Figure 4.1, where the lower and upper Riemann sums are the sums of the areas
of the rectangles, an area being counted as negative if the rectangle is below the
x-axis.
If m and M are the infimum and supremum of the values of f over the whole
interval [a, b], we clearly have mj ≥ m and Mj ≤ M for all j, and hence
J
"
sP f ≥ m (xj − xj−1 ) = m(b − a),
1
J
"
SP f ≤ M (xj − xj−1 ) = M (b − a).
1
4.1. Integration on the Line 149
The same argument shows that if one of the subintervals [xj−1 , xj ] is subdivided
further, the lower sum sP f becomes larger while the upper sum SP f becomes
smaller. In short:
4.3 Lemma. If P ′ is a refinement of P , then sP ′ f ≥ sP f and SP ′ f ≤ SP f .
An immediate consequence of this is that any lower Riemann sum for f is less
than any upper Riemann sum for f :
4.4 Lemma. If P and Q are any partitions of [a, b], then sP f ≤ SQ f .
Proof. Consider the common refinement P ∪ Q. By Lemma 4.3,
sP f ≤ sP ∪Qf ≤ SP ∪Q f ≤ SQ f.
the supremum and infimum being taken over all partitions P of [a, b]. By Lemma
b
4.4, we have I ba (f ) ≤ I a (f ). If the upper and lower integrals coincide, f is called
Riemann integrable on [a, b], and the common value of the upper and lower in-
;b
tegrals is the Riemann integral a f (x) dx. We shall generally omit the eponym
“Riemann,” as the Riemann integral is the only one we shall use in this book, but it
is significant not only for historical reasons but in order to distinguish the Riemann
integral from the more sophisticated Lebesgue integral.
At first sight it would seem difficult to determine whether a function f is inte-
grable and to evaluate its integral, as the definitions involve all possible partitions
of [a, b]. The following lemma is the key to making these calculations more man-
ageable.
150 Chapter 4. Integral Calculus
4.5 Lemma. If f is a bounded function on [a, b], the following conditions are equiv-
alent:
a. f is integrable on [a, b].
b. For every ϵ > 0 there is a partition P of [a, b] such that SP f − sP f < ϵ.
b
Proof. If SP f − sP f < ϵ for some partition P , then I a f − I ba f < ϵ, and since
b
ϵ is arbitrary, it follows that I a f = I ba f , i.e., f is integrable. Conversely, if f is a
bounded function and ϵ is positive, we can find partitions Q and Q′ of [a, b] such
b
that SQ f < I a f + 12 ϵ and sQ′ f > I ba f − 12 ϵ. Thus, if f is integrable, we have
SQ f − sQ′ f < ϵ. Let P = Q ∪ Q′ ; then by Lemma 4.3, sQ′ f ≤ sP f ≤ SP f <
SQ f , so SP f − sP f < SQ f − sQ′ f < ϵ.
The condition (b) in Lemma 4.5 not only gives a workable criterion for integra-
bility but also gives us some leverage for computing the integral of an integrable
function f . Indeed, for any partition P we have
* b
sP f ≤ f (x) dx ≤ SP f,
a
;b
so if SP f − sP f < ϵ, SP f and sP f are both within ϵ of a f (x) dx. The latter
quantity is therefore the limit of the sums SP f or sP f as P runs through any
sequence of partitions such that SP f − sP f → 0.
We next present the fundamental additivity properties of the integral, which are
easy but not quite trivial consequences of the definitions:
4.6 Theorem.
a. Suppose a < b < c. If f is integrable on [a, b] and on [b, c], then f is integrable
on [a, c], and
* c * b * c
(4.7) f (x) dx = f (x) dx + f (x) dx.
a a b
Proof. (a) Given ϵ > 0, let P and Q be partitions of [a, b] and [b, c], respectively,
such that SP f − sP f < 12 ϵ and SQ f − sQ f < 12 ϵ. Then P ∪ Q is a partition of
[a, c] and
SP ∪Q f = SP f + SQ f, sP ∪Q f = sP f + sQ f.
4.1. Integration on the Line 151
mj = f (xj−1 ), Mj = f (xj ),
The next criterion for integrability is the one that is most commonly stated in
calculus books. Its proof, however, is frequently omitted because it relies on the
notion of uniform continuity that we studied in §1.8.
Proof. First, f is bounded on [a, b] by Theorem 1.23, so the upper and lower Rie-
mann sums for any partition exist. By Theorem 1.33, f is uniformly continuous
on [a, b]; thus, given ϵ > 0, we can find δ > 0 so that |f (x) − f (y)| < ϵ/(b − a)
whenever x, y ∈ [a, b] and |x − y| < δ. Let P be any partition of [a, b] whose
subintervals [xj−1 , xj ] all have length less than δ. Then |f (x) − f (y)| < ϵ/(b − a)
whenever x and y both lie in the same subinterval, and in particular the maximum
and minimum values of f on that subinterval differ by less than ϵ/(b − a). But this
means that
J
"
SP f − s P f = (Mj − mj )(xj − xj−1 )
1
J
ϵ " ϵ
< (xj − xj−1 ) = (b − a) = ϵ.
b−a 1 b−a
and let
L
A
U= Il , V = [a, b] \ U int .
1
Thus U is a union of small intervals that contain the discontinuities of f , and V is
the remainder of [a, b]. Each interval Im has length at most 2δ, and there are L of
these intervals, so the total length of the set U is at most 2Lδ. On the other hand,
V is a finite union of closed intervals, on each of which f is continuous.
Let P be any partition of [a, b] that includes the endpoints of the intervals Im
among its subdivision points. Then we can write
SP f = SPU f + SPV f, sP f = sU V
P f + sP f,
154 Chapter 4. Integral Calculus
where SPU f (resp. SPV f ) is the sum of the terms Mj (xj − xx−1 ) in SP f for which
the interval [xj−1 , xj ] is contained in U (resp. V ), and likewise for sU V
P f and sP f .
Now, let ϵ > 0 be given. Since f is continuous on each of the closed intervals
that constitute V , Theorem 4.11 shows that we can make
≤ (M − m)(length of U ) ≤ (M − m)2Lδ,
and we can make this less than 12 ϵ by taking δ < ϵ/2L(M − m). In short, for a
suitably chosen P we have SP f − sP f < ϵ, so f is integrable by Lemma 4.5.
The preceding argument actually proves more than is stated in Theorem 4.12.
It is not necessary that the set of discontinuities of f be finite, only that it can be
covered by finitely many intervals I1 , . . . , IL whose total length is as small as we
please. Certain infinite sets, such as convergent sequences, also have this property
(Exercise 6). We make it into a formal definition: A set Z ⊂ R is said to have zero
content # if for any ϵ > 0 there is a finite collection of intervals I1 , . . . , IL such that
(i) Z ⊂ L 1 Il , and (ii) the sum of the lengths of the Il ’s is less than ϵ. The proof of
Theorem 4.12 now yields the following result:
4.13 Theorem. If f is bounded on [a, b] and the set of points in [a, b] at which f is
discontinuous has zero content, then f is integrable on [a, b].
Theorem 4.13 is only a technical refinement of Theorem 4.12, and the reader
should not attach undue importance to it.1 We mention it because its analogue in
higher dimensions does play a significant role in the theory, as we shall see. We
also remark that neither of Theorems 4.10 and 4.13 includes the other; the set of
discontinuities of a monotone function need not have zero content, and there are
continuous functions that are not monotone on any interval. ;b
If f is an integrable function on [a, b], the value of a f (x) dx is somewhat
insensitive to the values of f at individual points, in the following sense:
4.14 Proposition. Suppose f and g are integrable on [a, b] and f (x) = g(x) for
;b ;b
all except finitely many points x ∈ [a, b]. Then a f (x) dx = a g(x) dx.
1
It does, however, point the way toward a necessary and sufficient condition for a function to be
integrable, which we shall describe at the end of §4.8.
4.1. Integration on the Line 155
Proof. First suppose g is identically zero. That is, we are assuming that f (x) = 0
for all x ∈ [a, b] except for finitely many points y1 , . . . , yL . Let Pk be the partition
of [a, b] into k equal subintervals, and take k large enough so that the points yl all
lie in different subintervals. Then
L L
b−a" / 0 b−a" / 0
SPk f = max f (yl ), 0 , s Pk f = min f (yl ), 0 .
k 1
k 1
;b
Both these quantities tend to zero as k → ∞, and hence a f (x) dx = 0.
The general case follows by applying this argument to the difference f − g.
The main use of Proposition 4.14 is in the context of functions with finitely
many discontinuities, as in Theorem 4.12. For such a function f there is often no
“right” way to define f at the points where it is discontinuous. Proposition 4.14
assures us that this problem is of no consequence as far as integration is concerned;
we may define f at these; b points however we like, or indeed leave f undefined there,
without any effect on a f (x) dx.
Next, we present a general version of the fundamental theorem of calculus. Its
two parts say in effect that differentiating an integral or integrating a derivative
leads back to the original function.
4.15 Theorem (The Fundamental Theorem of Calculus). ;x
a. Let f be an integrable function on [a, b]. For x ∈ [a, b], let F (x) = a f (t) dt
(which is well defined by Theorem 4.9b). Then F is continuous on [a, b]; more-
over, F ′ (x) exists and equals f (x) at every x at which f is continuous.
b. Let F be a continuous function on [a, b] that is differentiable except perhaps at
finitely many points in [a, b], and let f be a function on [a, b] that agrees with
F ′ at all points where the latter is defined. If f is integrable on [a, b], then
;b
a f (t) dt = F (b) − F (a).
Proof. (a) If x, y ∈ [a, b], by (4.7) we have
* y
F (y) − F (x) = f (t) dt.
x
Let C = sup{|f (t)| : t ∈ [a, b]}; then by Theorem 4.9d,
* y * y
|F (y) − F (x)| ≤ |f (t)| dt ≤ C dt = C|y − x|,
x x
which implies that F is continuous. Next, suppose that f is continuous at x; thus,
given ϵ > 0, there is a δ > 0 so that |f (t) − f (x)| < ϵ whenever |t − x| < δ. Since
* y * y
1 1
f (x) = f (x) dt = f (x) dt,
y−x x y−x x
156 Chapter 4. Integral Calculus
we have * y
F (y) − F (x) 1
− f (x) = [f (t) − f (x)] dt.
y−x y−x x
Hence, if |y − x| < δ, we have |f (t) − f (x)| < ϵ for all t between y and x, so
) ) )* y )
) F (y) − F (x) ) 1 ) )
) − f (x))) ≤ ) ϵ dt)) = ϵ.
) y−x |y − x| x)
We have developed the notion of the integral of a function f in terms of the up-
per and lower Riemann sums SP f and sP f . More generally, if P = {x0 , . . . , xJ }
is a partition of [a, b] and tj is any point in the interval [xj−1 , xj ] (1 ≤ j ≤ J), the
quantity
"J
f (tj )(xj − xj−1 )
1
is called a Riemann sum for f associated to the partition P . Clearly, if mj and Mj
are as in (4.1) we have mj ≤ f (tj ) ≤ Mj , so that
J
"
sP f ≤ f (tj )(xj − xj−1 ) ≤ SP f.
1
One last question should be addressed: Given an integrable function f on [a, b],
; b which partitions P do the sums sP f and SP f furnish a good approximation to
for
a f (x) dx? It might seem that the answer might depend strongly on the nature of
the function f , but in fact, any partition whose subintervals are sufficiently small
will do the job. More precisely:
4.16 Proposition. Suppose f is integrable on [a, b]. Given ϵ > 0, there exists δ > 0
such that if P = {x0 , . . . , xJ } is any partition of [a, b] satisfying
;b
the sums sP f and SP f differ from a f (x) dx by at most ϵ.
EXERCISES
6. Let {xk } be a convergent sequence in R. Show that the set {x1 , x2 , . . .} has
zero content.
7. Let f be an integrable function on [a, b]. Suppose that f (x) ≥ 0 for all x
and there is at least one point x0 ∈ [a, b] at which f is continuous and strictly
;b
positive. Show that a f (x) dx > 0.
8. Let f be an integrable function on [a, b]. Prove the following formulas directly
from the definitions:; ; b/c
b
a. For any c > 0, a f (x) dx = c a/c f (cx) dx.
;b ; −a
b. a f (x) dx = −b f (−x) dx.
;b ; b−c
c. For any c ∈ R, a f (x) dx = a−c f (x + c) dx.
9. Suppose g and h are continuous functions on [a, b], and f is a continuous func-
tion on R2 . Show that for any ϵ > 0 there is a δ > 0 such that if P =
{x0 , . . . , xJ } is any partition of [a, b] satisfying max1≤j≤J (xj − xj−1 ) < δ,
then
)* b J )
) " )
) f (g(x), h(x)) dx − f (g(x ′
), h(x ′′
))(x − x ))<ϵ
) j j j j−1 )
a j=1
for any choice of x′j , x′′j in the interval [xj−1 , xj ]. (The point is that x′j and x′′j
need not be equal, so the sum in this inequality may not be a genuine Riemann
sum for the integral.)
(Thus, a “rectangle” in this sense is always closed, and its sides are always parallel
to the coordinate axes.) A partition of R is a subdivision of R into subrectangles
obtained by partitioning both sides of R. Thus, a partition P is specified by its
subdivision points,
7
% & a = x0 < · · · < xJ = b,
P = x0 , . . . , xJ ; y 0 , . . . , y K ,
c = y0 < · · · < yK = d,
with area
∆Ajk = (xj − xj−1 )(yk − yk−1 ).
Now let f be a bounded function on the rectangle R. Given a partition P as
above, we set
% & % &
mjk = inf f (x, y) : (x, y) ∈ Rjk , Mjk = sup f (x, y) : (x, y) ∈ Rjk ,
I R (f ) = sup sP f, I R (f ) = inf SP f,
P P
the supremum and infimum being taken over all partitions P of R. If the lower and
upper integrals coincide, f is called (Riemann) integrable on R, and the common
value of the upper and lower integrals is called the (Riemann) integral of f over
R and is denoted by
** **
f dA or f (x, y) dx dy.
R R
easily be adapted to the present situation. However, we have not yet built a satis-
factory definition of two-dimensional integrals, because we often wish to integrate
functions over regions other than rectangles. The solution to this problem is simple,
in principle: To integrate a function f over a bounded region S ⊂ R2 , we draw a
large rectangle R that contains S, (re)define f to be zero outside of S, and integrate
the resulting function over R.
To express this neatly, it is convenient to introduce another definition. If S is a
subset of R2 (or Rn , or indeed any set), the characteristic function or indicator
function of S is the function χS defined by
7
1 if x ∈ S,
χS (x) =
0 otherwise.
It is easily verified that this definition does not depend on the choice of the en-
veloping rectangle R, since the integrand f χS vanishes outside of S. (It also does
not depend on the values of f outside of S. We could just as well assume that
f is only defined on S or on some set containing S, with the understanding that
(f χS )(x) = 0 for x ∈ / S.)
The properties of integrals in two dimensions are very similar to those in one;
the following theorem provides a list of the most basic ones. The proof is essentially
identical to that of Theorems 4.6 and 4.9; we leave the details to the interested
reader.
4.17 Theorem.
a. If f1 and f2 are integrable on the bounded set S and c1 , c2 ∈ R, then c1 f1 +
c2 f2 is integrable on S, and
** ** **
[c1 f1 + c2 f2 ] dA = c1 f1 dA + c2 f2 dA.
S S S
;;
If f and g are integrable on S and f (x) ≤ g(x) for x ∈ S, then S f dA ≤
c. ;;
S g dA. ;; ;;
d. If f is integrable on S, then so is |f |, and | S f dA| ≤ S |f | dA.
At this point we need to say more about the conditions under which a function is
integrable. In the one-variable situation, we can get along quite well by restricting
attention to continuous functions, but that is not the case here: Even;;if the function
f is continuous, the function χS that enters into the definition of S f dA is not.
The starting point is the analogue of Theorem 4.13. The notion of “zero content”
transfers readily to sets in the plane; namely, a set Z ⊂ R2 is said to have zero
content if for#any ϵ > 0 there is a finite collection of rectangles R1 , . . . , RM such
that (i) Z ⊂ M 1 Rm , and (ii) the sum of the areas of the Rm ’s is less than ϵ. We
then have:
4.18 Theorem. Suppose f is a bounded function on the rectangle R. If the set of
points in R at which f is discontinuous has zero content, then f is integrable on R.
Proof. The proof is essentially identical to that of Theorem 4.13. That is, one
first shows that f is integrable if f is continuous on all of R by the argument
that proves Theorem 4.11, then encompasses the general case by the argument that
proves Theorem 4.12. Details are left to the reader.
The notion of “zero content” is considerably more interesting in the plane than
on the line, as the sets having this property include not only finite sets but things
such as smooth curves (that is, curves parametrized by C 1 functions f : [a, b] →
R2 ). The following proposition summarizes the results we will need; see also Ex-
ercise 2.
4.19 Proposition.
a. If Z ⊂ R2 has zero content and U ⊂ Z, then U # has zero content.
b. If Z1 , . . . , Zk have zero content, then so does k1 Zj .
c. If f : (a0 , b0 ) → R2 is of class C 1 , then f ([a, b]) has zero content whenever
a0 < a < b < b0 .
Proof. Parts (a) and (b) are easy, and their proofs are left as an exercise. As for
(c), let Pk = {t0 , . . . , tk } be the partition of [a, b] into k equal subintervals of
length δ = (b − a)/k, and let C be an upper bound for {|f ′ (t)| : t ∈ [a, b]}. By
the mean value theorem applied to the two components x(t), y(t) of f (t), we have
|x(t) − x(tj )| ≤ Cδ and |y(t) − y(tj )| ≤ Cδ for t ∈ [tj−1 , tj ]. In other words,
f ([tj−1 , tj ]) is contained in the square of side length 2Cδ centered at f (tj ). Hence,
f ([a, b]) is contained in the union of these squares, and the sum of their areas is
k(2Cδ)2 = 4C 2 (b − a)2 /k. This can be made as small as we please by taking k
sufficiently large, so f ([a, b]) has zero content.
162 Chapter 4. Integral Calculus
;;
To apply Theorem 4.18 to the integrand f χS in the definition of S f dA, we
need to know about the discontinuities of χS . The following lemma provides the
answer.
In view of Theorem 4.18 and Lemma 4.20, to have a good notion of integra-
tion over a set S, we should require the boundary of S to have zero content. We
make this condition into a formal definition: A set S ⊂ R2 is Jordan measurable
if it is bounded and its boundary has zero content. (We shall comment further on
this nomenclature below.) We shall generally say “measurable” rather than “Jor-
dan measurable,” but we advise the reader that in more advanced works the term
“measurable” refers to the more general concept of Lebesgue measurability (see
§4.8).
By Proposition 4.19, any bounded set whose boundary is a finite union of pieces
of smooth curves is measurable; these are the sets that we almost always encounter
in practice. The following theorem gives a convenient criterion for integrability.
Proof. The only points where f χS can be discontinuous are those points in the
closure of S where either f or χS is discontinuous. By Lemma 4.20 and Proposition
4.19b, the set of such points has zero content. By Theorem 4.18, f χS is integrable
on any rectangle R containing S, and hence f is integrable on S.
M
" M
"
−Cϵ < −C |Rj | ≤ sP (f χZ ) ≤ SP (f χZ ) ≤ C |Rj | < Cϵ.
1 1
Since ϵ is arbitrary, the desired conclusion follows directly from the definition of
the integral.
4.23 Corollary.
a. Suppose f is integrable on S ⊂ R2 . If g is bounded and g(x)
;; = f (x);;except for
x in a set of zero content, then g is integrable on S and S g dA = S f dA.
b. Suppose f is integrable on;;S and on T , and;; S ∩ T has
;; zero content. Then f is
integrable on S ∪ T , and S∪T f dA = S f dA + T f dA.
Proof. For (a), apply Proposition 4.22 to the function f − g. For (b), we are as-
suming that f χS and f χT are integrable; moreover, by Proposition 4.22, f χS∩T
is integrable and its integral is zero. But f χS∪T = f χS + f χT − f χS∩T , so the
result follows.
Area. The problem of determining the area of regions in the plane goes back
to antiquity. The first effective general method of attacking this problem was pro-
vided by the integral calculus in one variable, which yields the area of a region
under a graph, or of a region between two graphs. It therefore produces a theory
of area for regions that can be broken up into finitely many subregions bounded by
graphs of (nice) functions. However, the two-variable theory of integration con-
tains, as a special case, a theory of area (due to the French mathematician Jordan)
that encompasses more complicated sorts of regions too. Namely, if S is any Jordan
measurable set in the plane, its area is the integral over S of the constant function
f (x) ≡ 1: ** **
(area)(S) = 1 dA = χS dA,
S
the latter integral being taken over any rectangle that contains S.
Let us;;pause to see just what this means. Given any bounded set S ⊂ R2 , to
compute S χS dA we enclose S in a large rectangle R and consider a partition P
2
A collection {Sj } of sets is disjoint if Sj ∩ Sk = ∅ for j ̸
= k.
164 Chapter 4. Integral Calculus
of R, which produces a grid of small rectangles that cover S. The lower sum for this
partition is simply the sums of the areas of the small rectangles that are contained
in S, whereas the upper sum is the sum of the areas of the small rectangles that
intersect S. Taking the supremum of the lower sums and the infimum of the upper
sums yields quantities that may be called the inner area and outer area of S:
When these two quantities coincide, that is, when the characteristic function χS is
integrable, their common value is the area of S. See Figure 4.3.
When do we have A(S) = A(S)? It is not hard to see (Exercises 3–5) that for
any bounded set S,
• S and its interior S int have the same inner area;
• the inner area of S int plus the outer area of the boundary ∂S equals the outer
area of the closure S.
It follows that the inner and outer areas of S coincide precisely when the outer area
of the boundary ∂S is zero. But a moment’s thought shows that this is nothing but
the condition that ∂S should have zero content. In short, the inner and outer area
of S coincide precisely when S is measurable. This is the explanation for the name
“measurable”: The measurable sets are the ones that have a well-defined area.
Although the class of Jordan measurable sets is much more extensive than the
class of sets whose area can be computed by one-variable calculus, it is not as big
as we would ideally wish. It does not include all bounded open sets or all compact
4.2. Integration in Higher Dimensions 165
sets, for example. Moreover, it does not behave well with respect to passage to
limits: The union of a sequence of measurable sets, all contained in a common
rectangle, need not be measurable.
A simple example of the latter phenomenon can be obtained by considering the
sets Sk of all points in the unit square whose x-coordinate is an integer multiple of
2−k . Each Sk is the union of a finite collection
#∞ of line segments, so it is measurable
and its area is zero. However, the union 1 Sk is the set of all points in the unit
square whose x-coordinate has a terminating base-2 decimal expansion. This set
is dense in the unit square but has no interior, from which it is easy to see that its
inner area is 0 but its outer area is 1 (Exercises 3 and 4). By “fattening up” the
sets Sk (replacing the line segments in them by thin rectangles), we can also obtain
examples of open sets and closed sets that are not measurable (Exercise 6).
The defects of the Jordan theory of area carry over more generally to the theory
of integration we are discussing, and for more advanced work one needs the more
sophisticated Lebesgue theory of measure and integration, of which we present a
brief sketch in §4.8. It is largely for this reason that we are being somewhat cavalier
about presenting all the theoretical details in this chapter; there seems to be little
virtue in expending an enormous amount of effort on a theory that must be upgraded
when one proceeds to a more advanced level.
S ⊂ Rn is
* * * * * *
n n
· · · f dV = · · · f (x) d x = · · · f (x1 , . . . , xn ) dx1 · · · dxn ,
S S S
; ;
where · · · is shorthand for a row of n integral signs. When n = 3, we usually
write dV instead of dV 3 , the V denoting ordinary 3-dimensional volume.
We conclude with a useful fact about integrals in any number of dimensions.
4.24 Theorem (The Mean Value Theorem for Integrals). Let S be a compact, con-
nected, measurable susbset of Rn , and let f and g be continuous functions on S
with g ≥ 0. Then there is a point a ∈ S such that
* * * *
· · · f (x)g(x) d x = f (a) · · · g(x) dn x.
n
S S
Proof. Let m and M be the minimum and maximum values of f on S, which exist
since S is compact. Since g ≥ 0, we have mg ≤ f g ≤ M g on S, and hence
* * * * * *
m · · · g(x) d x ≤ · · · f (x)g(x) d x ≤ M · · · g(x) dn x.
n n
S S S
; ; ; ;
Thus the quotient ( · · · f g)/( · · · g) lies between m and M , so by the interme-
diate value theorem, it is equal to f (a) for some a ∈ S.
EXERCISES
1. Prove Proposition 4.19(a,b).
2. Let f : [a, b] → R be an integrable function.
a. Show that the graph of f in R2 has zero content. (Hint: Given a partition
P of [a, b], interpret SP f − sP f as a sum of areas of rectangles that cover
the graph of f .)
b. Suppose f ≥ 0 and let S = {(x, y) : x ∈ [a, b], 0 ≤ y ≤ f (x)}. Show
; b S is measurable and that its area (as defined in this section) equals
that
a f (x) dx.
3. Let S be a bounded set in R2 . Show that S and S int have the same inner area.
(Hint: For any rectangle contained in S, there are slightly smaller rectangles
contained in S int .)
4. Let S be a bounded set in R2 . Show that S and S have the same outer area.
(Hint: For any rectangle that does not intersect S, there are slightly smaller
rectangles that do not intersect S.)
5. Let S be a bounded set in R2 . Show that the inner area of S plus the outer area
of ∂S equals the outer area of S. (Use Exercises 3 and 4.)
6. Let S be the subset of the x-axis consisting of the union of the open interval
of length 41 centered at 12 , the open intervals of length 161
centered at 14 and 34 ,
1 1 3 5 7
the open intervals of length 64 centered at 8 , 8 , 8 , and 8 , and so forth. Let
U = S × (0, 1) be the union of the open rectangles of height 1 based on these
intervals. Thus U is the union of one rectangle of area 14 , two rectangles of area
1 1
16 , four rectangles of area 64 , . . . , some of which overlap.
a. Show that U is an open subset of the unit square R = [0, 1] × [0, 1].
b. Show that the inner area of U is less than 21 .
c. Show that U is dense in R and hence that its outer area is 1. (Use Exercise
4.)
d. Let V = R \ U . Show that V is a closed set whose inner area is 0 and
whose outer area is bigger than 21 .
7. (The Second Mean Value Theorem for Integrals) Suppose f is continuous on
[a, b] and ϕ is of class C 1 and increasing on [a, b]. Show that there is a point
c ∈ [a, b] such that
* b * c * b
f (x)ϕ(x) dx = ϕ(a) f (x) dx + ϕ(b) f (x) dx.
a a c
;x
(Hint: First suppose ϕ(b) = 0. Set F (x) = a f (t) dt, integrate by parts
;b ;b
to show that a f (x)ϕ(x) dx = − a F (x)ϕ′ (x) dx, and apply Theorem 4.24
168 Chapter 4. Integral Calculus
to the latter integral. To remove the condition ϕ(b) = 0, show that if the
conclusion is true for f and ϕ, it is true for f and ϕ + C for any constant C.)
** J "
" K
f (x, y) dx dy ≈ f (xj , yk ) ∆xj ∆yk
R j=1 k=1
K *
" b * d +* b ,
≈ f (x, yk ) dx ∆yk ≈ f (x, y) dx dy.
k=1 a c a
We could also play the same game with x and y switched, obtaining
** * b +* d ,
f dA = f (x, y) dy dx.
R a c
here there is one potential pitfall: The integrability of f on R need not imply the
integrability of f (x, y0 ), as a function of x for fixed y0 , on [a, b]. The line seg-
ment {(x, y) : a ≤ x ≤ b, y = y0 } is a set of zero content, after all, so f could
be discontinuous at every point on it, and its behavior as a function of x could be
quite wild. This problem is actually not too serious, and we shall sweep it under
the rug by making the assumption — quite harmless in practice — that it does not
occur. The resulting theorem is as follows. It is sometimes referred to as Fubini’s
theorem, although that name belongs more properly to the generalization of the
theorem to Lebesgue integrals.
4.26 Theorem. Let R = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d}, and let f be an
integrable function on R. Suppose that, for each y ∈ [c, d], the function fy defined
;b
by fy (x) = f (x, y) is integrable on [a, b], and the function g(y) = a f (x, y) dx is
integrable on [c, d]. Then
** * d +* b ,
(4.27) f dA = f (x, y) dx dy.
R c a
Likewise, iff x (y) = f (x, y) is integrable on [c, d] for each x ∈ [a, b], and h(x) =
;d
c f (x, y) dy is integrable on [a, b], then
** * b +* d ,
(4.28) f dA = f (x, y) dy dx.
R a c
Proof. The proof is presented in Appendix B.4 (Theorem B.9). The issue that
must be addressed is the permissibility of first letting the x-subdivisions get finer
and finer, and then doing the same for the y-subdivisions, or vice versa, as opposed
to requiring both subdivisions to become finer at the same time.
The integrals on the right side of (4.27) and (4.28) are called iterated integrals.
It is customary to omit the brackets in these integrals and to write, for example,
* d* b
f (x, y) dx dy,
c a
with the understanding that; the integration is to be done “from the inside out.” That
b
is, the innermost integral a corresponds to the innermost differential dx, and the
integral with respect to the corresponding variable x is to be performed first. Some
people find it clearer to write the differentials dx and dy next to the integral signs
to which they pertain, thus:
* d * b
dy dx f (x, y).
c a
170 Chapter 4. Integral Calculus
;; ;;
F IGURE 4.4: · · · dx dy versus · · · dy dx.
If our region of integration is not the whole rectangle R but a subset S, the in-
tegration effectively stops at the boundary of S, and the limits of integration should
be adjusted accordingly. For example, if S is bounded above and below by the
graphs of two functions,
% &
(4.29) S = (x, y) : a ≤ x ≤ b, ϕ(x) ≤ y ≤ ψ(x) ,
we have
** * b* ψ(x)
(4.30) f dA = f (x, y) dy dx.
S a ϕ(x)
Here it is essential to integrate first in y, then in x, since the limits ϕ(x) and ψ(x)
;b
furnish part of the x-dependence of the integrand for the outer integral a · · · dx.
It is important to observe that if S is a region of the form (4.29) where ϕ and
ψ are of class C 1 , and f is continuous on S, the hypotheses in Theorem 4.26
that allow integration first in y and then in x are automatically satisfied, so that
(4.30) is valid. Indeed, the integrability of f χS on any rectangle R ⊃ S follows
from Proposition 4.19c and Theorem 4.21, and the integrability of the function
(f χS )(x, y) as a function of y for fixed x is obvious since it is continuous except
at the two points y = ϕ(x) and y = ψ(x).
On the other hand, if S is bounded on the left and right by graphs of functions
of y, we obtain a formula similar to (4.30) with the roles of x and y reversed.
In general, most of the regions S that arise in practice can be decomposed into a
finite number of pieces S1 , . . . , SK , each of which is of the form (4.29) or of the
analogous form with x and y switched. By using ;; the additivity property (Theorem
4.17b), we can reduce the computation of S f dA to the calculation of iterated
integrals on these subregions.
Figure 4.4 may
;; be helpful in interpreting iterated integrals. The sketch on the
left symbolizes · · · dx dy, in which we integrate first over the horizontal lines
that run from the left side to the right side of the region, then integrate over the
y-interval that comprises;;the y-coordinates of all these lines. Similarly, the sketch
on the right symbolizes · · · dy dx.
4.3. Multiple Integrals and Iterated Integrals 171
2 2
−4 4
1 −2
F IGURE 4.5: The regions of integration in Example 1 (left) and Example 2 (right).
To integrate first in y, we must break up R into its left and right halves:
* * √ * √
0 4+x 4* 4−x
√ f (x, y) dy dx + √ f (x, y) dy dx.
−4 − 4+x 0 − 4−x
172 Chapter 4. Integral Calculus
The ideas in higher dimensions are entirely similar. The analogue of Theo-
rem 4.26 is that an integral over an n-dimensional rectangular solid with sides
[a1 , b1 ], . . . , [an , bn ] can be evaluated as an n-fold iterated integral,
* * * bn * b1
· · · f dV = ··· f (x1 , . . . , xn ) dx1 · · · dxn ,
R an a1
provided that the indicated integrals exist. The meaning of the iterated integral
on the right is that the integration is to be performed first with respect to x1 and
last with respect to xn . However, the same formula remains valid with the n inte-
grations performed
; bj in any order. The only thing that needs some care is that the
integral signs aj must be matched up with the differentials dxj in the right order
so as to get the right limits of integration, and the convention is the same as in
the case n = 2: The integrations are to be performed in order from innermost to
outermost.
When the region of integration is something other than a rectangular solid, set-
ting up the right limits of integration can be rather complicated. A typical situation
in 3 dimensions is as follows: The region of integration S is the region in between
two graphs,
% &
S = (x, y, z) : (x, y) ∈ U, ϕ(x, y) ≤ z ≤ ψ(x, y) ,
based on some region U in the xy-plane. The region U in turn is the region between
two graphs,
% &
U = (x, y) : a ≤ x ≤ b, σ(x) ≤ y ≤ τ (x) ,
E XAMPLE 3. Find the mass of the tetrahedron T formed by the three coordi-
nate planes and the plane x + y + 2z = 2 (see Figure 4.6) if the mass density
is given by ρ(x, y, z) = e−z .
4.3. Multiple Integrals and Iterated Integrals 173
(0, 0, 1)
(0, 0, 0)
(0, 2, 0)
(2, 0, 0)
;;;
Solution. There are six ways to write the triple integral T e−z dV as an
iterated integral, although only three of them are essentially different, namely,
* 2 * 2−x * 1−(x+y)/2 * 1 * 2−2z * 2−y−2z
−z
e dz dy dx, e−z dx dy dz,
0 0 0 0 0 0
* 2 * 1−(y/2) * 2−y−2z
e−z dx dz dy.
0 0 0
The reader may verify that the other two iterated integrals give the same answer.
More precisely, (4.31) is valid if f satisfies the conditions in Theorem 4.26 for
both (4.27) and (4.28) to hold. (See Exercise 13 for an example to demonstrate
the significance of these conditions.) The importance of this result can hardly be
overestimated; it is an extremely powerful tool for evaluating quantities defined by
integrals. We shall see a number of examples in later chapters.
;2;1 3
E XAMPLE 4. Evaluate 0 y/2 ye−x dx dy.
Solution. The integral cannot be evaluated by elementary methods as it
−x3
;; e −x3has no elementary antiderivative. However, it can be inter-
stands, since
preted as T ye dA where T is the triangle with vertices (0, 0), (1, 0), and
(1, 2) as in Example 1. Writing this double integral as an iterated integral in
the other order leads to an easy calculation:
* 1 * 2x * 1 * 1
−x3
)
1 2 )2x −x3 3
ye dy dx = 2 y 0
e dx = 2x2 e−x dx
0 0 0 0
3 )1
= − 23 e−x )0 = 32 (1 − e−1 ).
Applications. Double and triple integrals can be used to calculate physical and
geometric quantities in much the same way as single integrals. Here are a few
standard examples:
;;
• If f (x, y) ≥ 0, the integral S f dA can be interpreted as the volume of the
region in R3 between the graph of f and the xy-plane that lies over the base
region S.
• Again suppose that a massive object with mass density ρ(x) occupies the
region S ⊂ R3 , and let L be a line in R3 . The moment of inertia of the
the line L, a quantity that is useful in analyzing rotational motion
body about ;;;
about L, is S d(x)2 ρ(x)d3 x, where d(x) is the distance from x to L. (For
example, if L is the z-axis, then d(x, y, z)2 = x2 + y 2 .)
EXERCISES
1. Evaluate
;; the following double integrals.
a. ;;S (x + 3y 3 ) dA, S = the upper half (y ≥ 0) of the unit disc x2 + y 2 ≤ 1.
√
b. S (x2 − y) dA, S = the region between the parabola x = y 2 and the
line x = 2y.
2. Find the volume of the region above the triangle in the xy-plane with vertices
(0, 0), (1, 0), and (0, 1), and below the surface z = 6xy(1 − x − y).
;;
3. For the following regions S ⊂ R2 , express the double integral S f dA in
terms of iterated integrals in two different ways.
a. S = the region in the left half plane between the curve y = x3 and the line
y = 4x.
b. S = the triangle with vertices (0, 0), (2, 2), and (3, 1).
c. S = the region between the parabolas y = x2 and y = 6 − 4x − x2 .
4. Express each of the following iterated integrals as a double integral and as an
iterated integral in the opposite order. (That is, find the region of integration
for the double integral and the limits of integration for the other iterated inte-
gral.)
; 1 ; x1/3
a. 0 x2 f (x, y) dy dx.
; 1 ; 2y
b. 0 −y f (x, y) dx dy.
; 2 ; log x
c. 1 0 f (x, y) dy dx.
176 Chapter 4. Integral Calculus
5. Evaluate the following iterated integrals. (You may need to reverse the order of
integration.)
;3;y
a. 1 1 ye2x dx dy.
;1;1
b. 0 √x cos(y 3 + 1) dy dx.
;2;1
c. 1 1/x yexy dy dx.
; 1 ; x+1 ;1 ;2
6. Fill in the blanks: 0 2x2 f (y) dy dx = 0 [ ] dy + 1 [ ]dy. The expres-
sions you obtain for the [ ]’s should not contain integral signs.
;x;y
7. Given a continuous function g : R → R, let h(x) = 0 0 g(t) dt dy. That is,
h is obtained by integrating g twice, starting the integration
; x at 0. Show that h
can be expressed as a single integral, namely, h(x) = 0 (x − t)g(t) dt. (Note
that x can be treated as a constant here; y and t are the variables of integration.)
10. Find the centroid of the tetrahedron bounded by the coordinate planes and the
plane (x/a) + (y/b) + (z/c) = 1.
11. An object with mass density ρ(x, y, z) = yz occupies the cube {(x, y, z) : 0 ≤
x, y, z ≤ 2}. Find its mass and center of mass.
12. A body with charge density ρ(x, y, z) = 2z occupies the region bounded below
by the parabolic cylinder z = x2 − 3, above by the plane z = x − 1, and on the
sides by the planes y = 0 and y = 2. Find its net charge (total positive charge
minus total negative charge).
13. Let f (x, y) = y −2 if 0 < x < y < 1, f (x, y) = −x−2 if 0 < y < x < 1, and
f (x, y) = 0 otherwise, and let S be the unit square [0, 1] × [0, 1].
a. Show that f is not integrable on S, but that f (x, y) is integrable on [0, 1]
as a function of x for each fixed y and as a function of ;y for each fixed x.
1;1
b. Show by explicit calculation that the iterated integrals 0 0 f (x, y) dx dy
;1;1
and 0 0 f (x, y) dy dx both exist and are unequal.
4.4. Change of Variables for Multiple Integrals 177
The proof is a simple matter of combining the chain rule and the fundamental the-
orem of calculus. Indeed, if F is an antiderivative
;b of f , the right side of (4.32) is
′
F (g(b)) − F (g(a)), which in turn equals a (F ◦ g) (u) du, and the latter integrand
is f (g(u))g ′ (u). (Formula (4.32) is actually valid when f is merely integrable, but
we shall not worry about this refinement here.)
There is one slightly tricky point here, which we point out now because it will
be significant later. If g is an increasing function, (4.32) is fine as it stands, but
if g is decreasing, the endpoints on the integral on the right are in the “wrong”
order, and we might prefer to put them back in the “right” order by introducing a
; g(b) ; g(a)
minus sign: g(a) = − g(b) . Since g is increasing or decreasing according as g′ is
positive or negative, we could rewrite (4.32) as
* *
′
(4.33) f (g(u))|g (u)| du = f (x) dx.
[a,b] g([a,b])
Here g([a, b]);is the interval to which [a, b] is mapped under g, and for any interval
I the symbol I means the integral from the left endpoint of I to the right endpoint.
The replacement of g′ by |g′ | compensates for the extra minus sign that comes from
adjusting the order of the endpoints when g is decreasing.
In practice it is often more convenient to have all the g’s on one side of the
equation. If we set I = g([a, b]), we have [a, b] = g−1 (I), and (4.33) becomes
* *
(4.34) f (x) dx = f (g(u))|g ′ (u)| du.
I g −1 (I)
Our object is to find the analogous formula for multiple integrals. It is natural
to use (4.34) rather than (4.32) as a starting point, since for multiple integrals the
issue of left-to-right or right-to-left disappears and we just speak of integrals over
a region, like the integrals over intervals that appear in (4.34). More precisely,
suppose G is a one-to-one transformation from a region R in Rn to another region
S = G(R) in Rn ; then R = G−1 (S), and the formula we are seeking should look
178 Chapter 4. Integral Calculus
r dθ
dθ
dr
dr
The missing ingredient is the quantity that will play the role of |g′ (u)| in the formula
(4.34).
Now, the g′ (u) in (4.32) or (4.34) is the factor that relates the differentials du
and dx under the transformation x = g(u). In n variables, the n-fold differential
dn x = dx1 · · · dxn represents the “element of volume,” that is, the volume of an
infinitesimal piece of n-space. So the question is: How does the volume of a tiny
piece of n-space change when one applies the transformation G?
To get a feeling for what is going on, let us look at the polar coordinate map
A small rectangle in the rθ-plane with lower left corner at (r, θ) and sides of length
dr and dθ is mapped to a small region in the xy-plane bounded by two line seg-
ments of length dr and two circular arcs of length r dθ and (r + dr) dθ. When dr
and dθ are very small, this is essentially a rectangle with sides dr and r dθ, so its
area is r dr dθ. In short, a small bit of the rθ-plane with area dr dθ is mapped to a
small bit of the xy-plane with area r dr dθ; see Figure 4.7. Hence, in this case the
missing factor in (4.35) is simply r, and (4.36) becomes
** **
(4.36) f (x, y) dx dy = f (r cos θ, r sin θ)r dr dθ.
S R
Here S is a region in the xy-plane and R = G−1 (S) is the corresponding region in
the rθ-plane. Our argument here has been very informal, but this result is correct,
and it gives the formula for computing double integrals in polar coordinates.
The case /of a 0linear mapping of the plane is also easy to analyze. Given a
matrix A = ac db with det A = ad − bc ̸= 0, let x = G(u) = Au, that is,
4.4. Change of Variables for Multiple Integrals 179
(0, 1) (1, 1) (a + c, b + d)
(b, d)
(a, c)
(0, 0) (1, 0) (0, 0)
(x, y) = G(u, v) = (au + bv, cu + dv). The transformation G takes the unit
vectors (1, 0) and (0, 1) to the vectors (a, c) and (b, d), so it maps the standard
coordinate grid to a grid of parallelograms with sides parallel to these vectors. In
particular, it maps the square [0, 1] × [0, 1] to the parallelogram with vertices at
(0, 0), (a, c), (b, d), and (a + b, c + d), as indicated in Figure 4.8. The area of that
parallelogram is |ad − bc|, that is, | det A|. (To see this, think of the plane as sitting
in R3 and recall the geometric interpretation of the cross product: The area of the
parallelogram is
which is nothing but | det A| (Exercise 8 in §1.1). As before, we conclude that for
the linear map G(u) = Au of R3 , the missing factor in (4.35) should be | det A|.
180 Chapter 4. Integral Calculus
It is now reasonable to conjecture that the same result should hold for linear
mappings of Rn for any n. We proceed to show that this is correct.
4.37 Theorem. Let A be an invertible n × n matrix, and let G(u) = Au be the
corresponding linear transformation of Rn . Suppose S is a measurable region in
Rn and f is an integrable function on S. Then G−1 (S) = {A−1 x : x ∈ S} is
measurable and f ◦ G is integrable on G−1 (S), and
* * * *
n
(4.38) · · · f (x) d x = | det A| · · · f (Au) dn u.
S G−1 (S)
Proof. The proof of the measurability of G−1 (S) and the integrability of f ◦ G,
which is not profound but rather tedious, is given in Appendix B.5 (Corollaries
B.16 and B.17). (Actually, what is proved in Appendix B.5 is that if f is continuous
except on a set of zero content, a slightly stronger condition than integrability, then
the same is true of f ◦ G.) Here we concentrate on proving (4.38). The proof
naturally requires some linear algebra, in particular, the facts about elementary row
operations and determinants in (A.17)–(A.18), (A.28), and (A.30) of Appendix A.
Step 1: Let us agree to (re)define f (x) to be 0 for x ∈ / S. Then f (Au) = 0
for u ∈ / G (S), and we can replace the regions S and G−1 (S) in (4.38) by Rn .
−1
This makes the integrals in (4.38) look improper, but they really are not, since the
integrands vanish outside bounded sets. The point is that now we don’t have to
worry about what the limits of integration in each variable are; we can take them to
be ±∞.
Step 2: We prove the theorem when G is an “elementary transformation,” that
is, the transformation given by performing a single elementary row operation on
the column vector u. There are three kinds of elementary transformations, corre-
sponding to the three types of row operations (see (A17)–(A18)):
1. Multiply the kth component by a nonzero number c, leaving all the other
components alone:
2. Add a multiple of the jth component to the kth component, leaving all the
other components alone:
G3 (u1 , . . . , uj , . . . , uk , . . . , un ) = (u1 , . . . , uk , . . . , uj , . . . , un ).
4.4. Change of Variables for Multiple Integrals 181
It is easy to verify that (4.38) holds for these three types of transformations.
The first two involve a change in only the kth variable, so we can integrate first
with respect to that variable and use (4.34) (or, rather, the simple special cases of
(4.34) discussed in Exercise 8 of §4.1). Thus, for G1 we set xk = cuk and obtain
* ∞ * c−1 ∞
f (. . . , xk , . . .) dxk = f (. . . , cuk , . . .) c duk
−∞ −c−1 ∞
* ∞
= |c| f (. . . , cuk , . . .) duk .
−∞
(The endpoints have to be switched if c < 0, which accounts for replacing c by |c|,
as in the discussion preceding (4.34).) Likewise, for G2 we set xk = uk + cuj and
obtain * ∞ * ∞
f (. . . , xk , . . .) dxk = f (. . . , uk + cuj , . . .) duk .
−∞ −∞
simply because the variables uj and uk are dummy variables here. That is, we are
integrating f with respect to its jth and kth variables, and it doesn’t matter what
we call them. Now an integration with respect to the remaining variables, together
with (4.39), gives (4.38) for G3 .
Step 3: We next verify that if (4.38) is valid for the linear maps G(u) = Au
and H(u) = Bu, then it is also valid for the composition (G ◦ H)(u) = ABu.
182 Chapter 4. Integral Calculus
But (det A)(det B) = det(AB) and H−1 (G−1 (S)) = (G ◦ H)−1 (S), so the
integral on the right equals
* *
| det(AB)| · · · f (ABu) dn u,
(G◦H)−1 (S)
as claimed.
The Final Step: From Step 3, it follows easily by induction that if (4.38) is valid
for G1 , . . . , Gk , then it is also valid for the composition G1 ◦· · ·◦Gk . Thus, in view
of Step 2, to complete the proof we have merely to observe that every invertible
linear transformation of Rn is a composition of elementary transformations. This
is equivalent to the fact that every invertible matrix A can be row-reduced to the
identity matrix; see (A.52) (in particular, the equivalence of (a) and (i)) and (A.53)
in Appendix A.
There is one more simple class of transformations for which the change-of-
variable formula is easily established, namely the translations. These are the map-
pings of the form G(u) = u + b where b is a fixed vector. Indeed, we just make
the substitution xj = uj + bj , dxj = duj in each variable separately to conclude
that * * * *
· · · f (x) dn x = · · · f (u + b) dn u.
S S−b
Notice that the results derived earlier in this section are indeed special cases of
Theorem 4.41. If G is a linear map, G(u) = Au, then DG(u) = A for all u,
so | det DG(u)| = | det A| is a constant that can be brought outside the integral
sign. And if G is the polar coordinate map, G(r, θ) = (r cos θ, r sin θ), then
det DG(r, θ) = r, so we recover (4.36).
Let us record the corresponding results for the standard “polar” coordinate sys-
tems in R3 , shown in Figure 4.9. Cylindrical coordinates are just polar coordi-
nates in the xy-plane with the z-coordinate added in,
Gcyl (r, θ, z) = (r cos θ, r sin θ, z).
It is easily verified that det DGcyl (r, θ, z) = r again, so the formula for integration
in cylindrical coordinates is
*** ***
(4.43) f (x, y, z) dx dy dz = f (r cos θ, r sin θ, z) r dr dθ dz.
S G−1
cyl (S)
184 Chapter 4. Integral Calculus
z z
r
z ϕ
r y y
θ θ
x x
S T
E XAMPLE 2. Find' the volume of the “ice cream cone” T bounded below by
the cone z = 2 x2 + y 2 and above by the sphere x2 + y 2 + z 2 = 1. (See
Figure 4.10.)
Solution. In spherical coordinates (r, ϕ, θ), the equation of the cone is
tan ϕ = 21 and the equation of the sphere is r = 1. Hence the volume is
This can also be done in cylindrical coordinates (r, θ, z) (note that the meaning
of r has changed here), in which the equation of the cone is z = 2r and the
2 2
√ is r + z = 1. The projection of T onto the xy-plane
equation of the sphere
is the disc r ≤ 1/ 5, so the volume is
* √ √ √
1/ 5* 1−r 2* 2π * 1/ 5 '
r dθ dz dr = 2π (r 1 − r 2 − 2r 2 ) dr
0 2r 0 0
2π 8 2 3/2
9 √
3 1/ 5
= − (1 − r ) − 2r 0 ,
3
which yields the same answer as before.
E XAMPLE 3. Let P be the parallelogram bounded by the lines x − y = 0,
x + 2y = 0, x − y = 1, and x + 2y = 6. (See Figure 4.11.) Compute
;;
P xy dA.
Solution. The equations of the bounding lines suggest the linear transfor-
mation u = x − y, v = x + 2y, which maps P to the rectangle 0 ≤ u ≤ 1,
0 ≤ v ≤ 6. In the notation of Theorem 4.37, P plays the role of S and this
transformation is G−1 ; its inverse G is easily computed to be x = 13 (2u + v),
186 Chapter 4. Integral Calculus
4
2
2
R
P
2 1 4
** * 1* 6- .- .
1 2u + v v−u
xy dA = dv du,
P 3 0 0 3 3
where the columns of the 2 × 2 matrix are the vectors from the origin to the
two adjacent vertices. Taking this transformation as G in Theorem 4.37 yields
** * 1* 1
xy dA = 2 ( 32 s + 2t)(− 31 s + 2t) dt ds.
P 0 0
This integral is essentially the same as the preceding one; the variables (s, t)
and (u, v) are related by u = s, v = 6t.
E XAMPLE 4. Let R be the region in the first quadrant of the xy-plane bounded
1 2 1 2 1 2
;; x = 1 − 4 y , x = 4 y − 1, and x = 4 − 16 y .
by the x-axis and the parabolas
(See Figure 4.11.) What is R xy dx dy?
Solution. Refer back to Example 3 in §3.4: The region R is the image of
the rectangle {(u, v) : 1 ≤ u ≤ 2, 0 ≤ /2uv −2v
≤ 1}
0 under the map G(u, v) =
2 2
(u − v , 2uv). We have DG(u, v) = 2v 2u and hence det DG(u, v) =
4.4. Change of Variables for Multiple Integrals 187
EXERCISES
1. Find the area of the region inside the cardioid r = 1+cos θ (polar coordinates).
'
2. Find the centroid of the half-cone x2 + y 2 ≤ z ≤ 1, x ≥ 0.
3. Find the volume of the region inside both the sphere x2 + y 2 + z 2 = 4 and the
cylinder x2 + y 2 = 1.
Find the volume of the region above the xy-plane, below the cone z = 2 −
4. '
x2 + y 2 , and inside the cylinder (x − 1)2 + y 2 = 1.
5. Find the mass of a right circular cylinder of base radius R and height h if the
mass density is c times the distance from the bottom of the cylinder.
6. Find the volume of the portion of the sphere x2 + y 2 + z 2 = 4 lying above the
plane z = 1.
7. Find the mass of a ball of radius R if the mass density is c times the distance
from the boundary of the ball.
8. Find the centroid of the portion of the ball x2 + y 2 + z 2 ≤ 1 lying in the first
octant (x, y, z ≥ 0).
9. Find the centroid of the parallelogram bounded by the lines x − 3y = 0, 2x +
y = 0, x − 3y = 10, and 2x + y = 15.
;;
10. Calculate S (x + y)4 (x − y)−5 dA where S is the square −1 ≤ x + y ≤ 1,
1 ≤ x − y ≤ 3.
11. Find the volume of the ellipsoid (x + 2y)2 + (x − 2y + z)2 + 3z 2 = 1.
12. Let S be the region in the first quadrant bounded by the curves xy = 1, xy = 4,
and the lines y = x, y = 4x. Find the area and the centroid of S by using the
transformation u = xy, v = y/x.
13. Let S be the region in the first quadrant bounded;;by the curves xy = 1, xy = 3,
x2 − y 2 = 1, and x2 − y 2 = 4. Compute S (x2 + y 2 ) dA. (Hint: Let
G(x, y) = (xy, x2 − y 2 ). What is | det DG|?)
188 Chapter 4. Integral Calculus
;;
14. Use the transformation x = u−uv, y = uv to evaluate S (x+y)−1 dA where
S is the region in the first quadrant between the lines x + y = 1 and x + y = 4.
15. Use “double polar coordinates” x = r cos θ, y = r sin θ, z = s cos ϕ, w =
s sin ϕ in R4 to compute the 4-dimensional volume of the ball x2 + y 2 + z 2 +
w 2 = R2 .
The question then arises as to how properties of f such as continuity and differen-
tiability relate to the corresponding properties of F .
Perhaps the most basic question of this sort is the following. Suppose that
is it true that * *
lim F (x) = · · · g(y) dn y?
x→a S
In other words, can one interchange the operations of integrating with respect to y
and taking a limit with respect to x? Is the limit of the integral equal to the integral
of the limit? In general, the answer is no.
E XAMPLE 1. let
x2 y / 0
f (x, y) = (x, y) ̸= (0, 0) , f (0, 0) = 0.
(x2 + y 2 )2
Evidently limx→0 f (x, y) = 0 for each y ;(although for different reasons when
1
y = 0 or when y ̸= 0). However, limx→0 0 f (x, y) dy ̸= 0; in fact,
* 1
)1
x2 y x2 ) 1
dy = − ) = ,
0
2
(x + y ) 2 2 2(x + y ) )0 2(1 + x2 )
2 2
1
which tends to 2 as x → 0.
4.5. Functions Defined by Integrals 189
Proof. Given ϵ > 0, we wish to find δ > 0 so that |F (x) − F (x′ )| < ϵ whenever
|x − x′ | < δ. Let |S| denote the n-dimensional volume of S. Since T × S is
compact, f is uniformly continuous on it by Theorem 1.33, so there is a δ > 0 so
that |f (x, y) − f (x′ , y)| < ϵ/|S| whenever y ∈ S, x, x′ ∈ T , and |x − x′ | < δ.
But then
* * * *
′ ′ n ϵ n
|F (x) − F (x )| ≤ · · · |f (x, y) − f (x , y)| d y < · · · d y = ϵ,
S S |S|
The argument now proceeds as in the proof of Theorem 4.46. Since ∂x f is contin-
uous on the compact set B(r, x0 ) × S, it is uniformly continuous there by Theorem
1.33. Thus, given ϵ > 0, we can find δ > 0 so that the integrand on the right of
(4.49) is less than ϵ/|S| for all y ∈ S, x ∈ B(r, x0 ), and t ∈ (0, 1), whenever
|h| < δ. It follows that
) * ) *
) F (x + h) − F (x) ) ϵ
) − ∂ f (x, y) dy )< dy = ϵ for |h| < δ,
) h
x )
S S |S|
as claimed.
;π
E XAMPLE 2. Let F (x) = 0 y −1 exy sin y dy. This integral
;π cannot be eval-
uated in elementary terms; however, we have F ′ (x) = 0 exy sin y dy, which
can be evaluated by two integrations by parts. The result is that F ′ (x) =
(eπx + 1)/(x2 + 1).
Situations often occur in which the variable x occurs in the limits of integration
as well as the integrand. For simplicity we consider the case where x and y are
scalar variables:
* ϕ(x)
(4.50) F (x) = f (x, y) dy.
a
4.5. Functions Defined by Integrals 191
We suppose that f is continuous in x and y and of class C 1 in x for each y, and that
ϕ is of class C 1 . If f does not depend on x, the derivative of F can be computed
by the fundamental theorem of calculus together with the chain rule:
* ϕ(x)
d
f (y) dy = f (ϕ(x))ϕ′ (x).
dx a
For the more general case (4.50), we can differentiate F by combining this result
with Theorem 4.47 according to the recipe in Exercise 7 of §2.3: Differentiate with
respect to each x in (4.50) in turn while treating the others as constants, and add
the results. The upshot is that
* ϕ(x)
∂f
(4.51) F ′ (x) = f (x, ϕ(x))ϕ′ (x) + (x, y) dy.
a ∂x
Then * *
x x
′
h (x) = (x − x)g(x) + g(y) dy = g(y) dy,
0 0
and hence h′′ (x) = g(x). (Cf. Exercise 7 in §4.3, where this result is ap-
proached from a different angle.)
The hypotheses of Theorems 4.46 and 4.47 can be weakened considerably, but
only at the cost of a more intricate proof. More sophisticated theories of integra-
tion (see §4.8) furnish a powerful theorem, the so-called dominated convergence
theorem, that generally provides the sharpest results in these situations. The full
statement of this theorem requires more background than we have available here,
but its restriction to the context of Riemann integrable functions is the following
result, in which the crucial condition is the existence of the uniform bound C.
An elementary (but not simple) proof for the case where S is an interval in R
can be found in Lewin [17]. The full dominated convergence theorem can be found
in Bear [3, p. 68], DePree and Swartz [5, p. 194], Jones [9, p. 133], and Rudin [18,
p. 321].
Theorem 4.52 implies the following improvements on Theorems 4.46 and 4.47.
Proof. To prove part (a), by Theorem 1.15 it is enough to show that F (xj ) → F (x)
whenever {xj } is a sequence in S converging to x ∈ S. This follows by applying
the bounded convergence theorem to the sequence of functions fj (y) = f (xj , y).
Similarly, part (b) is proved by applying the bounded convergence theorem to the
sequence of difference quotients with increments hj , where {hj } is a sequence
tending to zero along one of the coordinate axes. The uniform bound on these quo-
tients is obtained by applying the mean value theorem as in the proof of Theorem
4.47; details are left as Exercise 8.
EXERCISES
2
1. Let f (x, y) = x3 y −2 e−x /y if y > 0, f (x, y) = 0 if y ≤ 0.
a. Show that f (x, y) is of class C 1 as a function of x for each fixed y and as a
function of y for each fixed x, but that f is unbounded in any neighborhood
; 1 the smoothness in y, cf. Exercise 9 in
of the origin. (For
−x
§2.1.)
2
b. Let F (x) = 0 f (x, y) dy. Show that F (x) = xe and hence that
′
; 1
F (0) = 1, but that 0 ∂x f (0, y) dy = 0.
2. Compute F ′ (x) for the functions F (x) defined for x > 0 by the following
formulas. Your
; 1 answers should not contain integral signs.
a. F (x) = 0 log(1 + xey ) dy.
; x2
b. F (x) = 1 y −1 cos(xy 2 ) dy.
; 3x −1 xy
c. F (x) = 1 y e dy.
4.6. Improper Integrals 193
;x
3. Given a continuous function g on R, let h(x) = 0 (x − y)ex−y g(y) dy. Show
that h′′ − 2h′ + h = g.
;x
4. Given a continuous function g on R, let h(x) = 12 0 [sin 2(x − y)]g(y) dy.
Show that h′′ + 4h = g.
; ϕ(x)
5. Given F (x) = ψ(x) f (x, y) dy, find F ′ (x), assuming suitable smoothness
conditions on ψ, ϕ, and f .
6. (How to compress n antidifferentiations into one) Let f be a continuous func-
tion on R. For n ≥ 1, let
* x
1
f [n](x) = (x − y)n−1 f (y) dy.
(n − 1)! 0
/ 0′
Show that f [n] = f [n−1] for n > 1 and conclude that f [n] is an nth-order
antiderivative of f .
7. Let f be any continuous function on [0, 1]. For x ∈ R and t > 0, let
* 1 * 1
2 f (y)
u(x, t) = t−1/2 e−(x−y) /4t f (y) dy, v(x, t) = t 2 2
dy.
0 0 (x − y) + t
We study these two types in turn and then consider integrals of more complicated
sorts that can be obtained by combining them.
E XAMPLE
; ∞ −x1. 8 9b
a. ;0 e dx = limb→∞ − e−x 0 = 1, since limb→∞ e−b = 0.
∞
b. 0 cos x dx diverges, since limb→∞ sin b does not exist.
;∞
Our main concern here is not with the evaluation of a f (x) dx but with the
more basic question of whether or not it converges. At ; ∞the outset, we make one
; ∞ If c > a, the convergence of a f (x) dx is equivalent to
simple but useful remark:
the convergence
;c of c f (x) dx, the difference between
;∞ the two being the ordinary
integral a f (x) dx. Thus, the convergence of a f (x) dx depends only on the
behavior of f (x) as x → ∞, not on its behavior on a finite interval [a, c].
;b
We first consider the situation when f ≥ 0. In this case, the integral a f (x) dx
increases along with the upper endpoint b, so we can exploit the following variant
of the monotone sequence theorem.
4.54 Lemma. If ϕ is a bounded increasing function on [a, ∞), then limx→∞ ϕ(x)
exists and equals sup{ϕ(x) : x ≥ a}.
Proof. The proof is left to the reader (Exercise 7); it is essentially identical to the
proof of the monotone sequence theorem (1.16).
;x
By applying Lemma 4.54 to the function ϕ(x) = a f (t) dt, we see that the
;∞ ;b
integral a f (x) dx converges if and only if a f (x) dx remains bounded as b →
∞. This immediately leads to the basic comparison test for convergence.
4.6. Improper Integrals 195
Proof. If 0 < l < ∞, the fact that f (x)/g(x) → l yields the estimates f (x) ≤
2lg(x) and f (x) ≥ 12 lg(x) for sufficiently large x, so the first assertion follows by
comparing f to a multiple of g. If l = 0 (resp. l = ∞), we have f (x) ≤ g(x) (resp.
g(x) ≥ f (x)) for sufficiently large x, whence the other assertions follow.
The functions most often used for comparison in Theorem 4.55 and Corollary
4.56 are the power functions x−p . Taking a = 1 for convenience, for p ̸= 1 we
have * b 7
dx b1−p − 1 ∞ if p < 1,
p
= →
1 x 1−p (p − 1)−1 if p > 1,
;b ;∞
and 1 x−1 dx = log b → ∞. In short, 1 x−p dx converges if and only if p > 1.
Combining this fact with Theorem 4.55, we obtain the following handy rule:
;4.57 Corollary. If 0 ≤ f (x) ≤ Cx−p for all sufficiently large x, where p > 1, then
∞ −1
;a∞ f (x) dx converges. If f (x) ≥ cx (c > 0) for all sufficiently large x, then
a f (x) dx diverges.
;∞
E XAMPLE 2. The integral 0 [(2x + 14)/(x3 + 1)] dx converges, because
2x + 14 4x 4
3
≤ 3 = 2 for x ≥ 7.
x +1 x x
196 Chapter 4. Integral Calculus
and use Corollary 4.56 with g(x) = x−2 to establish the convergence of the
; ∞[1, ∞). (The integral over [0,;1]∞is proper.) Note that we are
integral over, say,
not comparing 0 [(2x + 14)/(x3 + 1)] dx to 0 x−2 dx, which presents an
additional difficulty because x−2 is unbounded at x = 0; the comparison of
(2x + 14)/(x3 + 1) with x−2 is significant only for large x.
It should be noted that the power functions x−p do not quite tell the whole story.
There are functions whose rate of decay at infinity is faster than x−1 but slower
than x−p for p > 1, and their integrals may be either convergent or divergent; see
Exercises 4 and 5.
Next we remove the assumption that f is nonnegative, and with a view toward
future applications, we shall allow f to be complex-valued. The question of con-
vergence can often be reduced to the case where f ≥ 0 via the following result.
;∞ ;∞
4.58 Theorem. If a |f (x)| dx converges, then a f (x) dx converges.
Proof. First suppose f is real-valued. Let f + (x) = max[f (x), 0] and f − (x) =
;max[−f (x), 0]. Then
; ∞ we have 0 ≤ f + (x) ≤ |f (x)| and 0 ≤ f − (x) ≤ |f (x)|, so
∞ + − + −
;a∞ f (x) dx and a f (x) dx converge by Theorem 4.55. But f = f − f , so
a f (x) dx converges also.
If f is complex-valued,
; ∞ we have | Re f (x)| ≤ |f (x)| and | Im; f∞(x)| ≤ |f (x)|,
so the; ∞convergence of a |f (x)| dx implies the convergence of a | Re f (x)| dx
and a | Im f (x)| dx and hence ; ∞(by the preceding argument) the convergence of
the real and imaginary parts of a f (x) dx.
;∞ ;∞
The integral a f (x) dx is called absolutely convergent if a |f (x)| dx con-
verges. Theorem 4.55 and its corollaries can be used to test ; ∞ for absolute conver-
gence, by applying
;∞ them to |f |. It is possible, however, for a f (x) dx to converge
even when a |f (x)| dx diverges because of cancellation effects between positive
and negative values. Here is an important example.
* ∞
sin x
E XAMPLE 3. The integral dx is not absolutely convergent (Exercise
1 x
8), but it is convergent. To see this, integrate by parts:
* ) * b
b
sin x − cos x ))b cos x
dx = ) − dx.
1 x x 1 1 x2
4.6. Improper Integrals 197
;∞
Now, 1 |x−2 cos x| dx converges by Corollary 4.57 since |x−2 cos x| ≤ x−2 ,
so the integral on the right approaches a finite limit as b → ∞; moreover,
;b since
|b−1 cos b| ≤ b−1 → 0, so does the other term. Hence limb→∞ 1 x−1 sin x dx
exists, as claimed.
Improper Integrals of Type II. In this subsection, all functions in question are
assumed to be defined on (a, b] and integrable on [c, b] for every c > a.
The definition of the improper integral in this situation is
* b * b
f (x) dx = lim f (x) dx.
a c>a, c→a c
;b
That is, a f (x) dx converges if the limit on the right exists, and diverges other-
wise. The obvious analogues of the results in the preceding subsection are valid in
this situation with essentially the same proofs; one has merely to replace conditions
like “x → ∞” or “for sufficiently large x” by “x → a” or “for x sufficiently close
to a.” For instance, here is the basic comparison test:
4.59 Theorem. Suppose that 0 ≤ f (x) ≤ g(x) for all x sufficiently close to
;b ;b ;b
a. If a g(x) dx converges, so does a f (x) dx. If a f (x) dx diverges, so does
;b
a g(x) dx.
The functions most often used for comparison in this situation are the power
functions (x − a)−p , but now the condition for convergence is p < 1 rather than
p > 1. Indeed, for p ̸= 1,
* b ) 7
−p (x − a)1−p ))b (1 − p)−1 (b − a)1−p if p < 1,
(x − a) dx = →
c 1 − p )c ∞ if p > 1,
;b
and c (x − a)−1 dx = log(x − a)|bc → ∞. Hence the analogue of Corollary 4.57
is as follows:
4.60 Corollary. If 0 ≤ f (x) ≤ C(x − a)−p for x near a, where p < 1, then
;b ;
−1 (c > 0) for x near a, then b f (x) dx
a f (x) dx converges. If f (x) > c(x − a) a
diverges.
;1
E XAMPLE 4. 0 x−2 sin 3x dx diverges. Indeed, x−1 sin 3x → 3 as x → 0, so
x−2 sin 3x > 2x−1 for x near 0.
Theorem 4.58 also remains valid in this situation; that is, absolute convergence
implies convergence.
; 1 −1/2
E XAMPLE 5. 0 x sin(x−1 ) dx is absolutely convergent, because
|x−1/2 sin(x−1 )| ≤ x−1/2 .
198 Chapter 4. Integral Calculus
The integral on the left converges only when both of the limits on the right exist
independently of one another; ; b a and b.
; ∞ there is no relation between the variables
The same ideas apply to a f (x) dx when f is unbounded at a or to a f (x) dx
when f is unbounded at both a and b.
;∞
E XAMPLE 6. −∞ dx/(1 + x2 ) converges; the integrals over (−∞, 0] and
[0, ∞) are both convergent by comparison to x−2 . In fact,
* ∞ 5 6
dx )b
= lim arctan x ) = π − − π = π.
2 a
−∞ 1 + x a→−∞, b→+∞ 2 2
;∞ ;1
E XAMPLE 7. ;0 x−p dx is divergent for every p. Indeed, if p < 1, 0 x−p dx
∞
converges but 1 x−p dx diverges, whereas the reverse is true if p > 1. If
p = 1, these integrals both diverge.
;∞
E XAMPLE 8. Consider 0 f (x) dx where f (x) = 1/(x1/2 + x3/2 ). Since
;1
0 < f (x) < x−1/2 , 0 f (x) dx converges by Corollary 4.60. Since 0 <
;∞ ;∞
f (x) < x−3/2 , 1 f (x) dx converges by Corollary 4.57. Hence 0 f (x) dx
converges.
4.6. Improper Integrals 199
;b
Finally, one can consider improper integrals a f (x) dx where f is unbounded
near one or more interior points of [a, b]. Again the trick is to break up [a, b] into
subintervals such that the singularities of f occur only at endpoints of the subinter-
vals and consider the integrals of f over the subintervals separately.
;9
XAMPLE 9. Let f (x) = (x3 − 8x2 )−1/3 , and let us consider 0 f (x) dx and
;E ∞
0 f (x) dx. The singularities of f occur at x = 0 and x = 8, so for the first
integral we write
* 9 * c * 8 * 9
= + + (0 < c < 8).
0 0 c 8
For the general case, we write ϕ(x) = ϕ(0) + [ϕ(x) − ϕ(0)], obtaining
* b * b * b
ϕ(x) dx ϕ(x) − ϕ(0)
P.V. dx = ϕ(0) P.V. + dx.
a x a x a x
We have just seen that the first quantity on the right exists, and the second one is a
proper integral: The integrand is actually continuous on [a, b] if we define its value
at x = 0 to be ϕ′ (0).
The
; ∞notion of principal value is also occasionally applied to integrals of the
form −∞ f (x) dx in which f is integrable over any finite interval:
* ∞ * R
P.V. f (x) dx = lim f (x) dx.
−∞ R→∞ −R
;∞
For example, the integral −∞ x(1 + x2 )−1 dx is divergent because the integrand
is asymptotically equal to x−1 as x → ±∞, but its principal value is zero because
the integrand is odd.
EXERCISES
1. Determine
* ∞ whether the following improper integrals of type I converge.
dx
a. √ .
*1 ∞ x2 x + 3
x − 3x − 1
b. 2
dx.
*3 ∞ x(x + 2)
2
c. x2 e−x dx.
*0 ∞
sin 4x
d. 2−x−2
dx.
3 x
4.6. Improper Integrals 201
* ∞
1
e. tan dx.
1 x
2. Determine
* 1 whether the following improper integrals of type II converge.
x
a. √ dx.
*0 1 − x2
π
b. cot x dx.
*π/2
1
√
1−x
c. 2
dx.
0
* 1 x − 4x + 3
dx
d. .
0 x (x + x)1/3
1/2 2
* 1
1 − cos x
e. dx.
0 sin3 2x
3. Determine whether the following improper integrals converge. In each case
it will be necessary to break up the integral into a sum of integrals of types I
and/or
* ∞II.
a. x−3/4 e−x dx.
*0 1
b. x−1/3 (1 − x)−2 dx.
*0 ∞ √
x
c. x
dx.
*0 ∞ e − 1
dx
d. .
*0 ∞ x(x − 1)1/3
1
e. x−1/5 sin dx.
*0 ∞ x
ex
f. x 2
dx.
−∞ e + x
where the Sr ’s are a family of measurable sets that fill out R2 as r → ∞. For
instance, we could take Sr to be the disc of radius r about the origin, or the square
of side length r centered at the origin, or the rectangle of side lengths r and r 2
centered at the origin, or the disc of radius r centered at (15, −37), and so on. The
difficulty is evident: There is a bewildering array of possibilities, with no rationale
for choosing one over another and no guarantee that different families Sr will yield
the same limit.
Evidently there is some work to be done, and we shall not give all the details
here. The outcome, in a nutshell, is that everything goes well when the integrand is
nonnegative or when the integral is absolutely convergent, but not otherwise.
4.7. Improper Multiple Integrals 203
always exists, provided that we allow +∞ as a ;value, ; and this limit is an obvious
n
candidate for the value of the improper integral · · · S f dV .
Here is the crucial point: Suppose that {U:j } is another sequence of sets satis-
fying the conditions of (4.63). Then the two limits
* * * *
n
lim · · · f dV and lim · · · f dV n
j→∞ Uj j→∞ !j
U
where {Uj } is any sequence of sets satisfying the conditions of (4.63). It is un-
derstood that the value of the integral may be +∞, in which case we say that the
integral diverges.
204 Chapter 4. Integral Calculus
The proof that the limit in (4.64) is independent of the choice of {Uj }, in full
generality, requires the Lebesgue theory of integration. We shall give a proof under
some additional restrictions on S and the Uj ’s, usually easy to satisfy in practice,
in Appendix B.6 (Theorem B.25).
It is also true that improper multiple integrals of nonnegative functions can be
evaluated as iterated improper integrals under suitable conditions on S and f so
that the latter integrals exist. For example,
** * ∞* ∞ * ∞* ∞
f dA = f (x, y) dx dy = f (x, y) dy dx,
R2 −∞ −∞ −∞ −∞
We shall not attempt to state a general theorem to cover all the various cases (much
less give a precise proof), but we assure the reader that as long as the integrand is
nonnegative, there is almost never any difficulty.
The analogue of the comparison test, Theorem 4.55, is valid for multiple im-
proper integrals, with the same proof. Again the basic comparison functions are
powers of |x|, but the critical exponent depends on the dimension.
4.65 Proposition. For p > 0, define fp on Rn \{0} by fp (x) = |x|−p . The integral
of fp over a ball {x : |x| < a} is finite if and only if p < n; the integral of fp over
the complement of a ball, {x : |x| > a}, is finite if and only if p > n.
On the one hand, we can take the approximating regions Uj to be discs centered at
the origin and switch to polar coordinates:
** * R * 2π * ∞ * 2π
−x2 −y 2 −r 2 2
e dA = lim e r dθ dr = e−r r dθ dr
R2 R→∞ 0 0 0 0
8 2 9∞
= 2π − 12 e−r 0 = π.
On the other hand, we can take the approximating regions to be squares centered at
the origin and stick to Cartesian coordinates:
** * R * R
−x2 −y 2 2 2
e dA = lim e−x e−y dx dy
R2 R→∞ −R −R
-* ∞ . -* ∞ .
−x2 −y 2
= e dx e dy .
−∞ −∞
The two integrals in parentheses are equal, of course; the name of the variable of
integration is irrelevant. We have shown that
-* ∞ .2
−x2
e dx = π.
−∞
2
Since e−x > 0, we can take the positive square root of both sides to obtain the
magic formula:
* ∞
2 √
4.66 Proposition. e−x dx = π.
−∞
2
The function e−x turns up in many contexts. In particular, it is essentially
the “bell curve” or “normal distribution” of probability and statistics, but in that
setting one must rescale it so that the total area under its graph is 1; Proposition
4.66 provides the appropriate scaling factor. Proposition 4.66 is remarkable not
2
only because it is inaccessible by elementary calculus (the antiderivative of e−x is
not an elementary function) but because it presents the number π in a starring role
that has nothing to do with circles.
206 Chapter 4. Integral Calculus
Now, what about functions that are not nonnegative? Let us suppose that S, f ,
and {Uj } are as in (4.63), but f is merely assumed to be real-valued. The essential
; the preceding theory can be applied to |f |, so that it makes sense to say
point;is that
that · · · S |f | dV n converges. If this
; condition
; holds, the argument used to prove
n
Theorem 4.58 shows that limj→∞ · · · Uj f dV exists and that
* * * * * *
n
lim ··· f dV = · · · f dV − · · · f − dV n ,
+ n
j→∞ Uj S S
where f + (x) = max[f (x), 0] and f − (x) = max[−f (x), 0]. The integrals on the
right converge by comparison to the integral of |f |, and they are independent
; ; of
the choice of {Uj }; hence, so is the limit on the left. In short, if · · · S |f | dV n
converges, we may define the improper integral of f over S by formula (4.64); the
limit in question exists and is independent of the choice of approximating sequence
{Uj }.
The same result holds if f is complex-valued; we simply consider its real and
imaginary parts separately.
In dimensions n > 1, however, there is no general theory of improper integrals
that are convergent but not absolutely convergent. Such integrals, when they arise,
must be defined by specific limiting procedures that are adapted to the situation at
hand.
EXERCISES
are equal, in which case their common value is the Lebesgue measure m(S).
Note that there is no assumption that the sets in question are bounded (although
compact sets are bounded by definition); the Lebesgue theory applies equally well
to bounded and unbounded sets.
The notion of n-dimensional Lebesgue measure for sets in Rn is entirely simi-
lar; only the terminology needs to be modified a little. Every set that one will ever
meet in “real life” — in particular, every open set, every closed set, every intersec-
tion of countably many open sets, every union of countably many closed sets, and
so on — is Lebesgue measurable.3 Lebesgue measure has the following fundamen-
tal additivity property:#If {Sj } is a finite or infinite sequence
# of disjoint
! Lebesgue
measurable sets, then Sj is Lebesgue measurable and m( Sj ) = m(Sj ). In
the Jordan theory, this additivity is guaranteed to hold only for finitely many sets;
the extension to infinitely many sets is the crucial property that allows the Lebesgue
theory to handle various limiting processes more smoothly.
It is not hard to show that every open set U ⊂ Rn is the union of a finite or
countably infinite family of rectangular boxes Rj (intervals when n = 1) with dis-
joint interiors, and the Lebesgue measure of U is just the sum of the n-dimensional
volumes of the boxes. (In general these boxes are not part of a fixed grid of boxes;
if there are infinitely many of them, the diameter of Rj generally tends to zero as
j → ∞.) It follows that a set S ⊂ Rn has Lebesgue measure zero if and only if for
every ϵ > 0, S is contained in the union of a finite or countable family of boxes,
the sum of whose volumes is less than ϵ. The only difference between this and the
condition that S have zero content is the fact that here we allow an infinite family
3
For those who know some set theory: More precisely, one cannot construct Lebesgue nonmea-
surable sets without invoking the axiom of choice.
4.8. Lebesgue Measure and the Lebesgue Integral 209
211
212 Chapter 5. Line and Surface Integrals; Vector Analysis
F IGURE 5.1: The vector fields F(x, y) = (x, y) (left) and F(x, y) =
(−y, x) (right).
closed sets. When we say that a function or vector field is of class C k on a closed
set S ⊂ Rn , we always mean that it is of class C k on some open set containing S.
is the vector difference between the two points, and we imagine it as being infinitely
small. We may, however, be more interested in the distance between the two points,
traditionally denoted by ds, which is
(
(5.2) ds = |dx| = dx21 + · · · + dx2n .
5.1. Arc Length and Line Integrals 213
To give these differentials a precise meaning that can be used for calculations, the
best procedure is to parametrize the curve. Thus, we assume that C is given by
parametric equations x = g(t), a ≤ t ≤ b, where g is of class C 1 and g′ (t) ̸= 0.
Then the neighboring points x and x + dx are given by g(t) and g(t + dt), so
- .
′ dx1 dxn
(5.3) dx = g(t + dt) − g(t) = g (t) dt = ,..., dt.
dt dt
(The difference between the increment of g and its linear approximation disappears
in the infinitesimal limit.) Moreover,
E- . - .
′ dx1 2 dxn 2
(5.4) |dx| = |g (t)| dt = + ··· + dt,
dt dt
which is just what one gets by formally multiplying and dividing the expression on
the right of (5.2) by dt.
What happens if we sum up all the infinitesimal increments dx or ds — that
is, if we integrate the differentials dx or ds = |dx| over the curve? Integration of
the vector increments dx just gives the total vector increment, that is, the vector
difference between the initial and final points on the curve:
* * b
(5.5) dx = g′ (t) dt = g(b) − g(a).
C a
This is nothing but the fundamental theorem of calculus applied to the components
of g; it is simple but not very exciting. On the other hand, ds is the straight-line
distance between two infinitesimally close points x and x + dx on the curve, and
since smooth curves are indistinguishable from their linear approximations on the
infinitesimal level, ds is the arc length of the bit of curve between dx and x + dx.
Adding these up gives the total arc length of the curve:
* * b
(5.6) Arc length = ds = |g′ (t)| dt.
C a
where for the second equality we have used the chain rule. This does indeed agree
with (5.6), by formula (4.34).
The same independence of parametrization holds for ; bthe′ related integral (5.5),
with one subtle but important difference. The integral a g (t) dt gives the vector
difference between the two endpoints of the curve, which is clearly independent of
the parametrization except insofar as the parametrization determines which is the
initial point and which is the final point. If we choose a new parameter u as above
so that t is a decreasing function of u (thus a = ϕ(d) and b = ϕ(c)), then the initial
and final points get switched, and so their difference is multiplied by −1.
The issue here is that a parametrization x = g(t) determines an orientation for
the curve C, that is, a determination of which direction along the curve is “forward”
and which direction is “backward,” the “forward” direction being the direction in
which the point g(t) moves as t increases. The orientation of a curve can be conve-
niently indicated in a picture by drawing one or more arrowheads along the curve
that point in the “forward” direction, as indicated in Figure 5.2. The substance of
the preceding paragraph is then that the integral (5.5) depends on the parametriza-
tion only insofar as the parametrization determines a choice of orientation. In
contrast, the arc length of a curve is independent even of the orientation.
The notion of arc length extends in an obvious way to piecewise smooth curves,
obtained by joining finitely many smooth curves together end-to-end but allow-
ing corners or cusps at the joining points; we simply compute the lengths of the
smooth pieces and add them up. We can express this more precisely in terms of
parametrizations, as follows: The function g : [a, b] → Rn is called piecewise
smooth if (i) it is continuous, and (ii) its derivative exists and is continuous except
perhaps at finitely many points tj where the one-sided limits limt→tj ± g′ (t) exist.
5.1. Arc Length and Line Integrals 215
(Note. In Chapter 8 we shall use the term “piecewise smooth” in a slightly different
sense.) In this case |g′ (t)| is an integrable function on [a, b] by Theorem 4.12 (the
fact that it may be undefined at a few points is immaterial), and its integral gives
the arc length. The same generalization also applies to the line integrals discussed
below.
Remarks.
i. The parametrization x = g(t) may be considered as representing the curve C
as the path traced out by a moving particle whose position at time t is g(t).
The derivative g′ (t) is then the velocity of the particle, and its norm |g′ (t)|
;b
is the speed of the particle. Integrating the velocity, a g′ (t) dt, gives the net
difference ;in the initial and final positions of the particle, whereas integrating
b
the speed, a |g′ (t)| dt, gives the total distance traveled by the particle, i.e., the
arc length of the curve.
ii. In the preceding discussion, we have implicitly assumed that the parametri-
zation x = g(t) is one-to-one. This is not always the case if we think of g(t)
as the position of a particle at time t, for the particle can traverse a path more
than once. For example, g(t) = (cos t, sin t) represents a particle moving
around the unit circle with constant speed. If we restrict t to an interval of
length ≤ 2π, we get a one-to-one parametrization of part or all of the circle,
but from the physical point of view there is no reason to make such a restriction.
However, the interpretations
;b ′ ; b ′hold whether g is one-
in the preceding paragraph
to-one or not: a g (t) dt is still g(b) − g(a), and a |g (t)| dt is still the total
distance traveled by the particle from time a to time b; it can be interpreted
as arc length if the portions of the curve that are traversed more than once are
counted with the appropriate multiplicity.
iii. While theoretically simple, calculation of arc length tends to be difficult in
practice because the square root implicit in the definition of the norm |g′ (t)|
often leads to unpleasant integrands. This is just a fact of life.
This is independent of the parametrization and the orientation, by the same chain-
rule calculation that we performed above for the case f ≡ 1.
216 Chapter 5. Line and Surface Integrals; Vector Analysis
Here we have applied Theorem 4.9d to the scalar-valued function F(t) · u and then
invoked Cauchy’s inequality. ;The desired result is obtained by taking u to be the
b
unit vector in the direction of a F(t) dt.
Of greater interest is a scalar-valued line integral for vector fields — that is, for
Rn -valued functions on Rn . If C is a smooth (or piecewise smooth) curve in Rn
and F is a continuous vector field defined on some neighborhood of C in Rn , the
line integral of F over C is
* *
F · dx = (F1 dx1 + F2 dx2 + · · · + Fn dxn ).
C C
so
* *
(5.10) F · dx = Ftang ds.
C C
;
That is, C F · dx is the integral of the tangential component of F with respect to
arc length. The dependence on the orientation here comes through Ftang , which
changes sign if the orientiation is reversed. (Any temptation to compute specific
line integrals by using (5.10), however, should probably be resisted, because the
element of arc length ds is often hard to compute with. It is almost always better to
use the basic definition (5.9) instead.)
Remarks. ;
i. If F is a force field, then C F · dx represents a quantity of energy; it is the
work done by the force on a particle that traverses the curve C.
ii. The integrand F · dx = F1 dx1 + · · · + Fn dxn in a line integral, with the
dx’s included, is often called a differential form, and we speak of integrating
a differential form over a curve. We shall return to this notion in §5.9.
What does all this boil down to when n = 1? In this case, vector fields and
scalar functions are the same thing, and both the scalar and vector versions of line
integrals are just ordinary one-variable integrals. The former, however, is indepen-
dent of orientation, whereas the latter depends on orientation. The distinction is the
same as the one between formulas (4.32) and (4.33) in §4.4; it is a question of
* * b
f (x) dx versus f (x) dx.
[a,b] a
In the integral on the left we must have a ≤ b; but in the integral on the right a and
b can occur in either order, and the sign of the integral depends on the order.
E XAMPLE 2. Let C be the ellipse formed by the intersection of the circular
cylinder x2 + y 2 = 1 and the plane z = 2y + 1, oriented counterclockwise
;
as
; viewed from above, and let F(x, y, z) = (y, z, x). Calculate C F · dx =
C (y dx + z dy + x dz).
Solution. We can parametrize C by x = cos t, y = sin t, z = 2 sin t + 1,
with 0 ≤ t ≤ 2π. Then dx = (− sin t, cos t, 2 cos t) dt, so
/ 0
F · dx = − sin2 t + (2 sin t + 1) cos t + 2 cos2 t dt
= (cos 2t + sin 2t + cos t + cos2 t) dt.
; first three terms over [0, 2π] vanishes, and the integral of the
The integral of the
last one is π. So C F · dx = π.
5.1. Arc Length and Line Integrals 219
Note that it doesn’t matter which point on C we choose to start and end at.
Instead of taking t ∈ [0, 2π], we could take t ∈ [a, a + 2π] for any a ∈ R; the
answer is the same since the integral of a trig function over a complete period
is independent of the particular period chosen.
is bounded, then C is called rectifiable, and the arc length L(C) is defined to be
the supremum of L:
% &
L(C) = sup LP (C) : P is a partition of [a, b] .
length is additive: If C1 and C2 are the curves parametrized by g(t) for t ∈ [a, c]
and t ∈ [c, b], then L(C) = L(C1 ) + L(C2 ). See Exercise 8.
We now show that this definition coincides with our previous one for C 1 curves.
Proof. For any partition P of [a, b], by (5.5) and Proposition 5.8 we have
J )*
"
) J * *
) tj ) " tj b
LP (C) = ) g (t) dt)) ≤
′
|g′ (t)| dt = |g′ (t)| dt.
)
1 tj−1 1 tj−1 a
;b
It follows that L(C) ≤ a |g′ (t)| dt, and in particular that C is rectifiable.
Next, for r, s ∈ [a, b], let Crs be the curve parametrized by g(t) with t ∈ [r, s],
and let ϕ(s) = L(Cas ). (That is, we consider the length of the curve C, starting
at t = a, as a function of the right endpoint of the parameter interval.) Suppose
h > 0. Since arc length is additive, we have L(Css+h ) = ϕ(s + h) − ϕ(s), so by
the inequality we have just proved (applied to the curve Css+h ) and the mean value
theorem for integrals,
* s+h
L(Css+h ) = ϕ(s + h) − ϕ(s) ≤ |g′ (t)| dt = h|g′ (σ)|,
s
where σ is some number between s and s + h. On the other hand, |g(s + h) − g(s)|
is LP (Css+h ) where P is the trivial partition {s, s + h}, and hence it is no bigger
than L(Css+h ). Combining these estimates and dividing by h, we see that
) )
) g(s + h) − g(s) ) ϕ(s + h) − ϕ(s)
) )≤ ≤ |g′ (σ)|.
) h ) h
As h → 0, the quantities on the left and right approach |g′ (s)|, and hence so does
the one in the middle. A slight modification of this argument works also for h < 0,
so we conclude that ϕ is differentiable and that ϕ′ (s) = |g′ (s)|. The desired result
is now immediate:
* b
L(C) = ϕ(b) = ϕ(b) − ϕ(a) = |g′ (s)| ds.
a
5.1. Arc Length and Line Integrals 221
EXERCISES
1. Find the arc length of the following parametrized curves:
a. g(t) = (a cos t, a sin t, bt), t ∈ [0, 2π].
b. g(t) = ( 13 t3 − t, t2 ), t ∈ [0, 2].
c. g(t) = (log t, 2t, t2 ), t ∈ [1, e].
d. g(t) = (6t, 4t3/2 , −4t3/2 , 3t2 ), t ∈ [0, 2].
2. Express the arc length of the following curves in terms of the integral
* π/2 '
E(k) = 1 − k2 sin2 t dt (0 < k < 1),
0
for suitable values of k. (E(k) is one of the standard elliptic integrals, so
called because of their connection with the arc length of an ellipse.)
a. An ellipse with semimajor axis a and semiminor axis b.
b. The portion of the intersection of the sphere x2 + y 2 + z 2 = 4 and the
cylinder x2 + y 2 − 2y = 0 lying in the first octant.
3. Find the centroid of the curve y = cosh x, −1 ≤ x ≤ 1.
; √
4. Compute C z ds where C is parametrized by g(t) = (2 cos t, 2 sin t, t2 ),
0 ≤ t ≤ 2π.
;
5. Compute C F · dx for the following F and C:
a. F(x, y, z) = (yz, x2 , xz); C is the line segment from (0, 0, 0) to (1, 1, 1).
b. F is as in (a); C is the portion of the curve y = x2 , z = x3 from (0, 0, 0)
to (1, 1, 1).
c. F(x, y) = (x − y, x + y); C is the circle x2 + y 2 = 1, oriented clockwise.
d. F(x, y) = (x2 y, x3 y 2 ); C is the closed curve formed by portions of the
line y = 4 and the parabola y = x2 , oriented counterclockwise.
6. Compute; the following line integrals:
a. C (xe−y dx + sin πx dy), where C is the portion of the parabola y = x2
from
; (0, 0) to (1, 1).
b. C (y dx + z dy + xy dz), where C is given by x = cos t, y = sin t, z = t
;with 02 ≤ t ≤ 2π.
c. C (y dx − 2x dy), where C is the triangle with vertices (0, 0), (1, 0), and
(1, 1), oriented counterclockwise.
7. Let F : Rn → Rm be a continuous map,; and let C be 1
; a C curve in R .
n
Use the mean value theorem to express the differences inside the square root in
terms of g′ and h′ , and then use Exercise 9 in §4.1 to give an alternate proof of
Theorem 5.11. (Exactly the same idea works for curves in Rn .)
the sum of the line integrals of F over the positively oriented closed curves that
make up ∂S.
5.12 Theorem (Green’s Theorem). Suppose S is a regular region in R2 with piece-
wise smooth boundary ∂S. Suppose also that F is a vector field of class C 1 on S.
Then
* ** - .
∂F2 ∂F1
(5.13) F · dx = − dA.
∂S S ∂x1 ∂x2
In the more common notation, if we set F = (P, Q) and x = (x, y),
* ** - .
∂Q ∂P
(5.14) P dx + Q dy = − dA.
∂S S ∂x ∂y
Proof. First we consider a very restricted class of regions, for which the proof is
quite simple. We shall say that the region S is x-simple if it is the region between
the graphs of two functions of x, that is, if it has the form
% &
(5.15) S = (x, y) : a ≤ x ≤ b, ϕ1 (x) ≤ y ≤ ϕ2 (x) ,
where ϕ1 and ϕ2 are continuous, piecewise smooth functions on [a, b]. Likewise,
we say that S is y-simple if it has the form
% &
(5.16) S = (x, y) : c ≤ y ≤ d, ψ1 (y) ≤ x ≤ ψ2 (y) ,
where ψ1 and ψ2 are continuous, piecewise smooth functions on [c, d].
E XAMPLE 1. The region bounded by the curve y = 18 x3 − 1, the line x + 2y =
2, and the y-axis is both x-simple and y simple. (See Figure 5.5.) It has the
forms (5.15) and (5.16) with
a = 0, b = 2, ϕ1 (x) = 18 x3 − 1, ϕ2 (x) = 1 − 12 x,
7
2(y + 1)1/3 if −1 ≤ y ≤ 0,
c = −1, d = 1, ψ1 (y) = 0, ψ2 (y) =
2 − 2y if 0 ≤ y ≤ 1.
224 Chapter 5. Line and Surface Integrals; Vector Analysis
−1
In exactly the same way, using the representation (5.16) for S, we see that
* **
∂Q
Q dy = dA.
∂S S ∂x
(There is no minus sign here, because if we take y as the parameter for the curves
x = ψ1 (y) and ψ2 (y), the orientation is wrong for ψ1 and right for ψ2 .) Adding
these last two equalities, we obtain the desired result (5.14).
5.2. Green’s Theorem 225
F IGURE 5.6: A decomposition of the region in Figure 5.4 into simple subregions.
Thus Green’s theorem is established for regions that are both x-simple and y-
simple. There is now an immediate generalization to a much larger class of regular
regions. Namely, suppose the region S can be cut up into finitely many subregions,
say S = S1 ∪ · · · · ∪Sk , where
a. the Sj ’s may intersect along common edges but have disjoint interiors;
b. each Sj has a piecewise smooth boundary and is both x-simple and y-simple.
(See Figure 5.6.) Since the Sj ’s overlap only in a set of zero content, by Corollary
4.23b we have
** - . " k ** - .
∂Q ∂P ∂Q ∂P
− dA = − dA.
S ∂x ∂y Sj ∂x ∂y
j=1
because the integrals over the parts of the boundaries of the Sj ’s that are not parts of
the boundary of S all cancel out. In more detail, if Si and Sj have a common edge
C, then C will have one orientation as part of ∂Si and the opposite ; orientation
;
as part of ∂Sj , so the two integrals over C that make up parts of ∂Si and ∂Sj
will cancel each other. Therefore, we obtain Green’s theorem for the region S by
applying Green’s theorem to the simple regions Sj and adding up the results.
The result we have just obtained is sufficient for most practical purposes, but
it is not definitive. The class of regular regions that can be cut up into simple
subregions does not include all regions with C 1 boundary, much less all regions
with piecewise smooth boundary, and it may be difficult to tell whether a given
region has this property. For example, the region
% &
(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 + x3 sin x−1
226 Chapter 5. Line and Surface Integrals; Vector Analysis
is x-simple but cannot be cut up into finitely many y-simple subregions because
the graph of x3 sin x−1 has infinitely many “wiggles.” The deduction of the general
case from the special cases considered here requires some additional machinery that
is of interest in its own right; we present it in Appendix B.7 (Theorem B.28).
** - .
∂ 8 2 xy 4
9 ∂ 8' 2 xy
9
x − xe + log(1 + y ) − 1 + x − ye + 3y dA
D ∂x ∂y
**
= (2x − 3) dA = −3π.
D
Let us see what Green’s theorem says when F is the gradient of a C 2 function
f , so that F1 = ∂1 f and F2 = ∂2 f . Formula (5.13) gives
* ** **
∇f · dx = (∂1 ∂2 f − ∂2 ∂1 f ) dA = 0 dA = 0.
∂S S S
This is no surprise; it is easy to see directly that the line integral of a gradient over
any closed curve vanishes. Indeed, if the curve C is parametrized by x = g(t) with
g(a) = g(b), then by the chain rule,
* * b * b
′ d
∇f · dx = ∇f (g(t)) · g (t) dt = f (g(t)) dt
C a a dt
= f (g(b)) − f (g(a)) = 0.
The formula (5.18) gives a more interesting result. ∇f · n is the directional deriva-
tive of f in the outward normal direction to ∂S, or normal derivative of f on ∂S,
often denoted by ∂f /∂n; and (5.18) says that
* ** - 2 .
∂f ∂ f ∂2f
ds = + 2 dA.
∂S ∂n S ∂x21 ∂x2
The integrand on the right is the Laplacian of f , which we encountered in §2.6 and
which will play an important role in §5.6.
EXERCISES
7. The point of this exercise is to show how Green’s theorem can be used to de-
duce a special case of Theorem 4.41. Let U , V be connected open sets in R2 ,
and let G : U → V be a one-to-one transformation of class C 1 whose deriva-
tive DG(u) is invertible for all u ∈ U . Moreover, let S be a regular region in V
with piecewise smooth boundary, let A be its area, and let T = G−1 (S).
a. The Jacobian det DG is either everywhere positive or everywhere negative
on U ; why? ;
b. Suppose det DG(u) > 0 for all u ∈ U . Write A = ∂S y dx as in Ex-
ample 3, make a change of variable to transform this line integral into
a line integral over ∂T , and apply Green’s theorem to deduce that A =
;;
T det DG dA.
c. By a similar
;; argument, show ;; that if det DG(u) < 0 for all u ∈ U , then
A = − T det DG dA = T | det DG| dA. Where does the minus sign
come from?
a smooth surface in R3 is to make a choice of one of the two unit normal vectors
at each point of the surface, in such a way that the choice varies continuously with
the point. The “positive” side of the surface is the one into which the normal arrow
points.
It is important to note that not every surface can be oriented. The standard
example of a nonorientable surface is the Möbius band, which can be constructed
by taking a long strip of paper, giving it a half twist, and gluing the ends together.
(That is, call the two sides of the original strip A and B; the ends are to be glued
together so that side A of one end matches with side B of the other.) A sketch of a
Möbius band is given in Figure 5.7, but the best way to appreciate the features of
the Möbius band is to make one for yourself.
However, if a surface forms part of the boundary of a regular region in R3 , it
is always orientable, and the standard specification for the orientation is that the
positive normal vector is the one pointing out of the region.
image under the map G is a small quadrilateral (with curved sides) on the surface
S whose vertices are G(u, v), G(u + ∆u, v), etc. (See Figure 3.4 in §3.3.) In the
limit in which the increments ∆u and ∆v become infinitesimals du and dv, this
quadrilateral becomes a parallelogram whose sides from the vertex x = G(u, v) to
the two adjacent vertices are described by the vectors
∂G ∂G
G(u + du, v) − G(u, v) = du and G(u, v + dv) − G(u, v) = dv.
∂u ∂v
These two vectors are tangent to the surface S at x, so their cross product is a
vector normal to S at x, whose magnitude is the area of the parallelogram they
span. Therefore, the element of area on S is given in terms of the parametrization
x = G(u, v) by
) )
) ∂G ∂G )
(5.19) dA = ) ) × ) du dv.
∂u ∂v )
In other words, if R is a measurable subset of W in the uv-plane and G(R) is the
corresponding region in the surface S,
** ) )
) ∂G ∂G )
(5.20) Area of G(R) = ) × ) du dv.
) ∂v )
R ∂u
Thus,
E+ ,2 + ,2 + ,2
∂(y, z) ∂(z, x) ∂(x, y)
(5.21) dA = + + du dv.
∂(u, v) ∂(u, v) ∂(u, v)
Computationally, this is usually a horrible mess. (But what did you expect? Arc
length is already problematic; surface area must be worse!)
5.3. Surface Area and Surface Integrals 231
As with arc length, we must verify that our informally-derived formula for sur-
face area really makes sense by checking that it is independent of the parametriza-
tion. Thus, suppose we make a change of variables (u, v) = Φ(s, t), where Φ is a
one-to-one C 1 map from a region V in the st-plane to the region W in the uv-plane.
The elements of area are then related by
) )
) ∂(u, v) )
du dv = )) ) ds dt,
∂(s, t) )
But by the chain rule and the fact that the determinant of a product is the product
of the determinants, we have
and likewise for the other two terms. Hence, in the st-parametrization,
E+ , + , + ,
∂(y, z) 2 ∂(z, x) 2 ∂(x, y) 2
dA = + + ds dt.
∂(s, t) ∂(s, t) ∂(s, t)
∂G ∂G
× = −(∂x ϕ)i − (∂y ϕ)j + k,
(5.22) ∂x ∂y
(
dA = 1 + (∂x ϕ)2 + (∂y ϕ)2 dx dy.
(Note that our surface is a level set of the function Φ(x, y, z) = z − ϕ(x, y) and
that −(∂x ϕ)i − (∂y ϕ)j + k = ∇Φ; we deduced that ∇Φ is normal to the surface
by other means in Theorem 2.37.)
232 Chapter 5. Line and Surface Integrals; Vector Analysis
say z = ϕ(x, y). As in the preceding discussion of surface area, we take x and y
as the parameters and find that
n dA = [−(∂x ϕ)i − (∂y ϕ)j + k] dx dy.
The orientation here is the one with the normal pointing upward, since its z com-
ponent is positive. Thus, if F = F1 i + F2 j + F3 k and G(x, y) = (x, y, ϕ(x, y)),
**
(5.23) F · n dA
S
** + ,
∂ϕ ∂ϕ
= −F1 (G(x, y)) − F2 (G(x, y)) + F3 (G(x, y)) dx dy.
W ∂x ∂y
Here and in what follows, we adopt the common practice of denoting by i, j,
and k the unit vectors in the positive coordinate directions and writing vector fields
in R3 as F = F1 i + F2 j + F3 k in preference to F = (F1 , F2 , F3 ); this serves to
emphasize the interpretation of F as a vector field rather than a transformation.
E XAMPLE 2. Let S be the portion of the cone x2 + y 2 = z 2 with 0 ≤ z ≤ 1,
oriented so;;that the normal points upward, and let F(x, y, z) = x2 i + yzj + yk.
Compute S F · n dA.
Solution. One way is to use polar coordinates as parameters: G(r, θ) =
(r cos θ, r sin θ, r). Then we have ∂r G = (cos θ)i + (sin θ)j + k and ∂θ G =
−(r sin θ)i + (r cos θ)j, so
∂r G × ∂θ G = −(r cos θ)i − (r sin θ)j + rk.
This gives the right orientation since the z component, namely r, is positive.
Thus,
**
F · n dA
S
* 2π * 1
8 9
= (r cos θ)2 (−r cos θ) + (r sin θ)r(−r sin θ) + (r sin θ)r dr dθ,
0 0
Finally, as a practical matter we need to extend the ideas in this section from
smooth surfaces to piecewise smooth surfaces. Giving a satisfactory general def-
inition of a “piecewise smooth surface” is a rather messy business, and we shall
not attempt it. For our present purposes, it will suffice to assume that the surface S
under consideration is the union of finitely many pieces S1 , . . . , Sk that satisfy the
following conditions:
i. Each Sj admits a smooth parametrization as discussed above.
ii. The intersections Si ∩ Sj are either empty or finite unions of smooth curves.
We then define integration over S in the obvious way:
** k **
"
f dA = f dA.
S j=1 Sj
Condition (ii) guarantees that the parts of S that are counted more than once on
the right, namely the intersections Si ∩ Sj , contribute nothing to the integral, by
Propositions 4.19 and 4.22.
E XAMPLE 3.
a. Let S be the surface of a cube; then we can take S1 , . . . , S6 to be the faces
of the cube.
b. Let S be the surface of the cylindrical solid {(x, y, z) : x2 + y 2 ≤ 1, |z| ≤
1}. We can write S = S1 ∪ S2 ∪ S3 where S1 and S2 are the discs forming
the top and bottom and S3 is the circular vertical side. S1 and S2 can be
parametrized by (x, y) → (x, y, 1) and (x, y) → (x, y, −1) with x2 +y 2 ≤
1, and S3 can be parametrized by (θ, z) → (cos θ, sin θ, z) with 0 ≤ θ <
2π and |z| ≤ 1. If one wishes to use only one-to-one parametrizations with
compact parameter domains, one can cut S3 further into two pieces, say
the left and right halves defined by 0 ≤ θ ≤ π and π ≤ θ ≤ 2π.
Remark. In condition (ii) above, we have in mind that the sets Sj will intersect
each other only along their edges, although there is nothing to forbid them from
crossing one another. For example, S could be the union of the two spheres S1 =
{x : |x| = 1} and S2 = {x : |x − i| = 1}. This added generality is largely useless
but also harmless.
EXERCISES
1. Find the area of the part of the surface z = xy inside the cylinder x2 + y 2 = a2 .
236 Chapter 5. Line and Surface Integrals; Vector Analysis
2. Find the area of the part of the surface z = x2 +y 2 inside the cylinder x2 +y 2 =
a2 .
3. Suppose 0 < a < b. Find the area of the torus obtained by revolving the circle
(x − b)2 + z 2 = a2 in the xz-plane about the z axis. (Hint: The torus may be
parametrized by x = (b + a cos ϕ) cos θ, y = (b + a cos ϕ) sin θ, z = a sin ϕ,
with 0 ≤ ϕ, θ ≤ 2π.)
4. Find the area of the ellipsoid (x/a)2 + (y/a)2 + (z/b)2 = 1.
5. Find the centroid of the upper hemisphere of the unit sphere x2 + y 2 + z 2 = 1.
;;
6. Compute S (x2 +y 2 ) dA where S is the portion of the sphere x2 +y 2 +z 2 = 4
with z ≥ 1.
;;
7. Compute S (x2 + y 2 − 2z 2 ) dA where S is the unit sphere. Can you find the
answer by symmetry considerations without doing any calculations?
;;
8. Calculate S F · n dA for the following F and S.
a. F(x, y, z) = xzi − xyk; S is the portion of the surface z = xy with
0 ≤ x ≤ 1, 0 ≤ y ≤ 2, oriented so that the normal points upward.
b. F(x, y, z) = x2 i + zj − yk; S is the unit sphere x2 + y 2 + z 2 = 1, oriented
so that the normal points outward (away from the center).
c. F(x, y, z) = xyi + zj; S is the triangle with vertices (2, 0, 0), (0, 2, 0),
(0, 0, 2), oriented so that the normal points upward.
d. F(x, y, z) = z 2 k; S is the boundary of the region x2 + y 2 ≤ 1, a ≤ z ≤ b,
oriented so that the normal points out of the region. (You should be able to
do this in your head.)
F(x, y, z) = xi + yj + zk; S is the boundary of the region x2 + y 2 ≤ z ≤
e. '
2 − x2 − y 2 , oriented so that the normal points out of the region.
∇ = (∂1 , . . . , ∂n ).
We are already familiar with this notation in connection with the gradient of a C 1
function on Rn , which is the vector field defined by
grad f = ∇f = (∂1 f, . . . , ∂n f ).
div F = ∇ · F = ∂1 F1 + · · · + ∂n Fn .
(Some authors write rot F instead of curl F; “rot” stands for “rotation.”) Again,
the curl has a geometric significance that will be explained later, in §5.7.
We shall employ the notations div F and curl F in preference to ∇·F and ∇×F
because they seem to be more readable. In this section we shall also write grad f
instead of ∇f for the sake of consistency; later we shall use these two notations
interchangeably.
The operators grad, curl, and div satisfy product rules with respect to scalar
multiplication and dot and cross products. As these rules are useful and some of
them are not obvious, it is well to make a list for handy reference. In the following
formulas, f and g are real-valued functions and F and G are vector fields, all of
class C 1 .
It is an important fact that the first two of these always vanish, by the equality
of mixed partials:
(5.30) curl(grad f )
= (∂2 ∂3 f − ∂3 ∂2 f )i + (∂3 ∂1 f − ∂1 ∂3 f )j + (∂1 ∂2 f − ∂2 ∂1 f )k = 0
and
(5.31) div(curl F)
= ∂1 (∂2 F3 − ∂3 F2 ) + ∂2 (∂3 F1 − ∂1 F3 ) + ∂3 (∂1 F2 − ∂2 F1 ) = 0.
Schematically, we have
and (5.30) and (5.31) say that the composition of two successive mappings is zero.
The third combination, div(grad f ), which makes sense in any number of di-
mensions, is of fundamental importance for both physical and purely mathematical
reasons. It is called the Laplacian of f and is usually denoted by ∇2 f or ∆f :
The last two combinations are of less interest by themselves, but together they yield
the Laplacian for vector fields in R3 :
EXERCISES
where W is a regular region in the xy-plane and ϕ1 and ϕ2 are piecewise smooth
functions on W . We define the notions of yz-simple and xz-simple similarly, and
we say that R is simple if it is xy-simple, yz-simple, and xz-simple.
240 Chapter 5. Line and Surface Integrals; Vector Analysis
Suppose now that R is simple. We shall prove the divergence theorem for
the region R by considering the components of F separately. That is, let F =
F1 i + F2 j + F3 k; we shall show that
** ***
F3 k · n dA = ∂3 F3 dV,
∂R R
and similarly for the other two components. Since R is xy-simple, the boundary
∂R consists of three pieces: the “top” and “bottom” surfaces z = ϕ2 (x, y) and
z = ϕ1 (x, y) and the “sides” consisting of the union of the vertical line segments
from (x, y, ϕ1 (x, y)) to (x, y, ϕ2 (x, y)) as (x, y) ranges over the boundary of W .
The outward normal to R is horizontal on the sides, i.e., k · n = 0 there, so the
sides contribute nothing to the surface integral. For the top and bottom surfaces we
use (5.23). The outward normal points upward on the top surface and downward
on the bottom surface, so
** ** **
F3 k · n dA = F3 (x, y, ϕ2 (x, y)) dx dy − F3 (x, y, ϕ1 (x, y)) dx dy
∂R W W
** * ϕ2 (x,y)
= ∂3 F3 (x, y, z) dz dx dy
W ϕ1 (x,y)
***
= ∂3 F3 (x, y, z) dV,
R
as claimed. The proof for F1 i and F2 j is the same, using the assumptions that R is
yz-simple and xz-simple.
It now follows that the divergence theorem is valid for regions that can be cut
up into finitely many simple regions R1 , . . . , Rk . The integrals of div F over the
regions R1 , . . . , Rk add up to the integral over R, and the integrals of F · n over
the boundaries ∂R1 , . . . , ∂Rk add up to the integral over ∂R because the integrals
over the portions of the ∂Rj ’s that are not part of ∂R cancel out. (The reasoning is
the same as in the proof of Green’s theorem.)
The completion of the proof for general regular regions with smooth boundary,
with indications of how to generalize it to the piecewise smooth case, is given in
Appendix B.7 (Theorem B.30).
Armed with the divergence theorem, we can obtain a better understanding of
the meaning of div F. Suppose F is a vector field of class C 1 in some open set
containing the point a. For r > 0, let Br be the ball of radius r about a. If r is
very small, the average value of div F(x) on the ball Br is very nearly equal to
div F(a). Therefore, by the divergence theorem,
*** **
3 3
div F(a) ≈ div F dV = F · n dA.
4πr 3 Br 4πr 3 ∂Br
5.5. The Divergence Theorem 241
The integral on the right is the flux of F across ∂Br from the inside (Br ) to the
outside (the complement of Br ). If we think of the vector field as representing
the flow of some substance through space, the integral represents the amount of
substance flowing out of Br minus the amount of substance flowing in; thus, the
condition div F(a) > 0 means that there is a net outflow near a, in other words,
that F tends to “diverge” from a. (The effect is subtle, though: One has to divide
the flux by r 3 in (5.36) to get something that does not vanish in the limit.) In any
case, the integral in (5.36) is a geometrically defined quantity that is independent
of the choice of coordinates; this gives the promised coordinate-free interpretation
of div F.
Among the important consequences of the divergence theorem are the follow-
ing identities.
5.37 Corollary (Green’s Formulas). Suppose R is a regular region in R3 with
piecewise smooth boundary, and f and g are functions of class C 2 on R. Then
** ***
(5.38) f ∇g · n dA = (∇f · ∇g + f ∇2 g) dV,
** ∂R * * *R
(5.39) (f ∇g − g∇f ) · n dA = (f ∇2 g − g∇2 f ) dV.
∂R R
Proof. An application of the product rule (5.28) shows that div(f ∇g) = ∇f ·
∇g + f · ∇2 g, so the divergence theorem applied to F = f ∇g yields (5.38). The
corresponding equation with f and g switched also holds; by subtracting the latter
equation from the former we obtain (5.39).
EXERCISES
In several of these exercises it will be useful to note that if Sr is the sphere of
radius r about the origin, the unit outward normal to Sr at a point x ∈ Sr is just
r −1 x. This is geometrically obvious if you think about it a little. Alternatively,
since Sr is a level set of the function |x|2 = x2 + y 2 + z 2 , we know that ∇(|x|2 ) =
2xi + 2yj + 2zk = 2x is normal to Sr , so the unit normal is |x|−1 x = r −1 |x| for
x ∈ Sr .
242 Chapter 5. Line and Surface Integrals; Vector Analysis
;;
1. Use the divergence theorem to evaluate the surface integral S F · n dA for the
following F and S, where S is oriented so that the positive normal points out
of the region bounded by S.
a. F, S as in Exercise 8b in §5.3.
b. F, S as in Exercise 8e in §5.3.
c. F(x, y, z) = x2 i + y 2 j + z 2 k; S is the surface of the cube 0 ≤ x, y, z ≤ a.
d. F(x, y, z) = (x/a2 )i + (y/b2 )j + (z/c2 )k; S is the ellipsoid (x/a)2 +
(y/b)2 + (z/c)2 = 1.
e. F(x, y, z) = x2 i − 2xyj + z 2 k; S is the surface of the cylindrical solid
{(x, y, z) : (x, y) ∈ W, 1 ≤ z ≤ 2} where W is a smoothly bounded
regular region in the plane with area A.
2. Let F(x, y, z) = (x2 + y 2 + z 2 )(xi
;; + yj + zk) and let S be the sphere of radius
a about the origin. Compute S F · n both directly and by the divergence
theorem.
3. Let R be a regular region
;; in R3 with piecewise smooth boundary. Show that
the volume of R is 13 ∂R F · n dA where F(x, y, z) = xi + yj + zk.
4. Prove the following integration-by-parts formula for triple integrals:
*** *** **
∂g ∂f
f dV = − g dV + f gnx dA,
R ∂x R ∂x ∂R
where nx is the x-component of the unit outward normal to ∂R. (Of course,
similar formulas also hold with x replaced by y and z.)
5. Suppose R is a regular region in R3 with piecewise smooth boundary, and f is
* * C 2 on R. * * *
a function of class
∂f
a. Show that dA = ∇2 f dV .
∂R ∂n ** R ***
∂f
2
b. Show that if ∇ f = 0, then f dA = |∇f |2 dV .
∂R ∂n R
6. Let x = (x, y, z) and g(x) = |x|−1 = (x2 + y 2 + z 2 )−1/2 .
a. Compute ∇g(x) for x ̸= 0.
b. Show that ∇2 g(x) = 0 for x ̸= ;;
0. (Cf. Exercise 9 in §2.6.)
c. Show by direct calculation that S (∂g/∂n) dA = −4π if S is any sphere
centered at the origin.
d. Since ∂g/∂n = ∇g · n and ∇2 g = div(∇g), why do (b) and (c) not
contradict the
;; divergence theorem?
e. Show that ∂R (∂g/∂n) dA = −4π if R is any regular region with piece-
wise smooth boundary whose interior contains the origin. (Hint: Consider
the region obtained by excising a small ball about the origin from R.)
5.6. Some Applications to Physics 243
(a) (b)
n
n θ
dS dS
v v
|v| dt dA, so the amount of substance in the box is ρ|v| dt dA. In short, the rate of
flow of substance through dS is ρ|v| dA.
Now suppose, more generally, that the angle from the velocity v to the normal
n to dS is θ. We apply the same reasoning to the box in Figure 5.8b. The vertical
height of the box is now | cos θ| times the slant height of dS, so the volume of the
box is |v| | cos θ| dt dA = |v · n| dt dA. Therefore, the rate of flow of substance
through dS is ρv · n dA if we take orientation into account, that is, if we count the
flow as negative when it goes in across dS in the direction opposite to n.
Passing from the infinitesimal level to the macroscopic level, we conclude that
the rate of flow of substance through a surface S is
**
J · n dA, where J(x, t) = ρ(x, t)v(x, t).
S
The time-dependent vector field J = ρv that occurs here represents the momentum
density if ρ is the mass density of the substance, and it represents the current density
if the substance is electric charge and ρ is the charge density. Our earlier remarks
about interpreting vector fields in terms of flows really mean thinking of the vector
field as a momentum or current density.
(The integral on the right is positive when the substance flows out of S, i.e., when
the amount of substance in S is decreasing; hence the minus sign.) The quantity
on the left is the integral over R of ∂ρ/∂t, by Theorem 4.47. We can use the
divergence theorem to convert the integral on the right to another integral over R,
obtaining
*** ***
∂ρ
(5.40) (x, t) dV = − div J dV.
R ∂t R
Now, this relation holds for any region R. In particular, let us take R = Br to
be the ball of radius r centered at the point x. After division of both sides by the
volume of Br , (5.40) says that the mean values of ∂ρ/∂t and − div J on Br are
equal. Letting r → 0 and assuming that these functions are continuous, we see that
their values at the center x are equal. In short, we have
∂ρ
(5.41) + div J = 0,
∂t
the classic differential equation relating the charge and current densities (or mass
and momentum densities, etc.).
This argument is reversible; that is, (5.41) implies that the substance is con-
served. Indeed, suppose R is a regular region such that no substance flows in or out
of R. Integrating (5.41) and using Theorem 4.47 and the divergence theorem, we
obtain
*** *** *** **
d ∂ρ
ρ dV = dV = − div J dV = − J · n dA = 0,
dt R R ∂t R ∂R
The Heat Equation. We now derive a mathematical model for the transfer
of heat through a substance by diffusion. (If the substance in question is a fluid
like water or air, our model does not take convection effects into account; we must
assume that the fluid is immobile on the macroscopic scale. But our model is valid
246 Chapter 5. Line and Surface Integrals; Vector Analysis
for the diffusion of heat in solids as well as in fluids that cannot flow readily, such
as air in a down jacket.) Our model will take the form of a differential equation for
the temperature u(x, t) at position x and time t.
The first basic physical assumption (which may be a simplification of the real-
life situation) is that the thermal energy density is proportional to the temperature.
The constant of proportionality σ is the specific heat density; it is the product of the
usual specific heat or heat capacity and the mass density of the substance. The total
thermal energy (or “heat,” for short) within a region R at time t is then
***
σu(x, t) d3 x.
R
The next assumption is Newton’s law of cooling, which says that heat flows
from hotter to colder regions at a rate proportional to the difference in temperature.
In our situation, the precise interpretation of this statement is that the flux of heat
per unit area in the direction of the unit vector n at the point x is proportional to the
directional derivative ∇u(x) · n of the temperature in the direction n, the constant
of proportionality being negative since heat flows in the direction of decreasing
temperature. Denoting the constant of proportionality by −K, then, we see that the
flux of heat across an oriented surface S with normal vector n is
**
− K∇u · n dA.
S
Here n denotes the unit outward normal to ∂R, as usual, and the minus sign on the
surface integral has disappeared because a positive flow of heat out of R represents
a decrease of heat in R.
As in the preceding subsection, we bring the d/dt inside the integral and apply
the divergence theorem to obtain
*** *** ***
∂u 2
σ dV = K ∇ u dV + F dV.
R ∂t R R
5.6. Some Applications to Physics 247
Since this holds for an arbitrary regular region R, we conclude as before that
∂u
(5.42) σ (x, t) = K∇2 u(x, t) + F (x, t).
∂t
This partial differential equation is known as the (inhomogeneous) heat equation;
it is of fundamental importance in the study of all sorts of diffusion processes. The
important special case F = 0 (the homogeneous equation) is what is usually called
the heat equation.
We have implicitly assumed that the specific heat density σ and the thermal
conductivity K are constants. However, the same arguments apply to the more
general situation where they are allowed to depend on position, as will be the case
where the material through which the heat is diffusing varies in some way from
point to point. The reader may verify that the result is the following generalized
heat equation:
∂u 8 9
σ(x) (x, t) = div K(x)∇u(x, t) + F (x, t).
∂t
Potentials and Laplace’s Equation. The electric field generated by a system
of electric charges is the vector field E whose value at a point x is the force felt
by a unit positive charge locted at x as the result of the electrostatic attraction or
repulsion to the system of charges. If the system is just a single unit positive charge
at the point p, the field is given by the usual inverse square law force, E(x) =
(x − p)/|x − p|3 . (There should be a constant of proportionality, but we shall
assume that units of measurement have been chosen so that the constant is 1.) For
many purposes, it is more convenient to work with the electric potential u(x) =
|p − x|−1 , which is related to the electric field E by
E = −∇u.
(For any points x1 and x2 , u(x2 )−u(x1 ) is the work done in moving a unit positive
charge from x1 to x2 through the field E.)
If, instead of a single charge at one point, our system of charges consists of
a number of charges located at different points, the electric field (resp. electric
potential) generated by the system is just the sum of the fields (resp. potentials)
generated by the individual charges. We wish to consider the case where there
is a continuous distribution of charge (an idealization, but a useful one) in some
bounded region of space. That is, we are given a charge density function ρ(p), a
continuous function that vanishes outside some bounded set R. The field generated
by such a charge distribution is found in the usual way: Chop up the set R into tiny
pieces, treat the charge coming from each piece as a point charge, and add up the
248 Chapter 5. Line and Surface Integrals; Vector Analysis
resulting fields or potentials. We shall work primarily with the potentials, for which
the result is
***
ρ(p) 3
(5.43) u(x) = d p.
R3 |p − x|
Proof. We can differentiate u by passing the derivatives under the integral sign.
They fall on ρ, which is assumed to be of class C 2 , so u is of class C 2 and
***
2 ∇2 ρ(x + y) 3
∇ u(x) = d y.
|y|
(Strictly speaking, Theorem 4.47 does not apply because of the singularity of the
integrand at the origin, but this is a minor technicality. One can finesse the problem,
for example, by switching to spherical coordinates, in which the r 2 sin ϕ coming
from the volume element cancels the r −1 of the integrand with room to spare.)
Here ∇2 ρ(x + y) is obtained by differentiating ρ with respect to x, but the same
result is obtained by taking the derivatives with respect to y, for ∂xj [ρ(x + y)] =
(∂j ρ)(x + y) = ∂yj [ρ(x + y)]. We can therefore use Green’s formula to transfer
5.6. Some Applications to Physics 249
the derivatives to |y|−1 . We need to take some care, however, since the singularity
of |y|−1 does not remain harmless after being differentiated twice.
Let us fix the point x and choose positive numbers ϵ and K, with ϵ < 1 and K
large enough so that ρ(x + y) = 0 if |y| ≥ K − 1. Let Rϵ,K = {y : ϵ < |y| < K}.
We then have ***
2 ∇2 ρ(x + y) 3
∇ u(x) = lim d y.
ϵ→0 Rϵ,K |y|
The integrand has no singularities in the region Rϵ,K , so we can apply Green’s
formula (5.39) to obtain
+ ***
2
∇ u(x) = lim ρ(x + y)∇2 (|y|−1 ) d3 y
ϵ→0 Rϵ,K
** ,
8 −1 −1
9
+ ∇ρ(x + y)|y| − ρ(x + y)∇(|y| ) · n dA .
∂Rϵ,K
The integral over Rϵ,K on the right vanishes by (5.45). Also, the boundary of Rϵ,K
consists of two pieces, the sphere |y| = K and the sphere |y| = ϵ, and the integral
over |y| = K is zero because ρ(x + y) and its derivatives vanish for |y| > K − 1.
Therefore,
**
2
8 9
(5.47) ∇ u(x) = lim ∇ρ(x + y)|y|−1 − ρ(x + y)∇(|y|−1 ) · n dA.
ϵ→0 |y|=ϵ
Here n denotes the unit normal to the sphere |y| = ϵ that is outward with respect
to Rϵ,K and hence inward in the usual sense.
Since the first derivatives of ρ are continuous, |∇ρ(x + y)| is bounded by some
constant C for |y| ≤ 1, and hence
)* * ) **
) ∇ρ(x + y) · n ) C C
) )
(5.48) ) dA) ≤ dA = 4πϵ2 = 4πCϵ,
) |y|=ϵ |y| ) |y|=ϵ ϵ ϵ
But the expression inside the brackets is just the mean value of ρ(x + y) on the
sphere |y| = ϵ, which tends to ρ(x) as ϵ → 0, so the proof is complete.
5.49 Corollary. The electric field E is related to the charge density ρ by div E =
4πρ.
density ρ, and the current density J. In suitably normalized units, they are
1 ∂B
div E = 4πρ, curl E = − ,
(5.50) c ∂t
1 ∂E 4π
div B = 0, curl B = + J,
c ∂t c
where c is the speed of light. This is not the place for a thorough study of Maxwell’s
equations and their consequences for physics, but we wish to point out a couple of
features of them in connection with the ideas we have been developing. In what
follows we shall assume that all functions in question are of class C 2 , so that the
second derivatives make sense and the mixed partials are equal.
First, Maxwell’s equations contain the law of conservation of charge. Indeed,
by formula (5.31) we have
∂ρ 1 ∂E c
= div = div(curl B) − div J = − div J,
∂t 4π ∂t 4π
and this is the conservation law in the form (5.41). Second, in a region of space
with no charges or currents (ρ = 0 and J = 0), by formula (5.33) we have
1 ∂B 1 ∂2E
∇2 E = ∇(div E) − curl(curl E) = 0 + curl = 2 2
c ∂t c ∂t
and
1 ∂E 1 ∂2B
∇2 B = ∇(div B) − curl(curl B) = 0 − curl = 2 2.
c ∂t c ∂t
That is, the components of E and B all satisfy the differential equation
1 ∂2f
(5.51) ∇2 f = .
c2 ∂t2
This is the wave equation, another of the fundamental equations of mathematical
physics. It describes the propagation of waves in many different situations; here it
concerns electromagnetic radiation — light, radio waves, X-rays, and so on.
EXERCISES
Besides distributions of charge or mass in 3-space, one can consider distributions on
surfaces or curves (physically: thin plates or wires). The formula for the associated
potential or field is similar to (5.43) except that the triple integral is replaced by a
surface or line integral, and the density ρ represents charge or mass per unit area or
unit length rather than unit volume. In the following exercises, “uniform” means
“of constant density.”
252 Chapter 5. Line and Surface Integrals; Vector Analysis
5.52 Theorem (Stokes’s Theorem). Let S and ∂S be as described above, and let
F be a C 1 vector field defined on some neighborhood of S in R3 . Then
* **
(5.53) F · dx = (curl F) · n dA.
∂S S
On the other hand, since the formalism of differentials automatically encodes the
chain rule, - .
* *
∂x ∂x
F dx = F du + dv .
∂S ∂W ∂u ∂v
(In both of these equations, F and its derivatives are evaluated at G(u, v).) We
apply Green’s theorem to this last line integral:
* - . ** - + , + ,.
∂x ∂x ∂ ∂x ∂ ∂x
F du + dv = F − F du dv.
∂W ∂u ∂v W ∂u ∂v ∂v ∂u
By the product rule and the chain rule, the integrand on the right equals
+ ,
∂F ∂x ∂F ∂y ∂F ∂z ∂x ∂2x
+ + +F
∂x ∂u ∂y ∂u ∂z ∂u ∂v ∂u∂v
+ ,
∂F ∂x ∂F ∂y ∂F ∂z ∂x ∂2x
− + + −F
∂x ∂v ∂y ∂v ∂z ∂v ∂u ∂v∂u
∂F ∂(z, x) ∂F ∂(x, y)
= − .
∂z ∂(u, v) ∂y ∂(u, v)
But this is the integrand on the right side of (5.55), so (5.54) is proved.
5.7. Stokes’s Theorem 255
(No computation is necessary here; the integral of 1 is the area of the disc and
the integral of −2y vanishes by symmetry.)
There is an interesting feature of Stokes’s theorem that does not appear in its
siblings. A closed curve in R2 is the boundary of just one regular region in R2 ,
and a closed surface in R3 is the boundary of just one regular region in R3 ; but a
closed curve in R3 is the boundary of infinitely many surfaces in R3 ! For example,
the unit circle in the xy-plane is the boundary of the unit disc in the xy-plane, the
upper and lower hemispheres of the unit sphere in R3 , the portion of the paraboloid
z = 1 − x2 − y 2 lying above the unit disc, and so forth. Stokes’s theorem says that
if C is a closed curve in R3 and S is any oriented surface bounded by C, then
* **
F · dx = (curl F) · n dA
C S
for any C 1 vector field F, provided that the orientations on C and S are compatible.
xz x+2y ]i + [log(2 + y + z) + 2ex+2y ]j +
E XAMPLE 2. Let F(x,;; y, z) = [e + e
3xyzk. Compute S curl F · n dA, where S is the portion of the surface z =
1 − x2 − y 2 above the xy-plane, oriented with the normal pointing upward.
Solution. We have curl F(x, y, z) = [3xz − (2+ y + z)−1 ]i+ [xexz − 3yz]j
and n dA = (2xi + 2yj + k) dx dy, so direct evaluation;of the integral is quite
unpleasant. By Stokes’s theorem, the integral equals C F · dx where C is
the unit circle in the xy-plane; this is not much better. However, by Stokes’s
256 Chapter 5. Line and Surface Integrals; Vector Analysis
;;
theorem again, the latter line integral is equal to D curl F · n dA where D is
the unit disc in the xy-plane. Here n = k, so curl F · n = 0 and the integral
vanishes!
Here is an analogue of the fact that the integral of the gradient of a function
over any closed curve vanishes:
Proof. If F extends differentiably to the region R inside S, this follows from the
divergence theorem, since div(curl F) = 0 for any F. However, it is true even if F
has singularities inside S. To see this, draw a small simple closed curve C in S (say,
the image of a small circle in the uv-plane under a parametrization x = G(u, v)).
C divides S into two regular regions S1 and S2 , and we have
** ** **
(5.57) (curl F) · n dA = (curl F) · n dA + (curl F) · n dA.
S S1 S2
On the other hand, if we give C the orientation compatible with S1 , Stokes’s theo-
rem gives
** * **
(curl F) · n dA = F · dx = − (curl F) · n dA,
S1 C S2
because the orientation compatible with S2 is the opposite one. Hence the terms on
the right of (5.57) cancel.
(Note: We had to say that C is a “small” closed curve, because otherwise C
might not divide S into two pieces. For example, take S to be a torus [the surface
of a doughnut] and C to be a circle that goes completely around S in one direction.)
EXERCISES
;
1. Use Stokes’s theorem to calculate C [(x − z) dx + (x + y) dy + (y + z) dz]
where C is the ellipse where the plane z = y intersects the cylinder x2 +y 2 = 1,
oriented counterclockwise as viewed from above.
;
2. Use Stokes’s theorem to evaluate C [y dx+y 2 dy +(x+2z) dz] where C is the
curve of intersection of the sphere x2 + y 2 + z 2 = a2 and the plane y + z = a,
oriented counterclockwise as viewed from above.
3. Given any nonvertical plane P parallel to the x-axis, let C; be the curve of
intersection of P with the cylinder x2 + y 2 = a2 . Show that C [(yz − y) dx +
(xz + x) dy] = 2πa2 .
;;
4. Evaluate S curl F · n dA where F(x, y, z) = yi + (x − 2x3 z)j + xy 3 k and S
is the upper half of the sphere x2 + y 2 + z 2 = a2 .
5. Let F(x, y, z) = 2xi + 2yj + (x2 + y 2 + z 2 )k and let S be the lower half of the
ellipsoid (x2 /4) + (y 2 /9) + (z 2 /27) = 1. Use Stokes’s theorem to calculate
the flux of curl F across S from the lower side to the upper side.
6. Define the vector field F on the complement of the z-axis by F(x, y, z) =
(−yi + xj)/(x2 + y 2 ).
a. Show that curl F = 0. ;
b. Show by direct calculation C F · dx = 2π for any horizontal circle C
centered at a point on the z-axis.
c. Why do (a) and (b) not contradict Stokes’s theorem?
7. Let Cr denote the circle of radius r about the origin in the xz-plane, oriented
counterclockwise as viewed from the positive y-axis. Suppose
; F is a C 1 vector
field on the complement of the y-axis in R3 such that C1 F · dx = 5 and
;
curl F(x, y, z) = 3j + (zi − xk)/(x2 + z 2 )2 . Compute Cr F · dx for every r.
258 Chapter 5. Line and Surface Integrals; Vector Analysis
∂Gj ∂Gk
(5.61) − = 0 for all j ̸= k.
∂xk ∂xj
We observe that when n = 3, the quantities in (5.61) are the components of curl G,
so that (5.61) is equivalent to the condition curl G = 0.
The condition (5.61) is almost sufficient to guarantee that G is a gradient; the
only possible problem arises from the geometry of R, as we shall explain in more
detail below. When R is convex, the problem disappears, and we have the following
result. Our proof will only be complete in dimensions 2 and 3 because it invokes
Green’s or Stokes’s theorem, but the same idea works in higher dimensions.
260 Chapter 5. Line and Surface Integrals; Vector Analysis
Proof. The idea is similar to the proof of Proposition 5.60, but we do not know
yet that condition (a) of Proposition 5.59 is satisfied, so we must be more
; careful.
Pick a base point a in R, and define f (x) for x ∈ R by f (x) = L(a,x) G ·
dx, where L(a, x) is the line segment from a to x. (We need the hypothesis of
convexity so that this line segment lies in R.) To show that G(x) = ∇f (x), let
h = (h, 0, · · · , 0) be small enough so that x + h ∈ R. Let C be the triangular
closed curve obtained by following L(a, x) from a to x, L(x, x + h) from x to
x + h, and then L(a, x + h) backwards from x + h to a. Green’s theorem (if
n = 2), Stokes’s theorem (if n = 3), or the
; higher-dimensional version of Stokes’s
theorem (if n > 3; see §5.9) converts C G · dx into a double integral over the
solid
; triangle whose boundary is C, whose integrand vanishes by (5.61). Hence
C G · dx = 0, or in other words,
* * *
f (x + h) − f (x) = G · dx − G · dx = G · dx.
L(a,x+h) L(a,x) L(x,x+h)
Now the same argument as in Proposition 5.60 shows that ∂1 f = G1 , and likewise
∂j f = Gj for the other j.
−yi + xj
G(x, y, z) = .
x2 + y 2
The hypothesis on R that should replace convexity in Theorem 5.62 to give the
best result is that every simple closed curve in R is the boundary of a surface lying
entirely in R. (The proof requires more advanced techniques.) The region R in
Example 1 does not have this property; no closed curve that encircles the z-axis
can be the boundary of a surface in R.
In practice, if R is a rectangular box, to find a function whose gradient is G one
can proceed in a more simple-minded way than is indicated in the proof of Theorem
5.62. Consider the 2-dimensional case, where R = [a, b] × [α, β] and G(x, y) =
P (x, y)i + Q(x, y)j. Assuming that ∂x Q = ∂y P , we begin by integrating P with
respect to x, including a “constant” of integration that can depend on the other
variable y: * x
f (x, y) = P (t, y) dt + ϕ(y).
c
Here c can be any point in the interval [a, b]. Any such f will satisfy ∂x f = P . To
obtain ∂y f = Q, differentiate the formula for f with respect to y and use Theorem
4.47:
* x * x
′
∂y f (x, y) = ∂y P (t, y) dt + ϕ (y) = ∂x Q(t, y) dt + ϕ′ (y)
a a
= Q(x, y) − Q(a, y) + ϕ′ (y).
Then ∂y f = (xy + 1)exy + ϕ′ (y); matching this up with the second component
yields ϕ′ (y) = cos y, so we can take ϕ(y) = sin y. The general solution is
f (x, y) = yexy + sin y + C.
E XAMPLE 3. Let G(x, y, z) = yzi+(xz+y)j+(xy−z)k. An easy calculation
shows that curl G = 0. To find a function f such that ∇f = G, we integrate
the first component with respect to x, obtaining f (x, y, z) = xyz + ϕ(y, z).
Differentiating this in y and z yields ∂y f = xz + ∂y ϕ and ∂z f = xy + ∂z ϕ.
Therefore, we must have ∂y ϕ = y and ∂z ϕ = −z. Integrating the first of these
equations with respect to y gives ϕ(y, z) = 21 y 2 + ψ(z), so ∂z ϕ = ψ ′ (z) = −z
and ψ(z) = − 12 z 2 + C. Putting this all together,
f (x, y, z) = xyz + 12 y 2 − 21 z 2 + C.
Proof. We shall not give the general proof but shall content ourselves with present-
ing an algorithm for solving curl F = G when R is a rectangular box, similar to the
5.8. Integrating Vector Derivatives 263
one given above for solving ∇f = G. Suppose that R = [a1 , b1 ]×[a2 , b2 ]×[a3 , b3 ]
and G is a C 1 vector field satisfying div G = 0 on R. Unlike the problem of find-
ing a function with a given gradient, whose solution is unique up to an additive
constant, there is lots of freedom in choosing an F such that curl F = G, for if
curl F = G then also curl(F + ∇f ) = G for any smooth function f . This gives
enough leeway to allow us to assume that the z-component of F is zero. Thus, let
us write G = G1 i + G2 j + G3 k and F = F1 i + F2 j; we then want
∂z F1 = −(3x2 y + y 2 ), ∂z F2 = −(6xz + x3 ),
∂x F2 − ∂y F1 = 4x + 2yz − 3z 2 .
and plugging these results into the third equation yields ∂x ϕ − ∂y ψ = 4x.
Therefore, one solution (with ϕ = 2x2 and ψ = 0) is
or similar expressions with the variables permuted; there are many other possi-
bilities. In fact, this problem is so easy that it seems reasonable to make it more
interesting by imposing additional conditions on F. We restrict attention to the
three-dimensional situation, but there are similar results in higher dimensions.
The key result here is Theorem 5.46, which shows that we can solve the equa-
tion div F = g subject to the restriction that curl F = 0. More precisely, suppose
R is a bounded open set in R3 and g is of class C 1 on R. (In Theorem 5.46 g
was assumed to be C 2 , but see the remarks following the proof.) Smoothness on
R means that g can be extended as a C 1 function to an open set containing R, and
it can be modified outside R so as to vanish outside some bounded set while re-
maining of class C 1 . (One multiplies g by a C 1 function that is identically 1 on R
and vanishes outside some slightly larger region; we omit the details, which are of
little importance for this argument.) Hence we may assume that g is C 1 on R3 and
vanishes outside a bounded set. Then, by Theorem 5.46, the function
***
1 g(x + y) 3
u(x) = − d y
4π R 3 |y|
5.8. Integrating Vector Derivatives 265
There is a companion result to Theorem 5.64: Not every vector field is a gra-
dient, and not every vector field is a curl, but every vector field is the sum of a
gradient and a curl. The proof is left to the reader as Exercise 3, where a more
precise statement is given.
One might also ask about uniqueness in Theorem 5.64; that is, to what extent is
a vector field determined by its curl and divergence? Clearly, if F satisfies curl F =
G and div F = g, then so does F + H whenever curl H = 0 and div H = 0.
Solutions of the latter pair of equations can be obtained simply by taking H = ∇ϕ
where ϕ is any solution of Laplace’s equation ∇2 ϕ = 0. Such solutions exist in
great abundance, so the F in Theorem 5.64 is far from unique. However, one can
pin down a unique solution by imposing suitable boundary conditions.
5.65 Proposition. Let R be a bounded convex open set in R3 with piecewise smooth
boundary. Suppose H is a C 1 vector field on R such that curl H = 0 and div H =
0 on R and H · n = 0 on ∂R. Then H vanishes identically on R.
Proof. By Theorem 5.62, H is the gradient of a function u on R, and ∇2 u =
div H = 0. Since H · n = ∂u/∂n, by Green’s formula (5.38) we have
** *** ***
∂u / 2 2
0 / 2 0
0= u dA = |∇u| + u∇ u dV = |H| + 0 dV.
∂R ∂n R R
But |H|2 is a nonnegative continuous function, so its integral over R can be zero
only if |H|2 (and hence H) vanishes identically on R.
field E vanishes only when there are no time-varying magnetic fields present. Only
in this case is E the gradient of a potential function. However, div B = 0 always
(this expresses the fact that there are no “magnetic charges”), so B is the curl of a
vector potential A. We then have
- .
1 ∂A 1 ∂B
curl E + = curl E + = 0,
c ∂t c ∂t
EXERCISES
1. Determine whether each of the following vector fields is the gradient of a func-
tion f , and if so, find f . The vector fields in (a)–(c) are on R2 ; those in (d)–(f)
are on R3 , and the one in (g) is on R4 . In all cases i, j, k, and l denote unit
vectors along the positive x-, y-, z-, and w-axes.
a. G(x, y) = (2xy + x2 )i + (x2 − y 2 )j.
b. G(x, y) = (3y 2 + 5x4 y)i + (x5 − 6xy)j.
c. G(x, y) = (2e2x sin y − 3y + 5)i + (e2x cos y − 3x)j
d. G(x, y, z) = (yz − y sin xy)i + (xz − x sin xy + z cos yz)j + (xy +
y cos yz)k.
e. G(x, y, z) = (y − z)i + (x − z)j + (x − y)k
f. G(x, y, z) = 2xyi + (x2 + log z)j + ((y + 2)/z)k (z > 0).
g. G(x, y, z, w) = (xw2 + yzw)i + (xzw + yz 2 − 2e2y+z )j + (xyw + y 2 z −
e2y+z − w sin zw)k + (xyz + x2 w − z sin zw)l.
2. Determine whether each of the following vector fields is the curl of a vector
field F, and if so, find such an F.
a. G(x, y, z) = (x3 + yz)i + (y − 3x2 y)j + 4y 2 k.
b. G(x, y, z) = (xy + z)i + xzj − (yz + x)k.
2 2 2 2
c. G(x, y, z) = (xe−x z − 6x)i + (5y + 2z)j + (z − ze−x z )k.
3. Let R be a bounded convex open set in R3 . Show that for any C 2 vector
field H on R there exist a C 2 function f and a C 2 vector field G such that
H = grad f + curl G. (Hint: Solve ∇2 f = div H.)
4. Let F = F1 i + F2 j be a C 1 vector field on S = R2 \ {(0, 0)} such that
∂1 F2 = ∂2 F1 on S (but F may be singular at the origin).
5.9. Higher Dimensions and Differential Forms 267
a. Let Cr be the
; circle of radius r about the origin, oriented counterclockwise.
Show that Cr F · dx is a constant α that does not depend on r. (Hint:
Consider the
; region between two circles.)
b. Show that C F · dx = α for any simple closed curve C, oriented counter-
clockwise, that encircles the origin.
c. Let F0 = (xj − yi)/(x2 + y 2 ) as in Example 1. Show that F − (α/2π)F0
is the gradient of a function on S. (Thus, all curl-free vector fields on S
that are not gradients can be obtained from F0 by adding gradients.)
dimensional “area” element on ∂R. The “vector area element” n dV n−1 is given
by a formula analogous to the one in R3 . Namely, if (part of) ∂R is parametrized
by x = G(u1 , . . . , un−1 ), then
⎛ ⎞
e1 ··· en
⎜ ∂1 G1 · · · ∂1 Gn ⎟
⎜ ⎟
n dV n−1 = det ⎜ .. .. ⎟ du1 · · · dun−1 ,
⎝ . . ⎠
∂n−1 G1 · · · ∂n−1 Gn
where e1 , . . . , en are the standard basis vectors for Rn . (The reader may verify that
in the case n = 2, these formulas yield Green’s theorem in the form (5.18).)
Second, the analogue of the divergence theorem in dimension 1 is just the fun-
damental theorem of calculus:
*
f (b) − f (a) = f ′ (t) dt.
[a,b]
On the real line, vector fields are the same thing as functions, and the divergence of
a vector field is just the derivative of a function. A regular region in R is an interval
268 Chapter 5. Line and Surface Integrals; Vector Analysis
[a, b], whose boundary is the two-element set {a, b}. Since the boundary is finite,
“integration” over the boundary is just summation, and the minus sign on f (a)
comes from assigning the proper “orientation” to the two points in the boundary.
There are also analogues of Stokes’s theorem in higher dimensions, which say
that the integral of some gadget G over the boundary of a k-dimensional submani-
fold of Rn equals the integral of another gadget formed from the first derivatives of
G over the submanifold itself. However, to formulate things properly in this general
setting, it is necessary to develop some additional algebraic machinery, the theory
of differential forms. To do so is beyond the scope of this book; what follows is
intended to provide an informal introduction to the ideas involved. For a detailed
treatment of differential forms, we refer the reader to Hubbard and Hubbard [7] and
Weintraub [19].
Roughly speaking, a differential k-form is an object whose mission in life is to
be integrated over k-dimensional sets; thus, 1-forms are designed to be integrated
over curves, 2-forms are designed to be integrated over surfaces, and so on. Here
is how the ideas of vector analysis that we have been studying can be reformulated
in terms of differential forms.
This operation is just the “built-in chain rule” for differentials of functions, ex-
tended to arbitrary 1-forms. To wit, let x1 , . . . , xn and u1 , . . . , uk be the coordi-
nates on Rn and Rk , respectively. If ω = F1 dx1 + · · · + Fn dxn is a 1-form on
Rn , its pullback via T is the 1-form T∗ ω on Rk defined by substituting into ω the
expressions for the x’s in terms of the u’s and the dx’s in terms of the du’s:
(5.67)
+ , + ,
∂x1 ∂x1 ∂xn ∂xn
T∗ ω = F:1 du1 + · · · + duk + · · · + F:n du1 + · · · + duk
∂u1 ∂uk ∂u1 ∂uk
+ , + ,
: ∂x1 : ∂xn : ∂x1 : ∂xn
= F1 + · · · + Fn du1 + · · · + F1 + · · · + Fn duk ,
∂u1 ∂u1 ∂uk ∂uk
where / 0
F:m (u1 , . . . , uk ) = Fm T(u1 , . . . , uk ) .
Two special cases are of particular interest. First, the chain rule says that when
ω = df , T∗ ω = d(f ◦ T). Second, when k = 1 so that T : R → Rn defines a
curve in Rn , (5.67) becomes
+ ,
dx1 dxn
T∗ ω = (F1 ◦ T) + · · · + (Fn ◦ T) du.
du du
1-forms can be integrated over curves. To begin with, a 1-form on R is merely
something of the form ω = g(t) dt, and its integral over an interval [a, b] is just
what you think it is:
* * b
ω= g(t) dt.
[a,b] a
Now, if ω = F1 dx1 + · · · + ;Fn dxn is a 1-form on Rn and C is a smooth curve
parametrized by x = g(t), C ω is defined by pulling ω back to R via g and
integrating the result as before:
* * * b+ ,
∗ dx1 dxn
ω= g ω= F1 (g(t)) + · · · + Fn (g(t)) dt.
C [a,b] a dt dt
In other words, if we identify ω with the vector field F as before,
* *
ω= F · dx.
C C
270 Chapter 5. Line and Surface Integrals; Vector Analysis
(5.69) β ∧ α = −α ∧ β.
But according to (5.69), dxj ∧ dxi = −dxi ∧ dxj and dxi ∧ dxi = 0. Thus the
terms with i = j in (5.70) drop out, and for i ̸= j we can combine the ijth and jith
terms into one:
We have the option of using either of the two expressions on the right, and the usual
choice is to use the one where the first index is smaller than the second one. (In R3
a different choice is sometimes convenient, as we shall soon see.) Thus, we finally
obtain "
α∧β = (Ai Bj − Aj Bi ) dxi ∧ dxj .
1≤i<j≤n
where the Cij are continuous functions on Rn . We note that the number of terms
in this sum, that is, the number of pairs (i, j) with 1 ≤ i < j ≤ n, is 21 n(n − 1).
In (5.71) we also have the option of rewriting dxi ∧ dxj as −dxj ∧ dxi if we so
choose.
What does this really mean? We have been proceeding purely formally, without
saying what meaning is to be attached to the expressions dxi ∧dxj . In the full-dress
treatment of this subject, 2-forms are defined to be alternating rank-2 tensor fields
over Rn , but this is somewhat beside the point. For now it is probably best to
think of a 2-form on Rn simply as a 12 n(n − 1)-tuple of functions, namely the
functions Cij in (5.71), and the expressions dxi ∧ dxj simply as a convenient set of
signposts to mark the various components, just as i, j, and k are used to mark the
components of vector fields in R3 . The important features of 2-forms are not their
precise algebraic definition but the way they transform under changes of variables
and the way they integrate over surfaces.
Before proceeding to these matters, however, let us see how things look in the
3-dimensional case. When n = 3 we also have 12 n(n − 1) = 3, so 2-forms have 3
components just as vector fields and 1-forms do: This is the “accident” that makes
n = 3 special! The general 2-form on R3 can be written as
(5.72) ω = F dy ∧ dz + G dz ∧ dx + H dx ∧ dy ←→ F = F i + Gj + Hk.
Observe carefully how we have set this correspondence up: we have written the
basis elements dxi ∧ dxj with the variables in cyclic order,
rather than the “i < j” order we used above, so that the middle term is dz ∧ dx
rather than dx ∧ dz. Also, we identify the unit vector i in the x direction with the
2-form dy ∧ dz from which dx is missing, and likewise for j and k.
The exterior product in 3 dimensions looks like this: If
α = A1 dx + A2 dy + A3 dz, β = B1 dx + B2 dy + B3 dz,
then
α ∧ β = (A2 B3 − A3 B2 ) dy ∧ dz + (A3 B1 − A1 B3 ) dz ∧ dx
+ (A1 B2 − A2 B1 ) dx ∧ dy.
272 Chapter 5. Line and Surface Integrals; Vector Analysis
Thus, if we identify α and β with vector fields according to (5.66) and α ∧ β with
a vector field according to (5.72), the exterior product turns into the cross product:
α ←→ F, β ←→ G, α∧β ←→ F × G.
so in general, if "
ω= Clm (x) dxl ∧ dxm ,
l<m
then
"" ∂(xl , xm )
T∗ ω = Clm (T(u)) dui ∧ duj .
∂(ui , uj )
l<m i<j
It is a consequence of the chain rule that the pullback operation behaves properly
under composition of mappings, namely, (T1 ◦ T2 )∗ ω = T∗2 (T∗1 ω).
We can now show how to integrate 2-forms over surfaces. First consider the
simplest case, where the surface is simply a region D in R2 . If we name the coor-
dinates on R2 x and y, the general 2-form on R2 has the form ω = f (x, y) dx ∧ dy,
and its integral over D is the obvious thing:
** **
(5.73) f (x, y) dx ∧ dy = f (x, y) dx dy,
D D
the integral on the right being the ordinary double integral of f over D. The only
subtle point is that the integral on the left is an oriented integral, the orientation
being carried in the fact that dx comes before dy in dx ∧ dy. If we wrote dy ∧ dx
instead, we would introduce a minus sign.
The nice thing about (5.73) is that the change-of-variable formula for double
integrals is more or less built into it. Namely, suppose T : R2 → R2 is an invertible
C 1 transformation, say T(u, v) = (x, y). If ω = f (x, y) dx ∧ dy, then
∂(x, y)
T∗ ω = f (T(u, v)) du ∧ dv = f (T(u, v))(det DT) du dv,
∂(u, v)
5.9. Higher Dimensions and Differential Forms 273
In other words, the formalism of differential forms produces the necessary Jacobian
factor automatically. The change-of-variable formula as we have seen it before
involved | det DT| rather than det DT, but this discrepancy is accounted for by
the difference between ordinary integrals and oriented integrals.
Now we turn to the case of integrals over a surface S in Rn . The idea is the
Rn and S is a surface parametrized
same as for line integrals: If ω is a 2-form on ;;
2
by x = G(u, v), (u, v) ∈ D ⊂ R , we define S ω by pulling ω back to D via G
and using (5.73) to define the resulting integral:
** **
ω= G∗ ω.
S D
: ◦T
This is independent of the parametrization, in the following sense: If G = G
2 2 1
where T : R → R is a C transformation, then by (5.74),
** ** **
G∗ ω = T∗ G: ∗ω = G: ∗ ω.
D D T(D)
Hence the notion of surface integrals of vector fields in R3 also fits into the theory
of differential forms.
274 Chapter 5. Line and Surface Integrals; Vector Analysis
Here, as in the case of 2-forms, one can think of the expressions dxi ∧ dxj ∧ dxk
simply as formal basis elements, and one can put the indices i, j, k in an order other
than i < j < k with the understanding that whenever one interchanges two of the
dx’s one introduces a minus sign. The number of terms in the sum in (5.75) is the
binomial coefficient n!/3!(n − 3)!. When n = 3, this number is 1: All 3-forms on
R3 have the form
ω = f (x, y, z) dx ∧ dy ∧ dz
and hence can be identified with functions:
f (x, y, z) dx ∧ dy ∧ dz ←→ f (x, y, z).
The notion of exterior product extends so as to yield a 3-form as the product
of three 1-forms or as the product of a 1-form and a 2-form. The idea is pretty
obvious: dxi ∧ dxj ∧ dxk is the exterior product of the three 1-forms dxi , dxj , and
dxk , or the 1-form dxi and the 2-form dxj ∧ dxk , or the 2-form dxi ∧ dxj and the
1-form dxk . The exterior product distributes over sums and scalar multiples in the
usual way, and the anticommutative law becomes
α ∧ β = (−1)l+m−1 β ∧ α if α is an l-form and β is an m-form.
Here is how it works when n = 3: If
α = A1 dx + A2 dy + A3 dz,
β = B1 dx + B2 dy + B3 dz,
γ = C1 dx + C2 dy + C3 dz,
ω = W1 dy ∧ dz + W2 dz ∧ dx + W3 dx ∧ dy,
then
⎛ ⎞
A1 A2 A3
α ∧ (β ∧ γ) = (α ∧ β) ∧ γ = det ⎝B1 B2 B3 ⎠ dx ∧ dy ∧ dz,
C1 C2 C3
α ∧ ω = ω ∧ α = (A1 W1 + A2 W2 + A3 W3 ) dx ∧ dy ∧ dz.
Thus, if we identify α, β, γ with the vector fields F, G, H and ω with the vector
field V, the exterior product turns into the scalar triple product and dot product:
α∧β∧γ ←→ F · (G × H), α∧ω ←→ F · V.
5.9. Higher Dimensions and Differential Forms 275
The Exterior Derivative. When the operations of gradient, curl, and diver-
gence are expressed in terms of differential forms, they are all instances of a single
operation, denoted by d and called the exterior derivative, which maps k-forms
on Rn into (k + 1)-forms on Rn :
d d d d
0-forms −→ 1-forms −→ 2-forms −→ 3-forms −→ · · · .
Here’s how it works.
First, a 0-form is, by definition, a function; if f is a 0-form, then df is just the
differential of f . If we identify 1-forms with vector fields, df becomes ∇f . That
is, the gradient is the exterior derivative on 0-forms.
Now, any k-form ω with k ≥ 1 is a sum of terms of the form f β where f is a
function and β is one of the basis elements (dxi for 1-forms, dxi ∧ dxj for 2-forms,
etc.). dω is defined to be the (k + 1)-form obtained by replacing each such term f β
by df ∧ β.
This is what it looks like when ω = A1 dx1 + A2 dx2 + · · · + An dxn is a
1-form:
dω = dA1 ∧ dx1 + · · · + dAn ∧ dxn
+ , + ,
∂A1 ∂A1 ∂An ∂An
= dx1 + · · · + dxn ∧ dx1 + · · · + dx1 + · · · + dxn ∧ dxn
∂x1 ∂xn ∂x1 ∂xn
" + ∂Aj ∂Ai
,
= − dxi ∧ dxj .
∂xi ∂xj
i<j
276 Chapter 5. Line and Surface Integrals; Vector Analysis
But this is just the curl! That is, if we identify the 1-form ω and the 2-form dω
with vector fields F and G in the standard way, then G = curl F. The curl is the
exterior derivative on 1-forms in R3 .
Now suppose that ω = A dy ∧ dz + B dz ∧ dx + C dx ∧ dy is a 2-form. As the
notation in higher dimensions gets messy, we shall write out only the 3-dimensional
case:
dω = dA ∧ dy ∧ dz + dB ∧ dz ∧ dx + dC ∧ dx ∧ dy
= (∂x A dx + ∂y A dy + ∂z A dz) ∧ dy ∧ dz
+ (∂x B dx + ∂y B dy + ∂z B dz) ∧ dz ∧ dx
+ (∂x C dx + ∂y C dy + ∂z C dz) ∧ dx ∧ dy
= (∂x A + ∂y B + ∂z C) dx ∧ dy ∧ dz.
(For the last equality we have used the fact that an exterior product containing two
identical factors vanishes and the fact that the product dx ∧ dy ∧ dz is unchanged by
cyclic permutation of its three terms.) If we identify ω with a vector field F and dω
with a function g as before, we see that g = div F. The divergence is the exterior
derivative on 2-forms in R3 .
We observed earlier that curl(∇f ) = 0 for any function f and div(curl F) = 0
for any vector field F. The interpretation of these identities in terms of differential
forms is that d(df ) = 0 for any 0-form (function) f and d(dω) = 0 for any 1-form
ω. It is true in general that
(5.76) d(dω) = 0
for any k-form ω on Rn . In all cases the proof of this fact boils down to the equality
of mixed partials.
As an illustration of the exterior derivative, we give the relativistically covari-
ant reformulation of Maxwell’s equations (5.50). The key idea is to think of elec-
tromagnetism as a phenomenon in 4-dimensional space-time rather than a time-
dependent phenomenon in 3-dimensional space. The electric and magnetic fields
E = (Ex , Ey , Ez ) and B = (Bx , By , Bz ) are combined into a single entity, the
5.9. Higher Dimensions and Differential Forms 277
Stokes’s Theorem. We can now state the general theorem that encompasses the
integral theorems of the preceding sections and their higher dimensional analogues:
5.77 Theorem (The General Stokes Theorem). Let M be a smooth, oriented k-
dimensional submanifold of Rn with a piecewise smooth boundary ∂M , and let
∂M carry the orientation that is (in a suitable sense) compatible with the one on
M . If ω is a (k − 1)-form of class C 1 on an open set containing M , then
* * ** *
··· ω= · · · dω.
∂M M
We conclude with a final suggestive remark. The formal differential-algebraic
identity d(dω) = 0 stated above has a geometric counterpart. The boundary of a
region in the plane is a closed curve with no endpoints, and the boundary of a region
in 3-space is a closed surface with no edge. In general, the boundary of a (smoothly
bounded) region M in a k-dimensional manifold is a (k − 1)-dimensional manifold
with no boundary, that is,
(5.78) ∂(∂M ) = ∅.
The general Stokes theorem shows that (5.76) and (5.78) are in some sense
equivalent. Indeed, if M is k-dimensional and ω is a (k − 2)-form, the Stokes
theorem gives
* * ** * ** * *
··· ω= ··· dω = · · · d(dω).
∂(∂M ) ∂M M
278 Chapter 5. Line and Surface Integrals; Vector Analysis
If we accept the geometric fact that ∂(∂M ) = ∅, then the integral on the left
vanishes, and hence so does the integral on the right. But since this happens for
every M , it follows that d(dω) = 0. Similarly, if we know that d(dω) = 0 for
every ω, we can conclude that ∂(∂M ) = ∅. This sort of interplay of algebra,
analysis, and geometry is a significant feature of much of modern mathematics.
Chapter 6
INFINITE SERIES
Infinite series are sums with infinitely many terms, of which the most familiar
examples are the nonterminating decimal expansions. For instance, the equality
π = 3.14159 . . . is an abbreviation of the statement that π is the sum of the infinite
series
1 4 1 5 9
3+ + 2 + 3 + 4 + 5 + ··· .
10 10 10 10 10
The procedure by which one makes sense out of such sums stands alongside dif-
ferentiation and integration as one of the fundamental limiting processes of mathe-
matical analysis. Just as decimal expansions provide a useful way of obtaining all
real numbers from the finite decimal fractions, infinite series provide a flexible and
powerful way of building complicated functions out of simple ones.
This chapter is devoted to the foundations of the theory of infinite series. In
it we develop the basic facts about series of numbers; then in the next chapter we
proceed to the study of series of functions.
Here the ak ’s can be real numbers, complex numbers, vectors, and so on; for the
present, we shall mainly consider the case where they are real numbers.
279
280 Chapter 6. Infinite Series
It is not immediately
!∞ clear what precise meaning is to be attached to an expres-
sion of the form 0 an that involves a sum of infinitely many terms. The formal
definition must be phrased in terms of limits of finite sums, as follows.
Given a sequence {an }∞0 of real numbers (or complex numbers, vectors, etc.),
we can form a new sequence {sk }∞ 0 by adding up the terms of the original sequence
successively:
s0 = a0 , s 1 = a 0 + a1 , s2 = a0 + a1 + a2 , ...,
s k = a0 + a 1 + · · · + ak .
An infinite series is formally defined to be !a pair of sequences {an } and {sk } re-
lated by these equations, and the notation ∞ 0 an is to be regarded as a convenient
way of encoding this information. The an ’s are called the terms of the series, and
the sk ’s are called the partial sums of the series. If the sequence {sk } of partial
sums converges to a! limit S, then the series is said to be convergent, S is called its
sum, and we write ∞ 0 an = S; otherwise, the series !∞ is said to be divergent, and
no numerical meaning is attached! to the expression 0 an . (However, if sk → ∞
as k → ∞, we may say that ∞ 0 a n = ∞.)
Remark. We have elected to start the numbering of the sequences {an } and
{sk } at n = 0 and k = 0, since this is perhaps the most common situation in
practice. However, we could equally well start at some other point, for instance,
∞
"
a n = a 5 + a6 + a 7 + · · · ,
5
s5 = a5 , s6 = a5 + a6 , s7 = a5 + a6 + a7 , . . . .
Before proceeding further, let us record a couple of very simple but important
facts about series.
6.1 Theorem.
! !∞
a. If
!the series ∞ 0 an and 0 bn are convergent, with sums S and T , then
∞
0 (a n + b n
!∞) is convergent, with sum S + T .
b. If
!the series 0 an is convergent, with sum S, then for any c ∈ R the series
∞
0 can is convergent,
! with sum cS.
c. If the series ∞ 0 a n is convergent,!then limn→∞ an = 0. Equivalently, if
an ̸→ 0 as n → ∞, then the series ∞ 0 an is divergent.
6.1. Definitions and Examples 281
!∞
Proof.
!∞ Let {s k } and {t k } be the sequences of partial sums of the series 0 an and
0 bn , respectively. (a) and (b) follow from the fact that if sk → S and tk → T ,
then sk + tk → S + T and csk → cS. As for (c), we observe that an = sn − sn−1 .
If the series converges to the sum S, it follows that lim an = lim sn − lim sn−1 =
S − S = 0.
At present we are thinking primarily of series whose terms are numbers, but
most of the really significant applications of series come !from situations where the
terms an depend on a variable x. In this case the series ∞ 0 an (x) may converge
for some values of x and diverge for others, and it defines a function whose domain
is the set of all x for which it converges. We shall explore this idea in more detail
in the next chapter; at this point we recall some familiar examples.
One of the simplest and most useful infinite series is the geometric series, in
which the ratio of two succeeding terms is a constant x. That is, the geometric
series with initial term a and ratio x is
∞
"
2 3
a + ax + ax + ax + · · · = axn .
0
s k = 1 + x + · · · + xk ,
xsk = x + · · · + xk + xk+1 ,
and subtracting the second equation from the first yields (1 − x)sk = 1 − xk+1 .
Therefore,
1 − xk+1
(6.2) sk = if x ̸= 1, sk = k + 1 if x = 1.
1−x
One simple sufficient condition to guarantee that Rk (x) → 0 follows from the
estimate for the Taylor remainder in Corollary 2.61:
|x|k+1
|Rk (x)| ≤ sup |f (k+1) (t)| (|x| < c).
|t|≤|x| (k + 1)!
6.6 Theorem. Let f be a function of class C ∞ on the interval (−c, c), where
0 < c ≤ ∞.
a. If there exist constants a, b > 0 such that |f (k) (x)| ≤ abk k! for all |x| < c and
k ≥ 0, then (6.5) holds for |x| < min(c, b−1 ).
b. If there exist constants A, B > 0 such that |f (k) (x)| ≤ AB k for all |x| < c
and k ≥ 0, then (6.5) holds for |x| < c.
Proof. By Corollary 2.61, the estimate |f (k) (x)| ≤ abk k! implies the estimate
|Rk−1 (x)| ≤ a|bx|k for |x| < c. If also |x| < b−1 , then |bx|k → 0 as k → ∞,
so (6.4) yields the result (a). To deduce (b), we observe that the factorial function
grows faster than exponentially (see Example 5 in §1.4), so that for any positive
A, B, and b, the sequence A(B/b)k /k! tends to zero as k → ∞. Letting a be the
largest term in this sequence, we have
+ ,
k (B/b)k k
AB = A b k! ≤ abk k!,
k!
so the estimate |f (k) (x)| ≤ AB k , for a given A and B, implies the estimate
|f (k) (x)| ≤ abk k! for every b > 0 (with a depending on b). Hence (b) follows
from (a).
Remark. The interval (−c, c) might not be the whole set where the function
f and its derivatives are defined. It may be necessary to restrict x to a proper
subinterval of the domain of f to obtain the estimates on f (k) (x) in Theorem 6.6,
as Example 2 will show.
6.1. Definitions and Examples 283
E XAMPLE 1. Let f (x) = cos x. The derivatives f (k)(x) are equal to ± cos x or
± sin x, depending on k, so they all satisfy |f (k) (x)| ≤ 1 !
for all x. By Theorem
6.6b, it follows that cos x is the sum of its Taylor series, ∞ n 2n
0 (−1) x /(2n)!,
!∞all x. nFor
for exactly the same reason, sin x is the sum of its Taylor series,
2n+1
0 (−1) x /(2n + 1)!, for all x.
E XAMPLE 2. Let f (x) = ex . Here f (k) (x) = ex for all k. We cannot obtain a
good estimate on f (k) (x) that is valid for all x at once, but for |x| < c we have
|f (k) (x)|
!∞ < ec . By Theorem 6.6b, it follows that ex is the sum of!its Taylor
series, 0 xn /n!, for |x| < c. But c is arbitrary, so in fact ex = ∞ n
0 x /n!
for all x.
Finally, we; bmention one other simple type of series that arises from time to
time. Just as a f (x)
!∞ dx is easy to compute when f is the derivative of a known
function, the series 0 an is easy to sum when the terms an are the differences of
a known sequence {bn }. That is, suppose a0 = b0 and an = bn − bn−1 for n ≥ 1;
then
sk = a0 + a1 + · · · + ak = b0 + (b1 − b0 ) + · · · + (bk − bk−1 ) = bk ,
!
so the!series ∞0 an converges if and only if the sequence {bn } converges, in which
∞
case 0 an = lim bn . Such series are called telescoping series.
EXERCISES
1. Find the values of x for which each of the following series converges and com-
pute its sum.
a. 2(x + 1) + 4(x + 1)4 + 8(x + 1)7 + · · · + 2n+1 (x + 1)3n+1 + · · ·
b. 10x−2 + 20x−4 + 40x−6 + · · · + 10 · 2n x−2(n+1) + · · ·
c. 1 + (1 − x)/(1 + x) + (1 − x)2 /(1 + x)2 + · · · + (1 − x)n /(1 + x)n + · · ·
d. log x + (log x)2 + (log x)3 + · · · + (log x)n + · · ·
2. Tell whether each of the following series converges; if it does, find its sum.
a. 1 + 34 + 58 + 169
+ 17
32 + · · ·
1 1 1 1
b. 1·2 + 2·3 + 3·4 + 4·5 + · · · (Hint: [n(n + 1)]−1 = n−1 − (n + 1)−1 ).
√ √ √ √ √ √
c. ( 2 − 1) + ( 3 − 2) + ( 4 − 3) + · · ·
d. 1 − 12 + 1 − 31 + 1 − 14 + 1 − 51 + · · ·
3. Let f (x) = log(1 + x). Show that the Taylor remainder R0,k (x) (defined by
(2.54)) tends to zero as k → ∞ for −1 < x ≤ 1, and conclude that
∞
" xn
log(1 + x) = (−1)n+1 for − 1 < x ≤ 1.
1
n
284 Chapter 6. Infinite Series
(Hint: Lagrange’s formula for R0,k easily yields the desired result when − 12 <
x ≤ 1 but not when −1 < x ≤ − 21 . For x < 0, use the integral for-
mula (2.56) for R0,k and the mean value theorem for integrals to show that
|R0,k (x)| = |x|(x′ − x)n (x′ + 1)−n−1 for some x′ ∈ (x, 0), and thence show
that |R0,k (x)| < |x|n+1 /(1 + x).)
Bk
4. Given a sequence {an } of numbers, let B∞ 1 an denote the product of the num-
bers a1 , . . . , ak . The infinite product 1 an is said to converge to the number
P if the sequence of partial products converges to P :
∞
H k
H
an = lim an = lim a1 a2 · · · ak .
k→∞ k→∞
1 1
(Note: In many books one finds a more complicated definition that takes ac-
B∞ role of the number 0 with regard to multiplication.)
count of the peculiar
a. Show that if 1 an converges to a nonzero number P , then limn→∞ an =
1. (This is theBanalogue of Theorem 6.1c for products.) !∞
b. Show that if ∞ 1 an converges to a nonzero number P , then 1 log an
converges after omission of those terms for which an < 0. (By (a), there
!∞ and no an can be 0.) Conversely,
can only be finitely many such terms, B
show that if an > 0 for all n and 1 log an converges to S, then ∞ 1 an
converges to eS . (See also Exercise 5 in §6.3.)
;k
F IGURE 6.1: Comparison of j f (x) dx (the area under the curve)
!k−1 !k
with n=j f (n) and n=j+1 f (n) (its upper and lower Riemann
sums).
6.7 Theorem. Suppose f is a positive, decreasing function on the half-line [a, ∞).
Then for any integers j, k with a ≤ j < k,
k−1
" * k k
"
f (n) ≥ f (x) dx ≥ f (n).
n=j j n=j+1
k
" * k * ∞
sk = f (1) + f (n) ≤ f (1) + f (x) dx ≤ f (1) + f (x) dx,
2 1 1
286 Chapter 6. Infinite Series
so the partial
; ∞ sums are bounded above and hence the series converges. On the other
hand, if 1 f (x) dx = ∞, we have
k−1
" * k
sk = f (n) + f (k) ≥ f (x) dx + f (k) → ∞ as k → ∞,
1 1
;∞
Proof. The same is true of the integrals 1 x−p dx, for
* ∞ ) 7
−p x1−p ))K (p − 1)−1 if p > 1,
x dx = lim =
1 K→∞ 1 − p )1 ∞ if p < 1,
;∞ )K
and 1 x−1 dx = limK→∞ log x)1 = ∞.
Theorem 6.7 does more than provide a test for convergence; it also provides an
approximation to the partial sums and the full sum of the series. In the convergent
!
case, this can be used to provide a numerical approximation to the sum ∞ 1 f (n)
or an estimate of how many terms must be used for a partial sum to provide a good
approximation; in the divergent case, it can be used to estimate how rapidly the
partial sums grow. ;∞
Suppose, for example, that f is positive and decreasing, and that 1 f (x) dx <
∞. By letting k → ∞ in Theorem 6.7, we obtain
"∞ * ∞ "∞
f (n) ≥ f (x) dx ≥ f (n),
1 1 2
and hence * *
∞ ∞
" ∞
f (x) dx ≤ f (n) ≤ f (1) + f (x) dx.
1 1 1
!
This gives an approximation to the sum ∞ 1 f (n) with an error of at most f (1).
A better approximation can be obtained by using this estimate not for the whole
series but for its tail end:
* ∞ ∞
" * ∞
f (x) dx ≤ f (n) ≤ f (k) + f (x) dx.
k k k
6.2. Series with Nonnegative Terms 287
k = 10 in (6.10) to get
∞
" * ∞
−4 −4 −4 −4
n ≈1 +2 + ··· + 9 + x−4 dx
1 10
A bit of work with a pocket calculator yields the!value of this last sum as
we can conclude that 1.08226 < ∞
1.08226 . . ., so! 1 n
−4 < 1.08236. (The
∞ −4 4
exact value of 1 n is π /90 = 1.0823232 . . . ; see Exercise 3 in §8.3 or
Exercise 9a in §8.6.)
General Comparison Tests. One can often decide whether a series of nonneg-
ative terms converges by comparing it to a series whose convergence or divergence
is known. The general method is as follows.
!∞
6.11 Theorem.
!∞ Suppose
!∞ 0 ≤ a n ≤ b n for n ≥
!∞0. If 0 bn converges, then so
does 0 an . If 0 an diverges, then so does 0 bn .
! ! !
Proof. Let sk = k0 an and tk = k0 bn ; thus 0 ≤ sk ≤ tk for all k. If ∞ 0 bn
converges, the numbers tk form a bounded set; hence so do the numbers sk , so the
sequence {sk } converges by the monotone sequence theorem. This proves the first
assertion, to which the second one is logically equivalent.
A couple of remarks are in order concerning this result. First, the convergence
or divergence of a series is unaffected if finitely many terms are deleted from or
added to the series. Hence, the comparison an ≤ bn only has to be valid for all
n ≥ N , where N is some (possibly large) positive integer. Second, the convergence
or divergence of a series is unaffected if all the terms of the series are multiplied by
a nonzero constant. Hence, the comparison an ≤ bn can be replaced by an ≤ cbn ,
where c is any positive number.
When an is an algebraic function of n (obtained from n by applying various
combinations of the arithmetic operations together with the operation
! of raising to
a
a power, x → x ), one can usually decide the convergence of an by comparing
288 Chapter 6. Infinite Series
!
it to one of the series ∞ −p
1 n , discussed in Theorem 6.9. The rule of thumb,
!
obtained by combining Theorems 6.9 and 6.11, is that
! if an ≥ cn−1 then an
diverges, whereas if an ≤ cn−p for some p > 1 then an converges.
!
E XAMPLE 2. ! The series ∞ 1 (2n − 1)
−1 = 1 + 1 + 1 + · · · diverges by
3 5
∞ −1
comparison to 1 n , for
1 1 1 1
> = · .
2n − 1 2n 2 n
!∞ 2
E XAMPLE 3. The series 1 (n − 6n + 10)−1 converges by comparison to
! ∞ −2
1 n , but here the comparison takes more work to establish. Since 6n > 10
except for n = 1, it is not true that (n2 − 6n + 10)−1 ≤ n−2 . However, we can
observe when n > 12 we have 6n < 21 n2 , and hence
1 1 1 2
< 2 < 2 = 2 (n > 12),
n2 − 6n + 10 (n /2) + 10 (n /2) n
which gives the desired comparison. However, there is also a simpler way to
proceed. The key observation is that when n is large, −6n + 10 is negligibly
small in comparison with n2 , so (n2 − 6n + 10)−1 is practically equal to n−2 .
More precisely,
(n2 − 6n + 10)−1 n2 1
= = → 1 as n → ∞,
n−2 n2 − 6n + 10 1 − 6n−1 + 10n−2
which immediately gives the comparison (n2 − 6n + 10)−1 < 2n−2 when n is
large.
The second method for solving Example 3 can be formulated quite generally;
the result is often called the limit comparison test:
6.12 Theorem. Suppose {an } and {bn } are sequences of positive numbers ! and
an /bn approaches a positive, finite limit as n → ∞. Then the series ∞
that ! 0 an
∞
and 0 bn are either both convergent or both divergent.
Proof. If an /bn → l as n → ∞, where 0 < l < ∞, we have 12 l < an /bn < 2l
when n is large; that is, an < 2lbn and bn < (2/l)an . The result therefore follows
from Theorem 6.11 and the remarks following it.
! If an /bn → 0 as n → ∞, then!
Theorem 6.12 can be extended a little. an <
bn for large n, so the convergence of bn will imply the convergence of !an .
Likewise, if an /bn → ∞, then ! an > bn for large n, so the convergence of an
will imply the convergence of bn . However, the reverse implications are not
valid in these cases.
6.2. Series with Nonnegative Terms 289
6.13 Theorem (The Ratio Test). Suppose {an } is a sequence of positive num-
bers. !
a. If an+1 /an < r for all sufficiently large n, where r < 1, then the series ∞0 an
converges.! On the other hand, if an+1 /an ≥ 1 for all sufficiently large n, then
the series ∞ 0 an diverges. !
b. Suppose that l = limn→∞ an+1 /an exists. Then the series ∞ 0 an converges
if l < 1 and diverges if l > 1. No conclusion can be drawn if l = 1.
6.14 Theorem (The Root Test). Suppose {an } is a sequence of positive num-
bers. !
1/n
a. If an < r for all sufficiently large n, where r < 1, then the series ∞ 0 an
1/n
! On the other hand, if an ≥ 1 for all sufficiently large n, then the
converges.
series ∞0 an diverges.
1/n !
b. Suppose that l = limn→∞ an exists. Then the series ∞ 0 an converges if
l < 1 and diverges if l > 1. No conclusion can be drawn if l = 1.
1/n
Proof. If an < r, we have an < r n , so we have an immediate comparison to the
! ! 1/n
geometric series r n that gives
! the convergence of an when r < 1. If an ≥ 1
then an ≥ 1, so an ̸→ 0 and an diverges. This proves (a).
290 Chapter 6. Infinite Series
1/n
Part (b) follows as in the proof of the ratio test. If an → l < 1, we pick
1/n ! 1/n
r ∈ (l, 1) and obtain an < r for large n, so an converges. If an → l > 1,
1/n !
then an ≥ 1 for large n, and an diverges. Finally, for an = n−p we have
1/n
an = n−p/n → 1 for any p, so the test is inconclusive when l = 1.
Note: In the last line of this proof, and in Example 4 below, we use the fact
that limx→∞ x1/x = 1. To see, this, observe that log(x1/x ) = (log x)/x, and
limx→∞ (log x)/x = 0 by l’Hôpital’s rule.
1/n
It can be shown that if an+1 /an converges to a limit l, then an also converges
1/n
to the same limit; but the convergence of an does not imply the convergence of
an+1 /an . (See Example 6.) Thus the root test is, in theory, more powerful than
the ratio test. However, the ratio test is often more convenient to use in practice,
especially for series whose terms involve factorials or similar sorts of products.
E XAMPLE 4. Let an = n2 /2n . !The ratio test and the root test can both be used
to establish the convergence of ∞0 an :
+ ,
an+1 (n + 1)2 /2n+1 1 n+1 2 1 1 1
= 2 n
= → , a1/n
n = n2/n → .
an n /2 2 n 2 2 2
1 · 4 · 7 · · · (3n + 1)
E XAMPLE 5. Let an = . Here the root test is cumber-
2n n!
some, but the ratio test works easily:
Here an+1 /an equals 1 if n is even and 12 if n is odd, so the ratio test (even
the more general form in part (a) of Theorem 6.13) fails; its hypotheses are not
1/n
satisfied. But the root test works: an equals 2−1/2 if n is even and 2−(n−1)/2n
if n is odd; both of these expressions converge to 2−1/2 as n → ∞, so the series
converges. (Of course, this can!∞also be proved
!∞ −m more simply. By grouping the
terms in pairs, one sees that 0 an = 2 0 2 = 4.)
6.2. Series with Nonnegative Terms 291
Raabe’s Test. The ratio test and the root test are, in a sense, rather crude,
1/n
for the indecisive cases where lim an+1 !/a n = 1 or lim an = 1 include many
commonly encountered series such as !∞ 1 n −p . The reason for this insensitivity is
that the terms of the geometric series r n either converge to zero exponentially
fast (if r < 1) or not at all (if r ≥ 1), so they do not furnish a useful comparison for
quantities such as n−p that tend to zero only polynomially fast. However, there is
another test, Raabe’s test, that is sometimes useful in the case where lim an+1 /an =
1. The class of problems for which Raabe’s test is effective is rather limited, and
there is another way of attacking the most important of them that we shall present in
§7.6. Hence we view Raabe’s test as an optional topic; however, the insight behind
it is of interest in its own right.
!
The idea
! −p is to use the ratios a n+1 /a n to compare the series !an to one of the
series n rather than to the geometric series. For the series n−p , the ratio
−p −p −p
of two successive terms is (n + 1) /n = [1 + (1/n)] . To put this quantity
in a form more amenable to comparison, we use the tangent line approximation to
the function f (x) = (1 + x)−p at x = 0. Since f ′ (x) = −p(1 + x)−p−1 and
f ′′ (x) = p(p + 1)(1 + x)−p−2 , Lagrange’s formula for the error term gives
p(p + 1) 2
(1 + x)−p = 1 − px + E(x), 0 < E(x) < x for x > 0.
2
Hence,
+ ,
(n + 1)−p 1 −p p p(p + 1)
(6.15) −p
= 1+ = 1 − + En , 0 < En < .
n n n 2n2
Thus, n[1 − (n + 1)−p /n−p ] is approximately p when n is large. With this in mind,
we are ready for the main result.
6.16 Theorem (Raabe’s Test). Let {an } be a sequence of positive numbers. Sup-
pose that + ,
an+1 an+1
→ 1 and n 1 − → L as n → ∞.
an an
! !
If L > 1, the series an converges, and if L < 1, the series an diverges. (If
L = 1, no conclusion can be drawn.)
Proof. If L > 1, choose a number p with 1 < p < L. Then, when n is large, we
have n[1 − (an+1 /an )] > p, that is, an+1 /an < 1 − (p/n). Thus, by (6.15),
The main applications of Raabe’s test are to series whose terms involve quo-
tients of factorial-like products. The following example is typical.
1 · 4 · 7 · · · (3n + 1)
E XAMPLE 7. Let an = . We have
n2 3n n!
an+1 1 · 4 · · · (3n + 1)(3n + 4)/(n + 1)2 3n+1 (n + 1)! (3n + 4)n2
= = .
an 1 · 4 · · · (3n + 1)/n2 3n n! 3(n + 1)3
This tends to 1 as n → ∞ (the dominant term on both top and bottom is 3n3 ),
so the ratio test fails. But
+ , + ,
an+1 (3n + 4)n2 5n3 + 9n2 + 3n 5
n 1− =n 1− 3
= 3
→ ,
an 3(n + 1) 3(n + 1) 3
!
and 35 > 1, so the series an converges.
!
Concluding Remarks. Faced with an infinite series an , how does one de-
cide how to test it for convergence? Some series require more cleverness than
others, but the following rules of thumb may be helpful.
!
• Does an → 0 as n → ∞? If not, an diverges.
• If an involves expressions with n in the exponent, try the ratio test or the root
test.
6.2. Series with Nonnegative Terms 293
• If an involves factorial-like products, the ratio test is the best bet. If the ratio
test fails because lim an+1 /an = 1, try Raabe’s test.
• The integral test may be useful when numerical estimates are desired or when
the series is near the borderline between convergence and divergence.
In any case, one should beware of confusing the ! various sequences that arise in
the study of infinite series. For any infinite series an , one has the sequence {an }
of terms and the sequence {sk } of partial sums. In the ratio test, one considers the
sequence {an+1 /an } of ratios of successive terms of a series, whereas in the limit
comparison test, one considers the sequence {an /bn } of ratios of corresponding
terms of two different series. Don’t mix these sequences up!
EXERCISES
In Exercises 1–18, test the series for convergence.
∞ √
" n+1
1. 2
.
n − 4n + 5
0
∞
"
2. ne−n .
1
∞
" 2n2 − n
3. .
1
2n8/3 + n
∞
" n+1
4. .
n!
1
∞
" (2n + 1)3n
5. .
(3n + 1)2n
0
∞
" 12 · 32 · · · (2n + 1)2
6. .
0
3n (2n)!
∞
" n!
7. .
1
10n
∞
"
8. (log n)−100 .
2
∞
" 1 · 3 · · · (2n + 1)
9. .
0
2 · 5 · · · (3n + 2)
294 Chapter 6. Infinite Series
∞
" (n!)2
10. .
(2n)!
0
∞
" 3n n!
11. .
nn
0
"-
∞
n
.n 2
12. .
n+1
1
∞
"
13. [1 − cos(1/n)].
1
∞ √ √
" n+1− n
14. √ .
1
n+2
∞
" n
15. sin .
n2 + 3
1
∞
" n2 [π + (−1)n ]n
16. .
5n
1
∞
" 1 · 3 · · · (2n − 1)
17. .
4 · 6 · · · (2n + 2)
1
∞
" 2 · 4 · · · (2n)
18. .
1
3 · 5 · · · (2n + 1)
! ! p
19. Suppose an > 0. Show that if an converges, then so does an for any
p > 1.
∞
" 1
20. Show that converges if p > 1 and diverges if p ≤ 1.
2
n(log n)p
∞
"
1
21. For which p does converge?
4
n(log n)(log log n)p
! !∞
22. By Exercise 20, ∞ 2 1/[n log n] diverges while
2
2 1/[n(log n) ] converges.
Use Theorem 6.7 to show that
1040 ∞
" 1 " 1
4.88 < < 5.61, ≈ 0.011.
2
n log n n(log n)2
1040
The point is that for series such as these that are near the borderline between
convergence and divergence, attempts at numerical approximation by adding
6.3. Absolute and Conditional Convergence 295
up the first few terms aren’t much use. If you add up the first 1040 terms of the
first series, you get no clue that the series diverges; and if you add up the first
1040 terms of the second one, the answer you get still differs from the full sum
in the second decimal place. (By way of comparison, the universe is around
1018 seconds old, and the earth contains around 1050 atoms.)
2 2 −1/2 , and thence show that
!∞x/(x 2+ 1) 2 is decreasing for x ≥ 3
23. Verify that
0.38 < 1 n/(n + 1) < 0.41.
24. Let ck = 1 + 12 + · · · + k1 − log k. Show that the sequence {ck } is positive
and decreasing, and hence convergent. (limk→∞ ck is conventionally denoted
by γ and is called Euler’s constant or the Euler-Mascheroni constant. It is
approximately equal to 0.57721; it is conjectured to be transcendental, but at
present no one knows whether it is even irrational.)
1/n
25. Suppose an > 0 for!all n > 0, and let L = lim sup an (see Exercises 9–12
in §1.5). Show that ∞
1 an converges if L < 1 and diverges if L > 1.
But then
Important Remark. We can consider series whose terms are complex numbers
or n-dimensional vectors instead of real numbers. The definition of absolute con-
vergence is the same, with |an | denoting the norm of the vector an . Theorem 6.17
remains valid in this more general setting, with exactly the same proof.
The converse of Theorem 6.17 is false; a series that is not absolutely convergent
may still converge because of cancellation between the positive and negative terms.
A series that converges but does not converge absolutely is said to be conditionally
convergent.
E XAMPLE 1. Let an = 1/(n + 1) if n is even, an = −1/n if n is odd; thus,
∞
"
1 1 1 1
an = 1 − 1 + 3 − 3 + 5 − 5 + · · ·.
0
series for f (x) = log(1 + x) at x = 1. Indeed, for n > 0 we have f (n) (x) =
(−1)n−1 (n − 1)!(1 + x)−n , so Taylor’s formula gives
k
" (−1)n−1 (n − 1)!
log(1 + x) = xn + Rk (x)
n!
1
x2 x3 (−1)k−1 xk
=x− + + ··· + + Rk (x),
2 3 k
and by Corollary 2.61,
) )
1 ) (−1)k k! )
|Rk (1)| ≤ )
sup ) )= 1 ,
(k + 1)! 0≤t≤1 (1 + t) ) k + 1
k
!
which tends to zero as k → ∞. It follows that ∞ 1 (−1)
n−1 /n converges to
log 2.
6.3. Absolute and Conditional Convergence 297
a+
n = max(an , 0) a−
n = max(−an , 0).
That is, a+ + −
n = an if an is positive and an = 0 otherwise, and an = |an | if an
− +
! and an = 0 otherwise;
is negative the nonzero an ’s are the positive terms of the
series an , and the nonzero a−n ’s are the absolute values of the negative terms.
Observe that
a+ −
n − an = a n , a+ −
n + an = |an |.
! ! + ! −
6.18 Theorem. If !an is absolutely convergent, the series a!n and a!
n are
both convergent. If an is conditionally convergent, the series a+
n and a−n
are both divergent.
Absolutely convergent series are much more pleasant to deal with than condi-
tionally convergent ones. For one thing, they converge more rapidly; the partial
sums sk of conditionally convergent series tend to provide poor approximations
! to
the full sum unless one takes k very large because the divergence of |an | implies
that an cannot tend to zero very rapidly as n → ∞. For another thing, the sum
of an absolutely convergent series cannot be affected by rearranging the terms, but
this is not the case for conditionally convergent series!
!∞Let us explain this mysterious statement in more detail. The terms of a series
0 an are presented in a definite order: a0 , a1 , a2 , . . .. We might think of forming
a new series by writing down these terms in a different order, such as
a0 , a2 , a1 , a4 , a6 , a3 , a8 , a10 , a5 , . . . ,
298 Chapter 6. Infinite Series
where we take the first two even-numbered terms, the first odd-numbered term,
the next two even-numbered terms, the next odd-numbered term, and so forth. In
general, if σ is any one-to-one
! mapping from the set of nonnegative integers! onto it-
self, we can form the series ∞ 0 aσ(n) , which we call a rearrangement of ∞
0 an .
(The reasons why we would want to do this are perhaps not so clear right now, but
we will encounter situations in §6.5 where this issue must be addressed.) The sharp
contrast between absolutely and conditionally convergent series with respect to re-
arrangements is explained in the following two theorems.
!∞
6.19 Theorem.!∞ If 0 an is absolutely convergent with sum S, then every rear-
rangement 0 aσ(n) is also absolutely convergent with sum S.
!
Proof. First suppose an ≥ 0 for all n. Every! term of the rearranged series aσ(n)
is among the terms of the original series an , and hence the partial sums of the
rearranged series cannot exceed S. It follows that the full sum S ′ of the rearranged
series satisfies S ′ ≤ S. The same reasoning shows that S ≤ S ′ , so S ′ = S.
! !
Now we do the general case. If |an | < ∞, we have |aσ(n) | < ∞ by what
we have just proved. Hence, given ϵ > 0, for k sufficiently large we have
∞
" ∞
"
|an | < ϵ and |aσ(n) | < ϵ.
k+1 k+1
Given such a k, let K be the largest of the numbers σ(0), . . . , σ(k), so that
% & % &
σ(0), σ(1), . . . , σ(k) ⊂ 0, 1, . . . , K .
The elements of {0, 1, . . . , K} \ {σ(0), σ(1), . . . , σ(k)} are among the σ(n)’s with
n ≥ k + 1, so
)" ) "
)K " k
) ∞
) an − )
aσ(n) ) ≤ |aσ(n) | < ϵ.
)
0 0 k+1
But then
) k ) ) k K ) )K ) ∞
)" ) )" " ) )" ) "
) ) )
aσ(n) − S ) ≤ ) aσ(n) − ) )
an ) + ) )
an − S ) ≤ ϵ + |an | < 2ϵ.
)
0 0 0 0 K+1
!∞
As ϵ is arbitrary, we conclude that 0 aσ(n) = S.
!
6.20 Theorem. Suppose ∞ 0 an!is conditionally convergent. Given any real num-
ber S, there is a rearrangement ∞ 0 aσ(n) that converges to S.
6.3. Absolute and Conditional Convergence 299
! ! −
Proof.!By Theorem 6.18, the series ! a+ n and an of positive and negative terms
from an both diverge; but since an converges, we have an → 0 as n → ∞.
These pieces of information are all we need.
Suppose S ≥ 0. (A similar argument works for S < 0.) We construct the
desired rearrangement as follows: !
1. Add up the positive terms from the series! an (in their original order) until
the sum exceeds S. This is possible since a+ n = ∞. Stop as soon as the sum
exceeds S.
2. Now start adding in the negative terms (in their original
! − order) until the sum
becomes less than S. Again, this is possible since an = ∞. Stop as soon as
the sum is less than S.
3. Repeat steps 1 and 2 ad infinitum. That is, add in positive terms until the sum
is greater than S, then add in negative terms until the sum ! is+ less than ! − S, and
so forth. This process never terminates since the series an and an both
diverge, and sooner or later every term from the original! series will be added
into the new series. The result is a rearrangement ∞ 0 aσ(n) of the original
series.
We claim that this rearrangement converges to S. Indeed, given ϵ > 0, there exists
an integer N so that |an | < ϵ if n > N . If we choose K large enough so that all
the terms a0 , a1 , . . . , aN are included among the terms aσ(0) , aσ(1) , . . . aσ(K) , then
!
|aσ(n) | < ϵ if n > K. It follows that the partial sums k0 aσ(n) differ from S by
less than ϵ if k > K, because the procedure specifies switching from positive to
negative terms or vice versa as soon as the sum is greater than or less than S; if
the sum became greater than S + ϵ or less than! S − ϵ, we would have added in too
many terms of the same sign. Hence the sums k0 aσ(n) converge to S.
EXERCISES
1. Show!that the following series are absolutely convergent.
a. !∞0 x n cos nθ (|x| < 1, θ ∈ R).
b. !∞1 n −2 sin nθ (θ ∈ R).
∞ n 2 1−n
c. 1 (−1) n 3 xn (|x| < 3).
!
2. Suppose
! an is conditionally convergent. Show that there are rearrangements
of an whose partial sums diverge to +∞ or −∞.
!
3. Consider the rearrangement of the series ∞ 1 (−1)
n−1 /n obtained by taking
two positive terms, one negative term, two positive terms, one negative term,
and so forth:
1 1 1 1 1 1 1 1
1+ 3 − 2 + 5 + 7 − 4 + 9 + 11 − 6 + ··· .
300 Chapter 6. Infinite Series
Show that the sum of this series is 32 log 2. (Hint: Deduce from Example 2 that
0 + 21 + 0 − 14 + 0 + 16 + 0 − · · · = 12 log 2 and add this to the result of Example
2.)
! !∞
4. Let ∞ 0 an be a convergent series, and let 0 bn be its rearrangement ob-
tained by interchanging each even-numbered term with the odd-numbered term
immediately
! !following it: a1 + a0 + a3 + a2 + a5 + a4 + · · · . Show that
∞ ∞
0 nb = 0 a n .
5. Suppose an > −1 for all n. By suitable applications of Taylor’s theorem to the
x
! log(1 + x) or e , show the following: !
functions
a. an is absolutely convergent if and only if log(1 + an ) is absolutely
Bof interest in connection with Exercise 4 of §6.1: If
convergent. (This is
!
|an | < ∞, then (1 + an ) converges.)
!
√
b. Let an = (−1)n+1 / n. ! Then ∞ 1 an is conditionally convergent (see
∞
Theorem 6.22 below), but 1 log(1 + an ) diverges.
6.21 Theorem.
!
a. If |an | ≤ Cn−1−ϵ for some C, ϵ > ! 0, then an converges absolutely. If
−1
|an | ≥ Cn for some C > 0, then an either converges conditionally or
diverges. !
b. (The Ratio Test) If |an+1 /an | → l as n → ∞, then an converges absolutely
if l < 1 and diverges if l > 1. !
c. (The Root Test) If |an |1/n → l as n → ∞, then an converges absolutely if
l < 1 and diverges if l > 1.
In the ratio and root tests, the divergence (rather than conditional convergence)
when l > 1 is guaranteed because an ̸→ 0 in this case; see the proofs of Theorems
6.13 and 6.14. The statements of the ratio and root tests can be sharpened a bit as
in Theorems 6.13a and 6.14a.
Warning. It is a common mistake to obtain incorrect results ! by forgetting the
absolute values in Theorem 6.21. For example, the series ∞ 0 (−2) n satisfies
sk > S for k even, sk < S for k odd, and |sk − S| < ak+1 for all k.
Thus the sequence {s2m−1 } of odd-numbered partial sums is increasing and the
sequence {s2m } of even-numbered partial sums is decreasing. This monotonicity
further yields
so {s2m−1 } and {s2m } are bounded above and below, respectively. By the mono-
tone sequence theorem, these sequences both converge, and since s2m − s2m−1 =
a2m → 0,! their limits are equal. Thus the whole sequence {sk } converges, that is,
the series (−1)n an converges. The even-numbered partial sums decrease to the
full sum S while the odd-numbered ones increase, so S < s2m and S > s2m−1 for
all m. In particular,
The alternating series test is a useful test for conditional convergence, but the
fact that the difference between a partial sum and the full sum is less in absolute
value than the first neglected term is also of interest in the absolutely convergent
case. (This estimate for the error in replacing the full sum by a partial sum is, in
most cases, accurate to within an order of magnitude.) !
The alternating series test can be applied to a series (−1)n an for which
lim an = 0 provided that the an ’s decrease from some point onward. (Of course,
the inequalities for the partial sums are only valid from that point onward too.)
However, the monotonicity condition cannot be dropped entirely, as the following
example shows:
1 1 1 1 1 1 1
1− 2 + 2 − 4 + 3 − 6 + ··· + m − 2m + · · ·.
Here an → 0 as n → ∞, but not monotonically, and the series diverges. (The sum
of!the first 2m terms is 12 (1 + 12 + 31 + · · · + m1
), a partial sum of the divergent series
1 −1
2 n .)
The tests we have developed can ! be used to analyze a wide variety of power
series, that is, series of the form ∞ 0 n c (x − a) n where x is a real variable. In
typical cases, the ratio test or the root test will establish that there is some number r
such that the series converges absolutely for |x−a| < r and diverges for |x−a| > r.
The convergence at the two remaining points x = a ± r can then be studied by one
of the other tests.
∞
" (−1)n (x − 3)n
E XAMPLE 2. Consider the series . We start with the ratio
(n + 1)22n+1
0
test:
) ) ) )
) an+1 ) ) (−1)n+1 (x − 3)n+1 /(n + 2)22n+3 ) n + 1 |x − 3| |x − 3|
) )=) )
) an ) ) (−1)n (x − 3)n /(n + 1)22n+1 ) = n + 2 4 →
4
.
Thus the series converges absolutely for |x−3| < 4 and diverges for |x−3| > 4.
(The root test would also yield this result.) The two remaining points are where
x − 3 = ±4, that is, x = −1 and x = 7. At these two points the series becomes
∞
" ∞ ∞ ∞
(−1)n (−4)n 1" 1 " (−1)n 4n 1 " (−1)n
= and = .
(n + 1)22n+1 2 n+1 (n + 1)22n+1 2 n+1
0 0 0 0
The first of these diverges, while the second one converges by the alternating
series test. The convergence is only conditional, by the divergence of the first
series. Thus the original series converges absolutely for −1 < x < 7, con-
verges conditionally at x = 7, and diverges elsewhere.
6.4. More Convergence Tests 303
6.23 Lemma (Summation by Parts). Given two numerical sequences {an } and
{bn }, let
a′n = an − an−1 , Bn = b0 + · · · + bn .
Then
k
" k
"
an bn = ak Bk − a′n Bn−1 .
0 1
(6.24) a0 b0 + a1 b1 + a2 b2 + · · · + ak bk
= a0 B0 − a1 B0 + a1 B1 − a2 B1 + a2 B2 − · · · − ak Bk−1 + ak Bk
= −a′1 B0 − a′2 B1 − · · · − a′k Bk−1 + ak Bk .
6.25 Theorem (Dirichlet’s Test). Let {an } and {bn } be numerical sequences. Sup-
pose that the sequence {an } is decreasing and tends to 0 as n → ∞, and that the
sums Bn = b0 +· · ·+b!n are bounded in absolute value by a constant C independent
of n. Then the series ∞ 0 an bn converges.
! ! ′
Proof. With notation as in Lemma 6.23, k0 an bn = ak Bk − k1! an Bn−1 , so
it is enough to show that limk→∞ ak Bk exists and that the series ∞ ′
1 an Bn−1
converges. The first assertion is easy: Since |Bk | ≤ C and ak → 0, we have
|ak Bk | ≤ Cak → 0. On the other hand, since {an } is decreasing, we have a′n ≤ 0
for all n, so
k
" k
"
|a′n Bn−1 | ≤ C |a′n |
1 1
8 9
= C (a0 − a1 ) + (a1 − a2 ) + · · · + (ak−1 − ak ) = C(a0 − ak ) ≤ Ca0
!∞
for all k. It follows that the series 1 a′n Bn−1 is absolutely convergent and hence
convergent.
304 Chapter 6. Infinite Series
Dirichlet’s test includes the alternating series test as a special case, by taking
bn = (−1)n , for which Bn = 1 or 0 according as n is even or odd. The other
situations in which it is most commonly applied are those with bn = sin nθ or
bn = cos nθ, where θ is not an integer multiple of 2π. That the hypotheses on {bn }
in Dirichlet’s test are satisfied in these cases is shown by the following calculation.
k
" cos 21 (k + 1)θ · sin 21 kθ
cos nθ = ,
1
sin 21 θ
k
" sin 12 (k + 1)θ · sin 21 kθ
sin nθ = .
1
sin 21 θ
k
" eikθ − 1 iθ e
ikθ/2 [eikθ/2 − e−ikθ/2 ]
einθ = eiθ = e
eiθ − 1 eiθ/2 [eiθ/2 − e−iθ/2 ]
1
eikθ/2 − e−ikθ/2
= ei(k+1)θ/2
eiθ/2 − e−iθ/2
8 9 sin 21 kθ
= cos 12 (k + 1)θ + i sin 12 (k + 1)θ .
sin 12 θ
The asserted formulas follow by taking the real and imaginary parts of both sides.
Proof. The hypotheses of Dirichlet’s test are satisfied for θ ̸= 2πj, for if bn is either
cos nθ!or sin nθ, the lemma implies that |Bn | ≤ | csc 12 θ| for all n. (If θ = 2πj, the
series an sin nθ converges trivially since sin nθ = 0 for all n.)
6.4. More Convergence Tests 305
EXERCISES
In Exercises 1–9, determine the values of x at which the series converges absolutely
or conditionally.
∞
" (x + 2)n
1. .
n2 + 1
0
∞
"
2. n3 (2x − 1)n .
1
∞
" x2n
3. .
1 · 3 · · · (2n + 1)
0
∞
" nxn+2
4. .
5n (n + 1)2
1
∞
"(−1)n (x − 4)n
5. .
0
(2n − 3) log(n + 3)
∞ - .
" 1 x−1 n
6. √ .
1
n x+1
∞
" 2 · 4 · · · (2n) 1
7. ( x − 3)n .
1
1 · 3 · · · (2n − 1) 2
∞
" (−1)n (x + 1)2n
8. .
0
3n + 2
∞
" 1 · 3 · · · (2n + 1) n
9. x .
2 · 5 · · · (3n + 2)
0
∞
"
13. (−1)n−1 log(n sin n−1 ).
1
∞
" + - .n ,
n−1 n+1
14. (−1) e− .
n
1
1 2 1 4
15. Use the alternating series test to show that x−1 sin x = 1 − 3! x + 5! x −
1 6
7! x + E(x) where 0 < E(x) < 0.027 for |x| ≤ π.
!
16. (Abel’s Test) Suppose an is a convergent series and {bn } is a decreasing
!
sequence of positive numbers. (lim bn need not be zero.) Show that an bn
converges. (This can be done by using Dirichlet’s test or by modifying the
proof of Dirichlet’s test.)
! !∞ −p
17. Show that if ∞ 1 an converges, then so does 1 n an for any p > 0. For
which p can you guarantee absolute convergence without knowing anything
more about the an ’s?
!
18. For which x and θ does ∞ −1 n
1 n x cos nθ converge?
that is, a series whose terms are indexed by ordered pairs of nonnegative integers.
The difficulty in making precise sense out of such an expression is that it is not
clear what one should mean by a “partial sum.” Two obvious candidates are the
“square” partial sums and the “triangular” partial sums
k
" "
s!
k = amn , s△
k = amn ,
m,n=0 m+n≤k
which are defined by adding up all the terms amn for which (m, n) lies in the
outlined regions in Figure 6.2. (Note that passing from s! △ ! △
k or sk to sk+1 or sk+1
involves adding not just a single term but a finite set of terms to the sum. It is not
necessary to specify the order in which these terms are added, as finite addition
is commutative.) Clearly there are many other possibilities. Indeed, there are in-
finitely many ways to enumerate the set of ordered pairs of nonnegative integers,
each of which leads to a different notion of “partial sums.”
6.5. Double Series; Products of Series 307
k k
n
m
0 0
0 k 0 k
There is yet another possibility: One can consider the double series (6.28) as
an iterated series, just as one can regard double integrals as iterated integrals. That
is, one could interpret (6.28) as
"∞ -" ∞ . "∞ -" ∞ .
amn or amn ,
m=0 n=0 n=0 m=0
!∞
in which one forms the ordinary! series σm = n=0 amn for each m and then
adds up the sums to obtain ∞ m=0 mσ , or similarly with m and n switched. This is
different from the partial-sum procedures discussed above because the intermediate
steps involve infinite sums rather than finite ones.
How is one to make sense out of all these ways of interpreting (6.28)? The
answer, in a nutshell, is that the situation is similar to that for improper double
integrals discussed in §4.7: For series of positive terms, or for absolutely conver-
gent series, there is no problem, as all interpretations lead to the same answer.
Otherwise, one must proceed with great caution.
Let us explain this in more detail. Given any one-to-one correspondence j ↔
(m, n) between the set of nonnegative integers and the set of ordered pairs! of non-
∞
negative integers, we can set bj = amn and form the ordinary!∞ infinite series 0 bj ;
! of the double series m,n=0 amn . The essential
we call such a series an ordering
point is that the orderings of amn are all rearrangements of one another, and we
can apply Theorem 6.19. !
First, if amn ≥ 0, then either all orderings of amn diverge or all orderings
converge,
! and in the latter case their sums are all equal. Thus, the sum of the series
amn is well defined as a positive number or +∞, independent of the choice of
ordering. !
Second,! without the assumption of positivity, if |bj | is convergent for one
ordering of amn , then the same is true for every ordering. In this case the series
308 Chapter 6. Infinite Series
!
amn !is called absolutely convergent, and by Theorem 6.19 again, all order-
ings of
! amn have the same sum, which we call the sum of the double series
amn . Moreover, !an argument similar to the proof of Theorem 6.19 shows that
the
! ! double series amn is absolutely convergent! if and only! if the!iterated series
m ( n |amn |) is convergent, in which case m,n amn = m ( n amn ). (See
Exercises 5 and 6.)
!
Given a double! series amn , we can therefore proceed as follows. First we
evaluate the series |amn | by ordering it in some fashion! or treating it as an iter-
ated series; if it turns out to be finite, we can then evaluate amn by ordering it in
any fashion or treating it as an iterated series.
!
What if amn is not absolutely convergent? Let us separate out the positive
and negative terms as we ! did in Theorem! 6.18. The argument in the proof of!Theo-
rem 6.18 shows that if a+ mn =
! + ∞ but a−
mn! < ∞, then all orderings of amn
+∞; a < ∞ −
diverge
! to likewise, if mn ! +amn =
but !∞,− then all orderings of
amn diverge to −∞. On the other hand, if amn = amn = ∞ but amn !→ 0
as m, n → ∞, the proof of Theorem 6.20 shows that various orderings of amn
can converge to any real number. In this ! case, therefore, we simply cannot make
numerical sense out of the expression amn without specifying more precisely
how the summation is to be performed.
An important situation in which double series occur is in multiplying two series
together. The basic result is as follows.
! !∞
6.29 Theorem. Suppose that ∞ 0 am and ! 0 bn are both absolutely convergent,
with sums A and B. Then the double series ∞ m,n=0 am bn is absolutely convergent,
and its sum is AB.
!
Proof. We consider the ! square partial
! sums of am bn , which are just the products
of the partial sums of am and bn :
k
" -"
k .-"
k .
(6.30) a m bn = am bn .
m,n=0 0 0
If we replace am!and bn by !|am | and |bn | in (6.30), the right side is bounded!by the
finite quantity ( ∞
0 |am |)( ∞
0 |b n |), which shows that the double ! am bn
series
is absolutely convergent. Then, letting k → ∞ in (6.30), we obtain am bn =
AB.
! Under the conditions of Theorem 6.29, we are free to use any ordering of
am bn that we choose, and in particular, we can use the triangular partial sums
rather than the square ones. This is the natural thing to do when considering power
6.5. Double Series; Products of Series 309
! !
series. Indeed, if an xn and ! bn xn are absolutely convergent for a particu-
lar value of x, their product is am bn xm+n , which can also be expressed as a
power series if we group together all the terms involving a given power of x. The
terms involving xj are those with m + n = j, i.e., those with m = 0, 1, . . . , j and
n = j − m. Collecting these terms together yields
-" ∞ .-" ∞ . " ∞ + " ,
n n
an x bn x = a n bm x j .
0 0 j=0 m+n=j
The expression on the right is a power series whose jth coefficient is a finite sum
of products of the original coefficients;
! its partial sums are precisely the triangular
partial sums of the double series am bn xm+n .
The same procedure can also be used for series !∞ without an!x (by taking x = 1,
if you like). That is, given two convergent series 0 am and ∞ 0 bn , we can form
the series
"∞ - " . " ∞
an bm = (a0 bj + a1 bj−1 + · · · + aj−1 b1 + aj b0 ),
j=0 m+n=j j=0
!
whose partial sums are the triangular!partial sums
! of the double series a
!m bn ;
called the Cauchy product of
it is ! am and bn . As we have seen, if am
and
! bn!are absolutely convergent, their Cauchy product is ! too, and!its sum is
( am )( bn ). In fact, the ! Cauchy product
! converges to ( am )( bn ) pro-
vided that at least one of am and bn is absolutely
! convergent! (see Krantz
[12, pp. 109–10], or Rudin [18, p. 74]). However, if am and bn are both
conditionally convergent, their Cauchy product may diverge. (See Exercise 4.)
EXERCISES
1. By multiplying the!geometric series by itself, show that for |x| < 1,
∞
a. (1 − x)−2 = ! 0 (n + 1)x ;
n
1 ∞
−3
b. (1 − x) = 2 0 (n + 1)(n + 2)xn .
!∞ n
2. Let f (x) = 0 x /n!. Show directly from this formula that f (x)f (y) =
f (x + y).
!
3. Verify that the Taylor series of (1 − 4x)−1/2 about x = 0 is ∞ n
0 (2n)!x /(n!)
2
1
and that this series converges absolutely for |x| < 4 . Then, taking for granted
that the sum of this series actually is (1 − 4x)−1/2 (which we shall prove in
§7.3), multiply the series by itself and conclude that for any positive integer j,
"j
(2n)!(2j − 2n)!
2 ((j − n)!)2
= 4j .
n=0
(n!)
310 Chapter 6. Infinite Series
!
4. Show that the series ∞ n
0 (−1) (n+1)
−1/2 is conditionally convergent and that
the Cauchy product of this series with itself diverges. (Hint: The maximum
of the function f (x) = (x + 1)(j − x + 1) occurs at x = 21 j, and hence
(n + 1)(j − n + 1) ≤ ( 12 j + 1)2 for n = 0, . . . , j.)
! !∞ !∞
5. Show that ∞ m,n=0 amn = m=0 ( n=0 amn ) whenever amn ≥ 0 for all
m, n ≥ 0.
!
Suppose! ∞
6. ! m,n=0 amn is absolutely convergent.
!∞ Show that the iterated series
∞ ∞
m=0 ( a
n=0 mn ) converges to the sum m,n=0 amn . (Use Exercise 5.)
!∞
7. Show that m,n=1 (m + n)−p converges if and only if p > 2. (Hint: Use
triangular partial sums.)
8. Let amn = 1 if m = n, amn ! = −1 ! if m − n = 1, and
!∞amn!
= 0 otherwise.
∞ ∞
Show that the iterated series n=0 m=0 amn and m=0 ∞ n=0 amn both
converge, but their sums are unequal.
Chapter 7
FUNCTIONS DEFINED BY
SERIES AND INTEGRALS
In this chapter we study the convergence of sequences and series whose terms are
functions of a variable x and improper integrals whose integrand contains x as a
free variable. In all these situations, the study of the resulting function of x may
reveal unpleasant surprises unless we have some control over the way the rate of
convergence varies along with x; the most commonly encountered form of such
control, uniform convergence, is a major theme of this chapter.
This is, indeed, what is usually meant by the statement “fk → f on S” when no
further qualification is added; when we wish to be very clear about it, we shall say
that fk → f pointwise on S when (7.1) holds.
Unfortunately, pointwise convergence is a rather badly behaved operation in
the sense that it does not interact well with other limiting operations, such as dif-
ferentiation and integration. Consider the following group of examples:
311
312 Chapter 7. Functions Defined by Series and Integrals
E XAMPLE 1. Let
1 1
fk (x) = arctan kx, gk (x) = fk′ (x) = ,
k k 2 x2 +1
(7.2)
−2k2 x
hk (x) = gk′ (x) = .
(k2 x2 + 1)2
Observe that fk (x) = k−1 f1 (kx), gk (x) = g1 (kx), and hk (x) = kh1 (kx).
In graphical terms, as shown in Figure 7.1, this means that the graph of fk is
obtained from the graph of f1 by shrinking the x and y scales by a factor of k;
the graph of gk is obtained from the graph of g1 by shrinking the x scale by a
factor of k and leaving the y scale unchanged; and the graph of hk is obtained
from the graph of h1 by shrinking the x scale and expanding the y scale by a
factor of k. We have:
i. fk (x) → 0 for all x, since |fk (x)| ≤ π/2k.
ii. gk (x) → 0 for all x ̸= 0, but gk (0) = 1 for all k. That is,
7
1 if x = 0,
lim gk (x) = g(x) ≡
k→∞ 0 otherwise.
iii. hk (x) → 0 for all x. (hk (0) = 0 for all k, and if x ̸= 0, hk (x) ≈ −2/k 2 x3
for large k.)
Therefore, g is discontinuous even though the gk ’s are all continuous; more-
over, since gk is the derivative of fk and an antiderivative of hk ,
/ 0′
lim fk′ (0) = 1 ̸= 0 = lim fk (0);
k→∞ k→∞
lim lim gk (x) = 1 ̸= 0 = lim lim gk (x);
k→∞ x→0 x→0 k→∞
* 1 * 1
8 9
lim hk (x) dx = −1 ̸= 0 = lim hk (x) dx.
k→∞ 0 0 k→∞
Clearly, if we want some theorems to the effect that “the integral of the limit is
the limit of the integrals,” or “the derivative of a limit is the limit of the derivatives,”
pointwise convergence is the wrong condition to impose. We now develop a more
stringent notion of convergence that removes some of the pathologies.
The real trouble with pointwise convergence is as follows. The statement
“fk (x) → f (x) for all x ∈ S” means that, for each x, fk (x) will be close to
f (x) provided k is sufficiently large, but the rate of convergence of fk (x) to f (x)
can be very different for different values of x. For example, if gk is as in (7.2), for
all x ̸= 0 we have gk (x) → 0, so |gk (x)| < 10−4 (say) provided k is sufficiently
314 Chapter 7. Functions Defined by Series and Integrals
large; for x = 10, “sufficiently large” means k ≥ 10, but for x = 0.1, it means
k ≥ 1000. If, however, we have some control over the rate of convergence that is
independent of the particular point x, then many of the pathologies disappear.
The precise definition is as follows. A sequence {fk } of functions defined on a
set S ⊂ Rn is said to converge uniformly on S to the function f if for every ϵ > 0
there is an integer K such that
The point here is that the same K will work for every x ∈ S. Another way of
writing (7.3) is
The geometry of this inequality is indicated in Figure 7.2. Yet another way of
expressing uniform convergence is the following, which is sufficiently useful to be
displayed as a theorem.
7.5 Theorem. The sequence {fk } converges to f uniformly on S if and only if
there is a sequence {Ck } of positive constants such that |fk (x) − f (x)| ≤ Ck for
all x ∈ S and limk→∞ Ck = 0.
Proof. If fk → f uniformly, by (7.4) we can take Ck = supx∈S |fk (x) − f (x)|.
Conversely, if Ck → 0, for any ϵ > 0 there exists K such that Ck < ϵ whenever
k > K, and hence |fk (x) − f (x)| ≤ Ck < ϵ for all x ∈ S whenever k > K; that
is, (7.3) holds.
Let us take another look at the examples in (7.2) with regard to uniform con-
vergence. First, the sequence {fk } defined by fk (x) = k−1 arctan kx converges
uniformly to 0 on R, since we can take Ck = π/2k in Theorem 7.5. Second, the
sequence {gk } defined by gk (x) = (k2 x2 + 1)−1 does not converge uniformly to
its limit g on R; indeed,
1
sup |gk (x) − g(x)| = sup = 1 for all k.
x∈R x̸
=0 k 2 x2 +1
(Notice that the supremum is not actually achieved; the maximum of (k2 x2 + 1)−1
occurs at x = 0, but g(0) = 1, so gk (0)−g(0) = 0. See Figure 7.2.) Finally, the se-
quence {hk } defined by hk (x) = −2k2 x(k2 x2 +1)−2 does not converge uniformly
to its limit 0 on R. Indeed, a bit of calculus
√ shows that the√
minimum and maximum
values of hk (x), achieved at x = ±1/ 3k, are ∓9k/8 3, so supx |hk (x) − 0|
actually tends to ∞ rather than 0.
7.1. Sequences and Series of Functions 315
On the other hand, the bad behavior in these examples is all at x = 0. The
sequences {gk } and {hk } do converge uniformly to 0 on the intervals [δ, ∞) and
(−∞, −δ] for any δ > 0. For gk this is clear:
1 1
|gk (x) − 0| = ≤ (x ≤ −δ or x ≥ δ),
k 2 x2 +1 δ2 k2 +1
and (δ2 k2 + 1)−1 → 0 as k → ∞. For hk we do not get a good estimate for the first
√ of k, but (by the same bit of calculus as in the preceding paragraph) when
few values
k > 1/ 3δ the function hk is positive and increasing on (−∞, −δ] and negative
and increasing on [δ, ∞), so the maximum of |hk | on these intervals occurs at the
endpoints ±δ:
- .
2δk2 1
|hk (x) − 0| ≤ 2 2 x ≤ −δ or x ≥ δ, k > √ .
(δ k + 1)2 3δ
The phenomenon exhibited here is quite common. That is, one has a sequence
{fk } of functions that converge pointwise to f on a set S; the convergence is not
uniform on all of S but is uniform on many “slightly smaller” subsets of S. The
situation we shall encounter most often is where S is an open interval (a, b), and
the “bad behavior” occurs near the endpoints, so that the convergence is uniform on
[a + δ, b − δ] for any δ > 0. In this case, the sequence of constants Ck in Theorem
39 will generally depend on δ — as they do in the preceding examples.
The notion of Cauchy sequence has an obvious adaptation to the context of uni-
form convergence. Namely, a sequence {fk } of functions on a set S is uniformly
Cauchy if for every ϵ > 0 there is an integer K so that
or in other words,
|f (x) − f (a)| ≤ |f (x) − fk (x)| + |fk (x) − fk (a)| + |fk (a) − f (a)|
ϵ ϵ ϵ
< + + = ϵ,
3 3 3
which shows that f is continuous at a.
The following is the most commonly used test for uniform convergence of se-
ries:
∞
" ∞
"
|s(x) − sk (x)| ≤ |fn (x)| ≤ Mn = Ck (x ∈ S).
k+1 k+1
!
But Ck → 0 as k → ∞ since the series Mn is !convergent, so it follows from
Theorem 7.5 that the sequence {sk }, i.e., the series fn , is uniformly convergent
on S.
318 Chapter 7. Functions Defined by Series and Integrals
E XAMPLE 3. The M-test gives an easy verification that the geometric series
! ∞ n n
0 x converges uniformly !onn [−r, r] for any r < 1, by taking Mn = r .
n n
(|x | ≤ r for |x| ≤ r, and r < ∞.)
!
E XAMPLE 4. The Taylor series for log(1 + x), ∞ 1 (−1)
n−1 xn /n, converges
absolutely for x ∈ (−1, 1) (by the ratio test) and conditionally at x = 1 (by the
alternating series test). Since |(−1)n−1 xn /n| ≤ r n /n when |x| ≤ r, the M-test
(with Mn = r n /n) shows that this series converges uniformly on [−r, r] for
any r < 1. It actually converges uniformly on [−r, 1] for any r < 1, but the M-
test will not yield this result because the convergence at 1 is only conditional.
(The result needed here is a theorem of Abel that we shall present in §7.3.)
The remarks following Theorem 7.8, to the effect that local uniform conver-
gence is enough to yield continuity, apply to this situation also.
EXERCISES
that does not contain a nonzero integer, and conclude that the sum of the series
is a continuous function on R \ {±1, ±2, . . .}.
"∞
(−1)n−1
5. Show that the series converges uniformly on R, although the
x2 + n
1
convergence is conditional at every point.
!
6. Given a sequence {cn } of real numbers such that ∞ 1 cn converges, consider
"∞ n
x
the series cn (x ̸= ±1). (Such a series is called a Lambert se-
1 − xn
1
ries.)
a. Show that the series converges absolutely and uniformly on [−a, a] for any
a < 1.
b. Show that the series converges uniformly on (−∞, −b] and on![b, ∞) for
any b > 1, and that the convergence is absolute if and only if ∞ 1 |cn | <
320 Chapter 7. Functions Defined by Series and Integrals
This last quantity is the n-dimensional volume of S times Ck , which tends to zero
as k → ∞.
7.2. Integrals and Derivatives of Sequences and Series 321
E XAMPLE 1. Let gk (x) = k−1 sin kx. Then |gk (x)| ≤ k−1 for all x, so
gk → 0 uniformly on R. On the other hand, gk′ (x) = cos kx; the sequence
{cos kx} does not converge at all for most values of x, and when it does —
namely, when x is an even multiple of π — its limit is 1, not 0.
In this situation, the crucial uniformity hypothesis is not on the original se-
quence {fk } but on the differentiated sequence {fk′ }. Here is the result:
;x
Thus f (x) = f (a) + a g(t) dt. But by the fundamental theorem of calculus, the
function on the right is differentiable and its derivative is g.
The example {fk } in (7.2) shows that pointwise convergence of {fk′ } is not
sufficient to obtain lim(fk′ ) = (lim fk )′ . On the other hand, Theorem 7.12 can be
extended somewhat. Since differentiability (like continuity) is a local property, it is
enough for the convergence of {fk′ } to be uniform on a neighborhood of each point,
rather than on the whole interval in question. In many situations, the sequence
{fk } is defined on an open interval (a, b) and one has uniform convergence of
{fk′ } on each compact subinterval [a + δ, b − δ]; this suffices to guarantee that
lim(fk′ ) = (lim fk )′ on (a, b).
The results on term-by-term integration and differentiation of series are imme-
diate consequences of those for sequences. We have merely to apply Theorems
7.11 and 7.12 to the partial sums of the series to obtain the following theorem.
!
a. If fn converges uniformly on [a, b], then
* b I" J "* b
fn (x) dx = fn (x) dx.
a a
! ′
b. If the fn ’s are!of class C 1 and the series fn converges uniformly on [a, b],
1
then the sum fn is of class C on [a, b] and
d I" J "
fn (x) = fn′ (x) (x ∈ [a, b]).
dx
EXERCISES
!
1. Let f (x) = ∞ 1 n
−2 sin nx. Show that f is a continuous function on R and
; π/2 ! !
that 0 f (x) dx = n=1,3,5,... n−3 + 2 n=2,6,10,... n−3 .
!
2. Let f (x) = ∞ −2
1 (x + n) . Show that f is a continuous function on [0, ∞)
;1
and that 0 f (x) dx = 1.
3. Let fk (x) = x arctan kx.
a. Show that limk→∞ fk (x) = 12 π|x|.
b. Show that limk→∞ fk′ (x) exists for every x, including x = 0, but that the
convergence is not uniform in any interval containing 0.
4. For each of the series (a–f) in Exercise 2, §7.1, show that the series can be dif-
ferentiated term-by-term on its interval of convergence (except at the endpoints
in (b)).
!
5. For x ̸= ±1, ±2, . . ., let f (x) = 2x ∞ 2
1 (x − n )
2 −1 (see Exercise 4, §7.1).
!
Show that f is of class C on its domain and that f ′ (x) = − ∞
1
1 [(x − n)
−2 +
−2
(x + n) ].
6. Let f be a continuous function ;on [0, ∞) such that 0 ≤ f (x) ≤ Cx−1−ϵ for
∞
some C, ϵ > 0, and let a = 0 f (x) dx. (The estimate on f implies the
convergence of this integral.) Let fk (x) = kf (kx).
a. Show that limk→∞ fk (x) = 0 for all x > 0 and that the convergence is
uniform on [δ, ∞) for ; 1 any δ > 0.
b. Show that limk→∞ 0 fk (x) dx = a.
;1
c. Show that limk→∞ 0 fk (x)g(x) dx = ag(0) for any integrable function g
;1 ;δ ;1
on [0, 1] that is continuous at 0. (Hint: Write 0 = 0 + δ .)
7.3. Power Series 323
Important Remark. The reader has probably been thinking of an and x as real
numbers, but Theorem 7.16 is valid, with exactly the same proof, when an and x
are complex numbers.
324 Chapter 7. Functions Defined by Series and Integrals
!
Theorem 7.16 says that the set of all real x such that an xn converges is an
open interval centered at 0, possibly
! together with one or both endpoints, and the
set of all complex x such that an xn converges is an open disc centered at 0 in
the complex plane, possibly together with some or all of its boundary points. The
behavior of the series on the boundary of the region of convergence must be decided
on a case-by-case basis.
An easy application of the ratio test shows that each of these series converges
absolutely for |x| < 1 and diverges for |x| > 1, so their radius of convergence
!1. −2
is The first one is absolutely convergent when |x| = 1 by comparison with
n , whereas the second is divergent when |x| = 1 because xn ̸→ 0 as
n → ∞ in that case. The third one is divergent when x = 1 but is conditionally
convergent at x = −1 by the alternating series test. It is also conditionally
convergent at all other complex numbers x such that |x| = 1, by Dirichlet’s
test. (Indeed, take an = n−1 and bn = xn . Then b1 + · · · + bn is a finite
geometric series whose sum equals x(1 − xn )/(1 − x), and this is bounded by
2|x|/(|1 − x|) as n → ∞.)
The standard tools for determining the radius of convergence of a power series
are the ratio test and the root test. We have already seen how this works in §6.4
(especially Example 2 and Exercises 1–9), so we shall not belabor the point here.
However, see Exercise 1. In fact, a slight extension of the root test yields a formula
for the radius of convergence of an arbitrary power series; see Exercise 4.
Theorem 7.16 shows that any power series converges absolutely within the re-
gion |x| < R. Equally important is that it converges uniformly on compact subsets
of this region.
!∞ n
7.17 Theorem.
!∞ Let R be the radius of convergence of 0 an x . For any r < R,
the series 0 an xn converges uniformly on the set {x : |x| ≤ r}, and its sum is a
continuous function on the set {x : |x| < R}.
!
Proof.!For |x| ≤ r we have |an xn | ≤ |an |r n , and the series |an |r n is convergent
since an xn is absolutely convergent at x = r. The first assertion therefore fol-
lows from the Weierstrass M-test, and the second follows from the first by Theorem
7.8.
7.3. Power Series 325
Proof. Assertion (a) follows immediately from !Theorems 7.13a and 7.17. The fun-
∞ n+1
damental theorem of calculus then shows that 0 an x /(n+1) is an antideriva-
tive of f on (−R, R) — specifically, the one whose value at x = 0 is zero — and
any other antiderivative differs from this one by a constant.
Theorem 7.18 gives a way of generating new series expansions from old ones.
!∞
E XAMPLE 2. If we integrate the geometric series 0 (−x)n = (1 + x)−1
(|x| < 1), we obtain
* x ∞
" (−1)n ∞
" (−1)n−1
dt
log(1 + x) = = xn+1 = xn (|x| < 1).
0 1+t n+1 n
0 1
The series for log(1+x) is easily obtained from Taylor’s theorem (see Exercise
3 in §6.1), but not the series for arctan x; the computation of the high-order
derivatives of the latter function is very cumbersome. (Remark: The expansion
of log(1+x) is also valid at x = 1, and that of arctan x is also valid at x = ±1.
However, these facts do not follow from Theorem 7.18. The extra result needed
here is Abel’s theorem, which we shall present below.)
Theorem 7.18 also offers a technique for expressing definite or indefinite inte-
grals of functions that have no elementary antiderivatives in a computable form.
326 Chapter 7. Functions Defined by Series and Integrals
respectively. Suppose |x| < R′ . Then nan xn−1 is absolutely convergent, and
|x|
|an xn | = |nan xn−1 | ≤ |nan xn−1 | for large n,
n
!
so an xn is absolutely convergent by comparison. Thus, if |x| < R′ then |x| ≤
R, and it follows that R′ ≤ R.
On the other!hand, if |x| < R, we can pick a number r such that |x| < r < R.
Then the series an r n is absolutely convergent, and
n−1 1 5 )) x ))n 6
|nan x |= n ) ) |an |r n .
|x| r
Combining this result with Theorem 7.13b, we obtain the fundamental theorem
on term-by-term differentiation of a power series.
!
7.20 Theorem. Suppose the radius of convergence of the series f (x) = an x n
∞
is R > 0. Then the function f is of class C on the interval (−R, R), and !∞its kth
derivative may be computed on (−R, R) by differentiating the series 0 an xn
termwise k times.
!
Proof. In view of Theorem 7.19, Theorem 7.13b shows that f ′ (x) = nan xn−1
for |x| < R. It now follows by induction on k that, for any positive integer k, f is
of class C k on (−R, R) and that f (k) is the sum of the k-times derived series.
!
7.21 Corollary. Every power series ∞ n
0 an x with a positive
!∞ radius of conver-
gence is the Taylor series of its sum; that is, if f (x) = a x n for |x| < R
0 n
(R > 0), then
f (n) (0)
an = .
n!
Proof. Since (d/dx)n xk = 0 when k < n and (d/dx)n xn ≡ n!, we have
dn / 0
f (n) (x) = n
a0 + a1 x + · · · + an xn + · · · = n!an + · · · ,
dx
where the last set of dots denotes terms containing positive powers of x. Setting
x = 0, we obtain f (n) (0) = n!an .
! !∞
7.22 Corollary. If ∞ n
0 an x =
n
0 bn x for |x| < R (R > 0), then an = bn for
all n.
Proof. We have an = f (n) (0)/n! = bn where f (x) is the common sum of the two
series.
The following examples will illustrate the use of Theorem 7.20. The second one
contains a result of importance in its own right, the binomial formula for fractional
and negative exponents.
!
E XAMPLE 4. Suppose we wish to express the sum of the series ∞ n
1 x /n
2
log(1 − x)
xf ′ (x) = − log(1 − x), f ′ (x) = − ,
x
and, finally, * x
log(1 − t)
f (x) = − dt.
0 t
E XAMPLE 5. Let α be a real number. Since
dn
(1 + x)α = α(α − 1) · · · (α − n + 1)(1 + x)α−n ,
dxn
the Taylor series of (1 + x)α is
∞ - .
" - .
α n α α(α − 1) · · · (α − n + 1)
(7.23) fα (x) = x , where =
n n n!
n=0
/ 0
(with the understanding that α0 = 1). This series is called the binomial series
of order α. When α is a nonnegative integer k, the terms with n > k all vanish
since they contain a factor of (α − k), and we obtain the familiar binomial
expansion formula for (1 + x)k . For other values of α, the Taylor series is a
genuine infinite series, and one can easily check by the ratio test that its radius
of convergence is 1. Our aim is to verify that the sum of this series is actually
(1 + x)α for |x| < 1.
We need
/α0 the following formulas concerning the generalized binomial co-
efficients n :
- . - .
α α(α − 1) · · · (α − n + 1) α−1
(7.24) n = =α ;
n (n − 1)! n−1
(7.25)
- . 8 - . - .
α (α − n) + n](α − 1) · · · (α − n + 1) α−1 α−1
= = + .
n n! n n−1
∞
" - . ∞ - . ∞ - .
α n−1 " α − 1 n−1 " α−1 n
fα′ (x) = n x = α x =α x
1
n 1
n−1 0
n
= αfα−1 (x).
7.3. Power Series 329
(For the third equality we have made the change of variable n → n + 1.) On
the other hand,
∞ -
" . ∞ -
" .
α−1 α − 1 n+1
n
(1 + x)fα−1 (x) = x + x
n n
0 0
"∞ +- . - ., "∞ - .
α−1 α−1 n α n
= + x = x = fα (x).
0
n n−1 0
n
!
7.26 Theorem (Abel’s Theorem). If the series ∞ n
0 an x converges at x = R
(resp. x = −R), then it converges uniformly on the interval [0, R] (resp. [−R, 0])
and hence defines a continuous function on that interval.
Proof.
! Convergence at x = −R (and uniform convergence on [−R, 0]) of f (x) =
an xn is the!same as convergence at x = R (and uniform convergence on [0, R])
of f (−x) = (−1)n an xn , so it is enough to consider convergence at x = R.
Moreover, convergence at x = R (and uniform convergence on [0, R]) of f (x) =
!
an xn is the!same as convergence at x = 1 (and uniform convergence
! on [0, 1])
of f (Rx) = Rn xn . In short, it is enough to assume that ∞
an! 0 a n converges
and to prove that ∞ n
0!an x converges uniformly on [0, 1]. To do this we must
show that the tail end ∞ n
k an x of the series converges uniformly to zero on [0, 1]
as k → ∞. ! !∞
For k ≥ 1, let Ak = ∞ k an be the kth tail end of the series 0 an , so that
ak = Ak − Ak+1 . For l > k and x ∈ [0, 1] we have
Let l → ∞: then Al+1 → 0 and xl remains bounded, so the last term on the right
disappears and we obtain
∞
" ∞
"
(7.27) an xn = Ak xk + An+1 (xn+1 − xn ).
k k
Now, given ϵ > 0, we can choose k so large that |An | < 12 ϵ whenever n ≥ k.
Since x ∈ [0, 1], we have xn+1 − xn ≤ 0, so (7.27) yields
) ∞ ) ∞
)" ) "
) n) k
an x ) ≤ |Ak |x + |An+1 |(xn − xn+1 )
)
k k
∞
"
1 k 1
≤ 2 ϵx + 2 ϵ (xn − xn+1 ).
k
for all x ∈ [0, 1] when k is sufficiently large, which establishes the desired uniform
convergence.
7.3. Power Series 331
! !
Remark. If an Rn converges, we already know (Theorem 7.17) that an xn
converges uniformly !on [−r, r] for any r < R. Combining this with Abel’s the-
orem, we see that an xn converges uniformly on [−r, R]. (See Exercise 7 in
§7.1.)
The continuity of the series at the endpoint can be restated in the following way.
Recall that limx→a− f (x) denotes the limit of f (x) as x approaches a from the left.
! !∞ !∞
7.28 Corollary. If ∞ 0 an converges, then limx→1−
n
0 an x = 0 an .
!∞
E XAMPLE 7. The expansion arctan x = 0 (−1)n x2n+1 /(2n + 1) was es-
tablished in Example 2 for |x| < 1. Since the series also converges at x = 1
(by the alternating series test), we obtain a neat series formula for π:
∞
"
1 (−1)n 1 1 1
4π = lim arctan x =
x→1− 2n + 1
=1− 3 + 5 − 7 + ··· .
0
!∞ n
The converse! of Corollary 7.28 is false: The limit S = limx→1− ! 0 an x may
∞ n ∞
exist even when 0 an diverges. (Example: Take an = (−1) ! ; then 0 an xn =
(1 + x)−1 for |x| < 1, so S = 21 .) In this case the series an is said to be Abel
summable to the sum S. Abel summation provides a way of making sense out
of certain divergent series that is useful in some situations, one of which we shall
discuss in §8.2.
EXERCISES
1. Let {an }∞0 be a sequence of real or complex numbers.
a. Suppose that |an+1 /an | converges
!∞ to a limit L as n → ∞. Show that the
radius of convergence of 0 an x is L−1 .
n
∞
!∞ thatn if the sequence {an }0 is bounded, the radius of convergence of
2. Show
0 an x is at least 1.
!
3. Suppose the radius
!∞of convergence of ∞ n
0 an x is R. What is the radius of
kn
convergence of 0 an x (k = 2, 3, 4, . . .)?
!∞
4. Show that for any sequence {an }∞ 0 , the radius of convergence of
n
0 an x is
1/n
the reciprocal of lim supn→∞ |an | . (See Exercises 9–12 in §1.5 and Exer-
cise 25 in §6.2.)
5. Show that each of the following functions of x admits a power series expansion
on some interval centered at the origin. Find the expansion and give its interval
of validity.
332 Chapter 7. Functions Defined by Series and Integrals
;x 2
a. ;0 e−t dt.
x
b. ;0 cos t2 dt.
x
c. 0 t−1 log(1 + 2t) dt.
6. Use the series expansions in Exercise 5 to calculate the following integrals to
three; decimal places, and prove the accuracy of your answer.
1 2
a. 0 e−t dt.
;1
b. 0 cos t2 dt.
; 1/2
c. 0 t−1 log(1 + 2t) dt.
!
7. Let f (x) = ∞ n
0 an x be a power series with positive radius of convergence.
Show that f (−x) = f (x) (resp. f (−x) = −f (x)) for all x in the interval of
convergence if and only if an = 0 for all odd n (resp. all even n).
8. Let k be a nonnegative integer. The Bessel function of order k is the function
Jk defined by
(−1)n I x J2n+k
"∞
Jk (x) = .
n!(n + k)! 2
0
converges for all x and that its sum f (x) satisfies f ′′ (x) = xf (x).
10. Express the sums of the following series in terms of elementary functions and
(perhaps) their antiderivatives in the manner of Example 4.
"∞
nxn
a. .
1
(n + 1)!
"∞
(−1)n x2n+1
b. .
(2n + 1) · (2n + 2)!
0
"∞
xn
c. .
(n + 1)2 n!
0
"∞
(−1)n (2n + 1)x2n
d. .
0
(2n)!
7.4. The Complex Exponential and Trig Functions 333
;x
11. Consider the function f (x) = 0 arctan t dt.
a. Perform the integration to evaluate f in terms of elementary functions.
b. Using the result of Example 2, compute the Taylor series of f (x) (centered
at the origin) and show that it converges to f (x) for x ∈ [−1, 1]. (The
endpoints require special attention.)
c. Deduce that
1 1 1 1 1 1
1− 2 − 3 + 4 + 5 − 6 − 7 + · · · = 14 π − 1
2 log 2.
This extended exponential function still obeys the basic law of exponents. Indeed,
by Theorem 6.29,
∞
" ∞ ∞
z w z m wn " " z m wn " (z + w)k
(7.29) e e = = = = ez+w .
m,n=0
m!n! m!n! k!
k=0 m+n=k k=0
The series on the right are the Taylor series of cos x and sin x, so we have arrived
at Euler’s formula
This is the appropriate place to raise the issue of the definition of cos x and
sin x. These functions are so familiar that we take them entirely for granted, but the
334 Chapter 7. Functions Defined by Series and Integrals
We now indicate how to derive all the familiar properties of the trig functions
from these definitions. First, it is clear from (7.31) that
Third, the addition formulas for sine and cosine follow easily from the law of ex-
ponents:
Next, we have to!bring the number π into play somehow. We can proceed as
follows. The series ∞ n 2n
0 (−1) 2 /(2n)! for cos 2 is an alternating series whose
terms decrease in magnitude starting with n = 1, so by the alternating series test,
22 24 2
cos 2 = 1 − = −1 with error less than = .
2! 4! 3
In particular, cos 2 < 0, and of course cos 0 = 1 > 0, so by the intermediate value
theorem there is at least one number a ∈ (0, 2) such that cos a = 0. Therefore, the
set Z = {x ≥ 0 : cos x = 0} is nonempty; it is closed since cos is continuous;
hence it contains its greatest lower bound, which is positive since cos 0 = 1. We
denote this smallest positive zero of cos by 12 π. (Again, this may be taken as a
definition of the number π, from which its other familiar properties can be derived.)
Now, by (7.33), (d/dx) sin x = cos x > 0 for 0 ≤ x < 21 π, so sin is increasing
on [0, 21 π], and sin 0 = 0; hence sin 21 π > 0. But by (7.35), sin2 12 π = sin2 12 π +
cos2 12 π = 1; hence, sin 12 π = 1. In summary,
These, in turn, yield the 2π-periodicity of sine and cosine. Indeed, replacing x by
−x in (7.37) and using (7.32), we see that cos(x+ 21 π) = − sin x and sin(x+ 21 π) =
cos x, whence
and therefore
EXERCISES
1. Recall that the hyperbolic sine and cosine functions are defined by sinh z =
1 z −z 1 z −z
2 (e − e ) and cosh z = 2 (e + e ). Here, z may now be taken to be a
complex number.
a. Show that sinh ix = i sin x and cosh ix = cos x.
b. Show that sinh(z +w) = sinh z cosh w+cosh z sinh w and cosh(z +w) =
cosh z cosh w + sinh z sinh w.
c. Express sinh(x + iy) and cosh(x + iy) in terms of real functions of the
real variables x and y.
2. Verify that the formula (d/dx)ecx = cecx remains valid when c is a complex
number. (However, x is still a real variable, since we have not discussed differ-
entiation of functions of a complex variable.)
;
3. Let a and b be real numbers. Compute e(a+ib)x dx by using the result of
Exercise 2; then, by taking real and imaginary parts, deduce the formulas
*
eax (a cos bx + b sin bx)
eax cos bx dx = ,
a2 + b2
*
eax (a sin bx − b cos bx)
eax sin bx dx = .
a2 + b2
The most useful test for uniform convergence is the following analogue of the
Weierstrass M-test. The proof is essentially identical to that of the M-test, and we
leave the details to the reader (Exercise 1).
7.38 Theorem. Suppose there is a function g(t) ≥ 0; on [c, ∞) such that (i)
∞
|f
;∞ (x, t)| ≤ g(t) for all x ∈ I and t ≥ c, and (ii) c g(t) dt < ∞. Then
c f (x, t) dt converges absolutely and uniformly for x ∈ I.
The consequences of uniform convergence
;∞ for continuity, integration, and dif-
ferentiation of the function F (x) = c f (x, t) dt are much the same as for series.
The following two theorems provide analogues of Theorems 7.10 and 7.13 in the
present setting.
7.39 Theorem. Suppose that f (x, t) is;a continuous function on the set {(x, t) :
∞
x ∈ I, t ≥ c} and that the integral c f (x, t) dt is uniformly convergent for
x ∈ I. Then: ;∞
a. The function F (x) = c f (x, t) dt is continuous on I.
b. If [a, b] ⊂ I, then
* b* ∞ * ∞* b
f (x, t) dt dx = f (x, t) dx dt.
a c c a
;∞ ;d
Proof. The conclusions are true if c is replaced by c where d < ∞, by The-
orems 4.46 and 4.26. (a) then follows because the uniform limit of continuous
functions is continuous, and (b) follows by the argument in the proof of Theorem
7.11.
7.40 Theorem. Suppose that f (x, t) and its partial derivative ∂x f (x, t) are con-
tinuous
;∞ functions on the set {(x, t) : x ∈ I, t ≥ c}.; Suppose also that the integral
∞
c f (x, t) dt converges for x ∈ I and the integral c ∂x f (x, t) dt converges uni-
formly for x ∈ I. Then the former integral is differentiable on I as a function of x,
and * ∞ * ∞
d ∂f
f (x, t) dt = (x, t) dt.
dx c c ∂x
Theorem 7.40 may be deduced from Theorem 7.39 in much the same way as
Theorem 7.12 was deduced from Theorem 7.11 (Exercise 2).
Let us state explicitly the result of combining Theorems 7.39 and 7.40 with
Theorem 7.38:
7.41 Theorem. The conclusions of ;Theorem 7.39 are valid whenever |f (x, t)| ≤
∞
g(t) for all x ∈ I and t ≥
;∞c, where c g(t) dt < ∞. The conclusions of Theorem
7.40 are valid whenever c f (x,; ∞t) dt converges for x ∈ I and |∂x f (x, t)| ≤ g(t)
for all x ∈ I and t ≥ c, where c g(t) dt < ∞.
338 Chapter 7. Functions Defined by Series and Integrals
E XAMPLE 2. Let
* ∞
2
F (x) = e−xt dt, x > 0.
0
2 2
Since (∂ k /∂xk )e−xt = (−t2 )k e−xt , by Theorem 7.40 we can conclude that
* ∞
(k) k 2
F (x) = (−1) t2k e−xt dt (x > 0),
0
provided that we establish the uniform convergence of the integral on the right.
In fact, the convergence is not uniform on the whole interval (0, ∞), but it is
uniform on [δ, ∞) for any δ > 0, which is sufficient. This follows easily from
2 2
Theorem 7.38, since t2k e−xt ≤ t2k e−δt for x ≥ δ.
On the other hand, we can evaluate F (x) explicitly by making the substi-
tution u = x1/2 t and invoking Proposition 4.66:
* ∞ √
−u2 −1/2 π −1/2
F (x) = e x du = x ,
0 2
and therefore
√
(k) π 1
F (x) = (− 2 )(− 32 ) · · · (−k + 12 )x−k−(1/2) .
2
7.5. Functions Defined by Improper Integrals 339
This is a bit tricky, since the integral is not absolutely convergent. (Note that
since t−1 sin t → 1 as t → 0, the integral over [0, 1] is an ordinary proper inte-
gral. The convergence of the integral over [1, ∞) was proved in §4.6 [Example
3].) Our strategy will be to consider an improper integral with two parameters:
* ∞
e−xt sin yt
(7.43) F (x, y) = dt (x > 0, y ∈ R).
0 t
By Theorem ; ∞7.41, this formula is indeed valid, since |e−xt cos yt| ≤ e−xt for
all y and 0 e−xt dt < ∞. The integral on the right can be evaluated by
elementary calculus (integrate by parts twice, or use Exercise 3 in §7.4), and
the result is
)∞
∂F )
−xt y sin yt − x cos yt ) x
=e 2 2 ) = 2 .
∂y x +y 0 x + y2
The variable y has now served its purpose, and we henceforth set it equal to 1.
We have shown that
* ∞
e−xt sin t
(7.44) dt = arctan(1/x) (x > 0).
0 t
We now wish to let x → 0. In order to pass the limit under the integral sign
in (7.44), it is enough to show that the integral in (7.44) is uniformly convergent
for x ≥ 0. Unfortunately, Theorem 7.38 does not apply here, since the integral
is not absolutely convergent at x = 0. (Theorem 7.38 easily yields the uniform
convergence for x ≥ δ for any δ > 0, but that isn’t good enough!) Recall the
meaning of uniform convergence: What we need to show is that
)* ∞
)
) e−xt sin t ))
sup )) dt) → 0 as b → ∞.
x≥0 b t
To this end, we use integration by parts,2 taking u = t−1 and dv = e−xt sin t dt;
the result is
* ∞ * ∞
e−xt sin t e−bx (x sin b + cos b) e−xt (x sin t + cos t)
dt = − dt.
b t (x2 + 1)b b (x2 + 1)t2
Now,
) −xt )
) e (x sin t + cos t) )
) )≤ x+1 .
) (x2 + 1) ) x2 + 1
2
The idea is much the same as the use of summation by parts in the proof of Abel’s theorem.
7.5. Functions Defined by Improper Integrals 341
EXERCISES
1. Prove Theorem 7.38.
2. Prove Theorem 7.40.
;∞
3. Suppose x > 0. Verify that 0 ;e−xt dt = x−1 , justify differentiating under
∞
the integral sign, and deduce that 0 tn e−xt dt = n!x−n−1 .
;∞
4. Verify that 0 (t2 + x)−1 dt =; 21 πx−1/2 , justify differentiating under the inte-
∞
gral sign, and thence evaluate 0 (t2 + x)−n dt.
* ∞ −bx
e − e−ax a
5. Show that dx = log for a, b > 0.
x b
*0 ∞ −bx −ax
e −e 1 + a2
6. Show that cos x dx = 12 log for a, b > 0.
x 1 + b2
*0 ∞
1 − cos ax
7. Show that e−x dx = 12 log(1 + a2 ) for all a ∈ R.
0 x
8. Deduce from (7.42) that
⎧
1
* ∞ ⎨2π
⎪ if x > 0,
sin xt
dt = 0 if x = 0,
0 t ⎪
⎩ 1
− 2 π if x < 0.
Show that the convergence is uniform for x ∈ I if I is any compact interval
with 0 ∈ / I, but not if 0 ∈ I.
* ∞
sin2 xt
9. Use Exercise 8 to show that dt = 21 πx for x > 0.
0 t2
* ∞
cos bx − cos ax
10. Let I(a, b) = dx.
0 x2
a. Show that I(a, b) is convergent for all a, b ∈ R and that the convergence is
uniform for a in any finite interval when b is fixed (or vice versa).
b. Use Exercise 8 to show that I(a, b) = 21 π(a − b) if a, b > 0.
c. Show that I(a, b) = 12 π(|a| − |b|) for all a, b ∈ R.
;∞ 2
11. Let F (x) = 0 e−t cos xt dt for x ∈ R.
a. Justify differentiating under the integral sign and thence show that F ′ (x) =
− 12 xF (x).
√ 2
b. Show that F (x) = 12 πe−x /4 .
;∞ 2
12. Let G(x) = 0 e−t sin xt dt for x ∈ R. Proceeding as in Exercise 11, show
2 ; x 2
that G(x) = e−x /4 0 et /4 dt.
* ∞ 2
1 − e−xt √
13. Show that dt = πx for x ≥ 0.
0 t2
342 Chapter 7. Functions Defined by Series and Integrals
;∞ 2 2 2
14. Let F (x) = 0 e−t −(x /t ) dt.
a. Show that F is a continuous function on R that satisfies F ′ (x) = −2F (x)
for x > 0 and F ′ (x) = 2F (x) for x < 0.
√
b. Show that F (x) = 21 π e−2|x| .
;∞ 2 2
c. Evaluate 0 e−pt −(q/t ) dt for p, q > 0.
15. Let f be a continuous function on [0, ∞) that satisfies |f (x)| ≤ a(1 + x)N ebx
for some a, b, N ≥ 0. The Laplace transform of f is the function L[f ] defined
on (b, ∞) by * ∞
L[f ](s) = e−sx f (x) dx.
0
a. Show that L[f ] is of class C ∞ on (b, ∞) and (d/ds)n L[f ] = (−1)n L[fn ]
where fn (x) = xn f (x).
b. Suppose that f is of class C 1 on [0, ∞) and that f ′ satisfies the same sort of
exponential growth condition as f . Show that L[f ′ ](s) = sL[f ](s) − f (0).
which has a way of turning up in many unexpected places. Let us analyze the
integrals over [0, 1] and [1, ∞) separately. The integral over [0, 1] is proper for
x ≥ 1 and improper but convergent for 0 < x < 1. In fact, by Theorem 7.38 it
is uniformly convergent for x ≥ δ, for any δ > 0, since 0 < tx−1 e−t ≤ tδ−1 for
x ≥ δ and 0 ≤ t ≤ 1. The integral over [1, ∞) is convergent for all x and uniformly
convergent for x ≤ C, for any constant C, since 0 < tx−1 e−t ≤ tC−1 e−t for
x ≤ C and t ≥ 1. Therefore, the integral defining Γ(x) is convergent for x > 0
and uniformly convergent on δ ≤ x ≤ C for any δ > 0 and C > 0.
It follows that Γ is a continuous function on (0, ∞). In fact, Γ is of class C ∞ on
(0, ∞), and its derivatives can be calculated by differentiating under the integral:
* ∞
(k)
(7.46) Γ (x) = (log t)k tx−1 e−t dt.
0
Since | log t| grows more slowly than any power of t as t → 0 or t → ∞, the argu-
ment of the preceding paragraph shows that the integral on the right is absolutely
7.6. The Gamma Function 343
and so by induction,
1√
(7.48) Γ(n) = (n − 1)!, Γ(n + 12 ) = (n − 12 ) · · · 23 · 2 π.
Thus the gamma function provides an extension of the factorial function to non-
integers: x! = Γ(x + 1), if you like. It is the natural extension of the factorial
function, not just because it gives the right values at the integers, but because the
functional equation Γ(x + 1) = xΓ(x) is the natural generalization of the recursive
formula n! = n · (n − 1)! that defines factorials.
Other factorial-like products — more precisely, products of numbers in an arith-
metic progression — can also be expressed in terms of the gamma function. Indeed,
since
Γ(x + 1)
Γ(x) = ,
x
shows that Γ(x) blows up like x−1 as x → 0. It also provides a way of extending
the gamma function to negative values of x. Indeed, the expression on the right is
defined for all x > −1 except x = 0, and it can be taken as a definition of Γ(x)
for −1 < x < 0. Once this has been done, Γ(x + 1)/x is defined for all x > −2
except x = 0, −1, and it can be taken as a definition of Γ(x) for −2 < x < −1.
Proceeding inductively, we eventually obtain a definition of Γ(x) for all x except
the nonpositive integers, where Γ(x) blows up. In more explicit form, it is
Γ(x + n)
(7.50) Γ(x) = (x > −n).
x(x + 1) · · · (x + n − 1)
This extended gamma function still satisfies the functional equation (7.47), more or
less by definition, and (7.49) remains valid provided that a/b is not a nonpositive
integer.
The qualitative behavior of the gamma function for x > 0 can be analyzed as
follows: Since Γ(1) = Γ(2) = 1, there is a critical point x0 in the interval (1, 2)
by Rolle’s theorem. On the other hand, from (7.46) it is clear that Γ′′ (x) > 0
for x > 0, so that Γ′ (x) is strictly increasing. It follows that Γ is decreasing for
0 < x < x0 and increasing for x > x0 ; in particular, it has a minimum at x0 . Also,
it tends to ∞ as x → 0 or x → ∞, so its graph is roughly U-shaped. The behavior
for x < 0 can then be deduced from (7.50). The graph of Γ is sketched in Figure
7.3.
A number of useful integrals can be transformed into the integral defining Γ(x)
by a change of variables. We single out two particularly useful ones, obtained by
setting u = bt and v = t2 , respectively:
* ∞ * ∞I J
x−1 −bt u x−1 −u du
(7.51) t e dt = e = b−x Γ(x) (b > 0),
0 0 b b
* ∞ * ∞
2 dv
(7.52) t2x−1 e−t dt = v (2x−1)/2 e−v 1/2 = 21 Γ(x).
0 0 2v
There is another important integral related to the gamma function, the so-called
beta function
* 1
(7.53) B(x, y) = tx−1 (1 − t)y−1 dt (x, y > 0).
0
7.6. The Gamma Function 345
Since the integrand is approximately equal to tx−1 for t near 0 and to (1 − t)y−1
for t near 1, the integral is proper when x, y ≥ 1 and convergent for x, y > 0. Like
the gamma function, the beta function can be expressed in a number of different
forms by changes of variable in the integral. Other than (7.53), the most important
of these is obtained by the substitution t = sin2 θ, which makes 1 − t = cos2 θ and
dt = 2 sin θ cos θ dθ, so that
* π/2
(7.54) B(x, y) = 2 sin2x−1 θ cos2y−1 θ dθ.
0
Γ(x)Γ(y)
7.55 Theorem. For x, y > 0, B(x, y) = .
Γ(x + y)
;∞ 2
Proof. We employ the same device that we used to calculate −∞ e−x dx in §4.7:
We express Γ(x) and Γ(y) by (7.52), write Γ(x)Γ(y) as an iterated integral, convert
346 Chapter 7. Functions Defined by Series and Integrals
We draw two useful consequences from Theorem 7.55. The first one is another
functional equation for the gamma function; the second one compares the growth
of Γ(x) and Γ(x + a) as x → ∞.
Proof. Assume that x > 0. By taking y = x in Theorem 7.55 and observing that
the function t(1 − t) is symmetric about t = 12 , we see that
* 18 * 1/2 8
Γ(x)2 9x−1 9x−1
= t(1 − t) dt = 2 t(1 − t) dt.
Γ(2x) 0 0
By the substitution
Since Γ( 12 ) = π 1/2 , the result follows. The extension to negative values of x is left
to the reader (Exercise 6).
Γ(x + a)
7.57 Theorem. For a > 0, lim = 1.
x→∞ xa Γ(x)
7.6. The Gamma Function 347
When x is large, e−xu is very small unless u is close to 0, and in that case 1−e−u is
;approximately u. Hence, the integral on the right should be approximately equal to
∞ a−1 −xu
0 u e du = x−a Γ(a), which is what we are trying to show. More precisely,
we have
* ∞ * ∞8
Γ(x)Γ(a) a−1 −xu
9
= u e du + (1 − e−u )a−1 − ua−1 e−xu du
Γ(x + a) 0 0
* ∞
8 9
= x−a Γ(a) + (1 − e−u )a−1 − ua−1 e−xu du.
0
where we have used (7.51) again in the last step. In short, the right side of (7.58) is
dominated by x−1 as x → ∞, so we are done.
348 Chapter 7. Functions Defined by Series and Integrals
Since
84 9 Γ(n + 43 )
1 · 4 · 7 · · · (3n + 1) = 3n 3 · 73 · · · (n + 31 ) = 3n ,
Γ( 43 )
Γ(n + 43 )
.
n2 Γ( 43 )Γ(n + 1)
EXERCISES
1. Prove the duplication formula for the case where x is a positive integer simply
by using (7.48).
2. Show that for a, b > 0,
* 1- . * 1- .a−1
1 a−1 1
log dt = Γ(a), log tb−1 dt = b−a Γ(a).
0 t 0 t
3. Evaluate
; ∞ the following integrals:
2
a. ;0 x4 e−x dx.
∞ √
b. 0 e−3x x dx.
; ∞ 9 −x4
c. 0 x e dx.
4. Prove the following identities directly from the definition (7.53) (without using
Theorem 7.55):
a. B(x, y) = B(y, x).
b. B(x, 1) = x−1 .
7.6. The Gamma Function 349
π 2 · 2 · 4 · 4 · 6 · 6 · · · (2n)(2n)
= lim .
2 n→∞ 1 · 3 · 3 · 5 · 5 · 7 · · · (2n − 1)(2n + 1)
(Hint: Denote the fraction on the right by cn . Use Exercise 7 and the fact that
sin2n+1 x < sin2n x < sin2n−1 x for 0 < x < 12 π to show that cn < 21 π <
(2n + 1)cn /2n.)
9. Suppose f is a continuous function on [0, ∞). For α > 0, define the function
Iα [f ] on [0, ∞) by
* x
1
Iα [f ](x) = (x − t)α−1 f (t) dt.
Γ(α) 0
∞
" Γ(a + n)Γ(b + n)
12. Suppose a, b, c > 0. Show that converges if and only
Γ(c + n)n!
0
if a + b < c.
To see how good this approximation is, we approximate log(k + x) by its tangent
line at x = 0 and use Taylor’s theorem to estimate the error:
x 1 x2
log(k + x) = log k + + Ek (x), |Ek (x)| ≤ sup 2
.
k |t|≤|x| (k + t) 2!
(Here (k+t)−2 is the absolute value of the second derivative of log(k+t).) Clearly,
for |x| ≤ 12 and k ≥ 1 we have
1 1 1
|Ek (x)| ≤ 1 2 ≤ 1 2 ≤ 2k 2 .
8(k − 2 ) 8( 2 k)
Hence,
* 1/2 * 1/2
log(k + x) dx = [log k + k−1 x + Ek (x)] dx = log k + ck ,
−1/2 −1/2
where
)* 1/2
)
) ) 1
(7.59) |ck | = )) Ek (x) dx)) ≤ 2 .
−1/2 2k
7.7. Stirling’s Formula 351
Therefore,
n
"
log(n!) − (n + 12 ) log n + n = (n + 12 ) log(1 + (2n)−1 ) − 1
2 log 21 − ck .
1
Proof. With g(u) = f (u)a−1 − 1 as in the proof of Theorem 7.57, the function
|g′ (u)| = |(a − 1)f (u)a−2 f ′ (u)| is jointly continuous in a and u in the compact
region a ∈ [0, A], u ∈ [0, 1], so its maximum on this region is finite. The constant C
in that proof can be taken to be this maximum when a ∈ [0, A], and the conclusion
of the proof shows that
) a )
) x Γ(x) ) (C + 1)A
sup ) ) − 1)) ≤ ,
0≤a≤A Γ(x + a) x
Γ(x)
7.62 Lemma. lim = L, where L is as in Lemma 7.60.
x→∞ xx−(1/2) e−x
Proof. Any number x ≥ 1 can be written as x = n + a where n is a positive integer
and 0 ≤ a < 1, so that
Γ(x) Γ(n + a)
=
xx−(1/2) e−x (n + a)n+a−(1/2) e−n−a
+ ,+ ,+ - .−n−a+(1/2) ,
Γ(n) Γ(n + a) a n+a
= e .
nn−(1/2) e−n na Γ(n) n
By Lemma 7.60, the first factor in this last expression will be as close to L as we
please when n is sufficiently large. By Lemma 7.61, the second factor will be as
close to 1 as we please when n is sufficiently large and 0 ≤ a ≤ 1. The same is
also true of the third factor; indeed, by taking logarithms it is enough to verify that
) 5 a 6))
)
)a − (n + a − 12 ) log 1 + )
n
will be as close to 0 as we please when n is sufficiently large and 0 ≤ a < 1, and
this is easily accomplished by using the Taylor expansion of log(1 + t) about t = 0.
(Details are left to the reader as Exercise 1.) Combining these results, we see that
Γ(x)/xx−(1/2) e−x becomes as close to L as we please when x is sufficiently large,
as claimed.
Γ(x) √
7.63 Theorem (Stirling’s Formula). lim x−(1/2) −x = 2π.
x→∞ x e
Proof. It remains only to identify the constant L in Lemma 7.62. According to that
lemma, the quantities
Γ(x) Γ(x + 12 ) Γ(2x)
, ,
x x−(1/2) e−x (x + 21 )x e−x−(1/2) (2x)2x−(1/2) e−2x
all approach L as x → ∞. Dividing the product of the first two by the third and
using the duplication formula
Γ(x)Γ(x + 12 ) √
= 21−2x π,
Γ(2x)
we see that
Γ(x) Γ(x + 12 ) (2x)2x−(1/2) e−2x
L = lim x−(1/2) −x · ·
x→∞ x e (x + 21 )x e−x−(1/2) Γ(2x)
- .−x
√ 1
= lim 2πe 1 + .
x→∞ 2x
The last factor on the right tends to e−1/2 as x → ∞, so we are done.
7.7. Stirling’s Formula 353
where ∼ means that the ratio of the quantities on the left and right approaches 1 as
x → ∞. (The difference of these two quantites, however, tends to ∞ along with
x.)
EXERCISES
1. Complete the proof) of Lemma 7.62 by showing that) for some constant C > 0
we have sup0≤a≤1 )a − (n + a − 12 ) log[1 + (a/n)]) ≤ C/n.
2. If a fair coin is tossed 2n times, the probability that it will come up heads
exactly n times is (2n)!/(n!)2 22n . (The total number of possible outcomes is
2/ 2n0, and the number of those with exactly n heads is the binomial coefficient
2n 2
n = (2n)!/(n!) .) Use Stirling’s formula to estimate this probability when
n is large.
3. Stirling’s formula for factorials,
n! √
lim = 2π,
n→∞ nn+(1/2) e−n
can be proved more simply than the general case. One begins, as we did, by
proving Lemma 7.60, but it is then enough to evaluate the constant L there.
To do this, show that the fraction on the right of Wallis’s formula (Exercise 8
in §7.6) equals [2n n!]4 /[(2n)!]2 (2n + 1), then use
√ Lemma 7.60 to show that it
1 2
approaches 4 L as n → ∞; conclude that L = 2π.
Chapter 8
FOURIER SERIES
Fourier series are infinite series that use the trigonometric functions cos nθ and
sin nθ, or, equivalently, einθ and e−inθ , as the basic building blocks, in the same
way that power series use the monomials xn . They are a basic tool for analyzing
periodic functions, and they therefore have applications in the study of physical
phenomena that are periodic in time (such as circular or oscillatory motion) or in
space (such as crystal lattices). They can also be used to analyze functions defined
on finite intervals in ways that are useful in solving differential equations, and this
leads to many other applications in physics and engineering. The theory of Fourier
series and its ramifications is an extensive subject that lies at the heart of much
of modern mathematical analysis. Here we present only the basics; for further
information we refer the reader to Folland [6], Kammler [10], and Körner [11].
355
356 Chapter 8. Fourier Series
exist (and are finite). Moreover, we shall say that a P -periodic function f on R is
piecewise continuous if it is piecewise continuous on each interval of length P . (If
it is piecewise continuous on one such interval, of course, it is piecewise continuous
on all of them.)
Note. It is sometimes convenient to allow a piecewise continuous function to
be undefined at the points where it has jumps. This does not affect anything that
follows in a significant way.
A piecewise continuous function is integrable over every bounded interval in
its domain. In this connection, the following elementary fact is worth pointing out
explicitly: If f is P -periodic and piecewise continuous, the integrals of f over all
intervals of length P are equal:
* a+P * P
(8.2) f (x) dx = f (x) dx for every a ∈ R.
a 0
The proof is left to the reader (Exercise 9).
By making the change of variable θ = 2πx/P , we can convert any P -periodic
function into a 2π-periodic function. Namely, if f (x + P ) = f (x) and we set
g(θ) = f (x) = f (P θ/2π), then g(θ + 2π) = g(θ). We may therefore restrict
attention to the case where the period is 2π, and we shall generally denote the inde-
pendent variable by θ. There is no presumption that θ denotes an angle, however;
it is just a convenient name for a real variable.
The basic idea of Fourier analysis is that an arbitrary piecewise continuous 2π-
periodic function f (θ) can be expanded as an infinite linear combination of the
functions einθ (n = 0, ±1, ±2, . . .), or equivalently of the functions cos nθ and
sin nθ (n = 0, 1, 2, . . .). In terms of the functions einθ , this expansion takes the
form
∞
"
(8.3) f (θ) = cn einθ .
−∞
Since e±inθ = cos nθ ± i sin nθ, combining the nth and (−n)th terms gives
where
an = cn + c−n , bn = i(cn − c−n ).
Therefore, (8.3) can be rewritten as
∞
"
1
(8.4) f (θ) = 2 a0 + (an cos nθ + bn sin nθ).
1
The grouping of the nth and (−n)th terms in (8.4) corresponds to the grouping of
the cos nθ and sin nθ terms in (8.4). (The factor of 12 in front of a0 is an artifact of
the definition a0 = c0 + c−0 = 2c0 .)
The series (8.3) and (8.4) can be used interchangeably. The more traditional
form is (8.4), but each of them has its advantages. The advantages of (8.4) derive
from the fact that cos nθ and sin nθ are real-valued and are respectively even and
odd; the advantages of (8.3) derive from the fact that exponentials tend to be eas-
ier to manipulate than trig functions. For developing the basic theory, the latter
consideration is compelling, so we shall work mostly with (8.3).
The questions that face us are as follows: Given a 2π-periodic function f , can
it be expanded in a series of the form (8.3)? If so, how do we find the coefficients
cn in this series? It turns out to be easier to tackle the second question first. That
is, we first assume that f can be expressed in the form (8.3) and figure out what
the coefficients cn must be; then we show that with this choice of cn , the expansion
(8.3) is actually valid under suitable hypotheses on f .
!
Suppose, then, that the series ∞ −∞ cn e
inθ converges pointwise to the function
f (θ), and suppose also that the convergence is sufficiently well behaved that term-
by-term integration is permissible. The coefficients cn can then be evaluated by
the following device. To compute ck , we multiply both sides of (8.3) by e−ikθ and
integrate over [−π, π]:
* π ∞
" * π
−ikθ
f (θ)e dθ = cn ei(n−k)θ dθ.
−π −∞ −π
Now,
* 7
π
i(n−k)θ [i(n − k)]−1 ei(n−k)θ |π−π = 0 if n ̸= k,
(8.5) e dθ =
−π θ|π−π = 2π if n = k.
358 Chapter 8. Fourier Series
Thus all the terms on the right of the integrated series vanish except for the one
with n = k, and we obtain
* π
f (θ)e−ikθ dθ = 2πck ,
−π
or, relabeling k as n,
* π
1
(8.6) cn = f (θ)e−inθ dθ.
2π −π
This is the promised formula for the coefficients cn . The corresponding formula
for an and bn in (8.4) follows immediately:
(8.7) * π *
1 −inθ inθ 1 π
an = cn + c−n = f (θ)[e + e ] dθ = f (θ) cos nθ dθ,
2π −π π −π
* π *
i 1 π
bn = i(cn − c−n ) = f (θ)[e−inθ − einθ ] dθ = f (θ) sin nθ dθ.
2π −π π −π
Of course, according to (8.2), the integrals over [−π, π] in (8.6) and (8.7) can be
replaced by integrals over any interval of length 2π.
It is useful to keep in mind that in either (8.3) or (8.4), the constant term in the
series is
* π
1 1
(8.8) c0 = 2 a0 = f (θ) dθ,
2π −π
the mean value of f on the interval [−π, π] (or on any interval of length 2π).
!∞What have we accomplished? We have shown that if f (θ) is the sum of a series
inθ
−∞ cn e , and if term-by-term integration is legitimate, then the coefficients
cn must be given by (8.6), but as yet we know almost nothing about the class of
functions that can be represented by such series. But now the formula (8.6) provides
a starting point for studying this matter. Indeed, if f is any integrable 2π-periodic
function, the quantities
* *
1 π 1 π
an = f (θ) cos nθ dθ, bn = f (θ) sin nθ dθ,
π −π π −π
* π
1
cn = f (θ)e−inθ dθ
2π −π
are well defined. We call them the Fourier coefficients of f , and we call the series
∞
" ∞
"
cn einθ = 21 a0 + (an cos nθ + bn sin nθ)
−∞ 1
The study of general Fourier series will be undertaken in the following sections.
We conclude this one by working out two simple examples.
E XAMPLE 1. Let f (θ) be the 2π-periodic function determined by the formula
f (θ) = θ, (−π < θ ≤ π).
That is, f is the sawtooth wave depicted in the top graph of Figure 8.1. The
calculation of the Fourier coefficients cn is an easy integration by parts for
n ̸= 0:
* π + ,π
1 −inθ 1 θe−inθ e−inθ (−1)n+1
cn = θe dθ = + = ,
2π −π 2π −in n2 −π in
Grouping together the nth and (−n)th terms yields the equivalent form
∞
" (−1)n+1
(8.9) 2 sin nθ.
n
1
(We could also have obtained this series directly by using (8.7); we have an = 0
for all n since f is odd, and a calculation similar to the one above shows that
bn = 2(−1)n+1 /n.)
The series (8.9) converges for all θ by Dirichlet’s test. (See Corollary
6.27. The factor of (−1)n+1 does not affect the result, since (−1)n sin nθ =
sin n(θ + π).) The sketches of some of the partial sums in Figure 8.1 lend plau-
sibility to the conjecture that (8.9) does indeed converge to the function f (θ),
at least at the points where f is continuous. (At the points θ = (2k + 1)π where
f is discontinuous, every term in (8.9) vanishes.)
E XAMPLE 2. Let g(θ) be the 2π-periodic function determined by the formula
g(θ) = |θ|, (−π ≤ θ ≤ π).
That is, g is the triangle wave depicted in the top graph of Figure 8.2. Here it
is a bit easier to calculate the Fourier coefficients in terms of sines and cosines.
Since g is an even function, we have bn = 0 for all n and
* *
1 π 2 π
an = g(θ) cos nθ dθ = θ cos nθ dθ.
π −π π 0
360 Chapter 8. Fourier Series
;π
For n = 0 we have a0 = (2/π) 0 θ dθ = π, and for n > 0 an integration by
parts gives
+ ,
2 θ sin nθ cos nθ π 2 (−1)n − 1
an = + = .
π n n2 0 π n2
Weierstrass M-test. Again, a glance at its first few partial sums in Figure 8.2
supports the conjecture that its full sum is g(θ).
362 Chapter 8. Fourier Series
EXERCISES
In Exercises 1–8, find the Fourier series of the 2π-periodic function f (θ) that
is given on the interval (−π, π) by the indicated formula. (All but Exercise 5 are
either even or odd, so their Fourier series are naturally expressed in terms of cosines
or sines.) Sketches of these functions are given in Figure 8.3.
7
−1 (−π < θ < 0)
1. f (θ) = (the square wave).
1 (0 < θ < π)
2. f (θ) = sin2 θ. (You don’t need calculus if you look at this the right way.)
3. f (θ) = | sin θ|. (Hint: sin a cos b = 21 [sin(a + b) + sin(a − b)].)
4. f (θ) = θ 2 .
5. f (θ) = ebθ (b > 0).
6. f (θ) = θ(π − |θ|).
7
1/a (|θ| < a),
7. f (θ) = where 0 < a < π. (The values of f
−1/(π − a) (a < |θ| < π),
are chosen to make the areas of the rectangles between the graph of f and the
x-axis on the intervals [0, a] and [a, π] both equal to 1.)
7
a−2 (a − |θ|) (|θ| < a),
8. f (θ) = where 0 < a < π. (The constants are
0 (a < |θ| < π),
chosen to make the areas of the triangles under the graph of f equal to 1.)
9. Prove that (8.2) is valid for every piecewise continuous P -periodic function f .
(This
; a+P can be done either directly by changes of variable or by differentiating
a with respect to a via Theorem 4.15a.)
π π
Exercise 1 Exercise 2
π
Exercise 3
Exercise 4
π
Exercise 5 Exercise 6
a π a π
Exercise 7 Exercise 8
bounded by a constant:
* π * π
1 −inθ 1
|cn | ≤ |f (θ)e | dθ = |f (θ)| dθ ≤ sup |f (θ)|.
2π −π 2π −π θ
However, it is actually true that cn → 0; in fact, we can say something more precise.
!
In particular, |cn |2 < ∞, and hence limn→±∞ cn = 0.
Proof. We examine the difference between f and a partial sum of its Fourier series.
Since the absolute value of a complex number z is given by |z|2 = zz, we have
) N
" )2 - N
" .- N
" .
) )
)f (θ) − inθ )
cn e ) = f (θ) − cn einθ
f (θ) − cn e−inθ
)
−N −N −N
N
" N
"
2
8 −inθ inθ
9
= |f (θ)| − cn f (θ)e + cn f (θ)e + cm cn ei(m−n)θ .
−N m,n=−N
Next, integration of both sides over [−π, π], using the definition of cn and the
relation (8.5), yields
* )π N
" )2 * π N
" N
"
1 ) ) 1
)f (θ)− c einθ )
dθ = |f (θ)|2
dθ− [c c +c c ]+ cn cn
2π ) n ) 2π −π
n n n n
−π −N −N −N
* π N
"
1
= |f (θ)|2 dθ − |cn |2 .
2π −π −N
N
" * π
f 1
(8.13) SN (θ) = cn einθ , cn = f (ψ)e−inψ dψ.
2π −π
−N
N
" * π N
" * π
f 1 in(θ−ψ) 1
SN (θ) = f (ψ)e dψ = f (ψ)ein(ψ−θ) dψ
2π −π 2π −π
−N −N
N
" * π
1
= f (ϕ + θ)einϕ dϕ.
2π −π
−N
(The second equality is obtained by replacing n by −n, which leaves the sum from
−N to N unchanged, and the third one comes from the change of variable ϕ =
ψ − θ with the help of (8.2).) In other words,
* N
f
π
1 " inϕ
(8.14) SN (θ) = f (ϕ + θ)DN (ϕ) dϕ, where DN (ϕ) = e .
−π 2π
−N
2N
" ei(2N +1)ϕ − 1 ei(N +1)ϕ − e−iN ϕ
2πDN (ϕ) = e−iN ϕ einϕ = e−iN ϕ = .
0
eiϕ − 1 eiϕ − 1
366 Chapter 8. Fourier Series
Incidentally, if we multiply and divide the formula in Lemma 8.15b for DN (ϕ)
by e−iϕ/2 , we obtain
This shows that DN is real-valued and gives an easy way to visualize it: Its graph
is the rapidly oscillating sine wave y = sin(N + 21 )ϕ, amplitude-modulated to fit
inside the envelope y = ±(2π sin 12 ϕ)−1 . (The reader may wish to generate graphs
of DN for various values of N on a computer.)
We are now ready to formulate and prove the basic convergence theorem for
Fourier series. It turns out that piecewise continuity of a periodic function f is not
enough to yield a good result. Instead we shall assume, in effect, that not only
f but also its derivative f ′ is piecewise continuous. More precisely, we shall say
that a periodic function f is piecewise smooth if, on any bounded interval, f is
of class C 1 except at finitely many points, at which the one-sided limits f (θ+),
f (θ−), f ′ (θ+), and f ′ (θ−) (as defined in (8.1)) exist and are finite. (Note that this
definition of piecewise smoothness is more general than that given in §5.1, which
required the function to be continuous.) Pictorially, f is piecewise smooth if its
graph over any bounded interval is a smooth curve except at finitely many points
where it has jumps (if f is discontinuous) or corners (if f is continuous but f ′ is
discontinuous). In addition, the one-sided tangent lines at the jumps and corners
are not allowed to be vertical.
8.16 Theorem. Suppose f is 2π-periodic and piecewise smooth. Then the partial
f
sums SN (θ) of the Fourier series of f , defined by (8.13), converge pointwise to
1
2 [f (θ−) + f (θ+)]. In particular, they converge to f (θ) at each point θ where f is
continuous.
f
so by (8.14), the difference between SN (θ) and its asserted limit is
f 8 9
SN (θ) − 12 f (θ−) + f (θ+)
* 0 * π
8 9 8 9
= f (ϕ + θ) − f (θ−) DN (ϕ) dϕ + f (ϕ + θ) − f (θ+) DN (ϕ) dϕ.
−π 0
8.2. Convergence of Fourier Series 367
where ⎧
⎪ f (ϕ + θ) − f (θ−)
⎨ if −π ≤ ϕ < 0,
iϕ
g(ϕ) = f (ϕ +e θ) −
−
1
f (θ+)
⎪
⎩ if 0 < ϕ ≤ π.
eiϕ − 1
(We could define g(0) to be anything we please; altering the value at this one point
does not affect (8.17), by Proposition 4.14.) On the interval [−π, π], g(ϕ) is con-
tinuous wherever f (ϕ + θ) is and has jump discontinuities wherever f (ϕ + θ) does,
except for an additional singularity at ϕ = 0 caused by the vanishing of eiϕ − 1
there. But this singularity is also at worst a jump discontinuity; that is, the limits
g(0+) and g(0−) both exist. Indeed, by l’Hôpital’s rule,
f (ϕ + θ) − f (θ+) f ′ (ϕ + θ) f ′ (θ+)
g(0+) = lim iϕ
= lim iϕ
= ,
ϕ→0+ e −1 ϕ→0+ ie i
π/2
We conclude by remarking that one can often use simple changes of variable
to generate new Fourier expansions from old ones without recalculating the coeffi-
cients from scratch.
E XAMPLE 3. Consider the modified triangle wave h whose graph is given
in Figure 8.4. It is related to the triangle wave g in Example 2 by h(θ) =
g(θ + 12 π), and cos(2m − 1)(θ + 12 π) = (−1)m sin(2m − 1)θ, so
∞
π 4 " (−1)m−1 sin(2m − 1)θ
h(θ) = + .
2 π (2m − 1)2
1
at the end of §7.3. Namely, for 0 < r < 1 we consider the series
∞
"
(8.19) Ar f (θ) = r |n| cn einθ
−∞
!
Since f is bounded, the Weierstrass M-test (comparison to r |n| again) gives the
uniform convergence to justify interchange of the summation and integration, and
a couple of manipulations like those that lead to (8.14) then show that
* ∞
π
1 " |n| inϕ
(8.20) Ar f (θ) = f (θ + ϕ)Pr (ϕ) dϕ, where Pr (ϕ) = r e .
−π 2π −∞
The function Pr is called the Poisson kernel. Like the Dirichlet kernel, it satisfies
* 0 * π
1
(8.21) Pr (ϕ) dϕ = Pr (ϕ) dϕ =
−π 0 2
! n
(write Pr (ϕ) = (2π)−1 + π −1 ∞ 1 r cos nϕ and integrate term by term), and it is
easily expressed in closed form since it is the sum of two geometric series:
(8.22)
+∞ ∞ , + ,
1 " n inϕ " n −inϕ 1 1 re−iϕ
Pr (ϕ) = r e + r e = +
2π 0 1
2π 1 − reiϕ 1 − re−iϕ
1 − r2 1 − r2
= = .
2π(1 − reiϕ )(1 − re−iϕ ) 2π(1 + r 2 − 2r cos ϕ)
However, the Poisson kernel has one additional crucial property that is not shared
by the Dirichlet kernel:
(8.23)
For any δ > 0, Pr (ϕ) → 0 uniformly on [−π, −δ] and on [δ, π] as r → 1−.
1 − r2
0 ≤ Pr (ϕ) ≤ ,
2π(1 + r 2 − 2r cos δ)
and the expression on the right tends to zero as r → 1−. With these results in hand,
we come to the main theorem.
Proof. We sketch the ideas and leave the details to the reader as Exercises 5 and 6.
Given θ ∈ R and ϵ > 0, we choose δ > 0 small enough so that |f (θ+ϕ)−f (θ+)| <
ϵ when 0 < ϕ < δ and |f (θ + ϕ) − f (θ−)| < ϵ when −δ < ϕ < 0. We then write
the formula (8.20) for Ar f (θ) as
+* −δ * 0 * δ * π,
Ar f (θ) = + + + f (θ + ϕ)Pr (ϕ) dϕ.
−π −δ 0 δ
The first and last integrals tend to zero as r → 1− by (8.23). In the second and
third integrals, f (θ + ϕ) is within ϵ of f (θ−) and f (θ+), respectively, and (8.21)
and (8.23) together show that the integrals of Pr (ϕ) over [−δ, 0] and [0, δ] tend to
1 1
2 as r → 1−. The upshot is that Ar f (θ) is within 2ϵ of 2 [f (θ−) + f (θ+)] when
r is sufficiently close to 1, and since ϵ is arbitrary, the first assertion is proved.
If f is continuous, it is uniformly continuous on [−π, π] by Theorem 1.33 and
hence uniformly continuous on R by periodicity. This means that the δ in the
preceding paragraph can be chosen independent of θ, and the argument given there
then yields uniform convergence.
EXERCISES
1. Find the Fourier series of the sawtooth waves depicted below by modifying the
series in Example 1.
π/2 2
π π
(a) (b)
2. Find the Fourier series of the 2π-periodic function f (θ) defined by f (θ) =
(θ − 14 π)2 on the interval [− 34 π, 54 π]. Use the result of Exercise 4 in §8.1.
3. Find the Fourier series of the 2π-periodic functions defined on the interval
(−π, π) by the indicated formulas by modifying the series in the exercises of
§8.1. 7
0 (−π < θ < 0),
a. f (θ) =
1 (0 < θ < π).
372 Chapter 8. Fourier Series
7
0 (−π < θ < 0),
b. f (θ) = (Hint: max(x, 0) = 21 (x + |x|).)
sin θ (0 < θ < π).
7
(2a)−1 (|θ| < a),
c. f (θ) = where 0 < a < π.
0 (a < |θ| < π),
d. f (θ) = sinh θ.
4. Find the sums of the following series by applying Theorem 8.16 to the series
obtained in the indicated exercises from §8.1 and choosing appropriate values
of θ.
" ∞ "∞
1 (−1)n+1
a. and (Exercise 3). Can you sum the first series
1
4n2 − 1 1
4n2 − 1
in a more elementary way by rewriting it as a telescoping series?
" ∞ ∞
"
1 (−1)n+1
b. and (Exercise 4).
n2 n2
1 1
" ∞ "∞
(−1)n 1
c. and , where b > 0 (Exercise 5).
n 2 + b2 n 2 + b2
1 1
" ∞
(−1)n+1
d. (Exercise 6).
1
(2n − 1)3
5. Fill in the details of the proof of the first assertion of Theorem 8.24.
6. Fill in the details of the proof of the second assertion of Theorem 8.24.
is valid when f is continuous and piecewise smooth, even though f ′ may be un-
defined at finitely many points. (However, it is generally false if f itself has
jump discontinuities.) In particular, if f and g are both continuous and piecewise
smooth, then so is f g, and an application of (8.25) to the latter function yields the
integration-by-parts formula
* b * b
)b
f (x)g(x) dx = f (x)g(x))a −
′
f (x)g′ (x) dx.
a a
8.3. Derivatives, Integrals, and Uniform Convergence 373
The first main result is that there is a very simple relation between the Fourier
coefficients of f and those of f ′ .
c′n = incn .
Equivalently, if an , bn and a′n , b′n are the Fourier coefficients of f and f ′ given by
(8.7), then a′n = nbn and b′n = −nan .
The first term on the right vanishes because f (θ)e−inθ is 2π-periodic, and the sec-
ond one is incn . The argument for an and bn is similar (Exercise 1).
Note that Theorem 8.26 makes no claim about the Fourier series of f ′ ; it is
valid whether or not that series actually converges. If we add more conditions on f
to ensure that it does, we obtain the following result:
is the Fourier series of f , then f ′ (θ) is the sum of the derived series
∞
" ∞
"
incn einθ = (nbn cos nθ − nan sin nθ)
−∞ 1
at every θ at which f ′ (θ) exists. At the exceptional points where f ′ has jumps, the
series converges to 12 [f ′ (θ−) + f ′ (θ+)].
Proof. By Theorem 8.16, f ′ is the sum of its Fourier series everywhere except
where it has jumps, and the coefficients in that series are given by Theorem 8.26.
374 Chapter 8. Fourier Series
that is, when the mean value of f over an interval of length 2π is zero, or, equiv-
alently, when the constant term in the Fourier series of f vanishes. We make this
assumption in the following theorem; if it is not valid, we may wish to subtract off
the constant term and deal with it separately.
" cn ∞ -
" .
inθ an bn
F (θ) = C0 + e = C0 + sin nθ − cos nθ
in n n
n̸
=0 1
By Theorem 8.16, F is the sum of its Fourier series at every point, and by Theorem
8.26, its Fourier coefficients Cn are given for n ̸= 0 by inCn = cn (and likewise
for the cosine and sine coefficients). The constant term C0 is, as always, the mean
value of F .
Observe that the series in Theorem 8.28 is obtained by formally integrating the
Fourier series of f term-by-term, whether the latter series converges or not.
8.3. Derivatives, Integrals, and Uniform Convergence 375
E XAMPLE 2. Subtraction of the mean value from the triangle wave (Example
2 in §8.2) and multiplication by −2 gives
∞
8 " cos(2m − 1)θ
π − 2|θ| = (|θ| ≤ π),
π 1 (2m − 1)2
Theorem 8.28 and the Corollary 8.27 exhibit situations where we can integrate
or differentiate a series termwise without worrying about uniform convergence.
However, uniform and absolute convergence are still highly desirable things, so
we present a simple criterion for the Fourier series of a function to have these
properties.
Proof. Let cn and c′n be the Fourier ! coefficients of f and f ′ . Since |cn einθ | =
|cn |, the absolute convergence of cn einθ is equivalent to the convergence of
!
|c |,
!n inθ and by the Weierstrass M-test, this also implies the uniform convergence
′
of cn e . But by Theorem 8.26, cn = cn /in for n ̸= 0, so
EXERCISES
Show that this series can be differentiated or integrated termwise to yield two
apparently different series expansions of cos θ for 0 < θ < π, and reconcile
these two expansions. (Hint: Example 1 of §8.2 is useful.)
8.4. Fourier Series on Intervals 377
Let f (θ) be the 2π-periodic function such that f (θ) = eθ for |θ| < π, and let
5. !
∞ inθ be its Fourier series. If we formally differentiate this equation,
−∞ cn e !
we obtain eθ = ∞ −∞ incn e
inθ for |θ| < π. But then c and inc are both
n n
; π
equal to (2π)−1 −π eθ e−inθ dθ, so cn = incn and hence cn = 0 for all n.
Clearly this is wrong; where is the mistake?
6. How smooth are the following functions? That is, for which k can you show
that the function is of class C k ?
" ∞
" ∞
"
einθ cos nθ cos 2n θ
a. . b. . c.
n6/5 (1 + n6 ) 2n 2n
n̸
=0 0 0
For this extension the Fourier sine coefficients bn all vanish because feven (θ) sin nθ
378 Chapter 8. Fourier Series
F IGURE 8.5: A function on [0, π] (above) and its even and odd exten-
sions to [−π, π] (below, left and right).
E XAMPLE 1. Let f (θ) = θ on [0, π]. The even and odd periodic extensions of
f are the triangle and sawtooth waves, respectively, and the Fourier cosine and
sine series of f are
∞ ∞
π 4 " cos(2m − 1)θ " (−1)n+1 sin nθ
− and 2 ,
2 π (2m − 1)2 n
1 1
respectively.
If f is piecewise smooth on [0, π], its even and odd periodic extensions will be
piecewise smooth on R. If f (0) = f (0+) and f (π) = f (π−), its even periodic
extension will be continuous at both 0 and π, but its odd periodic extension will
have jumps at 0 or π unless f (0) = 0 or f (π) = 0, respectively. In any case, an
application of Theorem 8.16 to these extensions easily yields the following:
8.30 Theorem. Suppose f is piecewise smooth on [0, π]. The Fourier cosine series
and the Fourier sine series of f converge to 12 [f (θ−) + f (θ+)] at every θ ∈ (0, π).
The cosine series converges to f (0+) at θ = 0 and to f (π−) at θ = π; the sine
series converges to 0 at both these points.
We may wish to consider periodic functions with period other than 2π, or func-
tions defined on intervals other than [0, π]. The general situation can be reduced to
the one we have studied by a linear change of variable; we record the results for
future reference.
Suppose f (x) is a piecewise smooth 2l-periodic function. We make the change
of variables - .
πx lθ
θ= , g(θ) = f (x) = f .
l π
Then g is 2π-periodic, and we have
∞
" * π
inθ 1
g(θ) = cn e , cn = g(θ)e−inθ dθ.
−∞
2π −π
where
* l * l
1 nπx 1 nπx
an = f (x) cos dx, bn = f (x) sin dx.
l −l l l −l l
It follows that the Fourier cosine and sine series of a piecewise smooth function f
on the interval [0, l] are
∞
" * l
1 nπx 2 nπx
(8.32) f (x) = 2 a0 + an cos , an = f (x) cos dx,
1
l l 0 l
and
∞
" * l
nπx 2 nπx
(8.33) f (x) = bn sin , bn = f (x) sin dx.
l l 0 l
0
We conclude with a few remarks comparing Taylor series and Fourier series,
∞
" ∞
"
f (n) (0)
f (x) = xn and f (x) = cn einπx/l ,
0
n! −∞
EXERCISES
1. Find the Fourier cosine series and the Fourier sine series of the following func-
tions on the interval [0, π]. All of these series can be derived from the results
of the examples and exercises in §8.1 without computing the coefficients from
scratch.
a. f (θ) = 1.
b. f (θ) = sin θ.
c. f (θ) = θ 2 . (For the sine series, use Example 1 and Exercise 6 in §8.1.)
d. f (θ) = θ for 0 ≤ θ ≤ 12 π, f (θ) = π − θ for 12 π ≤ θ ≤ π.
2. Expand the given function in a series of the given type. As in Exercise 1, use
previously derived results as much as possible.
a. f (x) = 1; sine series on [0, 1].
b. f (x) = 1 for 0 < x < 2, f (x) = −1 for 2 < x < 4; cosine series on
[0, 4].
c. f (x) = lx − x2 ; sine series on! [0, l].
d. f (x) = ex ; series of the form ∞ −∞ cn e
2πinx on [0, 1].
(Hint: Extend f to [0, 2l] by making it even about x = l, i.e., f (x) = f (2l − x)
for x ∈ [l, 2l], and use Exercise 3.)
Heat Flow in an Insulated Rod. Consider a rod occupying the interval [0, l],
insulated so that no heat can enter or leave it, and let f (x) be the temperature at
position x and time t = 0. How does the temperature distribution evolve with time?
(Note: Instead of thinking of a thin rod, one can think of a thick cylindrical slab
382 Chapter 8. Fourier Series
x
R
These are simple ordinary differential equations, and the general solutions are read-
ily found:
√ √
ψ(t) = C0 e−kαt , ϕ(x) = C1 cos α x + C2 sin α x.
We have thus found a large family of solutions of the heat equation of the form
ϕ(x)ψ(t). For these solutions, the boundary conditions ∂x u(0, t) = ∂x u(l, t) = 0
8.5. Applications to Differential Equations 383
so the condition ϕ′ (0) = 0 forces C2 = 0, and the condition ϕ′ (l) = 0 then forces
√
α to be a multiple of π/l, or α = n2 π 2 /l2 where n is an integer (which might as
well be nonnegative). In short, we have obtained the following family of solutions
of the heat equation together with the boundary conditions:
- 2 2 .
−n π kt nπx
un (x, t) = exp cos (n = 0, 1, 2, 3, . . .).
l2 l
Since the heat equation and the boundary conditions are linear, we obtain more
general solutions by taking linear combinations of these. In fact, we can pass to
infinite linear combinations — that is, infinite series of the form
"∞ - 2 2 .
−n π kt nπx
(8.35) u(x, t) = an exp 2
cos .
l l
0
Finally, we are ready to tackle the initial condition u(x, 0) = f (x). If we set
t = 0 in (8.35), we obtain
∞
" nπx
u(x, 0) = an cos ,
l
0
so we can make u(x, 0) equal to f (x) by taking the series on the right to be the
Fourier cosine series of f , defined by (8.32)! (Note that the constant term, which
we called 12 a0 before, is called a0 here.) In other words, to solve the problem (8.34),
we take u(x, t) to be defined by (8.35), where the coefficients an are given in terms
of the initial data f by
* *
1 l 2 l nπx
a0 = f (x) dx, an = f (x) cos dx (n > 0).
l 0 l 0 l
At this point we should stop to verify that the proposed solution (8.35) of the
problem (8.34) really works, as the passage from finite linear combinations to infi-
nite series has the potential to cause difficulties. In fact, everything turns out quite
nicely for this problem. In the first place, if the initial temperature distribution f (x)
is continuous and piecewise smooth (a reasonable physical assumption), the same
will be true of its even 2l-periodic extension, so by Theorem ! 8.29, its Fourier series
is absolutely and uniformly convergent. In particular, ∞ 1 |an | < ∞. The abso-
lute value of the nth term of the series in (8.35) is at most |an |, so the Weierstrass
384 Chapter 8. Fourier Series
M-test shows that this series converges absolutely and uniformly for 0 ≤ x ≤ l and
t ≥ 0 to define a continuous function u(x, t) there. Moreover, for t > 0, the ex-
ponential factors in (8.35) decay rapidly as n → ∞, which makes the convergence
even better. In particular, repeated differentiation with respect to t or x introduces
factors of nk into the series, which are still overpowered by the decay of the expo-
nential factors, so the differentiated series still converges absolutely and uniformly.
If follows that u(x, t) is of class C ∞ for t > 0 and that termwise differentiation
is permissible; u therefore satisfies the heat equation and the boundary conditions
because each term of the series does.
Two further remarks: First, as t → ∞, the exponential factors in (8.35) all
tend rapidly to zero except for the one with n = 0, and so u(x, t) approaches
the constant a0 , the mean value of f on the interval [0, l]. In physical terms this
means that the rod approaches thermal equilibrium as time progresses. Second, the
series (8.35) will usually diverge when t < 0, for then the exponential factors grow
rather than decay! This corresponds to the physical fact that time is irreversible for
diffusion processes governed by the heat equation.
The Vibrating String. We now study the vibrations of a string stretched across
the interval 0 ≤ x ≤ l and fixed at the endpoints. (Think of a guitar string, and see
Figure 8.7.) Here u(x, t) will denote the displacement of the string (in a direction
perpendicular to the x-axis) at position x and time t. The relevant differential
equation is the wave equation ∂t2 u = c2 ∂x2 u, where c is a positive constant that
can be interpreted as the speed with which disturbances propagate down the string.
(See Folland [6, pp. 388–90] or Kammler [10, pp. 526–7] for a derivation of the
wave equation from physical principles.) Since the string is fixed at both ends,
the boundary conditions for this problem are u(0, t) = u(l, t) = 0. As for initial
conditions, since the wave equation is second-order in t we need to specify both
the initial displacement u(x, 0) and the initial velocity ∂t u(x, 0). Thus the problem
we have to solve is
(8.36)
∂2u 2
2∂ u ∂u
2
= c 2
, u(x, 0) = f (x), (x, 0) = g(x), u(0, t) = u(l, t) = 0,
∂t ∂x ∂t
In the last equation, the quantities on the left and right depend only on t and x,
respectively, so they are both equal to a constant −α, and we obtain the ordinary
differential equations
ψ ′′ (t) + αc2 ψ(t) = 0, ϕ′′ (x) + αϕ(x) = 0.
The general solution of the second equation is
√ √
ϕ(x) = C1 cos α x + C2 sin α x.
The boundary condition u(0, t) = 0 forces C1 to vanish, and then the boundary
√
condition u(l, t) = 0 forces α to be a multiple of π/l, so α = n2 π 2 /l2 for some
(positive) integer n. With this value of α, the general solution of the differential
equation for ψ is
nπct nπct
ψ(t) = b cos + B sin .
l l
(The arbitrary constants are labeled b and B for reasons that will become clearer in
a moment.)
For each positive integer n, we therefore have the solution
- .
nπct nπct nπx
un (x, t) = bn cos + Bn sin sin .
l l l
Taking linear combinations and passing to limits, we are led to the series solution
∞ -
" .
nπct nπct nπx
(8.37) u(x, t) = bn cos + Bn sin sin .
1
l l l
so we satisfy the condition u(x, 0) = f (x) by taking the bn ’s to be the Fourier sine
coefficients of f :
*
2 l nπx
bn = f (x) sin dx.
l 0 l
386 Chapter 8. Fourier Series
to converge. The extra factor of n2 makes the terms larger, and there is no ex-
ponential decay anywhere to compensate. If we recall that the decay of Fourier
coefficients is related to the degree of smoothness of the function in question, the
contrast with the heat equation may be expressed as follows: The diffusion of heat
tends to smooth out irregularities in the initial temperature distribution, but in wave
motion, any initial roughness simply propagates without dying out.
We can obtain a positive result by imposing more differentiability hypotheses
on f and g. If we assume that not only f and g but also the first two derivatives of f
and the first derivative of g are continuous and piecewise smooth, and that not only
f and g but also f ′′ vanishes at the endpoints (so that its odd! periodic extension
is
! 2continuous there), then Theorems 8.26 and 8.29 imply that n2 |bn | < ∞ and
n |Bn | < ∞, which guarantees the absolute and uniform convergence of (8.38).
This is also enough to guarantee that the formal differentiation of (8.37) that led to
the formula for the Bn ’s is valid.
However, these additional assumptions are rather unnatural from a physical
point of view. The obvious model for a plucked string, for example, is to take
f to be a piecewise linear function as in Figure 8.8. It is easy to calculate the
8.5. Applications to Differential Equations 387
coefficients bn explicitly for such an f (Exercise 4), and they turn out to decay
exactly like n−2 . The series (8.37) therefore converges nicely, and we may expect
it to provide a good description of the physical vibration of the string. On the other
hand, the twice-differentiated series (8.38) does not converge at all, so it is hard to
say in what sense (8.37) satisfies the wave equation. The resolution of this paradox
is to expand our vision of what a solution of a differential equation ought to be and
to develop a notion of “weak solution” that will encompass examples such as this
one. But this is a more advanced topic; see, for example, Folland [6, §9.5].
Taking for granted that the series (8.37) really is the solution of the boundary
value problem (8.36), we say a few words about its physical interpretation. Think of
the string as being a producer of musical notes such as a guitar string. The nth term
in the series (8.37), as a function of t, is a pure sine wave with frequency nπc/l,
which represents a musical tone at a pure, definite pitch. The series (8.37) therefore
shows how the sound produced by the string can be resolved into a superposition
of these pure pitches. Typically, the coefficients bn and Bn decrease as n increases,
so that the largest contribution comes from the first term, n = 1. This is the
“fundamental” pitch, and the higher n’s are the “overtones” that give the note its
particular tone quality.
Related Problems. The heat flow and vibration problems (8.34) and (8.36)
can be modified by changing the boundary conditions; this leads to models of other
interesting physical processes. Here are a few examples:
1. The boundary value problem
∂u ∂2u
= k 2, u(x, 0) = f (x), u(0, t) = u(l, t) = 0
∂t ∂x
models the flow of heat in a rod that occupies the interval 0 ≤ x ≤ l when both
ends are held at temperature zero — by immersing them in ice water, for instance.
(Note that the heat equation doesn’t care where the zero point of the temperature
scale is located; if u is a solution, so is u + c for any constant c. Of course, this
means that the validity of the heat equation as a model for actual thermodynamic
processes has its limitations, as absolute zero exists physically.) The method of
solution is exactly the same as for the insulated problem (8.34), except that the
388 Chapter 8. Fourier Series
boundary conditions for ϕ(x) are ϕ(0) = ϕ(l) = 0. Thus, as in the vibrating string
problem, we obtain ϕ(x) = sin(nπx/l), and the solution is given by
"∞ - 2 2 .
−n π kt nπx
u(x, t) = bn exp sin ,
l2 l
1
!
where bn sin(nπx/l) is the Fourier sine series of f (x).
2. The boundary value problem
∂2u 2
2∂ u
= c ,
∂t2 ∂x2
∂u ∂u ∂u
u(x, 0) = f (x), (x, 0) = g(x), (0, t) = (l, t) = 0
∂t ∂x ∂x
models the vibration of air in a cylindrical pipe occupying the interval 0 ≤ x ≤ l
that is open at both ends. (Examples: flutes and some organ pipes.) Here u(x, t)
represents the longitudinal displacement of the air at position x and time t. The
boundary conditions ∂x u(0, t) = ∂x u(l, t) = 0 come from the fact that the change
in air pressure due to the displacement u is proportional to ∂x u, and the air pressure
at both ends must remain equal to the ambient air pressure. Again, the solution is
very similar to (8.37) except that it involves cosines instead of sines in x:
∞ -
" .
1 nπct nπct nπx
u(x, t) = 2 (a0 + A0 t) + an cos + An sin cos ,
1
l l l
! !∞
where 21 a0 + ∞ 1
1 an cos(nπx/l) and 2 A0 + 1 (nπcAn /l) cos(nπx/l) are the
Fourier cosine series of f and g, respectively. (The term 21 (a0 + A0 t) represents
a flow of air down the tube with constant velocity, of no importance for the vibra-
tions.) As with the vibrating string, the vibrations of the pipe are a superposition of
vibrations at the definite frequencies nπc/l (n = 1, 2, 3, . . .).
3. We can also mix the two types of boundary conditions we have been consid-
ering: for the heat equation,
∂u ∂2u ∂u
= k 2, u(x, 0) = f (x), u(0, t) = (l, t) = 0,
∂t ∂x ∂x
or the wave equation,
∂2u 2
2∂ u
= c ,
∂t2 ∂x2
∂u ∂u
u(x, 0) = f (x), (x, 0) = g(x), u(0, t) = (l, t) = 0.
∂t ∂x
8.5. Applications to Differential Equations 389
The first of these models heat flow in a rod where one end is held at temperature
zero and the other is insulated; the second models vibrations of air in cylindrical
pipes where one end is closed and the other is open, such as clarinets and some
organ pipes. In both of them, separation of variables leads to the ordinary differen-
tial equation ϕ′′ (x) = −αϕ(x) with boundary conditions ϕ(0) = ϕ′ (l) = 0. The
√ √
general solution of the differential equation is ϕ(x) = C1 cos αx + C2 sin αx;
the condition ϕ(0) = 0 forces C1 to vanish, and then the condition ϕ′ (l) = 0 forces
√
α to be of the form (n − 21 )π/l with n a positive integer. We are therefore led to
try to expand the initial functions in a series of the form
∞
" πx
f (x) = an sin(n − 12 ) .
1
l
This can indeed be done; the technique for reducing this problem to one of ordinary
Fourier sine series is outlined in Exercise 4 of §8.4.
It is interesting to note that the resulting frequencies for the vibrating pipe are
(n − 21 )πc/l (n = 1, 2, 3, . . .). In particular, the fundamental frequency for a pipe
closed at one end and open at the other, namely 21 πc/l, is half as great as for a
pipe of equal length that is open at both ends. Moreover, only the odd-numbered
multiples of this fundamental frequency occur as “harmonics” for half-open pipes,
whereas all integer multiples occur for open pipes; as a result, the two kinds of
pipes produce notes of different tone qualities.
4. Clearly there are many other variations to be played on this theme — dif-
ferent boundary conditions, other differential equations, and so on. A few further
examples are outlined in the exercises, and we shall indicate a more general frame-
work in which such problems can be studied in the next section.
EXERCISES
1. A rod 100 cm long is insulated along its length and at both ends. Suppose that
its initial temperature is u(x, 0) = x (x in cm, u in ◦ C, t in sec, 0 ≤ x ≤ 100),
and that its diffusivity coefficient k is 1.1 cm2 /sec (about right if the rod is made
of copper).
a. Find the temperature u(x, t) for t > 0. (For the relevant Fourier series, see
Example 1 of §8.4.)
b. Show that the first three nonvanishing terms of the series (including the
constant term) give the temperature accurately to within 1◦ when t = 60
!∞What are u(0,
(one minute after starting). 60), u(10, 60),
!∞ and u(40, 60) to
the nearest 1◦ ? (Hint: 1 (2n − 1)−2 = π 2 /8, so
3 (2n − 1)
−2 =
(π 2 /8) − 1 − 19 ≈ 0.123.)
390 Chapter 8. Fourier Series
c. Show that u(x, t) is within 1◦ of its equilibrium value of 50◦ for all x when
t ≥ 3600 (i.e., after one hour). (Don’t work too hard; crude estimates are
enough.)
2. Find the temperature function u(θ, t) (t > 0) for a rod bent into the shape of
a circular hoop, given the initial temperature u(θ, 0) = f (θ). (Here θ denotes
the angular coordinate on the circle, and the boundary conditions for a straight
rod are replaced by the requirement that u should be a 2π-periodic function of
θ.)
3. As we found in §5.6, the inhomogeneous heat equation ∂t u = k∂x2 u + G can
be used to model heat flow in a rod when the total amount of heat energy is not
constant; here G is a function of x and t, with units of degrees per unit time,
that accounts for the addition or subtraction of heat from the rod. Let us solve
the initial value problem with constant-temperature boundary conditions,
5. The model for a vibrating string given by the wave equation is unrealistic be-
cause it predicts that the vibration will continue forever without dying out. Real
strings, however, are not perfectly elastic, so the vibrational energy is gradu-
ally dissipated. A better model is obtained by the following modification of the
wave equation:
∂t2 u = c2 ∂x2 u − 2δ∂t u,
where δ is a small positive constant. (The left side is the acceleration, and the
terms on the right are the effects of the elastic restoring force and the damping
force that tends to slow the motion down. The factor of 2 is just for conve-
nience.) Find the general solution of this differential equation subject to the
boundary conditions u(0, t) = u(l, t) = 0 by modifying the method used in
the text for the ordinary wave equation. Assume that δ < πc/l. You should find
that the solutions decay exponentially in time and that the frequencies decrease
as the damping constant δ increases.
Exercises 6 and 7 concern the Dirichlet problem for a bounded open set S ⊂
R2 : Given a function f on the boundary ∂S, find a solution of Laplace’s equation
∂x2 u + ∂y2 u = 0 on S such that u = f on ∂S. (A physical interpretation: Find
the steady-state distribution of heat in S when the temperature on the boundary is
given.)
6. Consider the Dirichlet problem for a rectangle:
∂x2 u + ∂y2 u = 0 for 0 < x < l, 0 < y < L;
u(x, 0) = f1 (x), u(x, L) = f2 (x), u(0, y) = g1 (y), u(l, y) = g2 (y).
Similar ideas underlie the study of complex n-dimensional vectors. The main
difference is that, since the absolute value |z| of a complex number z is given by
(zz)1/2 rather than (z 2 )1/2 , the appropriate definition of inner product is
n
"
(8.39) ⟨a, b⟩ = aj bj (a, b ∈ Cn ).
1
If the basis {uj } is orthogonal (⟨uj , uk ⟩ = 0 for j ̸= k) but not normalized (∥uj ∥
not necessarily equal to 1), the formula becomes
n
" ⟨x, uj ⟩
(8.40) x= cj uj , cj = .
1
∥uj ∥2
Now we are ready to make the conceptual leap from the discrete and finite-
dimensional to the continuous and infinite dimensional. Suppose we are studying
functions on an interval [a, b] — let us say, piecewise continuous, complex-valued
ones. We regard such a function f as a “vector” whose “components” are the
values f (x) as x ranges over [a, b]. We define the inner product of two functions
f and g just as in (8.39) except that the sum is replaced by an integral:
* b
(8.41) ⟨f, g⟩ = f (x)g(x) dx.
a
where ϕn = (2π) −1/2 en . The formula for the Fourier series of a function f ,
"∞ * π
1 ⟨f, en ⟩
f= cn en , cn = f (x)e−inx dx = ,
−∞
2π −π ∥en ∥2
is an exact analogue of the formula (8.40) for the expansion of a vector in terms of
an orthogonal basis!
A similar interpretation holds for Fourier cosine and sine series. To wit, it is
easy to verify (Exercise 1) that {cos nπx/l}∞ ∞
0 and {sin nπx/l}1 are orthogonal
sets on the interval [0, l], and that the formulas for the Fourier cosine and sine
coefficients of a function f on [0, l] are analogous to (8.40).
There are some unanswered questions here, however. The inner product ⟨f, g⟩
makes sense when f and g are piecewise continuous on [a, b], but we have proved
the validity of Fourier expansions only for piecewise smooth functions. So, what is
the “right” class of functions to consider here? Can we make sense out of Fourier
series for functions that may not be piecewise smooth?
The key insight is that pointwise convergence is the wrong notion of conver-
gence in this situation. Instead, we should use a notion of convergence that arises
from the geometry of the inner product. That is, we think of the set
P C(a, b) = set of all piecewise continuous complex-valued functions on [a, b]
as an “infinite-dimensional Euclidean space” with the notions of length and angle
given by the inner product (8.41). The “distance” between two functions is to be
interpreted as the norm of their difference,
+* b ,1/2
2
Distance from f to g = ∥f − g∥ = |f (x) − g(x)| dx ,
a
The introduction of norm convergence is justified by the fact that the Fourier
series of any piecewise continuous function f on [−π, π] converges in norm to f .
This is a substantial result, but there is more to be said before we state a formal
theorem.
The space P C(a, b) of piecewise continuous functions on [a, b] fails to be a
good infinite-dimensional analogue of Euclidean space in one crucial respect: it is
not complete. That is, if {fk } is a sequence in P C(a, b) such that ∥fj − fk ∥ → 0
as j, k → ∞, there may not be a function f ∈ P C(a, b) such that ∥fk − f ∥ → 0.
For example, with [a, b] = [0, 1], let
7
x−1/4 if x > 1/k,
fk (x) =
0 otherwise.
where the integral is a Lebesgue integral. (The name “L2 ” is pronounced “L-two”;
the L is in honor of Lebesgue and the 2 refers to the exponent in |f (x)|2 .)
We can now state the general convergence theorem for Fourier series.
8.43 Theorem. Let en (θ) = einθ .
a. If f ∈ L2 (−π, π), the Fourier series
∞
" * π
1
cn en , cn = f (θ)e−inθ dθ,
−∞
2π −π
!∞
c. If {cn }∞ !∞ of complex numbers such that −∞ |c2n | con-
−∞ is any sequence
2
Proof. A full proof of Theorem 8.43 is beyond the scope of this book. (One may
be found in Jones [9, p. 325] or Rudin [18, pp. 328ff.].) However, the idea is as
follows. If f is continuous and piecewise smooth, we know that its Fourier series
converges uniformly (Theorem 8.29) and hence in norm, so (a) is valid for such f .
We then obtain the result for arbitrary f ∈ L2 (−π, π) by a limiting argument that
involves proving that any function in L2 (−π, π) is the limit in norm of a sequence
of continuous, piecewise smooth functions. (A partial result in this direction is
indicated in Exercise 7.) (b) follows easily because, as we showed in the proof of
Bessel’s inequality,
* N * ) N )2
1 π " 1 π ) " )
2
|f (θ)| dθ − 2
|cn | = )f (θ) − cn e )) dθ,
inθ
2π 2π )
−π −N −π −N
and the integral on the right tends to zero as N → ∞ since the series converges
in norm to f . (c) follows from (b) and the completeness of L2 (−π, π). Indeed, by
(b),
* π) " )2
"
) )
) c einθ )
dθ = 2π |cn |2 ,
) n )
−π M ≤|n|≤N M ≤|n|≤N
!
so the partial sums of the series cn en are Cauchy in norm; by completeness, the
series converges in norm.
f be the sawtooth wave function (f (θ) = θ for |θ| < π). We calculated in §8.1
that its Fourier coefficients are given by c0 = 0 and cn = (−1)n+1 /in for n ̸= 0.
Therefore,
∞
" +∞ −1 , ∞ * π
1 1 " 1 " 1 1" 2 1 2 π2
= + = |cn | = θ dθ = .
n2 2 n2 −∞ n2 2 −∞ 4π −π 6
1 1
on the whole L2 space but on a suitable subspace of functions that possess the
requisite derivatives and satisfy certain boundary conditions. Indeed, we have
d inx d2 d2
e = ineinx , cos nx = −n2 cos nx, sin nx = −n2 sin nx.
dx dx2 dx2
The functions einx are precisely the eigenfunctions of d/dx on [−π, π] that satisfy
the periodicity condition f (−π) = f (π), and the functions cos nx and sin nx are
precisely the eigenfunctions of d2 /dx2 on [0, π] that satisfy the boundary condi-
tions f ′ (0) = f ′ (π) = 0 and f (0) = f (π) = 0, respectively. The Fourier expan-
sion of a function therefore provides the analogue of the spectral theorem (A.58)
for these fundamental differential operators, with all the resulting simplifications
that one expects when one finds an orthonormal eigenbasis for a matrix.
For example, we can rederive the solution (8.35) of the insulated heat flow
problem (8.34) as follows. To solve the heat equation ∂t u = k∂x2 u subject to the
boundary conditions ∂x u(0, t) = ∂x u(l, t) = 0, we take u to be the sum of a series
of eigenfunctions of ∂x2 satisfying these boundary conditions:
∞
" nπx
u(x, t) = αn (t) cos .
0
l
Plugging this into the heat equation turns the partial differential equation ∂t u =
k∂x2 u into the ordinary differential equations α′n (t) = −k(nπ/l)2 αn (t) for the
2
coefficients. The latter are easily solved to yield αn (t) = an e−k(nπ/l) t and hence
the solution (8.35).
There is an extensive theory of eigenfunction expansions associated to bound-
ary value problems. Many such expansions yield interesting orthogonal bases for
L2 spaces. Others, in which there is a “continuous spectrum” instead of (or in addi-
tion to) a “discrete spectrum,” involve integrals instead of (or in addition to) infinite
series. A great deal of interesting mathematics has arisen from these ideas, and its
ramifications spread far beyond the problems for which it was originally devised.
An introduction to this subject can be found, for example, in Folland [6].
EXERCISES
= lim
8. Show that in terms of the cosine and sine coefficients an and bn defined by
(8.7), Parseval’s identity takes the form
* π ∞
"
π / 0
|f (θ)|2 dθ = |a0 |2 + π |an |2 + |bn |2 .
−π 2
1
9. Evaluate the following series by applying Parseval’s identity, in the form given
in Exercise 8, to certain of the Fourier series found in the exercises of §8.1 and
§8.3. (Remember that the constant term is 21 a0 , not a0 .)
"∞
1
a.
n4
1
"∞
1
b.
(2n − 1)6
1
"∞
1
c. 8
1
n
8.7. The Isoperimetric Inequality 401
∞
" sin2 na
d. (First assume that 0 < a < π, then deduce the general re-
n2
1
sult.)
10. Suppose that f is 2π-periodic, real-valued, and of class C 1 . Show that f ′ is
orthogonal to f on [−π, π] in two ways: (i) directly from the fact that 2f f ′ =
(f 2 )′ , and (ii) by expanding f in a Fourier series and using (8.46). (Hint: When
f is real we have c−n = cn ; why?)
On the other hand, by Green’s theorem (see Example 3 in §5.2), the area of the
region enclosed by C is ) * )
)1 )
A = )2) x dy − y dx)).
C
(The absolute value is there because we do not specify whether C is positively or
negatively oriented.) Moreover,
8 9
x dy − y dx = Im (x − iy)(dx + i dy) = Im z dz,
so ) * ) ) * π
)
) ) ) )
A = )) 12 Im z dz )) = 1)
2 )Im ζ(s)ζ ′ (s) ds)).
C −π
Thus, by the general form (8.46) of Parseval’s identity,
) ∞ ) )∞ )
) " ) )" )
)
A = π )Im ) )
cn incn ) = π ) 2)
n|cn | ).
−∞ −∞
Comparing this with (8.49) yields the desired upper bound for A:
)∞ ) ∞ ∞
)" ) " "
A = π )) n|cn | )) ≤ π
2
|n| |cn |2 ≤ π n2 |cn |2 = π.
−∞ −∞ −∞
8.7. The Isoperimetric Inequality 403
Moreover, the second inequality is strict unless cn = 0 for |n| > 1. In that case,
the first inequality becomes
) )
) |c1 |2 − |c−1 |2 ) ≤ |c1 |2 + |c−1 |2 ,
which is strict unless either c1 or c−1 vanishes. Thus A < π unless ζ(s) =
c0 + c1 eis or ζ(s) = c0 + c−1 e−is , both of which describe a circle centered at
c0 , traversed counterclockwise or clockwise, respectively. (In either case the radius
is 1 since |c±1 | = |ζ ′ (s)| = 1.)
Appendix A
SUMMARY OF LINEAR
ALGEBRA
This appendix consists of a brief summary of the definitions and results from linear
algebra that are needed in the text (and a little more). Brief indications of proofs
are given where it is easy to do so, but lack of any proof does not necessarily mean
that a statement is supposed to be obvious. More complete treatments can be found
in texts on linear algebra such as Anton [1] and Lay [16].
A.1 Vectors
Most of the basic terminology concerning n-dimensional vectors is contained in
§1.1; we introduce a few more items here.
(A.1) If x1 , . . . , xk are vectors in Rn , any vector of the form
c1 x1 + c2 x2 + · · · + ck xk (c1 , . . . , ck ∈ R)
is called a linear combination of x1 , . . . , xk . The set of all linear combinations of
x1 , . . . , xk is called the linear span of x1 , . . . , xk .
Geometrically, the linear span of a single nonzero vector x (that is, the set of all
scalar multiples of x) is the straight line through x and the origin. The linear span
of a pair of nonzero vectors x and y is the plane containing x, y, and the origin
unless y is a scalar multiple of x, in which case it is just the line through x and the
origin.
(A.2) For 1 ≤ j ≤ n, we define ej to be the vector in Rn whose jth component
is 1 and whose other components are all 0:
e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, 0, . . . , 1).
405
406 Appendix A. Summary of Linear Algebra
(x1 , x2 , . . . , xn ) = x1 e1 + x2 e2 + · · · + xn en .
(A.11) It is important to note that the product BA is defined only if the number
of columns in B is the same as the number of rows in A, that is, if the length of
a row in B is equal to the length of a column in A. It is also important to note
that matrix multiplication is not commutative: In general, BA ̸= AB, even when
both products are defined. However, matrix multiplication is associative; that is,
(CB)A = C(BA) for any A, B, C such that all products in question are defined.
It also distributes over addition in the obvious way: C(A + B) = CA + CB and
(A + B)D = AD + BD.
(A.12) Let I be the identity mapping on Rn , I(x) = x for all x ∈ Rn . The
corresponding matrix is called the n × n identity matrix and is denoted by I or by
In if the size needs to be specified. It is the matrix whose columns are the standard
basis vectors e1 , . . . , en , that is, the matrix whose entries Ijk are equal to 1 when
j = k and 0 when j ̸= k. If A is any m × n matrix, we have Im A = A and
AIn = A. This is obvious since the composition of any map A with the identity
map is just A; it is also easy to verify from the definition of matrix products in
(A.10).
(A.13) Let A : Rn → Rn be a linear map. If there is another linear map B :
Rn → Rn such that AB(x) = BA(x) = x for all x ∈ Rn (that is, in terms of
matrices, AB = BA = In ), then A (or its associated matrix) is called invertible
or nonsingular, and B is called the inverse of A and is denoted by A−1 . It is easy
to verify that if A1 and A2 are both invertible, then so is their product A1 A2 , and
(A1 A2 )−1 = A−1 −1
2 A1 . We shall say more about invertibility in (A.50)–(A.55).
(A.14) Vectors in Rn can be thought of as n × 1 matrices (called column vec-
tors) or as 1 × n matrices (called row vectors), and scalars can be thought of as
1 × 1 matrices. With these identifications, we can reinterpret some of the preceding
formulas:
• If A : Rn → Rm and x ∈ Rn , then by (A.7), A(x) is the matrix product
Ax, where x and A(x) are considered as column vectors. For this reason, we
(almost) always think of vectors as column vectors when we perform matrix
calculations with linear maps. Moreover, we shall henceforth write Ax in
preference to A(x).
• Let B be an l × m matrix and A an m × n matrix; then the rows of B and
the columns of A can both be considered as vectors in Rm . The (ik)th entry
of the product matrix BA is the dot product of the ith row of B with the kth
column of A.
of A∗ are the columns of A and vice versa. As linear maps, A and A∗ are related
through the dot product:
(A.16) x · Ay = A∗ x · y,
! !n
since both sides are equal to the double sum m j=1 k=1 xj Ajk yk . It is easy to
∗
check that (AB) = B A .∗ ∗
ii. Add a scalar multiple of one row to another row. (That is, for some j ̸= k,
replace rj by rj + crk , and leave all the other rows unchanged.)
iii. Interchange two rows. (That is, for some j ̸= k, replace rj by rk and rk by
rj , and leave all other rows unchanged.)
(A.18) For each elementary row operation, the matrix obtained by performing
that operation on the identity matrix Im is called the corresponding elementary
matrix. For example, the entries of the elementary matrix corresponding to the
operation (ii) are 1 on the main diagonal, c in the (jk)th slot, and 0 elsewhere.
We leave it as an easy exercise for the reader to verify that performing an elemen-
tary row operation on a matrix A is the same as multiplying A on the left by the
corresponding elementary matrix.
(A.19) It is important to note that the elementary row operations, and their as-
sociated matrices, are all invertible, and their inverses are operations of the same
types. Indeed, the inverses of the operations
rj → crj , rj → rj + crk , rj ↔ rk
are
rj → c−1 rj , rj → rj − crk , rj ↔ rk .
410 Appendix A. Summary of Linear Algebra
(A.20) Row operations can be used to transform a matrix into certain standard
forms that are useful for many purposes. The definitions are as follows. A matrix
is said to be in echelon form if the following conditions are satisfied:
• In every nonzero row (that is, every row in which at least one entry is non-
zero), the first nonzero entry is equal to 1.
• If the jth and kth rows are nonzero, and j < k, the initial 1 in row j is to the
left of the initial 1 in row k.
• The zero rows (if any) are below all of the nonzero rows.
• The entries above and below the initial 1’s in the nonzero rows are all 0.
The matrices displayed above are not in reduced echelon form, but the following
matrices are:
⎛ ⎞
- . - . 1 7 0
1 0 −5 1 4 ⎝0 0 1⎠ .
, ,
0 1 −3 0 0
0 0 0
1. If necessary, interchange the first row with another row so that the leftmost
nonzero column has a nonzero entry in the first row.
2. Multiply the first row by the reciprocal of its first nonzero entry (thus turning
the first nonzero entry into a 1).
3. Add multiples of the first row to the rows below so as to make the entries
below the initial 1 in the first row equal to 0.
4. Set the first row aside and apply steps 1–3 to the submatrix obtained by omit-
ting the first row. Repeat this process until no nonzero rows remain.
Once this is done, the matrix can be further transformed into one in reduced echelon
form as follows:
5. Add multiples of each nonzero row to the rows above so as to make the
entries above the initial 1’s equal to 0.
(A.23) All of the ideas in this section have analogues for columns in place of
rows. That is, we have the elementary column operations (multiply a column by
a nonzero scalar, add a multiple of one column to another one, interchange two
columns), which are implemented by multiplying a matrix on the right by the cor-
responding elementary matrix. They can be used to transform a matrix into one in
column-echelon form or reduced column-echelon form, whose definitions are the
obvious modifications of the ones given above for (row-)echelon forms.
A.4 Determinants
(A.24) The determinant is a function that assigns to each square matrix A a
certain number det A. For 2 × 2 and 3 × 3 matrices, the determinant is given by
- .
a b
(A.25) det = ad − bc,
c d
⎛ ⎞
a b c
(A.26) det ⎝d e f ⎠ = a(ei − f h) − b(di − f g) + c(dh − eg).
g h i
For larger matrices, the explicit formula for the determinant is quite a mess. How-
ever, this formula is of little use; the important things about determinants are the
properties they possess, which lead to more efficient ways of computing them. The
412 Appendix A. Summary of Linear Algebra
following seven items constitute a list of the most fundamental properties of deter-
minants. In them, A and B denote n × n matrices.
(A.27) det In = 1.
(A.28) det(AB) = (det A)(det B).
(A.29) For each j, det A is a linear function of the jth row of A when the other
rows are kept fixed. (Thus, for example, when j = 1,
⎛ ′ ⎞ ⎛ ′⎞ ⎛ ′′ ⎞
ar1 + br′′1 r1 r1
⎜ r ⎟ ⎜r2 ⎟ ⎜r2 ⎟
det ⎝ 2 ⎠ = a det ⎝ ⎠ + b det ⎝ ⎠ ,
.. .. ..
. . .
where the rj ’s denote row vectors.) In particular, if A has a zero row, det A = 0.
(A.30) (Behavior under elementary row operations)
• If one row of A is multiplied by c and the other rows are left unchanged,
det A is multiplied by c.
• If a multiple of the kth row of A is added to the jth row and the other rows
are left unchanged, det A is unchanged.
This formula is called the cofactor expansion of det A along the jth row. (For
example, in view of equation (A.25), equation (A.26) gives the cofactor expansion
of the determinant of a 3 × 3 matrix along its first row.)
(A.32) det(A∗ ) = det A. Consequently, properties (A.29) and (A.30) remain
valid if “row” is replaced by “column,” and we can sum over j instead of k in the
cofactor expansion.
(A.33) (How to compute determinants) The cofactor expansion reduces n × n de-
terminants to determinants of smaller size and so can be used recursively to com-
pute a determinant. However, for large matrices it is much more efficient to use
row operations. That is, to compute det A, row-reduce A and keep track of what
A.5. Linear Independence 413
c1 x1 + · · · + ck xk = 0 only when c1 = · · · = ck = 0.
A.7 Invertibility
(A.50) We recall from the introduction to Chapter 1 that that a mapping f : X →
Y from a set X to another set Y is invertible if there is another mapping g : Y → X
such that g(f (x)) = x for all x ∈ X and f (g(y)) = y for all y ∈ Y , and that f is
invertible if and only if f maps X onto Y and f is one-to-one.
(A.51) Now let A : Rn → Rm be a linear map. We first observe that A is one-
to-one if and only if N(A) = {0}, for Ax = Ay if and only if x − y ∈ N(A).
In particular, if m < n, then by (A.47) we have dim N(A) = n − dim R(A) ≥
n − m > 0, so A cannot be one-to-one. On the other hand, if m > n, then by
(A.47) again, dim R(A) ≤ n < m, so R(A) cannot be all of Rm . Hence, A can
be invertible in the sense of (A.50) only when n = m; in this case, it is not hard to
check that the inverse of A (if it exists) is again a linear map. Thus, for linear maps
the definition of invertibility in (A.50) agrees with the one in (A.13).
(A.52) For a linear map A : Rn → Rn , the following conditions are all equiva-
lent:
a. A is invertible.
b. R(A) = Rn .
c. N(A) = {0}.
d. R(A∗ ) = Rn .
e. N(A∗ ) = {0}.
f. The columns of the matrix A are linearly independent.
g. The rows of the matrix A are linearly independent.
h. det A ̸= 0.
i. The matrix A is a product of elementary matrices.
(A.53) Let us prove (A.52). First, (a) is equivalent to the conjunction of (b) and
(c) by the discussion in (A.50–A.51). (b) and (c) are equivalent to each other by
(A.47), as are (d) and (e), and (b) and (d) are equivalent!
by (A.49). (f) is equivalent
(c), for if cj = Aej is the jth column of A, we have nj=1 aj cj = 0 if and only
to !
if aj ej ∈ N(A); similarly, (g) is equivalent to (e).
Next, we can perform elementary row operations on A to turn A into a matrix
B in reduced echelon form; since performing row operations does not change the
row space of a matrix, we have R(A∗ ) = R(B ∗ ). But by (A.21) and (A.33), either
B = I, in which case det A ̸= 0 and R(A∗ ) = R(I) = Rn ; or B contains at least
one zero row, in which case det A = 0 and dim R(A∗ ) = dim R(B ∗ ) < n; thus
(h) is equivalent to (d).
We have shown that (a)–(h) are all equivalent. Finally, we observed in (A.19)
that every elementary matrix is invertible, and hence so is every product of elemen-
A.8. Eigenvectors and Eigenvalues 417
(A.58) Not all matrices have eigenbases. (In fact, some matrices have no eigen-
values at all, as long as we allow only real numbers. The situation changes dramat-
ically if we consider complex matrices and complex eigenvalues, but even then A
may not have an eigenbasis when the characteristic polynomial has multiple roots.)
However, there is an important class of matrices that do have eigenbases.
418 Appendix A. Summary of Linear Algebra
The n×n matrix A is called symmetric if A = A∗ , that is, if Ajk = Akj for all
j and k. One can show that every symmetric matrix has an orthonormal eigenbasis.
This is one of the major results of linear algebra, known as the spectral theorem
or principal axis theorem.
Appendix B
By bisecting the intervals [aj , bj ], we can write B0 as the union of 2n boxes whose
side lengths are half as big as those of B0 ; we denote this collection of boxes by
B1 . By bisecting the sides of each box in B1 , we can write B0 as the union of 22n
boxes whose side length are 14 as big as those of B0 ; we denote this collection of
boxes by B2 . Continuing inductively, for each positive integer k we can write B0
as the union of 2kn boxes whose side lengths are 2−k times as big as those of B0 ,
and we denote this collection of boxes by Bk .
419
420 Appendix B. Some Technical Proofs
Since det B ̸= 0 by assumption, at least one term in this sum must be nonzero. By
reordering the variables if necessary, we can assume that the last term is nonzero,
so det M kk ̸= 0.
Now, M kk is the matrix of partial derivatives of F1 , . . . , Fk−1 with respect to
the variables y1 , . . . , yk−1 , evaluated at (a, b), so by inductive hypothesis, the k −1
equations
F1 (x, y) = F2 (x, y) = · · · = Fk−1 (x, y) = 0
We wish to use the implicit function theorem for a single equation to solve the
equation G(x, yk ) = 0 for yk as a C 1 function of x, say yk = fk (x). Then for
j < k we will have yj = fj (x) where fj (x) = gj (x, fk (x)), and the proof will
be complete. (The method for computing the partial derivatives of f stated in (b) is
just implicit differentiation, as discussed in §2.5.)
Our task is to verify that the hypothesis of the implicit function theorem, namely
∂yk G(a, bk ) ̸= 0, is satisfied. To do this we need the chain rule, some facts about
determinants, and perseverance. To begin with,
k−1
" ∂Fk ∂gj
∂G ∂Fk
= + ,
∂yk ∂yj ∂yk ∂yk
j=1
"k−1
∂G ∂gj
(B.4) (a, bk ) = Bkj (a, bk ) + Bkk .
∂yk ∂yk
j=1
These k −1 equations can be solved for the desired quantities (∂gj /∂yk )(a, bk )
by Cramer’s rule (see (A.54) in Appendix A). The coefficient matrix in (B.5),
(Bij )k−1
i,j=1 , is what we called M
kk above, and the matrix obtained by replacing
But this is just the matrix M kj obtained by deleting the kth row and the jth column
from B except that the column involving the Bik ’s has been multiplied by −1 and
moved from the last slot to the jth slot. The determinant of this matrix is therefore
(−1)k−j det M kj — one factor of −1 because of the minus signs on the column of
Bik ’s, and k − j − 1 more factors of −1 from interchanging that column with the
succeeding k − j − 1 columns to move it back to its rightful place on the right end.
In short, the application of Cramer’s rule to the system (B.5) yields
∂gj det M jk
(a, bk ) = (−1)k−j .
∂yk det M kk
Now we are done. Substitute this result back into (B.4), noting that (−1)−j =
(−1)j , and recall (B.3):
k−1
"
∂G det M kj
(a, bk ) = (−1)j+k Bkj + Bkk
∂yk det M kk
j=1
!k j+k B det M kj
j=1 (−1) kj det B
= kk
= .
det M det M kk
Since det B ̸= 0 by assumption, this completes the verification that ∂yk G(a, bk ) ̸=
0 and hence the proof of the theorem.
Proof. We consider the upper sums SP f and SP ′ f ; the argument for the lower
sums is similar. If no extra point is added in the interval (xj−1 , xj ) in passing from
P to P ′ , both sums contain the term Mj (xj − xj−1 ), where Mj is the supremum
of f on [xj−1 , xj ]. If extra points are added, the term Mj (xj − xj−1 ) in SP f is
replaced by a sum of similar terms corresponding to subintervals of [xj−1 , xj ]. Both
Mj (xj −xj−1 ) and the latter sum are bounded in absolute value by C(xj −xj−1 ) <
Cδ, so their difference is bounded by 2Cδ. The total change from SP f to SP ′ f
is the sum of these differences, of which there are at most N , so it is less than
2CN δ.
B.7 Theorem. Suppose f is integrable on [a, b]. Given ϵ > 0, there exists δ > 0
such that if P = {x0 , . . . , xJ } is any partition of [a, b] satisfying
!J ;b
any Riemann sum 1 f (tj )(xj − xj−1 ) associated to P differs from a f (x) dx by
at most ϵ.
Proof. It is enough to prove the result for the lower and upper sums sP f and SP f ,
as all other Riemann sums lie in between these two. Pick a partition Q of [a, b] such
;b ;b
that SQ f < a f (x) dx + 12 ϵ and sQ f > a f (x) dx − 21 ϵ. Let N be the number
of subdivision points in Q, and let C be an upper bound for |f | on [a, b]; we claim
that any δ < ϵ/4N C will do the job. Indeed, suppose P = {x0 , . . . , xJ } satisfies
maxj (xj − xj−1 ) < δ. Then the partition P ∪ Q is obtained by adding at most N
points to P (namely, the points of Q that are not already in P ). By Lemma B.6 and
Lemma 4.3,
* b
1 1
SP f < SP ∪Q f + 2N Cδ < SP ∪Q f + 2 ϵ ≤ SQ f + 2 ϵ ≤ f (x) dx + ϵ,
a
;b ;
and likewise sP f > a f (x) dx − ϵ. Since sP f ≤ f (x) dx ≤ SP f , the proof is
complete.
424 Appendix B. Some Technical Proofs
In the next two sections we shall need the generalization of Theorem B.7 to
multiple integrals. The idea is exactly the same, but the notation is more compli-
cated. We give the precise statement of the result but leave the adaptation of the
one-dimensional proof to the reader.
provided that K ≥ N . (For (B.10) we are applying Theorem B.8 to ;the function f ,
b
and for (B.11) we are applying Theorem B.7 to the function g(y) = a f (x, y) dx.)
B.5. Change of Variables for Multiple Integrals 425
Let us fix K to be equal to N ; then the points yk are also fixed. By Theorem B.7
again, we can choose J large enough so that
)* b J
" )
) ) ϵ
) f (x, y ) dx − f (x , y ) ∆x )<
) k j k ) 3(d − c)
a j=1
Therefore, by (B.10),
)* * )
) K * b
" ) 2ϵ
) )
) f dA − f (x, yk ) dx ∆y ) < ,
) R a ) 3
k=1
Since ϵ is arbitrary, the double integral and the iterated integral must be equal.
As we observed in (1.3), the norms |x| and ∥x∥ are comparable to each other in
√
the sense that ∥x∥ ≤ |x| ≤ n∥x∥. The max-norm shares the following basic
properties with the Euclidean norm:
Hence, if we define
n
"
m
(B.12) ∥A∥ = max |Ajk |,
j=1
k=1
we have
∥Ax∥ ≤ ∥A∥ ∥x∥.
We shall need the variant of Theorem 2.88 that pertains to the norms just de-
fined, and an extension of it to nonconvex sets:
B.13 Lemma. Suppose F is a differentiable map from a convex set W ⊂ Rn into
Rm , and suppose that ∥DF(x)∥ ≤ M for all x ∈ W (where ∥DF(x)∥ is defined
by (B.12)). Then
∥F(x) − F(y)∥ ≤ M ∥x − y∥ for all x, y ∈ W.
Proof. Let F = (F1 , . . . , Fm ). By the mean value theorem (2.39), for each j there
is a point c on the line segment between x and y such that
n
"
Fj (x) − Fj (y) = ∇Fj (c) · (x − y) = (∂k Fj (c))(xk − yk ).
k=1
But then
n
"
|Fj (x) − Fj (y)| ≤ |∂k Fj (c)| ∥x − y∥ ≤ ∥DF(c)∥ ∥x − y∥ ≤ M ∥x − y∥.
k=1
Proof. Since U is open, for each x ∈ R there is a positive number r such that
the cube Q(2r, x) is contained in U . By the Heine-Borel theorem, R is covered
#J finitely many of the cubes Q(r, x) with side length half as large, say R ⊂
by
j=1 Q(rj , xj ). Let r0 be the smallest of the numbers r1 , . . . , rJ . Moreover, let
C1 and C2 be the maximum values of ∥DF(x)∥ and ∥F(x)∥ as x ranges over R.
(These maxima exist since R is compact and ∥DF(x)∥ and ∥F(x)∥ are continuous
functions of x ∈ R.)
Now suppose x, y ∈ R; then either ∥x − y∥ < r0 or ∥x − y∥ ≥ r0 . In
the first case, both x and y lie in one of the cubes Q(2rj , xj ). (Indeed, x lies
in one of the cubes Q(rj , xj ) since they cover R, and then y ∈ Q(rj + r0 , xj ).)
Since Q(2rj , xj ) is convex, we can apply Lemma B.9 to conclude that ∥DF(x) −
DF(y)∥ ≤ C1 ∥x − y∥. In the second case, we simply have
2C2
∥F(x) − F(y)∥ ≤ ∥F(x)∥ + ∥F(y)∥ ≤ 2C2 ≤ ∥x − y∥.
r0
B.15 Theorem. Suppose K ⊂ U is a compact set with zero content. Then G(K)
also has zero content.
Proof. First, since U is open, for each u ∈ K there is a cube centered at x whose
vertices have rational coordinates and whose closure lies in U . Since K is compact,
428 Appendix B. Some Technical Proofs
finitely many of these cubes cover K; thus, K ⊂ Rint where R is a finite union of
closed cubes contained in U . Let C be the constant in Lemma B.14, with R being
the set we have just defined.
Since K has zero content,# for any ϵ > 0! there is a finite collection
! n of cubes
{Q(rj , xj )} such that K ⊂ Q(rj , xj ) and n
V (Q(rj , xj )) = rj < ϵ/C n ,
and these cubes can be taken to be subsets of R. (See the remarks following Lemma
B.14.) By Lemma B.14, G(Q(rj , xj )) ⊂ Q(Crj , G(xj )). Thus G(K) is con-
tained
! ! of the cubes Q(Crj , G(xj )), and the sum of their volumes is
in the union
(Crj )n = C n rjn < ϵ. It follows that G(K) has zero content.
B.18 Lemma. Suppose that S and T are disjoint closed subsets of Rn and S is
compact. Then d(S, T ) > 0.
Proof. If the assertion is false, there exist sequences {xj } in S and {yj } in T such
that |xj − yj | → 0. Since S is compact, by passing to a subsequence we may
assume that xj converges to a point x ∈ S. But then yj → x also, so x ∈ T since
T is closed. This is impossible since S ∩ T = ∅.
B.5. Change of Variables for Multiple Integrals 429
B.19 Lemma. Suppose Q ⊂ U is a closed cube. For any invertible linear map
A : Rn → Rn ,
O Pn
V n (G(Q)) ≤ | det A| sup ∥A−1 DG(u)∥ V n (Q).
u∈Q
Proof. Let C = supu∈Q ∥A−1 DG(u)∥ (which is finite since Q is compact), and
notice that A−1 DG(u) = D(A−1 ◦ G)(u) since A−1 is linear. We apply Lemma
B.13 to the map F = A−1 ◦ G on the set W = Q to see that A−1 (G(Q)) is
contained in a cube Q′ whose side length is C times the side length of Q, and
whose volume is therefore C n times that of Q. Hence, by Theorem 4.35,
as claimed.
B.20 Lemma. Let R be a compact subset of U . For any ϵ > 0 there is a δ > 0
such that
) ) ) )
) −1 ) ) −1 )
) ∥DG(u) DG(v)∥ − 1) < ϵ and ) | det DG(u)| | det DG(v)| − 1) < ϵ
Proof. By (A.55) in Appendix A, the entries of the matrix DG(u)−1 DG(v) vary
continuously as u, v vary over R, so the functions ϕ(u, v) = ∥DG(u)−1 DG(v)∥
and ψ(u, v) = | det DG(u)|−1 | det DG(v)| are continuous on R × R. Moreover,
ϕ(u, u) = ψ(u, u) = 1 for all u ∈ R. (It follows easily from the definition
(B.12) that ∥I∥ = 1.) Since R × R is compact, ϕ and ψ are uniformly continuous
(Theorem 1.33). Hence, for any ϵ > 0 there is a δ > 0 such that |ϕ(u, v) −
ϕ(u′ , v′ )| < ϵ whenever ∥u − u′ ∥ + ∥v − v′ ∥ < δ, and likewise for ψ. Taking
u′ = v′ = u, we obtain the desired conclusions.
Proof. Since ∂T and ∂(G(T )) have zero content (Corollary B.16), the quantities
on either side of (B.22) are unchanged if we replace T by T . Hence we may as well
assume that T = T is compact.
We shall prove (B.22) by approximating the quantities on either side by finite
sums corresponding to a grid of small cubes. In detail, the process is as follows.
Pick a closed cube Q0 such that T ⊂ Q0 , and denote the side length of Q0 by l. By
partitioning the sides of Q0 into M equal pieces, we obtain a partition of Q0 into
M n equal subcubes of side length l/M ; denote this collection of closed cubes by
QM . Since distance from T to the complement of U is strictly positive by Lemma
B.18, all of the cubes in QM that intersect T will be contained in U provided M is
sufficiently large, say M ≥ M0 . For each M ≥ M0 , let RM be the union of those
cubes in QM that intersect T . Then RM is a compact set such that T ⊂ RM ⊂ U ,
and V n (RM ) → V n (T ) as M → ∞.
Now, let ϵ > 0 be given. We choose δ > 0 as in Lemma B.20, and we then pick
M ≥ M0 large enough so that l/M < δ and V n (RM ) < V n (T ) + ϵ.
#
Let Q1 , . . . , QK be the cubes in QM that intersect T , so that RM = Kk=1 Qk ,
and let xk be the center of Qk . Since l/M < δ, Lemma B.20 applies whenever
u ∈ Qk and v = xk . Thus, by Lemma B.19, with A = DG(xk ),
so
K
" K
"
n n n
V (G(T )) ≤ V (G(Qk )) < (1 + ϵ) | det DG(xk )|V n (Qk ).
k=1 k=1
so
* *
n n1
| det DG(xk )|V (Qk ) = | det DG(xk )| d u < | det DG(u)| dn u.
Qk 1−ϵ Qk
In short,
K * *
(1 + ϵ)n " (1 + ϵ)n
V n (G(T )) ≤ | det DG| dV n = | det DG| dV n .
1−ϵ Qk 1 − ϵ RM
k=1
B.5. Change of Variables for Multiple Integrals 431
Finally, let C be the maximum of | det DG(u)| as u ranges over the compact set
RM0 . Then
)* * ) *
) )
) | det DG| dV n
− | det DG| dV n)
| det DG| dV n
) )=
RM T RM \T
≤ C[V (RM ) − V n (T )] < Cϵ.
n
Therefore,
*
n (1 + ϵ)n (1 + ϵ)n
V (G(T )) ≤ | det DG| dV n + C ϵ.
1−ϵ T 1−ϵ
Since ϵ is arbitrary and C is independent of ϵ, (B.22) follows.
B.23 Lemma. Let T be a measurable set such that T ⊂ U , and let f be a bounded
nonnegative function on G(T ) that is continuous except perhaps on a set of zero
content. Then
* *
n
f (x) d x ≤ f (G(u))| det DG(u)| dn u.
G(T ) T
;
Proof. Consider a lower Riemann sum for G(T ) f :
J
"
sP f = mj V n (Qj ),
j=1
where the Qj ’s are cubes with disjoint interiors contained in G(T ) and mj =
inf x∈Qj f (x). (The hypothesis f ≥ 0 is needed so that the cubes Qj satisfy Qj ⊂
G(T ), not just Qj ∩ G(T ) ̸= ∅.) By Theorem B.15 and Corollary B.17 (applied
to G−1 ), the sets G−1 (Qj ) are measurable and overlap only in sets of zero content.
By Lemma B.21, then, we have
"
sP f = mj V n (Qj )
" * "*
n
≤ mj | det DG| dV ≤ (f ◦ G)| det DG| dV n
G (Qj )
−1 G (Qj )
−1
* *
= " (f ◦ G)| det G| dV ≤ (f ◦ G)| det DG| dV n .
n
G−1 (Qj ) T
#
(For the last inequality we used the fact that G−1 (Qj ) ⊂ T and the assumption
that f ≥ 0.) Taking the supremum over all lower Riemann sums sP f , we obtain
the desired conclusion.
432 Appendix B. Some Technical Proofs
At last we come to the main result, for which we restate the hypotheses in
full. We assume that f : G(T ) → R is bounded and continuous except on a
set of zero content (and hence is integrable on G(T )); by Corollary B.17, this
implies that f ◦ G : T → R is also bounded and continuous except on a set of
zero content (and hence is integrable on T ). It is actually enough to assume that
f is integrable on G(T ), but then an additional argument would be necessary to
establish the integrability of f ◦ G.
Proof. It suffices to show that each of these integrals is less than or equal to the
other one. For f ≥ 0, Lemma B.23 proves one of these inequalities, and the
reverse inequality follows by applying Lemma B.23 to the inverse transformation.
More precisely, if in Lemma B.23 we replace T by G(T ), G by G−1 , and f by
(f ◦ G)| det DG|, we obtain
*
f (G(u))| det DG(u)| dn u
T *
≤ f (G(G−1 (x)))| det DG(G−1 (x))|| det DG−1 (x)| dn x.
G(T )
But by the chain rule (2.86), the matrices DG(G−1 (x)) and DG−1 (x) are inverses
of each other, so their determinants
; are reciprocals of each other; hence, the integral
on the right is simply G(T ) f (x) dn x. Thus the theorem is proved for the case
f ≥ 0. The general case follows by writing f = (f + C) − C where C ≥ 0 is
sufficiently large that f + C ≥ 0 on T . The argument just given applies to f + C
and to the constant functon C; subtracting the results yields the theorem.
Then * *
lim f dV n = lim f dV n ,
j→∞ Uj j→∞ U
!j
(Note that the exponent 1/(x − a)(x − b) is negative for a < x < b and tends to
−∞ as x → a+ or x → b−.) An argument like that in Exercise 9, §2.1, shows that
f and all its derivatives vanish as x → a+ or x → b−, so f is C ∞ even at a and b.
For the n-dimensional case, then, the function
Proof. The starting point is a fact we demonstrated in the course of proving The-
orem B.1: There is a grid B of closed rectangular boxes such that each box in B
that intersects K is contained in one of the sets Uj . Let B1 , . . . , BM be the boxes
in B that intersect K, and let BM +1 , . . . , B#NMbe the additional boxes in B that in-
tersect at least one of B1 , . . . , BM . (Thus, 1 Bm is a compact set contained in U
#
whose interior contains K, and N 1 Bm is obtained by adding one additional layer
#
of boxes around the boundary of M 1 Bm .)
For 1 ≤ m ≤ M , the box Bm is contained in one of the Uj ’s, say Uj(m) ; let
cm = d(Bm , Uj(m) c ). (Here d(S, T ) is the distance from S to T , defined before
Lemma B.18.) On the other hand, for M < m ≤ N we have Bm ∩ K = ∅; let
cm = d(Bm , K). The numbers cm are all positive by Lemma B.18. Let η be the
smallest of the side lengths of the Bm ’s, let
1
δ = √ min(c1 , . . . , cN , η),
2 n
and for 1 ≤ m ≤ N let B :m be the closed box with the same center as Bm whose
side lengths are larger than those of Bm by the amount δ. Then the boxes B :m have
:
the following properties: First, each point of Bm is in the interior of Bm . Second,
since δ < 21 η, for m ≤ M each point of B :m is in the interior of one of the B
:l ’s. (It
is the points on the boundary of B :m that are at issue here, and it may happen that
:
l > M .) Third, if x ∈ Bm , there is a point y ∈ Bm such that |xj − yj | ≤ δ for all
B.7. Green’s Theorem and the Divergence Theorem 435
√ √ :m ⊂ Uj(m) for
j, and hence |x − y| ≤ δ n. Since δ n < 21 cm , it follows that B
m ≤ M and B :m ∩ K = ∅ for m > M .
Now, for 1 ≤ m ≤ N , choose a C ∞ function ψm such that ψm > 0 on B :m
: c
and ψm = 0 on Bm , according to Lemma B.26, and let
ψm
ϕm = !N (1 ≤ m ≤ M ),
l=1 ψl
Proof. The starting point is the special case of Green’s theorem, proved in §5.2, in
which S is x-simple and y-simple. (What we actually need here is the case where
S is a rectangle with sides parallel to the axes.) In contrast to the method used in
§5.2 to handle more general regions, instead of cutting up the region into simple
pieces, we shall use a partition of unity to cut up the integrand into pieces that are
easily analyzed by a change of variables.
By Theorem 3.13, for every point x ∈ ∂S there is an open disc D centered at x
such that the portion of ∂S within D is the graph of a C 1 function, either y = f (x)
or x = f (y). By the Heine-Borel theorem, we can select finitely many of these
436 Appendix B. Some Technical Proofs
S∩D L
F IGURE B.1: The transformation (x, y) → (u, v). The disc D is the
indicated by the dashed circle on the left; the rectangle R to which
Green’s theorem is to be applied is dotted on the right.
#
discs, say D1 , . . . , DJ , so that ∂S ⊂ J1 Dj . Then D1 , . . . , DJ , and S int form an
open covering of S.
By Theorem B.27 we can! choose a partition of !unity {ϕm }M1 on S subordinate
m m
to this covering. Then P = 1 ϕm P and Q = 1 ϕm Q on S, and ϕm P and
ϕm Q are still of class C 1 , so it suffices to prove the theorem with P and Q replaced
by ϕm P and ϕm Q for m = 1, . . . , M . In short, it is enough to prove the theorem
when supp(P ) and supp(Q) are either (a) contained in S int or (b) contained in a
disc D such that D ∩ ∂S is the graph of a C 1 function. ;
In case (a), P and Q both vanish on ∂S, so ∂S P dx + Q dy = 0. Also, P
and Q remain C 1 if we extend them to be zero outside of S. But then we can apply
Green’s theorem on any rectangle R that includes S to conclude that
** - . ** - . *
∂Q ∂P ∂Q ∂P
− dA = − dA = P dx + Q dy = 0.
S ∂x ∂y R ∂x ∂y ∂R
lower half-plane if S lies below; thus, the relative orientations of T and L are the
same as those of S and ∂S. See Figure B.1.
Let R be a rectangle in the uv-plane, one of whose sides is the segment L, that
includes T . Then the functions P: = P ◦ G and Q : = Q ◦ G are C 1 functions on R
that vanish on the three sides of R other than L.
Now, dx = du and dy = f ′ (u) du + dv, so
* *
P dx + Q dy = P: du + Q[f
: ′ (u) du + dv],
∂S L
where L is oriented as a portion of ∂R. Since P: and Q: vanish on the other sides of
R, we can apply Green’s theorem on R to conclude that
* *
P dx + Q dy = [P:(u, v) + Q(u,
: v)f ′ (u)] du + Q(u,
: v) dv
∂S ∂R
** O : P
∂ Q ∂ P: ∂ Q :
(B.29) = − − f ′ (u) du dv.
R ∂u ∂v ∂v
Let us indicate how this argument can be extended to a region S with piece-
wise smooth boundary. Recall from §5.1 that “piecewise smooth” means that ∂S
consists of curves that are smooth except at finitely many points, where they have
“corners,” i.e., where the direction of the curve changes abruptly. If x0 is such a
point, there is a small disc D centered at x0 such that ∂S ∩ D is the union of por-
tions of two smooth curves that intersect at x0 . By Theorem 3.13, by shrinking D
if necessary we may assume that these curves are the loci of equations F (x0 ) = 0
and G(x0 ) = 0, where ∇F ̸= 0 and ∇G ̸= 0 on D. We shall assume that ∇F (x0 )
and ∇G(x0 ) are linearly independent. (The exceptional case where they are not —
that is, where the two curves are tangent at x0 and the region has a sharp “cusp”
rather than a “corner” at x0 — must be handled by an additional limiting argu-
ment, in which S is approximated by regions with smooth boundaries.) Then, by
438 Appendix B. Some Technical Proofs
the inverse mapping theorem, by shrinking D yet further we may assume that the
transformation u = F (x, y), v = G(x, y) has a C 1 inverse on D.
Now, as in the proof of Theorem B.28, we can cover ∂S by finitely many discs
D1 , . . . DJ such that ∂S ∩ Dj is the graph of a smooth function, together with
finitely many discs DJ+1 , . . . DK centered at the corners and satisfying the condi-
tions of the preceding paragraph. By using of a partition of unity subordinate to
the covering {D1 , . . . , DK , S int } of S, we reduce to the case where P and Q are
supported in one of these discs. The discs Dj of the first kind (j ≤ J) are han-
dled as before. For the ones centered at a corner, we use the change of variables
u = F (x, y), v = G(x, y) described above to reduce to the case where the bound-
ary consists of a segment of the u-axis and a segment of the v-axis that meet at the
origin. (This change of variables is not as simple as the one we used before, so
the calculations are more complicated, but the idea is the same.) If S occupies the
“inside” of the corner, the calculation boils down to Green’s theorem on a rectangle
as before; if S occupies the “outside,” it boils down to Green’s theorem for two
rectangles; see Figure B.2.
Finally, we prove the divergence theorem for general regions with C 1 boundary.
The argument can be extended to handle regions with piecewise smooth boundary
in a manner similar to that in the preceding paragraphs.
Proof. The proof is very similar to that of Theorem B.28, so we shall omit many
details. By using a partition of unity, we reduce the problem to proving the theorem
when supp(F) ⊂ Rint or when supp(F) ⊂ B where B is a ball such that ∂R ∩ B
1
;; the graph of a C;;; function, say z = ϕ(x, y). In the first case, the integrals
is
∂R F · n dA and R div F dV both vanish, as in Theorem B.28. In the second
case, we introduce a change of variables on B, (x, y, z) = G(u, v, w), defined by
where 8 9
H = − (∂u ϕ)F:1 − (∂v ϕ)F:2 + F:3 k.
Here the ± is + or − depending on whether R (resp. Q) lies below or above the
surface z = ϕ(x, y) (resp. the uv-plane), that is, on whether the outward normal to
Q on S is +k or −k; the last equality holds because F : vanishes on ∂Q \ S. In the
vector field H, the functions F:j depend on (u, v, w), but ϕ depends only on (u, v).
By the divergence theorem for the box Q (proved in §5.5), then,
** ***
(B.31) F · n dA = div H dV
∂R Q
*** F G
∂ϕ ∂ F:1 ∂ϕ ∂ F:2 ∂ F:3
= − − + dV.
Q ∂u ∂w ∂v ∂w ∂w
∂ F:1 Q1 ∂F
∂F Q1 ∂ϕ ∂ F:2 Q2
∂F Q2 ∂ϕ
∂F
= + , = + ,
∂u ∂x ∂z ∂u ∂v ∂y ∂z ∂v
440 Appendix B. Some Technical Proofs
and
∂ F:j Qj
∂F
= for j = 1, 2, 3,
∂w ∂z
where the tildes continue to denote composition with G. Substituting these formu-
las into (B.31), we obtain
** *** F Q Q2 ∂ F:2 ∂F Q3
G
∂F1 ∂ F:1 ∂F
(B.32) F · n dA = − + − + dV.
∂R Q ∂x ∂u ∂y ∂v ∂z
We are almost done. On the one hand, by integrating first with respect to u or
v, we see that
*** ***
∂ F:1 ∂ F:2
dV = dV = 0,
Q ∂u Q ∂v
because F1 and F2 vanish on the vertical faces of Q. On the other hand, the trans-
formation G is volume-preserving,
∂(x, y, z)
= 1,
∂(u, v, w)
so by Theorem B.24,
*** F Q Q2 ∂FQ3
G *** ***
∂F1 ∂F "
+ + dV = div F dV = div F dV.
Q ∂x ∂y ∂z Q R
In conclusion, we remark that these calculations appear more natural if the argu-
ment is recast in the language of differential forms as described in §5.9.
Answers to Selected Exercises
CHAPTER 1
Section 1.1 √
1. ∥x∥ = 2 3, ∥y∥ = 3, θ = 5π/6.
Section 1.2
1. (a) Not open or closed; ∂S = {(0, 0)} ∪ {(x, y) : x2 + y 2 = 4}.
(b) Closed; ∂S = {(x, 0) : 0 ≤ x ≤ 1} ∪ {(x, x2 − x) : 0 ≤ x ≤ 1}.
(c) Open; S = {(x, y) : x ≥ 0, y ≥ 0, and x + y ≥ 1}.
(d) Closed; S int = ∅.
Section 1.3
3. f (0, y) = y.
5. Discontinuous only at (0, 0).
7. Continuous at every irrational.
Section 1.4
√
1. (a) 1/ 2. (b) 0. (c) Diverges.
2. Any K ≥ (19/ϵ) + 5 will work.
3. lim xk = 0.
Section 1.5
1. (a) sup S = 1, inf S = −1.
(b) sup S = 2, inf S = −1.
(c) sup S = ∞, inf S = π/4.
5. lim xk = 2.
441
442 Answers to Selected Exercises
CHAPTER 2
Section 2.2
1. (a) ∇f (x, y) = (2xy + πy cos πxy, x2 + πx cos πxy);
[∇f (1, −2)] · ( 35 , 45 ) = − 15 (8 + 2π).
2. (a) df = ex−y+3z [(2x + x2 ) dx − x2 dy + 3x2 dz];
f (1.1, 1.2, −0.1) − f (1, 1, 0) ≈ −0.2.
3. (a) dz = 0.036. (b) z.
Section 2.3
1. (Derivatives of f , g, and h are to be evaluated at the same points as f , g, and h
themselves.) (a) dw/dt = f1 (g1 h′ + g2 ) + f2 h′ + f3 .
(b) ∂x w = f1 + f2 g1 + f3 h1 , ∂y w = f2 g2 , ∂z w = f3 h2 .
(c) dw/dx = f ′ (g1 + g2 h′ ).
2. (a) ∂x w = 2f1 + (sin 3y)f2 + 4x3 f3 , ∂y w = −2yf1 + (3x cos 3y)f2 (f1 and
f2 evaluated at (2x − y 2 , x sin 3y, x4 )).
(c) ∂x w = 2(∂2 f )/(f 2 + 1), ∂y w = (2y∂1 f − ∂2 f )/(f 2 + 1) (f and its
derivatives evaluated at (y 2 , 2x − y)).
6. (a) z = 4x − 3y − 6.
(b) 2x + 4y − 6z = 12.
Section 2.5
1. (a) ∂z/∂x = (1 − 3yz)/(3xy − 3z 2 ), ∂z/∂y = (2y − 3xz)/(3xy − 3z 2 ).
3. dz/dt = (2yzt + 5y 4 t + zteyz )/(10y 4 z 3 + 2z 4 eyz − y 2 eyz − yt2 ).
4. 2x, 2x + 6xz 2 .
5. (∂V /∂h)|r = πr 2 , (∂V /∂h)|S = πr 2 − 2πr 2 h/(2r + h), (∂V /∂S)|r =
r/2, (∂S/∂V )|r = 2/r.
Section 2.6
2. r sin θ cos θ(fyy − fxx ) + r(cos2 θ − sin2 θ)fxy − (sin θ)fx + (cos θ)fy .
3. (a) ∂x2 w = 4f11 + 4(sin 3y)f12 + 16x3 f13 + (sin2 3y)f22 + 8x3 (sin 3y)f23 +
16x6 f33 +12x2 f3 , ∂x ∂y w = −4yf11 +(6x cos 3y−2y sin 3y)f12 −8x3 yf13 +
3x(sin 3y cos 3y)f22 + 12x4 (cos 3y)f23 + 3(cos 3y)f2 .
Section 2.7
1. (b) 1/24.
2. (a) P1,3 (h) = h − 21 h2 + 13 h3 , C = 4.
(b) P1,3 (h) = 1 + 12 h − 18 h2 + 161 3
h , C = 5 · 2−7/2 . (Note: These C’s come
from Lagrange’s formula and may not be optimal.)
443
4. 0.747.
5. (a) x2 + xy − 16 (x4 + 3x3 y + 3x2 y 2 + xy 3 ).
(b) 1 + xy − 12 (x4 + x2 y 2 + y 4 ).
6. P(3,1),3 (h, k) = 2 + h + 3k + hk + 12 (π 2 − 3)k2 − 21 hk2 + k3 .
7. P(1,2,1),3 (h, k, l) = 3 + 4h + k + l + 2h2 + 2hk + h2 k.
Section 2.8
1. (a) (0, −2)
√ and (0, 1) minima, √ (0, 0) saddle. √ √
(b) (±1, 2) minima, (0, − 2) maximum, (±1, − 2) and (0, 2) saddles.
(c) (1, ±1) and (0, 0) saddles, ( 32 , 0) minimum.
(e) (0, 0) minimum, (±1, 0) maxima, (0, ±1) saddles.
(f) ((a2 /b)1/3 , (b2 /a)1/3 ), a minimum if a and b have the same sign and a
maximum otherwise.
Section 2.9
1. min = − 12 , max = 4.
2. min = −4, max = 16 .
√5 √
3. min = (308 − 62 31)/27, max = 2/3 3.
4. min = − 85 3 , max = 56.
5. A2 /(1 + b2 + c2 ).
6. min = 0, max = 2/e.
7. min = −2/e, max = 1/e.
8. 3(12)1/3
9. min = 1, √ max√= 3.
√
11. ( a + b + c)2 .
12. 3V 1/3 .
13. ( 21 , 12 , 0).
√
14. Vmax = A3/2 /6 3.
15. (2, 0, 2).
16. a2 b2 .
Section 2.10
1. ∂(u, v)/∂(x, y) = 3xy 2 z 2 − yz 3 + 24y 3 , ∂(u, v)/∂(x, z) = −y 2 z 2 − 6xy 3 z,
∂(u, v)/∂(y, z) = xyz 2 + 8y 2 − 12x2 y 2 z.
2. ∂(u, v)/∂(x, y) = 3x − 18y, ∂(v, w)/∂(x, y) = −6x2 − 18y 2 ,
∂(u,⎛w)/∂(x, y) ⎞
= −12x − 6y.
−15 −20
3. (b) ⎝ 3 4 ⎠.
2 4
444 Answers to Selected Exercises
- .
8 6 −21
4. (b) .
18 10 −43
CHAPTER 3
Section 3.1
3. y yes; z no.
5. ∂2 F (0, 0) ̸= 0 and ∂1 F (0, 0) ̸= −1.
6. Can solve for x and y or y and z.
7. Can solve for any pair.
9. Yes.
Section 3.2
1. (a), (c), (f) are smooth curves.
3. (a), (c), (e) are smooth curves.
Section 3.3
1. (a) Plane.
(b) Elliptic cone.
(c) Hyperboloid of revolution.
(d) Paraboloid of revolution.
2. (a) 2x − y − z = 3. (b) x − y = 3.
3. (a) One possibility: f (u, v) = (u cos v, u sin v, f (u)) (a < u < b, |v| ≤ π).
4. (a) One possibility: f (t) = (1 + t, 13 + t, 83 + t).
√
5. (a) One possibility: f (t) = 12 (1+cos t, 2 sin t, 1−cos t). (b) One possibility:
√
f (t) = 21 (1 + t, − 2, 1 − t).
Section 3.4
1. (a) det Df = e2x ; x = 12 log(u2 + v 2 ); y is given up to multiples of 2π by
arctan(v/u) when u > 0, 12 π − arctan(u/v) when v > 0, π + arctan(v/u)
when u < 0, 32 π − arctan(u/v) when v < 0.
2. (a) (x, y) = 13 (2v − u, v − 2u).
√ √
4. (d) g(u, v) = 12 (u − u2 + 4v, −u − u2 − 4v).
Section 3.5
1. One relation for (a), (c), and (e); two relations for (d).
445
CHAPTER 4
Section 4.3
√
1. (a) 54 . (b) 32 35 (5 − 2).
1
2. 20 .
; 0 ; x3 ; 0 ; y/4
3. (a) −2 4x f (x, y) dy dx, −8 y1/3 f (x, y) dx dy.
;2;x ; 3 ; 4−x
(b) 0 x/3 f (x, y) dy dx + 2 x/3 f (x, y) dy dx,
; 1 ; 3y ; 2 ; 4−y
0 y f (x, y) dx dy + 1 y f (x, y) dx dy.
; 1 ; y1/2
4. (a) 0 y3 f (x, y) dx dy.
;0 ;1 ;2;1
(b) −1 −x f (x, y) dy dx + 0 x/2 f (x, y) dy dx.
5. (a) 58 e6 − 17 e2 . (b) 13 (sin 2 − sin 1). (c) 21 e2 − e.
;1 '8 ;2 '
6. 0 f (y) y/2 dy + 1 f (y)( y/2 − y + 1) dy.
; 1 ; √1−x2 ; 1
8. (a) −1 −√1−x2 x2 +y2 f dz dy dx.
; 1 ; 1 ; √z−x2
(b) −1 x2 −√z−x2 f dy dz dx.
√
; 1 ; √z ; z−y2
(c) 0 −√z √ 2 f dx dy dz.
− z−y
; 1 ; √1−x ; y
9. (b) 0 0 f dz dy dx.
; 1 ; √1−x ; 0√1−x
(c) 0 0 z f dy dz dx.
10. 41 (a, b, c).
11. mass = 8, center of mass = (1, 43 , 43 ).
12. − 126 5 .
Section 4.4
1. 3π/2.
2. ( π1 , 0, 34 ).
√
3. 4π( 83 − 3).
4. 2π − 32 9 .
1 2 h2 .
5. 2 πcR
6. 5π/3.
7. πcR4 /3.
8. ( 83 , 38 , 38 ).
1
9. 14 (55, −5).
4
10. 81 . √
11. π/3 3.
12. A = 23 log 4, x = 14
9 log 4 , y= 28
9 log 4 .
13. 3.
446 Answers to Selected Exercises
14. 3.
15. 21 π 2 R4 .
Section 4.5
- .
1 1 + ex
2. (a) log .
x 1+x
(b) (2x)−1 (5 cos x5 − cos x).
2
(c) x−1 (2e3x − ex ).
Section 4.6
1. (a) Converges. (b) Diverges. (c) Converges. (d) Converges. (e) Diverges.
2. (a) Converges. (b) Diverges. (c) Converges. (d) Converges. (e) Diverges.
3. (a) Converges. (b) Diverges. (c) Converges. (d) Diverges. (e) Converges.
(f) Diverges.
4. (b) p > 1.
5. (b) p > 1.
10. − 21 log 3.
Section 4.7
1√
2. (a) Diverges. (b) 14 π. (c) 2π/3. (d) 2 π. (e) Diverges.
CHAPTER 5
Section 5.1
√
1. (a) 2π a2 + b2 . (b) 14 2
3 . (c) e . (d) 24.
'
2. /(a) 4aE( 1 − (b/a)2 ). 0(b) 2 E(2−1/2 ).
3/2
Section 5.2
1. (c) 12. (d) 0.
2. 15
2 π.
3. The circle x2 + y 2 = 1.
4. 3πR2 .
447
Section 5.3
1. 32 π[(1 + a2 )3/2 − 1].
2. 16 π[(1 + 4a2 )3/2 − 1].
3. 4π 2 ab. O √ P
2 2πab2 a + a2 − b2
4. 2πa + √ log if a > b,
a2 − b2 b
O√ P
2πab 2 b 2 − a2
2πa2 + √ arcsin if b > a.
b2 − a2 b
5. (0, 0, 21 ).
6. 20π/3.
7. 0.
8. (a) − 179 . (b) 0. (c) 2. (d) π(b2 − a2 ). (e) π(25/2 − 72 ).
Section 5.4
1. (a) curl F = xi − yj + (y − 2xy)k, div F = x + y 2 .
(b) curl F = 0, div F = −x(y 2 + z 2 ) sin yz.
(c) curl F = (1 − 4xy)i − (x2 − 3z 2 )j + 4yzk, div F = 0.
2. (a) 0. (b) 2x − 24yz. (c) a(a + n − 2)|x|a−2 . (d) 0.
Section 5.5
1. (c) 3a4 . (d) 4π(a2 b2 + b2 c2 + a2 c2 )/3abc. (e) 3A.
2. 4πa5 .
6. (a) −x/|x|3 .
Section 5.6
3. (a) 2ρ(xi + yj)/(x2 + y 2 ).
Section 5.7
1. 2π. √
2. −πa2 / 2.
4. 0.
5. 0.
7. 5 + 3π(r 2 − 1).
Section 5.8
1. (a) x2 y + 13 x3 − 13 y 3 + C.
(b) Not a gradient.
(c) e2x sin y − 3xy + 5x + C.
448 Answers to Selected Exercises
CHAPTER 6
Section 6.1
1. (a) −1 − 2√−1/3 < x < −1 + 2−1/3 ; (2x + 2)/[1 − 2(x + 1)3 ].
√
(b) x < − 2 or x > 2; 10/(x2 − 2).
(c) x > 0; 12 (1 + x−1 ).
(d) e−1 < x < e; log x/(1 − log x).
2. (a) Diverges. (b) 1. (c) Diverges. (d) Diverges.
Section 6.2
1. Converges.
2. Converges.
3. Diverges.
4. Converges.
5. Diverges.
6. Converges.
7. Diverges.
8. Diverges.
9. Converges.
10. Converges.
11. Diverges.
12. Converges.
13. Converges.
14. Diverges.
15. Diverges.
16. Converges.
17. Converges.
18. Diverges.
21. p > 1.
449
Section 6.4
1. Converges absolutely for −3 ≤ x ≤ −1.
2. Converges absolutely for 0 < x < 1.
3. Converges absolutely for all x.
4. Converges absolutely for −5 < x < 5, conditionally for x = −5.
5. Converges absolutely for 2 < x < 6, conditionally for x = 6.
6. Converges absolutely for x > 0, conditionally for x = 0.
7. Converges absolutely for 4 < x < 8.
8. Converges absolutely for −2 < x < 0, conditionally for x = −2 and x = 0.
9. Converges absolutely for − 32 < x < 23 , conditionally for x = − 32 .
10. Converges conditionally.
11. Converges conditionally.
12. Diverges.
13. Converges absolutely.
14. Converges conditionally.
18. Converges when |x| < 1 and θ ∈ R, when x = 1 and θ ̸= 2kπ, or when
x = −1 and θ ̸= (2k + 1)π.
CHAPTER 7
Section 7.1
1. (a) Uniform convergence on [0, 1 − δ] (δ > 0).
(b) Uniform convergence on [δ, 1] (δ > 0).
(c) Uniform convergence on [0, 12 π − δ] and [ 12 π + δ, π] (δ > 0).
(d) Uniform convergence on R.
(e) Uniform convergence on [δ, ∞) (δ > 0).
(f) Uniform convergence on [0, b] (b < ∞).
(g) Uniform convergence on [0, 1 − δ] and [1 + δ, ∞) (δ > 0).
2. (a) Uniform convergence on [δ, ∞) (δ > 0).
(b) Uniform convergence on [−1, 1].
(c) Uniform convergence on [−2 + δ, 2 − δ] (δ > 0).
(d) Uniform convergence on R.
(e) Uniform convergence on R.
(f) Uniform convergence on [1 + δ, ∞) (δ > 0).
Section 7.3
∞
" (−1)n x2n+1
5. (a) , x ∈ R.
0
n!(2n + 1)
450 Answers to Selected Exercises
∞
" (−1)n x4n+1
(b) , x ∈ R.
(2n)!(4n + 1)
0
∞
" (−1)n−1 (2x)n
(c) , |x| ≤ 12 .
n2
1
10. (a) ;ex + x−1 (1 − ex ).
x
(b) 0 t−2; x(1 − cos t) dt.
(c) x−1 0 t−1 (et − 1) dt.
(d) cos x − x sin x.
Section 7.5
1 · 3 · · · (2n − 3) π
4. (2n−1)/2
.
2 · 4 · · · (2n − 2) 2x
Section 7.6
√ ' 3 √
3. (a) 83 π. (b) 12 π/27. (c) 16 π.
5. Γ((a + 1)/b)Γ(c + 1)/bΓ(c + 1 + (a + 1)/b).
1 · 3 · · · (k − 1) π 2 · 4 · · · (k − 1)
7. if k is even, if k is odd (and k > 1).
2 · 4···k 2 3 · 5···k
10. (a) Diverges. (b) Converges.
CHAPTER 8
Section 8.1
∞
4 " sin(2m − 1)θ
1. .
π 1 2m − 1
1
2. 2 − 12 cos 2θ.
∞
2 4 " cos 2mθ
3. − .
π π 1 4m2 − 1
"∞
π2 (−1)n
4. +4 cos nθ.
3 n2
1
∞
sinh bπ " (−1)n inθ
5. e .
π −∞
b − in
∞
8 " sin(2m − 1)θ
6. .
π (2m − 1)3
1
"∞
2 sin na
7. cos nθ.
a(π − a) 1 n
451
∞
1 2 " 1 − cos na
8. + cos nθ.
2π π n2 a2
1
Section 8.2
"∞
sin 2nθ
1. (a) .
n
1
∞
2 " sin 2nθ
(b) 1 − .
π 1 n
"∞
π2 (−1)n
2. +4 (cos 41 nπ cos nθ + sin 14 nπ sin nθ).
3 n2
1
∞
1 2 " sin(2m − 1)θ
3. (a) + .
2 π 2m − 1
1
∞
1 2 " cos 2mθ 1
(b) − 2
+ sin θ.
π π 1 4m − 1 2
∞
1 1 " sin na
(c) + cos nθ.
2π π na
1
∞
2 sinh π " (−1)n−1 n
(d) sin nθ.
π n2 + 1
1
4. (a) 12 , 14 (π − 2).
1 2
(b) 61 π 2 , 12 π .
(c) (πb csch πb − 1)/2b2 , (πb coth πb − 1)/2b2 .
1 3
(d) 32 π .
Section 8.3
∞
" sin na
2
2. (b) sin nθ.
a(π − a) n2
1
6. (a) k = 6.
(b) k = ∞.
(c) k = 0, i.e., the function is merely continuous. (It is known to be nowhere
differentiable.)
Section 8.4
∞
4 " sin(2m − 1)θ
1. (a) 1; .
π 2m − 1
1
∞
2 4 " cos 2mθ
(b) − ; sin θ.
π π 1 4m2 − 1
452 Answers to Selected Exercises
∞
" "∞ ∞
π2 (−1)n (−1)n+1 8 " sin(2m − 1)θ
(c) +4 cos nθ; 2π sin nθ− .
3 n2 n π (2m − 1)3
1 1 1
∞ ∞
π 2 " cos(4m − 2)θ 4" sin(2m − 1)θ
(d) − 2
; (−1)m+1 .
4 π (2m − 1) π (2m − 1)2
1 1
∞
4 " sin(2m − 1)πx
2. (a) .
π 1 2m − 1
∞
4 " (−1)m+1 cos( 12 m − 14 )πx
(b) .
π 2m − 1
1
∞
8l2 " sin(2m − 1)πx/l
(c) 3 .
π (2m − 1)3
1
"∞
e2πinx
(d) (e − 1) .
−∞
1 − 2πin
Section 8.5
∞
400 " 1 2 2 (2m − 1)πx
1. (a) u(x, t) = 50 − 2 2
e−(0.00011)(2m−1) π t cos .
π 1
(2m − 1) 100
! !∞
2. u(x, t) = ∞ 2
−∞ cn exp[inθ8 − n kt] ;where f (θ) = −∞ cn e
inθ .
t 9
3. bn (t) = exp(−n2 π 2 kt/l2 ) bn (0) + 0 βn (s) exp(n2 π 2 ks/l2 ) ds .
∞
2l2 m " 1 nπb nπx nπct
4. (a) u(x, t) = 2 sin sin cos .
π b(l − b) 1 n2 l l l
! −δt
5. u(x, t) = ∞ 1 e (bn cos ωn t + Bn sin ωn t) sin(nπx/l),
where ωn2 = (nπc/l)2 − δ2 .
"∞ + ,
1 sinh(nπ(L − y)/l) 2 sinh(nπy/l)
6. (b) u(x, y) = an + an sin(nπx/l),
sinh(nπL/l) sinh(nπL/l)
1! !∞ 2
where f1 (x) = ∞ 1
1 an sin(nπx/l) and f2 (x) = 1 an sin(nπx/l).
Section 8.6
3. a = − 12 , b = −1, c = 61 .
9. (a) π 4 /90.
(b) π 6 /960.
(c) π 8 /9450.
(d) 21 a(π − a) if 0 ≤ a ≤ π, π-periodic as a function of a.
Bibliography
[1] H. Anton, Elementary Linear Algebra (7th ed.), John Wiley, New York, 1994.
[2] R. G. Bartle, Return to the Riemann integral, Amer. Math. Monthly 103
(1996), 625–632.
[3] H. S. Bear, A Primer of Lebesgue Integration, Academic Press, San Diego,
1995.
[4] G. Birkhoff and S. Mac Lane, A Survey of Modern Algebra (5th ed.), A K Pe-
ters, Wellesley, MA, 1997.
[5] J. D. DePree and C. W. Swartz, Introduction to Real Analysis, John Wiley,
New York, 1988.
[12] S. G. Krantz, Real Analysis and Foundations, CRC Press, Boca Raton, FL,
1991.
453
454 Bibliography
[14] P. D. Lax, Change of variables in multiple integrals, Amer. Math. Monthly 106
(1999), 497–501.
[15] P. D. Lax, Change of variables in multiple integrals II, Amer. Math. Monthly
108 (2001), 115–119.
[16] D. C. Lay, Linear Algebra and its Applications (2nd ed.), Addison-Wesley,
Reading, MA, 1997.
455
456 Index
uniform continuity, 39
uniform convergence
of a sequence, 314
of a series, 317
of an improper integral, 336
upper bound, 24
This appendix is a very brief summary of the basic language and princi-
ples of mathematical logic. More extensive treatments can be found in many
places, such as The Tools of Mathematical Reasoning by Tamara J. Lakins
(American Mathematical Society, 2016).
Statements. Mathematics deals with statements (or assertions or propo-
sitions) that have a definite truth value: they must be either true or false.
In this discussion we use the letters P and Q to denote such statements. For
example, P could stand for the statement“5 > 2” (true) and Q could stand
for the statement “every odd number is divisible by 3” (false). Statements
can be quite complex objects built up out of simpler statements. For exam-
ple, the statement “every real number x can be written as n + y where n is
an integer and 0 y < 1” is built from the statements “x = n + y,” “n is
an integer,” and “0 y < 1” together with a couple of quantifiers. (See (2)
below.)
The Fundamental Operations. The basic logical operations to create
new statements from old ones are defined by the English words “and,” “or,”
and “not,” which logicians like to indicate by the symbols ^, _, and ¬. If P
and Q are statements, the statement P ^ Q is true precisely when P and Q
are both true; the statement P _ Q is true precisely when either P or Q is
true (or both);1 and ¬P is true precisely when P is false.
Observe that the negation ¬ interchanges ^ and _. If it is not the case
that P and Q are both true, then one or the other (or both) must be false;
and if it is not the case that P is true or Q is true, then both must be false:
(1) ¬(P ^ Q) ⌘ ¬P _ ¬Q, ¬(P _ Q) ⌘ ¬P ^ ¬Q.
Here the symbol ⌘ means that the statements on either side of it are logically
equivalent: they both have the same truth value, no matter whether P and
Q are true or false.
The symbols ^ and _ will probably remind the reader of the symbols
\ and [ in set theory. This is no accident. if P and Q are the statements
“x 2 A” and “x 2 B,” where x denotes an element of some set S and A and
B denote subsets of S, then P ^ Q and P _ Q are the statements “x 2 A \ B”
and “x 2 A [ B.” Also, ¬P is the statement “x 2 / A.”
1
That is, the word “or” is always to be interpreted in the inclusive sense: saying that
P is true or Q is true includes the possibility that they are both true.
463
Many mathematical statements involve a variable that can take di↵erent
values, such as the x in the preceding statements. We can denote such
a statement by P (x) to indicate the variable object explicitly; it is always
assumed, either explicitly or implicitly, that x is an element of some specified
set. P (x) may be true for some x’s and false for others. For example,
“x2 x 6 = 0” is a statement about real numbers; it is true for x = 3 and
x = 2 and false for all other values of x.
Quantifiers. Often we are interested not in the truth of P (x) for a
particular x but wish to say something about its truth as x ranges over
some specified set S. The two most common species of such statements
are “P (x) is true for all x 2 S” and “P (x) is true for at least one x 2 S.”
Logicians use the universal quantifier 8 and the existential quantifier 9 (read
as “for all” and “there exists”) for these situations. That is,
are the symbolic form of the statements “for all x in S, P (x) is true” and
“there exists an x in S such that P (x) is true.” Note that the English versions
of these statements can be reformulated in various ways such as “P (x) is true
for every x in S” and “P (x) is true for some x in S” in which the quantifying
clause follows the P (x), but in symbolic form, the quantifiers must always
precede P (x). Note also that when the set S is clearly understood, it is often
omitted from the quantifier; that is, we just say “8x” or “9x” rather than
“8x 2 S” or “9x 2 S.”
Example: The sentence at the end of the first paragraph can be written
symbolically as
464
is used in ordinary English in several di↵erent ways, but in mathematics it
has just one precise interpretation in terms of the truth values of P and Q.
Namely, when P is true then Q must also be true, but when P is false, Q
can be either true or false. That is, the only forbidden situation is that P is
true and Q is false:
P ) Q ⌘ ¬(P ^ ¬Q).
(In view of (1), this means that P ) Q is logically equivalent to ¬P _ Q.
It is a matter of psychology rather than logic to prefer the former version to
the latter.)
Implications involving a variable x often implicitly contain an unex-
pressed universal quantifier. For example, “if x > 0 then ex > 1” is a
statement about real numbers, and it really should be prefaced by “for all
x 2 R.” This is rarely a source of confusion, except that the negation of
such a statement contains an existential quantifier that cannot be omitted.
The negation of the (false) statement “if x > 0 then 3x > 1” is “there is an
x such that x > 0 but 3x 1 (true; any x with 0 < x 13 will work).
The converse of an implication P ) Q is the implication Q ) P . These
two statements are di↵erent and must not be confused with each other. For
example, the statement “if 0 < x < 1 then x3 < x” is true; its converse “if
x3 < x then 0 < x < 1” is false (any x < 1 is a counterexample).
There is, however, a way of “reversing the order” in an implication that
yields an equivalent statement. The assertion P ) Q means that Q must
be true when P is true; thus if Q is false, P must also be false. That is,
P )Q ⌘ ¬Q ) ¬P.
The statement ¬Q ) ¬P is called the contrapositive of the statement
P ) Q. These two statements are logically equivalent; for both of them, the
forbidden situation is that P is true while Q is false. This equivalence gives
a useful strategy for proving an implication P ) Q (“proof by contraposi-
tion”). Namely, instead of assuming the hypothesis P and reasoning one’s
way to the conclusion Q, one assumes the hypothesis ¬Q and reasons one’s
way to the conclusion ¬P .
If P ) Q and Q ) P are both true, we say that the statements P and
Q are equivalent and write P , Q (read as “P if and only if Q”). One can
replace the implication Q ) P by its contrapositive:
P ,Q ⌘ (P ) Q) ^ (¬P ) ¬Q).
That is, the statement P , Q means that P and Q (which, in practice, will
usually contain a variable x) always have the same truth value (no matter
465
what x is). Proving that P , Q is usually a matter of making two separate
arguments to show that P ) Q and Q ) P or that P ) Q and ¬P ) ¬Q.
(Note: The “equivalences” ⌘ and , are di↵erent. “P ⌘ Q” means
that P and Q have the same truth values simply by virtue of their logical
structure; “P , Q” means that P and Q have the same truth value by
virtue of their specific content.)
Proof by Contradiction. We conclude with a few words about the proof
technique known as proof by contradiction. The underlying logical princi-
ple is that mathematical statements must be either true or false, so that a
statement that is not false must be true:
¬(¬P ) ) P.
466