MA20218 Analysis 2A: Lecture Notes
MA20218 Analysis 2A: Lecture Notes
Lecture Notes
Roger Moser
Department of Mathematical Sciences
University of Bath
Semester 1, 2014/5
Contents
1 Riemann Integration
1.1 Lower and Upper Riemann Sums . . . .
1.2 Criteria for Integrability . . . . . . . . .
1.3 Riemann Sums . . . . . . . . . . . . . .
1.4 Properties of the Integral . . . . . . . .
1.5 The Fundamental Theorem of Calculus .
1.6 Integration Techniques . . . . . . . . . .
1.7 Exchanging Integrals with Limits . . . .
1.8 Improper Integrals . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
11
15
17
23
26
27
29
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
33
35
37
38
40
43
51
54
57
Index
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
CONTENTS
Chapter 1
Riemann Integration
Integration deals with two different questions.
Area under a curve Given an interval [a, b] R and a function f :
[a, b] R (say,
continuous), we obtain a curve in the plane described
by its graph (x, y) R2 : a x b, y = f (x) . This curve may be
interpreted as a piece of the boundary of a region in the plane. (Typically, the rest of the boundary is taken to be a piece of the x-axis and
two vertical line segments; see Fig. 1.0.1.) What is the area of such a
region?
Antiderivative Given an interval [a, b] R and a function f : [a, b] R,
is it possible to find another function F : [a, b] R such that F 0 = f
in (a, b)? If so, how?
At first, these questions may seem unrelated. But it turns out that there is
a deep connection; in fact, they are two sides of the same coin.
1.1
Throughout this chapter, let a, b R with a < b. We first want to find the
area under a curve given by a function f : [a, b] R. The main idea is to
divide the interval [a, b] into many small intervals and approximate the region
under the curve by rectangles. We can both overestimate and underestimate
the area this way, and when we choose increasingly fine subdivisions, then
we hope that the difference decreases (cf. Fig. 1.1.1).
Definition 1.1.1. A subdivision or partition of [a, b] is a finite sequence
(x0 , x1 , . . . , xN ) such that
a = x0 < x1 < < xN = b.
If = (x0 , x1 , . . . , xN ) is a subdivision of [a, b], then In = [xn1 , xn ] is called
the n-th interval of for n = 1, . . . , N . The number |In | = xn xn1 is
5
y = f (x)
y = f (x)
Figure 1.1.1: Lower and upper Riemann sum represented in terms of rectangles
N
X
n=1
f (x) dx,
f (x) dx =
a
then we say that f is Riemann integrable (or integrable for short). The
common value is then called the Riemann integral (or integral for short)
and denoted by
b
f (x) dx.
a
y
y = f (x)
positive area
b
negative
area
and
m(xn xn1 )
n=1
N
X
n=1
N
X
n=1
mn (xn xn1 )
Mn (xn xn1 )
N
X
n=1
M (xn xn1 ).
But
N
X
(xn xn1 ) = xN x0 = b a,
n=1
N
X
(f, In )|In |.
n=1
Remark 1.1.3. From the formulas for L(f, ), U (f, ), and (f, ), we
immediately obtain
(f, ) = U (f, ) L(f, ).
Lemma 1.1.2. Let f be a bounded real function on [a, b].
(i) If and 0 are subdivisions of [a, b] and 0 refines , then
L(f, 0 ) L(f, ), U (f, 0 ) U (f, ), and (f, 0 ) (f, ).
(ii) For any subdivisions 1 , 2 of [a, b],
L(f, 1 ) U (f, 2 ).
(iii) Furthermore,
f (x) dx
a
f (x) dx.
a
The last statement is consistent with the expectation that the lower
Riemann integral underestimates the area, while the upper Riemann integral
overestimates it in general.
10
Proof. (i) If 0 refines , then there exists a number ` N {0} such that
0 has ` points more than . We use induction over `.
If ` = 0, then 0 = and there is nothing to prove. If ` = 1, then we
have numbers x0 , . . . , xN with
a = x0 < x1 < < xN = b
such that = (x0 , . . . , xN ), and we have another number x (xK1 , xK )
for some K {1, . . . , N } such that
0 = (x0 , . . . , xK1 , x , xK , . . . , xN ).
Write In = [xn1 , xn ] for n = 1, . . . , N and
I = [xK1 , x ]
and I + = [x , xK ].
and m+ = f (I + ).
K1
X
mn |In | + mK |IK | +
n=1
K1
X
N
X
mn |In |
n=K+1
n=1
K1
X
N
X
mn |In | + mK (|I | + |I + |) +
mn |In |
n=K+1
mn |In | + m |I | + m |I | +
n=1
N
X
mn |In | = L(f, 0 ).
n=K+1
and
11
f (x) dx U (f, 2 ).
a
1.2
N
X
0 |In | = 0
and U (f, ) =
n=1
N
X
1 |In | = 1,
n=1
f (x) dx = 0
0
and
f (x) dx = 1.
0
12
and simultaneously
Thus for any > 0, there exist subdivisions 1 , 2 of [a, b] such that
f (x) dx
L(f, 1 ) >
a
and
U (f, 2 ) <
a
2
f (x) dx + .
2
f (x) dx U (f, )
f (x) dx L(f, ),
and
it follows that
f (x) dx
a
f (x) dx.
f (x) dx =
a
13
n1
,
N
Mn = sup f (In ) =
n
.
N
Then
L(f, N ) =
N
X
mn |In | =
n=1
N
X
n1
n=1
N2
(N 1)N
1
1
=
2
2N
2 2N
and
U (f, N ) =
N
X
Mn |In | =
n=1
N
X
n
N (N + 1)
1
1
=
= +
.
2
2
N
2N
2 2N
n=1
Hence
1
1
2 2N
x dx
0
x dx
0
1
1
+
2 2N
x dx
x dx ,
2
2
0
0
which implies the claim.
The monotonicity is the reason why we can determine the lower and
upper Riemann sums quite easily in this example. We can draw similar
conclusions for other monotonic functions.
Corollary 1.2.1. Let f : [a, b] R be monotonic (and therefore bounded).
Then f is Riemann integrable.
Proof. We only consider the case where f is increasing, as the case of a
decreasing function is similar.
Consider a subdivision = (x0 , . . . , xN ) of [a, b]. As usual, let In =
[xn1 , xn ] for n = 1, . . . , N . Because f is increasing, we have inf f (In ) =
f (xn1 ) and sup f (In ) = f (xn ). Thus
(f, ) =
N
X
n=1
kk
N
X
n=1
Thus for any > 0, we can achieve (f, ) < by choosing the mesh small
enough; more precisely, by choosing kk < /(f (b) f (a)). Now Theorem
1.2.1 implies the claim.
14
N
X
n=1
X
(f, In )|In | <
|In | = .
ba
n=1
N
X
k=1
whereas
L(f, 0 ) = inf f (In1 )|In1 | + inf f (In2 )|In2 | +
k6=n
When we take the difference, most of the terms cancel. More precisely,
L(f, 0 ) L(f, ) = inf f (In1 )|In1 | + inf f (In2 )|In2 | inf f (In )|In |
M |In1 | + M |In2 | m|In | = (M m)|In |.
The first inequality follows. The second inequality is proved similarly.
15
+ 2N (M m)kk.
2
Choose
.
4N (M m)
If kk < , it follows that (f, ) < .
=
1.3
Riemann Sums
n = 1, . . . , N.
Then the pair (, ) is called a tagged subdivision of [a, b]. For a bounded
function f : [a, b] R, the expression
S(f, , ) =
N
X
n=1
f (n )(xn xn1 )
16
Proof. Fix > 0 and invoke Theorem 1.2.2 to find a number > 0 such
that (f, ) < for all subdivisions of [a, b] with kk < . We have
L(f, ) S(f, , ) U (f, )
as well as
L(f, )
f (x) dx U (f, ).
a
So the numbers
f (x) dx
and S(f, , )
both belong to the interval [L(f, ), U (f, )] of length less than . This
implies the desired inequality.
Example 1.3.1. Consider the function f : [0, 1] R given by f (x) = x2 .
Being continuous, this function is Riemann integrable by Corollary 1.2.2.
Now we want to calculate the integral.
Let N N and consider the tagged subdivision (N , N ) with N =
(0, N1 , N2 , . . . , 1) and N = ( N1 , N2 , . . . , 1). Then we compute
S(f, N , N ) =
N
X
(n/N )2
n=1
N
1 X 2 N (N + 1)(2N + 1)
n =
,
N3
6N 3
n=1
using the well-known formula for the sum of the first N squares. Since
kN k 0 as N , we have
N (N + 1)(2N + 1)
1
= .
3
N
6N
3
x2 dx = lim
0
1.4
17
g(x) dx.
f (x) dx +
(f (x) + g(x)) dx =
f (x) dx =
a
f (x) dx.
a
N
X
n=1
N
X
f (n )(xn xn1 ) +
n=1
= S(f, , ) + S(g, , ).
N
X
n=1
18
(ii) Suppose first that > 0. Then for any subdivision of [a, b] with
intervals I1 , . . . , IN , we have
inf (f )(In ) = sup f (In )
and
Hence L(f, ) = L(f, ) and U (f, ) = U (f, ). Taking the supremum and the infimum, respectively, we obtain
b
b
f (x) dx = f (x) dx
a
and
f (x) dx =
a
f (x) dx.
a
and
and
b
(f (x)) dx = f (x) dx.
If f is integrable, then
b
b
b
(f (x)) dx =
(f (x)) dx =
f (x) dx.
a
Finally, in the case < 0, we can write = (1)||, and the claim
follows from the two cases already considered. The case = 0 is trivial.
19
f (x) dx
c
f (x) dx =
f (x) dx +
f (x) dx.
(1.1)
Proof. (i) Let > 0. According to Theorem 1.2.1, there exists a subdivision
of [a, b] such that (f, ) < . Let 0 be the subdivision of [a, b] obtained
by adding the points c and d to (unless they already belong to ). Then
by Lemma 1.1.2,
(f, 0 ) (f, ) < .
Say that 0 = (x0 , . . . , xN ). Write In = [xn1 , xn ] for n = 1, . . . , N . There
exist some numbers K, L with 1 K L N , such that c = xK1 and
d = xL . Then E = (xK1 , . . . , xL ) is a subdivision of [c, d]. We have
(f, E) =
L
X
(f, In )|In |
N
X
n=1
n=K
2
and
(f, 2 ) < .
2
and
(f, ) = (f, 1 ) + (f, 2 ) < .
(1.2)
20
and
f (x) dx U (f, 2 ),
L(f, 2 )
c
L(f, )
f (x) dx U (f, ).
f (x) dx +
a
f (x) dx +
f (x) dx
and
f (x) dx
both belong to the interval [L(f, ), U (f, )] of length less than . Therefore, we have
c
b
b
f (x) dx +
f (x) dx
f (x) dx < .
a
f (x) dx +
f (x) dx =
f (x) dx,
a
as required
Notation. If d < c, define
f (x) dx =
c
Furthermore, define
f (x) dx.
d
f (x) dx = 0
c
for any c [a, b]. Then (1.1) holds for any three numbers a, b, c in the
domain of an integrable function.
Theorem 1.4.3. Suppose that f : [a, b] R is a Riemann integrable function and g : R R is continuous. Then g f is Riemann integrable.
21
Proof. Let m = inf f ([a, b]) and M = sup f ([a, b]). Then g is uniformly
continuous on [m, M ] by the theorem of uniform continuity. Define ` =
min g([m, M ]) and L = max g([m, M ]), both of which exist by the Weierstrass extreme value theorem.
Let > 0. Then by the uniform continuity, there exists a number > 0
such that whenever s, t [m, M ] with |s t| < , we have |g(s) g(t)| <
. Therefore, if I [a, b] is an interval with (f, I) < , it follows that
(g f, I) < .
By Theorem 1.2.1, there exists a subdivision of [a, b] with (f, ) <
. Let I1 , . . . , IN be the intervals of , which we now divide into two
categories. Let A be the set comprising all indices n {1, . . . , N } such that
(f, In ) < , and let B comprise all n {1, . . . , N } such that (f, In ) .
Then
> (f, ) =
N
X
(f, In )|In |
n=1
(f, In )|In |
nB
|In |.
nB
Therefore, we have
X
>
|In |,
nB
which implies
X
(g f, In )|In | (L `)
nB
nB
(g f, In )|In | <
nA
|In | (b a).
nA
Therefore,
(g f, ) =
X
nA
(g f, In )|In | +
nB
The right-hand side can be made arbitrarily small, and thus g f is Riemann
integrable by Theorem 1.2.1.
Corollary 1.4.1. Let f : [a, b] R be Riemann integrable. Then |f | is
Riemann integrable.
Proof. Apply Theorem 1.4.3 with g(y) = |y|.
Corollary 1.4.2. Let f, g : [a, b] R be Riemann integrable functions.
Then f g is Riemann integrable.
22
1
(f + g)2 (f g)2
4
Proof. (i) If f g on [a, b], then for any interval I [a, b], we have inf f (I)
inf g(I). Hence for any subdivision of [a, b],
L(f, ) L(g, ).
Taking the supremum, we obtain
b
b
f (x) dx
g(x) dx.
a
Since we have Riemann integrable functions, this implies the desired inequality.
(ii) If
b
f (x) dx 0,
a
then we use the fact that f |f | on [a, b], together with part (i). The
conclusion is then that
b
b
b
f (x) dx =
f (x) dx
|f (x)| dx.
a
If
f (x) dx < 0,
a
in this case.
1.5
23
This is the section where we draw the link between the two problems at the
beginning of the chapter. So far we have calculated areas under a curve.
Now we find a connection with differentiation.
Theorem 1.5.1 (First Fundamental Theorem of Calculus). Suppose that
f : [a, b] R is Riemann integrable and F : [a, b] R is continuous on
[a, b] and differentiable on (a, b) with F 0 = f . Then
b
f (x) dx = F (b) F (a).
a
24
N
X
f (n )(xn xn1 ) =
n=1
N
X
n=1
c
|x c|
c
c
That is, we have
F (x) F (c)
= f (c),
xc
which is exactly what we have to prove.
lim
xc
25
for all x I.
Remark 1.5.2. The second statement means that primitives are unique up
to a constant.
Proof. (i) Fix x0 I and define
G(x) =
f (t) dt.
x0
Then for any x > x0 , Theorem 1.5.2 implies that G0 (x) = f (x), as f is
continuous.
Now consider x x0 . Choose a point x1 I with x1 < x (which exists
as I is open). Then
x
x0
G(x) =
f (t) dt
f (t) dt
x1
x1
for all x I.
26
1.6
Integration Techniques
Define F (x) = n+1 and check that F 0 (x) = xn for all x (a, b). Furthermore, this function is continuous on [a, b]. So by Theorem 1.5.1,
b
bn+1
an+1
xn dx = F (b) F (a) =
.
n+1 n+1
a
We have differentiation rules for products and compositions, and these
give rise to similar rules for integrals.
Theorem 1.6.1 (Integration by Parts). Let f, g : [a, b] R be Riemann
integrable functions. Suppose that F, G : [a, b] R are continuous functions
that are primitives of f and g, respectively, in (a, b). Then
b
b
f (x)G(x) dx +
F (x)g(x) dx = F (b)G(b) F (a)G(a).
a
27
u(b)
and
1.7
as k .
Remark 1.7.1. The conclusion of the theorem can be written in the form
b
b
lim
fk (x) dx =
lim fk (x) dx.
k a
a k
28
It follows that
(f, ) =
N
X
(f, In )|In |
n=1
N
X
(fK , In )|In | + 2
n=1
N
X
|In |
n=1
b
X
fk (x) dx =
fk (x) dx.
a k=1
k=1
P
k
Corollary 1.7.2. Suppose that
k=0 k x is a power series with radius of
convergence R (0, ] and R < a < b < R. Then
bX
k xk dx =
a k=0
X
k k+1
b
ak+1 .
k+1
k=0
29
Proof. Let y0 = limk fk (x0 ) and let g : (a, b) R be the uniform limit of
fk0 . Then g is continuous, being the uniform limit of continuous functions.
Define
x
g(t) dt, x (a, b).
f (x) = y0 +
x0
Then
f0
1.8
Improper Integrals
If we have an unbounded interval or an unbounded function, then the previous theory does not apply. When we think in terms of area under a curve,
this means that we have only discussed bounded regions in R2 so far. But
sometimes it is appropriate to assign an area to an unbounded region. This
can often be done by taking a limit.
There are two distinct situation that we consider.
(i) Suppose that f : [a, b] R is unbounded, but is Riemann integrable
(and in particular bounded) on [c, b] for any c (a, b) and the limit
b
lim
f (x) dx
ca+
f (x) dx = lim
ca+
f (x) dx.
c
cb
30
f (x) dx = lim
c a
f (x) dx.
f (x) dx = lim
c c
f (x) dx,
We compute
as c
0+ .
Hence
dx
=2 12 c2
x
dx
= 2.
x
We have
dx
.
x
dx
x
dx
= log c log 1 = log c
x
31
f (k)
k=K
f (x) dx < .
K
ks .
k=1
We conclude that
X
1
= .
k
k=1
s+1
if s > 1,
limc cs+1 1
s
x dx =
=
1
s
+
1
s+1 if s < 1.
1
Hence we have
ks =
k=1
if s 1 and
X
k=1
if s < 1.
ks <
32
Chapter 2
2.1
N
X
x n yn = y T x
n=1
34
x+y
2.2. CONVERGENCE
2.2
35
Convergence
(n)
= limk xk
for every n =
N
X
(n)
xk
(n) 2
x0
!1/2
= kxk x0 k 0.
n=1
(n)
(k)
So xk x0 as k .
(n)
(n)
Conversely, suppose that x0 = limk xk for n = 1, . . . , N . Then
kxk x0 k =
N
X
(n)
xk
(n) 2
x0
!1/2
0
n=1
36
37
2.3
38
2.4
Continuity
2.4. CONTINUITY
39
1
,
k
Then the sequence (xk )kN evidently converges to x0 , but the sequence
(f (xk ))kN does not converge to f (x0 ).
Corollary 2.4.1. Let S RN and x0 S.
(i) If f, g : S RM are both continuous at x0 , then f + g is continuous
at x0 .
(ii) If f : S RM is continuous at x0 and : S R is continuous at x0 ,
then f is continuous at x0 .
Proof. Combine Theorem 2.4.1 with Lemma 2.2.2.
Theorem 2.4.2. Let A RN and B RM and let x0 A. Suppose that
f : A RM and g : B RK are functions with f (A) B, such that f is
continuous at x0 and g is continuous at f (x0 ). Then g f is continuous at
x0 .
Proof. First note that g f is well-defined by the assumption f (A) B.
Let > 0. By the continuity of g at f (x0 ), there exists a number > 0
such that kg(y) g(f (x0 ))k < for all y B with ky f (x0 )k < . By the
continuity of f at x0 , there exists a number > 0 such that kf (x)f (x0 )k <
for all x A with kx x0 k < .
Now if x A and kx x0 k < , it follows that kg(f (x)) g(f (x0 ))k < .
Hence g f is continuous at x0 .
Theorem 2.4.3 (Weierstrass Extreme Value Theorem). Let C RN be
non-empty and compact. Then any continuous function f : C R is
bounded and attains its infimum and supremum.
Proof. Let = sup f (C) (, ]. Then we can choose a sequence
(k )kN in f (C) such that = limk k . For each k N, choose a point
xk C with f (xk ) = k .
By Corollary 2.3.1, there exists a convergent subsequence (xkj )jN converging to a point of C, say xkj x0 C as j . By the continuity of
f , we now have
f (x0 ) = lim f (xkj ) = lim kj = lim k = sup f (C).
j
40
2.5
Norms
Everything that we have done so far in this chapter is based on the idea
that we can measure distances in RN in terms of the Euclidean norm k k.
This concept can be generalised.
Definition 2.5.1. A norm on real vector space V is a map k kV : V R
such that
(i) kxkV 0 for all x V with equality if, and only if, x = 0,
(ii) kxkV = ||kxkV for all x V and all R, and
(iii) kx + ykV kxkV + kykV for all x, y V .
2.5. NORMS
41
N
X
|xn |
n=1
for x = (x1 , . . . , xN )T .
Given a real vector space V with a norm k kV , we can define convergence,
balls, and open sets in V analogously to the corresponding concepts in RN ,
simply replacing the Euclidean norm by k kV everywhere. If we have two
real vector spaces V and W with norms k kV and k kW , respectively, then
we can also define continuity of a map f : V W .
Definition 2.5.2. Two norms k k1 and k k2 on a real vector space V are
equivalent if there exists a number C 0 such that kxk1 Ckxk2 and
kxk2 Ckxk1 for all x V .
Proposition 2.5.1. Let k k1 and k k2 be two equivalent norms on the real
vector space V . Then a sequence (xk )kN in V converges to a limit x0 V
with respect to the norm k k1 if, and only if, it converges to x0 with respect
to k k2 .
Proof. Convergence with respect to k k1 means that kxk x0 k1 0 as
k . But then kxk x0 k2 Ckxk x0 k1 0 as well, so we have
convergence with respect to k k2 . The arguments for the converse are the
same.
Remark 2.5.1. So equivalent norms give rise to the same notion of convergence. It follows that they also give rise to the same continuous functions.
Theorem 2.5.1. Any two norms on RN are equivalent.
Proof. It suffices to show that any norm k k on RN is equivalent to the
Euclidean norm k k. Let (e1 , . . . , eN ) be the standard basis in RN . Then
for x = (x1 , . . . , xN )T RN , we have
N
N
X
X
kxk =
xn e n
kxn en k
n=1
n=1
!1/2 N
!1/2
N
N
X
X
X
=
|xn |ken k
x2n
ken k2
n=1
n=1
n=1
by the triangle inequality, property (ii) in Definition 2.5.1, and the CauchyP
1/2
N
2
Schwarz inequality. Setting C1 =
ke
k
, we obtain
n
n=1
kxk C1 kxk.
(2.1)
42
by the triangle inequality and the first part of this proof (cf. Exercise 6.3).
Hence k k is a continuous function with respect to the Euclidean norm.
Let
S = x RN : kxk = 1 ,
which is a closed and bounded set with respect to k k. It follows from
Weierstrasss extreme value theorem (Theorem 2.4.3) that there exists a
point x0 S such that kxk kx0 k for all x S. Let C2 = kx10 k . Then
for any x 6= 0, we have
x
S.
kxk
Hence
x
1
= kxk .
C2
kxk
kxk
That is,
kxk C2 kxk .
(2.2)
This inequality is trivial for x = 0.
The equivalence of the norms now follows from the two inequalities (2.1)
and (2.2).
The following is another example of a norm. We will use this specific
norm later.
Definition 2.5.3. Let Hom(RN , RM ) denote the space of all linear maps
A : RN RM . The operator norm is the norm k k on Hom(RN , RM )
defined by
kAk = sup kAxk : x RN with kxk 1 .
Remark 2.5.2. For any A Hom(RN , RM ), the operator norm kAk is
finite. Indeed, if (amn )m,n is the transformation matrix of A with respect to
the standard basis, then for all x RN , we have
!2 1/2
M
N
X
X
kAxk =
amn xn
m=1
n=1
M
X
N
X
m=1
n=1
!
a2mn
N
X
!!1/2
x2n
n=1
M X
N
X
!1/2
a2mn
m=1 n=1
M X
N
X
!1/2
a2mn
m=1 n=1
kxk
2.6. DERIVATIVES
Proposition 2.5.2.
43
(i) If A Hom(RN , RM ) and x RN , then
kAxk kAkkxk.
2.6
Derivatives
xx0
f (x) f (x0 )
.
x x0
44
y
y = f (x0) + (x - x0) f '(x0)
y = f (x)
x0
xx0
f (x) f (x0 )
f (x) f (x0 ) (x x0 )f 0 (x0 )
= lim
f 0 (x0 ) = 0,
xx0
x x0
x x0
so
lim
xx0
xx0
g(x)
= 0.
h(x)
as x x0 .
Since the definition of the derivative involves a linear map, the following
information is useful.
Lemma 2.6.1. Any linear map A : RN RM is continuous.
2.6. DERIVATIVES
45
amn xn ,
m = 1, . . . , M.
n=1
Thus the claim follows from Lemma 2.2.1 and Theorem 2.4.1.
Proposition 2.6.1. Let f : RM be a map that is differentiable at
x0 . Then f is continuous at x0 .
Proof. We have
f (x) f (x0 ) = Df (x0 )(x x0 ) + o(kx x0 k) 0
as x x0
f
xn (x0 )
46
a
mn
h
f (x0 + hen ) f (x0 )
= m ,
Df (x0 )en
h
f (x0 + hen ) f (x0 ) Df (x0 )hen
0
h
as h 0. Hence we obtain
fm
(x0 ) = amn
xn
when taking the limit.
Remark 2.6.3. This lemma implies that the derivative Df (x0 ), if it exists,
is unique.
Remark 2.6.4. If f is differentiable at x0 , then for any v RN , we can
show with the arguments from Lemma 2.6.2 that Dv f (x0 ) = Df (x0 )v.
Theorem 2.6.1. Let x0 . If f : RM is a map such that all the
f
f
partial derivatives x
, . . . , x
exist throughout and are continuous at x0 ,
1
N
then f is differentiable at x0 .
Proof. We first consider the case M = 1.
Let r > 0 such that Br (x0 ) . Let h = (h1 , . . . , hN )T Br (0).
Consider the function t 7 f (x0 + te1 ), which is differentiable in (r, r) with
derivative
f
(x0 + te1 ).
x1
By the mean value theorem, there exists a number 1 R with |1 | |h1 |
such that
f
f (x0 + h1 e1 ) = f (x0 ) + h1
(x0 + 1 e1 ).
x1
2.6. DERIVATIVES
47
f
(x0 + h1 e1 + 2 e2 )
x2
f
(x0 + 1 e1 )
x1
f
(x0 + h1 e1 + 2 e2 ).
x2
f (x0 + h) = f (x0 ) + h1
Setting
bn =
f
f
(x0 + h1 e1 + + hn1 en1 + n en )
(x0 ),
xn
xn
we obtain
f (x0 + h) = f (x0 ) + Jf (x0 )h +
N
X
bn hn .
(2.3)
n=1
bn hn = o(khk).
n=1
So (2.3) implies that Df (x0 ) exists and is represented by the matrix Jf (x0 ).
If M 2, then we apply these arguments to every component of f . The
claim then follows in this case as well.
Definition 2.6.5. A map f : RM is called continuously differentiable
if it is differentiable throughout and the map Df : Hom(RN , RM ) is
continuous.
Here continuity is meant with respect to the operator norm on the space
Hom(RN , RM ). However, it follows from Lemma 2.2.1, Theorem 2.4.1, Theorem 2.6.1, and the equivalence of all norms on Hom(RN , RM ) that f is
continuously differentiable if, and only if, all partial derivatives exist and
are continuous in .
Example 2.6.1. Let f : R3 R2 with
2
x1 + x22 x3
f (x) =
x1 x2 x3
48
for x R3 . We compute
Jf (x) =
2x1 2x2 x3 x22
.
x2 x3 x1 x3 x1 x2
All of these expressions give rise to continuous functions, hence f is continuously differentiable.
Theorem 2.6.2 (Chain Rule). Let U RN and V RM be open sets. Let
f : U RM and g : V RK be maps and suppose that f (U ) V . Let
x0 U . If f is differentiable at x0 and g is differentiable at f (x0 ), then g f
is differentiable at x0 with
D(g f )(x0 ) = Dg(f (x0 ))Df (x0 ).
Proof. Define
(x) = f (x) f (x0 ) Df (x0 )(x x0 )
for x U and
(y) = g(y) g(f (x0 )) Dg(f (x0 ))(y f (x0 ))
for y V . Then
lim
xx0
and
(x)
=0
kx x0 k
(y)
=0
yf (x0 ) ky f (x0 )k
lim
(2.4)
(2.5)
2.6. DERIVATIVES
49
f (x0)
x0
f (x) = 0
Figure 2.6.2: A level set of f and the gradient f (x0 ) perpendicular to it.
xx0
0 = h0 (0) = Df (x0 ) 0 (0) = f (x0 ), 0 (0) .
(It is possible, however, that S does not have any tangent vectors at x0
except 0.)
Notation. If x, y RN , then we write [x, y] = {(1 t)x + ty : 0 t 1}
for the line segment connecting x and y.
50
0 t 1.
Then by Theorem 2.4.2, the function g is continuous on [0, 1] and by Theorem 2.6.2, it is differentiable in (0, 1) with
g 0 (t) = hv, Df ((1 t)x + ty)(y x)i .
By the mean value theorem, there exists a number (0, 1) such that
g(1) g(0) = g 0 ( ),
i.e.,
hv, f (y) f (x)i = hv, Df ((1 )x + y)(x y)i Kkvkkx yk.
Choose v = f (y) f (x), then we obtain
kf (y) f (x)k2 Kkf (y) f (x)kkx yk.
If f (x) = f (y), then there is nothing to prove. Otherwise, we divide by
kf (y) f (x)k on both sides to obtain the desired inequality.
Definition 2.6.6. Suppose that x0 . Let f : R be a function. If f
is differentiable at x0 and Df (x0 ) = 0, then x0 is called a critical point (or
stationary point) of f .
If there exists a number r > 0 such that f (x0 ) f (x) for all x Br (x0 ),
then x0 is called a local minimum point of f . If there exists a number r > 0
such that f (x0 ) f (x) for all x Br (x0 ), then x0 is called a local maximum
point of f . If x0 is a critical point of f and neither a local minimum point
nor a local maximum point, then it is called a saddle point of f .
Proposition 2.6.3. Suppose that f : R is a function. If x0 is a
local minimum point or local maximum point of f and f is differentiable at
x0 , then x0 is a critical point of f .
Proof. Choose r > 0 such that Br (x0 ) . Fix n {1, . . . , N } and let en
be the n-th standard unit vector in RN . Consider the function
g(t) = f (x0 + ten ),
t (r, r).
2.7
51
2f
(x0 )
xi xj
i,j
0sh
0 t h.
Note that
g10 (s) =
f
f
(x0 + sei + hej )
(x0 + sei ).
xi
xi
52
By the mean value theorem, there exists a number 1 (0, h) such that
g1 (h) g1 (0) = hg 0 (1 ). That is,
f (x0 + hei + hej ) f (x0 + hei ) f (x0 + hej ) + f (x0 )
f
f
(x0 + 1 ei + hej )
(x0 + 1 ei ) .
=h
xi
xi
Applying the mean value theorem to the function t 7
we find a number 1 (0, h) such that
f
xi (x0
+ 1 ei + tej ),
f
f
2f
(x0 + 1 ei + hej )
(x0 + 1 ei ) = h
(x0 + 1 ei + 1 ej ).
xi
xi
xj xi
That is,
1
(f (x0 + hei + hej ) f (x0 + hei ) f (x0 + hej ) + f (x0 ))
h2
2f
=
(x0 + 1 ei + 1 ej ).
xj xi
Replacing g1 with g2 and applying the same arguments, we find 2 , 2
(0, h) such that
1
(f (x0 + hei + hej ) f (x0 + hei ) f (x0 + hej ) + f (x0 ))
h2
2f
(x0 + 2 ei + 2 ej ).
=
xi xj
Hence
2f
2f
(x0 + 1 ei + 1 ej ) =
(x0 + 2 ei + 2 ej ).
xj xi
xi xj
(2.6)
xj xi (x0 + 1 ei + 1 ej ) xj xi (x0 )
and
2
f
2f
xi xj (x0 + 2 ei + 2 ej ) xi xj (x0 ) .
xj xi (x0 ) xi xj (x0 ) < 2.
Since was chosen arbitrarily, this concludes the proof.
53
and ! = 1 ! N !.
x
x1 1 . . . xNN
provided that this partial derivative exists.
Theorem 2.7.2 (Taylors Theorem). Suppose that f : R has continuous partial derivatives up to order m throughout . Let x, y such that
[x, y] . Then there exists a number (0, 1) such that
X
f (y) =
||m1
X 1 mf
1 || f
(x)(y
x)
+
((1 )x + y)(y x) .
! x
! x
||=m
Moreover,
f (y) =
X 1 || f
(x)(y x) + o(ky xkm )
! x
||m
as y x.
Proof. This follows from Taylors theorem in one variable, applied to the
function t 7 f ((1 t)x + ty). See Exercise 10.1 for the details.
Recall that given a real (N N )-matrix A, we have a quadratic form
RN R, x 7 hx, Axi. We say that A is
positive definite if hx, Axi > 0 for all x RN \{0},
negative definite if hx, Axi < 0 for all x RN \{0}, and
indefinite if there exist two points x , x+ RN with hx , Ax i < 0
and hx+ , Ax+ i > 0.
For a symmetric matrix (i.e., a matrix with AT = A), positive definite
means that all eigenvalues are positive, negative definite means that all
eigenvalues are negative, and indefinite means that there are positive and
negative eigenvalues.
54
1
hx x0 , Hf (x0 )(x x0 )i + R(x)
2
xx0
R(x)
= 0.
kx x0 k2
(i) If Hf (x0 ) is positive definite, then all of its eigenvalues are positive.
Let 0 be the smallest eigenvalue. Then we have
hx x0 , Hf (x0 )(x x0 )i 0 kx x0 k2
for all x RN . So
f (x) f (x0 ) + kx x0 k
0
R(x)
+
2
kx x0 k2
.
2.8
We now consider functions in two variables. That is, from now on, we have
N = 2 and we consider an open set R2 . This is merely to avoid technical
complications, and the results do have counterparts in higher dimensions. In
55
two dimensions, its convenient to denote the coordinates by (x, y)T rather
than (x1 , x2 )T .
This section is about the observation that for a reasonably regular function f : R, the solutions of the equation
f (x) = 0
typically form a curve in . For example, if f (x) = x21 + x22 r2 with r R,
then we have a circle of radius |r|, except for r = 0, where we have a single
point that solves the equation.
We may want to use an equation like this in order to define a specific
curve in R2 . But if we want to be certain that we actually obtain a curve, we
have to worry about degenerate cases like the case r = 0 above. Fortunately,
we can give conditions that guarantee not only that we have a curve, but
even that we locally have the graph of a function. This function is implicitly
defined through the equation.
Theorem 2.8.1 (Implicit Function Theorem). Suppose that the function
f : R is continuously differentiable in . Let (x0 , y0 )T be a point
with f (x0 , y0 ) = 0 and f
y (x0 , y0 ) 6= 0. Then there exist two numbers r, s > 0
such that (x0 r, x0 + r) (y0 s, y0 + s) and there exists a unique
function g : (x0 r, x0 + r) (y0 s, y0 + s) with f (x, g(x)) = 0 for all
x (x0 r, x0 + r). Furthermore, this function g is differentiable at x0 with
f
x
g 0 (x0 ) = f
(x0 , y0 )
y (x0 , y0 )
g (x) =
f
x (x, g(x))
f
y (x, g(x))
for every x (r, r) such that the denominator does not vanish (which is
the case at least sufficiently close to x0 by the continuity). The right-hand
side is continuous, so g is continuously differentiable.
Proof. Without loss of generality we may assume that x0 = 0 and y0 = 0,
because otherwise we may apply a translation in R2 . Furthermore, we may
assume without loss of generality that f
y (0, 0) > 0; otherwise we may replace
f by f .
56
and
f (x, s) = f (x, 0) +
f
(x, ) d > 0.
y
f
(x, g(x))
x
f
(0, g(x)).
y
f
f
(x, g(x)) + g(x) (0, g(x))
x
y
f
Recall that f
y > in (r, r) (s, s). Furthermore, since x is continuous
on [r, r][s, s], the Weierstrass extreme value theorem implies that there
exist a number C 0 such that
f
(
C
x
,
y
)
x
for all x
[r, r] and all y [s, s]. Hence
f (x, g(x))
C
x
|g(x)| = f
|x| |x|.
(0, g(x))
y
Choosing K = C/, we obtain (2.7).
57
Step 3: g is differentiable at 0. Next we want to show that g is differentiable at 0 with the derivative given in the statement of the theorem. To
this end, note that by the differentiability of f , we have
f (x, g(x)) = f (0, 0) + Df (0, 0)(x, g(x))T + R(x)
for a function R : (r, r) R with
R(x)
lim p
= 0.
2
x
x + (g(x))2
But f (0, 0) = f (x, g(x)) = 0. So we have
0=x
Thus we obtain
f
f
(0, 0) + g(x) (0, 0) + R(x).
x
y
f
R(x)
g(x)
x (0, 0)
f
.
= f
x
(0,
0)
(0,
0)
x
y
y
Moreover, we have
q
2
R(x)
|R(x)| 1 + (g(x))
|R(x)|
1 + K2
x2
q
p
0
f
f
2 + (g(x))2
x (0, 0)
(g(x))2 f
x
(0,
0)
y
y
|x| 1 +
(0, 0)
x2
as x 0 by (2.7). Hence
f
g(x)
x (0, 0)
.
= f
x0 x
(0, 0)
g 0 (0) = lim
2.9
58
or
g
(x0 , y0 ) 6= 0.
y
g
If y
(x0 , y0 ) 6= 0, then we apply the implicit function theorem to find r, s > 0
with [x0 r, x0 + r] [y0 s, y0 + s] and a continuously differentiable
function : (x0 r, x0 + r) (y0 s, y0 + s) such that g(x, (x)) = 0 for
all x (x0 r, x0 + r). Moreover, we have
g
0 (x0 ) = x
g
(x0 , y0 )
y (x0 , y0 )
Now consider the function h : (x0 r, x0 +r) R with h(x) = f (x, (x)).
It has a minimum at x0 , and so by the chain rule,
0 = h0 (x0 ) =
=
f
f
(x0 , y0 ) +
(x0 , y0 )0 (x0 )
x
y
g
(x0 , y0 )
f
f
(x0 , y0 )
(x0 , y0 ) x
.
g
x
y
(x0 , y0 )
y
Hence
f
x (x0 , y0 )
g
x (x0 , y0 )
f
y (x0 , y0 )
g
y (x0 , y0 )
=: .
Then we have
f
g
(x0 , y0 ) = (x0 , y0 )
x
x
and
f
g
(x0 , y0 ) = (x0 , y0 ),
y
y
59
f (x0) = g (x0)
x0
g(x) = 0
60
Index
n-th interval, 5
Jacobi matrix, 45
Bolzano-Weierstrass theorem, 36
bounded, 36
Cauchy-Schwarz inequality, 34
chain rule, 48
closed, 37
compact, 38
Continuity Theorem, 23
continuity theorem, 23
continuous, 38
continuous at a point, 38
continuously differentiable, 47
convergence, 35, 38
critical point, 50
directional derivative, 45
equivalent, 41
Euclidean inner product, 33
Euclidean norm, 33
partial derivative, 45
partition, 5
primitive, 25
refinement, 9
Riemann integrable, 7
Riemann integral, 7
Riemann sum, 15
saddle point, 50
second derivative test, 54
second fundamental theorem of
calculus, 23
stationary point, 50
subdivision, 5
symmetry of the Hessian, 51
tagged subdivision, 15
61
62
Taylors theorem, 53
triangle inequality, 34
uniformly continuous, 38
upper Riemann integral, 7
upper Riemann sum, 7
Weierstrass extreme value theorem,
39
INDEX