0% found this document useful (0 votes)
31 views

Mat 322-1-2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Mat 322-1-2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

MAT 322 (Real Analysis IV)

M.E. Egwe
Department of Mathematics,
University of Ibadan,
Ibadan, NIGERIA.

April 17, 2023


Contents

1 Introduction & Preliminaries 3


1.1 Functions Two or more variables . . . . . . . . . . . . . . . . . . . . 3
1.2 Regions & Domains of Functions of several variables . . . . . . . . . . 3

2 Limits of Functions of several variables 9


2.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Iterated Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Some solved problems . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Introduction to IRn 13
3.1 Vector Spaces Revisited . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Derivaties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . 21

4 The Chain Rule, Higher Order Derivatives and Taylor’s Theorem 32


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Mean-Value Theorem . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Higher Order Derivative & Taylor’s Theorem . . . . . . . . . . . . . . 39

5 Applications to Extremum Problems and the method of Lagrange


Multipliers 51
5.1 Extremum Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

1
6 The Mapping Theorems 63
6.1 Inverse Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Implicit function Theorem . . . . . . . . . . . . . . . . . . . . . . . . 70

Course synopsis
Introduction to limits of functions of several variable, Differentiation or derivatives.
Directional derivatives, partial derivatives and higher order derivatives. Taylor’s the-
orem. Inverse function theorem. Implicit function theorem. Extrema and method of
Lagrange multipliers. Riemann integrals, Riemann-Stieltjes integral. Functions of
bounded variation. Partial integration formula. Mean value theorems. Integration
of functions of several variables. Semester 2; LH 60; PH 0; 4 Units; Status R

Mode of Assessments
There shall be two standard tests, one assignment and as may be determined by the
course lecturer depending on the response of the students to the course lectures.

References
1. Walter Rudin - Principles of Mathematical Analysis.

2. Robert G. Bartle - Mathematical Analysis.

3. S.O. Iyahen - Functions of Several Variables.

4. T.M. Apostol.

2
Chapter 1

Introduction & Preliminaries

1.1 Functions Two or more variables


Definition 1.1.1. A function z is said to be a function of two variables x and y if
for each pair (x, y), we can determine one or more values of z.
We use the notation f (x, y), F (x, y), g(x, y) etc to denote the value of the function
at (x, y) and write z = f (x, y), z = g(x, y)) etc.

Example 1.1.2. If f (x, y) = x2 + 2y 3 , then f (3, 1) = 32 + 2(−1)3 = 7

Remark 1.1.3. The concept is easily extended.Thus, F (x, y, z) denotes the value
of a function at (x, y, z) or a point in three dimensional space.

1.2 Regions & Domains of Functions of several


variables
The functions of one variable which we study in calculus are usually defined on
intervals. If the interval is finite, it may contain both end points, or just one or
neither. If the interval is infinite, but is not the entire axis, it has just one end point
and this may or may not be counted as belonging to the interval.
One of the things that claims our attention when we begin to study functions of
several variables is the nature of the region of such functions. In what follows, we

3
give some illustrations and examples taking the number of independent variables to
be two.

Example 1.2.1. Consider the function defined by f (x, y) = log(1 − x2 − y 2 ).

This function is defined only when x2 + y 2 < 1, since otherwise, the logarithm is
undefined. The region of definition is therefore the interior of the unit circle at the
origin as shown below.
y

p
Example 1.2.2. F (x, y) = x2 + y 2 − 1 + log(4 − x2 − y 2 ).

Here, we must have x2 + y 2 ≥ 1 in order for the square root to be real, while we
must have x2 + y 2 < 4 for the logarithm to be defined
the region of definition therefore for F (x, y) is the annular region between the cir-
cles x2 + y 2 = 1 and x2 + y 2 = 4. The inner circumference is part of the region of
definition while the outer circumference is not. This is shown below.

4
y

(2, 0) x

x
Example 1.2.3. g(x, y) = 2 .
y − 4x
This function is defined except when the denominator is zero, i.e., everywhere except
at the points of the parabola y 2 = 4x. This is shown below.

5
x

p p
Example 1.2.4. G(x, y) = x2 − y 2 + x2 + y 2 − 1.

The region of definition here is given by the inequalities x2 ≥ y 2 , x2 + y 2 ≥ 1. The


lines x − y = 0 x + y = 0 divide the plane into four quadrants. The inequalities
x2 ≥ y 2 states that the point (x, y) lies in (or on the edge of) one of those two of
the four quadrants which contain the x-axis. The other inequality states that (x, y)
lies outside or on the circle x2 + y 2 = 1. Hence the region of definition of G(x, y) is
that part of the xy-plane as shown in the shaded portion below.

6
x

Remark 1.2.5. Similar examples might be given for functions of three of more
independent variables. The region of definition might be interior of a cube, the
interior and boundary of an ellipsoid, the space between concentric spheres, etc.

Definition 1.2.6. The set all points (x, y) such that |x − x◦ | < δ, |y − y◦ | < δ
where δ > 0 is called a rectangular δ-neighbourhood of (x◦ , y◦ ).

The set 0 < |x − x◦ | < δ, 0 < |y − y◦ | < δ which excludes (x◦ , y◦ ) is called
the rectangular deleted δ-neighbourhood of (x◦ , y◦ ). Other neighbourhoods include
circular δ-neighbourhood of (x◦ , y◦ ). e.g.,

(x − x◦ )2 + (y − y◦ )2 < δ 2 .

Definition 1.2.7. A point (x◦ , y◦ ) is called a limit point or accumulation point or


cluster point of a set S if every deleted δ-neighbourhood of (x◦ , y◦ ) contains points
S.

For example, every bounded infinite set has at least one limit point (Bolzano-
Weierstrass). The limit point need not be a member of the set.

7
Definition 1.2.8. A set is said to be closed if it contains all its limit points. A set
S is called open if each point p ∈ S has some circular neighbourhood which belongs
entirely to the set S.

Example 1.2.9.
(1) The set of all points in the plane not on the parabola y 2 = 4x is an open set
(Verify!)
(2) The set of all points on or inside the circle x2 + y 2 = 1 is not open (check!)

Exercise 1.2.10.

(1) Let S = {(x, y) : x2 + y 2 < 1, x < 0 if y = 0}.Is S open, closed or neither?

(2) Let f (x, y) = log sin x + y −1/2 . Determine the set of points (x, y) where f is
defined. Is the set open, closed or neither?

1 −1
 
(3) Also f (x, y) = y − sin
x

Definition 1.2.11. By a region, we shall mean a set of points which is either a


nonempty open set, or such a set together with some or all of the points forming its
boundary.

8
Chapter 2

Limits of Functions of several


variables

2.1 Definitions and Examples


Definition 2.1.1. Let f (x, y) be defined in a deleted δ-neighbourhood of (x◦ , y◦ ),
i.e., f (x, y) may not be defined at (x◦ , y◦ ). We say that f has a limit ℓ as x approaches
x◦ and y approaches y◦ if for any ϵ > 0 given, we can find a δ > 0 (depending on ϵ
and (x◦ , y◦ )) such that

|f (x, y) − ℓ| < ϵ whenever 0 < |x − x◦ | < δ and 0 < |y − y◦ | < δ.

One can also use the deleted neighbourhood 0 < (x − x◦ )2 + (y − y◦ )2 < δ.

Example 2.1.2. Let f : IR2 → IR be defined by



 3xy, if (x, y) ̸= (1, 2)


f (x, y) =


0, if (x, y) = (1, 2)

Then, as x → 1, y → 2, f (x, y) → 3(2)(1) = 6. Thus, we outrightly suspect that

lim f (x, y) = 6.
x→1
y→2

9
Note that lim f (x, y) ̸= f (1, 2) since f (1, 2) = 0. The limit would in fact be 6 even if
x→1
y→2
f (x, y) were not defined at (1, 2). Thus, the existence of the limit as (x, y) → (x◦ , y◦ )
is in no way dependent on the existence of a value of f (x, y) at (x◦ , y◦ )

Remark 2.1.3. If the limit exists, then it is unique. The concept of one-sided limits
of functions of one variable is easily extended to functions of several variables.
π π
Example 2.1.4. lim tan−1 (y/x) = , and lim tan−1 (y/x) = − .
x→0+ 2 x→0− 2
y→1 y→1
π
But lim tan−1 (y/x) = does not exist since the limits as given above are not
x→0 2
y→1
equal.

In general, the theorems of limits, concepts of infinity, etc for functions of one
variable apply with appropriate modifications to functions of two or more variables.

2.1.1 Iterated Limits

The iterated limits lim f (x, y) and lim f (x, y) also denoted by lim lim f (x, y)
y→y◦ x→x◦ x→x◦ y→y◦
and lim lim f (x, y) respectively are not necessarily in general, equal.
y→y◦ x→x◦
Although, they must be equal if lim f (x, y) exists, their equality does not guar-
x→x◦
y→y◦
antee the existence of this limit.

x−y
Example 2.1.5. If f (x, y) = , then
x+y
 
x−y
lim lim = lim (1) = 1
x→0 y→0 x + y x→0

and

 
x−y
lim lim = lim (−1) = −1.
y→0 x→0 x + y x→0

Since the iterated limits are not equal, we have that lim f (x, y) can not exist.
x→0
y→0

10
2.2 Some solved problems
(1) If f (x, y) = x3 − 2xy + 3y 2 , find

 
1 2 f (x, y + k) − f (x, y)
(a) f , and (b)
x y k

Solution:
   3  2
1 2 1
1 2 2
(a) f , = −2 +3
x y x x y y

(b)

f (x, y + k) − f (x, y) 1 3
= [x − 2x(y + k) − 3(y + k)2 ] − [x3 − 2xy + 3y 2 ]
k k

1 3
= (x − 2xy − 2kx + 3y 2 + 6k + 3k 2 − x3 + 2xy − 3y 2 )
k

1
= (−2kx + 6xy + 3K 2 )
k

= −2x + 6y + 3k.

(2) Give the domain of definition of the function defined by

(a) f (x, y) = ln{(16 − x2 − y 2 )(x2 + y 2 − 4)}


Solution:
Dom(f ) = {(x, y) : (16 − x2 − y 2 )(x2 + y 2 − 4) > 0}. i.e., 4 < x2 + y 2 < 16 which
is the required domain of definition.

p
(b) f (x, y) = 6 − (2x + 3y)
Solution:
Dom(f ) = {(x, y) : 2x + 3y ≤ 6.}

11
2.3 Exercises
1. (a) Compute the following limits

i.
(x + y)2 − (x − y)2
lim
(x, y)→(0, 0) xy
ii.
x3 − y 3
lim
(x, y)→(0, 0) x2 + y 2
iii.
xy
lim
(x, y)→(0, 0) x2 + y2 + 2
iv.
xy 3
lim
(x, y)→(0, 0) x2 + y 6

(b) Determine the domain of existence and continuity of the following func-
tions

i.
1
f (x, y, z) = 2
x + y + z2 − 4
2

ii.
1
f (x, y) = 2
x + y2
iii.
f (x, y) = 3x2 + 3y 2 log x2 + y 2
 

iv.
ex+y
f (x, y, z) =
1 + z2

12
Chapter 3

Introduction to IRn

3.1 Vector Spaces Revisited


n
Y
Recall that IRn = IR × IR × · · · IR = IRi where for each i, IRi = IR. Let ξ ∈ IRn ,
i=1
then ξ = (x1 , x2 , . . . , xn ), xi ∈ IR for i = 1, . . . , n.
The unit co-ordinate vectors
e1 = (1, 0, 0, . . . , 0)

e2 = (0, 1, . . . , 0)

e3 = (0, 0, 1, . . . , 0)

ei = (0, 0, . . . , 0, |{z}
1 , 0, . . . , 0)
i−thplace

en = (0, 0, . . . , 1) form the standard orthonormal basis for IRn

{ei }n
i=1 form a basis for IR
n

i.e., if x ∈ IRn then ∃λi ∈ F ∋


n
X
x= λi ei
i=1

Consider the figure below The distance between the point x ∈ IR3 and the origin is
given
q
d(x, 0) = ξ12 + ξ22 + ξ32

13
Thus, given ξ, η ∈ IRn , we have the distance between them to be given by

n
!1/2
X
d(ξ, η) = |ξi − ηi |2 − metric distance (3.1)
i=1

This is called the metric distance between ξ and η.


We denote by
n
!1/2
X
||x|| = |ξ|2 the length or norm of a vector space x ∈ IRn
i=1
If n = 1, then ||x|| = |x|, the absolute value of x(= xi ). If x, y ∈ IR, the inner
product
n
X
⟨x, y⟩ = xi yi (3.2)
i=1

Thus
⟨x, x⟩ = ||x||2 (3.3)

We have the following properties of a norm.

Definition 3.1.1. For vectors x, y ∈ IRn , we have

(i) ||x|| ≥ 0

(ii) ||x|| = 0 ⇐⇒ x = 0

(iii) ||αx|| = |α|||x||, α ∈ IR - Absolute homogeneity.

(iv) ||x + y|| ≤ ||x|| + ||y|| - triangular inequality.

14
Definition 3.1.2. A mapping f with Dom(f ) = IRn and Ran(f ) ⊆ IRn ,
f : IRn → IRm is said to be linear if for all x, y ∈ IRn , α ∈ IR, we have

1. f (x + y) = f (x) + f (y)

2. f (αx) = αf (x)

If α = 0, it follows from (2) that

f (0) = f (0 · x) = 0f (x) = 0.

Lemma 3.1.3. Every linear mapping on IRn is bounded

That is, if f : IRn → IRm is linear, then ∃ a constant M > 0 such that
||f (x)|| ≤ M ||x|| ∀x ∈ IRn .

Proof:
Let {e1 , e2 , . . . , en } be the basis of IRn and let x ∈ IRn , then
n n
!
X X
x= xi e i , f (x) = f xi ei
i=1 i=1

By linearity of f , we have
n
X
f (x) = xi f (ei )
i=1
Therefore,
n
X
||f (x)|| = || xi f (ei )||
i=1
n
X
≤ |xi |||f (ei )||
i=1
n
X
≤ max |xi | ||f (ei )||
i=1

≤ M ||x||

15
n
X
where M = ||f (ei )||
i=1
Therefore,
max |xi | ≤ ||x|| □
i

3.2 Derivaties
Definition 3.2.1. Let f : Dom(f ) ⊂ IRn −→ IR and let x◦ ∈Dom(f ). f is said
to be continuous at x◦ if and only if given ϵ > 0 ∃δ = δ(ε) > 0 such that
∀ x ∈Dom(f ), ||x − x◦ || < δ ⇒ ||f (x) − f (x◦ )|| < ε.
Then f is continuous on Dom(f ) ⊆ IRn if and only if it is continuous at every point
of Dom(f ).

Theorem 3.2.2. If f is a linear map of IRn into IRm , then f is continuous on IRn .

Proof: Choose x◦ ∈ IRn arbitrarily. Then, for M > 0,

||f (x) − f (x◦ )|| = ||f (x − x◦ )||

≤ M ||x − x◦ ||.

Given ε > 0, choose δ = ε/M.


Then,
||x − x◦ || < δ ⇒ ||f (x) − f (x◦ )|| < ε. □

Definition 3.2.3. Suppose U is an open subset of IRn , f : U → IRm and x◦ ∈ U .


Then, f is said to be differentiable at x◦ if and only if there exists a linear map Tx◦
of IRn into IRm such that
||f (x◦ + h) − f (x◦ ) − Tx◦ (h)||IRm
lim (1)
h→0 ||h||IRn
exists.

The Linear map Tx◦ is called the derivatives of f at x◦ and is usually denoted
by Df (x◦ ).
Note that h is a point in IR and f (x◦ + h) − f (x◦ ) − Tx◦ (h) is a point in IRm

16
Remark 3.2.4. If for x◦ + h ∈ U , rx◦ (h) = f (x◦ + h) − f (x◦ ) − Tx◦ (h), then
equation (1) can be re-written as

f (x◦ + h) = f (x◦ ) + Tx◦ (h) + rx◦ (h)

We may give an equivalent definition of differentiability as follows.

Definition 3.2.5. A function f : U ⊆ IRn → IRm is said to be differentiable at


x◦ ∈ U if and only if given ε > 0, ∃δ = δ(x, ε) and a linear map Tx◦ : IRn → IRm
such that
||f (x◦ + h) − f (x◦ ) − Tx◦ (h)|| < ε||h||

whenever ||h|| < δ. If h = x − x◦ then x = x◦ + h.

Another equivalent definition of differentiability is as follows:


For any ε > 0, ∃ δ > 0 and a linear map Df (x◦ ) such that ∀ x ∈ IRn and
||x − x◦ || < δ ⇒ ||f (x) − f (x◦ ) − Df (x◦ )(x − x◦ )|| ≤ ε||x − x◦ ||

Theorem 3.2.6. Let U ⊂ IRn be an open set on IRn and let f : U → IRm be
differentiable at x◦ ∈ U . Then Df (x◦ ) is unique.

Proof:
Let Tx◦ and Tx′ ◦ be two linear maps of IRn into IRm such that for the open set U
with x◦ and x◦ + h in U .

f (x◦ + h) = f (x◦ ) + rx◦ (h) + Tx◦ (h),

where
||rx◦ (h)||
lim =0
h→0 ||h||
f (x◦ + h) = f (x◦ ) + Tx′ ◦ (h) + Sx◦ (h),

where
||Sx◦ (h)||
lim = 0.
h→0 ||h||

17
Then,
Tx◦ (h) − Tx′ ◦ = Sx◦ (h) − rx◦ (h)

and hence
||Tx◦ (h) − Tx′ ◦ (h)|| ||Sx◦ (h) − rx◦ (h)||
=
||h|| ||h||
||Sx◦ (h)|| ||rx◦ (h)||
≤ + −→ 0 as h → 0.
||h|| ||h||
So that
||Tx◦ (h) − Tx′ ◦ (h)||
lim = 0.
h→0 ||h||
Thus, for each ε > 0, ∃ δ > 0, such that

||Tx◦ (h) − Tx′ ◦ (h)|| = ||(Tx◦ − Tx′ ◦ )(h)|| < ε||h||,

provided ||h|| > δ.


For any x ∈ IRn , let x = αh where ||h|| < δ, α ∈ IR\{0.} Then

||(Tx◦ − Tx′ ◦ )(x)|| = |α|||(Tx◦ − Tx′ ◦ )(h)||

< |α|ε||h|| = ε||αh|| = ε||x||.

This shows that


Tx◦ = Tx′ ◦ . □

Definition 3.2.7. If f : U ⊂ IRn → IRm is differentiable at each point of U , we say


that f is differentiable on U .

Theorem 3.2.8. Suppose U ⊂ IRn is an open set and f : U ⊂ IRn is differentiable


at x◦ ∈ U . Then f is continuous at x◦ .

Proof:
First, we show that for each x◦ ∈ U , there are constants δ > 0, and M > 0 such
that

||x − x◦ || < δ ⇒ ||f (x) − f (x◦ )|| ≤ M ||x − x◦ || ∀x ∈ IRn (by Lemma 3.1.3).

Now, if L : IRn −→ IRn is a linear map, then ∃, M0 > 0 such that ||L(x)|| ≤
M0 ||x|| ∀ x ∈ IRn .

18
Take L := Df (x◦ ) with x◦ ∈ U. Then given ε > 0, choose δ0 = min{δ, ε/M }.
Let ε = 1, then by definition of differentiability, ∃ δ0 > 0 such that

||x − x◦ || < δ0 ⇒ ||f (x) − f (x◦ ) − Df (x◦ )(x − x◦ )|| < ||x − x◦ ||.

But

||f (x) − f (x◦ )|| = ||f (x) − f (x◦ ) − Df (x)(x − x◦ ) + Df (x)(x − x◦ )||

≤ ||f (x) − f (x◦ ) − Df (x)(x − x◦ )|| + ||Df (x)(x − x◦ )||

< (1 + M0 )||x − x◦ ||

Set M := (1 + M0 ). Then,

||x − x◦ || < δ ⇒ ||f (x) − f (x◦ )|| ≤ M ||x − x◦ || < δM = ε. □

Corollary 3.2.9. Let f be a function and x◦ ∈ Dom(f ) = U remain as in the


theorem above. If f is not continuous at x◦ , then f is not differentiable at x◦ .

Example 3.2.10. Find Df (1, 1) if f (x, y) = x2 + y 2 such that f : U ⊆ IR2 −→ IR.

Solution: Let h = (h1 , h2 ), x◦ = (1, 1)


We consider

f (x◦ + h) − f (x◦ ) = f ((1, 1) + (h1 , h2 )) − f (1, 1)

= f (1 + h1 , 1 + h2 ) − f (1, 1)

= (1 + h1 )2 + (1 + h2 )2 − (12 + 12 ) (∗)

= 1 + 2h1 + h21 + 1 + 2h2 + h22 − 2

= h21 + h22 + 2h1 + 2h2

Tx◦ (h) = Tx◦ (h1 , h2 ) is a linear part of (*). i.e., 2h1 + 2h2 . We write this as linear
map Df : IR2 −→ IR, i.e.,
 
2
2h1 + 2h2 = (h1 , h2 )
2

19
provided the condition of differentiability in its definition is satisfied. Now we know
q
that ||h|| = ||(h1 , h2 )|| = h21 + h22 or ||(h1 , h2 )||2 = h21 + h22 .
Then for arbitrary ε > 0,

||f (1 + h1 , 1 + h2 ) − f (1, 1) − 2h1 − 2h2 || = ||(h1 , h2 )||2 < ε||(h1 , h2 )||,

provided
||(h1 , h2 )|| < δ and δ = ε. □

Example 3.2.11. Let L : IRn → IRm be linear map. Prove that DL(x) = L(x).

Proof:
Given ε > 0, x◦ ∈ Dom(L), we find δ > 0 such that ||x − x◦ || < δ but

||L(x) − L(x◦ ) − DL(x − x◦ )|| < ε||x − x◦ ||

⇒ ||L(x) − L(x◦ ) − L(x − x◦ )|| < ε||x − x◦ ||

which is zero since L(x − x◦ ) = L(x) − L(x◦ ) (by linearity).


Hence, DL(x) = L satisfies the definition, with any δ > 0.

Exercise 3.2.12. 1. Let f : IR2 → IR2 be defined by f (x, y) = (xy, x2 + y 2 ).


Find the derivatives of f at an arbitrary point (a, b).

2. Bring an example of a function which is continuous at a given point but which


is not differentiable at that point.
Hint: Consider f : Dom(f ) ⊆ IR −→ IR defined by f (x) = |x|.
cos x + exy
3. Let f (x, y) = .
x2 + y 2
Show that f is differentiable at all points (x, y) ̸= (0, 0).

4. Verify whether the functions


2x
f (x, y) = and
(x2 + y 2 )2
x y
g(x, y) = +
y x
are C 1 or C 2 functions.

20
3.2.1 Directional Derivatives

Definition 3.2.13. Let f : IRn → IRm , where U is an open set, x◦ ∈ U and let u
be a vector in IRn . Then the directional derivative of f at x◦ in the direction of u
is denoted by Du f (x◦ ) and is defined by
f (x◦ + τ u) − f (x◦ )
Du f (x◦ ) = lim ,
τ →0 τ
provided the limit exists.

Alternatively, we have the following:

Definition 3.2.14. Du f (x◦ ) is the directional derivative along the direction of u if


and only if for every ε > 0, ∃ δ = δ(ε) > 0 such that if 0 < |τ | < δ then

||f (x◦ + τ u) − f (x◦ ) − Du f (x◦ )(τ u)|| ≤ ε||τ u||.

Example 3.2.15. Find the directional derivatives of f (x, y) 7→ x2 + 3xy: IR2 → IR


√ √
at ξ = (2, 0) in the direction u = (1/ 2, −1/ 2).

Solution:
p √ √
Notice that u as given above is a unit vector. i.e., (1/ 2)2 + (1/ 2)2 = 1
Therefore, the direction is at 450
 
1 1
f (ξ0 + τ u) = f (2, 0) + τ √ , − √
2 2
 
τ τ
= f 2 + √ , −√
2 2
 2   
τ τ τ
= 2+ √ +3 2+ √ −√
2 2 2
τ2 2
 
4τ −2τ τ
=4+ √ + +3 √ −
2 2 2 2
4τ τ 2 6τ 3τ 2
=4+ √ + −√ −
2 2 2 2
τ 2 2τ 2τ
= 4 − 2 − √ = 4 − √ − τ2
2 2 2
f (ξ0 ) = f (2, 0) = 22 + 3(2)(0) = 4

21

∴ f (ξ0 + τ u) − f (ξ0 ) = − √ − τ 2
2
Hence,

−√ − τ2 √
2 2
Du f (ξ0 ) = lim = − √ = − 2. □
τ →0 τ 2
Theorem 3.2.16. If U ⊆ IRn is an open set and if f : U → IRm is differentiable
at x◦ ∈ U and u is a unit vector in IRn , then Du f (x◦ ) exists and
¯

Du f (x◦ ) = Df (x◦ ).(u)

Proof: Let f : U ⊆ IRn → IRm , where U is an open set in IRn , and let x ∈ U .
If the derivative Df (x) exists, then each of the partial derivative Di f (x), 1 ≤ i ≤ n
exists and if u = (u1 , . . . , un ) ∈ IRn , then it is clear that D0 f (x◦ ) = 0 = Df (x◦ ).0.
Now, Df (x◦ ) is linear, therefore Df (x◦ ).0 = 0. Hence, this is true by degeneracy.
Next, let u ̸= 0, then by the definition of differentiability we have for any ε >
0, ∃ δ > 0 such that

||f (x) − f (x◦ ) − Df (x◦ )(x − x◦ )|| ≤ ε||x − x◦ ||

provided that ||x − x◦ || ≤ δ.


Set x = x◦ + τ u, τ ∈ IR\{0}. Then

||f (x◦ + τ u) − f (x◦ ) − Df (x◦ )(τ u)|| ≤ ε||τ u||

whenever ||τ u|| ≤ δ. Let |τ |||u|| = ||τ u||. Since u ̸= 0, we have

δ
||τ u|| < δ ⇒ 0 < |τ | < .
||u||
δ , then
Hence, if 0 < |τ | < ||u||

f (x◦ + τ u) − f (x◦ )
|| − Df (x◦ )(u)|| ≤ ||u||.
τ

This shows that the directional derivative exists in the direction of u and

Du f (x◦ ) = Df (x◦ ).(u). □

22
Example 3.2.17. Find the directional derivative of f : IR2 → IR : (x, y) 7→
1
4 − x2 − y 2 at (1, 2) in the direction of u = (cos π/3, sin π/3).
4

Solution: Df (x, y) = (−2x, −y/2). This is a continuous function and therefore,


f is differentiable. Since u = (cos π/3, sin π/3) is already a unit vector, we have

Du f (x, y) = Df (x, y).u = (−2x, −y/2)(cos π/3, sin π/3) = −2x cos π/3−y/2 sin π/3.

At (x, y) = (1, 2, ) we have

Du f (1, 2) = −2 cos π/3 − sin π/3



= −2(1/2) + (−1)( 3/2)

= −1 − 3/2

Example 3.2.18. Find the directional derivative of f (x, y) = x2 sin 2y at (1, π/2)
in the direction of v = (3, −4).

Solution:
Df (x, y) = (2x sin 2y, 2x2 cos 2y) is continuous. Thus f is differentiable. We thus
obtain the unit vector as
v 3 −4
u= =( , ).
∥v∥ 5 5
Hence,
Du f (x, y) = Df (x, y).u
= (2x sin 2y, 2x2 cos 2y)( 53 , −4
5 )
= 65 x sin 2y − 85 x2 cos 2y
Df (1, π/2) = 65 sin π/2 − 85 cos π
= 0( 65 ) + (−1)(−8/5)
= 58 .

Exercise 3.2.19. Give an example of a function f such that Du f (0, 0) exists for
all vector u but f is not differentiable at (0, 0).

23
Hints: Consider the function defined by
 2
x y

 if (x, y) ̸= (0, 0)
 x2 + y 2


f (x, y) =




0 if (x, y) = (0, 0)

and then apply limits to show that f is not differentiable.

Definition 3.2.20. The direction derivative of f in the direction of ei is called the


i-th partial derivative of f and it is denoted by Di f.

Remark 3.2.21. Let f : IRn → IRm and let f have components f1 , f2 , . . . , fm .


Then the partial derivatives Di f (x), 1 ≤ i ≤ m of f exists if and only if Di fj (x), 1 ≤
i ≤ n, 1 ≤ j ≤ m exists at a point x◦ say. Moreover

Di f (x◦ ) = (Di f1 (x◦ ), Di f2 (x◦ ), . . . , Di fm (x◦ ))

The existence of all partial derivatives at a point does not ensure the the differen-
tiability at that point.

Example 3.2.22. Let f : IR2 → IR be defined by



 x if y=0


f (x, y) = y if x=0


1 if x ̸= 0, y ̸= 0

∂f (x, y) ∂f (x, y)
= 1, =1
∂x ∂y
∂f (0, 0) ∂f (0, 0)
= 1, =1
∂x ∂y
But if x ̸= 0 and y ̸= 0, then

|f (x, y) − f (0, 0)| = |1 − 0| = 1

∴ if |(x, y)| < 1/n, then |f (x, y) − f (0, 0)| = 1


∴ f is not continuous at the point (0, 0).
Hence, f is not differentiable at (0,0).

24
Problems:

1. Show that the function f : IR → IR defined by f (x) ∀ x ∈ IR where f (x) = |x|


is continuous on IR but not differentiable at 0.

2. 
 xy 2 if (x, y) ̸= (0, 0)
f (x, y) =
0 if (x, y) = (0, 0)

Show that Du f (0, 0) exists in a direction of a vector u ∈ IR2 but that f is not
differentiable at (0,0).

1. Find the directional derivatives f = x3 + 3xy + 4


ξ0 = (3, 1) in the direction of a vector (5, -5).
 
x y
||u|| = ,
||x|| ||y||
p √
||u|| = x2 + y 2 = 50
 
5 −5
u= √ ,√ .
50 50
Theorem 3.2.23. Suppose f is differentiable at x◦ ∈ U and u is a vector in IRn .
If u ̸= 0, then
Since f = (f1 , f2 , . . . , fm ) and each f : IRn → IRm is an open set for i = 1, 2, . . . , n,
j = 1, 2, . . . , m,

fj (x + τ ei ) − fj (x)
Di fj (x) = lim
τ →0 τ
fj (x1 , . . . , xi + τ, . . . , xn ) − fj (x1 , . . . , xn )
= lim
τ →0 τ
provided the limit exists. These are the partial derivatives of fj with respect to xi keeping all other
∂f ∂fj
components fixed. Di f, Di fj may be denoted by and respectively.
∂xi ∂xi

Moreover,

25
If x = (x1 , x2 , . . . , xn ) ∈ IRn , then

f (x + τ ei ) − f (x)
Di f (x) = lim
τ →0 τ
f (x1 , . . . , xi + τ, . . . , xn ) − f (x1 , . . . , xn )
= lim
τ →0 τ
N.B. x = (x1 , x1 , . . . , xn )
τ ei = (0, 0, . . . , τ, . . . , 0)
x + τ = (x1 , x2 , . . . , xi + τ, 0, . . . , 0). And
n
X
Df (x)(u) = ui Di f (x)
i=1

Proof:
Suppose Df (x) exists. Then by the theorem which states that Du f (x) =
Df (x)(u), we have that for each e1 , e2 , . . . , en , the partial derivatives
Di f (x), D2 f (x), . . . , Dn f (x) and are equal to Df (x)e1 , Df (x)e2 , . . . , Df (x)en re-
spectively.
n
X
Since Df (x) is linear and u = ui ei , we have that
i=1
n
X n
X
Df (x)u = ui Df (x)ei = ui Di f (x). □
i=1 i=1

Theorem 3.2.24. Let U ⊆ IRn be an open set and set x ∈ U . If f : U → IRm is


differentiable at x, then Di fj (x) exist for i = 1, 2, . . . , n, j = 1, , 2 . . . , m and Df (x)
is the matrix given by

(Di fj (x))m
j=1 i = 1, 2, . . . , n

Proof:
That Df (x) exists
f (x + τ ei ) − f (x)
= lim
τ →0 τ
Di f (x) = Df (x)ei (∗)

exists for each i = 1, 2, . . . , n.

26
If (e∗1 , e∗2 , . . . e∗m ) is the basis for IRm , then since f = (f1 , f2 , . . . , fm ), we have
m
X
f (x) = fj (x)e∗j
j=1

By equation (∗) above, we have


m [f (x + τ e ) − f (x)e∗ ]
X j i j j
Df (x)ei = lim
τ →0 τ
j=1
m
X
= Di fj (x)e∗j
j=1

Therefore, it is differentiable at a point, x = (x1 , . . . , xn ) ∈ U , the partial derivatives


¯
Di fj (x) i = 1, 2, . . . , n, j = 1, 2, . . . , m exists at x. But the derivatives Df (x), if it
exists maps u = (u1 , u2 , . . . , un ) ∈ IRn to point w = (w1 , w2 , . . . , wm ) ∈ IRm .
We may take u and w to be the standard basis of IRn and IRm respectively. Then
we have

w1 = D1 f1 (x)u1 + D2 f1 (x)u2 + · · · + Dn f1 (x)un

w2 = D1 f2 (x)u1 + D2 f2 (x)u2 · · · + Dn f2 (x)un

wm = D1 fm (x)u1 + D2 fm (x)u2 + · · · + Dn fm (x)un


    
w D f (x) D2 f1 (x) · · · Dn f1 (x) u
 1  1 1   1
 w2   D1 f2 (x) D2 f2 (x) · · · Dn f2 (x)   u2 
    
    
 .   .  . 
 =
    
 
 .   .  . 
    
    
 .   .  . 
    
wn D1 fm (x) D2 fm (x) · · · Dn fm (x) un
∂fj (x)
Hence, Di fj (x) = .
∂xi
∴ Df (x) = (Di fj (x))m
j=1 i = 1, 2, . . . , n.

27
If m = 1, we have
∂f ∂f ∂f
Df (x) = , ,··· , .
∂xi ∂x2 ∂xn
The matrix (Di fj (x))m
j=1 , i = 1, 2, . . . , n is called the Jacobian Matrix.

Definition 3.2.25. For m = 1, Df (x) is a 1×n-matrix given by (D1 f (x), D2 f (x), . . . , Dn f (x)).
The vector whose components are the same as Df (x) is called the gradient
of f and is denoted by ▽f or grad(f ).

Example 3.2.26. Let f : IR2 → IR3 be defined by


f (x, y) = (x3 y, x4 y 2 , x3 ).
Find Df (1, 1).

Solution: f1 (x, y) = x3 y, f2 (x, y) = x4 y 2 , f3 (x, y) = x3

D1 f (x, y) = (3x2 y, 4x3 y 2 , 3x2 )

D2 f (x, y) = (x3 , 2x4 y, 0)

 
∂f1 ∂f1
 ∂x ∂y 
 
 
 
 ∂f2 ∂f2 
Df (x, y) = 
 ∂x ∂y 

 
 
 
 ∂f3 ∂f3 
∂x ∂y

 
3x2 y x3
 
= 3 2 4 
4x y 2x y 
3x2 0

 
3 1
 
Df (1, 1) = 
4 2 

3 0

28
Exercise 3.2.27. 1. If f is a function f : IR2 → IR3 defined by

f (x, y) = (x2 y, xy, x3 − 2)

Determine the partial derivatives of f . Show that it is differentiable at all


points in IR3 . Determine the Jacobian matrix of f and obtain Df (1, 2).

2. Find the gradient of each point at which it exists for the function defined by
(a) f (x, y) = x2 + y 2 (sin xy)
(b) f (x, y) = ex cos y
(c) f (x, y, z) = log(x2 + 2y 2 − 3z 2 )

3. Compute the matrix of partial derivative of the following functions

(i) f : R2 7→ R3 : f (x, y) = (xey + cos y, x + ey )

(ii) f : R3 7→ R2 : f (x, y, z) = (x + ez + y, yx2 )

(iii) f : R2 7→ R3 : f (x, y) = (xyexy , x sin y, 5xy 2 )

(iv) f : R4 7→ R4 : f (x, y, z, w) = exyzw log(x2 + y 2 + z 2 + w2 ), 1, 0, xyz




Theorem 3.2.28. Let U ⊆ IRn be an open set and let f : U → IR3 . Suppose
f = (f1 , f2 , . . . , fm ). If all the first partial derivatives of f exists and are continuous
on U, then f is differentiable.

Proof: We prove the result for m = 1. We show that Df (x◦ ) exists for some
arbitrary x◦ ∈ U . Take ε > 0. Since U is open, ∃ δ > 0 such that B(x◦ , δ) ⊂ U ,
and such that

|Di f (x) − Di f (x◦ )| < ε/n. ∀ x ∈ B(x◦ , δ), i = 1, 2, . . . , n

29
Let h be sufficiently small and let
x◦ = (a1 , a2 , . . . , an ). Then
f (x◦ + h) − f (x◦ ) = f (a1 + h1 , a2 , ...) − f (a1 , a2 , ..., an ) + f (a1 + h1 , a2 + h2 + . . .)
+f (a1 + h1 , a2 , ..., an ) + · · · + f (a1 + h1 , ..., an + h) − f (a1 + h1 , a2 + h2 , ...)

Note that D1 f is the ordinary derivative of g defined by g(τ ) = f (τ, a1 , ...


applying the Mean value theorem for one real variable on g. We obtain
f (a1 + h1 , a2 , ..., an ) − f (a1 , a2 , ...)
= h1 D1 f (ξ, a1 , ..., an ), where
Similarly, the i-th term in the sum
equals.
h1 D1 f (a1 + h1 , ..., ai−1 + hi−1 , ξi , ..

= hi Di f (c)
n
X
Since Df (x◦ )hi = Di f (x◦ )hi , we have
n=1
n
X
f (x◦ + h) − f (x◦ ) = Di f (c)hi
i=1
Therefore,
n
X n
X
f (x◦ + h) − f (x◦ ) − Di f (x◦ )h1 = Di fi
i=1 i=1
n
X n
X
|f (x◦ + h) − f (x◦ ) − Di | ≤ |Di f (c) − Di f (x◦ )||hi |
i=1 i=1
n
1X
≤ |hi |ε ∀i, |hi | < ||h||
n
i=1
1
≤ ||h||nε
n
= ε||h||
Whenever ||h|| < δ. This shows that f is differentiable. □

Remark 3.2.29. The converse of the above theorem is also true.

30
Definition 3.2.30. A function f : U ⊆ IRn → IRm is said to be continuously differ-
entiable on U if Df (x) is a continuous map of U into L(IRn , IRm ) where L(IRn , IRm )
is the linear space of all linear maps from IRn to IRm .
More explicitly, it is required that to every x ∈ U and to every ε > 0, ∃δ > 0
such that ||Df (y) − Df (x)|| < ε, if y ∈ U and ||x − y|| < δ. It can be proved
that f : U ⊂ IRn → IRm is continuously differentiable if and only if the first partial
derivatives Di fj , (i = 1, 2, ..., n), (j = 1, 2, ..., m) exists and are continuous if it is
continuously differentiable, we say that f ∈ C(U ).

x2 y 4
if (x, y) ̸= 0


Exercise 3.2.31. 1. Let f (x, y) = x +6y 8
2
.
 0

if (x, y) = (0, 0).

∂f ∂f
2. Show that (0, 0) and (0, 0) exist.
∂x ∂y

∂ 2f ∂ 2f
(i) Is = true?
∂x∂y ∂y∂x
(ii) Is f differentiable at (0, 0) or continuous at (0, 0)?

31
Chapter 4

The Chain Rule, Higher Order


Derivatives and Taylor’s Theorem

4.1 Introduction
First, we prove the Chain rule otherwise known as the Composite mapping
theorem (CMT).

Theorem 4.1.1. Let f : U → IRm be a differentiable function on the open set


U ⊂ IRn . Let g : V → IRp be differentiable on the open set V ⊂ IRm and suppose that
f (U ) ⊂ V . Then the composite function g ◦ f is differentiable at U and D(g ◦ f )x =
Dg(f (x◦ )).Df (x◦ ).

Proof:: We must show that

||g ◦ f (x) − g ◦ f (x◦ ) − (Dg)f (x◦ ).Df (x◦ )(x − x◦ )||


lim =0
x→x◦ ||x − x◦ ||

The estimate of the numerator is as follows:

||g ◦ f (x) − g ◦ f (x◦ ) − (Dg)f (x◦ ).Df (x◦ )(x − x◦ )||

32
= ||g[f (x)] − g[f (x◦ )] − (Dg)f (x◦ )[f (x) − f (x◦ )]

+ (Dg)f (x◦ )[f (x) − f (x◦ )] − (Dg)[f (x◦ )].Df (x◦ )(x − x◦ )||

≤ ||gf (x) − gf (x◦ ) − (Dg)f (x◦ )[f (x) − f (x◦ )]||

+ ||(Dg)f (x◦ )[f (x) − f (x◦ ) − Df (x◦ )(x − x◦ )]||.


Since f is differentiable, then ∃ δ > 0, M > 0 such that whenever ||x − x◦ || < δ0 ,
we have
||f (x) − f (x◦ )|| ≤ M ||x − x◦ ||.

Now given ε > 0. by definition of derivatives of g, there exists a δ1 > 0 such that

||y − f (x◦ )|| < δ1 ⇒ ||g(y) − g(f (x◦ )) − (Dg)f (x◦ )(x − x◦ )||

≤ ε/2M ||x − x◦ ||.

Let δ2 = min(δ0 , δ1 ). Then we have that if ||x − x◦ || ≤ δ2 ,

||g[f (x)] − g[f (x◦ )] − (Dg)(f (x◦ ))[(f (x) − f (x◦ )]|| < ε/2M ||f (x) − f (x◦ )||

≤ ε/2||x − x◦ ||

|||g(f (x)) − g(f (x◦ ) − (Dg)(f (x◦ ))[f (x) − f (x◦ )]|
||x − x◦ || < δ2 ⇒ < ε/2.
||x − x◦ ||
But (Dg)f (x◦ ) is linear. Therefore, there exists K > 0 such that

||(Dg)f (x◦ )(y)|| ≤ K||y|| ∀ y ∈ IRn .

By definition of derivative of f, there exists δ3 > 0 such that

||x − x◦ || < δ3 ⇒ ||f (x)−f (x◦ ) − Df (x◦ )(x − x◦ )||

< ε/2K||x − x◦ ||.

Therefore,

||(Dg)f (x◦ )[f (x) − f (x◦ ) − Df (x◦ )(x − x◦ )]||

≤ K||f (x) − f (x◦ ) − Df (x◦ )(x − x◦ )|| < ε/2K.K||x − x◦ ||

or
||(Dg)f (x◦ )[f (x) − f (x◦ ) − Df (x◦ )(x − x◦ )]||
< ε/2.
||x − x◦ ||

33
Therefore, if ||x − x◦ || < δ = min(δ2 , δ3 ), we have

g ◦ f (x) − g ◦ f (x◦ ) − (Dg)f (x◦ )Df (x◦ )(x − x◦ )||


||
||x − x◦ ||

< ε/2 + ε/2 = ε.

This means that

||g ◦ f (x) − g ◦ f (x◦ ) − (Dg)f (x◦ )(x − x◦ )||


lim = 0.
x→x◦ ||x − x◦ ||

Therefore,
D(g ◦ f )(x◦ ) = (Dg)f (x◦ ).Df (x◦ ). □

We may interpret the composite mapping theorem thus, if x = (x1 , x2 , . . . , xn ).


y = f (x). Then y = (y1 , y2 , . . . , yn ). If h = g ◦ f, then
g ◦ f : IRn −→ IRp : IRn → IRm → IRp .
Here, if f and g are as defined in the last theorem, we have yj = fj (x1 , . . . , xn ), j =
1, . . . , m. with g ◦ f as defined above. i.e.

g ◦ f : IRn → IRp

The matrix Dh(x) is given by


  
∂g1 ∂g1 ∂g1 ∂f1 ∂f1 ∂f1
··· ···
 ∂y1 ∂y2 ∂ym   ∂x1 ∂x2 ∂xn 
  
  
 ∂g ∂g2

∂g2   ∂f2 ∂f2

∂f2 
 2 ··· ···
 ∂y1 ∂y2 ∂ym   ∂x1 ∂x2 ∂xn 
  
Dh(x) = 
 .
 .



  
 .  . 
  
  
  
  
∂gp ∂gp ∂gp ∂fm ∂fp ∂fm
∂y1 ∂y2 ··· ∂ym ∂x1 ∂x2 ··· ∂xn
p
Writing Dh(x) = (aij )i=1 j = 1, 2, ..., m, then we have
X ∂gp ∂f
aij = · k - (This is the classical Chain Rule). Thus,
∂yk ∂xj

m
∂h1 X ∂g1 ∂fk
a11 = =
∂x1 ∂yk ∂x1
k=1

34
m
∂h1 X ∂g1 ∂fk
a12 = =
∂x2 ∂xk ∂x2
k=1
and so on.

Example 4.1.2. Let f = f (x, y), where x = u(r, θ), y = v(r, θ) choose u(r, θ) =
r cos θ,
v(r, θ) = r sin θ. Set h(r, θ) = f (r cos θ, r sin θ)
f (x, y) = reiθ = r cos θ + ir sin θ
w := f (u(r, θ), v(r, θ)), w = u + iv
using h(r, θ) = f (r cos θ, r sin θ)

∂h ∂f ∂x ∂f ∂y
= · + ·
∂r ∂x ∂r ∂y ∂r
∂h ∂f ∂x ∂f ∂y
= · + ·
∂θ ∂x ∂θ ∂y ∂θ
∂x ∂x
But = cos θ, = −r sin θ
∂r ∂θ
∂y ∂y
= sin θ, = r cos θ
∂r ∂θ
∂h ∂f ∂f
= cos θ + sin θ
∂r ∂x ∂y
∂h ∂f ∂f
= −r sin θ + r cos θ
∂θ ∂x ∂y

Example 4.1.3. Find the Jacobian of f (x, y) = (sin(x sin y), (x + y)2 ) if
f : IR2 → IR2 .

Solution:

∂f1 ∂(sin(x sin y)) ∂(sin y)


= = cos(x sin y)
∂x ∂x ∂x
= sin y cos(x sin y)

35
∂f2 ∂
= 2(x + y) (x + y) = 2(x + y)
∂x ∂x
∂f1 ∂
= cos(x sin y) (sin y) = cos(x sin y)x cos y
∂y ∂y
∂f2
= 2(x + y)
∂y
By the general rule for derivatives, we have Jacobian matrix (where x = x1 , and
y = x2 ) as  
sin y cos(x sin y) x cos y cos(x sin y)
 
 
 
 
 
 
 
 
 
 
 
 
2(x + y) 2(x + y).

Remark 4.1.4. The Jacobian matrices generally are not symmetric and indeed,
need not be square. Symmetry is only a property of the second derivatives.

Exercise 4.1.5. 1. Verify Chain rule for the following functions.

(a) Let f (u, v, w) = eu−w , cos(u + v) + sin(u + v + w) and g(x, y) = ex , cos(y − x), e−y .
 

Calculate f ◦ g and D(f ◦ g)(0, 0).

(b) Let f (u, v) = tan(u − 1) − ev , u2 − v 2 and g(x, y) = ex−y , x − y . Cal-


 

culate f ◦ g and D(f ◦ g)(1, 1).


−9
 
(c) Let h : R3 7→ R5 7→
and g : R2 R3
be given by h(x, y, z) = xyz, exz , x sin y, , 17
√ x
and g(u, v) = v 2 + 2u, π, 2 u . Find D(h ◦ g)(1, 1).


u2 + v 2
(d) Let h(x, y) = f ((u(x, y), v(x, y)) and f (u, v) = where u(x, y) =
u2 − v 2
∂h ∂h
e−x−y , v(x, y) = exy . Find , .
∂x ∂y

4.1.1 Mean-Value Theorem

Next, we consider the Mean-Value theorem for functions f : IRn → IR.

36
Definition 4.1.6. A subset E of a linear space X is said to be convex if and only
if, for each pair of points x, y ∈ E, the line segment joining x and y lies in E

i.e., If x, y ∈ E then L[x, y] = {z : z = (1 − t)x + ty, t ∈ [0, 1]} ⊂ E. Notice that


L[x, y] = L[y, x].
We wish to show that the ball defined on IRn by B(x, δ) = {x ∈ IRn : ||x−x◦ || <
δ} is convex. To see this, let x, y ∈ B(0, δ) and let t ∈ [0, 1], then ||x|| < δ, ||y|| < δ.
Therefore,

||(1 − t)x + ty|| ≤ ||(1 − t)x|| + ||ty||

= (1 − t)||x|| + tδ = δ,

(1 − t)x + ty ∈ B(0, δ).


Hence, B(0, δ) is a convex set. □

Theorem 4.1.7. (Mean Value Theorem) Let U be an open set in IRn and consider
f : U → IR. Suppose the set U contains the points x, y and the line segment L[x, y]
joining them and that f is differentiable at every point of this segment . There exists
a point ξ ∈ L[x, y] such that

f (y) − f (x) = Df (ξ)(y − x).

Proof: Let φ : IR → IR be defined by

φ(t) = (1 − t)x + ty.

Then φ(0) = x, φ(1) = y and φ(t) ∈ L[x, y] ⊂ U for t ∈ [0, 1].


Since U is open and φ(t) is continuous, then ∃ r > 0 such that φ maps the open
interval (−r, r + 1) into U . Define

F : (−r, r + 1) → IR by

F (t) = f ◦ φ(t) = f ((1 − t)x + ty).

37
By the chain rule,

F ′ (t) = Df ((1 − t)x + ty) · φ′ (t)

= Df ((1 − t)x + ty)(y − x).

But F (0) = f (φ(0)) = f (x),

F (1) = f (φ(1)) = f (y).

By applying the MVT for function of single variables, ∃t ∈ (0, 1) such that

F (1) − F (0) = F ′ (t)(1 − 0)

Letting ξ = φ(t) = L[x, y], we obtain

f (y) − f (x) = F (1) − F (0) = F ′ (t)

= Df (ξ)(y − x). □

Remark 4.1.8. The theorem above is not valid for vector-valued functions as the
following example shows

Example 4.1.9. The function f : IR → IR2 is given by f (x) = (cos x, sin x). Prove
that there are points u, v ∈ IR such that

f (v) − f (u) ̸= Df (ξ)(v − u) ∀ ξ ∈ IR.

Solution:
If v = u + 2π, then v − u = 2π

cos(u + 2π) = cos u

sin(u + 2π) = sin u

f (v) = (cos v, sin v)

= (cos u, sin u) if v = u + 2π.

Also

f (u) = (cos u, sin u)

f (v) − f (u) = (0, 0)

38
But, for all ξ ∈ IR,

Df (ξ) = (− sin ξ, cos ξ)

Df (ξ)(u − v) = Df (ξ)(2π)

= (−2π sin ξ, 2π sin ξ).

Hence,

||Df (ξ)(2π)|| = ||(−2π sin ξ, 2π cos ξ)||

= [(−2π sin ξ)2 + (2π cos ξ)2 ]1/2

= [(2π)2 (sin2 ε + cos2 ε)]1/2

= 2π. □

4.2 Higher Order Derivative & Taylor’s Theorem


Definition 4.2.1. Consider the function f : U ⊆ IRn → IRm where U ⊆ IRn is
open. We may take m = 1, if Ds f (x) exists for all x ∈ U , we obtain a function
Ds f : U ⊆ IRn → IR.

The r-th partial derivative of this function at x, i.e., Dr (Ds f )(x) is often denoted
by Drs f (x). The function Drs f (s) is called the second order or mixed partial
derivative of f .

Remark 4.2.2. Drs f should not be assumed to be symmetric as the following


example shows.

Example 4.2.3. Define f : IR2 → IR by

xy(x2 − y 2 )


 if (x, y) ̸= (0, 0)
 x2 + y 2

f (x, y) =



0 elsewhere if (x, y) = (0, 0)

(a) Show that D2 f (x, 0) = x for all x and D1 f (0, y) = −y for all y.
(b) Show that D12 f (0, 0) ̸= D21 f (0, 0).

39
Solution:
∂ 2f
(a) Note that Drs f = .
∂xr ∂xs
Thus, we have

∂f (x, y)
D1 f (x, y) =
∂x
(x + y 2 )[(x2 − y 2 )]y 3
2
=
(x2 + y 2 )2
y(x4 + 4x2 y 2 − y
=
(x2 + y 2 )2

If (x, y) ̸= (0, 0)
D1 f (0, y) = −y
D1 f (0, 0) = 0
Since D1 f (0, 0) = 0, then,
D2 D1 f (0, 0) = −1.
Also, differentiating with respect to y,
∂f
D2 f (x, y) = (x, y)
∂y
(x2 + y 2 )(x3 − 3xy 2 ) − [xy(x2 − y 2 )(2y)]
= .
(x2 + y 2 )2

Hence, D2 f (0, 0) = 0 and D2 f (x, 0) = x.


Therefore, D1 (D2 f )(x, 0) = 1. Thus,

D12 f (0, 0) ̸= D21 f (0, 0) □

Theorem 4.2.4. Let U ⊂ IRn be an open set and let f be a continuous function
defined on U into IR. Suppose that the partial derivatives D1 f, D2 f, D1 D2 f and
D2 D1 f exists and are continuous, then D1 D2 f = D2 D1 f.

Proof: Let (x, y) be a point in U and let (h, k) be such that h ̸= 0, k ̸= 0, we


consider the expression

g1 (x) = f (x, y + k) − f (x, y)

g2 (y) = f (x + h, y) − f (x, y).

40
Then

g1 (x + h) − g1 (x) = f (x + h, y + k) − f (x + h, y) − f (x, y + k) + f (x, y)

g2 (y + k) − g2 (y) = f (x + h, y + k) − f (x, y + k) − f (x + h, y) + f (x, y.)

We observe that
g1 (x + h) − g1 (x) = g2 (y + k) − g2 (y).

Now g1 is a function of a single variable x and by 1-dimensional MVT, we have

g1 (x + h) − g1 (x) = hg1′ ( xi) for some ξ ∈ (x, x + h). (1)

We may write ξ1 = x1 + θ1 h, where 0 < θ < 1. Since f is differentiable, g1 is


differentiable. Hence, using the definition of partial derivatives equation (1) becomes

g1 (x + h) − g1 (x) = h[D1 f (x + θ1 h, y + k) − (D1 f (x + θ1 h))y].

By hypothesis, Df is differentiable, therefore by MVT in variable y, we have

D1 f (x + θ1 h, y + k) − D1 f (x + θ1 h, y) = k(D1 f )′ (x + θ1 h, η)

for some η ∈ (y, y + k). Put η = y + θ1 k, where 0 < θ1 < 1. Then

D1 f (x + θ1 h, y + k) − D1 f (x + θ1 h, y) = kD1 D2 f (x + θ1 h, y + θk),

where 0 < θ1 .
Hence, if 0 < ϕ1 , we have

g1 (x + h) − g1 (x) = hkD1 D2 f (x + θ1 h, y + ϕk).

Also, applying the MVT to g2 , we obtain

2 (y + k) − g2 (y) = kg2 (y + ϕ2 k), 0 < ϕ2 < 1

= k[D − 2f (x + h, y + ϕ2 k) − D2 f (x, y + ϕ2 k)].

Apply MVT to D2 f and obtain

D2 f (x + h, y+ϕ2 k) − D2 f (x, y + ϕ2 k)

= h(D2 f )′ (x + θ2 h, y + ϕ2 k).

41
Therefore,
g2 (y + k) − g2 (y) = khD1 D2 f (x + θ2 h, y + ϕ2 k).

Hence,
hkD1 D2 f (x + θ1 h, y + ϕ1 k) = khD1 D2 f (x + θ2 h, y + ϕ2 k),

where 0 < θ1 θ2 ϕ1 ϕ2 < 1.

D2 D1 f (x + θ1 h, y + ϕ1 k) → D2 D1 f (x, y) as h, k → 0.

Also,
D1 D2 f (x + θ2 h, y + ϕ2 k) → D2 D1 f (x, y) as h, k → 0

D1 D2 f (x, y) = D2 D1 f (x, y). □

Exercise 4.2.5. Let f : IR2 → IR2 be defined by

f (x, y) = xy 2 cos x2 .

Show that
∂ 2f ∂ 2f
= .
∂x∂y ∂y∂x
Solution:
∂f
= y 2 [−2x2 sin x2 + cos x2 ]
∂x
= −2x2 y 2 sin x2 + y 2 cos x2 .
∂f
= −4x2 y sin x2 + cos x2 · 2y
∂x∂y
= −4x2 y sin x2 + 2y cos x2 .
∂f
= x cos x2 [2y]
∂y
= 2xy cos x2 .
∂ 2f
= 2xy(−2x sin x2 + cos x2 [2y]
∂y∂x
= −4x2 y sin x2 + 2y cos x2 .

42
Definition 4.2.6. A function f : U ⊆ IRn → IR: where U is an open set, is
said to be of class C k (U ). If the k-th derivatives of f exists and are continuous.
Equivalently,
f is of class C k (U ) if all the k-th partial derivative.

Di1 , ..., Dik f = Di1 Di2 · · · Dik f


∂kf
=
∂xi1 ∂xi2 · · · ∂xik

exists and are continuous. The function of class of C ∞ (U ) if it is of class C k (U ) for


every k = 1, 2, · · · .
Each polynomial
X i i
f (x) = ai1 , ..., ain x11 x22 · · · xinn
2
is of class C ∞ (U ). Also, another example of a C ∞ (U ) function is f (x) = e−|x| .
Recall that if f : IRn → IRm , u = (u1 , ..., un ) ∈ IRn , we have
n
X
Df (x)(u) = Di f (x)ui .
i=1

If we replace (u1 , ..., un ) by (dx1 , ..., dxn ), then we have


n
X
df (x, dx) = (Di f )(x)dxi .
i=1

Then, Df is called the differential of f at x. If f : IRn → IR and we denote an


element in IRn by (x, y) and u = (dx, dy), then the differential df is defined by
∂f ∂f
df = ∂x + ∂y.
∂x ∂y
If f = f (u, v), where u = g(x, y) and v = f (x, y). Then,
∂f ∂f ∂u ∂f ∂v
= · + ·
∂x ∂u ∂x ∂v ∂x
∂f ∂f ∂u ∂f ∂v
= · + · .
∂y ∂u ∂y ∂v ∂y
Therefore, the differential df becomes
   
∂f ∂u ∂f ∂v ∂f ∂u ∂f ∂v
df = · + · ∂x + · + · ∂y.
∂u ∂x ∂v ∂x ∂u ∂y ∂v ∂y

43
Definition 4.2.7. Let f be a real-valued function defined on IRn . Then the second-
order differential d2 f is a function of n-dimensional variables defined for x ∈ IRn
where f has a continuous second order partial derivatives and for t ∈ IRn ,
n X
X n
d2 f (x; t) = Dij f (x)tj ti ,
i=1 j=1

where t = (t1 , t2 , ..., tn ).


The third order differential d3 f is defined as
n X
X n X
n
d3 f (x, t) = Dijk f (x)tk tj ti ,
i=1 j=1 k=1

if all the third order partial derivatives exists at x and are continuous.

Similarly, the m-th order differential dm f is defined as


n X
X n n
X
dm f (x, t) = ··· Di1 ,...,im f (x)tim tim−1 · · · ti1 ,
i1 =1 i2 =1 im =1

if all the nth partial derivatives exist and are continuous. Now, consider f : IR2 →
IR. If we denote an element of IR2 by (x, y) and if t = (∂x, ∂y), then

∂ 2f 2 2 2
2 + ∂ f dxdy + ∂ f dydx + ∂ f (dy)2
d2 f ((x, y), (dx, dy)) = (dx)
∂x2 ∂x∂y ∂y∂x ∂y 2
∂ 2f 2 2
2 + 2 ∂ f dxdy + ∂ f (dy)2 .
= (dx)
∂x2 ∂x∂y ∂y 2
Also,
∂ 3f ∂ 3f ∂ 3f ∂ 3f
d3 f (x, t) = 3
3 2
(dx) + 3 2 (dx) dy + 3 2
dx(dy) + 3 (dy)3
2
∂x ∂x ∂y ∂x∂y ∂y
∂ 4f 4 4
4 + 4 ∂ f (dx)3 dy + 6 ∂ f (dx)2 (dy)2
d4 f (x, t) = (dx)
∂x4 ∂x3 ∂y ∂x2 ∂y 2
∂ 4f 4
3 + ∂ f (dy)4
+4 dx(dy)
∂x∂y 3 ∂y 4
∂ mf ∂ mf
 
m
dm f (x, t) = (dx)m + (dx)m−1 dy+
∂xm 1 ∂xm−1 ∂y
∂ mf m
 
m m−2 (dy)2 + · · · + ∂ f (dy)m .
(dx)
2 ∂xm−2 ∂y 2 ∂y m
We are now ready to state and prove Taylor’s theorem.

44
Theorem 4.2.8. (Taylor’s Theorem) Let m be a positive integer, and let U ⊂ IRn
be an open set. Suppose that f : U → IR, has a continuous partial derivatives up to
and including order m at each point of U . If a ∈ U ,b ∈ U are such that L[a, b] ⊂ U ,
then ∃ a point ξ ∈ L[a, b] such that
1 2 1
f (b) = f (a) + df (a, b − a) + d f (a, b − a) + · · · + dm−1 f (a, b − a)
2! (m − 1)!
1 m
+ d f (ξ, b − a)
m!
m−1
X 1 k 1 m
= f (a) + d f (a, b − a) + d f (ξ, b − a).
k! m!
k=1

Proof:
Define a new function of single variable by the equation

g(t) = f [tb + (1 − t)a]

= f [a + t(b − a)]

= f (a1 + t(b1 − a1 ), a2 + t(b2 − a2 ), ..., an + t(bn − an )).

Then
g(0) = f (a) g(1) = f (b).

Apply the single variable Taylor’s theorem to g, we have


m−1
X 1 k 1 m
g(1) = g(0) + g (0) + g (τ ), (T )
k! m!
k=1

where τ ∈ (0, 1). Clearly, g is composite function given by

g(t) = f ◦ τ (t) = f (ψ(t)),

where

ψ(t) = tb + (1 − t)a

= a + t(b − a).

The derivative of the ith component of ψ(t) is given by ψ ′ (t) = bk − ak .

45
By applying chain rule to g, we see that g ′ (t) exists in the interval interval (0, 1)
and is given by
n
X
g ′ (t) = Dj f (ψ)(bj − aj )
j=1

= df ψ(t, b − a).

Applying the chain rule, we obtain


n X
X n
g ′′ (t) = Di,j f (ψ(t)(bj − aj )(bi − ai ))
i=1 j=1

= d2 f (ψ(t), b − a).

Similarly, we find that


m
X n
X
g m (t) = ··· Di1 ,...,im f (ψ(t))(bi1 − ai1 ) · · · (bim − aim )
i1 =1 im =1

= dm f (ψ(t), b − a).

Substituting these in equation (T ) with ψ(0) = a + 0(b − a) = a ∈ L[a, b] and


ξ = ψ(t), we obtain the required Taylor’s expansion

g(t) = f [tb + (1 − t)a]

g(0) = f (a)

g(1) = f (b)
1 ′′ 1 1 m
g(t) = g(0) + g ′ (t) + g (t) + g ′′′ (t) + · · · + g (t)
2! 3! m!
n n n
X 1 XX
f (b) = f (a) + Dj f (ψ(t))(bj − aj ) + Dij f (ψ(t))(bj − aj )(bi − ai )
2!
j=1 i=1 j=1
1 m
+ ··· + g (t). □
m!
Example 4.2.9. Let f : IR2 → IR be defined by f (x, y) = sin(xy). Obtain the
Taylors expansion for f at the point (0,0) up to order 2.

Solution:
a = (0, 0), b = (x, y)

46
f (a) = f (0, 0) = sin 0 = 0

∂f (0, 0) ∂f (0, 0)
f (x, y) = f (0, 0) ̸= x+ y
∂x ∂y
1 ∂ 2 f (0, 0) 2 ∂ 2 f (0, 0) ∂ 2 f (0, 0)
 
+ x +2 xy + .
2! ∂x2 ∂x∂y ∂y 2
Hence,
f (a) = f (0, 0), dx = x − 0, dy = y − 0, a = (x, y)
∂f ∂f (0, 0)
yx cos xy, =0
∂x ∂x

∂f ∂f (0, 0)
= cos xy, =0
∂y ∂y

∂ 2f ∂ 2 f (0, 0)
= −y sin(xy), =0
∂x2 ∂x2

∂ 2f ∂ 2 f (0, 0)
= −x2 sin(xy), =0
∂y 2 ∂y 2

∂ 2f ∂ 2 f (0, 0)
= − sin(xy) + cos(xy), =1
∂x∂y ∂x∂y
∴ sin(xy) = xy.
Problems 4.2

1. Let f be a function of class C m+1 in a neighborhood of x◦ ∈ IRn . Show that


∃ a unique polynomial Pm , x◦ of degree at most m such that
f (x) − Pm , x◦ (x)
lim =0
x→x◦ ||x − x◦ ||
Solution: Since the function f is of class C m , then all the mth order partial deriva-
tives exist and are continuous in the nbd of x◦ . Therefore given ε, ∃ δ > 0, such
that
∂ m f (x̃) ∂ m f (x◦ )
− <ε
∂xim ∂xim−1 · · · ∂xi1 ∂xim ∂xim−1 · · · ∂xi1
provided x̃ ∈ B(x◦ , δ) i.e., ||x̃ − x◦ || < δ. Suppose that x ∈ B(x◦ , δ), then Taylor’s
formulae becomes
1 m
f (x) = Pm−1,x◦ + d f (x̃, x − x◦ )
m!
47
where, x̃ ∈ L[x − x◦ ] and hence x̃ ∈ B(x◦ , δ).
(∴ B(x◦ , δ) is a converse set). Then the expansion may be written as

1 m
f (x) = Pm,x◦ (x) + [d f (x̃, x − x◦ ) − dm f (x◦ , x − x◦ )]
m!
n
X ∂ m f (x)
dm f (x̃, x − x ◦) = (x − x0i1 )(xi2 − x0i ) · · · (xim − x0i )
∂xm ∂xm−1 · · · ∂xi i1 2 m
i1 ,...,im
n
X ∂ m f (x◦ )
dm f (x ◦ , x−x◦ ) = (x −x0i1 )(xi2 −x0i ) · · · (xim −x0i )
∂xm ∂xm−1 · · · ∂xi i1 2 m
i1 ,...,im

|dm f (x̃, x − x◦ ) − dm f (x◦ , x − x◦ )| ≤


n
X ∂ m f (x̃) ∂ m f (x◦ )
− |xi1 − x0i1 | · · · |(xim − x0i |
∂xm · · · ∂xi ∂xm · · · ∂xi m
i1 ,...,im =1
Xn X n
n X n
X
< ε||x − x◦ ||m = ··· ε||x − x◦ ||m
i1 ,...,im =1 i1 =1 i2 =1 im =1

= εn2 ||x − x◦ ||m


Therefore,
|f (x) − Pm,x◦ (x)| nm ε
<
||x − x◦ ||m m!
Therefore
|f (x) − Pm,x◦ (x)|
lim =0
x→0 ||x − x◦ ||

2. Obtain a second degree polynomial for estimating the value of


f (x, y) = (1 + x + 4y)1/2 near that point ξ = (−1, 1).

Solution: f (x, y) = (1 + x + 4y)1/2 , ξ = (−1, 1)


f (ξ0 ) = f (x◦ , y◦ ) = f (−1, 1) = 2
 
∂f (ξ◦ ) ∂f (ξ◦ )
f (x, y) = f (ξ◦ ) + x − x◦ + (y − y◦ )
∂x ∂y
∂ 2 f (ξ◦ ) ∂ 2f 2
 
1 2 ∂ f (ξ◦ )
+ (x − x◦ )2 + 2(x − x ◦ )(y − y ◦ ) + (y − y ◦ )
2! ∂x2 ∂x∂y ∂y 2

48
∂f
= 1/2(1 + x + 4y)−1/2
∂x

∂f (ξ◦ ) 1
=
∂x 4

∂f
= 1/2(1 + x + 4y)−1/2 · 4
∂y

∂f (ξ◦ )
=1
∂y

∂ 2f
= −1/4(1 + x + 4y)−3/2 · 1
∂x2

∂ 2 f (ξ◦ )
= −1/32
∂x2

∂ 2f
= 2[−1/2(1 + x + 4y)−3/2 · (1)]
∂x∂y

∂ 2 f (ξ◦ )
= −1/8
∂x∂y

∂ 2f
2
= 2[−1/2(1 + x + 4y)−3/2 ] · 4
∂y

∂ 2 f (ξ◦ )
= −1/2.
∂y 2

Thus,

f (x, y) = 2 + [(x + 1)1/4 + (y − 1)]


1
+ [(x + 1)2 (−1/32) + 2(x + 1)(y − 1)(−1/8) + (y + 1)2 (−1/2)]
2!
x+1 (x + 1)2 (x + 1)(y − 1) (y − 1)2
f (x, y) = 2 + +y−1− − −
4 64 8 2
64f (x, y) = 128 + 16(x + 1) + 64(y − 1) − (x + 1)2 − 32(y − 1)2 .

Example 4.2.10. 1. Obtain the Taylor’s expansion for the function up to and
including order 3 near point (1,-2).

49
Given that f (x, y) = x2 y + 3y − 2.

2. f (x, y) = log(1 + x + 2y) about the origin.

3. f (x, y) = (ex , ey ) about ξ◦ = (0, 0).

4. f (x, y) = cos(sin(xy)) in power of (x − 1) and y − π/3.

5. Obtain the third Taylor polynomial of f (x, y) = cos(x2 y) in the point of (x−2)
and (y − π/6)

50
Chapter 5

Applications to Extremum
Problems and the method of
Lagrange Multipliers

5.1 Extremum Problems


Definition 5.1.1. Consider the function f : U ⊂ IRn → IR, where U is an
open set. Let x◦ ∈ U and assume that f is differentiable at x◦ . If D1 f (x◦ ) =
0, · · · , Dn f (x◦ ) = 0, the point x◦ is called a critical point (or stationary point of f ).

Example 5.1.2. Find the critical point of the function f : IR2 → IR defined by
2 2
f (x, y) = e−(x +y ) .

Solution: Taking partial derivatives w.r.t x and y


∂f 2 2
= −2xe−(x +y )
∂x
∂f 2 2
= −2ye−(x +y ) .
∂y
The only values of x, y for which ∂f ∂f
∂x = 0 = ∂y is x = 0, y = 0.
Therefore the only critical point of the function is (0, 0).

Definition 5.1.3. Let f be a real-valued (differentiable or not) defined in an open


subset U of IRn . We say that f has a local maximum at a point x◦ ∈ U if ∃ δ > 0

51
such that f (x) ≤ f (x◦ ) for every x ∈ B(x◦ , δ).
f is said to have absolute maximum at the point x◦ ∈ U if f (x) ≤ f (x◦ ) for
every x ∈ U . The concepts of local minimum and absolute minimum are defined
analogously.

Definition 5.1.4. The number which is either a local maximum or a local minimum
of f is called an EXTREMUM point of f .

Remark 5.1.5. The following result is a necessary condition for a critical point to
be a local maximum.

Theorem 5.1.6. Let f be a real-valued function of class C 1 in an open subset U of


IRn . If f has a local maximum at the point x◦ ∈ U , then the derivative Df (x◦ ) = 0.
i.e. Di f (x◦ ) = 0, i = 1, 2, . . . , n.

Proof: Let U be a non-zero element on IRn . Then for some values of t, x◦ +tu ∈
U and f (x◦ + tu) is defined. (Recall the Proof from one-variable calculus: Because
g(0) is a local maximum, g(t) ≤ g(0) for small t > 0, so g(t) − g(0) ≤ 0, and hence
g(t) − g(0)
g ′ (0) = lim ≤ 0, where lim means the limit as t → 0, t > 0. For
t→0+ t t→0+
g(t) − g(0)
small t < 0, we similarly have g ′ (0) = lim ≥ 0. Therefore, g ′ (0) = 0.)
t→0− t
Furthermore, for small values of t, tu is small and hence x◦ + tu ∈ B(x◦ , δ) so that
f (x◦ + tu) < f (x◦ ). Therefore, the function of a single variable g(t) = f (x◦ + tu)
has a local maximum at t = 0. Hence, its derivatives g ′ (0) = [Df (x◦ )]u = 0.
By chain rule, we have that ∇f (x◦ )u = 0 that is D1 f (x◦ ) = 0, · · · , Dn f (x◦ ) = 0.
This completes the proof. □

Definition 5.1.7. A real valued function Q defined in IRn by the equation of the
type
n
X
Q(x) = aij xi xj ,
i=1
where x = (x1 , · · · , xn ) and aij ’s are real numbers is called a quadratic form. The
quadratic form is called

1. Symmetric if aij = aji

52
2. Positive definite if x ̸= 0 ⇒ Q(x) > 0

3. Negative definite if x ̸= 0 ⇒ Q(x) < 0

Definition 5.1.8. A critical point x◦ is called a SADDLE point if every neighbour-


hood B(x◦ , δ) contains point x such that f (x) < f (x◦ ) and other points such that
f (x) > f (x◦ ).

The last theorem gave the necessary condition for a local maximum point of a
C 1 -function to be a critical point. The next result gives a sufficient condition for a
critical point to be a local maximum or local minimum.

Theorem 5.1.9. Let U ⊂ IRn be an open set. Suppose


(i) f : U → IR is of class C 2 on U and
(ii) x◦ ∈ U is a critical point of f . Let
n X
X n
Q(ξ) = Dij f (x◦ )ξi ξj .
i=1 j=1

(a) If Q is positive definite, then the point x◦ is a local minimum of f .


(b) If Q is negative definite, then the point x◦ is a local maximum.
(c) If Q takes both positive and negative values then the point x◦ is a saddle point
of f .

For function of two variables, f , the quadratic form is given by

∂ 2 f (x◦ , y◦ ) 2 ∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ ) 2


Q(x, y) = x +2 xy + y .
∂x2 ∂x∂y ∂y 2
Theorem 5.1.10. If U ⊂ IRn is an open set and f : U → IR is of class C 2 and
(x◦ , y◦ ) ∈ U is a critical point of f . Then,

2
∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ )

(a) if > 0 and − > 0,
∂x2 ∂x2 ∂y 2 ∂x∂y
it follows that (x◦ , y◦ ) is a local minimum of f .

2
∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ )

(b) if > 0 and − < 0,
∂x2 ∂x2 ∂y 2 ∂x∂y

53
it follows that (x◦ , y◦ ) is a local maximum of f.

(c) If
2
∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ )

− = 0.
∂x2 ∂y 2 ∂x∂y
then (x◦ , y◦ ) is neither a local maximum nor a local minimum. In this case, (x◦ , y◦ )
is a saddle point.

Before giving the proof of the above theorem, we shall need the following.

Definition 5.1.11. The Hessisan matrix is the derivative of the Jacobian matrix
given by
∂ 2f ∂ 2f
 
 ∂x2 ∂x∂y 
 
H=

.

 ∂ 2f ∂ 2f 
∂x∂y ∂y 2
∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ )
∂x2 ∂x∂y
det(H) = .
∂ 2 f (x ◦ , y◦ ) ∂ 2 f (x
◦ , y◦ )
∂y∂x ∂y 2
Proof:
∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ )
Let A = , B = and C = .
∂x2 ∂x∂y ∂y 2

A B
Then D = = AC − B 2 .
B C
We only need to look at the quadratic form Q(x, y) = Ax2 + 2Bxy + Cy 2 . Suppose
A ̸= 0. Specifically, let A > 0. Then we write

2B B2 C B2
Q = (x, y) = A[x2 + xy + 2 y 2 + ( − 2 )y 2 ]
A A A A
2B B 2 B2 2
= A(x2 + xy + 2 y 2 ) + (C − )y
A A A
B 1
= A(x + y)2 + (CA − B 2 )y 2
A A
B 2 D 2
= A(x + y) + y
A A
54
The quadratic form Q is positive definite if D > 0. Therefore, by the last theorem,
the point (x◦ , y◦ ) is a local minimum.

D
(b) Suppose A < 0, then is negative. Hence the quadratic form is negative
A
definite.

(c) If D < 0, then there are three probabilities


−B D
(i) A ̸= 0, then Q(1, 0) = A and Q( , 1) =
A A
(ii) C ̸= 0, then Q(0, 1) = C and Q(1, −B/C) = D/C

(iii) A = C = 0, then B ̸= 0, and Q(1, 1) = −Q(1, −1). In each case, the two given
values of Q differ in sign and so the point (x◦ , y◦ ) is neither a local maximum
nor a local minimum. Therefore, it follows that the point (x◦ , y◦ ) is a saddle
point of f .

Example 5.1.12. Investigate the nature of the critical points of the function
f (x, y) = y 3 − 2x2 − 2y 2 + y and determine whether each is a local minimum, local
maximum or saddle point.

Solution:
∂f ∂f
= −4x, = 3y 2 − 4y + 1
∂x ∂y
At critical points,
−4x = 0 ⇒ x = 0 and
3y 2 − 4y + 1 ⇒ y = 1 and 1/3
The critical points are (0,1), (0,1/3).
∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ ) ∂ 2 f (x◦ , y◦ )
A= = −4 B= =0 C= = 6y − 4
∂x2 ∂x∂y ∂y 2

(0,1) (0,1/3)
A -4 -4
B 0 0
C 2 -2
D -8 8

55
We see that from the table

1. For the point (0,1), A < 0, D < 0


Therefore (0, 1) is a saddle point

2. For the point (0, 1/3), we have A < 0, D < 0. Therefore, (0,1/3) is a local
maximum.

Example: Discuss the critical points of the function f (x, y) = 2x4 + y 4 − 2x2 − 2y 2
in the open set B := {(x, y) ∈ IR2 : x2 + y 2 < 1}

∂f
= 8x3 − 4x
∂x

∂f
= 4y 3 − 4y
∂y
The necessary condition is
∂f
At critical point, =0
∂x
⇒ 8x3 − 4x = 0
√ √
⇒ x = 0, 1/ 2, −1/ 2
∂f
= 0 ⇒ 4y 3 − 4y = 0
∂y
y = 0, −1, 1
Since we are in the open set B := {(x, y) ∈ IR2 : x2 + y 2 < 1}, the critical points
are

√ √
y/x 0 1/ 2 -1/ 2
√ √
0 (0,0) (1/ 2,0) (-1/ 2,0)
√ √
-1 (0,-1) (1/ 2,-1) (-1/ 2,-1)
√ √
1 (0,1) (1/ 2,1) (-1/ 2,1)
√ √
The critical points that lie in the open set are (0, 0), (1/ 2, 0), (−1/ 2, 0)

∂ 2f
A= = 24x2 − 4
∂x2

56
∂ 2f
B= =0
∂x∂y

∂ 2f
C= = 12y 2 − 4
∂x2

√ √
(0, 0) (1/ 2, 0) (−1/ 2, 0)
A −4 8 8
B 0 0 0
C −4 −4 −4
D 16 −32 −32

1. At point (0,0), A < 0, D > 0


⇒ (0, 0) is a local maximum.

2. At point (1/ 2,0), A > 0, D < 0
⇒ (0, 0) is a saddle point.

Exercise 5.1.13. Discuss the critical point of the function f (x, y, z) = x + 2y +


yz − x2 − y 2 − z 2
(i) in IR3
(ii) in the closed ball
S = {(x, y, z) ∈ IR3 : x2 + y 2 + z 2 ≤ 1}

5.2 Lagrange Multipliers


In this section, we shall discuss the method of Lagrange multipliers for solving
extremum problem with constraints.
Before describing the method in general, we wish to demonstrate the use of the
method with the following example.

57
Example 5.2.1. Find the extremum points of the function f : IR2 → IR defined by
3
f (x, y) = x2 − xy + y 2 − x3 if x and y satisfy the equation

2y − x = 4

g(x, y) = 0

g(x, y) = 2y − x − 4

Solution:
We form a new function

F (x, y, λ) = f (x, y) + λg(x, y)

where g(x, y) = 2y − x − 4

x3
F (x, y, λ) = x2 − xy + y 2 − + λ(2y − x − 4)
3
∂F (x, y, λ)
= 2x − y − x2 − λ
∂x
∂F (x, y, λ)
= −x + 2y + 2λ
∂y
∂F (x, y, λ)
= 2y − x − 4
∂λ
∂F ∂F ∂F
Equate , and to zero and solve the resulting simultaneously equation
∂x ∂x ∂λ

2x − y − x2 − λ = 0 (1)

−x + 2y + 2λ = 0 (2)

2y − x − 4 = 0 (3)

Solving (2) and (3) simultaneously, we get

4 + 2λ = 0 ⇒

λ = −2

Substitute for λ = −2 in (1) and obtain

2x − y − x2 + 2 = 0 (4) × 2

58
x + 2y − 4 = 0 (5) × 1

4x − 2y − 2x2 + 4 = 0

−x + 2y − 4 = 0
Adding,
3x − 2x2 = 0 ⇒ x = 0, 3/2

Substitute for x = 0, 3/2 in equation (3)


For x = 0, y = 2
x = 3/2, y = 11/4
The critical points are (0,2), (3/2, 11/4). We have f (0, 2) = 4
9 = 4.56. Therefore, max{4, 4.56} = 4.56. Hence, 4.56 is
f (3/2, 11/4)=73/16 = 4 16
the absolute maximum for f (x, y) at the critical point (3/2, 11/4)
4=min{4, 4.56}. The function has an absolute minimum at the point (0,2). □

This example, a typical example of the lagrange multiplier problem can be described
generally as follows:
Obtain the extremum points of the function f (x, y) subject to the constraints
condition g(x, y) = 0. This equation is known as extremum problem with con-
straints.

Solution Procedures
Form an auxiliary function

F (x, y, λ) = f (x, y) + λg(x, y).

Differentiate partially w.r.t. x, y and λ in turn and equate the resulting expressions
to zero. Thus, we have

∂F (x, y, λ) ∂f (x, y) λ∂g(x, y)


= +
∂x ∂x ∂x
∂F (x, y, λ) ∂f (x, y) λ∂g(x, y)
= +
∂y ∂y ∂y
∂F (x, y, λ)
= g(x, y).
∂λ

59
Then
∂f (x, y) λ∂g(x, y)
+ =0
∂x ∂x
g(x, y) = 0.

This system of equations (called the Lagrange equation) is then solved simultane-
ously. Then the problem reduces to that of finding the critical points of F (x, y, λ).
The general situation is given in the following theorem, which establishes the
validity of Lagrange’s method.

Theorem 5.2.2. Let f and g1 , g2 , . . . , gn be of class C 2 on an open subset U of


IRn , where 1 ≤ m ≤ n and suppose x◦ = (x01 , x02 , . . . , x0n ) ∈ U is an extremum
point of f subject to the constraints

gi (x) = 0, 1 ≤ i ≤ m. (L)

Suppose also that at least one of the Jacobians

∂(g1 , ..., gm )
(1 ≤ i ≤ i2 ≤ · · · ≤ im ≤ n)
∂(xi1 , xi2 , ..., xim ) x

is non-zero. Then there exists λ1 , λ2 , λ3 , ..., λn such that


m
∂f (x◦ ) X ∂gi (x◦ )
+ = 0.
∂xi ∂xi
i=1

NOTE: The numbers λ1 , λ2 , ..., λm ar e called the Lagrange multipliers.

Example 5.2.3. Find the extreme points of f (x, y, z) = x + y + z subject to the


condition x2 + y 2 = 2 and x + z = 1.

Solution:
Hence, there are two constraints

g1 (x, y, z) = x2 + y 2 − 2 = 0

60
g2 (x, y, z) = x + z − 1 = 0.

Thus, we must find x, y, z and λ1 , λ2 such that

∇F (x, y, λ) = ∇f (x, y, z) + λ1 ∇g1 (x, y, z) + λ2 g(x, y, z)

and g1 (x, y, z) = 0
g2 (x, y, z) = 0. i.e
1 + 2xλ1 + λ2 1 (1)

1 + 2yλ1 + λ2 0 (2)

1 + 0λ1 + λ2 (3)

x2 + y 2 − 2 = 0

x + z − 1 = 0.

These equations are then solvable for x, y, z, λ1 and λ2 from the 3rd, λ2 = −1 and
2xλ1 = 0
2yλ1 = −1.

Since the second implies λ1 ̸= 0 we have that x = 0, y = ± 2 and z = 1. Hence,

our points are x = 0, y = ± 2 and z = 1. i.e.,

(0, ± 2, 1) or
√ √
(0, − 2, 1) and (0, + 2, 1)
f (x, y, z) = x + y + z.
√ √
By inspection, we see that (0, 2, 1) is the maximum point and (0, − 2, 1) is the
minimum point. □

Example 5.2.4. Find the largest volume of a rectangular box can have subject to
the constraint that the surface area be fixed at 10 square meters.

Solution:
Here x, y, z are the length of the sides, volume is f (x, y, z) = xyz. The constraint
is given by
2(xy + xz + yz) = 10

61
. i.e,
xy + xz + yz = 5

This our conditions are 




 yz + λ(y + z) = 0


xz + λ(x + z) = 0



xy + λ(y + x) = 0



 xy + xz + yz = 5.

First of all, x ̸= 0, for x = 0 ⇒ yz = 5 and 0 = λz, so λ = 0 and yz = 0. Similarly,


y ̸= 0, z ̸= 0, and x + y ̸= 0. Elimination of λ from the first two equations gives
yz xz
=
y+z x+z
r
5
which gives x = y similarly y = z. Using the last equation, 3x2 = 5 i.e., x = ,
r 3
5
Thus, x = y = z = and so,
3
53
2
xyz =
3
The maximum occurs when x = y = z. □

Exercise 5.2.5. 1. Obtain the largest and least values of 2(x + y + z) − xyz on
the closed ball
B = {(x, y, z) : x2 y 2 + z 2 ≤ 9}
¯
2. A rectangular box without a top is to have a volume 18m3 . By using the
method of Lagrange multiplier, determine the dimension that will give its
maximum surface.

3. Find the extreme of f subject to the total constraints.


(i) f (x, y) = 3x + 2y g(x, y) = 2x2 + 3y 2 − 3
(ii) f (x, y, z) = z + y + z
g1 (x, y, z) = 2x + 2y − 1

62
Chapter 6

The Mapping Theorems

Here, we shall consider two mapping theorems viz: the inverse function theorem
and the Implicit function theorem.

6.1 Inverse Function Theorem


We now give a background for the statement of the Inverse Function Theorem.

Suppose that we have a differentiable function f : U → IRn , where U ⊆ IRn is


an open set. We would like to know when f has an inverse f −1 : f (U ) → IRn which
is differentiable.. The inverse function f −1 exists if and only if f is one-to-one.
If f is one-to-one, in order to discuss the differentiability of f −1 , we need to
show that f (U ) is an open set. Suppose for the moment that f is one-to-one and
f (U ) is open and f −1 is differentiable. Then by the chain rule (at each point),

(Df −1 )(Df ) = In

(Df )(Df −1 ) = Im

where Ij = Identity map on IRj . Therefore, for any x ∈ U , the Jacobian matrix
Df (x) has both right and left inverses. So it must be a square matrix (i.e. m = n).
We conclude two things from the brief discussion
* First, we need to concern ourselves only with the case of a differentiable function
f : U → IRm .

63
** Secondly, if f is to have a differentiable inverse, then it is necessary that the
Jacobian matrix, Df (x) be invertible for each x ∈ U .

Theorem 6.1.1. (Inverse Function Theorem) Let f be an IRn -valued function of


class C 1 on an open subset U of IRn . Let x be on the domain of f such that Df (x)
is invertible i.e. such that Df (x) ̸= 0. Then, there exists a neighbourhood W of x
such that
(i) f is one-to-one on W
(ii) f (W ) is an open set and
(iii) The inverse function f −1 : f (W ) → W is continuously differentiable and its
derivatives satisfies

Df −1 (f (a)) = [Df (a)]−1 , a ∈ W.

Proof: For the sake of clarity we will now break up the proof of the theorem
into a number of steps as follows:
Step 1: Simplification to a special case
We will prove the theorem below for the case when Df (x◦ ) is the identity transfor-
mation. Here we show that this is indeed sufficient to prove the general case.
Letλ = Df (x◦ ); then λ−1 exists, and by the chain rule D(λ−1 ◦f )(x◦ ) = D(λ)(f (x◦ ))◦
Df (x◦ ) = λ−1 ◦ Df (x◦ ) =identity transformation. Now if the theorem is true for
λ−1 ◦ f, then the theorem is also true for f. Indeed, if g is an inverse for λ−1 ◦ f,
the inverse for f will be g ◦ λ−1 . We can make one further simplifying assumption,
namely, that x◦ = 0 and f (x◦ ) = 0. To see this, let us suppose we have proven the
theorem for the special case x◦ = 0 and f (x◦ ) = 0. We now prove the general case
from this. Let h(x) = f (x + x◦ ) − f (x◦ ). Then h(0) = 0 and Dh(0) = Df (x◦ ), so
Dh(0) is invertible. Then if h has an inverse near x = 0, the required inverse for f
near x◦ is given by
f −1 (y) = h−1 (y − f (x◦ )) + x◦ .

Step 1 demonstrates that it is sufficient to prove the theorem under the assumptions

64
x◦ = 0, f (x◦ ) = 0 and Df (0) is the identity. This will be assumed in the remaining
parts of the proof.
Step 2:
To get a local inverse, we would like is choose two neighborhoods of 0 such that
given any y from the first neighborhood of 0 there is a unique x from the second
neighborhood such that f (x) = y. To do this, consider the function gy defined by
gy (x) = y + x − f (x) If for some closed neighborhood of zero this is a contracting
mapping, then it has a unique fixed point, say x, and so x = y + x − f (x) or x is the
unique point belonging to the neighborhood such that f (x) = y. Now construct this
neighborhood: define g(x) − x − f (x); then Dg(0) = 0. Assume g to be of class C p ,
with p > 1. This means in particular that Dg is a continuous function, and so by
continuity at 0 there exists an r > 0 such that ∥x∥ < r implies ∥Dg1 (x)∥ < 1/2n,
where g = g1 , g2 , · · · , gn ). By the mean-value theorem, given x ∈ D(0, r), there
exists points c1 , c2 , · · · , cn in D(0, r) such that gi (x) = gi (x) − gi (0) = Dgi (ci )(x −
0) = Dgi (ci )(x). Therefore,
n n n
X X X ∥x∥ r
∥g(x)∥ = ∥gi (x)∥ = ∥gi (ci )(x)∥ ≤ ∥gi (ci )∥∥(x)∥ < < .
2 2
i−1 i−1 i−1

Let x1 and x2 be any two points in D̄(0, r) Then ∥gy (x1 )−gy (x2 )∥ = ∥g(x1 )−g(x2 )∥
and by the mean-value theorem as above, ∥g(x1 ) − g(x2 )∥ ≤ (1/2)∥x1 − x2 ∥, and
so gy is a contracting map (with constant K = 1/2). Now this implies that there is
a unique fixed point x ∈ D̄(0, r) for gy , and this implies f (x) = y. This means that
f has an inverse f −1 : D̄(0, r/2) ⊂ IRn → D̄(0, r) ⊂ IRn .
Step 3: The inverse is continuous.
Let x1 , x2 ∈ D̄(0, r), then recalling the definition of g, we get

∥x1 − x2 ∥ ≤ ∥f (x1 ) − f (x2 )∥ + ∥g(x1 ) − g(x2 )∥ ≤ ∥f (x1 ) − f (x2 )∥ + (1/2)∥x1 − x2 ∥,

and hence, ∥x1 − x2 ∥ ≤ 2∥f (x1 ) − f (x2 )∥. Therefore, if y1 , y2 ∈ D̄(0, 1/2), we get
∥f −1 (y1 ) − f −1 (y2 )∥ ≤ ∥y1 − y2 ∥, so f −1 is continuous.
Step 4: Inverse is differentiable.
For suitably small r, the inverse is differentiable on D̄(0, r/2). We were given that

65
2
f (0) is invertible, that Df : A ⊂ IRn → IRn is continuous, show that for all x
in some neighborhood around 0, [Dg(x)]−1 exists. If this neighborhood does not
contain D̄(0, r/2), r is restricted further until this is the case. Hence we can assume
[Df (x)]−1 exists for all x ∈ D̄(0, r/2). Moreover, we can assume ∥[Df (x)]−1 y∥ ≤
M ∥y∥ for all x ∈ D̄(0, r/2) and y ∈ IRn by the continuity of [Df (x)]−1 .
Now, for y1 , y2 ∈ D̄(0, r/2), x1 = f −1 (y1 ) x2 = f −1 (y2 ),
∥f −1 (y1 )−f −1 (y2 )−[Df (x2 )]−1 .(y1 −y2 )∥
∥y1 −y2 ∥

∥x1 − x2 − [Df (x2 )]−1 .(f (x1 ) − f (x2 ))∥


=
∥f (x1 ) − f (x2 )∥
∥{[Df (x2 )]−1 }{Df (x2 )(x1 − x2 ) − (f (x1 ) − f (x2 ))}∥
 
∥x1 − x2 ∥
= .
∥f (x1 ) − f (x2 )∥ ∥x1 − x2 ∥
Using ∥x1 − x2 ∥ ≤ 2∥f (x1 ) − f (x2 )∥ and ∥Df (x2 )−1 y∥ ≤ M ∥y∥ gives that the
above is
∥Df (x2 )(x1 − x2 ) − (f (x1 ) − f (x2 ))∥
≤ 2M .
∥x1 − x2 ∥
The last expression has a limit zero as ∥x1 − x2 ∥ → 0 by the differentiability of
f at x2 . This shows that f −1 is differentiable at y2 with derivative [Df (x2 )]−1 =
[Df (f −1 (y2 ))]−1 . In the theorem, we therefore set W = D̄(0, r/2) and U = f −1 (W ),
with both open sets.
Step 5:
We next prove that f −1 : D̄(0, r/2) → IRn is of class C p . To do this, we no-
tice from step 4 that f −1 : D̄(0, r/2) → IRn is differentiable on D̄(0, r/2) and that
Df −1 (y) = [Df (f −1 (y))]−1 . We have shown that f −1 : D̄(0, r/2) → IRn continuous
and Df ; is continuous by assumption. This implies that Df −1 is continuous on x ∈
D̄(0, r/2). Hence f −1 is of class C 1 . Also, looking at Df −1 (y) = [Df (f −1 (y))]−1 ,
we observe that since f −1 is of class C 1 , Df is of class C p−1 and since inversion is
C p , p ≥ 1, Df −1 is of class C 1 . Hence f −1 is of class C 2 . Continuing in this way
by induction we finally conclude that f −1 is of class C p . □

66
Example 6.1.2. Let
u(x, y) = ex cos y

v(x, y) = ex sin y.

Show that the function


f (x, y) = (u(x, y), v(x, y))

is locally invertible but not globally invertible in the neighbourhood of (0, π3 ). □

Solution:

∂u ∂u
∂x ∂y ex cos y −ex sin y
∂(u, v)
= =
∂(x, y)
∂v ∂v
ex sin y ex cos y
∂x ∂y
= e2x [cos2 y + sin2 y] = e2x
∂(u, v)
= e0 = 1 ̸= 0.
∂(x, y) (0,π/3)
By the inverse mapping theorem, the function f is locally invertible in the neigh-
bourhood of (0, π/3). To see the global invertibility, let x, y ∈ IR be arbitrary

u(x, y + 2π) = u(x, y)

v(x, y + 2π) = v(x, y)


π 7π
u(0, + 2π) = u(0, )
3 3
π
= u(0, )
3
π 7π
v(0, + 2π) = v(0, )
3 3
π
= v(0, )
3
π π π
∴ f (0, ) = (u(0, ), v(0, ))
3 3 3
0 π 0 π
= (e cos( ), e sin( ))
√ 3 3
1 3
=( , ).
2 2

67
Also,
π π π
f (0, ) = (e0 cos( ), e0 sin( ))
3 3 3
π 7π
Now (0, ) ̸= (0, )
3 3
π 7π
but f (0, ) = f (0, ).
3 3
Therefore, f is not one-to-one. Hence, f is not invertible globally in the neighbor-
hood of (0, π3 ). Note that the neighbourhood of 0, π3 on which f is invertible should
not include (0, 7π3 ).
To find the local inverse, recall that

u2 + v 2 = e2x (cos2 y + sin2 y) = e2x

log( eu2 + v 2 ) = 2x
1
∴ x= log(u2 + v 2 ).
2
Also, to obtain y,
v
= tan y.
u
v
⇒ y = tan−1 ( ).
u
∴ if g is a local inverse of f in the neighbourhood of (0, π3 ), then
1 v
g(u, v) = ( log(u2 + v 2 ), tan−1 ( )). □
2 u

Example 6.1.3. Consider the system of equations

ax2 + bx + cy = u (i)

αx2 + βx + γy = v (ii).

If bγ − βc ̸= 0 and γa − αc = 0, solve the equation for (x, y) uniquely in terms of


(u, v).

68
Solution:

∂u ∂u
∂x ∂y
∂(u, v)
=
∂(x, y)
∂v ∂v
∂x ∂y

2ax + b c
=
2αx + β γ

= (2ax + b)γ − (2αx + β)c

= 2(aγ − αc) + bγ − βc

= bγ − βc ̸= 0.
This shows that the given systems of (u, v) in (i) and (ii) are locally invertible

(i) × γ aγx2 + bγx + cγy = γu

(ii) × c αcx2 + βcx + cγy = cv.

Then (i) − (ii) gives

(aγ − αc)x2 + (bγ − βc)x = γu − cv

i.e (bγ − βc)x = γu − cv.


γu − cv
∴ x= .
bγ − βc
Substitute for x in (i) to obtain y.

ax2 + bx + cy = u
 2  
γu − cv γu − cv
a +b + cy = u
bγ − βc bγ − βc
    
1 γu − cv γu − cv
y= u−a −b .
c bγ − βc bγ − βc

69
6.2 Implicit function Theorem
Consider the function f : IRn × IRn → IRm . Let x and y be related by the equation
f (x, y) = 0. We want to solve f (x, y) = 0 so that we obtain y = f (x). Such a
function is called an explicit function. We also want to compute Df (x). In general,
given f (x, y) = 0, one may not be able to solve for y in terms of x. Therefore, it
becomes important to verify whether such a relation exists or not. Consider the
following example. Suppose f : IR2 → IR is defined by f (x, y) = x2 + y 2 − 1. Let us
find x and y such that f (x, y) = 0. A function f (x) is a solution ⇐⇒ f (x, f (x)) =
0.
The solution of the equation is
p
f (x) = ± 1 − x2
p p
i.e f (x) = + 1 − x2 and f (x) = − 1 − x2 . Thus, y is defined for |x| < 1.
Therefore, f if it exists is not necessarily unique. Let (x◦ , y◦ ) be such that the
equation F (x◦ , y◦ ) = 0, can we find f (x) such that f is differentiable near (x◦ , y◦ )?
f is not differentiable near x◦ = ±1.
Now consider f : IRn × IRn → IRm and consider the equations

f1 (x1 , x2 , ..., xn , y1 , y2 , ..., ym ) = 0

f2 (x1 , x2 , ..., xn , y1 , y2 , ..., ym ) = 0.

We wish to solve the unknown y1 , y2 , ..., ym in terms of x1 , x2 , ..., xn . The following


theorem tells us how the equation may be solved.

Theorem 6.2.1. (Implicit function theorem): Let Ω ⊂ IRn × IRn be an open set
and let F : Ω → IRm be a function of class C p , p ∈ Z. Suppose (x◦ , y◦ ) ∈ Ω and

70
F (x, y) = 0. From the determinant of the matrix (Di Fj )m
j=1 , i = 1, 2, ..., n i.e

∂F1 ∂F1
···
∂y1 ∂ym
.
△= .
.
∂Fm ∂Fm
···
∂y1 ∂ym
evaluated at (x◦ , y◦ ) where F = (F1 , ..., Fm ). Suppose △ =
̸ 0. Then, there exists
an open neighbourhood U ⊂ IRn and a unique function f : U → V such that
F (x, f (x)) = 0 ∀ x ∈ U . Moreover, f is of class C p .

Proof: Let F : Ω → IRm be a function of class C p . Define a function G :


IRn × IRm → IRn × IRm by G(x, y) = (x, F (x, y)), x ∈ IRn , y ∈ IRm . Since F is of
class C p and the identity mapping is of class C ∞ , it follows that G is of class C p .
The jacobian matrix of G is given as
 
1 0 ··· 0 0 ···0 
∂F1 ∂F1 ∂F1 ∂F1 ∂F1

 
  ∂x1 ∂x2 ··· ∂xn ∂y1 ··· ∂ym 
0 1 · · · 0 0 · · · 0 

   . .


.   
= . .
  
 .
.   

  . .
   

.  
∂Fm ∂Fm ∂Fm ∂Fm ∂Fm
 
∂x1 ∂x2 ··· ∂xn ∂y1 ··· ∂ym
0 0 ··· 1 0 ··· 0

The determinant of this matrix at (x◦ , y◦ ) is now


∂F1 ∂F1
···
∂y1 ∂ym
.
△= . .
.
∂Fm ∂Fm
···
∂y1 ∂ym
Since by hypothesis, △ =
̸ 0, JG(x◦ , y◦ ) ̸= 0. Consequently, by the inverse map-
ping theorem, ∃ an open set W containing (x◦ , y◦ ) and an open set S containing

71
(x◦ , y◦ ) ∋ G(S) = W and this G has an inverse.
G−1 : W → S of class C p . Since S is open, there exists open sets U and V such
that x◦ ∈ U ⊂ IRn , y◦ ∈ V ⊂ IRm and U × V ⊂ S. Let G(U × V ) = Y ∈ W. Then
G : U × V → Y is of class C p and has inverse G−1 : Y → U × V of class C p . Hence,
G−1 (x, w) = (x, H(x, w)), where H is a C p -map. H : Y → V.
Define a map π : IRn × IRm → IRm by π(x, y) = y, so that

F (x, H(x, w)) = π ◦ G(x, H(x, w))

= π ◦ G ◦ G−1 (x, w) = w.

Since G−1 (x, w) = (x, H(x, w)) it follows that if f (x, w) ∈ Y , then x ∈ U . Now de-
fine f : U → V by f (x) = H(x, 0). Since F (x, H(x, w)) = w, we have F (x, f (x)) =
0.
Since H is of class C p , f must be of class C p . By Inverse mapping theorem, H(x, w)
is uniquely determined and hence f (x) = H(x, 0) is also uniquely determined and
the proof is complete. □

72

You might also like