A Conceptual Introduction To Several Variable Calculus
A Conceptual Introduction To Several Variable Calculus
S. Kumaresan
School of Math. and Stat.
University of Hyderabad
Hyderabad 500046
[email protected]
The right hand side of this equation involves T vi and the coefficients ai of v as a linear
combination of vi ’s.
We now take V = R and W = R. Let A : R → R be linear. Then as a basis of R, we take
v1 = 1. Any real number x ∈ R can be written as x = x · 1. We therefore have,
Thus any linear map T : R → R is of the form T (x) = αx where α := T (1). Conversely, if we
define T : R → R by setting T x := αx for some real scalar α, then it is easy to verify that T
is a linear map.
We extend this result for linear maps A : Rn → R. As a basis of Rn , we choose the so-
called standard basis {ei : 1 ≤ i ≤ n} where ei = (0, . .P
. , 0, 1, 0, . . . , 0) with 1 at the i-th place.
Then any x = (x1 , . . . , xn ) ∈ R is expressed as x = ni=1 xi ei . We apply A to both sides of
n
Thus, if A : Rn → R is any linear map, there exists a vector (α1 , . . . , αn ) ∈ Rn such that
Ax = α · x where αi = Aei for all i. Conversely, if α ∈ Rn is given and if we set Ax = α · x,
then A : Rn → R is linear with Aei = αi . Here and in the sequel, v · w and hv, wi stand for
the dot product of vectors in Rn .
The most important observation in (1) and (2) is that in the expression for T or A, the
scalars α or αi are completely determined as T (1) or as T ei .
1
Given two vector spaces V and W , a map of the form x 7→ Ax + w0 where A : V → W is
linear and w0 ∈ W is a fixed vector, is called an affine function.
The basic idea of differential calculus is this: Given a function f : U ⊂ Rn → R and a
point a ∈ U , we wish to find a linear map A : Rn → R such that f (x) is “approximately
equal to” f (a) + A(x − a) for points x sufficiently near to a, that is, f (a + h) ≈ f (a) + Ah
for all h near to 0 ∈ Rn . (≈ is read as approximately equal to.) In other words, if we think
of h as an increment to the independent variable, then the increment f (a + h) − f (a) in the
dependent variable is approximately near to Ah. If we can find such an A, we shall say that
f is differentiable at a and the derivative at a is A.
What do we mean by Ah is near to f (a + h) − f (a) for h near zero? An obvious and naive
answer would be f (a + h) − f (a) approaches zero (in R) as h approaches 0 in Rn . This is not
the correct formulation. For, if we assume that f is continuous at a, we can take A to be the
zero map Ah = 0 for all h ∈ Rn . With this choice we have
f (a + h) − f (a) − Ah = f (a + h) − f (a) → 0, as h → 0,
|f (a + h) − f (a) − Ah|
→ 0, as h → 0. (3)
khk
2
The function f is differentiable at a iff there exists a linear map A : Rn → R such that for
any given ε > 0, there exists a δ > 0 with the following property:
|f (a + h) − f (a) − Ah| < ε khk for all h with khk < δ. (4)
Two remarks are in order. The first is that even though the domain U of f may be a
proper subset of Rn , the domain of A is Rn , (as it should be, since the domain of a linear
map should be vector spaces!)
The second point is that the linear map appearing in the definition is unique. (This is
a place where we need U to be open.) Let us explain the second point in detail. Our claim
is this: If B : Rn → R is a linear map satisfying (3) with A replaced by B (possibly with
different δ for a given ε), then B = A, that is, Ah = Bh for all h ∈ Rn . Using the fact that
both A and B satisfy (3), we shall prove that Ah = Bh for all h with khk = 1.
Given ε > 0 there exist δ1 and δ2 such that the following hold:
ε
|f (a + h) − f (a) − Ah| < khk for khk < δ1 ,
2
ε
|f (a + h) − f (a) − Bh| < khk for khk < δ2 .
2
Let δ = min{δ1 , δ2 }. Let v ∈ Rn be given with kv k = 1. Choose any t ∈ R such that |t| < δ.
Set h := tv. We have
It follows that kAv − Bv k < ε for kv k = 1 for any ε > 0. Hence we conclude that Av = Bv
whenever kv k = 1. If w ∈ Rn is any nonzero vector, if we set v := w/ kw k, then kv k = 1 so
that
w w 1 1
Av = Bv =⇒ A( ) = B( ) =⇒ Aw = Bw =⇒ Aw = Bw.
kw k kw k kw k kw k
This completes the proof of our claim. The unique A is called the total derivative of f at a
and is denoted by Df (a).
We now face the following question. If f : (a, b) ⊂ R → R is differentiable at c ∈ (a, b)
in the usual calculus sense, is it differentiable according to our new definition and if so, is
there any relation between f ′ (c) of calculus and Df (c)? The answers to these questions is
contained in the following
Theorem 2. Let f : (a, b) → R be given and c ∈ (a, b). Then f is differentiable at c in the
usual calculus sense iff f is differentiable according to Def. 1 and we have
3
Proof. Do you understand (5)? The left hand side is the value of the linear map Df (c) at
h = 1 and the equation says that the real number f ′ (c) is Df (c)(1).
Let us assume that f is differentiable at c in the usual sense. We shall show that if we
define Df (c)(h) := f ′ (c)h, then we have
|f (c + h) − f (c) − Df (c)(h)|
→ 0 as h → 0.
khk
Since f ′ (c) exists, given ε > 0 there exists δ > 0 such that
f (c + h) − f (c) ′
− f (c) < ε for 0 < |h| < δ.
h
This is same as saying that
f (c + h) − f (c) − f ′ (c)h
<ε for 0 < |h| < δ,
h
or,
f (c + h) − f (c) − f ′ (c)h < ε |h|
for |h| < δ.
This shows that f is differentiable at c according to Def. 1 with derivative given by Df (c)(h) =
f ′ (c)h.
The converse is proved in an analogous way. We set α := Df (c)(1) and show that
f (c + h) − f (c)
→ α as h → 0.
h
This will prove that f is differentiable at c and the derivative as α. Note that Df (c)(h) = αh.
Since f is differentiable at c according to Def. 1, given ε > 0 there exists δ > 0 such that
Remark 3. Note that (5) brings out the intimate relation between f ′ (c) and Df (c).
Definition 4. Let f : U ⊂ Rn → R be differentiable at a ∈ U with Df (a) = P A. From
(2), we know that there exists a unique vector α ∈ R such that Ah = α · h ≡ ni=1 αi hi
n
4
Example 5. Let f : U ⊂ Rn → R be a constant. Let a ∈ U be arbitrary. If we look at
f (a + h) − f (a) = 0, an obvious choice for Df (a) is the zero linear map. We see that
|f (a + h) − f (a) − 0h| 0
= = 0.
khk khk
Hence f is differentiable at a with Df (a) = 0, the zero linear map.
Example 6. Let f be the restriction to an open U ⊂ Rn of a linear amp A : Rn → R. It
should be intuitively clear that Df (a) exists and equals A for any a ∈ U . Let us verify this.
|f (a + h) − f (a) − Ah| = |Aa + Ah − Aa − Ah| = |0| .
Example 7. Let f : R → R be given by f (x) = x2 . Note that from Theorem 2, we know that
f is differentiable according to our new definition and that Df (a)(h) = 2ah. Let us verify
this from first principles.
|f (a + h) − f (a)| = (a + h)2 − a2
= a2 + 2ah + h2 − a2
= 2ah + h2 .
The term on the right hand side which is ‘linear’ in h is 2ah. It suggests that we may take as
Df (a) the linear map h 7→ 2ah. We do so and find that
2
|f (+h) − f (a) − 2ah| h
= → 0 as h → 0.
|h| |h|
Example 8. Consider f : Rn → R given by f (x) = x · x ≡ ni=1 x2i . Again, we shall compute
P
the derivative from the first principles.
f (a + h) − f (a) = (a + h) · (a + h)
= a · a + 2a · h + h · h − a · a
= 2a · h + h · h.
The term ‘linear’ in h on the right hand side of the above equation is 2a · h. This suggests
that we may take Df (a)(h) := 2a · h. Using this, we see that
5
√ √
This follows easily if we observe that |h| ≤ h2 + k2 and |k| ≤ h2 + k2 .
In particular, we find that grad f (a, b) = (1, 1) and grad g(a, b) = (b, a).
Example 10. One last example in the same vein. Let A be an n × n-symmetric matrix with
real entries. Define f (x) := Ax · x. Here Ax denotes the column vector got from the matrix
multiplication of A and the column vector x. Let us show that f is differentiable at each
a ∈ Rn and compute its derivative.
f (a + h) − f (a) = A(a + h) · (a + h) − Aa · a
= (Aa + Ah) · (a + h) − a · a
= Aa · a + 2Aa · h − Aa · h.
An obvious choice for Df (a) is by setting Df (a)h := 2Aa · h, for h ∈ Rn . Let us check
whether this choice works.
|f (a + h) − f (a) − Df (a)(h)| |2Aa · h| kAhk khk
= ≤ = kAhk ,
khk khk khk
Ex. 11. Let M (2, R) denote the set of all 2 × 2 matrices with real entires. We may identify
M (2, R) with R4 as a vector space with the norm kAk := k(a11 , a12 , a21 , a22 )k. Consider
f : M (2, R) → R be given by f (X) = det(X). Show that f is differentiable at I and that
Df (I)(H) = Tr(H).
Ex. 12. Keep the notation of the last exercise. Show that the map f : M (2, R) → M (2, R)
given by X 7→ X 2 is differentiable at A and Df (A)(H) = AH + HA.
6
|f (a+hv)−f (a)−Df (a)(hv)|
that k hv k < ε/ k v k, provided 0 < khv k < δ.
g(h) − g(0) f (a + hv) − f (a) − hDf (a)(v)
− Df (a)(v) =
h h
f (a + hv) − f (a) − Df (a)(hv)
= (using linearity of Df (a))
h
f (a + hv) − f (a) − Df (a)(hv)
= kv k
khv k
< ε,
What we have proved just before this definition is the following theorem.
Remark 15. There exist functions f : U → R such that Dv f (a) exist for all v ∈ Rn but f is
not differentiable.
Let us now specialize the vector v, in the definition of the directional derivatives, by taking
v = ei . In this case,
so that
f (a + tei ) − f (a) f (a1 , . . . , ai−1 , ai + t, ai+1 , . . . , an ) − f (a1 , . . . , an )
lim = lim .
t→0 t t→0 t
The limit, namely, the direction derivative Dei f (a), if exists, is called the i-th partial derivative
∂f
of f at a and is usually denoted ∂x i
(a) or at times by Di f (a).
We now go back to the question raised earlier. Can we find a concrete expression for
the vector grad f (a)? Recall from (2), if T : Rn → R is linear, then T v = (T e1 , . . . , T en ) ·
(v1 , . . . , vn ). Hence, we have
7
Thus we have proved
∂f ∂f
grad f (a) = ( (a), . . . , (a)). (8)
∂x1 ∂xn
∂f
Let us understand what we have achieved. Computing ∂x i
(a) is a one variable job: it is g′ (0)
where g(t) := f (a + tei ). Thus, if we somehow know that f is differentiable, say, by means of
some theoretical considerations, then we can compute Df (a) simply by finding the directional
∂f
derivatives Dv f (a) or the partial derivatives ∂x i
(a) for 1 ≤ i ≤ n. See Example 16 below.
Example 16. Let A be an n × n-real matrix. Let R : Rn \ {0} → R be defined by setting
hAx, xi
R(x) := , x 6= 0.
hx, xi
Then it follows that both numerator and the denominator are differentiable functions and
hence their quotient is differentiable on Rn \ {0}. The derivative can be computed using the
algebra of differentiable functions. However, we shall show how to compute the derivative
in a simpler way. Since it is already known that f is differentiable, it is enough to compute
Df (x)(v) for x 6= 0 for an arbitrary vector v ∈ Rn . By (7) we know Df (x)(v) is the directional
derivative Dv f (x). If we set g(t) := R(x + tv), then Dv f (x) = g ′ (0). Computing this is easy,
ϕ(t)
since g(t) = ψ(t) where
and
ψ(t) := hx, xi + 2t hx, vi + t2 hv, vi .
Hence from the quotient rule of one variable calculus,
ϕ′ (0)ψ(0) − ϕ(0)ψ ′ (0)
g ′ (0) =
ψ(0)2
hAx, vi + hAv, xi hAx, xi hx, vi
= −2 .
hx, xi (hx, xi)2
We hope that the simplicity impresses about the utility of our principle!
Ex. 17. In Example 8, the function is f (x) := x21 + · · · + x2n . From our work over there we
know grad f (a) = 2a. Assuming the algebra of differentiable functions, f is differentiable.
Compute the partial derivatives and hence grad f (a). Compare your work with Example 8
P
Carry out a similar investigation with f (x) := i,j aij xi xj of Example 10.
Let us look at another instance of the principle of reduction to the one-variable case. Let
f : U ⊂ Rn → R be differentiable at all point of U . Assume that p ∈ U is a point of local
maximum, that is, f (x) ≤ f (p) for all x ∈ B(p, r) for some r > 0. We claim that Df (p) = 0.
∂f
In view of (7) and (8), it suffices to show that ∂x i
(p) = 0 for 1 ≤ i ≤ n. Consider the
one-variable function g(t) := f (p + tei ) defined on (−ε, ε) for sufficiently small ε > 0. Since
g(0) = f (p) and p + tei ∈ B(p, r), we see that t = 0 is a maximum of g on (−ε, ε). From
one-variable result, it follows that g′ (0) = 0. What is g ′ (0)? As done earlier,
g(t) − g(0) f (p + tei ) − f (p) ∂f
lim = lim = Dei f (p) = (p).
t→0 t t→0 t ∂xi
8
Hence the claim is proved.
A third instance of the principle in action is seen in the proof of the mean value theorem
for differentiable functions f : U → R.
Proof. Consider g(t) := f ((1 − t)x + ty) on [0, 1]. Then g is continuous on [0, 1] and differ-
entiable on (0, 1). (To show that g is differentiable, adapt the computation of g′ (t0 ) below.)
Hence by mean value theorem of one-variable calculus, there exists t0 ∈ (0, 1) such that
g(1) − g(0) = g ′ (t0 )(1 − 0) = g ′ (t0 ). What is g ′ (t0 )?
(Note that the above computation proves that g is differentiable at t0 and computes the
derivative.)
Corollary 19. Assume that U is star-shaped at a ∈ U , that is, the line segment [a, x] ⊂ U
for all x ∈ U . Assume that f : U → R be differentiable on U and that Df (x) = 0 for x ∈ U .
Then f is a constant.
Proof. Let x ∈ U be arbitrary. By the mean value theorem, the exists z ∈ [a, x] such that
∂ α1 +···+αn f
∂xα1 1 · · · ∂xαnn
9
and are continuous. (Note that ∂f ∂x etc. are functions from U to R so we can speak of their
partial derivatives w.r.t. x or y.)
To make things simple, let us also assume that 0 ∈ U , U is star-shaped at 0 and that we
wish to find a ‘Taylor expansion’ of f at 0. Consider the one-variable function g(t) := f (tx).
We first observe that this is defined for t in (−ε, 1 + ε) for some sufficiently small ε > 0.
For, since U is open and 0 ∈ U , there exists a ε1 > 0 such that B(0, ε1 ) ⊂ U . Hence
0 + tx ∈ B(0, ε1 ) provided that ktxk = |t| kxk < ε1 , that is, when |t| < ε1 / kxk. A similar
consideration will show that x + tx = (1 + t)x ∈ B(x, ε2 ) ⊂ U if |t| < ε2 / kxk. If we
choose ε := min{ε1 / kxk , ε2 / kxk}, then tx ∈ U for t ∈ (−ε, 1 + ε). Let us show that g is
differentiable on this interval and compute its derivative.
g(t + h) − g(t) f ((t + h)x) − f (tx)
g ′ (t) = lim = lim
h→0 h h→0 h
f (tx + hx) − f (tx)
= lim
h→0 h
= Dx f (tx) ≡ Df (tx)(x) by (7)
n
X ∂f
= (tx)xi by (8).
∂xi
i=1
In particular,
n
′
X ∂f
g (0) = (0)xi . (10)
∂xi
i=1
∂f
Let gi (t) := ∂xi (tx). If we proceed as above, we find that
n n
X ∂gi X ∂2f
gi′ (t) = (tx)xj = (tx)xj , (11)
∂xj ∂xj ∂xi
j=1 j=1
so that we have
n
X
g′′ (t) = gi′ (t)
i=1
n n 2f
X X ∂
= (tx)xj xi
∂xj ∂xi
i=1 j=1
n
X ∂2f
= (tx)xj xi . (12)
∂xj ∂xi
i,j=1
Note that the above computation show that g is twice continuously differentiable and so we
can apply Taylor’s theorem of one-variable calculus to g. We get
g(t) = g(0) + g ′ (0)t + g′′ (0)t2 + R,
where the remainder R is such that limt→0 R/t2 = 0. Taking t = 1 in the last displayed
equation, we deduce the Taylor’s formula for f .
n n
X ∂f X ∂2f
f (x) = f (0) + (0)xi + (0)xi xj + R. (13)
∂xi ∂xi ∂xj
i=1 i,j=1
10
From (13), it is easy to deduce the sufficient condition (in terms of second order partial
derivatives) for the local maximum/minimum of f . If we assume, for instance x = 0 is a point
of local maximum, then t = 0 is a point of local maximum for g. Hence g′′ (0) ≤ 0. Coming
back to f , this implies that
n
X ∂2f
(0)xi xj ≤ 0
∂xi ∂xj
i,j=1
for all choices of x in a neighbourhood of 0. This is same as saying that the matrix
∂2f ∂2f ∂2f
∂x21
(0) ∂x1 ∂x2 (0) . . . ∂x1 ∂xn (0)
D 2 f (0) :=
..
.
2
∂ f 2
∂ f 2
∂ f
∂xn ∂x1 (0) ∂xn ∂x2 (0) . . . ∂x2
(0)
n
11
Proof. We do not prove this. The aim here is to make sure that you understand (14) and
(15).
In case (a), the function g ◦ f is a function from R to R and hence it is enough to know
its derivative in the usual calculus sense. (14) says that this derivative is the real number got
by letting the linear map Dg(f (t)) : Rn → R act on the vector f ′ (t).
In case (b), the function ϕ ◦ g is from Rn to R. Its derivative is known if we know the
gradient. (15) says that this gradient is grad g(p) multiplied by the scalar ϕ′ (q).
Proof. Let R∗ stand for the set of nonzero real numbers. Let ϕ : R∗ → R be given by
ϕ(t) = 1/t. Then ϕ is differentiable on R∗ and we observe that g = ϕ ◦ f . Hence g is
differentiable at a and we deduce from (15),
1
grad g(a) = ϕ′ (f (a)) grad f (a) = − grad f (a).
f (a)2
Proof. Since f ◦ c is a constant, we see that (f ◦ c)′ (t) = 0 for all t. By (14), we have
Remark 23. Let us bring out the geometric significance of the last result. Let us assume
that n = 3. Then the set of points {x ∈ U : f (x) = α}, if non-empty, can be considered as a
‘surface’. Then the hypothesis on c says that the curve c lies entirely in S. Hence c′ (t) can
also be thought of a vector tangent to the surface S. Thus, we see that grad f (a) is ‘normal’ to
the surface at a ∈ S. Take some linear and quadratic functions on R3 and try to understand
this remark.
12
Geometric Meaning of Derivatives and Partial Derivatives
13
is a plane containing the two tangent lines. (Observe that, by taking y = b or x = a we get
the tangent lines!) The figure illustrates these points.
2 2
0 0
2 2
-2 -2
0 0
-2 -2
0 0
-2 -2
2 2
1
2
0.5
0 0
2 2
-2 -0.5
-1
0 0
-2 -2
0 0
-2 -2
2 2
1 1
0.5 0.5
0 4 0 4
-0.5 -0.5
2 2
-1 -1
-2 0 -2 0
0 0
2 -2 2 -2
4 4
14