Differential Multivariate Calculus
Differential Multivariate Calculus
Fall 2011
Part II
You will be familiar with single variable functions such as those illustrated in Figure 1.
x x x
−1
−1
In each of these cases, the function f (·) acts as a ‘black box’, operating on a single variable, x, and outputting one,
and only one, number, z = f (x), that depends only on x, as shown in Figure 2 (left). The values of x on which f (x)
is defined is termed the domain of f (x): in the three examples above, the domain of f (x) is, respectively, the real
line, the real line and the real line excluding the point −1. The range (or image) of f (x) is the set of points to which
the function maps its domain: for the three examples above, the range is, respectively, the set of non-negative real
numbers, the segment of the real line [−1, 1] and the real line excluding the point 0.
In this part of the course we shall be analyzing functions of multiple variables, of the form shown in Figure 2
(right). In this example, the function f (x, y) takes two inputs, x and y and outputs one, and only one, number,
z = f (x, y). Whereas the domains of the functions f (x) in Figure 1 were subsets of the real line making up the
x-axis, the domain of the two variable function z = f (x, y) is composed of subsets of the (two-dimensional) xy-
plane, as illustrated in Figure 3. The range of the function z = f (x, y) is a subset of the real line that forms the
z-axis. Therefore f (·, ·) maps each point (x∗ , y∗ ) in its domain onto the point z∗ = f (x∗ , y∗ ) in its range. In the three
dimensional space with coordinates x, y, z, the result of this mapping is the surface illustrated in Figure 3, wherein
each point on the surface satisfies z = f (x, y).
Note that what distinguishes a surface formed by a function z = f (x, y) from any other surface is that the mapping
from (x, y) to z is unique. This is in contrast to the sphere x2 + y2 + z2 = 1, for example, where each point in the
x, y plane maps onto two values of z.
1
Part II: Differential calculus of multivariable functions 2
If f (·) were a function of three variables, such that w = f (x, y, z), the domain of f (·) would be a subset of the three
dimensional space defined by the x-, y- and z-axes. Drawing such a function would be very difficult as it would
require four axes. We shall mainly be concerned with functions of two variables, although the techniques we shall
study are applicable to functions of any number of variables.
x
x f (x) z f (x, y) z
y
Figure 2: A function f (x) of one variable (left) and a function f (x, y) of multiple variables (right).
z*=f(x*,y*) z=f(x,y)
(x*,y*) y
x
Domain of f(x,y)
As an example of a multivariable function, a plot of z = x2 − y2 is shown in Figure 4. Note that the intersection
of this function with the planes of the form x =constant and y =constant yields a series of parabolic curves. For
example, the intersection of the function with the plane y = 0 yields a parabolic curve in three dimensional space
in the shape of z = x2 .
z=4-y2
4
z=4-y2
2 z=x2
z=-y2
z 0
-2 z=x2-4
z=x2-4
-4
2
0
1 2
-2 -1 0
-2
y x
In a function such as z = f (x, y) the points in the function’s domain in xy-plane map onto its range in the z-axis.
We now fix a point in the functions range, z = c, say, and we look for all the points in the xy-plane which are such
that f (x, y) = c. The set of all such points is termed a level set of f (x, y). In the two dimensional xy-plane, such
a level set forms a curve, termed a contour line. Clearly there is a different level set of the function (and hence a
different curve in the xy-plane) for every point c in the function’s range. For functions of three variables such as
w = f (x, y, z), we have, rather than contour lines, contour surfaces in three dimensional space, where f (x, y, z) = c,
where c is constant.
As an example, consider the function z = f (x, y) = x2 + y2 , shown in Figure 5. In three-dimensional space, this
function forms a bowl-shaped surface, as shown in Figure 5 (left). Level sets of this function are of the form
x2 + y2 = c. When√projected onto the the xy-plane in a contour plot, shown in Figure 5 (right), each level set forms
a circle of radius c, centered on the origin.
Note that as we move radially away from the origin of the xy-plane, the circles gradually become closer together
for equally spaced values of z. What this signifies is that the sides of the bowl shown in Figure 5 (left) gradually
become steeper as the distance away from the origin of the xy-plane increases.
z=f(x,y)=x2+y2 y
z=4
z=3
z=2 x
z=1 z=4
z=3
y
x z=2
z=1
Figure 5: The function z = f (x, y) = x2 + y2 and its contour plot showing the level surfaces of z = f (x, y), at
z = 1, 2, 3, 4.
As another example, consider the plane z = x, shown in Figure 6 (left). Here, z increases as x increases. Further-
more, z is independent of y, and will therefore not increase if we change y, so long as x remains fixed. For this
reason, the contour plot in Figure 6 (right) shows that the contour lines in this function are parallel to the y-axis.
If we consider a particle moving on the plane illustrated in Figure 6 (left), then if the motion of the particle is in a
direction parallel to the y-axis, then the z coordinate of the particle will not change, and the particle will continue
moving along the same contour line. If, on the other hand, the particle moves along the x axis, it will traverse the
level sets orthogonally, leading to a change in the value of z. Note that the steepness of the plane is constant. This
is reflected in the contour plot by the fact that the contour lines are equally spaced for equally spaced values of z.
The function and contour plot illustrated in Figure 7 are those of the function z = x2 − y2 . Note that at the origin,
the function is locally flat, and this is reflected in the fact that the contour lines are relatively distant from each
other at that point. The steepness of the function increases along the two axis and, correspondingly, the contour
lines become closer together along the two axis.
2 2
The hill-shaped function z = e−(x +y ) illustrated in Figure 8 is relatively flat at the ‘foot of the hill’, and this is
illustrated in the fact that at points that are radially distant from the origin, the contour lines are far apart. The
contour lines are closest together at the sides of the hill, where the function is steepest. At the ‘top of the hill’, the
function is locally flat, which is reflected in the fact that at the origin, the contour lines are again widely spaced.
As a further example, we consider the three variable function w = f (x, y, z) = x2 + y2 + z2 . This function is difficult
to visualize, as it requires four axis, although some insight into its contour plot can be gained by analogy with the
example of Figure 5. The level sets of w = f (x, y, z) are the sets x2 + y2 + z2 = c, where c is fixed for each level
1.5
z=-0.5
z=-1.5
z=0.5
z=1.5
2
z=-1
z=0
z=1
1
1
0.5
z 0
y 0
-1
-0.5
-2 -1
2
2
-1.5
0
y 0
-2 -2 x -2
-2 0 2
x
2
4
1.5
2 1
0.5
z 0
y 0
-2 -0.5
-1
-4
2 -1.5
0
2 -2
y -2 -2
0
x -2 0 2
x
1.5
1
1
0.8
0.6 0.5
z 0.4 y 0
0.2 -0.5
0 -1
2
2 -1.5
0
0
y -2 -2 x -2
-2 0 2
x
2 +y2
Figure 8: The function z = e−(x ) and its contour plot.
√
set. As such, the level sets of this function are spherical surfaces of radius c, centered on the origin, as shown in
Figure 9. In this case, ‘steepness’ of the
√ function w = f (x, y, z) roughly corresponds to how quickly w will increase
in response to an increase in distance c from the origin x = y = z = 0. As with the example of Figure 5, f (x, y, z)
becomes ‘steeper’ with radial distance from the origin. To see this, consider the two level sets of f (x, y, z):
w1 = x2 + y2 + z2 = c1
w2 = x2 + y2 + z2 = c2
√
where c2 > c1 . Let the radial distance between the two spheres represented by these surfaces be ε , so that c2 −
√ √
c1 = ε . For these two surfaces, w2 − w1 = c2 − c1 = 2ε c1 + ε 2 . Therefore w2 − w1 is an increasing function
of c1 , meaning that as we move away from the origin, f (x, y, z) increases at an increasing rate - that is, it becomes
‘steeper’.
z z z z
w=16
w=9
w=4
w=1
x y x y x y x y
2 Partial derivatives
We have used contour plots to show how functions of multiple variables vary across their domains. Figure 10
shows the contour lines of a function f (x, y). At the point (x∗ , y∗ ), the function will increase in the positive x
direction and decrease in the positive y direction. Clearly therefore, any notion of slope of a multivariable function
will in general be a function of direction.
y f(x,y)=1
f(x,y)=2
f(x,y)=3
∆y f(x,y)=4
(x*,y*) ∆x
Figure 10: The rate of change in f (x, y) at (x∗ , y∗ ) is dependent on direction: ∆x leads to an increase in f (x, y),
whilst ∆y leads to a decrease.
This gives rise to a notion of steepness of such functions. This notion is strongly related to the concept of the
derivative of a single variable function that you will already be familiar with. In this section, we shall define this
idea in a more formal way.
As illustrated in Figure 11, the derivative ddxf (also denoted f ′ (x)) of a single variable function f (x) at any point x
has the graphical interpretation of being the slope of the f (x) at x.
df f (x + ∆x) − f (x)
= lim
dx ∆x→0 ∆x
In this definition, the derivative is the limit of the change in f (x) in response to an increment in x in the positive x
direction, in the domain of the function.
f (x)
df
dx = f ′(x)
df
Figure 11: A function y = f (x) and its derivative dx .
Unlike single variable functions, functions of multiple variables have a multi-dimensional domain. For a function
of two variables f (x, y), we can therefore conceive of the change in f (x, y) in response to an increment in x in
the positive x direction and, separately, in response to an increment in y in the positive y direction. For f (x, y)
therefore, this leads to the definition of two partial derivatives, one in the direction of x, one in the direction of y.
Definition 1. The partial derivative of a function f (x, y) with respect to x is
∂f f (x + ∆x, y) − f (x, y)
= lim
∂ x ∆x→0 ∆x
∂f ∂f
The notation ∂x is often shortened to fx . Likewise, ∂y is equivalent fy .
These two derivatives have the following interpretation:
• The partial derivative ∂∂ xf at a point (x, y) = (x0 , y0 ) is the slope of the function f (x, y) at (x0 , y0 ) in the
direction of increasing x, with y held fixed, as illustrated in Figure 12 (left).
• The partial derivative ∂∂ yf at a point (x, y) = (x0 , y0 ) is the slope of the function f (x, y) at (x0 , y0 ) in the
direction of increasing y, with x held fixed, as illustrated in Figure 12 (right).
Finding the partial derivative of a function with respect to one of its variables can be found by differentiating the
function with respect to that variable, whilst treating all the other variables as constants.
1
Example 1. Evaluate the partial derivatives of f (x, y) = (x2 + y3 ) 2 with respect to x and y.
∂f
To find ∂x , consider y to be a constant, and differentiate with respect to x.
∂f 1 1
= (x2 + y3 )− 2 (2x)
∂x 2
z z
∂f (x0,y0,z0)
∂x ∂f
(x0,y0,z0) ∂y
z=f(x,y) z=f(x,y)
y=y0 y=y0
y y
(x0,y0) (x0,y0)
x x
∂f
To find ∂y , consider x to be a constant, and differentiate with respect to y.
∂f 1 1
= (x2 + y3 )− 2 (3y2 )
∂y 2
2 +y2 +z2 ) ∂f
Example 2. For f (x, y, z) = e−(x , find ∂z at (x, y, z) = (0, 0, 1).
∂f
To find ∂z , consider x and y to be constants, and differentiate with respect to z.
∂f 2 2 2
= −2ze−(x +y +z )
∂z
∂f
At (0, 0, 1), ∂z = − 2e .
Note that a partial derivative of a function of x and y is, in general, itself a function of x and y as well, as shown
in the examples above. We can therefore take further partial derivatives of partial derivatives. For example, for the
function f (x, y) = sin(2x + y) the first partial derivatives are
∂f ∂f
∂x = 2 cos(2x + y) ∂y = cos(2x + y)
! "
∂ ∂f ∂2 f
∂x = ∂ x2
= fxx = −4 sin(2x + y)
! ∂x " 2
∂ ∂f ∂ f
∂y = ∂ y∂ x = f yx = −2 sin(2x + y)
! ∂x "
∂ ∂f ∂2 f
∂x = ∂ x∂ y = f xy = −2 sin(2x + y)
! ∂y "
∂ ∂f ∂2 f
∂y ∂y = ∂ y2
= fyy = − sin(2x + y)
In the above example, it is no coincidence, and, in fact, it is always the case, that
∂2 f ∂2 f
=
∂ x∂ y ∂ y∂ x
3 Differentials
Suppose that we know the value f (x0 ) and derivative ddxf (x0 ) of a single variable function f (x) at some x = x0 .
Given a small δ x, a rough approximation of the change in the value of the function between x = x0 and x = x0 + δ x
is, as shown in Figure 13, given by
df
f (x0 + δ x) − f (x0 ) = δ f ≈ (x0 )δ x
dx
As δ x is made smaller, the approximation clearly becomes increasingly accurate. In the limit as δ x → 0,
df
δ f →df = dx
dx
f (x)
df
dx (x0 )
δf
df
dx (x0 )δ x
f (x0 )
δx
x
x0
Figure 13: A local approximation of f (x).
We can carry out the same approximation exercise for a multivariable function f (x, y), with the difference that in
this case we make use of the multivariable function’s partial derivatives. Therefore, if we know the value of f (x, y)
at x = x0 , y = y0 in addition to the partial derivatives ∂∂ xf and ∂∂ yf at that point, then, given an increment δ x in the
x-direction, and an independent increment δ y in the y-direction, the change in f due to these increments in the
xy-plane is approximately
∂f ∂f
f (x0 + δ x, y0 + δ y) − f (x0 , y0 ) = δ f ≈ (x0 , y0 )δ x + (x0 , y0 )δ y
∂x ∂y
In the limits δ x → 0 and δ y → 0, we obtain the total differential
∂f ∂f
df = dx + dy
∂x ∂y
For functions z = f (x, y), it is implied that x and y are free to vary independently of each other, whilst z is dependent
on x and y.
Consider a flat hot plate, and suppose that points on the plate can by described using (x, y) coordinates. Suppose
that we know the temperature T at each point on the plate, so that T = T (x, y).
Now define a parameterized curve C in the xy-plane parameterized by t, so that the position vector of points along
the curve is r(t) = x(t)i + y(t)j. This curve can represent the path traced out by a particle moving on the surface
of the hot plate with time. As the particle moves along the curve, it will sense a change in temperature as its (x, y)
coordinates vary. The temperature T can therefore be regarded as
T = T (x(t), y(t))
in other words, T can be regarded as a function of two functions, x(t) and y(t), of a single, independent, variable t.
In another scenario, suppose we know the pressure P in a volume of air (say, in a section of the atmosphere) at each
point in the volume, so that at each point (x, y, z), we know P = P(x, y, z). Suppose we also know the temperature
T at different points in the volume, so that T = T (x, y, z). Therefore P and T are functions of three independent
variables (x, y, z). Now if we know that the density D of the air at any point is a function of its local temperature
and pressure, then we can write the density as an explicit function of P and T :
In other words, D is a function of two functions (P and T ) which are themselves functions of three independent
variables (x, y, z).
For functions such as those above, describing the temperature on the hot plate as a function of time, T = T (x(t), y(t))
and the density in a volume of air as a function of position, D = D(P(x, y, z), T (x, y, z)), we may wish to compute
derivatives such as
dT ∂D
and
dt ∂x
To find such derivatives, we turn to the chain rule.
For functions of functions of single variables such as y = f (u), with u = g(x), the chain rule allows us to to find
dy
the derivative dx thus
dy dy du
=
dx du dx
The chain rule extends to multivariable functions. Consider the temperature profile of the hot plate described
above. We know that the differential of T with respect to x and y is
∂T ∂T
dT = dx + dy
∂x ∂y
and since x and y are both functions of t, we know the differentials of x and y are
dx dy
dx = dt and dy = dt
dt dt
If we substitute these latter differentials into the differential for T and divide by dt we obtain
dT ∂ T dx ∂ T dy
= +
dt ∂ x dt ∂ y dt
which is the chain rule for the multivariable function T = T (x, y).
As another example, consider the density D in a volume of air which, as described above, is a function of the local
pressure P and the local temperature T , both of which vary in space, and are therefore functions of position x, y, z.
That is,
D = D(P, T ) P = P(x, y, z) T = T (x, y, z) (1)
Suppose we wish to find a quantity such as the rate of change of air density along the x-direction, ∂∂Dx . Since D is
an explicit function of P and T , we can compute ∂∂ DP and ∂∂ D
T . We therefore know that the differential of D with
respect to P and T is
∂D ∂D
dD = dP + dT (2)
∂P ∂T
At the same time, we know that the differentials of P and T with respect to x, y, and z are
∂P ∂P ∂P ∂T ∂T ∂T
dP = ∂ x dx + ∂ y dy + ∂ z dz dT = ∂ x dx + ∂ y dy + ∂ z dz
∂D
∂P : Since D is a function of P and of T , this partial derivative will also, in general, be a function of both P and T .
∂P
∂x : Since P is a function of x, y, z, this partial derivative will also, in general, be a function of x, y, z,t.
∂D
∂T : Since D is a function of P and of T , this partial derivative will also, in general, be a function of both P and T .
∂T
∂x : Since T is a function of x, y, z, this partial derivative will also, in general, be a function of x, y, z.
2
Suppose we now wish to find ∂∂ xD2 . To do this, let F = ∂∂Dx . Because ∂∂ DP , ∂∂ D T are explicit functions of P and T ,
and because ∂∂ Px , ∂∂Tx are explicit functions of x, y, z, we know that ∂∂Dx will be a function of P, T, x, y, z. Therefore
∂D ∂ 2D ∂F
F= ∂x = F(P, T, x, y, z), and ∂ x2
= ∂x .
Now F = F(P, T, x, y, z), and therefore we can regard it as an explicit function of P, T, x, y, z. However, at the same
time, P and T are both functions of x, y, z, and so we can also say that F = F(P(x, y, z), T (x, y, z), x, y, z), and is
therefore an explicit function of x, y, z only. Therefore when we compute ∂∂Fx , we need to draw the important
distinction between
∂F
1. regarding F as a function of P, T, x, y, z and computing ∂x with P, T, y, z held constant. This is denoted as
! "
∂F
∂x P,T,y,z
! "
∂F ∂F
2. regarding F as a function of x, y, z only and computing ∂x with y, z held constant. This is denoted as ∂x y,z
and therefore
# $ ) # $ *
∂ 2D ∂F ∂F ∂P ∂F ∂T ∂F
= = + +
∂ x2 ∂x y,z ∂P ∂x ∂T ∂x ∂ x P,T,y,z
# # $$ ) # # $$ *
∂ ∂D 2 2
∂ D ∂P ∂ D ∂T ∂ ∂D
= = + +
∂x ∂x y,z ∂ P2 ∂ x ∂ T 2 ∂ x ∂x ∂x P,T,y,z
Consider a point mass moving through the atmosphere, having instantaneous position (x(t), y(t), z(t)) at time t.
Suppose that the temperature in the atmosphere varies over time and space. Let the temperature sensed by the point
mass as it moves through the atmosphere be τ = τ (x, y, z,t). Then the rate of change of the sensed temperature is
dτ ∂ τ dx ∂ τ dy ∂ τ dz ∂ τ
= + + +
dt ∂ x dt ∂ y dt ∂ z dt ∂ t
dx dy dz ∂x ∂y ∂z
Note that we have used the notation dt , dt , dt rather than ∂t , ∂t , ∂t , which is because the variables (x(t), y(t), z(t))
are single variable functions of t.
∂τ dτ
Note also that the difference between ∂t and dt is that
∂τ
• ∂t is the rate of change of the temperature with the position held fixed. For example, this can represent the
temperature changes as sensed by a thermometer at a fixed point in the atmosphere, where such changes oc-
cur due to seasonal changes. In other words, this partial derivative signifies changes in the local temperature
only.
dτ
• dt is the rate of change in temperature with time due to the combined effect of the change in local temperature
at each point, in addition to the change in temperature sensed by moving through the atmosphere between hot
and cold regions. For example, seasonal changes can make the temperature at all points in the atmosphere
change with time. A particle moving through the atmosphere can also move between hotter and cooler
regions whilst these seasonal changes are occurring. Therefore the change in temperature sensed by the
particle would be from the combined effect of the change in position (from hotter to cooler areas) and the
change in temperature (due to seasonal changes). Therefore we can split the chain rule for ddtτ into the
following
dτ ∂ τ dx ∂ τ dy ∂ τ dz ∂τ
= + + +
dt ∂ x dt ∂ y dt ∂ z dt ∂t
%&'(
% &' (
Sensed change in temperature due to change in position Sensed change in temperature due to local temperature changes
A ‘variable dependency graph’ shows the dependencies of functions on their variables. Suppose we have the
functions
u = u(v, w,t) v = v(x, y, z) w = w(x, y) x = x(t) y = y(t) z = z(t)
we can sort the dependence of each function on the variables in its domain as a tree, as shown in Figure 14
v w t
x y z x y
t t t t t
This graph can be used to quickly calculate partial derivatives associated with the above functions. As an example,
to find ∂∂ ux :
For the example of ∂∂ ux , there are two paths between u and x: u-v-x and u-w-x (Step 1). The product of the partial
derivatives along the branch u-v-x is ∂∂ uv ∂∂ xv , whilst for the branch u-w-x this product is ∂∂ wu ∂∂ xv (Step 2). Finally (Step
3), the sum of these two products gives the partial derivative ∂∂ ux as:
∂u ∂u ∂v ∂u ∂v
= +
∂x ∂v ∂x ∂w ∂x
Because all the sub-functions of u are in the end functions of the single variable t, we can find the ordinary
derivative du
dt in exactly the same way as above. This yields
du ∂ u ∂ v dx ∂ u ∂ v dy ∂ u ∂ v dz
= + +
dt ∂ v ∂ x dt ∂ v ∂ y dt ∂ v ∂ z dt
∂ u ∂ w dx ∂ u ∂ w dy
+ +
∂ w ∂ x dt ∂ w ∂ y dt
∂u
+
∂t
du ∂u
Once more, note the distinction between dt and ∂t , highlighted in Section 4.4.
w t
x t
However, following the three step procedure described above would yield the term ∂∂ut on both sides of the equation.
This is clearly incorrect. What we need to do is ensure that when working down a dependency graph and a
particular partial differentiation is being performed, all other variables at the same level of the graph are kept
constant. For this example, when, in Step 2, we find ∂∂ut along the path from u to t, we do this with w kept constant.
This partial derivative is denoted # $
∂u
∂t w
and equals 3t 2 in this example. Then the total partial derivative of u with respect to t is given by applying Step 3,
and, as before, we obtain # $ # $
∂u ∂u ∂w ∂u
= + = 2t + 3t 2
∂t ∂w t ∂t ∂t w
Finally, if we are to use the variable dependency graph to find second (or higher order derivatives) we repeat
Steps 1-3 above, but we need to be careful with regards to the variables on which depends the function we are
differentiating. With reference to the functions (1), the dependency graph for the function D, which is, explicitly,
only a function of P and T is
P T
x y z x y z
However, the function ∂∂Dx obtained in (4) is, in general, an explicit function of each of P, T, x, y, z. Therefore its
variable dependency graph is
∂D
∂x
P T x y z
x y z x y z
5 Gradients
We have seen that contour plots are useful for showing how the value of a function varies over its domain. Qualita-
tively, we can use the contour plot to visually determine, for example, that for a function z = f (x, y), z will remain
unchanged along the contour lines, but will vary across them. We can also tell the function’s steepness by the
spacing between its contours. Therefore we can determine the direction of change of the function’s level sets, and
we can determine the magnitude of the rate of change of the function’s level sets. This implies that a vector can be
used to quantify the direction and magnitude of the rate of change of a multivariable function at each point in its
domain.
Consider a function w = f (x, y, z). For constant c, a level set of this function is a surface S on which f (x, y, z) = c, as
shown in Figure 15 (left). We can define two lines, L1 and L2 on the surface S, that cross at a point P on the surface.
Each of these lines are parameterized by a parameter t (for example, t can represent time, which parameterizes the
the motion of a particle along each of these two lines).
Since S is the level set w = f (x, y, z) = c, the value of w does not change along the two lines. Therefore on each of
L1 and L2 , dw
dt = 0.
z z
L1 ∇f
L2
Tangent
x x plane
Point y y
P
Surface S: Surface S:
f(x,y,z)=c f(x,y,z)=c
to be the gradient vector of the function f (x, y, z). For a function of n variables, the gradient is a vector of dimension
n.
Now, if the position vector of any point on the line L1 is r(t) = x(t)i + y(t)j + z(t)k, we have seen previously that
the vector dr dr
dt is tangential to the line L1 at that point. From (7), the dot product between ∇ f and dt is zero on lines
on the level surface S. We can therefore say that for any point on L1 , the vector ∇ f at the point is perpendicular to
the tangent to L1 at the point.
Note that this is also true of points on L2 , and of any other line running along the surface S.
If we now turn to the point P, the tangent plane to the level surface S at the point P is tangential all the lines
(L1 , L2 , · · · ) running along S and passing through P. But since ∇ f is perpendicular to all these lines at P, we can
therefore conclude that ∇ f at the point P is perpendicular to the tangent plane to the surface S at point P.
1 Sketch the level curves of the function f (x, y) = 1, f (x, y) = 4 and f (x, y) = 9.
√ √
2 Calculate the gradient of f (x, y) at the points (1, 0), ( 2, 2) and (0, 3) and plot arrows representing the
directions of the gradient at each of these points.
See Figure 16. Note that ∇ f (x, y) is a vector perpendicular to the level curves of f (x, y).
y
f(x,y)=9
f(x,y)=4 ∇f=6j
f(x,y)=1 (0,3)
∇f=2√2(i+j)
(√2,√2)
(0,1)
x
∇f=2i
Example 4. Find the tangent vector T to the curve C at the point (1, 0, 0), formed by the intersection between the
surfaces x2 + y2 + z2 = 1 and z = 0.
The sphere x2 + y2 + z2 = 1 is a level surface of the function w = x2 + y2 + z2 − 1, when w = 0, whilst the plane
z = 0 is the level surface of the function v = z, at v = 0. The gradients of each of these functions are normal to their
level surfaces in three-dimensional space:
The curve C, lies on both of the level surfaces, and therefore the the vectors ∇w and ∇v are both normal to the
curve C. The tangent vector T to the curve C is tangent to both level surfaces and therefore perpendicular to both
gradient vectors. Given two vectors in three dimensional space, a third vector perpendicular to both is given by
their cross product. Therefore the tangent vector T to the curve C is given by
1 1
1 i j k 11
1
T = ∇v × ∇w = 11 0 0 1 11 = −2yi + 2xj
1 2x 2y 2z 1
The intersection of these two surfaces is the unit circle in the xy-plane, centered on the origin O. In Figure 17, we
can see that the vector ∇w points radially out of the unit sphere, which, at the point (1, 0, 0), points in the positive
x direction. The vector ∇v points ‘upwards’ in the positive z direction. The tangent to the unit circle at (1, 0, 0)
points in the positive y direction, as expected from the cross product of the two gradient vectors at that point.
z y
y
∇v ∇v×∇w
x
∇v×∇w
Curve C O
∇w x
Figure 17: The tangent vector to curve formed by the intersection of the the unit sphere with the xy-plane.
We have seen that the gradient of a function at a point in the function’s domain is a vector that points in a direction
that is perpendicular to the level surface of the function. Along the level surface, the rate of change of the function
is zero, by definition. Because it is perpendicular to the level surface, it may therefore be expected that the direction
of the gradient vector is that in which the rate of change of the function is maximal. We shall next demonstrate that
this is so.
Consider the following three unit vectors, each aligned with one of the x, y and z axis:
2 3∗
T̂x = 2 1 0 0 3
∗
T̂y = 2 0 1 0 3
∗
T̂z = 0 0 1
and next consider the dot product of each of these unit vectors with ∇ f , where w = f (x, y, z).
∂f
∇ f · T̂x = ∂x
∂f
∇ f · T̂y = ∂y
∂f
∇ f · T̂z = ∂z
We therefore see that the dot product of ∇ f with a unit vector such as T̂x , T̂y , T̂z , gives the rate of change of f in
the direction of the unit vector. In fact, this is true for any unit vector T̂, and not only those aligned with the axes.
The quantity ∇ f · T̂ is a scalar quantity called the directional derivative of f in the direction of the unit vector T̂,
and gives the rate of change of f in the direction of T̂.
The directional derivative is, as the name suggests, a derivative. Suppose that for some function f (x, y), there exists
a parameterized curve r(t) on the xy-plane, the domain of the function. The directional derivative can be used to
determine how the function changes along the curve, as shown in Figure 18.
We have seen that a tangent vector T̂ to the curve r(t) at any point is given by
dr
T̂ =
ds
T̂
(x*,y*)
f(x,y)=c4
f(x,y)=c3
f(x,y)=c2
r(t)
f(x,y)=c1
Figure 18: The directional derivative along a parameterized curve r(t) with tangent vector T̂.
where s is a measure of arc length along the curve. Now the directional derivative of f (x, y) along the tangent
vector T̂ is
∂ r ∂ f dx ∂ f dy ∂ f dz d f
∇ f · T̂ = ∇ f · = + + = (8)
ds ∂ x ds ∂ y ds ∂ z ds ds
Therefore the directional derivative of f (x, y) along the direction of the tangent vector T̂ is the rate of change of
f (x, y) with respect to distance s traversed in the direction of T̂, ddsf .
∇ f · T̂ = |∇ f ||T̂| cos θ
= |∇ f | cos θ since T̂ is a unit vector
• the maximum rate of increase of the function f (x, y, z) is in the direction aligned with ∇ f , since −1 ≤
cos θ ≤ 1 (i.e. when ∇ f and T̂ are parallel and θ = 0),
• the maximum rate of decrease of the function f (x, y, z) is in the direction opposite to ∇ f , (i.e. θ = 180◦ ),
• the rate of change of f (x, y, z) in the direction perpendicular to ∇ f (i.e., along a contour or level set) is always
zero, as per the definition of a contour.
Figure 19 shows a contour line of the function f (x, y) = x2 + y2 , where f (x, y) = 1. At the point ( √12 , √12 ), the
directional derivative in the direction T̂1 (perpendicular to the contour line) is 1. In Figure 20, the vector T̂1 is
similarly perpendicular to the contour the point A = ( √12 , √12 ), and therefore T̂1 is the direction of the maximal rate
increase of the function z = f (x, y) = x2 + y2 at A, given by
Figure 19 also shows that at (− √12 , − √12 ), tangentially to the contour line and in the direction of T̂2 , the rate of
change of the function is zero (in agreement with the definition of the contour line). This is also illustrated in
Figure 20 where, at the point B = (− √12 , − √12 ), the vector T̂2 points in a direction tangential to the contour line.
In this direction at the point B, there is therefore no increase in the value of the function z = f (x, y).
In Figure 19, direction T̂3 is opposite to that of ∇ f , and is therefore the direction of greatest decrease of the
function, which gives a directional derivative of −1.
As a more practical example, consider a flat hot plate on which we define a coordinate system x − y. At each point
on the plate, the temperature is given by a function T (x, y). Fourier’s law of heat conduction says that heat flow
y
T̂1
√ √
f (x, y) = 1 ∇ f ( √12 , √12 ) = 2i + 2j
f (x, y) =x2 + y2
∇ f (x, y) = 2xi + 2yj
T̂3
T̂1 = √12 i + √12 j ∇ f ( √12 , √12 ) · T̂1 = 1 x
T̂2 = − √12 i + √12 j ∇ f (− √12 , − √12 ) · T̂2 = 0
T̂3 = − √12 i + − √12 j ∇ f ( √12 , √12 ) · T̂3 = −1 T̂2
√ √
∇ f (− √12 , − √12 ) = − 2i − 2j
z=f(x,y)=x2+y2
ˆ1
∇f·T
z=1 (Maximal increase in z)
ˆ
∇f·T 2
(No increase in z)
z y
A ˆ1
T
Tˆ 2 x
B
x2+y2=1
Figure 20: The directional derivative of the function z = f (x, y) in the direction of a vector T̂1 , parallel to ∇ f , gives
the maximal rate of change of z. In the direction T̂2 , orthogonal to ∇ f (and hence tangential to the contour line of
f (x, y)), the directional derivative is zero, indicating no change in the value of z.
per unit area, q, in the direction of unit vector n, along which we measure distance s, is given by
dT
q = −λ
ds
where λ is the thermal conductivity of the material from which the plate is made. This is illustrated in Figure 21,
where the contour lines represent isotherms, which are lines of constant temperature across the plate. As in (8), the
heat flow per unit area q can be rewritten as the directional derivative
q = (−λ ∇T ) · n (9)
Note that the negative sign is due to the fact that heat flows from hot to cold areas, whilst the temperature gradient
∇T points in the direction of maximum temperature increase. From the properties of the dot product, Equation (9)
∇T
shows that the heat flow per unit area is maximized when n = − |∇T | , that is, in the opposite direction to ∇T , which
is the direction perpendicular to the isotherms and also the direction of maximum temperature decrease.
q=−[λ∇T]·n=−λ ∂T
∂s
∇T HOT
n
s q
∇T T(x,y)=T4
T(x,y)=T3
COLD
T(x,y)=T2
T(x,y)=T1
Figure 21: The heat flow per unit area q, in the direction n, across a flat, hot plate. Contour lines are isotherms
(lines of constant temperature).
Taylor series provide a means of representing a function around a point in its domain using a (possibly infinite)
power series. For a single variable function f (x), the Taylor series of f (x) around the point x = x0 is
1
f (x) = f (x0 ) + (x − x0 ) f ′ (x0 ) + (x − x0 )2 f ′′ (x0 ) + · · ·
2!
Truncating this series gives an approximation of how a function behaves around the point of interest. Note that a
first order approximation is given by
f (x) ≈ f (x0 ) + (x − x0 ) f ′ (x0 )
which is a linear approximation to the function at the point x0 , shown in Figure 22 as a line approximating the
function y = sin x near the point x = π . As Figure 22 also shows, higher order truncations give increasingly
accurate approximations of the function y = sin x near the point x = π .
Function y=sin(x)
1.5
First order approx
Third order approx
1 Fifth order approx
0.5
0
y
−0.5
−1
−1.5
−2
0 1 2 3 4 5 6
x
Figure 22: Taylor series approximations of the single variable function y = sin x.
Taylor series extend to multivariable functions. For the two-variable function z = f (x, y), at the point (x0 , y0 ), the
series takes the form
∂f ∂f 1 ∂2 f ∂2 f 1 ∂2 f
f (x, y) = f (x0 , y0 ) + (x − x0 )+ (y − y0 ) + (x − x0 )2 2 + (x − x0 )(y − y0 ) + (y − y0 )2 2 + · · ·
∂x ∂ y 2! ∂x ∂ x∂ y 2! ∂y
(10)
where all the partial derivatives are evaluated at (x0 , y0 ).
As with the single variable case, the Taylor series of a multivariable function truncated to its first order approxima-
tion gives a linear approximation of the function of interest. For the function z = f (x, y), the truncated, first order
Taylor series near the point z0 = f (x0 , y0 ) is given by
∂f ∂f
z = z0 + (x − x0 ) (x0 , y0 ) + (y − y0 ) (x0 , y0 )
∂x ∂y
which can be re-written as
⎡ ⎤
+ , x − x0
− ∂∂ xf (x0 , y0 ) − ∂∂ yf (x0 , y0 ) 1 ⎣ y − y0 ⎦ = 0 (11)
z − z0
2 3∗
The set of points in (11) is such that the vector (x − x0 ) (y − y0 ) (z − z0 ) is always orthogonal to the vector
+ ,∗
− ∂∂ xf (x0 , y0 ) − ∂∂ yf (x0 , y0 ) 1 (12)
because the dot product between these two vectors is zero. In other words, (11) defines a plane with normal vector
(12).
The significance of this vector is the following: the function z = f (x, y) defines a surface in the three dimensional
xyz-space. If we define the function w = f (x, y) − z, then the level set of this function when w = 0 is precisely the
surface z = f (x, y). In other words, z = f (x, y) is the surface composed of the set of points where w = 0. This is
a three dimensional equivalent to a contour line. For any point (x0 , y0 , z0 ) on a level surface of a function such as
w = f (x, y) − z, the gradient vector of the function at that point is always, as discussed previously, perpendicular
to the tangent plane to the level surface of the function at that point. Therefore we can say that (11) is the equation
of the plane that is tangent to the level surface w = f (x, y) − z = 0, that is, the surface z = f (x, y). As with Figure
22, we can thus see that, when truncated to the first order, the Taylor series expansion of a function about a point
provides a linear approximation to the function at that point, which, in the case of a two variable function such as
f (x, y), is a plane.
Recall from first-year Calculus the idea of implicit differentiation: if you have an equation that related x and y
dy
then you can treat y as a function of x and differentiate implicitly with respect to x to find the derivative . For
dx
example, given the equation
x2 + y2 = 25 (13)
which we recognize as the equation of a circle, we differentiate both sides with respect to x, treating y as a function
of x, to obtain
dy
2x + 2y = 0 (14)
dx
and then solve for the derivative to obtain
dy x
=− (15)
dx y
Geometrically, this expression gives the slope of the tangent line to the circle at a point (x, y) on the circle. (Note
that the expression is undefined at y = 0. Why?)
In this particular example, we could also have computed this derivative by solving for y in terms of x but the
computation is more complicated. First of all, solving for y:
4
y = ± 25 − x2 (16)
√ √
There are two solutions for y, namely y1 = 25 − x2 and y2 = − 25 − x2 , corresponding to the top half of the circle
√
(y ≥ 0) and the bottom half (y ≤ 0). (Recall that the sign returns only the positive square root.) Nevertheless,
we can differentiate this expression (we’ll differentiate both cases simultaneously) to give
dy −x
= ±√ (17)
dx 25 − x2
Although this result differs from that in (15), it is equivalent. We can re-substitute for the yi functions to obtain
(15).
In general, however, such “brute-force” computations of derivatives are difficult if not impossible because of the
complexities of the expressions.
dy
With an eye to our discussion of several variables, let us state the first-year calculus problem of finding by
dx
implicit differentiation in a more general way. Suppose that we have a relation between x and y that can be
expressed in the form
F(x, y) = 0 (18)
And now suppose that we wish to consider y as a function of x. In other words, we say that (18) defines y implicitly
as a function of x. In this case, we consider F as a function F(x, y(x)) so that its variable dependency graph has the
form
x y
Now differentiate F(x, y(x)) with respect to x using the chain rule,
dF ∂ F ∂ F dy
= + (19)
dx ∂x ∂ y dx
We can now solve for the derivative as
dy Fx
=− (20)
dx Fy
For our previous example, F(x, y) = x2 + y2 , we have Fx = 2x and Fy = 2y, thus yielding the result obtained in (15).
There is no need to memorize this equation. It is only to show you what lies behind the process of implicit
differentiation, in order to move on to higher-dimensional problems.
F(x, y, z) = 0 (21)
In principle, we can now consider any one variable as a function of the other two variables, e.g.,
x = f1 (y, z) (22)
y = f2 (z, x)
z = f3 (x, y)
Note that we may not be able to actually solve for the functions fi in closed form. Nevertheless, we can consider
(21) to define – at least mathematically – one of the three variables implicitly as a function of the other two.
This is quite relevant to your study of Physical Chemistry. You will recall that one often wishes to characterize the
state of a gas in terms of the three variables P (pressure), V (volume) and T (temperature). The equation of state
of the gas will then take the form
F(P,V, T ) = 0 (23)
For example, the well-known ideal gas law (one mole) will assume the form
F(P,V, T ) = PV − RT = 0 (24)
In this case, of course, we can easily solve for each of the functions fi :
RT RT PV
P(V, T ) = , V (T, P) = , T (P,V ) = (25)
V P R
However, for more complicated equations of state, e.g., a Van der Waals gas, it may not be possible to find closed-
form expressions for these functions. That is why implicit differentiation is useful.
Example 5. Suppose we are experimenting with an ideal gas, so that it obeys the ideal gas equation PV − RT = 0,
and suppose we consider P as a function of V and T , i.e., P = P(V, T ). Find the partial derivatives ∂∂ TP and ∂∂VP .
∂P R ∂P RT
= , =− 2 (27)
∂T V ∂V V
However, if we were to use implicit differentiation of F, then
But we don’t even have to do this. We can simply partially differentiate the original given equation, (32), implicitly
with respect to x, taking into consideration that z is a function of x and y:
∂z ∂z
2xy2 z3 + z sin y + 3x2 y2 z2 + x sin y =0 (34)
∂x ∂x
Now solve for the partial derivative:
∂z 2xy2 z3 + z sin y
=− 2 2 2 (35)
∂x 3x y z + x sin y
This is actually Example 12.20 from the textbook, Page 837. But note that we have not relied on any fancy formulas
– simple, straightforward differentiation will achieve the desired result.
We now want to talk about implicitly differentiating F(x, y, z) = 0 in general. First of all, we have to settle on what
variable is to be considered a function of the other variables. Suppose that we wish to consider z as a function of x
and y, which we may write either as
z = z(x, y) or z = f (x, y) (36)
We’ll use the former to avoid too many variables. The variable dependency graph associated with this assumption
is as follows
x y z
x y
Once again, you shouldn’t need to memorize these final formulas. They have been presented only to outline the
method behind the process. If you perform the required implicit differentiation properly, you will be able to extract
the desired partial derivative.
We have already obtained the partials of z(x, y) with respect to x and y in (40). Now let us obtain the partials of (i)
x(y, z) and (ii) y(x, z).
We now derive some relations between the various partial derivatives involving these variables – you may have
seen or will see such relations in your physical chemistry courses.
First of all, note that # $
∂x 1
= (47)
∂y z (∂ y/∂ x)z
This works for all other combinations, i.e,
# $
∂a 1
= (48)
∂b c (∂ b/∂ a)c
Also note that, for example,
# $ # $ # $ # $# $# $
∂x ∂y ∂z Fy Fz Fx
= − − − = −1 (49)
∂y z ∂z x ∂x y Fx Fy Fz
If we now let x = P, y = V and z = T and F(P,V, T ) = 0 be the ideal gas relation PV − RT = 0, then the reader can
verify by straightfoward calculation that
# $ # $ # $
∂P ∂V ∂T
= −1 (50)
∂V T ∂ T P ∂ P V
7 Optimization
A recurring application of differential calculus is the use of the derivative of a function to find its maximum or
minimum values over its domain (given they exist). For a continuous function of a single variable, a critical point
of the function is any point where the derivative is zero or is undefined. At the point where the derivative is zero,
the function is neither increasing nor decreasing - it is locally ‘flat’. If the derivative is defined over the entire
domain of the function, there are three possible cases in which a critical point will occur
• For the function f (x), illustrated in Figure 23, left column, the point x∗ , f (x∗ ) is a critical point at which
f ′ (x∗ ) = 0. The fact that f ′′ (x∗ ) is positive, means that this critical point is a minimum.
• The function g(x), in Figure 23, central column, depicts another critical point at x∗ . We know that this is a
maximum because g′′ (x∗ ) < 0.
x*
x x*
x x*
x
x*
x* x x x*
x
x*
x x*
x x*
x
Figure 23: The extrema of a single variable function: Left: a minimum. Center: a maximum. Right: a point of
inflexion.
• The function h(x) in Figure 23, right column, depicts yet another critical point at x∗ . This is a point of
inflexion, which we know from the fact that h′′ (x∗ ) = 0 and that h′′ (x) changes sign at x∗ .
As with single variable functions, there will be points in the domain of a multivariable function where the function
will reach either a maximum or a minimum. Such points are more complex to find than in the single variable case.
For a function of two variables, f (x, y), we have the following definitions:
Definition 2 (Relative minimum). A point (x∗ , y∗ ) in the domain of a function f (x, y) is a relative minimum if there
exists a circle CR , of radius R, centered on (x∗ , y∗ ) such that
Definition 3 (Relative maximum). A point (x∗ , y∗ ) in the domain of a function f (x, y) is a relative maximum if
there exists a circle CR , of radius R, centered on (x∗ , y∗ ) such that
We have seen that at a critical point of a single variable function f (x), the first derivative is such that f ′ (x) = 0,
meaning that the tangent to the graph is flat (that is, horizontal). In a multivariable function, the analogous condition
is that the tangent plane to the surface defined by the function is flat. To see this, consider Figure 24, left. The
function z = x2 + y2 is such that at (x, y) = (0, 0), z = 0. Consider a circle CR with radius R > 0, centered on
(x, y) = (0, 0). Every point in CR maps onto a point on the surface at which z ≥ 0, which is greater than the value
of z at (0, 0). Therefore we can conclude that the point (0, 0) is a critical point of the function z = x2 + y2 , at which
the function attains a relative minimum. Note that at the critical point, the function is locally flat and therefore has
a horizontal tangent plane at that point.
A similar argument holds for the local maximum of the function z = 1 − x2 − y2 illustrated in Figure 24, right.
Here, for a circle CR , of radius R > 0, centered on (0, 0), every point in the circle maps onto a point z ≤ 1. At (0, 0),
z = 1. Therefore (0, 0) is a critical point at which the function attains a relative maximum. Note that, once more,
the tangent plane to the surface is horizontal.
We have therefore seen that a critical point of a multivariable function is characterized by having a horizontal
tangent plane. Consider now the surface defined by z = f (x, y). This can be regarded as the level surface of a
function w = F(x, y, z) = f (x, y) − z, at which w = 0. The tangent plane to the level surface at a critical point is
horizontal. This means that the normal to the tangent plane is parallel with the unit vector k, which points in the
positive z direction. But we have already seen (in Section 5) that the normal to the tangent plane of a level surface
of a function such as w = f (x, y) − z is given by
∂F ∂F ∂F ∂f ∂f
i+ j+ k= i+ j−k
∂x ∂y ∂z ∂x ∂y
z=x2+y2 z=1-x2-y2
z z
Tangent planes CR
y y
x CR x
Figure 24: Two multi-variable functions with critical points at (x, y) = (0, 0). Left: a relative minimum. Right: a
relative maximum.
Since the normal to the tangent plane is parallel with k, we may conclude that at a critical point of a function
f (x, y), the partial derivatives ∂∂ xf = ∂∂ yf = 0. That is to say
5 6
0
∇ f (x, y) =
0
For functions of three variables, such as w = f (x, y, z), critical points are those where
⎡ ∂f ⎤ ⎡ ⎤
∂x 0
∇ f (x, y, z) = ⎣ ∂∂ yf ⎦ = ⎣ 0 ⎦
⎢ ⎥
∂f 0
∂z
Critical points of functions also occur when any of their partial derivatives are undefined (but we will not be
considering this case).
Example 7. Show that the critical points of the functions illustrated in Figure 24 are both located at (x, y) = (0, 0).
For the case z = x2 + y2 , the gradient is 2xi + 2yj, whilst for z = 1 − x2 − y2 the gradient is −2xi − 2yj. The
components of both gradients equal zero at (x, y) = (0, 0). This is therefore a critical point of both functions.
Example 8. Find the critical point of the function z = f (x, y) = 2x2 + 10y2 − 6xy − 18x − 6y + 100.
∂f
First, compute ∂x :
∂f
= 4x − 6y − 18
∂x
∂f
Next, compute ∂y :
∂f
= 20y − 6x − 6
∂y
∂f ∂f
At a critical point, the two partial derivatives, ∂x and ∂y must simultaneously equal zero. Solving the system of
equations
4x − 6y − 18 = 0
20y − 6x − 6 = 0
we find that these equations are satisfied when x = 9, y = 3. Therefore the critical point of f (x, y) is at (x, y) = (9, 3).
In Example 8, the function f (x, y) can be written as a sum of squares: f (x, y) = 2x2 +10y2 −6xy−18x−6y+100 =
(x − 3y)2 + (y − 3)2 + (x − 9)2 + 10. At the critical point, each of the squared terms vanishes, so that f (9, 3) = 10.
Since each of the squared terms is positive, it follows that f (x, y) ≥ 10, for all x, y. This shows that the critical
point of this function is a minimum.
Determining whether a critical point is a relative minimum or a maximum in the above manner is not always
straightforward - in this case, we were able to re-write the function as a sum of squares, but this is not always
possible. Instead, we need a more systematic way of determining the nature of critical points for multivariable
functions. We shall focus on functions of two variables, such as f (x, y).
∂f ∂f
Suppose we determine that the critical point of a function f (x, y) is at (x, y) = (x∗ , y∗ ). This implies that ∂x and ∂y
both equal zero at (x∗ , y∗ ).
Now consider the Taylor series expansion of f (x, y) about the point (x∗ , y∗ ), which, from (10) is
∂f ∗ ∗ ∂f ∗ ∗ 1 1
f (x, y) = f (x∗ , y∗ ) + (x − x∗ ) (x , y ) +(y − y∗ ) (x , y ) + A(x − x∗ )2 + B(x − x∗ )(y − y∗ ) + C(y − y∗ )2 + · · ·
∂ x
% &' ( ∂ y 2! 2!
% &' (
=0 =0
1 1
= f (x∗ , y∗ ) + A(x − x∗ )2 + B(x − x∗ )(y − y∗ ) + C(y − y∗ )2 + · · ·
2 2
where
∂2 f ∗ ∗ ∂2 f ∗ ∗ ∂2 f ∗ ∗
A= ∂ x2
(x , y ) B= ∂ x∂ y (x , y ) C= ∂ y2
(x , y )
1 1 B2 1 B2 1
f (x, y) = f (x∗ , y∗ ) + A(x − x∗ )2 + B(x − x∗ )(y − y∗ ) + (y − y∗ )2 − (y − y∗ )2 + C(y − y∗ )2 + · · ·
2 2 A 2 A 2
# $2 # 2
$
1 B 1 B (51)
= f (x∗ , y∗ ) + A (x − x∗ ) + (y − y∗ ) + C− (y − y∗ )2 + · · ·
2 A 2 A
% &' ( % &' (
Term 1 Term 2
Now suppose the magnitudes of the quantities x − x∗ and y − y∗ are small enough such that Term 1 and Term 2 in
(51) dominate higher order terms in x − x∗ and y − y∗ . We can then determine the nature of the critical point (x∗ , y∗ )
as follows:
The fact that A > 0 means that Term 1 in (51) is positive. With A > 0, the inequality AC − B2 > 0 also ensures that
Term 2 in (51) is positive. Therefore in a small neighborhood of (x∗ , y∗ ), we have f (x, y) ≥ f (x∗ , y∗ ). Therefore
under these conditions, (x∗ , y∗ ) is a relative minimum.
The fact that A < 0 means that Term 1 in (51) is negative. With A < 0, the inequality AC − B2 > 0 also ensures that
Term 2 in (51) is negative. Therefore in a small neighborhood of (x∗ , y∗ ), we have f (x, y) ≤ f (x∗ , y∗ ). Therefore
To analyze this case, assume first that A > 0 (a similar argument holds in the case A < 0). This means that Term 1 in
(51) is non-negative whilst Term 2 is non-positive. Therefore along the line y − y∗ = 0, sufficiently close to (x∗ , y∗ ),
the function f (x, y) will increase since Term 2 is zero. Along the line (x − x∗ ) + BA (y − y∗ ) = 0, sufficiently close to
(x∗ , y∗ ), the function f (x, y) will decrease since Term 1 is zero. In other words, at this critical point, f (x, y) can be
either increasing or decreasing, depending on the direction of interest in the xy-plane. This is known as a saddle
point. An example of such a point is (x, y) = (0, 0) when mapped by the function z = f (x, y) = x2 − y2 , sketched
in Figure 25. Along the line y = 0, this function is increasing in both the positive and negative x directions. Along
the line x = 0, this function is decreasing in both the positive and negative y directions.
z=f(x,y)=x2-y2
Saddle point
inc
inc
dec dec
x
y
Figure 25: A sketch of the function z = f (x, y) = x2 − y2 , with a saddle point at (x, y) = (0, 0).
In this case, no conclusion can be drawn regarding the nature of the critical point from the second derivative test.
We have now arrived at the most important aspect of optimization - determining the maximum and minimum
values attained by a function f (x, y) over a region R of interest. This is analogous to the problem from first-year
calculus of finding the absolute maximum and minimum values of a function f (x) over an interval [a, b].
Let us first state the important definitions:
1. The absolute maximum of a function f (x, y) on a region R ⊆ R2 is the largest value M attained by f (x, y)
on R, i.e.,
f (x, y) ≤ M for all (x, y) ∈ R. (52)
The point(s) (a, b) at which f attains this maximum value is (are) called absolute maximum point(s).
2. The absolute minimum of a function f (x, y) on a region R ⊆ R2 is the least value M attained by f (x, y) on
R, i.e.,
f (x, y) ≥ M for all (x, y) ∈ R. (53)
The point(s) (a, b) at which f attains this minimum value is (are) called absolute minimum point(s).
Recall the procedure for finding absolute maximum and minimum values of a function f (x) over an interval [a, b]:
1. Determine all critical points of f (x) in [a, b] and evaluate f at these points.
2. Evaluate f (x, y) at the endpoints of [a, b], i.e., f (a) and f (b).
3. Select the largest and smallest values of f (x) attained at the points examined in Steps 1 and 2.
For functions f (x, y), the region R will be a two-dimensional region of R2 , for example, the region contained inside
a circle or a rectangle. Such regions R will generally not have endpoints but will be enclosed by boundary curves.
The analogous procedure of finding the absolute maximum and minimum values of f (x, y) over region R will be
as follows:
1. Determine all critical points of f (x, y) in R and evaluate f at these points. (Note that we really don’t need to
spend time determining whether these critical points are relative maxima, minima or saddle points - it is the
value of the function f (x, y) that is usually more important.)
2. Determine the maximum and minimum values achieved by f (x, y) over the boundary curve(s) of R.
3. Select the largest and smallest values of f (x, y) attained at the points examined in Steps 1 and 2.
Note: There is one important theoretical technicality. How do we know that f attains an absolute maximum or
absolute minimum on a region R? In most of the examples that we shall encounter, R will be a closed and bounded
region of R2 , e.g., the interior of a rectangle, circle or ellipse. In such cases, if f (x, y) is a continuous function of x
and y, then it must attain absolute maximum and minimum values on R. (Recall the case of continuous functions
f (x) on closed intervals [a, b] in first-year calculus.)
Example 9. Find the maximum and minimum values of the function f (x, y) = x2 + xy + y2 over the square region
R defined by −1 ≤ x ≤ 1, −1 ≤ y ≤ 1
Step 1: Solve for all critical points of f (x, y) that lie in R. They must satisfy
∂f
∂x = 2x + y = 0
∂f
∂y = x + 2y = 0
The only solution of this system is (0, 0). At this point f (0, 0) = 0.
1 Region R
x
-1 0 1
-1
Just for interest’s sake, we shall perform a second derivative test of this point:
Step 2: We now determine the maximum and minimum values achieved by f (x, y) on the boundary of R.
(a) The line y = 1, with −1 ≤ x ≤ 1. On this line, the function f (x, y) is given by f (x, 1) = x2 + x + 1,
which we shall call g(x). The problem is now to find the max and min values of g(x) on [−1, 1], a
first-year calculus problem.
Since g′ (x) = 2x + 1, the critical point of g(x) is at x = −1/2. At this point g(−1/2) = 3/4.
We must also check the endpoints x = ±1: g(−1) = 1 and g(1) = 3.
(b) The line y = −1, with −1 ≤ x ≤ 1. Here, f (x, y) is given by f (x, −1) = x2 − x + 1, which we shall call
h(x). Now find the max and min values of g(x) on [−1, 1].
Since h′ (x) = 2x − 1, the critical point of h(x) is at x = 1/2. At this point h(1/2) = 3/4.
We also check the endpoints x = ±1: h(−1) = 3 and h(1) = 1.
(c) The line x = 1, with −1 ≤ y ≤ 1. Here, f (x, y) is given by f (1, y) = 1 + y + y2 , which we shall call k(y).
We must now find the max and min values of k(y) on [−1, 1]. This problem turns out to be identical to
the first case, except that x is now called y. But just to be complete, we’ll work it out in detail.
k′ (y) = 2y + 1, the critical point of k(y) is at y = −1/2. At this point k(−1/2) = 3/4.
We also check the endpoints y = ±1: k(−1) = 1 and k(1) = 3.
(d) The line x = −1, with −1 ≤ y ≤ 1. Here, f (x, y) is given by f (−1, y) = 1 − y + y2 , which we shall call
l(y). Now find the max and min values of g(x) on [−1, 1].
Since l ′ (y) = 2x − 1, the critical point of l(y) is at x = 1/2. At this point l(1/2) = 3/4.
We also check the endpoints x = ±1: l(−1) = 3 and l(1) = 1.
Reviewing all of the above results, we have found the following: On the region R, the function f (x, y) = x2 +xy+y2
achieves
Example 10. Find the absolute max/min values of f (x, y) = x2 + xy + y2 over the region D = {(x, y) | x2 + y2 ≤
1, y ≥ 0}.
f = 3/4
x
-1 0 1
f =0
f = 3/4
abs min
-1
f =3 f =1
f = 3/4
abs max
This is the same function as considered in the previous example but the region D is different. D is the semicircular
region that is enclosed by the circle x2 + y2 = 1 and the x-axis.
1. Step 1: Determine critical points of f in D. (0, 0) is the only critical point of f . It also lies in the region R.
Here f (0, 0) = 0.
2. Step 2: Examine f over the boundary of D. We examine the two curves that comprise the boundary.
(a) The line y = 0, −1 ≤ x ≤ 1, on which f (x, 0) = x2 . On this line, f achieves a minimum value of 0 at
(0, 0) (the critical point of Step 1) and the maximum value of 1 at (±1, 0).
(b) The semicircular curve x2 + y2 = 1, y ≥ 0. We can parameterize this curve as
Suppose that you are given some experimental data in the form of ordered pairs (Ti , Pi ), i = 1, 2, · · · , N that, when
plotted, suggest that there is some kind of relationship between the Ti and Pi , as sketched in Figure 26.
From Figure 26, it appears that as property T increases, so does property P. (For example, this could be a plot
of pressure P vs. temperature T of a gas at fixed volume V .) Of course, one would like to be able describe the
relationship a little better than this, perhaps in the form of a functional relationship, i.e.,
P = f (T ) (57)
+
PN +
+
+
+
(Ti , Pi )
+ +
P1 + +
T
T1 TN
In practice, we have to acknowledge that there are errors in the data points, so we would not demand that the
graph of f (T ) would necessarily pass through all data points. Note that if the number of points is small, and we
allow f (T ) to be a polynomial of sufficiently high degree, we could fit a curve through these points. An example
is presented in Figure 27. We would probably expect that the relationship between T and P to be much less
complicated, i.e. not as “bumpy”, which implies that the function f (T ) would be much simpler in form.
+
PN +
+
+
+
(Ti , Pi )
+ +
P1 + +
T
T1 TN
One of the simplest representations which is useful in many applications is the straight line, i.e., that the following
fundamental relationship underlies the pattern seen in the experimental data,
P = f (T ) = aT + b (58)
where a and b are constants. In other words, we shall try to produce a straight line approximation to the data points
so that
Pi ≈ aTi + b, i = 1, 2, · · · , N (59)
is a good approximation, as sketched in Figure 28.
Question: “What is the line that ‘best fits’ the data points?”
Answer: There are actually many “best lines”: It all depends on the “measure of fit” that you use.
In some way, we would like our straight-line fitting of the data to minimize the error of the fit. The situation at
each value of Ti is sketched in Figure 29.
P = aT + b
+
PN +
+
+
+ (Ti , Pi )
+ +
P1 + +
T1 TN T
P
actual data value
Pi + P = aT + b
error ei
aTi + b
predicted value
Ti T
Figure 29: The error ei between the data point (Ti , Pi ) and the prediction given by the fitted line P = aT + b.
ei = Pi − (aTi + b) (60)
Note that this error can be either positive or negative (or zero). As such, it would not be a good idea to consider
the total error of the fit to be simply the sum of the errors, i.e.,
n
S = ∑ ei (61)
i=1
since we could have a large positive error cancelling a large negative error – the fit would be bad but the net error
would be close to zero.
Thus, in some way, we should look at the magnitudes of the errors. There are several possibilities, including
This type of fitting is used quite often in statistical applications and is known as an “L1 fit.” Unfortunately,
it is somewhat complicated to perform because of the absolute values. (That being said, there is software
available to perform the fit.) A much easier method is to consider
This measure, which is very commonly employed in scientific applications, is the basis of the so-called
“method of least squares”.
Of course, the question still remains: How do we find the best-fitting line according to least squares? The answer
is that we consider the sum S of squares as a function of the parameters a and b:
N N
S(a, b) = ∑ (ei )2 = ∑ [Pi − (aTi + b)]2 (64)
i=1 i=1
(Recall that the data points (Ti , Pi ), 1 ≤ i ≤ N, as well as N, are given.) The “best” values of a and b are those that
minimize the function S(a, b).
The minimum of S(a, b) must occur at a critical point of S(a, b) – points (a, b) for which the partial derivatives
∂S ∂S
∂ a (a, b) and ∂ b (a, b) are both zero, or for which at least one of the derivatives fails to exist.
We can compute both partial derivatives in a straightforward way – keep in mind that the Ti and Pi are constants
and we are differentiating with respect to a and b:
N
∂S
= 2 ∑ [Pi − aTi − b](−Ti ) (65)
∂a i=1
N
= −2 ∑ [Pi − aTi − b]Ti
i=1
N
∂S
= 2 ∑ [Pi − aTi − b](−1)
∂b i=1
N
= −2 ∑ [Pi − aTi − b]
i=1
∂S
Since both partial derivatives exist, the condition for a critical point is that they both vanish. Setting ∂ a (a, b) = 0,
dividing by (-2) and expanding the sums yields the condition
N N N
∑ Ti Pi − ∑ aTi2 − ∑ bTi = 0 (66)
i=1 i=1 i=1
∂S
Similarily, setting ∂ b (a, b) = 0, dividing by (-2) and expanding the sums yields the condition
N N N
∑ Pi − ∑ aTi − ∑ b = 0. (67)
i=1 i=1 i=1
We can take a and b out of the summations above and rearrange the equations to produce the following
) * ) *
N N N
∑ Ti2 a + ∑ Ti b = ∑ Ti Pi (68)
i=1 i=1 i=1
) *
N N
∑ Ti a + Nb = ∑ Pi
i=1 i=1
Equation (68) is a linear system of equations in the unknowns a and b of the form
(Note that A21 = A12 . The coefficients Ai j and B j of a and b are determined from the data points (Ti , Pi ). This linear
system can be solved by elimination or by Cramer’s Rule, provided that the determinant of the system is nonzero,
i.e.,
D = A11 A22 − A21 A12 ̸= 0. (71)
Notes:
1. The method of least squares is easily extended to consider higher order polynomial fits of data. For example,
suppose that we wish to fit the data points (xi , yi ), 1 ≤ i ≤ N, with a quadratic, i.e.
where a, b and c are parameters to be determined. We would then consider the following sum of squared
errors,
N
S(a, b, c) = ∑ [yi − axi2 − bxi − c]2 (73)
i=1
∂S ∂S ∂S
= = = 0. (74)
∂a ∂b ∂c
This leads to a set of three linear equations in a, b and c which then can be solved.
2. Sometimes, the relation between the xi and yi is more complicated than a linear relation, for example,
y = bxa , (75)
once again where a and b are to be determined. This problem can be recast into a linear problem if we take
logarithms of both sides, i.e.
log y = a log x + log b. (76)
We now consider our data points to be
A plot of vi vs. ui will be roughly approximated by a straight line with slope a and v-intercept b. To determine
the best values of a and b, we employ the method of least squares on the data (ui , vi ).
The optimization problems we have looked at so far have involved finding the maximum or minimum of a function
z = f (x, y) on its domain. We saw that at any point (x, y) in the domain of f (x, y) where ∇ f = 0, the function will
have a critical point, which may be a relative minimum, or a relative maximum or saddle point.
Suppose now that we are interested in finding the minima and maxima of f (x, y) (called the objective function),
but only on a pre-specified set of points in its domain. Suppose also that these points are those that satisfy the
constraint g(x, y) = 0. In such a case, this optimization problem is termed a constrained optimization problem.
This problem can be formally stated as
Minimize/maximize z = f (x, y)
subject to g(x, y) = 0
To visualize this, remember that for the unconstrained problem, finding maxima or minima of z when z = f (x, y)
involved finding points in the (x, y) plane where the surface defined by the function z = f (x, y) is locally flat (i.e.
has a horizontal tangent plane). In the constrained problem, note that the constraint g(x, y) = 0 defines a curve in
the xy-plane and a surface in the xyz-space. The intersection of the surfaces defined by z = f (x, y) and g(x, y) = 0
define a curve in the xyz-space. Therefore in the constrained optimization problem, we seek the extreme (i.e.
minimum or maximum) value of z attained by the curve resulting from the intersection of the surfaces given by
z = f (x, y) and g(x, y) = 0. Rather than seeking the maxima/minima of the two dimensional surface z = f (x, y),
the dimensionality of the problem is reduced by the number of constraints (in this case, by one) so that we seek the
maxima/minima of the one dimensional curve given by the intersection of z = f (x, y) and g(x, y) = 0.
The following examples will illustrate this idea.
Example 11. Find
In the first problem, illustrated in Figure 30, the minimum of z across the xy-plane can be found by inspection to
be at (x, y) = (0, 0). At this point, z = 0.
x
z y
Minimum at (x,y)=(0,0)
For the second problem, illustrated in Figure 31, note that by substituting the constraint x = 1 into the function
z = x2 + y2 , we obtain z = 1 + y2 , which is a parabola. This is the same parabola illustrated in Figure 31, right,
which results from the intersection of the surfaces x = 1 and z = x2 + y2 . By inspection, this curve has a minimum
at y = 0, and therefore the minimum of z = x2 + y2 along the constraint x = 1 occurs at (x, y) = (1, 0), at which
z = 1.
z y x=1 z
z=1+y2
x Minimum at (x,y)=(1,0) x
y
Minimum at (x,y)=(1,0)
Example 12. Find the shortest distance between the origin and the plane x + y + z = 1.
This can be solved in two ways. The first way is to find the length of the line starting at the origin, as shown in
Figure 32, perpendicular to the plane, passing through the origin. The line perpendicular to the plane is parallel to
the normal vector to the plane, n = [ 1 1 1 ]T . Since it passes through the origin, the parametric equation of
the position vector r(t) of points in this line is
⎡ ⎤ ⎡ ⎤
x(t) 1
r(t) = ⎣ y(t) ⎦ = ⎣ 1 ⎦ t
z(t) 1
This line intersects the plane x + y + z = 1 when
2 3T 1
1 1 1 r(t) = 3t = 1 ⇒ t =
3
9 :
At t = 13 , the position vector is r 31 = 31 [ 1 1 1 ]T . The minimum distance to the plane from the origin is the
length of the vector r( 13 ), which is √13 . Note that at the point of intersection of this line, the sphere of radius √13
just touches the plane x + y + z = 1, as shown in Figure 32.
n
(0,0,1)
(1/3,1/3,1/3)
x2+y2+z2=1/3
(0,1,0)
y
(1,0,0)
x+y+z=1
x
Figure 32: The shortest distance from the origin to the plane x + y + z = 1. The vector n is normal to the plane.
4
Minimize h(x, y, z) = x2 + y2 + z2
Subject to x+y+z = 1
Minimize m(x, y, z) = x2 + y2 + z2
Subject to x+y+z = 1
1
Level sets of the function w = m(x, y, z) are spheres of radius w 2 , centered on the origin, as shown in Figure 33.
Therefore this minimization problem is equivalent to finding the minimum radius of the sphere w = m(x, y, z) that
just touches the plane x + y + z = 1, as in Figure 32.
z z z z
w=16
w=9
w=4
w=1
x y x y x y x y
z = 1−x−y
and substitute this condition into the objective function m(x, y, z) to obtain
which is to be minimized over the xy-plane. To do this, we solve for the critical point(s) in the xy-plane where
5 6 5 6
4x − 2 + 2y 0
∇m(x, y, 1 − x − y) = =
4y − 2 + 2x 0
Solving these simultaneous equations, we find that the only critical point is x = y = 31 . To ensure that this is a
minimum, it can be verified that the second derivative test yields A = C = 4 > 0 and AC − B2 = 12 > 0. On the
plane x + y + z = 1, x = y = 31 ⇒ z = 31 . At the point (x, y, z) = ( 13 , 31 , 13 ), w = 13 , which corresponds to the sphere
of radius √13 , which is the distance from the origin to the constraint plane, in agreement with the previous method.
Note the parallels between this optimization and that of Example 11. There, the constraint x = 1 reduced the
problem from optimizing over the two-dimensional surface z = f (x, y) = x2 + y2 to optimizing over the one-
dimensional curve defined by
z = 1 + y2
x=1
If the optimization in this example were unconstrained, then the goal would be to minimize the function w =
m(x, y, z) = x2 + y2 + z2 over the entire three-dimensional space, R3 , as illustrated in Figure 33. This would yield
a minimum at the origin (that is, the sphere with radius length zero). In the constrained case however, the con-
straint x + y + z = 1 reduces the dimension of the space over which we seek the minimum of m(x, y, z) to the
two-dimensional surface given by m(x, y, 1 − x − y) = 2x2 + 2y2 − 2x − 2y + 2xy + 1.
For constrained optimizations, we have seen examples where the constraint can be re-arranged and substituted
into the objective function to be maximized or minimized. This is not always easy or straightforward. One such
example is the optimization
Maximize f (x, y) = xy
Subject to g(x, y) = x2 + y2 − 1 = 0
Instead, we turn to another method finding the critical points of the objective function f (x, y) on the constraint.
Figure 34 shows some of the level sets of f (x, y) plotted on the xy-plane, along with the constraint g(x, y) =
x2 + y2 − 1 = 0 (which is the unit circle). Visually, we can see!that the"maxima occur at the ! points where" the
highest level hyperbola just touches the unit circle, at P, (x, y) = √12 , √12 and at R, (x, y) = − √12 , − √12 . Note
! " ! "
also that there are two minima at Q (x, y) = √12 , − √12 and at S, (x, y) = − √12 , √12 .
Note that at P the vector TP is tangent to both the level curve f (x, y) = 0.5 and the constraint curve g(x, y) = 0,
as are the vectors, TQ at Q, TR at R, TS at S. Note also that the gradient vectors ∇ f and ∇g are also parallel at
P, Q, R, S.
∇g TP ∇f , ∇g
∇f , ∇g TR ∇g
Figure 34: Level curves of f (x, y) = xy and the constraint x2 + y2 = 1. Arrows TP , TQ , TR , TS represent tangent
vectors to both the level curves and the constraint curve at P, Q, R, S respectively.
To see why this is the case, note that for the unconstrained optimization of a function z = f (x, y), a point in the
2 3T
xy-plane is a critical point of z = f (x, y) if, along any curve r(t) = x(t) y(t) passing through the point in
any direction, is such that
dz dr(t)
= ∇f · =0
dt dt
dr(t)
Since dt can represent a vector in any direction, this condition requires that ∇ f = 0 for the point to be a critical
one.
However in the constrained optimization case, we require that dz dt = 0 along the constraint direction only. In other
words, for a point on a constraint to be a critical one, we require that the directional derivative of f (x, y) be zero
only along the constraint. If T̂ represents a unit tangent vector to the constraint (which is a vector along the
constraint), then a critical point on the constraint is characterized by the condition
∇ f · T̂ = 0
which means that the gradient ∇ f is orthogonal to any vector that is tangent to the constraint. Now since the
gradient of the constraint ∇g is also orthogonal to any tangent to the constraint, it must follow that ∇ f and ∇g are
parallel. Two parallel vectors can therefore be related by a scaling λ (which, in this case, we call the Lagrange
multiplier):
∇ f = λ ∇g (78)
In order to solve for the coordinates of the critical point, (78) gives two simultaneous equations:
∂f
∂x = λ ∂∂ gx
∂f
∂y = λ ∂∂ gy
and the constraint
g(x, y) = 0
gives a third equation. Since there are three unknowns: x, y, λ , these equations can be solved for the critical point(s)
(x, y).
For the above example therefore, the simultaneous equations to solve are then
∂f
∂x = λ ∂∂ gx ⇒ y = 2λ x
∂f ∂g
∂y = λ ∂ y = ⇒ x = 2λ y
g(x, y) = 0 ⇒ x2 + y2 − 1 = 0
Solving these equations gives four stationary points (as expected from Figure 34) at which we can evaluate f (x, y):
Minimize w = f (x, y, z)
Subject to g(x, y, z) = 0 and h(x, y, z) = 0
The function w = f (x, y, z) maps points in three dimensional space onto a variable w. Each of the constraints
g(x, y, z) = 0 and h(x, y, z) = 0 defines a surface in three-dimensional space. Their intersection defines a curve C
in three-dimensional space, as shown in Figure 35. The function w = f (x, y, z) takes a value at each point on this
curve, and we wish to find the point(s) on this curve where w is at a minimum.
At each point (x, y, z), the vector ∇g is orthogonal to the tangent plane of the surface g(x, y, z) = 0. Similarly, the
vector ∇h is orthogonal to the tangent plane of the surface h(x, y, z) = 0. The curve C lies along both g(x, y, z) = 0
and h(x, y, z) = 0, and therefore ∇g and ∇h are both normal to C, and lie in a plane P that C crosses orthogonally. At
the same time, the vector ∇g × ∇h is orthogonal both ∇g and ∇h (and hence orthogonal to plane P) and therefore
tangential to C.
Now a critical point of w = f (x, y, z) is such that the directional derivative of ∇ f along in a direction tangential to
C is zero. In other words
∇ f · (∇g × ∇h) = 0
This means that ∇ f is orthogonal to C and therefore must lie in the plane P. For this reason, we can write the
vector ∇ f as a linear sum of ∇g and ∇h (assuming that ∇g and ∇h are not parallel). In other words, using the
Lagrange multipliers λ and µ , we have
∇ f = λ ∇g + µ ∇h
In the five unknowns x, y, z, λ , µ we therefore have the five equations
∂f
∂x = λ ∂∂ gx + µ ∂∂ hx
∂f
∂y = λ ∂∂ gy + µ ∂∂ hy
∂f
∂z = λ ∂∂ gz + µ ∂∂ hz
g(x, y, z) = 0
h(x, y, z) = 0
g(x,y,z)=0
Curve C
∇h
∇g
∇g×∇h
h(x,y,z)=0
Plane P
Figure 35: Intersection curve C of the two surfaces defined by the constraints g(x, y, z) = 0 and h(x, y, z) = 0. A
vector tangent to the curve C and orthogonal to the plane P containing the two vectors ∇g and ∇h is given by
∇g × ∇h.