Text Differential Calculus
Text Differential Calculus
This text is intended for UBC Math 100. It consists primarily of content drawn from three
open-source textbooks:
• CLP-1 Differential Calculus by Joel Feldman, Andrew Rechnitzer, and Elyse Yeager
Copyright © 2016–24 CC-BY-NC-SA 4.0
• Optimal, Integral, Likely prepared by Bruno Belevan, Parham Hamidi, Nisha Malhotra, and
Elyse Yeager Copyright © 2020-21 CC-BY-NC-SA, which is itself largely based on
CLP-3 Multivariable Calculus by Joel Feldman, Andrew Rechnitzer, and Elyse Yeager
Copyright © 2016–2024 CC-BY-NC-SA 4.0
2
– The unnumbered section “Using the arithmetic of derivatives – examples” (starting page
118 in the textbook) is adapted from section 2.6.
– Section 4.2 is adapted from section 2.8.
– Section 4.3 is adapted from section 2.9.
– Section 4.4 is adapted from section 2.10.
– Section 4.5 is adapted from section 2.11.
– Section 4.6 is adapted from sections 0.6 and 2.12.
– The introduction to Chapter 4.7 is adapted from the introduction to chapter 3. The rest
of section 4.7 is adapted from section 2.12.
– The introduction and Sections 8.1, 8.2, and 8.3 are adapted from CLP section 3.5.
– Section 8.4 is adapted from Keshet, chapter 7.
• Chapter 14 is adapted from OIL Chapter 1, which is itself based on CLP–3 Chapter 1; except
subsection 14.1.1, which was Appendix A.1 in OIL, and does not appear in CLP–3.
• Chapter 15 is adapted from OIL Sections 2.1-2.2, which are based on CLP–2 chapter 2.
• Chapter 16 is adapted from OIL Sections 2.3-2.5, which are based on CLP–3 chapter 2.
3
4
C ONTENTS
0 Introduction 1
0.1 About this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.1 Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.2 Learning objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.3 Flavours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.2 Writing mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Pre-calculus 3
1 Power functions as building blocks 5
1.1 Power functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 First steps in graph sketching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Rate of reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 (optional) Predator response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Familiar functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Limits 27
2.1 Quick review of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.1 Calculating limits with limit laws . . . . . . . . . . . . . . . . . . . . . . 37
2.1.2 Limits at infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2 Asymptotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3 Limits and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Differentiation 69
3 Introduction to the Derivative 73
3.1 Review: lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1.1 Equations and sketches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1.2 Different equation forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.1.3 Slopes at different points . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
i
CONTENTS CONTENTS
8 Optimization 243
8.1 Local and global maxima and minima . . . . . . . . . . . . . . . . . . . . . . . . 244
8.2 Finding global maxima and minima . . . . . . . . . . . . . . . . . . . . . . . . . 253
8.3 Max/min examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8.4 Sample optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
8.4.1 Density dependent (logistic) growth in a population . . . . . . . . . . . . . 275
8.4.2 Wine for Kepler’s wedding . . . . . . . . . . . . . . . . . . . . . . . . . . 277
8.4.3 Optimal foraging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
ii
CONTENTS CONTENTS
iii
CONTENTS CONTENTS
iv
Chapter 0
I NTRODUCTION
Learning Objectives
• Solve a long question by breaking it up into smaller pieces.
1
I NTRODUCTION 0.2 W RITING MATHEMATICS
• Understand some basic ideas about what constitutes a proof in mathematics; understand
the differences between how something is defined and how it is computed.
0.1.3 §§ Flavours
Math 100 has three flavours. All of them use this document. Some content is shared between all
flavours, and some is not.
Content that isn’t shared by all flavours is marked. For example, Chapter 14 has “Flavour C” in
its title, and a coloured bar running along the left margin. If you’re in Flavour C, this content will be
covered in class and homework, and is examinable. If you’re in flavours A or B, this chapter won’t
show up in class or on exams. You may wish to self-study content from other flavours (especially
smaller pieces of content, like a single example) for personal interest, or to deepen your familiarity
with common concepts, but doing so is purely optional.
Sometimes content looks a lot like one flavour (for example, an exercise about finding interest
on an investment would look like Flavour C) but uses mathematics common to all flavours. These
are generally not marked as flavoured, since the content may be helpful for all flavours.
• Use the symbols given in the problem statement (e.g., x). If you need to introduce a new
symbol, clearly define it (e.g., “let y = length of desk”, or label it on a diagram).
• Make sure your statements are unambiguous. In general, the standalone fragment “= 5” is
incomplete and unclear (what is it that “equals five”?), but “x = 5” is a complete mathematical
statement (as long as your reader knows what x represents).
• Be particularly careful with ambiguity and clarity when doing multi-line calculations. For
instance, although it is reasonable to not repeat the left-hand side of an equality if it remains
the same line by line, e.g.
f (x) = (x + 5)(x ´ 2)
= x2 + 3x ´ 10,
2
I NTRODUCTION 0.2 W RITING MATHEMATICS
• Use English words to explain any reasoning that is not captured by your mathematical notation.
• If you perform multiple different calculations within one solution, label them (e.g.,“finding
critical pts”, “finding minimum”).
• If you make any assumptions, state them.
If you are writing a solution more formally, such as for a written assignment, then you should follow
the above guidelines, and additionally:
• Explain what you are doing at each step. For instance, are you taking a derivative of a function
f (x)? Write it: “Taking the derivative of f (x) reveals f 1 (x) =. . . ”.
• Be clear. Instead of “It is nonzero because. . . ”, write “The function is nonzero because. . . ”.
• Use appropriate spelling, grammar, and punctuation – including ending sentences with a
period – even when you are using mathematical notation. Although “x = 5” might suffice on
an exam, on an assignment you would write something like “We get x = 5.”, complete with
period.
• Format your mathematics appropriately. For instance, the exponential function should look
like ex : italicized, with the x as a superscript. In LaTeX, this would be typed in math mode as
“$e ^ x$.” In Word or Google Docs, you can use the equation editor, or manually format the
expression in italics and superscript the x.
• Ensure your entire solution is readable from start to finish, like a paragraph or an essay (instead
of like jot notes, which would suffice for an informal solution). You can test this by reading
the entire solution out loud to someone (even yourself). A readable solution almost always
requires including additional English sentences to explain what you are doing, and it often
involves enclosing your mathematical equations within English sentences.
Following these guidelines, one way to more formally write the informal calculation shown above is:
“Consider the function f (x) = (x + 5)(x ´ 2), which expands to f (x) = x2 + 3x ´ 10. We multiply
f by 2 to get 2 f (x) = 2x2 + 6x ´ 20.” There are of course many ways to write this formally; this is
just an example. Many solutions in this textbook choose to present solutions as an indented chunk
of properly-formatted equations surrounded by English sentences; for instance:
“ Consider the function
f (x) = (x + 5)(x ´ 2)
= x2 + 3x ´ 10. (expanded)
Multiply by two to get:
2 f (x) = 2x2 + 6x ´ 20. ”
3
I NTRODUCTION 0.2 W RITING MATHEMATICS
4
Chapter 1
P OWER FUNCTIONS
AS BUILDING BLOCKS
Like tall architectural marvels that are made of simple units (beams, bricks, and tiles), many
interesting functions can be constructed from simpler building blocks. In this chapter, we study a
family of simple functions, the power functions — those of the form f (x) = xn .
Our first task is to understand properties of the members of this “family”. We will see that basic
observations of power functions such as x2 , x3 lead to insights into significant considerations such as
the sustainability of life on planet Earth (for example). Later, we use power functions as “building
blocks” to construct polynomials and rational functions1 . We then develop important approaches to
sketch the shapes of the resulting graphs.
Learning Objectives
• Sketch functions of the form f (x) = xn , where n is a real number (power functions);
interpret the shapes of power functions relative to one another.
• Determine which term in a polynomial function will dominate for small x and for large
x.
Let us consider the power functions, that is, functions of the form
y = f ( x ) = xn ,
where n is a real number. Power functions are among the most elementary and “elegant” functions -
we only need multiplications to compute their value at any point. They are thus easy to calculate,
very predictable and smooth, and, from the point of view of calculus, very easy to handle.
1 Now would be a good time to check in with your understanding of these terms. Can you define function? Can you
give an example of a polynomial function? What about an example of a rational function?
5
P OWER FUNCTIONS AS BUILDING BLOCKS 1.1 P OWER FUNCTIONS
` Click on this link and then adjust the slider on this interactive desmos graph to see how the
power n affects the shape of a power function in the first quadrant.
From Figure 1.1, we see that the power functions (y = xn for powers n = 2, . . . , 5) intersect2 at x = 0
and x = 1. This is true for all positive integer powers. The same figure also demonstrates another
fact helpful for curve-sketching: the greater the power n, the flatter the graph near the origin and
the steeper the graph beyond x ą 1. This can be restated in terms of the relative size of the power
functions. We say that close to the origin, the functions with lower powers dominate, while far from
the origin, the higher powers dominate.
4 y
x5 x4
3
x3
2
x2
1
x
0.2 0.4 0.6 0.8 1 1.2 1.4
Figure 1.1: Graphs of a few power functions y = xn . All intersect at x = 0, 1. As the power n
increases, the graphs become flatter close to the origin, (0, 0), and steeper at large x-values.
y = f ( x ) = K ¨ xn
where a and b are constants. You may assume that both a and b are positive.
This comparison is a slight generalization of the previous discussion. First, we note that the
coefficients a and b merely scale the vertical behaviour (i.e. stretch the graph along the y axis). It is
still true that the two functions intersect at x = 0; further, as before, the higher the power, the flatter
the graph close to x = 0, and the steeper for large positive or negative values of x. However, now
another point of intersection of the graphs occur when
6
P OWER FUNCTIONS AS BUILDING BLOCKS 1.1 P OWER FUNCTIONS
100 y
80
2x3
60 5x2
40
20
x
1 2 3 4
Figure 1.2: Graphs of two power functions, y = 5x2 and y = 2x3 .
x = (b/a)1/(n´m) . (1.1.1)
This is shown in Figure 1.2 for the specific example of y1 = 5x2 , y2 = 2x3 . Close to the origin,
the quadratic power function has a larger value, whereas for large x, the cubic function has larger
values. The functions intersect when 5x2 = 2x3 , which holds for x = 0 or x = 52 = 2.5. ♦
If b/a is positive, then in general the value given in (1.1.1) is a real number.
Example 1.1.1
Example 1.1.2
Determine points of intersection for the following pairs of functions:
Following the steps outlined above in Example 1.1.1 (calculations not shown in detail here —
this is a good place for you to try the calculations yourself), we find the following intersections:
?
(a) Intersections occur at x = 0 and at ˘(27/3)1/(4´2) = ˘ 9 = ˘3.
(b) These functions intersect at x = 0, 3 but there are no other intersections at negative values of x.
Example 1.1.2
2 How comfortable are you with interpreting graphs? Check in: use Figure 1.1 to approximate when x5 = 2.
3 Another good check-in point: If asked to draw a solution in the first quadrant, you should know that this means the
upper right-hand corner of the graph, which is where both x and y are positive.
7
P OWER FUNCTIONS AS BUILDING BLOCKS 1.1 P OWER FUNCTIONS
Note that in many cases, the points of intersection are irrational numbers4 whose decimal
approximations can only be obtained by a scientific calculator or by some approximation method
(such as Newton’s Method, studied in Chapter 10).
With only these observations we can examine the issue of energy balance and the sustainability
of life on Earth — as seen next.
1. Energy input from the sun, given the Earth’s radius r, can be approximated as5
where S is incoming radiation energy per unit area (also called the solar constant), and
0 ď a ď 1 is the fraction of that energy reflected; a is also called the albedo, and depends on
cloud cover, and other planet characteristics (such as percent forest, snow, desert, and ocean).
2. Energy lost from Earth due to radiation into space depends on the current temperature of the
Earth T , and is approximated as
where ε is the emissivity of the Earth’s atmosphere, which represents the Earth’s tendency
to emit radiation energy. This constant depends on cloud cover, water vapour, as well as
on greenhouse gas concentration in the atmosphere; σ is a physical constant (the Stephan-
Bolzmann constant) which is fixed for the purpose of our discussion.
Notice there are several different symbols in Eqns. (1.1.2) and (1.1.3). Being clear about which
are constants and which are variables is critical to using any mathematical model. As the next
example points out, sometimes you have a choice to make.
Example 1.1.3 (Energy expressions are power functions)
Explain in what sense the two forms of energy above can be viewed as power functions, and what
types of power functions they represent.
Both Ein and Eout depend on Earth’s radius as the power „ r2 . However, since this radius is a
constant, it is not fruitful to consider it as an interesting variable for this problem. However, we note
that Eout depends on temperatureas „ T 4 . (We might also select the albedo as a variable and in that
case, we note that Ein depends linearly on the albedo a.)
4 As a reminder,? an irrational number is a real number that cannot be expressed as a ratio of integers. Classic
examples are 2 and pi.
5 Take a close look at the formula for Ein in equation 1.1.2. Do you think Ein is proportional to Earth’s surface area,
or its volume? If you’re stuck, consider the formulas for the surface area and the volume of a sphere.
8
P OWER FUNCTIONS AS BUILDING BLOCKS 1.2 F IRST STEPS IN GRAPH SKETCHING
Example 1.1.3
We observe that the factors πr2 cancel, and we can obtain an equation that can be solved for the
temperature T . . .
. . . this is left for you (the reader) to finish! Once you have the answer (T = . . . ) it is additionally
instructive to examine how this temperature depends on the constants in the problem, and how it is
affected by cloud cover and greenhouse gas level.
Example 1.1.4
Learning Objectives
• Sketch two-term polynomial functions by determining which term dominates for small
x and for large x. For example, sketch f (x) = x2 ´ 3x4 .
Many functions are not symmetric at all, and are neither even nor odd.
9
P OWER FUNCTIONS AS BUILDING BLOCKS 1.2 F IRST STEPS IN GRAPH SKETCHING
Figure 1.3: Graphs of power functions. (a) A few even power functions: y = x2 , y = x4 and y = x6 .
(b) Some odd power functions: y = x, y = x3 and y = x5 . Note the symmetry properties.
` Adjust the slider to see how the even and odd power functions behave as their power
increases.
Example 1.2.1
Show that the function y = g(x) = x2 ´ 3x4 is an even function
For g to be an even function, it should satisfy g(´x) = g(x). Let us calculate g(´x) and see if
this requirement holds. We find that
Here we have used the fact that (´x)n = (´1)n xn , and that when n is even, (´1)n = 1.
Example 1.2.1
All power functions are continuous and unbounded: for x Ñ 8 both even and odd power
functions satisfy y = xn Ñ 8. For x Ñ ´8, odd power functions tend to ´8. Odd power functions
are one-to-one: that is, each value of y is obtained from a unique value of x and vice versa. This is
not true for even power functions. From Fig 1.3 we see that all power functions go through the point
(0, 0). Even power functions have a local minimum at the origin whereas odd power functions do
not.
A local minimum of a function f (x) is a point xmin such that the value of f is larger at all
sufficiently close points. Formally, f (xmin ˘ ε ) ą f (xmin ) for ε small enough.
10
P OWER FUNCTIONS AS BUILDING BLOCKS 1.2 F IRST STEPS IN GRAPH SKETCHING
Concept Check-In
(Grey boxes labelled “Concept Check-In” — containing questions or prompts that encourage you to check in on
your knowledge or comfort with concepts — like this one, are offered occasionally in select chapters.)
` Adjust the slider to see how positive and negative values of the coefficient a affect the shape
of this simple polynomial.
The polynomial in Eqn. (1.2.1) has two terms, each one a power function. Let us consider their
effects individually. Near the origin, for x « 0 the term ax dominates so that, close to x = 0, the
function behaves as
y « ax.
This is a straight line with slope a. Hence, near the origin, if a ą 0 we would see a line with positive
slope, whereas if a ă 0 the slope of the line should be negative. Far away from the origin, the cubic
term dominates, so
y « x3
at large (positive or negative) x values. Figure 1.4 illustrates these ideas.
Concept Check-In
6. Justify why the linear term dominates near the origin, while the cubic term dominates
further out.
11
P OWER FUNCTIONS AS BUILDING BLOCKS 1.2 F IRST STEPS IN GRAPH SKETCHING
y y y
x x x
y y y
x x x
y y y
x x x
Figure 1.4: The graph of the polynomial y = p(x) = x3 + ax can be obtained by combining its two
power function components. The cubic “arms” y « x3 (top row) dominate for large x (far from
the origin), while the linear part y « ax (middle row) dominates near the origin. When these are
smoothly connected (bottom row) we obtain a sketch of the desired polynomial. Shown here are
three possibilities, for a ă 0, a = 0, a ą 0, left to right. The value of a determines the slope of the
curve near x = 0 and thus also affects presence of a local maximum and minimum (for a ă 0).
In the first row we see the behaviour of y = p(x) = x3 + ax for large x, in the second for small
x. The last row shows the graph for an intermediate range. We might notice that for a ă 0, the
graph has a local minimum as well as a local maximum. Such an argument already leads to a fairly
reasonable sketch of the function in Eqn. (1.2.1). We can add further details using algebra to find
zeros - that is where y = p(x) = 0.
Example 1.2.3
12
P OWER FUNCTIONS AS BUILDING BLOCKS 1.2 F IRST STEPS IN GRAPH SKETCHING
The above equation always has a solution x = 0, but if x ‰ 0, we can cancel and obtain
x2 = ´a.
This would have no solutions if a is a positive number, so that in that case, the graph crosses the x
axis only once, at x = 0, as shown in Figure 1.4. If a is negative, then the minus signs cancel, so the
equation can be written in the form
x2 = |a|
and we would have two new zeros at a
x = ˘ |a|.
For example, if a = ´1 then the function y = x3 ´ x has zeros at x = 0, 1, ´1.
Concept Check-In
8. Find the zeros of y = x3 + 3x.
Example 1.2.4
The reasoning used here is an important first step in sketching the graph of a polynomial. In the
ensuing chapters, we apply calculus tools to determine points at which the function attains local
maxima or minima (called critical points), and how it behaves for very large positive or negative
values of x. We also develop specialized methods to find zeros of more complicated functions (using
an approximation technique called Newton’s method—although this is flavour-dependent). That
said, the elementary steps described here remain useful as a quick approach for visualizing the
overall shape of a graph.
p1 (x)
y= , where p1 (x) and p2 (x) are polynomials.
p2 (x)
13
P OWER FUNCTIONS AS BUILDING BLOCKS 1.3 R ATE OF REACTION
Axn
y= n , x ě 0. (1.2.2)
a + xn
What properties of your sketch depend on the power n? What would the graph look like for
n = 1, 2, 3?
` Adjust the sliders to see how the values of n, A, and a affect the shape of the rational
function in (1.2.2).
We can break up the process of sketching this function into the following steps:
• The graph of the function in Eqn. (1.2.2) goes through the origin (at x = 0, we see that y = 0).
• For very small x, (i.e., x ăă a) we can approximate the denominator by the constant term
an + xn « an6 , since xn is negligible by comparison, so that
Axn Axn A
y= n n
« n = xn for small x.
a +x a an
This means that near the origin, the graph looks like a power function, y « Cxn (where
C = A/an ).
• For large x, i.e. x ąą a, we have an + xn « xn since x overtakes and dominates over the
constant a, so that
Axn Axn
y= n « = A for large x.
a + xn xn
This reveals that the graph has a horizontal asymptote y = A at large values of x.
• Since the function behaves like a simple power function close to the origin, we conclude
directly that the higher the value of n, the flatter is its graph near 0. Further, large n means
sharper rise to the eventual asymptote.
6 Although an looks complicated, it’s actually just a constant. Do you see why?
14
P OWER FUNCTIONS AS BUILDING BLOCKS 1.3 R ATE OF REACTION
x x x
Figure 1.5: The rational functions Eqns.(1.2.2) with n = 1, 2, 3 are compared on this graph. Close
to the origin, the function behaves like a power function, whereas for large x there is a horizontal
asymptote at y = A. As n increases, the graph becomes flatter close to the origin, and steeper in its
rise to the asymptote.
k1 k2
k−1
?
E S C E P
Figure 1.6: An enzyme (catalytic protein) is shown binding to a substrate molecule (circular dot)
and then processing it into a product (star shaped molecule).
In the context of this example, x represents the concentration of substrate in the reaction mixture.
The speed of the reaction, v, (namely the rate at which product is formed) depends on x. When
you actually graph the speed of the reaction as a function of the concentration, you see that it is
not linear: Figure 1.7 is typical. This relationships, known as Michaelis-Menten kinetics, has the
mathematical form
Kx
speed of reaction = v = , (1.3.1)
kn + x
where K, kn ą 0 are constants specific to the enzyme and the experimental conditions.
Equation (1.3.1) is a rational function. Since x is a concentration, it must be a positive quantity,
so we restrict attention to x ě 0. The expression in Eqn. (1.3.1) is a special case of the rational
functions explored in Example 1.2.6, where n = 1, A = K, a = kn . In Figure 1.7, we used plot this
function for specific values of K, kn . The following observations can be made
1. The graph of Eqn. (1.3.1) goes through the origin. Indeed, when x = 0 we have v = 0.
2. Close to the origin, the initial rise of the graph “looks like” a straight line. We can see this by
considering values of x that are much smaller than kn . Then the denominator (kn + x) is well
15
P OWER FUNCTIONS AS BUILDING BLOCKS 1.3 R ATE OF REACTION
v Michaelis-Menten kinetics
K saturation
K/2
initial rise x
kn
Figure 1.7: The graph of reaction speed, v, versus substrate concentration, x in an enzyme-catalyzed
reaction, as in Eqn. 1.3.1. This behaviour is called Michaelis-Menten kinetics. Note that the graph at
first rises almost like a straight line, but then it curves and approaches a horizontal asymptote. This
graph tells us that the speed of the enzyme cannot exceed some fixed level, i.e. it cannot be faster
than K.
approximated by the constant kn . Thus, for small x, v « (K/kn )x, so that the graph resembles
a straight line through the origin with slope (K/kn ).
3. For large x, there is a horizontal asymptote. A similar argument for x " kn , verifies that v is
approximately constant at large enough x.
Michaelis-Menten kinetics represents one relationship in which saturation occurs: the speed
of the reaction at first increases as substrate concentration x is raised, but the enzymes saturate and
operate at a fixed constant speed K as more and more substrate is added.
units example
x concentration “nano Molar” , nM ” 10´9 Moles per litre
v concentration over time nM min´1
kn
K
Kx
Table 1.1: Units for Michaelis-Menten kinetics, v = kn +x . (Incomplete; see Concept Check-In,
below.)
It is worth considering the units in Eqn. (1.3.1). Given that only quantities with identical units
can be added or compared, and that the units of the two sides of the relationship must balance, fill
Table 1.1.
Concept Check-In
9. Complete Table 1.1.
16
P OWER FUNCTIONS AS BUILDING BLOCKS 1.3 R ATE OF REACTION
§§ Hill functions
The Michaelis-Menten kinetics we discussed above fit into a broader class of Hill functions, which
are rational functions of the form shown in Eqn. (1.2.2) with n ą 1 and A, a ą 0. This function is
often referred to in the life sciences as a Hill function with coefficient n, (although the “coefficient”
is actually a power in the terminology used in this chapter). Hill functions occur when an enzyme-
catalyzed reaction benefits from cooperativity of a multi-step process. For example, the binding of
the first substrate molecule may enhance the binding of a second.
chemical concentration, x
Figure 1.8: Hill function kinetics, from Eqn. (1.2.2), with A = 3, a = 1 and Hill coefficient n = 1, 2, 3.
See also Fig 1.5 for an analysis of the shape of this graph.
17
P OWER FUNCTIONS AS BUILDING BLOCKS 1.4 ( OPTIONAL ) P REDATOR RESPONSE
speed” of the reaction. Since the Hill function behaves like a simple power function close to the
origin, the higher the value of n, the flatter is its graph near 0, and the sharper the rise to the eventual
asymptote. Hill functions with large n are often used to represent “switch-like” behaviour in genetic
networks or biochemical signal transduction pathways.
The constant a is sometimes called the “half-maximal activation level” for the following reason:
when x = a then
Aan Aa2 A
v= n n
= 2 = .
a +a 2a 2
This shows that the level x = a leads to a reaction speed of A/2 which is half of the maximal possible
rate.
Featured Problem 1.3.2
Lineweaver-Burk plots. Hill functions can be transformed to a linear relationship through a change
of variables. Consider the Hill function
Ax3
y= .
a3 + x3
define y = 1/Y , X = 1/x3 . Show that Y and X satisfy a linear relationship. Because we take the
reciprocals of x and y, X and Y are sometimes called reciprocal coordinates.
Featured Problem 1.3.2
Type II
Type III
prey density, x
Figure 1.9: Holling’s Type I, II, and III predator response. The predation rate P(x) is the number of
prey eaten by a predator per unit time. Note that the predation rate depends on the prey density x.
Interactions of predators and prey are often studied in ecology. Professor C.S. (“Buzz”) Holling,
(a former Director of the Institute of Animal Resource Ecology at the University of British Columbia)
described three types of predators, termed “Type I”, “Type II” and “Type III”, according to their
18
P OWER FUNCTIONS AS BUILDING BLOCKS 1.4 ( OPTIONAL ) P REDATOR RESPONSE
ability to consume prey as the prey density increases. The three Holling “predator functional
responses” are shown in Fig. 1.9.
Based on Fig. 1.9, match the predator responses to functions shown below.
Hint: One of the curves “looks like a straight line” (so which function here is linear?). One of the
choices is a power function. (Will it fit any of the other curves? why or why not?). Now consider the
saturating curves and use our description of rational functions in Section 1.3 to select appropriate
formulae for these functions.
P1 (x) = kx,
x
P2 (x) = K
,
a+x
P3 (x) = Kxn , n ě 2
xn
P4 (x) = K n , ně2
a + xn
The generality of mathematics allows us to adapt concepts we studied in one setting (enzyme
biochemistry) to an apparently new topic (behaviour of predators).
Concept Check-In
1. Match the predator responses shown in Fig. 1.9 with the descriptions given below
1. As a predator, I get satiated and cannot keep eating more and more prey.
2. I can hardly find the prey when the prey density is low, but I also get satiated at high
prey density.
3. The more prey there is, the more I can eat.
Fig. 1.10 provides data 7 that supports the idea that ladybugs are type 3 predators.
Let x= the number of aphids in some unit area (i.e., the density of the prey). Then the number
of aphids eaten by a ladybug per unit time in that unit area will be called the predation rate and
denoted P(x). The predation rate usually depends on the prey density, and we approximate that
dependence by
xn
P(x ) = K n , where K, a ą 0. (1.4.1)
a + xn
Here we consider the case that n = 2. The aphids reproduce at a rate proportional to their number,
so that the growth rate of the aphid population G (number of new aphids per hour) is
G(x) = rx where r ą 0. (1.4.2)
7 source: Hassell, M. P., Lawton, J. H., & Beddington, J. R. (1977). Sigmoid Functional Responses by Invertebrate
Predators and Parasitoids. Journal of Animal Ecology, 46(1), 249–262. https://doi.org/10.2307/3959
19
P OWER FUNCTIONS AS BUILDING BLOCKS 1.4 ( OPTIONAL ) P REDATOR RESPONSE
30
10
0
0 20 40 60 80 100
Aphid density, x
Figure 1.10: The predation rate of a ladybug depends on its aphid (prey) density.
(a) For what aphid population density x does the predation rate exactly balance the aphid popula-
tion growth rate?
(b) Are there situations where the predation rate cannot match the growth rate? Explain your
results in terms of the constants K, a, r.
(a) The wording “the predation rate exactly balances the reproduction rate” means that the two
functions P(x) and G(x) are exactly equal.
` Use the sliders to manipulate the predation constants K, a and the aphid growth
rate parameter r. How many solutions are there to P(x) = G(x)? Show that for some
parameter values, there is only a trivial solution at x = 0. Make a connection between this
observation and part (b) of Example 1.4.1.
Hence, to solve this problem, equate P(x) = G(x) and determine the value of x (i.e., the
number of aphids) at which this equality holds. You will find that one solution to this equation
is x = 0. But if x ‰ 0, you can cancel one factor of x from both sides and rearrange the
equation to obtain a quadratic equation whose solution can be written down (in terms of the
positive constants K, r, a).
20
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS
2 a
? that a quadratic equation ax + bx + c = 0 has roots
Hint: Recall
´b ˘ b ´ 4ac
2
x= .
2a
These roots are real provided
b2 ´ 4ac ě 0.
a The solution to this problem is based on solving a quadratic equation, and so, relies on the fact that we chose the
value n = 2 in the predation rate. To solve the same kind of problem with n = 3, 4 etc generally requires numerical
approximation methods.
(b) The solution you find in (a) is only a real number (i.e. a real solution exists) if the discriminant
(quantity inside the square-root) is positive. Determine when this situation can occur and
interpret your answer in terms of the aphid and ladybugs.
Learning Objectives
• Know that ex eventually dominates any given power function, and any power function
with positive exponent dominates logarithm (for large positive x). Use these facts for
sketching. For example, sketch f (x) = ex ´ x.
?
• Sketch familiar functions such as ex , log x, sin x, cos x, tan x, 1/x, x, and |x|.
Power functions are both common and (relatively) simple, so they were a good place to start
thinking about dominance and how it can be useful. Another common class of functions are the
exponential functions: those of the form f (x) = ax , where a is a positive constant. (The one we’ll
be using the most, sometimes called the exponential function, is ex .) The constant a is known as the
base.
21
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS
y= 4 x
2x
3x
y=
y=
y
Example 1.5.1
Exponential functions with bases greater than one will, for large x, grow extremely quickly.
Indeed, they will grow more quickly than any power function, eventually.
• For large positive values of x, ex will dominate ´x, so the function will look approximately
like ex . That is, it will grow steeply.
• For x = 0, e0 ´ 0 = 1.
So, all together: our function will look like the straight line ´x when x is strongly negative; then
it will pass through the point (0, 1); then it will grow like the classic hockey-stick graph of ex for
large x.
22
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS
y
y = ex ´ x
Example 1.5.2
Below are some familiar functions whose graphs you should be able to sketch. Some salient
points are stated explicitly.
ex :
log(x) :
23
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS
sin(x) :
cos(x) :
tan(x) :
1
x :
1
• For values of x close to 0, 1
is very large y= x
x
(positive or negative). x
24
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS
? x
• For large values of x, x is also large.
|x| :
#
x if x ě 0 y
• Piecewise defined: |x| = .
´x if x ă 0
• Domain all real numbers; range [0, 8).
• Looks like a straight line if you only look
x
to one side of the y-axis.
25
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS
26
Chapter 2
L IMITS
The concept of a limit helps us to describe the behaviour of a function close to some point of interest.
This is useful in the case of functions that are either not continuous, or not defined somewhere. We
use the notation
lim f (x)
xÑa
to denote the value that the function f approaches as x gets closer and closer to the value a.
Learning Objectives
• Explain using both words and pictures what lim f (x) = L, lim f (x) = L, and
xÑa xÑa´
lim f (x) = L mean (including the case where L is equal to 8 or ´8).
xÑa+
• Explain using both words and pictures what lim f (x) = L and lim f (x) = L mean
xÑ8 xÑ´8
(including the case where L is equal to 8 or ´8).
• Find the limit of a function at a point given the graph of the function.
Before we come to definitions, let us start with a little notation for limits.
27
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Notation 2.1.1.
lim f (x) = L
xÑa
The notation is just shorthand — we don’t want to have to write out long sentences as we do our
mathematics. Whenever you see these symbols you should think of that sentence.
This shorthand also has the benefit of being mathematically precise (albeit not in a way that we
will cover in this course), and (almost) independent of the language in which the author is writing.
A mathematician who does not speak English can read the above formula and understand exactly
what it means.
In mathematics, like most languages, there is usually more than one way of writing things and
we can also write the above limit as
f (x) Ñ L as x Ñ a
This is an example of a piecewise function. That is, a function defined in several pieces, rather than
as a single formula. We evaluate the function at a particular value of x on a case-by-case basis. Here
is a sketch of it:
28
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Notice the two circles in the plot. One is open, ˝, and the other is closed, ‚.
• A filled circle has quite a precise meaning — a filled circle at (x, y) means that the function
takes the value f (x) = y.
• An open circle is a little harder — an open circle at (3, 6) means that the point (3, 6) is not on
the graph of y = f (x), i.e. f (3) ‰ 6. We should only use the open circle where it is absolutely
necessary in order to avoid confusion.
This function is quite contrived, but it is a very good example to start working with limits more
systematically. Consider what the function does close to x = 3. We already know what happens
exactly at 3 (that is, f (x) = 9 ) but we want to look at how the function behaves very close to x = 3.
That is, what does the function do as we look at a point x that gets closer and closer to x = 3?
If we plug in some numbers very close to 3 (but not exactly 3) into the function we see the
following:
x 2.9 2.99 2.999 ˝ 3.001 3.01 3.1
f (x ) 5.8 5.98 5.998 ˝ 6.002 6.02 6.2
So as x moves closer and closer to 3, without being exactly 3, we see that the function moves closer
and closer to 6. We can write this as
lim f (x) = 6
xÑ3
That is:
The limit as x approaches 3 of f (x) is 6.
So for x very close to 3, without being exactly 3, the function is very close to 6 — which is a long
way from the value of the function exactly at 3, f (3) = 9. Note well that the behaviour of the
function as x gets very close to 3 does not depend on the value of the function at 3.
Example 2.1.2
We now have enough to make an informal definition of a limit, which is actually sufficient for
most of what we will do in this text.
Definition 2.1.3 (Informal definition of limit).
We write
lim f (x) = L
xÑa
if the value of the function f (x) is sure to be arbitrarily close to L whenever the value of x
is close enough to a, without1 being exactly a.
1 You may find the condition “without being exactly a” a little strange, but there is a good reason for it. One very
important application of limits, indeed the main reason we teach the topic, is in the definition of derivatives (see
f (x ) ´ f (a)
Definition 3.3.3). In that definition we need to compute the limit lim . In this case the function whose
x Ña x´a
f (x )´ f (a)
limit is being taken, namely x´a , is not defined at all at x = a.
29
L IMITS 2.1 Q UICK REVIEW OF LIMITS
• Now if we try to compute f (2) we get 0/0 which is undefined. The function is not defined
at that point — this is a good example of why we need limits. We have to sneak up on these
places where a function is not defined (or is badly behaved).
• VERY IMPORTANT POINT: the fraction 00 is not 8 and it is not 1; it is not defined. We
cannot ever divide by zero in normal arithmetic and obtain a consistent and mathematically
sensible answer. If you learned otherwise in high school, you should quickly unlearn it.
• Again, we can plug in some numbers close to 2 and see what we find
Example 2.1.4
The previous two examples are nicely behaved in that the limits we tried to compute actually
exist. We now turn to two nastier examples2 in which the limits we are interested in do not exist.
Example 2.1.5 (A bad example)
Consider the following function f (x) = sin(π/x). Find the limit as x Ñ 0 of f (x).
We should see something interesting happening close to x = 0 because f (x) is undefined there.
Using your favourite graph-plotting software you can see that the graph looks roughly like
2 Actually, they are good examples, but the functions in them are nastier.
30
L IMITS 2.1 Q UICK REVIEW OF LIMITS
How to explain this? As x gets closer and closer to zero, π/x becomes larger and larger (remember
what the plot of y = 1/x looks like). So when you take sine of that number, it oscillates faster and
faster the closer you get to zero. Since the function does not approach a single number as we bring x
closer and closer to zero, the limit does not exist.
We write this as
π
lim sin does not exist
xÑ0 x
It’s not very inventive notation, however it is clear. We frequently abbreviate “does not exist” to
“DNE” and rewrite the above as
π
lim sin = DNE
xÑ0 x
Example 2.1.5
In the following example, the limit we are interested in does not exist. However the way in
which things go wrong is quite different from what we just saw.
Example 2.1.6
Consider the function
$
&x
’ xă2
f (x) = ´1 x=2
’
x+3 xą2
%
• This isn’t like before. Now when we approach from below, we seem to be getting closer to
2, but when we approach from above we seem to be getting closer to 5. Since we are not
approaching the same number the limit does not exist.
31
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Example 2.1.6
While the limit in the previous example does not exist, the example serves to introduce the idea
of “one-sided limits”. For example, we can say that
As x moves closer and closer to two from below the function approaches 2.
and similarly
As x moves closer and closer to two from above the function approaches 5.
We write
lim f (x) = K
xÑa´
when the value of f (x) gets closer and closer to K when x ă a and x moves closer and
closer to a. Since the x-values are always less than a, we say that x approaches a from
below. This is also often called the left-hand limit since the x-values lie to the left of a on
a sketch of the graph.
We similarly write
lim f (x) = L
xÑa+
when the value of f (x) gets closer and closer to L when x ą a and x moves closer and
closer to a. For similar reasons we say that x approaches a from above, and sometimes
refer to this as the right-hand limit.
Note — be careful to include the superscript + and ´ when writing these limits. You might also
see the following notations:
lim f (x) = lim f (x) = lim f (x) = lim f (x) = L right-hand limit
xÑa+ xÑa+ xÓa xŒa
lim f (x) = lim f (x) = lim f (x) = lim f (x) = L left-hand limit
xÑa´ xÑa´ xÒa xÕa
32
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Notice that this is really two separate statements because of the “if and only if”
• If the limit of f (x) as x approaches a exists and is equal to L, then both the left-hand and
right-hand limits exist and are equal to L. AND,
• If the left-hand and right-hand limits as x approaches a exist and are equal, then the limit as x
approaches a exists and is equal to the one-sided limits.
That is — the limit of f (x) as x approaches a will only exist if it doesn’t matter which way we
approach a (either from left or right) AND if we get the same one-sided limits when we approach
from left and right, then the limit exists.
We can rephrase the above by writing the contrapositives3 of the above statements.
• If either of the left-hand and right-hand limits as x approaches a fail to exist, or if they both
exist but are different, then the limit as x approaches a does not exist. AND,
• If the limit as x approaches a does not exist, then the left-hand and right-hand limits are either
different or at least one of them does not exist.
Here is another limit example.
Example 2.1.9
Consider the following two functions and compute their limits and one-sided limits as x approaches
1:
These are a little different from our previous examples, in that we do not have formulas, only the
sketch. But we can still compute the limits.
• Function on the left — f (x):
lim f (x) = 2 lim f (x) = 2
xÑ1´ xÑ1+
3 Given a statement of the form “If A then B”, the contrapositive is “If not B then not A”. They are logically
equivalent — if one is true then so is the other. We must take care not to confuse the contrapositive with the
converse. Given “If A then B”, the converse is “If B then A”. These are definitely not the same.
To see this consider the statement “If he is Shakespeare then he is dead.” The converse is “If he is dead then he is
Shakespeare” — clearly garbage since there are plenty of dead people who are not Shakespeare. The contrapositive
is “If he is not dead then he is not Shakespeare” — which makes much more sense.
33
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Example 2.1.9
We have seen two ways in which a limit does not exist — in one case the function oscillated
wildly, and in the other there was some sort of “jump” in the function, so that the left-hand and
right-hand limits were different.
There is a third way that we must also consider. To describe this, consider the following four
functions:
Figure 2.1.1.
None of these functions are defined at x = a, nor do the limits as x approaches a exist. However
we can say more than just “the limits do not exist”.
Notice that the value of function 1 can be made bigger and bigger as we bring x closer and
closer to a. Similarly the value of the second function can be made arbitrarily large and negative (i.e.
make it as big a negative number as we want) by bringing x closer and closer to a. Based on this
observation we have the following definition.
34
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Definition 2.1.10.
We write
lim f (x) = +8
xÑa
when the value of the function f (x) becomes arbitrarily large and positive as x gets closer
and closer to a, without being exactly a.
Similarly, we write
lim f (x) = ´8
xÑa
when the value of the function f (x) becomes arbitrarily large and negative as x gets closer
and closer to a, without being exactly a.
1 1
lim = +8 lim ´ = ´8
xÑ0 x2 xÑ0 x2
IMPORTANT POINT: Please do not think of “+8” and “´8” in these statements as numbers.
You should think of lim f (x) = +8 and lim f (x) = ´8 as special cases of lim f (x) = DNE. The
xÑa xÑa xÑa
statement
lim f (x) = +8
xÑa
does not mean “ f (x) approaches infinity as x approaches a.” It means “the function f (x) becomes
arbitrarily large as x approaches a”. These are different statements; remember that 8 is not a
number4 .
Now consider functions 3 and 4 in Figure 2.1.1. Here we can make the value of the function as
big and positive as we want (for function 3) or as big and negative as we want (for function 4) but
only when x approaches a from one side. With this in mind we can construct similar notation and a
similar definition:
4 One needs to be very careful making statements about infinity. At some point in our lives we get around to asking
ourselves “what is the biggest number?” and we realise there isn’t one. That is, we can go on counting integer
after integer forever. Indeed the set of integers is the first infinite thing we really encounter. It is an example of a
countably infinite set. The set of real numbers is actually much bigger and is uncountably infinite. In fact there are
an infinite number of different sorts of infinity! Much of the theory of infinite sets was developed by Georg Cantor,
who is well worth Googling.
35
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Definition 2.1.11.
We write
lim f (x) = +8
xÑa+
when the value of the function f (x) becomes arbitrarily large and positive as x gets closer
and closer to a from above (equivalently — from the right), without being exactly a.
Similarly, we write
lim f (x) = ´8
xÑa+
when the value of the function f (x) becomes arbitrarily large and negative as x gets closer
and closer to a from above (equivalently — from the right), without being exactly a.
The notation
has a similar meaning except that limits are approached from below / from the left.
Example 2.1.12
Consider the function
1
g(x ) =
sin(x)
Find the one-sided limits of this function as x Ñ π.
Probably the easiest way to do this is to first plot the graph of sin(x) and 1/x and then think
carefully about the one-sided limits:
36
L IMITS 2.1 Q UICK REVIEW OF LIMITS
• As x Ñ π from the left, sin(x) is a small positive number that is getting closer and closer to
zero. That is, as x Ñ π ´ , we have that sin(x) Ñ 0 through positive numbers (i.e. from above).
Now look at the graph of 1/x, and think what happens as we move x Ñ 0+ , the function is
positive and becomes larger and larger.
So as x Ñ π from the left, sin(x) Ñ 0 from above, and so 1/ sin(x) Ñ +8.
• By very similar reasoning, as x Ñ π from the right, sin(x) is a small negative number that gets
closer and closer to zero. So as x Ñ π from the right, sin(x) Ñ 0 through negative numbers
(i.e. from below) and so 1/ sin(x) to ´8.
Thus
1 1
lim = +8 lim = ´8
xÑπ ´ sin(x) xÑπ + sin(x)
Example 2.1.12
Up to this point we explored limits by sketching graphs or plugging values into a calculator. This
was done to help build intuition, but it is not really the basis of a systematic method for computing
limits. We have also avoided more formal approaches5 since we do not have time in the course to go
into that level of detail and (arguably) we don’t need that detail to achieve the aims of the course.
Thankfully we can develop a more systematic approach based on the idea of building up complicated
limits from simpler ones by examining how limits interact with the basic operations of arithmetic.
• constants — c
These are the building blocks from which we construct functions. Soon we will add a few more
functions to this list, especially the exponential function and various inverse functions.
We then take these building blocks and piece them together using arithmetic
• addition and subtraction — f (x) = g(x) + h(x) and f (x) = g(x) ´ h(x)
5 The formal approaches are typically referred to as “epsilon-delta limits” or “epsilon-delta proofs” since the symbols
ε and δ are traditionally used throughout. Your favourite search engine will tell you more, if you’re curious.
37
L IMITS 2.1 Q UICK REVIEW OF LIMITS
What we will learn in this section is how to compute the limits of the basic building blocks and
then how we can compute limits of sums, products and so forth using “limit laws”. This process
allows us to compute limits of complicated functions, using very simple tools and without having to
resort to “plugging in numbers” or “closer and closer” or “ε ´ δ arguments”.
In the examples we saw above, almost all the interesting limits happened at points where the
underlying function was badly behaved — where it jumped, was not defined, or blew up to infinity.
In those cases we had to be careful and think about what was happening. Thankfully most functions
we will see do not have too many points at which these sorts of things happen.
For example, polynomials do not have any nasty jumps and are defined everywhere and do not
“blow up”. If you plot them, they look smooth6 . Polynomials and limits behave very nicely together,
and for any polynomial P(x) and any real number a we have that
That is — to evaluate the limit, we just plug in the number. We will build up to this result over the
next few pages.
Let us start with the two easiest limits.
Since we have not seen too many theorems yet, let us examine it carefully piece by piece.
• Let a, c P R — just as was the case for definitions, we start a theorem by defining terms and
setting the scene. There is not too much scene to set: the symbols a and c are real numbers.
• The following two limits hold — this doesn’t really contribute much to the statement of the
theorem, it just makes it easier to read.
• lim c = c — when we take the limit of a constant function (for example think of c = 3), the
xÑa
limit is (unsurprisingly) just that same constant.
• lim x = a — as we noted above for general polynomials, the limit of the function f (x) = x as
xÑa
x approaches a given point a, is just a. This says something quite obvious — as x approaches
a, x approaches a (if you are not convinced then sketch the graph).
Armed with only these two limits, we cannot do very much. But combining these limits with
some arithmetic we can do quite a lot.
6 We have used this term in an imprecise way, but it does have a precise mathematical meaning.
38
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Let a, c P R, let f (x) and g(x) be defined for all x’s that lie in some interval about a (but
f , g need not be defined exactly at a).
• lim ( f (x) + g(x)) = F + G — limit of the sum is the sum of the limits.
xÑa
• lim ( f (x) ´ g(x)) = F ´ G — limit of the difference is the difference of the limits.
xÑa
f (x ) F 1 1
• If G ‰ 0 then lim = and, in particular, lim = .
xÑa g(x ) G xÑa g(x ) G
Note — be careful with this last one — the denominator cannot be zero.
The above theorem shows that limits interact very simply with arithmetic. If you are asked to
find the limit of a sum then the answer is just the sum of the limits. Similarly the limit of a product
is just the product of the limits.
How do we apply the above theorem to the rational function? Here is a warm-up example:
Example 2.1.15
You are given two functions f , g (not explicitly) which have the following limits as x approaches 1:
lim 3 f (x) = 3 ˆ 3 = 9
xÑ1
lim 3 f (x) ´ g(x) = 3 ˆ 3 ´ 2 = 7
xÑ1
lim f (x)g(x) = 3 ˆ 2 = 6
xÑ1
f (x ) 3
lim = =3
xÑ1 f (x ) ´ g(x ) 3 ´ 2
Example 2.1.15
Example 2.1.16
Find lim 4x2 ´ 1
xÑ3
39
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Example 2.1.16
This is an excruciating level of detail, but when you first use a theorem, it is a good idea to do things
step by step. You can go faster when you are comfortable.
Example 2.1.17
x
Yet another limit — compute lim .
xÑ2 x ´ 1
To apply the arithmetic of limits, we need to examine numerator and denominator separately and
make sure the limit of the denominator is non-zero. Numerator first:
lim x = 2 limit of x
xÑ2
Since the limit of the denominator is non-zero we can put it back together to get
x lim x
lim = xÑ2
xÑ2 x ´ 1 lim (x ´ 1)
xÑ2
2
=
1
=2
Example 2.1.17
In the next example we show that many different things can happen if the limit of the denominator
is zero.
40
L IMITS 2.1 Q UICK REVIEW OF LIMITS
• If the limit of the numerator is non-zero then the limit of the ratio does not exist
f (x )
lim = DNE when lim f (x) ‰ 0 and lim g(x) = 0
xÑa g(x) xÑa xÑa
1
For example, lim = DNE.
xÑ0 x2
• If the limit of the numerator is zero then the above theorem does not give us enough information
to decide whether or not the limit exists. It is possible that
x 1
– the limit does not exist, eg. lim = lim = DNE
xÑ0 x2 xÑ0 x
x2 1 ´x2 ´1
– the limit is ˘8, eg. lim 4
= lim 2
= + 8 or lim 4
= lim 2 = ´8.
xÑ0 x xÑ0 x xÑ0 x xÑ0 x
x2
– the limit is zero, eg. lim =0
xÑ0 x
x
– the limit exists and is non-zero, eg. lim =1
xÑ0 x
Now while the above examples are very simple and a little contrived they serve to illustrate the point
we are trying to make — be careful if the limit of the denominator is zero.
Example 2.1.18
Example 2.1.19
2x ´ 3
Let h(x) = 2 and find its limit as x approaches 2.
x + 5x ´ 6
Since this is the limit of a ratio, we compute the limit of the numerator and denominator
separately. Numerator first:
lim 2x ´ 3 = lim 2x ´ lim 3 difference of limits
xÑ2 xÑ2 xÑ2
= 2 ¨ lim x ´ 3 product of limits and limit of constant
xÑ2
= 2¨2´3 limits of x
=1
41
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Denominator next:
2 2
lim x + 5x ´ 6 = lim x + lim 5x ´ lim 6 sum of limits
xÑ2 xÑ2 xÑ2 xÑ2
= lim x ¨ lim x + 5 ¨ lim x ´ 6 product of limits and limit of constant
xÑ2 xÑ2 xÑ2
= 2¨2+5¨2´6 limits of x
=8
Since the limit of the denominator is non-zero, we can obtain our result by taking the ratio of the
separate limits.
2x ´ 3 lim 2x ´ 3 1
xÑ2
lim 2 = =
xÑ2 x + 5x ´ 6 2
lim x + 5x ´ 6 8
xÑ2
The above works out quite simply. However, if we were to take the limit as x Ñ 1 then things
are a bit harder. The limit of the numerator is:
lim 2x ´ 3 = 2 ¨ 1 ´ 3 = ´1
xÑ1
(we have not listed all the steps). And the limit of the denominator is
lim x2 + 5x ´ 6 = 1 ¨ 1 + 5 ´ 6 = 0
xÑ1
Since the limit of the numerator is non-zero, while the limit of the denominator is zero, the limit of
the ratio does not exist.
2x ´ 3
lim = DNE
xÑ1 x2 + 5x ´ 6
Example 2.1.19
It is IMPORTANT TO NOTE that it is not correct to write
2x ´ 3 ´1
lim = = DNE
xÑ1 x2 + 5x ´ 6 0
lim f (x)
f (x) xÑa
lim = = something
xÑa g(x) lim g(x)
xÑa
when the limit of the denominator is non-zero (see Example 2.1.18 above).
With a little care you can use the arithmetic of limits to obtain the following rules for limits of
powers of functions and limits of roots of functions:
42
L IMITS 2.1 Q UICK REVIEW OF LIMITS
lim f (x) = F
xÑa
then
1/n
lim ( f (x))1/n = lim f (x) = F 1/n
xÑa xÑa
Notice that we have to be careful when taking roots of limits that might be negative numbers. To
see why, consider the case n = 2, the limit
In order to evaluate such limits properly we need to use complex numbers which are beyond the
scope of this text.
Also note that the notation x1/2 refers to the positive square root of x. While 2 and (´2) are both
numbers whose squares are 4, the notation 41/2 means 2. This is something we must be careful of8 .
So again — let us do a few examples and carefully note what we are doing.
Example 2.1.21
7 You may not know the definition of the power b p when p is not a rational number, so here it is. If b ą 0 and p is
any real number, then b p is the limit of br as r approaches p through rational numbers. We won’t do so here, but it
is possible to prove that the limit exists.
8 Like ending sentences in prepositions — “This is something up with which we will not put.” This quote is attributed
to Churchill though there is some dispute as to whether or not he really said it.
43
L IMITS 2.1 Q UICK REVIEW OF LIMITS
1/3
2 1/3 2
lim (4x ´ 3) = ( lim 4x ) ´ ( lim 3)
xÑ2 xÑ2 xÑ2
1/3
= 4 ¨ 22 ´ 3
= (16 ´ 3)1/3
= 131/3
Example 2.1.21
By combining the last few theorems we can make the evaluation of limits of polynomials and
rational functions much easier:
Theorem 2.1.22 (Limits of polynomials and rational functions).
Let a P R, let P(x) be a polynomial and let R(x) be a rational function. Then
If R(x) is not defined at x = a then we are not able to apply this result.
44
L IMITS 2.1 Q UICK REVIEW OF LIMITS
If we try to apply the arithmetic of limits then we compute the limits of the numerator and denomi-
nator separately
lim x3 ´ x2 = 1 ´ 1 = 0 (2.1.1)
xÑ1
lim x ´ 1 = 1 ´ 1 = 0 (2.1.2)
xÑ1
Since the denominator is zero, we cannot apply our theorem and we are, for the moment, stuck.
However, there is more that we can do here — the hint is that the numerator and denominator both
approach zero as x approaches 1. This means that there might be something we can cancel.
So let us play with the expression a little more before we take the limit:
x3 ´ x2 x2 ( x ´ 1 )
= = x2 provided x ‰ 1.
x´1 x´1
So what we really have here is the following function
#
x3 ´ x2 x2 x‰1
=
x´1 undefined x=1
If we plot the above function the graph looks exactly the same as y = x2 except that the function is
not defined at x = 1 (since at x = 1 both numerator and denominator are zero).
When we compute a limit as x Ñ a, the value of the function exactly at x = a is irrelevant. We only
care what happens to the function as we bring x very close to a. So for the above problem we can
write
x3 ´ x2
= x2 when x is close to 1 but not at x = 1
x´1
So the limit as x Ñ 1 of the function is the same as the limit lim x2 since the functions are the same
xÑ1
except exactly at x = 1. By this reasoning we get
x3 ´ x2
lim = lim x2 = 1
xÑ1 x ´ 1 xÑ1
Example 2.1.23
The reasoning in the above example can be made more general:
45
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Theorem 2.1.24.
If f (x) = g(x) except when x = a then lim f (x) = lim g(x) provided the limit of g exists.
xÑa xÑa
How do we know when to use this theorem? The big clue is that when we try to compute the
limit in a naive way, we end up with 00 . We know that 00 does not make sense, but it is an indication
that there might be a common factor between numerator and denominator that can be cancelled. In
the previous example, this common factor was (x ´ 1).
Example 2.1.25
Using this idea, compute
(1 + h)2 ´ 1
lim
hÑ0 h
• First we should check that we cannot just substitute h = 0 into this — clearly we cannot
because the denominator would be 0.
• But we should also check the numerator to see if we have 00 , and we see that the numerator
gives us 1 ´ 1 = 0.
• Thus we have a hint that there is a common factor that we might be able to cancel. So now we
look for the common factor and try to cancel it.
(1 + h)2 ´ 1 1 + 2h + h2 ´ 1
= expand
h h
2h + h2 h(2 + h)
= = factor and then cancel
h h
= 2+h
(1 + h)2 ´ 1
lim = lim 2 + h
hÑ0 h hÑ0
=2
Example 2.1.25
We have written everything out in great detail here — way more than is required for a solution to
such a problem. Let us do it again a little more succinctly.
46
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Example 2.1.26
Compute the following limit:
(1 + h)2 ´ 1
lim
hÑ0 h
If we try to use the arithmetic of limits, then we see that the limit of the numerator and the limit of
the denominator are both zero. Hence we should try to factor them and cancel any common factor.
This gives
(1 + h)2 ´ 1 1 + 2h + h2 ´ 1
lim = lim
hÑ0 h hÑ0 h
= lim 2 + h
hÑ0
=2
Example 2.1.26
Notice that even though we did this example carefully above, we have still written some text in our
working explaining what we have done. You should always think about the reader and, if in doubt,
put in more explanation rather than less.
Up until this point we have discussed what happens to a function as we move its input x closer and
closer to a particular point a. For a great many applications of limits we need to understand what
happens to a function when its input becomes extremely large — for example what happens to a
population at a time far in the future.
The definition of a limit at infinity has a similar flavour to the definition of limits at finite points
that we saw above, but the details are a little different. We also need to distinguish between positive
and negative infinity. As x becomes very large and positive it moves off towards +8 but when it
becomes very large and negative it moves off towards ´8.
Again we give an informal definition; the full formal definition is beyond the scope of this course.
47
L IMITS 2.1 Q UICK REVIEW OF LIMITS
We write
lim f (x) = L
xÑ8
when the value of the function f (x) gets closer and closer to L as we make x larger and
larger and positive.
Similarly we write
lim f (x) = L
xÑ´8
when the value of the function f (x) gets closer and closer to L as we make x larger and
larger and negative.
Example 2.1.28
Consider the two functions depicted below
The dotted horizontal lines indicate the behaviour as x becomes very large. The function on the left
has limits as x Ñ 8 and as x Ñ ´8 since the function “settles down” to a particular value. On the
other hand, the function on the right does not have a limit as x Ñ ´8 since the function just keeps
getting bigger and bigger.
Example 2.1.28
Just as was the case for limits as x Ñ a we will start with two very simple building blocks and
build other limits from those.
Theorem 2.1.29.
lim c = c lim c = c
xÑ8 xÑ´8
1 1
lim =0 lim =0
xÑ8 x xÑ´8 x
48
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Note that, as was the case in Theorem 2.1.20, we need a little extra care with powers of functions.
We must avoid taking square roots of negative numbers, or indeed any even root of a negative
number9 .
Hence we have for all rational r ą 0
1
lim =0
xÑ8 xr
1
• On the other hand, x4/3 is defined for negative values of x and lim = 0.
xÑ´8 x4/3
9 To be more precise, there is no real number x so that xeven power is a negative number. Hence we cannot take the
even-root of a negative number and express it as a real number. This is precisely what complex numbers allow us
to do, but alas, there is not space in the course for us to explore them.
10 where we write r = qp with p, q integers with no common factors. For example, r = 14 6
should be written as r = 73
when considering this rule.
49
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Our first application of limits at infinity will be to examine the behaviour of a rational function
for very large x. To do this we use a “trick”.
Example 2.1.31
Compute the following limit:
x2 ´ 3x + 4
lim
xÑ8 3x2 + 8x + 1
As x becomes very large, it is the x2 term that will dominate in both the numerator and denominator
and the other bits become irrelevant. (This is the asymptotic reasoning you’ve seen earlier.) That is,
for very large x, x2 is much much larger than x or any constant. So we pull out these dominant parts
x 2 1´ 3 + 4
x ´ 3x + 4
2 x x 2
2
=
3x + 8x + 1 x2 3 + 8 + 1
x x2
1 ´ 3x + x42
= remove the common factors
3 + 8x + x12
x2 ´ 3x + 4 1 ´ 3x + x42
lim = lim
xÑ8 3x2 + 8x + 1 xÑ8 3 + 8 + 1
2
x x
3 4
lim 1 ´ + 2
xÑ8 x x
= arithmetic of limits
8 1
lim 3 + + 2
xÑ8 x x
3 4
lim 1 ´ lim + lim 2
xÑ8 x xÑ8 x
= xÑ8 more arithmetic of limits
8 1
lim 3 + lim + lim 2
xÑ8 xÑ8 x xÑ8 x
1+0+0 1
= = .
3+0+0 3
Example 2.1.31
• The biggest contribution to the numerator comes from the 4x2 inside the square-root.? When
we pull x outside the square-root it becomes x, so the numerator is dominated by x ¨ 4 = 2x
2
50
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Example 2.1.32
?
4x +1 2
Now let us also think about the limit of the same function, 5x´1 , as x Ñ ´8. There is
something subtle going on because of the square-root. First consider the function11
?
h(t ) = t 2 .
Evaluating this at t = 7 gives
? ?
h(7) = 72 = 49 = 7.
?
We’ll get much the same thing for any t ě 0. For any t ě 0, h(t ) = t 2 returns exactly t. However
now consider the function at t = ´3
b ?
h(´3) = (´3)2 = 9 = 3 = ´(´3);
that is, the function is returning ´1 times the input.
?
This is because when we defined , we defined it to be the positive square-root. i.e. the function
?
t can never return a negative number. So being more careful
?
h(t ) = t 2 = |t|,
where the |t| is the absolute value of t. You are perhaps used to thinking of absolute value as “remove
the minus sign”, but this is not quite correct. Let’s sketch the function:
Figure 2.1.2.
11 Just to change things up let’s use t and h(t ) instead of the ubiquitous x and f (x).
51
L IMITS 2.1 Q UICK REVIEW OF LIMITS
We are now ready to examine the limit as x Ñ ´8 in our previous example. Mostly it is copy and
paste from above.
Example 2.1.33
?
2
4x +1
Find the limit as x Ñ ´8 of 5x´1
We use the same trick — try to work out what is the biggest term in the numerator and
denominator and pull it to one side. Since we are taking the limit as x Ñ ´8 we should think of x
as a large negative number.
• The denominator is dominated by 5x.
• The biggest contribution to the numerator comes from the 4x2 inside the square-root. When
|x| = ´x (since we are taking the limit as
we pull the x2 outside a square-root it becomes ?
x Ñ ´8), so the numerator is dominated by ´x ¨ 4 = ´2x
• To see this more explicitly rewrite the numerator
a b ? a
4x + 1 = x2 (4 + 1/x2 ) = x2 4 + 1/x2
2
a
= |x| 4 + 1/x2 and since x ă 0 we have
a
= ´x 4 + 1/x2
Example 2.1.33
So the limit as x Ñ ´8 is almost the same but we gain a minus sign. This is definitely not the case
in general — you have to think about each example separately.
Here is a sketch of the function in question.
52
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Figure 2.1.3.
Example 2.1.34
Compute the following limit:
7/5
lim x ´ x
xÑ8
From our asymptotic reasoning, we know the higher-power power function will dominate for
x grow without bound as x Ñ 8, the first term will be
large values of x. So, although both x7/5 and
much bigger. So, we expect lim x7/5 ´ x = 8.
xÑ8
That’s a fine way of computing the limit, but for interest, let’s see how it would go using
arithmetic of limits. In this case we cannot use the arithmetic of limits to write this as
lim x7/5 ´ x = lim x7/5 ´ lim x
xÑ8 xÑ8 xÑ8
= 8´8
because the limits do not exist. We can only use the limit laws when the limits exist. So we should
go back and think some more.
When x is very large, x7/5 = x ¨ x2/5 will be much larger than x, so the x7/5 term will dominate
the x term. So factor out x7/5 and rewrite it as
7/5 7/5 1
x ´x = x 1 ´ 2/5
x
• For large x, x7/5 ą x (this is actually true for any x ą 1). In the limit as x Ñ +8, x becomes
arbitrarily large and positive, and x7/5 must be bigger still, so it follows that
• On the other hand, (1 ´ x´2/5 ) becomes closer and closer to 1 — we can use the arithmetic
of limits to write this as
53
L IMITS 2.1 Q UICK REVIEW OF LIMITS
So the product of these two factors will be come larger and larger (and positive) as x moves off to
infinity. Hence we have
lim x7/5 1 ´ 1/x2/5 = +8.
xÑ8
Example 2.1.34
But remember +8 and ´8 are not numbers; the last equation in the example is shorthand for “the
function becomes arbitrarily large”.
In the previous section we saw that finite limits and arithmetic interact very nicely (see The-
orems 2.1.14 and 2.1.20). This enabled us to compute the limits of more complicated function
in terms of simpler ones. When limits of functions go to plus or minus infinity we are quite a
bit more restricted in what we can deduce. The next theorem states some results concerning the
sum, difference, ratio and product of infinite limits — unfortunately in many cases we cannot make
general statements and the results will depend on the details of the problem at hand.
54
L IMITS 2.1 Q UICK REVIEW OF LIMITS
Let a, c, H P R and let f , g, h be functions defined in an interval around a (but they need
not be defined at x = a), so that
f (x )
• lim undetermined
xÑa g(x)
$
’+8 H ą0
f (x ) &
• lim = ´8 H ă0
xÑa h(x) ’
undetermined H = 0
%
h(x )
• lim =0
xÑa f (x)
$
&+8
’ pą0
p
• lim f (x) = 0 pă0
xÑa ’
1 p=0
%
Note that by “undetermined” we mean that the limit may or may not exist, but cannot be
determined from the information given in the theorem. See Example 2.1.18 for an example of what
we mean by “undetermined”. Additionally consider the following example.
Example 2.1.36
55
L IMITS 2.2 A SYMPTOTES
Say we want to compute the limit of the difference of two of the above functions as x Ñ 0. Then
the previous theorem cannot help us. This is not because it is too weak, rather it is because the
difference of two infinite limits can be, either plus infinity, minus infinity or some finite number
depending on the details of the problem. For example,
Example 2.1.36
2.2 IJ Asymptotes
Learning Objectives
• Evaluate limits of polynomial, rational, trigonometric, exponential, and logarithmic
functions.
• Explain using both informal language and the language of limits what it means for a
function to have a horizontal or vertical asymptote.
• Given a simple function, find its vertical and horizontal asymptotes by asymptotic
reasoning or by taking limits.
• Explain why it is not true that a function cannot cross its horizontal asymptote.
Definition 2.2.1.
Let f (x) be a function. If lim f (x) = L OR lim f (x) = L, for some real number L,
xÑ8 xÑ´8
then we say the line y = L is a horizontal asymptote of f (x).
Example 2.2.2
3x4
Consider the function f (x) = 1 + x4
, pictured below.
56
L IMITS 2.2 A SYMPTOTES
y
y = f (x )
For large positive and large negative values of x, the function looks nearly flat. To investigate
this ‘flatness,’ we can take limits at infinity. This can be done using algebra, or asymptotics.
Option 1, algebra:
That is: as x gets larger and larger, f (x) gets closer and closer to 3.
The computation is similar for lim f (x).
xÑ´8
Option 2, asymptotics: Let’s consider very large positive values of x. The denominator 1 + x4
4
behaves much like x4 when |x| is large, so the entire function behaves much like 3x x4
, which is
just the constant 3. That is: for very large positive values of x, the function looks quite a lot
like the horizontal line y = 3.
The computation is similar for lim f (x).
xÑ´8
So, this function has a horizontal asymptote, y = 3. This is often emphasized in a sketch with
dashed lines.
3 y = f (x )
57
L IMITS 2.2 A SYMPTOTES
Example 2.2.2
Since 0 is a real number, we see that ex has a horizontal asymptote at y = 0. Since 8 is not a
real number, y = 0 is the only horizontal asymptote of ex .
y
y = ex
Example 2.2.3
y
y = f (x )
58
L IMITS 2.2 A SYMPTOTES
Second example: A function might take on the value of its horizontal asymptote while the function
is busy not pretending to be constant.
x
A function such as g(x) = 1 + x2
has a horizontal asymptote of y = 0 both to the left, and to
the right:
x x
lim =0 and lim =0
xÑ´8 1 + x2 xÑ8 1 + x2
That is, when x is very large (positive or negative) g(x) is nearly constant. However, g(x) is
not ‘nearly constant’ everywhere. When x is close to 0, g(x) moves around quite a bit. And, at
the origin, we just so happen to have g(0) = 0.
y = g(x )
x
Third example: A function might cross its horizontal asymptote infinitely many times.
sin x
Consider h(x) = x . As |x| grows larger and larger, the magnitude, or absolute value, of h(x)
shrinks to 0:
sin x sin x
lim = 0 and lim =0
xÑ´8 x xÑ8 x
However, the sign of the function changes endlessly: it’s positive for 0 ă x ă π, negative for
π ă x ă 2π, positive again for 2π ă x ă 3π, etc. That leads to an oscillating behaviour. In
particular, h(x) = 0 when x is a nonzero integer multiple of π.
Remark: the oscillating behaviour in the sketch below has been exaggerated. In a more
accurate sketch, h(x) quickly appears indistinguishable from 0.
x
y = h(x )
Example 2.2.4
The counterpart to the horizontal asymptote is, not surprisingly, the vertical asymptote.
59
L IMITS 2.2 A SYMPTOTES
Definition 2.2.5.
Let a be a real number and let f (x) be a function. We say f (x) has a vertical asymptote at
a if at least one of the following is true:
That is, a function has a vertical asymptote where it has an infinite discontinuity (see section 2.3
for more about continuity).
1
y= x2
x
Example 2.2.6
60
L IMITS 2.2 A SYMPTOTES
1
y= x
x
Example 2.2.7
y
y = log x
Example 2.2.8
61
L IMITS 2.2 A SYMPTOTES
1
lim t = lim =8
xÑ0+ xÑ0+ x
1
ùñ lim e x = lim et = 8
xÑ0+ tÑ8
1
This tells us e x has a vertical asymptote at x = 0. Now let’s find the limit from the other side.
1
lim t = lim = ´8
xÑ0´ xÑ0´ x
1
ùñ lim e x = lim et = 0
xÑ0´ tÑ´8
So, interestingly, the limit from the right is infinite, while the limit from the left is finite. Finally,
let’s consider large-magnitude values of x:
1
lim t = lim =0
xÑ˘8 x
xÑ˘8
ùñ lim e1/x =e =1 0
xÑ˘8
• When x is large and positive, e1/x « 1. Since x ą 0, then 1x ą 0, so e1/x ą 1. So, on the far
right of our graph, our function will be close to 1, but a little larger.
• When x is large and negative, e1/x « 1. Since x ă 0, then 1x ă 0, so e1/x ă 1. So, on the far
left of our graph, our function will be close to 1, but a little smaller.
• When x approaches 0 from the left, e1/x approaches 0. So, from the left, our function will
approach the origin. Note, however, that e1/x is not defined at x = 0.
• When x approaches 0 from the right, e1/x will blow up, increasing without bound.
62
L IMITS 2.3 L IMITS AND CONTINUITY
For interest, a more accurate graph of y = f (x) is shown below. We repeat that you do not need
to know how to achieve this level of accuracy right now, but you will learn it later.
1
y = ex
Example 2.2.9
Learning Objectives
• Explain informally and formally what it means for a function to be continuous on its
domain.
• Given a function defined with parameters, select parameter values that make the
function continuous.
We have seen that computing the limits some functions — polynomials and rational functions —
is very easy because
That is, the the limit as x approaches a is just f (a). Roughly speaking, the reason we can compute
the limit this way is that these functions do not have any abrupt jumps near a.
Many other functions have this property, sin(x) for example. A function with this property is
called “continuous” and there is a precise mathematical definition for it.
63
L IMITS 2.3 L IMITS AND CONTINUITY
Definition 2.3.1.
• f (a) exists
We already know from our work above that polynomials are continuous, and that rational
functions are continuous at all points in their domains — i.e. where their denominators are non-zero.
As we did for limits, we will see that continuity interacts “nicely” with arithmetic. This will allow
us to construct complicated continuous functions from simpler continuous building blocks (like
polynomials).
But first, a few examples. . .
Example 2.3.2
Consider the functions drawn below
These are
# # #
xă1 x‰0 x3 ´x2
x 1/x2 x´1 x‰1
f (x ) = g(x ) = h(x ) =
x+2 xě1 0 x=0 0 x=1
64
L IMITS 2.3 L IMITS AND CONTINUITY
• When x ă 1 then f (x) is a straight line (and so a polynomial) and so it is continuous at every
point x ă 1. Similarly when x ą 1 the function is a straight line and so it is continuous at
every point x ą 1. The only point which might be a discontinuity is at x = 1. We see that the
one sided limits are different. Hence the limit at x = 1 does not exist and so the function is
discontinuous at x = 1.
But note that that f (x) is continuous from one side — which?
• The middle case is much like the previous one. When x ‰ 0 the g(x) is a rational function
and so is continuous everywhere on its domain (which is all reals except x = 0). Thus the
only point where g(x) might be discontinuous is at x = 0. We see that neither of the one-sided
limits exist at x = 0, so the limit does not exist at x = 0. Hence the function is discontinuous
at x = 0.
• We have seen the function h(x) before. By the same reasoning as above, we know it is
continuous except at x = 1 which we must check separately.
By definition of h(x), h(1) = 0. We must compare this to the limit as x Ñ 1. We did this
before.
x3 ´ x2 x2 ( x ´ 1 )
= = x2
x´1 x´1
3 2
So limxÑ1 xx´1
´x
= limxÑ1 x2 = 1 ‰ h(1). Hence h is discontinuous at x = 1.
Example 2.3.2
• The function f (x) has a “jump discontinuity” because the function “jumps” from one finite
value on the left to another value on the right.
• The second function, g(x), has an “infinite discontinuity” since lim f (x) = +8.
• The third function, h(x), has a “removable discontinuity” because we could make the function
continuous at that point by redefining the function at that point. i.e. setting h(1) = 1. That is
#
x3 ´x2
x´1 x‰1
new function h(x) =
1 x=1
Showing a function is continuous can be a pain, but just as the limit laws help us compute
complicated limits in terms of simpler limits, we can use them to show that complicated functions
are continuous by breaking them into simpler pieces.
65
L IMITS 2.3 L IMITS AND CONTINUITY
Let a, c P R and let f (x) and g(x) be functions that are continuous at a. Then the following
functions are also continuous at x = a:
Above we stated that polynomials and rational functions are continuous (being careful about
domains of rational functions — we must avoid the denominators being zero) without making it a
formal statement. This is easily fixed. . .
Lemma 2.3.4.
f (x ) = x g(x ) = c
This isn’t quite the result we wanted (that’s a couple of lines below) but it is a small result that
we can combine with the arithmetic of limits to get the result we want. Such small helpful results
are called “lemmas” and they will arise more as we go along.
Now since we can obtain any polynomial and any rational function by carefully adding, subtract-
ing, multiplying and dividing the functions f (x) = x and g(x) = c, the above lemma combines with
the “arithmetic of continuity” theorem to give us the result we want:
With some more work this result can be extended to wider families of functions:
66
L IMITS 2.3 L IMITS AND CONTINUITY
Theorem 2.3.6.
´1 ď cos(x) ď 1
1 ď 2 + cos(x) ď 3
12 Remember that sin and cos are defined on all real numbers, so tan(x) = sin(x)/ cos(x) is continuous everywhere
except where cos(x) = 0. This happens when x = π2 + nπ for any integer n. If you cannot remember where tan(x)
“blows up” or sin(x) = 0 or cos(x) = 0 then you should definitely revise trigonometric functions. Come to think of
it — just revise them anyway.
13 If you do not know this fact then you should revise trigonometric functions. See the previous footnote.
67
L IMITS 2.3 L IMITS AND CONTINUITY
sin(x)
If the function were changed to much of the same reasoning can be used. Being a
x2 ´ 5x + 6
little terse we could answer with:
Note that this example raises a subtle point about checking continuity when numerator and denomi-
nator are simultaneously zero. There are quite a few possible outcomes in this case and we need
more sophisticated tools to adequately analyse the behaviour of functions near such points. We will
return to this question later in the text after we have developed Taylor expansions.
Example 2.3.7
So we know what happens when we add subtract multiply and divide, but what about when we
compose functions? Well — limits and compositions work nicely when things are continuous.
Our first step should be to break the functions down into pieces and study them. When we put them
back together we should be careful of dividing by zero, or falling outside the domain.
68
L IMITS 2.3 L IMITS AND CONTINUITY
• In order for g(x) to be defined and continuous we must restrict x so that sin(x) ě 0.
• Hence g(x) is continuous when x P [2nπ, (2n + 1)π ] for any integer n.
Example 2.3.9
Continuous functions are very nice (mathematically speaking). Functions from the “real world”
tend to be continuous (though not always). The key aspect that makes them nice is the fact that they
don’t jump about.
69
L IMITS 2.3 L IMITS AND CONTINUITY
70
Differentiation
71
Chapter 3
Learning Objectives
• Given an equation for a line, sketch the line, and identify its slope.
• Find a line from two points; from a point and a slope; or from a clearly labelled graph.
As you’ll soon see, derivatives and lines are closely related. To make the discussion of derivatives
smoother1 , we’ll do a quick review of lines.
1 This is a pun, but you might need to read a bit further to recognize it.
73
I NTRODUCTION TO THE D ERIVATIVE 3.1 R EVIEW: LINES
• A line of positive slope is increasing (going upwards as you move to the right).
• A line of negative slope is decreasing (going downwards as you move to the right).
The equation of a line is in ‘slope-intercept form’ if it has the form
y = mx + b
where m and b are real numbers. The slope of the line is m, and it passes through the point (0, b)
Example 3.1.1
Sketch the line y = 3x ´ 2.
Solution: The slope of the line is 3, and the line passes through the point (0, ´2). So, we can
start with putting a point at (0, ´2). Then, move right 1 and up 3 to find another point on the line,
(1, 1). Draw the line between these two points.
y
74
I NTRODUCTION TO THE D ERIVATIVE 3.1 R EVIEW: LINES
Example 3.1.1
Example 3.1.2
Sketch the line y = 2(1 ´ x).
Solution: This isn’t in slope-intercept form, but we can manipulate it to be:
y = ´2x + 2 .
So, the slope of the line is ´2, and the line passes through the point (0, 2).
Example 3.1.2
Example 3.1.3
Give an equation for a line passing through the point (1, 2) with slope 3.
Solution: We aren’t told which format to put it in. Since we have a point and a slope, point-slope
format is easiest:
y ´ 2 = 3(x ´ 1) .
75
I NTRODUCTION TO THE D ERIVATIVE 3.1 R EVIEW: LINES
Example 3.1.3
Example 3.1.4
Give an equation for a line passing through the points (1, 2) and (3, 3).
Solution: If we had the slope, we could use point-slope. So, let’s find the slope!
∆y 3 ´ 2 1
m= = = .
∆x 3 ´ 1 2
Now, we can write the line in point-slope form. The following two equations are equivalent to one
another (and, therefore, both correct answers):
1
y ´ 2 = (x ´ 1)
2
1
y ´ 3 = (x ´ 3)
2
Since we weren’t told which form to put the equation into, other answers are possible.
Example 3.1.4
Example 3.1.5
Find an equation for the line sketched below. Each gridline corresponds to a single unit.
y
Solution: From the gridlines, we can be fairly certain that the line passes through the points
(´4, ´3) and (4, 2). (These are the only places shown where the line intersects a point on the grid
where both the x-and y-values are integers. It’s tough to guess at the exact value of a point on the
line elsewhere, so we don’t try.) As in Example 3.1.4, we use these two points to find the slope:
2 ´ (´3) 5
m= =
4 ´ (´4) 8
From here, the easiest format to use is point-slope. The two equations below are equivalent:
5
y + 3 = (x + 4)
8
5
y ´ 2 = (x ´ 4)
8
76
I NTRODUCTION TO THE D ERIVATIVE 3.1 R EVIEW: LINES
Example 3.1.5
Example 3.1.6
Each equation below describes a line. Sort them into collections of equations describing the same
line.
y´1
A. y = 2x + 1 D. y ´ 5 = 2(x ´ 2) G. x = 2
B. y = 2x ´ 1 E. 2y + 2 = 4x H. 2 ´ y = x + 1
C. y = 1 ´ x F. y ´ 1 = 2(x ´ 1) I. y + 3 = 2(x + 1)
Solution: One method is to manipulate each equation algebraically until they are in slope-
intercept form, and then see which are the same. Equations A and B are already in slope-intercept
form.
C: y = 1´x
ðñ y = ´x + 1 C is neither equivalent to A nor B
D: y ´ 5 = 2(x ´ 2)
ðñ y ´ 5 = 2x ´ 4
ðñ y = 2x + 1 D is equivalent to A
E: 2y + 2 = 4x
ðñ 2y = 4x ´ 2
ðñ y = 2x ´ 1 E is equivalent to B
F: y ´ 1 = 2(x ´ 1)
ðñ y ´ 1 = 2x ´ 2
ðñ y = 2x ´ 1 F is equivalent to B
y´1
G: x=
2
ðñ 2x = y ´ 1
ðñ y = 2x + 1 G is equivalent to A
H: 2´y = x+1
ðñ ´2 + y = ´x ´ 1
ðñ y = ´x + 1 H is equivalent to C
I: y + 3 = 2(x + 1)
ðñ y + 3 = 2x + 2
ðñ y = 2x ´ 1 I is equivalent to B
All together:
77
I NTRODUCTION TO THE D ERIVATIVE 3.1 R EVIEW: LINES
Example 3.1.6
Suppose every piece of a piecewise-defined function is linear. Then at any point (except where the
function switches from one piece to the other), the function locally looks like a line, so we can find
its slope at that point.
Example 3.1.7
Consider the function below:
$
&3 ´ x
’ for x ď ´2
f (x) = 2x + 1 for ´ 2 ă x ď 1
’
5 ´ 2x for x ą 1
%
Sketch y = f (x), and give the slope of the line making up the function at all points x except x = ´2
and x = 1. (In fact, we can call these numbers the slope of the function itself.)
Solution:
• For x ď ´2, we sketch a line with slope ´1. When x is (say) ´3, then y is 6, so one point we
know is (´6, 6).
• For ´2 ă x ď 1, the line to draw has slope 2 and passes through the point (0, 1).
• For x ě 1, we sketch a line with slope ´2. When x is (say) 2, then y is 1, so one point we
know is (´2, 1).
78
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE
As you’ll see later, it won’t make sense to us to talk about the “slope” of the function when x is ´2
or 1. At these places, f (x) doesn’t look much like a line.
Example 3.1.7
Learning Objectives
• Describe the slope of a linear function as the rate of change of that function (change in
y over change in x).
79
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE
For a linear relationship f (x), we define the rate of change of y = f (x) with respect to x
as the ratio:
change in y
.
change in x
A graph of y = f (x) versus x is a straight line with slope m and intercept b. In section 3.1, we
remembered the slope m of a line as the ratio of the changes of the vertical component (or dependent
variable) to the horizontal component (or independent variable) between (any) two distinct points
f (x1 )´ f (x0 ) ∆y
along the line. We can write this as yx11 ´y0
´x0 or x1 ´x0 or ∆x . This is precisely the rate of change of
2
y per unit rate of change of x, and it is a constant. This is the property that distinguishes lines from
other curves, and linear relationships from nonlinear ones:
The slope m of a straight line with equation y = mx + b is the rate of change of the linear
function y = f (x), and it is constant.
change in y
= m.
change in x
The following example demonstrates how we can describe the slope of a line as the rate of
change of a linear function (i.e., change in y per change in x, over any interval).
Example 3.2.3
In this example we look at the straight line
y = 12 x + 32 .
• From the slope of 12 , we claim that if, as we walk along this straight line, our x–coordinate
changes by an amount ∆x, then our y–coordinate changes by exactly ∆y = 12 ∆x. This is what
we mean by rate of change.
• For example, in the figure on the left below, we move from the point
(x0 , y0 ) = (1 , 2 = 21 ˆ 1 + 32 )
2 In the “real world” the phrase “rate of change” usually refers to rate of change per unit time. In science it used
more generally.
80
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE
y y = 21 x + 3
2 y (x1 , y1 ) y = 21 x + 3
2
(5, 4)
(x0 , y0 ) ∆y
∆y
∆x
(1, 2) ∆x
x x
Example 3.2.3
81
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE
A straight line connecting any two points on the graph of a function is called a secant line
of that function.
We define the average3 rate of change of a function f (x) over an interval x0 ď x ď x1 as the
slope of the straight line connecting the two points (x0 , f (x0 )), and (x1 , f (x1 )).
The average rate of change of y = f (x) over the interval x0 ď x ď x1 is the slope of the
secant line through the two points (x0 , f (x0 )), and (x1 , f (x1 )):
The average rate of change of a function can be interpreted in different ways, depending on what
the function represents. When the function of interest represents distance with respect to time, its
average rate of change is the average velocity:
For a moving body, the average velocity over a time interval a ď t ď b is the average rate
of change of distance over the given time interval.
∆distance
average velocity =
∆time
3 The word “average” sometimes causes confusion. One often speaks in a different context about the average value
of a set of numbers (e.g. the average of t7, 1, 3, 5u is (7 + 1 + 3 + 5)/4 = 4.) However the term average rate of
change always means the slope of the straight line joining a pair of points.
82
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE
• Look at the interval between the points (2, 4) and (4, 16) on the parabola. If we draw a straight
line connecting (2, 4) and (4, 16), this is a secant line for the parabola. This secant line has
∆y
slope m = ∆x = 16´4 12
4´2 = 2 = 6.
y = x2
y
16
4
x
2 4
Secant line through points (2, 4) and (4, 16)
• The slope of the secant line connecting (2, 4) and (4, 16) is the average rate of change of the
function over the interval 2 ď x ď 4.
• Now consider the points (2, 4) and (5, 25) on the parabola. We can form a different secant
∆y
line by connecting these two points with a straight line, which will have a slope of m = ∆x =
25´4 21
5´2 = 3 = 7. This slope is the average rate of change of the function over the interval
2 ď x ď 5.
y = x2
y
25
4
x
2 5
Secant line through points (2, 4) and (5, 25)
83
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE
Example 3.2.7
Notice that different choices for either (or both) of the points (x0 , y0 ) and (x1 , y1 ) can result in
different values for the slope yx11 ´y0
´x0 of the secant through those points. Thus the average rate of
change will, in general, depend on which two points we select. This is in contrast to the linear case;
see the example below.
Example 3.2.8
Consider the line y = 12 x + 32 . If y = f (x) is linear, then the secant through any two different points
on a line is always identical to the line itself, and so always has exactly the same slope as the line
itself. This is illustrated in Figure 3.2.1 below — the (yellow) secant through (x0 , y0 ) and (x1 , y1 )
lies exactly on top of the (red) line y = 21 x + 32 .
Figure 3.2.1.
y (x1 , y1 ) y = 21 x + 3
2
(x0 , y0 )
x
For a straight line, all secants have the same slope.
This also means that if y = f (x) is linear, then the average rate of change (from 3.2.5) is the
same as the rate of change (from 3.2.1); and both rates are equal to its slope.
Example 3.2.8
x1 = x0 + h.
In other words, h is the difference of the two x coordinates. Then our two points are (x0 , f (x0 ))
and (x0 + h, f (x0 + h)), and we can write the average rate of change of f across the interval
84
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
x0 ď x ď x0 + h as
∆y [ f (x0 + h) ´ f (x0 )] [ f (x0 + h) ´ f (x0 )]
= = .
∆x (x0 + h) ´ x0 h
This ratio is the slope of the secant line through the two points.
Figure 3.2.2.
y = f (x)
f (x0 + h)
secant line
f (x0 )
x
x0 x0 + h
The slope of the secant line through the points (x0 , f (x0 )) and
(x0 + h, f (x0 + h)) is the average rate of change of f over the given interval.
` A secant line between two points, x0 and x0 + h on the graph of a function f (x) is shown in
this link. You can change the base point x0 , the distance between the x coordinates, h, or you
can input your own function for f (x). The slope of the secant line is the average rate of change
of f over the interval x0 ď x ď x0 + h
To summarize, now we have an alternative to the definition of average rate of change and the
associated slope of the secant line provided in 3.2.5:
Definition 3.2.9 (Average rate of change over an interval is slope of secant (alternate)).
The average rate of change of y = f (x) over the interval x0 ď x ď x0 + h is the slope of
the secant line through the two points (x0 , f (x0 )), and (x0 + h, f (x0 + h)):
Learning Objectives
• Explain using words, pictures, and the language of limits what a derivative is.
• Use the definition of derivative to find the tangent line to a function at a given point.
85
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
• Explain why the definition of a derivative is important, even if you know shortcuts for
computation.
Example 3.3.1
In this example, let us fix (x0 , y0 ) to be the point (2, 4) on the parabola y = x2 . Now let (x1 , y1 ) =
(x1 , x12 ) be some other point on the parabola; that is, a point with x1 ‰ x0 .
• The following table gives the slope, yx11 ´y´x0 , of the secant line through (x0 , y0 ) = (2, 4) and
0
• So now we have a big table of numbers — what do we do with them? Look at the columns of
the table closer to the middle. As x1 gets closer and closer to x0 = 2, the slope, yx11 ´y 0
´x0 , of the
secant through (x0 , y0 ) and (x1 , y1 ) appears to get closer and closer to the value 4.
86
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
Example 3.3.1
Example 3.3.2
It is very easy to generalise what is happening in Example 3.3.1.
• Fix any point (x0 , y0 ) on the parabola y = x2 . If (x1 , y1 ) is any other point on the parabola
y = x2 , then y1 = x12 and the slope of the secant through (x0 , y0 ) and (x1 , y1 ) is
y1 ´ y0 x2 ´ x2
slope = = 1 0 since y = x2
x1 ´ x0 x1 ´ x0
(x ´ x )(x + x0 )
= 1 0 1 remember a2 ´ b2 = (a ´ b)(a + b)
x1 ´ x0
= x1 + x0
You should check the values given in the table of Example 3.3.1 above to convince yourself
that the slope xy11 ´x
´y0
0
of the secant line really is x0 + x1 = 2 + x1 (since we set x0 = 2).
• Now as we move x1 closer and closer to x0 , the slope should move closer and closer to 2x0 .
Indeed if we compute the limit carefully, we see that in the limit as x1 Ñ x0 the slope becomes
2x0 . That is
y1 ´ y0
lim = lim (x1 + x0 ) by the work we did just above
x1 Ñx0 x1 ´ x0 x1 Ñx0
= 2x0
(Note: Taking this limit gives us our first derivative. Of course we haven’t yet given the
definition of a derivative, so we perhaps wouldn’t recognise it yet. We rectify this in the next
section.)
Figure 3.3.1.
y y = x2
(x0 , y0 )
x
Secants approaching a tangent line
87
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
• So it is reasonable to say “as x1 approaches x0 , the secant through (x0 , y0 ) and (x1 , y1 )
approaches the tangent line to the parabola y = x2 at (x0 , y0 )”.
The figure above shows four different secants through (x0 , y0 ) for the curve y = x2 . The
four hollow circles are four different choices of (x1 , y1 ). As (x1 , y1 ) approaches (x0 , y0 ), the
corresponding secant does indeed approach the tangent to y = x2 at (x0 , y0 ), which is the
heavy (red) straight line in the figure.
Using limits we determined the slope of the tangent line to y = x2 at x0 to be 2x0 . Often we
will be a little sloppy with our language and instead say “the slope of the parabola y = x2 at
(x0 , y0 ) is 2x0 ” — where we really mean the slope of the line tangent to the parabola at x0 .
Example 3.3.2
∆x = x1 ´ x0
We can construct a secant line through (x0 , y0 ) and (x1 , y1 ) just as we did for the parabola above. It
has slope
y1 ´ y0 f x0 + ∆x ´ f (x0 )
=
x1 ´ x0 ∆x
5 The idea of “smooth enough” can be made quite precise. Indeed the word “smooth” has a very precise meaning in
mathematics, which we won’t cover here. For now think of “smooth” as meaning roughly just “smooth”.
6 Again the term “reasonably smooth” can be made more precise.
7 Indeed, we don’t have to expect — it is!
88
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
When we talk of the “slope of the curve” at a point, what we really mean is the slope of the tangent
line to the curve at that point. So “the slope of the curve y = f (x) at (x0 , y0 )” is also the limit8
expressed in the above equation. The derivative of f (x) at x = x0 is also defined to be this limit.
Which leads9 us to the most important definition in this text:
• When the above limit exists, the function f (x) is said to be differentiable at x = a.
When the limit does not exist, the function f (x) is said to be not differentiable at
x = a.
f (x ) ´ f (a)
f 1 (a) = lim .
xÑa x´a
To see that these two definitions are the same, we set x = a + h and then the limit as
h goes to 0 is equivalent to the limit as x goes to a.
• Informally, f 1 (a) is the “slope of f (x) at a”. Formally, f 1 (a) is the slope of the
tangent line to f (x) at x = a.
Let’s now compute the derivatives of some very simple functions. This is our first step towards
building up a toolbox for computing derivatives of complicated functions — this process will very
much parallel what we did in the previous chapter with limits. The two simplest functions we know
are f (x) = c and g(x) = x.
8 This is of course under the assumption that the limit exists — we will talk more about that below.
9 We will rename “x0 ” to “a” and “∆x” to “h”.
10 Maybe you remember this, but just in case you don’t: the open interval (c, d ) is just the set of all real numbers
obeying c ă x ă d.
89
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
f (a + h) ´ f (a)
f 1 (a) = lim (the definition)
hÑ0 h
c´c
= lim (substituted in the function)
hÑ0 h
= lim 0 (simplified things)
hÑ0
=0
Example 3.3.4
That was easy! What about the next most complicated function — arguably it’s this one:
Example 3.3.5 (Derivative of g(x) = x)
Let a P R and compute the derivative of g(x) = x at x = a.
Again, we compute the derivative of g by just substituting the function of interest into the formal
definition of the derivative and then evaluating the resulting limit.
g(a + h) ´ g(a)
g1 (a) = lim (the definition)
hÑ0 h
(a + h) ´ a
= lim (substituted in the function)
hÑ0 h
h
= lim (simplified things)
hÑ0 h
= lim 1 (simplified a bit more)
hÑ0
=1
Example 3.3.5
That was a little harder than the first example, but still quite straight forward — start with the
definition and apply what we know about limits.
Thanks to these two examples, we have our first theorem about derivatives:
Let a, c P R and let f (x) = c be the constant function and g(x) = x. Then
f 1 (a) = 0
and
g1 (a) = 1.
90
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
f (a + h) ´ f (a)
the slope of the secant through a, f (a) and a + h, f (a + h) =
h
This is shown in Figure 3.3.2 below.
Figure 3.3.2.
f (a + h) ´ f (a)
the slope of the tangent line to y = f (x) at x = a = lim
hÑ0 h
1
= f (a).
` As h Ñ 0, the secant line approaches a tangent line. Use the slider for h to show this trend,
and note that the slope of the secant line approaches the slope of the tangent line at the point x0 .
Let us go a little further and work out a general formula for the equation of the tangent line to
y = f (x) at x = a. We know that the tangent line
11 We are of course assuming that the curve is smooth enough to have a tangent line at a.
91
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
There are a couple of different ways to construct the equation of the tangent line from this information.
One is to observe, as inFigure 3.3.3, that if (x, y) is any other point on the tangent line then the line
segment from a, f (a) to (x, y) is part of the tangent line and so also has slope f 1 (a). That is,
y ´ f (a)
= the slope of the tangent line = f 1 (a)
x´a
Cross multiplying gives us the equation of the tangent line:
y ´ f (a) = f 1 (a) (x ´ a) or y = f (a) + f 1 (a) (x ´ a)
Figure 3.3.3.
y y = f (a) + f ′ (a) (x − a)
(x, y) y = f (x)
a, f (a)
x
A line segment of a tangent line
A second way to derive the same equation of the same tangent line is to recall that the general
equation for a line, with finite slope, is y = mx + b, where m is the slope and b is the y-intercept. We
already know the slope — so m = f 1 (a). To work out b we use the other piece of information —
(a, f (a)) is on the line. So (x, y) = (a, f (a)) must solve y = f 1 (a) x + b. That is,
f (a) = f 1 (a) ¨ a + b and so b = f (a) ´ a f 1 (a)
Hence our equation is, once again,
y = f 1 (a) ¨ x + ( f (a) ´ a f 1 (a)) or, after rearranging a little,
y = f (a) + f 1 (a) (x ´ a)
This is a very useful formula, so perhaps we should make it a theorem.
Theorem 3.3.7 (Tangent line).
y = f (a) + f 1 (a) (x ´ a)
92
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
The caveat at the end of the above theorem is necessary — there are certainly cases in which the
derivative does not exist and so we do need to be careful.
Example 3.3.8
Find the tangent line to the curve y = x2 at x = 3.
Rather than redoing everything from scratch, we can, and for efficiency, should, use Theorem
3.3.7. To write this up properly, we must ensure that we tell the reader what we are doing. So
something like the following:
• By Theorem 3.3.7, the tangent line to the curve y = f (x) at x = a is given by
y = f (a) + f 1 (a)(x ´ a)
provided f 1 (a) exists.
• In Example 3.3.2, we found that, for any x0 ą 0, the derivative of x2 at x = x0 is
f 1 (x0 ) = 2x0 .
The tangent line formula uses a instead of x0 , so let’s use a for the derivative at the point:
f 1 (a) = 2a.
We don’t have to write it up using dot-points as above; we have used them here to help delineate
each step in the process of computing the tangent line.
Example 3.3.8
In the example above, imagine “zooming in” to the point (3, 9) and watching the curve of the
function and the tangent line. As you zoom in closer and closer, the tangent line more and more
closely matches the curve at that point. In fact, if we want to use a line to approximate a curve at any
point x = a, the best we can do is indeed its tangent line at that point. (If you’re not convinced: can
you come up with a different line that passes through the point (3, 9) that is a better approximation
to x2 than its tangent line there?)
93
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
The linear approximation to f (x) at a is a better approximation to f (x) near x = a than other
lines through (a, f (a)). You might start wondering: What kind of polynomial approximation might
be a better approximation than the linear approximation? Why? We return to this idea in a later
chapter when we discuss numerical approximations.
94
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
• You should go back check that this is what we got in Example 3.3.2 — just some names have
been changed.
Example 3.3.10
• If the derivative f 1 (x) exists for all x P (a, b) we say that f is differentiable on (a, b).
• Note that we will sometimes be a little sloppy with our discussions and simply write
“ f is differentiable” to mean “ f is differentiable on an interval we are interested in”
or “ f is differentiable everywhere”.
95
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
Notice that we are no longer thinking of tangent lines. Instead, differentiation is an operation we
can do on a function – and moreover, the result (the derivative) is itself a function as well.
For example:
Example 3.3.12 The derivative of f (x) = 1x
1
Let f (x) = x and compute its derivative with respect to x — think carefully about where the deriva-
tive exists.
• Our first step is to write down the definition of the derivative — at this stage, we know of no
other strategy for computing derivatives.
f (x + h) ´ f (x )
f 1 (x) = lim (the definition)
hÑ0 h
• Notice that the original function f (x) = 1x was not defined at x = 0 and the derivative is also
not defined at x = 0. This does happen more generally — if f (x) is not defined at a particular
point x = a, then the derivative will not exist at that point either.
Example 3.3.12
96
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
Notation 3.3.13.
The following notations are all used for “the derivative of f (x) with respect to x”
df d
f 1 (x ) f (x ) f˙(x) D f (x ) Dx f (x),
dx dx
while the following notations are all used for “the derivative of f (x) at x = a”
ˇ
df d
f˙(a)
1
ˇ
f (a) (a) f (x) ˇˇ D f (a) Dx f ( a ) .
dx dx x=a
• We will generally use the first three, but you should recognise them all. The notation
f 1 (a) is due to Lagrange, while the notation ddxf (a) is due to Leibniz. They are both
very useful. Neither can be considered “better”.
• The notation f˙ is due to Newton. In physics, it is common to use f˙(t ) to denote the
derivative of f with respect to time.
0
• As x tends to a, the numerator and denominator both tend to zero. But 0 is not defined.
97
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
So to get a well defined limit we need to exhibit a cancellation between the numerator and
denominator — just as we saw in Example 2.1.23.
? ?
• Recall how to factor the difference of two perfect squares:set A = x and B = a in A2 ´B2 =
(A ´ B)(A + B) to get
? ? ? ?
x ´ a = ( x ´ a)( x + a)
and then substitute this little fact into our expression
? ? ? ?
x´ a x´ a
= ? ? ? ? (now cancel common factors)
x´a ( x ´ a)( x + a)
1
= ? ?
( x + a)
• Now we can take the limit we need:
? ?
1 x´ a
f (a) = lim
xÑa x ´ a
1
= lim ? ?
xÑa x + a
1
= ?
2 a
• We should think about the domain of f 1 here — that is, for which values of a is f 1 (a) defined?
1
The original function f (x) was defined for all x ě 0, however the derivative f 1 (a) = 2? a
is
undefined at a = 0.
?
If we draw a careful picture of x around x = 0 we can see why this has to ? be the case. The
figure below shows three different tangent lines to the graph of y = f (x) = x. As the point
of tangency moves closer and closer? to
the origin, the tangent line gets steeper and steeper.
The slope of the tangent line at a, a blows up as a Ñ 0.
√
y y= x
Example 3.3.14
d
Example 3.3.15 dx t|x|u
Compute the derivative, f 1 (a), of the function f (x) = |x| at the point x = a.
98
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
• We should start this example by recalling the definition of |x| (we saw this back in Exam-
ple 2.1.32):
$
&´x if x ă 0
’
|x| = 0 if x = 0
’
x if x ą 0.
%
• This breaks our computation of the derivative into 3 cases depending on whether x is positive,
negative or zero.
• Assume x ą 0. Then
df f (x + h) ´ f (x )
= lim
dx hÑ0 h
|x + h| ´ |x|
= lim
hÑ0 h
Since x ą 0 and we are interested in the behaviour of this function as h Ñ 0 we can assume h
is much smaller than x. This means x + h ą 0 and so |x + h| = x + h.
x+h´x
= lim
hÑ0 h
h
= lim = 1 as expected
hÑ0 h
• Assume x ă 0. Then
df f (x + h) ´ f (x )
= lim
dx hÑ0 h
|x + h| ´ |x|
= lim
hÑ0 h
Since x ă 0 and we are interested in the behaviour of this function as h Ñ 0 we can assume h
is much smaller than x. This means x + h ă 0 and so |x + h| = ´(x + h).
´(x + h) ´ (´x)
= lim
hÑ0 h
´h
= lim = ´1
hÑ0 h
99
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
• When x = 0 we have
f (0 + h) ´ f (0)
f 1 (0) = lim
hÑ0 h
|0 + h| ´ |0|
= lim
hÑ0 h
|h|
= lim
hÑ0 h
=1
Whereas, the limit from below is:
|h| ´h
lim = lim since h ă 0, |h| = ´h
hÑ0´ h hÑ0´ h
= ´1
Since the one-sided limits differ, the limit as h Ñ 0 does not exist. And thus the derivative
does not exist as x = 0.
In summary:
$
d &´1
’ if x ă 0
|x| = DNE if x = 0
dx ’
1 if x ą 0
%
Example 3.3.15
100
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
• In Example 3.3.12, we considered the function f (x) = 1x . This function “blows up” (i.e.
becomes infinite) at x = 0. It does not have a tangent line at x = 0 and its derivative does not
exist at x = 0.
• In Example 3.3.15, we considered the function f (x) = |x|. This function does not have a
tangent line at x = 0, because there is a sharp corner in the graph of y = |x| at x = 0. (Look at
the graph in Example 2.2.10.) So the derivative of f (x) = |x| does not exist at x = 0.
Here are a few more examples.
Example 3.3.16
Visually, the function
y y = H(x)
#
0 if x ď 0
H (x ) =
1 if x ą 0
x
does not have a tangent line at (0, 0). Not surprisingly, when a = 0 and h tends to 0 with h ą 0,
d 1/3
Example 3.3.17 dx x
Visually, it looks like the function f (x) = x1/3 , sketched below, (this might be a good point to recall
that cube roots of negative numbers are negative — for example, since (´1)3 = ´1, the cube root
of ´1 is ´1),
y = x1/3
has the y–axis as its tangent line at (0, 0). So we would expect that f 1 (0) does not exist. Let’s check.
With a = 0,
as expected.
101
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
Example 3.3.17
a
d
Example 3.3.18 dx |x|
?
We have already considered
a the derivative of the function x in Example 3.3.14. We’ll now look at
the function f (x) = |x|. Recall,? from Example 3.3.15, the definition of |x|. When ? x ą 0, we have
|x| =ax and f (x) is identical to x. When x ă 0, ? we have |x| = ´x and f (x) = ´x. So to graph
y = |x| when x ă 0, you just have to graph y = x for x ą 0 and then send x Ñ ´x — i.e. reflect
the graph in the y–axis. Here is the graph. The pointy thing at the origin is called a cusp. The graph
p
y= |x|
of y = f (x) does not have a tangent line at (0, 0) and, correspondingly, f 1 (0) does not exist because
a
f (h) ´ f (0) |h| 1
lim = lim = lim ? = DNE
hÑ0+ h hÑ0+ h hÑ0+ h
Example 3.3.18
Theorem 3.3.19.
f (a + h) ´ f (a)
f (a + h) ´ f (a) = h
h
as h Ñ 0 exists and is zero. But if f (x) is differentiable at x = a, then, as h Ñ 0, the first factor,
f (a+h)´ f (a)
h converges to f 1 (a) and the second factor, h, converges to zero. So the product provision
f (a+h)´ f (a)
of our arithmetic of limits Theorem 2.1.14 implies that the product h h converges to
f 1 (a) ¨ 0 = 0 too.
Notice that while this theorem is useful as stated, it is (arguably) more often applied in its
contrapositive12 form:
12 If you have forgotten what the contrapositive is, then quickly reread Footnote 3 in Section 2.1.
102
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
As the above examples illustrate, this statement does not tell us what happens if f is continuous
at x = a — we have to think!
13 Again — recall that we are being a little sloppy with this term — we really mean “The slope of the tangent line to
the curve”.
103
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE
• We conclude that the instantaneous velocity at time t = 1, which is the instantaneous rate of
change of distance per unit time at time t = 1, is the derivative s1 (1) = 9.8m/sec.
Example 3.3.21
Now suppose, more generally, that you are taking a walk and that as you walk, you are continu-
ously measuring some quantity, like temperature, and that the measurement at time t is f (t ). Then
the
so the
In particular, if you are walking along the x–axis and your x–coordinate at time t is x(t ), then x1 (a)
is the instantaneous rate of change (per unit time) of your x–coordinate at time t = a, which is your
velocity at time a. If v(t ) is your velocity at time t, then v1 (a) is the instantaneous rate of change of
your velocity at time a. This is called your acceleration at time a.
You might expect that if the instantaneous rate of change of a function at time c is strictly
positive, then, in some sense, the function is increasing at t = c. You would be right. Indeed, if
f (t )´ f (c)
f 1 (c) ą 0, then, by definition, the limit of t´c as t approaches c is strictly bigger than zero. So
104
I NTRODUCTION TO THE D ERIVATIVE 3.4 H IGHER ORDER DERIVATIVES
f (t ) ´ f (c)
ą 0 ùñ f (t ) ´ f (c) ą 0 (since t ´ c ą 0)
t ´c
ùñ f (t ) ą f (c)
f (t ) ´ f (c)
ą 0 ùñ f (t ) ´ f (c) ă 0 (since t ´ c ă 0)
t ´c
ùñ f (t ) ă f (c)
Consequently we say that “ f (t ) is increasing at t = c”. If we wish to emphasise that the inequalities
above are the strict inequalities ą and ă, as opposed to ě and ď, we will say that “ f (t ) is strictly
increasing at t = c”.
Learning Objectives
• Understand what is meant by ‘higher-order derivatives,’ and compute them.
The operation of differentiation takes as input one function, f (x), and produces as output another
function, f 1 (x). Now f 1 (x) is once again a function. So we can differentiate it again, assuming
that it is differentiable, to create a third function, called the second derivative of f . And we can
differentiate the second derivative again to create a fourth function, called the third derivative of f .
And so on.
Notation 3.4.1.
d2 f
• f 2 (x) and f (2) (x) and dx2
(x ) all mean d d
dx dx f (x )
d3 f
• f 3 (x) and f (3) (x) and dx3
(x ) all mean d d d
dx dx dx f (x )
d4 f
• f (4) (x) and dx4
(x ) both mean d d d d
dx dx dx dx f (x )
• and so on.
14 This is typical mathematician speak — it allows us to be completely correct, without being terribly precise. In this
context, “sufficiently close” means “The following need not be true for all t bigger than c, but there must exist
some b ą c so that the following is true for all c ă t ă b”. Typically we do not know what b is. And typically it
does not matter what the exact value of b is. All that matters is that b exists and is strictly bigger than c.
105
I NTRODUCTION TO THE D ERIVATIVE 3.4 H IGHER ORDER DERIVATIVES
Example 3.4.2
106
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS
Learning Objectives
• Use the definition of the derivative to show that the derivative of the function f (x) = ax
(where a is a positive constant) is a constant times ax .
• Note the useful modelling power of a function whose derivative is proportional to itself.
In this section we show how to compute the derivative of the exponential function. Let a ą 015
and set f (x) = ax — this is what is known as an exponential function with base a. This function
interacts very nicely with its derivative and turns up in many “real world” examples.
Let’s see what happens when we try to compute the derivative of this function just using the
definition of the derivative.
df f (x + h) ´ f (x ) ax+h ´ ax
= lim = lim
dx hÑ0 h hÑ0 h
a ´1
h a ´1
h
= lim ax ¨ = ax ¨ lim
hÑ0 h hÑ0 h
We cannot yet complete this computation because we cannot evaluate the last limit directly. For the
moment, let us assume this limit exists and name it
ah ´ 1
C (a) = lim .
hÑ0 h
It depends only on a and on h and is completely independent of x. Using this notation (which we
will quickly improve upon below), our desired derivative is now
d x
a = C (a) ¨ ax .
dx
Thus the derivative of an exponential function ax is just ax multiplied by some constant that depends
only on the base a. If we can tune a so that C (a) = 1 then the derivative would just be the original
function! This turns out to be very useful.
To try finding an a that obeys C (a) = 1, let us first investigate how C (a) changes with a.
Unfortunately (though this fact is not at all obvious) there is no way to write C (a) as a finite
combination of any of the functions we have examined so far16 . Instead, we’ll calculate approximate
values of C (a) by plugging in some small values of h. We’ll do this for a few values of a.
15 Letting the base be positive is necessary because we want to ensure this function is defined for all real x.
16 To be a bit more precise, we say that a number q is algebraic if we can write q as the zero of a polynomial with
integer coefficients. When a is any positive algebraic number other 1, C (a) is not algebraic. A number that is not
algebraic is called transcendental. The best known example of a transcendental number is π (which follows from
the Lindemann-Weierstrass Theorem — way beyond the scope of this course).
107
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS
Example 3.5.1
1h ´ 1
Let a = 1, then C (1) = lim = 0. This is not surprising since 1x = 1 is constant, and so its
hÑ0 h
derivative must be zero everywhere.
2h ´ 1
Now let a = 2, then C (2) = lim . Setting h to smaller and smaller numbers gives:
hÑ0 h
So C (2) « 0.6931. (The actual value of C (2) has an infinitely long decimal expansion.) Similarly
when a = 3 we get:
and a = 10:
Instead of continuing to write ‘the value of a for which C (a) = 1’, this particular a is historically
given its own name: e. To find a value for e, we begin with C (e) = 1:
eh ´ 1
C (e) = lim = 1.
hÑ0 h
This means that for small h,
eh ´ 1
« 1,
h
so that
eh ´ 1 « h ñ eh « h + 1 ñ e « (1 + h)1/h .
More formally, we would write that
We can find an approximate decimal expansion for e by calculating the expression in Eqn. (3.5.1)
for some very small (but finite value) of h.
108
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS
e « (1.00001)100000 « 2.71826,
e = 2.7182818284590452354 . . .
1 1 1 1
= 1 + + + + + ¨ ¨ ¨ .18
1! 2! 3! 4!
We will be able to explain this last formula once we develop Taylor polynomials later in the
course.
To summarise:
Theorem 3.5.3.
eh ´ 1
lim = 1.
hÑ0 h
Further,
d x
(e ) = ex .
dx
17 Unfortunately there is another Euler’s constant, γ, which is more properly called the Euler–Mascheroni constant.
Anyway like many mathematical discoveries, e was first found by someone else — Napier used the constant e in
order to compute logarithms but only implicitly. Bernoulli was probably the first to approximate it when examining
continuous compound interest. It first appeared explicitly in work of Leibniz, though he denoted it b. It was Euler,
though, who established the notation we now use and who showed how important the constant is to mathematics.
18 Recall n factorial, written n! is the product n ˆ (n ´ 1) ˆ (n ´ 2) ˆ ¨ ¨ ¨ ˆ 2 ˆ 1.
n
19 Equivalently, e can be defined as e = limhÑ0 (1 + h)1/h or as e = limnÑ8 1 + 1n .
109
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS
Figure 3.5.1.
y y = ex
6
2
1
x
−3 −2 −1 1 2 3
1. ex+y = ex ey .
2. e´x = e1x .
y
3. ex = exy .
4. ex is a function that is defined, continuous, and differentiable for all real numbers x.
5. e0 = 1, and e1 = e.
7. lim ex = 8, lim ex = 0.
xÑ8 xÑ´8
8. The derivative of ex is ex .
Example 3.5.4
Find the derivative of ex when x = 0. Then show that the tangent line at that point is the line y = x + 1.
• The derivative of ex is ex . At x = 0, e0 = 1.
• The slope of the tangent line at x = 0 is the derivative of the function at that point, which
we just found to be 1. The tangent line goes through the point (0, e0 ) = (0, 1). With slope 1
and an intercept at (0, 1), the tangent line at x = 0 can be written in slope-intercept form as
y = x + 1.
Example 3.5.4
20 The function ex is of course the special case of the function ax with a = e. So it inherits all the usual algebraic
properties of ax .
110
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS
` To see the tangent line to the exponential function: On this graph of f (x) = ex , add the
tangent line y = x + 1. Does it touch the curve where you expect it to? As an extra step, add a
generic tangent line at any point x0 . Adjust a slider for x0 to see how the tangent line changes as
it moves along the curve.
In the next chapter, we return to the problem of differentiating ax . (If your curiosity is piqued,
take a look at Example 4.4.6 – although you’ll need the techniques introduced between here and
there in order to understand it.)
Definition 3.5.5.
The function y = f (x) = ex is equal to its own derivative, which means that it satisfies the
equation
dy
= y.
dx
An equation linking a function and its derivative(s) is called a differential equation.
This is a new type of equation, unlike others previously seen in this course. They feature highly
in some of the later (flavoured) chapters of this text, where we show that these differential equations
have many applications to biology, physics, chemistry, and science in general.
111
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS
112
Chapter 4
C OMPUTING D ERIVATIVES
Learning Objectives
• Demonstrate using the limit definition of derivative that differentiation is linear.
• Use counterexamples to demonstrate that certain statements about derivatives are false.
• Demonstrate the Power Rule for integer exponents using the limit definition of deriva-
tive.
So far, we have evaluated derivatives only by applying Definition 3.3.3 to the function at hand and
then computing the required limits directly. It is quite obvious that as the function being differentiated
becomes even a little complicated, this procedure quickly becomes extremely unwieldy. It is many
orders of magnitude more efficient to have access to:
• a collection of rules for breaking down complicated derivative computations into sequences of
simple derivative computations.
113
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
This is precisely what we did to compute limits. We started with limits of simple functions and then
used “arithmetic of limits” to compute limits of complicated functions.
We have already started building our list of derivatives of simple functions. We have shown, in
Examples 3.3.4, 3.3.5, 3.3.10 and 3.3.14, that:
d d d 2 d ? 1
1 = 0, x = 1, x = 2x, x= ? .
dx dx dx dx 2 x
We’ll expand this list later.
We now start building a collection of tools that help reduce the problem of computing the
derivative of a complicated function to that of computing the derivatives of a number of simple
functions. In this section we give three derivative “rules” as three separate theorems. We’ll give the
proofs of these theorems in the next section and examples of how they are used in the following
section.
As was the case for limits, derivatives interact very cleanly with addition, subtraction and
multiplication by a constant. The following result actually follows very directly from the first three
points of Theorem 2.1.14.
d (
f ( x ) + g ( x ) = f 1 ( x ) + g1 ( x ) ,
dx
d (
f ( x ) ´ g ( x ) = f 1 ( x ) ´ g1 ( x ) ,
dx
d (
c f (x ) = c f 1 (x ).
dx
That is, the derivative of the sum is the sum of the derivatives, and so forth.
Following this we can combine the three statements in this lemma into a single rule which
captures the “linearity of differentiation”.
Again, let f (x), g(x) be differentiable functions, let α, β P R be constants and define the
“linear combination”
S (x ) = α f (x ) + β g(x ).
dS
= S 1 ( x ) = α f 1 ( x ) + β g1 ( x ) .
dx
Note that we can recover the three rules in the previous lemma by setting α = β = 1 or
α = 1, β = ´1 or α = c, β = 0.
114
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
Unfortunately, the derivative does not act quite as simply on products or quotients. The rules for
computing derivatives of products and quotients get their own names and theorems:
Let f (x), g(x) be differentiable functions, then the derivative of the product f (x)g(x)
exists and is given by
d (
f ( x ) g ( x ) = f 1 ( x ) g ( x ) + f ( x ) g1 ( x ) .
dx
Before we proceed to the derivative of the ratio of two functions, it is worth noting a special
case of the product rule when g(x) = f (x). In fact, since this is a useful special case, let us call it a
corollary1 :
d
f (x )2 = 2 f (x ) f 1 (x ).
(
dx
With a little work this can be generalised to other powers — but that is best done once we
understand how to compute the derivative of the composition of two functions. That requires the
chain rule (see Theorem 4.3.2 below). But before we get to that, we need to see how to take the
derivative of a quotient of two functions.
Let f (x), g(x) be differentiable functions. Then the derivative of their quotient is
So we have covered sums, differences, products and quotients. This allows us to compute
derivatives of many different functions — including polynomials and rational functions. However
we are still missing trigonometric functions (for example), and a rule for computing derivatives of
compositions of functions. These will follow in the near future, but there are a couple of things to
do before that: understand where the above theorems come from, and practice using them.
1 Recall that a corollary is an important result that follows from one or more theorems — typically without too much
extra work — as is the case here.
115
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
f (x + h) ´ f (x ) g(x + h) ´ g(x )
f 1 (x) = lim and g1 (x) = lim .
hÑ0 h hÑ0 h
Our proofs, roughly speaking, involve doing algebraic manipulations to uncover the expressions that
look like the above.
S (x + h) ´ S (x )
S1 (x) = lim .
hÑ0 h
Let us concentrate on the numerator of the expression inside the limit and then come back to the full
limit in a moment. Substitute in the definition of S(x):
S (x + h) ´ S (x ) = α f (x + h) + β g(x + h) ´ α f (x ) + β g(x ) collect terms
= α f (x + h) ´ f (x)] + β g(x + h) ´ g(x) .
Now it is easy to see the structures we need — namely, we almost have the expressions for the
derivatives f 1 (x) and g1 (x). Indeed, all we need to do is divide by h and take the limit. So let’s finish
things off.
S (x + h) ´ S (x )
S1 (x) = lim from above
hÑ0 h
α f (x + h) ´ f (x)] + β g(x + h) ´ g(x)
= lim
hÑ0
h
f (x + h) ´ f (x ) g(x + h) ´ g(x )
= lim α +β limit laws
hÑ0 h h
f (x + h) ´ f (x ) g(x + h) ´ g(x )
= α lim + β lim
hÑ0 h hÑ0 h
1 1
= α f (x ) + β g (x ),
as required.
116
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
P(x + h) ´ P(x )
P1 (x) = lim
hÑ0 h
Again we will focus on the numerator inside the limit and massage it into the form we need. To
simplify these manipulations, define
f (x + h) ´ f (x ) g(x + h) ´ g(x )
F (h) = and G(h) = .
h h
Then we can write
Now since f (x) and g(x) do not change as we send h to zero, we can pull them outside. We can
also write the third term as the product of 3 limits:
= f (x) lim G(h) + g(x) lim F (h) + lim h ¨ lim F (h) ¨ lim G(h)
hÑ0 hÑ0 hÑ0 hÑ0 hÑ0
= f (x ) ¨ g (x ) + g(x ) ¨ f (x ) + 0 ¨ f (x ) ¨ g (x )
1 1 1 1
= f ( x ) ¨ g1 ( x ) + g ( x ) ¨ f 1 ( x ) .
117
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
• In the first step, we prove the quotient rule under the assumption that f (x)/g(x) is differen-
tiable.
• In the second step, we prove that 1/g(x) differentiable. Once we know that 1/g(x) is
differentiable, the product rule implies that f (x)/g(x) is differentiable.
f (x ) f (x )
Step 1: the proof of the quotient rule assuming that g(x) is differentiable. Write Q(x) = g(x )
. Then
f (x) = g(x) Q(x) so that f 1 (x) = g1 (x) Q(x) + g(x) Q1 (x), by the product rule, and
f (x )
f 1 ( x ) ´ g1 ( x ) Q ( x ) f 1 ( x ) ´ g1 ( x ) g ( x )
1
Q (x ) = =
g(x ) g(x )
f (x )g(x ) ´ f (x )g (x )
1 1
= .
g(x )2
d
‚ LIN to stand for “linearity” dx tα f (x) + β g(x)u = α f 1 (x) + β g1 (x) Theorem 4.1.2
d
‚ PR to stand for “product rule” dx t f (x) g(x)u = f 1 ( x ) g ( x ) + f ( x ) g1 ( x ) Theorem 4.1.3
! )
d f (x ) f 1 (x) g(x)´ f (x) g1 (x)
‚ QR to stand for “quotient rule” dx g(x )
= g(x )2
Theorem 4.1.5
118
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
d d d
t4x + 7u = 4 ¨ txu + 7 ¨ t1u LIN
dx dx dx
= 4¨1+7¨0 = 4
where we have used LIN with f (x) = x, g(x) = 1, α = 4, β = 7.
Example 4.1.6
Example 4.1.7
Continuing on from the previous example, we can use the product rule and the previous result to
compute
d ( d d
x(4x + 7) = x ¨ t4x + 7u + (4x + 7) txu PR
dx dx dx
= x ¨ 4 + (4x + 7) ¨ 1
= 8x + 7
where we have used the product rule PR with f (x) = x and g(x) = 4x + 7.
Example 4.1.7
Example 4.1.8
In the same vein as the previous example, we can use the quotient rule to compute
d d
(4x + 7) ¨ dx txu ´ x ¨ dx t4x + 7u
" *
d x
= QR
dx 4x + 7 (4x + 7)2
(4x + 7) ¨ 1 ´ x ¨ 4
=
(4x + 7)2
7
=
(4x + 7)2
where we have used the quotient rule QR with f (x) = x and g(x) = 4x + 7.
Example 4.1.8
119
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
• So we have completed breaking down f (x) into easy pieces. It is now just a matter of reversing
the break down steps, putting everything back together, starting with the easy pieces and
working up to f (x). Here goes.
d d d
f5 (x) = 3x + 1 so f 5 (x ) = 3 x + 1 = 3 ¨ 1 + 0 = 3 LIN
dx dx dx
1 d f 1 (x ) 3
f 4 (x ) = so f 4 (x ) = ´ 5 2 = ´ QR
f 5 (x ) dx f 5 (x ) (3x + 1)2
d 3
f 2 (x ) = 2 f 3 (x ) + f 4 (x ) so f2 (x) = 2 f31 (x) + f41 (x) = 2 ´ LIN
dx (3x + 1)2
f 1 (x ) d f 1 (x) f2 (x) ´ f1 (x) f21 (x)
f (x ) = so f (x ) = 1 QR
f 2 (x ) dx f 2 (x )2
1 2x + 3x1+1 ´ x 2 ´ (3x+3 1)2
= 2
2x + 3x1+1
Oof!
3 This is an instance of a special case of the quotient rule (Theorem 4.1.5) which is obtained by setting f (x) = 1. You
might see this defined elsewhere as “the derivative of a reciprocal”. It can be stated as: Let g(x) be a differentiable
function. Then the derivative of the reciprocal of g is given by
" *
d 1 g1 ( x )
=´
dx g(x) g(x )2
and exists except at those points where g(x) = 0.
120
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
• We now have an answer. But we really should clean it up, not only to make it easier to
read, but also because invariably such computations are just small steps inside much larger
computations. Any future computations involving this expression will be a lot easier and less
error prone if we clean it up now. Cancelling the 2x and the ´2x in
1 3 1 3x
1 2x + ´x 2´ = 2x + ´ 2x +
3x + 1 (3x + 1)2 3x + 1 (3x + 1)2
1 3x
= +
3x + 1 (3x + 1)2
and multiplying both the numerator and denominator by (3x + 1)2 gives
1
+ (3x3x
3x+1 +1)2 (3x + 1)
2
1
f (x ) = 2
2x + 3x1+1 (3x + 1)2
(3x + 1) + 3x
= 2
2x(3x + 1) + 1
6x + 1
= .
[6x2 + 2x + 1]2
Example 4.1.9
While the linearity theorem (Theorem 4.1.2) is stated for a linear combination of two functions,
it is not difficult to extend it to linear combinations of three or more functions as the following
example shows.
Example 4.1.10
We’ll start by generalising linearity to three functions.
d ( d (
aF (x) + bG(x) + cH (x) = a ¨ [F (x)] + 1 ¨ [bG(x) + cH (x)]
dx dx
d
= aF 1 (x) + tbG(x) + cH (x)u
dx
by LIN with α = a, f (x) = F (x), β = 1,
and g(x) = bG(x) + cH (x),
= aF (x) + bG1 (x) + cH 1 (x)
1
This gives us linearity for three terms, namely (just replacing upper case names by lower case
names):
d
ta f (x) + bg(x) + ch(x)u = a f 1 (x) + bg1 (x) + ch1 (x).
dx
121
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
Just by repeating the above argument many times, we may generalise to linearity for n terms, for
any natural number n:
d
ta1 f1 (x) + a2 f2 (x) + ¨ ¨ ¨ + an fn (x)u = a1 f11 (x) + a2 f21 (x) + ¨ ¨ ¨ + an fn1 (x).
dx
Example 4.1.10
Similarly, while the product rule is stated for the product of two functions, it is not difficult to
extend it to the product of three or more functions as the following example shows.
Example 4.1.11
Once again, we’ll start by generalising the product rule to three factors.
d d
tF (x) G(x) H (x)u = F 1 (x) G(x) H (x) + F (x) tG(x) H (x)u
dx dx
by PR with f (x) = F (x) and g(x) = G(x)H (x)
(
= F 1 (x) G(x) H (x) + F (x) G1 (x) H (x) + G(x) H 1 (x)
by PR with f (x) = G(x) and g(x) = H (x).
This gives us a product rule for three factors, namely (just replacing upper case names by lower case
names)
d
t f (x) g(x) h(x)u = f 1 (x) g(x) h(x) + f (x) g1 (x) h(x) + f (x) g(x) h1 (x).
dx
Observe that when we differentiate a product of three factors, the answer is a sum of three terms and
in each term the derivative acts on exactly one of the original factors. Just by repeating the above
argument many times, we may generalise the product rule to give the derivative of a product of n
factors, for any natural number n:
d
t f 1 (x ) f 2 (x ) ¨ ¨ ¨ f n (x )u = f11 (x) f2 (x) ¨ ¨ ¨ fn (x)
dx
+ f1 (x) f21 (x) ¨ ¨ ¨ fn (x)
..
.
+ f1 (x) f2 (x) ¨ ¨ ¨ fn1 (x).
When we differentiate a product of n factors, the answer is a sum of n terms and in each term the
derivative acts on exactly one of the original factors. In the first term, the derivative acts on the first
of the original factors. In the second term, the derivative acts on the second of the original factors.
And so on.
122
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
If we make f1 (x) = f2 (x) = ¨ ¨ ¨ = fn (x) = f (x) then each of the n terms on the right hand side of
the above equation is the product of f 1 (x) and exactly n ´ 1 f (x)’s, and so is exactly f (x)n´1 f 1 (x).
So we get the following useful result:
d
f (x)n = n ¨ f (x)n´1 ¨ f 1 (x).
dx
Example 4.1.11
This last result is quite useful, so let us write it as a lemma for future reference.
Lemma 4.1.12.
d n
x = n ¨ xn´1 ¨ 1 = n xn´1 .
dx
Example 4.1.13
Again — this is a result we will come back to quite a few times in the future, so we should make
sure we can refer to it easily. However, at present this statement only holds when n is a positive
integer. With a little more work we can extend this to compute xq where q is any positive rational
number and then any rational number at all (positive or negative). So let us hold off for a little
longer. Instead we can make it a lemma, since it will be an ingredient in quite a few of the examples
following below and in constructing the final corollary.
123
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
Example 4.1.15
d d d
2x3 + 4x5 = 2 tx3 u + 4 tx5 u
(
dx dx dx
by LIN with α = 2, f (x) = x3 , β = 4, and g(x) = x5
= 2t3x2 u + 4t5x4 u
by Lemma 4.1.14, once with n = 3, and once with n = 5
= 6x2 + 20x4 .
Example 4.1.15
Example 4.1.16
d
(
In this example we’ll compute dx (3x + 9)(x2 + 4x3 ) in two different ways. For the first, we’ll
start with the product rule.
d ( !d ) d
(3x + 9)(x2 + 4x3 ) = (3x + 9) (x2 + 4x3 ) + (3x + 9) tx2 + 4x3 u
dx dx dx
= 3 ˆ 1 + 9 ˆ 0 (x + 4x ) + (3x + 9) t2x + 4(3x2 )u
( 2 3
For the second, we expand the product first and then differentiate.
d ( d
(3x + 9)(x2 + 4x3 ) = 9x2 + 39x3 + 12x4
(
dx dx
= 9(2x) + 39(3x2 ) + 12(4x3 )
= 18x + 117x2 + 48x3 .
Example 4.1.16
Example 4.1.17
124
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
Example 4.1.17
Example 4.1.18
?
3
In this example, we’ll use a little trickery to find the derivative of x. The trickery consists of
observing that, by the definition of the cube root,
? 3
x = ( 3 x) .
Since both sides of the expression are the same, they must have the same derivatives:
d d ? 3
txu = ( 3 x) .
dx dx
We already know by Theorem 3.3.6 that
d (
x =1
dx
?
and that, by Lemma 4.1.12 with n = 3 and f (x) = 3 x,
d ? 3 ? 2 d ? d ?
x = 3 x2/3 ¨
( (
3
x =3 3x ¨ 3 3
x .
dx dx dx
?
Since we know that dxd
txu = dxd
( 3 x)3 , we must have
d ?
1 = 3x2/3 ¨ 3
(
x
dx
which we can rearrange to give the result we need
d ? (
3
x = 13 x´2/3 .
dx
Example 4.1.18
Example 4.1.19
In this example, we’ll use the same trickery as in Example 4.1.18 to find the derivative x p/q for any
two natural numbers p and q. By definition of the qth root,
q
x p = x p/q .
125
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
q
That is, x p and x p/q are the same function, and so have the same derivative. So we differentiate
both of them. We already know that, by Lemma 4.1.14 with n = p,
d p(
x = px p´1
dx
and that, by Lemma 4.1.12 with n = q and f (x) = x p/q ,
d q ( q´1 d p/q (
x p/q = q x p/q x .
dx dx
Remember that (xa )b = x(a¨b) . Now these two derivatives must be the same. So
d p/q (
px p´1 = q ¨ x( pq´p)/q x
dx
and, rearranging things,
d p/q ( p p´1´( pq´p)/q
x = x
dx q
p
= x( pq´q´pq+ p)/q
q
p p
= x /q´1 .
q
So finally
d ! p/q ) p p/q´1
x = x . (4.1.2)
dx q
Notice that this has the same form as Lemma 4.1.14, above, except with n = p/q allowed to be any
positive rational number, not just a positive integer.
Example 4.1.19
Example 4.1.21
In this example we’ll use the quotient rule to find the derivative of x´p/q , for any pair of natural
numbers p and q. By a special case the quotient rule with g(x) = x p/q and g1 (x) = qp x p/q´1 ,
p p/q´1
qx
" *
d ´ p/q ( d 1 p ´ p/q´1
x = p/q
=´ 2
= ´ x
dx dx x (x p/q ) q
126
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES
Example 4.1.21
Note that we have found, in Examples 3.3.4, 4.1.19 and 4.1.21, the derivative of xa for any
rational number a, whether 0, positive, negative, integer or fractional. In all cases, the answer is
d a
We shall show, in Example 4.4.5, that the formula dx x = axa´1 in fact applies for all real numbers
a, not just rational numbers.
?
Back in Example 3.3.14 we computed the derivative of x from the definition of the derivative.
The above corollary (correctly) gives
d 1/2 1 ´1/2
x = x
dx 2
• As we seen before, the best strategy for dealing with nasty expressions is to break them up
into easy pieces. We can think of f (x) as the five–fold product
1 1
f (x ) = f 1 (x ) ¨ f 2 (x ) ¨ f 3 (x ) ¨ ¨
f 4 (x ) f 5 (x )
with
? ?
f 1 (x ) = x´1 f 2 (x ) = 2 ´ x f 3 ( x ) = 1 ´ x2 f 4 (x ) = x f5 (x) = 3 + 2x.
1 1
f11 (x) = ? f21 (x) = ´1 f31 (x) = ´2x f41 (x) = ? f51 (x) = 2.
2 x 2 x
• Now, to get the derivative f (x) we use the n–fold product rule which was developed in
127
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES
1 1 1 1 1 1 f1 1 1 f51
f 1 (x) = f11 f2 f3 + f1 f21 f3 + f1 f2 f31 ´ f1 f2 f3 42 ´ f1 f2 f3
f4 f5 f4 f5 f4 f5 f4 f5 f4 f52
h f1 f21 f1 f1 f1i 1 1
= 1
+ + 3 ´ 4 ´ 5 f1 f2 f3
f1 f2 f3 f4 f5 f4 f5
?
1 1 2x 1 2 ( x ´ 1)(2 ´ x)(1 ´ x2 )
= ? ? ´ ´ ´ ´ ? .
2 x( x ´ 1) 2 ´ x 1 ´ x2 2x 3 + 2x x(3 + 2x)
The trick that we used in going from the first line to the second line, namely multiplying term
f (x )
number j by f j (x) is often useful in simplifying the derivative of a product of many factors4 .
j
Example 4.1.23
Learning Objectives
• Review the definitions of trigonometric functions.
We are now going to compute the derivatives of the various trigonometric functions, sin x, cos x
and so on. The computations are more involved than the others that we have done so far and will
take several steps. Fortunately, the final answers will be very simple.
Observe that we only need to work out the derivatives of sin x and cos x, since the other trigono-
metric functions are really just quotients of these two functions. Recall:
sin x cos x 1 1
tan x = cot x = csc x = sec x = .
cos x sin x sin x cos x
The first steps towards computing the derivatives of sin x, cos x is to find their derivatives at x = 0.
The derivatives at general points x will follow quickly from these, using trig identities. It is important
to note that we must measure angles in radians5 , rather than degrees, in what follows. Indeed —
unless explicitly stated otherwise, any number that is put into a trigonometric function is measured
in radians.
128
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES
d
ˇ
§§ Step 1: dx tsin xu x=0
ˇ
By definition, the derivative of sin x evaluated at x = 0 is
d ˇ sin h ´ sin 0 sin h
tsin xuˇ = lim = lim .
ˇ
dx x=0 hÑ0 h hÑ0 h
We will prove this limit by use of a theorem called the squeeze theorem6 . To get there we will first
need to do some geometry. But first we will build some intuition.
The figure below contains part of a circle of radius 1. Recall that an arc of length h on such a
circle subtends an angle of h radians at the centre of the circle. So the darkened arc in the figure
has length h and the darkened vertical line in the figure has length sin h. We must determine what
happens to the ratio of the lengths of the darkened vertical line and darkened arc as h tends to zero.
1 h
sin h
h
cos h
Here is a magnified version of the part of the above figure that contains the darkened arc and vertical
line.
sin h h = 0.4
6 The squeeze theorem is not part of the Math 100 content, but we do need to use its results for this proof. This theorem
tells that we can compute the limit of a function by “squeezing” or “sandwiching” it between two other functions.
If the upper function and the lower function both tend to the same value, then so does the function that is squeezed
between them. Formally, we would state it as: Let a P R and let f , g, h be three functions so that f (x) ď g(x) ď h(x)
for all x in an interval around a, except possibly exactly at x = a. Then if limxÑa f (x) = limxÑa h(x) = L then it is
also the case that limxÑa g(x) = L. (We do not prove it here.)
129
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES
This particular figure has been drawn with h = .4 radians. Here are three more such blow ups.
In each successive figure, the value of h is smaller. To make the figures clearer, the degree of
magnification was increased each time h was decreased.
As we make h smaller and smaller and look at the figure with ever increasing magnification, the arc
of length h and vertical line of length sin h look more and more alike. We would guess from this that
sin h
lim = 1.
hÑ0 h
The following tables of values
sin h sin h
h sin h h h sin h h
0.4 .3894 .9735 ´0.4 ´.3894 .9735
0.2 .1987 .9934 ´0.2 ´.1987 .9934
0.1 .09983 .9983 ´0.1 ´.09983 .9983
0.05 .049979 .99958 ´0.05 ´.049979 .99958
0.01 .00999983 .999983 ´0.01 ´.00999983 .999983
0.001 .0099999983 .9999983 ´0.001 ´.0099999983 .9999983
suggests the same guess. Here is an argument that shows that the guess really is correct.
sin h
§§§ Proof that lim = 1:
hÑ0 h
tan h
1
sin h
h S
O cos h R
1
130
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES
Now we can use a few geometric facts about this figure to establish both an upper bound and a lower
bound on sinh h with both the upper and lower bounds tending to 1 as h tends to 0. So the squeeze
theorem7 will tell us that sinh h also tends to 1 as h tends to 0.
• The triangle OPR has base 1 and height sin h, and hence
sin h
area of 4OPR = 12 ˆ 1 ˆ sin h = .
2
• The triangle OQR has base 1 and height tan h, and hence
tan h
area of 4OQR = 21 ˆ 1 ˆ tan h = .
2
h
• The “piece of pie” OPR cut out of the circle is the fraction 2π of the whole circle (since the
angle at the corner of the piece of pie is h radians and the angle for the whole circle is 2π
radians). Since the circle has radius 1 we have
h h h
area of pie OPR = ¨ (area of circle) = π ¨ 12 =
2π 2π 2
Now the triangle OPR is contained inside the piece of pie OPR. and so the area of the triangle is
smaller than the area of the piece of pie. Similarly, the piece of pie OPR is contained inside the
triangle OQR. Thus we have
7 Again, we aren’t proving the squeeze theorem, nor are we requiring you to know it — see the previous footnote.
What you need to know here is that we are “squeezing” the function sin h/h between the upper and lower bounds.
131
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES
sin h
cos h ď ď 1.
h
We know that
lim cos h = 1.
hÑ0
Since sinh h is sandwiched between cos h and 1, we can apply the squeeze theorem for limits to deduce
the following lemma:
Lemma 4.2.1.
sin h
lim =1
hÑ0 h
Since this argument took a bit of work, perhaps we should remind ourselves why we needed it in
the first place. We were computing
d ˇ sin h ´ sin 0
tsin xuˇ = lim
ˇ
dx x=0 hÑ0 h
sin h
= lim (This is why!)
hÑ0 h
= 1.
d
ˇ
This concludes Step 1. We now know that dx sin x x=0
ˇ = 1. The remaining steps are easier.
d
ˇ
§§ Step 2: dx tcos xu x=0
ˇ
Fortunately we don’t have to wade through geometry like we did for the previous step. Instead we
can recycle our work and massage the above limit to rewrite it in terms of expressions involving sinh h .
Thanks to Lemma 4.2.1 the work is then easy.
We’ll show you two ways to proceed — one uses a method similar to “multiplying by the
conjugate” that we have already used a few times (see Example 3.3.14 ), while the other uses a nice
trick involving the double–angle formula.
132
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES
133
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES
sin h
where we have used the fact that lim = 1 and that the limit of a product is the product of limits
hÑ0 h
(i.e. Lemma 4.2.1 and Theorem 2.1.14).
Thus we have now produced two proofs of the following lemma:
Lemma 4.2.2.
cos h ´ 1
lim =0
hÑ0 h
Again, there has been a bit of work to get to here, so we should remind ourselves why we needed
it. We were computing
d ˇ cos h ´ cos 0
tcos xuˇ = lim
ˇ
dx x=0 hÑ0 h
cos h ´ 1
= lim
hÑ0 h
= 0.
Armed with these results we can now build up the derivatives of sine and cosine.
d d
§§ Step 3: dx tsin xu and dx tcos xu for General x
To proceed to the general derivatives of sin x and cos x we are going to use the above two results and
a couple of trig identities. Remember the addition formulae
To compute the derivative of sin(x) we just start from the definition of the derivative:
d sin(x + h) ´ sin x
sin x = lim
dx hÑ0 h
sin x cos h + cos x sin h ´ sin x
= lim
hÑ0 h
cos h ´ 1 sin h ´ 0
= lim sin x + cos x
hÑ0 h h
cos h ´ 1 sin h ´ 0
= sin x lim + cos x lim
hÑ0 h hÑ0 h
d d
= sin x cos x + cos x sin x
dx x=0
loooooomoooooon dx x=0
loooooomoooooon
=0 =1
= cos x.
134
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES
Lemma 4.2.3.
d d
sin x = cos x cos x = ´ sin x
dx dx
The above formulas hold provided x is measured in radians.
d
These formulae are pretty easy to remember — applying dx to sin x and cos x just exchanges
8
sin x and cos x, except for the minus sign in the derivative of cos x.
d
Remark 4.2.4 (Optional — Another derivation of dx cos x = ´ sin x). We remark that, once one
d
knows that dx sin x = cos x, it is easy to use it and the trig identity cos(x) = sin π2 ´ x to derive
d
dx cos x = ´ sin x. Here is how .
9
d cos(x + h) ´ cos x sin π2 ´ x ´ h) ´ sin π2 ´ x
cos x = lim = lim
dx hÑ0 h hÑ0 h
sin x + h ) ´ sin(x )
1 1 1
= ´ lim 1
with x1 = π2 ´ x, h1 = ´h
1
h Ñ0 h
d ˇ
= ´ 1 sin x1 ˇ 1 π = ´ cos π2 ´ x
ˇ
dx x = 2 ´x
= ´ sin x.
Note that, if x is measured in degrees, then the formulas of Lemma 4.2.3 are wrong. There are
similar formulas, but we need the chain rule to build them — that is the subject of the next section.
But first we should find the derivatives of the other trig functions.
8 There is a bad pun somewhere in here about sine errors and sign errors.
9 We thank Serban Raianu for suggesting that we include this.
135
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES
sin x cos x 1
tan x = cot x = =
cos x sin x tan x
1 1
csc x = sec x =
sin x cos x
So, by the quotient rule,
cos x
hkkkikkkj ´ sin x
hkkkikkkj
d
d
d d sin x dx sin x cos x ´ sin x dx cos x
tan x = = 2
= sec2 x
dx dx cos x cos x
cos x
hkkkikkkj
d
d d 1 dx sin x
csc x = =´ = ´ csc x cot x
dx dx sin x sin2 x
´ sin x
hkkkikkkj
d
d d 1 cos x
sec x = = ´ dx 2 = sec x tan x
dx dx cos x cos x
´ sin x
hkkkikkkj cos x
hkkkikkkj
d
d
d d cos x dx cos x sin x ´ cos x dx sin x
cot x = = 2
= ´ csc2 x.
dx dx sin x sin x
§§ Summary
To summarise all this work, we can write this up as a theorem:
Of these 6 derivatives you should really memorise those of sine, cosine and tangent. We certainly
expect you to be able to work out those of cotangent, cosecant and secant.
136
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE
Learning Objectives
• Use the chain rule to compute derivatives of compositions of functions.
We have built up most of the tools that we need to express derivatives of complicated functions
in terms of derivatives of simpler known functions. We started by learning how to evaluate
• derivatives of sums, products and quotients,
• derivatives of constants and monomials.
These tools allow us to compute derivatives of polynomials and rational functions. We have also
added exponential and trigonometric functions to our list. The final tool we add is called the chain
rule. It tells us how to take the derivative of a composition of two functions. That isif we know f (x)
and g(x) and their derivatives, then the chain rule tells us the derivative of f g(x) .
Before we get to the statement of the rule, let us look at an example showing how such a
composition might arise (in the “real-world”).
Example 4.3.1
You are out in the woods after a long day of mathematics and are walking towards your camp fire
on a beautiful still night. The heat from the fire means that the air temperature depends on your
position. Let your position at time t be x(t ). The temperature of the air at position x is f (x). What
instantaneous rate of change of temperature do you feel at time t?
• Because your position at time t is x = x(t ), the temperature you feel at time t is F (t ) = f x(t ) .
• The instantaneous rate of change of temperature that you feel is F 1 (t ). We have a complicated
function, F (t ), constructed by composing two simpler functions, x(t ) and f (x).
d
• We wish to compute the derivative, F 1 (t ) = dt f (x(t )), of the complicated function F (t ) in
1 1
terms of the derivatives, x (t ) and f (x), of the two simple functions. This is exactly what the
chain rule does.
Example 4.3.1
137
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE
Let a P R and let g(x) be a function that is differentiable at x = a. Now let f (u) be a func-
tion that is differentiable at u = g(a). Then the function F (x) = f (g(x)) is differentiable
at x = a and
F 1 ( a ) = f 1 g ( a ) g1 ( a )
Here, as was the case earlier in this chapter, we have been very careful to give the point at which
the derivative is evaluated a special name (i.e. a). But of course this evaluation point can really be
any point (where the derivative is defined). So it is very common to just call the evaluation point “x”
rather than give it a special name like “a”, like this:
Notice that when we form the composition f g(x) there is an “outside” function (namely
f (x)) and an “inside” function (namely g(x)). The chain rule tells us that when we differentiate
a composition that we have to differentiate the outside and then multiply by the derivative of the
inside.
d 1
f 1 g(x) ¨ lo
f g(x) = looomooon (xo)n
gomo
dx
diff outside diff inside
Here is another statement of the chain rule which makes this idea more explicit.
dy dy du
= ¨
dx du dx
This particular form is easy to remember because it looks like we can just “cancel” the du
between the two terms.
dy dy du
= ¨
dx du
dx
138
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE
Of course, du is not, by itself, a number or variable10 that can be cancelled. But this is still a
good memory aid.
The hardest part about applying the chain rule is recognising when the function you are trying to
differentiate is really the composition of two simpler functions. This takes a little practice. We can
warm up with a couple of simple examples.
Example 4.3.5
5
Let f (u) = u5 and g(x) = sin(x). Then set F (x) = f g(x) = sin(x) . To find the derivative of
F (x) we can simply apply the chain rule — the pieces of the composition have been laid out for us.
Here they are:
f (u) = u5 f 1 (u) = 5u4
g(x) = sin(x) g1 (x) = cos x.
We now just put them together as the chain rule tells us:
dF
= f 1 g ( x ) ¨ g1 ( x )
dx
4
= 5 g(x) ¨ cos(x) since f 1 (u) = 5u4
4
= 5 sin(x) ¨ cos(x).
Notice that it is quite easy to extend this to any power. Set f (u) = un . Then follow the same
steps and we arrive at
n´1
F (x) = (sin(x))n , F 1 (x) = n sin(x) cos(x).
Example 4.3.5
This example shows one of the ways that the chain rule appears very frequently — when we
need to differentiate the power of some simpler function. More generally we have the following.
Example 4.3.6
Let f (u) = un and let g(x) be any differentiable function. Set F (x) = f g(x) = g(x)n . Then
dF d
= g(x)n = ng(x)n´1 ¨ g1 (x)
dx dx
This is precisely the result in Example 4.1.11 and Lemma 4.1.12.
Example 4.3.6
Example 4.3.7
Let f (u) = cos(u) and g(x) = 3x ´ 2. Find the derivative of
F (x) = f g(x) = cos(3x ´ 2).
10 In this context du is called a differential. There are ways to understand and manipulate these in calculus but they
are beyond the scope of this course.
139
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE
Again we should approach this by first writing down f and g and their derivatives and then
putting everything together as the chain rule tells us.
Example 4.3.7
This example shows a second way that the chain rule appears very frequently — when we need
to differentiate some function of ax + b. More generally we have the following.
Example 4.3.8
Let a, b P R and let f (x) be a differentiable function. Set g(x) = ax + b. Then
d d
f (ax + b) = f g(x )
dx dx
= f 1 g ( x ) ¨ g1 ( x )
= f 1 (ax + b) ¨ a.
Corollary 4.3.9.
d
f (ax + b) = a f 1 (ax + b).
dx
140
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE
Notice that the units of measurement on both sides of the equation agree — as indeed they must.
To see this, let us assume that t is measured in seconds, that x(t ) is measured in metres and that
f (x) is measured in degrees. Because of this F (x(t )) must also be measured in degrees (since it is a
temperature).
What about the derivatives? These are rates of change. So
degrees
• F 1 (t ) has units second ,
degrees
• f 1 (x) has units metre , and
metre
• x1 (t ) has units second
Before we can compute F 1 (a), we need to set up some ground work, and in particular the
definitions of our given derivatives:
141
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE
Then we have
lim ϕ (H ) = f 1 (b) = f 1 g(a) since b = g(a), (4.3.1)
HÑ0
and we can also write (with a little juggling)
f (b + H ) = f (b) + Hϕ (H ).
Similarly set
g(a + h) ´ g(a)
γ (h) =
h
which gives us
lim γ (h) = g1 (a) and g(a + h) = g(a) + hγ (h).
hÑ0
Now we can start computing
F (a + h) ´ F (a)
F 1 (a) = lim
hÑ0 h
f g(a + h) ´ f g(a)
= lim .
hÑ0 h
We know that g(a) = b and g(a + h) = g(a) + hγ (h)), so
f g ( a ) + hγ ( h ) ´ f g ( a )
F 1 (a) = lim
hÑ0 h
f (b + hγ (h)) ´ f (b)
= lim .
hÑ0 h
Now for the sneaky bit. We can turn f (b + hγ (h)) into f (b + H ) by setting
H = hγ (h).
Now notice that as h Ñ 0 we have
lim H = lim h ¨ γ (h)
hÑ0 hÑ0
= lim h ¨ lim γ (h)
hÑ0 hÑ0
= 0 ¨ g (a) = 0.
1
So as h Ñ 0 we also have H Ñ 0.
We now have
1 f b + H ´ f (b)
F (a) = lim
hÑ0 h
f b + H ´ f (b) H
= lim ¨ if H = hγ (h) ‰ 0
H
hÑ0 looooooooomooooooooon h on
loomo
=ϕ (H ) =γ (h)
= lim ϕ (H ) ¨ γ (h)
hÑ0
= lim ϕ (H ) ¨ lim γ (h) since H Ñ 0 as h Ñ 0
hÑ0 hÑ0
= lim ϕ (H ) ¨ lim γ (h) = f 1 ( b ) ¨ g1 ( a )
HÑ0 hÑ0
142
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE
This is exactly the RHS of the chain rule. It is possible to have H = 0 in the second line above. But
that possibility is easy to deal with:
• If g1 (a) ‰ 0, then, since limhÑ0 γ (h) = g1 (a), H = hγ (h) cannot be 0 for small nonzero h.
Technically, there is an h0 ą 0 such that H = hγ (h) ‰ 0 for all 0 ă |h| ă h0 . In taking the limit
h Ñ 0, above, we need only consider 0 ă |h| ă h0 and so, in this case, the above computation
is completely correct.
• If g1 (a) = 0, the above computation is still fine provided we exclude all h’s for which
H = hγ (h) ‰ 0. When g (a) = 0, the right hand side, f g(a) ¨ g (a), of the chain rule is 0.
1 1 1
Example 4.3.11
Example 4.3.12
d 2
Find dx sin(x ).
143
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE
In this example we are to compute the derivative of sin with a (slightly) complicated argument.
So we apply the chain rule with f being sin and g(x) being the complicated argument. That is, we
set
Example 4.3.12
Example 4.3.13
d 3
a
Find dx sin(x2 ).
In this example we are to compute the derivative of the cube root of a (moderately) complicated
argument, namely sin(x2 ). So we apply the chain rule with f being “cube root” and g(x) being the
complicated argument. That is, we set
? 1 2
f (u) = 3
u = u3 f 1 (u) = 13 u´ 3
g(x) = sin(x2 ) g1 (x) = 2x cos(x2 )
b b
F (x) = f g(x) = 3 g(x) = 3 sin(x2 ).
In computing g1 (x) here, we have already used the chain rule once (in Example 4.3.12). By the
chain rule,
2
F 1 (x) = f 1 g(x) y1 (x) = 13 g(x)´ 3 ¨ 2x cos(x2 )
2x cos(x2 )
= .
3 [sin(x2 )] 23
Example 4.3.13
Example 4.3.14
d
Find the derivative of dx f (g(h(x))).
This is very similar to the previous example. Let us set F (x) = f (g(h(x))) with u = g(h(x)).
Then the chain rule tells us that
dF d f du
= ¨
dx du dx
d
= f 1 (g(h(x))) ¨ g(h(x)).
dx
144
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE
Indeed it is not too hard to generalise further (in the manner of Example 4.1.11 to find the derivative
of the composition of 4 or more functions (though things start to become tedious to write down):
d d
f1 ( f2 ( f3 ( f4 (x)))) = f11 ( f2 ( f3 ( f4 (x)))) ¨ f2 ( f3 ( f4 (x)))
dx dx
d
= f11 ( f2 ( f3 ( f4 (x)))) ¨ f21 ( f3 ( f4 (x))) ¨ f3 ( f4 (x))
dx
= f1 ( f2 ( f3 ( f4 (x)))) ¨ f2 ( f3 ( f4 (x))) ¨ f31 ( f4 (x)) ¨ f41 (x).
1 1
Example 4.3.14
Example 4.3.15
We can also use the chain rule to calculate the derivative of the reciprocal of a function11 , and from
there we can use the product rule to recover the quotient rule.
We want to differentiate F (x) = g(1x) so set f (u) = 1u and u = g(x). Then the chain rule tells us
" *
d 1 dF d f du
= = ¨
dx g(x ) dx du dx
´1
= 2 ¨ g1 ( x )
u
g1 ( x )
=´ .
g(x )2
Once we know this, a quick application of the product rule will give us the quotient rule.
" * " *
d f (x ) d 1
= f (x ) ¨ use PR
dx g(x ) dx g(x )
" *
1 d 1
= f (x ) ¨
1
+ f (x ) ¨ use the result from above
g(x ) dx g(x)
1 g1 (x)
= f 1 (x ) ¨ ´ f (x ) ¨ place over a common denominator
g(x ) g(x )2
f 1 (x) ¨ g(x) ´ f (x) ¨ g1 (x)
=
g(x )2
145
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE
Example 4.3.16
Compute the following derivative:
? !
d x5 3 + x6
cos
dx ( 4 + x2 ) 3
This time we are to compute the derivative of cos with a really complicated argument.
?
5 3 + x6
• So, to start, we apply the chain rule with g(x) = x being the really complicated argument
( 4 + x2 ) 3
and f being cos. That is, f (u) = cos(u). Since f 1 (u) = ´ sin(u), the chain rule gives
5? 5? # ? +
d x 3 + x6 x 3 + x6 d x5 3 + x6
cos = ´ sin .
dx ( 4 + x2 ) 3 (4 + x2 )3 dx (4 + x2 )3
• This reduced ? our problem to that of computing the derivative of the really complicated
x5 3 + x6
argument . We can think of the argument as being built up out of three pieces, namely
( 4 + x2 ) 3
? 3 ´3
x5 , multiplied by 3 + x6 , divided by (4 + x2 ) , or, equivalently, multiplied by (4 + x2 ) .
5
?
6 1/2 ´3
So we may rewrite x 3+2 x3 as x5 3 + x6 (4 + x2 ) , and then apply the product rule to
(4+x )
reduce the problem to that of computing the derivatives of the three pieces.
146
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION
1 1
– differentiating (3 + x6 ) 2 to get 12 (3 + x6 )´1/2 ¨ 6x5 is the same as multiplying (3 + x6 ) 2
5
by 33x
+ x6
, and
´3 ´4 ´3
– differentiating (4 + x2 ) to get ´3(4 + x2 ) ¨ 2x is the same as multiplying (4 + x2 )
6x
by ´ 4+ x2
.
Using these sneaky tricks we can write our solution quite neatly:
5? 5? ?
x 3 + x6 x5 3 + x6 5 3x5
" *
d x 3 + x6 6x
cos = ´ sin + ´ .
dx ( 4 + x2 ) 3 ( 4 + x2 ) 3 ( 4 + x2 ) 3 x 3 + x6 4 + x2
• This method of cleaning up the derivative of a messy product is actually something more
systematic in disguise — namely logarithmic differentiation. This is our next topic.
Example 4.3.16
Learning Objectives
• Differentiate logarithmic functions.
• Use the generalized product rule to compute the derivative of products of many
functions.
The chain rule opens the way to understanding derivatives of more complicated function. Not
only compositions of known functions as we have seen the examples of the previous section, but
also functions which are defined implicitly.
Consider the logarithm base e — loge (x) is the power that e must be raised to to give x. That is,
loge (x) is defined by
eloge x = x
i.e. — it is the inverse of the exponential function with base e. Since this choice of base works
so cleanly and easily with respect to differentiation, this base turns out to be (arguably) the most
natural choice for the base of the logarithm. And as we saw in our whirlwind review of logarithms
in Section 3.5, it is easy to use logarithms of one base to compute logarithms with another base:
loge x
logq x =
loge q
So we are (relatively) free to choose a base which is convenient for our purposes.
147
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION
The logarithm with base e, is called the “natural logarithm”. The “naturalness” of logarithms
base e is exactly that this choice of base works very nicely in calculus (and so wider mathematics)
in ways that other bases do not12 . There are several different “standard” notations for the logarithm
base e;
loge x = log x = ln x.
We recommend that you be able to recognise all of these.
In this text we will write the natural logarithm as “log” with no base. The reason for this choice
is that base e is the standard choice of base for logarithms in mathematics13 . The natural logarithm
inherits many properties of general logarithms14 . So, for all x, y ą 0 the following hold:
• elog x = x,
• for any real number X, log eX = X,
log x loga x
• for any a ą 1, loga x = log a and log x = loga e
• log 1 = 0, log e = 1
• log(xy) = log x + log y
1
• log xy = log x ´ log y, log y = ´ log y
• log(xX ) = X log x
• lim log x = 8, lim log x = ´8
xÑ8 xÑ0
And finally we should remember that log x has domain (i.e. is defined for) x ą 0 and range (i.e. takes
all values in) ´8 ă x ă 8.
Figure 4.4.1.
y
1.5
1.0 y = ln x
0.5
x
1 2 3 4
−0.5
−1.0
−1.5
12 The interested reader should head to Wikipedia and look up the natural logarithm.
13 In other disciplines other bases are natural; in computer science, since numbers are stored in binary it makes sense
to use the binary logarithm — i.e. base 2. While in some sciences and finance, it makes sense to use the decimal
logarithm — i.e. base 10.
14 Again take a quick look at the whirlwind review of logarithms in Section 3.5.
148
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION
To compute the derivative of log x we could attempt to start with the limit definition of the
derivative
d log(x + h) ´ log(x)
log x = lim
dx hÑ0 h
log((x + h)/x)
= lim
hÑ0 h
= um. . .
This doesn’t look good. But all is not lost — we have the chain rule, and we know that the logarithm
satisfies the equation:
x = elog x
Since both sides of the equation are the same function, both sides of the equation have the same
derivative. i.e. we are using15
if f (x) = g(x) for all x, then f 1 (x) = g1 (x)
So now differentiate both sides:
d d
x = elog x
dx dx
The left-hand side is easy, and the right-hand side we can process using the chain rule with f (u) = eu
and u = log x.
d f du
1= ¨
du dx
d
= eu ¨ log x
dx
loomoon
what we want to compute
d 1
log x =
dx x
where log x is the logarithm base e.
15 Notice that just because the derivatives are the same, doesn’t mean the original functions are the same. Both
f (x) = x2 and g(x) = x2 + 3 have derivative f 1 (x) = g1 (x) = 2x, but f (x) ‰ g(x).
149
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION
Example 4.4.2
Let f (x) = log 3x. Find f 1 (x).
There are two ways to approach this — we can simplify then differentiate, or differentiate and
then simplify. Neither is difficult.
Example 4.4.2
Example 4.4.3
• If x ą 0, |x| = x and so
d 1
g1 ( x ) = log x =
dx x
150
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION
• If x ă 0 then |x| = ´x. If |h| is strictly smaller than |x|, then we also have that x + h ă 0
and |x + h| = ´(x + h) = |x| ´ h. Write X = |x| and H = ´h. Then, by the definition of the
derivative,
d 1
log |x| =
dx x
Example 4.4.4
d a d a log x d
x = e = ea log x (a log x) by the chain rule
dx dx dx
a a log x a a
= e = x
x x
a´1
= ax
as expected.
Example 4.4.5
We can extend Theorem 4.4.1 to compute the derivative of logarithms of other bases in a
straightforward way. Since for any positive a ‰ 1:
log x 1
loga x = = ¨ log x since a is a constant
log a log a
d 1 1
loga x = ¨
dx log a x
151
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION
d x
§§ Back to dx a
We can also now finally get around to computing the derivative of ax (which we started to do back in
Section 3.5).
Example 4.4.6 (The derivative of ax )
We show two ways to compute this derivative.
• Method 1:
f ( x ) = ax take log of both sides
log f (x) = x log a exponentiate both sides base e
f (x) = ex log a chain rule
f 1 (x) = ex log a ¨ log a
= ax ¨ log a.
• Method 2:
f ( x ) = ax take log of both sides
log f (x) = x log a differentiate both sides
d
(log f (x)) = log a
dx
We then process the left-hand side using the chain rule
1
f 1 (x ) ¨ = log a
f (x )
f 1 (x) = f (x) ¨ log a = ax ¨ log a.
Example 4.4.6
d
We will see dx log f (x) more below in the subsection on “logarithmic differentiation”.
To summarise the results above:
Corollary 4.4.7.
d x
a = log a ¨ ax for any a ą 0
dx
d 1
loga x = for any a ą 0, a ‰ 1
dx x ¨ log a
where log x is the natural logarithm.
Recall that we need the caveat a ‰ 1 because the logarithm base 1 is not well defined. This
is because 1x = 1 for any x. We do not need a similar caveat for the derivative of the exponential
because we know (recall Example 3.5.1)
d x d
1 = 1=0 while the above corollary tells us
dx dx
= log 1 ¨ 1x = 0 ¨ 1 = 0.
152
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION
P(x ) = F (x ) ¨ G(x ) ¨ H (x )
Notice that the product of functions on the right-hand side has become a sum of functions. Differen-
tiating sums is much easier than differentiating products. So when we differentiate we have
d d d d
log P(x) = log F (x) + log G(x) + log H (x).
dx dx dx dx
which is what found in Example 4.1.11 by repeated application of the product rule. The above
generalises quite easily to more than 3 functions.
Example 4.4.8
This same trick of “take a logarithm and then differentiate” — or logarithmic differentiation — will
work any time you have a product (or ratio) of functions.
Example 4.4.9
Let’s use logarithmic differentiation on the function from Example 4.1.23:
?
( x ´ 1)(2 ´ x)(1 ´ x2 )
f (x ) = ?
x(3 + 2x)
Beware however, that we may only take the logarithm of positive numbers, and this f (x) is often
negative. For example, if 1 ă x ă 2, the factor (1 ´ x2 ) in the definition of f (x) is negative while
all of the other factors are positive, so that f (x) ă 0. None–the–less, we can use logarithmic
153
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION
d f 1 (x )
differentiation to find f 1 (x), by exploiting the observation that dx log | f (x)| = f (x) . (To see this, use
the chain rule and Example 4.4.4.) So we take the logarithm of | f (x)| and expand.
?
| x ´ 1| |2 ´ x| |1 ´ x2 |
log | f (x)| = log ?
x|3 + 2x|
? ?
= log | x ´ 1| + log |2 ´ x| + log |1 ´ x2 | ´ log ( x) ´ log |3 + 2x|
looomooon
= 21 log x
Learning Objectives
• Explain how implicit differentiation is a consequence of the Chain Rule.
• Use implicit differentiation to find slopes of tangent lines to implicitly defined curves.
Implicit differentiation is a simple trick that is used to compute derivatives of functions either
• when you don’t know an explicit formula for the function, but you know an equation that the
function obeys, or
• even when you have an explicit, but complicated, formula for the function, and the function
obeys a simple equation.
The trick is just to differentiate both sides of the equation and then solve for the derivative we
are seeking. In fact we have already done this, without using the name “implicit differentiation”,
when we found the derivative of log x in the previous section. There we knew that the function
154
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION
f (x) = log x satisfied the equation e f (x) = x for all x. That is, the functions e f (x) and x are in fact
the same function and so have the same derivative. So we had
d f (x ) d
e = x=1
dx dx
d f (x )
We then used the chain rule to get dx e = e f (x) f 1 (x), which told us that f 1 (x) obeys the equation
• Now, to find the slope of the tangent line at (1, ´1), pretend that our curve is y = f (x) so that
f (x) obeys
f ( x ) = f ( x ) 3 + x f ( x ) + x3
• At this point we could isolate for f 1 (x) and write it in terms of f (x) and x, but since we only
want answers when x = 1, let us substitute in x = 1 and f (1) = ´1 (since the curve passes
through (1, ´1)) and clean things up before doing anything else.
2
f 1 (1) = 3 f 1 (1) ´ 1 + f 1 (1) + 3 and so f 1 (1) = ´
3
17 In Theorem 3.3.7 we wrote the x-coordinate of the point as a. The following examples use the name x0 instead. Of
course, we could use any name we would like — a, x0 , ♥. . . etc — but the symbols that are usually chosen for this
are x0 or a.
18 This type of luck rarely happens in the “real world”. But it happens remarkably frequently in textbooks, problem
sets and tests.
155
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION
y = y3 + xy + x3
for all x. We are asked to find y2 (x). We cannot solve this equation to get an explicit formula for
d
y(x). So we use implicit differentiation, as we did in Example 4.5.1. That is, we apply dx to both
sides of (E1). This gives
Since Example 4.5.1 asked us to find the tangent line at a specific point, we substituted in some
values before solving for y1 (x). In this example we are just finding the general derivative – not at a
specific value – so there are no values to substitute in. We go directly to solving for y1 (x) by moving
all y1 (x)’s to the left hand side, giving
1 ´ x ´ 3y(x)2 y1 (x) = y(x) + 3x2
156
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION
Remark 1. We have now computed y2 (x) — sort of. The answer is in terms of y(x), which we
don’t know. Since we cannot get an explicit formula for y(x), there’s not a great deal that we can do,
in general.
Remark 2. Even though we cannot solve y = y3 + xy + x3 explicitly for y(x), for general x, it
is sometimes possible to solve equations like this for some special values of x. In fact, we saw
in Example 4.5.1 that when x = 1, the given equation reduces to y(1) = y(1)3 + 1 ¨ y(1) + 13 , or
y(1)3 = ´1, which we can solve to get y(1) = ´1. Substituting into (E2), as we did in Example
4.5.1 gives
´1 + 3 2
y1 (1) = =´
1 ´ 1 ´ 3(´1) 2 3
and substituting into (E4) gives
2
2 2
6 + 2 ´ 3 + 6 ( ´1 ) ´ 3 6 ´ 43 ´ 83 2
2
y (1) = = =´
1 ´ 1 ´ 3(´1) 2 ´3 3
(It’s a fluke that, in this example, y1 (1) and y2 (1) happen to be equal.) So we now know that,
even though we can’t solve y = y3 + xy + x3 explicitly for y(x), the graph of the solution passes
through (1, ´1) and has slope ´ 23 (i.e. is sloping downwards by between 30˝ and 45˝ ) there and,
furthermore, the slope of the graph decreases as x increases through x = 1.
157
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION
y
x
1
−1
Here is a sketch of the part of the graph very near (1, ´1). The tangent line to the graph at (1, ´1) is
also shown. Note that the tangent line is sloping down to the right, as we expect, and that the graph
lies below the tangent line near (1, ´1). That’s because the slope f 1 (x) is decreasing (becoming
more negative) as x passes through 1.
Example 4.5.2
Warning 4.5.3.
Many people will suppress the (x) in y(x) when doing computations like those in Example
y+3x2
4.5.2. This gives shorter, easier to read formulae, like y1 = 1´x´3y 2 . If you do this, you
must never forget that y is a function of x and is not a constant. If you do forget, you’ll
dy
make the very serious error of saying that dx = 0, which is false.
Okay. The next one returns to a question involving tangent lines, and is at the same time a bit
easier (because it is a quadratic, and because we only need to take the first derivative) and a bit
harder (because we are asked for the tangent at a general point on the curve, not a specific one).
Example 4.5.4
Let (x0 , y0 ) be a point on the ellipse 3x2 + 5y2 = 7. Find the equation for the tangent lines when
x = 1 and y is positive. Then find an equation for the tangent line to the ellipse at a general point
(x0 , y0 ).
Since we are not given an specific point x0 we are going to have to be careful with the second
half of this question.
3 + 5y2 = 7
5y2 = 4
2
y = ˘? .
5
?
We are only interested in positive y, so our point on the curve is (1, 2/ 5).
158
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION
dy
• Now we use implicit differentiation to find dx at this point. First we pretend that we have
solved the curve explicitly, for some interval of x’s, as y = f (x). The equation becomes
3x2 + 5 f (x)2 = 7 now differentiate
6x + 10 f (x) f 1 (x) = 0
3x
f 1 (x ) = ´ .
5 f (x )
?
• When x = 1, y = 2/ 5 this becomes
3 3
f 1 (1) = ´? =´ ? .
5 ¨ 2/ 5 2 5
? 3
So the tangent line passes through (1, 2/ 5) and has slope ´ 2? 5
. Hence the tangent line has
equation
y = y0 + f 1 (x0 )(x ´ x0 )
2 3
= ? ´ ? (x ´ 1)
5 2 5
7 ´ 3x
= ? or equivalently
2 5
?
3x + 2 5y = 7.
Now we should go back and do the same but for a general point on the curve (x0 , y0 ):
• A good first step here is to sketch the curve. Since this is an ellipse, it is pretty straight-forward.
p p (x0 , y0 )
− 7/3, 0 7/3, 0
3x2 + 5y 2 = 7 3x3 + 5y 2 = 7
? that there are two points on the ellipse — the extreme right and left points (x0 , y0 ) =
• Notice
˘ 7/3, 0 — at which the tangent line is vertical. In those two cases, the tangent line is just
x = x0 .
• Since this is a quadratic for y, we could solve it explicitly to get
c
7 ´ 3x2
y=˘
5
and choose the positive or negative branch as appropriate. Then we could differentiate to find
the slope and put things together to get the tangent line.
But even in this relatively easy case, it is computationally cleaner, and hence less vulnerable
to mechanical errors, to use implicit differentiation. So that’s what we’ll do.
159
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION
• Now we could again “pretend” that we have solved the equation for the ellipse for y = f (x)
near (x0 , y0 ), but let’s not do that. Instead (as we did just before this example) just remember
that when we differentiate y is really a function of x. So starting from
3x2 + 5y2 = 7 differentiating gives
6x + 5 ¨ 2y ¨ y1 = 0.
We can then solve this for y1 :
3x
y1 = ´
5y
where y1 and y are both functions of x.
• Hence at the point (x0 , y0 ) we have
ˇ 3x0
y1 ˇ(x
0 ,y0 )
=´ .
5y0
This is the slope of the tangent line at (x0 , y0 ) and so its equation is
y = y0 + y1 ¨ (x ´ x0 )
3x
= y0 ´ 0 (x ´ x0 ).
5y0
We can simplify this by multiplying through by 5y0 to get
5y0 y = 5y20 ´ 3x0 x + 3x02 .
We can clean this up more by moving all the terms that contain x or y to the left-hand side and
everything else to the right:
3x0 x + 5y0 y = 3x02 + 5y20 .
But there is one more thing we can do, our original equation is 3x2 + 5y2 = 7 for all points on
the curve, so we know that 3x02 + 5y20 = 7. This cleans up the right-hand side:
3x0 x + 5y0 y = 7.
• In deriving this formula for the tangent line at (x0 , y0 ) we have assumed that y?0 ‰ 0. But in
fact the final answer happens to also work when y0 = 0 (which means x0 = ˘ 7/3), so that
the tangent line is x = x0 .
We can also check that our answer for general (x0 , y0 ) reduces to our answer for x0 = 1.
?
• When x0 = 1 we worked out that y0 = 2/ 5.
• Plugging this into our answer above gives
?
3x0 x + 5y0 y = 7 sub in (x0 , y0 ) = (1, 2/ 5) :
2
3x + 5 ? y = 7 clean up a little
5
?
3x + 2 5y = 7
as required.
160
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION
Example 4.5.4
Example 4.5.5
At which points does the curve x2 ´ xy + y2 = 3 cross the x–axis? Are the tangent lines to the curve
at those points parallel?
This is a 2 part question — first the x-intercepts and then we need to examine tangent lines.
• Finding where the curve crosses the x-axis is straight forward. It does so when y = 0. This
means x satisfies
?
x 2 ´ x ¨ 0 + 02 = 3 so x = ˘ 3.
?
So the curve crosses the x–axis at two points ˘ 3 , 0 .
• Now we need to find the tangent lines at those points. But we don’t actually need the lines,
just their slopes. Again we can pretend that near one of those points the curve is y = f (x).
d
Applying dx to both sides of x2 ´ x f (x) + f (x)2 = 3 gives
2x ´ f (x) ´ x f 1 (x) + 2 f (x) f 1 (x) = 0
etc etc.
• But let us stop “pretending”. Just make sure we remember that y is a function of x when we
differentiate:
x2 ´ xy + y2 = 3 start with the curve, and differentiate
2x ´ xy1 ´ y + 2yy1 = 0
?
Now substitute in the first point, x = + 3, y = 0:
? ?
2 3 ´ 3y1 + 0 = 0
y1 = 2
?
And now do the second point x = ´ 3, y = 0:
? ?
´2 3 + 3y1 + 0 = 0
y1 = 2
? ?
Thus the slope is the same at x = 3 and x = ´ 3 and the tangent lines are parallel.
161
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION
Example 4.5.5
Okay — let’s get away from curves and do something a little different.
Example 4.5.6
You are standing at the origin. At time zero a pitcher throws a ball at your head19 .
Figure 4.5.1.
r
θ(t)
d − vt
The position of the (centre of the) ball at time t is x(t ) = d ´ vt, where d is the distance from
your head to the pitcher’s mound and v is the ball’s velocity. Your eye sees the ball filling 20 an angle
2θ (t ) with
r
sin θ (t ) =
d ´ vt
where r is the radius of the baseball. The question is “How fast is θ growing at time t?” That is,
what is dθ
dt ?
• We don’t know (yet) how to solve this equation to find θ (t ) explicitly. So we use implicit
differentiation.
d
• To do so we apply dt to both sides of our equation. This gives
rv
cos θ (t ) ¨ θ 1 (t ) =
(d ´ vt )2
• As is often the case, when using implicit differentiation, this answer is not very satisfying
because it contains θ (t ), for which we still do not have an explicit formula. However in this
case we can get an explicit formula for cos θ (t ) , without having an explicit formula for
θ (t ), just by looking at the right–angled triangle in Figure 4.5.1, above.
162
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION
Example 4.5.6
Okay — just one more tangent-to-the-curve example and then we’ll go on to something different.
Example 4.5.7
Let (x0 , y0 ) be a point on the astroid21
x /3 + y /3 = 1.
2 2
• As was the case in examples above we can rewrite the equation of the astroid near (x0 , y0 ) in
the form y = f (x), with an explicit f (x), by solving the equation x2/3 + y2/3 = 1. But again,
it is computationally cleaner, and hence less vulnerable to mechanical errors, to use implicit
differentiation. So that’s what we’ll do.
x0/3 + y0/3 = 1.
2 2
• Now, no pretending that y = f (x), this time — just make sure we remember when we
differentiate that y changes with x.
x /3 + y /3 = 1
2 2
start with the curve, and differentiate
2 ´1/3 2 ´1/3 1
x + y y =0
3 3
• Note the derivative of x2/3 , namely 23 x´1/3 , and the derivative of y2/3 , namely 23 y´1/3 y1 , are
defined only when x ‰ 0 and y ‰ 0. We are interested in the case that x = x0 and y = y0 . So
we better assume that x0 ‰ 0 and y0 ‰ 0. Probably something weird happens when x0 = 0 or
y0 = 0. We’ll come back to this shortly.
21 Here is where is the astroid comes from. Imagine two circles, one of radius 1/4 and one of radius 1. Paint a red
dot on the smaller circle. Then imagine the smaller circle rolling around the inside of the larger circle. The curve
traced by the red dot is our astroid. Search “astroid” (be careful about the spelling) to find animations showing this.
The astroid was first discussed by Johann Bernoulli in 1691–92. It also appears in the work of Leibniz.
163
C OMPUTING D ERIVATIVES 4.6 I NVERSE FUNCTIONS
• To continue on, we set x = x0 , y = y0 in the equation above, and then solve for y1 :
1/3
2 ´1/3 2 ´1/3 1 y0
x0 + y0 y (x) = 0 ùñ y (x0 ) = ´
1
3 3 x0
This is the slope of the tangent line and its equation is
1/3
y0
y = y0 + f (x0 )(x ´ x0 ) = y0 ´
1
(x ´ x0 )
x0
?
Now let’s think a little bit about what the tangent line slope of ´ 3 y0/x0 tells us about the astroid.
• First, as a preliminary observation, note that since x0/3 ě 0 and y0/3 ě 0 the equation x0/3 +
2 2 2
?
• For all x0 , y0 ą 0 the slope ´ 3 y0/x0 ă 0. So at all points on the astroid that are in the first
quadrant, the tangent line has negative slope, i.e. is “leaning backwards”.
• As x0 tends to zero, y0 tends to ˘1 and the tangent line slope tends to infinity. So at points on
the astroid near (0, ˘1), the tangent line is almost vertical.
• As y0 tends to zero, x0 tends to ˘1 and the tangent line slope tends to zero. So at points on the
astroid near (˘1, 0), the tangent line is almost horizontal.
Here is a figure illustrating all this.
(x0 , y0 )
x2/3 + y 2/3 = 1
Sure enough, as we speculated earlier, something weird does happen to the astroid when x0 or y0 is
zero. The astroid is pointy, and does not have a tangent there.
Example 4.5.7
164
C OMPUTING D ERIVATIVES 4.6 I NVERSE FUNCTIONS
f (Y ) = X. (4.6.1)
If we’re lucky, then for each real number X there is exactly one real number Y , that we’ll call
f ´1 (X ),
obeying (4.6.1). Then f ´1 is called the inverse function of f . A (trivial) example in which
this happens is given in Example 4.6.1, below.
If we’re a little less lucky, there is a set of real numbers D (that does not contain all of R) such
that
• for each real number X in D there is exactly one real number Y , that we’ll again call f ´1 (X ),
obeying (4.6.1) but
• for each real number X that is not in D there is no Y obeying (4.6.1).
Then f ´1 is again called the inverse function of f and D is called the domain of f ´1 . We have
already seen an example of this — namely f (x) = ex . We’ll review this example in Example 4.6.2,
below.
If we’re yet a little less lucky, there is at least one real number X for which there is more than
one real number Y obeying (4.6.1). The trigonometric functions are like this. We’ll take a first quick
look at this in Example 4.7.1, below and take a more thorough look in the next section, §4.7, below.
Example 4.6.1
Let f (x) = 2x. For this f (x), equation (4.6.1) becomes
2Y = X
For each real number X, there is exactly one Y , namely Y = X2 , that obeys 2Y = X. So, the function
f (x) = 2x has inverse function f ´1 (X ) = X2 .
Example 4.6.1
Example 4.6.2
Let f (x) = ex . For this f (x), equation (4.6.1) becomes
eY = X
For concreteness, let’s pick a specific value of X, say X = 2. The graph of eY , as a function of Y , is
sketched below. In that sketch, the x–axis has been renamed the Y –axis, because we are interested
in eY as a function of Y . (Be careful to distinguish the upper case Y from the lower case y.) The
165
C OMPUTING D ERIVATIVES 4.6 I NVERSE FUNCTIONS
y y = eY
y=2
x=Y
y = −2
number of Y ’s obeying eY = 2 is exactly the number of times the horizontal straight line y = 2
intersects the graph y = eY , which is one. So for X = 2, there is exactly one Y obeying eY = X. On
the other hand, for X = ´2, the number of Y ’s obeying eY = ´2 is exactly the number of times the
horizontal straight line y = ´2 intersects the graph y = eY , which is zero. So for X = ´2, no Y ’s
obey eY = X.
As Y runs from ´8 to +8, eY takes each strictly positive value exactly once and never takes
any value zero or smaller. So the domain of ln x, the inverse function of ex , is exactly the interval
(0, 8).
Example 4.6.2
Example 4.6.3
Let f (x) = sin(x). For this f (x), equation (4.6.1) becomes
sin(Y ) = X
For each fixed real number X, the number of Y ’s that obey sin(Y ) = X, is exactly the number of
times the horizontal straight line y = X intersects the graph y = sin(Y ). When ´1 ď X ď 1, the line
y = X intersects the graph y = sin(Y ) infinitely many times. This is illustrated in the figure below
by the line y = 0.3. On the other hand, when X ă ´1 or X ą 1, the line y = X never intersects the
graph y = sin(Y ). This is illustrated in the figure below by the line y = ´1.2. We’ll see what is
normally done about this in §4.7, below.
y = sin(x)
y = 0.3
x
y = −1.2
166
C OMPUTING D ERIVATIVES 4.6 I NVERSE FUNCTIONS
Example 4.6.3
It is an easy matter to construct the graph of an inverse function from the graph of the original
function. We just need to remember that
Y = f ´1 (X ) ðñ f (Y ) = X
y
y = f (x)
Now replace each x by Y and each y by X and replace the resulting label X = f (Y ) on the curve by
the equivalent Y = f ´1 (X ).
X
Y = f −1 (X)
Finally we just need to redraw the sketch with the Y axis running vertically (with Y increasing
upwards) and the X axis running horizontally (with X increasing to the right). To do so, pretend that
the sketch was on a transparency or on a very thin piece of paper that you can see through. Lift the
sketch up and flip it over so that the Y axis runs vertically and the X axis runs horizontally. If you
want can also convert the upper case X into a lower case x and the upper case Y into a lower case y.
167
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
Y Y = f −1 (X) y y = f −1 (x)
X x
d d
f f ´1 (x) = x = 1
dx dx
By the chain rule
d d ´1 1
f 1 f ´1 (x) ¨ f ´1 (x) = 1 ùñ f (x) = 1 ´1 (4.6.2)
dx dx f f (x )
Example 4.6.4
The inverse function of f (x) = ex is f ´1 (x) = log x. Since f 1 (x) = ex , (4.6.2) gives
d 1 1
log x = log x = .
dx e x
Example 4.6.4
22 There is a theorem called the Inverse Function Theorem, which we will not prove, that says that, under reasonable
hypotheses on f (x), f ´1 (x) is differentiable.
168
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
Learning Objectives
• Sketch f (x) = arctan x.
• Evaluate (at nice points) the inverse trigonometric functions arcsin(x), arccos(x) and
arctan(x).
• Use implicit differentiation / chain rule to find the derivatives of the inverse trigono-
metric functions arcsin(x), arccos(x) and arctan(x).
We are now going to consider the problem of finding the derivatives of the inverses of trigono-
metric functions. Most importantly, remind yourself that: given a function f (x), its inverse function
f ´1 (x) only exists, with domain D, when f (x) passes the “horizontal line test”, which says that for
each Y in D the horizontal line y = Y intersects the graph y = f (x) exactly once. (That is, f (x) is a
one-to-one function.)
Let us start by playing with the sine function and determine how to restrict the domain of sin x
so that its inverse function exists.
Example 4.7.1
Let y = f (x) = sin(x). We would like to find the inverse function which takes y and returns to us a
unique x-value so that sin(x) = y.
y = sin(x)
y = 0.3
x
y = −1.2
• For each real number Y , the number of x-values that obey sin(x) = Y , is exactly the number
of times the horizontal straight line y = Y intersects the graph of sin(x).
• When ´1 ď Y ď 1, the horizontal line intersects the graph infinitely many times. This is
illustrated in the figure above by the line y = 0.3.
• On the other hand, when Y ă ´1 or Y ą 1, the line y = Y never intersects the graph of sin(x).
This is illustrated in the figure above by the line y = ´1.2.
This is exactly the horizontal line test and it shows that the sine function is not one-to-one.
Now consider the function
π π
y = sin(x) with domain ´ ď x ď
2 2
169
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
This function has the same formula but the domain has been restricted so that, as we’ll now show,
the horizontal line test is satisfied.
y = sin(x)
y = 0.3
x
− π2 π
2
y = −1.2
As we saw above when |Y | ą 1 no x obeys sin(x) = Y and, for each ´1 ď Y ď 1, the line y = Y
(illustrated in the figure above with y = 0.3) crosses the curve y = sin(x) infinitely many times, so
that there are infinitely many x’s that obey f (x) = sin x = Y . However exactly one of those crossings
(the dot in the figure) has ´π/2 ď x ď π/2.
That is, for each ´1 ď Y ď 1, there is exactly one x, call it X, that obeys both
π π
sin X = Y and ´ ďX ď
2 2
That unique value, X, is typically denoted arcsin(Y ). That is
π π
sin(arcsin(Y )) = Y and ´ ď arcsin(Y ) ď
2 2
Renaming Y Ñ x, the inverse function arcsin(x) is defined for all ´1 ď x ď 1 and is determined by
the equation
π π
sin arcsin(x) = x and ´ ď arcsin(x) ď . (4.7.1)
2 2
Note that many texts will use sin´1 (x) to denote arcsine, however we will use arcsin(x) since we
feel that it is clearer23 ; the reader should recognise both.
Example 4.7.1
Example 4.7.2
Since
π π 1
sin =1 sin =
2 6 2
and ´π/2 ď π/6, π/2 ď π/2, we have
π 1 π
arcsin 1 = arcsin =
2 2 6
23 The main reason being that people frequently confuse sin´1 (x) with (sin(x))´1 = sin1 x . We feel that prepending
the prefix “arc” less likely to lead to such confusion. The notations asin(x) and Arcsin(x) are also used.
170
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
Even though
sin(2π ) = 0
it is not true that arcsin 0 = 2π, and it is not true that arcsin sin(2π ) = 2π, because 2π is not
between ´π/2 and π/2. More generally
arcsin sin(x) = the unique angle θ between ´π/2 and π/2 obeying sin θ = sin x
= x if and only if ´π/2 ď x ď π/2
So, for example, arcsin sin 11π/16 cannot be 11π/16 because 11π/16 is bigger than π/2. So how
do we find the correct answer? Start by sketching the graph of sin(x).
y = sin(11π/16)
x
5π π 11π
16 2 16 y = sin(x)
3π 3π
16 16
It looks like the graph of sin x is symmetric about x = π/2. The mathematical way to say that “the
graph of sin x is symmetric about x = π/2” is “sin(π/2 ´ θ ) = sin(π/2 + θ )” for all θ . That is indeed
true24 .
Now 11π/16 = π/2 + 3π/16 so
11π π 3π π 3π 5π
sin = sin + = sin ´ = sin
16 2 16 2 16 16
and, since 5π/16 is indeed between ´π/2 and π/2,
11π 5π 11π
arcsin sin = and not .
16 16 16
Example 4.7.2
arcsin(x) = θ (x),
171
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
dθ
so that the derivative we are seeking is dx . The above equation is (after taking sine of both sides)
equivalent to
sin(θ ) = x
Now differentiate this using implicit differentiation (we just have to remember that θ varies with x
and use the chain rule carefully):
dθ
cos(θ ) ¨ =1
dx
dθ 1
= substitute θ = arcsin x
dx cos(θ )
d 1
arcsin x =
dx cos(arcsin x)
This doesn’t look too bad, but it’s not really very satisfying because the right hand side is expressed
in terms of arcsin(x) and we do not have an explicit formula for arcsin(x).
However even withoutan explicit formula for arcsin(x), it is a simple matter to get an explicit
formula for cos arcsin(x) , which is all we need. Just draw a right–angled triangle with one angle
being arcsin(x). This is done in the figure below25 .
1 x
θ
√
1 − x2
Since sin(θ ) = x (see (4.7.1)), we have made the side opposite the angle θ of? length x and the
hypotenuse of length 1. Then, by Pythagoras, the side adjacent to θ has length 1 ´ x2 and so
a
cos arcsin(x) = cos(θ ) = 1 ´ x2
d 1
arcsin(x) = ?
dx 1 ´ x2
The definitions for arccos, arctan and arccot are developed in the same way. Here are the graphs
that are used.
25 The figure is drawn for the case that 0 ď arcsin(x) ď π/2. Virtually the same argument works for the case
´π/2 ď arcsin(x) ď 0
172
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
y = cos(x)
y = 0.3
x
π
y = −1.2
y y = tan(x)
y = 0.8
x
− π2 π
2
y y = cot(x)
y = 0.8
x
π π
2
The definitions for the remaining two inverse trigonometric functions may also be developed in the
same way2627 . But it’s a little easier to use
1 1
csc x = sec x =
sin x cos x
26 In fact, there are two different widely used definitions of arcsec x. Under our definition, below, θ = arcsec x
takes values in 0 ď θ ď π. Some people, perfectly legitimately, define θ = arcsec x to take values in the union
of 0 ď θ ă π2 and π ď θ ă 3π 2 . Our definition is sometimes called the “trigonometry friendly” definition. The
definition itself has the advantage of simplicity. The other definition is sometimes called the “calculus friendly”
definition. It eliminates some absolute values and hence simplifies some computations. Similarly, there are two
different widely used definitions of arccsc x.
27 One could also define arccot(x) = arctan(1/x) with arccot(0) = π2 . We have chosen not to do so, because the
definition we have chosen is both continuous and standard.
173
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
Definition 4.7.3.
Example 4.7.4
To find the derivative of arccos we can follow the same steps:
dθ
• Write arccos(x) = θ (x) so that cos θ = x and the desired derivative is dx .
174
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
1 √
1 − x2
θ
x
• Thus
d 1
arccos x = ´ ? .
dx 1 ´ x2
Example 4.7.4
Example 4.7.5
Very similar steps give the derivative of arctan x:
• Start with θ = arctan x, so tan θ = x.
• Differentiate implicitly:
dθ
sec2 θ =1
dx
dθ 1
= = cos2 θ
dx sec2 θ
d
arctan x = cos2 (arctan x).
dx
• To simplify this expression, we draw the relevant triangle
√
1 + x2
x
θ
1
• Thus
d 1
arctan x = .
dx 1 + x2
175
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
√
1 + x2
1
θ
x
Example 4.7.5
Example 4.7.6
To find the derivative of arccsc we can use its definition and the chain rule.
θ = arccsc x take cosecant of both sides
1
csc θ = x but csc θ = , so flip both sides
sin θ
1
sin θ = now take arcsine of both sides
x
1
θ = arcsin
x
Now just differentiate:
dθ d 1
= arcsin chain rule carefully
dx dx x
1 ´1
=? ¨ 2
1 ´ x´2 x
To simplify further we will factor x´2 out of the square root. We need to be a little careful doing
that. Take another look at examples 2.1.32 and 2.1.33 and the discussion between them before
proceeding.
1 ´1
=a ¨ 2
x´2 (x2 ´ 1) x
1 ´1
= ? ¨ 2 note that x2 ¨ |x´1 | = |x|.
|x | ¨ x ´ 1 x
´1 2
1
=´ ?
|x| x2 ´ 1
176
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
In the same way, we can find the derivative of the remaining inverse trig function. We just use its
definition, a derivative we already know and the chain rule.
d d 1 1 1 1
arcsec(x) = arccos = ´? ¨ ´ 2 = ?
dx dx x 1 ´ 1/x 2 x |x| x2 ´ 1
Example 4.7.6
By way of summary, we have
Theorem 4.7.7.
177
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
178
Applications of Differentiation
179
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
In Section 3.3.2 we defined the derivative at x = a, f 1 (a), of an abstract function f (x), to be its
instantaneous rate of change at x = a:
f (x ) ´ f (a)
f 1 (a) = lim
xÑa x´a
This abstract definition, and the whole theory that we have developed to deal with it, turns out
be extremely useful simply because “instantaneous rate of change” appears in a huge number of
settings. Here are a few examples.
• If you are moving along a line and x(t ) is your position on the line at time t, then your rate of
change of position, x1 (t ), is your velocity. If, instead, v(t ) is your velocity at time t, then your
rate of change of velocity, v1 (t ), is your acceleration.
• If P(t ) is the size of some population (say the number of humans on the earth) at time t, then
P1 (t ) is the rate at which the size of that population is changing. It is called the net birth rate.
• Radiocarbon dating, a procedure used to determine the age of, for example, archaeological
materials, is based on an understanding of the rate at which an unstable isotope of carbon
decays.
• A capacitor is an electrical component that is used to repeatedly store and release electrical
charge (say electrons) in an electronic circuit. If Q(t ) is the charge on a capacitor at time t,
then Q1 (t ) is the instantaneous rate at which charge is flowing into the capacitor. That’s called
the current. The standard unit of charge is the coulomb. One coulomb is the magnitude of the
charge of approximately 6.241 ˆ 1018 electrons. The standard unit for current is the amp. One
amp represents one coulomb per second.
181
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES
182
Chapter 5
R ELATED R ATES
Learning Objectives
• Implement a sequence of steps to solve related rates problems.
1 Related rate problems are problems in which you are given the rate of change of one quantity and are to determine
the rate of change of another, related, quantity.
183
R ELATED R ATES
• Typically a little geometry (or some physics or. . . ) will allow you to relate these quantities
(above it was the formula that links the volume of a sphere to its radius).
• Implicit differentiation will then allow you to link the rate of change of one quantity to another.
Another balloon example
Example 5.0.1
Consider a helium balloon rising vertically from a fixed point 200m away from you. You are trying
to work out how fast it is rising. Now — computing the velocity directly is difficult, but you can
measure angles. You observe that when it is at an angle of π/4 its angle is changing by 0.05 radians
per second.
184
R ELATED R ATES
• So denote the angle to be θ (in radians), the height of the balloon (in m) by h and time (in
seconds) by t. Then trigonometry tells us
h = 200 ¨ tan θ
Example 5.0.1
• So now define x(t ) to be the distance between the bottom of the ladder and the wall, at time t,
and let y(t ) be the distance between the top of the ladder and the ground at time t. Measure
time in seconds, but both distances in meters.
185
R ELATED R ATES
x2 + y2 = 52
Example 5.0.2
The next example is complicated by the rates of change being stated not just as “the rate of
change per unit time” but instead being stated as “the percentage rate of change per unit time”. If a
quantity f is changing with rate ddtf , then we can say that
df
dt
f is changing at a rate of 100 ¨ percent.
f
Thus if, at time t, f has rate of change r%, then
f 1 (t ) r
100 = r ùñ f 1 (t ) = f (t )
f (t ) 100
so that if h is a very small time increment
f (t + h) ´ f (t ) r rh
« f (t ) ùñ f (t + h) « f (t ) + f (t )
h 100 100
rh
That is, over a very small time interval h, f increases by the fraction 100 of its value at time t.
So armed with this, let’s look at the problem.
Example 5.0.3
The quantities P, Q and R are functions of time and are related by the equation R = PQ. Assume
2 Since the ladder isn’t buried in the ground, we can discard the solution y = ´4.
186
R ELATED R ATES
1
that P is increasing instantaneously at the rate of 8% per year (meaning that 100 PP = 8) and that Q
1
is decreasing instantaneously at the rate of 2% per year (meaning that 100 QQ = ´2). Determine the
percentage rate of change for R.
Solution. This one is a little different — we are given the variables and the formula, so no picture
drawing or defining required. Though we do need to define a time variable — let t denote time in
years.
dR
= PQ1 + QP1
dt
R1 PQ1 + QP1
100 = 100
R R
but R = PQ, so rewrite it as
PQ1 + QP1
= 100
PQ
PQ 1 QP1
= 100 + 100
PQ PQ
Q 1 P 1
= 100 + 100
Q P
so we have stated the instantaneous percentage rate of change in R as the sum of the percentage
rate of change in P and Q.
R1
100 = ´2 + 8 = 6
R
That is, the instantaneous percentage rate of change of R is 6% per year.
Example 5.0.3
• First a diagram; the one below is perhaps a bit over the top.
187
R ELATED R ATES
• Let’s call s(t ) the distance from the shadow to the point on the ground directly underneath the
ball.
• By similar triangles we see that
4.9t 2 49 ´ 4.9t 2
=
10 s(t )
10
We can then solve for s(t ) by just multiplying both sides by 4.9t 2
s(t ). This gives
49 ´ 4.9t 2 100
s(t ) = 10 = ´ 10
4.9t 2 t2
• Differentiating with respect to t will then give us the rates,
100
s1 (t ) = ´2
t3
• So, at t = 1, s1 (1) = ´200m/sec. That is, the shadow is moving to the left at 200m/sec.
Example 5.0.4
188
R ELATED R ATES
travelling east at 3km/h and boat B is travelling north at 4km/h. At 3pm how fast is the distance
between the boats changing.
• First we draw a picture.
• Let x(t ) be the distance at time t, in km, from boat A to the original position of boat B (i.e. to
the position of boat B at noon). And let y(t ) be the distance at time t, in km, of boat B from
its original position. And let z(t ) be the distance between the two boats at time t.
• Additionally we are told that x1 = ´3 and y1 = 4 — notice that x1 ă 0 since that distance is
getting smaller with time, while y1 ą 0 since that distance is increasing with time.
• Further at 3pm boat A has travelled 9km towards the original position of boat B, so x =
15 ´ 9 = 6, while boat B has travelled 12km away from its original position, so y = 12.
• The distances x, y and z form a right-angled triangle, and Pythagoras tells us that
z2 = x2 + y2 .
At 3pm we know x = 6, y = 12 so
z2 = 36 + 144 = 180
? ?
z = 180 = 6 5.
189
R ELATED R ATES
Example 5.0.5
Consider a cylindrical fuel tank of radius r and length L (in some appropriate r
units) that is lying on its side. Suppose that fuel is being pumped into the
tank at a rate q. At what rate is the fuel level rising?
L
Solution. If the tank were vertical everything would be much easier. Unfortunately the tank is on
its side, so we are going to have to work a bit harder to establish the relation between the depth and
volume. Also notice that we have not been supplied with units for this problem — so we do not
need to state the units of our variables.
• Again — draw a picture. Here is an end view of the tank; the shaded part of the circle is filled
with fuel.
θ r
• Let us denote by V (t ) the volume of fuel in the tank at time t and by h(t ) the fuel level at time
t.
• We have been told that V 1 (t ) = q and have been asked to determine h1 (t ). While it is possible
to do so by finding a formula relating V (t ) and h(t ), it turns out to be quite a bit easier to first
find a formula relating V and the angle θ shown in the end view. We can then translate this
back into a formula in terms of h using the relation
h(t ) = r ´ r cos θ (t ).
• The computation that follows below gets a little involved in places, so we will drop the “(t )”
on the variables V , h and θ . The reader must never forget that these three quantities are really
functions of time, while r and L are constants that do not depend on time.
• The volume of fuel is L times the cross–sectional area filled by the fuel. That is,
V = L ˆ Area
190
R ELATED R ATES
While we do not have a canned formula for the area of a chord of a circle like this, it is easy to
express the area of the chord in terms of two areas that we can compute.
r
V = L ˆ Area = L ˆ Area 2θ ´ Area θ r
2θ r 2θ 2θ 2
– The piece of pie is the fraction 2π of the full circle, so its area is 2π πr =
θ r2 .
– The triangle θ r as height r cos θ and base 2r sin θ and hence has area
1 r2
2 (r cos θ )(2r sin θ ) = r2 sin θ cos θ = 2 sin(2θ ), where we have used a double-angle
formula.
Subbing these two areas into the above expression for V gives
2 r2 Lr2
V = L ˆ θ r ´ sin 2θ = 2θ ´ sin 2θ
2 2
Oof!
• Now we can differentiate to find the rate of change. Recalling that V = V (t ) and θ = θ (t ),
while r and L are constants,
Lr2
V1 = [2θ 1 ´ 2 cos 2θ ¨ θ 1 ]
2
= Lr2 ¨ θ 1 ¨ [1 ´ cos 2θ ]
Solving this for θ 1 and using V 1 = q gives
q
θ1 =
Lr2 (1 ´ cos 2θ )
This is the rate at which θ is changing, but we need the rate at which h is changing. We get
this from
h = r ´ r cos θ differentiating this gives
h1 = r sin θ ¨ θ 1
Substituting our expression for θ 1 into the expression for h1 gives
q
h1 = r sin θ ¨ 2
Lr (1 ´ cos 2θ )
• We can clean this up a bit more — recall more double-angle formulas
q
h1 = r sin θ ¨ 2 substitute cos 2θ = 1 ´ 2 sin2 θ
Lr (1 ´ cos 2θ )
q
= r sin θ ¨ 2 now cancel r’s and a sin θ
Lr ¨ 2 sin2 θ
q
=
2Lr sin θ
191
R ELATED R ATES
• But we can clean this up even more — instead of writing this rate in terms of θ it is more
natural to write it in terms of h (since the initial problem is stated in terms of h). From the
triangle
r−h θ r
• As a check, notice that h1 becomes undefined when h ă 0 and also when h ą 2r, because then
the argument of the square root in the denominator is negative. Both make sense — the fuel
level in the tank must obey 0 ď h ď 2r.
Example 5.0.6
192
Chapter 6
Learning Objectives
• Recognize the two types of indeterminate forms where L’Hôpital’s rule is directly
applicable.
Let us return to limits (Chapter 2) and see how we can use derivatives to simplify certain families of
limits called indeterminate forms. We know, from Theorem 2.1.14 on the arithmetic of limits, that if
and G ‰ 0, then
f (x ) F
lim =
xÑa g(x) G
The requirement that G ‰ 0 is critical — we explored this in Example 2.1.18. Please reread that
example.
Of course1 it is not surprising that if F ‰ 0 and G = 0, then
f (x )
lim = DNE
xÑa g(x)
1 Now it is not so surprising, but perhaps back when we started limits, this was not so obvious.
193
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS
However when both F, G = 0 then, as we saw in Example 2.1.18, almost anything can happen
x 1
f (x ) = x g ( x ) = x2 lim = lim = DNE
xÑ0 x2 xÑ0 x
x2
f ( x ) = x2 g(x ) = x lim = lim x = 0
xÑ0 x xÑ0
x
f (x ) = x g(x ) = x lim = lim 1 = 1
xÑ0 x xÑ0
7x2 7 7
f (x) = 7x2 g(x) = 3x2 lim 2 = lim =
xÑ0 3x xÑ0 3 3
Indeed after exploring Example 2.1.23 and 2.1.25 we gave ourselves the rule of thumb that if we
found 0/0, then there must be something that cancels.
Because the limit that results from these 0/0 situations is not immediately obvious, but also
leads to some interesting mathematics, we should give it a name.
There are quite a number of mathematical tools for evaluating such indeterminate forms —
Taylor series for example. A simpler method, which works in quite a few cases, is L’Hôpital’s rule2 .
2 Named for the 17th century mathematician, Guillaume de l’Hôpital, who published the first textbook on differential
calculus. The eponymous rule appears in that text, but is believed to have been developed by Johann Bernoulli.
The book was the source of some controversy since it contained many results by Bernoulli, which l’Hôpital
acknowledged in the preface, but Bernoulli felt that l’Hôpital got undue credit.
Note that around that time l’Hôpital’s name was commonly spelled l’Hospital, but the spelling of silent s in French
was changed subsequently; many texts spell his name l’Hospital. If you find yourself in Paris, you can hunt along
Boulevard de l’Hôpital for older street signs carved into the sides of buildings which spell it “l’Hospital” — though
arguably there are better things to do there.
194
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS
Then
f (x ) f 1 (a)
lim = 1 ,
xÑa g(x) g (a)
(b) while, if f 1 (x) and g1 (x) exist, with g1 (x) nonzero, on an open interval that contains a,
except possibly at a itself, and if the limit
f 1 (x )
lim exists or is +8 or is ´8
xÑa g1 (x)
then
f (x ) f 1 (x )
lim = lim 1
xÑa g(x) xÑa g (x)
Proof. We only give the proof for part (a). The proof of part (b) is not very difficult, but uses the
Generalised Mean–Value Theorem (Theorem 9.7.1), which is optional and most readers have not
seen it.
• First note that we must have f (a) = g(a) = 0. To see this note that since derivative f 1 (a)
exists, we know that the limit
f (x ) ´ f (a)
lim exists
xÑa x´a
Since we know that the denominator goes to zero, we must also have that the numerator goes
to zero (otherwise the limit would be undefined). Hence we must have
lim ( f (x) ´ f (a)) = lim f (x) ´ f (a) = 0
xÑa xÑa
We are told that lim f (x) = 0 so we must have f (a) = 0. Similarly we know that g(a) = 0.
xÑa
195
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES
f (x ) ´ f (a) (x ´ a)´1
= lim multiply by 1 =
xÑa g(x) ´ g(a) (x ´ a)´1
f (x) ´ f (a) (x ´ a)´1
= lim ¨ rearrange
xÑa g(x) ´ g(a) (x ´ a)´1
f (x ) ´ f (a)
x´a
= lim use arithmetic of limits
xÑa g(x) ´ g(a)
x´a
f (x ) ´ f (a)
lim f 1 (a)
=
xÑa x´a = 1
g(x ) ´ g(a) g (a)
lim
xÑa x´a
We can justify this step and apply Theorem 2.1.14, since the limits in the numerator and
denominator exist, because they are just f 1 (a) and g1 (a).
• Notice that
lim sin x = 0
xÑ0
lim x = 0
xÑ0
• So by l’Hôpital’s rule
sin x f 1 (0) 1
lim = 1 = = 1.
xÑ0 x g (0) 1
196
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES
Example 6.1.1
Example 6.1.2
Consider the limit
sin(x)
lim
xÑ0 sin(2x)
• First check
lim sin 2x = 0
xÑ0
lim sin x = 0
xÑ0
Example 6.1.2
Example 6.1.3
Let q ą 1 and compute the limit
qx ´ 1
lim
xÑ0 x
This limit arose in our discussion of exponential functions in Section 3.5.
• First check
lim (qx ´ 1) = 1 ´ 1 = 0
xÑ0
lim x = 0
xÑ0
197
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES
qh ´ 1
lim = log q.
hÑ0 h
Example 6.1.3
In this example, we shall apply L’Hôpital’s rule twice before getting the answer.
Example 6.1.4
Compute the limit
sin(x2 )
lim
xÑ0 1 ´ cos x
sin(x2 ) f 1 (0) 0
lim = 1 = .
xÑ0 1 ´ cos x g (0) 0
• It appears that we are stuck until we remember that l’Hôpital’s rule (as stated in Theorem 6.0.2)
has a part (b) — now is a good time to reread it.
3 While it might not be immediately obvious, this example relies on circular reasoning. In order to apply l’Hôpital’s
rule, we need to compute the derivative of qx . However in order to compute that limit (see Section 3.5) we needed
to evaluate this limit.
A more obvious example of this sort of circular reasoning can be seen if we use l’Hôpital’s rule to compute the
derivative of f (x) = xn at x = a using the limit
x n ´ an nxn´1 ´ 0
f 1 (a) = lim = lim = nan´1 .
x Ña x ´ a x Ña 1´0
d n
We have used the result dx x = nxn´1 to prove itself!
198
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES
• It says that
f (x ) f 1 (x )
lim = lim 1
xÑ0 g(x) xÑ0 g (x)
provided this second limit exists. In our case this requires us to compute
2x cos(x2 )
lim
xÑ0 sin(x)
By l’Hôpital’s rule
2x cos(x2 ) h1 (0)
lim = 1 =2
xÑ0 sin(x) ` (0)
sin(x2 ) 2x cos(x2 )
lim = lim = 2.
xÑ0 1 ´ cos x xÑ0 sin(x)
• We can succinctly summarise the two applications of L’Hôpital’s rule in this example by
Here “num” and “den” are used as abbreviations of “numerator” and “denominator” respec-
tively.”
Example 6.1.4
One must be careful to ensure that the hypotheses of l’Hôpital’s rule are satisfied before applying
it. The following “warnings” show the sorts of things that can go wrong.
199
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES
If
then
f (x ) f 1 (a) f 1 (x )
lim need not be the same as or lim .
xÑa g(x) g1 (a) xÑa g1 (x)
Then
f (x ) 3x 3ˆ0
lim = lim = =0
xÑ0 g(x) xÑ0 4 + 5x 4+5ˆ0
f 1 (x ) f 1 (0) 3
lim = =
xÑ0 g1 (x) g1 ( 0 ) 5
If
then
f (x ) f 1 (x )
lim need not be the same as lim .
xÑa g(x) xÑa g1 (x)
Then
f (x ) 4 + 5x
lim = lim = DNE
xÑ0 g(x) xÑ0 3x
f 1 (x ) 5 5
lim = lim =
xÑ0 g (x)
1 xÑ0 3 3
This next one is more subtle; the limits of the original numerator and denominator functions
both go to zero, but the limit of the ratio their derivatives does not exist.
200
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES
If
but
f 1 (x )
lim does not exist
xÑa g1 (x)
f (x )
lim exists.
xÑa g(x)
1 1
f 1 (x) = 2x sin ´ cos
x x
and we then try to compute the limit
f 1 (x ) 1 1
lim = lim 2x sin ´ cos
xÑ0 g1 (x) xÑ0 x x
However, this limit does not exist. The first term converges to 0 (by the squeeze theorem),
but the second term cos(1/x) just oscillates wildly between ˘1. All we can conclude
from this is
Since the limit of the ratio of derivatives does not exist, we cannot apply
l’Hôpital’s rule.
Instead we should go back to the original limit and apply the squeeze theorem:
f (x ) x2 sin 1x 1
lim = lim = lim x sin = 0,
xÑ0 g(x) xÑ0 x xÑ0 x
It is also easy to construct an example in which the limits of numerator and denominator are
201
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.2 VARIATIONS
both zero, but the limit of the ratio and the limit of the ratio of the derivatives do not exist. A slight
change of the previous example shows that it is possible that
lim f (x) = 0 and lim g(x) = 0
xÑa xÑa
exist. Take
1
a=0 f (x) = x sin g(x ) = x
x
Then (with a quick application of the squeeze theorem)
lim f (x) = 0 and lim g(x) = 0.
xÑ0 xÑ0
However,
f (x ) x sin 1x 1
lim = lim = lim sin
xÑ0 g(x) xÑ0 x xÑ0 x
does not exist. And similarly
f 1 (x ) sin 1x ´ 1x cos 1x
lim = lim
xÑ0 g1 (x) xÑ0 x2
does not exist.
6.2 IJ Variations
Theorem 6.0.2 is the basic form of L’Hôpital’s rule, but there are also many variations. Here are a
bunch of them.
(a) L’Hôpital’s rule also applies when the limit of x Ñ a is replaced by lim or by lim or by
xÑa+ xÑa´
lim or by lim .
xÑ+8 xÑ´8
We can justify adapting the rule to the limits to ˘8 via the following reasoning
f (x ) f (1/y)
lim = lim substitute x = 1/y
xÑ8 g(x) yÑ0+ g(1/y)
´ y12 f 1 (1/y)
= lim ,
yÑ0+ ´ y12 g1 (1/y)
d
where we have used l’Hôpital’s rule (assuming this limit exists) and the fact that dy f (1/y) =
1 1
´ y2 f (1/y) (and similarly for g). Cleaning this up and substituting y = 1/x gives the required
result:
f (x ) f 1 (1/y) f 1 (x )
lim = lim 1 = lim 1 .
xÑ8 g(x) yÑ0+ g (1/y) xÑ8 g (x)
202
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.2 VARIATIONS
Example 6.2.1
Consider the limit
arctan x ´ π2
lim
xÑ8 1/x
Example 6.2.1
8
(b) 8 indeterminate form: L’Hôpital’s rule also applies when lim f (x) = 0, lim g(x) = 0 is
xÑa xÑa
replaced by lim f (x) = ˘8, lim g(x) = ˘8.
xÑa xÑa
Example 6.2.2
Consider the limit
log x
lim
xÑ8 x
The numerator and denominator both blow up towards infinity so this is an 8/8 indeterminate
form. An application of l’Hôpital’s rule gives
log x 1/x
lim = lim
x on xÑ8 1
xÑ8 loomo
numÑ8
denÑ8
1
= lim =0
xÑ8 x
Example 6.2.2
Example 6.2.3
Consider the limit
5x2 + 3x ´ 3
lim
xÑ8 x2 + 1
203
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.2 VARIATIONS
5x2 + 3x ´ 3 10x + 3 10
lim 2
= lim = lim = 5.
x +1
xÑ8 loooooomoooooon 2x
xÑ8 loomoon xÑ8 2
numÑ8 numÑ8
denÑ8 denÑ8
Example 6.2.3
Example 6.2.4
Compute the limit
log x
lim
xÑ0+ tan π ´ x
2
2 cos( π2 ´ x) sin( π2 ´ x)
= ´ lim =0
1
xÑ0+ looooooooooooomooooooooooooon
numÑ0
denÑ1
Example 6.2.4
Sometimes things don’t quite work out as we would like and l’Hôpital’s rule can get stuck in a
loop. Remember to think about the problem before you apply any rule.
204
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.2 VARIATIONS
Example 6.2.5
Consider the limit
ex + e´x
lim
xÑ8 ex ´ e´x
Clearly both numerator and denominator go to 8, so we have a 8/8 indeterminate form. Naively
applying l’Hôpital’s rule gives
ex + e´x ex ´ e´x
lim = lim x
xÑ8 ex ´ e´x xÑ8 e + e´x
(c) 0 ¨ 8 indeterminate form: When lim f (x) = 0 and lim g(x) = 8. We can use a little algebra
xÑa xÑa
0 8
to manipulate this into either a 0 or 8 form:
f (x ) g(x )
lim lim
xÑa 1/g(x) xÑa 1/ f (x)
Example 6.2.6
Consider the limit
lim x ¨ log x
xÑ0+
Here the function f (x) = x goes to zero, while g(x) = log x goes to ´8. If we rewrite this as
the fraction
log x
x ¨ log x =
1/x
205
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.2 VARIATIONS
Example 6.2.6
Example 6.2.7
In this example we’ll evaluate lim xn e´x , for all natural numbers n. We’ll start with n = 1 and
xÑ+8
n = 2 and then, using what we have learned from those cases, move on to general n.
´x x 1
lim loomo
x on looemoon = lim = lim = lim e´x = 0
xÑ+8 ex on xÑ+8 loomo
xÑ+8 loomo ex on xÑ+8
Ñ8 Ñ0
numÑ+8 numÑ1
denÑ+8 denÑ+8
x2 2x 2
x2 on looemo
lim loomo ´x
on = lim x
= lim x
= lim x
= lim 2e´x = 0
xÑ+8 e on xÑ+8 loomo
xÑ+8 loomo e on xÑ+8 loomo
e on xÑ+8
Ñ8 Ñ0
numÑ+8 numÑ8 numÑ2
denÑ+8 denÑ+8 denÑ+8
n ´x xn
x on looemoon = lim
lim loomo
xÑ+8 ex on
xÑ+8loomo
Ñ8 Ñ0
numÑ+8
denÑ+8
nxn´1
= lim
ex on
xÑ+8 loomo
numÑ8
denÑ+8
n(n ´ 1)xn´2
= lim
ex
xÑ+8 loooooomoooooon
numÑ8
denÑ+8
n!
= ¨ ¨ ¨ = lim =0
ex on
xÑ+8 loomo
numÑn!
denÑ+8
Example 6.2.7
206
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS
(d) 8 ´ 8 indeterminate form: When lim f (x) = 8 and lim g(x) = 8. We rewrite the difference
xÑa xÑa
as a fraction using a common denominator
h(x )
f (x ) ´ g(x ) =
`(x )
Example 6.3.1
Consider the limit
π´
Since the limit of both sec x and tan x is +8 as x Ñ 2 , this is an 8 ´ 8 indeterminate form.
However we can rewrite this as
1 sin x 1 ´ sin x
sec x ´ tan x = ´ =
cos x cos x cos x
which is then a 0/0 indeterminate form. This then gives
1 ´ sin x ´ cos x
lim sec x ´
loomoon loomoon = lim
tan x = lim =0
xÑ π2 ´ cos x
xÑ π2 ´ looomooon ´ sin x
xÑ π2 ´ loomoon
Ñ+8 Ñ+8
numÑ0 numÑ0
denÑ0 denÑ´1
Example 6.3.1
Example 6.3.2
In this example, we evaluate the 8 ´ 8 indeterminate form
1 1
lim ´
log(1 + x)
x on loooomoooon
xÑ0 loomo
Ñ˘8 Ñ˘8
207
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS
numÑ0
denÑ0
x
= ´ lim (E2)
(1 + x) log(1 + x) + x
xÑ0 loooooooooooomoooooooooooon
numÑ0
denÑ1ˆ0+0=0
Example 6.3.2
The following example can be done by l’Hôpital’s rule, but it is actually far simpler to multiply
by the conjugate and take the limit using the tools of Chapter 2.
Example 6.3.3
Consider the limit
a a
lim x2 + 4x ´ x2 ´ 3x
xÑ8
208
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS
? ?
which is now a 0/0 form with f (x) = 1 + 4/x ´ 1 ´ 3/x and g(x) = 1/x. Then
´4/x2 3/x2 1
f 1 (x ) = ? ´ ? g1 ( x ) = ´
2 1 + 4/x 2 1 ´ 3/x x2
Hence
f 1 (x ) 4 3
= ? +?
g (x )
1 2 1 + 4/x 1 ´ 3/x
And so in the limit as x Ñ 8
f 1 (x ) 4 3 7
lim = + =
xÑ8 g1 (x) 2 2 2
and so our original limit is also 7/2.
By comparison, if we multiply by the conjugate we have
a a a a ?x2 + 4x + ?x2 ´ 3x
x2 + 4x ´ x2 ´ 3x = x2 + 4x ´ x2 ´ 3x ¨ ? ?
x2 + 4x + x2 ´ 3x
x2 + 4x ´ (x2 ´ 3x)
=? ?
x2 + 4x + x2 ´ 3x
7x
=? ?
x2 + 4x + x2 ´ 3x
7
=? ? assuming x ą 0
1 + 4/x + 1 ´ 3/x
Now taking the limit as x Ñ 8 gives 7/2 as required. Just because we know l’Hôpital’s rule, it
does not mean we should use it everywhere it might be applied.
Example 6.3.3
(e) 18 indeterminate form: We can use l’Hôpital’s rule on limits of the form
lim f (x)g(x) with
xÑa
lim f (x) = 1 and lim g(x) = 8
xÑa xÑa
which is now an 0 ¨ 8 form. This can be further transformed into a 0/0 or 8/8 form:
log lim f (x)g(x) = lim log ( f (x)) ¨ g(x)
xÑa xÑa
log ( f (x))
= lim .
xÑa 1/g(x)
4 We are using the fact that the logarithm is a continuous function and Theorem 2.3.8.
209
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS
Example 6.3.4
The following limit appears quite naturally when considering systems which display exponential
growth or decay.
lim (1 + x) /x
a
with the constant a ‰ 0
xÑ0
h i
Since (1 + x)a/x = exp log (1 + x)a/x and the exponential function is continuous, our
original limit is ea .
Example 6.3.4
Example 6.3.5
In the limit
sin x 1/x2
lim
xÑ0 x
the base, sinx x , converges to 1 (see Example 6.1.1) and the exponent, 1
x2
, goes to 8. But if we
take logarithms then
210
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS
which is yet another 0/0 form. Once more with l’Hôpital’s rule:
sin x cos x
´ lim = ´ lim
4 sin x + 2x cos x
xÑ0 loooooooomoooooooon 4 cos x + 2 cos x ´ 2x sin x
xÑ0 loooooooooooooomoooooooooooooon
numÑ0 numÑ1
denÑ0 denÑ6
1
=´
6
Oof! We have just shown that the logarithm of our original limit is ´1/6. Hence the original
limit itself is e´1/6 .
This was quite a complicated example. However it does illustrate the importance of cleaning up
your algebraic expressions. This will both reduce the amount of work you have to do and will
also reduce the number of errors you make.
Example 6.3.5
(f) 00 indeterminate form: Like the 18 form, this can be treated by considering its logarithm.
Example 6.3.6
For example, in the limit
lim xx
xÑ0+
both the base, x, and the exponent, also x, go to zero. But if we consider the logarithm then we
have
log xx = x log x
which is a 0 ¨ 8 indeterminate form, which we already know how to treat. In fact, we already
found, in Example 6.2.6, that
lim x log x = 0
xÑ0+
Example 6.3.6
211
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS
(g) 80 indeterminate form: Again, we can treat this form by considering its logarithm.
Example 6.3.7
For example, in the limit
lim x /x
1
xÑ+8
the base, x, goes to infinity and the exponent, 1x , goes to zero. But if we take logarithms
log x
log x /x =
1
x
which is an 8/8 form, which we know how to treat.
1
log x x
lim = lim =0
1 on
x on xÑ+8 loomo
xÑ+8 loomo
numÑ8 numÑ0
denÑ8 denÑ1
Example 6.3.7
212
Chapter 7
S KETCHING G RAPHS
One of the most obvious applications of derivatives is to help us understand the shape of the graph of
a function. In this section we will use our accumulated knowledge of derivatives to identify the most
important qualitative features of graphs y = f (x). The goal of this section is to highlight features of
the graph y = f (x) that are easily
Learning Objectives
• Sketch a function using information from precalculus (limits, intercepts) and the first
derivative
• Efficiently find signs of factored functions by determining where the signs change.
Given a function f (x), there are several important features that we can determine from that
expression before examining its derivatives.
• The domain of the function — take note of values where f does not exist. If the function
is rational, look for where the denominator is zero. Similarly be careful to look for roots of
negative numbers or other possible sources of discontinuities.
• Intercepts — examine where the function crosses the x-axis and the y-axis by solving f (x) = 0
and computing f (0).
213
S KETCHING G RAPHS 7.1 D OMAIN , INTERCEPTS AND ASYMPTOTES
• Vertical asymptotes — look for values of x at which f (x) blows up. If f (x) approaches either
+8 or ´8 as x approaches a (or possibly as x approaches a from one side) then x = a is a
vertical asymptote to y = f (x). When f (x) is a rational function (written so that common
factors are cancelled), then y = f (x) has vertical asymptotes at the zeroes of the denominator.
• Horizontal asymptotes — examine the limits of f (x) as x Ñ +8 and x Ñ ´8. Often f (x)
will tend to +8 or to ´8 or to a finite limit L. If, for example, lim f (x) = L, then y = L is
xÑ+8
a horizontal asymptote to y = f (x) as x Ñ 8.
Example 7.1.1
Consider the function
x+1
f (x ) =
(x + 3)(x ´ 2)
• We see that it is defined on all real numbers except x = ´3, +2.
• Since f (0) = ´1/6 and f (x) = 0 only when x = ´1, the graph has y-intercept (0, ´1/6)
and x-intercept (´1, 0).
• Since the function is rational and its denominator is zero at x = ´3, +2 it will have vertical
asymptotes at x = ´3, +2. To determine the shape around those asymptotes we need to
examine the limits
lim f (x) lim f (x)
xÑ´3 xÑ2
Notice that when x is close to ´3, the factors (x + 1) and (x ´ 2) are both negative, so the
+1
sign of f (x) = xx´2 ¨ x+1 3 is the same as the sign of x + 3. Hence
lim f (x) = +8 lim f (x) = ´8
xÑ´3+ xÑ´3´
A similar analysis when x is near 2 gives
lim f (x) = +8 lim f (x) = ´8
xÑ2+ xÑ2´
• Finally since the numerator has degree 1 and the denominator has degree 2, we see that as
x Ñ ˘8, f (x) Ñ 0. So y = 0 is a horizontal asymptote.
• Since we know the behaviour around the asymptotes and we know the locations of the
intercepts (as shown in the left graph below), we can then join up the pieces and smooth them
out to get the a good sketch of this function (below right).
214
S KETCHING G RAPHS 7.2 F IRST DERIVATIVE — INCREASING OR DECREASING
Example 7.1.1
Example 7.2.1
Consider the function
f (x) = x4 ´ 6x3
• Before we move on to derivatives, let us first examine the function itself as we did above.
– As f (x) is a polynomial its domain is all real numbers.
– Its y-intercept is at (0, 0). We find its x-intercepts by factoring
f (x) = x4 ´ 6x3 = x3 (x ´ 6)
So it crosses the x-axis at x = 0, 6.
1 This is the extension of the definition of “singular point” that was mentioned in the footnote in Definition 3.5.6.
215
S KETCHING G RAPHS 7.2 F IRST DERIVATIVE — INCREASING OR DECREASING
– Again, since the function is a polynomial it does not have any vertical asymptotes. And
since
• Since the function is a polynomial, it does not have any singular points, but it does have two
critical points at x = 0, 9/2. These two critical points split the real line into 3 open intervals
216
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY
Example 7.2.1
Learning Objectives
• Explain what it means for a twice-differentiable function to be concave up or concave
down on an interval.
• Explain how information about the graph of a function may be extracted from the
function, its derivative and its second derivative.
217
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY
• Sketch the graph of a function f (x) using the function, its derivative and its second
derivative.
• Sketch the graph of a function using characteristics determined from the function and
its derivatives, without scaffolding from an external source.
The second derivative f 2 (x) tells us the rate at which the derivative changes. Perhaps the easiest
way to understand how to interpret the sign of the second derivative is to think about what it implies
about the slope of the tangent line to the graph of the function. Consider the following sketches of
y = 1 + x2 and y = ´1 ´ x2 .
• In the case of y = f (x) = 1 + x2 , f 2 (x) = 2 ą 0. Notice that this means the slope, f 1 (x), of
the line tangent to the graph at x increases as x increases. Looking at the figure on the left
above, we see that the graph always lies above the tangent lines.
• For y = f (x) = ´1 ´ x2 , f 2 (x) = ´2 ă 0. The slope, f 1 (x), of the line tangent to the graph
at x decreases as x increases. Looking at the figure on the right above, we see that the graph
always lies below the tangent lines.
?
Similarly consider the following sketches of y = x´1/2 and y = 4 ´ x:
218
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY
Both of their derivatives, ´ 12 x´3/2 and ´ 12 (4 ´ x)´1/2 , are negative, so they are decreasing functions.
Examining second derivatives shows some differences.
• For the first function, y2 (x) = 34 x´5/2 ą 0, so the slopes of tangent lines are increasing with x
and the graph lies above its tangent lines.
• However, the second function has y2 (x) = ´ 14 (4 ´ x)´3/2 ă 0 so the slopes of the tangent
lines are decreasing with x and the graph lies below its tangent lines.
More generally
Definition 7.3.1.
Let f (x) be a continuous function on the interval [a, b] and suppose its first and second
derivatives exist on that interval.
• If f 2 (x) ą 0 for all a ă x ă b, then the graph of f lies above its tangent lines for
a ă x ă b and it is said to be concave up.
• If f 2 (x) ă 0 for all a ă x ă b, then the graph of f lies below its tangent lines for
a ă x ă b and it is said to be concave down.
concave
down
(c,f (c))
inflection
concave point
up
219
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY
To avoid confusion we recommend the reader stick with the terms “concave up” and “concave
down”.
Let’s now continue Example 7.2.1 by discussing the concavity of the curve.
Example 7.3.2 (Continuation of Example 7.2.1)
Consider again the function
f (x) = x4 ´ 6x3
• Thus the second derivative is zero (and potentially changes sign) at x = 0, 3. Thus we should
consider the sign of the second derivative on the following intervals
Since the concavity changes at both x = 0 and x = 3, the following are inflection points
• Putting this together with the information we obtained earlier gives us the following sketch
220
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY
Example 7.3.2
Example 7.3.3 Optional — y = x1/3 and y = x2/3
In our Definition 7.3.1, concerning concavity and inflection points, we considered only functions
having first and second derivatives on the entire interval of interest. In this example, we will consider
the functions
f (x) = x1/3 g(x) = x2/3
We shall see that x = 0 is a singular point for both of those functions. There is no universal agreement
as to precisely when a singular point should also be called an inflection point. We choose to extend
our definition of inflection point in Definition 7.3.1 as follows. If
• the function f (x) is defined and continuous on an interval a ă x ă b and if
• the first and second derivatives f 1 (x) and f 2 (x) exist on a ă x ă b except possibly at the single
point a ă c ă b and if
• f is concave up on one side of c and is concave down on the other side of c
then we say that c , f (c) is an inflection point of y = f (x). Now let’s check out y = f (x) and
y = g(x) from this point of view.
(1) Features of y = f (x) and y = g(x) that are read off of f (x) and g(x):
• Since f (0) = 01/3 = 0 and g(0) = 02/3 = 0, the origin (0, 0) lies on both y = f (x) and
y = g(x ).
• For example, 13 = 1 and (´1)3 = ´1 so that the cube root of 1 is 11/3 = 1 and the cube
root of ´1 is (´1)1/3 = ´1. In general,
$
&ă 0 if x ă 0
’
1/3
x = 0 if x = 0
’
ą 0 if x ą 0
%
Consequently the graph y = f (x) = x1/3 lies below the x-axis when x ă 0 and lies above
2
the x-axis when x ą 0. On the other hand, the graph y = g(x) = x2/3 = x1/3 lies on or
above the x-axis for all x.
• As x Ñ +8, both y = f (x) = x1/3 and y = g(x) = x2/3 tend to +8.
• As x Ñ ´8, y = f (x) = x1/3 tends to ´8 and y = g(x) = x2/3 tends to +8.
(2) Features of y = f (x) and y = g(x) that are read off of f 1 (x) and g1 (x):
# +
1 ´2/3
x if x ‰ 0
f 1 (x ) = 3 ùñ f 1 (x) ą 0 for all x ‰ 0
undefined if x = 0
# + #
2 ´1/3
x if x ‰ 0 ă 0 if x ă 0
g1 (x) = 3 ùñ g1 (x)
undefined if x = 0 ą 0 if x ą 0
221
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY
So the graph y = f (x) is increasing on both sides of the singular point x = 0, while the graph
x Ñ 0, f (x)
y = g(x) is decreasing to the left of x = 0 and is increasing to the right of x = 0. As 1
1
and g (x) become infinite. That is, the slopes of the tangent lines at x, f (x) and x, g(x)
become infinite and the tangent lines become vertical.
(3) Features of y = f (x) and y = g(x) that are read off of f 2 (x) and g2 (x):
#
2 ´1/3 5
+ #
´9x2 ´5/3
= ´9 x ] if x ‰ 0 ą 0 if x ă 0
f 2 (x ) = ùñ f 2 (x)
undefined if x = 0 ă 0 if x ą 0
# 4
+
´ 92 x´4/3 = ´ 29 x´1/3 ] if x ‰ 0
g2 ( x ) = ùñ g2 (x) ă 0 for all x ‰ 0
undefined if x = 0
So the graph y = g(x) is concave down on both sides of the singular point x = 0, while the
graph y = f (x) is concave up to the left of x = 0 and is concave down to the right of x = 0.
Since the concavity changes at x = 0 for y = f (x), but not for y = g(x), (0, 0) is an inflection point
for y = f (x), but not for y = g(x). We have the following sketch for y = f (x) = x1/3 ,
y = f (x) = x1/3
inflection point
(0, 0) x
222
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES
y = g(x) = x2/3
(0, 0) x
g ′ <0, g decreasing g ′ >0, g increasing
g ′′ <0, g concave down g ′′ <0, g concave down
Note that the curve y = f (x) = x1/3 looks perfectly smooth, even though f 1 (x) Ñ 8 as x Ñ 0.
There is no kink or discontinuity at (0, 0). The singularity at x = 0 has caused the y-axis to be a
vertical tangent to the curve, but has not prevented the curve from looking smooth.
Example 7.3.3
Definition 7.4.2.
Example 7.4.3
Let f (x) = x2 and g(x) = x3 . Then
223
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES
Example 7.4.3
Not all even and odd functions are polynomials. For example
|x| cos x and (ex + e´x )
are all even, while
sin x tan x and (ex ´ e´x )
are all odd. Indeed, given any function f (x), the function
g(x) = f (x) + f (´x) will be even, and
h(x) = f (x) ´ f (´x) will be odd.
Now let us see how we can make use of these symmetries to make graph sketching easier. Let
f (x) be an even function. Then
the point (x0 , y0 ) lies on the graph of y = f (x)
if and only if y0 = f (x0 ) = f (´x0 ) which is the case if and only if
the point (´x0 , y0 ) lies on the graph of y = f (x).
−x0 x0
Notice that the points (x0 , y0 ) and (´x0 , y0 ) are just reflections of each other across the y-axis.
Consequently, to draw the graph y = f (x), it suffices to draw the part of the graph with x ě 0 and
then reflect it in the y–axis. Here is an example. The part with x ě 0 is on the left and the full graph
is on the right.
224
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES
y y
(x0 , y0) (−x0 , y0) (x0 , y0)
x x
(−x0 , y0 ) y0 (x0 , y0 )
−x0 x0
Now the symmetry is a little harder to interpret pictorially. To get from (x0 , y0 ) to (´x0 , ´y0 ) one
can first reflect (x0 , y0 ) in the y–axis to get to (´x0 , y0 ) and then reflect the result in the x–axis to get
to (´x0 , ´y0 ). Consequently, to draw the graph y = f (x), it suffices to draw the part of the graph
with x ě 0 and then reflect it first in the y–axis and then in the x–axis. Here is an example. First,
here is the part of the graph with x ě 0.
y
(x0 , y0 )
Next, as an intermediate step (usually done in our heads rather than on paper), we add in the
reflection in the y–axis.
225
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES
y
(−x0 , y0 ) (x0 , y0 )
Finally to get the full graph, we reflect the dashed line in the x–axis
y
(−x0 , y0 ) (x0 , y0 )
(−x0 , −y0 )
y
(x0 , y0 )
(−x0 , −y0 )
x2 ´ 9
g(x ) =
x2 + 3
226
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES
namely x = ˘3. Note that we only need to establish x = 3 as an intercept. Then since g is
even, we know that x = ´3 is also an intercept.
• To find the horizontal asymptotes we compute the limit as x Ñ +8
x2 ´ 9
lim g(x) = lim
xÑ8 xÑ8 x2 + 3
x2 (1 ´ 9/x2 )
= lim 2
xÑ8 x (1 + 3/x2 )
1 ´ 9/x2
= lim =1
xÑ8 1 + 3/x2
• We can already produce a quite reasonable sketch just by putting in the horizontal asymptote
and the intercepts and drawing a smooth curve between them.
Note that we have drawn the function as never crossing the asymptote y = 1, however we have
not yet proved that. We could by trying to solve g(x) = 1.
x2 ´ 9
=1
x2 + 3
x2 ´ 9 = x2 + 3
´9 = 3 so no solutions.
227
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES
Alternatively we could analyse the first derivative to see how the function approaches the
asymptote.
There are no singular points since the denominator is nowhere zero. The only critical point is
at x = 0. Thus we must find the sign of g1 (x) on the intervals
(´8, 0) (0, 8)
• When x ą 0, 24x ą 0 and (x2 + 3) ą 0, so g1 (x) ą 0 and the function is increasing. By even
symmetry we know that when x ă 0 the function must be decreasing. Hence the critical point
x = 0 is a local minimum of the function.
• Notice that since the function is increasing for x ą 0 and the function must approach the
horizontal asymptote y = 1 from below. Thus the sketch above is quite accurate.
d 24x
g2 ( x ) =
dx (x2 + 3)2
(x2 + 3)2 ¨ 24 ´ 24x ¨ 2(x2 + 3) ¨ 2x
= cancel a factor of (x2 + 3)
( x2 + 3 ) 4
(x2 + 3) ¨ 24 ´ 96x2
=
( x2 + 3 ) 3
72(1 ´ x2 )
= 2
(x + 3)3
• It is clear that g2 (x) = 0 when x = ˘1. Note that, again, we can infer the zero at x = ´1
from the zero at x = 1 by the even symmetry. Thus we need to examine the sign of g2 (x) the
intervals
• When |x| ă 1 we have (1 ´ x2 ) ą 0 so that g2 (x) ą 0 and the function is concave up. When
|x| ą 1 we have (1 ´ x2 ) ă 0 so that g2 (x) ă 0 and the function is concave down. Thus the
points x = ˘1 are inflection points. Their coordinates are (˘1, g(˘1)) = (˘1, ´2).
228
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES
Example 7.4.4
229
S KETCHING G RAPHS 7.5 A CHECKLIST FOR SKETCHING
x0 − P x0 x0 + P x0 + 2P
Consequently, to draw the graph y = f (x), it suffices to draw one period of the graph, say the part
with 0 ď x ď P, and then translate it repeatedly. Here is an example. Here is a sketch of one period
y (x0 ,y0 )
P x
−P P 2P x
§§ A Sketching Checklist.
(1) Features of y = f (x) that are read off of f (x):
230
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
• y = f (x) is first plotted for x ě 0 if the function is even or odd. The rest of the sketch is
then created by reflections.
• y = f (x) is first plotted for a single period if the function is periodic. The rest of the sketch
is then created by translations.
• Next compute f (0), limxÑ8 f (x) and limxÑ´8 f (x) and look for solutions to f (x) = 0
that you can easily find. Then
• y = f (x) has y–intercept 0, f (0) .
• y = f (x) has x–intercept (a, 0) whenever f (a) = 0
• y = f (x) has horizontal asymptote y = Y if limxÑ8 f (x) = L or limxÑ´8 f (x) = L.
• Compute f 1 (x) and determine its critical points and singular points, then
• y = f (x) has a horizontal tangent at the points where f 1 (x) = 0.
• y = f (x) is increasing at points where f 1 (x) ą 0.
• y = f (x) is decreasing at points where f 1 (x) ă 0.
• y = f (x) has vertical tangents or vertical asymptotes at the points where f 1 (x) = ˘8.
• Compute f 2 (x) and determine where f 2 (x) = 0 or does not exist, then
• y = f (x) is concave up at points where f 2 (x) ą 0.
• y = f (x) is concave down at points where f 2 (x) ă 0.
• y = f (x) may or may not have inflection points where f 2 (x) = 0.
2 With the aid of a computer we can find the x-intercepts numerically: x « ´1.879385242, 0.3472963553, and
1.532088886.
231
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
• For very large x, both positive and negative, the x3 term in f (x) dominates the other two
terms so that
#
+8 as x Ñ +8
f (x ) Ñ
´8 as x Ñ ´8
• The critical points (where f 1 (x) = 0) are at x = ˘1. Further since the derivative is a
polynomial it is defined everywhere and there are no singular points. The critical points
split the real line into the intervals (´8, ´1), (´1, 1) and (1, 8).
So (´1, f (´1)) = (´1, 3) is a local maximum and (1, f (1)) = (1, ´1) is a local mini-
mum.
f 2 (x) = 6x
• The second derivative is zero when x = 0, and the problem is quite easy to analyse. Clearly,
f 2 (x) ă 0 when x ă 0 and f 2 (x) ą 0 when x ą 0.
• Thus f is concave down for x ă 0, concave up for x ą 0 and has an inflection point at
x = 0.
232
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
y
y = x3 − 3x + 1
(−1, 3)
(0, 1)
x
(1, −1)
Example 7.6.1
Example 7.6.2 Sketch f (x) = x4 ´ 4x3
f (x) = x4 ´ 4x3 = 0
x3 ( x ´ 4 ) = 0
• The critical points are at x = 0, 3. Since the function is a polynomial there are no singular
points. The critical points split the real line into the intervals (´8, 0), (0, 3) and (3, 8).
233
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
y y = x4 − 4x3
(0, 0) (4, 0)
x
(2, −16)
(3, −27)
f ′ <0, f decreasing f ′ <0, f decreasing f ′ >0, f increasing
Example 7.6.2
Example 7.6.3 f (x) = x3 ´ 6x2 + 9x ´ 54
234
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
f (x) = x3 ´ 6x2 + 9x ´ 54 = 0
x2 ( x ´ 6 ) + 9 ( x ´ 6 ) = 0
(x2 + 9)(x ´ 6) = 0
• The critical points are at x = 1, 3. Since the function is a polynomial there are no singular
points. The critical points split the real line into the intervals (´8, 1), (1, 3) and (3, 8).
• When x ă 1, (x ´ 1) ă 0 and (x ´ 3) ă 0, so f 1 (x) ą 0.
• When 1 ă x ă 3, (x ´ 1) ą 0 and (x ´ 3) ă 0, so f 1 (x) ă 0.
• When 3 ă x, (x ´ 1) ą 0 and (x ´ 3) ą 0, so f 1 (x) ą 0.
• Summarising all this
(´8, 1) 1 (1,3) 3 (3, 8)
f 1 (x ) positive 0 negative 0 positive
increasing maximum decreasing minimum increasing
So the point (1, f (1)) = (1, ´50) is a local maximum. The point (3, f (3)) = (3, ´54) is
a local minimum.
f 2 (x) = 6x ´ 12
• So f 2 (x) = 0 when x = 2. This splits the real line into the intervals (´8, 2) and (2, 8).
• When x ă 2, f 2 (x) ă 0.
• When x ą 2, f 2 (x) ą 0.
235
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
• Thus the function is convex down for x ă 2, then convex up for x ą 2. Hence (2, f (2)) =
(2, ´52) is an inflection point.
Putting all this information together gives us the following sketch.
y = x3 − 6x2 + 9x − 54
y
(6, 0)
(1,−50)
(0,−54)
(2,−52) (3,−54)
and if we zoom in around the interesting points (minimum, maximum and inflection point), we have
(1,−50)
(0,−54)
(2,−52)
(3,−54)
Example 7.6.3
• The function is rational so it is defined except where its denominator is zero — namely at
x = ˘2.
236
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
´x
• Since f (´x) = = ´ f (x), it is odd. Indeed this means that we only need to examine
x2 ´ 4
what happens to the function for x ě 0 and we can then infer what happens for x ď 0 using
f (´x) = ´ f (x). In practice we will sketch the graph for x ě 0 and then infer the rest from
this symmetry.
• The y-intercept is y = f (0) = 0, while the x-intercepts are given by the solution of f (x) = 0.
So the only x-intercept is 0.
• Since f is rational, it may have vertical asymptotes where its denominator is zero — at
x = ˘2. Since the function is odd, we only have to analyse the asymptote at x = 2 and we
can then infer what happens at x = ´2 by symmetry.
x
lim f (x) = lim = +8
xÑ2+ (x ´ 2)(x + 2)
xÑ2+
x
lim f (x) = lim = ´8
xÑ2 ´ xÑ2 (x ´ 2)(x + 2)
´
(x2 ´ 4) ¨ 1 ´ x ¨ 2x
f 1 (x ) =
( x2 ´ 4 ) 2
´ ( x2 + 4 )
= 2
(x ´ 4)2
• Hence there are no critical points. There are singular points where the denominator is zero,
namely x = ˘2. Before we proceed, notice that the numerator is always negative and the
denominator is always positive. Hence f 1 (x) ă 0 except at x = ˘2 where it is undefined.
• The function is decreasing except at x = ˘2.
• We already know that at x = 2 we have a vertical asymptote and that f 1 (x) ă 0 for all x. So
lim f 1 (x) = ´8
xÑ2
237
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
• So f 2 (x) = 0 when x = 0 and does not exist when x = ˘2. This splits the real line into the
intervals (´8, ´2), (´2, 0), (0, 2) and (2, 8). However we only need to consider x ě 0
(because of the odd symmetry).
• When 0 ă x ă 2, x ą 0, (x2 + 12) ą 0 and (x2 ´ 4) ă 0 so f 2 (x) ă 0.
• When x ą 2, x ą 0, (x2 + 12) ą 0 and (x2 ´ 4) ą 0 so f 2 (x) ą 0.
Putting all this information together gives the following sketch for x ě 0:
y y= x
x2 −4
2 x
f ′′ <0 f ′′ >0
convex convex up
down
y y= x
x2 −4
inflection point
-2 2 x
f ′′ <0 f ′′ >0
convex convex up
down
Notice that this means that the concavity changes at x = 0, so the point (0, f (0)) = (0, 0) is a point
of inflection (as indicated).
238
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
Example 7.6.4
This final example is more substantial since the function has singular points (points where the
derivative is undefined). The analysis is more involved.
b 2
x
Example 7.6.5 f (x) = 3 (x´6 )2
• The function is the cube root of a rational function. The rational function is defined except
at x = 6, so the domain of f is all reals except x = 6.
• Clearly the function is not periodic, and examining
d
1
f (´x) = 3
(1 ´ 6/(´x))2
d
1
= 3 ‰ ˘ f (x )
(1 + 6/x)2
lim f (x) = 1
xÑ˘8
That is, the line y = 1 will be a horizontal asymptote to the graph y = f (x) both for
x Ñ +8 and for x Ñ ´8.
• Our function f (x) Ñ +8 as x Ñ 6, because of the (1 ´ 6/x)2 in its denominator. So
y = f (x) has x = 6 as a vertical asymptote.
239
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
• Notice that the derivative is nowhere equal to zero, so the function has no critical points.
However there are two places the derivative is undefined. The terms
5/3
1 1
1/3
x´6 x
are undefined at x = 6, 0 respectively. Hence x = 0, 6 are singular points. These split the
real line into the intervals (´8, 0), (0, 6) and (6, 8).
• When x ă 0, (x ´ 6) ă 0, we have that (x ´ 6)´5/3 ă 0 and x´1/3 ă 0 and so f 1 (x) =
´4 ¨ (negative) ¨ (negative) ă 0.
• When 0 ă x ă 6, (x ´ 6) ă 0, we have that (x ´ 6)´5/3 ă 0 and x´1/3 ą 0 and so f 1 (x) ą 0.
• When x ą 6, (x ´ 6) ą 0, we have that (x ´ 6)´5/3 ą 0 and x´1/3 ą 0 and so f 1 (x) ă 0.
• We should also examine the behaviour of the derivative as x Ñ 0 and x Ñ 6.
´5/3 ´1/3
lim f (x) = ´4
1
lim (x ´ 6) lim x = ´8
xÑ0´ xÑ0´ xÑ0´
´5/3 ´1/3
lim f (x) = ´4
1
lim (x ´ 6) lim x = +8
xÑ0+ xÑ0+ xÑ0+
´5/3 ´1/3
lim f (x) = ´4
1
lim (x ´ 6) lim x = +8
xÑ6´ xÑ6´ xÑ6´
´5/3 ´1/3
lim f (x) = ´4
1
lim (x ´ 6) lim x = ´8
xÑ6+ xÑ6+ xÑ6+
240
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
Oof!
8/3 8 4
1 1
• Both of the factors x´6 = ?3 x´6 and 41/3 = ? 1
3x are even powers and so are
x
positive (though possibly infinite). So the sign of f 2 (x) is the same as the sign of the factor
x ´ 1. Thus
(´8, 1) 1 (1, 8)
f 2 (x ) negative 0 positive
inflection
concave down concave up
point
q
x2
y= 3
(x−6)2
6 x
1
It is hard to see the inflection point at x = 1, y = f (1) = ?
3 in the above sketch. So here is a blow
25
up of the part of the sketch around x = 1.
241
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES
√
(1 , 1/ 3 25)
x
6
√
(1 , 1/ 3 25)
Example 7.6.5
242
Chapter 8
O PTIMIZATION
Learning Objectives
• Determine the critical and singular points of a function.
• Explain how the algorithm can be used in optimization problems. (Note that finding a
critical point is not enough to identify an extremum.)
One important application of differential calculus is to find the maximum (or minimum) value of
a function. This often finds real world applications in problems such as the following.
Example 8.0.1
A farmer has 400m of fencing materials. What is the largest rectangular paddock that can be en-
closed?
Solution. We will describe a general approach to these sorts of problems in Sections 8.2 and 8.3
below, but here we can take a stab at starting the problem.
• Begin by defining variables and their units (more generally we might draw a picture too); let
the dimensions of the paddock be x by y metres.
A = x¨y
243
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA
At this stage we cannot apply the calculus we have developed since the area is a function of
two variables and we only know how to work with functions of a single variable. We need to
eliminate one variable.
• We know that the perimeter of the rectangle (and hence the dimensions x and y) are constrained
by the amount of fencing materials the farmer has to hand:
2x + 2y ď 400
and so we have
y ď 200 ´ x
Clearly the area of the paddock is maximised when we use all the fencing possible, so
y = 200 ´ x
• Now substitute this back into our expression for the area
A = x ¨ (200 ´ x)
Since the area cannot be negative (and our lengths x, y cannot be negative either), we must
also have
0 ď x ď 200
• Thus the question of the largest paddock enclosed becomes the problem of finding the
maximum value of
Example 8.0.1
The above example is sufficiently simple that we can likely determine the answer by several different
methods. In general, we will need more systematic methods for solving problems of the form
To do this we need to examine what a function looks like near its maximum and minimum values.
Suppose that the maximum (or minimum) value of f (x) is f (c) then what does that tell
us about c?
Notice that we have not yet made the ideas of maximum and minimum very precise. For the moment
think of maximum as “the biggest value” and minimum as “the smallest value”.
244
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA
Warning 8.1.1.
It is important to distinguish between “the smallest value” and “the smallest magnitude”.
For example, because
´5 ă ´1
the number ´5 is smaller than ´1. But the magnitude of ´1, which is | ´ 1| = 1, is
smaller than the magnitude of ´5, which is | ´ 5| = 5. Thus the smallest number in the
set t´1, ´5u is ´5, while the number in the set t´1, ´5u that has the smallest magnitude
is ´1.
Now back to thinking about what happens around a maximum. Suppose that the maximum value
of f (x) is f (c), then for all “nearby” points, the function should be smaller.
f (c + h) ´ f (c)
f 1 (c) = lim .
hÑ0 h
Split the above limit into the left and right limits:
f (c + h) ´ f (c)
lim ď0
hÑ0+ h
245
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA
This function has only 1 maximum value (the middle green point in the graph) and 1 minimum
value (the rightmost blue point), however it has 4 points at which the derivative is zero. In the
small intervals around those points where the derivative is zero, we can see that function is locally a
maximum or minimum, even if it is not the global maximum or minimum. We clearly need to be
more careful distinguishing between these cases.
246
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA
Definition 8.1.2.
Let I be an interval, like (a, b) or [a, b] for example, and let the function f (x) be defined
for all x P I. Now let c P I. Then
• we say that f (x) has a global (or absolute) minimum on the interval I at the point
x = c if f (x) ě f (c) for all x P I.
• We say that f (x) has a local1 minimum on I at x = c if f (x) ě f (c) for all x P I that
are near c. Precisely, if there is a δ ą 0 such that f (x) ě f (c) for all x P I that are
within a distance δ of c.
• Similarly, we say that f (x) has a local maximum on I at x = c if f (x) ď f (c) for all
x P I that are near c. Precisely, if there is a δ ą 0 such that f (x) ď f (c) for all x P I
that are within a distance δ of c.
The global maxima and minima of a function are called the global extrema of the function,
while the local maxima and minima are called the local extrema.
It has 3 local maxima and 3 local minima on the interval [a, b]. The global maximum occurs at
the middle green point (which is also a local maximum), and the global minimum occurs at the
rightmost blue point (which is also a local minimum).
Using the above definition we can summarise what we have learned above as the following
theorem2 :
1 Beware that, while many textbooks use these definitions of local minimum and maximum, some textbooks exclude
the endpoints a, b of the interval [a, b] from their definitions. Our definitions allow the endpoints a and b to be local
minima and maxima. Note that, under our definitions, every global minimum (maximum) is also a local minimum
(maximum).
2 This is one of several important mathematical contributions made by Pierre de Fermat, a French government lawyer
and amateur mathematician, who lived in the first half of the seventeenth century.
247
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA
Theorem 8.1.3.
Let the function f (x) be defined on the interval I and let a, b, c be points in I with
a ă c ă b. If f (x) has a local maximum or local minimum at x = c and if f 1 (c) exists,
then f 1 (c) = 0.
• It is often (but not always) the case that, when f (x) has a local maximum at x = c, the function
f (x) increases strictly as x approaches c from the left and decreases strictly as x leaves c to
the right. That is, f 1 (x) ą 0 for x just to the left of c and f 1 (x) ă 0 for x just to the right of c.
Then, it is often the case, because f 1 (x) is decreasing as x increases through c, that f 2 (c) ă 0.
• Conversely, if f 1 (c) = 0 and f 2 (c) ă 0, then, just to the right of c, f 1 (x) must be negative,
so that f (x) is decreasing, and just to the left of c, f 1 (x) must be positive, so that f (x) is
increasing. So f (x) has a local maximum at c.
• Similarly, it is often the case that, when f (x) has a local minimum at x = c, f 1 (x) ă 0 for x
just to the left of c and f 1 (x) ą 0 for x just to the right of c and f 2 (x) ą 0.
• Conversely, if f 1 (c) = 0 and f 2 (c) ą 0, then, just to the right of c, f 1 (x) must be positive,
so that f (x) is increasing, and, just to the left of c, f 1 (x) must be negative, so that f (x) is
decreasing. So f (x) has a local minimum at c.
Theorem 8.1.3 says that, when f (x) has a local maximum or minimum on an interval I at the
point x = c, there are three possibilities.
• The derivative f 1 (c) = 0. This case is illustrated in the following figure.
y y y = f ′ (x)
y = f (x)
x x
−1 1 2 3 −1 1 2
248
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA
Observe that, in this example, f 1 (x) changes continuously from negative to positive at the
local minimum, taking the value zero at the local minimum (the red dot).
• The derivative f 1 (c) does not exist. This case is illustrated in the following figure.
y y = f (x) y y = f ′ (x)
x
a b
x
a b
Observe that, in this example, f 1 (x) changes discontinuously from negative to positive at the
local minimum (x = 0) and f 1 (0) does not exist.
• The point c is an endpoint of the interval I = [a, b]. This case is also illustrated in the above
figure. The endpoints a and b are both local maxima. But f 1 (a) and f 1 (b) are not zero.
This theorem demonstrates that the points at which the derivative is zero or does not exist are very
important. It simplifies the discussion that follows if we give these points names.
Definition 8.1.5.
Let f (x) be a function that is defined on the interval a ă x ă b and let a ă c ă b. Then
• if f 1 (c) exists and is zero we call x = c a critical point of the function, and
• if f 1 (c) does not exist then we call x = c a singular point3 of the function.
Warning 8.1.6.
Note that some people (and texts) will combine both of these cases and call x = c a critical
point when either the derivative is zero or does not exist. The reader should be aware of
the lack of convention on this point4 and should be careful to understand whether the more
inclusive definition of critical point is being used, or if the text is using the more precise
definition that distinguishes critical and singular points.
3 For c to be a local maximum or minimum of f , the function f must obviously be defined at c. So here we are
considering only points c in the domain of f . We will later, in Section 7.2, extend the definition of singular points
of f to points that are not in the domain of f .
249
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA
We’ll now look at a few simple examples involving local maxima and minima, critical points
and singular points. Then we will move on to global maxima and minima.
Example 8.1.7
In this example, we’ll look for local maxima and minima of the function f (x) = x3 ´ 6x on the
interval ´2 ď x ď 3.
f 1 (x) = 3x2 ´ 6.
Since this is a polynomial it is defined everywhere on the domain and so there will not be any
singular points. So we now look for critical points.
y
y = f (x) = x3 − 6x
c− , f (c− )
x
−2 −1 1 2 3
c+ , f (c+ )
– f (x) has a local minimum at the endpoint x = ´2 (i.e. we have f (x) ě f (´2) whenever
x ě ´2 is close to ´2) and
– f (x) has a local minimum at x = c+ (i.e. we have f (x) ě f (c+ ) whenever x is close to
c+ ) and
– f (x) has a local maximum at x = c´ (i.e. we have f (x) ď f (c´ ) whenever x is close to
c´ ) and
– f (x) has a local maximum at the endpoint x = 3 (i.e. we have f (x) ď f (3) whenever
x ď 3 is close to 3) and
4 No pun intended.
250
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA
Example 8.1.8
In this example, we’ll look for local maxima and minima of the function f (x) = x3 on the interval
´1 ă x ă 1.
• First compute the derivative:
f 1 (x) = 3x2 .
Again, this is a polynomial and so defined on all of the domain. The function will not have
singular points, but may have critical points.
• The derivative is zero only when x = 0, so x = c = 0 is the only critical point of the function.
• The graph of f (x) is sketched below. From that sketch we see that f (x) has neither a
local maximum nor a local minimum at x = c despite the fact that f 1 (c) = 0 — we have
f (x) ă f (c) = 0 for all x ă c = 0 and f (x) ą f (c) = 0 for all x ą c = 0.
y
y = f (x) = x3
x
−1 c, f (c) 1
• Note that this example has been constructed to illustrate that a critical point (or singular point)
of a function need not be a local maximum or minimum for the function.
• Reread Theorem 8.1.3. It says5 “Let ¨ ¨ ¨ . If f (x) has a local maximum/minimum at x = c
5 A very common error of logic that people make is “Affirming the consequent”. When the statement “if P then Q” is
true, observing Q does not imply P. (“Affirming the consequent” eliminates “not” from the previous sentence.)
For example, “If he is Shakespeare, then he is dead,” and “That man is dead.” does not imply “He must be
Shakespeare.”. Or you may have also seen someone use this reasoning: “If a person is a genius before their time
then they are misunderstood.” “I am misunderstood.” “So I must be a genius before my time.”.
251
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA
and if f 1 (c) exists, then f 1 (c) = 0”. It does not say that “if f 1 (c) = 0 then f has a local
maximum/minimum at x = c”.
Example 8.1.8
Example 8.1.9
In this example, we’ll look for local maxima and minima of the function
#
x if x ě 0
f (x) = |x| =
´x if x ă 0
on the interval ´1 ă x ă 1 and we’ll also look for local maxima and minima of the function
g(x) = x2/3
on the interval ´1 ă x ă 1.
• These derivatives never take the value 0, so the functions f (x) and g(x) do not have any
critical points. However both derivatives do not exist at the point x = 0, so that point is a
singular point for both f (x) and g(x).
y
y = f (x) = |x|
−1 1 x
252
O PTIMIZATION 8.2 F INDING GLOBAL MAXIMA AND MINIMA
y
y = g(x) = x2/3
−1 1 x
From the figures we see that both f (x) and g(x) have a local (and in fact global) minimum at
x = 0 despite the fact that x = 0 is not a critical point.
• Reread Theorem 8.1.3 yet again. It says “Let ¨ ¨ ¨ . If f (x) has a local maximum or local
minimum at x = c and if f is differentiable at x = c, then f 1 (c) = 0”. It says nothing about
what happens at points where the derivative does not exist. Indeed that is why we have to
consider both critical points and singular points when we look for maxima and minima.
Example 8.1.9
Theorem 8.2.1.
Let the function f (x) be defined and continuous on the closed, finite interval6 ´8 ă a ď
x ď b ă 8. Then f (x) attains a maximum and a minimum at least once. That is, there
exist numbers a ď xm , xM ď b such that
Suppose that the maximum (or minimum) value of f (x), for a ď x ď b, is f (c). What
does that tell us about c?
6 The hypotheses that f (x) be continuous and that the interval be finite and closed are all essential. We suggest that
you find three functions f1 (x), f2 (x) and f3 (x) with f1 defined but not continuous on 0 ď x ď 1, f2 defined and
continuous on ´8 ă x ă 8, and f3 defined and continuous on 0 ă x ă 1, and with none of f1 , f2 and f3 attaining
either a global maximum or a global minimum.
253
O PTIMIZATION 8.2 F INDING GLOBAL MAXIMA AND MINIMA
If c obeys a ă c ă b (note the strict inequalities), then f has a local maximum (or minimum) at x = c
and Theorem 8.1.3 tells us that either f 1 (c) = 0 or f 1 (c) does not exist. The only other place that a
maximum or minimum can occur are at the ends of the interval. We can summarise this as:
Theorem 8.2.2.
If f (x) has a global maximum or global minimum, for a ď x ď b, at x = c then there are 3
possibilities. Either
• f 1 (c) = 0, or
• c = a or c = b.
That is, a global maximum or minimum must occur either at a critical point, a singular
point or at the endpoints of the interval.
This theorem provides the basis for a method to find the maximum and minimum values of f (x)
for a ď x ď b:
Corollary 8.2.3.
Let f (x) be a function on the interval a ď x ď b. Then to find the global maximum and
minimum of the function:
– f 1 (c) = 0, or
– f 1 (c) does not exist, or
– c = a or c = b.
That is — compute the function at all the critical points, singular points, and
endpoints.
• Evaluate f (c) for each c in that list. The largest (or smallest) of those values is the
largest (or smallest) value of f (x) for a ď x ď b.
Let’s now demonstrate how to use this strategy. The function in this first example is not too
simple — but it is a good example of a function that contains both a singular point and a critical
point.
Example 8.2.4
Find the largest and smallest values of the function f (x) = 2x5/3 + 3x2/3 for ´1 ď x ď 1.
Solution. We will apply the method in Corollary 8.2.3. It is perhaps easiest to find the values at the
endpoints of the intervals and then move on to the values at any critical or singular points.
254
O PTIMIZATION 8.2 F INDING GLOBAL MAXIMA AND MINIMA
• Before we get into things, notice that we can rewrite the function by factoring it:
f (1) = 2 + 3 = 5
f (´1) = 2 ¨ (´1)5/3 + 3 ¨ (´1)2/3 = ´2 + 3 = 1
• To compute the function at the critical and singular points we first need to find the derivative:
5 2
f 1 (x) = 2 ¨ x2/3 + 3 ¨ x´1/3
3 3
10 2/3
= x + 2x´1/3
3
10x + 6
=
3x1/3
• Notice that the numerator and denominator are defined for all x. The only place the derivative
is undefined is when the denominator is zero. Hence the only singular point is at x = 0. The
corresponding function value is
f (0) = 0
255
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
9 4/3
Since 0 ă 9/25 ă 1, we know that 0 ă 25 ă 1, and hence
4/3
9
0 ă f (´3/5) = 5 ¨ ă 5.
25
c ´ 35 0 ´1 1
type critical
b point singular point endpoint endpoint
9 9
f (c) 5
3
25 « 1.28 0 1 5
• The largest value of f in the table is 5 and the smallest value of f in the table is 0.
• For completeness we also sketch the graph of this function on the same interval.
y
y = f (x) = 2x5/3 + 3x2/3
x
−1 1
Later (in Section 7) we will see how to construct such a sketch without using a calculator or
computer.
Example 8.2.4
(1) Read — read the problem carefully. Work out what information is given in the statement of the
problem and what we are being asked to compute.
(2) Diagram — draw a diagram. This will typically help you to identify what you know about the
problem and what quantities you need to work out.
256
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
(3) Variables — assign variables to the quantities in the problem along with their units. It is typically
a good idea to make sensible choices of variable names: A for area, h for height, t for time etc.
(4) Relations — find relations between the variables. By now you should know the quantity we
are interested in (the one we want to maximise or minimise) and we need to establish a relation
between it and the other variables.
(5) Reduce — the relation down to a function of one variable. In order to apply the calculus we
know, we must have a function of a single variable. To do this we need to use all the information
we have to eliminate variables. We should also work out the domain of the resulting function.
(6) Maximise or minimise — we can now apply the methods of Corollary 8.2.3 to find the maximum
or minimum of the quantity we need (as the problem dictates).
(7) Be careful — make sure your answer makes sense. Make sure quantities are physical. For
example, lengths and areas cannot be negative.
(8) Answer the question — be sure your answer really answers the question asked in the problem.
• We need to determine the area of the two types of materials used and the corresponding total
cost.
257
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
• In the picture we have already introduced two variables. The square base has side-length b
metres and it has height h metres. Let the area of the base be Ab and the area of the other fives
sides be As (both in m2 ), and the total cost be C (in dollars). Finally let the volume enclosed
be V m3 .
Ab = b2
As = 4bh + b2
V = b2 h
C = 5 ¨ Ab + 1 ¨ As = 5b2 + 4bh + b2 = 6b2 + 4bh.
• To eliminate one of the variables we use the fact that the total cost is $72.
12 ´ b2 ě 0
?
and so b ď 12.
?
• Now we can apply Corollary 8.2.3 on the above expression for the volume with 0 ď b ď 12.
The endpoints give:
V (0) = 0
?
V ( 12) = 0
The derivative is
9b2
V 1 (b) = 18 ´
2
Since this is a polynomial there are no singular points. However we can solve V 1 (b) = 0 to
find critical points:
9b2
18 ´ =0 divide by 9 and multiply by 2
2
4 ´ b2 = 0
258
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
Hence b = ˘2. Thus the only critical point in the domain is b = 2. The corresponding volume
is
3
V (2) = 18 ˆ 2 ´ ˆ 23
2
= 36 ´ 12 = 24.
3 12 ´ b2 3 12 ´ 4
h= ¨ = = 6.
2 b 2 2
• All our quantities make sense; lengths, areas and volumes are all non-negative.
• Checking the question again, we see that we are asked for the dimensions of the container
(rather than its volume) so we can answer with
Example 8.3.1
Example 8.3.2
A rectangular sheet of cardboard is 6 inches by 9 inches. Four identical squares are cut from the
corners of the cardboard, as shown in the figure below, and the remaining piece is folded into an
open rectangular box. What should the size of the cut out squares be in order to maximize the
volume of the box?
Solution. This one is quite similar to the previous one, so we perhaps don’t need to go into so much
detail.
• Let the height of the box be x inches, and the base be ` ˆ w inches. The volume of the box is
then V cubic inches.
259
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
x, `, w ě 0
• We can now apply Corollary 8.2.3. First the endpoints of the interval give
V (0) = 0 V (3) = 0
The derivative is
Since this is a polynomial there are no singular points. To find critical points we solve
V 1 (x) = 0 to get
?
10 ˘ 100 ´ 4 ˆ 2 ˆ 9
x˘ =
? 4 ? ?
10 ˘ 28 10 ˘ 2 7 5 ˘ 7
= = =
4 4 2
We can then use a calculator to approximate
x+ « 3.82 x´ « 1.18.
• Notice that since 0 ă x´ ă 3 we know that the other lengths are positive, so our answer makes
sense. Further, the question only asks for the length x and not the resulting volume so we have
answered the question.
7 Say if we do not have a calculator to hand, or your instructor insists that the problem be done without one.
260
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
Example 8.3.2
There is a new wrinkle in the next two examples. Each involves finding the minimum value
of a function f (x) with x running over all real numbers, rather than just over a finite interval as in
Corollary 8.2.3. Both in Example 8.3.4 and in Example 8.3.5 the function f (x) tends to +8 as x
tends to either +8 or ´8. So the minimum value of f (x) will be achieved for some finite value of
x, which will be a local minimum as well as a global minimum.
Theorem 8.3.3.
Let f (x) be defined and continuous for all ´8 ă x ă 8. Let c be a finite real number.
(a) If lim f (x) = +8 and lim f (x) = +8 and if f (x) has a global minimum at
xÑ+8 xÑ´8
x = c, then there are 2 possibilities. Either
• f 1 (c) = 0, or
• f 1 (c) does not exist
That is, a global minimum must occur either at a critical point or at a singular point.
(b) If lim f (x) = ´8 and lim f (x) = ´8 and if f (x) has a global maximum at
xÑ+8 xÑ´8
x = c, then there are 2 possibilities. Either
• f 1 (c) = 0, or
• f 1 (c) does not exist
That is, a global maximum must occur either at a critical point or at a singular point.
Example 8.3.4
Find the point on the line y = 6 ´ 3x that is closest to the point (7, 5).
• A simple picture
261
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
• Some notation is already given to us. Let a point on the line have coordinates (x, y), and we
do not need units. And let ` be the distance from the point (x, y) to the point (7, 5).
• Since the points are on the line the coordinates (x, y) must obey
y = 6 ´ 3x
Notice that x and y have no further constraints. The distance ` is given by
`2 = (x ´ 7)2 + (y ´ 5)2
we know that x2 ´ 2x + 5 ě 4. Thus the function has no singular points and the only
critical point occurs at x = 1. The corresponding function value is then
? ? ?
`(1) = 10 1 ´ 2 + 5 = 2 10.
262
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
?
– Thus the minimum value of the distance is ` = 2 10 and occurs at x = 1.
• This answer makes sense — the distance is not negative.
• The question asks for the point that minimises the distance, not that minimum distance. Hence
the answer is x = 1, y = 6 ´ 3 = 3. I.e.
The point that minimises the distance is (1, 3).
Notice that we can make the analysis easier by observing that the point that minimises the
distance also minimises the squared-distance. So that instead of minimising the function `, we can
just minimise `2 :
`2 = 10(x2 ´ 2x + 5)
The resulting algebra is a bit easier and we don’t have to hunt for singular points.
Example 8.3.4
Example 8.3.5
Find the minimum distance from (2, 0) to the curve y2 = x2 + 1.
Solution. This is very much like the previous question.
• After reading the problem carefully we can draw a picture
y
(x, y)
x
(2, 0)
• In this problem we do not need units and the variables x, y are supplied. We define the distance
to be ` and it is given by
`2 = (x ´ 2)2 + y2 .
As noted in the previous problem, we will minimise the squared-distance since that also
minimises the distance.
• Since x, y satisfy y2 = x2 + 1, we can write the distance as a function of x:
263
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
• Since the squared-distance is a polynomial it will not have any singular points, only critical
points. The derivative is
d 2
` = 2(x ´ 2) + 2x = 4x ´ 4
dx
so the only critical point occurs at x = 1.
?
• When x = 1, y = ˘ 2 and the distance is
?
`2 = (1 ´ 2)2 + (1 + 1) = 3 `= 3
?
and thus the minimum distance from the curve to (2, 0) is 3.
Example 8.3.5
Example 8.3.6
A water trough is to be constructed from a metal sheet of width 45 cm by bending up one third of
the sheet on each side through an angle θ . Which θ will allow the trough to carry the maximum
amount of water?
Solution. Clearly 0 ď θ ď π, so we are back in the domain8 of Corollary 8.2.3.
• After reading the problem carefully we should realise that it is really asking us to maximise
the cross-sectional area. A figure really helps.
• From this we are led to define the height h cm and cross-sectional area A cm2 . Both are
functions of θ .
h = 15 sin θ
while the area can be computed as the sum of the central 15 ˆ h rectangle, plus two triangles.
Each triangle has height h and base 15 cos θ . Hence
1
A = 15h + 2 ¨ ¨ h ¨ 15 cos θ
2
= 15h (1 + cos θ )
264
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
where 0 ď θ ď π.
The derivative is
This is a continuous function, so there are no singular points. However we can still hunt for
critical points by solving A1 (θ ) = 0. That is
Hence we must have cos θ = ´1 or cos θ = 12 . On the domain 0 ď θ ď π, this means θ = π/3
or θ = π.
A(π ) = 0
A(π/3) = 225 sin(π/3)(1 + cos(π/3))
?
3 1
= 225 ¨ ¨ 1+
2 2
?
3 3
= 225 ¨ « 292.28
4
π
• Thus the cross-sectional area is maximised when θ = .
3
Example 8.3.6
Example 8.3.7
x2
Find the points on the ellipse 4 + y2 = 1 that are nearest to and farthest from the point (1, 0).
Solution. While this is another distance problem, the possible values of x, y are bounded, so we
need Corollary 8.2.3 rather than Theorem 8.3.3.
265
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
y
x, y(x)
• Let ` be the distance from the point (x, y) on the ellipse to the point (1, 0). As was the case
above, we will maximise the squared-distance.
`2 = (x ´ 1)2 + y2 .
x2
+ y2 = 1
4
Note that this also shows that ´2 ď x ď 2 and ´1 ď y ď 1.
Isolating y2 and substituting this into our expression for `2 gives
1 ´ x2 /4 .
`2 = (x ´ 1)2 + looomooon
=y2
• Now we can apply Corollary 8.2.3. The endpoints of the domain give
The derivative is
d 2 3x
` = 2(x ´ 1) ´ x/2 = ´2
dx 2
Thus there are no singular points, but there is a critical point at x = 4/3. The corresponding
squared-distance is
2
2 4 (4/3)2
` (4/3) = ´1 +1´
3 4
= (1/3)2 + 1 ´ (4/9) = 6/9 = 2/3.
266
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
x (x, y) `
´2 (´2, 0) 3
? ?
4/3 4/3, ˘ 5/3 2/3
2 (2, 0) 1
?
The point of maximum distance is (´2, 0), and the point of minimum distance is 4/3, ˘ 5/3 .
Example 8.3.7
Example 8.3.8
Find the dimensions of the rectangle of largest area that can be inscribed in an equilateral triangle of
side a if one side of the rectangle lies on the base of the triangle.
Solution. Since the rectangle must sit inside the triangle, its dimensions are bounded and we will
end up using Corollary 8.2.3.
√
(0, 3a/2)
√ a (−x, y) (x, y)
3a
2
π/3
a (−a/2, 0) (a/2, 0)
2
We have drawn (on the left) the triangle in the xy-plane with its base on the x-axis. The base
has been drawn running from (´a/2, 0) to (a/2, 0) so its centre lies at the origin. A little
Pythagoras (or a little trigonometry) tells us that the height of the triangle is
?
3
b
π
a2 ´ (a/2)2
= ¨ a = a ¨ sin
2 3
?
3
Thus the vertex at the top of the triangle lies at 0, 2 ¨ a .
• If we construct a rectangle that does not touch the sides of the triangle, then we can increase
the dimensions of the rectangle until it touches the triangle and so make its area larger. Thus
we can assume that the two top corners of the rectangle touch the triangle as drawn in the
right-hand figure above.
267
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
• Now let the rectangle be 2x wide and y high. And let A denote its area. Clearly
A = 2xy.
?
3
where 0 ď x ď a/2 and 0 ď y ď 2 a.
• Our construction means that the top-right corner of the rectangle?will have coordinates (x, y)
and lie on the line joining the top vertex of the triangle at (0, 3a/2) to the bottom-right
vertex at (a/2, 0). In order to write the area as a function of x alone, we need the equation for
this line since it will tell us how to write y as a function of x. The line has slope
?
3a/2 ´ 0 ?
slope = = ´ 3.
0 ´ a/2
?
and passes through the point (0, 3a/2), so any point (x, y) on that line satisfies:
?
? 3
y = ´ 3x + a.
2
with 0 ď x ď a/2.
A(0) = 0 A(a/2) = 0.
The derivative is
? ?
A1 (x) = 3 (x ¨ (´2) + 1 ¨ (a ´ 2x)) = 3(a ´ 4x).
Since this is a polynomial there are no singular points, but there is a critical point at x = a/4.
There
? a ? a2
A(a/4) = 3 ¨ ¨ (a ´ a/2) = 3 ¨ .
4 ? 8
? 3 ? a
y = ´ 3 ¨ (a/4) + a = 3¨ .
2 4
• Checking the question again, we see that we are asked for the dimensions rather than the area,
so the answer is 2x ˆ y:
?
3a
The largest such rectangle has dimensions a2 ˆ 4 .
268
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
Example 8.3.8
This next one is a good physics example. In it we will derive Snell’s Law9 from Fermat’s
principle10 .
Example 8.3.9
Consider the figure below which shows the trajectory of a ray of light as it passes through two
different mediums (say air and water).
P
θi
θi O
θr
θr
Q
Let ca be the speed of light in air and cw be the speed of light in water. Fermat’s principle states
that a ray of light will always travel along a path that minimises the time taken. So if a ray of light
travels from P (in air) to Q (in water) then it will “choose” the point O (on the interface) so as to
minimise the total time taken. Use this idea to show Snell’s law,
sin θi ca
=
sin θr cw
where θi is the angle of incidence and θr is the angle of refraction (as illustrated in the figure above).
Solution. This problem is a little more abstract than the others we have examined, but we can still
apply Theorem 8.3.3.
• We are given a figure in the statement of the problem and it contains all the relevant points
and angles. However it will simplify things if we decide on a coordinate system. Let’s assume
that the point O lies on the x-axis, at coordinates (x, 0). The point P then lies above the axis at
(XP , +YP ), while Q lies below the axis at (XQ , ´YQ ). This is drawn below.
9 Snell’s law is named after the Dutch astronomer Willebrord Snellius who derived it in around 1621, though it was
first stated accurately in 984 by Ibn Sahl.
10 Named after Pierre de Fermat who described it in a letter in 1662. The beginnings of the idea, however, go back
as far as Hero of Alexandria in around 60CE. Hero is credited with many inventions including the first vending
machine, and a precursor of the steam engine called an aeolipile.
269
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
(XP , +YP )
θi
θi (x, 0)
(XP , 0) (XQ , 0)
θr
θr
(XQ , −YQ )
• The statement of Snell’s law contains terms sin θi and sin θr , so it is a good idea for us to see
how to express these in terms of the coordinates we have just introduced:
opposite (x ´ XP )
sin θi = =b
hypotenuse (XP ´ x)2 + YP2
opposite (XQ ´ x)
sin θr = =b
hypotenuse (XQ ´ x)2 + YQ2
• Let `P denote the distance PO, and `Q denote the distance OQ. Then we have
b
`P = (XP ´ x)2 + YP2
b
`Q = (XQ ´ x)2 + YQ2
Notice that the terms inside the square-roots cannot be zero or negative since they are both
sums of squares and YP ,YQ ą 0. So there are no singular points, but there is a critical point
when T 1 (x) = 0, namely when
1 XP ´ x 1 XQ ´ x
0= b + b
ca (X ´ x)2 + Y 2 cw (X ´ x)2 + Y 2
P P Q Q
´ sin θi sin θr
= +
ca cw
270
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
Example 8.3.10
The Statue of Liberty has height 46m and stands on a 47m tall pedestal. How far from the statue
should an observer stand to maximize the angle subtended by the statue at the observer’s eye, which
is 1.5m above the base of the pedestal?
Solution. Obviously if we stand too close then all the observer sees is the pedestal, while if they
stand too far then everything is tiny. The best spot for taking a photograph is somewhere in between.
• Draw a careful picture11
271
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
Thus
p
ϕ = arctan
x
p+h
ϕ + θ = arctan
x
and so
p+h p
θ = arctan ´ arctan .
x x
• If we allow the viewer to stand at any point in front of the statue, then 0 ď x ă 8. Further
observe that as x Ñ 8 or x Ñ 0 the angle θ Ñ 0, since
p+h p
lim arctan = lim arctan = 0
xÑ8 x xÑ8 x
and
p+h p π
lim arctan = lim arctan =
xÑ0+ x xÑ0+ x 2
Clearly the largest value of θ will be strictly positive and so has to be taken for some 0 ă x ă 8.
(Note the strict inequalities.) This x will be a local maximum as well as a global maximum.
As θ is not singular at any 0 ă x ă 8, we need only search for critical points. A careful
application of the chain rule shows that the derivative is
´p
dθ 1 ´( p + h) 1
= ¨ ´ ¨
dx 1 + ( p+x h )2 x2 1 + ( xp )2 x2
´( p + h) p
= 2 2
+ 2
x + ( p + h) x + p2
So a critical point occurs when
( p + h) p
= cross multiply
x2 + ( p + h ) 2 x2 + p2
( p + h)(x2 + p2 ) = p(x2 + ( p + h)2 ) collect x terms
x 2 ( p + h ´ p ) = p ( p + h ) 2 ´ p2 ( p + h ) clean up
hx2 = p( p + h)( p + h ´ p) = ph( p + h) cancel common factors
x2 = p ( p + h )
b
x = ˘ p( p + h) « ˘64.9m
• Thus the best place to stand approximately 64.9m in front or behind the statue. At that point
θ « 0.348 radians or 19.9˝ .
Example 8.3.10
Example 8.3.11
Find the length of the longest rod that can be carried horizontally (no tilting allowed) from a corridor
3m wide into a corridor 2m wide. The two corridors are perpendicular to each other.
Solution.
272
O PTIMIZATION 8.3 M AX / MIN EXAMPLES
• Suppose that we are carrying the rod around the corner, then if the rod is as long as possible it
must touch the corner and the outside walls of both corridors. A picture of this is show below.
You can see that this gives rise to two similar triangles, one inside each corridor. Also the
maximum length of the rod changes with the angle it makes with the walls of the corridor.
• Suppose that the angle between the rod and the inner wall of the 3m corridor is θ , as illustrated
in the figure above. At the same time it will make an angle of π2 ´ θ with the outer wall of the
2m corridor. Denote by `1 (θ ) the length of the part of the rod forming the hypotenuse of the
upper triangle in the figure above. Similarly, denote by `2 (θ ) the length of the part of the rod
forming the hypotenuse of the lower triangle in the figure above. Then
3 2
`1 (θ ) = `2 (θ ) =
sin θ cos θ
and the total length is
3 2
`(θ ) = `1 (θ ) + `2 (θ ) = +
sin θ cos θ
where 0 ď θ ď π2 .
• The length of the longest rod we can move through the corridor in this way is the minimum of
`(θ ). Notice that `(θ ) is not defined at θ = 0, π2 . Indeed we find that as θ Ñ 0+ or θ Ñ π2 ´ ,
the length ` Ñ +8. (You should be able to picture what happens to our rod in those two
limits). Clearly the minimum allowed `(θ ) is going to be finite and will be achieved for some
0 ă θ ă π2 (note the strict inequalities) and so will be a local minimum as well as a global
minimum. So we only need to find zeroes of `1 (θ ). Differentiating ` gives
d` 3 cos θ 2 sin θ ´3 cos3 θ + 2 sin3 θ
=´ 2 + = .
dθ sin θ cos2 θ sin2 θ cos2 θ
This does not exist at θ = 0, π2 (which we have already analysed) but does exist at every
0 ă θ ă π2 and is equal to zero when the numerator is zero. Namely when
273
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS
• From this we can recover sinθ and cosθ , without having to ? compute θ itself. We can,
? for
3 3
example, construct
? a right-angle triangle with adjacent length 2 and opposite length 3 (so
3
that tan θ = 3/2):
√
22/3 + 32/3 √
3
3
θ
√3
2
a
It has hypotenuse 32/3 + 22/3 , and so
31/3
sin θ = a
32/3 + 22/3
21/3
cos θ = a
32/3 + 22/3
• Using the above expressions for sin θ , cos θ we find the minimum of ` (which is the longest
rod that we can move):
3 2 3 2
`= + = ?3 + ?3
sin θ cos θ ? 3 ? 2
22/3 +32/3 22/3 +32/3
a
= 22/3 + 32/3 3 /3 + 2 /3
2 2
2 2 3/2
= 2 /3 + 3 /3 « 7.02m
Example 8.3.11
274
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS
Concept Check-In
1. Give an example of units for N.
Here N is the independent variable, and G(N ) is the function of interest. All other quantities are
constant:
• K ą 0 is a constant, called the carrying capacity. It represents the population density that a
given environment can sustain.
a) Find the population density N that leads to the maximal growth rate G(N ).
from which it is apparent that G(N ) is a polynomial in powers of N, with constant coefficients r and
r/K.
a) To find critical points of G(N ), we find N such that G1 (N ) = 0, and then test for maxima:
r r K
G1 (N ) = r ´ 2 N = 0. ñ r=2 N ñ N= .
K K 2
275
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS
N
0 K/2 K
Figure 8.1: In logistic growth, the population growth rate rate G depends on population size N as
shown here.
Hence, N = K/2 is a critical point, but is it a maximum? We check this in one of several
ways. First, a sketch in Figure 8.1 reveals a downwards-opening parabola. This confirms a local
maximum. Alternately, we can apply Theorem 8.1.4:
r
G2 (N ) = ´2 ă 0 ñ G(N ) concave down
K
K
ñ N = is a local maximum
2
Thus, the population density with the greatest growth rate is K/2.
b) The maximal growth rate is found by evaluating the function G at the critical point, N = K/2,
!
K K K ´ K2 K 1 rK
G =r =r ¨ = .
2 2 K 2 2 4
c) To find the population size at which the growth rate is zero, we set G = 0 and solve for N:
K ´N
G(N ) = rN = 0.
K
There are two solutions. One is trivial: N = 0. (This is biologically interesting in the sense
that it rules out the ancient idea of spontaneous generation - a defunct theory that held that
life can arise on its own, from dust or air. If N = 0, the growth rate is also 0, so no population
spontaneously arises according to logistic growth.) The second solution, N = K means that the
population is at its “carrying capacity”.
♦
We return to this type of growth in Chapter 13.
276
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS
Figure 8.2: Barrels come in various shapes. But the cost of a barrel of wine was determined by the
length L (dashed blue line segment) of the wet portion of the rod inserted into the tap hole. Kepler
figured out which barrels contain the most wine for a given price.
Kepler sought the wine barrel that contains the most wine for a given cost. This is equivalent to
asking which cylinder has the largest volume for a fixed (constant) length L. Below, we solve this
optimization problem. An alternate approach is to seek the wine barrel that costs least for a given
volume,which leads to the same result.
Example 8.4.2. Find the proportions (height:radius) of the cylinder with largest volume for a fixed
length L (dashed line segment in Figure 8.2).
Solution. We make the following assumptions:
2. the tap-hole (normally sealed to avoid leaks) is half-way up the height of the barrel, and
Concept Check-In
3. Give two different examples of barrel dimensions which would both yeild a volume of
160L.
Let r, h denote the radius and height of the barrel. These two variables uniquely determine the shape
as well as the volume of the barrel. Note that because the barrel is assumed to be full, the volume of
the cylinder is the same as the volume of wine, namely
277
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS
The rod used to “measure” the amount of wine (and hence determine the cost of the barrel) is shown
as the diagonal of length L in Figure 8.3. Because the cylinder walls are perpendicular to its base, the
length L is the hypotenuse of a right-angle triangle whose other sides have lengths 2r and h/2. (This
follows from the assumption that the tap hole is half-way up the side.) Thus, by the Pythagorean
theorem, 2
2 2 h
L = (2r ) + . (8.4.2)
2
The problem can now be stated mathematically: maximize V in Eqn. (8.4.1) subject to a fixed
value of L in Eqn. (8.4.2). The fact that L is fixed means that we have a constraint, as before, that
we use to reduce the number of variables in the problem.
h
L h
2
2r
Figure 8.3: We simplify the problem to a cylindrical barrel with diameter 2r and height h. We
assumed that the height of the tap-hole is h/2. Length L denotes the “wet” portion of the merchant’s
rod, used to determine the cost. We observe a Pythagorean triangle formed by the dashed line
segments.
278
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS
The function V (h) is positive for h in the range 0 ď h ď 2L, and V = 0 at the two endpoints of the
interval. We can restrict attention to this interval since otherwise V ă 0, which makes no physical
sense. Since V (h) is a smooth function, we anticipate that somewhere inside this range of values
there should be a maximal volume.
Computing first and second derivatives, we find
π 2 3 2 π 3 3
1
V (h) = L ´ h , V (h) = 2
0 ´ 2 ¨ h = ´ πh ă 0.
4 4 4 4 8
h 2 ?L3 ?
= 1 = 2 2. (8.4.3)
r ? ? L
3 2
Hence,
? for greatest economy, Kepler would have purchased barrels with height to radius ratio of
2 2 = 2.82 « 3. ♦
Concept Check-In
4. If all barrels had a radius of 25cm, given the result Example 8.4.2, what would be the best
barrel height?
279
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS
6. Consider a barrel with radius 25cm and height 100cm. What is this barrel’s volume?
f
food patch
travel time τ
time t
Figure 8.4: A bird travels daily to forage in a food patch. We want to determine how long it should
stay in the patch to optimize its overall average energy gain per unit time.
• τ= travel time between nest and food patch (this is considered to be time that is unavoidably
wasted).
• t = residence time in the patch (i.e. how long to spend foraging in one patch), also called
foraging time,
• f (t ) = total energy gained by foraging in a patch for time t.
In some patches, food is ample and found quickly, while in others, it takes time and effort to obtain.
The typical time needed to find food is reflected by various energy gain functions f (t ) shown in
Figure 8.5.
Example 8.4.3 (Energy gain versus patch residence time). For each panel in Figure 8.5, explain
what the graph of the total energy gain f (t ) is saying about the type of food patch: how easy or hard
is it to find food?
Solution. The types of food patches are as follows:
280
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS
t t t
t t t
Figure 8.5: Examples of various total energy gain f (t ) for a given foraging time t. The shapes of
these functions determine how hard or easy it is to extract food from a food patch.
1. The energy gain is linearly proportional to time spent in the patch. In this case, the patch has
so much food that it is never depleted. It would make sense to stay in such a patch for as long
as possible.
2. Energy gain is independent of time spent. The animal gets the full quantity as soon as it gets
to the patch.
3. Food is gradually depleted, (the total energy gain levels off to some constant as t increases).
There is “diminishing return” for staying longer, suggesting that it is best not to stay too long.
4. The reward for staying longer in this patch increases: the net energy gain is concave up
( f 2 (t ) ą 0), so its slope is increasing.
5. It takes time to begin to gain energy. After some time, the gain increases, but eventually, the
patch is depleted.
6. Staying too long in such a patch is disadvantageous, resulting in net loss of energy. It is
important to leave this patch early enough to avoid that loss.
Concept Check-In
281
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS
8. Which model(s) can you automatically dismiss as not very biologically realistic?
b) From Chapter 1, Emax is the horizontal asymptote, corresponding to an upper bound for the total
amount of energy that can be extracted from the patch. The parameter k has units of time and
controls the steepness of the function. Foraging for a time t = k, leads the animal to obtain half
of the total available energy, since f (k) = Emax /2. ♦
Example 8.4.5 (Currency to optimize). We can assume that animals try to maximize the average
energy gain per unit time, defined by the ratio:
Total energy gained
R(t ) = ,
total time spent
Write down R(t ) for the assumed patch energy function Eqn. 8.4.4.
Solution. The ‘total time spent’ is a sum of the fixed amount of time τ traveling, and time t foraging.
The ‘total energy gained’ is f (t ). Thus, for the patch function f (t ) assumed in Eqn. (8.4.4),
Concept Check-In
9. What units might be used in the function R(t )?
f (t ) Emaxt
R(t ) = = . (8.4.5)
(τ + t ) (k + t )(τ + t )
♦
We can now state the mathematical problem:
282
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS
R(t) R(t)
t t
Figure 8.6: In Example 8.4.6 we first compose a rough sketch of the average rate of energy gain
R(t ) in Eqn. (8.4.5). The graph is linear near the origin, and decays to zero at large t.
kτ ´ t 2
R1 (t ) = Emax = 0. (8.4.6)
(k + t )2 (τ + t )2
This can only be satisfied if the numerator is zero, that is
?
kτ ´ t 2 = 0 ñ t1,2 = ˘ kτ.
? the (irrelevant) negative root, we deduce that the critical point of the function R(t ) is
Rejecting
tcrit = kτ. The sketch in Figure 8.6, verifies that this critical point is a local maximum. ♦
?
Example 8.4.7. For practice, use one of the calculus tests for critical points to show that tcrit = kτ
is a local maximum for the function R(t ) in Eqn. (8.4.5).
Solution. R(t ) is a rational function, so a second derivative is messy. Instead, we apply the first
derivative test –that is, we check the sign of R1 (t ) on both sides of the critical point.
• Eqn. (8.4.6) gives R1 (t ). Its denominator is positive, so the sign of R1 (t ) is determined by its
numerator, (kτ ´ t 2 ).
• Thus, R1 (t ) ą 0 for t ă tcrit , and R1 (t ) ă 0 for t ą tcrit .
This confirms that the function increases up to the critical point and decreases afterwards, so the
critical point is a local maximum, henceforth denoted tmax . ♦
To optimize the average rate of?energy gain, R(t ), we found that the animal should stay in the
patch for a duration of t = tmax = kτ.
283
O PTIMIZATION 8.5 S UMMARY
Concept Check-In
10. Given tmax is the duration of time an animal should stay in a patch, and τ is travelling
time, explain why the constant k is also in units of time.
Example 8.4.8. Determine the average rate of energy gain at this optimal patch residence time, i.e.
find the maximal average rate of energy gain.
?
Solution. Computing R(t ) for t = tmax = kτ, we find that
Emaxtmax Emax 1
R(tmax ) = = ? . (8.4.7)
(k + tmax )(τ + tmax ) τ (1 + k/τ )2
8.5 IJ Summary
1. Optimization is a process of finding critical points, and identifying local and global max-
ima/minima.
2. A scientific problem that address “biggest/smallest, best, most efficient” is often reducible to
an optimization problem.
3. As with all mathematical models, translating scientific observations and reasonable assump-
tions into mathematical terms is an important first step.
(a) Density dependent population growth. Using a given logistic growth law, the following
parameters were considered:
• population growth rate (to be maximized),
• population density,
• intrinsic growth rate (constant),
• carrying capacity (constant).
(b) Wine for Kepler’s wedding, seeking the largest barrel volume for a fixed diagonal length.
The following parameters were considered:
• barrel volume, (to be maximized)
• barrel height,
• barrel radius,
• length of the diagonal (constant).
(c) Foraging time for an animal collecting food. We considered:
• travel time between nest and food patch,
• foraging time in the patch,
• energy gained by foraging in a patch for various time durations.
284
O PTIMIZATION 8.5 S UMMARY
where N is the density of the population, under what circumstances is the population
growing fastest?
2. When finding a global maximum, why is always imperative to check the endpoints?
3. Demonstrate the variability of barrel dimensions by giving two different height and radius
pairs which lead to a volume of 50L.
4. Would the answer to Kepler’s wine barrel problem have changed if we had solved for h2
instead of r2 ?
285
O PTIMIZATION 8.5 S UMMARY
286
Chapter 9
A PPROXIMATING F UNCTIONS
N EAR A S PECIFIED P OINT—
TAYLOR P OLYNOMIALS
Learning Objectives
• Use a linear approximation to approximate a differentiable function that is difficult to
evaluate exactly. This includes choosing an appropriate centre point.
Suppose that you are interested in the values of some function f (x) for x near some fixed point
a. When the function is a polynomial or a rational function we can use some arithmetic (and maybe
some hard work) to write down the answer. For example:
x2 ´ 3
f (x ) =
x2 ´ 2x + 4
1 1´75
25 ´ 3 25
f (1/5) = 1
= 1´10+100
´ 25 + 4
25 25
´74
=
91
Tedious, but we can do it. On the other hand if you are asked to compute sin(1/10) then what can
287
TAYLOR P OLYNOMIALS 9.1 Z EROTH APPROXIMATION — THE CONSTANT APPROXIMATION
Figure 9.0.1.
1
x
sin x
0.5
x
´1 ´0.5 0.5 1
´0.5
´1
The above figure shows that the curves y = x and y = sin x are almost the same when x is close
to 0. Hence if we want the value of sin(1/10) we could just use this approximation y = x to get
sin(1/10) « 1/10.
Of course, in this case we simply observed that one function was a good approximation of the other.
We need to know how to find such approximations more systematically.
More precisely, say we are given a function f (x) that we wish to approximate close to some
point x = a, and we need to find another function F (x) that
• is simple and easy to compute2
• is a good approximation to f (x) for x values close to a.
Further, we would like to understand how good our approximation actually is. Namely we need to
be able to estimate the error | f (x) ´ F (x)|.
There are many different ways to approximate a function and we will discuss one family of
approximations: Taylor polynomials. This is an infinite family of ever improving approximations,
and our starting point is the very simplest.
1 Originally the word “calculator” referred not to the software or electronic (or even mechanical) device we think of
today, but rather to a person who performed calculations.
2 It is no good approximating a function with something that is even more difficult to work with.
3 It barely counts as an approximation at all, but it will help build intuition. Because of this, and the fact that a
constant is a polynomial of degree 0, we’ll start counting our approximations from zero rather than 1.
288
TAYLOR P OLYNOMIALS 9.1 Z EROTH APPROXIMATION — THE CONSTANT APPROXIMATION
To ensure that F (x) is a good approximation for x close to a, we choose A so that f (x) and F (x)
take exactly the same value when x = a.
f (x ) « f (a)
An important point to note is that we need to know f (a) — if we cannot compute that easily then
we are not going to be able to proceed. We will often have to choose a (the point around which we
are approximating f (x)) with some care to ensure that we can compute f (a).
Here is a figure showing the graphs of a typical f (x) and approximating function F (x). At
y
y = f (x)
y = F (x) = f (a)
a x
x = a, f (x) and F (x) take the same value. For x very near a, the values of f (x) and F (x) remain
close together. But the quality of the approximation deteriorates fairly quickly as x moves away
from a. Clearly we could do better with a straight line that follows the slope of the curve. That is
our next approximation.
But before then, an example:
Example 9.1.2
Use the constant approximation to estimate e0.1 .
Solution. First set f (x) = ex .
• Now we first need to pick a point x = a to approximate the function. This point needs to be
close to 0.1 and we need to be able to evaluate f (a) easily. The obvious choice is a = 0.
• Then our constant approximation is just
F (x) = f (0) = e0 = 1
F (0.1) = 1
Note that e0.1 = 1.105170918 . . . , so even this approximation isn’t too bad..
Example 9.1.2
289
TAYLOR P OLYNOMIALS 9.2 L INEAR APPROXIMATION
So we must have B = f 1 (a). Substituting this into A + Ba = f (a) we get A = f (a) ´ a f 1 (a). So
we can write
A
hkkkkkkikkkkkkj
F (x) = A + Bx = f (a) ´ a f 1 (a) + f 1 (a) ¨ x
= f (a) + f 1 (a) ¨ (x ´ a)
We write it in this form because we can now clearly see that our first approximation is just an
extension of our zeroth approximation. This first approximation is also often called the linear
approximation of f (x) about x = a.
We should again stress that in order to form this approximation we need to know f (a) and f 1 (a) —
if we cannot compute them easily then we are not going to be able to proceed.
Recall, from Theorem 3.3.7, that y = f (a) + f 1 (a)(x ´ a) is exactly the equation of the tangent
line to the curve y = f (x) at a. Here is a figure showing the graphs of a typical f (x) and the
approximating function F (x). Observe that the graph of f (a) + f 1 (a)(x ´ a) remains close to the
a x
290
TAYLOR P OLYNOMIALS 9.2 L INEAR APPROXIMATION
graph of f (x) for a much larger range of x than did the graph of our constant approximation, f (a).
One can also see that we can improve this approximation if we can use a function that curves down
rather than being perfectly straight. That is our next approximation.
But before then, back to our example:
Example 9.2.2
Use the linear approximation to estimate e0.1 .
Solution. First set f (x) = ex and a = 0 as before.
• To form the linear approximation we need f (a) and f 1 (a):
f (x) = ex f (0) = 1
f 1 (x) = ex f 1 (0) = 1
• Then our linear approximation is
F (x ) = f (0) + x f 1 (0) = 1 + x
F (0.1) = 1.1
Recall that e0.1 = 1.105170918 . . . , so the linear approximation is almost correct to 3 digits.
Example 9.2.2
291
TAYLOR P OLYNOMIALS 9.3 Q UADRATIC APPROXIMATION
C = 21 f 2 (a) substitute
B = f 1 (a) ´ 2Ca = f 1 (a) ´ a f 2 (a) substitute again
A = f (a) ´ Ba ´Ca2 = f (a) ´ a[ f 1 (a) ´ a f 2 (a)] ´ 12 f 2 (a)a2
Oof! We again write it in this form because we can now clearly see that our second approximation is
just an extension of our first approximation.
Our second approximation is called the quadratic approximation:
Here is a figure showing the graphs of a typical f (x) and approximating function F (x). This new
292
TAYLOR P OLYNOMIALS 9.3 Q UADRATIC APPROXIMATION
y
y = f (x)
y = F (x) = f (a) + f ′ (a)(x − a) + 21 f ′′ (a)(x − a)2
a x
F (x ) = α + β ¨ (x ´ a) + γ ¨ (x ´ a)2
Then
And from these we can clearly read off the values of α, β and γ and so recover our function
F (x). Additionally if we write things this way, then it is quite clear how to extend this to a cubic
approximation and a quartic approximation and so on.
Return to our example:
Example 9.3.2
Use the quadratic approximation to estimate e0.1 .
Solution. Set f (x) = ex and a = 0 as before.
f (x) = ex f (0) = 1
f 1 (x) = ex f 1 (0) = 1
f 2 (x) = ex f 2 (0) = 1
1 x2
F ( x ) = f ( 0 ) + x f 1 ( 0 ) + x2 f 2 ( 0 ) = 1 + x +
2 2
F (0.1) = 1.105
5 Any polynomial of degree two can be written in this form. For example, when a = 1, 3 + 2x + x2 = 6 + 4(x ´ 1) +
(x ´ 1)2 .
293
TAYLOR P OLYNOMIALS 9.3 Q UADRATIC APPROXIMATION
Recall that e0.1 = 1.105170918 . . . , so the quadratic approximation is quite accurate with very little
effort.
Example 9.3.2
Before we go on, let us first introduce (or revise) some notation that will make our discussion
easier.
In the remainder of this section we will frequently need to write sums involving a large number of
terms. Writing out the summands explicitly can become quite impractical — for example, say we
need the sum of the first 11 squares:
1 + 22 + 32 + 42 + 52 + 62 + 72 + 82 + 92 + 102 + 112
This becomes tedious. Where the pattern is clear, we will often skip the middle few terms and
instead write
1 + 22 + ¨ ¨ ¨ + 112 .
A far more precise way to write this is using Σ (capital-sigma) notation. For example, we can write
the above sum as
11
ÿ
k2
k =1
This is read as
More generally
294
TAYLOR P OLYNOMIALS 9.3 Q UADRATIC APPROXIMATION
Notation 9.3.3.
Let m ď n be integers and let f (x) be a function defined on the integers. Then we write
n
ÿ
f (k )
k =m
f (m) + f (m + 1) + f (m + 2) + ¨ ¨ ¨ + f (n ´ 1) + f (n).
Similarly we write
n
ÿ
ai
i=m
to mean
7
ÿ 1 1 1 1 1 1
2
= 2+ 2+ 2+ 2+ 2
k =3
k 3 4 5 6 7
It is important to note that the right hand side of this expression evaluates to a number6 ; it does not
contain “k”. The summation index k is just a “dummy” variable and it does not have to be called k.
For example
7 7 7 7
ÿ 1 ÿ 1 ÿ 1 ÿ 1
2
= 2
= 2
=
k =3
k i=3
i j =3
j `=3
`2
Also the summation index has no meaning outside the sum. For example
7
ÿ 1
k
k =3
k2
46181
6 Some careful addition shows it is 176400 .
7 Or possibly gobbledygook. For a discussion of statements without meaning and why one should avoid them we
recommend the book “Bendable learnings: the wisdom of modern management” by Don Watson.
295
TAYLOR P OLYNOMIALS 9.4 S TILL BETTER APPROXIMATIONS — TAYLOR POLYNOMIALS
n! = n ˆ (n ´ 1) ˆ ¨ ¨ ¨ ˆ 3 ˆ 2 ˆ 1
0! = 1
1! = 1 2! = 2 3! = 6
4! = 24 5! = 120 6! = 720
8 Polynomials are generally a good choice for an approximating function since they are so easy to work with. De-
pending on the situation other families of functions may be more appropriate. For example if you are approximating
a periodic function, then sums of sines and cosines might be a better choice; this leads to Fourier series.
9 Any polynomial in x of degree n can also be expressed as a polynomial in (x ´ a) of the same degree n and vice
versa. So Tn (x) really still is a polynomial of degree n.
10 Furthermore when x is close to a, (x ´ a)k decreases very quickly as k increases, which often makes the ”high k”
terms in Tn (x) very small. This can be a considerable advantage when building up approximations by adding more
ÿn
and more terms. If we were to rewrite Tn (x) in the form bk xk the ”high k” terms would typically not be very
k =0
small when x is close to a.
296
TAYLOR P OLYNOMIALS 9.4 S TILL BETTER APPROXIMATIONS — TAYLOR POLYNOMIALS
Tn (a) = c0
Tn1 (a) = c1
Tn2 (a) = 2 ¨ c2
Tn3 (a) = 6 ¨ c3
..
.
(n)
Tn (a) = n! ¨ cn
So now if we want to set the coefficients of Tn (x) so that it agrees with f (x) at x = a then we need
1
Tn (a) = c0 = f (a) c0 = f (a) = f (a)
0!
We also want the first n derivatives of Tn (x) to agree with the derivatives of f (x) at x = a, so
1 1
Tn1 (a) = c1 = f 1 (a) c1 = f 1 (a) = f (a)
1!
1 1
Tn2 (a) = 2 ¨ c2 = f 2 (a) c2 = f 2 (a) = f 2 (a)
2 2!
1 3 1
Tn3 (a) = 6 ¨ c3 = f 3 (a) c3 = f (a) = f 3 (a)
6 3!
More generally, making the kth derivatives agree at x = a requires :
(k ) 1 (k )
Tn (a) = k! ¨ ck = f (k) (a) ck = f (a)
k!
And finally the nth derivative:
(n) 1 (n)
Tn (a) = n! ¨ cn = f (n) (a) cn = f (a)
n!
Putting this all together we have
11 It is actually possible to define the factorial of positive real numbers and even negative numbers but it requires
more advanced calculus and is outside the scope of this course. The interested reader should look up the Gamma
function.
297
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES
1 2 1
f (x) « Tn (x) = f (a) + f 1 (a)(x ´ a) + f ( a ) ¨ ( x ´ a ) 2 + ¨ ¨ ¨ + f (n) ( a ) ¨ ( x ´ a ) n
2 n!
n
ÿ 1 (k )
= f (a) ¨ (x ´ a)k
k =0
k!
Let a be a constant and let n be a non-negative integer. The nth degree Taylor polynomial
for f (x) about x = a is
n
ÿ 1 (k )
Tn (x) = f (a) ¨ (x ´ a)k .
k =0
k!
• While we can compute a Taylor polynomial about any a-value (providing the derivatives exist),
in order to be a useful approximation, we must be able to compute f (a), f 1 (a), . . . , f (n) (a)
easily. This means we must choose the point a with care. Indeed for many functions the
choice a = 0 is very natural — hence the prominence of Maclaurin polynomials.
• If we have computed the approximation Tn (x), then we can readily extend this to the next
Taylor polynomial Tn+1 (x) since
1
Tn+1 (x) = Tn (x) + f (n+1) ( a ) ¨ ( x ´ a ) n+1
(n + 1) !
This is very useful if we discover that Tn (x) is an insufficient approximation, because then we
can produce Tn+1 (x) without having to start again from scratch.
12 The polynomials are named after Brook Taylor who devised a general method for constructing them in 1715.
Slightly later, Colin Maclaurin made extensive use of the special case a = 0 (with attribution of the general case to
Taylor) and it is now named after him. The special case of a = 0 was worked on previously by James Gregory
and Isaac Newton, and some specific cases were known to the 14th century Indian mathematician Madhava of
Sangamagrama.
298
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES
Example 9.5.1
The constant, linear and quadratic approximations we used above were the first few Maclaurin
polynomial approximations of ex . That is
x2
T0 (x) = 1 T1 (x) = 1 + x T2 (x) = 1 + x +
2
d x
Since dx e = ex , the Maclaurin polynomials are very easy to compute. Indeed this invariance under
differentiation means that
f (n) (x) = ex n = 0, 1, 2, . . . so
(n)
f (0) = 1
Thus we can write down the seventh Maclaurin polynomial very easily:
x2 x3 x4 x5 x6 x7
T7 (x) = 1 + x + + + + + +
2 6 24 120 720 5040
The following figure contains sketches of the graphs of ex and its Taylor polynomials Tn (x) for
n = 0, 1, 2, 3, 4.
y
y = ex
7 y = T4 (x)
y = T3 (x)
6
x2
5 y = T2 (x) = 1 + x + 2
3 y = T1 (x) = 1 + x
1 y = T0 (x) = 1
x
−1 1 2
299
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES
Example 9.5.2
For cosine:
300
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES
Example 9.5.3
Find the 4th degree Maclaurin polynomial for cos x.
Solution. We have a = 0 and we need to find the first 4 derivatives of cos x.
x2 x4 x6 x8
T8 (x) = 1 ´ + ´ +
2 24 6! 8!
Continuing this process gives us the 2nth Maclaurin polynomial
n
ÿ (´1)k 2k
T2n (x) = ¨x
k =0
(2k)!
Warning 9.5.4.
The above formula only works when x is measured in radians, because all of our derivative
formulae for trig functions were developed under the assumption that angles are measured
in radians.
Below we plot cos x against its first few Maclaurin polynomial approximations:
301
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES
Example 9.5.4
The above work is quite easily recycled to get the Maclaurin polynomial for sine:
Example 9.5.5
Find the 5th degree Maclaurin polynomial for sin x.
Solution. We could simply work as before and compute the first five derivatives of sin x. But set
g(x) = sin x and notice that g(x) = ´ f 1 (x), where f (x) = cos x. Then we have
g(0) = ´ f 1 (0) = 0
g1 ( 0 ) = ´ f 2 ( 0 ) = 1
g2 (0) = ´ f 3 (0) = 0
g3 (0) = ´ f (4) (0) = ´1
g(4) ( 0 ) = ´ f (5) ( 0 ) = 0
g(5) ( 0 ) = ´ f (6) ( 0 ) = 1
x3 x5
T5 (x) = x ´ +
3! 5!
302
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES
Just as we extended to the 2nth Maclaurin polynomial for cosine, we can also extend our work to
compute the (2n + 1)th Maclaurin polynomial for sine:
n
ÿ (´1)k
T2n+1 (x) = ¨ x2k+1
k =0
( 2k + 1 ) !
Warning 9.5.6.
The above formula only works when x is measured in radians, because all of our derivative
formulae for trig functions were developed under the assumption that angles are measured
in radians.
Below we plot sin x against its first few Maclaurin polynomial approximations.
Example 9.5.6
To get an idea of how good these Taylor polynomials are at approximating sin and cos, let’s
concentrate on sin x and consider x’s whose magnitude |x| ď 1. There are tricks that you can
employ13 to evaluate sine and cosine at values of x outside this range.
If |x| ď 1 radians14 , then the magnitudes of the successive terms in the Taylor polynomials for
13 If you are writing software to evaluate sin x, you can always use the trig identity sin(x) = sin(x ´ 2nπ ), to easily
restrict to |x| ď π. You can then use the trig identity sin(x) = ´ sin(x ˘ π ) to reduce to |x| ď π2 . Finally you can
use the trig identity sin(x) = ¯ cos( π2 ˘ x)) to reduce to |x| ď π4 ă 1.
14 Recall that the derivative formulae that we used to derive the Taylor polynomials are valid only when x is in radians.
The restriction ´1 ď x ď 1 radians translates to angles bounded by 180 π « 57 .
˝
303
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS
From these inequalities, and the graphs on the previous pages, it certainly looks like, for x not too
large, even relatively low degree Taylor polynomials give very good approximations. In Section 9.6
we’ll see how to get rigorous error bounds on our Taylor polynomial approximations.
Learning Objectives
• Be able to use the formula for the error in Taylor polynomial approximations, and
interpret its result. For example: determine a bound on the error of a polynomial
approximation at a point; determine a range for which a particular approximation has an
error within a certain tolerance; or determine which degree Taylor approximation will
result in an error within a certain tolerance.
Any time you make an approximation, it is desirable to have some idea of the size of the error you
introduced. That is, we would like to know the difference R(x) between the original function f (x)
and our approximation F (x):
R(x ) = f (x ) ´ F (x ).
Of course if we know R(x) exactly, then we could recover f (x) = F (x) + R(x) — so this is an
unrealistic hope. In practice we would simply like to bound R(x):
where (hopefully) M is some small number. It is worth stressing that we do not need the tightest
possible value of M, we just need a relatively easily computed M that isn’t too far off the true value
of | f (x) ´ F (x)|.
We will now develop a formula for the error introduced by the constant approximation, equa-
tion (9.1.1) (developed back in Section 9.1)
The resulting formula can be used to get an upper bound on the size of the error |R(x)|.
304
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS
Indeed, this equation is important in the discussion that follows, so we’ll highlight it
f (x ) ´ f (a)
The coefficient of (x ´ a) is the average slope of f (t ) as t moves from t = a to
x´a
t = x. We can picture this as the slope of the secant joining the points (a, f (a)) and (x, f (x)) in the
Flavour A
sketch below.
y x, f (x)
y = f (t)
a, f (a)
a c x t
305
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS
Notice that this expression as it stands is not quite what we want. Let us massage this around a
little more into a more useful form
Equation 9.6.3 (The error in constant approximation).
Notice that the MVT doesn’t tell us the value of c, however we do know that it lies strictly
between x and a. So if we can get a good bound on f 1 (c) on this interval then we can get a good
bound on the error.
Example 9.6.4
Let us return to Example 9.1.2, and we’ll try to bound the error in our approximation of e0.1 .
know that
And while this is true, it is rather circular. We have just bounded the error in our approximation
1 0.1
of e0.1 by 10 e — if we actually knew e0.1 then we wouldn’t need to estimate it!
• While we don’t know e0.1 exactly, we do knowa that 1 = e0 ă e0.1 ă e1 ă 3. This gives us
That is — the error in our approximation of e0.1 is no greater than 0.3. Recall that we don’t
need the error exactly, we just need a good idea of how large it actually is.
306
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS
But we can actually go a little further here — we can bound the error above and below. If we do
not take absolute values, then since
we can write
so
So while the upper bound is weak, the lower bound is quite tight.
Example 9.6.4
There are formulae similar to equation (9.6.2), that can be used to bound the error in our
other approximations; all are based on generalisations of the MVT. The next one — for linear
approximations — is
Flavour A
It implies that the error that we make when we approximate f (x) by T1 (x) = f (a) + f 1 (a) (x´a)
is exactly 21 f 2 (c) (x ´ a)2 for some c strictly between a and x.
More generally
1 1
f ( x ) = f ( a ) + f 1 ( a ) ¨ ( x ´ a ) + ¨ ¨ ¨ + f (n) ( a ) ¨ ( x ´ a ) n + f (n+1) ( c ) ¨ ( x ´ a ) n+1
n!
loooooooooooooooooooooooooooooomoooooooooooooooooooooooooooooon (n + 1) !
=Tn (x)
for some c strictly between a and x. Again, rewriting this in terms of Tn (x) gives
307
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS
Equation 9.6.6.
1
f (x) ´ Tn (x) = f (n+1) (c) ¨ (x ´ a)n+1 for some c strictly between a and x
(n + 1) !
That is, the error introduced when f (x) is approximated by its Taylor polynomial of degree n, is
precisely the last term of the Taylor polynomial of degree n + 1, but with the derivative evaluated
at some point between a and x, rather than exactly at a. These error formulae are proven in the
optional Section 9.7 later in this chapter.
Example 9.6.7
Approximate sin 46˝ using Taylor polynomials about a = 45˝ , and estimate the resulting error.
Solution.
a = 45˝ = 45 180
π
radians x = 46˝ = 46 180
π
radians x´a = π
180 radians
Flavour A
• The constant, linear and quadratic Taylor approximations for sin(x) about π
4 are
1
T0 (x) = f (a) =?
2
1 1 π
T1 (x) = T0 (x) + f 1 (a) ¨ (x ´ a) = ? + ? x ´
2 2 4
1 1 π 1 π 2
T2 (x) = T1 (x) + 21 f 2 (a) ¨ (x ´ a)2 = ? + ? x ´ ´ ? x´
2 2 4 2 2 4
308
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS
• Rather than carefully estimating sin c and cos c for c in that range, we make use of a simpler
(but much easier bound). No matter what c is, we know that | sin c| ď 1 and | cos c| ď 1. Hence
ˇ
ˇerror in 0.70710678ˇ ď π
ˇ
ă 0.018
180
2
ˇerror in 0.71944812ˇ ď 1 π
ˇ ˇ
ă 0.00015
2 180
3
ˇerror in 0.71934042ˇ ď 1 π
ˇ ˇ
ă 0.0000009
3! 180
Example 9.6.7
So at x = 1 we have
e « T1 (1) = 2
309
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS
So e ă 4.
• This isn’t as tight as we would like — so now do the same with the quadratic approximation
with a = 0:
x2
ex « T2 (x) = 1 + x +
2
So when x = 1 we have
1 5
e « T2 (1) = 1 + 1 + =
2 2
310
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS
Example 9.6.8
• Recall that
n
ÿ 1 k
Tn (x) = x
Flavour A
k =0
k!
• So when n = 9 we have
1 1 1 3
ď e´ 1 + 1 + + ¨¨¨ + ď
10! 2 9! 10!
311
TAYLOR P OLYNOMIALS 9.7 (O PTIONAL ) — D ERIVATION OF THE ERROR FORMULAE
• More generally we know that using Tn (1) to approximate e will have an error of at most
3
(n+1) !
— so it converges very quickly.
Example 9.6.9
a Since the derivative of ex is ex which is positive everywhere, the function is increasing everywhere.
1
Rn (x) = f (x) ´ Tn (x) = f (n+1) ( c ) ¨ ( x ´ a ) n+1
(n + 1) !
for some c strictly between a and x, and where Tn (x) is the nth degree Taylor polynomial approxima-
tion of f (x) about x = a:
n
ÿ 1 (k )
Tn (x) = f (a).
k =0
k!
Let the functions F (x) and G(x) both be defined and continuous on a ď x ď b and both be
differentiable on a ă x ă b. Furthermore, suppose that G1 (x) ‰ 0 for all a ă x ă b. Then,
there is a number c obeying a ă c ă b such that
Notice that setting G(x) = x recovers the original Mean-Value Theorem. It turns out that this
theorem is not too difficult to prove from the MVT using some sneaky algebraic manipulations:
Proof. • First we construct a new function h(x) as a linear combination of F (x) and G(x) so
that h(a) = h(b) = 0. Some experimentation yields
h(x ) = F (b) ´ F (a) ¨ G(x ) ´ G(a) ´ G(b) ´ G(a) ¨ F (x ) ´ F (a)
312
TAYLOR P OLYNOMIALS 9.7 (O PTIONAL ) — D ERIVATION OF THE ERROR FORMULAE
• Since h(a) = h(b) = 0, the Mean-Value theorem (actually Rolle’s theorem) tells us that there
is a number c obeying a ă c ă b such that h1 (c) = 0:
h1 ( x ) = F ( b ) ´ F ( a ) ¨ G1 ( x ) ´ G ( b ) ´ G ( a ) ¨ F 1 ( x ) so
1 1
0 = F (b) ´ F (a) ¨ G (c) ´ G(b) ´ G(a) ¨ F (c)
Now move the G1 (c) terms to one side and the F 1 (c) terms to the other:
F ( b ) ´ F ( a ) ¨ G1 ( c ) = G ( b ) ´ G ( a ) ¨ F 1 ( c ) .
• Since we have G1 (x) ‰ 0, we know that G1 (c) ‰ 0. Further the Mean-Value theorem ensures15
that G(a) ‰ G(b). Hence we can move terms about to get
F 1 (c)
F (b) ´ F (a) = G(b) ´ G(a) ¨ 1
G (c)
F (b) ´ F (a) 1
F (c)
= 1
G(b) ´ G(a) G (c)
as required.
Armed with the above theorem we can now move on to the proof of the Taylor remainder
formula.
Proof of equation (9.6.6). We begin by proving the remainder formula for n = 1. That is
1 2
f (x) ´ T1 (x) = f (c) ¨ (x ´ a)2
2
• Start by setting
F (a) = 0 G(a) = 0
F 1 (x ) = f 1 (x ) ´ f 1 (a) G1 (x) = 2(x ´ a)
• Now apply the generalised MVT with b = x: there exists a point q between a and x such that
F (x ) ´ F (a) F 1 (q)
= 1
G(x ) ´ G(a) G (q)
F (x ) ´ 0 f 1 (q) ´ f 1 (a)
=
G(x ) ´ 0 2(q ´ a)
F (x ) f (q) ´ f 1 (a)
1
2¨ =
G(x ) q´a
15 Otherwise if G(a) = G(b) the MVT tells us that there is some point c between a and b so that G1 (c) = 0.
313
TAYLOR P OLYNOMIALS 9.7 (O PTIONAL ) — D ERIVATION OF THE ERROR FORMULAE
• Consider the right-hand side of the above equation and set g(x) = f 1 (x). Then we have the
g(q)´g(a)
term q´a — this is exactly the form needed to apply the MVT. So now apply the standard
MVT to the right-hand side of the above equation — there is some c between q and a so that
f 1 (q) ´ f 1 (a) g(q) ´ g(a)
= = g1 ( c ) = f 2 ( c )
q´a q´a
Notice that here we have assumed that f 2 (x) exists.
• if I can stand on the current rung, then I can step up to the next rung (if the result is true for
n = k then it is also true for n = k + 1)
Hence I can climb as high as like.
• Now set
314
TAYLOR P OLYNOMIALS 9.7 (O PTIONAL ) — D ERIVATION OF THE ERROR FORMULAE
and apply the generalised MVT with b = x: hence there exists a q between a and x so that
F (x ) ´ F (a) F 1 (q)
= 1 which becomes
G(x ) ´ G(a) G (q)
F (x ) F 1 (q)
= rearrange
(x ´ a)k+1 (k + 1)(q ´ a)k
(x ´ a )k +1
F (x ) = ¨ F 1 (q)
(k + 1)(q ´ a) k
(x ´ a )k +1 1
= ¨ f (k+1) (c)(q ´ a)k
(k + 1)(q ´ a) k! k
1 (k +1) (x ´ a )k +1 (q ´ a )k
= ¨f (c) ¨
(k + 1)k! (q ´ a)k
1
= ¨ f (k +1) ( c ) ¨ ( x ´ a ) k +1
(k + 1) !
as required.
315
TAYLOR P OLYNOMIALS 9.7 (O PTIONAL ) — D ERIVATION OF THE ERROR FORMULAE
• if, for some k, the remainder formula (with n = k) is true for all k times differentiable functions,
• then the remainder formula is true (with n = k + 1) for all k + 1 times differentiable functions.
Repeatedly applying this for k = 1, 2, 3, 4, ¨ ¨ ¨ (and recalling that we have shown the remainder
formula is true when n = 0, 1) gives equation (9.6.6) for all n = 0, 1, 2, . . . .
316
N EWTON ’ S M ETHOD
Chapter 10
Learning Objectives
• Given a function, find an integer that is reasonably close to the root.
• Given a differentiable function, find the x-intercept of the tangent line at a particular
point.
• Explain how Newton’s method works. That is, how you can use tangent lines to
approximate the roots of a function.
• Write down the formula for Newton’s method and explain what each term in the
equation represents.
Newton’s method1 , also known as the Newton-Raphson method, is another technique for
generating numerical approximate solutions
? to equations of the form f (x) = 0. For example, one
can easily get a good approximation to 2 by applying Newton’s method to the equation x2 ´ 2 = 0.
This will be done in Example 10.0.2, below.
Here is the derivation of Newton’s method. We start by simply making a guess for the solution.
For example, we could base the guess on a sketch of the graph of f (x). Call the initial guess x1 .
Next recall, from Theorem 3.3.7, that the tangent line to y = f (x) at x = x1 is y = F (x), where
Usually F (x) is a pretty good approximation to f (x) for x near x1 . So, instead of trying to solve
1 The algorithm that we are about to describe grew out of a method that Newton wrote about in 1669. But the modern
method incorporates substantial changes introduced by Raphson in 1690 and Simpson in 1740.
N EWTON ’ S M ETHOD
f (x) = 0, we solve the linear equation F (x) = 0 and call the solution x2 .
f (x1 )
0 = F (x) = f (x1 ) + f 1 (x1 ) (x ´ x1 ) ðñ x ´ x1 = ´
f 1 (x1 )
f (x1 )
ðñ x = x2 = x1 ´ 1
f (x1 )
Note that if f (x) were a linear function, then F (x) would be exactly f (x) and x2 would solve
f (x) = 0 exactly.
y (x1 , f (x1 ))
y = f (x)
x1 x2 x
y = F (x)
Now we repeat, but starting with the (second) guess x2 rather than x1 . This gives the (third)
f (x )
guess x3 = x2 ´ f 1 (x2 ) . And so on. By way of summary, Newton’s method is
2
3. Iterate. That is, for each natural number n, once you have computed xn , define
?
Example 10.0.2 Approximating 2
In this example we compute,?approximately, the square root of two. We will of course pretend that
we do not already know
? that 2 = 1.41421 ¨ ¨ ¨ . So we cannot find it by solving, approximately, the
equation f (x) = x ´ 2 = 0. Instead we apply Newton’s method to the equation
f ( x ) = x2 ´ 2 = 0
Since f 1 (x) = 2x, Newton’s method says that we should generate approximate solutions by iteratively
applying
f (xn ) x2 ´ 2 xn 1
xn+1 = xn ´ 1 = xn ´ n = +
f (xn ) 2xn 2 xn
N EWTON ’ S M ETHOD
We need a starting point. Since 12 = 1 ă 2 and 22 = 4 ą 2, the square root of two must be between
1 and 2, so let’s start Newton’s method with the initial guess x1 = 1.5. Here goes2 :
x1 = 1.5
1 1 1 1
x2 = x1 + = (1.5) +
2 x1 2 1.5
= 1.416666667
1 1 1 1
x3 = x2 + = (1.416666667) +
2 x2 2 1.416666667
= 1.414215686
1 1 1 1
x4 = x3 + = (1.414215686) +
2 x3 2 1.414215686
= 1.414213562
1 1 1 1
x5 = x4 + = (1.414213562) +
2 x4 2 1.414213562
= 1.414213562
f (x) = sin x = 0
starting with x1 = 3. Since f 1 (x) = cos x, Newton’s method says that we should generate approximate
solutions by iteratively applying
f (xn ) sin xn
xn+1 = xn ´ = xn ´ = xn ´ tan xn
f (xn )
1 cos xn
2 The following computations have been carried out in double precision, which is computer speak for about 15
significant digits. We are displaying each xn rounded to 10 significant digits (9 decimal places). So each displayed
xn has not been impacted by roundoff error, and still contains more decimal places than are usually needed.
N EWTON ’ S M ETHOD
Here goes
x1 = 3
x2 = x1 ´ tan x1 = 3 ´ tan 3
= 3.142546543
x3 = 3.142546543 ´ tan 3.142546543
= 3.141592653
x4 = 3.141592653 ´ tan 3.141592653
= 3.141592654
x5 = 3.141592654 ´ tan 3.141592654
= 3.141592654
f (x) = arctan x = 0
starting with x1 = 1.5. (Of course the solution to f (x) = 0 is just x = 0; we chose x1 = 1.5 for
demonstration purposes.) Since the derivative f 1 (x) = 1+1x2 , Newton’s method gives
f (xn )
xn+1 = xn ´ = xn ´ (1 + xn2 ) arctan xn
f 1 (xn )
So3
x1 = 1.5
x2 = 1.5 ´ (1 + 1.52 ) arctan 1.5 = ´1.69
x3 = ´1.69 ´ (1 + 1.692 ) arctan(´1.69) = 2.32
x4 = 2.32 ´ (1 + 2.322 ) arctan(2.32) = ´5.11
x5 = ´5.11 ´ (1 + 5.112 ) arctan(´5.11) = 32.3
x6 = 32.3 ´ (1 + 32.32 ) arctan(32.3) = ´1575
x7 = 3, 894, 976
3 Once again, the following computations have been carried out in double precision. This time, it is clear that the
xn ’s are growing madly as n increases. So there is not much point to displaying many decimal places and we have
not done so.
N EWTON ’ S M ETHOD
The figure below shows what went wrong. In this figure, y = F1 (x) is the tangent line to
y = arctan x at x = x1 . Under Newton’s method, this tangent line crosses the x–axis at x = x2 . Then
y = F2 (x) is the tangent to y = arctan x at x = x2 . Under Newton’s method, this tangent line crosses
the x–axis at x = x3 . And so on.
The problem arose because the xn ’s were far enough from the solution, x = 0, that the tangent
line approximations, while good approximations to f (x) for x « xn , were very poor approximations
y
y = F3 (x)
y = F1 (x)
y = f (x) = tan−1 x
(x1 ,f (x1 ))
y = F2 (x)
x4 x2 x1 x3 x
to f (x) for x « 0. In particular, y = F1 (x) (i.e. the tangent line at x = x1 ) was a bad enough
approximation to y = arctan x for x « 0 that x = x2 (i.e. the value of x where y = F1 (x) crosses the
x-axis) is farther from the solution x = 0 than our original guess x = x1 . If we had started with
x1 = 0.5 instead of x1 = 1.5, Newton’s method would have succeeded very nicely:
x1 = 0.5 x2 = ´0.0796 x3 = 0.000335 x4 = ´2.51 ˆ 10´11
Example 10.0.4
4 “Compounded monthly”, means that, each month, interest is paid on the accumulated interest that was paid in all
previous months.
N EWTON ’ S M ETHOD
• The second month’s interest is [P(1 + r )] ˆ r. So at the end of month #2, the account balance
is P(1 + r ) + P(1 + r ) r = P(1 + r )2 .
• And so on.
• So at the end of n months, the account balance is P(1 + r )n .
In order for the balance at the end of n months, P(1 + r )n , to be $420, the initial deposit has to be
P = 420(1 + r )´n . That is what is meant by the statement “The present value5 of a $420 payment
made n months in the future, when the interest rate is 100r% per month, compounded monthly, is
420(1 + r )´n .”
Now back to the original problem. We will be making 60 monthly payments of $420. The
present value of all 60 payments is6
(1 + r )´1 ´ (1 + r )´61
420(1 + r )´1 + 420(1 + r )´2 + ¨ ¨ ¨ + 420(1 + r )´60 = 420
1 ´ (1 + r )´1
1 ´ (1 + r )´60 1 ´ (1 + r )´60
= 420 = 420
(1 + r ) ´ 1 r
The interest rate 100r% being charged by the car dealer is such that the present value of 60 monthly
payments of $420 is $23520. That is, the monthly interest rate being charged by the car dealer is the
solution of
1 ´ (1 + r )´60 1 ´ (1 + r )´60
23520 = 420 or 56 =
r r
or 56r = 1 ´ (1 + r )´60
or 56r (1 + r )60 = (1 + r )60 ´ 1
or (1 ´ 56r )(1 + r )60 = 1
or
f 1 (r ) = ´ 56(1 + r ) + 60(1 ´ 56r ) (1 + r )59 = (4 ´ 3416r )(1 + r )59
Apply Newton’s method with an initial guess of r1 = .002. (That’s 0.2% per month or 2.4% per
5 Inflation means that prices of goods (typically) increase with time, and hence $100 now is worth more than $100 in
10 years time. The term “present value” is widely used in economics and finance to mean “the current amount of
money that will have a specified value at a specified time in the future”. It takes inflation into account. If the money
is invested, it takes into account the rate of return of the investment. We recommend that the interested reader do
some search-engining to find out more.
6 Don’t worry if you don’t know how to evaluate such sums. They are called geometric sums, and will be covered in
the CLP-2 text. (See (1.1.3) in the CLP-2 text.) In any event, you can check that this is correct, by multiplying the
whole equation by 1 ´ (1 + r )´1 . When you simplify the left hand side, you should get the right hand side.
N EWTON ’ S M ETHOD
year.) Then
324
Differential Equations
325
I NTRODUCTION TO D IFFERENTIAL E QUATIONS
Chapter 11
(F LAVOURS A, B) I NTRODUCTION
TO D IFFERENTIAL E QUATIONS
Learning Objectives
• Explain how a differential equation is different from an algebraic equation.
• Identify solutions to simple differential equations (of the form y1 = ay ) and interpret
them in context.
• Given an initial condition, find a particular solution that satisfies a differential equation.
The equation on the right (linking a function to its own derivative) is a new kind of equation called
a differential equation (abbreviated DE). We say that f (x) = ex is a function that “satisfies” the
equation, and we call this a solution to the differential equation.
Note: The solution to an algebraic equation is a number, whereas the solution to a differential
equation is a function.
We call this a differential equation because it connects (one or more) derivatives of a function
with the function itself.
Concept Check-In
1. For what constant C does y = Cex satisfy the differential equation dy/dx = y?
We will be interested in applications in which a system or process varies over time. For this
reason, we will henceforth use the independent variable t, for time in place of the former generic “x”.
Observations.
Hint
Notice that we merely changed the notation very slightly. Now the derivative is “with
respect to” t rather than x.
It is interesting to ask: Is this is the only function that satisfies the differential equation 11.1.1? Are
there other possible solutions? What about a function such as y = 2ekt or y = 400ekt ?
The reader should show that for any constant C, the function y = Cekt is a solution to the
DE (11.1.1).
I NTRODUCTION TO D IFFERENTIAL E QUATIONS11.1 I NTRODUCING A NEW KIND OF EQUATION
Hint
Notice that the constant C in front will appear in both the derivative and the function, and so
will not change the equation.
To do so, differentiate the function and plug into (11.1.1). Verifying that the two sides of the equation
are then the same establishes the result. While we do not prove it here, it turns out that y = Cekt are
the only functions that satisfy Eqn. (11.1.1).
Let us summarize what we have found out so far:
A few comments are in order. First, unlike algebraic equations - whose solutions are numbers -
differential equations have solutions that are functions.
Concept Check-In
1. Give an example of an algebraic equation and its solution.
dy
2. Verify that y = 3e´t satisfies differential equation dt = ´y.
Second, the constant k that appears in Eqn. (11.1.2), is the same as the constant k in ekt . Depending
on the sign of k, we get either
Third, since ekt is always positive, the constant C determines the sign of the function as a whole -
whether its graph lies above or below the t axis.
A few curves of each type (C ą 0,C ă 0) are shown in each panel of Figure 11.1. The collection
of curves in a panel is called a family of solution curves. The family shares the same value of k,
but each member has a distinct value of C. Next, we ask how to specify a particular member of the
family as the solution.
I NTRODUCTION TO D IFFERENTIAL E QUATIONS11.1 I NTRODUCING A NEW KIND OF EQUATION
y y
t
t
k>0 k<0
(a) (b)
Figure 11.1: (a) A family of solutions to the differential equation (DE) (11.1.2). These are functions
of the form y = Cekt for k ą 0 and arbitrary constant C. (b) Ampther family of solutions of a DE of
the form (11.1.2), but for k ă 0.
We often refer to “solution curves” - the graphs of the family of solutions of a differential
equation, as shown, for example in the panels of Figure 11.1.
So far, we found that “many” functions can be valid solutions of the differential equation (11.1.2),
since we can chose the constant C arbitrarily in the family of solutions y = Cekt . Hence, in order
to distinguish one specific solution of interest, we need additional information. This additional
information is called an initial value, or initial condition, and it specifies one point belonging to
the solution curve of interest. A common way to set an initial value is to specify a fixed value of the
function (say y = y0 ) at time t = 0.
Definition 11.1.3 (Initial value). An initial value for a differential equation is a specified, known
value of the solution at some specific time point (usually at time t = 0).
` Adjust the sliders in this interactive graph to see how the values of k and C affect the shape
of the graph of the function y = Cekt as well as its initial value y(0) = y0 .
Note the transitions that take place when k changes from positive to negative.
Example 11.1.4. Given the differential Eqn. (11.1.2) and the initial value
y(0) = y0 ,
Concept Check-In
1. Given differential Eqn. (11.1.2) and the initial value y(0) = 1, find C for the solution in
Eqn (11.1.3).
2. Repeat the above but for the initial value y(0) = 10.
4. Use differentiation to verify that the unction y = 3e´0.5t in Example 11.1.5 is a solution
to dy/dt = ´0.5y with initial condition y(0) = 3.
Differential equations are important because they turn up in the study of many natural processes
that vary continuously. In this section we examine the way that a simple differential equation arises
when we study continuous uncontrolled population growth.
Here we set up a mathematical model for population growth. Let N (t ) be the number of
individuals in a population at time t. The population changes with time due to births and mortality.
(Here we ignore migration). Consider the changes that take place in the population size between
time t and t + h, where ∆t = h is a small time increment. Then
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH
Concept Check-In
1. What is the dependent variable in this model? The independent variable?
2. What are the units associated with each variable in this model?
h i h i h i
Change Number Number
N (t + h) ´ N (t ) = = ´ (11.2.1)
in N of births of deaths
Eqn. (11.2.1) is just a “book-keeping” equation that keeps track of people entering and leaving the
population. It is sometimes called a balance equation. We use it to derive a differential equation
linking the derivative of N to the value of N at the given time.
Notice that dividing each term by the time interval h, we obtain
N (t + h) ´ N (t ) Number of births Number of deaths
= ´ .
h h h
The term on the left “looks familiar”. If we shrink the time interval, h Ñ 0, this term is a derivative
dN/dt, so
Rate of Number Number
dN change of of births of deaths
= = ´
dt N per unit per unit per unit
time time time
For simplicity, we assume that all individuals are identical and that the number of births per unit
time is proportional to the population size. Denote by r the constant of proportionality. Similarly,
we assume that the number of deaths per unit time is proportional to population size with m the
constant of proportionality.
Both r and m have meanings: r is the average per capita birth rate, and m is the average per
capita mortality rate . Here, both are assumed to be fixed positive constants that carry units of
1/time. This is required to make the units match for every term in Eqn. (11.2.1). Then
Concept Check-In
1. If there are 10 births/year in a population of size 1000, what is the birth rate r? Give units.
2. If there are 11 deaths/year in a population of size 1000, what is the mortality rate m? Give
units.
3. Given the above conditions, what is the net growth rate k for such a population? Give
units. Is the population growing or shrinking?
Consequently, we have
We refer to constants such as r, m as parameters. In general, for a given population, these would
have specific numerical values that could be found through experiment, by collecting data, or by
making simple assumptions. In Section 11.2, we show how some elementary assumptions about
birth and mortality could help to estimate approximate values of r and m.
Taking the assumptions and the form of the balance equation (11.2.1) together we have arrived
at:
dN
= rN ´ mN = (r ´ m)N. (11.2.2)
dt
This is a differential equation: it links the derivative of N (t ) to the function N (t ). By solving
the equation (i.e. identifying its solution), we are be able to make a projection about how fast a
population is growing.
Define the constant k = r ´ m. Then k is the net growth rate, of the population, so
dN
= kN, for k = (r ´ m).
dt
Suppose we also know that at time t = 0, the population size is N0 . Then:
• The function that describes population over time is (by previous results),
(The result is identical to what we saw previously, but with N rather than y as the time-
dependent function. We can easily check by differentiation that this function satisfies
Eqn. (11.2.2).)
• The initial condition N (0) = N0 , allows us to specify the (otherwise arbitrary) constant
multiplying the exponential function.
• The population grows provided k ą 0 which happens when r ´ m ą 0 i.e. when birth rate
exceeds mortality rate.
• If k ă 0, or equivalently, r ă m then more people die on average than are born, so that the
population shrinks and (eventually) go extinct.
number
of people
age
0 80
Figure 11.2: Flat age distribution assumption
We assume a uniform age distribution to to determine the fraction of people who are fertile (and can
give birth) or who are old (and likely to die). While slightly silly, this simplification helps estimate
the desired parameters.
Assumptions.
• The age distribution of the population is “flat”, i.e. there are as many 10 year-olds as 70 year
olds. Of course, this is quite inaccurate, but a good place to start since it is easy to estimate
some of the quantities we need. Figure 11.2 shows such a uniform age distribution.
• The sex ratio is roughly 50%. This means that half of the population is female and half male.
number of
people fertile
age
0 15 55 80
We assume that only women between the ages of 15 and 55 years old are fertile and can give birth.
Then, according to our uniform age distribution assumption, half of all women are between these
ages and hence fertile.
• Women are fertile and can have babies only during part of their lives: we assume that the
fertile years are between age 15 and age 55, as shown in Figure 11.3.
• A lifetime lasts 80 years. This means that for half of that time a given woman can contribute
(55´15)
to the birth rate, or that 80 = 50% of women alive at any time are able to give birth.
• During a woman’s fertile years, we assume that on average, she has one baby every 10 years.
• We assume that deaths occur only from old age (i.e. we ignore disease, war, famine, and child
mortality.)
• We assume that everyone lives precisely to age 80, and then dies instantly.
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH
number of
people mortality occurs
here
age
0 80
We assume that the people in the age bracket 79-80 years old all die each year, and that those are the
only deaths. This, too, is a silly assumption, but makes it easy to estimate mortality in the population.
Based on the above assumptions, we can estimate the birthrate parameter r as follows:
number women years fertile number babies per woman
r= ¨ ¨
population years of life number of years
Thus we compute that
1 1 1
r= ¨ ¨ = 0.025 births per person per year.
2 2 10
Concept Check-In
1. Under these assumptions, for a population size of 800, how many male 35 year-olds
would you expect? Women in their 60’s?
Note that this value is now a rate per person per year, averaged over the entire population (male and
female, of all ages). We need such an average rate since our model of Eqn. (11.2.2) assumes that
individuals “are identical”. We now have an approximate value for the average human per capita
birth rate, r « 0.025 per year.
Next, using our assumptions, we estimate the mortality parameter, m. With the flat age distribu-
tion shown in Figure 11.2, there would be a fraction of 1/80 of the population who are precisely
removed by mortality every year (i.e. only those in their 80th year.) In this case, we can estimate
that the per capita mortality is:
1
m= = 0.0125 deaths per person per year.
80
The net per capita growth rate is k = r ´ m = 0.025 ´ 0.0125 = 0.0125 per person per year. We
often refer to the constant k as a growth rate constant and we also say that the population grows at
the rate of 1.25% per year.
Example 11.2.1. Using the results of this section, find a prediction for the population size N (t ) as a
function of time t.
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH
15
N (t)
10
5
20 40 60 80 100
t (years)
Figure 11.5: Projected world population
Projected world population (in billions) over 100 years, based on the model in Eqn. (11.2.4) and
assuming that the initial population is « 7 billion.
Concept Check-In
1. Based on Figure 11.5, when would we expect the human population to reach 15 billion?
Example 11.2.2 (Human population in 100 years). Given the initial condition N (0) = 7 billion,
determine the size of the human population at t = 100 years predicted by the model.
Solution. At time t = 0, the population is N (0) = N0 = 7 billion. Then in billions,
N (t ) = 7e0.0125t
Thus, with a starting population of 7 billion, there would be about 24.4 billion after 100 years based
on the uncontrolled continuous growth model. ♦
A critique. Before leaving our population model, we should remember that our projections hold
only so long as some rather restrictive assumptions are made. We have made many simplifications,
and ignored many features that would seriously affect these results. These include (among others),
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH
• variations in birth and mortality rates that stem from competition for resources and,
We have also assumed that the age distribution is uniform (flat), but that is not accurate: the
population grows only by adding new infants, and this would skew the distribution even if it is
initially uniform. All these factors suggest that some “healthy skepticism” should be applied to
any model predictions. Predictions may cease to be valid if model assumptions are not satisfied.
This caveat will lead us to think about more realistic models for population growth. Certainly, the
uncontrolled exponential growth would not be sustainable in the long run. That said, such a model
is a good starting point for a first description of population growth, later to be adjusted.
ln(2)
τ= .
k
Concept Check-In
1. What are the units associated with τ?
2. The human population hit 3 billion in 1959. How does this fit with our (imperfect) model?
Example 11.2.3 (Human population doubling time). Determine the doubling time for the human
population based on the results of our approximate growth model.
Solution. We have found a growth rate of roughly k = 0.0125 per year for the human population.
Based on this, it would take
ln(2)
τ= = 55.45 years
0.0125
for the population to double. Compare this with the graph of Fig 11.5, and note that over this time
span, the population increases from 6 to 12 billion. ♦
Note: the observant student may notice that we are simply converting back from base e to base 2
when we compute the doubling time.
We summarize an important observation:
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.3 R ADIOACTIVE DECAY
ln(2)
τ= .
k
This is shown in Figure 11.6. We have discovered that based on the uncontrolled growth model,
the population doubles every 55 years! After 110 years, for example, there have been two doublings,
or a quadrupling of the population.
y
2y0
y0
t
τ
Figure 11.6: Doubling time for exponential growth.
Example 11.2.4 (A ten year doubling time). Suppose we are told that some animal population
doubles every 10 years. What growth rate would lead to such a trend?
ln(2)
τ= ,
k
we obtain
ln(2) 0.6931
k= = « 0.07 per year.
τ 10
Thus, a growth rate of 7% leads to doubling roughly every 10 years. ♦
In this section, we use the same kind of book-keeping (keeping track of the number of radioactive
atoms remaining) as in the population growth example, to arrive at a differential equation that
describes the process. Once we have the equation, we determine its solution and make a long-term
prediction about the amount of radioactivity remaining at a future time.
(1) The process of radioactive decay is random, but on average, the probability of decay for a given
radioactive atom is k per unit time where k ą 0 is some constant.
(2) During each (small) time interval of length ∆t = h, a radioactive atom has probability kh of
decaying. This is merely a restatement of (1).
Concept Check-In
1. Suppose a given atom has a 1% chance of decay per 24 hours. What is this atom’s
probability of decay per week? Per hour?
Suppose that at some time t, there are N (t ) radioactive atoms. Then, according to our assump-
tions, during the time period t ď t ď t + h, on average khN (t0 ) atoms would decay. How many are
there at time t + h? We can write the following balance-equation:
Amount left Amount present Amount decayed
at time = at time ´ during time interval
t +h t t ď t ď t +h
or, restated:
N (t + h) = N (t ) ´ khN (t ). (11.3.1)
Here we have assumed that h is a small time period. Rearranging Eqn. (11.3.1) leads to
N (t + h) ´ N (t )
= ´kN (t ).
h
Considering the left hand side of this equation, we let h get smaller and smaller (h Ñ 0) and recall
that
N (t + h) ´ N (t ) dN
lim = = N 1 (t )
hÑ0 h dt
where we have used the notation for a derivative of N with respect to t. We have thus shown that a
description of the population of radioactive atoms reduces to
dN
= ´kN. (11.3.2)
dt
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.3 R ADIOACTIVE DECAY
We have, once more, arrived at a differential equation that provides a link between a function of time
N (t ) and its own rate of change dN/dt. Indeed, this equation specifies that dN/dt is proportional
to N, but with a negative constant of proportionality which implies decay.
Above we formulated the entire model in terms of the number of radioactive atoms. However,
as shown below, the same equation holds regardless of the system of units used measure the amount
of radioactivity
Example 11.3.1. Define the number of moles of radioactive material by y(t ) = N (t )/A where A
is Avogadro’s number (the number of molecules in 1 mole: « 6.022 ˆ 1023 - a dimensionless
quantity, i.e. just a number with no associated units). Determine the differential equation satisfied
by y(t ).
Solution. We write y(t ) = N (t )/A in the form N (t ) = Ay(t ) and substitute this expression for N (t )
in Eqn. (11.3.2). We use the fact that A is a constant to simplify the derivative. Then
dN Ady(t ) dy(t )
= ´kN ñ = ´k(Ay(t )) ñ A = A(´ky(t ))
dt dt dt
cancelling the constant A from both sides of the equations leads to
dy(t ) dy
= ´ky(t ), or simply = ´ky. (11.3.3)
dt dt
Thus y(t ) satisfies the same kind of differential equation (with the same negative proportionality
constant) between the derivative and the original function. We will refer to (11.3.3) as the decay
equation. ♦
y0
y0
2
t
[-1in] τ
Example 11.3.4 (Chernobyl: April 1986). In 1986 the Chernobyl nuclear power plant exploded, and
scattered radioactive material over Europe. The radioactive element iodine-131 (I131 ) has half-life
of 8 days whereas cesium-137 (Cs137 ) has half life of 30 years. Use the model for radioactive decay
to predict how much of this material would remain over time.
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.4 S UMMARY
Solution. We first determine the decay constants for each of these two elements, by noting that
ln(2)
k= ,
τ
and recalling that ln(2) « 0.693. Then for I131 we have
ln(2) ln(2)
k= = = 0.0866 per day.
τ 8
Then the amount of I131 left at time t (in days) would be
yI (t ) = y0 e´0.0866t .
For Cs137
ln(2)
k= = 0.023 per year.
30
so that for T in years,
yC (T ) = y0 e´0.023T .
Note: we have used T rather than t to emphasize that units are different in the two calculations done
in this example.
Example 11.3.5 (Decay to 0.1% of the initial level). How long it would take for I131 to decay to 0.1
% of its initial level? Assume that the initial level occurred just after the explosion at Chernobyl.
Solution. We must calculate the time t such that yI = 0.001y0 :
Therefore,
ln(0.001) ´6.9
t= = = 79.7 days.
´0.0866 ´0.0866
Thus it would take about 80 days for the level of Iodine-131 to decay to 0.1% of its initial level. ♦
Concept Check-In
1. Repeat the calculation in Example 11.3.5 for Cesium.
2. Convert the Cesium decay time units to days and repeat the calculation of Example 11.3.4
with the new time units.
3. If the decay rate of a substance is 10% per day, what is its half-life?
11.4 IJ Summary
1. A differential equation is a statement linking the rate of change of some state variable with
current values of that variable. An example is the simplest population growth model: if N (t )
is population size at time t:
dN
= kN.
dt
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.4 S UMMARY
2. A solution to a differential equation is a function that satisfies the equation. For instance, the
function N (t ) = Cekt (for any constant C) is a solution to the unlimited population growth
model (we check this by the appropriate differentiation). Graphs of such solutions (e.g. N
versus t) are called solution curves.
3. To select a specific solution, more information (an initial condition) is needed. Given this
information, e.g. N (0) = N0 , we can fully characterize the desired solution.
4. The decay equation is one representative of the same class of problems, and has an exponen-
tially decaying solution.
dy
= ´ky, y(0) = y0 ñ Solution: y(t ) = y0 e´kt . (11.4.1)
dt
5. So far, we have seen simple differential equations with simple (exponential) functions for
their solutions. In general, it may be quite challenging to make the connection between the
differential equation (stemming from some application or model) with the solution (which we
want in order to understand and predict the behaviour of the system.)
Scientific
problem or
system
“Laws of
Facts,
Nature” or
observations
statements
assumptions,
about rates
hypotheses
of change
Predictions
about the Mathematical
systme model
behaviour
Solutions to Differential
the equation(s)
differential describing
equations the system
[0in]
Figure 11.8: A “flow chart” showing how differential equations originate from scientific problems.
In this chapter, we saw examples in which a natural phenomenon (population growth, radioactive
decay, cell growth) motivated a mathematical model that led to a differential equation. In both cases,
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.4 S UMMARY
that equation was derived by making a statement that tracked the amount or number or mass of a
system over time. Numerous simplifications were made to derive each differential equation. For
example, we assumed that the birth and mortality rates stay fixed even as the population grows to
huge sizes.
• Our purpose was to illustrate how a simple model is created, and what such models can
predict.
• In general, differential equation models are often based on physical laws (“F = ma”) or
conservation statements (“rate in minus rate out equals net rate of change”, or “total energy =
constant”).
• In biology, where the laws governing biochemical events are less formal, the models are often
based on some mix of speculation and reasonable assumptions.
• In Figure 11.8 we illustrate how the scientific method leads to a cycle between the mathematical
models and their test and validation using observations about the natural world.
(a) y = 20e3t ;
(b) y = 5e´3t ;
dy
(c) dt = 3t;
dy
(d) dx = ´5x.
3. Determine the half life of the of the exponential decay function N (t ) = 500e´2t .
15
y
10
20 40 60 80 100
t
Chapter 12
(F LAVOURS A, B) S OLVING
DIFFERENTIAL EQUATIONS
In Chapter 11, we introduced differential equations to keep track of continuous changes in the
growth of a population or the decay of radioactivity. We encountered a differential equation that
tracks changes in cell mass due to nutrient absorption and consumption. Finally, we learned that
the solutions to a differential equation is a function. In applications studied, that function can be
interpreted as predictions of the behaviour of the system or process over time.
In this chapter, we further develop some of these ideas. We explore several techniques for finding
and verifying that a given function is a solution to a differential equation. We then examine a simple
class of differential equations that have many applications to processes of production and decay,
and find their solutions. Finally, we show how an approximation method provides for numerical
solutions of such problems.
Example 12.1.1. Show that the function y(t ) = (2t + 1)1/2 is a solution to the differential equation
and initial condition
dy 1
= , y(0) = 1.
dt y
Solution. First, we check the derivative, obtaining
LHS RHS
dy
1´y
dt
d [y0 e´t ]
1 ´ y0 e´t
dt
´y0 e´t X
Table 12.1: The function y(t ) = y0 e´t is not a solution to the differential equation (12.1.1). Plugging
the function into each side of the DE and simplifying (down the rows) leads to expressions that do
not match.
LHS RHS
dy
1´y
dt
d
[1 ´ (1 ´ y0 )e´t ] 1 ´ 1 ´ (1 ´ y0 )e´t
dt
de´t
´(1 ´ y0 ) (1 ´ y0 )e´t
dt
(1 ´ y0 )e´t X
Table 12.2: (b) The function y(t ) = 1 ´ (1 ´ y0 )e´t is a solution to the differential equation (12.1.1).
The expressions we get by evaluating each side of the differential equation do match.
Hence, the function satisfies the differential equation. We must also verify the initial condition. We
find that y(0) = (2 ¨ 0 + 1)1/2 = 11/2 = 1. Thus the initial condition is also satisfied, and y(t ) is
indeed a solution. ♦
Example 12.1.2. Consider the differential equation and initial condition
dy
= 1 ´ y, y(0) = y0 . (12.1.1)
dt
a) Show that the function y(t ) = y0 e´t is not a solution to this differential equation.
b) Show that the function y(t ) = 1 ´ (1 ´ y0 )e´t is a solution.
Solution.
a) To check whether y(t ) = y0 e´t is a solution to the differential equation (12.1.1), we substitute
the function into each side (“left hand side”, LHS; “right hand side”. RHS) of the equation. We
show the results in the columns of Table 12.1. After some steps in the simplification, we see that
the two sides do not match, and conclude that the function is not a solution, as it fails to satisfy
the equation
b) Similarly, we check the second function. The calculations are shown in columns of Table 12.2.
We find that RHS=LHS, so the differential equation is satisfied. Finally, let us show that the
initial condition y(0) = y0 is also satisfied. Plugging in t = 0 we have
y(0) = 1 ´ (1 ´ y0 )e0 = 1 ´ (1 ´ y0 ) ¨ 1 = 1 ´ (1 ´ y0 ) = y0 .
S OLVING DIFFERENTIAL EQUATIONS 12.1 V ERIFYING THAT A FUNCTION IS A SOLUTION
Concept Check-In
1. Draw a diagram of the system described in Example 12.1.3.
2. What set of units would be reasonable for each of the parameters in Example 12.1.3.
3. Create a table to organize the calculations for this example, similar to Tables 12.1 and
12.2.
As shown in Examples 12.1.1- 12.1.3, if we are told that a function is a solution to a differential
equation, we can check the assertion and verify that it is correct or incorrect. A much more difficult
task is to find the solution of a new differential equation from first principles.
In some cases, integration, learned in second semester calculus, can be used. In others, some
transformation that changes the problem to a more familiar one is helpful - an example of this type
is presented in Section 12.2. In many cases, particularly those of so-called non-linear differential
equations, great expertise and familiarity with advanced mathematical methods are required to
find the solution to such problems in an analytic form, i.e. as an explicit formula. In such cases,
approximation and numerical methods are helpful.
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by
dy
i An explanation of the way we find solutions to equations of the form dt = a ´ by, with
y(0) = y0 .
dy a
=0 ñ a ´ by = 0 ñ y= .
dt b
In other words, if we were to start with the initial value y(0) = a/b, then that value would not
Figure 12.1: y = a/b is a constant solution to the differential equation in (12.2.1). We call this type
of solution a steady state.
change, since it satisfies dy/dt = 0, so that the solution at all future times would be y(t ) = a/b.
(Of course, this is a perfectly good function; it is simply a function that is alway constant.)
We refer to such constant solutions as Steady States.
dy dy a
= a ´ by ñ = ´b y ´ ,
dt dt b
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by
(having factored out ´b). The advantage is that we recognize the expression (y ´ ab ) as the difference,
or deviation of y away from its steady state value. (That deviation could be either positive or negative,
depending on whether y is larger or smaller than a/b.) We ask whether this deviation gets larger or
smaller as time goes by, i.e., whether y gets further away or closer to its steady state value a/b.
Define z(t ) as that deviation, that is
Figure 12.2: We define z(t ) as the deviation of y from its steady state value. Here we show two
typical initial values of z, where z0 = y0 ´ ab .
a
z(t ) = y(t ) ´ ,
b
Then, since a, b are constants, we recognize that
dz dy
= .
dt dt
Second, the initial value of z follows simply from the initial value of y:
a a
z(0) = y(0) ´ = y0 ´ .
b b
Now we can transform the equation (12.2.1) into a new differential equation for the variable z
by using these two facts. We can replace the y derivative by the z derivative, and also, using
Eqn. (12.2.1), find that
dz dy a
= = ´b y ´ = ´bz.
dt dt b
Hence, we have transformed the original DE and IC into the new problem
Figure 12.3: The deviation away from steady state (blue, grey curves) is z(t ) = y(t ) ´ a/b. We can
solve the differential equation for z(t ) because it is a simple exponential decay equation. Here we
show two typical solutions for z.
dz h ai
= ´bz, z(0) = z0 , where z0 = y0 ´ .
dt b
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by
But this is the familiar decay initial value problem that we have already solved before. So
z(t ) = z0 e´bt .
We have arrived at the conclusion that the deviation from steady state decays exponentially with
time, provided that b ą 0. Hence, we already know that y should get closer to the constant value
a/b as time goes by!
We can do even better than this, by transforming the solution we found for z(t ) into an expression
for y(t ). To do so, use the definition once more, setting
` Adjust the sliders to see how the parameters a and b and the initial value y0 affect the shape
of the function y(t ) in the formula (12.2.2).
a a ´bt
z(t ) = z0 e´bt
ñ y(t ) ´ = y0 ´ e .
b b
Solving for y(t ) then leads to
a a ´bt
y(t ) = + y0 ´ e . (12.2.2)
b b
Example 12.2.1 (a = b = 1). Suppose we are given the differential equation and initial condition
dy
= 1 ´ y, y(0) = y0 . (12.2.3)
dt
Determine the solution to this differential equation.
Solution.
Concept Check-In
1. Find the steady state of Eqn. (12.2.3).
2. From Figure 12.5, determine what were the four different initial conditions used.
3. Rewrite these four initial conditions as the initial deviations away from steady state, that
is, give the initial values, z0 of the deviation.
y
3
Solutions to
differential equation
dy
dt = 1 − y
2
0.5 1 1.5 2
time, t
Concept Check-In
1. What can we say about the units of T and E?
Suppose the object is warmer than its environment (T (t ) ą E). Then T (t ) ´ E ) ą 0 and α ě 0
implies that dT /dt ą 0 which says that the temperature of the object should get warmer! But this
does not agree with our everyday experience: a hot cup of coffee cools off in a chilly room. Hence
α ě 0 cannot be correct. Based on this, we conclude that Newton’s Law of Cooling, written in the
form of a differential equation, should read:
dT
= k(E ´ T (t )), where k ą 0. (12.2.4)
dt
Note: the sign of the term in braces has been switched.
Typically, given the temperature at some initial time T (0) = T0 , we want to predict T (t ) for later
time.
Example 12.2.2. Consider the temperature T (t ) as a function of time. Solve the differential equation
for Newton’s law of cooling
dT
= k (E ´ T ),
dt
together with the initial condition T (0) = T0 .
Solution. As before, we transform the variable to reduce the differential equation to one that we
know how to solve. This time, we select the new variable to be z(t ) = E ´ T (t ). Then, by steps
similar to previous examples, we find that
dz(t )
= ´kz.
dt
We also rewrite the initial condition in terms of z, leading to z(0) = E ´ T (0) = E ´ T0 . After
carrying out Steps 1-3 as before, we find the solution for T (t ),
Concept Check-In
1. Fill in the details for Example 12.2.2.
2. In Figure 12.6, what are the five different initial temperatures, T0 corresponding to each
solution curve?
3. In Figure 12.6, how many curves represent a heating object and how many a cooling
object?
In Figure 12.6 we show a family of curves of the form of Eqn. (12.2.5) for five different initial
temperature values (we have set E = 10 and k = 0.2 for all these curves). ♦
Next, we interpret the behaviour of these solutions.
Example 12.2.3. Explain (in words) what the form of the solution in Eqn. (12.2.5) of Newton’s law
of cooling implies about the temperature of an object as it warms or cools.
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by
20
temperature, T 15
10
0
t
[-1in] 0 2 4 6 8 10 12 14
• It is straightforward to verify that the initial temperature is T (0) = T0 (substitute t = 0 into the
solution of Eqn. (12.2.5)). Now examine the time dependence. Only one term, e´kt depends
on time. Since k ą 0, this is an exponentially decaying function, whose magnitude shrinks
with time. The whole term that it multiplies, (T0 ´ E )e´kt , continually shrinks. Hence,
Thus the temperature of the object always approaches the ambient temperature. This is evident
in the solution curves shown in Figure 12.6.
• We also observe that the direction of approach (decreasing or increasing) depends on the
sign of the constant (T0 ´ E ). If T0 ą E, the temperature approaches E from above, whereas
if T0 ă E, the temperature approaches E from below.
• In the specific case that T0 = E, there is no change at all. T = E satisfies dT /dt = 0, and
corresponds to a steady state of the differential equation, as previously defined.
Concept Check-In
1. Consider three cups of coffee left in a 20˝ C room. If one is iced, another is piping hot,
and the third is room temperature, which cup will not change temperature? Which, thus,
represents a steady state?
Solution. We assume that body-temperature just before death was 37˝ C (normal human body
temperature). Let t = 0 be the time of death. Then the initial temperature is T (0) = T0 = 37˝ C. We
want to find the time elapsed until the body was found, i.e. time t at which the temperature of the
body had cooled down to 27˝ C. We assume that the ambient temperature, E = 10, was constant.
From Newton’s law of cooling, the body temperature satisfies
dT
= k(10 ´ T ).
dt
From previous work and Eqn. (12.2.5), the solution to this DE is
T (t ) = 10 + (37 ´ 10)e´kt .
We do not know the value of the constant k, but we have enough information to find it. First, at
discovery, the body’s temperature was 27˝ . Hence at time t
27 = 10 + 27e´kt ñ 17 = 27e´kt .
Thus,
14 = 27e´k(t +1) .
We have two equations for the two unknowns t and k. To solve for k, take a ratio of the sides of
the equations. Then
14 27e´k(t +1) 14
= ´kt
=e ´k
ñ ´k = ln = ´0.194 .
17 27e 17
This is the constant that describes the rate of cooling of the body.
To find the time of death, t, use
17
17 = 27e ´kt
ñ ´kt = ln = ´0.4626
27
finally, solving for t, we get
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by
Concept Check-In
1. Give the concluding sentence for Example 12.2.4. Be sure to include an actual time of
death, given that the body was discovered at midnight.
3. Use your plot to estimate how long it took for the body to cool off to 33˝ C.
0.4626 0.4626
t= = = 2.384 hours.
k 0.194
♦
Suppose that a, b ą 0 in Eqn. (12.2.6). Then we can summarize the behaviour of the solutions
(12.2.7) as follows:
• The time dependence of Eqn. (12.2.7) is contained in the term e´bt , which (for b ą 0) is
exponentially decreasing. As time increases, t Ñ 8, the exponential term becomes negligibly
small, so y Ñ a/b.
• If initially y(0) = y0 ą a/b, then y(t ) approaches a/b from above, whereas if y0 ă a/b, it
approaches a/b from below.
• If initially y0 = a/b, there is no change at all (dy/dt = 0). Thus y = a/b is a steady state of
the DE in Eqn. (12.2.6).
Recognizing such general structure means that we can avoid repeating similar calculations from
scratch in related examples. Newton’s law of cooling is one representative of the class of differential
equations of the form Eqn. (12.2.6). If we set a = kE, b = k and T = y in Eqn. (12.2.6), we get
back to Eqn. (12.2.4). As expected from the general case, T approaches a/b = E, the ambient
temperature, which corresponds to a steady state of NLC.
Next, we describe other examples that share this structure, and hence similar dynamic behaviour.
Friction and terminal velocity A falling object accelerates under the force of gravity, but friction
slows down this acceleration.
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by
Note
Eqn. (12.2.8) comes from a simple force balance:
ma = Fgravity ´ Fdrag ,
and from the assumption that Fdrag = µv, where µ ą 0 is the “drag coefficient”. Dividing both
sides by m and replacing a by dv/dt leads to this equation, with k = µ/m.
The differential equation satisfied by the velocity v(t ) of the falling object with friction is
dv
= g ´ kv (12.2.8)
dt
where g ą 0 is acceleration due to gravity and k ą 0 is a constant representing the effect of air
resistance. Usually, a frictional force is assumed to be proportional to the velocity of the object,
and to act in a direction that slows it down. (This accounts for the negative sign in Eqn. (12.2.8).)
Parachutes operate on the principle of enhancing that frictional force to damp out the acceleration of
a skydiver. Hence, Eqn. (12.2.8) is often called the skydiver equation.
Example 12.2.5. Use the general results for Eqn. (12.2.6) to write down the solution to the differen-
tial equation (12.2.8) for the velocity of a skydiver given the initial condition v(0) = v0 . Interpret
your results in a simple description of what happens over time.
Concept Check-In
1. Assign appropriate units to each of the parameters in Example 12.2.5.
2. When a sky-diver steps into the void, her initial vertical velocity is zero. Write down her
velocity v(t ) based on results of Example 12.2.5 .
Solution. Eqn. (12.2.8) is of the same form as Eqn. (12.2.6), and has the same type of solutions. We
merely have to adjust the notation, by identifying
v(t ) Ñ y(t ), g Ñ a, k Ñ b, v0 Ñ y0 .
Hence, without further calculation, we can conclude that the solution of (12.2.8) together with its
initial condition is:
g g
v(t ) = ´ ´ v0 e´kt . (12.2.9)
k k
The velocity is initially v0 , and eventually approaches g/k which is the steady state or terminal
velocity for the object. Depending on the initial speed, the object either slows down (if v0 ą g/k) or
speed up (if v0 ă g/k) as it approaches the terminal velocity. ♦
Chemical production and decay. A chemical reaction inside a fixed reaction volume produces
a substance at a constant rate Kin . A second reaction results in decay of that substance at a rate
proportional to its concentration. Let c(t ) denote the time-dependent concentration of the substance,
and assume that time is measured in units of hours. Then, writing down a balance equation leads to
a differential equation of the form
dc
= Kin ´ γc. (12.2.10)
dt
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by
Here, the first term is the rate of production and the second term is the rate of decay. The
net rate of change of the chemical concentration is then the difference of the two. The constants
Kin ą 0, γ ą 0 represent the rate of production and decay - recall that the units of each term in any
equation have to match.For example, if the concentration c is measured in units of milli-Molar
(mM), then dc/dt has units of mM/hr, and hence Kin must have units of mM/h and γ must have
units of 1/hr.
Example 12.2.6. Write down the solution to the DE (12.2.10) given the initial condition c(0) = c0 .
Determine the steady state chemical concentration.
Regardless of its initial condition, the chemical concentration will approach a steady state concentra-
tion is c = Kin /γ. ♦
In this section we have seen that the behaviour found in the general case of the differential
equation (12.2.1), can be reinterpreted in each specific situation of interest. This points to one
powerful aspect of mathematics, namely the ability to use results in abstract general cases to solve a
variety of seemingly unrelated scientific problems that share the same mathematical structure.
Featured Problem 12.2.7 (Greenhouse Gasses and atmospheric CO2 )
Climate change has been attributed partly to the accumulation of greenhouse gasses (such as carbon
dioxide and methane) in the atmosphere.
[0in]
Figure 12.7: CO2 is produced by emissions from burning fossil fuel and other human activities
(orange arrow). The oceans and plant biomass are both sinks that absorbs CO2 (light green arrows).
Here we consider a simplified illustrative model for the carbon cycle that tracks the sources and
sinks of CO2 in the atmosphere. Consider C (t ) as the level of atmospheric carbon dioxide. Define
the production rate of C02 due to utilization of fossil fuel and other human activity to be EFF , and
let the rate of absorption of CO2 by the oceans be SOCEAN . We will also assume that living plants
absorb CO2 at a rate proportional to their biomass and to the CO2 level.
S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS
2. Assuming that EFF , SOCEAN , γ, P are constants, find the steady state level of CO2 in terms of
these parameters.
Hint
CO2 is usually given in units of “parts per million”, ppm (=10´6 ) ,1 ppm = 2.1 GtC.
(1GtC= 1 gigaton carbon = 109 tons.)
Time is typically given in years, so rates are “per year” (yr´1 ).
Approximate parameter values:
3. Find C (t ), that is, predict the amount of CO2 over time, assuming that C (0) = C0 .
4. Graph the function C (t ) for parameter values given in the problem, assuming that C0 =
400ppm= 840 GtC.
5. How big an effect would be produced on the CO2 level in 50 years if 15% of the plant biomass
is removed to deforestation just prior to t = 0?
Learning Objectives
• Explain how a differential equation may be solved computationally using linear ap-
proximations. That is, explain how Euler’s method works.
• Explain what each term represents in the formula for Euler’s method.
• Use Euler’s method to solve a differential equation by hand (small number of steps)
S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS
So far, we have explored ways of understanding the behaviour predicted by a differential equation
in the form of an analytic solution, namely an explicit formula for the solution as a function of
time. However, in reality this is typically difficult without extensive training, and occasionally,
impossible even for experts. Even if we can find such a solution, it may be inconvenient to determine
its numerical values at arbitrary times, or to interpret its behaviour.
For this reason, we sometimes need a method for computing an approximation for the desired
solution. We refer to that approximation as a numerical solution. The idea is to harness a
computational device - computer, laptop, or calculator - to find numerical values of points along
the solution curve, rather than attempting to determine the formula for the solution as a function
of time. We illustrate this process using a technique called Euler’s method, which is based on an
approximation of a derivative by the slope of a secant line.
Below, we describe how Euler’s method is used to approximate the solution to a general initial
value problem (differential equation together with initial condition) of the form
dy
= f (y), y(0) = y0 .
dt
Set up. We first must pick a “step size,” ∆t, and subdivide the t axis into discrete steps of that size.
We thus have a set of time points t1 ,t2 , . . . , spaced ∆t apart as shown in Figure 12.8. Our procedure
starts with the known initial value y(0) = y0 , and uses it to generate an approximate value at the
next time point (y1 ), then the next (y2 ), and so on. We denote by yk the value of the independent
variable generated at the k’th time step by Euler’s method as an approximation to the (unknown)
true solution y(tk ).
∆t
time
t0 t1 t2 t3 t4 t5
Figure 12.8: The time axis is subdivided into steps of size ∆t.
Concept Check-In
1. If ∆t = 0.1 and t0 = 0, what are t1 ,t2 and t3 ?
2. Explain the difference between the value y1 and the true solution y(t1 ).
3. If ∆t is not sufficiently small, why might Euler’s method give a bad approximation to the
solution?
S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS
This approximation is reasonable only when ∆t, the time step size, is small. Rearranging this
equation leads to a process (also called recurrence relation) for linking values of the solution at
successive time points,
yk+1 ´ yk
= f (yk ), ñ yk+1 = yk + ∆t ¨ f (yk ). (12.3.1)
∆t
Application. We start with the known initial value, y0 . Then (setting the index to k = 0 in
Eqn. (12.3.1)) we obtain
y1 = y0 + f (y0 )∆t.
The quantities on the right are known, so we can compute the value of y1 , which is the approximation
to the solution y(t1 ) at the time point t1 . We can then continue to generate the value at the next time
point in the same way, by approximating the derivative again as a secant slope. This leads to
y2 = y1 + f (y1 )∆t.
Concept Check-In
1. In Euler’s method, can you determine t2 directly? That is, without first computing t1 ?
2. In Euler’s method, can you determine y2 directly? That is, without first computing y1 ?
Applying this approximation repeatedly, leads to an iteration method, that is, the repeated
computation
y1 = y0 + f (y0 )∆t,
y2 = y1 + f (y1 )∆t,
..
.
yk+1 = yk + f (yk )∆t.
From this iteration, we obtain the approximate values of the function yk « y(tk ) for as many time
steps as desired starting from t = 0 in increments of ∆t up to some final time T of interest.
It is customary to use the following notations:
• h = ∆t : common notations for the step size, i.e. the distance between the points along the t
axis.
• tk : the k’th time point. Note that since the points are at multiples of the step size that we have
picked, tk = k∆t = kh.
• y(t ) : the actual value of the solution to the differential equation at time t. This is usually not
known, but in the examples discussed in this chapter, we can solve the differential equation
exactly, so we have a formula for the function y(t ). In most hard scientific problems, no such
formula is known in advance.
S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS
• y(tk ) : the actual value of the solution to the differential equation at one of the discrete time
points, tk (again, not usually known).
• yk : the approximate value of the solution obtained by Euler’s method. We hope that this
approximate value is fairly close to the true value, i.e. that yk « y(tk ), but there is always some
error in the approximation. More advanced methods that are specifically designed to reduce
such errors are discussed in courses on numerical analysis.
Example 12.3.1. Apply Euler’s method to approximating solutions for the simple exponential
growth model that was studied in Chapter 11,
dy
= ay, y(0) = y0
dt
where a is a constant (see Eqn 11.1.2).
Concept Check-In
1. Carry our Example 12.3.1 with ∆t = 0.1, a = 1, and y0 = 1.
2. Plot the first 5 points you determine. Compare with the true solution.
3. Solve the initial value problem in Example 12.3.2 analytically. Compare the points (0, 100),
(0.1, 95), (0.2, 90.25) and (0.3, 85.7375) with the true solution at the corresponding t
values.
Solution. Subdivide the t axis into steps of size ∆t, starting with t0 = 0, and t1 = ∆t,t2 = 2∆t, . . .
The first value of y is known from the initial condition,
y0 = y(0) = y0 .
y1 = y0 + a∆ty0 = y0 (1 + a∆t ),
y2 = y1 (1 + a∆t ),
y3 = y2 (1 + a∆t ),
and so on. At every stage, the quantity on the right hand side depends only on value of yk that as
already known from the step before. ♦
The next example demonstrates Euler’s method applied to a specific differential equation.
S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS
k tk yk
0 0 100.00
1 0.1 95.00
2 0.2 90.25
3 0.3 85.74
4 0.4 81.45
5 0.5 77.38
y0 =100.
y1 =y0 (1 + a∆t ) = 100(1 + (´0.5)(0.1)) = 95, etc.
We show the first five values in Table 12.3. Clearly, these kinds of repeated calculations are best
handled on a spreadsheet or similar computer software.
P Link to Google Sheets. This spreadsheet implements Euler’s method for Example 12.3.2.
You can view the formulae by clicking on a cell in the sheet but you cannot edit the sheet here.
♦
Below, we use Euler’s method to compute a solution from each of several initial conditions, T (0) =
0, 5, 15, 20 degrees.
Example 12.3.4 (Euler’s method applied to Newton’s law of cooling). Write the Euler’s method
procedure for the approximate solution to the problem in Example 12.3.3.
Figure 12.9: Euler’s method applied to Newton’s law of cooling. The graph shows the true solution (red) and the
approximate solution (black).
Example 12.3.5. Use Euler’s method from Example 12.3.4 and time steps of size ∆t = 1.0 to find
a numerical solution to the the cooling problem. Use a spreadsheet for the calculations. Note that
∆t = 1.0 is not a “small step;” we use it here for illustration purposes.
In Figure 12.9 we show a typical example of the method with initial value T (0) = T0 and with the
time step size ∆t = 1.0. Black dots represent the discrete values generated by the Euler method,
starting from initial conditions, T0 = 0, 5, 15, 20. Notice that the black curve is simply made up of
line segments linking points obtained by the numerical solution.
S OLVING DIFFERENTIAL EQUATIONS 12.4 S UMMARY
Concept Check-In
1. What change would you make in the process set up in Example 12.3.5 to improve the
approximation made by Euler’s method?
On the same graph, we also show the analytic solution (red curves) given by Eqn. (12.3.2) with the
same four initial temperatures. We see that the black and red curves start out at the same points (since
they both satisfy the same initial conditions). However, the approximate solution obtained with
Euler’s method is not identical to the true solution. The difference between the two (gap between
the red and black curves) is the numerical error in the approximation.
12.4 IJ Summary
1. Given a function, we can check whether it is a solution to a differential equation by performing
the appropriate differentiation and algebraic simplification.
2. Solutions to differential equations in which there is no change at all (“constant solutions”) are
referred to as steady states.
4. If we define the deviation from steady state, z(t ) = y(t ) ´ ab , we get a decay equation for z(t )
that has exponentially decreasing solutions provided b ą 0. This says that the eviation from
steady state always decrease over time.
6. For some differential equations, it is not always possible to determine an analytic solution
(explicit formula). Numerical solutions can be found using Euler’s method, and serve as an
approximate solution.
7. Euler’s method takes a known initial value y0 and uses the iteration scheme:
to generate successive values of yk that approximate the solution at time points tk = k∆t
(a) height of water draining out of a cylindrical container (verifying a solution to a differential
equation);
(b) Newton’s law of cooling (described by a linear differential equation);
(c) growth of the radius of a cell;
S OLVING DIFFERENTIAL EQUATIONS 12.4 S UMMARY
20
y
10
t
2 4 6 8 10 12 14
(a) Estimate the value that these solution curves are approaching.
(b) Which solutions are approaching from above? From below?
4. Why is a large value of ∆t not a good idea when using Euler’s method?
S OLVING DIFFERENTIAL EQUATIONS 12.4 S UMMARY
366
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS
Chapter 13
Concept Check-In
1. What is meant by an analytic solution to a differential equation?
Not all differential equations are easily solved analytically. Furthermore, even when we find the
analytic solution, it is not necessarily easy to interpret, graph, or understand. This situation motivates
qualitative methods that promote an overall understanding of behaviour - directly from information
in the differential equation - without the challenge of finding a full functional form of the solution.
In this chapter we expand our familiarity with differential equations and assemble new, qualitative
techniques for understanding them. We consider differential equations in which the expression on
one side, f (y), is nonlinear, i.e. equations of the form
dy
= f (y)
dt
in which f is more complicated than the form a ´ by. Geometric techniques, rather than algebraic
calculations form the core of the concepts we discuss.
Concept Check-In
4. What happens in the case that k = 0? Explain under what conditions this might arise and
what happens to the population N (t ) in this case.
The case of k ą 0 is unrealistic, since real populations cannot keep growing indefinitely in an
explosive, exponential way. Eventually running out of space or resources, the population growth
dwindles, and the population attains some static level rather than expanding forever. This motivates
a revision of our previous model to depict density-dependent growth.
Concept Check-In
dy
5. Can the differential equation dt = a ´ by be written in the form (13.1.2)? If so, what are
the values of α, β , γ?
Definition 13.1.1 (Linear differential equation). A first order differential equation is said to be linear
if it is a linear combination of terms of the form
dy
, y, 1
dt
that is, it can be written in the form
dy
α +βy+γ = 0 (13.1.2)
dt
where α, β , γ do not depend on y. Note that “first order” means that only the first derivative (or no
derivative at all) may occur in the equation.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.1 L INEAR AND NONLINEAR
DIFFERENTIAL EQUATIONS
So far, we have seen several examples of this type with constant coefficients α, β , γ. For example,
α = 1, β = ´k, and γ = 0 in Eqn. 11.1.2 whereas α = 1, γ = ´a, and β = b in Eqn. (12.2.1). A
differential equation that is not of this form is said to be nonlinear.
Example 13.1.2 (Linear versus nonlinear differential equations). Which of the following differential
equations are linear and which are nonlinear?
dy dy dy
(a) = y2 , (b) ´ y = 5, (c) y = ´1.
dt dt dt
?
Solution. Any term of the form y2 , y, 1/y, etc. is nonlinear in y. A product such as y dy dt is also
nonlinear in the independent variable. Hence equations (a), (c) are nonlinear, while (b) is linear. ♦
Concept Check-In
6. For what values of α, β and γ can Example 13.1.2(b) be put into the form (13.1.2)?
The significance of the distinction between linear and nonlinear differential equations is that
nonlinearities make it much harder to systematically find a solution to the given differential equation
by “analytic” methods. Most linear differential equations have solutions that are made of exponential
functions or expressions involving such functions. This is not true for nonlinear equations.
However, as we see shortly, geometric methods are very helpful in understanding the behaviour
of such nonlinear differential equations.
8. Why does the product a ¨ b, rather than the sum a + b appear in the Law of Mass Action ?
Example 13.1.3 (Differential equation for interacting chemicals). Substance A is added at a constant
rate of I moles per hour to a 1-litre vessel. Pairs of molecules of A interact chemically to form a
product P. Write down a differential equation that keeps track of the concentration of A, denoted y(t ).
Concept Check-In
1. In each of Examples 13.1.3 and 13.1.4, clearly identify the constant quantities.
Solution. First consider the case that there is no reaction. Then, the addition of A to the reactor at a
constant rate leads to changing y(t ), described by the differential equation
dy
= I.
dt
When the chemical reaction takes place, the depletion of A depends on interactions of pairs of
molecules. By the law of mass action, the rate of reaction is of the form k ¨ y ¨ y = ky2 , and as it
reduces the concentration, it appear with a minus sign in the DE. Hence
dy
= I ´ ky2 .
dt
Example 13.1.4 (Logistic equation reinterpreted). Rewrite the logistic equation in the form
dN
= rN ´ bN 2
dt
a) Interpret the meaning of this restated form of the equation by explaining what each of the terms
on the right hand side could represent.
b) Which of the two terms dominates for small versus large population levels?
Solution.
a) This form of the equation has growth term rN proportional to population size, as encountered
previously in unlimited population growth. However, there is also a quadratic (nonlinear) rate of
loss (note the minus sign) ´bN 2 . This term could describe interactions between individuals that
lead to mortality, e.g. through fighting or competition.
b) From familiarity with power functions (in this case, the functions of N that form the two terms,
rN and bN 2 ) we can deduce that the second, quadratic term dominates for larger values of N,
and this means that when the population is crowded, the loss of individuals is greater than the
rate of reproduction. ♦
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.1 L INEAR AND NONLINEAR
DIFFERENTIAL EQUATIONS
Concept Check-In
10. Suppose an environment can sustain 2000 aphids per plant, and the current population
size on a given plant is 1700. What is K, N and y based on this information?
Solution.
a) The variable, y(t ) represents a scaled version of the population density. Instead of measuring
the population in some arbitrary units - such as number of individuals per acre, or number of
bacteria per ml - y(t ) measures the population in “multiples of the carrying capacity.”
For example, if the environment can sustain 1000 aphids per plant (so K = 1000 individuals
per plant), and the current population size on a given plant is N = 950 then the value of the
scaled variable is y = 950/1000 = 0.95. We would say that “the aphid population is at 95% of
its carrying capacity on the plant.”
b) Since K is assumed constant, it follows that
dN dy
N (t ) = Ky(t ) ñ =K .
dt dt
Using this, we can simplify the logistic equation as follows:
dN (K ´ N ) dy (K ´ Ky)
= rN , ñ K = r (Ky) ,
dt K dt K (13.1.3)
dy
ñ = ry(1 ´ y).
dt
♦
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.2 T HE GEOMETRY OF CHANGE
Eqn. (13.1.3) “looks simpler” than Eqn. (13.1.1) since it depends on only one parameter, r.
Moreover, by understanding this equation, and transforming back to the original logistic in terms
of N (t ) = Ky(t ), we can interpret results for the original model. While we do not go further with
transforming variables at present, it turns out that one can also further reduce the scaled logistic to
an equation in which r = 1 by “rescaling time units”.
Concept Check-In
12. What are the units of the parameter r?
Learning Objectives
• Find linear approximations of a solution to a DE, given a point.
• Interpret slope fields for a given differential equation and use them to roughly sketch
solutions.
In this section, we introduce a new method for understanding differential equations using
graphical and geometric arguments. Such methods circumvent the solutions that we expressed in
terms of analytic formulae. We resort to concepts learned much earlier - for example, the derivative
as a slope of a tangent line - in order to use the differential equation itself to assemble a sketch
of the behaviour that it predicts. That is, rather than writing down y = F (t ) as a solution to the
differential equation (and then graphing that function) we sketch the qualitative behaviour of such
solution curves directly from information contained in the differential equation.
§§ Slope fields
Here we discuss a geometric way of understanding what a differential equation is saying using
a slope field, also called a direction field. We have already seen that solutions to a differential
equation of the form
dy
= f (y)
dt
are curves in the (y,t )-plane that describe how y(t ) changes over time (thus, these curves are graphs
of functions of time). Each initial condition y(0) = y0 is associated with one of these curves, so that
together, these curves form a family of solutions.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.2 T HE GEOMETRY OF CHANGE
• at any point (t, y(t )) on a solution curve, the tangent line must have slope f (y), which
depends only on the y value, and not on the time t.
Note: in more general cases, the expression f (y) that appears in the differential equation might
depend on t as well as y. For our purposes, we do not consider such examples in detail.
By sketching slopes at various values of y, we obtain the slope field through which we can get a
reasonable idea of the behaviour of the solutions to the differential equation.
dy
= 2y. (13.2.1)
dt
Compute some of the slopes for various values of y and use this to sketch a slope field for this
differential equation.
Concept Check-In
14. Solve Differential Eqn. (13.2.1) analytically.
Solution. Equation (13.2.1) states that if a solution curve passes through a point (t, y), then its
tangent line at that point has a slope 2y, regardless of the value of t. This example is simple enough
that we can state the following: for positive values of y, the slope is positive; for negative values of
y, the slope is negative; and for y = 0, the slope is zero.
We provide some tabulated values of y indicating the values of the slope f (y), its sign, and what
this implies about the local behaviour of the solution and its direction. Then, in Figure 13.1 we
dy
Table 13.1: Table for the slope field diagram of differential equation (13.2.1), dt = 2y, described in
Example 13.2.1.
combine this information to generate the direction field and the corresponding solution curves. Note
that the direction of the arrows (rather than their absolute magnitude) provides the most important
qualitative tendency for the slope field sketch. ♦
In constructing the slope field and solution curves, the following basic rules should be followed:
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.2 T HE GEOMETRY OF CHANGE
y
2 2 y
1 1
0 0
−1 −1
−2 t −2 t
0 1 2 3 4 5 0 1 2 3 4 5
dy
Figure 13.1: Direction field and solution curves for differential equation, dt = 2y described in
Example 13.2.1.
1. By convention, time flows from left to right along the t axis in our graphs, so the direction of
all arrows (not usually indicated explicitly on the slope field) is always from left to right.
2. According to the differential equation, for any given value of the variable y, the slope is given
by the expression f (y) in the differential equation. The sign of that quantity is particularly
important in determining whether the solution is locally increasing, decreasing, or neither. In
the tables, we indicate this in the last column with the notation Õ, Œ, or Ñ.
3. There is a single arrow at any point in the ty-plane, and consequently solution curves cannot
intersect anywhere (although they can get arbitrarily close to one another).
Solution. Based on the last example, we focus on the sign, rather than the value of the derivative f (y),
since that sign determines whether the solutions increase, decrease, or stay constant. Recall that
factoring helps to find zeros, and to identify where an expression changes sign. For example,
dy
= f (y) = y ´ y3 = y(1 ´ y2 ) = y(1 + y)(1 ´ y).
dt
The sign of f depends on the signs of the factors y, (1 + y), (1 ´ y).
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.2 T HE GEOMETRY OF CHANGE
Concept Check-In
15. Graph the function f (y) = y(1 + y)(1 ´ y) and indiate where it changes sign.
16. Repeat the process for the function f (y) = y2 (1 + y)2 (1 ´ y).
For y ă ´1, two factors, y, (1 + y), are negative, whereas (1 ´ y) is positive, so that the product
is positive overall. The sign of f (y) changes at each of the three points y = 0, ˘1 where one or
another of the three factors changes sign, as shown in Table 13.2. Eventually, to the right of all three
(when y ą 1), the sign is negative. We summarize these observations in Table 13.2 and show the
slopes field and solution curves in Figure 13.2. ♦
Table 13.2: Table for the slope field diagram of the DE (13.2.2) described in Example 13.2.2.
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
1 1
/ / / / / / / / / / / / / / / / / / / /
0.5 / / / / / / / / / / 0.5 / / / / / / / / / /
0 0
−0.5 \ \ \ \ \ \ \ \ \ \ −0.5 \ \ \ \ \ \ \ \ \ \
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
−1 −1
/ / / / / / / / / / / / / / / / / / / /
0 5 10 15 20 0 5 10 15 20
Figure 13.2: Direction field and solution curves for differential equation (13.2.2) described in
Example 13.2.2.
Example 13.2.3. Sketch a slope field and solution curves for the problem of a cooling object, and
specifically for
dT
= f (T ) = 0.2(10 ´ T ). (13.2.3)
dt
Solution. The family of curves shown in Figure 13.3 (also Figure 12.6) are solutions to (13.2.3).
The function f (T ) = 0.2(10 ´ T ) corresponds to the slopes of tangent lines to these curves. We
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.3 (F LAVOUR B) S TATE - SPACE
DIAGRAMS
indicate the sign of f (T ) and thereby the behaviour of T (t ) in Table 13.3. Note that there is only
one change of sign, at T = 10. For smaller T , the solution is always increasing and for larger T ,
the solution is always decreasing. The slope field and solution curves are shown in Figure 13.3. In
the slope field, one particular value of t is coloured to emphasize the associated changes in T , as in
Table 13.3. ♦
20 20
temperature, T
temperature, T
15 15
10 10
5 5
0
t 0
t
0 5 10 15 20 0 5 10 15 20
Figure 13.3: Slope field and solution curves for a cooling object that satisfies the differential
equation (13.2.3) in Example 13.2.3.
Concept Check-In
17. Indicate the regions Figure 13.3 where T is increasing.
We observe an agreement between the detailed solutions found analytically (Example 12.2.2),
found using Euler’s method (Example 12.3.4), and those sketched using the new qualitative argu-
ments (Example 13.2.3).
Learning Objectives
• Explain what is meant by a “steady-state” solution.
• Sketch a state-space diagram for a given differential equation and use it to describe the
behaviour of solutions.
• Explain what it means for a steady-state solution to be “stable”. Determine the stability
of a steady state.
Definition 13.3.1 (State space diagram (or phase line)). A line representing the dependent vari-
able (y) together with arrows to describe the flow along that line (increasing, decreasing, or
stationary y) satisfying Eqn. (13.3.1) is called the state space diagram or the phase line diagram
for the differential equation.
Rather than tabulating signs for f (y), we can arrive at similar conclusions by sketching f (y)
and observing where this function is positive (implying that y increases) or negative (y decreases).
Places where f (y) = 0 (“zeros of f ”) are important since these represent steady states (“static
solutions”, where there is no change in y). Along the y axis (which is now on the horizontal axis of
the sketch) increasing y means motion to the right, decreasing y means motion to the left.
As we shall see, the information contained in this type of diagram provides a qualitative
description of solutions to the differential equation, but with the explicit time behaviour suppressed.
This is illustrated by Figure 13.4, where we show the connection between the slope field diagram
and the state space diagram for a typical differential equation.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.3 (F LAVOUR B) S TATE - SPACE
DIAGRAMS
y
y
% f (y)
&
t y
%
&
(iii) y(0) = 2.
This means that y does not change at these steady state values, so, if we start a system off with
y(0) = 0, or y(0) = ˘1, the value of y is static. The three places at which this happens are marked
by heavy dots in Figure 13.5(a).
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.3 (F LAVOUR B) S TATE - SPACE
DIAGRAMS
f (y) f (y)
y y
(a) (b)
Steady states (dots) and intervals for which y increases or decreases for the differential equation
(13.3.2). See Example 13.3.2.
Flavour B
We also see that f (y) ă 0 for ´1 ă y ă 0 and for y ą 1. In these intervals, y(t ) must be a
decreasing function of time (dy/dt ă 0). On the other hand, for 0 ă y ă 1 or for y ă ´1, we have
f (y) ą 0, so y(t ) is increasing. See arrows on Figure 13.5(b). We see from this figure that there is a
tendency for y to move away from the steady state value y = 0 and to approach either of the steady
states at 1 or ´1. Starting from the initial values given above, we have
Example 13.3.3 (A cooling object). Sketch the same type of diagram for the problem of a cooling
object and interpret its meaning.
f (T )
2
T
10
Concept Check-In
19. In Figures 13.6 and 13.7, where is the function positive?
Example 13.3.4. Create a similar qualitative sketch for the more general form of linear differential
equation
dy
= f (y) = a ´ by. (13.3.4)
dt
For what values of y would there be no change?
Solution. The rate of change of y is given by the function f (y) = a ´ by. This is shown in
Figure 13.7. The steady state at which f (y) = 0 is at y = a/b. Starting from an initial condi-
tion y(0) = a/b, there would be no change. We also see from this figure that y approaches this
value over time. After a long time, the value of y will be approximately a/b. ♦
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.3 (F LAVOUR B) S TATE - SPACE
DIAGRAMS
f (y)
y
a
b
Figure 13.7: Qualitative sketch for a DE
From the last few figures, we observe that wherever the function f on the right hand side of the
differential equations crosses the horizontal axis (satisfies f = 0) there is a steady state. For example,
in Figure 13.6 this takes place at T = 10. At that temperature the differential equation specifies that
dT /dt = 0 and so, T = 10 is a steady state, a concept we first encountered in Chapter 12.
Definition 13.3.5 (Steady state). A steady state is a state in which a system is not changing.
Definition 13.3.7 (Stability). We say that a steady state is stable if states that are initially close
enough to that steady state will get closer to it with time. We say that a steady state is unstable, if
states that are initially very close to it eventually move away from that steady state.
Concept Check-In
21. In the state space diagram in Figure 13.4, identify the stable steady states.
Solution. From any starting value of y ą 0 in this example, we see that after a long time, the
solution curves tend to approach the value y = 1. States close to y = 1 get closer to it, so this is a
stable steady state. For the steady state y = 0, we see that initial conditions near y = 0 move away
over time. Thus, this steady state is unstable. Similarly, the steady state at y = ´1 is stable. In
Figure 13.5 we show the stable steady states with black dots and the unstable steady state with an
open dot. ♦
Example 13.4.1. Find the steady states of the logistic equation, Eqn. (13.1.1):
i The scaled logistic equation, its slope field, and steady state values are discussed here.
dN (K ´ N )
= rN .
dt K
Solution. To determine the steady states of Eqn. (13.1.1), i.e. the level of population that would not
change over time, we look for values of N such that
dN
= 0.
dt
This leads to
(K ´ N )
rN = 0,
K
which has solutions N = 0 (no population at all) or N = K (the population is at its carrying capacity).
♦
We could similarly find steady states of the scaled form of the logistic equation, Eqn. (13.1.3).
Setting dy/dt = 0 leads to
dy
0= = ry(1 ´ y) ñ y = 0, or y = 1.
dt
This comes as no surprise since these values of y correspond to the values N = 0 and N = K.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS
i A second way to analyze the scaled logistic equation, using the phase line approach, and its
connection to the slope field method as described in Example 13.4.2.
Example 13.4.2. Draw a plot of the rate of change dy/dt versus the value of y for the scaled logistic
equation,Eqn. (13.1.3):
dy
= ry(1 ´ y).
dt
Concept Check-In
22. Circle the steady states in Figure 13.8 and identify which one is stable.
Solution. In the plot of Figure 13.8 only y ě 0 is relevant. In the interval 0 ă y ă 1, the rate
of change is positive, so that y increases, whereas for y ą 1, the rate of change is negative, so y
decreases. Since y refers to population size, we need not concern ourselves with behaviour for y ă 0.
From Figure 13.8 we deduce that solutions that start with a positive y value approach y = 1 with
Rate of change
dy/dt
y
1
Figure 13.8: Plot of dy/dt versus y for the the scaled logistic equation (13.1.3).
time. Solutions starting at either steady state y = 0 or y = 1 would not change. Restated in terms of
the variable N (t ), any initial population should approach its carrying capacity K with time. ♦
We now look at the same equation from the perspective of the slope field.
Example 13.4.3. Draw a slope field for the scaled logistic equation with r = 0.5, that is for
dy
= f (y) = 0.5 ¨ y(1 ´ y). (13.4.1)
dt
Solution. We generate slopes for various values of y in Table 13.4 and plot the slope field in
Figure 13.9(a). ♦
Finally, we practice Euler’s method to graph the numerical solution to Eqn. (13.4.1) from several
initial conditions.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS
Table 13.4: Table for slope field for the logistic equation (13.4.1). See Fig 13.9(a) for the resulting
diagram.
1.2
1.2
1
1
population
population
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 0 2 4 6 8 10
dy
Figure 13.9: (a) Slope field and (b) solution curves for the logistic equation (13.4.1), dt = 0.5 ¨ y(1 ´
y)
Example 13.4.4 (Numerical solutions to the logistic equation). Use Euler’s method to approximate
the solutions to the logistic equation (13.4.1).
Concept Check-In
24. What initial values y0 were used in drawing the different solution curves depicted in
Figure 13.9(b)?
Solution. In Figure 13.9(b) we show a set of solution curves, obtained by solving the equation
numerically using Euler’s method. To obtain these solutions, a value of h = ∆t = 0.1 was used.
The solution is plotted for various initial conditions y(0) = y0 . The successive values of y were
calculated according to
P Link to Google Sheets. This spreadsheet implements Euler’s method for Example 13.4.4. A
chart showing solutions from four initial conditions is included.
From Figure 13.9(b), we see that solution curves approach the steady state y = 1, meaning that the
population N (t ) approaches the carrying capacity K for all positive starting values. A link to the
spreadsheet that implements Euler’s method is included. ♦
Example 13.4.5 (Inflection points). Some of the curves shown in Figure 13.9(b) have an inflection
point, but others do not. Use the differential equation to determine which of the solution curves have
an inflection point.
Solution. We have already established that all initial values in the range 0 ă y0 ă 1 are associated
with increasing solutions y(t ). Now we consider the concavity of those solutions.
Concept Check-In
25. How do we know that initial conditions in the range 0 ă y0 ă 1 lead to increasing
solutions?
d2y dy dy dy
2
= r ´ 2ry = r (1 ´ 2y).
dt dt dt dt
An inflection point would occur at places where the second derivative changes sign. This is possible
for dy/dt = 0 or for (1 ´ 2y) = 0. We have already dismissed the first possibility because we argued
that the rate of change is nonzero in the interval of interest. Thus we conclude that an inflection point
would occur whenever y = 1/2. Any initial condition satisfying 0 ă y0 ă 1/2 would eventually
pass through y = 1/2 on its way to the steady state level at y = 1, and in so doing, would have an
inflection point. ♦
Hint
Growth rate (number of aphids born per unit time) contributes positively, whereas predation rate
(number of aphids eaten per unit time) contributes negatively to the rate of change of aphids
with respect to time (dx/dt).
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS
Consider the aphid-ladybug problem (Example 1.4.1) with aphid density x, growth rate G(x) = rx,
and predation rate by a ladybug P(x) as in (1.4.1). (a) Write down a differential equation for the
aphid population. (b) Use your equation, and a sketch of the two functions to answer the following
question: What happens to the aphid population starting from various initial population sizes?
Featured Problem 13.4.6
i A video summary of the model for the spread of a disease, together with its analysis.
1. The population mixes very well, so each individual is equally likely to contact and interact
with any other individual. The contact is random.
2. Other than the state (S or I), individuals are “identical,” with the same rates of recovery and
infectivity.
3. On the timescale of interest, there is no birth, death or migration, only exchange between S
and I.
Example 13.4.7. Suppose that the process can be represented by the scheme
S + I Ñ I + I,
IÑS
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS
The first part, transmission of disease from I to S involves interaction. The second part is recovery.
Use the assumptions above to track the two populations and to formulate a set of differential
equations for I (t ) and S(t ).
Solution. The following balance equations keeps track of individuals
Rate of Rate of gain Rate of loss
change of = due to disease ´ due to
I (t ) transmission recovery
According to our assumption, recovery takes place at a constant rate per unit time, denoted by µ ą 0
. By the law of mass action, the disease transmission rate should be proportional to the product of
the populations, (S ¨ I ). Assigning β ą 0 to be the constant of proportionality leads to the following
differential equations for the infected population:
dI
= β SI ´ µI.
dt
Similarly, we can write a balance equation that tracks the population of susceptible individuals:
Rate of Rate of Loss Rate of gain
change of = ´ due to disease + due to
S(t ) transmission recovery
Observe that loss from one group leads to (exactly balanced) gain in the other group. By similar
logic, the differential equation for S(t ) is then
dS
= ´β SI + µI.
dt
We have arrived at a system of equations that describe the changes in each of the groups,
dI
= β SI ´ µI, (13.4.2a)
dt
dS
= ´β SI + µI. (13.4.2b)
dt
♦
Concept Check-In
26. Identify any constants in Eqns. (13.4.2)(a) and (b).
From Eqns. (13.4.2) it is clear that changes in one population depend on both, which means
that the differential equations are coupled (linked to one another). Hence, we cannot “solve one”
independently of the other. We must treat them as a pair. However, as we observe in the next
examples, we can simplify this system of equations using the fact that the total population does not
change.
Example 13.4.8. Use Eqns.(13.4.2) to show that the total population does not change (hint: show
that the derivative of S(t ) + I (t ) is zero).
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS
Concept Check-In
29. Redo Example 13.4.9 but eliminate I (t ) instead of S(t ) .
30. Analyze the equation you get for dS(t )/dt as done for dI/dt in Example 13.4.10.
Solution. Since N = S(t ) + I (t ) is constant, we can write S(t ) = N ´ I (t ). Then, plugging this into
the differential equation for I (t ) we obtain
dI dI
= β SI ´ µI, ñ = β (N ´ I )I ´ µI.
dt dt
♦
Example 13.4.10. a) Show that the above equation can be written in the form
dI
= β I (K ´ I ),
dt
where K is a constant.
♦
Using the above process, we have reduced the system of two differential equations for the two
variables I (t ), S(t ) to a single differential equation for I (t ), together with the statement S(t ) =
N ´ I (t ). We now examine implications of this result using the qualitative methods of this chapter.
Find the steady states of the differential equation (13.4.3) and draw a state space diagram in each of
the following cases:
(a) K ě 0,
(b) K ă 0.
Use your diagram to determine which steady state(s) are stable or unstable.
Concept Check-In
31. What is the significance of the grey shaded regions in Fig. 13.10.
Solution. Steady states of Eqn. (13.4.3) satisfy dI/dt = β I (K ´ I ) = 0. Hence, these steady states
are I = 0 (no infected individuals) and I = K. The latter only makes sense if K ě 0. We plot the
function f (I ) = β I (K ´ I ) in Eqn. (13.4.3) against the state variable I in Figure 13.10 (a) for K ě 0
and (b) for K ă 0. Since f (I ) is quadratic in I, its graph is a parabola and it opens downwards. We
add arrows pointing right (Ñ) in the regions where dI/dt ą 0 and arrows pointing left (Ð) where
dI/dt ă 0.
In case (a), when K ě 0, we find that arrows point toward I = K, so this steady state is stable.
Arrows point away from I = 0, so this represents an unstable steady state. In case (b), while we still
have a parabolic graph with two steady states, the state I = K is not admissible since K is negative.
Hence only one steady state, at I = 0 is relevant biologically, and all initial conditions move towards
this state. ♦
Example 13.4.12. Interpret the results of the model in terms of the disease, assuming that initially
most of the population is in the susceptible S group, and a small number of infected individuals are
present at t = 0.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS
f (I) f (I)
I 0 I
0 K K
(a) (b)
Figure 13.10: State-space diagrams for differential equation (13.4.3). Plots of f (I ) as a function of
I in the cases (a) K ě 0, and (b) K ă 0. The grey regions are not biologically meaningful since I
cannot be negative.
Solution. In case (a), as long as the initial size of the infected group is positive (I ą 0), with time
it approaches K, that is, I (t ) Ñ K = N ´ µ/β . The rest of the population is in the susceptible
group, that is S(t ) Ñ µ/β (so that S(t ) + I (t ) = N is always constant.) This first scenario holds
provided K ą 0 which is equivalent to N ą µ/β . There are then some infected and some healthy
individuals in the population indefinitely, according to the model. In this case, we say that the
disease becomes endemic.
In case (b), which corresponds to N ă µ/β , we see that I (t ) Ñ 0 regardless of the initial size of
the infected group. In that case, S(t ) Ñ N so with time, the infected group shrinks and the healthy
group grows so that the whole population becomes healthy. From these two results, we conclude
that the disease is wiped out in a small population, whereas in a sufficiently large population, it can
spread until a steady state is attained where some fraction of the population is always infected. In
fact we have identified a threshold that separates these two behaviours:
Concept Check-In
34. In the case that β = 0.001per person per day and µ = 0.1 per day, how large would the
population have to be for the disease to become endemic?
35. Frequent hand-washing can be a protective measure that decreases the spread of disease.
Which parameter of the model would this affect and in what way?
Nβ
ą1 ñ disease becomes endemic,
µ
Nβ
ă1 ñ disease is wiped out.
µ
i A video summarizing the interpretation of the model and the meaning of the constant
R0 = Nβ /µ.
The ratio of constants in these inequalities, R0 = Nβ /µ is called the basic reproduction number
for the disease. Many current and much more detailed models for disease transmission also have
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.5 S UMMARY
such threshold behaviour, and the ratio that determines whether the disease spreads or disappears, R0
is of great interest in vaccination strategies. This ratio represents the number of infections that arise
when 1 infected individual interacts with a population of N susceptible individuals.
13.5 IJ Summary
1. A differential equation of the form α dy
dt + β y + γ = 0 is linear (and “first order”). We
encountered several examples of nonlinear DEs in this chapter.
2. A (possibly nonlinear) differential equation dy dt = f (y) can be analyzed qualitatively by
observing where f (y) is positive, negative or zero.
3. A slope field (or “direction field”) is a collection of tangent vectors for solutions to a differential
equation. Slope fields can be sketched from f (y) without the need to solve the differential
equation.
4. A solution curve drawn in a slope field corresponds to a single solution to a differential
equation, with some initial y0 value given.
5. A state space (or “phase line” diagram) for the differential equation is a y axis, together
with arrows describing the flow (increasing/decreasing/stationary) along that axis. It can be
obtained from a sketch of f (y).
6. A steady state is stable if nearby states get closer. A steady state is unstable if nearby states
get further away with time.
7. Creating/interpreting slope field and state space diagrams is helpful in understanding the
behavior of solutions to differential equations.
8. Applications considered in this chapter included:
(a) the logistic equations for population growth (a nonlinear differential equation, scaling,
steady state and slope field demonstration);
(b) the Law of Mass Action (a nonlinear differential equation);
(c) a cooling object (state space and phase line diagram demonstration); and
(d) disease spread model (an extensive exposition on qualitative differential equation meth-
ods).
(a) 5 dy
dt ´ y = ´0.5 (c) dy
+ πy + ρ = 3
dx
2
(b) dy
+y+1 = 0 (d) dx
dt + x + 2 = ´3x
dt
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.5 S UMMARY
15
t
5 10 15 20
4. Circle the stable steady states in the following state space diagram
f (y)
y
Application to Multivariable Equations
393
G EOMETRY IN T HREE D IMENSIONS
Chapter 14
(F LAVOUR C) G EOMETRY
IN T HREE D IMENSIONS
Before we get started doing calculus in two and three dimensions we need to brush up on some basic
geometry that we will use a lot. We are already familiar with the Cartesian plane1 , but we’ll start
from the beginning.
Learning Objectives
• Label points on the x-y-z axes and identify basic planes of constant x, y, or z.
Each point in two dimensions may be labeled by two coordinates2 (x, y) which specify the
position of the point in some units with respect to some axes as in the figure below.
(x, y)
x x
1 René Descartes (1596–1650) was a French scientist and philosopher, who lived in the Dutch Republic for roughly
twenty years after serving in the (mercenary) Dutch States Army. He is viewed as the father of analytic geometry,
which uses numbers to study geometry.
2 This is why the xy-plane is called “two dimensional” — the name of each point consists of two real numbers.
G EOMETRY IN T HREE D IMENSIONS 14.1 P OINTS AND PLANES
Similarly, each point in three dimensions may be labeled by three coordinates (x, y, z), as in the
two figures below.
z z
(x, y, z) (x, y, z)
z z
y y
x
x
y y
x x
The set of all points in three dimensions is denoted R3 . The plane that contains, for example, the x-
and y-axes is called the xy-plane.
More generally,
• The set of all points (x, y, z) that obey z = c is a plane that is parallel to the xy-plane and is a
distance |c| from it. If c ą 0, the plane z = c is above the xy-plane. If c ă 0, the plane z = c is
below the xy-plane. We say that the plane z = c is a signed distance c from the xy-plane.
• The set of all points (x, y, z) that obey y = b is a plane that is parallel to the xz-plane and is a
signed distance b from it.
• The set of all points (x, y, z) that obey x = a is a plane that is parallel to the yz-plane and is a
signed distance a from it.
3 Not surprisingly, the 2 in R2 signifies that each point is labelled by two numbers and the R in R2 signifies that the
numbers in question are real numbers. There are more advanced applications (for example in signal analysis and
in quantum mechanics) where complex numbers are used. The space of all pairs (z1 , z2 ), with z1 and z2 complex
numbers is denoted C2 .
G EOMETRY IN T HREE D IMENSIONS 14.1 P OINTS AND PLANES
z z z
z“c
y“b
y y y
x x x x“a
px, y, zq
y
x
px, 0, 0q y px, y, 0q
More generally, the distance from the point (x, y, z) to the point (x1 , y1 , z1 ) is
b
(x ´ x1 )2 + (y ´ y1 )2 + (z ´ z1 )2
Notice that this gives us the equation for a sphere quite directly. All the points on a sphere are
equidistant from the centre of the sphere. So, for example, the equation of the sphere centered on
(1, 2, 3) with radius 4, that is, the set of all points (x, y, z) whose distance from (1, 2, 3) is 4, is
(x ´ 1)2 + (y ´ 2)2 + (z ´ 3)2 = 16
If you’re having a hard time picturing the three-dimensional axes, Appendix section 14.1.1 will
lead you through folding a model out of a piece of paper.
G EOMETRY IN T HREE D IMENSIONS 14.1 P OINTS AND PLANES
3. Your paper now has a triangle sitting on top of a rectangle. Where the triangle ends, make a
crease in the underlying rectangle shapes.
crease
4. Your paper has four layers, with the triangle shapes on top. Open the paper so that three layers
are on top, and one is on the bottom. The result should look like the inside corner of a box.
open
Your octant is created! The vertical crease is the z axis, the crease to the left is the x axis, and the
crease to the right is the y axis. In the picture below, the blue sphere indicates that the octant is open
towards you: if you were to put a marble inside the paper structure, it would sit as shown.
x y
To practice with your octant, label the following points directly on the paper:
• (1, 1, 0)
• (0, 1, 1)
• (1, 0, 1)
The next collection of points will exist out in space, not on any of the paper sides. Point to their
positions relative to your octant:
• (1, 1, 1)
• (1, 2, 3)
• (1, ´1, 1)
• (1, 1, ´1)
Learning Objectives
• Given a simple function of two variables, z = f (x, y), evaluate z values for given pairs
(x, y).
First, a quick review of dependent and independent variables. Independent variables are the variables
we think of as changing somehow on their own; the dependent variables are the variables whose
change we think of as being caused by the independent variables. For example, if you want to
describe the relationship between the age of a cup of cottage cheese, and the number of bacteria in
that cup, we generally choose age (time) to be the independent variable and population of bacteria to
be the dependent variable: we think of age changing on its own, then that age causing the bacterial
population to change.
We could of course go the other way, and write time as a function of bacteria. This could be
useful if we were trying to figure out how old the cheese was by counting its bacteria. So the
difference between an independent variable and a dependent variable has to do with how we want to
interpret a function.
In a single-variable function, by convention we write
y = f (x )
G EOMETRY IN T HREE D IMENSIONS 14.2 F UNCTIONS OF TWO VARIABLES
where y is the dependent variable and x is the independent variable. Similarly, in a two-variable
function, we generally write
z = f (x, y)
We think of the variables x and y as independent, and the variable z as dependent.
If we’re not too concerned with independent vs dependent variables; or if the relationship
between the dependent and independent variables is difficult (or impossible) to write explicitly in
this form; then we can also define multivariable functions implicitly. For example, in the equation
z3 x + z2 y + xyz ´ 1 = 0
we can think of z as an implicitly defined function of x and y. You’ve already seen two families of
implicitly defined functions: planes and spheres.
Example 14.2.1
Which points (1, y, 1) in R3 satisfy the equation
z3 x + z2 y + xyz ´ 1 = 0 ?
1+y+y´1 = 0
f (x, y) = sin(x + y)
or
2 +y2
g(x, y) = ex
and think that the sine and exponential functions are different from the sine and exponential
functions we’ve seen in two dimensions. They aren’t! When x and y are real numbers, then (x + y)
and (x2 + y2 ) are real numbers as well. We’re taking the sine of a real number in the first equation,
and e to a real power in the second equation, just as we always have.
Functions of two (or more) variables are not so different from functions of one variable in other
ways as well.
Let f (x, y) be a function that takes pairs of real numbers as inputs, and gives a real number
as its output.
The set of points (x, y) that can be input to f is the domain of that function. The set of
outputs of f over its entire domain is the range of that function.
G EOMETRY IN T HREE D IMENSIONS 14.2 F UNCTIONS OF TWO VARIABLES
Solution. There are three operations in our function: exponentiation, subtraction, and taking of a
square root. We can subtract anything from anything; and we can raise e to any power. So the only
thing that could “break” our function is if we tried to take the square root of a negative number. This
tells us that, in order for f (x, y) to be defined, we need
2 2
x +y
e ´2 ě 0
2 +y2
ùñ ex ě2
2 2
ùñ x + y ě ln 2
One way of describing the domain of this function is to call it “all points (x, y) with x2 + y2 ě ln 2.”
? the shape this set makes in R : all points on or outside the circle
A more standard way is to describe 2
?
ln 2
To help you visualize what we mean, take a point in the shaded area above. For example, (1, .5).
If we plug that into our function, it causes no problems:
a
2 +.52
a ?
f (1, .5) = e1 ´1 = e1.25 ´ 2 « 1.49 « 1.22
On the other hand, take a point in the white area. For example, (.5, .5). If we try to plug this into our
function, we end up with
a
2 2
a ? ?
f (.5, .75) = e.5 +.5 ´ 2 = e0.5 ´ 2 « 1.65 ´ 2 « ´0.35
?
ln 2
(1, .5)
(.5, .5)
x
Now, let’s think about range. By choosing larger and larger values of x and y, we can make
x2 + y2 into larger and larger numbers. So within our restricted domain, the range of x2 + y2 is
2 2 2 2
[ln2, 8); so the range of ex +y is eln 2 , 8 = [2, 8); so the range of ex +y ´ 2 is [0, 8); so the
range of f (x, y) is [0, 8).
Again, note that the domain of f consists of ordered pairs of real numbers, while its range
consists of real numbers.
Example 14.2.3
Example 14.2.4
Find the domain and range of the function
x
f (x, y) = sin ?
y
Solution. Let’s start with domain. We can take the sine of any number we like, so that part of the
function doesn’t limit the domain. The things limiting the domain are that we cannot take the square
root of a negative number, and we can’t divide by zero.
• Because we can’t take the square root of a negative number, we must have y ě 0.
?
• Because we can’t divide by 0, we must have y ‰ 0, i.e. y ‰ 0.
Combining these restrictions, we can only have values of y in the interval (0, 8); x can be any real
number. So, our domain is the upper half of the xy plane, excluding the x-axis:
G EOMETRY IN T HREE D IMENSIONS 14.2 F UNCTIONS OF TWO VARIABLES
In general, the range of sin x is [´1, 1]. So, we certainly can’t get a larger range than this. We
should check that our range is no smaller. When y = 1, our function becomes f (x, 1) = sin(x/1) =
sin x. Since x can be any real number, indeed the range of our function is [´1, 1].
Example 14.2.4
Example 14.2.5
Find the domain and range of the function
Solution. First, let’s think about the arctangent and logarithm function in the context of single-
variable functions. The domain of arctangent is all real numbers, and its range is ´ π2 , π2 . The
domain of the natural logarithm is all positive numbers, and its range is all real numbers.
z z
π
2
t t
z = arctant z = lnt
G EOMETRY IN T HREE D IMENSIONS 14.2 F UNCTIONS OF TWO VARIABLES
Since only positive numbers may be input into the natural logarithm, we require arctan(x + y) ą 0.
That requires (x + y) ą 0. So, our domain is the collection of all points (x, y) such that x + y ą 0;
put another way, all points above the line y = ´x.
If our domain is points (x, y) such that x + y ą 0, then the range of the function (x + y) is (0, 8);
so the numbers being plugged into the arctangent function are (0, 8 ). So, the numbers coming out
of the arctangent function are 0, π2 . Then the numbers from 0, π2 are being input into the natural
logarithm function, leading to a range of the entire function of ´8, ln 2 . π
z z
π
2
π
ln 2
t t
π
2
If 0 ă t, then 0 ă arctant ă π
2 If 0 ă t ă π2 , then ´8 ă lnt ă ln π
2
Example 14.2.5
We may sometimes restrict the domain of a function more than is mathematically necessary in
order for it to make sense in a model. For example, we may have a function that only makes sense
in our model when it gives positive values. In this case, we might restrict the domain to a model
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
domain, the set of inputs for which the function is not only defined, but sensible in the context of
our model.
Example 14.2.6
A large pharmaceutical company determines its research budget for a new vaccine according to the
formula
R(x, y) = ln(xy)
where x is the size of the customer base they expect to have and y is the revenue they expect per
dose.
Then for each variable x, y, and R, negative values don’t make sense in the model. So although
we could compute R(´1, ´1) = 1, and we could compute R(0.5, 0.5) « ´1.39, they wouldn’t be
sensible in the context of our model.
• Since x and y need to be nonnegative, we will only consider points (x, y) in the first quadrant
of the Cartesian plane: x ě 0 and y ě 0.
• Since R needs to be nonnegative, we will further restrict xy ě 1. That is, y ě 1x .
The two restrictions above give us the model domain shaded below.
y
x
1
Depending on the specifics of how the function is being used, the model domain may be restricted
even further. For example, perhaps the firm has a maximum budget for any given project; perhaps
the amount they can charge is limited by law; etc.
Example 14.2.6
Learning Objectives
In Math 100, you won’t be asked to produce sketches of 3D surfaces, so there are no learning
objectives associated with this section. However, you will be shown such sketches. Under-
standing how they can be produced can help you deepen and solidify your understanding of
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
In practice students taking multivariable calculus regularly have great difficulty visualising
surfaces in three dimensions, despite the fact that we all live in three dimensions. We’ll now develop
some technique to help us sketch surfaces in three dimensions5 .
We all have a fair bit of experience drawing curves in two dimensions. Typically the intersection
of a surface (in three dimensions) with a plane is a curve lying in the (two dimensional) plane. Such
an intersection is usually called a cross-section. In the special case that the plane is one of the
coordinate planes, or parallel to one of the coordinate planes, the intersection is sometimes called a
trace.
Definition 14.3.1.
The trace of a surface is the intersection of that surface with a plane that is parallel to one
of the coordinate planes.
So, one trace (the intersection with the xy plane) is found by setting z equal to a constant; another
trace (the intersection with the yz plane) is found by setting x equal to a constant; and the final trace
(the intersection with the xz plane) is found by setting y equal to a constant.
One can often get a pretty good idea of what a surface looks like by sketching a bunch of
cross-sections. Here are some examples.
Example 14.3.2 4x2 + y2 ´ z2 = 1
Sketch the surface that satisfies 4x2 + y2 ´ z2 = 1.
Solution. We’ll start by fixing any number z0 and sketching the part of the surface that lies in the
horizontal plane z = z0 .
z “ z0
The intersection of our surface with that horizontal plane is a horizontal cross-section. Any point
(x, y, z) lying on that horizontal cross-section satsifies both
z = z0 and 4x2 + y2 ´ z2 = 1
ðñ z = z0 and 4x2 + y2 = 1 + z20
5 Of course you could instead use some fancy graphing software, but part of the point is to build intuition. Not to
mention that you can’t use fancy graphing software on your exam.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
p y
(0 , 1 + z02 )
x
p
( 21 1 + z02 , 0)
Remember that this ellipse is the part of our surface that lies in the plane z = z0 . Imagine that the
sketch of the ellipse is on a single sheet of paper. Lift the sheet of paper up, move it around so that
the x- and y-axes point in the directions of the three dimensional x- and y-axes and place the sheet of
paper into the three dimensional sketch at height z0 . This gives a single horizontal ellipse in 3d, as
in the figure below.
z
z “ z0
We can build up the full surface by stacking many of these horizontal ellipses — one for each
possible height z0 . So we now draw a few of them as in the figure below. To reduce the amount of
clutter in the sketch, we have only drawn the first octant (i.e. the part of three dimensions that has
x ě 0, y ě 0 and z ě 0).
6 The semi-axes of an ellipse are the line segments from the centre of the ellipse to the farthest point on the curve
and to the nearest point on the curve. For a circle the lengths of both of these line segments are just the radius.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
z
z=3
z=2
z=1
y
Here is why it is OK, in this case, to just sketch the first octant. Replacing x by ´x in the equation
4x2 + y2 ´ z2 = 1 does not change the equation. That means that a point (x, y, z) is on the surface if
and only if the point (´x, y, z) is on the surface. So the surface is invariant under reflection in the
yz-plane. Similarly, the equation 4x2 + y2 ´ z2 = 1 does not change when y is replaced by ´y or z
is replaced by ´z. Our surface is also invariant under reflection in the xz- and xy-planes. Once we
have the part in the first octant, the remaining octants can be gotten simply by reflecting about the
coordinate planes.
We can get a more visually meaningful sketch by adding in some vertical cross-sections. The
x = 0 and y = 0 cross-sections (also called traces — they are the parts of our surface that are in the
yz- and xz-planes, respectively) are
x = 0, y2 ´ z2 = 1 and y = 0, 4x2 ´ z2 = 1
These equations describe hyperbolae7 . If you don’t remember how to sketch them, don’t worry.
We’ll do it now. We’ll first sketch them in 2d. Since
y2 = 1 + z2 ùñ |y| ě 1 and y = ˘1 when z = 0 and for large z, y « ˘z
2 2 1
4x = 1 + z ùñ |x| ě 2 and x= ˘ 12 when z = 0 and for large z, x « ˘ 12 z
the sketchs are
z z=y z
y2 − z2 = 1 4x2 − z 2 = 1
y x
Now we’ll incorporate them into the 3d sketch. Once again imagine that each is a single sheet of
paper. Pick each up and move it into the 3d sketch, carefully matching up the axes. The red (blue)
parts of the hyperbolas above become the red (blue) parts of the 3d sketch below (assuming of course
that you are looking at this on a colour screen).
z
z=3
z=2
z=1
y
Now that we have a pretty good idea of what the surface looks like we can clean up and simplify the
sketch. Here are a couple of possibilities.
Example 14.3.3 4x2 + y2 ´ z2 = ´1
Sketch the surface that satisfies 4x2 + y2 ´ z2 = ´1.
Solution. As in the last example, we’ll start by fixing any number z0 and sketching the part of the
surface that lies in the horizontal plane z = z0 . The intersection of our surface with that horizontal
plane is
Think of z0 as a constant.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
b
1
• If |z0 | ą 1 then4x2 + y2 = z20 ´ 1 is an ellipse with x semi-axis 2z20 ´ 1 and y semi-axis
b
z20 ´ 1. These semi-axes are small when |z0 | is close to 1 and grow as |z0 | increases.
The first octant parts of a few of these horizontal cross-sections are drawn in the figure below.
z
z“3
z“2
z “ 1.02
y
Next we add in the x = 0 and y = 0 cross-sections (i.e. the parts of our surface that are in the yz-
and xz-planes, respectively)
x = 0, z2 = 1 + y2 and y = 0, z2 = 1 + 4x2
z
z“3
z“2
z “ 1.05
y
Now that we have a pretty good idea of what the surface looks like we clean up and simplify the
sketch.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
Solution. This surface has a special property that makes it relatively easy to sketch. There are no
x’s in the equation yz = 1. That means that if some y0 and z0 obey y0 z0 = 1, then the point (x, y0 , z0 )
lies on the surface yz = 1 for all values of x. As x runs from ´8 to 8, the point (x, y0 , z0 ) sweeps
out a straight line parallel to the x-axis. So the surface yz = 1 is a union of lines parallel to the x-axis.
It is invariant under translations parallel to the x-axis. To sketch yz = 1, we just need to sketch its
intersection with the yz-plane and then translate the resulting curve parallel to the x-axis to sweep
out the surface.
We’ll start with a sketch of the hyperbola yz = 1 in two dimensions.
z
yz = 1
Next we’ll move this 2d sketch into the yz-plane, i.e. the plane x = 0, in 3d, except that we’ll only
draw in the part in the first octant.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
Example 14.3.4
Solution. We’ll sketch this surface using much the same procedure as we used in Examples 14.3.2
and 14.3.3. We’ll only sketch the part of the surface in the first octant. The remaining parts (in the
octants with x, y ă 0, z ě 0, with x, z ă 0, y ě 0 and with y, z ă 0, x ě 0) are just reflections of the
first octant part.
As usual, we start by fixing any number z0 and sketching the part of the surface that lies in the
horizontal plane z = z0 . The intersection of our surface with that horizontal plane is the hyperbola
4
z = z0 and xy =
z0
Note that x Ñ 8 as y Ñ 0 and that y Ñ 8 as x Ñ 0. So the hyperbola has both the x-axis and the
y-axis as asymptotes, when drawn in the xy-plane. The first octant parts of a few of these horizontal
cross-sections (namely, z0 = 4, z0 = 2 and z0 = 21 ) are drawn in the figure below.
z
z“4
z“2
z “ 1{2
y
Next we add some vertical cross-sections. We can’t use x = 0 or y = 0 because any point on xyz = 4
must have all of x, y, z nonzero. So we use
x = 4, yz = 1 and y = 4, xz = 1
y“4
x“4
x
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
Example 14.3.5
Often the reason you are interested in a surface in 3d is that it is the graph z = f (x, y) of a
function of two variables f (x, y). Another good way to visualize the behaviour of a function f (x, y)
is to sketch what are called its level curves.
Definition 14.3.6.
A level curve of f (x, y) is a curve whose equation is f (x, y) = C, for some constant C.
A level curve is the set of points in the xy-plane where f takes the value C. Because it is a curve
in 2d, it is usually easier to sketch than the graph of f . Here are a couple of examples.
Example 14.3.7 f (x, y) = x2 + 4y2 ´ 2x + 2
Sketch the level curves of f (x, y) = x2 + 4y2 ´ 2x + 2.
Solution. Fix any real number C. Then, for the specified function f , the level curve f (x, y) = C is
the set of points (x, y) that obey
x2 + 4y2 ´ 2x + 2 = C ðñ x2 ´ 2x + 1 + 4y2 + 1 = C
ðñ (x ´ 1)2 + 4y2 = C ´ 1
Now (x ´ 1)2 + 4y2 is the sum of two squares, and so is always at least zero. So if C ´ 1 ă 0, i.e.
if C ă 1, there is no curve f (x, y) = C. If C ´ 1 = 0, i.e. if C = 1, then f (x, y) = C ´ 1 = 0 if and
only if both (x ´ 1)2 = 0 and 4y2 = 0 and so the level curve consists of the single point (1, 0). If
C ą 1, then f (x, y) = C become (x ´ 1)2 + 4y2 = C ´ 1 ą 0 which describes an ellipse centred on
(1, 0). It intersects the x-axis when y = 0 and
? ?
(x ´ 1)2 = C ´ 1 ðñ x ´ 1 = ˘ C ´ 1 ðñ x = 1 ˘ C ´ 1
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
and it intersects the line x = 1 (i.e. the vertical line through the centre) when
? ?
4y2 = C ´ 1 ðñ 2y = ˘ C ´ 1 ðñ y = ˘ 12 C ´ 1
?
So,
? when C ą 1, f ( x, y ) = C is the ellipse centred on ( 1, 0 ) with x semi-axis C ´ 1 and y semi-axis
1
2 C ´ 1. Here is a sketch of some representative level curves of f (x, y) = x + 4y ´ 2x + 2.
2 2
1
f “17
f “10
f “5
f “1 f “2
1 x
x“1
Example 14.3.7
If you’ve ever used a topographic map, you’ve seen examples of level curves. Modelling the
z-axis as a measure of elevation, with z = 0 as sea level, the contours shown on topographic maps
show the level curves associated with different elevations. The example8 below shows the area
around Gambier, Anvil, and Keats Islands, north of UBC. The lines show level curves for z = 0
metres, z = 100 metres, z = 200 metres, etc.
8 generated by Natural Resources Canada’s Atlas of Canada - Toporama, included under an open government license
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
Solution. This one is not as nasty as it appears. That “ f (x, y) is given implicitly by the equation
ex+y+z = 1” means that, for each x, y, the solution z of ex+y+z = 1 is f (x, y). So, for the specified
function f and any fixed real number C, the level curve f (x, y) = C is the set of points (x, y) that
obey
This is of course a straight line. It intersects the x-axis when y = 0 and x = ´C and it intersects the
y-axis when x = 0 and y = ´C. Here is a sketch of some level curves.
1 f =−3
x
1 f =−2
f =−1
f =3 f =2 f =1 f =0
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
Example 14.3.8
We have just seen that sketching the level curves of a function f (x, y) can help us understand the
behaviour of f . We can generalise this to functions F (x, y, z) of three variables. A level surface of
F (x, y, z) is a surface whose equation is of the form F (x, y, z) = C for some constant C. It is the set
of points (x, y, z) at which F takes the value C.
Example 14.3.9 F (x, y, z) = x2 + y2 + z2
C centred on the origin. Here is a sketch of the parts of the level surfaces F = 1 (radius 1), F = 4
(radius 2) and F = 9 (radius 3) that are in the first octant.
F “9
F “4
F “1
Example 14.3.9
Example 14.3.10 F (x, y, z) = x2 + z2
Let F (x, y, z) = x2 + z2 and C ą 0. Consider the level surface x2 + z2 = C. The variable y does not
2 2
? y0 , the intersection of the our surface x + z = C with the
appear in this equation. So for any fixed
plane y = y0 is the circle of radius C centred on x = z = 0. Here is a sketch of the first quadrant
part of one such circle.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
z
y “ y0
F “C
z
F “9
F “4
F “1
Example 14.3.10
F “ e3
F “e 2
F “e
y
Example 14.3.11
There some classes of relatively simple, but commonly occurring, surfaces that are given their
own names. One such class is cylindrical surfaces. You are probably used to thinking of a cylinder
as being something that looks like x2 + y2 = 1.
x2 ` y 2 “ 1
A cylinder is a surface that consists of all points that are on all lines that are
Example 14.3.13
Here are sketches of three cylinders. The familiar cylinder on the left below
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
x2 ` y 2 “ 1 x2 ` py ´ zq2 “ 1
is called a right circular cylinder, because the given fixed plane curve (x2 + y2 = 1, z = 0) is a circle
and the given line (the z-axis) is perpendicular (i.e. at right angles) to the fixed plane curve.
The cylinder on the left above can be thought of as a vertical stack of circles. The cylinder on
the right above can also be thought of as a stack of circles, but the centre of the circle at height z has
been shifted rightward to (0, z, z). For that cylinder, the given fixed plane curve is once again the
circle x2 + y2 = 1, z = 0, but the given line is y = z, x = 0.
We have already seen the third cylinder
x yz “ 1
x, y, z ą 0
in Example 14.3.4. It is called a hyperbolic cylinder. In this example, the given fixed plane curve is
the hyperbola yz = 1, x = 0 and the given line is the x-axis.
Example 14.3.13
§§ Quadric Surfaces
Another named class of relatively simple, but commonly occurring, surfaces is the quadric surfaces.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
A quadric surface is surface that consists of all points that obey Q(x, y, z) = 0, with Q
being a polynomial of degree two9 .
for some constants A, B, ¨ ¨ ¨ , J. Each constant z cross section of a quadric surface has an equation of
the form
If A = B = D = 0 but g and h are not both zero, this is a straight line. If A, B, and D are not all
zero, then by rotating and translating our coordinate system the equation of the cross section can be
brought into one of the forms10
There are similar statements for the constant y cross sections and the constant z cross sections. Hence
quadratic surfaces are built by stacking these three types of curves.
We have already seen a number of quadric surfaces in the last couple of sections.
Its constant z cross sections are ellipses and its x = 0 and y = 0 cross sections are hyperbolae.
It is called a hyperboloid of one sheet.
9 Technically, we should also require that the polynomial can’t be factored into the product of two polynomials of
degree one.
10 This statement can be justified using a linear algebra eigenvalue/eigenvector analysis. It is beyond what we can
cover here, but is not too difficult for a standard linear algebra course.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
Its constant z cross sections are circles and its x = 0 and y = 0 cross sections are straight lines.
It is called a right circular cylinder.
?
1 x y=2
x
1 2
?
Let’s make a small contour map of our surface U (x, y) = x y, plotting several indifference
? 2
curves. (Note x y = c is equivalent to y = xc2 in our model domain.)
11 An amusing thought experiment is to propose units for measuring happiness. ”The one-point increase in GDP was
associated with an average increase of 3.7 wrinkly puppy faces of happiness nation-wide.”
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
=1
=2
=3
=4
=5
y
U
U
U
U
U
x
Not surprisingly, if we move roughly in the direction of the (1, 1) (that is, increasing both x and
y), our happiness U (x, y) goes up.
Note that none of the indifference curves touch either of the x or y axes. It is clear enough
from the formula that U (0, y) = U (x, 0) = 0. This is a common feature of utility functions: that to
maximize utility, a consumer will have at least a little of both products, rather than consuming only
one type.
Example 14.3.15
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D
424
PARTIAL D ERIVATIVES
Chapter 15
In this chapter we are going to generalize the definition of “derivative” to functions of more than
one variable, and then we are going to use those derivatives. We can speed things up considerably
by recycling what we have already learned in the single-variable case.
Learning Objectives
• Compute partial derivatives of two-variable functions.
First, recall how we defined the derivative, f 1 (a), of a function of one variable, f (x). We imagined
that we were walking along the x-axis, in the positive direction, measuring, for example, the
temperature along the way. We denoted by f (x) the temperature at x. The instantaneous rate of
change of temperature that we observed as we passed through x = a was
df f (a + h) ´ f (a) f (x ) ´ f (a)
(a) = lim = lim
dx hÑ0 h xÑa x´a
Next suppose that we are walking in the xy-plane and that the temperature at (x, y) is f (x, y).
We can pass through the point (x, y) = (a, b) moving in many different directions, and we cannot
expect the measured rate of change of temperature if we walk parallel to the x-axis, in the direction
of increasing x, to be the same as the measured rate of change of temperature if we walk parallel to
the y-axis in the direction of increasing y. We’ll start by considering just those two directions. other
directions (like walking parallel to the line y = x) later.
Suppose that we are passing through the point (x, y) = (a, b) and that we are walking parallel to
the x-axis (in the positive direction). Then our y-coordinate will be constant, always taking the value
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES
y = b. So we can think of the measured temperature as the function of one variable B(x) = f (x, b)
and we will observe the rate of change of temperature
Bf
This is called the “partial derivative f with respect to x at (a, b)” and is denoted Bx y (a, b). Here
˝ the symbol B, which is read “partial”, indicates that we are dealing with a function of more
than one variable and
˝ the subscript y on y
indicates that y is being held fixed, i.e. being treated as a constant,
and
˝ the x in BBxf indicates that we are differentiating with respect to x.
Bf
˝ Bx is read “ partial dee f dee x”.
d B d B
Do not write dx when Bx is appropriate. (There exist situations when dx f and Bx f are both defined
and have different meanings.)
If, instead, we are passing through the point (x, y) = (a, b) and are walking parallel to the y-axis
(in the positive direction), then our x-coordinate will be constant, always taking the value x = a. So
we can think of the measured temperature as the function of one variable A(y) = f (a, y) and we
will observe the rate of change of temperature
Bf
This is called the “partial derivative f with respect to y at (a, b)” and is denoted By x (a, b).
Just as was the case for the ordinary derivative ddxf (x), it is common to treat the partial derivatives
of f (x, y) as functions of (x, y) simply by evaluating the partial derivtives at (x, y) rather than at
(a, b).
respectively. The partial derivatives of functions of more than two variables are defined
analogously.
Partial derivatives are used a lot. And there many notations for them.
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES
Notation 15.1.2.
Bf
The partial derivative Bx y of a function f (x, y) is also denoted
Bf
fx Dx f D1 f
Bx
The subscript 1 on D1 f indicates
that
f is being differentiated with respect to its first
Bf
variable. The partial derivative Bx (a, b) is also denoted
y
B f ˇˇ
ˇ
Bx ˇ(a,b)
Remark 15.1.3 (The Geometric Interpretation of Partial Derivatives). We’ll now develop a
geometric interpretation of the partial derivative
Bf f (a + h, b) ´ f (a, b)
(a, b) = lim
Bx y hÑ0 h
in terms of the shape of the graph z = f (x, y) of the function f (x, y). That graph appears in the
figure below. It looks like the part of a deformed sphere that is in the first octant.
Bf
The definition of Bx y (a, b) concerns only points on the graph that have y = b. In other words,
the curve of intersection of the surface z = f (x, y) with the plane y = b. That is the red curve in the
figure. The two blue vertical line segments in the figure have heights f (a, b) and f (a + h, b), which
f (a+h,b)´ f (a,b)
are the two numbers in the numerator of h .
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES
z “ f px, yq
y“b
f pa ` h, bq ´ f pa, bq
f pa, bq
f pa ` h, bq
y
pa, b, 0q
h
pa ` h, b, 0q
A side view of the curve (looking from the left side of the y-axis) is sketched in the figure below.
f pa ` h, bq ´ f pa, bq
z “ f px, bq, y “ b
f pa, bq
f pa ` h, bq
x
pa, b, 0q pa ` h, b, 0q
Again, the two blue vertical line segments in the figure have heights f (a, b) and f (a + h, b), which
f (a+h,b)´ f (a,b)
are the two numbers in the numerator of h . So the numerator f (a + h, b) ´ f (a, b) and
denominator
h are the rise and run, respectively, of the curve z = f (x, b) from x = a to x = a + h.
Bf
Thus Bx (a, b) is exactly the slope of (the tangent to) the curve of intersection of the surface
y
z = f (x, y) and the plane y = b at the point a, b, f (a, b) . In the same way BByf (a, b) is exactly
x
the slope of (the tangent to) the curve of intersection of the surface z = f (x, y) and the plane x = a
at the point a, b, f (a, b) .
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES
• to evaluate BBxf (x, y), treat the y in f (x, y) as a constant and differentiate the resulting function
of x with respect to x.
• To evaluate BByf (x, y), treat the x in f (x, y) as a constant and differentiate the resulting function
of y with respect to y.
• To evaluate BBxf (a, b), treat the y in f (x, y) as a constant and differentiate the resulting function
of x with respect to x. Then evaluate the result at x = a, y = b.
• To evaluate BByf (a, b), treat the x in f (x, y) as a constant and differentiate the resulting function
of y with respect to y. Then evaluate the result at x = a, y = b.
Bf
(1, 0) = 3(1)2 + 4(0)2 = 3
Bx
Bf
(1, 0) = 2(0) + 8(1)(0) = 0
By
Example 15.1.4
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES
Example 15.1.5
Let
f (x, y) = y cos x + xexy
B yx
Then, since B
Bx treats y as a constant, Bx e = yeyx and
Bf
(x, y) = ´y sin x + exy + xyexy
Bx
Bf
(x, y) = cos x + x2 exy
By
Example 15.1.5
Let’s move up to a function of four variables. Things generalize in a quite straight forward way.
Example 15.1.6
Let
f (x, y, z,t ) = x sin(y + 2z) + t 2 e3y ln z
Then
Bf
(x, y, z,t ) = sin(y + 2z)
Bx
Bf
(x, y, z,t ) = x cos(y + 2z) + 3t 2 e3y ln z
By
Bf
(x, y, z,t ) = 2x cos(y + 2z) + t 2 e3y /z
Bz
Bf
(x, y, z,t ) = 2te3y ln z
Bt
Example 15.1.6
Now here is a more complicated example — our function takes a special value at (0, 0). To compute
derivatives there we have to revert to the definition.
Example 15.1.7
Set #
cos x´cos y
x´y if x ‰ y
f (x, y) =
0 if x = y
If b ‰ a, then for all (x, y) sufficiently close to (a, b), f (x, y) = cos x´cos
x´y
y
and we can compute the
partial derivatives of f at (a, b) using the familiar rules of differentiation. However that is not the
case for (a, b) = (0, 0). To evaluate fx (0, 0), we need to set y = 0 and find the derivative of
#
cos x´1
x if x ‰ 0
f (x, 0) =
0 if x = 0
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES
Example 15.1.8
Again set #
cos x´cos y
x´y if x ‰ y
f (x, y) =
0 if x = y
We’ll now compute fy (x, y) for all (x, y).
The case y ‰ x: When y ‰ x,
B cos x ´ cos y
fy (x, y) =
By x´y
(x ´ y) By
B
(cos x ´ cos y) ´ (cos x ´ cos y) By
B
(x ´ y)
= by the quotient rule
(x ´ y)2
(x ´ y) sin y + cos x ´ cos y
=
(x ´ y)2
Note that if x is not an integer multiple of π, then the numerator ´ sin(x + h) does not tend to zero
as h tends to zero, and the limit giving fy (x, y) does not exist. On the other hand, if x is an integer
multiple of π, both the numerator and denominator tend to zero as h tends to zero, and we can apply
L’Hôpital’s rule a second time. Then
´ cos(x + h)
fy (x, y) = lim
hÑ0 2
cos x
=´
2
The conclusion:
$ (x´y) sin y+cos x´cos y
’
& (x´y)2
if x ‰ y
fy (x, y) = ´ cos x
if x = y with x an integer multiple of π
% 2
’
DNE if x = y with x not an integer multiple of π
Example 15.1.8
Our next example uses implicit differentiation.
Example 15.1.9
The equation
z5 + y2 ez + e2x = 0
implicitly determines z as a function of x and y. For example, when x = y = 0, the equation reduces
to
z5 = ´1
which forces1 z(0, 0) = ´1. Let’s find the partial derivative Bx
Bz
(0, 0).
We are not going to be able to explicitly solve the equation for z(x, y). All we know is that
1 The only real number z which obeys z5 = ´1 is z = ´1. However there are four other complex numbers which
also obey z5 = ´1.
2 You should have already seen this technique, called implicit differentiation, in your first Calculus course.
PARTIAL D ERIVATIVES 15.2 H IGHER ORDER DERIVATIVES
Example 15.1.9
Next we have a partial derivative disguised as a limit.
Example 15.1.10
In this example we are going to evaluate the limit
(x + y + z)3 ´ (x + y)3
lim
zÑ0 (x + y)z
The critical observation is that, in taking the limit z Ñ 0, x and y are fixed. They do not change as
z is getting smaller and smaller. Furthermore this limit is exactly of the form of the limits in the
Definition 15.1.1 of partial derivative, disguised by some obfuscating changes of notation.
Set
(x + y + z)3
f (x, y, z) =
(x + y)
Then
B
Recalling that Bz treats x and y as constants, we are evaluating the derivative of a function of the
(const+z)3
form const . So
Example 15.1.10
Learning Objectives
• Compute the second order partial derivatives given a function of two variables.
• State without proof that the mixed partials should be equal for “nice” functions.
PARTIAL D ERIVATIVES 15.2 H IGHER ORDER DERIVATIVES
You have already observed, in your first Calculus course, that if f (x) is a function of x, then its
derivative, ddxf (x), is also a function of x, and can be differentiated to give the second order derivative
d2 f
dx2
(x ), which can in turn be differentiated yet again to give the third order derivative, f (3) (x), and
so on.
We can do the same for functions of more than one variable. If f (x, y) is a function of x and y,
then both of its partial derivatives, BBxf (x, y) and BByf (x, y) are also functions of x and y. They can both
be differentiated with respect to x and they can both be differentiated with respect to y. So there are
four possible second order derivatives. Here they are, together with various alternate notations.
B Bf B2 f
(x, y) = (x, y) = fxx (x, y)
Bx Bx Bx2
B Bf B2 f
(x, y) = (x, y)= fxy (x, y)
By Bx ByBx
B Bf B2 f
(x, y) = (x, y)= fyx (x, y)
Bx By BxBy
B Bf B2 f
(x, y) = (x, y) = fyy (x, y)
By By By2
Warning 15.2.1.
B f 2 2
B B
In By Bx = By Bx f , the derivative closest to f , in this case Bx , is applied first. So we work
through the variables in the bottom right-to-left.
In fxy , the derivative with respect to the variable closest to f , in this case x, is applied first.
So we work through the subscript variables left-to-right.
The difference in “direction” highlighted in the warning seems confusing at first, but it stems from
the way the first partial derivative is written. In the fractional notation, if f is being differentiated
with respect to x, we write BBxf or Bx
B B
f . So the operator Bx is added to the left of the function.
h i Now 2
suppose we want to differentiate BBxf with respect to y. By analogy, we would write By B Bf B f
Bx , or ByBx .
This leads to the order of variables being right-to-left.
With the subscript notation, if f is being differentiated with respect to x, we write fx , with the
variable on the right of the function. So now if we take the second derivative with respect to y, it
makes sense by analogy to add that new variable to the right: ( fx )y , or fxy , in left-to-right order.
Example 15.2.2
Let f (x, y) = emy cos(nx). Then
Example 15.2.2
Example 15.2.3
Let f (x, y) = eαx+β y . Then
fx = αeαx+β y fy = β eαx+β y
fxx = α 2 eαx+β y fyx = β αeαx+β y
fxy = αβ eαx+β y fyy = β 2 eαx+β y
B m+n f
= α m β n eαx+β y
Bxm Byn
Example 15.2.3
Example 15.2.4
If f (x1 , x2 , x3 , x4 ) = x14 x23 x32 x4 , then
B4 f B3
= x14 x23 x32
Bx1 Bx2 Bx3 Bx4 Bx1 Bx2 Bx3
B2
= 2 x14 x23 x3
Bx1 Bx2
B
= 6 x14 x22 x3
Bx1
= 24 x13 x22 x3
and
B4 f B3
= 4x13 x23 x32 x4
Bx4 Bx3 Bx2 Bx1 Bx4 Bx3 Bx2
B2
= 12 x13 x22 x32 x4
Bx4 Bx3
B
= 24 x13 x22 x3 x4
Bx4
= 24 x13 x22 x3
Example 15.2.4
Notice that in Example 15.2.2,
B4 f B4 f
= = 24 x13 x22 x3
Bx1 Bx2 Bx3 Bx4 Bx4 Bx3 Bx2 Bx1
In all of these examples, it didn’t matter what order we took the derivatives in. The following
theorem3 shows that this was no accident.
B2 f B2 f
(x0 , y0 ) = (x0 , y0 )
BxBy ByBx
We won’t use this theorem a whole lot in Math 105. It can occasionally be useful to note that as
long as a function is continuous and differentiable, you can differentiate it in any “order.”
Example 15.2.6
Let f (x, y) = x5 ex + y. Find fxxxy .
Solution. Since f (x, y) is continuous and differentiable everywhere, then the order of differentiation
doesn’t matter. Rather than starting with respect to x (which is harder), we start with respect to y
(which is easier).
fy = 1
fyx = 0 ùñ fxy = 0
fxyx = 0 ùñ fxxy = 0
fxxyx = 0 ùñ fxxxy = 0
Example 15.2.6
3 The history of this important theorem is pretty convoluted. See “A note on the history of mixed partial derivatives”
by Thomas James Higgins which was published in Scripta Mathematica 7 (1940), 59-62.
4 Alexis Clairaut (1713–1765) was a French mathematician, astronomer, and geophysicist.
5 Hermann Schwarz (1843–1921) was a German mathematician.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS
Chapter 16
(F LAVOUR C) O PTIMIZATION OF
M ULTIVARIABLE F UNCTIONS
Definition 16.1.1.
Let the function f (x, y) be defined for all (x, y) in some subset R of R2 . Let (a, b) be a
point in R.
• (a, b) is a local maximum of f (x, y) if f (x, y) ď f (a, b) for all (x, y) close to (a, b).
More precisely, (a, b) is a local maximum of f (x, y) if there is an r ą 0 such that
f (x, y) ď f (a, b) for all points (x, y) within a distance r of (a, b).
• (a, b) is a local minimum of f (x, y) if f (x, y) ě f (a, b) for all (x, y) close to (a, b).
• Local maximum and minimum values are also called extremal values.
Learning Objectives
• Define critical point and singular point for a function of two variables.
• Compute the critical points and singular points of a given function of two variables.
• State (without proof) that extreme values of a continuous multivariable function will
occur at critical or singular points.
One of the first things you did when you were developing the techniques used to find the maximum
and minimum values of f (x) was to ask yourself2
Suppose that the largest value of f (x) is f (a). What does that tell us about a?
After a little thought you answered
If the largest value of f (x) is f (a) and f is differentiable at a, then f 1 (a) = 0.
y “ f pxq
Let’s recall why that’s true. Suppose that the largest value of f (x) is f (a). Then for all h ą 0,
f (a + h) ´ f (a)
f (a + h) ď f (a) ùñ f (a + h) ´ f (a) ď 0 ùñ ď0 if h ą 0
h
Taking the limit h Ñ 0 tells us that f 1 (a) ď 0. Similarly, for all h ă 0,
f (a + h) ´ f (a)
f (a + h) ď f (a) ùñ f (a + h) ´ f (a) ď 0 ùñ ě0 if h ă 0
h
Taking the limit h Ñ 0 now tells us that f 1 (a) ě 0. So we have both f 1 (a) ě 0 and f 1 (a) ď 0 which
forces f 1 (a) = 0.
You also observed at the time that for this argument to work, you only need f (x) ď f (a) for
all x’s close to a, not necessarily for all x’s in the whole world. (In the above inequalities, we only
used f (a + h) with h small.) Since we care only about f (x) for x near a, we can refine the above
statement.
Let’s use the ideas of the above discourse to extend the study of local maxima and local minima
to functions of more than one variable. Suppose that the function f (x, y) is defined for all (x, y) in
some subset R of R2 , that (a, b) is point of R that is not on the boundary of R, and that f has a local
maximum at (a, b). See the figure below.
pa,b , f pa,bqq
z “ f px, yq
y
pa,bq
R
x
Then the function f (x, y) must decrease in value as (x, y) moves away from (a, b) in any direction.
If we change the x-coordinate a little, f (x, y) must not increase. So for all h ą 0:
f (a + h, b) ´ f (a, b)
f (a + h, b) ď f (a, b) ùñ f (a + h, b) ´ f (a, b) ď 0 ùñ ď0 if h ą 0
h
f (a + h, b) ´ f (a, b)
f (a + h, b) ď f (a, b) ùñ f (a + h, b) ´ f (a, b) ď 0 ùñ ě0 if h ă 0
h
Taking the limit h Ñ 0 now tells us that fx (a, b) ě 0. So we have both fx (a, b) ě 0 and fx (a, b) ď 0
which forces fx (a, b) = 0. The same reasoning tells us fy (a, b) = 0 as well, and that these partial
derivatives are zero for minima as well as maxima.
This is an important and useful result, so let’s theoremise it.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES
Theorem 16.1.2.
Let the function f (x, y) be defined for all (x, y) in some subset R of R2 . Assume that
Then
fx (a, b) = 0
and fy (a, b) = 0
Definition 16.1.3.
Let f (x, y) be a function and let (a, b) be a point in its domain. Then we call (a, b) a
critical point (or a stationary point) of the function if
• fx (a, b) = fy (a, b) = 0.
Warning 16.1.4.
Note that some people (and texts) do not include the cases where one or both partial
derivatives do not exist in the definition of a critical point. These points would (usually)
be referred as a singular point of the function. We do not use this terminology.
Warning 16.1.5.
Theorem 16.1.2 tells us that every local maximum or minimum (in the interior of the
domain of a differentiable function) is a critical point. Beware that it does not3 tell us that
every critical point is either a local maximum or a local minimum.
In fact, as we shall see in Example 16.1.12, critical points that are neither local maxima nor a
local minima. None-the-less, Theorem 16.1.2 is very useful because often functions have only a
small number of critical points. To find local maxima and minima of such functions, we only need
3 A very common error of logic that people make is “Affirming the consequent”. “If P then Q” is true, does not imply
that “If Q then P” is true . The statement “If he is Shakespeare then he is dead” is true. But concluding from “That
sheep is dead” that “He must be Shakespeare” is just silly.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES
to consider its critical points. We’ll return later to the question of how to tell if a critical point is a
local maximum, local minimum or neither. For now, we’ll just practice finding critical points.
Example 16.1.6 f (x, y) = x2 ´ 2xy + 2y2 + 2x ´ 6y + 12
Find all critical points of f (x, y) = x2 ´ 2xy + 2y2 + 2x ´ 6y + 12.
Solution. To find the critical points, we need to find the first order partial derivatives. So, as a
preliminary calculation, we find the two first order partial derivatives of f (x, y).
fx (x, y) = 2x ´ 2y + 2
fy (x, y) = ´2x + 4y ´ 6
These functions are defined everywhere. So the critical points are the solutions of the pair of
equations
2x ´ 2y + 2 = 0 ´ 2x + 4y ´ 6 = 0
or equivalently (dividing by two and moving the constants to the right hand side)
x ´ y = ´1 (E1)
´x + 2y = 3 (E2)
This is a system of two equations in two unknowns (x and y). One strategy for solving system like
this is to
• First use one of the equations to solve for one of the unkowns in terms of the other unknown.
For example, (E1) tells us that y = x + 1. This expresses y in terms of x. We say that we have
solved for y in terms of x.
• Then substitute the result, y = x + 1 in our case, into the other equation, (E2). In our case, this
gives
´x + 2(x + 1) = 3 ðñ x + 2 = 3 ðñ x = 1
• We have now found that x = 1, y = x + 1 = 2 is the only solution. So the only critical point
is (1, 2). Of course it only takes a moment to verify that fx (1, 2) = fy (1, 2) = 0. It is a good
idea to do this as a simple check of our work.
An alternative strategy for solving a system of two equations in two unknowns, like (E1) and (E2),
is to
The point here is that adding equations (E1) and (E2) together eliminates the unknown x,
leaving us with one equation in the unknown y, which is easily solved. For other systems of
equations you might have to multiply the equations by some numbers before adding them
together.
x ´ 2 = ´1 ùñ x = 1
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES
• Once again (thankfully) we have found that the only critical point is (1, 2).
Example 16.1.6
This was pretty easy because we only had to solve linear equations, which in turn was a consequence
of the fact that f (x, y) was a polynomial of degree two. Here is an example with some slightly more
challenging algebra.
Example 16.1.7 f (x, y) = 2x3 ´ 6xy + y2 + 4y
Find all critical points of f (x, y) = 2x3 ´ 6xy + y2 + 4y.
Solution. As in the last example, we need to find where the partial derivatives do not exist or are
zero.
fx = 6x2 ´ 6y fy = ´6x + 2y + 4
These functions are defined everywhere. So the critical points are the solutions of
6x2 ´ 6y = 0 ´ 6x + 2y + 4 = 0
We can rewrite the first equation as y = x2 , which expresses y as a function of x. We can then
substitute y = x2 into the second equation, giving
When x = 1, y = 12 = 1 and when x = 2, y = 22 = 4. So, there are two critical points: (1, 1), (2, 4).
Alternatively, we could have also used the second equation to write y = 3x ´ 2, and then
substituted that into the first equation to get
6x2 ´ 6(3x ´ 2) = 0 ðñ x2 ´ 3x + 2 = 0
just as above.
Example 16.1.7
And here is an example for which the algebra requires a bit more thought.
Example 16.1.8 ( f (x, y) = xy(5x + y ´ 15))
Find all critical points of f (x, y) = xy(5x + y ´ 15).
Solution. The first order partial derivatives of f (x, y) = xy(5x + y ´ 15) are
Therefore the partial derivatives of the function exist everywhere in the domain of the function. The
critical points are the solutions of fx (x, y) = fy (x, y) = 0. That is, we need to find all x, y that satisfy
the pair of equations
The first equation, y(10x + y ´ 15) = 0, is satisfied if at least one of the two factors y, (10x + y ´ 15)
is zero. So the first equation is satisfied if at least one of the two equations
y=0 (E1a)
10x + y = 15 (E1b)
is satisfied. The second equation, x(5x + 2y ´ 15) = 0, is satisfied if at least one of the two factors x,
(5x + 2y ´ 15) is zero. So the second equation is satisfied if at least one of the two equations
x=0 (E2a)
5x + 2y = 15 (E2b)
is satisfied.
So both critical point equations (E1) and (E2) are satisfied if and only if at least one of (E1a),
(E1b) is satisfied and in addition at least one of (E2a), (E2b) is satisfied. So both critical point
equations (E1) and (E2) are satisfied if and only if at least one of the following four possibilities
hold.
• (E1a) and (E2a) are satisfied if and only if x = y = 0
• (E1a) and (E2b) are satisfied if and only if y = 0, 5x + 2y = 15 ðñ y = 0, 5x = 15
• (E1b) and (E2a) are satisfied if and only if 10x + y = 15, x = 0 ðñ y = 15, x = 0
• (E1b) and (E2b) are satisfied if and only if 10x + y = 15, 5x + 2y = 15. We can use, for
example, the second of these equations to solve for x in terms of y: x = 15 (15 ´ 2y). When we
substitute this into the first equation we get 2(15 ´ 2y) + y = 15, which we can solve for y.
This gives ´3y = 15 ´ 30 or y = 5 and then x = 51 (15 ´ 2 ˆ 5) = 1.
In conclusion, the critical points are (0, 0), (3, 0), (0, 15) and (1, 5).
A more compact way to write what we have just done is
fx (x, y) = 0 and fy (x, y) = 0
ðñ y(10x + y ´ 15) = 0 and x(5x + 2y ´ 15) = 0
( (
ðñ y = 0 or 10x + y = 15 and x = 0 or 5x + 2y = 15
( ( (
ðñ y = 0, x = 0 or y = 0, 5x + 2y = 15 or 10x + y = 15, x = 0 or
(
10x + y = 15, 5x + 2y = 15
( ( ( (
ðñ x = y = 0 or y = 0, x = 3 or x = 0, y = 15 or x = 1, y = 5
Example 16.1.8
Let’s try a more practical example — something from the real world. Well, a mathematician’s
“real world”. The interested reader should search-engine their way to a discussion of “idealisation”,
“game theory” “Cournot models” and “Bertrand models”. But don’t spend too long there. A
discussion of breweries is about to take place.
Example 16.1.9
In a certain community, there are two breweries in competition4 , so that sales of each negatively
affect the profits of the other. If brewery A produces x litres of beer per month and brewery B
produces y litres per month, then the profits of the two breweries are given by
2x2 + y2 4y2 + x2
P = 2x ´ Q = 2y ´
106 2 ˆ 106
respectively. Find the sum of the two profits if each brewery independently sets its own production
level to maximize its own profit and assumes that its competitor does likewise. Then, assuming
cartel behaviour, find the sum of the two profits if the two breweries cooperate so as to maximize
that sum5 .
Solution. If A adjusts x to maximize P (for y held fixed) and B adjusts y to maximize Q (for x held
fixed) then we want to find the (x, y) using
4x
Px = 2 ´ 10 6
8y
Qy = 2 ´ 2ˆ10 6
Note that Px and Qy exists everywhere. Then x and y are determined by the equations
Px = 0 (E1)
Qy = 0 (E2)
Equation (E1) yields x = 12 106 and equation (E2) yields y = 12 106 . Knowing x and y we can
determine P, Q and the total profit
P + Q = 2(x + y) ´ 1016 25 x2 + 3y2
= 106 1 + 1 ´ 58 ´ 43 = 58 106
On the other hand if (A, B) adjust (x, y) to maximize P + Q = 2(x + y) ´ 1016 52 x2 + 3y2 , then x and
y are determined by
5x
(P + Q)x = 2 ´ 10 6 = 0 (E1)
6y
(P + Q)y = 2 ´ 106
=0 (E2)
Equation (E1) yields x = 25 106 and equation (E2) yields y = 31 106 . Again knowing x and y we can
determine the total profit
P + Q = 2(x + y) ´ 1016 52 x2 + 3y2
= 106 45 + 32 ´ 25 ´ 13 = 11
15 10
6
So cooperating really does help their profits. Unfortunately, like a very small tea-pot, consumers
will be a little poorer6 .
Example 16.1.9
Moving swiftly away from the last pun, let’s do something a little more geometric.
Example 16.1.10
Equal angle bends are made at equal distances from the two ends of a 100 metre long fence so
the resulting three segment fence can be placed along an existing wall to make an enclosure of
trapezoidal shape. What is the largest possible area for such an enclosure?
wall
θ θ
Solution. This is a very geometric problem (fenced off from pun opportunities), and as such we
should start by drawing a sketch and introducing some variable names.
x x x sin θ
θ θ
100 ´ 2x
The area enclosed by the fence is the area inside the blue rectangle (in the figure on the right above)
plus the area inside the two blue triangles.
A(x, θ ) = (100 ´ 2x)x sin θ + 2 ¨ 12 ¨ x sin θ ¨ x cos θ
= (100x ´ 2x2 ) sin θ + x2 sin θ cos θ
To maximize the area, we need to solve
BA
0= = (100 ´ 4x) sin θ + 2x sin θ cos θ
Bx
BA
= (100x ´ 2x2 ) cos θ + x2 cos2 θ ´ sin2 θ
(
0=
Bθ
BA BA
Note that Bx and Bθ are defined everywhere in their domain (so here the critical points are the points
where both partial derivatives are zero). Both terms in the first equation contain the factor sin θ and
all terms in the second equation contain the factor x. If either sin θ or x are zero the area A(x, θ ) will
also be zero, and so will certainly not be maximal. So we may divide the first equation by sin θ and
the second equation by x, giving
(100 ´ 4x) + 2x cos θ = 0 (E1)
(100 ´ 2x) cos θ + x cos2 θ ´ sin2 θ = 0
(
(E2)
These equations might look a little scary. But there is no need to panic. They are not as bad as they
look because θ enters only through cos θ and sin2 θ , which we can easily write in terms of cos θ .
Furthermore we can eliminate cos θ by observing that the first equation forces cos θ = ´ 100´4x
2x and
(100´4x)2
hence sin2 θ = 1 ´ cos2 θ = 1 ´ 4x2
.
Substituting these into the second equation gives
100 ´ 4x (100 ´ 4x)2
´(100 ´ 2x) +x ´1 = 0
2x 2x2
ùñ 6x2 ´ 200x = 0
100 ´100/3 1
ùñ x= cos θ = ´ = θ = 60˝
3 200/3 2
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES
Example 16.1.10
Now here is a very useful (even practical!) statistical example — finding the line that best fits a
given collection of points.
Example 16.1.11 (Linear regression)
An experiment yields n data points (xi , yi ), i = 1, 2, ¨ ¨ ¨ , n. We wish to find the straight line
y = mx + b which “best” fits the data. The definition of “best” is “minimizes the root mean
pxn ,yn q
y “ mx ` b
x
• All terms in the sum are positive, regardless of whether the points (xi , yi ) are above or below
the line.
Our problem is to find the m and b that minimizes E (m, b). This technique for drawing a line through
a bunch of data points is called “linear regression”. It is used a lot7 8 . Even in the real world — and
not just the real world that you find in mathematics problems. The actual real world that involves
jobs.
Solution. We wish to choose m and b so as to minimize E (m, b). So we need to determine where
the partial derivatives of E do not exist, or exist and are equal to zero.
BE ÿ
n hřn i hřn i hř n i
= 2(mxi + b ´ yi )xi = m 2xi2 + b 2xi ´ 2xi yi
Bm i=1 i=1 i=1 i=1
BE ÿn hřn i hřn i hř n i
= 2(mxi + b ´ yi ) =m 2xi + b 2 ´ 2yi
Bb i=1 i=1 i=1 i=1
There are a lot of symbols here. But remember that all of the xi ’s and yi ’s are given constants. They
come from, for example, experimental data. The only unknowns are m and b. To emphasize this,
and to save some writing, define the constants
n n n n
xi2
ř ř ř ř
Sx = xi Sy = yi Sx 2 = Sxy = xi yi
i=1 i=1 i=1 i=1
The partial derivatives of E exists everywhere so we only need to find where they are equal to zero.
The equations which determine the critical points are (after dividing by two)
These are two linear equations on the unknowns m and b. They may be solved in any of the usual
ways. One is to use (E2) to solve for b in terms of m
1
b= Sy ´ Sx m (E3)
n
and then substitute this into (E1) to get the equation
1
Sx2 m + Sx Sy ´ Sx m = Sxy ùñ nSx2 ´ Sx2 m = nSxy ´ Sx Sy
n
for m. We can then solve this equation for m and substitute back into (E3) to get b. This gives
nSxy ´ Sx Sy Sx Sxy ´ Sy Sx2
m= b=´
nSx2 ´ Sx2 nSx2 ´ Sx2
Another way to solve the system of equations is
h i
n(E1) ´ Sx (E2) : nSx2 ´ Sx2 m = nSxy ´ Sx Sy
h i
´Sx (E1) + Sx2 (E2) : nSx2 ´ Sx b = ´Sx Sxy + Sy Sx2
2
Learning Objectives
• Use the second derivative test to classify critical points as either local maximums, local
minimums, or saddle points.
Now let’s start thinking about how to tell if a critical point is a local minimum, local maximum, or
neither. We’ll start with an intuitive approach, then introduce the (multivariable) Second Derivative
Test.
You have already encountered single variable functions that have a critical point which is neither
a local max nor a local min. This can also happen for functions of two variables. We’ll start with the
simplest possible such example.
Example 16.1.12 f (x, y) = x2 ´ y2
The first partial derivatives of f (x, y) = x2 ´ y2 are fx (x, y) = 2x and fy (x, y) = ´2y. So the only
critical point of this function is (0, 0). Is this a local minimum or maximum? Well let’s start with
(x, y) at (0, 0) and then move (x, y) away from (0, 0) and see if f (x, y) gets bigger or smaller. At the
origin f (0, 0) = 0. Of course we can move (x, y) away from (0, 0) in many different directions.
• First consider moving (x, y) along the x-axis. Then (x, y) = (x, 0) and f (x, y) = f (x, 0) = x2 .
So when we start with x = 0 and then increase x, the value of the function f increases —
which means that (0, 0) cannot be a local maximum for f .
• Next let’s move (x, y) away from (0, 0) along the y-axis. Then (x, y) = (0, y) and f (x, y) =
f (0, y) = ´y2 . So when we start with y = 0 and then increase y, the value of the function f
decreases — which means that (0, 0) cannot be a local minimum for f .
So moving away from (0, 0) in one direction causes the value of f to increase, while moving away
from (0, 0) in a second direction causes the value of f to decrease. Consequently (0, 0) is neither
a local minimum or maximum for f . It is called a saddle point, because the graph of f looks like
a saddle. (The full definition of “saddle point” is given immediately after this example.) Here are
some figures showing the graph of f .
The figure below show some level curves of f . Observe from the level curves that
• f increases as you leave (0, 0) walking along the x axis
• f decreases as you leave (0, 0) walking along the y axis
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES
y
f =−9
f =−4
f =−1
f =0
f =9 f =4 f =1 f =1 f =4 f =9
x
f =−1
f =−4
f =−9
Example 16.1.12
Approximately speaking, if a critical point (a, b) is neither a local minimum nor a local maximum,
then it is a saddle point. For (a, b) to not be a local minimum, f has to take values smaller than
f (a, b) at some points nearby (a, b). For (a, b) to not be a local maximum, f has to take values
bigger than f (a, b) at some points nearby (a, b). Writing this more mathematically we get the
following definition.
Definition 16.1.13.
The critical point (a, b) is called a saddle point for the function f (x, y) if, for each r ą 0,
• there is at least one point (x, y), within a distance r of (a, b), for which f (x, y) ą
f (a, b) and
• there is at least one point (x, y), within a distance r of (a, b), for which f (x, y) ă
f (a, b).
Understanding what the graph of a function looks like is a powerful tool for classifying critical
points, but it can be very time-consuming. The Second Derivative Test (below) is a more algebraic
approach to classification. This test is often faster than graphing, but the drawback is that it is
sometimes inconclusive.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES
Let r ą 0 and assume that all second order derivatives of the function f (x, y) are continuous
at all points (x, y) that are within a distance r of (a, b). Assume that fx (a, b) = fy (a, b) = 0.
Define
D(x, y) = fxx (x, y) fyy (x, y) ´ fxy (x, y)2
It is called the discriminant of f . Then
• if D(a, b) ą 0 and fxx (a, b) ą 0, then f (x, y) has a local minimum at (a, b),
• if D(a, b) ą 0 and fxx (a, b) ă 0, then f (x, y) has a local maximum at (a, b),
The proof of Theorem 16.1.14 is beyond the scope of Math 105, but there is some intuition
supporting it that is more accessible. Extremely informally, we can think of saddle points as places
with inconsistent concavity: in some directions the surface looks concave up, in other directions
it looks concave down. On the other hand, at a local extremum, the concavity is the same in all
directions.
Let’s do thought experiments on a few simple cases to expand those ideas.
Example 16.1.15 (Second Derivative Test Intuition)
Let (a, b) be a critical point of the function f (x, y) with fx (a, b) = fy (a, b) = 0, and assume all
second-order derivatives fo f (x, y) are continuous.
1. Suppose at (a, b), the surface looks like a minimum if y is held constant, but it looks like a
maximum if x is held constant. (In particular, this means (a, b) is the location of a saddle
point.)
f (x, b)
f (a, y)
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES
Since fxx (a, b) and fyy (a, b) have different signs (or at least one of them is zero):
So in this simple saddle-point example, we expect D(a, b) ď 0. This accords with the third
bullet point in Theorem 16.1.14.
2. Suppose D(a, b) ą 0.
2
0 ă fxx (a, b) fyy (a, b) ´ fxy (a, b)
2
fxy (a, b) ă fxx (a, b) fyy (a, b)
This tells us that fxx (a, b) and fyy (a, b) have the same sign – either they’re both positive or
they’re both negative. So, the function’s concavity is the same whether we hold the x-value or
the y-value constant. The function might have the same concavity in all directions – unlike the
saddle point example we saw above. So, it seems plausible that critical points with positive
discriminants are local extrema, rather than saddle points.
z = f ( x, y) z = f ( x, b)
b b
a a
y y
x x
This doesn’t go so far as to show us that D(a, b) ě 0, but it does accord with the test of
fxx (a, b) in the second bullet point of Theorem 16.1.14.
z = f ( x, y) z = f ( x, b)
b b
a a
y y
x x
Again, although this doesn’t go so far as to show us that D(a, b) ě 0, it does accord with the
test of fxx (a, b) in the first bullet point of Theorem 16.1.14.
Example 16.1.15
You might wonder why, in the local maximum/local minimum cases of Theorem 16.1.14,
fxx (a, b) appears rather than fyy (a, b). The answer is only that x is before y in the alphabet9 . You
can use fyy (a, b) just as well as fxx (a, b). The reason is that if D(a, b) ą 0 (as in the first two bullets
of the theorem), then because D(a, b) = fxx (a, b) fyy (a, b) ´ fxy (a, b)2 ą 0, we necessarily have
fxx (a, b) fyy (a, b) ą 0 so that fxx (a, b) and fyy (a, b) must have the same sign — either both are
positive or both are negative.
You might also wonder why we cannot draw any conclusions when D(a, b) = 0 and what
happens then. The second derivative test for functions of two variables was derived in precisely the
same way as the second derivative test for functions of one variable is derived — you approximate
the function by a polynomial that is of degree two in (x ´ a), (y ´ b) and then you analyze the
behaviour of the quadratic polynomial near (a, b). For this to work, the contributions to f (x, y) from
terms that are of degree two in (x ´ a), (y ´ b) had better be bigger than the contributions to f (x, y)
from terms that are of degree three and higher in (x ´ a), (y ´ b) when (x ´ a), (y ´ b) are really
small. If this is not the case, for example when the terms in f (x, y) that are of degree two in (x ´ a),
(y ´ b) all have coefficients that are exactly zero, the analysis will certainly break down. That’s
exactly what happens when D(a, b) = 0. Here are some examples. The functions
all have (0, 0) as the only critical point and all have D(0, 0) = 0. The first, f1 has its minimum there.
The second, f2 , has its maximum there. The third and fourth have a saddle point there.
Here are sketchs of some level curves for each of these four functions (with all renamed to
simply f ).
9 The shackles of convention are not limited to mathematics. Election ballots often have the candidates listed in
alphabetic order.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES
y f “9 y f “´9
f “4 f “´4
f “1 f “´1
f “0.1 f “´0.1
f “0 f “0
x x
y y
f “4
f “´4
f “1 f “´1
f “0
f “4 f “1 f “1 f “4
f “0
x f “0 x
f “´1
f “´1
f “´4
f “´4
Example 16.1.16 f (x, y) = 2x3 ´ 6xy + y2 + 4y
Find and classify all critical points of f (x, y) = 2x3 ´ 6xy + y2 + 4y.
Solution. Thinking a little way ahead, to find the critical points we will need the first order partial
derivatives. To apply the second derivative test of Theorem 16.1.14 we will need all second order
partial derivatives. So we need all partial derivatives of order up to two. Here they are.
f = 2x3 ´ 6xy + y2 + 4y
fx = 6x2 ´ 6y fxx = 12x fxy = ´6
fy = ´6x + 2y + 4 fyy = 2 fyx = ´6
(Of course, fxy and fyx have to be the same. It is still useful to compute both, as a way to catch some
mechanical errors.)
We have already found, in Example 16.1.7, that the critical points are (1, 1), (2, 4). The
classification is
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES
critical
point fxx fyy ´ fxy
2 fxx type
(1, 1) 12 ˆ 2 ´ (´6)2 ă 0 saddle point
(2, 4) 24 ˆ 2 ´ (´6)2 ą0 24 local min
We were able to leave the fxx entry in the top row blank, because
• we knew that fxx (1, 1) fyy (1, 1) ´ fxy
2 (1, 1) ă 0, and
• we knew, from Theorem 16.1.14, that fxx (1, 1) fyy (1, 1) ´ fxy
2 (1, 1) ă 0, by itself, was enough
f p2,4q“0, p2,4q
f “0.25
f “0.5
f “2 f “1
f “3 f “2
f “3
p1,1q, f p1,1q“1
f “0.5
f “0 f “1 x
but can give you some idea as to what the graph of f looks like.
Example 16.1.16
of f (x, y) in Example 16.1.8. Again, to classify the critical points we need the second order partial
derivatives. They are
(Once again, we have computed both fxy and fyx to guard against mechanical errors.) We have
already found, in Example 16.1.8, that the critical points are (0, 0), (0, 15), (3, 0) and (1, 5). The
classification is
critical
point fxx fyy ´ fxy
2 fxx type
(0, 0) 0 ˆ 0 ´ (´15)2 ă 0 saddle point
(0, 15) 150 ˆ 0 ´ 152 ă0 saddle point
(3, 0) 0 ˆ 6 ´ 152 ă 0 saddle point
(1, 5) 50 ˆ 2 ´ 52 ą 0 50 local min
Here is a sketch of some level curves of our f (x, y). f is negative in the shaded regions and f is
positive in the unshaded regions. Again this is not needed to answer this question, but can give you
p0,15q, f p0,15q“0
f p1,5q“´25, p1,5q
f “20 f “20
p3,0q, f p3,0q“0
f p0,0q“0, p0,0q f “´20
f “´10
f “0
x
f “´20 f “´20
f “20
Example 16.1.18
Find and classify all of the critical points of f (x, y) = x3 + xy2 ´ 3x2 ´ 4y2 + 4.
Solution. We know the drill now. We start by computing all of the partial derivatives of f up to order
2.
fx and fy are defined everywhere. So the critical points are then the solutions of fx = 0, fy = 0. That
is
fx = 3x2 + y2 ´ 6x = 0 (E1)
fy = 2y(x ´ 4) = 0 (E2)
The second equation, 2y(x ´ 4) = 0, is satisfied if and only if at least one of the two equations y = 0
and x = 4 is satisfied.
0 = 3x2 + 02 ´ 6x = 3x(x ´ 2)
so that x = 0 or x = 2.
0 = 3 ˆ 42 + y2 ´ 6 ˆ 4 = 24 + y2
which is impossible.
So, there are two critical points: (0, 0), (2, 0). Here is a table that classifies the critical points.
critical
point fxx fyy ´ fxy
2 fxx type
(0, 0) (´6) ˆ (´8) ´ 02 ą 0 ´6 ă 0 local max
(2, 0) 6 ˆ (´4) ´ 02 ă 0 saddle point
Example 16.1.18
Example 16.1.19
A manufacturer wishes to make an open rectangular box of given volume V using the least possible
material. Find the design specifications.
Solution. Denote by x, y and z, the length, width and height, respectively, of the box.
x
y
The box has two sides of area xz, two sides of area yz and a bottom of area xy. So the total surface
area of material used is
S = 2xz + 2yz + xy
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES
However the three dimensions x, y and z are not independent. The requirement that the box have
volume V imposes the constraint
xyz = V
We can use this constraint to eliminate one variable. Since z is at the end of the alphabet (poor z),
V
we eliminate z by substituting z = xy . Note that if x (or y) is equal to zero then the volume of the
box would equal zero. What is the point of a box with zero volume?! So if we assume the box has
non-zero volume then x = 0 and y = 0. So we have find the values of x and y that minimize the
function
2V 2V
S(x, y) = + + xy
y x
Let’s start by finding the critical points of S. Since
2V
Sx (x, y) = ´ +y
x2
2V
Sy (x, y) = ´ +x
y2
Note that the partial derivatives are not defined for (x, y) = (0, 0) but we have already eliminated
the case where x or y is equal to zero. So (x, y) is a critical point if and only if
x2 y = 2V (E1)
xy2 = 2V (E2)
2V
Solving (E1) for y gives y = x2
. Substituting this into (E2) gives
4V 2 3
?
3 2V ?
3
x = 2V ùñ x = 2V ùñ x = 2V and y = = 2V
x4 (2V ) 2/3
As there is only one critical point, we would expect it to give the minimum10 . But let’s use the
second derivative test to verify that at least the critical point is a local minimum. The various second
partial derivatives are
4V ?
3
?
3
Sxx (x, y) = Sxx 2V , 2V = 2
x3 ? ?
3 3
Sxy (x, y) = 1 Sxy 2V , 2V = 1
4V ?
3
?
3
Syy (x, y) = 3 Syy 2V , 2V = 2
y
So
?3
?
3
?
3
?3
?3
?3
2 ?3
?
3
Sxx 2V , 2V Syy 2V , 2V ´ Sxy 2V , 2V = 3 ą 0 Sxx 2V , 2V = 2 ą 0
? ?
and, by Theorem 16.1.14.b, 3 2V , 3 2V is a local minimum and the desired dimensions are
?
c
3 3 V
x = y = 2V z=
4
10 Indeed one can use the facts that 0 ă x ă 8, that 0 ă y ă 8, and that S Ñ 8 as x Ñ 0 and as y Ñ 0 and as x Ñ 8
and as y Ñ 8 to prove that the single critical point gives the global minimum.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA
Note that our solution has x = y. That’s a good thing — the function S(x, y) is symmetric in x and y.
Because the box has no top, the symmetry does not extend to z.
Example 16.1.19
Learning Objectives
• Reduce a constrained optimization problem in 3D, where the constraint is a single
function (possibly with endpoints, possibly not), to a single-variable calculus problem.
• Understand that the global extrema of a two-variable function over a closed region
occur along the boundary and/or at critical points of the interior
• Find the extreme values for a function of two variables on a closed region in cases
where optimization on the boundary can be reduced to a single-variable calculus
problem.
Of course a local maximum or minimum of a function need not be the absolute maximum or
minimum. We’ll now consider how to find the absolute maximum and minimum. Let’s start by
reviewing how one finds the absolute maximum and minimum of a function of one variable on an
interval.
For concreteness, let’s suppose that we want to find the extremal11 values of a function f (x) on
the interval 0 ď x ď 1. If an extremal value is attained at some x = a which is in the interior of the
interval, i.e. if 0 ă a ă 1, then a is also a local maximum or minimum and so has to be a critical
point of f . But if an extremal value is attained at a boundary point a of the interval, i.e. if a = 0
or a = 1, then a need not be a critical point of f . This happens, for example, when f (x) = x. The
largest value of f (x) on the interval 0 ď x ď 1 is 1 and is attained at x = 1, but f 1 (x) = 1 is never
zero, so that f has no critical points.
y y = f (x) = x
1
1 x
So to find the maximum and minimum of the function f (x) on the interval [0, 1], you:
1. build up a list of all candidate points 0 ď a ď 1 at which the maximum or miminum could be
attained, by finding all a’s for which either
11 Recall that “extremal value” means “either maximum value or minimum value”.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA
The second equation, 2y(x ´ 4) = 0, is satisfied if and only if at least one of the two equations y = 0
and x = 4 is satisfied.
• When y = 0, equation (E1) forces x to obey
0 = 3x2 + 02 ´ 6x = 3x(x ´ 2)
so that x = 0 or x = 2.
• When x = 4, equation (E1) forces y to obey
0 = 3 ˆ 42 + y2 ´ 6 ˆ 4 = 24 + y2
which is impossible.
So, there are only two critical points: (0, 0), (2, 0).
Boundary: Our boundary is x2 + y2 = 1 We know that (x, y) satisfies x2 + y2 = 1, and hence
y2 = 1 ´ x2 . Examining the formula for f (x, y), we see that it contains only even15 powers of y, so
we can eliminate y by substituting y2 = 1 ´ x2 into the formula.
f = x3 + x(1 ´ x2 ) ´ 3x2 ´ 4(1 ´ x2 ) + 4 = x + x2
The max and min of x + x2 for ´1 ď x ď 1 must occur either
• when x = ´1 (ñ y = f = 0) or
• when x = +1 (ñ y = 0, f = 2) or
b
• when 0 = d 2
dx (x + x ) = 1 + 2x ( so x = ´ 12 , y = ˘ 34 , f = ´ 14 ).
Here is a sketch showing all of the points that we have identified.
√ y
(− 12 , 2
3
)
√
(− 21 , − 2
3
)
Note that the point (2, 0) is outside the allowed region16 . So all together, we have the following
candidates for max and min, with the max and min indicated.
?
point (0, 0) (´1, 0) (1, 0) ´ 2 , ˘ 23
1
value of f 4 2 0 ´ 14
max min
Example 16.2.1
?
2
?we could consider the cases y ě 0 and y ď 0 separately and substitute y = 1 ´ x
15 If it contained odd powers too,
2
in the former case and y = ´ 1 ´ x in the latter case.
16 We found (2, 0) as a solution to the critical point equations (E1), (E2). That’s because, in the course of solving
those equations, we ignored the constraint that x2 + y2 ď 1.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA
Example 16.2.2
Find the maximum and minimum values of f (x, y) = xy ´ x3 y2 when (x, y) runs over the square
0 ď x ď 1, 0 ď y ď 1.
Solution. As usual, let’s examine the critical points and boundary in turn.
Interior: If f takes its maximum or minimum value at a point in the interior, 0 ă x ă 1, 0 ă y ă 1,
then that point must be a critical point of f . To find the critical points we compute the first order
derivatives.
fx (x, y) = y ´ 3x2 y2 fy (x, y) = x ´ 2x3 y
Again, these functions are polynomials in two variables and they are smooth everywhere in their
domain, so the first order partial derivatives exist everywhere in the interior. This means that the
critical points are the solutions of
• Next, we look at the part of the boundary with y = 0. On that entire side f = 0.
• Next, we look at the part of the boundary with y = 1. There f = f (x, 1) = x ´ x3 . To find the
maximum and minimum of f (x, y) on the part of the boundary with y = 1, we must find the
maximum and minimum of x ´ x3 when 0 ď x ď 1.
Recall that, in general, the maximum and minimum of a function h(x) on the interval a ď x ď b,
must occur either at x = a or at x = b or at an x for which either h1 (x) = 0 or h1 (x) does not
exist. In this case, ddx (x ´ x3 ) = 1 ´ 3x2 , so the max and min of x ´ x3 for 0 ď x ď 1 must
occur
– either at x = 0, where f = 0,
– or at x = ?13 , where f = 3?
2
3
,
– or at x = 1, where f = 0.
– either at y = 0, where f = 0,
– or at y = 21 , where f = 14 ,
– or at y = 1, where f = 0.
All together, we have the following candidates for max and min, with the max and min indicated.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA
y
(0, 1) ( √13 , 1) (1, 1)
(1, 12 )
x
(0, 0) (1, 0)
Example 16.2.2
A common misconception when students are first learning about “checking boundaries” is
that the absolute extrema will occur on the “corners” of the boundaries. In the example we
just finished, Example 16.2.2, the four corners of our square boundary were indeed points
we needed to check. But if we had only checked the corners, we wouldn’t have found the
absolute maximum.
In your homework, if you notice that the extrema often occur at “corners” of boundaries,
or at point with x or y equal to 0, you should not take this to be a general rule.
To really see why corners don’t need to be important, consider the image17 below of an area
northeast of UBC. The central body of water in the image is Indian Arm. Indian Arm extends into
the ocean, so its elevation is pretty close to sea level. If we’re thinking of the z axis as height above
sea level, the surface of Indian Arm is probably the global minimum height in the rectangular region
shown. So, the global minimum along the boundary is not at a corner. It’s somewhere in the middle
of the left vertical boundary segment.
17 image generated by Natural Resources Canada’s Atlas of Canada - Toporama and shared under the open government
license
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA
Similarly, looking at the mountains in the image, there’s no reason to imagine the absolute highest
point along the boundary must specifically happen at a corner.
Example 16.2.4
a
Find the high and low points of the surface z = x2 + y2 with (x, y) varying over the square |x| ď 1,
|y| ď 1 .
a
Solution. The function f (x, y) = x2 + y2 has a particularly simple geometric interpretation — it
is the distance from the point (x, y) to the origin. So
• the minimum of f (x, y) is achieved at the point in the square that is nearest the origin —
namely the origin itself. So (0, 0, 0) is the lowest point on the surface and is at height 0.
• The maximum of f (x, y) is achieved at the points in the square that are farthest from
? the
origin — namely the four corners of the square ˘ 1, ˘1 . At those four points z = 2. So
?
the highest points on the surface are (˘1, ˘1, 2).
Even though we have already answered this question, it will be instructive to see what a we would
have found if we had followed our usual protocol. The partial derivatives of f (x, y) = x2 + y2 are
defined for (x, y) ‰ (0, 0) and are
x y
fx (x, y) = a fy (x, y) = a
x + y2
2 x + y2
2
• As we mentioned above, at the point (x, y) = (0, 0) the partial derivatives are not defined. But
(0, 0) is inside the interior of the domain of our function. Therefore, (0, 0) is a critical point.
• There are no other critical points because
• The boundary of the square consists of its four sides. One side is
ˇ (
(x, y) ˇ x = 1, ´1 ď y ď 1
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA
a a
On this side f = 1 + y2 . As 1 + y2 increases ? with |y|, the smallest value of f on that side
is 1 (when y = 0) and the largest value of f is 2 (when y = ˘1). The same thing happens
on the other three sides. The maximum value of f is achieved at the four corners. Note that fx
and fy are both nonzero at all four corners.
Example 16.2.4
Suppose you start with the complete graph on 30 vertices. You delete edges (but not vertices)
one-by-one until the graph is broken into three parts. Every part has at least one vertex (otherwise it
wouldn’t be a part, it would be a nothing) and there are no edges between vertices of different parts.
Some possibilities are shown below to demonstrate.
What is the minimum number of edges you could have deleted, in order to break the graph into
three pieces?
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA
Solution. Let’s name the pieces X, Y , and W , and say the numbers of vertices they contain are x, y,
and w, respectively. Then x ě 1, y ě 1, w ě 1, and x + y + w = 30.
For every vertex in one piece of the broken graph, you must have deleted the edges connecting it
to every vertex in every other piece. So, to delete all the edges from X to Y , you deleted at least xy
edges; to delete all the edges from X to W , you deleted at least xw edges; and to delete all the edges
from Y to W , you deleted at least yw edges. So all together, you deleted at least this many edges:
xy + xw + yw
Since x + y + w = 30, we can eliminate one of these from our expression, and say the minimum
number of edges deleted was:
f (x, y) = xy + x(30 ´ x ´ y) + y(30 ´ x ´ y)
= 30x + 30y ´ x2 ´ xy ´ y2
The domain of this function is all integer pairs in the region bounded by x ě 1, y ě 1, and x + y ď 29.
y
y = 28
y=1
x
x=1 x = 28
x + y = 29
To find the minimum value of f (x, y) in this region, we should check for critical points, and check
all three boundary lines.
• First, let’s check for critical points.
f (x, y) = 30x + 30y ´ x2 ´ xy ´ y2
fx = 30 ´ 2x ´ y fy = 30 ´ 2y ´ x
Solving fx = 0 for y, we find y = 30 ´ 2x. Plugging into the equation fy = 0, we get:
0 = fy = 30 ´ 2(30 ´ 2x) ´ x
= 3x ´ 30
x = 10
y = 30 ´ 2x = 10
So, our only critical point is (10, 10), and this is inside our region.
f (10, 10) = 300 + 300 ´ 100 ´ 100 ´ 100 = 300
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA
• Second, let’s check the boundary line y = 1, 1 ď x ď 28. On this portion of the boundary:
• Third, we check the boundary line x = 1, 1 ď y ď 28. On this portion of the boundary:
• Fourth, we check the final boundary line, y = 29 ´ x, 1 ď x ď 28. On this portion of the
boundary:
The one-variable function g(x) = ´x2 + 29x + 29 is a parabola pointing down, so its minimum
will occur at and endpoint of our interval: x = 1 or x = 28.
Comparing the values from the four bullet points, we find the minimum number of edges we could
have deleted in order to break the complete graph into 3 pieces is 57. We achieve that minimum by
having two pieces of one vertex each, and the remaining piece with all other vertices.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
Remark 1: making use of sketching and symmetry can reduce the amount of work involved in
solving this problem. If we recognize that f (x, y) is a paraboloid opening down, then we know its
critical point will actually be an absolute max – not the minimum we’re looking for.
We can see the x and y are symmetric in f (x, y) and in our region, so we also could have checked
only the boundary x = 1, and not the boundary y = 1, understanding that their minimum values
would be the same.
Remark 2: Our model domain for this problem actually restricts x and y to whole-number values,
as opposed to real numbers. We showed that 57 was the minimum value of f (x, y) over all real
numbers in the sketched region. Since whole numbers are themselves reals, and the minimum
occurred at integer value of x and y (i.e. the minimum is in our model domain), we can be sure that
57 is the minimum over all whole numbers in our domain. If the minimum had occurred at, say
x = 12 and y = 12 , then it wouldn’t have been in our model domain – and this would be a problem for
a different course!
Example 16.2.5
Learning Objectives
• Understand that solutions to a particular system of equations correspond to points
along a curve that is locally flat.
• Choose between the method of Lagrange multipliers, and simple plugging in, for
determining extrema along a constraint.
• Find the absolute extrema of a surface over a closed region, using the appropriate
method (Lagrange or plugging in) for investigating the boundary.
In the last section we had to solve a number of problems of the form “What is the maximum value
of the function f on the curve C?” In those examples, the curve C was simple enough that we could
reduce the problem to finding the maximum of a function of one variable. For more complicated
problems this reduction might not be possible. In this section, we introduce another method for
solving such problems. First some nomenclature.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
Definition 16.3.1.
“Find the maximum and minimum values of the function f (x, y) for (x, y) on the curve
g(x, y) = 0.”
Such problems are quite common. As we said above, we have already encountered them in
the last section on absolute maxima and minima, when we were looking for the extreme values
of a function on the boundary of a region. In economics “utility functions” are used to model the
relative “usefulness” or “desirability” or “preference” of various economic choices. For example, a
utility function U (w, κ ) might specify the relative level of satisfaction a consumer would get from
purchasing a quantity w of wine and κ of coffee. If the consumer wants to spend $100 and wine
costs $20 per unit and coffee costs $5 per unit, then the consumer would like to mazimize U (w, κ )
subject to the constraint that 20w + 5κ = 100.
To this point we have always solved such constrained optimization problemsby solving g(x, y) =
0 for y as a function of x (or for x as a function of y). However, quite often the function g(x, y) is so
complicated that one cannot explicitly solve g(x, y) = 0 for y as a function of x or for x as a function
of y and one also cannot explicitly parametrize g(x, y) = 0. Or sometimes you can, for example,
solve g(x, y) = 0 for y as a function of x, but the resulting solution is so complicated that it is
really hard, or even virtually impossible, to work with. Direct attacks become even harder in higher
dimensions when, for example, we wish to optimize a function f (x, y, z) subject to a constraint
g(x, y, z) = 0.
There is another procedure called the method of “Lagrange18 multipliers” that comes to our
rescue in these scenarios. Here is the two-dimensional version of the method. There are obvious
analogues is other dimensions.
18 Joseph-Louis Lagrange was actually born Giuseppe Lodovico Lagrangia in Turin, Italy in 1736. He moved to
Berlin in 1766 and then to Paris in 1786. He eventually acquired French citizenship and then the French claimed he
was a French mathematician, while the Italians continued to claim that he was an Italian mathematician.
19 If you’re walking along hilly terrain, changing direction can cause you to change from going uphill to downhill.
Direction definitely matters!
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
Let f (x, y, z) and g(x, y, z) have continuous first partial derivatives in a region of R3
that contains the surface S given by the equation g(x, y, z) = 0. Further sssume that
∇ g(x, y, z) ‰ 0 on S.
If f , restricted to the surface S, has a local extreme value at the point (a, b, c) on S, then
there is a real number λ such that
∇ f (a, b, c) = λ∇
∇g(a, b, c)
that is
fx (a, b, c) = λ gx (a, b, c)
fy (a, b, c) = λ gy (a, b, c)
fz (a, b, c) = λ gz (a, b, c)
Proof. Suppose that (a, b, c) is a point of S and that f (x, y, z) ě f (a, b, c) for all points (x, y, z) on S
that are close to (a, b, c). That is (a, b, c) is a local minimum for f on S. Of course the argument for
a local maximum is virtually identical.
Imagine that we go for a walk on S, with the time t running, say, from t = ´1 to t = +1 and
that at time t = 0 we happen to be exactly at (a, b, c). Let’s say that our position is x(t ), y(t ), z(t )
at time t. Write
F (t ) = f x(t ), y(t ), z(t )
So F (t ) is the value of f that we see on our walk at time t. Then for all t close to 0, x(t ), y(t ), z(t )
is close to x(0), y(0), z(0) = (a, b, c) so that
F (0) = f x(0), y(0), z(0) = f (a, b, c) ď f x(t ), y(t ), z(t ) = F (t )
for all t close to zero. So F (t ) has a local minimum at t = 0 and consequently F 1 (0) = 0.
By the multivariable chain rule,
1 d ˇˇ
F (0) = f x(t ), y(t ), z(t ) ˇ
dt t =0
= fx a, b, c x1 (0) + fy a, b, c y1 (0) + fz a, b, c z1 (0) = 0 (˚)
We may rewrite this as a dot product:
0 = F 1 (0) = ∇ f (a, b, c) ¨ [x1 (0) , y1 (0) , z1 (0)]
ùñ ∇ f (a, b, c) K [x1 (0) , y1 (0) , z1 (0)]
This is true for all paths on S that pass through (a, b, c) at time 0. In particular it is true for all vectors
[x1 (0) , y1 (0) , z1 (0)] that are tangent to S at (a, b, c). So ∇ f (a, b, c) is perpendicular to S at (a, b, c).
But we already knowthat ∇ g(a, b, c) is also perpendicular to S at (a, b, c). So ∇ f (a, b, c) and
∇ g(a, b, c) have to be parallel vectors. That is,
∇ f (a, b, c) = λ∇
∇g(a, b, c)
for some number λ . That’s the Lagrange multiplier rule of our theorem.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
Let f (x, y) and g(x, y) have continuous first partial derivatives in a region of R2 that
contains the surface S given by the equation g(x, y) = 0. Further assume that g(x, y) has
no critical points on S.
If f , restricted to the surface S, has a local extreme value at the point (a, b) on S, then
there is a real number λ such that
fx (a, b) = λ gx (a, b)
fy (a, b) = λ gy (a, b)
So to find the maximum and minimum values of f (x, y) on a surface g(x, y) = 0, assuming
that both the objective function f (x, y) and constraint function g(x, y) have continuous first partial
derivatives, and that g(x, y)hasnocritical points, you
1. build up a list of candidate points (x, y, z) by finding all solutions to the equations
fx (x, y) = λ gx (x, y)
fy (x, y) = λ gy (x, y)
g(x, y) = 0
Note that there are three equations and three unknowns, namely x, y, and λ .
2. Then you evaluate f (x, y) at each (x, y) on the list of candidates. The biggest of these
candidate values is the absolute maximum, if an absolute maximum exists. The smallest of
these candidate values is the absolute minimum, if an absolute minimum exists..
Theorem 16.3.3 can be extended to functions of more variables in a natural way. Using higher-
dimensional Lagrange isn’t in our learning goals, but for interest, we want you to see how easily the
method generalizes. The calculus is the same – it’s only the algebra that gets longer.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
Let f (x, y, z) and g(x, y, z) have continuous first partial derivatives in a region of R3 that
contains the surface S given by the equation g(x, y, z) = 0. Further assume that g(x, y, z)
has no critical points on S.
If f , restricted to the surface S, has a local extreme value at the point (a, b, c) on S, then
there is a real number λ such that
fx (a, b, c) = λ gx (a, b, c)
fy (a, b, c) = λ gy (a, b, c)
fz (a, b, c) = λ gz (a, b, c)
fx = 2x ´ 10 fy = ´2y gx = 2x gy = 8y
So, according to the method of Lagrange multipliers, we need to find all solutions to the following
system of equations.
fx = λ gx 2x ´ 10 = λ (2x) (E1)
fy = λ gy ùñ ´2y = λ (8y) (E2)
g(x, y) = 0 x2 + 4y2 ´ 16 = 0 (E3)
(E1) In equation (E1), if 2x is nonzero, then we can divide both sides of the equation by it, to find
x´5
λ = 2x´10
2x , i.e. λ = x . If 2x = 0, then the equation becomes ´10 = 0λ , which is not
true for any λ .
(E2) In equation (E2), if 8y is nonzero, then we can divide both sides of the equation by it, to find
1
λ = ´2y
8y , i.e. λ = ´ 4 . If 8y = 0, then we also get a solution y = 0 for any λ .
(E1)+(E2) We need all three equations to be true at the same time (that is, for the same values of x,
y, and λ . We’ve found two ways for both (E1) and (E2) to be true.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
1
x and λ = ´ 4
x´5
• First way: λ =
• Second way: λ = x´5
x and y = 0
(E3) Now we’ll see which points make (E1) and (E2) true while also making (E3) true.
x´5 1
λ= and λ = ´
x 4
x´5 1
ùñ =´
x 4
ùñ ´4x + 20 = x
ùñ x=4
0 = 42 + 4y2 ´ 16
0=y
0 = x2 + 4 ¨ 02 ´ 16
16 = x2
x = ˘4
Now we’ve found the only possible solutions to all three equations: (˘4, 0). (λ has to exist, but
we don’t actually care what it is.) So the method of Lagrange multipliers, Theorem 16.3.3, gives that
the only possible locations of the maximum and minimum of the function f are (4, 0) and (´4, 0).
To complete the problem, we only have to compute f at those points.
Hence the maximum value of x2 ´ 10x ´ y2 on the ellipse is 56 and the minimum value is ´24.
y
x2 ` 4y 2 “ 16
p4,0q
p´4,0q
x
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
Example 16.3.5
In the previous example, we had to make a lot of decisions about how to solve for the solutions
to the system of three equations. Actually, we can start our Lagrange system-solving the same way
every time. The first observation we make is that the partial derivatives of g can be 0, or nonzero. If
they’re zero, this may or may not lead to a solution; if they’re nonzero, this tells us something about
λ.
In the textbook and problem book, we will consistently use the same method to solve the system
of equations. It’s certainly not the only way, and you are free to use other methods. Once you
get used to the computations, you’ll probably start finding ways to make them faster based on the
specifics of individual problems.
Example 16.3.6 (Solving Lagrange in General)
Suppose you want to find all points (x, y) for which a solution exists to the system below.
f x = λ gx (E1)
f y = λ gy (E2)
g(x, y) = 0 (E3)
where λ is some real constant. Our method below will hinge on the observation from the last
example that we get different solutions for zero vs. nonzero partial derivatives of the constraint.
fx fy
• If gx ‰ 0 and gy ‰ 0, then from (E1) we see λ = gx , and from (E2) we see λ = gy . So,
choosing a pair (x, y) such that
fx fy
=
gx gy
means that for some λ , that pair makes (E1) and (E2) true. Simplify the equation above to
find the necessary relationship between x and y, then find which pairs with that relationship
make (E3) true.
• If gx = 0, then from (E1) we see also fx = 0. Then (E1) is true for any λ that we like. We can
check that there exists some λ that makes (E2) true as well. Then, we find the points (x, y)
that make (E3) true as well as gx = fx = 0.
• If gy = 0, then from (E2) we see also fy = 0. Then (E2) is true for any λ that we like. We can
check that there exists some λ that makes (E1) true as well. Then, we find the points (x, y)
that make (E3) true as well as gx = fx = 0.
Sometimes, one or more of these cases won’t lead to any solutions. In Example 16.3.5, we were
immediately able to discard the possibility gx = 0, because it didn’t lead to a solution. Once you’re
practiced with these types of problems, you’ll often see quite quickly which cases you get to discard.
Example 16.3.6
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
g(x, y) = x2 ´ 2x + y2 ´ 4y ´ 20 = 0
We start by setting up the first two equations from the method of Lagrange multipliers.
2x ´ 2
f x = λ gx = λ (2x ´ 2) (E1)
x2 ´ 2x + 5
2y ´ 4
f y = λ gy = λ (2y ´ 4) (E2)
y2 ´ 4y + 13
g(x, y) = 0 x2 ´ 2x + y2 ´ 4y = 20 (E3)
1 1
=
x2 ´ 2x + 5 y2 ´ 4y + 13
x2 ´ 2x + 5 = y2 ´ 4y + 13
x2 ´ 2x = y2 ´ 4y + 8
This gives us the relationship between x and y that must hold for (E1) and (E2) to be true
under the assumption gx ‰ 0 and gy ‰ 0. Now, in order for (E3) to be true as well:
0 = (x2 ´ 2x) + y2 ´ 4y ´ 20
= (y2 ´ 4y + 8) + y2 ´ 4y ´ 20
= 2y2 ´ 8y ´ 12
0 = y2 ´ 4y ´ 6
?
?
a
4 ˘ 16 ´ 4(1)(´6) 4 ˘ 40
y= = = 2 ˘ 10
2 2
2 2
So, 0 = (x ´ 2x) + y ´ 4y ´ 20
? 2 ?
2
= x ´ 2x + 2 ˘ 10 ´ 4(2 ˘ 10) ´ 20
? ?
= x2 ´ 2x + 4 ˘ 4 10 + 10 ´ 8 ¯ 4 10 ´ 20
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
? ?
Note ˘4 2 ¯ 4 2 = 0
= x2 ´ 2x + 4 + 10 ´ 8 ´ 20
= x2 ´ 2x ´ 14
?
?
a
2 ˘ 4 ´ 4(´14) 2 ˘ 2 15
x= = = 1 ˘ 15
2 2
This gives
? us four
? points
to ?
consider: ? ? ? ? ?
1 + 15, 2 + 10 , 1 ´ 15, 2 + 10 , 1 + 15, 2 ´ 10 , and 1 ´ 15, 2 ´ 10 .
• If gx = 0, then x = 1, and (E1) is true for any λ . Then we can choose whatever λ is necessary
to make (E2) true. By (E3):
0 = x2 ´ 2x + y2 ´ 4y ´ 20
= 1 ´ 2 + y2 ´ 4y ´ 20
= y2 ´ 4y ´ 21
= (y ´ 7)(y + 3)
y = 7, y = ´3
• If gy = 0, then y = 2, and (E2) is true for any λ . Then we can choose whatever λ is necessary
to make (E1) true. By (E3):
0 = x2 ´ 2x + y2 ´ 4y ´ 20
= x2 ´ 2x + 4 ´ 8 ´ 20
= x2 ´ 2x ´ 24
= (x ´ 6)(x + 4)
x = 6, x = ´4
So, all together we have eight points that satisfy our three Lagrange equations. It’s left only to
decide which of those points lead to maxima and to minima.
? ? ? ? ? ? ? ?
point (1 + 15, 2 + 10) (1 ´ 15, 2 + 10) (1 + 15, 2 ´ 10) (1 ´ 15, 2 ´ 10)
value of f ln 361 ln 361 ln 361 ln 361
max max max max
Example 16.3.8
Find the ends of the major and minor axes of the ellipse 3x2 ´ 2xy + 3y2 = 4. They are the points on
the ellipse that are farthest from and nearest to the origin.
Solution. Let (x, y) be a point on 3x2 ´ 2xy + 3y2 = 4. This point is at the end of a major axis when
it maximizes its distance from the centre of the ellipse, (0, 0). It is at the end of a minor axis
awhen it
minimizes its distance from (0, 0). So we wish to maximize and minimize the distance x2 + y2
subject to the constraint
g(x, y) = 3x2 ´ 2xy + 3y2 ´ 4 = 0
a a 2
Now maximizing/minimizing x2 + y2 is equivalent20 to maximizing/minimizing its square x2 + y2 =
x2 + y2 . So we are free to choose the objective function
f (x, y) = x2 + y2
which we will do, because it makes the derivatives cleaner. Again, we use Lagrange multipliers to
solve this problem, so we start by finding the partial derivatives.
fx (x, y) = 2x fy (x, y) = 2y gx (x, y) = 6x ´ 2y gy (x, y) = ´2x + 6y
We need to find all solutions to
2x = λ (6x ´ 2y) (E1)
2y = λ (´2x + 6y) (E2)
3x2 ´ 2xy + 3y2 ´ 4 = 0 (E3)
2x 2y y
• If gx ‰ 0 and gy ‰ 0, then λ = 6x´2y = x
by (E1), and λ =
3x´y ´2x+6y = ´x+3y by (E2).
x y
=
3x ´ y ´x + 3y
´x + 3xy = 3xy ´ y2
2
x2 = y2
x = ˘y
So if x = ˘y, then the appropriate λ will make both (E1) and (E2) true. Now let’s see what
makes (E3) true.
4 = 3x2 ´ 2xy + 3y2
4 = 3(˘y)2 ´ 2(˘y)y + 3y2
= 3y2 ¯ 2y2 + 3y2
= (6 ¯ 2)y2
1
4 = ( 6 + 2 ) x2 ùñ x = ˘ ? when x = ´y
2
4 = ( 6 ´ 2 ) x2 ùñ x = ˘1 when x = y
20 The function S(z) = z2 is a strictly increasing function for z ě 0. So, for a, b ě 0, the statement “a ă b” is equivalent
to the statement “S(a) ă S(b)”.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
This gives us four points to check: the two points ˘ ?1 , ´ ?1 and the two points ˘(1, 1)
2 2
?
The distance
from ( 0, 0 ) to ˘ ( 1, 1 ) , namely distance from (0, 0) to
2, is larger than the
1 1 1 1
˘ 2 , ´ 2 , namely 1. So the ends of the minor axes are ˘ 2 , ´ 2 and the ends of the major
? ? ? ?
axes are ˘(1, 1). Those ends are sketched in the figure on the left below. Once we have the ends, it
is an easy matter21 to sketch the ellipse as in the figure on the right below.
y y
p1,1q p1,1q
? ?
p´1,1q{ 2 p´1,1q{ 2
x x
? ?
p1,´1q{ 2 p1,´1q{ 2
p´1,´1q p´1,´1q
3x2 ´ 2xy ` 3y 2 “ 4
Example 16.3.8
In the previous examples, the objective function and the constraint were specified explicitly. That
will not always be the case. In the next example, we have to do a little geometry to extract them.
Example 16.3.9
Find the rectangle of largest area (with sides parallel to the coordinates axes) that can be inscribed in
the ellipse x2 + 2y2 = 1.
21 if you tilt your head so that the line through (1, 1) and (´1, ´1) appears horizontal
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
y
x2 ` 2y 2 “ 1
px, yq
Call the coordinates of the upper right corner of the rectangle (x, y), as in the figure above. Note
that x ě 0 and y ě 0; and if x = 0 or y = 0, then the area of the rectangle is 0, which is certainly
not a maximum. So the global maximum must occur at some point where x and y are both positive.
This will also be a local maximum, so we should be able to find it using the method of Lagrange
multipliers.
The four corners of the rectangle are (˘x, ˘y) so the rectangle has width 2x and height 2y and the
objective function is f (x, y) = 4xy. The constraint function for this problem is g(x, y) = x2 + 2y2 ´ 1.
Again, to use Lagrange mutlipliers we need the first order partial derivatives.
fx = 4y fy = 4x gx = 2x gy = 4y
So, according to the method of Lagrange multipliers, we need to find all solutions to
4y = λ (2x) (E1)
4x = λ (4y) (E2)
x2 + 2y2 ´ 1 = 0 (E3)
4y 2y 4x
• If gx ‰ 0 and gy ‰ 0, then λ = 2x = x from (E1) and λ = 4y = xy from (E2). So,
2y x
=
x y
2y = x2
2
?
x = (˘ 2)y
From (E3),
? 2
(˘ 2)y + 2y2 ´ 1 = 0
2y2 + 2y2 = 1
4y2 = 1
1
y=˘
2
? 1
x = (˘ 2)y = ˘ ?
2
So there are four points to consider: ˘ ?12 , ˘ 12 .
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
• If gx = 0, i.e. 2x = 0, then x = 0; by (E1) also y = 0; but then (E3) fails. So this doesn’t give
us any more points to consider.
• If gy = 0, i.e. 4y = 0, then y = 0; by (E2) also x = 0; but then (E3) fails. So this doesn’t give
us any more points to consider either.
? ? ?
We now have four possible values of (x, y), namely 1/ 2 , 1/2 , ´ 1/ 2 , ´1/2 , 1/ 2 , ´1/2
?
and ´ 1/ 2 , 1/2 . They are the four corners of a single rectangle. We said that we wanted (x, y) to
?
be the upper right corner, i.e. the corner in the first quadrant. It is 1/ 2 , 1/2 .
How do we interpret the other three points we found? The global min of the function 4xy subject
to the constraint x2 + 2y2 = 1 will occur at one of these points, but those points aren’t in our model
domain. When x and y have different signs, 4xy no longer gives the area of a rectangle, since it’s
negative. Over our model domain, we kind of have “endpoints:” x = 0 and y = 0. Our maximum
occurred somewhere between our endpoints; our model minimum occurs at the endpoints.
Example 16.3.9
1. If our constraint function is a closed curve (circle, ellipse, square, etc.) and our objective
function is continuous over it, then there will certainly be an absolute max and absolute min
over the constraint; and these will certainly also be local extrema. So when our constraint is
a closed curve, and our objective function is continuous over it, we are guaranteed that the
absolute max and min exist, and are at points that satisfy the Lagrange equations.
In Section 16.2 we considered domains that were bounded by a closed curve, so we only
considered boundaries of this type.
2. If our constraint function is not a closed curve (e.g. a line, a line segment, a function like
xy = 1, etc.) then the system is more complicated. Assume that the objective function is
continuous over the constraint curve. Since our constraint curve is one-dimensional (like a
line, but a line that has some orientation in space), we’re in a similar position as we were in
single-variable calculus: extrema can occur at endpoints, or at “critical points.” In our case,
“critical points” translate to solutions to the Lagrange equations; “endpoints” mean pretty
much the same thing they always have.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS
(a) If the constraint function is bounded, we must consider its endpoints as well as solutions
to the Lagrange system. There will be an absolute maximum and minimum, and these
will definitely occur at solutions to the Lagrange system or at the endpoints of the
constraint.
(b) If the constraint function is unbounded, there may or may not exist absolute extrema.
This is where you’ll most heavily rely on your understanding of function shape and
behaviour. Limits can be useful here.
Example 16.3.10
Find the values of w ě 0 and κ ě 0 that maximize the utility function
U (w, κ ) = 6w /3 κ
2 1/3
subject to the constraint 4w + 2κ = 12
Solution. The constraint 4w + 2κ = 12 is simple enough that we can easily use it to express
κ in terms of w, then substitute κ = 6 ´ 2w into U (w, κ ), and then maximize U (w, 6 ´ 2w) =
6w2/3 (6 ´ 2w)1/3 using the techniques of last semester.
However, for practice purposes, we’ll use Lagrange multipliers with the objective function
U (w, κ ) = 6w2/3 κ 1/3 and the constraint function g(w, κ ) = 4w + 2κ ´ 12. The first order derivatives
of these functions are
Uw = 4w´ /3 κ
1/3
Uκ = 2w /3 κ ´ /3
1 2 2
gw = 4 gκ = 2
The boundary values (“endpoints”) w = 0 and κ = 0 give utility 0, which is obviously not going to
be the maximum utility. So it suffices to consider only local maxima. According to the method of
Lagrange multipliers, we need to find all solutions to
4w´ /3 κ
1 1/3
= 4λ (E1)
2w /3 κ ´ /3 = 2λ
2 2
(E2)
4w + 2κ ´ 12 = 0 (E3)
Then we see gx ‰ 0 and gw ‰ 0, so we only have one of our usual three cases.
• Substituting this into (E2) gives w2/3 κ ´2/3 = λ = w´1/3 κ 1/3 and hence w = κ.
482
Appendix
483
Appendix A
Chapter 0: Introduction
• Solve a long question by breaking it up into smaller pieces.
• Understand some basic ideas about what constitutes a proof in mathematics; understand the
differences between how something is defined and how it is computed.
• Sketch functions of the form f (x) = xn , where n is a real number (power functions); interpret
the shapes of power functions relative to one another.
• Determine which term in a polynomial function will dominate for small x and for large x.
• Sketch two-term polynomial functions by determining which term dominates for small x and
for large x. For example, sketch f (x) = x2 ´ 3x4 .
485
L IST OF LEARNING OBJECTIVES
• Know that ex eventually dominates any given power function, and any power function with
positive exponent dominates logarithm (for large positive x). Use these facts for sketching.
For example, sketch f (x) = ex ´ x.
?
• Sketch familiar functions such as ex , log x, sin x, cos x, tan x, 1/x, x, and |x|.
Chapter 2: Limits
2.1: Quick review of limits
• Explain using both words and pictures what lim f (x) = L, lim f (x) = L, and lim f (x) = L
xÑa xÑa´ xÑa+
mean (including the case where L is equal to 8 or ´8).
• Explain using both words and pictures what lim f (x) = L and lim f (x) = L mean (includ-
xÑ8 xÑ´8
ing the case where L is equal to 8 or ´8).
• Find the limit of a function at a point given the graph of the function.
2.2: Asymptotes
• Explain using both informal language and the language of limits what it means for a function
to have a horizontal or vertical asymptote.
• Given a simple function, find its vertical and horizontal asymptotes by asymptotic reasoning
or by taking limits.
• Explain why it is not true that a function cannot cross its horizontal asymptote.
• Explain informally and formally what it means for a function to be continuous on its domain.
• Determine where a given function is continuous. Use formal notation as well as informal
explanation.
• Given a function defined with parameters, select parameter values that make the function
continuous.
486
L IST OF LEARNING OBJECTIVES
• Given an equation for a line, sketch the line, and identify its slope.
• Describe negative / positive / zero slope as corresponding to a line that is decreasing / increasing
/ constant over an interval.
• Find a line from two points; from a point and a slope; or from a clearly labelled graph.
• Describe the slope of a linear function as the rate of change of that function (change in y over
change in x).
• Explain using words, pictures, and the language of limits what a derivative is.
• Use the definition of derivative to find the tangent line to a function at a given point.
• Explain why the definition of a derivative is important, even if you know shortcuts for
computation.
• Use the definition of the derivative to show that the derivative of the function f (x) = ax (where
a is a positive constant) is a constant times ax .
• Note the useful modelling power of a function whose derivative is proportional to itself.
487
L IST OF LEARNING OBJECTIVES
• Use counterexamples to demonstrate that certain statements about derivatives are false.
• Demonstrate the Power Rule for integer exponents using the limit definition of derivative.
• Use the generalized product rule to compute the derivative of products of many functions.
• Use implicit differentiation to find slopes of tangent lines to implicitly defined curves.
488
L IST OF LEARNING OBJECTIVES
• Evaluate (at nice points) the inverse trigonometric functions arcsin(x), arccos(x) and arctan(x).
• Use implicit differentiation / chain rule to find the derivatives of the inverse trigonometric
functions arcsin(x), arccos(x) and arctan(x).
• Recognize the two types of indeterminate forms where L’Hôpital’s rule is directly applicable.
• Sketch a function using information from precalculus (limits, intercepts) and the first derivative
• Efficiently find signs of factored functions by determining where the signs change.
• Explain how information about the graph of a function may be extracted from the function, its
derivative and its second derivative.
• Sketch the graph of a function f (x) using the function, its derivative and its second derivative.
• Sketch the graph of a function using characteristics determined from the function and its
derivatives, without scaffolding from an external source.
489
L IST OF LEARNING OBJECTIVES
Chapter 8: Optimization
• Determine the critical and singular points of a function.
• Identify local extrema of a function.
• Find the global extrema of a function on a closed interval.
• Explain how the algorithm can be used in optimization problems. (Note that finding a critical
point is not enough to identify an extremum.)
• Convert geometric information into a function optimization problem.
• Interpret model optimization problems based on real-world examples according to their
context.
490
L IST OF LEARNING OBJECTIVES
• Identify solutions to simple differential equations (of the form y1 = ay ) and interpret them in
context.
• Given an initial condition, find a particular solution that satisfies a differential equation.
• Explain how a differential equation may be solved computationally using linear approxima-
tions. That is, explain how Euler’s method works.
• Explain what each term represents in the formula for Euler’s method.
• Examine and compare computational (numerical) and exact (analytical) solutions to differen-
tial equations.
• Use Euler’s method to solve a differential equation by hand (small number of steps)
• Interpret slope fields for a given differential equation and use them to roughly sketch solutions.
• Sketch a state-space diagram for a given differential equation and use it to describe the
behaviour of solutions.
• Explain what it means for a steady-state solution to be “stable”. Determine the stability of a
steady state.
491
L IST OF LEARNING OBJECTIVES
• Label points on the x-y-z axes and identify basic planes of constant x, y, or z.
• Given a simple function of two variables, z = f (x, y), evaluate z values for given pairs (x, y).
• Compute the second order partial derivatives given a function of two variables.
• State without proof that the mixed partials should be equal for “nice” functions.
• Define critical point and singular point for a function of two variables.
• Compute the critical points and singular points of a given function of two variables.
• State (without proof) that extreme values of a continuous multivariable function will occur at
critical or singular points.
• Use the second derivative test to classify critical points as either local maximums, local
minimums, or saddle points.
492
L IST OF LEARNING OBJECTIVES
• Reduce a constrained optimization problem in 3D, where the constraint is a single function
(possibly with endpoints, possibly not), to a single-variable calculus problem.
• Understand that the global extrema of a two-variable function over a closed region occur along
the boundary and/or at critical points of the interior
• Find the extreme values for a function of two variables on a closed region in cases where
optimization on the boundary can be reduced to a single-variable calculus problem.
• Choose between the method of Lagrange multipliers, and simple plugging in, for determining
extrema along a constraint.
• Find the absolute extrema of a surface over a closed region, using the appropriate method
(Lagrange or plugging in) for investigating the boundary.
493