0% found this document useful (0 votes)
21 views501 pages

Text Differential Calculus

Calculus textbook
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views501 pages

Text Differential Calculus

Calculus textbook
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 501

UBC Math 100: Differential Calculus

Textbook for AY 2024/25

This text is intended for UBC Math 100. It consists primarily of content drawn from three
open-source textbooks:

• CLP-1 Differential Calculus by Joel Feldman, Andrew Rechnitzer, and Elyse Yeager
Copyright © 2016–24 CC-BY-NC-SA 4.0

• Differential Calculus for the Life Sciences by Leah Edelstein-Keshet


Copyright © 2020 CC-BY-NC-SA 4.0

• Optimal, Integral, Likely prepared by Bruno Belevan, Parham Hamidi, Nisha Malhotra, and
Elyse Yeager Copyright © 2020-21 CC-BY-NC-SA, which is itself largely based on
CLP-3 Multivariable Calculus by Joel Feldman, Andrew Rechnitzer, and Elyse Yeager
Copyright © 2016–2024 CC-BY-NC-SA 4.0

This document was typeset on Sunday 8th September, 2024.


§§ Licenses and attributions
Copyright ©2024 Kelly Paton, Elyse Yeager
Cover art: Bulbous structure with shadows by Simone Hutsch is reproduced with a Royalty-Free
Commercial License. It is not creative commons.
Except where otherwise noted, this work is licensed under the Creative Commons Attribution-
NonCommercial-ShareAlike 4.0 International License. You can view a copy of the license at
https://creativecommons.org/licenses/by-nc-sa/4.0/.

The public-facing webpage for the project is https://personal.math.ubc.ca/˜elyse/


Math100Text/.
Source files can be found at the repository on gitlab (public version pending).
The creation of this resource was supported by a UBC OER Grant.
This text contains new material as well as material adapted from open sources.

• Chapter 0 is original content.

• Chapter 1 is adapted from Keshet, chapter 1.

• Chapter 2 is adapted from CLP and Keshet, with new content.

– The introduction is adapted from Keshet Appendix D.


– 2.1 is adapted from CLP section 1.3.
– 2.1.1 is adapted from CLP section 1.4.
– 2.1.2 is adapted from CLP section 1.5.
– 2.2 is new content.
– 2.3 is adapted from CLP section 1.6.

• Chapter 3 is adapted from CLP and Keshet, with new content.

– Section 3.1 is new content.


– Section 3.2 is adapted from Keshet Ch 2 and CLP 2.1, with new content .
– Subsection 3.3.1 is adapted from CLP section 2.1.
– Subsection 3.3.2 is adapted from CLP sections 2.1-2.3.
– Section 3.4 is adapted from CLP section 2.14.
– Section 3.5 is adapted from CLP section 2.7 and Keshet chapter 10.

• Chapter 4 is adapted from CLP

– Section 4.1 is adapted from section 2.4.


– The unnumbered section ”Proofs of the arithmetic of derivatives (starting on page 116)
is adapted from section 1.9.

2
– The unnumbered section “Using the arithmetic of derivatives – examples” (starting page
118 in the textbook) is adapted from section 2.6.
– Section 4.2 is adapted from section 2.8.
– Section 4.3 is adapted from section 2.9.
– Section 4.4 is adapted from section 2.10.
– Section 4.5 is adapted from section 2.11.
– Section 4.6 is adapted from sections 0.6 and 2.12.
– The introduction to Chapter 4.7 is adapted from the introduction to chapter 3. The rest
of section 4.7 is adapted from section 2.12.

• Chapter 5 is adapted from CLP section 3.2.

• Chapter 6 is adapted from CLP section 3.7.

• Chapter 7 is adapted from CLP section 3.6.

• Chapter 8 is adapted from both CLP and Keshet.

– The introduction and Sections 8.1, 8.2, and 8.3 are adapted from CLP section 3.5.
– Section 8.4 is adapted from Keshet, chapter 7.

• Chapter 9 is adapted from CLP section 3.4.

• Chapter 10 is adapted from CLP appendix C.1.

• Chapter 11 is adapted from Keshet chapter 11.

• Chapter 12 is adapted from Keshet chapter 12.

• Chapter 13 is adapted from Keshet Chaper 13.

• Chapter 14 is adapted from OIL Chapter 1, which is itself based on CLP–3 Chapter 1; except
subsection 14.1.1, which was Appendix A.1 in OIL, and does not appear in CLP–3.

• Chapter 15 is adapted from OIL Sections 2.1-2.2, which are based on CLP–2 chapter 2.

• Chapter 16 is adapted from OIL Sections 2.3-2.5, which are based on CLP–3 chapter 2.

3
4
C ONTENTS

0 Introduction 1
0.1 About this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.1 Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.2 Learning objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.3 Flavours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.2 Writing mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Pre-calculus 3
1 Power functions as building blocks 5
1.1 Power functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 First steps in graph sketching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Rate of reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 (optional) Predator response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Familiar functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Limits 27
2.1 Quick review of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.1 Calculating limits with limit laws . . . . . . . . . . . . . . . . . . . . . . 37
2.1.2 Limits at infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2 Asymptotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3 Limits and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Differentiation 69
3 Introduction to the Derivative 73
3.1 Review: lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1.1 Equations and sketches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1.2 Different equation forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.1.3 Slopes at different points . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

i
CONTENTS CONTENTS

3.2 Slopes and rates of change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79


3.2.1 Lines and rate of change . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.2.2 Nonlinear functions and average rates of change . . . . . . . . . . . . . . 82
3.3 The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.3.1 Slope at a point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.3.2 Definition of the derivative (1) . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3.3 Tangent lines and linear approximations . . . . . . . . . . . . . . . . . . . 91
3.3.4 Definition of the derivative (2) . . . . . . . . . . . . . . . . . . . . . . . . 94
3.3.5 Instantaneous rate of change . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.4 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.5 Derivatives of exponential functions . . . . . . . . . . . . . . . . . . . . . . . . . 107

4 Computing Derivatives 113


4.1 Arithmetic of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.2 Trigonometric functions and their derivatives . . . . . . . . . . . . . . . . . . . . 128
4.3 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.4 Logarithmic differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.5 Implicit differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.6 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.7 Inverse trig functions and their derivatives . . . . . . . . . . . . . . . . . . . . . . 168

Applications of Differentiation 177


5 Related Rates 183

6 L’Hôpital’s Rule and Indeterminate Forms 193


6.1 Standard examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.2 Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
6.3 (optional) Even more variations . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

7 Sketching Graphs 213


7.1 Domain, intercepts and asymptotes . . . . . . . . . . . . . . . . . . . . . . . . . . 213
7.2 First derivative — increasing or decreasing . . . . . . . . . . . . . . . . . . . . . 215
7.3 Second derivative — concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.4 (optional) Symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.5 A checklist for sketching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
7.6 Sketching examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

8 Optimization 243
8.1 Local and global maxima and minima . . . . . . . . . . . . . . . . . . . . . . . . 244
8.2 Finding global maxima and minima . . . . . . . . . . . . . . . . . . . . . . . . . 253
8.3 Max/min examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8.4 Sample optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
8.4.1 Density dependent (logistic) growth in a population . . . . . . . . . . . . . 275
8.4.2 Wine for Kepler’s wedding . . . . . . . . . . . . . . . . . . . . . . . . . . 277
8.4.3 Optimal foraging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

ii
CONTENTS CONTENTS

9 Approximating Functions Near a Specified Point— Taylor Polynomials 287


9.1 Zeroth approximation — the constant approximation . . . . . . . . . . . . . . . . 288
9.2 Linear approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
9.3 Quadratic approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
9.4 Still better approximations — Taylor polynomials . . . . . . . . . . . . . . . . . . 296
9.5 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
9.6 (Flavour A) Error in Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . . 304
9.7 (Optional) — Derivation of the error formulae . . . . . . . . . . . . . . . . . . . . 312

10 (Flavour A) Newton’s Method 317

Differential Equations 324


11 (Flavours A, B) Introduction to Differential Equations 327
11.1 Introducing a new kind of equation . . . . . . . . . . . . . . . . . . . . . . . . . . 327
11.2 Differential equation for unlimited population growth . . . . . . . . . . . . . . . . 331
11.3 Radioactive decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
11.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342

12 (Flavours A, B) Solving differential equations 345


12.1 Verifying that a function is a solution . . . . . . . . . . . . . . . . . . . . . . . . . 345
12.2 Equations of the form y1 (t ) = a ´ by . . . . . . . . . . . . . . . . . . . . . . . . . 348
12.3 Euler’s method and numerical solutions . . . . . . . . . . . . . . . . . . . . . . . 358
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

13 (Flavours A, B) Qualitative methods for differential equations 367


13.1 Linear and nonlinear differential equations . . . . . . . . . . . . . . . . . . . . . . 367
13.2 The geometry of change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
13.3 (Flavour B) State-space diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
13.4 Applying qualitative analysis to biological models . . . . . . . . . . . . . . . . . . 382
13.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391

Application to Multivariable Functions 393


14 (Flavour C) Geometry in Three Dimensions 395
14.1 Points and planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
14.1.1 (optional) Folding the first octant of R3 . . . . . . . . . . . . . . . . . . . 398
14.2 Functions of two variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
14.3 Sketching surfaces in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

15 (Flavour C) Partial Derivatives 425


15.1 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
15.2 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

iii
CONTENTS CONTENTS

16 (Flavour C) Optimization of Multivariable Functions 437


16.1 Local maximum and minimum values . . . . . . . . . . . . . . . . . . . . . . . . 437
16.1.1 Critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
16.1.2 Classifying critical points . . . . . . . . . . . . . . . . . . . . . . . . . . 448
16.2 Absolute minima and maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
16.3 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
16.3.1 Motivation for the method . . . . . . . . . . . . . . . . . . . . . . . . . . 468
16.3.2 Using the method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
16.3.3 Bounded vs unbounded Constraints . . . . . . . . . . . . . . . . . . . . . 479

A List of learning objectives 485

iv
Chapter 0

I NTRODUCTION

Welcome to UBC Math 100, Differential Calculus.

0.1 IJ About this book


0.1.1 §§ Origins
In previous years, Math 100 students were directed to different chapters in three different books.
This document is, more-or-less, the relevant parts of those three texts stapled together, with some
added content where needed.
You may notice differences from section to section in formatting and tone. In the practice
book, some questions have solutions, and some do not, due to the different source materials for the
questions. In particular, textbooks that do not contain solutions generally do so for pedagogical
reasons; if we were to publish solutions to their questions here, it would compromise the integrity of
their work.
We are making efforts to ensure that the content here matches the Math 100 learning goals
exactly, but it is a work in progress.

0.1.2 §§ Learning objectives


The learning objectives for the class are included in this text, in grey boxes. Generally, these are
printed at the start of a subsection. You can also see a printout of objectives for the entire course in
the Appendix A.
There are some overarching learning objectives for this course that are not tied to any particular
section. These are given below.

Learning Objectives
• Solve a long question by breaking it up into smaller pieces.

• Apply mathematical concepts to models of physical processes.

• Apply concepts creatively to unfamiliar contexts.

1
I NTRODUCTION 0.2 W RITING MATHEMATICS

• Be able to clearly and effectively communicate mathematical content in prose.

• Understand some basic ideas about what constitutes a proof in mathematics; understand
the differences between how something is defined and how it is computed.

• Correctly and appropriately manipulate algebraic and trigonometric expressions: sim-


plification, solving, etc.

0.1.3 §§ Flavours
Math 100 has three flavours. All of them use this document. Some content is shared between all
flavours, and some is not.
Content that isn’t shared by all flavours is marked. For example, Chapter 14 has “Flavour C” in
its title, and a coloured bar running along the left margin. If you’re in Flavour C, this content will be
covered in class and homework, and is examinable. If you’re in flavours A or B, this chapter won’t
show up in class or on exams. You may wish to self-study content from other flavours (especially
smaller pieces of content, like a single example) for personal interest, or to deepen your familiarity
with common concepts, but doing so is purely optional.
Sometimes content looks a lot like one flavour (for example, an exercise about finding interest
on an investment would look like Flavour C) but uses mathematics common to all flavours. These
are generally not marked as flavoured, since the content may be helpful for all flavours.

0.2 IJ Writing mathematics


Writing answers to questions in mathematics is much less about simply getting the correct result
somewhere on the page, and much more about communicating how you arrived at that result. As in
any discipline, the type of communication required will vary depending on the circumstances. Think
about the difference between writing yourself jot notes in English class, versus writing a formal
English essay. Both types of writing need to make sense, but they have different expectations in
terms of style, presentation, and level of detail.
If you are writing informally (e.g., solving an exam question under time pressure, or solving a
problem on a worksheet), your written mathematics should follow these basic guidelines:

• Use the symbols given in the problem statement (e.g., x). If you need to introduce a new
symbol, clearly define it (e.g., “let y = length of desk”, or label it on a diagram).

• Make sure your statements are unambiguous. In general, the standalone fragment “= 5” is
incomplete and unclear (what is it that “equals five”?), but “x = 5” is a complete mathematical
statement (as long as your reader knows what x represents).

• Be particularly careful with ambiguity and clarity when doing multi-line calculations. For
instance, although it is reasonable to not repeat the left-hand side of an equality if it remains
the same line by line, e.g.

f (x) = (x + 5)(x ´ 2)
= x2 + 3x ´ 10,

2
I NTRODUCTION 0.2 W RITING MATHEMATICS

ensure you do not neglect the left-hand side if it changes:


f (x) = (x + 5)(x ´ 2)
= x2 + 3x ´ 10
2 f (x) = 2x2 + 6x ´ 20.

• Use English words to explain any reasoning that is not captured by your mathematical notation.
• If you perform multiple different calculations within one solution, label them (e.g.,“finding
critical pts”, “finding minimum”).
• If you make any assumptions, state them.
If you are writing a solution more formally, such as for a written assignment, then you should follow
the above guidelines, and additionally:
• Explain what you are doing at each step. For instance, are you taking a derivative of a function
f (x)? Write it: “Taking the derivative of f (x) reveals f 1 (x) =. . . ”.
• Be clear. Instead of “It is nonzero because. . . ”, write “The function is nonzero because. . . ”.
• Use appropriate spelling, grammar, and punctuation – including ending sentences with a
period – even when you are using mathematical notation. Although “x = 5” might suffice on
an exam, on an assignment you would write something like “We get x = 5.”, complete with
period.
• Format your mathematics appropriately. For instance, the exponential function should look
like ex : italicized, with the x as a superscript. In LaTeX, this would be typed in math mode as
“$e ^ x$.” In Word or Google Docs, you can use the equation editor, or manually format the
expression in italics and superscript the x.
• Ensure your entire solution is readable from start to finish, like a paragraph or an essay (instead
of like jot notes, which would suffice for an informal solution). You can test this by reading
the entire solution out loud to someone (even yourself). A readable solution almost always
requires including additional English sentences to explain what you are doing, and it often
involves enclosing your mathematical equations within English sentences.
Following these guidelines, one way to more formally write the informal calculation shown above is:
“Consider the function f (x) = (x + 5)(x ´ 2), which expands to f (x) = x2 + 3x ´ 10. We multiply
f by 2 to get 2 f (x) = 2x2 + 6x ´ 20.” There are of course many ways to write this formally; this is
just an example. Many solutions in this textbook choose to present solutions as an indented chunk
of properly-formatted equations surrounded by English sentences; for instance:
“ Consider the function
f (x) = (x + 5)(x ´ 2)
= x2 + 3x ´ 10. (expanded)
Multiply by two to get:
2 f (x) = 2x2 + 6x ´ 20. ”

3
I NTRODUCTION 0.2 W RITING MATHEMATICS

4
Chapter 1

P OWER FUNCTIONS
AS BUILDING BLOCKS

Like tall architectural marvels that are made of simple units (beams, bricks, and tiles), many
interesting functions can be constructed from simpler building blocks. In this chapter, we study a
family of simple functions, the power functions — those of the form f (x) = xn .
Our first task is to understand properties of the members of this “family”. We will see that basic
observations of power functions such as x2 , x3 lead to insights into significant considerations such as
the sustainability of life on planet Earth (for example). Later, we use power functions as “building
blocks” to construct polynomials and rational functions1 . We then develop important approaches to
sketch the shapes of the resulting graphs.

1.1 IJ Power functions

Learning Objectives
• Sketch functions of the form f (x) = xn , where n is a real number (power functions);
interpret the shapes of power functions relative to one another.

• Determine which term in a polynomial function will dominate for small x and for large
x.

Let us consider the power functions, that is, functions of the form

y = f ( x ) = xn ,

where n is a real number. Power functions are among the most elementary and “elegant” functions -
we only need multiplications to compute their value at any point. They are thus easy to calculate,
very predictable and smooth, and, from the point of view of calculus, very easy to handle.

1 Now would be a good time to check in with your understanding of these terms. Can you define function? Can you
give an example of a polynomial function? What about an example of a rational function?

5
P OWER FUNCTIONS AS BUILDING BLOCKS 1.1 P OWER FUNCTIONS

` Click on this link and then adjust the slider on this interactive desmos graph to see how the
power n affects the shape of a power function in the first quadrant.

From Figure 1.1, we see that the power functions (y = xn for powers n = 2, . . . , 5) intersect2 at x = 0
and x = 1. This is true for all positive integer powers. The same figure also demonstrates another
fact helpful for curve-sketching: the greater the power n, the flatter the graph near the origin and
the steeper the graph beyond x ą 1. This can be restated in terms of the relative size of the power
functions. We say that close to the origin, the functions with lower powers dominate, while far from
the origin, the higher powers dominate.

4 y

x5 x4
3
x3
2
x2
1

x
0.2 0.4 0.6 0.8 1 1.2 1.4
Figure 1.1: Graphs of a few power functions y = xn . All intersect at x = 0, 1. As the power n
increases, the graphs become flatter close to the origin, (0, 0), and steeper at large x-values.

More generally, a power function has the form

y = f ( x ) = K ¨ xn

where n is a real number and K, sometimes called the coefficient, is a constant.


So far, we have compared power functions whose coefficient is K = 1. We can extend our
discussion to a more general case as well.
Example 1.1.1
Find points of intersection and compare the sizes of the two power functions

y1 = axn , and y2 = bxm .

where a and b are constants. You may assume that both a and b are positive.
This comparison is a slight generalization of the previous discussion. First, we note that the
coefficients a and b merely scale the vertical behaviour (i.e. stretch the graph along the y axis). It is
still true that the two functions intersect at x = 0; further, as before, the higher the power, the flatter
the graph close to x = 0, and the steeper for large positive or negative values of x. However, now
another point of intersection of the graphs occur when

axn = bxm ñ xn´m = (b/a).

6
P OWER FUNCTIONS AS BUILDING BLOCKS 1.1 P OWER FUNCTIONS

100 y

80
2x3

60 5x2
40

20

x
1 2 3 4
Figure 1.2: Graphs of two power functions, y = 5x2 and y = 2x3 .

We can solve this further to obtain a solution in the first quadrant3 :

x = (b/a)1/(n´m) . (1.1.1)
This is shown in Figure 1.2 for the specific example of y1 = 5x2 , y2 = 2x3 . Close to the origin,
the quadratic power function has a larger value, whereas for large x, the cubic function has larger
values. The functions intersect when 5x2 = 2x3 , which holds for x = 0 or x = 52 = 2.5. ♦
If b/a is positive, then in general the value given in (1.1.1) is a real number.
Example 1.1.1

Example 1.1.2
Determine points of intersection for the following pairs of functions:

(a) y1 = 3x4 and y2 = 27x2 ,



(b) y1 = 34 πx3 and y2 = 4πx2 .

Following the steps outlined above in Example 1.1.1 (calculations not shown in detail here —
this is a good place for you to try the calculations yourself), we find the following intersections:
?
(a) Intersections occur at x = 0 and at ˘(27/3)1/(4´2) = ˘ 9 = ˘3.

(b) These functions intersect at x = 0, 3 but there are no other intersections at negative values of x.

Example 1.1.2

2 How comfortable are you with interpreting graphs? Check in: use Figure 1.1 to approximate when x5 = 2.
3 Another good check-in point: If asked to draw a solution in the first quadrant, you should know that this means the
upper right-hand corner of the graph, which is where both x and y are positive.

7
P OWER FUNCTIONS AS BUILDING BLOCKS 1.1 P OWER FUNCTIONS

Note that in many cases, the points of intersection are irrational numbers4 whose decimal
approximations can only be obtained by a scientific calculator or by some approximation method
(such as Newton’s Method, studied in Chapter 10).
With only these observations we can examine the issue of energy balance and the sustainability
of life on Earth — as seen next.

§§ Sustainability and energy balance on Earth


The sustainability of life on planet Earth depends on a fine balance between the temperature of its
oceans and land masses and the ability of life forms to tolerate climate change. We introduce a
simple energy balance model to track incoming and outgoing energy and determine a rough estimate
for the Earth’s temperature. We use the following basic assumptions:

1. Energy input from the sun, given the Earth’s radius r, can be approximated as5

Ein = (1 ´ a)Sπr2 , (1.1.2)

where S is incoming radiation energy per unit area (also called the solar constant), and
0 ď a ď 1 is the fraction of that energy reflected; a is also called the albedo, and depends on
cloud cover, and other planet characteristics (such as percent forest, snow, desert, and ocean).

2. Energy lost from Earth due to radiation into space depends on the current temperature of the
Earth T , and is approximated as

Eout = 4πr2 εσ T 4 , (1.1.3)

where ε is the emissivity of the Earth’s atmosphere, which represents the Earth’s tendency
to emit radiation energy. This constant depends on cloud cover, water vapour, as well as
on greenhouse gas concentration in the atmosphere; σ is a physical constant (the Stephan-
Bolzmann constant) which is fixed for the purpose of our discussion.

Notice there are several different symbols in Eqns. (1.1.2) and (1.1.3). Being clear about which
are constants and which are variables is critical to using any mathematical model. As the next
example points out, sometimes you have a choice to make.
Example 1.1.3 (Energy expressions are power functions)
Explain in what sense the two forms of energy above can be viewed as power functions, and what
types of power functions they represent.
Both Ein and Eout depend on Earth’s radius as the power „ r2 . However, since this radius is a
constant, it is not fruitful to consider it as an interesting variable for this problem. However, we note
that Eout depends on temperatureas „ T 4 . (We might also select the albedo as a variable and in that
case, we note that Ein depends linearly on the albedo a.)

4 As a reminder,? an irrational number is a real number that cannot be expressed as a ratio of integers. Classic
examples are 2 and pi.
5 Take a close look at the formula for Ein in equation 1.1.2. Do you think Ein is proportional to Earth’s surface area,
or its volume? If you’re stuck, consider the formulas for the surface area and the volume of a sphere.

8
P OWER FUNCTIONS AS BUILDING BLOCKS 1.2 F IRST STEPS IN GRAPH SKETCHING

Example 1.1.3

Example 1.1.4 (Energy equilibrium for the Earth)


Explain how the assumptions above can be used to determine the equilibrium temperature of the
Earth, that is, the temperature at which the incoming and outgoing radiation energies are balanced.
The Earth is at equilibrium when

Ein = Eout ñ (1 ´ a)Sπr2 = 4πr2 εσ T 4 .

We observe that the factors πr2 cancel, and we can obtain an equation that can be solved for the
temperature T . . .
. . . this is left for you (the reader) to finish! Once you have the answer (T = . . . ) it is additionally
instructive to examine how this temperature depends on the constants in the problem, and how it is
affected by cloud cover and greenhouse gas level.
Example 1.1.4

1.2 IJ First steps in graph sketching

Learning Objectives
• Sketch two-term polynomial functions by determining which term dominates for small
x and for large x. For example, sketch f (x) = x2 ´ 3x4 .

§§ Even and odd power functions


So far, we have considered power functions y = xn with x ą 0. But in mathematical generality, there
is no reason to restrict the independent variable x to positive values. Thus we expand the discussion
to consider all real values of x. We examine now some symmetry properties that arise.
In Figure 1.3 (a) we see that power functions with an even power, such as y = x2 , y = x4 ,
and y = x6 , are symmetric about the y-axis. In Figure 1.3(b) we notice that power functions with an
odd power, such as y = x, y = x3 and y = x5 are symmetric when rotated 180˝ about the origin.
We adopt the term even function and odd function to describe such symmetry properties. More
formally,

f (´x) = f (x) ñ f is an even function,


f (´x) = ´ f (x) ñ f is an odd function

Many functions are not symmetric at all, and are neither even nor odd.

9
P OWER FUNCTIONS AS BUILDING BLOCKS 1.2 F IRST STEPS IN GRAPH SKETCHING

Even power functions Odd power functions


2.5
y
2
x6 x4 x5 x3
2
1 x
1.5
x2
−1.5 −1 −0.5 0.5 1 1.5
1
−1
0.5
−2
x
−1.5 −1 −0.5 0.5 1 1.5
(a) (b)

Figure 1.3: Graphs of power functions. (a) A few even power functions: y = x2 , y = x4 and y = x6 .
(b) Some odd power functions: y = x, y = x3 and y = x5 . Note the symmetry properties.

` Adjust the slider to see how the even and odd power functions behave as their power
increases.

Example 1.2.1
Show that the function y = g(x) = x2 ´ 3x4 is an even function
For g to be an even function, it should satisfy g(´x) = g(x). Let us calculate g(´x) and see if
this requirement holds. We find that

g(´x) = (´x)2 ´ 3(´x)4 = x2 ´ 3x4 = g(x).

Here we have used the fact that (´x)n = (´1)n xn , and that when n is even, (´1)n = 1.
Example 1.2.1

All power functions are continuous and unbounded: for x Ñ 8 both even and odd power
functions satisfy y = xn Ñ 8. For x Ñ ´8, odd power functions tend to ´8. Odd power functions
are one-to-one: that is, each value of y is obtained from a unique value of x and vice versa. This is
not true for even power functions. From Fig 1.3 we see that all power functions go through the point
(0, 0). Even power functions have a local minimum at the origin whereas odd power functions do
not.

Definition 1.2.2 (Local Minimum).

A local minimum of a function f (x) is a point xmin such that the value of f is larger at all
sufficiently close points. Formally, f (xmin ˘ ε ) ą f (xmin ) for ε small enough.

10
P OWER FUNCTIONS AS BUILDING BLOCKS 1.2 F IRST STEPS IN GRAPH SKETCHING

Concept Check-In
(Grey boxes labelled “Concept Check-In” — containing questions or prompts that encourage you to check in on
your knowledge or comfort with concepts — like this one, are offered occasionally in select chapters.)

1. Highlight the y-axes and circle the origins in Fig 1.3.

2. Consider Figure 1.3: where do even power functions intersect? Odd?

3. Show that f (a) = a5 ´ 3a is an odd function.

4. Give an example of a function which is bounded.

5. Verify y = x2 is not one-to-one.

6. What graphical property do one-to-one functions share?

§§ Sketching a simple (two-term) polynomial


Based on our familiarity with power functions, we now discuss functions made up of such com-
ponents. In particular, we extend the discussion to polynomials (sums of power functions) and
rational functions (ratios of such functions). We also develop skills in sketching graphs of these
functions.
Example 1.2.3 (Sketching a simple cubic polynomial)
Sketch a graph of the polynomial
y = p(x) = x3 + ax. (1.2.1)
How would the sketch change if the constant a changes from positive to negative?

` Adjust the slider to see how positive and negative values of the coefficient a affect the shape
of this simple polynomial.

The polynomial in Eqn. (1.2.1) has two terms, each one a power function. Let us consider their
effects individually. Near the origin, for x « 0 the term ax dominates so that, close to x = 0, the
function behaves as
y « ax.
This is a straight line with slope a. Hence, near the origin, if a ą 0 we would see a line with positive
slope, whereas if a ă 0 the slope of the line should be negative. Far away from the origin, the cubic
term dominates, so
y « x3
at large (positive or negative) x values. Figure 1.4 illustrates these ideas.

Concept Check-In
6. Justify why the linear term dominates near the origin, while the cubic term dominates
further out.

11
P OWER FUNCTIONS AS BUILDING BLOCKS 1.2 F IRST STEPS IN GRAPH SKETCHING

7. Sketch the graph of any function with horizontal asymptote y = 2.

y y y

x x x

y y y

x x x

y y y

x x x

a<0 a=0 a>0

Figure 1.4: The graph of the polynomial y = p(x) = x3 + ax can be obtained by combining its two
power function components. The cubic “arms” y « x3 (top row) dominate for large x (far from
the origin), while the linear part y « ax (middle row) dominates near the origin. When these are
smoothly connected (bottom row) we obtain a sketch of the desired polynomial. Shown here are
three possibilities, for a ă 0, a = 0, a ą 0, left to right. The value of a determines the slope of the
curve near x = 0 and thus also affects presence of a local maximum and minimum (for a ă 0).

In the first row we see the behaviour of y = p(x) = x3 + ax for large x, in the second for small
x. The last row shows the graph for an intermediate range. We might notice that for a ă 0, the
graph has a local minimum as well as a local maximum. Such an argument already leads to a fairly
reasonable sketch of the function in Eqn. (1.2.1). We can add further details using algebra to find
zeros - that is where y = p(x) = 0.
Example 1.2.3

Example 1.2.4 (Zeros)


Find the places at which the polynomial Eqn. (1.2.1) crosses the x axis, that is, find the zeros of the
function y = x3 + ax.
The zeros of the polynomial can be found by setting
y = p(x ) = 0 ñ x3 + ax = 0 ñ x3 = ´ax.

12
P OWER FUNCTIONS AS BUILDING BLOCKS 1.2 F IRST STEPS IN GRAPH SKETCHING

The above equation always has a solution x = 0, but if x ‰ 0, we can cancel and obtain

x2 = ´a.

This would have no solutions if a is a positive number, so that in that case, the graph crosses the x
axis only once, at x = 0, as shown in Figure 1.4. If a is negative, then the minus signs cancel, so the
equation can be written in the form
x2 = |a|
and we would have two new zeros at a
x = ˘ |a|.
For example, if a = ´1 then the function y = x3 ´ x has zeros at x = 0, 1, ´1.

Concept Check-In
8. Find the zeros of y = x3 + 3x.

Example 1.2.4

Example 1.2.5 (A more general case)


Explain how you would use the ideas of Example 1.2.3 to sketch the polynomial y = p(x) =
axn + bxm . Without loss of generality, you may assume that n ą m ě 1 are integers.
As in Example 1.2.3, this polynomial has two terms that dominate at different ranges of the
independent variable. Close to the origin, y « bxm (since m is the lower power) whereas for large x,
y « axn . The full behaviour is obtained by smoothly connecting these pieces of the graph. Finding
zeros can refine the graph.
Example 1.2.5

The reasoning used here is an important first step in sketching the graph of a polynomial. In the
ensuing chapters, we apply calculus tools to determine points at which the function attains local
maxima or minima (called critical points), and how it behaves for very large positive or negative
values of x. We also develop specialized methods to find zeros of more complicated functions (using
an approximation technique called Newton’s method—although this is flavour-dependent). That
said, the elementary steps described here remain useful as a quick approach for visualizing the
overall shape of a graph.

§§ (optional) Sketching a simple rational function


We apply similar reasoning to consider the graphs of simple rational functions. A rational function
is a function that can be written as

p1 (x)
y= , where p1 (x) and p2 (x) are polynomials.
p2 (x)

13
P OWER FUNCTIONS AS BUILDING BLOCKS 1.3 R ATE OF REACTION

Example 1.2.6 (A rational function)


Sketch the graph of the rational function

Axn
y= n , x ě 0. (1.2.2)
a + xn
What properties of your sketch depend on the power n? What would the graph look like for
n = 1, 2, 3?

` Adjust the sliders to see how the values of n, A, and a affect the shape of the rational
function in (1.2.2).

We can break up the process of sketching this function into the following steps:

• The graph of the function in Eqn. (1.2.2) goes through the origin (at x = 0, we see that y = 0).

• For very small x, (i.e., x ăă a) we can approximate the denominator by the constant term
an + xn « an6 , since xn is negligible by comparison, so that
 
Axn Axn A
y= n n
« n = xn for small x.
a +x a an

This means that near the origin, the graph looks like a power function, y « Cxn (where
C = A/an ).

• For large x, i.e. x ąą a, we have an + xn « xn since x overtakes and dominates over the
constant a, so that
Axn Axn
y= n « = A for large x.
a + xn xn
This reveals that the graph has a horizontal asymptote y = A at large values of x.

• Since the function behaves like a simple power function close to the origin, we conclude
directly that the higher the value of n, the flatter is its graph near 0. Further, large n means
sharper rise to the eventual asymptote.

The results are displayed in Figure 1.5.


Example 1.2.6

1.3 IJ (optional) Rate of an enzyme-catalyzed reaction


Rational functions introduced in Example 1.2.6 often play a role in biochemistry. Here we discuss
two such examples and the contexts in which they appear. In both cases, we consider the initial rise
of the function as well as its eventual saturation.

6 Although an looks complicated, it’s actually just a constant. Do you see why?

14
P OWER FUNCTIONS AS BUILDING BLOCKS 1.3 R ATE OF REACTION

Small x Large x Smoothly connected


y y y n=3
n=1 n=2
n=2
n=1
n=3
A

x x x

Figure 1.5: The rational functions Eqns.(1.2.2) with n = 1, 2, 3 are compared on this graph. Close
to the origin, the function behaves like a power function, whereas for large x there is a horizontal
asymptote at y = A. As n increases, the graph becomes flatter close to the origin, and steeper in its
rise to the asymptote.

§§ Saturation and Michaelis-Menten kinetics


Biochemical reactions are often based on the action of proteins known as enzymes that catalyze
reactions in living cells. Fig. 1.6 depicts an enzyme E binding to its substrate S to form a complex
C. The complex breaks apart into a product, P, and the original enzyme that can act once more.
Substrate is usually plentiful relative to the enzyme.

k1 k2
k−1
?
E S C E P

Figure 1.6: An enzyme (catalytic protein) is shown binding to a substrate molecule (circular dot)
and then processing it into a product (star shaped molecule).

In the context of this example, x represents the concentration of substrate in the reaction mixture.
The speed of the reaction, v, (namely the rate at which product is formed) depends on x. When
you actually graph the speed of the reaction as a function of the concentration, you see that it is
not linear: Figure 1.7 is typical. This relationships, known as Michaelis-Menten kinetics, has the
mathematical form
Kx
speed of reaction = v = , (1.3.1)
kn + x
where K, kn ą 0 are constants specific to the enzyme and the experimental conditions.
Equation (1.3.1) is a rational function. Since x is a concentration, it must be a positive quantity,
so we restrict attention to x ě 0. The expression in Eqn. (1.3.1) is a special case of the rational
functions explored in Example 1.2.6, where n = 1, A = K, a = kn . In Figure 1.7, we used plot this
function for specific values of K, kn . The following observations can be made
1. The graph of Eqn. (1.3.1) goes through the origin. Indeed, when x = 0 we have v = 0.
2. Close to the origin, the initial rise of the graph “looks like” a straight line. We can see this by
considering values of x that are much smaller than kn . Then the denominator (kn + x) is well

15
P OWER FUNCTIONS AS BUILDING BLOCKS 1.3 R ATE OF REACTION

v Michaelis-Menten kinetics

K saturation

K/2

initial rise x
kn

Figure 1.7: The graph of reaction speed, v, versus substrate concentration, x in an enzyme-catalyzed
reaction, as in Eqn. 1.3.1. This behaviour is called Michaelis-Menten kinetics. Note that the graph at
first rises almost like a straight line, but then it curves and approaches a horizontal asymptote. This
graph tells us that the speed of the enzyme cannot exceed some fixed level, i.e. it cannot be faster
than K.

approximated by the constant kn . Thus, for small x, v « (K/kn )x, so that the graph resembles
a straight line through the origin with slope (K/kn ).

3. For large x, there is a horizontal asymptote. A similar argument for x " kn , verifies that v is
approximately constant at large enough x.

Michaelis-Menten kinetics represents one relationship in which saturation occurs: the speed
of the reaction at first increases as substrate concentration x is raised, but the enzymes saturate and
operate at a fixed constant speed K as more and more substrate is added.

units example
x concentration “nano Molar” , nM ” 10´9 Moles per litre
v concentration over time nM min´1
kn
K

Kx
Table 1.1: Units for Michaelis-Menten kinetics, v = kn +x . (Incomplete; see Concept Check-In,
below.)

It is worth considering the units in Eqn. (1.3.1). Given that only quantities with identical units
can be added or compared, and that the units of the two sides of the relationship must balance, fill
Table 1.1.

Concept Check-In
9. Complete Table 1.1.

16
P OWER FUNCTIONS AS BUILDING BLOCKS 1.3 R ATE OF REACTION

Featured Problem 1.3.1 (Fish population growth 1)


The Beverton-Holt model relates the number of salmon in a population this year N1 to the number
of salmon that were present last year N0 , according to the relationship
N0
N1 = k1 , k1 , k2 ą 0 (1.3.2)
(1 + k2 N0 )
Sketch N1 as a function of N0 and explain how the constants k1 and k2 affect the shape of the graph
you obtain. Is there a population level N0 that would be exactly the same from one year to the next?
Are there any restrictions on k1 or k2 for this kind of static (“steady state”) population to be possible?

Featured Problem 1.3.1

§§ Hill functions
The Michaelis-Menten kinetics we discussed above fit into a broader class of Hill functions, which
are rational functions of the form shown in Eqn. (1.2.2) with n ą 1 and A, a ą 0. This function is
often referred to in the life sciences as a Hill function with coefficient n, (although the “coefficient”
is actually a power in the terminology used in this chapter). Hill functions occur when an enzyme-
catalyzed reaction benefits from cooperativity of a multi-step process. For example, the binding of
the first substrate molecule may enhance the binding of a second.

Hill function kinetics


n=3
n=2
n=1

chemical concentration, x
Figure 1.8: Hill function kinetics, from Eqn. (1.2.2), with A = 3, a = 1 and Hill coefficient n = 1, 2, 3.
See also Fig 1.5 for an analysis of the shape of this graph.

Michaelis-Menten kinetics coincides with a Hill function for n = 1. In biochemistry, expressions


of the form of Eqn. (1.2.2) with n ą 1 are often denoted “sigmoidal” kinetics. Several such functions
are plotted in Figure 1.8. We examined the shapes of these functions in Example 1.2.6.
All Hill functions have a horizontal asymptote y = A at large values of x. If y is the speed of a
chemical reaction (analogous to the variable we called v), then A is the “maximal rate” or “maximal

17
P OWER FUNCTIONS AS BUILDING BLOCKS 1.4 ( OPTIONAL ) P REDATOR RESPONSE

speed” of the reaction. Since the Hill function behaves like a simple power function close to the
origin, the higher the value of n, the flatter is its graph near 0, and the sharper the rise to the eventual
asymptote. Hill functions with large n are often used to represent “switch-like” behaviour in genetic
networks or biochemical signal transduction pathways.
The constant a is sometimes called the “half-maximal activation level” for the following reason:
when x = a then
Aan Aa2 A
v= n n
= 2 = .
a +a 2a 2
This shows that the level x = a leads to a reaction speed of A/2 which is half of the maximal possible
rate.
Featured Problem 1.3.2
Lineweaver-Burk plots. Hill functions can be transformed to a linear relationship through a change
of variables. Consider the Hill function

Ax3
y= .
a3 + x3

define y = 1/Y , X = 1/x3 . Show that Y and X satisfy a linear relationship. Because we take the
reciprocals of x and y, X and Y are sometimes called reciprocal coordinates.
Featured Problem 1.3.2

1.4 IJ (optional) Predator response

Holling Predator response


Type I
predation rate, P(x)

Type II

Type III

prey density, x
Figure 1.9: Holling’s Type I, II, and III predator response. The predation rate P(x) is the number of
prey eaten by a predator per unit time. Note that the predation rate depends on the prey density x.

Interactions of predators and prey are often studied in ecology. Professor C.S. (“Buzz”) Holling,
(a former Director of the Institute of Animal Resource Ecology at the University of British Columbia)
described three types of predators, termed “Type I”, “Type II” and “Type III”, according to their

18
P OWER FUNCTIONS AS BUILDING BLOCKS 1.4 ( OPTIONAL ) P REDATOR RESPONSE

ability to consume prey as the prey density increases. The three Holling “predator functional
responses” are shown in Fig. 1.9.
Based on Fig. 1.9, match the predator responses to functions shown below.
Hint: One of the curves “looks like a straight line” (so which function here is linear?). One of the
choices is a power function. (Will it fit any of the other curves? why or why not?). Now consider the
saturating curves and use our description of rational functions in Section 1.3 to select appropriate
formulae for these functions.
P1 (x) = kx,
x
P2 (x) = K
,
a+x
P3 (x) = Kxn , n ě 2
xn
P4 (x) = K n , ně2
a + xn
The generality of mathematics allows us to adapt concepts we studied in one setting (enzyme
biochemistry) to an apparently new topic (behaviour of predators).
Concept Check-In
1. Match the predator responses shown in Fig. 1.9 with the descriptions given below

1. As a predator, I get satiated and cannot keep eating more and more prey.
2. I can hardly find the prey when the prey density is low, but I also get satiated at high
prey density.
3. The more prey there is, the more I can eat.

§§ A ladybug eating aphids


Here we use ideas developed so far to address a problem in population growth and biological control.
Featured Problem 1.4.1 (A balance of predation and aphid population growth)
Ladybugs are predators that love to eat aphids (their prey).
i See this short video explanation of the ladybug Type III predator response to its aphid prey.

Fig. 1.10 provides data 7 that supports the idea that ladybugs are type 3 predators.
Let x= the number of aphids in some unit area (i.e., the density of the prey). Then the number
of aphids eaten by a ladybug per unit time in that unit area will be called the predation rate and
denoted P(x). The predation rate usually depends on the prey density, and we approximate that
dependence by
xn
P(x ) = K n , where K, a ą 0. (1.4.1)
a + xn
Here we consider the case that n = 2. The aphids reproduce at a rate proportional to their number,
so that the growth rate of the aphid population G (number of new aphids per hour) is
G(x) = rx where r ą 0. (1.4.2)

7 source: Hassell, M. P., Lawton, J. H., & Beddington, J. R. (1977). Sigmoid Functional Responses by Invertebrate
Predators and Parasitoids. Journal of Animal Ecology, 46(1), 249–262. https://doi.org/10.2307/3959

19
P OWER FUNCTIONS AS BUILDING BLOCKS 1.4 ( OPTIONAL ) P REDATOR RESPONSE

30

Predation rate, P(x)


20

10

0
0 20 40 60 80 100
Aphid density, x

Figure 1.10: The predation rate of a ladybug depends on its aphid (prey) density.

(a) For what aphid population density x does the predation rate exactly balance the aphid popula-
tion growth rate?

(b) Are there situations where the predation rate cannot match the growth rate? Explain your
results in terms of the constants K, a, r.

Featured Problem 1.4.1

Hints and partial solution

(a) The wording “the predation rate exactly balances the reproduction rate” means that the two
functions P(x) and G(x) are exactly equal.

` Use the sliders to manipulate the predation constants K, a and the aphid growth
rate parameter r. How many solutions are there to P(x) = G(x)? Show that for some
parameter values, there is only a trivial solution at x = 0. Make a connection between this
observation and part (b) of Example 1.4.1.

Hence, to solve this problem, equate P(x) = G(x) and determine the value of x (i.e., the
number of aphids) at which this equality holds. You will find that one solution to this equation
is x = 0. But if x ‰ 0, you can cancel one factor of x from both sides and rearrange the
equation to obtain a quadratic equation whose solution can be written down (in terms of the
positive constants K, r, a).

20
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS

2 a
? that a quadratic equation ax + bx + c = 0 has roots
Hint: Recall
´b ˘ b ´ 4ac
2
x= .
2a
These roots are real provided
b2 ´ 4ac ě 0.

a The solution to this problem is based on solving a quadratic equation, and so, relies on the fact that we chose the
value n = 2 in the predation rate. To solve the same kind of problem with n = 3, 4 etc generally requires numerical
approximation methods.

(b) The solution you find in (a) is only a real number (i.e. a real solution exists) if the discriminant
(quantity inside the square-root) is positive. Determine when this situation can occur and
interpret your answer in terms of the aphid and ladybugs.

1.5 IJ Familiar functions

Learning Objectives
• Know that ex eventually dominates any given power function, and any power function
with positive exponent dominates logarithm (for large positive x). Use these facts for
sketching. For example, sketch f (x) = ex ´ x.
?
• Sketch familiar functions such as ex , log x, sin x, cos x, tan x, 1/x, x, and |x|.

Power functions are both common and (relatively) simple, so they were a good place to start
thinking about dominance and how it can be useful. Another common class of functions are the
exponential functions: those of the form f (x) = ax , where a is a positive constant. (The one we’ll
be using the most, sometimes called the exponential function, is ex .) The constant a is known as the
base.

Example 1.5.1 (Bases of exponential functions)


Below are graphed y = 2x , y = 3x , and y = 4x . Note that, as the bases increase, the functions get
steeper for positive x.

21
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS

y= 4 x

2x
3x
y=

y=
y

Example 1.5.1

Exponential functions with bases greater than one will, for large x, grow extremely quickly.
Indeed, they will grow more quickly than any power function, eventually.

Example 1.5.2 (ex ´ x.)


Let’s sketch y = ex ´ x.

• For large positive values of x, ex will dominate ´x, so the function will look approximately
like ex . That is, it will grow steeply.

• For large negative values of x, ex « 0, so ex ´ x « ´x.

• For x = 0, e0 ´ 0 = 1.

So, all together: our function will look like the straight line ´x when x is strongly negative; then
it will pass through the point (0, 1); then it will grow like the classic hockey-stick graph of ex for
large x.

22
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS

y
y = ex ´ x

Example 1.5.2

Below are some familiar functions whose graphs you should be able to sketch. Some salient
points are stated explicitly.

ex :

• Domain is all real numbers. Range is all y


y = ex
positive numbers.
• Passes through the point (0, 1).
• ex is very close to 0 for large negative
values of x; ex grows rapidly and without x
bound for large positive values of x.

log(x) :

• In this course we use log(x) = y y = log x


loge (x) = ln(x). This differs from some
x
calculators. When using online calcula-
tors, put in a few test points to see which
base is used for the ‘log’ button.
• Is the inverse of the exponential func-
tion.
• Passes through the point (1, 0).
• Domain is (0, 8); range is all real num-
bers.
• For very large positive x values, log x is
very big and positive.
• For very small positive x values, log x is
very big and negative.

23
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS

sin(x) :

• Range is [´1, 1]. Domain is all real num- y


bers.
• Passes through the origin. x
• You should know the value of this func- y = sin x
tion at reference angles, and its relation
with the unit circle.

cos(x) :

• Range is [´1, 1]. Domain is all real num- y


bers. y = cos x
• Passes through the point (0, 1). x
• You should know the value of this func-
tion at reference angles, and its relation
with the unit circle.

tan(x) :

• Range is all real numbers. Not defined y


for x = nπ + π2 , where n is any integer. y = tan x

• Passes through the origin. Blows up near


points where it isn’t defined. x
´ 3π
2
´ π2 π
2

2
• You should know the value of this func-
tion at reference angles, and its relation
with sine and cosine.

1
x :

• Domain and range are both all nonzero y


real numbers.

1
• For values of x close to 0, 1
is very large y= x
x
(positive or negative). x

• For large (positive or negative) values of


x, 1x is close to 0.
?
x:

24
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS

• Domain and range are both [0, 8). y ?


y= x

? x
• For large values of x, x is also large.

|x| :
#
x if x ě 0 y
• Piecewise defined: |x| = .
´x if x ă 0
• Domain all real numbers; range [0, 8).
• Looks like a straight line if you only look
x
to one side of the y-axis.

25
P OWER FUNCTIONS AS BUILDING BLOCKS 1.5 FAMILIAR FUNCTIONS

26
Chapter 2

L IMITS

The concept of a limit helps us to describe the behaviour of a function close to some point of interest.
This is useful in the case of functions that are either not continuous, or not defined somewhere. We
use the notation

lim f (x)
xÑa

to denote the value that the function f approaches as x gets closer and closer to the value a.

2.1 IJ Quick review of limits

Learning Objectives
• Explain using both words and pictures what lim f (x) = L, lim f (x) = L, and
xÑa xÑa´
lim f (x) = L mean (including the case where L is equal to 8 or ´8).
xÑa+

• Explain using both words and pictures what lim f (x) = L and lim f (x) = L mean
xÑ8 xÑ´8
(including the case where L is equal to 8 or ´8).

• Find the limit of a function at a point given the graph of the function.

• Understand when limits do and do not exist.

Before we come to definitions, let us start with a little notation for limits.

27
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Notation 2.1.1.

We will often write

lim f (x) = L
xÑa

which should be read as

The limit of f (x) as x approaches a is L.

The notation is just shorthand — we don’t want to have to write out long sentences as we do our
mathematics. Whenever you see these symbols you should think of that sentence.
This shorthand also has the benefit of being mathematically precise (albeit not in a way that we
will cover in this course), and (almost) independent of the language in which the author is writing.
A mathematician who does not speak English can read the above formula and understand exactly
what it means.
In mathematics, like most languages, there is usually more than one way of writing things and
we can also write the above limit as

f (x) Ñ L as x Ñ a

This can also be read as above, but also as


f (x) goes to L as x goes to a
They mean exactly the same thing in mathematics, even though they might be written and read a
little differently.
To arrive at the definition of limit, we want to start with a very simple example.
Example 2.1.2
Consider the following function.
$
&2x
’ xă3
f (x ) = 9 x=3

2x xą3
%

This is an example of a piecewise function. That is, a function defined in several pieces, rather than
as a single formula. We evaluate the function at a particular value of x on a case-by-case basis. Here
is a sketch of it:

28
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Notice the two circles in the plot. One is open, ˝, and the other is closed, ‚.
• A filled circle has quite a precise meaning — a filled circle at (x, y) means that the function
takes the value f (x) = y.
• An open circle is a little harder — an open circle at (3, 6) means that the point (3, 6) is not on
the graph of y = f (x), i.e. f (3) ‰ 6. We should only use the open circle where it is absolutely
necessary in order to avoid confusion.
This function is quite contrived, but it is a very good example to start working with limits more
systematically. Consider what the function does close to x = 3. We already know what happens
exactly at 3 (that is, f (x) = 9 ) but we want to look at how the function behaves very close to x = 3.
That is, what does the function do as we look at a point x that gets closer and closer to x = 3?
If we plug in some numbers very close to 3 (but not exactly 3) into the function we see the
following:
x 2.9 2.99 2.999 ˝ 3.001 3.01 3.1
f (x ) 5.8 5.98 5.998 ˝ 6.002 6.02 6.2
So as x moves closer and closer to 3, without being exactly 3, we see that the function moves closer
and closer to 6. We can write this as
lim f (x) = 6
xÑ3

That is:
The limit as x approaches 3 of f (x) is 6.
So for x very close to 3, without being exactly 3, the function is very close to 6 — which is a long
way from the value of the function exactly at 3, f (3) = 9. Note well that the behaviour of the
function as x gets very close to 3 does not depend on the value of the function at 3.
Example 2.1.2

We now have enough to make an informal definition of a limit, which is actually sufficient for
most of what we will do in this text.
Definition 2.1.3 (Informal definition of limit).

We write

lim f (x) = L
xÑa

if the value of the function f (x) is sure to be arbitrarily close to L whenever the value of x
is close enough to a, without1 being exactly a.

1 You may find the condition “without being exactly a” a little strange, but there is a good reason for it. One very
important application of limits, indeed the main reason we teach the topic, is in the definition of derivatives (see
f (x ) ´ f (a)
Definition 3.3.3). In that definition we need to compute the limit lim . In this case the function whose
x Ña x´a
f (x )´ f (a)
limit is being taken, namely x´a , is not defined at all at x = a.

29
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Let us use the above definition to examine a more substantial example.


Example 2.1.4
Let f (x) = x´2
x2 +x´6
and consider its limit as x Ñ 2.

• We are really being asked


x´2
lim = what?
xÑ2 x2 + x ´ 6

• Now if we try to compute f (2) we get 0/0 which is undefined. The function is not defined
at that point — this is a good example of why we need limits. We have to sneak up on these
places where a function is not defined (or is badly behaved).
• VERY IMPORTANT POINT: the fraction 00 is not 8 and it is not 1; it is not defined. We
cannot ever divide by zero in normal arithmetic and obtain a consistent and mathematically
sensible answer. If you learned otherwise in high school, you should quickly unlearn it.
• Again, we can plug in some numbers close to 2 and see what we find

x 1.9 1.99 1.999 ˝ 2.001 2.01 2.1


f (x ) 0.20408 0.20040 0.20004 ˝ 0.19996 0.19960 0.19608

• So it is reasonable to suppose that


x´2
lim = 0.2
xÑ2 x2 + x ´ 6

Example 2.1.4

The previous two examples are nicely behaved in that the limits we tried to compute actually
exist. We now turn to two nastier examples2 in which the limits we are interested in do not exist.
Example 2.1.5 (A bad example)
Consider the following function f (x) = sin(π/x). Find the limit as x Ñ 0 of f (x).
We should see something interesting happening close to x = 0 because f (x) is undefined there.
Using your favourite graph-plotting software you can see that the graph looks roughly like

2 Actually, they are good examples, but the functions in them are nastier.

30
L IMITS 2.1 Q UICK REVIEW OF LIMITS

How to explain this? As x gets closer and closer to zero, π/x becomes larger and larger (remember
what the plot of y = 1/x looks like). So when you take sine of that number, it oscillates faster and
faster the closer you get to zero. Since the function does not approach a single number as we bring x
closer and closer to zero, the limit does not exist.
We write this as
π 
lim sin does not exist
xÑ0 x
It’s not very inventive notation, however it is clear. We frequently abbreviate “does not exist” to
“DNE” and rewrite the above as
π 
lim sin = DNE
xÑ0 x

Example 2.1.5

In the following example, the limit we are interested in does not exist. However the way in
which things go wrong is quite different from what we just saw.
Example 2.1.6
Consider the function
$
&x
’ xă2
f (x) = ´1 x=2

x+3 xą2
%

• The plot of this function looks like this

• So let us plug in numbers close to 2.

x 1.9 1.99 1.999 ˝ 2.001 2.01 2.1


f (x ) 1.9 1.99 1.999 ˝ 5.001 5.01 5.1

• This isn’t like before. Now when we approach from below, we seem to be getting closer to
2, but when we approach from above we seem to be getting closer to 5. Since we are not
approaching the same number the limit does not exist.

lim f (x) = DNE


xÑ2

31
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Example 2.1.6

While the limit in the previous example does not exist, the example serves to introduce the idea
of “one-sided limits”. For example, we can say that

As x moves closer and closer to two from below the function approaches 2.

and similarly

As x moves closer and closer to two from above the function approaches 5.

Definition 2.1.7 (Informal definition of one-sided limits).

We write

lim f (x) = K
xÑa´

when the value of f (x) gets closer and closer to K when x ă a and x moves closer and
closer to a. Since the x-values are always less than a, we say that x approaches a from
below. This is also often called the left-hand limit since the x-values lie to the left of a on
a sketch of the graph.
We similarly write

lim f (x) = L
xÑa+

when the value of f (x) gets closer and closer to L when x ą a and x moves closer and
closer to a. For similar reasons we say that x approaches a from above, and sometimes
refer to this as the right-hand limit.

Note — be careful to include the superscript + and ´ when writing these limits. You might also
see the following notations:

lim f (x) = lim f (x) = lim f (x) = lim f (x) = L right-hand limit
xÑa+ xÑa+ xÓa xŒa
lim f (x) = lim f (x) = lim f (x) = lim f (x) = L left-hand limit
xÑa´ xÑa´ xÒa xÕa

but please use with the notation in Definition 2.1.7 above.


Given these two similar notions of limits, when are they the same? The following theorem tell
us
Theorem 2.1.8 (Limits and one sided limits).

lim f (x) = L if and only if lim f (x) = L and lim f (x) = L


xÑa xÑa´ xÑa+

32
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Notice that this is really two separate statements because of the “if and only if”
• If the limit of f (x) as x approaches a exists and is equal to L, then both the left-hand and
right-hand limits exist and are equal to L. AND,
• If the left-hand and right-hand limits as x approaches a exist and are equal, then the limit as x
approaches a exists and is equal to the one-sided limits.
That is — the limit of f (x) as x approaches a will only exist if it doesn’t matter which way we
approach a (either from left or right) AND if we get the same one-sided limits when we approach
from left and right, then the limit exists.
We can rephrase the above by writing the contrapositives3 of the above statements.
• If either of the left-hand and right-hand limits as x approaches a fail to exist, or if they both
exist but are different, then the limit as x approaches a does not exist. AND,
• If the limit as x approaches a does not exist, then the left-hand and right-hand limits are either
different or at least one of them does not exist.
Here is another limit example.
Example 2.1.9
Consider the following two functions and compute their limits and one-sided limits as x approaches
1:

These are a little different from our previous examples, in that we do not have formulas, only the
sketch. But we can still compute the limits.
• Function on the left — f (x):
lim f (x) = 2 lim f (x) = 2
xÑ1´ xÑ1+

so by the previous theorem


lim f (x) = 2
xÑ1

3 Given a statement of the form “If A then B”, the contrapositive is “If not B then not A”. They are logically
equivalent — if one is true then so is the other. We must take care not to confuse the contrapositive with the
converse. Given “If A then B”, the converse is “If B then A”. These are definitely not the same.
To see this consider the statement “If he is Shakespeare then he is dead.” The converse is “If he is dead then he is
Shakespeare” — clearly garbage since there are plenty of dead people who are not Shakespeare. The contrapositive
is “If he is not dead then he is not Shakespeare” — which makes much more sense.

33
L IMITS 2.1 Q UICK REVIEW OF LIMITS

• Function on the right — g(t ):

lim g(t ) = 2 and lim g(t ) = ´2


tÑ1´ tÑ1+

so by the previous theorem

lim g(t ) = DNE


tÑ1

Example 2.1.9

We have seen two ways in which a limit does not exist — in one case the function oscillated
wildly, and in the other there was some sort of “jump” in the function, so that the left-hand and
right-hand limits were different.
There is a third way that we must also consider. To describe this, consider the following four
functions:

Figure 2.1.1.

None of these functions are defined at x = a, nor do the limits as x approaches a exist. However
we can say more than just “the limits do not exist”.
Notice that the value of function 1 can be made bigger and bigger as we bring x closer and
closer to a. Similarly the value of the second function can be made arbitrarily large and negative (i.e.
make it as big a negative number as we want) by bringing x closer and closer to a. Based on this
observation we have the following definition.

34
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Definition 2.1.10.

We write

lim f (x) = +8
xÑa

when the value of the function f (x) becomes arbitrarily large and positive as x gets closer
and closer to a, without being exactly a.
Similarly, we write

lim f (x) = ´8
xÑa

when the value of the function f (x) becomes arbitrarily large and negative as x gets closer
and closer to a, without being exactly a.

A good example of the above is

1 1
lim = +8 lim ´ = ´8
xÑ0 x2 xÑ0 x2

IMPORTANT POINT: Please do not think of “+8” and “´8” in these statements as numbers.
You should think of lim f (x) = +8 and lim f (x) = ´8 as special cases of lim f (x) = DNE. The
xÑa xÑa xÑa
statement

lim f (x) = +8
xÑa

does not mean “ f (x) approaches infinity as x approaches a.” It means “the function f (x) becomes
arbitrarily large as x approaches a”. These are different statements; remember that 8 is not a
number4 .
Now consider functions 3 and 4 in Figure 2.1.1. Here we can make the value of the function as
big and positive as we want (for function 3) or as big and negative as we want (for function 4) but
only when x approaches a from one side. With this in mind we can construct similar notation and a
similar definition:

4 One needs to be very careful making statements about infinity. At some point in our lives we get around to asking
ourselves “what is the biggest number?” and we realise there isn’t one. That is, we can go on counting integer
after integer forever. Indeed the set of integers is the first infinite thing we really encounter. It is an example of a
countably infinite set. The set of real numbers is actually much bigger and is uncountably infinite. In fact there are
an infinite number of different sorts of infinity! Much of the theory of infinite sets was developed by Georg Cantor,
who is well worth Googling.

35
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Definition 2.1.11.

We write

lim f (x) = +8
xÑa+

when the value of the function f (x) becomes arbitrarily large and positive as x gets closer
and closer to a from above (equivalently — from the right), without being exactly a.
Similarly, we write

lim f (x) = ´8
xÑa+

when the value of the function f (x) becomes arbitrarily large and negative as x gets closer
and closer to a from above (equivalently — from the right), without being exactly a.
The notation

lim f (x) = +8 lim f (x) = ´8


xÑa´ xÑa´

has a similar meaning except that limits are approached from below / from the left.

So for function 3 we have


lim f (x) = +8 lim f (x) = some positive number
xÑa´ xÑa+

and for function 4


lim f (x) = some positive number lim f (x) = ´8
xÑa´ xÑa+

Example 2.1.12
Consider the function
1
g(x ) =
sin(x)
Find the one-sided limits of this function as x Ñ π.
Probably the easiest way to do this is to first plot the graph of sin(x) and 1/x and then think
carefully about the one-sided limits:

36
L IMITS 2.1 Q UICK REVIEW OF LIMITS

• As x Ñ π from the left, sin(x) is a small positive number that is getting closer and closer to
zero. That is, as x Ñ π ´ , we have that sin(x) Ñ 0 through positive numbers (i.e. from above).
Now look at the graph of 1/x, and think what happens as we move x Ñ 0+ , the function is
positive and becomes larger and larger.
So as x Ñ π from the left, sin(x) Ñ 0 from above, and so 1/ sin(x) Ñ +8.

• By very similar reasoning, as x Ñ π from the right, sin(x) is a small negative number that gets
closer and closer to zero. So as x Ñ π from the right, sin(x) Ñ 0 through negative numbers
(i.e. from below) and so 1/ sin(x) to ´8.

Thus
1 1
lim = +8 lim = ´8
xÑπ ´ sin(x) xÑπ + sin(x)

Example 2.1.12

Up to this point we explored limits by sketching graphs or plugging values into a calculator. This
was done to help build intuition, but it is not really the basis of a systematic method for computing
limits. We have also avoided more formal approaches5 since we do not have time in the course to go
into that level of detail and (arguably) we don’t need that detail to achieve the aims of the course.
Thankfully we can develop a more systematic approach based on the idea of building up complicated
limits from simpler ones by examining how limits interact with the basic operations of arithmetic.

2.1.1 §§ Calculating limits with limit laws


Think back to the functions you know and the sorts of things you have been asked to draw, factor
and so on. Then they are all constructed from simple pieces, such as

• constants — c

• monomials or power functions — xn

• trigonometric functions — sin(x), cos(x) and tan(x)

These are the building blocks from which we construct functions. Soon we will add a few more
functions to this list, especially the exponential function and various inverse functions.
We then take these building blocks and piece them together using arithmetic

• addition and subtraction — f (x) = g(x) + h(x) and f (x) = g(x) ´ h(x)

• multiplication — f (x) = g(x) ¨ h(x)


g(x )
• division — f (x) = h(x )

• substitution — f (x) = g(h(x)) — this is also called the composition of g with h.

5 The formal approaches are typically referred to as “epsilon-delta limits” or “epsilon-delta proofs” since the symbols
ε and δ are traditionally used throughout. Your favourite search engine will tell you more, if you’re curious.

37
L IMITS 2.1 Q UICK REVIEW OF LIMITS

What we will learn in this section is how to compute the limits of the basic building blocks and
then how we can compute limits of sums, products and so forth using “limit laws”. This process
allows us to compute limits of complicated functions, using very simple tools and without having to
resort to “plugging in numbers” or “closer and closer” or “ε ´ δ arguments”.
In the examples we saw above, almost all the interesting limits happened at points where the
underlying function was badly behaved — where it jumped, was not defined, or blew up to infinity.
In those cases we had to be careful and think about what was happening. Thankfully most functions
we will see do not have too many points at which these sorts of things happen.
For example, polynomials do not have any nasty jumps and are defined everywhere and do not
“blow up”. If you plot them, they look smooth6 . Polynomials and limits behave very nicely together,
and for any polynomial P(x) and any real number a we have that

lim P(x) = P(a)


xÑa

That is — to evaluate the limit, we just plug in the number. We will build up to this result over the
next few pages.
Let us start with the two easiest limits.

Theorem 2.1.13 (Easiest limits).

Let a, c P R. The following two limits hold

lim c = c and lim x = a.


xÑa xÑa

Since we have not seen too many theorems yet, let us examine it carefully piece by piece.

• Let a, c P R — just as was the case for definitions, we start a theorem by defining terms and
setting the scene. There is not too much scene to set: the symbols a and c are real numbers.

• The following two limits hold — this doesn’t really contribute much to the statement of the
theorem, it just makes it easier to read.

• lim c = c — when we take the limit of a constant function (for example think of c = 3), the
xÑa
limit is (unsurprisingly) just that same constant.

• lim x = a — as we noted above for general polynomials, the limit of the function f (x) = x as
xÑa
x approaches a given point a, is just a. This says something quite obvious — as x approaches
a, x approaches a (if you are not convinced then sketch the graph).

Armed with only these two limits, we cannot do very much. But combining these limits with
some arithmetic we can do quite a lot.

6 We have used this term in an imprecise way, but it does have a precise mathematical meaning.

38
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Theorem 2.1.14 (Arithmetic of limits).

Let a, c P R, let f (x) and g(x) be defined for all x’s that lie in some interval about a (but
f , g need not be defined exactly at a).

lim f (x) = F lim g(x) = G


xÑa xÑa

exist with F, G P R. Then the following limits hold

• lim ( f (x) + g(x)) = F + G — limit of the sum is the sum of the limits.
xÑa

• lim ( f (x) ´ g(x)) = F ´ G — limit of the difference is the difference of the limits.
xÑa

• lim c f (x) = cF.


xÑa

• lim ( f (x) ¨ g(x)) = F ¨ G — limit of the product is the product of limits.


xÑa

f (x ) F 1 1
• If G ‰ 0 then lim = and, in particular, lim = .
xÑa g(x ) G xÑa g(x ) G
Note — be careful with this last one — the denominator cannot be zero.

The above theorem shows that limits interact very simply with arithmetic. If you are asked to
find the limit of a sum then the answer is just the sum of the limits. Similarly the limit of a product
is just the product of the limits.
How do we apply the above theorem to the rational function? Here is a warm-up example:
Example 2.1.15
You are given two functions f , g (not explicitly) which have the following limits as x approaches 1:

lim f (x) = 3 and lim g(x) = 2


xÑ1 xÑ1

Using the above theorem we can compute:

lim 3 f (x) = 3 ˆ 3 = 9
xÑ1
lim 3 f (x) ´ g(x) = 3 ˆ 3 ´ 2 = 7
xÑ1
lim f (x)g(x) = 3 ˆ 2 = 6
xÑ1
f (x ) 3
lim = =3
xÑ1 f (x ) ´ g(x ) 3 ´ 2

Example 2.1.15

Example 2.1.16
Find lim 4x2 ´ 1
xÑ3

39
L IMITS 2.1 Q UICK REVIEW OF LIMITS

We use the arithmetic of limits:


 
2 2
lim 4x ´ 1 = lim 4x ´ lim 1 difference of limits
xÑ3 xÑ3 xÑ3
 
2
= lim 4 ¨ lim x ´ lim 1 product of limits
xÑ3 xÑ3 xÑ3
 
= 4 ¨ lim x2 ´ 1 limit of constant
xÑ3
   
= 4 ¨ lim x ¨ lim x ´ 1 product of limits
xÑ3 xÑ3
= 4¨3¨3´1 limit of x
= 36 ´ 1
= 35

Example 2.1.16
This is an excruciating level of detail, but when you first use a theorem, it is a good idea to do things
step by step. You can go faster when you are comfortable.
Example 2.1.17
x
Yet another limit — compute lim .
xÑ2 x ´ 1
To apply the arithmetic of limits, we need to examine numerator and denominator separately and
make sure the limit of the denominator is non-zero. Numerator first:

lim x = 2 limit of x
xÑ2

and now the denominator:


   
lim x ´ 1 = lim x ´ lim 1 difference of limits
xÑ2 xÑ2 xÑ2
= 2´1 limit of x and limit of constant = 1

Since the limit of the denominator is non-zero we can put it back together to get

x lim x
lim = xÑ2
xÑ2 x ´ 1 lim (x ´ 1)
xÑ2
2
=
1
=2

Example 2.1.17

In the next example we show that many different things can happen if the limit of the denominator
is zero.

40
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Example 2.1.18 (Be careful with limits of ratios)


We must be careful when computing the limit of a ratio — it is the ratio of the limits except when
the limit of the denominator is zero. When the limit of the denominator is zero Theorem 2.1.14 does
not apply and a few interesting things can happen.

• If the limit of the numerator is non-zero then the limit of the ratio does not exist

f (x )
lim = DNE when lim f (x) ‰ 0 and lim g(x) = 0
xÑa g(x) xÑa xÑa

1
For example, lim = DNE.
xÑ0 x2

• If the limit of the numerator is zero then the above theorem does not give us enough information
to decide whether or not the limit exists. It is possible that

x 1
– the limit does not exist, eg. lim = lim = DNE
xÑ0 x2 xÑ0 x

x2 1 ´x2 ´1
– the limit is ˘8, eg. lim 4
= lim 2
= + 8 or lim 4
= lim 2 = ´8.
xÑ0 x xÑ0 x xÑ0 x xÑ0 x

x2
– the limit is zero, eg. lim =0
xÑ0 x
x
– the limit exists and is non-zero, eg. lim =1
xÑ0 x

Now while the above examples are very simple and a little contrived they serve to illustrate the point
we are trying to make — be careful if the limit of the denominator is zero.
Example 2.1.18

Example 2.1.19
2x ´ 3
Let h(x) = 2 and find its limit as x approaches 2.
x + 5x ´ 6
Since this is the limit of a ratio, we compute the limit of the numerator and denominator
separately. Numerator first:
  

lim 2x ´ 3 = lim 2x ´ lim 3 difference of limits
xÑ2 xÑ2 xÑ2
 
= 2 ¨ lim x ´ 3 product of limits and limit of constant
xÑ2
= 2¨2´3 limits of x
=1

41
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Denominator next:
     
2 2
lim x + 5x ´ 6 = lim x + lim 5x ´ lim 6 sum of limits
xÑ2 xÑ2 xÑ2 xÑ2
     
= lim x ¨ lim x + 5 ¨ lim x ´ 6 product of limits and limit of constant
xÑ2 xÑ2 xÑ2
= 2¨2+5¨2´6 limits of x
=8

Since the limit of the denominator is non-zero, we can obtain our result by taking the ratio of the
separate limits.

2x ´ 3 lim 2x ´ 3 1
xÑ2
lim 2 = =
xÑ2 x + 5x ´ 6 2
lim x + 5x ´ 6 8
xÑ2

The above works out quite simply. However, if we were to take the limit as x Ñ 1 then things
are a bit harder. The limit of the numerator is:

lim 2x ´ 3 = 2 ¨ 1 ´ 3 = ´1
xÑ1

(we have not listed all the steps). And the limit of the denominator is

lim x2 + 5x ´ 6 = 1 ¨ 1 + 5 ´ 6 = 0
xÑ1

Since the limit of the numerator is non-zero, while the limit of the denominator is zero, the limit of
the ratio does not exist.

2x ´ 3
lim = DNE
xÑ1 x2 + 5x ´ 6

Example 2.1.19
It is IMPORTANT TO NOTE that it is not correct to write

2x ´ 3 ´1
lim = = DNE
xÑ1 x2 + 5x ´ 6 0

Because we can only write

lim f (x)
f (x) xÑa
lim = = something
xÑa g(x) lim g(x)
xÑa

when the limit of the denominator is non-zero (see Example 2.1.18 above).
With a little care you can use the arithmetic of limits to obtain the following rules for limits of
powers of functions and limits of roots of functions:

42
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Theorem 2.1.20 (More arithmetic of limits — powers and roots).

Let n be a positive integer, let a P R and let f be a function so that

lim f (x) = F
xÑa

for some real number F. Then the following holds


 n
lim ( f (x))n = lim f (x) = F n
xÑa xÑa

so that the limit of a power is the power of the limit. Similarly, if

• n is an even number and F ą 0, or

• n is an odd number and F is any real number

then
 1/n
lim ( f (x))1/n = lim f (x) = F 1/n
xÑa xÑa

More generally7 , if F ą 0 and p is any real number,


 p
p
lim ( f (x)) = lim f (x) = F p
xÑa xÑa

Notice that we have to be careful when taking roots of limits that might be negative numbers. To
see why, consider the case n = 2, the limit

lim x1/2 = 41/2 = 2


xÑ4
lim (´x)1/2 = (´4)1/2 = not a real number
xÑ4

In order to evaluate such limits properly we need to use complex numbers which are beyond the
scope of this text.
Also note that the notation x1/2 refers to the positive square root of x. While 2 and (´2) are both
numbers whose squares are 4, the notation 41/2 means 2. This is something we must be careful of8 .
So again — let us do a few examples and carefully note what we are doing.
Example 2.1.21

7 You may not know the definition of the power b p when p is not a rational number, so here it is. If b ą 0 and p is
any real number, then b p is the limit of br as r approaches p through rational numbers. We won’t do so here, but it
is possible to prove that the limit exists.
8 Like ending sentences in prepositions — “This is something up with which we will not put.” This quote is attributed
to Churchill though there is some dispute as to whether or not he really said it.

43
L IMITS 2.1 Q UICK REVIEW OF LIMITS

 1/3
2 1/3 2
lim (4x ´ 3) = ( lim 4x ) ´ ( lim 3)
xÑ2 xÑ2 xÑ2
1/3
= 4 ¨ 22 ´ 3
= (16 ´ 3)1/3
= 131/3

Example 2.1.21

By combining the last few theorems we can make the evaluation of limits of polynomials and
rational functions much easier:
Theorem 2.1.22 (Limits of polynomials and rational functions).

Let a P R, let P(x) be a polynomial and let R(x) be a rational function. Then

lim P(x) = P(a)


xÑa

and provided R(x) is defined at x = a then

lim R(x) = R(a)


xÑa

If R(x) is not defined at x = a then we are not able to apply this result.

So the previous examples are now much easier to compute:


2x ´ 3 4´3 1
lim = =
xÑ2 x2 + 5x ´ 6 4 + 10 ´ 6 8
lim (4x2 ´ 1) = 16 ´ 1 = 15
xÑ2
x 2
lim = = 2
xÑ2 x ´ 1 2´1
It is clear that limits of polynomials are very easy, while those of rational functions are easy
except when the denominator might go to zero. We have seen examples where the resulting limit
does not exist, and some where it does. We now work to explain this more systematically. The
following example demonstrates that it is sometimes possible to take the limit of a rational function
to a point at which the denominator is zero. Indeed we must be able to do exactly this in order to be
able to define derivatives in the next chapter.
Example 2.1.23
Consider the limit
x3 ´ x2
lim .
xÑ1 x ´ 1

44
L IMITS 2.1 Q UICK REVIEW OF LIMITS

If we try to apply the arithmetic of limits then we compute the limits of the numerator and denomi-
nator separately

lim x3 ´ x2 = 1 ´ 1 = 0 (2.1.1)
xÑ1
lim x ´ 1 = 1 ´ 1 = 0 (2.1.2)
xÑ1

Since the denominator is zero, we cannot apply our theorem and we are, for the moment, stuck.
However, there is more that we can do here — the hint is that the numerator and denominator both
approach zero as x approaches 1. This means that there might be something we can cancel.
So let us play with the expression a little more before we take the limit:

x3 ´ x2 x2 ( x ´ 1 )
= = x2 provided x ‰ 1.
x´1 x´1
So what we really have here is the following function
#
x3 ´ x2 x2 x‰1
=
x´1 undefined x=1

If we plot the above function the graph looks exactly the same as y = x2 except that the function is
not defined at x = 1 (since at x = 1 both numerator and denominator are zero).

When we compute a limit as x Ñ a, the value of the function exactly at x = a is irrelevant. We only
care what happens to the function as we bring x very close to a. So for the above problem we can
write
x3 ´ x2
= x2 when x is close to 1 but not at x = 1
x´1

So the limit as x Ñ 1 of the function is the same as the limit lim x2 since the functions are the same
xÑ1
except exactly at x = 1. By this reasoning we get

x3 ´ x2
lim = lim x2 = 1
xÑ1 x ´ 1 xÑ1

Example 2.1.23
The reasoning in the above example can be made more general:

45
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Theorem 2.1.24.

If f (x) = g(x) except when x = a then lim f (x) = lim g(x) provided the limit of g exists.
xÑa xÑa

How do we know when to use this theorem? The big clue is that when we try to compute the
limit in a naive way, we end up with 00 . We know that 00 does not make sense, but it is an indication
that there might be a common factor between numerator and denominator that can be cancelled. In
the previous example, this common factor was (x ´ 1).
Example 2.1.25
Using this idea, compute

(1 + h)2 ´ 1
lim
hÑ0 h

• First we should check that we cannot just substitute h = 0 into this — clearly we cannot
because the denominator would be 0.

• But we should also check the numerator to see if we have 00 , and we see that the numerator
gives us 1 ´ 1 = 0.

• Thus we have a hint that there is a common factor that we might be able to cancel. So now we
look for the common factor and try to cancel it.

(1 + h)2 ´ 1 1 + 2h + h2 ´ 1
= expand
h h
2h + h2 h(2 + h)
= = factor and then cancel
h h
= 2+h

• Thus we really have that


#
(1 + h)2 ´ 1 2+h h‰0
=
h undefined h = 0

and because of this

(1 + h)2 ´ 1
lim = lim 2 + h
hÑ0 h hÑ0
=2

Example 2.1.25
We have written everything out in great detail here — way more than is required for a solution to
such a problem. Let us do it again a little more succinctly.

46
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Example 2.1.26
Compute the following limit:

(1 + h)2 ´ 1
lim
hÑ0 h

If we try to use the arithmetic of limits, then we see that the limit of the numerator and the limit of
the denominator are both zero. Hence we should try to factor them and cancel any common factor.
This gives

(1 + h)2 ´ 1 1 + 2h + h2 ´ 1
lim = lim
hÑ0 h hÑ0 h
= lim 2 + h
hÑ0
=2

Example 2.1.26
Notice that even though we did this example carefully above, we have still written some text in our
working explaining what we have done. You should always think about the reader and, if in doubt,
put in more explanation rather than less.

2.1.2 §§ Limits at infinity

Up until this point we have discussed what happens to a function as we move its input x closer and
closer to a particular point a. For a great many applications of limits we need to understand what
happens to a function when its input becomes extremely large — for example what happens to a
population at a time far in the future.
The definition of a limit at infinity has a similar flavour to the definition of limits at finite points
that we saw above, but the details are a little different. We also need to distinguish between positive
and negative infinity. As x becomes very large and positive it moves off towards +8 but when it
becomes very large and negative it moves off towards ´8.
Again we give an informal definition; the full formal definition is beyond the scope of this course.

47
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Definition 2.1.27 (Limits at infinity — informal).

We write

lim f (x) = L
xÑ8

when the value of the function f (x) gets closer and closer to L as we make x larger and
larger and positive.
Similarly we write

lim f (x) = L
xÑ´8

when the value of the function f (x) gets closer and closer to L as we make x larger and
larger and negative.

Example 2.1.28
Consider the two functions depicted below

The dotted horizontal lines indicate the behaviour as x becomes very large. The function on the left
has limits as x Ñ 8 and as x Ñ ´8 since the function “settles down” to a particular value. On the
other hand, the function on the right does not have a limit as x Ñ ´8 since the function just keeps
getting bigger and bigger.
Example 2.1.28

Just as was the case for limits as x Ñ a we will start with two very simple building blocks and
build other limits from those.
Theorem 2.1.29.

Let c P R then the following limits hold

lim c = c lim c = c
xÑ8 xÑ´8
1 1
lim =0 lim =0
xÑ8 x xÑ´8 x

48
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Again, these limits interact nicely with standard arithmetic:

Theorem 2.1.30 (Arithmetic of limits at infinity).

Let f (x), g(x) be two functions for which the limits

lim f (x) = F lim g(x) = G


xÑ8 xÑ8

exist. Then the following limits hold

lim f (x) ˘ g(x) = F ˘ G


xÑ8
lim f (x)g(x) = FG
xÑ8
f (x ) F
lim = provided G ‰ 0
xÑ8 g(x ) G

and for real numbers p

lim f (x) p = F p provided F p and f (x) p are defined for all x


xÑ8

The analogous results hold for limits to ´8.

Note that, as was the case in Theorem 2.1.20, we need a little extra care with powers of functions.
We must avoid taking square roots of negative numbers, or indeed any even root of a negative
number9 .
Hence we have for all rational r ą 0
1
lim =0
xÑ8 xr

but we have to be careful with


1
lim =0
xÑ´8 xr

This is only true if the denominator of r is not an even number10 .


For example
1 1
• lim = 0, but lim does not exist, because x1/2 is not defined for x ă 0.
xÑ8 x1/2 xÑ´8 x1/2

1
• On the other hand, x4/3 is defined for negative values of x and lim = 0.
xÑ´8 x4/3

9 To be more precise, there is no real number x so that xeven power is a negative number. Hence we cannot take the
even-root of a negative number and express it as a real number. This is precisely what complex numbers allow us
to do, but alas, there is not space in the course for us to explore them.
10 where we write r = qp with p, q integers with no common factors. For example, r = 14 6
should be written as r = 73
when considering this rule.

49
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Our first application of limits at infinity will be to examine the behaviour of a rational function
for very large x. To do this we use a “trick”.
Example 2.1.31
Compute the following limit:

x2 ´ 3x + 4
lim
xÑ8 3x2 + 8x + 1

As x becomes very large, it is the x2 term that will dominate in both the numerator and denominator
and the other bits become irrelevant. (This is the asymptotic reasoning you’ve seen earlier.) That is,
for very large x, x2 is much much larger than x or any constant. So we pull out these dominant parts
 
x 2 1´ 3 + 4
x ´ 3x + 4
2 x x 2

2
=  
3x + 8x + 1 x2 3 + 8 + 1
x x2
1 ´ 3x + x42
= remove the common factors
3 + 8x + x12

x2 ´ 3x + 4 1 ´ 3x + x42
lim = lim
xÑ8 3x2 + 8x + 1 xÑ8 3 + 8 + 1
2
 x x 
3 4
lim 1 ´ + 2
xÑ8 x x
=   arithmetic of limits
8 1
lim 3 + + 2
xÑ8 x x
3 4
lim 1 ´ lim + lim 2
xÑ8 x xÑ8 x
= xÑ8 more arithmetic of limits
8 1
lim 3 + lim + lim 2
xÑ8 xÑ8 x xÑ8 x
1+0+0 1
= = .
3+0+0 3

Example 2.1.31

The following one gets a little harder.


Example 2.1.32
?
4x +12
Find the limit as x Ñ 8 of 5x´1 .
We use the same trick — try to work out what is the biggest term in the numerator and
denominator and pull it to one side.

• The denominator is dominated by 5x.

• The biggest contribution to the numerator comes from the 4x2 inside the square-root.? When
we pull x outside the square-root it becomes x, so the numerator is dominated by x ¨ 4 = 2x
2

50
L IMITS 2.1 Q UICK REVIEW OF LIMITS

• To see this more explicitly rewrite the numerator


a b ? a a
4x + 1 = x2 (4 + 1/x2 ) = x2 4 + 1/x2 = x 4 + 1/x2 .
2

• Thus the limit as x Ñ 8 is


? ?
4x2 + 1 x 4 + 1/x2
lim = lim
xÑ8 5x ´ 1 xÑ8 x(5 ´ 1/x)
?
4 + 1/x2
= lim
xÑ8 5 ´ 1/x
2
= .
5

Example 2.1.32
?
4x +1 2
Now let us also think about the limit of the same function, 5x´1 , as x Ñ ´8. There is
something subtle going on because of the square-root. First consider the function11
?
h(t ) = t 2 .
Evaluating this at t = 7 gives
? ?
h(7) = 72 = 49 = 7.
?
We’ll get much the same thing for any t ě 0. For any t ě 0, h(t ) = t 2 returns exactly t. However
now consider the function at t = ´3
b ?
h(´3) = (´3)2 = 9 = 3 = ´(´3);
that is, the function is returning ´1 times the input.
?
This is because when we defined , we defined it to be the positive square-root. i.e. the function
?
t can never return a negative number. So being more careful
?
h(t ) = t 2 = |t|,
where the |t| is the absolute value of t. You are perhaps used to thinking of absolute value as “remove
the minus sign”, but this is not quite correct. Let’s sketch the function:

Figure 2.1.2.

11 Just to change things up let’s use t and h(t ) instead of the ubiquitous x and f (x).

51
L IMITS 2.1 Q UICK REVIEW OF LIMITS

It is a piecewise function defined by


#
x xě0
|x| =
´x x ă 0.

Hence our function h(t ) is really


?
#
t t ě0
h(t ) = t 2 =
´t t ă0

so that when we evaluate h(´7) it is


b ?
h(´7) = (´7)2 = 49 = 7 = ´(´7).

We are now ready to examine the limit as x Ñ ´8 in our previous example. Mostly it is copy and
paste from above.
Example 2.1.33
?
2
4x +1
Find the limit as x Ñ ´8 of 5x´1
We use the same trick — try to work out what is the biggest term in the numerator and
denominator and pull it to one side. Since we are taking the limit as x Ñ ´8 we should think of x
as a large negative number.
• The denominator is dominated by 5x.
• The biggest contribution to the numerator comes from the 4x2 inside the square-root. When
|x| = ´x (since we are taking the limit as
we pull the x2 outside a square-root it becomes ?
x Ñ ´8), so the numerator is dominated by ´x ¨ 4 = ´2x
• To see this more explicitly rewrite the numerator
a b ? a
4x + 1 = x2 (4 + 1/x2 ) = x2 4 + 1/x2
2
a
= |x| 4 + 1/x2 and since x ă 0 we have
a
= ´x 4 + 1/x2

• Thus the limit as x Ñ ´8 is


? ?
4x2 + 1 ´x 4 + 1/x2
lim = lim
xÑ´8 5x ´ 1 xÑ´8 x(5 ´ 1/x)
?
´ 4 + 1/x2
= lim
xÑ´8 5 ´ 1/x
2
=´ .
5

Example 2.1.33
So the limit as x Ñ ´8 is almost the same but we gain a minus sign. This is definitely not the case
in general — you have to think about each example separately.
Here is a sketch of the function in question.

52
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Figure 2.1.3.

Example 2.1.34
Compute the following limit:
 
7/5
lim x ´ x
xÑ8

From our asymptotic reasoning, we know the higher-power power function will dominate for
 x grow without bound as x Ñ 8, the first term will be
large values of x. So, although both x7/5 and
much bigger. So, we expect lim x7/5 ´ x = 8.
xÑ8
That’s a fine way of computing the limit, but for interest, let’s see how it would go using
arithmetic of limits. In this case we cannot use the arithmetic of limits to write this as
     
lim x7/5 ´ x = lim x7/5 ´ lim x
xÑ8 xÑ8 xÑ8
= 8´8

because the limits do not exist. We can only use the limit laws when the limits exist. So we should
go back and think some more.
When x is very large, x7/5 = x ¨ x2/5 will be much larger than x, so the x7/5 term will dominate
the x term. So factor out x7/5 and rewrite it as
 
7/5 7/5 1
x ´x = x 1 ´ 2/5
x

Consider what happens to each of the factors as x Ñ 8

• For large x, x7/5 ą x (this is actually true for any x ą 1). In the limit as x Ñ +8, x becomes
arbitrarily large and positive, and x7/5 must be bigger still, so it follows that

lim x7/5 = +8.


xÑ8

• On the other hand, (1 ´ x´2/5 ) becomes closer and closer to 1 — we can use the arithmetic
of limits to write this as

lim (1 ´ x´2/5 ) = lim 1 ´ lim x´2/5 = 1 ´ 0 = 1.


xÑ8 xÑ8 xÑ8

53
L IMITS 2.1 Q UICK REVIEW OF LIMITS

So the product of these two factors will be come larger and larger (and positive) as x moves off to
infinity. Hence we have
 
lim x7/5 1 ´ 1/x2/5 = +8.
xÑ8

Example 2.1.34
But remember +8 and ´8 are not numbers; the last equation in the example is shorthand for “the
function becomes arbitrarily large”.

In the previous section we saw that finite limits and arithmetic interact very nicely (see The-
orems 2.1.14 and 2.1.20). This enabled us to compute the limits of more complicated function
in terms of simpler ones. When limits of functions go to plus or minus infinity we are quite a
bit more restricted in what we can deduce. The next theorem states some results concerning the
sum, difference, ratio and product of infinite limits — unfortunately in many cases we cannot make
general statements and the results will depend on the details of the problem at hand.

54
L IMITS 2.1 Q UICK REVIEW OF LIMITS

Theorem 2.1.35 (Arithmetic of infinite limits).

Let a, c, H P R and let f , g, h be functions defined in an interval around a (but they need
not be defined at x = a), so that

lim f (x) = +8 lim g(x) = +8 lim h(x) = H


xÑa xÑa xÑa

• lim ( f (x) + g(x)) = +8


xÑa

• lim ( f (x) + h(x)) = +8


xÑa

• lim ( f (x) ´ g(x)) undetermined


xÑa

• lim ( f (x) ´ h(x)) = +8


xÑa
$
&+8 c ą 0

• lim c f (x) = 0 c=0
xÑa ’
´8 c ă 0
%

• lim ( f (x) ¨ g(x)) = +8.


xÑa
$
&+8
’ H ą0
• lim f (x)h(x) = ´8 H ă0
xÑa ’
undetermined H = 0
%

f (x )
• lim undetermined
xÑa g(x)
$
’+8 H ą0
f (x ) &
• lim = ´8 H ă0
xÑa h(x) ’
undetermined H = 0
%

h(x )
• lim =0
xÑa f (x)
$
&+8
’ pą0
p
• lim f (x) = 0 pă0
xÑa ’
1 p=0
%

Note that by “undetermined” we mean that the limit may or may not exist, but cannot be
determined from the information given in the theorem. See Example 2.1.18 for an example of what
we mean by “undetermined”. Additionally consider the following example.
Example 2.1.36

55
L IMITS 2.2 A SYMPTOTES

Consider the following 3 functions:

f (x) = x´2 g(x) = 2x´2 h(x) = x´2 ´ 1.

Their limits as x Ñ 0 are:

lim f (x) = +8 lim g(x) = +8 lim h(x) = +8.


xÑ0 xÑ0 xÑ0

Say we want to compute the limit of the difference of two of the above functions as x Ñ 0. Then
the previous theorem cannot help us. This is not because it is too weak, rather it is because the
difference of two infinite limits can be, either plus infinity, minus infinity or some finite number
depending on the details of the problem. For example,

lim ( f (x) ´ g(x)) = lim ´x´2 = ´8


xÑ0 xÑ0
lim ( f (x) ´ h(x)) = lim 1 = 1
xÑ0 xÑ0
lim (g(x) ´ h(x)) = lim x´2 + 1 = +8
xÑ0 xÑ0

Example 2.1.36

2.2 IJ Asymptotes

Learning Objectives
• Evaluate limits of polynomial, rational, trigonometric, exponential, and logarithmic
functions.

• Explain using both informal language and the language of limits what it means for a
function to have a horizontal or vertical asymptote.

• Given a simple function, find its vertical and horizontal asymptotes by asymptotic
reasoning or by taking limits.

• Explain why it is not true that a function cannot cross its horizontal asymptote.

Definition 2.2.1.

Let f (x) be a function. If lim f (x) = L OR lim f (x) = L, for some real number L,
xÑ8 xÑ´8
then we say the line y = L is a horizontal asymptote of f (x).

Example 2.2.2
3x4
Consider the function f (x) = 1 + x4
, pictured below.

56
L IMITS 2.2 A SYMPTOTES

y
y = f (x )

For large positive and large negative values of x, the function looks nearly flat. To investigate
this ‘flatness,’ we can take limits at infinity. This can be done using algebra, or asymptotics.

Option 1, algebra:

3x4 3x4 1/x4


lim = lim ¨
xÑ8 1 + x4 xÑ8 1 + x4 1/x4
3
= lim 1
xÑ8
x4
+1
3
= =3
0+1

That is: as x gets larger and larger, f (x) gets closer and closer to 3.
The computation is similar for lim f (x).
xÑ´8

Option 2, asymptotics: Let’s consider very large positive values of x. The denominator 1 + x4
4
behaves much like x4 when |x| is large, so the entire function behaves much like 3x x4
, which is
just the constant 3. That is: for very large positive values of x, the function looks quite a lot
like the horizontal line y = 3.
The computation is similar for lim f (x).
xÑ´8

So, this function has a horizontal asymptote, y = 3. This is often emphasized in a sketch with
dashed lines.

3 y = f (x )

57
L IMITS 2.2 A SYMPTOTES

Example 2.2.2

Example 2.2.3 (Horizontal asymptotes of ex )


Question: Does ex have any horizontal asymptotes?
Solution: We should know the following two limits by heart:

lim ex = 0 and lim ex = 8


xÑ´8 xÑ8

Since 0 is a real number, we see that ex has a horizontal asymptote at y = 0. Since 8 is not a
real number, y = 0 is the only horizontal asymptote of ex .

y
y = ex

Example 2.2.3

Example 2.2.4 (Functions that cross their horizontal asymptote)


In examples 2.2.2 and 2.2.3, the function never actually takes on the value of its horizontal asymptote.
4
There is no real number x for which 13x + x4
= 3, and there is no real number x for which ex = 0.
However, there is no reason why a function in general can’t take on the value of its horizontal
asymptote. We’ll show three examples below, of increasing complexity.

First example: A constant function is equal to its horizontal asymptote everywhere.


For example, consider f (x) = 2, shown below. Since lim f (x) = 2, y = 2 is a horizontal
xÑ8
asymptote. There are lots of real numbers x (in fact: all of them) where f (x) = 2.

y
y = f (x )

58
L IMITS 2.2 A SYMPTOTES

Second example: A function might take on the value of its horizontal asymptote while the function
is busy not pretending to be constant.
x
A function such as g(x) = 1 + x2
has a horizontal asymptote of y = 0 both to the left, and to
the right:
x x
lim =0 and lim =0
xÑ´8 1 + x2 xÑ8 1 + x2

That is, when x is very large (positive or negative) g(x) is nearly constant. However, g(x) is
not ‘nearly constant’ everywhere. When x is close to 0, g(x) moves around quite a bit. And, at
the origin, we just so happen to have g(0) = 0.

y = g(x )
x

Third example: A function might cross its horizontal asymptote infinitely many times.
sin x
Consider h(x) = x . As |x| grows larger and larger, the magnitude, or absolute value, of h(x)
shrinks to 0:
sin x sin x
lim = 0 and lim =0
xÑ´8 x xÑ8 x
However, the sign of the function changes endlessly: it’s positive for 0 ă x ă π, negative for
π ă x ă 2π, positive again for 2π ă x ă 3π, etc. That leads to an oscillating behaviour. In
particular, h(x) = 0 when x is a nonzero integer multiple of π.
Remark: the oscillating behaviour in the sketch below has been exaggerated. In a more
accurate sketch, h(x) quickly appears indistinguishable from 0.

x
y = h(x )

Example 2.2.4

The counterpart to the horizontal asymptote is, not surprisingly, the vertical asymptote.

59
L IMITS 2.2 A SYMPTOTES

Definition 2.2.5.

Let a be a real number and let f (x) be a function. We say f (x) has a vertical asymptote at
a if at least one of the following is true:

• lim f (x) = 8, or lim f (x) = ´8;


xÑa xÑa

• lim f (x) = 8, or lim f (x) = ´8; or


xÑa´ xÑa´

• lim f (x) = 8, or lim f (x) = ´8.


xÑa+ xÑa+

That is, a function has a vertical asymptote where it has an infinite discontinuity (see section 2.3
for more about continuity).

Example 2.2.6 (Symmetrical vertical asymptote)


1
The function y = x2
has a vertical asymptote at x = 0, because lim 1x = 8.
xÑ0
(This function also has a horizontal asymptote: y = 0.)

1
y= x2
x

Example 2.2.6

Example 2.2.7 (Asymmetrical vertical asymptote)


1 1 1
The function y = x has a vertical asymptote at x = 0, because lim x = 8 and lim x = ´8.
xÑ0+ xÑ0´
(This function also has a horizontal asymptote at y = 0.)

60
L IMITS 2.2 A SYMPTOTES

1
y= x
x

Example 2.2.7

Example 2.2.8 (One-sided vertical asymptote)


The function y = log x has a vertical asymptote at x = 0, because lim log x = ´8. This function
xÑ0+
has no horizontal asymptotes.

y
y = log x

Example 2.2.8

Example 2.2.9 (Using limits to sketch)


1
Consider the function f (x) = e x . Use the limits as x approaches 0, and as x goes to positive or
negative infinity, to give a very rough sketch of y = e1/x . Include all asymptotes.
To evaluate the one-sided limits as x approaches 0, we use our limit laws. In particular, we’ll
break down the function into two pieces: the exponential piece, and the 1x piece. To make this very
explicit, we’ll set t = 1x .

61
L IMITS 2.2 A SYMPTOTES

1
lim t = lim =8
xÑ0+ xÑ0+ x
1
ùñ lim e x = lim et = 8
xÑ0+ tÑ8

1
This tells us e x has a vertical asymptote at x = 0. Now let’s find the limit from the other side.

1
lim t = lim = ´8
xÑ0´ xÑ0´ x
1
ùñ lim e x = lim et = 0
xÑ0´ tÑ´8

So, interestingly, the limit from the right is infinite, while the limit from the left is finite. Finally,
let’s consider large-magnitude values of x:

1
lim t = lim =0
xÑ˘8 x
xÑ˘8
ùñ lim e1/x =e =1 0
xÑ˘8

So there is a horizontal asymptote of y = 1 on both the left and the right.


Let’s combine these observations with some pre-calculus knowledge.

• When x is large and positive, e1/x « 1. Since x ą 0, then 1x ą 0, so e1/x ą 1. So, on the far
right of our graph, our function will be close to 1, but a little larger.

• When x is large and negative, e1/x « 1. Since x ă 0, then 1x ă 0, so e1/x ă 1. So, on the far
left of our graph, our function will be close to 1, but a little smaller.

• When x approaches 0 from the left, e1/x approaches 0. So, from the left, our function will
approach the origin. Note, however, that e1/x is not defined at x = 0.

• When x approaches 0 from the right, e1/x will blow up, increasing without bound.

These behaviours together can help us make a rough sketch of y = f (x).

62
L IMITS 2.3 L IMITS AND CONTINUITY

For interest, a more accurate graph of y = f (x) is shown below. We repeat that you do not need
to know how to achieve this level of accuracy right now, but you will learn it later.

1
y = ex

Example 2.2.9

2.3 IJ Limits and continuity

Learning Objectives
• Explain informally and formally what it means for a function to be continuous on its
domain.

• Identify and classify points of discontinuity (jump, infinite, removable).

• Determine where a given function is continuous. Use formal notation as well as


informal explanation.

• Given a function defined with parameters, select parameter values that make the
function continuous.

We have seen that computing the limits some functions — polynomials and rational functions —
is very easy because

lim f (x) = f (a).


xÑa

That is, the the limit as x approaches a is just f (a). Roughly speaking, the reason we can compute
the limit this way is that these functions do not have any abrupt jumps near a.
Many other functions have this property, sin(x) for example. A function with this property is
called “continuous” and there is a precise mathematical definition for it.

63
L IMITS 2.3 L IMITS AND CONTINUITY

Definition 2.3.1.

A function f (x) is continuous at a if

lim f (x) = f (a).


xÑa

If a function is not continuous at a then it is said to be discontinuous at a.


When we write that f is continuous without specifying a point, then typically this means
that f is continuous at a for all a P R.
When we write that f (x) is continuous on the open interval (a, b) then the function is
continuous at every point c satisfying a ă c ă b.

So if a function is continuous at x = a we immediately know that

• f (a) exists

• lim f (x) exists and is equal to f (a), and


xÑa´

• lim f (x) exists and is equal to f (a).


xÑa+

We already know from our work above that polynomials are continuous, and that rational
functions are continuous at all points in their domains — i.e. where their denominators are non-zero.
As we did for limits, we will see that continuity interacts “nicely” with arithmetic. This will allow
us to construct complicated continuous functions from simpler continuous building blocks (like
polynomials).
But first, a few examples. . .
Example 2.3.2
Consider the functions drawn below

These are
# # #
xă1 x‰0 x3 ´x2
x 1/x2 x´1 x‰1
f (x ) = g(x ) = h(x ) =
x+2 xě1 0 x=0 0 x=1

Determine where they are continuous and discontinuous:

64
L IMITS 2.3 L IMITS AND CONTINUITY

• When x ă 1 then f (x) is a straight line (and so a polynomial) and so it is continuous at every
point x ă 1. Similarly when x ą 1 the function is a straight line and so it is continuous at
every point x ą 1. The only point which might be a discontinuity is at x = 1. We see that the
one sided limits are different. Hence the limit at x = 1 does not exist and so the function is
discontinuous at x = 1.

But note that that f (x) is continuous from one side — which?

• The middle case is much like the previous one. When x ‰ 0 the g(x) is a rational function
and so is continuous everywhere on its domain (which is all reals except x = 0). Thus the
only point where g(x) might be discontinuous is at x = 0. We see that neither of the one-sided
limits exist at x = 0, so the limit does not exist at x = 0. Hence the function is discontinuous
at x = 0.

• We have seen the function h(x) before. By the same reasoning as above, we know it is
continuous except at x = 1 which we must check separately.

By definition of h(x), h(1) = 0. We must compare this to the limit as x Ñ 1. We did this
before.

x3 ´ x2 x2 ( x ´ 1 )
= = x2
x´1 x´1

3 2
So limxÑ1 xx´1
´x
= limxÑ1 x2 = 1 ‰ h(1). Hence h is discontinuous at x = 1.

Example 2.3.2

This example illustrates different sorts of discontinuities:

• The function f (x) has a “jump discontinuity” because the function “jumps” from one finite
value on the left to another value on the right.

• The second function, g(x), has an “infinite discontinuity” since lim f (x) = +8.

• The third function, h(x), has a “removable discontinuity” because we could make the function
continuous at that point by redefining the function at that point. i.e. setting h(1) = 1. That is
#
x3 ´x2
x´1 x‰1
new function h(x) =
1 x=1

Showing a function is continuous can be a pain, but just as the limit laws help us compute
complicated limits in terms of simpler limits, we can use them to show that complicated functions
are continuous by breaking them into simpler pieces.

65
L IMITS 2.3 L IMITS AND CONTINUITY

Theorem 2.3.3 (Arithmetic of continuity).

Let a, c P R and let f (x) and g(x) be functions that are continuous at a. Then the following
functions are also continuous at x = a:

• f (x) + g(x) and f (x) ´ g(x),

• c f (x) and f (x)g(x), and


f (x )
• g(x )
provided g(a) ‰ 0.

Above we stated that polynomials and rational functions are continuous (being careful about
domains of rational functions — we must avoid the denominators being zero) without making it a
formal statement. This is easily fixed. . .

Lemma 2.3.4.

Let c P R. The functions

f (x ) = x g(x ) = c

are continuous everywhere on the real line

This isn’t quite the result we wanted (that’s a couple of lines below) but it is a small result that
we can combine with the arithmetic of limits to get the result we want. Such small helpful results
are called “lemmas” and they will arise more as we go along.
Now since we can obtain any polynomial and any rational function by carefully adding, subtract-
ing, multiplying and dividing the functions f (x) = x and g(x) = c, the above lemma combines with
the “arithmetic of continuity” theorem to give us the result we want:

Theorem 2.3.5 (Continuity of polynomials and rational functions).

Every polynomial is continuous everywhere. Similarly every rational function is continu-


ous except where its denominator is zero (i.e. on all its domain).

With some more work this result can be extended to wider families of functions:

66
L IMITS 2.3 L IMITS AND CONTINUITY

Theorem 2.3.6.

The following functions are continuous everywhere in their domains

• polynomials, rational functions

• roots and powers

• trig functions and their inverses

• exponential and the logarithm

We haven’t encountered inverse trigonometric functions, nor exponential functions or logarithms,


but we will see them in the next chapter. For the moment, just file the information away.
Using a combination of the above results you can show that many complicated functions are
continuous except at a few points (usually where a denominator is equal to zero).
Example 2.3.7
sin(x)
Where is the function f (x) = 2+cos(x) continuous?
We just break things down into pieces and then put them back together keeping track of where
things might go wrong.
• The function is a ratio of two pieces — so check if the numerator is continuous, the denomina-
tor is continuous, and if the denominator might be zero.
• The numerator is sin(x) which is “continuous on its domain” according to one of the above
theorems. Its domain is all real numbers12 , so it is continuous everywhere. No problems here.
• The denominator is the sum of 2 and cos(x). Since 2 is a constant it is continuous everywhere.
Similarly (we just checked things for the previous point) we know that cos(x) is continuous
everywhere. Hence the denominator is continuous.
• So we just need to check if the denominator is zero. One of the facts that we should know13 is
that

´1 ď cos(x) ď 1

and so by adding 2 we get

1 ď 2 + cos(x) ď 3

Thus no matter what value of x, 2 + cos(x) ě 1 and so cannot be zero.


• So the numerator is continuous, the denominator is continuous and nowhere zero, so the
function is continuous everywhere.

12 Remember that sin and cos are defined on all real numbers, so tan(x) = sin(x)/ cos(x) is continuous everywhere
except where cos(x) = 0. This happens when x = π2 + nπ for any integer n. If you cannot remember where tan(x)
“blows up” or sin(x) = 0 or cos(x) = 0 then you should definitely revise trigonometric functions. Come to think of
it — just revise them anyway.
13 If you do not know this fact then you should revise trigonometric functions. See the previous footnote.

67
L IMITS 2.3 L IMITS AND CONTINUITY

sin(x)
If the function were changed to much of the same reasoning can be used. Being a
x2 ´ 5x + 6
little terse we could answer with:

• Numerator and denominator are continuous.

• Since x2 ´ 5x + 6 = (x ´ 2)(x ´ 3) the denominator is zero when x = 2, 3.

• So the function is continuous everywhere except possibly at x = 2, 3. In order to verify that


the function really is discontinuous at those points, it suffices to verify that the numerator is
non-zero at x = 2, 3. Indeed we know that sin(x) is zero only when x = nπ (for any integer n).
Hence sin(2), sin(3) ‰ 0. Thus the numerator is non-zero, while the denominator is zero and
hence x = 2, 3 really are points of discontinuity.

Note that this example raises a subtle point about checking continuity when numerator and denomi-
nator are simultaneously zero. There are quite a few possible outcomes in this case and we need
more sophisticated tools to adequately analyse the behaviour of functions near such points. We will
return to this question later in the text after we have developed Taylor expansions.
Example 2.3.7

So we know what happens when we add subtract multiply and divide, but what about when we
compose functions? Well — limits and compositions work nicely when things are continuous.

Theorem 2.3.8 (Compositions and continuity).

If f is continuous at b and lim g(x) = b then lim f (g(x)) = f (b). I.e.


xÑa xÑa
 
lim f (g(x)) = f lim g(x)
xÑa xÑa

Hence if g is continuous at a and f is continuous at g(a) then the composite function


( f ˝ g)(x) = f (g(x)) is continuous at a.

So when we compose two continuous functions we get a new continuous function.


We can put this to use
Example 2.3.9
Where are the following functions continuous?

f (x) = sin x2 + cos(x)
b
g(x) = sin(x)

Our first step should be to break the functions down into pieces and study them. When we put them
back together we should be careful of dividing by zero, or falling outside the domain.

• The function f (x) is the composition of sin(x) with x2 + cos(x).

• These pieces, sin(x), x2 , cos(x) are continuous everywhere.

68
L IMITS 2.3 L IMITS AND CONTINUITY

• So the sum x2 + cos(x) is continuous everywhere

• And hence the composition of sin(x) and x2 + cos(x) is continuous everywhere.

The second function is a little trickier.


?
• The function g(x) is the composition of x with sin(x).
?
• x is continuous on its domain x ě 0.

• sin(x) is continuous everywhere, but it is negative in many places.

• In order for g(x) to be defined and continuous we must restrict x so that sin(x) ě 0.

• Recall the graph of sin(x):

Hence sin(x) ě 0 when x P [0, π ] or x P [2π, 3π ] or x P [´2π, ´π ] or. . . . To be more precise


sin(x) is positive when x P [2nπ, (2n + 1)π ] for any integer n.

• Hence g(x) is continuous when x P [2nπ, (2n + 1)π ] for any integer n.

Example 2.3.9

Continuous functions are very nice (mathematically speaking). Functions from the “real world”
tend to be continuous (though not always). The key aspect that makes them nice is the fact that they
don’t jump about.

69
L IMITS 2.3 L IMITS AND CONTINUITY

70
Differentiation

71
Chapter 3

I NTRODUCTION TO THE D ERIVATIVE

3.1 IJ Review: lines

Learning Objectives
• Given an equation for a line, sketch the line, and identify its slope.

• Describe negative / positive / zero slope as corresponding to a line that is decreasing /


increasing / constant over an interval.

• Find a line from two points; from a point and a slope; or from a clearly labelled graph.

• Find the slope at various points of a piecewise-linear function

As you’ll soon see, derivatives and lines are closely related. To make the discussion of derivatives
smoother1 , we’ll do a quick review of lines.

3.1.1 §§ Equations and sketches


If you know the slope of a line, and one point on that line, then you can sketch it. Recall that
the slope of a line, often remembered as “rise over run,” is the ratio of the changes of the vertical
component (or dependent variable) to the horizontal component (or independent variable) between
(any) two distinct points along the line.
For example, the line above has slope ´ 12 . For any two distinct points (x0 , y0 ) and (x1 , y1 ) on
the line, the ratio yx11 ´y 1
´x0 is the same: ´ 2 .
0

0´2 ´1 ´ 0 ´2.5 ´ (´1) ´2.5 ´ 2 1


= = = =´
´1 ´ (´5) 1 ´ (´1) 4´1 4 ´ (´5) 2
Generally speaking, you can think of the slope of a line as its steepness.

1 This is a pun, but you might need to read a bit further to recognize it.

73
I NTRODUCTION TO THE D ERIVATIVE 3.1 R EVIEW: LINES

• A line of slope 0 is flat.

• A line whose slope has a large absolute value is steep.

• A line of positive slope is increasing (going upwards as you move to the right).

• A line of negative slope is decreasing (going downwards as you move to the right).
The equation of a line is in ‘slope-intercept form’ if it has the form

y = mx + b

where m and b are real numbers. The slope of the line is m, and it passes through the point (0, b)
Example 3.1.1
Sketch the line y = 3x ´ 2.
Solution: The slope of the line is 3, and the line passes through the point (0, ´2). So, we can
start with putting a point at (0, ´2). Then, move right 1 and up 3 to find another point on the line,
(1, 1). Draw the line between these two points.
y

74
I NTRODUCTION TO THE D ERIVATIVE 3.1 R EVIEW: LINES

Example 3.1.1

Example 3.1.2
Sketch the line y = 2(1 ´ x).
Solution: This isn’t in slope-intercept form, but we can manipulate it to be:

y = ´2x + 2 .

So, the slope of the line is ´2, and the line passes through the point (0, 2).

Example 3.1.2

3.1.2 §§ Different equation forms


Slope-intercept form is a common form to learn in high school, but it is not the only standard
equation template for a line. Useful to us will be ‘point-slope’ form. A line passing through a point
(x0 , y0 ), with slope m, can be described with the equation
y ´ y0 = m(x ´ x0 ) .

Example 3.1.3
Give an equation for a line passing through the point (1, 2) with slope 3.
Solution: We aren’t told which format to put it in. Since we have a point and a slope, point-slope
format is easiest:
y ´ 2 = 3(x ´ 1) .

75
I NTRODUCTION TO THE D ERIVATIVE 3.1 R EVIEW: LINES

Example 3.1.3

Example 3.1.4
Give an equation for a line passing through the points (1, 2) and (3, 3).
Solution: If we had the slope, we could use point-slope. So, let’s find the slope!
∆y 3 ´ 2 1
m= = = .
∆x 3 ´ 1 2
Now, we can write the line in point-slope form. The following two equations are equivalent to one
another (and, therefore, both correct answers):
1
y ´ 2 = (x ´ 1)
2
1
y ´ 3 = (x ´ 3)
2
Since we weren’t told which form to put the equation into, other answers are possible.
Example 3.1.4

Example 3.1.5
Find an equation for the line sketched below. Each gridline corresponds to a single unit.
y

Solution: From the gridlines, we can be fairly certain that the line passes through the points
(´4, ´3) and (4, 2). (These are the only places shown where the line intersects a point on the grid
where both the x-and y-values are integers. It’s tough to guess at the exact value of a point on the
line elsewhere, so we don’t try.) As in Example 3.1.4, we use these two points to find the slope:
2 ´ (´3) 5
m= =
4 ´ (´4) 8
From here, the easiest format to use is point-slope. The two equations below are equivalent:
5
y + 3 = (x + 4)
8
5
y ´ 2 = (x ´ 4)
8

76
I NTRODUCTION TO THE D ERIVATIVE 3.1 R EVIEW: LINES

Example 3.1.5

Example 3.1.6
Each equation below describes a line. Sort them into collections of equations describing the same
line.
y´1
A. y = 2x + 1 D. y ´ 5 = 2(x ´ 2) G. x = 2

B. y = 2x ´ 1 E. 2y + 2 = 4x H. 2 ´ y = x + 1

C. y = 1 ´ x F. y ´ 1 = 2(x ´ 1) I. y + 3 = 2(x + 1)

Solution: One method is to manipulate each equation algebraically until they are in slope-
intercept form, and then see which are the same. Equations A and B are already in slope-intercept
form.

C: y = 1´x
ðñ y = ´x + 1 C is neither equivalent to A nor B
D: y ´ 5 = 2(x ´ 2)
ðñ y ´ 5 = 2x ´ 4
ðñ y = 2x + 1 D is equivalent to A
E: 2y + 2 = 4x
ðñ 2y = 4x ´ 2
ðñ y = 2x ´ 1 E is equivalent to B
F: y ´ 1 = 2(x ´ 1)
ðñ y ´ 1 = 2x ´ 2
ðñ y = 2x ´ 1 F is equivalent to B
y´1
G: x=
2
ðñ 2x = y ´ 1
ðñ y = 2x + 1 G is equivalent to A
H: 2´y = x+1
ðñ ´2 + y = ´x ´ 1
ðñ y = ´x + 1 H is equivalent to C
I: y + 3 = 2(x + 1)
ðñ y + 3 = 2x + 2
ðñ y = 2x ´ 1 I is equivalent to B

All together:

• A, D, and G are all equations for the same line;

77
I NTRODUCTION TO THE D ERIVATIVE 3.1 R EVIEW: LINES

• B, E, F, and I are all equations for the same line; and

• C and H are both equations for the same line.

Example 3.1.6

3.1.3 §§ Slopes at different points

Suppose every piece of a piecewise-defined function is linear. Then at any point (except where the
function switches from one piece to the other), the function locally looks like a line, so we can find
its slope at that point.

Example 3.1.7
Consider the function below:

$
&3 ´ x
’ for x ď ´2
f (x) = 2x + 1 for ´ 2 ă x ď 1

5 ´ 2x for x ą 1
%

Sketch y = f (x), and give the slope of the line making up the function at all points x except x = ´2
and x = 1. (In fact, we can call these numbers the slope of the function itself.)
Solution:

• For x ď ´2, we sketch a line with slope ´1. When x is (say) ´3, then y is 6, so one point we
know is (´6, 6).

• For ´2 ă x ď 1, the line to draw has slope 2 and passes through the point (0, 1).

• For x ě 1, we sketch a line with slope ´2. When x is (say) 2, then y is 1, so one point we
know is (´2, 1).

78
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE

The slopes of the lines making up the function are:


$
&´1 for ´ 8 ă x ă ´2

2 for ´ 2 ă x ă 1

´2 for 1 ă x ă 8
%

As you’ll see later, it won’t make sense to us to talk about the “slope” of the function when x is ´2
or 1. At these places, f (x) doesn’t look much like a line.
Example 3.1.7

3.2 IJ Slopes and rates of change

Learning Objectives
• Describe the slope of a linear function as the rate of change of that function (change in
y over change in x).

• Compute the average rate of change of a nonlinear function over an interval.

79
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE

3.2.1 §§ Lines and rate of change


So far we have talked a lot about equations for lines. Our goal now is to connect the slopes of
lines (and then eventually curves) with the concepts of change and rate of change of functions. If a
function f (x) depends linearly on another variable x, this linear relationship can be described by
the equation f (x) = mx + b.

Definition 3.2.1 (Rate of change for linear relationship).

For a linear relationship f (x), we define the rate of change of y = f (x) with respect to x
as the ratio:
change in y
.
change in x

A graph of y = f (x) versus x is a straight line with slope m and intercept b. In section 3.1, we
remembered the slope m of a line as the ratio of the changes of the vertical component (or dependent
variable) to the horizontal component (or independent variable) between (any) two distinct points
f (x1 )´ f (x0 ) ∆y
along the line. We can write this as yx11 ´y0
´x0 or x1 ´x0 or ∆x . This is precisely the rate of change of
2
y per unit rate of change of x, and it is a constant. This is the property that distinguishes lines from
other curves, and linear relationships from nonlinear ones:

Definition 3.2.2 (Slope of line is rate of change).

The slope m of a straight line with equation y = mx + b is the rate of change of the linear
function y = f (x), and it is constant.

change in y
= m.
change in x

The following example demonstrates how we can describe the slope of a line as the rate of
change of a linear function (i.e., change in y per change in x, over any interval).
Example 3.2.3
In this example we look at the straight line
y = 12 x + 32 .

• From the slope of 12 , we claim that if, as we walk along this straight line, our x–coordinate
changes by an amount ∆x, then our y–coordinate changes by exactly ∆y = 12 ∆x. This is what
we mean by rate of change.
• For example, in the figure on the left below, we move from the point
(x0 , y0 ) = (1 , 2 = 21 ˆ 1 + 32 )

2 In the “real world” the phrase “rate of change” usually refers to rate of change per unit time. In science it used
more generally.

80
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE

on the line to the point


(x1 , y1 ) = (5 , 4 = 21 ˆ 5 + 32 )
on the line. In this move our x–coordinate changes by
∆x = 5 ´ 1 = 4
and our y–coordinate changes by
∆y = 4 ´ 2 = 2

which is indeed 12 ˆ 4 = 12 ∆x, as claimed.

y y = 21 x + 3
2 y (x1 , y1 ) y = 21 x + 3
2
(5, 4)
(x0 , y0 ) ∆y
∆y
∆x
(1, 2) ∆x

x x

• In general, when we move from the point


(x0 , y0 ) = (x0 , 21 x0 + 32 )
on the line to the point
(x1 , y1 ) = (x1 , 12 x1 + 32 )
on the line, our x–coordinate changes by
∆x = x1 ´ x0
and our y–coordinate changes by
∆y = y1 ´ y0
   
= 12 x1 + 32 ´ 12 x0 + 32
= 12 (x1 ´ x0 )
which is indeed 12 ∆x, as claimed.
∆y
• So, for the straight line y = 21 x + 32 , the ratio ∆x = yx11 ´y0 1
´x0 always takes the value 2 , regardless
of the choice of initial point (x0 , y0 ) and final point (x1 , y1 ). This constant ratio is the rate of
change and it is the slope of the line y = 21 x + 32 .

Example 3.2.3

81
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE

3.2.2 §§ Nonlinear functions and average rates of change


What if the function we are interested in is not linear, so that its graph is some kind of curve instead
of a line? It might be exponential, or quadratic, or trigonometric, or maybe something we can’t even
name. How might we use what we know about slopes of lines to help us describe how a function
f (x) is changing as x changes? In this case, we can pick any two points (x0 , f (x0 )), and (x1 , f (x1 ))
and connect them with a line. We call this a secant line:

Definition 3.2.4 (Secant line).

A straight line connecting any two points on the graph of a function is called a secant line
of that function.

We define the average3 rate of change of a function f (x) over an interval x0 ď x ď x1 as the
slope of the straight line connecting the two points (x0 , f (x0 )), and (x1 , f (x1 )).

Definition 3.2.5 (Average rate of change over an interval is slope of secant).

The average rate of change of y = f (x) over the interval x0 ď x ď x1 is the slope of the
secant line through the two points (x0 , f (x0 )), and (x1 , f (x1 )):

change in f ∆f f (x1 ) ´ f (x0 )


Average rate of change = = = .
change in x ∆x x1 ´ x0

The average rate of change of a function can be interpreted in different ways, depending on what
the function represents. When the function of interest represents distance with respect to time, its
average rate of change is the average velocity:

Definition 3.2.6 (Average velocity).

For a moving body, the average velocity over a time interval a ď t ď b is the average rate
of change of distance over the given time interval.

∆distance
average velocity =
∆time

Let’s look at a nonlinear function and some of its secant lines.


Example 3.2.7
Consider the parabola y = x2 , which is the graph of the function f (x) = x2 .

3 The word “average” sometimes causes confusion. One often speaks in a different context about the average value
of a set of numbers (e.g. the average of t7, 1, 3, 5u is (7 + 1 + 3 + 5)/4 = 4.) However the term average rate of
change always means the slope of the straight line joining a pair of points.

82
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE

• Look at the interval between the points (2, 4) and (4, 16) on the parabola. If we draw a straight
line connecting (2, 4) and (4, 16), this is a secant line for the parabola. This secant line has
∆y
slope m = ∆x = 16´4 12
4´2 = 2 = 6.

y = x2
y

16

4
x
2 4
Secant line through points (2, 4) and (4, 16)

• The slope of the secant line connecting (2, 4) and (4, 16) is the average rate of change of the
function over the interval 2 ď x ď 4.

• Now consider the points (2, 4) and (5, 25) on the parabola. We can form a different secant
∆y
line by connecting these two points with a straight line, which will have a slope of m = ∆x =
25´4 21
5´2 = 3 = 7. This slope is the average rate of change of the function over the interval
2 ď x ď 5.

y = x2
y

25

4
x
2 5
Secant line through points (2, 4) and (5, 25)

83
I NTRODUCTION TO THE D ERIVATIVE 3.2 S LOPES AND RATES OF CHANGE

Example 3.2.7

Notice that different choices for either (or both) of the points (x0 , y0 ) and (x1 , y1 ) can result in
different values for the slope yx11 ´y0
´x0 of the secant through those points. Thus the average rate of
change will, in general, depend on which two points we select. This is in contrast to the linear case;
see the example below.
Example 3.2.8
Consider the line y = 12 x + 32 . If y = f (x) is linear, then the secant through any two different points
on a line is always identical to the line itself, and so always has exactly the same slope as the line
itself. This is illustrated in Figure 3.2.1 below — the (yellow) secant through (x0 , y0 ) and (x1 , y1 )
lies exactly on top of the (red) line y = 21 x + 32 .

Figure 3.2.1.

y (x1 , y1 ) y = 21 x + 3
2

(x0 , y0 )

x
For a straight line, all secants have the same slope.

This also means that if y = f (x) is linear, then the average rate of change (from 3.2.5) is the
same as the rate of change (from 3.2.1); and both rates are equal to its slope.
Example 3.2.8

§§ Alternative secant notation


Suppose y = f (x) is some nonlinear function of x. Instead of picking two points and calling them
(x0 , f (x0 )) and (x1 , f (x1 )) we could (and often do) alternatively pick one point (x0 , f (x0 )), and
then pick a second x-value some distance h away from x0 ,

x1 = x0 + h.

In other words, h is the difference of the two x coordinates. Then our two points are (x0 , f (x0 ))
and (x0 + h, f (x0 + h)), and we can write the average rate of change of f across the interval

84
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

x0 ď x ď x0 + h as
∆y [ f (x0 + h) ´ f (x0 )] [ f (x0 + h) ´ f (x0 )]
= = .
∆x (x0 + h) ´ x0 h
This ratio is the slope of the secant line through the two points.
Figure 3.2.2.

y = f (x)
f (x0 + h)

secant line

f (x0 )
x
x0 x0 + h
The slope of the secant line through the points (x0 , f (x0 )) and
(x0 + h, f (x0 + h)) is the average rate of change of f over the given interval.

` A secant line between two points, x0 and x0 + h on the graph of a function f (x) is shown in
this link. You can change the base point x0 , the distance between the x coordinates, h, or you
can input your own function for f (x). The slope of the secant line is the average rate of change
of f over the interval x0 ď x ď x0 + h

To summarize, now we have an alternative to the definition of average rate of change and the
associated slope of the secant line provided in 3.2.5:
Definition 3.2.9 (Average rate of change over an interval is slope of secant (alternate)).

The average rate of change of y = f (x) over the interval x0 ď x ď x0 + h is the slope of
the secant line through the two points (x0 , f (x0 )), and (x0 + h, f (x0 + h)):

change in f ∆f f (x0 + h) ´ f (x0 )


Average rate of change = = = .
change in x ∆x h

3.3 IJ The derivative

Learning Objectives
• Explain using words, pictures, and the language of limits what a derivative is.

• Use the definition of derivative to find the tangent line to a function at a given point.

85
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

• Describe the tangent line as an approximation to a function at a given point.

• Describe the derivative of a function as a function itself.

• Given the graph of a function, sketch the graph of its derivative.

• Interpret derivatives as instantaneous rates of change

• Explain why the definition of a derivative is important, even if you know shortcuts for
computation.

3.3.1 §§ Slope at a point


In the previous section we introduced the idea of a secant line as a way to talk about the average rate
of change of a function over a given interval. However, a function can change a lot between two
given points. How does this affect the usefulness of the slope of a secant line connecting two points
as a description of how a function is changing near those points? Well, if there is a lot of curviness
in between the two points, then the average rate of change between those points might not describe
what’s really happening at either point. What happens if we choose two points that are very close to
each other?
We investigate, in Examples 3.3.1 and 3.3.2, below, the idea of a secant line to the parabola
y = x2 in the limit of the two points that are very close together – so close, in fact, that we get a line
that touches the curve at a single point. This is called a tangent4 line.
In Example 3.3.1 we find the slope of the tangent line to y = x2 at a particular point. We
generalise this in Example 3.3.2, to show that we can define “the slope of the curve y = x2 ” at an
∆y
arbitrary point x = x0 by considering ∆x = yx11 ´y
´x0 with (x1 , y1 ) very close to (x0 , y0 ).
0

Example 3.3.1
In this example, let us fix (x0 , y0 ) to be the point (2, 4) on the parabola y = x2 . Now let (x1 , y1 ) =
(x1 , x12 ) be some other point on the parabola; that is, a point with x1 ‰ x0 .

• Draw the straight line through (x0 , y0 ) and (x1 , y1 ).

• The following table gives the slope, yx11 ´y´x0 , of the secant line through (x0 , y0 ) = (2, 4) and
0

(x1 , y1 ), for various different choices of (x1 , y1 = x12 ).

x1 1 1.5 1.9 1.99 1.999 ˝ 2.001 2.01 2.1 2.5 3


y1 = x12 1 2.25 3.61 3.9601 3.9960 ˝ 4.0040 4.0401 4.41 6.25 9
y1 ´y0 y1 ´4
x1 ´x0 = x1 ´2 3 3.5 3.9 3.99 3.999 ˝ 4.001 4.01 4.1 4.5 5

• So now we have a big table of numbers — what do we do with them? Look at the columns of
the table closer to the middle. As x1 gets closer and closer to x0 = 2, the slope, yx11 ´y 0
´x0 , of the
secant through (x0 , y0 ) and (x1 , y1 ) appears to get closer and closer to the value 4.

4 tangens means touching.

86
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

Example 3.3.1

Example 3.3.2
It is very easy to generalise what is happening in Example 3.3.1.

• Fix any point (x0 , y0 ) on the parabola y = x2 . If (x1 , y1 ) is any other point on the parabola
y = x2 , then y1 = x12 and the slope of the secant through (x0 , y0 ) and (x1 , y1 ) is

y1 ´ y0 x2 ´ x2
slope = = 1 0 since y = x2
x1 ´ x0 x1 ´ x0
(x ´ x )(x + x0 )
= 1 0 1 remember a2 ´ b2 = (a ´ b)(a + b)
x1 ´ x0
= x1 + x0

You should check the values given in the table of Example 3.3.1 above to convince yourself
that the slope xy11 ´x
´y0
0
of the secant line really is x0 + x1 = 2 + x1 (since we set x0 = 2).

• Now as we move x1 closer and closer to x0 , the slope should move closer and closer to 2x0 .
Indeed if we compute the limit carefully, we see that in the limit as x1 Ñ x0 the slope becomes
2x0 . That is
y1 ´ y0
lim = lim (x1 + x0 ) by the work we did just above
x1 Ñx0 x1 ´ x0 x1 Ñx0
= 2x0

(Note: Taking this limit gives us our first derivative. Of course we haven’t yet given the
definition of a derivative, so we perhaps wouldn’t recognise it yet. We rectify this in the next
section.)

Figure 3.3.1.

y y = x2

(x0 , y0 )

x
Secants approaching a tangent line

87
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

• So it is reasonable to say “as x1 approaches x0 , the secant through (x0 , y0 ) and (x1 , y1 )
approaches the tangent line to the parabola y = x2 at (x0 , y0 )”.
The figure above shows four different secants through (x0 , y0 ) for the curve y = x2 . The
four hollow circles are four different choices of (x1 , y1 ). As (x1 , y1 ) approaches (x0 , y0 ), the
corresponding secant does indeed approach the tangent to y = x2 at (x0 , y0 ), which is the
heavy (red) straight line in the figure.
Using limits we determined the slope of the tangent line to y = x2 at x0 to be 2x0 . Often we
will be a little sloppy with our language and instead say “the slope of the parabola y = x2 at
(x0 , y0 ) is 2x0 ” — where we really mean the slope of the line tangent to the parabola at x0 .

Example 3.3.2

3.3.2 §§ Definition of the derivative (1)


We now define the “derivative” explicitly, based on the limiting slope ideas of the previous section.
Then we see how to compute some simple derivatives.
Let us now generalise what we did in the last section so as to find “the slope of the curve y = f (x)
at (x0 , y0 )” for any smooth enough5 function f (x).
As before, let (x0 , y0 ) be any point on the curve y = f (x). So we must have y0 = f (x0 ). Now
let (x1 , y1 ) be any other point on the same curve. So y1 = f (x1 ) and x1 ‰ x0 . Think of (x1 , y1 ) as
being pretty close to (x0 , y0 ) so that the difference

∆x = x1 ´ x0

in x–coordinates is pretty small. In terms of this ∆x we have



x1 = x0 + ∆x and y1 = f x0 + ∆x

We can construct a secant line through (x0 , y0 ) and (x1 , y1 ) just as we did for the parabola above. It
has slope

y1 ´ y0 f x0 + ∆x ´ f (x0 )
=
x1 ´ x0 ∆x

If f (x) is reasonably smooth6 , then as x1 approaches x0 , i.e. as ∆x approaches 0, we would expect


the secant through (x0 , y0 ) and (x1 , y1 ) to approach the tangent line to the curve y = f (x) at (x0 , y0 ),
just as happened in Figure 3.3.1. And more importantly, the slope of the secant through (x0 , y0 ) and
(x1 , y1 ) should approach the slope of the tangent line to the curve y = f (x) at (x0 , y0 ).
Thus we would expect7 the slope of the tangent line to the curve y = f (x) at (x0 , y0 ) to be

f x0 + ∆x ´ f (x0 )
lim
∆xÑ0 ∆x

5 The idea of “smooth enough” can be made quite precise. Indeed the word “smooth” has a very precise meaning in
mathematics, which we won’t cover here. For now think of “smooth” as meaning roughly just “smooth”.
6 Again the term “reasonably smooth” can be made more precise.
7 Indeed, we don’t have to expect — it is!

88
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

When we talk of the “slope of the curve” at a point, what we really mean is the slope of the tangent
line to the curve at that point. So “the slope of the curve y = f (x) at (x0 , y0 )” is also the limit8
expressed in the above equation. The derivative of f (x) at x = x0 is also defined to be this limit.
Which leads9 us to the most important definition in this text:

Definition 3.3.3 (Derivative at a point).

Let a P R and let f (x) be defined on an open interval10 that contains a.

• The derivative of f (x) at x = a is denoted f 1 (a) and is defined by



f a + h ´ f (a)
f 1 (a) = lim
hÑ0 h
if the limit exists.

• When the above limit exists, the function f (x) is said to be differentiable at x = a.
When the limit does not exist, the function f (x) is said to be not differentiable at
x = a.

• We can equivalently define the derivative f 1 (a) by the limit

f (x ) ´ f (a)
f 1 (a) = lim .
xÑa x´a
To see that these two definitions are the same, we set x = a + h and then the limit as
h goes to 0 is equivalent to the limit as x goes to a.

• Informally, f 1 (a) is the “slope of f (x) at a”. Formally, f 1 (a) is the slope of the
tangent line to f (x) at x = a.

Let’s now compute the derivatives of some very simple functions. This is our first step towards
building up a toolbox for computing derivatives of complicated functions — this process will very
much parallel what we did in the previous chapter with limits. The two simplest functions we know
are f (x) = c and g(x) = x.

Example 3.3.4 (Derivative of f (x) = c)


Let a, c P R be a constants. Compute the derivative of the constant function f (x) = c at x = a.
We compute the desired derivative by just substituting the function of interest into the formal

8 This is of course under the assumption that the limit exists — we will talk more about that below.
9 We will rename “x0 ” to “a” and “∆x” to “h”.
10 Maybe you remember this, but just in case you don’t: the open interval (c, d ) is just the set of all real numbers
obeying c ă x ă d.

89
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

definition of the derivative.

f (a + h) ´ f (a)
f 1 (a) = lim (the definition)
hÑ0 h
c´c
= lim (substituted in the function)
hÑ0 h
= lim 0 (simplified things)
hÑ0
=0

Example 3.3.4
That was easy! What about the next most complicated function — arguably it’s this one:
Example 3.3.5 (Derivative of g(x) = x)
Let a P R and compute the derivative of g(x) = x at x = a.
Again, we compute the derivative of g by just substituting the function of interest into the formal
definition of the derivative and then evaluating the resulting limit.

g(a + h) ´ g(a)
g1 (a) = lim (the definition)
hÑ0 h
(a + h) ´ a
= lim (substituted in the function)
hÑ0 h
h
= lim (simplified things)
hÑ0 h
= lim 1 (simplified a bit more)
hÑ0
=1

Example 3.3.5

That was a little harder than the first example, but still quite straight forward — start with the
definition and apply what we know about limits.
Thanks to these two examples, we have our first theorem about derivatives:

Theorem 3.3.6 (Easiest derivatives).

Let a, c P R and let f (x) = c be the constant function and g(x) = x. Then

f 1 (a) = 0

and

g1 (a) = 1.

90
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

3.3.3 §§ Tangent lines and linear approximations


Suppose that y = f (x) is the equation of a curve in the xy–plane. That is, f (x) is the y–coordinate
of the point on the curve whose x–coordinate is x. Then, as we have already seen,

   f (a + h) ´ f (a)
the slope of the secant through a, f (a) and a + h, f (a + h) =
h
This is shown in Figure 3.3.2 below.

Figure 3.3.2.

In order to create the tangent line


 (as we have done a few times now) we squeeze h Ñ 0. As we
11
do this, the secant through a, f (a) and a + h, f (a + h) approaches the tangent line to y = f (x)
at x = a. Since the secant becomes the tangent line in this limit, the slope of the secant becomes the
slope of the tangent and

  f (a + h) ´ f (a)
the slope of the tangent line to y = f (x) at x = a = lim
hÑ0 h
1
= f (a).

` As h Ñ 0, the secant line approaches a tangent line. Use the slider for h to show this trend,
and note that the slope of the secant line approaches the slope of the tangent line at the point x0 .

Let us go a little further and work out a general formula for the equation of the tangent line to
y = f (x) at x = a. We know that the tangent line

• has slope f 1 (a) and



• passes through the point a, f (a) .

11 We are of course assuming that the curve is smooth enough to have a tangent line at a.

91
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

There are a couple of different ways to construct the equation of the tangent line from this information.
One is to observe, as inFigure 3.3.3, that if (x, y) is any other point on the tangent line then the line
segment from a, f (a) to (x, y) is part of the tangent line and so also has slope f 1 (a). That is,
y ´ f (a)  
= the slope of the tangent line = f 1 (a)
x´a
Cross multiplying gives us the equation of the tangent line:
y ´ f (a) = f 1 (a) (x ´ a) or y = f (a) + f 1 (a) (x ´ a)

Figure 3.3.3.

y y = f (a) + f ′ (a) (x − a)

(x, y) y = f (x)


a, f (a)

x
A line segment of a tangent line

A second way to derive the same equation of the same tangent line is to recall that the general
equation for a line, with finite slope, is y = mx + b, where m is the slope and b is the y-intercept. We
already know the slope — so m = f 1 (a). To work out b we use the other piece of information —
(a, f (a)) is on the line. So (x, y) = (a, f (a)) must solve y = f 1 (a) x + b. That is,
f (a) = f 1 (a) ¨ a + b and so b = f (a) ´ a f 1 (a)
Hence our equation is, once again,
y = f 1 (a) ¨ x + ( f (a) ´ a f 1 (a)) or, after rearranging a little,
y = f (a) + f 1 (a) (x ´ a)
This is a very useful formula, so perhaps we should make it a theorem.
Theorem 3.3.7 (Tangent line).

The tangent line to the curve y = f (x) at x = a is given by the equation

y = f (a) + f 1 (a) (x ´ a)

provided the derivative f 1 (a) exists.

92
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

The caveat at the end of the above theorem is necessary — there are certainly cases in which the
derivative does not exist and so we do need to be careful.
Example 3.3.8
Find the tangent line to the curve y = x2 at x = 3.
Rather than redoing everything from scratch, we can, and for efficiency, should, use Theorem
3.3.7. To write this up properly, we must ensure that we tell the reader what we are doing. So
something like the following:
• By Theorem 3.3.7, the tangent line to the curve y = f (x) at x = a is given by
y = f (a) + f 1 (a)(x ´ a)
provided f 1 (a) exists.
• In Example 3.3.2, we found that, for any x0 ą 0, the derivative of x2 at x = x0 is
f 1 (x0 ) = 2x0 .
The tangent line formula uses a instead of x0 , so let’s use a for the derivative at the point:
f 1 (a) = 2a.

• In the current example we are taking a = 3 and we have


ˇ ˇ
f ( a ) = f ( 3 ) = a 2 ˇa = 3 = 3 2 = 9 and f 1 (a) = f 1 (3) = 2aˇ = 2 ¨ 3 = 6.
ˇ
a=3

• So the equation of the tangent line to y = x2 at x = 3 is



y = 9+6 x´3 or y = 6x ´ 9.

We don’t have to write it up using dot-points as above; we have used them here to help delineate
each step in the process of computing the tangent line.
Example 3.3.8

In the example above, imagine “zooming in” to the point (3, 9) and watching the curve of the
function and the tangent line. As you zoom in closer and closer, the tangent line more and more
closely matches the curve at that point. In fact, if we want to use a line to approximate a curve at any
point x = a, the best we can do is indeed its tangent line at that point. (If you’re not convinced: can
you come up with a different line that passes through the point (3, 9) that is a better approximation
to x2 than its tangent line there?)

Definition 3.3.9 (Linear approximation).

The linear approximation to f (x) at x = a is

L(x) = f (a) + f 1 (a)(x ´ a).

This is simply the tangent line to f (x) at x = a.

93
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

The linear approximation to f (x) at a is a better approximation to f (x) near x = a than other
lines through (a, f (a)). You might start wondering: What kind of polynomial approximation might
be a better approximation than the linear approximation? Why? We return to this idea in a later
chapter when we discuss numerical approximations.

3.3.4 §§ Definition of the derivative (2)


Let’s redo the example we have already done a few times: f (x) = x2 . To make it a little more
interesting (and gain some perspective on what the symbols represent) let’s change the names of the
function and the variable so that it is not exactly the same as Examples 3.3.1 and 3.3.2.

Example 3.3.10 Derivative of h(t ) = t 2
Compute the derivative of
h(t ) = t 2 at t = a
• This function isn’t quite like the ones we saw earlier — it’s a function of t rather than x. Recall
that a function is a rule which assigns to each input value an output value. So far, we have
usually called the input value x. But this “x” is just a dummy variable representing a generic
input value. There is nothing wrong with calling a generic input value t instead. Indeed, from
time to time you will see functions that are not written as formulas involving x, but instead
are written as formulas in t (for example representing time), or z (for example representing
height), or other symbols.
• So let us write the definition of the derivative
f (a + h) ´ f (a)
f 1 (a) = lim
hÑ0 h
and then translate it to the function names and variables at hand:
h(a + h) ´ h(a)
h1 (a) = lim
hÑ0 h
• But there is a problem — “h” plays two roles here — it is both the function name and the
small quantity that is going to zero in our limit. It is extremely dangerous to have a symbol
represent two different things in a single computation. We need to change one of them. So
let’s rename the small quantity that is going to zero in our limit from “h” to “∆t”:
h(a + ∆t ) ´ h(a)
h1 (a) = lim
∆tÑ0 ∆t
• Now we are ready to begin. Substituting in what the function h is,
(a + ∆t )2 ´ a2
h1 (a) = lim
∆tÑ0 ∆t
a + 2a ∆t + ∆t 2 ´ a2
2 
= lim just squared out (a + ∆t )2
∆tÑ0 ∆t
2a ∆t + ∆t 2
= lim
∆tÑ0 ∆t
= lim (2a + ∆t )
∆tÑ0
= 2a

94
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

• You should go back check that this is what we got in Example 3.3.2 — just some names have
been changed.

Example 3.3.10

§§ An important point (and some notation)


Notice here that the answer we get depends on our choice of a — if we want to know the derivative
at a = 3 we can just substitute a = 3 into our answer 2a to get the slope is 6. If we want to know
at a = 1, we substitute a = 1 and get that the slope is 2. The important thing here is that we can
move from the derivative being computed at a specific point to the derivative being a function itself
— input any value of a and it returns the slope of the tangent line to the curve at the point x = a,
y = h(a). The variable a is a dummy variable. We can rename a to anything we want, like x, for
example. So we can replace every a in

h1 (a) = 2a by x, giving h1 (x) = 2x

where all we have done is replaced the symbol a by the symbol x.


We can do this more generally and tweak the derivative at a specific point a to obtain the
derivative as a function of x. We replace
f (a + h) ´ f (a)
f 1 (a) = lim
hÑ0 h
with
f (x + h) ´ f (x )
f 1 (x) = lim
hÑ0 h
which gives us the following definition

Definition 3.3.11 (Derivative as a function).

Let f (x) be a function.

• The derivative of f (x) with respect to x is



f x + h ´ f (x )
f 1 (x) = lim
hÑ0 h
provided the limit exists.

• If the derivative f 1 (x) exists for all x P (a, b) we say that f is differentiable on (a, b).

• Note that we will sometimes be a little sloppy with our discussions and simply write
“ f is differentiable” to mean “ f is differentiable on an interval we are interested in”
or “ f is differentiable everywhere”.

• Informally, the derivative is the “slope of f (x)”.

95
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

Notice that we are no longer thinking of tangent lines. Instead, differentiation is an operation we
can do on a function – and moreover, the result (the derivative) is itself a function as well.
For example:

Example 3.3.12 The derivative of f (x) = 1x
1
Let f (x) = x and compute its derivative with respect to x — think carefully about where the deriva-
tive exists.
• Our first step is to write down the definition of the derivative — at this stage, we know of no
other strategy for computing derivatives.
f (x + h) ´ f (x )
f 1 (x) = lim (the definition)
hÑ0 h

• And now we substitute in the function and compute the limit.


f (x + h) ´ f (x )
f 1 (x) = lim (the definition)
hÑ0
 h 
1 1 1
= lim ´ (substituted in the function)
hÑ0 h x + h x
1 x ´ (x + h)
= lim (wrote over a common denominator)
hÑ0 h x(x + h)
1 ´h
= lim (started cleanup)
hÑ0 h x(x + h)
´1
= lim
hÑ0 x(x + h)
1
=´ 2
x

• Notice that the original function f (x) = 1x was not defined at x = 0 and the derivative is also
not defined at x = 0. This does happen more generally — if f (x) is not defined at a particular
point x = a, then the derivative will not exist at that point either.

Example 3.3.12

So we now have two slightly different ideas of derivatives:


• The derivative f 1 (a) at a specific point x = a, being the slope of the tangent line to the curve
at x = a, as defined in Definition 3.3.3, and

• The derivative as a function, f 1 (x) as defined in Definition 3.3.11.


Of course, if we have f 1 (x) then we can always recover the derivative at a specific point by
substituting x = a.
As we noted at the beginning of the chapter, the derivative was discovered independently by
Newton and Leibniz in the late 17th century. Because their discoveries were independent, Newton
and Leibniz did not have exactly the same notation. Stemming from this, and from the many different
contexts in which derivatives are used, there are quite a few alternate notations for the derivative:

96
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

Notation 3.3.13.

The following notations are all used for “the derivative of f (x) with respect to x”

df d
f 1 (x ) f (x ) f˙(x) D f (x ) Dx f (x),
dx dx
while the following notations are all used for “the derivative of f (x) at x = a”
ˇ
df d
f˙(a)
1
ˇ
f (a) (a) f (x) ˇˇ D f (a) Dx f ( a ) .
dx dx x=a

Some things to note about these notations:

• We will generally use the first three, but you should recognise them all. The notation
f 1 (a) is due to Lagrange, while the notation ddxf (a) is due to Leibniz. They are both
very useful. Neither can be considered “better”.

• Leibniz notation writes the derivative as a “fraction” — however it is definitely not


a fraction and should not be thought of in that way. It is just shorthand, which is
read as “the derivative of f with respect to x”.
df d
• You read f 1 (x) as “ f –prime of x”, and dx as “dee– f –dee–x”, and dx f (x) as “dee-
by–dee–x of f ”.
ˇ
df d
• Similarly you read dx (a) as “dee– f –dee–x at a”, and f (x )ˇ as “dee-by-dee x
ˇ
dx
x=a
of f at x equals a”.

• The notation f˙ is due to Newton. In physics, it is common to use f˙(t ) to denote the
derivative of f with respect to time.

§§ Back to computing some derivatives


At this point we could try to start working out how derivatives interact with arithmetic and make
an “Arithmetic of derivatives” theorem just like the one we saw for limits in the previous chapter.
We will get there shortly, but before that it is important that we become more comfortable with
computing derivatives using limits and then understanding what the derivative actually means. So —
more examples.
d ?

Example 3.3.14 dx x
?
Compute the derivative, f 1 (a), of the function f (x) = x at the point x = a for any a ą 0.

• So again we start with the definition of derivative and go from there:


? ?
1 f (x ) ´ f (a) x´ a
f (a) = lim = lim
xÑa x´a xÑa x´a

0
• As x tends to a, the numerator and denominator both tend to zero. But 0 is not defined.

97
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

So to get a well defined limit we need to exhibit a cancellation between the numerator and
denominator — just as we saw in Example 2.1.23.
? ?
• Recall how to factor the difference of two perfect squares:set A = x and B = a in A2 ´B2 =
(A ´ B)(A + B) to get
? ? ? ?
x ´ a = ( x ´ a)( x + a)
and then substitute this little fact into our expression
? ? ? ?
x´ a x´ a
= ? ? ? ? (now cancel common factors)
x´a ( x ´ a)( x + a)
1
= ? ?
( x + a)
• Now we can take the limit we need:
? ?
1 x´ a
f (a) = lim
xÑa x ´ a
1
= lim ? ?
xÑa x + a
1
= ?
2 a
• We should think about the domain of f 1 here — that is, for which values of a is f 1 (a) defined?
1
The original function f (x) was defined for all x ě 0, however the derivative f 1 (a) = 2? a
is
undefined at a = 0.
?
If we draw a careful picture of x around x = 0 we can see why this has to ? be the case. The
figure below shows three different tangent lines to the graph of y = f (x) = x. As the point
of tangency moves closer and closer? to
 the origin, the tangent line gets steeper and steeper.
The slope of the tangent line at a, a blows up as a Ñ 0.


y y= x

Example 3.3.14

d

Example 3.3.15 dx t|x|u
Compute the derivative, f 1 (a), of the function f (x) = |x| at the point x = a.

98
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

• We should start this example by recalling the definition of |x| (we saw this back in Exam-
ple 2.1.32):
$
&´x if x ă 0

|x| = 0 if x = 0

x if x ą 0.
%

It is definitely not just “chop off the minus sign”.

• This breaks our computation of the derivative into 3 cases depending on whether x is positive,
negative or zero.

• Assume x ą 0. Then
df f (x + h) ´ f (x )
= lim
dx hÑ0 h
|x + h| ´ |x|
= lim
hÑ0 h
Since x ą 0 and we are interested in the behaviour of this function as h Ñ 0 we can assume h
is much smaller than x. This means x + h ą 0 and so |x + h| = x + h.
x+h´x
= lim
hÑ0 h
h
= lim = 1 as expected
hÑ0 h

• Assume x ă 0. Then
df f (x + h) ´ f (x )
= lim
dx hÑ0 h
|x + h| ´ |x|
= lim
hÑ0 h
Since x ă 0 and we are interested in the behaviour of this function as h Ñ 0 we can assume h
is much smaller than x. This means x + h ă 0 and so |x + h| = ´(x + h).

´(x + h) ´ (´x)
= lim
hÑ0 h
´h
= lim = ´1
hÑ0 h

99
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

• When x = 0 we have
f (0 + h) ´ f (0)
f 1 (0) = lim
hÑ0 h
|0 + h| ´ |0|
= lim
hÑ0 h
|h|
= lim
hÑ0 h

To proceed we need to know if h ą 0 or h ă 0, so we must use one-sided limits. The limit


from above is:
|h| h
lim = lim since h ą 0, |h| = h
hÑ0+ h hÑ0 h
+

=1
Whereas, the limit from below is:
|h| ´h
lim = lim since h ă 0, |h| = ´h
hÑ0´ h hÑ0´ h
= ´1
Since the one-sided limits differ, the limit as h Ñ 0 does not exist. And thus the derivative
does not exist as x = 0.
In summary:
$
d &´1
’ if x ă 0
|x| = DNE if x = 0
dx ’
1 if x ą 0
%

Example 3.3.15

§§ Where is the derivative undefined?


f (x )´ f (a)
According to Definition 3.3.3, the derivative f 1 (a) exists precisely when the limit lim x´a
xÑa
exists. That limit is also the slope of the tangent line to the curve y = f (x) at x = a. That limit does
not exist when the curve y = f (x) does not have a tangent line at x = a or when the curve does have
a tangent line, but the tangent line has infinite slope. We have already seen some examples of this.

100
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

• In Example 3.3.12, we considered the function f (x) = 1x . This function “blows up” (i.e.
becomes infinite) at x = 0. It does not have a tangent line at x = 0 and its derivative does not
exist at x = 0.

• In Example 3.3.15, we considered the function f (x) = |x|. This function does not have a
tangent line at x = 0, because there is a sharp corner in the graph of y = |x| at x = 0. (Look at
the graph in Example 2.2.10.) So the derivative of f (x) = |x| does not exist at x = 0.
Here are a few more examples.
Example 3.3.16
Visually, the function
y y = H(x)
#
0 if x ď 0
H (x ) =
1 if x ą 0

x
does not have a tangent line at (0, 0). Not surprisingly, when a = 0 and h tends to 0 with h ą 0,

H (a + h) ´ H (a) H (h) ´ H (0) 1


= =
h h h
blows up. The same sort of computation shows that f 1 (a) cannot possibly exist whenever the
function f is not continuous at a. We will formalize, and prove, this statement in Theorem 3.3.19,
below.
Example 3.3.16

d 1/3

Example 3.3.17 dx x
Visually, it looks like the function f (x) = x1/3 , sketched below, (this might be a good point to recall
that cube roots of negative numbers are negative — for example, since (´1)3 = ´1, the cube root
of ´1 is ´1),

y = x1/3

has the y–axis as its tangent line at (0, 0). So we would expect that f 1 (0) does not exist. Let’s check.
With a = 0,

1 f (a + h) ´ f (a) f (h) ´ f (0) h1/3 1


f (a) = lim = lim = lim = lim 2/3 = DNE
hÑ0 h hÑ0 h hÑ0 h hÑ0 h

as expected.

101
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

Example 3.3.17

 a 
d
Example 3.3.18 dx |x|
?
We have already considered
a the derivative of the function x in Example 3.3.14. We’ll now look at
the function f (x) = |x|. Recall,? from Example 3.3.15, the definition of |x|. When ? x ą 0, we have
|x| =ax and f (x) is identical to x. When x ă 0, ? we have |x| = ´x and f (x) = ´x. So to graph
y = |x| when x ă 0, you just have to graph y = x for x ą 0 and then send x Ñ ´x — i.e. reflect
the graph in the y–axis. Here is the graph. The pointy thing at the origin is called a cusp. The graph

p
y= |x|

of y = f (x) does not have a tangent line at (0, 0) and, correspondingly, f 1 (0) does not exist because
a
f (h) ´ f (0) |h| 1
lim = lim = lim ? = DNE
hÑ0+ h hÑ0+ h hÑ0+ h

Example 3.3.18

Theorem 3.3.19.

If the function f (x) is differentiable at x = a, then f (x) is also continuous at x = a.

Proof. The function f (x) is continuous at x = a if and only if the limit of

f (a + h) ´ f (a)
f (a + h) ´ f (a) = h
h
as h Ñ 0 exists and is zero. But if f (x) is differentiable at x = a, then, as h Ñ 0, the first factor,
f (a+h)´ f (a)
h converges to f 1 (a) and the second factor, h, converges to zero. So the product provision
f (a+h)´ f (a)
of our arithmetic of limits Theorem 2.1.14 implies that the product h h converges to
f 1 (a) ¨ 0 = 0 too.

Notice that while this theorem is useful as stated, it is (arguably) more often applied in its
contrapositive12 form:

12 If you have forgotten what the contrapositive is, then quickly reread Footnote 3 in Section 2.1.

102
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

Theorem 3.3.20 (The contrapositive of Theorem 3.3.19).

If f (x) is not continuous at x = a then it is not differentiable at x = a.

As the above examples illustrate, this statement does not tell us what happens if f is continuous
at x = a — we have to think!

3.3.5 §§ Instantaneous rate of change


In the previous sections we defined the derivative as the slope of a tangent line, using a particular
limit. This allows us to compute “the slope of a curve13 ” and provides us with one interpretation of
the derivative. However, the main importance of derivatives does not come from this application.
Instead, (arguably) it comes from the interpretation of the derivative as the instantaneous rate of
change of a quantity.
Just as the average rate of change can represent the average velocity, when we talk about the
instantaneous rate of change we might specify it as the instantaneous velocity (if the function is
distance with respect to time). That’s what we do in this example.
Example 3.3.21
You drop a ball from a tall building. After t seconds the ball has fallen a distance of s(t ) = 4.9t 2
metres. What is the velocity of the ball one second after it is dropped?

• In the time interval from t = 1 to t = 1 + h the ball travels a distance


 
s(1 + h) ´ s(1) = 4.9(1 + h)2 ´ 4.9(1)2 = 4.9 2h + h2

• So the average velocity over this time interval is

average velocity from t = 1 to t = 1 + h


distance travelled from t = 1 to t = 1 + h
=
length of time from t = 1 to t = 1 + h
s(1 + h) ´ s(1)
=
 h 2
4.9 2h + h
=
h
= 4.9[2 + h]

13 Again — recall that we are being a little sloppy with this term — we really mean “The slope of the tangent line to
the curve”.

103
I NTRODUCTION TO THE D ERIVATIVE 3.3 T HE D ERIVATIVE

• The instantaneous velocity at time t = 1 is then defined to be the limit

instantaneous velocity at time t = 1


 
= lim average velocity from t = 1 to t = 1 + h
hÑ0
s(1 + h) ´ s(1)
= lim = s1 ( 1 )
hÑ0 h
= lim 4.9[2 + h]
hÑ0
= 9.8m/sec

• We conclude that the instantaneous velocity at time t = 1, which is the instantaneous rate of
change of distance per unit time at time t = 1, is the derivative s1 (1) = 9.8m/sec.

Example 3.3.21

Now suppose, more generally, that you are taking a walk and that as you walk, you are continu-
ously measuring some quantity, like temperature, and that the measurement at time t is f (t ). Then
the

average rate of change of f (t ) from t = a to t = a + h


change in f (t ) from t = a to t = a + h
=
length of time from t = a to t = a + h
f (a + h) ´ f (a)
=
h

so the

instantaneous rate of change of f (t ) at t = a


 
= lim average rate of change of f (t ) from t = a to t = a + h
hÑ0
f (a + h) ´ f (a)
= lim
hÑ0 h
1
= f (a)

In particular, if you are walking along the x–axis and your x–coordinate at time t is x(t ), then x1 (a)
is the instantaneous rate of change (per unit time) of your x–coordinate at time t = a, which is your
velocity at time a. If v(t ) is your velocity at time t, then v1 (a) is the instantaneous rate of change of
your velocity at time a. This is called your acceleration at time a.
You might expect that if the instantaneous rate of change of a function at time c is strictly
positive, then, in some sense, the function is increasing at t = c. You would be right. Indeed, if
f (t )´ f (c)
f 1 (c) ą 0, then, by definition, the limit of t´c as t approaches c is strictly bigger than zero. So

104
I NTRODUCTION TO THE D ERIVATIVE 3.4 H IGHER ORDER DERIVATIVES

• for all t ą c that are sufficiently close14 to c

f (t ) ´ f (c)
ą 0 ùñ f (t ) ´ f (c) ą 0 (since t ´ c ą 0)
t ´c
ùñ f (t ) ą f (c)

• for all t ă c that are sufficiently close to c

f (t ) ´ f (c)
ą 0 ùñ f (t ) ´ f (c) ă 0 (since t ´ c ă 0)
t ´c
ùñ f (t ) ă f (c)

Consequently we say that “ f (t ) is increasing at t = c”. If we wish to emphasise that the inequalities
above are the strict inequalities ą and ă, as opposed to ě and ď, we will say that “ f (t ) is strictly
increasing at t = c”.

3.4 IJ Higher order derivatives

Learning Objectives
• Understand what is meant by ‘higher-order derivatives,’ and compute them.

The operation of differentiation takes as input one function, f (x), and produces as output another
function, f 1 (x). Now f 1 (x) is once again a function. So we can differentiate it again, assuming
that it is differentiable, to create a third function, called the second derivative of f . And we can
differentiate the second derivative again to create a fourth function, called the third derivative of f .
And so on.

Notation 3.4.1.

d2 f 
• f 2 (x) and f (2) (x) and dx2
(x ) all mean d d
dx dx f (x )

d3 f 
• f 3 (x) and f (3) (x) and dx3
(x ) all mean d d d
dx dx dx f (x )

d4 f 
• f (4) (x) and dx4
(x ) both mean d d d d
dx dx dx dx f (x )

• and so on.

14 This is typical mathematician speak — it allows us to be completely correct, without being terribly precise. In this
context, “sufficiently close” means “The following need not be true for all t bigger than c, but there must exist
some b ą c so that the following is true for all c ă t ă b”. Typically we do not know what b is. And typically it
does not matter what the exact value of b is. All that matters is that b exists and is strictly bigger than c.

105
I NTRODUCTION TO THE D ERIVATIVE 3.4 H IGHER ORDER DERIVATIVES

Here is a simple example.


Example 3.4.2
Let n be a natural number and let f (x) = xn . Then
d n
x = nxn´1
dx
d2 n d 
2
x = nxn´1 = n(n ´ 1)xn´2
dx dx
3
d n d n´2

x = n ( n ´ 1 ) x = n(n ´ 1)(n ´ 2)xn´3
dx3 dx
Each time we differentiate, we bring down the exponent, which is exactly one smaller than the
previous exponent brought down, and we reduce the exponent by one. By the time we have
differentiated n ´ 1 times, the exponent has decreased to n ´ (n ´ 1) = 1 and we have brought down
the factors n(n ´ 1)(n ´ 2) ¨ ¨ ¨ 2. So
dn´1 n
x = n(n ´ 1)(n ´ 2) ¨ ¨ ¨ 2x
dxn´1
and
dn n
x = n(n ´ 1)(n ´ 2) ¨ ¨ ¨ 1
dxn
The product of the first n natural numbers, 1 ¨ 2 ¨ 3 ¨ ¨ ¨ ¨ ¨ n, is called “n factorial” and is denoted n!. So
we can also write
dn n
x = n!
dxn
If m ą n, then
dm n
x =0
dxm

Example 3.4.2

What is the significance of higher order derivatives?


Here is a bit of thinking about second order derivatives in the context of rates of change.
Example 3.4.3
Recall that the derivative v1 (a) is the (instantaneous) rate of change of the function v(t ) at t = a.
Suppose that you are walking on the x–axis and that x(t ) is your x–coordinate at time t. Also
suppose, for simplicity, that you are moving from left to right. Then v(t ) = x1 (t ) is your velocity
at time t and v1 (a) = x2 (a) is the rate at which your velocity is changing at time t = a. It is called
your acceleration. In particular, if x2 (a) ą 0, then your velocity is increasing, i.e. you are speeding
up, at time a. If x2 (a) ă 0, then your velocity is decreasing, i.e. you are slowing down, at time a.
That’s one interpretation of the second derivative.
Example 3.4.3

106
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS

3.5 IJ Derivatives of exponential functions

Learning Objectives
• Use the definition of the derivative to show that the derivative of the function f (x) = ax
(where a is a positive constant) is a constant times ax .

• Describe the exponential function ex in terms of its derivative.

• Note the useful modelling power of a function whose derivative is proportional to itself.

In this section we show how to compute the derivative of the exponential function. Let a ą 015
and set f (x) = ax — this is what is known as an exponential function with base a. This function
interacts very nicely with its derivative and turns up in many “real world” examples.
Let’s see what happens when we try to compute the derivative of this function just using the
definition of the derivative.
df f (x + h) ´ f (x ) ax+h ´ ax
= lim = lim
dx hÑ0 h hÑ0 h
a ´1
h a ´1
h
= lim ax ¨ = ax ¨ lim
hÑ0 h hÑ0 h
We cannot yet complete this computation because we cannot evaluate the last limit directly. For the
moment, let us assume this limit exists and name it

ah ´ 1
C (a) = lim .
hÑ0 h
It depends only on a and on h and is completely independent of x. Using this notation (which we
will quickly improve upon below), our desired derivative is now
d x
a = C (a) ¨ ax .
dx
Thus the derivative of an exponential function ax is just ax multiplied by some constant that depends
only on the base a. If we can tune a so that C (a) = 1 then the derivative would just be the original
function! This turns out to be very useful.
To try finding an a that obeys C (a) = 1, let us first investigate how C (a) changes with a.
Unfortunately (though this fact is not at all obvious) there is no way to write C (a) as a finite
combination of any of the functions we have examined so far16 . Instead, we’ll calculate approximate
values of C (a) by plugging in some small values of h. We’ll do this for a few values of a.

15 Letting the base be positive is necessary because we want to ensure this function is defined for all real x.
16 To be a bit more precise, we say that a number q is algebraic if we can write q as the zero of a polynomial with
integer coefficients. When a is any positive algebraic number other 1, C (a) is not algebraic. A number that is not
algebraic is called transcendental. The best known example of a transcendental number is π (which follows from
the Lindemann-Weierstrass Theorem — way beyond the scope of this course).

107
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS

Example 3.5.1
1h ´ 1
Let a = 1, then C (1) = lim = 0. This is not surprising since 1x = 1 is constant, and so its
hÑ0 h
derivative must be zero everywhere.
2h ´ 1
Now let a = 2, then C (2) = lim . Setting h to smaller and smaller numbers gives:
hÑ0 h

h 0.1 0.01 0.001 0.0001 0.00001 0.000001 0.0000001


2h ´1
h 0.7177 0.6956 0.6934 0.6932 0.6931 0.6931 0.6931

So C (2) « 0.6931. (The actual value of C (2) has an infinitely long decimal expansion.) Similarly
when a = 3 we get:

h 0.1 0.01 0.001 0.0001 0.00001 0.000001 0.0000001


3h ´1
h 1.1612 1.1047 1.0992 1.0987 1.0986 1.0986 1.0986

and a = 10:

h 0.1 0.01 0.001 0.0001 0.00001 0.000001 0.0000001


10h ´1
h 2.5893 2.3293 2.3052 2.3028 2.3026 2.3026 2.3026

So C (3) « 1.0986 and C (10) « 2.3026.


From our calculations it appears that C (a) increases as we increase a, and we expect that
C (a) = 1 for some value of a between 2 and 3.
Example 3.5.1

Instead of continuing to write ‘the value of a for which C (a) = 1’, this particular a is historically
given its own name: e. To find a value for e, we begin with C (e) = 1:

eh ´ 1
C (e) = lim = 1.
hÑ0 h
This means that for small h,
eh ´ 1
« 1,
h
so that
eh ´ 1 « h ñ eh « h + 1 ñ e « (1 + h)1/h .
More formally, we would write that

e = lim (1 + h)1/h . (3.5.1)


hÑ0

We can find an approximate decimal expansion for e by calculating the expression in Eqn. (3.5.1)
for some very small (but finite value) of h.

h 0.1 0.01 0.001 0.0001 0.00001


1/h
(1 + h) 2.5937425 2.7048138 2.7169239 2.7181459 2.7182682

108
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS

We find (e.g. for h = 0.00001) that

e « (1.00001)100000 « 2.71826,

which is not too bad. In fact, e is called Euler’s constant17 :

Equation 3.5.2 (Euler’s constant).

e = 2.7182818284590452354 . . .
1 1 1 1
= 1 + + + + + ¨ ¨ ¨ .18
1! 2! 3! 4!

We will be able to explain this last formula once we develop Taylor polynomials later in the
course.
To summarise:

Theorem 3.5.3.

The constant e is the unique real number that satisfies19

eh ´ 1
lim = 1.
hÑ0 h
Further,
d x
(e ) = ex .
dx

We plot ex in the graph below.

17 Unfortunately there is another Euler’s constant, γ, which is more properly called the Euler–Mascheroni constant.
Anyway like many mathematical discoveries, e was first found by someone else — Napier used the constant e in
order to compute logarithms but only implicitly. Bernoulli was probably the first to approximate it when examining
continuous compound interest. It first appeared explicitly in work of Leibniz, though he denoted it b. It was Euler,
though, who established the notation we now use and who showed how important the constant is to mathematics.
18 Recall n factorial, written n! is the product n ˆ (n ´ 1) ˆ (n ´ 2) ˆ ¨ ¨ ¨ ˆ 2 ˆ 1. 
n
19 Equivalently, e can be defined as e = limhÑ0 (1 + h)1/h or as e = limnÑ8 1 + 1n .

109
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS

Figure 3.5.1.

y y = ex
6

2
1
x
−3 −2 −1 1 2 3

And just a reminder of some of its properties20 . . .

1. ex+y = ex ey .

2. e´x = e1x .
y
3. ex = exy .

4. ex is a function that is defined, continuous, and differentiable for all real numbers x.

5. e0 = 1, and e1 = e.

6. ex ą 0 for all values of x.

7. lim ex = 8, lim ex = 0.
xÑ8 xÑ´8

8. The derivative of ex is ex .

Example 3.5.4
Find the derivative of ex when x = 0. Then show that the tangent line at that point is the line y = x + 1.

• The derivative of ex is ex . At x = 0, e0 = 1.

• The slope of the tangent line at x = 0 is the derivative of the function at that point, which
we just found to be 1. The tangent line goes through the point (0, e0 ) = (0, 1). With slope 1
and an intercept at (0, 1), the tangent line at x = 0 can be written in slope-intercept form as
y = x + 1.

Example 3.5.4

20 The function ex is of course the special case of the function ax with a = e. So it inherits all the usual algebraic
properties of ax .

110
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS

` To see the tangent line to the exponential function: On this graph of f (x) = ex , add the
tangent line y = x + 1. Does it touch the curve where you expect it to? As an extra step, add a
generic tangent line at any point x0 . Adjust a slider for x0 to see how the tangent line changes as
it moves along the curve.

In the next chapter, we return to the problem of differentiating ax . (If your curiosity is piqued,
take a look at Example 4.4.6 – although you’ll need the techniques introduced between here and
there in order to understand it.)

§§ The function ex satisfies a new kind of equation


Before closing this chapter, we divert our attention momentarily to an interesting observation. We
have seen that the function
y = f (x) = ex
satisfies the relationship
dy
= f 1 (x) = f (x) = y.
dx
In other words, when differentiating, we get the same function back again. We can summarize this
observation:

Definition 3.5.5.

The function y = f (x) = ex is equal to its own derivative, which means that it satisfies the
equation
dy
= y.
dx
An equation linking a function and its derivative(s) is called a differential equation.

This is a new type of equation, unlike others previously seen in this course. They feature highly
in some of the later (flavoured) chapters of this text, where we show that these differential equations
have many applications to biology, physics, chemistry, and science in general.

111
I NTRODUCTION TO THE D ERIVATIVE 3.5 D ERIVATIVES OF EXPONENTIAL FUNCTIONS

112
Chapter 4

C OMPUTING D ERIVATIVES

4.1 IJ Arithmetic of derivatives - a differentiation toolbox

Learning Objectives
• Demonstrate using the limit definition of derivative that differentiation is linear.

• Use linearity to “break down” derivatives of sums and constant multiples.

• Use counterexamples to demonstrate that certain statements about derivatives are false.

• Explain why an example does not constitute a “proof”.

• Demonstrate the Power Rule for integer exponents using the limit definition of deriva-
tive.

• State and apply the Power Rule.

• Use the Product Rule to differentiate the product of functions.

• Use the Quotient Rule to differentiate the quotient of functions.

So far, we have evaluated derivatives only by applying Definition 3.3.3 to the function at hand and
then computing the required limits directly. It is quite obvious that as the function being differentiated
becomes even a little complicated, this procedure quickly becomes extremely unwieldy. It is many
orders of magnitude more efficient to have access to:

• a list of derivatives of some simple functions, and

• a collection of rules for breaking down complicated derivative computations into sequences of
simple derivative computations.

113
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

This is precisely what we did to compute limits. We started with limits of simple functions and then
used “arithmetic of limits” to compute limits of complicated functions.
We have already started building our list of derivatives of simple functions. We have shown, in
Examples 3.3.4, 3.3.5, 3.3.10 and 3.3.14, that:
d d d 2 d ? 1
1 = 0, x = 1, x = 2x, x= ? .
dx dx dx dx 2 x
We’ll expand this list later.
We now start building a collection of tools that help reduce the problem of computing the
derivative of a complicated function to that of computing the derivatives of a number of simple
functions. In this section we give three derivative “rules” as three separate theorems. We’ll give the
proofs of these theorems in the next section and examples of how they are used in the following
section.
As was the case for limits, derivatives interact very cleanly with addition, subtraction and
multiplication by a constant. The following result actually follows very directly from the first three
points of Theorem 2.1.14.

Lemma 4.1.1 (Derivative of sum and difference).

Let f (x), g(x) be differentiable functions and let c P R be a constant. Then

d (
f ( x ) + g ( x ) = f 1 ( x ) + g1 ( x ) ,
dx
d (
f ( x ) ´ g ( x ) = f 1 ( x ) ´ g1 ( x ) ,
dx
d (
c f (x ) = c f 1 (x ).
dx
That is, the derivative of the sum is the sum of the derivatives, and so forth.

Following this we can combine the three statements in this lemma into a single rule which
captures the “linearity of differentiation”.

Theorem 4.1.2 (Linearity of differentiation).

Again, let f (x), g(x) be differentiable functions, let α, β P R be constants and define the
“linear combination”

S (x ) = α f (x ) + β g(x ).

Then the derivative of S(x) at x = a exists and is

dS
= S 1 ( x ) = α f 1 ( x ) + β g1 ( x ) .
dx
Note that we can recover the three rules in the previous lemma by setting α = β = 1 or
α = 1, β = ´1 or α = c, β = 0.

114
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

Unfortunately, the derivative does not act quite as simply on products or quotients. The rules for
computing derivatives of products and quotients get their own names and theorems:

Theorem 4.1.3 (The product rule).

Let f (x), g(x) be differentiable functions, then the derivative of the product f (x)g(x)
exists and is given by
d (
f ( x ) g ( x ) = f 1 ( x ) g ( x ) + f ( x ) g1 ( x ) .
dx

Before we proceed to the derivative of the ratio of two functions, it is worth noting a special
case of the product rule when g(x) = f (x). In fact, since this is a useful special case, let us call it a
corollary1 :

Corollary 4.1.4 (Derivative of a square).

Let f (x) be a differentiable function, then the derivative of its square is

d
f (x )2 = 2 f (x ) f 1 (x ).
(
dx

With a little work this can be generalised to other powers — but that is best done once we
understand how to compute the derivative of the composition of two functions. That requires the
chain rule (see Theorem 4.3.2 below). But before we get to that, we need to see how to take the
derivative of a quotient of two functions.

Theorem 4.1.5 (The quotient rule).

Let f (x), g(x) be differentiable functions. Then the derivative of their quotient is

f 1 (x) g(x) ´ f (x) g1 (x)


" *
d f (x )
= .
dx g(x) g(x )2

This derivative exists except at points where g(x) = 0.

So we have covered sums, differences, products and quotients. This allows us to compute
derivatives of many different functions — including polynomials and rational functions. However
we are still missing trigonometric functions (for example), and a rule for computing derivatives of
compositions of functions. These will follow in the near future, but there are a couple of things to
do before that: understand where the above theorems come from, and practice using them.

1 Recall that a corollary is an important result that follows from one or more theorems — typically without too much
extra work — as is the case here.

115
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

IJ Proofs of the arithmetic of derivatives


The theorems of the previous section are not too difficult to prove from the definition of the derivative
(which we know) and the arithmetic of limits (which we also know). In this section we show how to
construct these rules.
Throughout this section we will use our two functions f (x) and g(x). Since the theorems we are
going to prove all express derivatives of linear combinations, products and quotients in terms of f , g
and their derivatives, it is helpful to recall the definitions of the derivatives of f and g:

f (x + h) ´ f (x ) g(x + h) ´ g(x )
f 1 (x) = lim and g1 (x) = lim .
hÑ0 h hÑ0 h

Our proofs, roughly speaking, involve doing algebraic manipulations to uncover the expressions that
look like the above.

§§ Proof of the linearity of differentiation (Theorem 4.1.2)


Recall that in Theorem 4.1.2 we defined S(x) = α f (x) + β g(x), where α, β P R are constants. We
wish to compute S1 (x), so we start with the definition:

S (x + h) ´ S (x )
S1 (x) = lim .
hÑ0 h

Let us concentrate on the numerator of the expression inside the limit and then come back to the full
limit in a moment. Substitute in the definition of S(x):
   
S (x + h) ´ S (x ) = α f (x + h) + β g(x + h) ´ α f (x ) + β g(x ) collect terms
  
= α f (x + h) ´ f (x)] + β g(x + h) ´ g(x) .

Now it is easy to see the structures we need — namely, we almost have the expressions for the
derivatives f 1 (x) and g1 (x). Indeed, all we need to do is divide by h and take the limit. So let’s finish
things off.

S (x + h) ´ S (x )
S1 (x) = lim from above
hÑ0 h
  
α f (x + h) ´ f (x)] + β g(x + h) ´ g(x)
= lim
hÑ0
 h 
f (x + h) ´ f (x ) g(x + h) ´ g(x )
= lim α +β limit laws
hÑ0 h h
f (x + h) ´ f (x ) g(x + h) ´ g(x )
= α lim + β lim
hÑ0 h hÑ0 h
1 1
= α f (x ) + β g (x ),

as required.

116
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

§§ Proof of the product rule (Theorem 4.1.3)


After the warm-up above, we will just jump straight in. Let P(x) = f (x) g(x), the product of our
two functions. The derivative of the product is given by

P(x + h) ´ P(x )
P1 (x) = lim
hÑ0 h
Again we will focus on the numerator inside the limit and massage it into the form we need. To
simplify these manipulations, define

f (x + h) ´ f (x ) g(x + h) ´ g(x )
F (h) = and G(h) = .
h h
Then we can write

f (x + h) = f (x) + hF (h) and g(x + h) = g(x) + hG(h).

We can also write

f 1 (x) = lim F (h) and g1 (x) = lim G(h).


hÑ0 hÑ0

So back to that numerator:

P(x + h) ´ P(x ) = f (x + h) ¨ g(x + h) ´ f (x ) ¨ g(x ) substitute


= [ f (x) + hF (h)] [g(x) + hG(h)] ´ f (x) ¨ g(x) expand
= f (x)g(x) + f (x) ¨ hG(h) + hF (h) ¨ g(x) + h2 F (h) ¨ G(h) ´ f (x) ¨ g(x)
= f (x) ¨ hG(h) + hF (h) ¨ g(x) + h2 F (h) ¨ G(h).

Armed with this we return to the definition of the derivative:


P(x + h) ´ P(x )
P1 (x) = lim
hÑ0 h
f (x) ¨ hG(h) + hF (h) ¨ g(x) + h2 F (h) ¨ G(h)
= lim
hÑ0 h
     
f (x) ¨ hG(h) hF (h) ¨ g(x) h2 F ( h ) ¨ G ( h )
= lim + lim + lim
hÑ0 h hÑ0 h hÑ0 h
     
= lim f (x) ¨ G(h) + lim F (h) ¨ g(x) + lim hF (h) ¨ G(h) .
hÑ0 hÑ0 hÑ0

Now since f (x) and g(x) do not change as we send h to zero, we can pull them outside. We can
also write the third term as the product of 3 limits:
         
= f (x) lim G(h) + g(x) lim F (h) + lim h ¨ lim F (h) ¨ lim G(h)
hÑ0 hÑ0 hÑ0 hÑ0 hÑ0
= f (x ) ¨ g (x ) + g(x ) ¨ f (x ) + 0 ¨ f (x ) ¨ g (x )
1 1 1 1

= f ( x ) ¨ g1 ( x ) + g ( x ) ¨ f 1 ( x ) .

And so we recover the product rule.

117
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

§§ (optional) — Proof of the quotient rule (Theorem 4.1.5)


We now give the proof of the quotient rule in two steps2 . We assume throughout that g(x) ‰ 0 and
that f (x) and g(x) are differentiable, meaning that the limits defining f 1 (x), g1 (x) exist.

• In the first step, we prove the quotient rule under the assumption that f (x)/g(x) is differen-
tiable.

• In the second step, we prove that 1/g(x) differentiable. Once we know that 1/g(x) is
differentiable, the product rule implies that f (x)/g(x) is differentiable.

f (x ) f (x )
Step 1: the proof of the quotient rule assuming that g(x) is differentiable. Write Q(x) = g(x )
. Then
f (x) = g(x) Q(x) so that f 1 (x) = g1 (x) Q(x) + g(x) Q1 (x), by the product rule, and

f (x )
f 1 ( x ) ´ g1 ( x ) Q ( x ) f 1 ( x ) ´ g1 ( x ) g ( x )
1
Q (x ) = =
g(x ) g(x )
f (x )g(x ) ´ f (x )g (x )
1 1
= .
g(x )2

Step 2: the proof that 1/g(x) is differentiable. By definition


 
d 1 1 1 1 g(x ) ´ g(x + h)
= lim ´ = lim
dx g(x) hÑ0 h g(x + h) g(x) hÑ0 h g(x) g(x + h)
1 1 g(x + h) ´ g(x )
= ´ lim
hÑ0 g(x) g(x + h) h
1 1 g(x + h) ´ g(x )
=´ lim lim
g(x) hÑ0 g(x + h) hÑ0 h
1 1
=´ g (x ).
g(x )2

IJ Using the arithmetic of derivatives: Examples


In this section we illustrate the computation of derivatives using the arithmetic of derivatives —
Theorems 4.1.2, 4.1.3 and 4.1.5. To make it clear which rules we are using during the examples we
will note which theorem we are using:

d
‚ LIN to stand for “linearity” dx tα f (x) + β g(x)u = α f 1 (x) + β g1 (x) Theorem 4.1.2
d
‚ PR to stand for “product rule” dx t f (x) g(x)u = f 1 ( x ) g ( x ) + f ( x ) g1 ( x ) Theorem 4.1.3
! )
d f (x ) f 1 (x) g(x)´ f (x) g1 (x)
‚ QR to stand for “quotient rule” dx g(x )
= g(x )2
Theorem 4.1.5

2 We thank Serban Raianu for suggesting this approach.

118
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

We’ll start with a really easy example.


Example 4.1.6

d d d
t4x + 7u = 4 ¨ txu + 7 ¨ t1u LIN
dx dx dx
= 4¨1+7¨0 = 4
where we have used LIN with f (x) = x, g(x) = 1, α = 4, β = 7.
Example 4.1.6

Example 4.1.7
Continuing on from the previous example, we can use the product rule and the previous result to
compute
d ( d d
x(4x + 7) = x ¨ t4x + 7u + (4x + 7) txu PR
dx dx dx
= x ¨ 4 + (4x + 7) ¨ 1
= 8x + 7
where we have used the product rule PR with f (x) = x and g(x) = 4x + 7.
Example 4.1.7

Example 4.1.8
In the same vein as the previous example, we can use the quotient rule to compute
d d
(4x + 7) ¨ dx txu ´ x ¨ dx t4x + 7u
" *
d x
= QR
dx 4x + 7 (4x + 7)2
(4x + 7) ¨ 1 ´ x ¨ 4
=
(4x + 7)2
7
=
(4x + 7)2
where we have used the quotient rule QR with f (x) = x and g(x) = 4x + 7.
Example 4.1.8

Now for a messier example.


Example 4.1.9
Differentiate:
x
f (x ) = .
2x + 3x1+1
This problem looks nasty. But it isn’t so hard if we just build it up a bit at a time.

119
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

• First, f (x) is the ratio of


1
f 1 (x ) = x and f2 (x) = 2x + .
3x + 1
If we can find the derivatives of f1 (x) and f2 (x), we will be able to get the derivative of f (x)
just by applying the quotient rule. The derivative, f11 (x) = 1, of f1 (x) is easy, so let’s work on
f 2 (x ).
• The function f2 (x) is the linear combination
1
f 2 (x ) = 2 f 3 (x ) + f 4 (x ) with f 3 (x ) = x and f 4 (x ) = .
3x + 1
If we can find the derivatives of f3 (x) and f4 (x), we will be able to get the derivative of f2 (x)
just by applying linearity (Theorem 4.1.2). The derivative, f31 (x) = 1, of f3 (x) is easy. So
let’s work on f4 (x).
• The function f4 (x) is the ratio
1
f 4 (x ) = with f5 (x) = 3x + 1.
f 5 (x )
If we can find the derivative of f5 (x), we will be able to get the derivative of f4 (x) by applying
the quotient rule to f 1(x) 3 . The derivative of f5 (x) is easy.
5

• So we have completed breaking down f (x) into easy pieces. It is now just a matter of reversing
the break down steps, putting everything back together, starting with the easy pieces and
working up to f (x). Here goes.
d d d
f5 (x) = 3x + 1 so f 5 (x ) = 3 x + 1 = 3 ¨ 1 + 0 = 3 LIN
dx dx dx
1 d f 1 (x ) 3
f 4 (x ) = so f 4 (x ) = ´ 5 2 = ´ QR
f 5 (x ) dx f 5 (x ) (3x + 1)2
d 3
f 2 (x ) = 2 f 3 (x ) + f 4 (x ) so f2 (x) = 2 f31 (x) + f41 (x) = 2 ´ LIN
dx (3x + 1)2
f 1 (x ) d f 1 (x) f2 (x) ´ f1 (x) f21 (x)
f (x ) = so f (x ) = 1 QR
f 2 (x ) dx f 2 (x )2
   
1 2x + 3x1+1 ´ x 2 ´ (3x+3 1)2
=  2
2x + 3x1+1
Oof!

3 This is an instance of a special case of the quotient rule (Theorem 4.1.5) which is obtained by setting f (x) = 1. You
might see this defined elsewhere as “the derivative of a reciprocal”. It can be stated as: Let g(x) be a differentiable
function. Then the derivative of the reciprocal of g is given by
" *
d 1 g1 ( x )

dx g(x) g(x )2
and exists except at those points where g(x) = 0.

120
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

• We now have an answer. But we really should clean it up, not only to make it easier to
read, but also because invariably such computations are just small steps inside much larger
computations. Any future computations involving this expression will be a lot easier and less
error prone if we clean it up now. Cancelling the 2x and the ´2x in
 1   3  1 3x
1 2x + ´x 2´ = 2x + ´ 2x +
3x + 1 (3x + 1)2 3x + 1 (3x + 1)2
1 3x
= +
3x + 1 (3x + 1)2

and multiplying both the numerator and denominator by (3x + 1)2 gives
1
+ (3x3x
3x+1 +1)2 (3x + 1)
2
1
f (x ) =  2
2x + 3x1+1 (3x + 1)2

(3x + 1) + 3x
= 2
2x(3x + 1) + 1
6x + 1
= .
[6x2 + 2x + 1]2

Example 4.1.9

While the linearity theorem (Theorem 4.1.2) is stated for a linear combination of two functions,
it is not difficult to extend it to linear combinations of three or more functions as the following
example shows.
Example 4.1.10
We’ll start by generalising linearity to three functions.

d ( d (
aF (x) + bG(x) + cH (x) = a ¨ [F (x)] + 1 ¨ [bG(x) + cH (x)]
dx dx
d
= aF 1 (x) + tbG(x) + cH (x)u
dx
by LIN with α = a, f (x) = F (x), β = 1,
and g(x) = bG(x) + cH (x),
= aF (x) + bG1 (x) + cH 1 (x)
1

by LIN with α = b, f (x) = G(x), β = c,


and g(x) = H (x).

This gives us linearity for three terms, namely (just replacing upper case names by lower case
names):

d
ta f (x) + bg(x) + ch(x)u = a f 1 (x) + bg1 (x) + ch1 (x).
dx

121
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

Just by repeating the above argument many times, we may generalise to linearity for n terms, for
any natural number n:

d
ta1 f1 (x) + a2 f2 (x) + ¨ ¨ ¨ + an fn (x)u = a1 f11 (x) + a2 f21 (x) + ¨ ¨ ¨ + an fn1 (x).
dx

Example 4.1.10

Similarly, while the product rule is stated for the product of two functions, it is not difficult to
extend it to the product of three or more functions as the following example shows.
Example 4.1.11
Once again, we’ll start by generalising the product rule to three factors.

d d
tF (x) G(x) H (x)u = F 1 (x) G(x) H (x) + F (x) tG(x) H (x)u
dx dx
by PR with f (x) = F (x) and g(x) = G(x)H (x)
(
= F 1 (x) G(x) H (x) + F (x) G1 (x) H (x) + G(x) H 1 (x)
by PR with f (x) = G(x) and g(x) = H (x).

This gives us a product rule for three factors, namely (just replacing upper case names by lower case
names)

d
t f (x) g(x) h(x)u = f 1 (x) g(x) h(x) + f (x) g1 (x) h(x) + f (x) g(x) h1 (x).
dx
Observe that when we differentiate a product of three factors, the answer is a sum of three terms and
in each term the derivative acts on exactly one of the original factors. Just by repeating the above
argument many times, we may generalise the product rule to give the derivative of a product of n
factors, for any natural number n:

d
t f 1 (x ) f 2 (x ) ¨ ¨ ¨ f n (x )u = f11 (x) f2 (x) ¨ ¨ ¨ fn (x)
dx
+ f1 (x) f21 (x) ¨ ¨ ¨ fn (x)
..
.
+ f1 (x) f2 (x) ¨ ¨ ¨ fn1 (x).

We can also write the above as


 
d f11 (x) f21 (x) fn1 (x)
t f 1 (x ) f 2 (x ) ¨ ¨ ¨ f n (x )u = + + ¨¨¨ + ¨ f 1 (x ) f 2 (x ) ¨ ¨ ¨ f n (x ).
dx f 1 (x ) f 2 (x ) f n (x )

When we differentiate a product of n factors, the answer is a sum of n terms and in each term the
derivative acts on exactly one of the original factors. In the first term, the derivative acts on the first
of the original factors. In the second term, the derivative acts on the second of the original factors.
And so on.

122
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

If we make f1 (x) = f2 (x) = ¨ ¨ ¨ = fn (x) = f (x) then each of the n terms on the right hand side of
the above equation is the product of f 1 (x) and exactly n ´ 1 f (x)’s, and so is exactly f (x)n´1 f 1 (x).
So we get the following useful result:

d
f (x)n = n ¨ f (x)n´1 ¨ f 1 (x).
dx

Example 4.1.11

This last result is quite useful, so let us write it as a lemma for future reference.

Lemma 4.1.12.

Let n be a natural number and f be a differentiable function. Then


d
f (x)n = n ¨ f (x)n´1 ¨ f 1 (x)
dx

This immediately gives us another useful result.


Example 4.1.13
We can now compute the derivative of xn for any natural number n. Start with Lemma 4.1.12 and
substitute f (x) = x and f 1 (x) = 1:

d n
x = n ¨ xn´1 ¨ 1 = n xn´1 .
dx

Example 4.1.13

Again — this is a result we will come back to quite a few times in the future, so we should make
sure we can refer to it easily. However, at present this statement only holds when n is a positive
integer. With a little more work we can extend this to compute xq where q is any positive rational
number and then any rational number at all (positive or negative). So let us hold off for a little
longer. Instead we can make it a lemma, since it will be an ingredient in quite a few of the examples
following below and in constructing the final corollary.

Lemma 4.1.14 (Derivative of xn ).

Let n be a positive integer then


d n
x = nxn´1 (4.1.1)
dx

Back to more examples.

123
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

Example 4.1.15

d d d
2x3 + 4x5 = 2 tx3 u + 4 tx5 u
(
dx dx dx
by LIN with α = 2, f (x) = x3 , β = 4, and g(x) = x5
= 2t3x2 u + 4t5x4 u
by Lemma 4.1.14, once with n = 3, and once with n = 5
= 6x2 + 20x4 .

Example 4.1.15

Example 4.1.16
d
(
In this example we’ll compute dx (3x + 9)(x2 + 4x3 ) in two different ways. For the first, we’ll
start with the product rule.

d ( !d ) d
(3x + 9)(x2 + 4x3 ) = (3x + 9) (x2 + 4x3 ) + (3x + 9) tx2 + 4x3 u
dx dx dx
= 3 ˆ 1 + 9 ˆ 0 (x + 4x ) + (3x + 9) t2x + 4(3x2 )u
( 2 3

= 3(x2 + 4x3 ) + (3x + 9) (2x + 12x2 )


= 3x2 + 12x3 + (6x2 + 18x + 36x3 + 108x2 )
= 18x + 117x2 + 48x3 .

For the second, we expand the product first and then differentiate.

d ( d
(3x + 9)(x2 + 4x3 ) = 9x2 + 39x3 + 12x4
(
dx dx
= 9(2x) + 39(3x2 ) + 12(4x3 )
= 18x + 117x2 + 48x3 .

Example 4.1.16

Example 4.1.17

124
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

d 4x3 ´ 7x (12x2 ´ 7)(4x2 + 1) ´ (4x3 ´ 7x)(8x)


" *
=
dx 4x2 + 1 (4x2 + 1)2
by QR with f (x) = 4x3 ´ 7x, f 1 (x) = 12x2 ´ 7,
and g(x) = 4x2 + 1, g1 (x) = 8x
(48x4 ´ 16x2 ´ 7) ´ (32x4 ´ 56x2 )
=
(4x2 + 1)2
16x4 + 40x2 ´ 7
= .
(4x2 + 1)2

Example 4.1.17

Example 4.1.18
?
3
In this example, we’ll use a little trickery to find the derivative of x. The trickery consists of
observing that, by the definition of the cube root,
? 3
x = ( 3 x) .
Since both sides of the expression are the same, they must have the same derivatives:
d d ? 3
txu = ( 3 x) .
dx dx
We already know by Theorem 3.3.6 that
d (
x =1
dx
?
and that, by Lemma 4.1.12 with n = 3 and f (x) = 3 x,
d ? 3 ? 2 d ? d ?
x = 3 x2/3 ¨
( (
3
x =3 3x ¨ 3 3
x .
dx dx dx
?
Since we know that dxd
txu = dxd
( 3 x)3 , we must have
d ?
1 = 3x2/3 ¨ 3
(
x
dx
which we can rearrange to give the result we need
d ? (
3
x = 13 x´2/3 .
dx

Example 4.1.18

Example 4.1.19
In this example, we’ll use the same trickery as in Example 4.1.18 to find the derivative x p/q for any
two natural numbers p and q. By definition of the qth root,
q
x p = x p/q .

125
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

q
That is, x p and x p/q are the same function, and so have the same derivative. So we differentiate
both of them. We already know that, by Lemma 4.1.14 with n = p,
d p(
x = px p´1
dx
and that, by Lemma 4.1.12 with n = q and f (x) = x p/q ,
d q ( q´1 d p/q (
x p/q = q x p/q x .
dx dx
Remember that (xa )b = x(a¨b) . Now these two derivatives must be the same. So
d p/q (
px p´1 = q ¨ x( pq´p)/q x
dx
and, rearranging things,
d p/q ( p p´1´( pq´p)/q
x = x
dx q
p
= x( pq´q´pq+ p)/q
q
p p
= x /q´1 .
q
So finally
d ! p/q ) p p/q´1
x = x . (4.1.2)
dx q
Notice that this has the same form as Lemma 4.1.14, above, except with n = p/q allowed to be any
positive rational number, not just a positive integer.
Example 4.1.19

Example 4.1.20 (Derivative of x´m )


In this example we’ll use the quotient rule to find the derivative of x´m , for any natural number m.
By the special case of the quotient rule
mxm´1
" *
d ´m ( d 1
x = = ´ 2
= ´mx´m´1
dx dx xm m
(x )
Again, notice that this has the same form as Lemma 4.1.14, above, except with n = ´m being a
negative integer.
Example 4.1.20

Example 4.1.21
In this example we’ll use the quotient rule to find the derivative of x´p/q , for any pair of natural
numbers p and q. By a special case the quotient rule with g(x) = x p/q and g1 (x) = qp x p/q´1 ,
p p/q´1
qx
" *
d ´ p/q ( d 1 p ´ p/q´1
x = p/q
=´ 2
= ´ x
dx dx x (x p/q ) q

126
C OMPUTING D ERIVATIVES 4.1 A RITHMETIC OF DERIVATIVES

Example 4.1.21

Note that we have found, in Examples 3.3.4, 4.1.19 and 4.1.21, the derivative of xa for any
rational number a, whether 0, positive, negative, integer or fractional. In all cases, the answer is

Corollary 4.1.22 (Derivative of xa ).

Let a be a rational number, then


d a
x = axa´1 (4.1.3)
dx

d a
We shall show, in Example 4.4.5, that the formula dx x = axa´1 in fact applies for all real numbers
a, not just rational numbers.
?
Back in Example 3.3.14 we computed the derivative of x from the definition of the derivative.
The above corollary (correctly) gives

d 1/2 1 ´1/2
x = x
dx 2

but with far less work.


Here’s an (optional) messy example.
Example 4.1.23 (Optional messy example)
Find the derivative of ?
( x ´ 1)(2 ´ x)(1 ´ x2 )
f (x ) = ? .
x(3 + 2x)

• As we seen before, the best strategy for dealing with nasty expressions is to break them up
into easy pieces. We can think of f (x) as the five–fold product

1 1
f (x ) = f 1 (x ) ¨ f 2 (x ) ¨ f 3 (x ) ¨ ¨
f 4 (x ) f 5 (x )

with
? ?
f 1 (x ) = x´1 f 2 (x ) = 2 ´ x f 3 ( x ) = 1 ´ x2 f 4 (x ) = x f5 (x) = 3 + 2x.

• By now, the derivatives of the f j ’s should be easy to find:

1 1
f11 (x) = ? f21 (x) = ´1 f31 (x) = ´2x f41 (x) = ? f51 (x) = 2.
2 x 2 x

• Now, to get the derivative f (x) we use the n–fold product rule which was developed in

127
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES

Example 4.1.11, together with the quotient rule.

1 1 1 1 1 1 f1 1 1 f51
f 1 (x) = f11 f2 f3 + f1 f21 f3 + f1 f2 f31 ´ f1 f2 f3 42 ´ f1 f2 f3
f4 f5 f4 f5 f4 f5 f4 f5 f4 f52
h f1 f21 f1 f1 f1i 1 1
= 1
+ + 3 ´ 4 ´ 5 f1 f2 f3
f1 f2 f3 f4 f5 f4 f5
  ?
1 1 2x 1 2 ( x ´ 1)(2 ´ x)(1 ´ x2 )
= ? ? ´ ´ ´ ´ ? .
2 x( x ´ 1) 2 ´ x 1 ´ x2 2x 3 + 2x x(3 + 2x)

The trick that we used in going from the first line to the second line, namely multiplying term
f (x )
number j by f j (x) is often useful in simplifying the derivative of a product of many factors4 .
j

Example 4.1.23

4.2 IJ Trigonometric functions and their derivatives

Learning Objectives
• Review the definitions of trigonometric functions.

• Determine derivatives of trigonometric functions using the limit definition of derivative,


trigonometric limits, addition formulas, and Product and Quotient Rules.

We are now going to compute the derivatives of the various trigonometric functions, sin x, cos x
and so on. The computations are more involved than the others that we have done so far and will
take several steps. Fortunately, the final answers will be very simple.
Observe that we only need to work out the derivatives of sin x and cos x, since the other trigono-
metric functions are really just quotients of these two functions. Recall:

sin x cos x 1 1
tan x = cot x = csc x = sec x = .
cos x sin x sin x cos x
The first steps towards computing the derivatives of sin x, cos x is to find their derivatives at x = 0.
The derivatives at general points x will follow quickly from these, using trig identities. It is important
to note that we must measure angles in radians5 , rather than degrees, in what follows. Indeed —
unless explicitly stated otherwise, any number that is put into a trigonometric function is measured
in radians.

4 Also take a look at “logarithmic differentiation” ins Section 4.4.


5 In science, radians is the standard unit for measuring angles. While you may be more familiar with degrees, radians
should be used in any computation involving calculus. Using degrees will cause errors. Thankfully it is easy to
translate between these two measures since 360˝ = 2π radians.

128
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES

§§ These proofs are optional; the results are not.


While we expect you to read and follow these proofs, we do not expect you to be able to reproduce
them. You will be required to know the results, in particular Theorem 4.2.5 below.

d
ˇ
§§ Step 1: dx tsin xu x=0
ˇ
By definition, the derivative of sin x evaluated at x = 0 is
d ˇ sin h ´ sin 0 sin h
tsin xuˇ = lim = lim .
ˇ
dx x=0 hÑ0 h hÑ0 h
We will prove this limit by use of a theorem called the squeeze theorem6 . To get there we will first
need to do some geometry. But first we will build some intuition.
The figure below contains part of a circle of radius 1. Recall that an arc of length h on such a
circle subtends an angle of h radians at the centre of the circle. So the darkened arc in the figure
has length h and the darkened vertical line in the figure has length sin h. We must determine what
happens to the ratio of the lengths of the darkened vertical line and darkened arc as h tends to zero.

1 h
sin h

h
cos h

Here is a magnified version of the part of the above figure that contains the darkened arc and vertical
line.

sin h h = 0.4

6 The squeeze theorem is not part of the Math 100 content, but we do need to use its results for this proof. This theorem
tells that we can compute the limit of a function by “squeezing” or “sandwiching” it between two other functions.
If the upper function and the lower function both tend to the same value, then so does the function that is squeezed
between them. Formally, we would state it as: Let a P R and let f , g, h be three functions so that f (x) ď g(x) ď h(x)
for all x in an interval around a, except possibly exactly at x = a. Then if limxÑa f (x) = limxÑa h(x) = L then it is
also the case that limxÑa g(x) = L. (We do not prove it here.)

129
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES

This particular figure has been drawn with h = .4 radians. Here are three more such blow ups.
In each successive figure, the value of h is smaller. To make the figures clearer, the degree of
magnification was increased each time h was decreased.

sin h h = 0.2 sin h h = 0.1 sin h h = 0.05

As we make h smaller and smaller and look at the figure with ever increasing magnification, the arc
of length h and vertical line of length sin h look more and more alike. We would guess from this that
sin h
lim = 1.
hÑ0 h
The following tables of values
sin h sin h
h sin h h h sin h h
0.4 .3894 .9735 ´0.4 ´.3894 .9735
0.2 .1987 .9934 ´0.2 ´.1987 .9934
0.1 .09983 .9983 ´0.1 ´.09983 .9983
0.05 .049979 .99958 ´0.05 ´.049979 .99958
0.01 .00999983 .999983 ´0.01 ´.00999983 .999983
0.001 .0099999983 .9999983 ´0.001 ´.0099999983 .9999983
suggests the same guess. Here is an argument that shows that the guess really is correct.

sin h
§§§ Proof that lim = 1:
hÑ0 h

tan h
1
sin h

h S
O cos h R
1

130
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES

The circle in the figure above has radius 1. Hence

|OP| = |OR| = 1 |PS| = sin h


|OS| = cos h |QR| = tan h

Now we can use a few geometric facts about this figure to establish both an upper bound and a lower
bound on sinh h with both the upper and lower bounds tending to 1 as h tends to 0. So the squeeze
theorem7 will tell us that sinh h also tends to 1 as h tends to 0.

• The triangle OPR has base 1 and height sin h, and hence
sin h
area of 4OPR = 12 ˆ 1 ˆ sin h = .
2

• The triangle OQR has base 1 and height tan h, and hence
tan h
area of 4OQR = 21 ˆ 1 ˆ tan h = .
2
h
• The “piece of pie” OPR cut out of the circle is the fraction 2π of the whole circle (since the
angle at the corner of the piece of pie is h radians and the angle for the whole circle is 2π
radians). Since the circle has radius 1 we have
h h h
area of pie OPR = ¨ (area of circle) = π ¨ 12 =
2π 2π 2

Now the triangle OPR is contained inside the piece of pie OPR. and so the area of the triangle is
smaller than the area of the piece of pie. Similarly, the piece of pie OPR is contained inside the
triangle OQR. Thus we have

area of triangle OPR ď area of pie OPR ď area of triangle OQR

Substituting in the areas we worked out gives


sin h h tan h
ď ď
2 2 2
which cleans up to give
sin h
sin h ď h ď .
cos h
sin h
We rewrite these two inequalities so that h appears in both.
sin h
• Since sin h ď h, we have that ď 1.
h
sin h sin h
• Since h ď we have that cos h ď .
cos h h

7 Again, we aren’t proving the squeeze theorem, nor are we requiring you to know it — see the previous footnote.
What you need to know here is that we are “squeezing” the function sin h/h between the upper and lower bounds.

131
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES

Thus we arrive at the “squeezable” inequality:

sin h
cos h ď ď 1.
h

We know that

lim cos h = 1.
hÑ0

Since sinh h is sandwiched between cos h and 1, we can apply the squeeze theorem for limits to deduce
the following lemma:

Lemma 4.2.1.

sin h
lim =1
hÑ0 h

Since this argument took a bit of work, perhaps we should remind ourselves why we needed it in
the first place. We were computing

d ˇ sin h ´ sin 0
tsin xuˇ = lim
ˇ
dx x=0 hÑ0 h
sin h
= lim (This is why!)
hÑ0 h
= 1.

d
ˇ
This concludes Step 1. We now know that dx sin x x=0
ˇ = 1. The remaining steps are easier.

d
ˇ
§§ Step 2: dx tcos xu x=0
ˇ

By definition, the derivative of cos x evaluated at x = 0 is

cos h ´ cos 0 cos h ´ 1


lim = lim .
hÑ0 h hÑ0 h

Fortunately we don’t have to wade through geometry like we did for the previous step. Instead we
can recycle our work and massage the above limit to rewrite it in terms of expressions involving sinh h .
Thanks to Lemma 4.2.1 the work is then easy.
We’ll show you two ways to proceed — one uses a method similar to “multiplying by the
conjugate” that we have already used a few times (see Example 3.3.14 ), while the other uses a nice
trick involving the double–angle formula.

132
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES

§§§ Method 1 — Multiply by the “Conjugate”


cos h + 1
Start by multiplying the expression inside the limit by 1, written as :
cos h + 1
cos h ´ 1 cos h ´ 1 cos h + 1
= ¨
h h cos h + 1
cos2 h ´ 1 
= since (a ´ b)(a + b) = a2 ´ b2
h(1 + cos h)
sin2 h
=´ (since sin2 h + cos2 h = 1)
h(1 + cos h)
sin h sin h
=´ ¨ .
h 1 + cos h
Now we can take the limit as h Ñ 0 via Lemma 4.2.1:
 
cos h ´ 1 ´ sin h sin h
lim = lim ¨
hÑ0 h hÑ0 h 1 + cos h
   
sin h sin h
= ´ lim ¨ lim
hÑ0 h hÑ0 1 + cos h
0
= ´1 ¨
2
= 0.

§§§ Method 2 — via the Double Angle Formula


The other way involves the double angle formula,
cos 2θ = 1 ´ 2 sin2 (θ ) or cos 2θ ´ 1 = ´2 sin2 (θ ).
Setting θ = h/2, we have
2
cos h ´ 1 ´2 sin h2
= .
h h
sin h
Now this begins to look like h , except that inside the sin(¨) we have h/2. So, setting θ = h/2,

cos h ´ 1 sin2 θ sin2 θ


=´ = ´θ ¨ 2
h θ θ
sin θ sin θ
= ´θ ¨ ¨ .
θ θ
When we take the limit as h Ñ 0, we are also taking the limit as θ = h/2 Ñ 0, and so
 
cos h ´ 1 sin θ sin θ
lim = lim ´θ ¨ ¨
hÑ0 h θ Ñ0 θ θ
   
sin θ sin θ
= lim [´θ ] ¨ lim ¨ lim
θ Ñ0 θ Ñ0 θ θ Ñ0 θ
= 0¨1¨1
= 0,

133
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES

sin h
where we have used the fact that lim = 1 and that the limit of a product is the product of limits
hÑ0 h
(i.e. Lemma 4.2.1 and Theorem 2.1.14).
Thus we have now produced two proofs of the following lemma:

Lemma 4.2.2.

cos h ´ 1
lim =0
hÑ0 h

Again, there has been a bit of work to get to here, so we should remind ourselves why we needed
it. We were computing

d ˇ cos h ´ cos 0
tcos xuˇ = lim
ˇ
dx x=0 hÑ0 h
cos h ´ 1
= lim
hÑ0 h
= 0.

Armed with these results we can now build up the derivatives of sine and cosine.

d d
§§ Step 3: dx tsin xu and dx tcos xu for General x
To proceed to the general derivatives of sin x and cos x we are going to use the above two results and
a couple of trig identities. Remember the addition formulae

sin(a + b) = sin(a) cos(b) + cos(a) sin(b),


cos(a + b) = cos(a) cos(b) ´ sin(a) sin(b).

To compute the derivative of sin(x) we just start from the definition of the derivative:

d sin(x + h) ´ sin x
sin x = lim
dx hÑ0 h
sin x cos h + cos x sin h ´ sin x
= lim
hÑ0 h
 
cos h ´ 1 sin h ´ 0
= lim sin x + cos x
hÑ0 h h
cos h ´ 1 sin h ´ 0
= sin x lim + cos x lim
hÑ0 h hÑ0 h
   
d d
= sin x cos x + cos x sin x
dx x=0
loooooomoooooon dx x=0
loooooomoooooon
=0 =1
= cos x.

134
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES

The computation of the derivative of cos x is very similar.


d cos(x + h) ´ cos x
cos x = lim
dx hÑ0 h
cos x cos h ´ sin x sin h ´ cos x
= lim
hÑ0 h
 
cos h ´ 1 sin h ´ 0
= lim cos x ´ sin x
hÑ0 h h
cos h ´ 1 sin h ´ 0
= cos x lim ´ sin x lim
hÑ0 h hÑ0 h
   
d d
= cos x cos x ´ sin x sin x
dx x=0
loooooomoooooon dx x=0
loooooomoooooon
=0 =1
= ´ sin x.
We have now found the derivatives of both sin x and cos x, provided x is measured in radians.

Lemma 4.2.3.

d d
sin x = cos x cos x = ´ sin x
dx dx
The above formulas hold provided x is measured in radians.

d
These formulae are pretty easy to remember — applying dx to sin x and cos x just exchanges
8
sin x and cos x, except for the minus sign in the derivative of cos x.
d
Remark 4.2.4 (Optional — Another derivation of dx cos x = ´ sin x). We remark that, once one
d

knows that dx sin x = cos x, it is easy to use it and the trig identity cos(x) = sin π2 ´ x to derive
d
dx cos x = ´ sin x. Here is how .
9


d cos(x + h) ´ cos x sin π2 ´ x ´ h) ´ sin π2 ´ x
cos x = lim = lim
dx hÑ0 h hÑ0 h
sin x + h ) ´ sin(x )
1 1 1
= ´ lim 1
with x1 = π2 ´ x, h1 = ´h
1
h Ñ0 h
d ˇ 
= ´ 1 sin x1 ˇ 1 π = ´ cos π2 ´ x
ˇ
dx x = 2 ´x

= ´ sin x.
Note that, if x is measured in degrees, then the formulas of Lemma 4.2.3 are wrong. There are
similar formulas, but we need the chain rule to build them — that is the subject of the next section.
But first we should find the derivatives of the other trig functions.

8 There is a bad pun somewhere in here about sine errors and sign errors.
9 We thank Serban Raianu for suggesting that we include this.

135
C OMPUTING D ERIVATIVES 4.2 T RIGONOMETRIC FUNCTIONS AND THEIR DERIVATIVES

§§ Step 4: The remaining trigonometric functions


It is now an easy matter to get the derivatives of the remaining trigonometric functions using basic
trig identities and the quotient rule. Remember that

sin x cos x 1
tan x = cot x = =
cos x sin x tan x
1 1
csc x = sec x =
sin x cos x
So, by the quotient rule,

cos x
hkkkikkkj ´ sin x
hkkkikkkj
d
 d

d d sin x dx sin x cos x ´ sin x dx cos x
tan x = = 2
= sec2 x
dx dx cos x cos x
cos x
hkkkikkkj
d

d d 1 dx sin x
csc x = =´ = ´ csc x cot x
dx dx sin x sin2 x
´ sin x
hkkkikkkj
d

d d 1 cos x
sec x = = ´ dx 2 = sec x tan x
dx dx cos x cos x
´ sin x
hkkkikkkj cos x
hkkkikkkj
d
 d

d d cos x dx cos x sin x ´ cos x dx sin x
cot x = = 2
= ´ csc2 x.
dx dx sin x sin x

§§ Summary
To summarise all this work, we can write this up as a theorem:

Theorem 4.2.5 (Derivatives of trigonometric functions).

The derivatives of sin x and cos x are


d d
sin x = cos x cos x = ´ sin x
dx dx
Consequently the derivatives of the other trigonometric functions are
d d
tan x = sec2 x cot x = ´ csc2 x
dx dx
d d
csc x = ´ csc x cot x sec x = sec x tan x
dx dx

Of these 6 derivatives you should really memorise those of sine, cosine and tangent. We certainly
expect you to be able to work out those of cotangent, cosecant and secant.

136
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE

4.3 IJ The chain rule

Learning Objectives
• Use the chain rule to compute derivatives of compositions of functions.

We have built up most of the tools that we need to express derivatives of complicated functions
in terms of derivatives of simpler known functions. We started by learning how to evaluate
• derivatives of sums, products and quotients,
• derivatives of constants and monomials.
These tools allow us to compute derivatives of polynomials and rational functions. We have also
added exponential and trigonometric functions to our list. The final tool we add is called the chain
rule. It tells us how to take the derivative of a composition of two functions. That isif we know f (x)
and g(x) and their derivatives, then the chain rule tells us the derivative of f g(x) .
Before we get to the statement of the rule, let us look at an example showing how such a
composition might arise (in the “real-world”).
Example 4.3.1
You are out in the woods after a long day of mathematics and are walking towards your camp fire
on a beautiful still night. The heat from the fire means that the air temperature depends on your
position. Let your position at time t be x(t ). The temperature of the air at position x is f (x). What
instantaneous rate of change of temperature do you feel at time t?


• Because your position at time t is x = x(t ), the temperature you feel at time t is F (t ) = f x(t ) .
• The instantaneous rate of change of temperature that you feel is F 1 (t ). We have a complicated
function, F (t ), constructed by composing two simpler functions, x(t ) and f (x).
d
• We wish to compute the derivative, F 1 (t ) = dt f (x(t )), of the complicated function F (t ) in
1 1
terms of the derivatives, x (t ) and f (x), of the two simple functions. This is exactly what the
chain rule does.

Example 4.3.1

137
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE

§§ Statement of the chain rule


Theorem 4.3.2 (The chain rule — version 1).

Let a P R and let g(x) be a function that is differentiable at x = a. Now let f (u) be a func-
tion that is differentiable at u = g(a). Then the function F (x) = f (g(x)) is differentiable
at x = a and

F 1 ( a ) = f 1 g ( a ) g1 ( a )

Here, as was the case earlier in this chapter, we have been very careful to give the point at which
the derivative is evaluated a special name (i.e. a). But of course this evaluation point can really be
any point (where the derivative is defined). So it is very common to just call the evaluation point “x”
rather than give it a special name like “a”, like this:

Theorem 4.3.3 (The chain rule — version 2).

Let f and g be differentiable functions then


d  
f g ( x ) = f 1 g ( x ) ¨ g1 ( x )
dx


Notice that when we form the composition f g(x) there is an “outside” function (namely
f (x)) and an “inside” function (namely g(x)). The chain rule tells us that when we differentiate
a composition that we have to differentiate the outside and then multiply by the derivative of the
inside.
d   1
f 1 g(x) ¨ lo
f g(x) = looomooon (xo)n
gomo
dx
diff outside diff inside

Here is another statement of the chain rule which makes this idea more explicit.

Theorem 4.3.4 (The chain rule — version 3).

Let y = f (u) and u = g(x) be differentiable functions, then

dy dy du
= ¨
dx du dx

This particular form is easy to remember because it looks like we can just “cancel” the du
between the two terms.
dy dy  du

= ¨
dx  du
 dx

138
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE

Of course, du is not, by itself, a number or variable10 that can be cancelled. But this is still a
good memory aid.
The hardest part about applying the chain rule is recognising when the function you are trying to
differentiate is really the composition of two simpler functions. This takes a little practice. We can
warm up with a couple of simple examples.
Example 4.3.5
 5
Let f (u) = u5 and g(x) = sin(x). Then set F (x) = f g(x) = sin(x) . To find the derivative of
F (x) we can simply apply the chain rule — the pieces of the composition have been laid out for us.
Here they are:
f (u) = u5 f 1 (u) = 5u4
g(x) = sin(x) g1 (x) = cos x.

We now just put them together as the chain rule tells us:
dF 
= f 1 g ( x ) ¨ g1 ( x )
dx
4
= 5 g(x) ¨ cos(x) since f 1 (u) = 5u4
4
= 5 sin(x) ¨ cos(x).
Notice that it is quite easy to extend this to any power. Set f (u) = un . Then follow the same
steps and we arrive at
n´1
F (x) = (sin(x))n , F 1 (x) = n sin(x) cos(x).

Example 4.3.5

This example shows one of the ways that the chain rule appears very frequently — when we
need to differentiate the power of some simpler function. More generally we have the following.
Example 4.3.6

Let f (u) = un and let g(x) be any differentiable function. Set F (x) = f g(x) = g(x)n . Then
dF d 
= g(x)n = ng(x)n´1 ¨ g1 (x)
dx dx
This is precisely the result in Example 4.1.11 and Lemma 4.1.12.
Example 4.3.6

Example 4.3.7
Let f (u) = cos(u) and g(x) = 3x ´ 2. Find the derivative of

F (x) = f g(x) = cos(3x ´ 2).

10 In this context du is called a differential. There are ways to understand and manipulate these in calculus but they
are beyond the scope of this course.

139
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE

Again we should approach this by first writing down f and g and their derivatives and then
putting everything together as the chain rule tells us.

f (u) = cos(u) f 1 (u) = ´ sin(u)


g(x) = 3x ´ 2 g1 (x) = 3.

So the chain rule says



F 1 ( x ) = f 1 g ( x ) ¨ g1 ( x )

= ´ sin g(x) ¨ 3
= ´3 sin(3x ´ 2).

Example 4.3.7

This example shows a second way that the chain rule appears very frequently — when we need
to differentiate some function of ax + b. More generally we have the following.
Example 4.3.8
Let a, b P R and let f (x) be a differentiable function. Set g(x) = ax + b. Then

d d 
f (ax + b) = f g(x )
dx dx 
= f 1 g ( x ) ¨ g1 ( x )
= f 1 (ax + b) ¨ a.

So the derivative of f (ax + b) with respect to x is just a f 1 (ax + b).


Example 4.3.8
The above is a very useful result that follows from the chain rule, so let’s make it a corollary to
highlight it.

Corollary 4.3.9.

Let a, b P R and let f (x) be a differentiable function, then

d
f (ax + b) = a f 1 (ax + b).
dx

Example 4.3.10 (Example 4.3.1, continued)


Let us now go back to our motivating campfire example. There we had

f (x) = temperature at position x,


x(t ) = position at time t,
F (t ) = f (x(t )) = temperature at time t.

140
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE

The chain rule gave



F 1 (t ) = f 1 x(t ) ¨ x1 (t ).

Notice that the units of measurement on both sides of the equation agree — as indeed they must.
To see this, let us assume that t is measured in seconds, that x(t ) is measured in metres and that
f (x) is measured in degrees. Because of this F (x(t )) must also be measured in degrees (since it is a
temperature).
What about the derivatives? These are rates of change. So
degrees
• F 1 (t ) has units second ,

degrees
• f 1 (x) has units metre , and
metre
• x1 (t ) has units second

Hence the product


 degrees metre degrees
f 1 x(t ) ¨ x1 (t ) has units = ¨ = .
metre second second
has the same units as F 1 (t ). So the units on both sides of the equation agree. Checking that the units
on both sides of an equation agree is a good check of consistency, but of course it does not prove
that both sides are in fact the same.
Example 4.3.10

§§ (optional) — Derivation of the chain rule


First, let’s review what our goal is. We have been given a function g(x), that is differentiable at
some point x = a, and another function f (u), that is differentiable at the point u = b = g(a). We
have defined the composite function F (x) = f g(x) and we wish to show that

F 1 (a) = f 1 g(a) ¨ g1 (a).

Before we can compute F 1 (a), we need to set up some ground work, and in particular the
definitions of our given derivatives:

f (b + H ) ´ f (b) g(a + h) ´ g(a)


f 1 (b) = lim and g1 (a) = lim .
HÑ0 H hÑ0 h
We are going to use similar manipulation tricks as we did back in the proofs of the arithmetic of
derivatives in Section 4.1. Unfortunately, we have already used up the symbols “F” and “H”, so we
are going to make use the Greek letters γ, ϕ.
As was the case in our derivation of the product rule it is convenient to introduce a couple of
new functions. Set
f (b + H ) ´ f (b)
ϕ (H ) = .
H

141
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE

Then we have

lim ϕ (H ) = f 1 (b) = f 1 g(a) since b = g(a), (4.3.1)
HÑ0
and we can also write (with a little juggling)
f (b + H ) = f (b) + Hϕ (H ).
Similarly set
g(a + h) ´ g(a)
γ (h) =
h
which gives us
lim γ (h) = g1 (a) and g(a + h) = g(a) + hγ (h).
hÑ0
Now we can start computing
F (a + h) ´ F (a)
F 1 (a) = lim
hÑ0 h
 
f g(a + h) ´ f g(a)
= lim .
hÑ0 h
We know that g(a) = b and g(a + h) = g(a) + hγ (h)), so
 
f g ( a ) + hγ ( h ) ´ f g ( a )
F 1 (a) = lim
hÑ0 h
f (b + hγ (h)) ´ f (b)
= lim .
hÑ0 h
Now for the sneaky bit. We can turn f (b + hγ (h)) into f (b + H ) by setting
H = hγ (h).
Now notice that as h Ñ 0 we have
lim H = lim h ¨ γ (h)
hÑ0 hÑ0
= lim h ¨ lim γ (h)
hÑ0 hÑ0
= 0 ¨ g (a) = 0.
1

So as h Ñ 0 we also have H Ñ 0.
We now have

1 f b + H ´ f (b)
F (a) = lim
hÑ0 h

f b + H ´ f (b) H
= lim ¨ if H = hγ (h) ‰ 0
H
hÑ0 looooooooomooooooooon h on
loomo
=ϕ (H ) =γ (h)

= lim ϕ (H ) ¨ γ (h)
hÑ0
= lim ϕ (H ) ¨ lim γ (h) since H Ñ 0 as h Ñ 0
hÑ0 hÑ0
= lim ϕ (H ) ¨ lim γ (h) = f 1 ( b ) ¨ g1 ( a )
HÑ0 hÑ0

142
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE

This is exactly the RHS of the chain rule. It is possible to have H = 0 in the second line above. But
that possibility is easy to deal with:

• If g1 (a) ‰ 0, then, since limhÑ0 γ (h) = g1 (a), H = hγ (h) cannot be 0 for small nonzero h.
Technically, there is an h0 ą 0 such that H = hγ (h) ‰ 0 for all 0 ă |h| ă h0 . In taking the limit
h Ñ 0, above, we need only consider 0 ă |h| ă h0 and so, in this case, the above computation
is completely correct.
• If g1 (a) = 0, the above computation is still fine provided we  exclude all h’s for which
H = hγ (h) ‰ 0. When g (a) = 0, the right hand side, f g(a) ¨ g (a), of the chain rule is 0.
1 1 1

So the above computation gives



f b + H ´ f (b) 
lim = f 1 g(a) ¨ g1 (a) = 0.
hÑ0
γ (h)‰0
h

On the other hand, when H = 0, we have f b + H ´ f (b) = 0. So

f b + H ´ f (b)
lim =0
hÑ0
γ (h)=0
h

too. That’s all we need.

§§ Chain rule examples


We’ll now use the chain rule to compute some more derivatives.
Example 4.3.11
d
75
Find dx 1 + 3x .
This is a concrete version of Example 4.3.8. We are to find the derivative of a function that is
built up by first computing 1 + 3x and then taking the 75th power of the result. So we set

f (u) = u75 f 1 (u) = 75u74


g(x) = 1 + 3x g1 (x) = 3
 75
F (x) = f g(x) = g(x)75 = 1 + 3x .

By the chain rule,


 74
F 1 (x) = f 1 g(x) g1 (x) = 75 g(x)74 g1 (x) = 75 1 + 3x ¨ 3
74
= 225 1 + 3x .

Example 4.3.11

Example 4.3.12
d 2
Find dx sin(x ).

143
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE

In this example we are to compute the derivative of sin with a (slightly) complicated argument.
So we apply the chain rule with f being sin and g(x) being the complicated argument. That is, we
set

f (u) = sin u f 1 (u) = cos u


g ( x ) = x2 g1 (x) = 2x
 
F (x) = f g(x) = sin g(x) = sin(x2 ).

By the chain rule,


 
F 1 (x) = f 1 g(x) g1 (x) = cos g(x) g1 (x) = cos(x2 ) ¨ 2x
= 2x cos(x2 ).

Example 4.3.12

Example 4.3.13
d 3
a
Find dx sin(x2 ).
In this example we are to compute the derivative of the cube root of a (moderately) complicated
argument, namely sin(x2 ). So we apply the chain rule with f being “cube root” and g(x) being the
complicated argument. That is, we set
? 1 2
f (u) = 3
u = u3 f 1 (u) = 13 u´ 3
g(x) = sin(x2 ) g1 (x) = 2x cos(x2 )
 b b
F (x) = f g(x) = 3 g(x) = 3 sin(x2 ).

In computing g1 (x) here, we have already used the chain rule once (in Example 4.3.12). By the
chain rule,
 2
F 1 (x) = f 1 g(x) y1 (x) = 13 g(x)´ 3 ¨ 2x cos(x2 )
2x cos(x2 )
= .
3 [sin(x2 )] 23

Example 4.3.13

Example 4.3.14
d
Find the derivative of dx f (g(h(x))).
This is very similar to the previous example. Let us set F (x) = f (g(h(x))) with u = g(h(x)).
Then the chain rule tells us that
dF d f du
= ¨
dx du dx
d
= f 1 (g(h(x))) ¨ g(h(x)).
dx

144
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE

We now just apply the chain rule again

= f 1 (g(h(x))) ¨ g1 (h(x)) ¨ h1 (x).

Indeed it is not too hard to generalise further (in the manner of Example 4.1.11 to find the derivative
of the composition of 4 or more functions (though things start to become tedious to write down):

d d
f1 ( f2 ( f3 ( f4 (x)))) = f11 ( f2 ( f3 ( f4 (x)))) ¨ f2 ( f3 ( f4 (x)))
dx dx
d
= f11 ( f2 ( f3 ( f4 (x)))) ¨ f21 ( f3 ( f4 (x))) ¨ f3 ( f4 (x))
dx
= f1 ( f2 ( f3 ( f4 (x)))) ¨ f2 ( f3 ( f4 (x))) ¨ f31 ( f4 (x)) ¨ f41 (x).
1 1

Example 4.3.14

Example 4.3.15
We can also use the chain rule to calculate the derivative of the reciprocal of a function11 , and from
there we can use the product rule to recover the quotient rule.
We want to differentiate F (x) = g(1x) so set f (u) = 1u and u = g(x). Then the chain rule tells us
" *
d 1 dF d f du
= = ¨
dx g(x ) dx du dx
´1
= 2 ¨ g1 ( x )
u
g1 ( x )
=´ .
g(x )2

Once we know this, a quick application of the product rule will give us the quotient rule.
" * " *
d f (x ) d 1
= f (x ) ¨ use PR
dx g(x ) dx g(x )
" *
1 d 1
= f (x ) ¨
1
+ f (x ) ¨ use the result from above
g(x ) dx g(x)
1 g1 (x)
= f 1 (x ) ¨ ´ f (x ) ¨ place over a common denominator
g(x ) g(x )2
f 1 (x) ¨ g(x) ´ f (x) ¨ g1 (x)
=
g(x )2

which is exactly the quotient rule.


Example 4.3.15

11 We glimpsed this case earlier, in Example 4.1.9.

145
C OMPUTING D ERIVATIVES 4.3 T HE CHAIN RULE

Example 4.3.16
Compute the following derivative:
? !
d x5 3 + x6
cos
dx ( 4 + x2 ) 3
This time we are to compute the derivative of cos with a really complicated argument.
?
5 3 + x6
• So, to start, we apply the chain rule with g(x) = x being the really complicated argument
( 4 + x2 ) 3
and f being cos. That is, f (u) = cos(u). Since f 1 (u) = ´ sin(u), the chain rule gives
 5?   5?  # ? +
d x 3 + x6 x 3 + x6 d x5 3 + x6
cos = ´ sin .
dx ( 4 + x2 ) 3 (4 + x2 )3 dx (4 + x2 )3

• This reduced ? our problem to that of computing the derivative of the really complicated
x5 3 + x6
argument . We can think of the argument as being built up out of three pieces, namely
( 4 + x2 ) 3
? 3 ´3
x5 , multiplied by 3 + x6 , divided by (4 + x2 ) , or, equivalently, multiplied by (4 + x2 ) .
5
?
6 1/2 ´3
So we may rewrite x 3+2 x3 as x5 3 + x6 (4 + x2 ) , and then apply the product rule to
(4+x )
reduce the problem to that of computing the derivatives of the three pieces.

• Here goes (recall Example 4.1.11):


d  5 1/2 ´3  d  5 1/2 ´3
x ( 3 + x6 ) ( 4 + x2 ) = x ¨ ( 3 + x6 ) ¨ ( 4 + x2 )
dx dx
d  1/2  ´3
+ x5 ¨ ( 3 + x6 ) ¨ ( 4 + x2 )
dx
1/2 d  ´3 
+ x ¨ ( 3 + x6 ) ¨
5
( 4 + x2 ) .
dx
This has reduced our problem to computing the derivatives of x5 , which is easy, and of
1/2 ´3
(3 + x6 ) and (4 + x2 ) , both of which can be done by the chain rule. Doing so,
5x 4
hkkikkj
d  5 1/2 ´3  d  5 1/2 ´3
x ( 3 + x6 ) ( 4 + x2 ) = x ¨ ( 3 + x6 ) ¨ ( 4 + x2 )
dx dx
1 6 ´1/2 5
2 (3+x ) ¨6x
hkkkkkkkkikkkkkkkkj
d  1/2  ´3
+ x5 ¨ ( 3 + x6 ) ¨ ( 4 + x2 )
dx
2 ´4
´3(4+x ) ¨2x
hkkkkkkkkikkkkkkkkj
1/2 d  ´3 
+ x5 ¨ ( 3 + x6 ) ¨ ( 4 + x2 ) .
dx

• Now we can clean things up in a sneaky way by observing

– differentiating x5 , to get 5x4 , is the same as multiplying x5 by 5x , and

146
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION

1 1
– differentiating (3 + x6 ) 2 to get 12 (3 + x6 )´1/2 ¨ 6x5 is the same as multiplying (3 + x6 ) 2
5
by 33x
+ x6
, and
´3 ´4 ´3
– differentiating (4 + x2 ) to get ´3(4 + x2 ) ¨ 2x is the same as multiplying (4 + x2 )
6x
by ´ 4+ x2
.

Using these sneaky tricks we can write our solution quite neatly:
 5?   5?  ?
x 3 + x6 x5 3 + x6 5 3x5
" *
d x 3 + x6 6x
cos = ´ sin + ´ .
dx ( 4 + x2 ) 3 ( 4 + x2 ) 3 ( 4 + x2 ) 3 x 3 + x6 4 + x2

• This method of cleaning up the derivative of a messy product is actually something more
systematic in disguise — namely logarithmic differentiation. This is our next topic.

Example 4.3.16

4.4 IJ Logarithmic differentiation

Learning Objectives
• Differentiate logarithmic functions.

• Determine when to use logarithmic differentiation to simplify derivatives.

• Use logarithmic differentiation.

• Use the generalized product rule to compute the derivative of products of many
functions.

The chain rule opens the way to understanding derivatives of more complicated function. Not
only compositions of known functions as we have seen the examples of the previous section, but
also functions which are defined implicitly.
Consider the logarithm base e — loge (x) is the power that e must be raised to to give x. That is,
loge (x) is defined by

eloge x = x

i.e. — it is the inverse of the exponential function with base e. Since this choice of base works
so cleanly and easily with respect to differentiation, this base turns out to be (arguably) the most
natural choice for the base of the logarithm. And as we saw in our whirlwind review of logarithms
in Section 3.5, it is easy to use logarithms of one base to compute logarithms with another base:
loge x
logq x =
loge q
So we are (relatively) free to choose a base which is convenient for our purposes.

147
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION

The logarithm with base e, is called the “natural logarithm”. The “naturalness” of logarithms
base e is exactly that this choice of base works very nicely in calculus (and so wider mathematics)
in ways that other bases do not12 . There are several different “standard” notations for the logarithm
base e;
loge x = log x = ln x.
We recommend that you be able to recognise all of these.
In this text we will write the natural logarithm as “log” with no base. The reason for this choice
is that base e is the standard choice of base for logarithms in mathematics13 . The natural logarithm
inherits many properties of general logarithms14 . So, for all x, y ą 0 the following hold:
• elog x = x,

• for any real number X, log eX = X,
log x loga x
• for any a ą 1, loga x = log a and log x = loga e

• log 1 = 0, log e = 1
• log(xy) = log x + log y
 1

• log xy = log x ´ log y, log y = ´ log y

• log(xX ) = X log x
• lim log x = 8, lim log x = ´8
xÑ8 xÑ0
And finally we should remember that log x has domain (i.e. is defined for) x ą 0 and range (i.e. takes
all values in) ´8 ă x ă 8.

Figure 4.4.1.

y
1.5

1.0 y = ln x

0.5
x
1 2 3 4
−0.5

−1.0

−1.5

12 The interested reader should head to Wikipedia and look up the natural logarithm.
13 In other disciplines other bases are natural; in computer science, since numbers are stored in binary it makes sense
to use the binary logarithm — i.e. base 2. While in some sciences and finance, it makes sense to use the decimal
logarithm — i.e. base 10.
14 Again take a quick look at the whirlwind review of logarithms in Section 3.5.

148
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION

To compute the derivative of log x we could attempt to start with the limit definition of the
derivative
d log(x + h) ´ log(x)
log x = lim
dx hÑ0 h
log((x + h)/x)
= lim
hÑ0 h
= um. . .
This doesn’t look good. But all is not lost — we have the chain rule, and we know that the logarithm
satisfies the equation:
x = elog x
Since both sides of the equation are the same function, both sides of the equation have the same
derivative. i.e. we are using15
if f (x) = g(x) for all x, then f 1 (x) = g1 (x)
So now differentiate both sides:
d d
x = elog x
dx dx
The left-hand side is easy, and the right-hand side we can process using the chain rule with f (u) = eu
and u = log x.
d f du
1= ¨
du dx
d
= eu ¨ log x
dx
loomoon
what we want to compute

Recall that eu = elog x = x, so


d
1 = x¨ log x
dx
loomoon
now what?
We can now just rearrange this equation to make the thing we want the subject:
d 1
log x =
dx x
Thus we have proved:
Theorem 4.4.1.

d 1
log x =
dx x
where log x is the logarithm base e.

15 Notice that just because the derivatives are the same, doesn’t mean the original functions are the same. Both
f (x) = x2 and g(x) = x2 + 3 have derivative f 1 (x) = g1 (x) = 2x, but f (x) ‰ g(x).

149
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION

Example 4.4.2
Let f (x) = log 3x. Find f 1 (x).
There are two ways to approach this — we can simplify then differentiate, or differentiate and
then simplify. Neither is difficult.

• Simplify and then differentiate:

f (x) = log 3x log of a product


= log 3 + log x
d d
f 1 (x) = log 3 + log x
dx dx
1
= .
x

• Differentiation and then simplify:


d
f 1 (x ) = log(3x) chain rule
dx
1
= ¨3
3x
1
=
x

Example 4.4.2

Example 4.4.3 (The derivative of log cx)


Notice that we can extend the previous example for any positive constant — not just 3. Let c ą 0 be
a constant, then
d d
log cx = (log c + log x)
dx dx
1
=
x

Example 4.4.3

Example 4.4.4 (The derivative of log |x|)


We can push this further still. Let g(x) = log |x|, then16

• If x ą 0, |x| = x and so

d 1
g1 ( x ) = log x =
dx x

16 It’s probably a good moment to go back and look at Example 3.3.15.

150
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION

• If x ă 0 then |x| = ´x. If |h| is strictly smaller than |x|, then we also have that x + h ă 0
and |x + h| = ´(x + h) = |x| ´ h. Write X = |x| and H = ´h. Then, by the definition of the
derivative,

log |x + h| ´ log |x| log(|x| ´ h) ´ log |x|


g1 (x) = lim = lim
hÑ0 h hÑ0 h
log(X + H ) ´ log X log(X + H ) ´ log X
= lim = ´ lim
HÑ0 ´H HÑ0 H
d 1 1
= ´ log X = ´ = ´
dX X |x|
1
=
x

• Since log 0 is undefined, g1 (0) does not exist.

Putting this together gives:

d 1
log |x| =
dx x

Example 4.4.4

Example 4.4.5 (The derivative of xa )


Just after Corollary 4.1.22, we said that we would, in the future, find the derivative of xa for all real
numbers. The future is here. Let x ą 0 and a be any real number. Exponentiating both sides of
log xa = a log x gives us xa = ea log x and then

d a d a log x d
x = e = ea log x (a log x) by the chain rule
dx dx dx
a a log x a a
= e = x
x x
a´1
= ax

as expected.
Example 4.4.5

We can extend Theorem 4.4.1 to compute the derivative of logarithms of other bases in a
straightforward way. Since for any positive a ‰ 1:

log x 1
loga x = = ¨ log x since a is a constant
log a log a
d 1 1
loga x = ¨
dx log a x

151
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION

d x
§§ Back to dx a
We can also now finally get around to computing the derivative of ax (which we started to do back in
Section 3.5).
Example 4.4.6 (The derivative of ax )
We show two ways to compute this derivative.
• Method 1:
f ( x ) = ax take log of both sides
log f (x) = x log a exponentiate both sides base e
f (x) = ex log a chain rule
f 1 (x) = ex log a ¨ log a
= ax ¨ log a.
• Method 2:
f ( x ) = ax take log of both sides
log f (x) = x log a differentiate both sides
d
(log f (x)) = log a
dx
We then process the left-hand side using the chain rule
1
f 1 (x ) ¨ = log a
f (x )
f 1 (x) = f (x) ¨ log a = ax ¨ log a.

Example 4.4.6
d
We will see dx log f (x) more below in the subsection on “logarithmic differentiation”.
To summarise the results above:
Corollary 4.4.7.

d x
a = log a ¨ ax for any a ą 0
dx
d 1
loga x = for any a ą 0, a ‰ 1
dx x ¨ log a
where log x is the natural logarithm.

Recall that we need the caveat a ‰ 1 because the logarithm base 1 is not well defined. This
is because 1x = 1 for any x. We do not need a similar caveat for the derivative of the exponential
because we know (recall Example 3.5.1)
d x d
1 = 1=0 while the above corollary tells us
dx dx
= log 1 ¨ 1x = 0 ¨ 1 = 0.

152
C OMPUTING D ERIVATIVES 4.4 L OGARITHMIC DIFFERENTIATION

§§ Revisiting examples, this time through logarithmic differentiation


We want to go back to some previous slightly messy examples (Examples 4.1.11 and 4.1.23) and
now show you how they can be done more easily.
Example 4.4.8
Consider again the derivative of the product of 3 functions:

P(x ) = F (x ) ¨ G(x ) ¨ H (x )

Start by taking the logarithm of both sides:

log P(x) = log (F (x) ¨ G(x) ¨ H (x))


= log F (x) + log G(x) + log H (x).

Notice that the product of functions on the right-hand side has become a sum of functions. Differen-
tiating sums is much easier than differentiating products. So when we differentiate we have

d d d d
log P(x) = log F (x) + log G(x) + log H (x).
dx dx dx dx

A quick application of the chain rule shows that d


dx log f (x ) = f 1 (x ) / f (x ):

P1 (x) F 1 (x) G1 (x) H 1 (x)


= + + .
P(x ) F (x ) G(x ) H (x )

Multiply through by P(x) = F (x)G(x)H (x):


 
F 1 ( x ) G1 ( x ) H 1 ( x )
1
P (x ) = + + ¨ F (x )G(x )H (x )
F (x ) G(x ) H (x )
= F 1 ( x ) G ( x ) H ( x ) + F ( x ) G1 ( x ) H ( x ) + F ( x ) G ( x ) H 1 ( x ) .

which is what found in Example 4.1.11 by repeated application of the product rule. The above
generalises quite easily to more than 3 functions.
Example 4.4.8
This same trick of “take a logarithm and then differentiate” — or logarithmic differentiation — will
work any time you have a product (or ratio) of functions.
Example 4.4.9
Let’s use logarithmic differentiation on the function from Example 4.1.23:
?
( x ´ 1)(2 ´ x)(1 ´ x2 )
f (x ) = ?
x(3 + 2x)

Beware however, that we may only take the logarithm of positive numbers, and this f (x) is often
negative. For example, if 1 ă x ă 2, the factor (1 ´ x2 ) in the definition of f (x) is negative while
all of the other factors are positive, so that f (x) ă 0. None–the–less, we can use logarithmic

153
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION

d f 1 (x )
differentiation to find f 1 (x), by exploiting the observation that dx log | f (x)| = f (x) . (To see this, use
the chain rule and Example 4.4.4.) So we take the logarithm of | f (x)| and expand.
?
| x ´ 1| |2 ´ x| |1 ´ x2 |
log | f (x)| = log ?
x|3 + 2x|
? ?
= log | x ´ 1| + log |2 ´ x| + log |1 ´ x2 | ´ log ( x) ´ log |3 + 2x|
looomooon
= 21 log x

Now we can essentially just differentiate term-by-term:


 
d d ? 2 1
log | f (x)| = log | x ´ 1| + log |2 ´ x| + log |1 ´ x | ´ log(x) ´ log |3 + 2x|
dx dx 2
?
1
f (x) 1/(2 x) ´1 ´2x 1 2
= ? + + ´ ´
f (x ) x´1 2 ´ x 1 ´ x2 2x 3 + 2x
 
1 1 2x 1 2
f (x ) = f (x ) ¨ ? ?
1
´ ´ ´ ´
2 x( x ´ 1) 2 ´ x 1 ´ x2 2x 3 + 2x
?  
( x ´ 1)(2 ´ x)(1 ´ x2 ) 1 1 2x 1 2
= ? ¨ ? ? ´ ´ ´ ´
x(3 + 2x) 2 x( x ´ 1) 2 ´ x 1 ´ x2 2x 3 + 2x

just as we found previously.


Example 4.4.9

4.5 IJ Implicit differentiation

Learning Objectives
• Explain how implicit differentiation is a consequence of the Chain Rule.

• Use implicit differentiation to find slopes of tangent lines to implicitly defined curves.

Implicit differentiation is a simple trick that is used to compute derivatives of functions either

• when you don’t know an explicit formula for the function, but you know an equation that the
function obeys, or

• even when you have an explicit, but complicated, formula for the function, and the function
obeys a simple equation.

The trick is just to differentiate both sides of the equation and then solve for the derivative we
are seeking. In fact we have already done this, without using the name “implicit differentiation”,
when we found the derivative of log x in the previous section. There we knew that the function

154
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION

f (x) = log x satisfied the equation e f (x) = x for all x. That is, the functions e f (x) and x are in fact
the same function and so have the same derivative. So we had
d f (x ) d
e = x=1
dx dx
d f (x )
We then used the chain rule to get dx e = e f (x) f 1 (x), which told us that f 1 (x) obeys the equation

e f (x ) f 1 ( x ) = 1 and we can now solve for f 1 (x)


1
f 1 (x) = e´ f (x) = e´ log x = .
x
The typical way to get used to implicit differentiation is to play with problems involving tangent
lines to curves. So here are a few examples finding the equations of tangent lines to curves. 
Recall, from Theorem 3.3.717 , that, in general, the tangent line to the curve y = f (x) at x0 , y0 is
y = f (x0 ) + f 1 (x0 )(x ´ x0 ) = y0 + f 1 (x0 )(x ´ x0 ).
Example 4.5.1
Find the equation of the tangent line to y = y3 + xy + x3 at x = 1.
This is a very standard sounding example, but made a little complicated by the fact that the curve
is given by a cubic equation — which means we cannot solve directly for y in terms of x or vice
versa. So we really do need implicit differentiation.

• First notice that when x = 1 the equation, y = y3 + xy + x3 , of the curve simplifies to y =


y3 + y + 1 or y3 = ´1, which we can solve18 : y = ´1. So we know that the curve passes
through (1, ´1) when x = 1.

• Now, to find the slope of the tangent line at (1, ´1), pretend that our curve is y = f (x) so that
f (x) obeys

f ( x ) = f ( x ) 3 + x f ( x ) + x3

for all x. Differentiating both sides gives

f 1 (x) = 3 f (x)2 f 1 (x) + f (x) + x f 1 (x) + 3x2

• At this point we could isolate for f 1 (x) and write it in terms of f (x) and x, but since we only
want answers when x = 1, let us substitute in x = 1 and f (1) = ´1 (since the curve passes
through (1, ´1)) and clean things up before doing anything else.

• Subbing in x = 1, f (1) = ´1 gives

2
f 1 (1) = 3 f 1 (1) ´ 1 + f 1 (1) + 3 and so f 1 (1) = ´
3
17 In Theorem 3.3.7 we wrote the x-coordinate of the point as a. The following examples use the name x0 instead. Of
course, we could use any name we would like — a, x0 , ♥. . . etc — but the symbols that are usually chosen for this
are x0 or a.
18 This type of luck rarely happens in the “real world”. But it happens remarkably frequently in textbooks, problem
sets and tests.

155
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION

• The equation of the tangent line is


2 2 1
y = y0 + f 1 (x0 )(x ´ x0 ) = ´1 ´ (x ´ 1) = ´ x ´
3 3 3

We can further clean up the equation of the line to write it as 2x + 3y = ´1.


Example 4.5.1
In the previous example we replace y by f (x) in the middle of the computation. We don’t actually
have to do this. When we are writing out our solution we can remember that y is a function of x. So
we can start with

y = y3 + xy + x3

and differentiate remembering that y ” y(x)

y1 = 3y2 y1 + xy1 + y + 3x2

And now substitute x = 1, y = ´1 to get

y1 (1) = 3 ¨ y1 (1) + y1 (1) ´ 1 + 3 and so


2
y1 (1) = ´
3
As a brief interlude to the tangent line problems: note that implicit differentiation can be used for
higher-order derivatives, too. Consider the same function as in the above example, and the problem
of finding its second derivative.
Example 4.5.2 (Example 4.5.1, continued)
Find y2 if y = y3 + xy + x3 .
Solution. Again, this problem concerns some function y(x) that is not given to us explicitly. All
that we are told is that y(x) satisfies

y(x) = y(x)3 + xy(x) + x3 (E1)

for all x. We are asked to find y2 (x). We cannot solve this equation to get an explicit formula for
d
y(x). So we use implicit differentiation, as we did in Example 4.5.1. That is, we apply dx to both
sides of (E1). This gives

y1 (x) = 3y(x)2 y1 (x) + y(x) + x y1 (x) + 3x2 (E2)

Since Example 4.5.1 asked us to find the tangent line at a specific point, we substituted in some
values before solving for y1 (x). In this example we are just finding the general derivative – not at a
specific value – so there are no values to substitute in. We go directly to solving for y1 (x) by moving
all y1 (x)’s to the left hand side, giving
 
1 ´ x ´ 3y(x)2 y1 (x) = y(x) + 3x2

and then dividing across to get


y(x) + 3x2
y1 (x) = . (E3)
1 ´ x ´ 3y(x)2

156
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION

To get y2 (x) from here, we have two options.


d
Method 1. Apply dx to both sides of (E2). This gives

y2 (x) = 3y(x)2 y2 (x) + 6y(x) y1 (x)2 + 2y1 (x) + x y2 (x) + 6x

We can now solve for y2 (x), giving

6x + 2y1 (x) + 6y(x)y1 (x)2


y2 (x) = (E4)
1 ´ x ´ 3y(x)2
Then we can substitute in (E3), giving
y(x)+3x2 y(x)+3x2 2
3x + 1´x´3y(x)2 + 3y(x) 1´x´3y(x)2
y2 (x) = 2
1 ´ x ´ 3y(x)2
2 2
3x[1 ´ x ´ 3y(x)2 ] + [y(x) + 3x2 ][1 ´ x ´ 3y(x)2 ] + 3y(x)[y(x) + 3x2 ]
=2
[1 ´ x ´ 3y(x)2 ]3
Method 2. Alternatively, we can also differentiate (E3).

[y1 (x) + 6x][1 ´ x ´ 3y(x)2 ] ´ [y(x) + 3x2 ][´1 ´ 6y(x)y1 (x)]


y2 (x) =
[1 ´ x ´ 3y(x)2 ]2
y(x)+3x2 y(x)+3x 2
[ 1´x´3y (x )2
+ 6x][1 ´ x ´ 3y(x)2 ] ´ [y(x) + 3x2 ][´1 ´ 6y(x) 1´x´3y (x )2
]
=
[1 ´ x ´ 3y(x)2 ]2
2 2
2[y(x) + 3x2 ][1 ´ x ´ 3y(x)2 ] + 6x[1 ´ x ´ 3y(x)2 ] + 6y(x)[y(x) + 3x2 ]
=
[1 ´ x ´ 3y(x)2 ]3

Remark 1. We have now computed y2 (x) — sort of. The answer is in terms of y(x), which we
don’t know. Since we cannot get an explicit formula for y(x), there’s not a great deal that we can do,
in general.
Remark 2. Even though we cannot solve y = y3 + xy + x3 explicitly for y(x), for general x, it
is sometimes possible to solve equations like this for some special values of x. In fact, we saw
in Example 4.5.1 that when x = 1, the given equation reduces to y(1) = y(1)3 + 1 ¨ y(1) + 13 , or
y(1)3 = ´1, which we can solve to get y(1) = ´1. Substituting into (E2), as we did in Example
4.5.1 gives
´1 + 3 2
y1 (1) = =´
1 ´ 1 ´ 3(´1) 2 3
and substituting into (E4) gives
2
 
2 2
6 + 2 ´ 3 + 6 ( ´1 ) ´ 3 6 ´ 43 ´ 83 2
2
y (1) = = =´
1 ´ 1 ´ 3(´1) 2 ´3 3

(It’s a fluke that, in this example, y1 (1) and y2 (1) happen to be equal.) So we now know that,
even though we can’t solve y = y3 + xy + x3 explicitly for y(x), the graph of the solution passes
through (1, ´1) and has slope ´ 23 (i.e. is sloping downwards by between 30˝ and 45˝ ) there and,
furthermore, the slope of the graph decreases as x increases through x = 1.

157
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION

y
x
1

−1

Here is a sketch of the part of the graph very near (1, ´1). The tangent line to the graph at (1, ´1) is
also shown. Note that the tangent line is sloping down to the right, as we expect, and that the graph
lies below the tangent line near (1, ´1). That’s because the slope f 1 (x) is decreasing (becoming
more negative) as x passes through 1.
Example 4.5.2

Warning 4.5.3.

Many people will suppress the (x) in y(x) when doing computations like those in Example
y+3x2
4.5.2. This gives shorter, easier to read formulae, like y1 = 1´x´3y 2 . If you do this, you
must never forget that y is a function of x and is not a constant. If you do forget, you’ll
dy
make the very serious error of saying that dx = 0, which is false.

Okay. The next one returns to a question involving tangent lines, and is at the same time a bit
easier (because it is a quadratic, and because we only need to take the first derivative) and a bit
harder (because we are asked for the tangent at a general point on the curve, not a specific one).
Example 4.5.4
Let (x0 , y0 ) be a point on the ellipse 3x2 + 5y2 = 7. Find the equation for the tangent lines when
x = 1 and y is positive. Then find an equation for the tangent line to the ellipse at a general point
(x0 , y0 ).
Since we are not given an specific point x0 we are going to have to be careful with the second
half of this question.

• When x = 1 the equation simplifies to

3 + 5y2 = 7
5y2 = 4
2
y = ˘? .
5
?
We are only interested in positive y, so our point on the curve is (1, 2/ 5).

158
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION

dy
• Now we use implicit differentiation to find dx at this point. First we pretend that we have
solved the curve explicitly, for some interval of x’s, as y = f (x). The equation becomes
3x2 + 5 f (x)2 = 7 now differentiate
6x + 10 f (x) f 1 (x) = 0
3x
f 1 (x ) = ´ .
5 f (x )
?
• When x = 1, y = 2/ 5 this becomes
3 3
f 1 (1) = ´? =´ ? .
5 ¨ 2/ 5 2 5
? 3
So the tangent line passes through (1, 2/ 5) and has slope ´ 2? 5
. Hence the tangent line has
equation
y = y0 + f 1 (x0 )(x ´ x0 )
2 3
= ? ´ ? (x ´ 1)
5 2 5
7 ´ 3x
= ? or equivalently
2 5
?
3x + 2 5y = 7.

Now we should go back and do the same but for a general point on the curve (x0 , y0 ):
• A good first step here is to sketch the curve. Since this is an ellipse, it is pretty straight-forward.

p  p  (x0 , y0 )
− 7/3, 0 7/3, 0

3x2 + 5y 2 = 7 3x3 + 5y 2 = 7

? that there are two points on the ellipse — the extreme right and left points (x0 , y0 ) =
• Notice
˘ 7/3, 0 — at which the tangent line is vertical. In those two cases, the tangent line is just
x = x0 .
• Since this is a quadratic for y, we could solve it explicitly to get
c
7 ´ 3x2
y=˘
5
and choose the positive or negative branch as appropriate. Then we could differentiate to find
the slope and put things together to get the tangent line.
But even in this relatively easy case, it is computationally cleaner, and hence less vulnerable
to mechanical errors, to use implicit differentiation. So that’s what we’ll do.

159
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION

• Now we could again “pretend” that we have solved the equation for the ellipse for y = f (x)
near (x0 , y0 ), but let’s not do that. Instead (as we did just before this example) just remember
that when we differentiate y is really a function of x. So starting from
3x2 + 5y2 = 7 differentiating gives
6x + 5 ¨ 2y ¨ y1 = 0.
We can then solve this for y1 :
3x
y1 = ´
5y
where y1 and y are both functions of x.
• Hence at the point (x0 , y0 ) we have
ˇ 3x0
y1 ˇ(x
0 ,y0 )
=´ .
5y0
This is the slope of the tangent line at (x0 , y0 ) and so its equation is
y = y0 + y1 ¨ (x ´ x0 )
3x
= y0 ´ 0 (x ´ x0 ).
5y0
We can simplify this by multiplying through by 5y0 to get
5y0 y = 5y20 ´ 3x0 x + 3x02 .
We can clean this up more by moving all the terms that contain x or y to the left-hand side and
everything else to the right:
3x0 x + 5y0 y = 3x02 + 5y20 .

But there is one more thing we can do, our original equation is 3x2 + 5y2 = 7 for all points on
the curve, so we know that 3x02 + 5y20 = 7. This cleans up the right-hand side:
3x0 x + 5y0 y = 7.

• In deriving this formula for the tangent line at (x0 , y0 ) we have assumed that y?0 ‰ 0. But in
fact the final answer happens to also work when y0 = 0 (which means x0 = ˘ 7/3), so that
the tangent line is x = x0 .
We can also check that our answer for general (x0 , y0 ) reduces to our answer for x0 = 1.
?
• When x0 = 1 we worked out that y0 = 2/ 5.
• Plugging this into our answer above gives
?
3x0 x + 5y0 y = 7 sub in (x0 , y0 ) = (1, 2/ 5) :
2
3x + 5 ? y = 7 clean up a little
5
?
3x + 2 5y = 7
as required.

160
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION

Example 4.5.4

Example 4.5.5
At which points does the curve x2 ´ xy + y2 = 3 cross the x–axis? Are the tangent lines to the curve
at those points parallel?
This is a 2 part question — first the x-intercepts and then we need to examine tangent lines.
• Finding where the curve crosses the x-axis is straight forward. It does so when y = 0. This
means x satisfies
?
x 2 ´ x ¨ 0 + 02 = 3 so x = ˘ 3.
? 
So the curve crosses the x–axis at two points ˘ 3 , 0 .
• Now we need to find the tangent lines at those points. But we don’t actually need the lines,
just their slopes. Again we can pretend that near one of those points the curve is y = f (x).
d
Applying dx to both sides of x2 ´ x f (x) + f (x)2 = 3 gives
2x ´ f (x) ´ x f 1 (x) + 2 f (x) f 1 (x) = 0
etc etc.
• But let us stop “pretending”. Just make sure we remember that y is a function of x when we
differentiate:
x2 ´ xy + y2 = 3 start with the curve, and differentiate
2x ´ xy1 ´ y + 2yy1 = 0
?
Now substitute in the first point, x = + 3, y = 0:
? ?
2 3 ´ 3y1 + 0 = 0
y1 = 2
?
And now do the second point x = ´ 3, y = 0:
? ?
´2 3 + 3y1 + 0 = 0
y1 = 2
? ?
Thus the slope is the same at x = 3 and x = ´ 3 and the tangent lines are parallel.

161
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION

Example 4.5.5

Okay — let’s get away from curves and do something a little different.
Example 4.5.6
You are standing at the origin. At time zero a pitcher throws a ball at your head19 .

Figure 4.5.1.

r
θ(t)
d − vt

The position of the (centre of the) ball at time t is x(t ) = d ´ vt, where d is the distance from
your head to the pitcher’s mound and v is the ball’s velocity. Your eye sees the ball filling 20 an angle
2θ (t ) with
 r
sin θ (t ) =
d ´ vt
where r is the radius of the baseball. The question is “How fast is θ growing at time t?” That is,
what is dθ
dt ?

• We don’t know (yet) how to solve this equation to find θ (t ) explicitly. So we use implicit
differentiation.
d
• To do so we apply dt to both sides of our equation. This gives
 rv
cos θ (t ) ¨ θ 1 (t ) =
(d ´ vt )2

• Then we solve for θ 1 (t ):


rv
θ 1 (t ) = 
(d ´ vt )2 cos θ (t )

• As is often the case, when using implicit differentiation, this answer is not very satisfying
because it contains θ (t ), for which we still do not have an explicit formula. However in this
case we can get an explicit formula for cos θ (t ) , without having an explicit formula for
θ (t ), just by looking at the right–angled triangle in Figure 4.5.1, above.

19 It seems that it is not a friendly game today.


20 This is the “visual angle” or “angular size”.

162
C OMPUTING D ERIVATIVES 4.5 I MPLICIT DIFFERENTIATION

a d ´ vt. By Pythagoras, the length of the side of the


• The hypotenuse of that triangle has length
triangle adjacent of the angle θ (t ) is (d ´ vt )2 ´ r2 . So
a
 (d ´ vt )2 ´ r2
cos θ (t ) =
d ´ vt
and
rv
θ 1 (t ) = a
(d ´ vt ) (d ´ vt )2 ´ r2

Example 4.5.6

Okay — just one more tangent-to-the-curve example and then we’ll go on to something different.

Example 4.5.7
Let (x0 , y0 ) be a point on the astroid21

x /3 + y /3 = 1.
2 2

Find an equation for the tangent line to the astroid at (x0 , y0 ).

• As was the case in examples above we can rewrite the equation of the astroid near (x0 , y0 ) in
the form y = f (x), with an explicit f (x), by solving the equation x2/3 + y2/3 = 1. But again,
it is computationally cleaner, and hence less vulnerable to mechanical errors, to use implicit
differentiation. So that’s what we’ll do.

• First up, since (x0 , y0 ) lies on the curve, it satisfies

x0/3 + y0/3 = 1.
2 2

• Now, no pretending that y = f (x), this time — just make sure we remember when we
differentiate that y changes with x.

x /3 + y /3 = 1
2 2
start with the curve, and differentiate
2 ´1/3 2 ´1/3 1
x + y y =0
3 3

• Note the derivative of x2/3 , namely 23 x´1/3 , and the derivative of y2/3 , namely 23 y´1/3 y1 , are
defined only when x ‰ 0 and y ‰ 0. We are interested in the case that x = x0 and y = y0 . So
we better assume that x0 ‰ 0 and y0 ‰ 0. Probably something weird happens when x0 = 0 or
y0 = 0. We’ll come back to this shortly.

21 Here is where is the astroid comes from. Imagine two circles, one of radius 1/4 and one of radius 1. Paint a red
dot on the smaller circle. Then imagine the smaller circle rolling around the inside of the larger circle. The curve
traced by the red dot is our astroid. Search “astroid” (be careful about the spelling) to find animations showing this.
The astroid was first discussed by Johann Bernoulli in 1691–92. It also appears in the work of Leibniz.

163
C OMPUTING D ERIVATIVES 4.6 I NVERSE FUNCTIONS

• To continue on, we set x = x0 , y = y0 in the equation above, and then solve for y1 :
 1/3
2 ´1/3 2 ´1/3 1 y0
x0 + y0 y (x) = 0 ùñ y (x0 ) = ´
1
3 3 x0
This is the slope of the tangent line and its equation is
 1/3
y0
y = y0 + f (x0 )(x ´ x0 ) = y0 ´
1
(x ´ x0 )
x0
?
Now let’s think a little bit about what the tangent line slope of ´ 3 y0/x0 tells us about the astroid.
• First, as a preliminary observation, note that since x0/3 ě 0 and y0/3 ě 0 the equation x0/3 +
2 2 2

y0/3 = 1 of the astroid forces 0 ď x0/3 , y0/3 ď 1 and hence ´1 ď x0 , y0 ď 1.


2 2 2

?
• For all x0 , y0 ą 0 the slope ´ 3 y0/x0 ă 0. So at all points on the astroid that are in the first
quadrant, the tangent line has negative slope, i.e. is “leaning backwards”.

• As x0 tends to zero, y0 tends to ˘1 and the tangent line slope tends to infinity. So at points on
the astroid near (0, ˘1), the tangent line is almost vertical.

• As y0 tends to zero, x0 tends to ˘1 and the tangent line slope tends to zero. So at points on the
astroid near (˘1, 0), the tangent line is almost horizontal.
Here is a figure illustrating all this.

(x0 , y0 )

x2/3 + y 2/3 = 1

Sure enough, as we speculated earlier, something weird does happen to the astroid when x0 or y0 is
zero. The astroid is pointy, and does not have a tangent there.
Example 4.5.7

4.6 IJ Inverse functions


One very useful application of implicit differentiation is to find the derivatives of inverse functions.
We have already used this approach to find the derivative of the inverse of the exponential function
— the logarithm — in §4.4.
In this section we will first describe what an inverse function is, and work through some examples,
and then show how to use implicit differentiation to compute the derivative of an inverse function.

164
C OMPUTING D ERIVATIVES 4.6 I NVERSE FUNCTIONS

In Example 4.5.6 we encountered the problem of trying to solve the equation


 r
sin θ (t ) =
d ´ vt
for θ (t ). We’re now going to consider, more generally, problems in which
• we have a given function, that we’ll call f , and

• for each number X

• we wish to find a number Y satisfying

f (Y ) = X. (4.6.1)

If we’re lucky, then for each real number X there is exactly one real number Y , that we’ll call
f ´1 (X ),
obeying (4.6.1). Then f ´1 is called the inverse function of f . A (trivial) example in which
this happens is given in Example 4.6.1, below.
If we’re a little less lucky, there is a set of real numbers D (that does not contain all of R) such
that
• for each real number X in D there is exactly one real number Y , that we’ll again call f ´1 (X ),
obeying (4.6.1) but
• for each real number X that is not in D there is no Y obeying (4.6.1).
Then f ´1 is again called the inverse function of f and D is called the domain of f ´1 . We have
already seen an example of this — namely f (x) = ex . We’ll review this example in Example 4.6.2,
below.
If we’re yet a little less lucky, there is at least one real number X for which there is more than
one real number Y obeying (4.6.1). The trigonometric functions are like this. We’ll take a first quick
look at this in Example 4.7.1, below and take a more thorough look in the next section, §4.7, below.
Example 4.6.1
Let f (x) = 2x. For this f (x), equation (4.6.1) becomes

2Y = X

For each real number X, there is exactly one Y , namely Y = X2 , that obeys 2Y = X. So, the function
f (x) = 2x has inverse function f ´1 (X ) = X2 .
Example 4.6.1

Example 4.6.2
Let f (x) = ex . For this f (x), equation (4.6.1) becomes

eY = X

For concreteness, let’s pick a specific value of X, say X = 2. The graph of eY , as a function of Y , is
sketched below. In that sketch, the x–axis has been renamed the Y –axis, because we are interested
in eY as a function of Y . (Be careful to distinguish the upper case Y from the lower case y.) The

165
C OMPUTING D ERIVATIVES 4.6 I NVERSE FUNCTIONS

y y = eY

y=2

x=Y
y = −2

number of Y ’s obeying eY = 2 is exactly the number of times the horizontal straight line y = 2
intersects the graph y = eY , which is one. So for X = 2, there is exactly one Y obeying eY = X. On
the other hand, for X = ´2, the number of Y ’s obeying eY = ´2 is exactly the number of times the
horizontal straight line y = ´2 intersects the graph y = eY , which is zero. So for X = ´2, no Y ’s
obey eY = X.
As Y runs from ´8 to +8, eY takes each strictly positive value exactly once and never takes
any value zero or smaller. So the domain of ln x, the inverse function of ex , is exactly the interval
(0, 8).
Example 4.6.2

Example 4.6.3
Let f (x) = sin(x). For this f (x), equation (4.6.1) becomes
sin(Y ) = X
For each fixed real number X, the number of Y ’s that obey sin(Y ) = X, is exactly the number of
times the horizontal straight line y = X intersects the graph y = sin(Y ). When ´1 ď X ď 1, the line
y = X intersects the graph y = sin(Y ) infinitely many times. This is illustrated in the figure below
by the line y = 0.3. On the other hand, when X ă ´1 or X ą 1, the line y = X never intersects the
graph y = sin(Y ). This is illustrated in the figure below by the line y = ´1.2. We’ll see what is
normally done about this in §4.7, below.

y = sin(x)
y = 0.3
x

y = −1.2

166
C OMPUTING D ERIVATIVES 4.6 I NVERSE FUNCTIONS

Example 4.6.3

It is an easy matter to construct the graph of an inverse function from the graph of the original
function. We just need to remember that

Y = f ´1 (X ) ðñ f (Y ) = X

which is y = f (x) with x renamed to Y and y renamed to X.


Start by drawing the graph of f , labelling the x– and y–axes and labelling the curve y = f (x).

y
y = f (x)

Now replace each x by Y and each y by X and replace the resulting label X = f (Y ) on the curve by
the equivalent Y = f ´1 (X ).

X
Y = f −1 (X)

Finally we just need to redraw the sketch with the Y axis running vertically (with Y increasing
upwards) and the X axis running horizontally (with X increasing to the right). To do so, pretend that
the sketch was on a transparency or on a very thin piece of paper that you can see through. Lift the
sketch up and flip it over so that the Y axis runs vertically and the X axis runs horizontally. If you
want can also convert the upper case X into a lower case x and the upper case Y into a lower case y.

167
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

Y Y = f −1 (X) y y = f −1 (x)

X x

§§ Derivatives of inverse functions


It is an easy matter to use implicit differentiation to find a formula for the derivative22 of f ´1 in
terms of the derivative of f . Substitute Y = f ´1 (X ) into f (Y ) = X to give

f f ´1 (X ) = X
d
Rename X to x and apply dx to both sides.

d  d
f f ´1 (x) = x = 1
dx dx
By the chain rule
 d d ´1 1
f 1 f ´1 (x) ¨ f ´1 (x) = 1 ùñ f (x) = 1 ´1  (4.6.2)
dx dx f f (x )

Example 4.6.4
The inverse function of f (x) = ex is f ´1 (x) = log x. Since f 1 (x) = ex , (4.6.2) gives

d 1 1
log x = log x = .
dx e x

Example 4.6.4

4.7 IJ Inverse trigonometric functions and their derivatives

22 There is a theorem called the Inverse Function Theorem, which we will not prove, that says that, under reasonable
hypotheses on f (x), f ´1 (x) is differentiable.

168
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

Learning Objectives
• Sketch f (x) = arctan x.

• Evaluate (at nice points) the inverse trigonometric functions arcsin(x), arccos(x) and
arctan(x).

• Use implicit differentiation / chain rule to find the derivatives of the inverse trigono-
metric functions arcsin(x), arccos(x) and arctan(x).

We are now going to consider the problem of finding the derivatives of the inverses of trigono-
metric functions. Most importantly, remind yourself that: given a function f (x), its inverse function
f ´1 (x) only exists, with domain D, when f (x) passes the “horizontal line test”, which says that for
each Y in D the horizontal line y = Y intersects the graph y = f (x) exactly once. (That is, f (x) is a
one-to-one function.)
Let us start by playing with the sine function and determine how to restrict the domain of sin x
so that its inverse function exists.
Example 4.7.1
Let y = f (x) = sin(x). We would like to find the inverse function which takes y and returns to us a
unique x-value so that sin(x) = y.

y = sin(x)
y = 0.3
x

y = −1.2

• For each real number Y , the number of x-values that obey sin(x) = Y , is exactly the number
of times the horizontal straight line y = Y intersects the graph of sin(x).
• When ´1 ď Y ď 1, the horizontal line intersects the graph infinitely many times. This is
illustrated in the figure above by the line y = 0.3.
• On the other hand, when Y ă ´1 or Y ą 1, the line y = Y never intersects the graph of sin(x).
This is illustrated in the figure above by the line y = ´1.2.
This is exactly the horizontal line test and it shows that the sine function is not one-to-one.
Now consider the function
π π
y = sin(x) with domain ´ ď x ď
2 2

169
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

This function has the same formula but the domain has been restricted so that, as we’ll now show,
the horizontal line test is satisfied.

y = sin(x)
y = 0.3
x
− π2 π
2

y = −1.2

As we saw above when |Y | ą 1 no x obeys sin(x) = Y and, for each ´1 ď Y ď 1, the line y = Y
(illustrated in the figure above with y = 0.3) crosses the curve y = sin(x) infinitely many times, so
that there are infinitely many x’s that obey f (x) = sin x = Y . However exactly one of those crossings
(the dot in the figure) has ´π/2 ď x ď π/2.
That is, for each ´1 ď Y ď 1, there is exactly one x, call it X, that obeys both
π π
sin X = Y and ´ ďX ď
2 2
That unique value, X, is typically denoted arcsin(Y ). That is
π π
sin(arcsin(Y )) = Y and ´ ď arcsin(Y ) ď
2 2
Renaming Y Ñ x, the inverse function arcsin(x) is defined for all ´1 ď x ď 1 and is determined by
the equation
 π π
sin arcsin(x) = x and ´ ď arcsin(x) ď . (4.7.1)
2 2
Note that many texts will use sin´1 (x) to denote arcsine, however we will use arcsin(x) since we
feel that it is clearer23 ; the reader should recognise both.
Example 4.7.1

Example 4.7.2
Since
π π 1
sin =1 sin =
2 6 2
and ´π/2 ď π/6, π/2 ď π/2, we have
π 1 π
arcsin 1 = arcsin =
2 2 6

23 The main reason being that people frequently confuse sin´1 (x) with (sin(x))´1 = sin1 x . We feel that prepending
the prefix “arc” less likely to lead to such confusion. The notations asin(x) and Arcsin(x) are also used.

170
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

Even though
sin(2π ) = 0

it is not true that arcsin 0 = 2π, and it is not true that arcsin sin(2π ) = 2π, because 2π is not
between ´π/2 and π/2. More generally

arcsin sin(x) = the unique angle θ between ´π/2 and π/2 obeying sin θ = sin x
= x if and only if ´π/2 ď x ď π/2

So, for example, arcsin sin 11π/16 cannot be 11π/16 because 11π/16 is bigger than π/2. So how
do we find the correct answer? Start by sketching the graph of sin(x).

y = sin(11π/16)

x
5π π 11π
16 2 16 y = sin(x)
3π 3π
16 16

It looks like the graph of sin x is symmetric about x = π/2. The mathematical way to say that “the
graph of sin x is symmetric about x = π/2” is “sin(π/2 ´ θ ) = sin(π/2 + θ )” for all θ . That is indeed
true24 .
Now 11π/16 = π/2 + 3π/16 so
 11π   π 3π   π 3π   5π 
sin = sin + = sin ´ = sin
16 2 16 2 16 16
and, since 5π/16 is indeed between ´π/2 and π/2,
  11π  5π  11π 
arcsin sin = and not .
16 16 16

Example 4.7.2

§§ Derivatives of inverse trig functions


Now that we have explored the arcsine function we are ready to find its derivative. Lets call

arcsin(x) = θ (x),

24 Indeed both are equal to cos θ .

171
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES


so that the derivative we are seeking is dx . The above equation is (after taking sine of both sides)
equivalent to

sin(θ ) = x

Now differentiate this using implicit differentiation (we just have to remember that θ varies with x
and use the chain rule carefully):


cos(θ ) ¨ =1
dx
dθ 1
= substitute θ = arcsin x
dx cos(θ )
d 1
arcsin x =
dx cos(arcsin x)

This doesn’t look too bad, but it’s not really very satisfying because the right hand side is expressed
in terms of arcsin(x) and we do not have an explicit formula for arcsin(x).
However even withoutan explicit formula for arcsin(x), it is a simple matter to get an explicit
formula for cos arcsin(x) , which is all we need. Just draw a right–angled triangle with one angle
being arcsin(x). This is done in the figure below25 .

1 x
θ

1 − x2

Since sin(θ ) = x (see (4.7.1)), we have made the side opposite the angle θ of? length x and the
hypotenuse of length 1. Then, by Pythagoras, the side adjacent to θ has length 1 ´ x2 and so

 a
cos arcsin(x) = cos(θ ) = 1 ´ x2

which in turn gives us the answer we need:

d 1
arcsin(x) = ?
dx 1 ´ x2

The definitions for arccos, arctan and arccot are developed in the same way. Here are the graphs
that are used.

25 The figure is drawn for the case that 0 ď arcsin(x) ď π/2. Virtually the same argument works for the case
´π/2 ď arcsin(x) ď 0

172
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

y = cos(x)

y = 0.3
x
π

y = −1.2

y y = tan(x)

y = 0.8

x
− π2 π
2

y y = cot(x)

y = 0.8
x
π π
2

The definitions for the remaining two inverse trigonometric functions may also be developed in the
same way2627 . But it’s a little easier to use
1 1
csc x = sec x =
sin x cos x
26 In fact, there are two different widely used definitions of arcsec x. Under our definition, below, θ = arcsec x
takes values in 0 ď θ ď π. Some people, perfectly legitimately, define θ = arcsec x to take values in the union
of 0 ď θ ă π2 and π ď θ ă 3π 2 . Our definition is sometimes called the “trigonometry friendly” definition. The
definition itself has the advantage of simplicity. The other definition is sometimes called the “calculus friendly”
definition. It eliminates some absolute values and hence simplifies some computations. Similarly, there are two
different widely used definitions of arccsc x.
27 One could also define arccot(x) = arctan(1/x) with arccot(0) = π2 . We have chosen not to do so, because the
definition we have chosen is both continuous and standard.

173
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

Definition 4.7.3.

arcsin x is defined for |x| ď 1. It is the unique number obeying


 π π
sin arcsin(x) = x and ´ ď arcsin(x) ď
2 2
arccos x is defined for |x| ď 1. It is the unique number obeying

cos arccos(x) = x and 0 ď arccos(x) ď π

arctan x is defined for all x P R. It is the unique number obeying


 π π
tan arctan(x) = x and ´ ă arctan(x) ă
2 2
arccsc x = arcsin 1x is defined for |x| ě 1. It is the unique number obeying
 π π
csc arccsc(x) = x and ´ ď arccsc(x) ď
2 2
Because csc(0) is undefined, arccsc(x) never takes the value 0.

arcsec x = arccos 1x is defined for |x| ě 1. It is the unique number obeying



sec arcsec(x) = x and 0 ď arcsec(x) ď π

Because sec(π/2) is undefined, arcsec(x) never takes the value π/2.

arccot x is defined for all x P R. It is the unique number obeying



cot arccot(x) = x and 0 ă arccot(x) ă π

Example 4.7.4
To find the derivative of arccos we can follow the same steps:

• Write arccos(x) = θ (x) so that cos θ = x and the desired derivative is dx .

• Differentiate implicitly, remembering that θ is a function of x:



´ sin θ =1
dx
dθ 1

dx sin θ
d 1
arccos x = ´ .
dx sin(arccos x)

• To simplify this expression, again draw the relevant triangle

174
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

1 √
1 − x2
θ
x

from which we see


a
sin(arccos x) = sin θ = 1 ´ x2 .

• Thus
d 1
arccos x = ´ ? .
dx 1 ´ x2

Example 4.7.4

Example 4.7.5
Very similar steps give the derivative of arctan x:
• Start with θ = arctan x, so tan θ = x.
• Differentiate implicitly:

sec2 θ =1
dx
dθ 1
= = cos2 θ
dx sec2 θ
d
arctan x = cos2 (arctan x).
dx
• To simplify this expression, we draw the relevant triangle


1 + x2
x
θ
1

from which we see


1
cos2 (arctan x) = cos2 θ =
1 + x2

• Thus
d 1
arctan x = .
dx 1 + x2

175
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

An almost identical computation gives the derivative of arccot x:


• Start with θ = arccot x, so cot θ = x.
• Differentiate implicitly:

´ csc2 θ =1
dx
d dθ 1 1
arccot x = = ´ 2 = ´ sin2 θ = ´
dx dx csc θ 1 + x2
from the triangle


1 + x2
1
θ
x

Example 4.7.5

Example 4.7.6
To find the derivative of arccsc we can use its definition and the chain rule.
θ = arccsc x take cosecant of both sides
1
csc θ = x but csc θ = , so flip both sides
sin θ
1
sin θ = now take arcsine of both sides
x  
1
θ = arcsin
x
Now just differentiate:
 
dθ d 1
= arcsin chain rule carefully
dx dx x
1 ´1
=? ¨ 2
1 ´ x´2 x
To simplify further we will factor x´2 out of the square root. We need to be a little careful doing
that. Take another look at examples 2.1.32 and 2.1.33 and the discussion between them before
proceeding.
1 ´1
=a ¨ 2
x´2 (x2 ´ 1) x
1 ´1
= ? ¨ 2 note that x2 ¨ |x´1 | = |x|.
|x | ¨ x ´ 1 x
´1 2

1
=´ ?
|x| x2 ´ 1

176
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

In the same way, we can find the derivative of the remaining inverse trig function. We just use its
definition, a derivative we already know and the chain rule.
d d 1 1  1 1
arcsec(x) = arccos = ´? ¨ ´ 2 = ?
dx dx x 1 ´ 1/x 2 x |x| x2 ´ 1

Example 4.7.6
By way of summary, we have

Theorem 4.7.7.

The derivatives of the inverse trigonometric functions are


d 1 d 1
arcsin(x) = ? arccsc(x) = ´ ?
dx 1 ´ x2 dx |x| x2 ´ 1
d 1 d 1
arccos(x) = ´ ? arcsec(x) = ?
dx 1 ´ x2 dx |x| x2 ´ 1
d 1 d 1
arctan(x) = arccot(x) = ´
dx 1 + x2 dx 1 + x2

177
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

178
Applications of Differentiation

179
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

In Section 3.3.2 we defined the derivative at x = a, f 1 (a), of an abstract function f (x), to be its
instantaneous rate of change at x = a:

f (x ) ´ f (a)
f 1 (a) = lim
xÑa x´a
This abstract definition, and the whole theory that we have developed to deal with it, turns out
be extremely useful simply because “instantaneous rate of change” appears in a huge number of
settings. Here are a few examples.

• If you are moving along a line and x(t ) is your position on the line at time t, then your rate of
change of position, x1 (t ), is your velocity. If, instead, v(t ) is your velocity at time t, then your
rate of change of velocity, v1 (t ), is your acceleration.

• If P(t ) is the size of some population (say the number of humans on the earth) at time t, then
P1 (t ) is the rate at which the size of that population is changing. It is called the net birth rate.

• Radiocarbon dating, a procedure used to determine the age of, for example, archaeological
materials, is based on an understanding of the rate at which an unstable isotope of carbon
decays.

• A capacitor is an electrical component that is used to repeatedly store and release electrical
charge (say electrons) in an electronic circuit. If Q(t ) is the charge on a capacitor at time t,
then Q1 (t ) is the instantaneous rate at which charge is flowing into the capacitor. That’s called
the current. The standard unit of charge is the coulomb. One coulomb is the magnitude of the
charge of approximately 6.241 ˆ 1018 electrons. The standard unit for current is the amp. One
amp represents one coulomb per second.

181
C OMPUTING D ERIVATIVES 4.7 I NVERSE TRIG FUNCTIONS AND THEIR DERIVATIVES

182
Chapter 5

R ELATED R ATES

Learning Objectives
• Implement a sequence of steps to solve related rates problems.

Consider the following problem


A spherical balloon is being inflated at a rate of 13cm3 /sec. How fast is the radius
changing when the balloon has radius 15cm?
There are several pieces of information in the statement:
• The balloon is spherical
• The volume is changing at a rate of 13cm3 /sec — so we need variables for volume (in cm3 )
and time (in sec). Good choices are V and t.
• We are asked for the rate at which the radius is changing — so we need a variable for radius
and units. A good choice is r, measured in cm — since volume is measured in cm3 .
Since the balloon is a sphere we know that
4
V = πr3
3
Since both the volume and radius are changing with time, both V and r are implicitly functions of
time; we could really write
4
V (t ) = πr (t )3 .
3
We are told the rate at which the volume is changing and we need to find the rate at which the radius
is changing. That is, from a knowledge of dV 1 dr
dt , find the related rate dt .

1 Related rate problems are problems in which you are given the rate of change of one quantity and are to determine
the rate of change of another, related, quantity.

183
R ELATED R ATES

In this case, we can just differentiate our equation by t to get


dV dr
= 4πr2
dt dt
This can then be rearranged to give
dr 1 dV
= .
dt 4πr2 dt
dV
Now we were told that dt = 13, so
dr 13
= .
dt 4πr2
We were also told that the radius is 15cm, so at that moment in time
dr 13
= .
dt π4 ˆ 152
This is a very typical example of a related rate problem. This section is really just a collection of
problems, but all will follow a similar pattern.
• The statement of the problem will tell you quantities that must be related (above it was volume,
radius and, implicitly, time).

• Typically a little geometry (or some physics or. . . ) will allow you to relate these quantities
(above it was the formula that links the volume of a sphere to its radius).

• Implicit differentiation will then allow you to link the rate of change of one quantity to another.
Another balloon example
Example 5.0.1
Consider a helium balloon rising vertically from a fixed point 200m away from you. You are trying
to work out how fast it is rising. Now — computing the velocity directly is difficult, but you can
measure angles. You observe that when it is at an angle of π/4 its angle is changing by 0.05 radians
per second.

• Start by drawing a picture with the relevant variables

184
R ELATED R ATES

• So denote the angle to be θ (in radians), the height of the balloon (in m) by h and time (in
seconds) by t. Then trigonometry tells us
h = 200 ¨ tan θ

• Differentiating allows us to relate the rates of change


dh dθ
= 200 sec2 θ ¨
dt dt

• We are told that when θ = π/4 we observe dt = 0.05, so
dh
= 200 ¨ sec2 (π/4) ¨ 0.05
dt
? 2
= 200 ¨ 0.05 ¨ 2
5
= 200 ¨ ¨2 = 20m/s
100
• So the balloon is rising at a rate of 20m/s.

Example 5.0.1

The following problem is perhaps the classic related rate problem.


Example 5.0.2
A 5m ladder is leaning against a wall. The floor is quite slippery and the base of the ladder slides
out from the wall at a rate of 1m/s. How fast is the top of the ladder sliding down the wall when the
base of the ladder is 3m from the wall?
• A good first step is to draw a picture stating all relevant quantities. This will also help us
define variables and units.

• So now define x(t ) to be the distance between the bottom of the ladder and the wall, at time t,
and let y(t ) be the distance between the top of the ladder and the ground at time t. Measure
time in seconds, but both distances in meters.

185
R ELATED R ATES

• We can relate the quantities using Pythagoras:

x2 + y2 = 52

• Differentiating with respect to time then gives


dx dy
2x + 2y = 0
dt dt
dx
• We know that dt = 1 and x = 3, so
dy
6 ¨ 1 + 2y =0
dt
but we need to determine y before we can go further. Thankfully we know that x2 + y2 = 25
and x = 3, so y2 = 25 ´ 9 = 16 and2 so y = 4.
• So finally putting everything together
dy
6¨1+8 =0
dt
dy 3
= ´ m/s.
dt 4
Thus the top of the ladder is sliding towards the floor at a rate of 3/4m/s.

Example 5.0.2

The next example is complicated by the rates of change being stated not just as “the rate of
change per unit time” but instead being stated as “the percentage rate of change per unit time”. If a
quantity f is changing with rate ddtf , then we can say that
df
dt
f is changing at a rate of 100 ¨ percent.
f
Thus if, at time t, f has rate of change r%, then
f 1 (t ) r
100 = r ùñ f 1 (t ) = f (t )
f (t ) 100
so that if h is a very small time increment
f (t + h) ´ f (t ) r rh
« f (t ) ùñ f (t + h) « f (t ) + f (t )
h 100 100
rh
That is, over a very small time interval h, f increases by the fraction 100 of its value at time t.
So armed with this, let’s look at the problem.
Example 5.0.3
The quantities P, Q and R are functions of time and are related by the equation R = PQ. Assume

2 Since the ladder isn’t buried in the ground, we can discard the solution y = ´4.

186
R ELATED R ATES

1
that P is increasing instantaneously at the rate of 8% per year (meaning that 100 PP = 8) and that Q
1
is decreasing instantaneously at the rate of 2% per year (meaning that 100 QQ = ´2). Determine the
percentage rate of change for R.
Solution. This one is a little different — we are given the variables and the formula, so no picture
drawing or defining required. Though we do need to define a time variable — let t denote time in
years.

• Since R(t ) = P(t ) ¨ Q(t ) we can differentiate with respect to t to get

dR
= PQ1 + QP1
dt

• But we need the percentage change in R, namely

R1 PQ1 + QP1
100 = 100
R R
but R = PQ, so rewrite it as

PQ1 + QP1
= 100
PQ
PQ 1 QP1
= 100 + 100
PQ PQ
Q 1 P 1
= 100 + 100
Q P
so we have stated the instantaneous percentage rate of change in R as the sum of the percentage
rate of change in P and Q.

• We know the percentage rate of change of P and Q, so

R1
100 = ´2 + 8 = 6
R
That is, the instantaneous percentage rate of change of R is 6% per year.

Example 5.0.3

Yet another falling object example.


Example 5.0.4
A ball is dropped from a height of 49m above level ground. The height of the ball at time t is
h(t ) = 49 ´ 4.9t 2 m. A light, which is also 49m above the ground, is 10m to the left of the ball’s
original position. As the ball descends, the shadow of the ball caused by the light moves across the
ground. How fast is the shadow moving one second after the ball is dropped?
Solution. There is quite a bit going on in this example, so read carefully.

• First a diagram; the one below is perhaps a bit over the top.

187
R ELATED R ATES

• Let’s call s(t ) the distance from the shadow to the point on the ground directly underneath the
ball.
• By similar triangles we see that
4.9t 2 49 ´ 4.9t 2
=
10 s(t )
10
We can then solve for s(t ) by just multiplying both sides by 4.9t 2
s(t ). This gives

49 ´ 4.9t 2 100
s(t ) = 10 = ´ 10
4.9t 2 t2
• Differentiating with respect to t will then give us the rates,
100
s1 (t ) = ´2
t3

• So, at t = 1, s1 (1) = ´200m/sec. That is, the shadow is moving to the left at 200m/sec.

Example 5.0.4

A more nautical example.


Example 5.0.5
Two boats spot each other in the ocean at midday — Boat A is 15km west of Boat B. Boat A is

188
R ELATED R ATES

travelling east at 3km/h and boat B is travelling north at 4km/h. At 3pm how fast is the distance
between the boats changing.
• First we draw a picture.

• Let x(t ) be the distance at time t, in km, from boat A to the original position of boat B (i.e. to
the position of boat B at noon). And let y(t ) be the distance at time t, in km, of boat B from
its original position. And let z(t ) be the distance between the two boats at time t.
• Additionally we are told that x1 = ´3 and y1 = 4 — notice that x1 ă 0 since that distance is
getting smaller with time, while y1 ą 0 since that distance is increasing with time.
• Further at 3pm boat A has travelled 9km towards the original position of boat B, so x =
15 ´ 9 = 6, while boat B has travelled 12km away from its original position, so y = 12.
• The distances x, y and z form a right-angled triangle, and Pythagoras tells us that

z2 = x2 + y2 .

At 3pm we know x = 6, y = 12 so

z2 = 36 + 144 = 180
? ?
z = 180 = 6 5.

• Differentiating then gives


dz dx dy
2z = 2x + 2y
dt dt dt
= 12 ¨ (´3) + 24 ¨ (4)
= 60.
?
Dividing through by 2z = 12 5 then gives
dz 60 5 ?
= ? =? = 5
dt 12 5 5
?
So the distance between the boats is increasing at 5km/h.

189
R ELATED R ATES

Example 5.0.5

One last one before we move on to another topic.


Example 5.0.6

Consider a cylindrical fuel tank of radius r and length L (in some appropriate r
units) that is lying on its side. Suppose that fuel is being pumped into the
tank at a rate q. At what rate is the fuel level rising?
L
Solution. If the tank were vertical everything would be much easier. Unfortunately the tank is on
its side, so we are going to have to work a bit harder to establish the relation between the depth and
volume. Also notice that we have not been supplied with units for this problem — so we do not
need to state the units of our variables.

• Again — draw a picture. Here is an end view of the tank; the shaded part of the circle is filled
with fuel.

θ r

• Let us denote by V (t ) the volume of fuel in the tank at time t and by h(t ) the fuel level at time
t.

• We have been told that V 1 (t ) = q and have been asked to determine h1 (t ). While it is possible
to do so by finding a formula relating V (t ) and h(t ), it turns out to be quite a bit easier to first
find a formula relating V and the angle θ shown in the end view. We can then translate this
back into a formula in terms of h using the relation

h(t ) = r ´ r cos θ (t ).

Once we know θ 1 (t ), we can easily obtain h1 (t ) by differentiating the above equation.

• The computation that follows below gets a little involved in places, so we will drop the “(t )”
on the variables V , h and θ . The reader must never forget that these three quantities are really
functions of time, while r and L are constants that do not depend on time.

• The volume of fuel is L times the cross–sectional area filled by the fuel. That is,

V = L ˆ Area

190
R ELATED R ATES

While we do not have a canned formula for the area of a chord of a circle like this, it is easy to
express the area of the chord in terms of two areas that we can compute.
    
 r
V = L ˆ Area = L ˆ Area 2θ ´ Area θ r

2θ r 2θ 2θ 2
– The piece of pie is the fraction 2π of the full circle, so its area is 2π πr =

θ r2 .

– The triangle θ r as height r cos θ and base 2r sin θ and hence has area
1 r2
2 (r cos θ )(2r sin θ ) = r2 sin θ cos θ = 2 sin(2θ ), where we have used a double-angle
formula.
Subbing these two areas into the above expression for V gives
 
2 r2 Lr2  
V = L ˆ θ r ´ sin 2θ = 2θ ´ sin 2θ
2 2
Oof!
• Now we can differentiate to find the rate of change. Recalling that V = V (t ) and θ = θ (t ),
while r and L are constants,
Lr2
V1 = [2θ 1 ´ 2 cos 2θ ¨ θ 1 ]
2
= Lr2 ¨ θ 1 ¨ [1 ´ cos 2θ ]
Solving this for θ 1 and using V 1 = q gives
q
θ1 =
Lr2 (1 ´ cos 2θ )
This is the rate at which θ is changing, but we need the rate at which h is changing. We get
this from
h = r ´ r cos θ differentiating this gives
h1 = r sin θ ¨ θ 1
Substituting our expression for θ 1 into the expression for h1 gives
q
h1 = r sin θ ¨ 2
Lr (1 ´ cos 2θ )
• We can clean this up a bit more — recall more double-angle formulas
q
h1 = r sin θ ¨ 2 substitute cos 2θ = 1 ´ 2 sin2 θ
Lr (1 ´ cos 2θ )
q
= r sin θ ¨ 2 now cancel r’s and a sin θ
Lr ¨ 2 sin2 θ
q
=
2Lr sin θ

191
R ELATED R ATES

• But we can clean this up even more — instead of writing this rate in terms of θ it is more
natural to write it in terms of h (since the initial problem is stated in terms of h). From the
triangle

r−h θ r

and Pythagoras we have


a ?
r 2 ´ (r ´ h)2 2rh ´ h2
sin θ = =
r r
and hence
q
h1 = ? .
2L 2rh ´ h2

• As a check, notice that h1 becomes undefined when h ă 0 and also when h ą 2r, because then
the argument of the square root in the denominator is negative. Both make sense — the fuel
level in the tank must obey 0 ď h ď 2r.

Example 5.0.6

192
Chapter 6

L’H ÔPITAL’ S RULE AND


I NDETERMINATE F ORMS

Learning Objectives
• Recognize the two types of indeterminate forms where L’Hôpital’s rule is directly
applicable.

• Use L’Hôpital’s rule to evaluate limits; compare/contrast with asymptotics.

Let us return to limits (Chapter 2) and see how we can use derivatives to simplify certain families of
limits called indeterminate forms. We know, from Theorem 2.1.14 on the arithmetic of limits, that if

lim f (x) = F lim g(x) = G


xÑa xÑa

and G ‰ 0, then

f (x ) F
lim =
xÑa g(x) G

The requirement that G ‰ 0 is critical — we explored this in Example 2.1.18. Please reread that
example.
Of course1 it is not surprising that if F ‰ 0 and G = 0, then

f (x )
lim = DNE
xÑa g(x)

and if F = 0 but G ‰ 0 then


f (x )
lim =0
xÑa g(x)

1 Now it is not so surprising, but perhaps back when we started limits, this was not so obvious.

193
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS

However when both F, G = 0 then, as we saw in Example 2.1.18, almost anything can happen

x 1
f (x ) = x g ( x ) = x2 lim = lim = DNE
xÑ0 x2 xÑ0 x
x2
f ( x ) = x2 g(x ) = x lim = lim x = 0
xÑ0 x xÑ0
x
f (x ) = x g(x ) = x lim = lim 1 = 1
xÑ0 x xÑ0
7x2 7 7
f (x) = 7x2 g(x) = 3x2 lim 2 = lim =
xÑ0 3x xÑ0 3 3

Indeed after exploring Example 2.1.23 and 2.1.25 we gave ourselves the rule of thumb that if we
found 0/0, then there must be something that cancels.
Because the limit that results from these 0/0 situations is not immediately obvious, but also
leads to some interesting mathematics, we should give it a name.

Definition 6.0.1 (First indeterminate forms).

Let a P R and let f (x) and g(x) be functions. If

lim f (x) = 0 and lim g(x) = 0


xÑa xÑa

then the limit


f (x )
lim
xÑa g(x)

is called a 0/0 indeterminate form.

There are quite a number of mathematical tools for evaluating such indeterminate forms —
Taylor series for example. A simpler method, which works in quite a few cases, is L’Hôpital’s rule2 .

2 Named for the 17th century mathematician, Guillaume de l’Hôpital, who published the first textbook on differential
calculus. The eponymous rule appears in that text, but is believed to have been developed by Johann Bernoulli.
The book was the source of some controversy since it contained many results by Bernoulli, which l’Hôpital
acknowledged in the preface, but Bernoulli felt that l’Hôpital got undue credit.
Note that around that time l’Hôpital’s name was commonly spelled l’Hospital, but the spelling of silent s in French
was changed subsequently; many texts spell his name l’Hospital. If you find yourself in Paris, you can hunt along
Boulevard de l’Hôpital for older street signs carved into the sides of buildings which spell it “l’Hospital” — though
arguably there are better things to do there.

194
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS

Theorem 6.0.2 (L’Hôpital’s Rule).

Let a P R and assume that

lim f (x) = lim g(x) = 0


xÑa xÑa

Then

(a) if f 1 (a) and g1 (a) exist and g1 (a) ‰ 0, then

f (x ) f 1 (a)
lim = 1 ,
xÑa g(x) g (a)

(b) while, if f 1 (x) and g1 (x) exist, with g1 (x) nonzero, on an open interval that contains a,
except possibly at a itself, and if the limit

f 1 (x )
lim exists or is +8 or is ´8
xÑa g1 (x)

then
f (x ) f 1 (x )
lim = lim 1
xÑa g(x) xÑa g (x)

Proof. We only give the proof for part (a). The proof of part (b) is not very difficult, but uses the
Generalised Mean–Value Theorem (Theorem 9.7.1), which is optional and most readers have not
seen it.

• First note that we must have f (a) = g(a) = 0. To see this note that since derivative f 1 (a)
exists, we know that the limit

f (x ) ´ f (a)
lim exists
xÑa x´a

Since we know that the denominator goes to zero, we must also have that the numerator goes
to zero (otherwise the limit would be undefined). Hence we must have

 
lim ( f (x) ´ f (a)) = lim f (x) ´ f (a) = 0
xÑa xÑa

We are told that lim f (x) = 0 so we must have f (a) = 0. Similarly we know that g(a) = 0.
xÑa

195
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES

• Now consider the indeterminate form


f (x ) f (x ) ´ 0
lim = lim use 0 = f (a) = g(a)
xÑa g(x) xÑa g(x) ´ 0

f (x ) ´ f (a) (x ´ a)´1
= lim multiply by 1 =
xÑa g(x) ´ g(a) (x ´ a)´1
f (x) ´ f (a) (x ´ a)´1
= lim ¨ rearrange
xÑa g(x) ´ g(a) (x ´ a)´1
 
f (x ) ´ f (a)
 x´a 
= lim   use arithmetic of limits
xÑa  g(x) ´ g(a) 
x´a
f (x ) ´ f (a)
lim f 1 (a)
=
xÑa x´a = 1
g(x ) ´ g(a) g (a)
lim
xÑa x´a
We can justify this step and apply Theorem 2.1.14, since the limits in the numerator and
denominator exist, because they are just f 1 (a) and g1 (a).

6.1 IJ Standard examples


Here are some simple examples using L’Hôpital’s rule.
Example 6.1.1
Consider the limit
sin x
lim
xÑ0 x

• Notice that

lim sin x = 0
xÑ0
lim x = 0
xÑ0

so this is a 0/0 indeterminate form, and suggests we try l’Hôpital’s rule.


• To apply the rule we must first check the limits of the derivatives.

f (x) = sin x f 1 (x) = cos x and f 1 (0) = 1


g(x ) = x g1 (x) = 1 and g1 ( 0 ) = 1

• So by l’Hôpital’s rule
sin x f 1 (0) 1
lim = 1 = = 1.
xÑ0 x g (0) 1

196
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES

Example 6.1.1

Example 6.1.2
Consider the limit
sin(x)
lim
xÑ0 sin(2x)

• First check

lim sin 2x = 0
xÑ0
lim sin x = 0
xÑ0

so we again have a 0/0 indeterminate form.


• Set f (x) = sin x and g(x) = sin 2x, then

f 1 (x) = cos x f 1 (0) = 1


g1 (x) = 2 cos 2x g1 ( 0 ) = 2

• And by l’Hôpital’s rule


sin x f 1 (0) 1
lim = 1 = .
xÑ0 sin 2x g (0) 2

Example 6.1.2

Example 6.1.3
Let q ą 1 and compute the limit
qx ´ 1
lim
xÑ0 x
This limit arose in our discussion of exponential functions in Section 3.5.

• First check

lim (qx ´ 1) = 1 ´ 1 = 0
xÑ0
lim x = 0
xÑ0

so we have a 0/0 indeterminate form.


• Set f (x) = qx ´ 1 and g(x) = x, then (maybe after a quick review of Section 3.5)
d x
f 1 (x ) = (q ´ 1) = qx ¨ log q f 1 (0) = log q
dx
g1 (x) = 1 g1 ( 0 ) = 1

197
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES

• And by l’Hôpital’s rule3

qh ´ 1
lim = log q.
hÑ0 h

Example 6.1.3

In this example, we shall apply L’Hôpital’s rule twice before getting the answer.
Example 6.1.4
Compute the limit

sin(x2 )
lim
xÑ0 1 ´ cos x

• Again we should check

lim sin(x2 ) = sin 0 = 0


xÑ0
lim (1 ´ cos x) = 1 ´ cos 0 = 0
xÑ0

and we have a 0/0 indeterminate form.

• Let f (x) = sin(x2 ) and g(x) = 1 ´ cos x then

f 1 (x) = 2x cos(x2 ) f 1 (0) = 0


g1 (x) = sin x g1 (0) = 0

So if we try to apply l’Hôpital’s rule naively we will get

sin(x2 ) f 1 (0) 0
lim = 1 = .
xÑ0 1 ´ cos x g (0) 0

which is another 0/0 indeterminate form.

• It appears that we are stuck until we remember that l’Hôpital’s rule (as stated in Theorem 6.0.2)
has a part (b) — now is a good time to reread it.

3 While it might not be immediately obvious, this example relies on circular reasoning. In order to apply l’Hôpital’s
rule, we need to compute the derivative of qx . However in order to compute that limit (see Section 3.5) we needed
to evaluate this limit.
A more obvious example of this sort of circular reasoning can be seen if we use l’Hôpital’s rule to compute the
derivative of f (x) = xn at x = a using the limit

x n ´ an nxn´1 ´ 0
f 1 (a) = lim = lim = nan´1 .
x Ña x ´ a x Ña 1´0
d n
We have used the result dx x = nxn´1 to prove itself!

198
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES

• It says that

f (x ) f 1 (x )
lim = lim 1
xÑ0 g(x) xÑ0 g (x)

provided this second limit exists. In our case this requires us to compute

2x cos(x2 )
lim
xÑ0 sin(x)

which we can do using l’Hôpital’s rule again. Now

h(x) = 2x cos(x2 ) h1 (x) = 2 cos(x2 ) ´ 4x2 sin(x2 ) h1 ( 0 ) = 2


`(x) = sin(x) `1 (x) = cos(x) `1 (0) = 1

By l’Hôpital’s rule

2x cos(x2 ) h1 (0)
lim = 1 =2
xÑ0 sin(x) ` (0)

• Thus our original limit is

sin(x2 ) 2x cos(x2 )
lim = lim = 2.
xÑ0 1 ´ cos x xÑ0 sin(x)

• We can succinctly summarise the two applications of L’Hôpital’s rule in this example by

sin(x2 ) 2x cos(x2 ) 2 cos(x2 ) ´ 4x2 sin(x2 )


lim = lim = lim =2
1 ´ cos x xÑ0 loooomoooon
xÑ0 looomooon sin x cos x
xÑ0 looooooooooooomooooooooooooon
numÑ0 numÑ0 numÑ2
denÑ0 denÑ0 denÑ1

Here “num” and “den” are used as abbreviations of “numerator” and “denominator” respec-
tively.”

Example 6.1.4

One must be careful to ensure that the hypotheses of l’Hôpital’s rule are satisfied before applying
it. The following “warnings” show the sorts of things that can go wrong.

199
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES

Warning 6.1.5 (Denominator limit nonzero).

If

lim f (x) = 0 but lim g(x) ‰ 0


xÑa xÑa

then
f (x ) f 1 (a) f 1 (x )
lim need not be the same as or lim .
xÑa g(x) g1 (a) xÑa g1 (x)

Here is an example. Take

a=0 f (x) = 3x g(x) = 4 + 5x

Then
f (x ) 3x 3ˆ0
lim = lim = =0
xÑ0 g(x) xÑ0 4 + 5x 4+5ˆ0
f 1 (x ) f 1 (0) 3
lim = =
xÑ0 g1 (x) g1 ( 0 ) 5

Warning 6.1.6 (Numerator limit nonzero).

If

lim g(x) = 0 but lim f (x) ‰ 0


xÑa xÑa

then
f (x ) f 1 (x )
lim need not be the same as lim .
xÑa g(x) xÑa g1 (x)

Here is an example. Take

a=0 f (x) = 4 + 5x g(x) = 3x

Then
f (x ) 4 + 5x
lim = lim = DNE
xÑ0 g(x) xÑ0 3x

f 1 (x ) 5 5
lim = lim =
xÑ0 g (x)
1 xÑ0 3 3

This next one is more subtle; the limits of the original numerator and denominator functions
both go to zero, but the limit of the ratio their derivatives does not exist.

200
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.1 S TANDARD EXAMPLES

Warning 6.1.7 (Limit of ratio of derivatives DNE).

If

lim f (x) = 0 and lim g(x) = 0


xÑa xÑa

but
f 1 (x )
lim does not exist
xÑa g1 (x)

then it is still possible that

f (x )
lim exists.
xÑa g(x)

Here is an example. Take


1
a=0 f (x) = x2 sin g(x ) = x
x
Then (with an application of the squeeze theorem)

lim f (x) = 0 and lim g(x) = 0.


xÑ0 xÑ0

If we attempt to apply l’Hôpital’s rule then we have g1 (x) = 1 and

1 1
f 1 (x) = 2x sin ´ cos
x x
and we then try to compute the limit
 
f 1 (x ) 1 1
lim = lim 2x sin ´ cos
xÑ0 g1 (x) xÑ0 x x

However, this limit does not exist. The first term converges to 0 (by the squeeze theorem),
but the second term cos(1/x) just oscillates wildly between ˘1. All we can conclude
from this is

Since the limit of the ratio of derivatives does not exist, we cannot apply
l’Hôpital’s rule.

Instead we should go back to the original limit and apply the squeeze theorem:

f (x ) x2 sin 1x 1
lim = lim = lim x sin = 0,
xÑ0 g(x) xÑ0 x xÑ0 x

since |x sin(1/x)| ă |x| and |x| Ñ 0 as x Ñ 0.

It is also easy to construct an example in which the limits of numerator and denominator are

201
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.2 VARIATIONS

both zero, but the limit of the ratio and the limit of the ratio of the derivatives do not exist. A slight
change of the previous example shows that it is possible that
lim f (x) = 0 and lim g(x) = 0
xÑa xÑa

but neither of the limits


f (x ) f 1 (x )
lim or lim
xÑa g(x) xÑa g1 (x)

exist. Take
1
a=0 f (x) = x sin g(x ) = x
x
Then (with a quick application of the squeeze theorem)
lim f (x) = 0 and lim g(x) = 0.
xÑ0 xÑ0

However,
f (x ) x sin 1x 1
lim = lim = lim sin
xÑ0 g(x) xÑ0 x xÑ0 x
does not exist. And similarly
f 1 (x ) sin 1x ´ 1x cos 1x
lim = lim
xÑ0 g1 (x) xÑ0 x2
does not exist.

6.2 IJ Variations
Theorem 6.0.2 is the basic form of L’Hôpital’s rule, but there are also many variations. Here are a
bunch of them.
(a) L’Hôpital’s rule also applies when the limit of x Ñ a is replaced by lim or by lim or by
xÑa+ xÑa´
lim or by lim .
xÑ+8 xÑ´8
We can justify adapting the rule to the limits to ˘8 via the following reasoning
f (x ) f (1/y)
lim = lim substitute x = 1/y
xÑ8 g(x) yÑ0+ g(1/y)
´ y12 f 1 (1/y)
= lim ,
yÑ0+ ´ y12 g1 (1/y)
d
where we have used l’Hôpital’s rule (assuming this limit exists) and the fact that dy f (1/y) =
1 1
´ y2 f (1/y) (and similarly for g). Cleaning this up and substituting y = 1/x gives the required
result:
f (x ) f 1 (1/y) f 1 (x )
lim = lim 1 = lim 1 .
xÑ8 g(x) yÑ0+ g (1/y) xÑ8 g (x)

202
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.2 VARIATIONS

Example 6.2.1
Consider the limit
arctan x ´ π2
lim
xÑ8 1/x

Both numerator and denominator go to 0 as x Ñ 8, so this is an 0/0 indeterminate form. We


find
1
arctan x ´ π2 1 + x2 1
lim 1
= lim = ´ lim = ´1
xÑ+8 xÑ+8 ´ 12 xÑ+8 1 + 12
x
looooomooooon x loomoxon
numÑ0 numÑ1
denÑ0 denÑ1

We have applied L’Hôpital’s rule with


π 1
f (x) = arctan x ´ g(x ) =
2 x
1 1
f 1 (x ) = g1 (x) = ´
1 + x2 x2

Example 6.2.1

8
(b) 8 indeterminate form: L’Hôpital’s rule also applies when lim f (x) = 0, lim g(x) = 0 is
xÑa xÑa
replaced by lim f (x) = ˘8, lim g(x) = ˘8.
xÑa xÑa

Example 6.2.2
Consider the limit
log x
lim
xÑ8 x

The numerator and denominator both blow up towards infinity so this is an 8/8 indeterminate
form. An application of l’Hôpital’s rule gives
log x 1/x
lim = lim
x on xÑ8 1
xÑ8 loomo
numÑ8
denÑ8
1
= lim =0
xÑ8 x

Example 6.2.2

Example 6.2.3
Consider the limit
5x2 + 3x ´ 3
lim
xÑ8 x2 + 1

203
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.2 VARIATIONS

Then by two applications of l’Hôpital’s rule we get

5x2 + 3x ´ 3 10x + 3 10
lim 2
= lim = lim = 5.
x +1
xÑ8 loooooomoooooon 2x
xÑ8 loomoon xÑ8 2
numÑ8 numÑ8
denÑ8 denÑ8

Example 6.2.3

Example 6.2.4
Compute the limit

log x
lim 
xÑ0+ tan π ´ x
2

We can compute this using l’Hôpital’s rule twice:


1
log x x cos2 ( π2 ´ x)
lim  = lim = ´ lim
xÑ0+ tan π ´ x xÑ0+ ´ sec2 ( 2 ´ x)
π
2
x
xÑ0+ looooomooooon
looooomooooon
numÑ0
numÑ´8 denÑ0
denÑ+8

2 cos( π2 ´ x) sin( π2 ´ x)
= ´ lim =0
1
xÑ0+ looooooooooooomooooooooooooon
numÑ0
denÑ1

The first application of L’Hôpital’s was with


π
f (x) = log x g(x) = tan ´x
2
1 π 
f 1 (x ) = g1 (x) = ´ sec2 ´x
x 2
and the second time with
π 
f (x) = cos2 ´x g(x ) = x
2
π h π i
f 1 (x) = 2 cos ´ x ´ sin ´ x (´1) g1 ( x ) = 1
2 2

Example 6.2.4

Sometimes things don’t quite work out as we would like and l’Hôpital’s rule can get stuck in a
loop. Remember to think about the problem before you apply any rule.

204
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.2 VARIATIONS

Example 6.2.5
Consider the limit
ex + e´x
lim
xÑ8 ex ´ e´x

Clearly both numerator and denominator go to 8, so we have a 8/8 indeterminate form. Naively
applying l’Hôpital’s rule gives
ex + e´x ex ´ e´x
lim = lim x
xÑ8 ex ´ e´x xÑ8 e + e´x

which is again a 8/8 indeterminate form. So apply l’Hôpital’s rule again:


ex ´ e´x ex + e´x
lim = lim
xÑ8 ex + e´x xÑ8 ex ´ e´x

which is right back where we started!


The correct approach to such a limit is to apply the methods we learned in Chapter 2 and rewrite
ex + e´x ex (1 + e´2x ) 1 + e´2x
= =
ex ´ e´x ex (1 ´ e´2x ) 1 ´ e´2x
and then take the limit.
A similar sort of l’Hôpital-rule-loop will occur if you naively apply l’Hôpital’s rule to the limit
?
4x2 + 1
lim
xÑ8 5x ´ 1

which appeared in Example 2.1.32.


Example 6.2.5

(c) 0 ¨ 8 indeterminate form: When lim f (x) = 0 and lim g(x) = 8. We can use a little algebra
xÑa xÑa
0 8
to manipulate this into either a 0 or 8 form:

f (x ) g(x )
lim lim
xÑa 1/g(x) xÑa 1/ f (x)

Example 6.2.6
Consider the limit

lim x ¨ log x
xÑ0+

Here the function f (x) = x goes to zero, while g(x) = log x goes to ´8. If we rewrite this as
the fraction
log x
x ¨ log x =
1/x

205
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.2 VARIATIONS

then the 0 ¨ 8 form has become an 8/8 form.


The result is then
1
log x x
lim loomo omoxon = lim
x on lolog 1
= lim = ´ lim x = 0
xÑ0+ xÑ0+ xÑ0+ ´ 12 xÑ0+
Ñ0 x on x
Ñ´8 loomo
numÑ´8
denÑ8

Example 6.2.6

Example 6.2.7
In this example we’ll evaluate lim xn e´x , for all natural numbers n. We’ll start with n = 1 and
xÑ+8
n = 2 and then, using what we have learned from those cases, move on to general n.

´x x 1
lim loomo
x on looemoon = lim = lim = lim e´x = 0
xÑ+8 ex on xÑ+8 loomo
xÑ+8 loomo ex on xÑ+8
Ñ8 Ñ0
numÑ+8 numÑ1
denÑ+8 denÑ+8

Applying l’Hôpital twice,

x2 2x 2
x2 on looemo
lim loomo ´x
on = lim x
= lim x
= lim x
= lim 2e´x = 0
xÑ+8 e on xÑ+8 loomo
xÑ+8 loomo e on xÑ+8 loomo
e on xÑ+8
Ñ8 Ñ0
numÑ+8 numÑ8 numÑ2
denÑ+8 denÑ+8 denÑ+8

Indeed, for any natural number n, applying l’Hôpital n times gives

n ´x xn
x on looemoon = lim
lim loomo
xÑ+8 ex on
xÑ+8loomo
Ñ8 Ñ0
numÑ+8
denÑ+8

nxn´1
= lim
ex on
xÑ+8 loomo
numÑ8
denÑ+8

n(n ´ 1)xn´2
= lim
ex
xÑ+8 loooooomoooooon
numÑ8
denÑ+8

n!
= ¨ ¨ ¨ = lim =0
ex on
xÑ+8 loomo
numÑn!
denÑ+8

Example 6.2.7

206
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS

6.3 IJ (optional) Even more variations


These next forms aren’t explicitly part of the learning goals, but they’re a good opportunity to
practice algebra skills, such as those used in logarithmic differentiation.

(d) 8 ´ 8 indeterminate form: When lim f (x) = 8 and lim g(x) = 8. We rewrite the difference
xÑa xÑa
as a fraction using a common denominator

h(x )
f (x ) ´ g(x ) =
`(x )

which is then a 0/0 or 8/8 form.

Example 6.3.1
Consider the limit

lim (sec x ´ tan x)


xÑ π2 ´

π´
Since the limit of both sec x and tan x is +8 as x Ñ 2 , this is an 8 ´ 8 indeterminate form.
However we can rewrite this as
1 sin x 1 ´ sin x
sec x ´ tan x = ´ =
cos x cos x cos x
which is then a 0/0 indeterminate form. This then gives
  1 ´ sin x ´ cos x
lim sec x ´
loomoon loomoon = lim
tan x = lim =0
xÑ π2 ´ cos x
xÑ π2 ´ looomooon ´ sin x
xÑ π2 ´ loomoon
Ñ+8 Ñ+8
numÑ0 numÑ0
denÑ0 denÑ´1

Example 6.3.1

In the last example, Example 6.3.1, we converted an 8 ´ 8 indeterminate form into a 00


indeterminate form by exploiting the fact that the two terms, sec x and tan x, in the 8 ´ 8
indeterminate form shared a common denominator, namely cos x. In the “real world” that will,
of course, almost never happen. However as the next couple of examples show, you can often
massage these expressions into suitable forms.
Here is another, much more complicated, example, where it doesn’t happen.

Example 6.3.2
In this example, we evaluate the 8 ´ 8 indeterminate form
 1 1 
lim ´
log(1 + x)
x on loooomoooon
xÑ0 loomo
Ñ˘8 Ñ˘8

207
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS

We convert it into a 00 indeterminate form simply by putting the two fractions, 1


x and 1
log(1+x)
over a common denominator.
 1 1  log(1 + x) ´ x
lim ´ = lim (E1)
log(1 + x)
x on loooomoooon
xÑ0 loomo x log(1 + x)
xÑ0 looooooomooooooon
Ñ˘8 Ñ˘8 numÑ0
denÑ0

Now we apply L’Hôpital’s rule, and simplify


1
log(1 + x) ´ x 1+x ´ 1 1 ´ (1 + x )
lim = lim x = lim
x log(1 + x)
xÑ0 looooooomooooooon xÑ0 log(1 + x) + 1+x xÑ0 (1 + x) log(1 + x) + x

numÑ0
denÑ0
x
= ´ lim (E2)
(1 + x) log(1 + x) + x
xÑ0 loooooooooooomoooooooooooon

numÑ0
denÑ1ˆ0+0=0

Then we apply L’Hôpital’s rule a second time


x 1 1
´ lim = ´ lim 1 + x
=´ (E3)
(1 + x) log(1 + x) + x
xÑ0 loooooooooooomoooooooooooon xÑ0 log(1 + x) +
1+x + 1
2
looooooooooomooooooooooon
numÑ0 numÑ1
denÑ1ˆ0+0=0 denÑ0+1+1=2

Combining (E1), (E2) and (E3) gives our final answer


1 1  1
lim ´ =´
xÑ0 x log(1 + x) 2

Example 6.3.2

The following example can be done by l’Hôpital’s rule, but it is actually far simpler to multiply
by the conjugate and take the limit using the tools of Chapter 2.

Example 6.3.3
Consider the limit
a a
lim x2 + 4x ´ x2 ´ 3x
xÑ8

Neither term is a fraction, but we can write


a a a a
x2 + 4x ´ x2 ´ 3x = x 1 + 4/x ´ x 1 ´ 3/x assuming x ą 0
a a 
=x 1 + 4/x ´ 1 ´ 3/x
? ?
1 + 4/x ´ 1 ´ 3/x
=
1/x

208
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS

? ?
which is now a 0/0 form with f (x) = 1 + 4/x ´ 1 ´ 3/x and g(x) = 1/x. Then
´4/x2 3/x2 1
f 1 (x ) = ? ´ ? g1 ( x ) = ´
2 1 + 4/x 2 1 ´ 3/x x2
Hence
f 1 (x ) 4 3
= ? +?
g (x )
1 2 1 + 4/x 1 ´ 3/x
And so in the limit as x Ñ 8
f 1 (x ) 4 3 7
lim = + =
xÑ8 g1 (x) 2 2 2
and so our original limit is also 7/2.
By comparison, if we multiply by the conjugate we have
a a a a  ?x2 + 4x + ?x2 ´ 3x
x2 + 4x ´ x2 ´ 3x = x2 + 4x ´ x2 ´ 3x ¨ ? ?
x2 + 4x + x2 ´ 3x
x2 + 4x ´ (x2 ´ 3x)
=? ?
x2 + 4x + x2 ´ 3x
7x
=? ?
x2 + 4x + x2 ´ 3x
7
=? ? assuming x ą 0
1 + 4/x + 1 ´ 3/x

Now taking the limit as x Ñ 8 gives 7/2 as required. Just because we know l’Hôpital’s rule, it
does not mean we should use it everywhere it might be applied.
Example 6.3.3

(e) 18 indeterminate form: We can use l’Hôpital’s rule on limits of the form
lim f (x)g(x) with
xÑa
lim f (x) = 1 and lim g(x) = 8
xÑa xÑa

by considering the logarithm of the limit4 :


   
log lim f (x)g(x) = lim log f (x)g(x) = lim log ( f (x)) ¨ g(x)
xÑa xÑa xÑa

which is now an 0 ¨ 8 form. This can be further transformed into a 0/0 or 8/8 form:
 
log lim f (x)g(x) = lim log ( f (x)) ¨ g(x)
xÑa xÑa
log ( f (x))
= lim .
xÑa 1/g(x)

4 We are using the fact that the logarithm is a continuous function and Theorem 2.3.8.

209
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS

Example 6.3.4
The following limit appears quite naturally when considering systems which display exponential
growth or decay.

lim (1 + x) /x
a
with the constant a ‰ 0
xÑ0

Since (1 + x) Ñ 1 and a/x Ñ 8 this is an 18 indeterminate form.


By considering its logarithm we have
   
a/x a/x
log lim (1 + x) = lim log (1 + x)
xÑ0 xÑ0
a
= lim log(1 + x)
xÑ0 x
a log(1 + x)
= lim
xÑ0 x
which is now a 0/0 form. Applying l’Hôpital’s rule gives
a
a log(1 + x)
lim = lim 1+x = a
x
xÑ0 looooomooooon 1 on
xÑ0 loomo
numÑ0 numÑa
denÑ0 denÑ1

h  i
Since (1 + x)a/x = exp log (1 + x)a/x and the exponential function is continuous, our
original limit is ea .
Example 6.3.4

Here is a more complicated example of a 18 indeterminate form.

Example 6.3.5
In the limit
 sin x 1/x2
lim
xÑ0 x
the base, sinx x , converges to 1 (see Example 6.1.1) and the exponent, 1
x2
, goes to 8. But if we
take logarithms then

 sin x 1/x2 log sinx x


log =
x x2
then, in the limit x Ñ 0, we have a 0/0 indeterminate form. One application of l’Hôpital’s rule
gives
x x cos x´sin x
log sinx x sin x x2
x cos x´sin x
x sin x x cos x ´ sin x
lim = lim = lim = lim
x2
xÑ0 loomoon xÑ0 2x xÑ0 2x xÑ0 2x2 sin x
numÑ0
denÑ0

210
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS

which is another 0/0 form. Applying l’Hôpital’s rule again gives:


x cos x ´ sin x cos x ´ x sin x ´ cos x
lim 2
= lim
2x sin x
xÑ0 loooooomoooooon xÑ0 4x sin x + 2x2 cos x
numÑ0
denÑ0
x sin x sin x
= ´ lim = ´ lim
xÑ0 4x sin x + 2x2 cos x xÑ0 4 sin x + 2x cos x

which is yet another 0/0 form. Once more with l’Hôpital’s rule:
sin x cos x
´ lim = ´ lim
4 sin x + 2x cos x
xÑ0 loooooooomoooooooon 4 cos x + 2 cos x ´ 2x sin x
xÑ0 loooooooooooooomoooooooooooooon
numÑ0 numÑ1
denÑ0 denÑ6
1

6
Oof! We have just shown that the logarithm of our original limit is ´1/6. Hence the original
limit itself is e´1/6 .
This was quite a complicated example. However it does illustrate the importance of cleaning up
your algebraic expressions. This will both reduce the amount of work you have to do and will
also reduce the number of errors you make.
Example 6.3.5

(f) 00 indeterminate form: Like the 18 form, this can be treated by considering its logarithm.

Example 6.3.6
For example, in the limit
lim xx
xÑ0+

both the base, x, and the exponent, also x, go to zero. But if we consider the logarithm then we
have
log xx = x log x
which is a 0 ¨ 8 indeterminate form, which we already know how to treat. In fact, we already
found, in Example 6.2.6, that
lim x log x = 0
xÑ0+

Since the exponential is a continuous function


  
lim xx = lim exp x log x = exp lim x log x = e0 = 1
xÑ0+ xÑ0+ xÑ0+

Example 6.3.6

211
L’H ÔPITAL’ S RULE AND I NDETERMINATE F ORMS 6.3 ( OPTIONAL ) E VEN MORE VARIATIONS

(g) 80 indeterminate form: Again, we can treat this form by considering its logarithm.

Example 6.3.7
For example, in the limit

lim x /x
1
xÑ+8

the base, x, goes to infinity and the exponent, 1x , goes to zero. But if we take logarithms

log x
log x /x =
1

x
which is an 8/8 form, which we know how to treat.
1
log x x
lim = lim =0
1 on
x on xÑ+8 loomo
xÑ+8 loomo
numÑ8 numÑ0
denÑ8 denÑ1

Since the exponential is a continuous function


 log x   log x 
lim x /x = lim exp = e0 = 1
1
= exp lim
xÑ+8 xÑ+8 x xÑ8 x

Example 6.3.7

212
Chapter 7

S KETCHING G RAPHS

One of the most obvious applications of derivatives is to help us understand the shape of the graph of
a function. In this section we will use our accumulated knowledge of derivatives to identify the most
important qualitative features of graphs y = f (x). The goal of this section is to highlight features of
the graph y = f (x) that are easily

• determined from f (x) itself, and

• deduced from f 1 (x), and

• read from f 2 (x).

We will then use the ideas to sketch several examples.

7.1 IJ Domain, intercepts and asymptotes

Learning Objectives
• Sketch a function using information from precalculus (limits, intercepts) and the first
derivative

• Efficiently find signs of factored functions by determining where the signs change.

Given a function f (x), there are several important features that we can determine from that
expression before examining its derivatives.

• The domain of the function — take note of values where f does not exist. If the function
is rational, look for where the denominator is zero. Similarly be careful to look for roots of
negative numbers or other possible sources of discontinuities.

• Intercepts — examine where the function crosses the x-axis and the y-axis by solving f (x) = 0
and computing f (0).

213
S KETCHING G RAPHS 7.1 D OMAIN , INTERCEPTS AND ASYMPTOTES

• Vertical asymptotes — look for values of x at which f (x) blows up. If f (x) approaches either
+8 or ´8 as x approaches a (or possibly as x approaches a from one side) then x = a is a
vertical asymptote to y = f (x). When f (x) is a rational function (written so that common
factors are cancelled), then y = f (x) has vertical asymptotes at the zeroes of the denominator.
• Horizontal asymptotes — examine the limits of f (x) as x Ñ +8 and x Ñ ´8. Often f (x)
will tend to +8 or to ´8 or to a finite limit L. If, for example, lim f (x) = L, then y = L is
xÑ+8
a horizontal asymptote to y = f (x) as x Ñ 8.

Example 7.1.1
Consider the function
x+1
f (x ) =
(x + 3)(x ´ 2)
• We see that it is defined on all real numbers except x = ´3, +2.
• Since f (0) = ´1/6 and f (x) = 0 only when x = ´1, the graph has y-intercept (0, ´1/6)
and x-intercept (´1, 0).
• Since the function is rational and its denominator is zero at x = ´3, +2 it will have vertical
asymptotes at x = ´3, +2. To determine the shape around those asymptotes we need to
examine the limits
lim f (x) lim f (x)
xÑ´3 xÑ2

Notice that when x is close to ´3, the factors (x + 1) and (x ´ 2) are both negative, so the
+1
sign of f (x) = xx´2 ¨ x+1 3 is the same as the sign of x + 3. Hence
lim f (x) = +8 lim f (x) = ´8
xÑ´3+ xÑ´3´
A similar analysis when x is near 2 gives
lim f (x) = +8 lim f (x) = ´8
xÑ2+ xÑ2´

• Finally since the numerator has degree 1 and the denominator has degree 2, we see that as
x Ñ ˘8, f (x) Ñ 0. So y = 0 is a horizontal asymptote.
• Since we know the behaviour around the asymptotes and we know the locations of the
intercepts (as shown in the left graph below), we can then join up the pieces and smooth them
out to get the a good sketch of this function (below right).

214
S KETCHING G RAPHS 7.2 F IRST DERIVATIVE — INCREASING OR DECREASING

Example 7.1.1

7.2 IJ First derivative — increasing or decreasing


Now we move on to the first derivative, f 1 (x). Consider any function f (x) that is continuous on an
interval A ď x ď B and is differentiable on A ă x ă B. Then
• if f 1 (x) ą 0 for all A ă x ă B, then f (x) is increasing on (A, B)
— that is, for all A ă a ă b ă B, f (a) ă f (b).
• if f 1 (x) ă 0 for all A ă x ă B, then f (x) is decreasing on (A, B)
— that is, for all A ă a ă b ă B, f (a) ą f (b).
Thus the sign of the derivative indicates to us whether the function is increasing or decreasing.
Further, as we discussed in Section 8.1, we should also examine points at which the derivative is
zero — critical points — and points where the derivative does not exist. These points may indicate a
local maximum or minimum.
We will now consider a function f (x) that is defined on an interval I, except possibly at finitely
many points of I. If f or its derivative f 1 is not defined at a point a of I, then we call a a singular
point1 of f .
After studying the function f (x) as described above, we should compute its derivative f 1 (x).
• Critical points — determine where f 1 (x) = 0. At a critical point, f has a horizontal tangent.
• Singular points — determine where f 1 (x) is not defined. If f 1 (x) approaches ˘8 as x
approaches a singular point a, then f has a vertical tangent there when f approaches a finite
value as x approaches a (or possibly approaches a from one side) and a vertical asymptote
when f (x) approaches ˘8 as x approaches a (or possibly approaches a from one side).
• Increasing and decreasing — where is the derivative positive and where is it negative. Notice
that in order for the derivative to change sign, it must either pass through zero (a critical point)
or have a singular point. Thus neighbouring regions of increase and decrease will be separated
by critical and singular points.

Example 7.2.1
Consider the function
f (x) = x4 ´ 6x3
• Before we move on to derivatives, let us first examine the function itself as we did above.
– As f (x) is a polynomial its domain is all real numbers.
– Its y-intercept is at (0, 0). We find its x-intercepts by factoring
f (x) = x4 ´ 6x3 = x3 (x ´ 6)
So it crosses the x-axis at x = 0, 6.

1 This is the extension of the definition of “singular point” that was mentioned in the footnote in Definition 3.5.6.

215
S KETCHING G RAPHS 7.2 F IRST DERIVATIVE — INCREASING OR DECREASING

– Again, since the function is a polynomial it does not have any vertical asymptotes. And
since

lim f (x) = lim x4 (1 ´ 6/x) = +8


xÑ˘8 xÑ˘8

it does not have horizontal asymptotes — it blows up to +8 as x goes to ˘8.


– We can also determine where the function is positive or negative since we know it is
continuous everywhere and zero at x = 0, 6. Thus we must examine the intervals

(´8, 0) (0, 6) (6, 8)

When x ă 0, x3 ă 0 and x ´ 6 ă 0 so f (x) = x3 (x ´ 6) = (negative)(negative) ą 0.


Similarly when x ą 6, x3 ą 0, x ´ 6 ą 0 we must have f (x) ą 0. Finally when 0 ă x ă 6,
x3 ą 0 but x ´ 6 ă 0 so f (x) ă 0. Thus
interval (´8, 0) 0 (0, 6) 6 (6, 8)
f (x ) positive 0 negative 0 positive
– Based on this information we can already construct a rough sketch.

• Now we compute its derivative

f 1 (x) = 4x3 ´ 18x2 = 2x2 (2x ´ 9)

• Since the function is a polynomial, it does not have any singular points, but it does have two
critical points at x = 0, 9/2. These two critical points split the real line into 3 open intervals

(´8, 0) (0, 9/2) (9/2, 8)

We need to determine the sign of the derivative in each intervals.

– When x ă 0, x2 ą 0 but (2x ´ 9) ă 0, so f 1 (x) ă 0 and the function is decreasing.


– When 0 ă x ă 9/2, x2 ą 0 but (2x ´ 9) ă 0, so f 1 (x) ă 0 and the function is still
decreasing.
– When x ą 9/2, x2 ą 0 and (2x ´ 9) ą 0, so f 1 (x) ą 0 and the function is increasing.

We can then summarise this in the following table

216
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY

interval (´8, 0) 0 (0, 9/2) 9/2 (9/2, 8)


f 1 (x ) negative 0 negative 0 positive
horizontal
decreasing decreasing minimum increasing
tangent
Since the derivative changes sign from negative to positive at the critical point x = 9/2, this
point is a minimum. Its y-value is
 
93 9
y = f (9/2) = 3 ´6
2 2
 
36 ´3 37
= 3¨ =´ 4
2 2 2
On the other hand, at x = 0 the derivative does not change sign; while this point has a
horizontal tangent line it is not a minimum or maximum.
• Putting this information together we arrive at a quite reasonable sketch.

To improve upon this further we will examine the second derivative.

Example 7.2.1

7.3 IJ Second derivative — concavity

Learning Objectives
• Explain what it means for a twice-differentiable function to be concave up or concave
down on an interval.

• Determine whether a twice-differentiable function is concave up or concave down on


an interval.

• Explain how information about the graph of a function may be extracted from the
function, its derivative and its second derivative.

217
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY

• Sketch the graph of a function f (x) using the function, its derivative and its second
derivative.

• Sketch the graph of a function using characteristics determined from the function and
its derivatives, without scaffolding from an external source.

The second derivative f 2 (x) tells us the rate at which the derivative changes. Perhaps the easiest
way to understand how to interpret the sign of the second derivative is to think about what it implies
about the slope of the tangent line to the graph of the function. Consider the following sketches of
y = 1 + x2 and y = ´1 ´ x2 .

• In the case of y = f (x) = 1 + x2 , f 2 (x) = 2 ą 0. Notice that this means the slope, f 1 (x), of
the line tangent to the graph at x increases as x increases. Looking at the figure on the left
above, we see that the graph always lies above the tangent lines.

• For y = f (x) = ´1 ´ x2 , f 2 (x) = ´2 ă 0. The slope, f 1 (x), of the line tangent to the graph
at x decreases as x increases. Looking at the figure on the right above, we see that the graph
always lies below the tangent lines.
?
Similarly consider the following sketches of y = x´1/2 and y = 4 ´ x:

218
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY

Both of their derivatives, ´ 12 x´3/2 and ´ 12 (4 ´ x)´1/2 , are negative, so they are decreasing functions.
Examining second derivatives shows some differences.
• For the first function, y2 (x) = 34 x´5/2 ą 0, so the slopes of tangent lines are increasing with x
and the graph lies above its tangent lines.
• However, the second function has y2 (x) = ´ 14 (4 ´ x)´3/2 ă 0 so the slopes of the tangent
lines are decreasing with x and the graph lies below its tangent lines.
More generally
Definition 7.3.1.

Let f (x) be a continuous function on the interval [a, b] and suppose its first and second
derivatives exist on that interval.

• If f 2 (x) ą 0 for all a ă x ă b, then the graph of f lies above its tangent lines for
a ă x ă b and it is said to be concave up.

• If f 2 (x) ă 0 for all a ă x ă b, then the graph of f lies below its tangent lines for
a ă x ă b and it is said to be concave down.

• If f 2 (c) = 0 for some a ă c ă b, and the concavity of f changes across x = c, then


we call (c, f (c)) an inflection point.

concave
down
(c,f (c))
inflection
concave point
up

Note that one might also see the terms

219
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY

• “convex” or “convex up” used in place of “concave up”, and

• “concave” or “convex down” used to mean “concave down”.

To avoid confusion we recommend the reader stick with the terms “concave up” and “concave
down”.
Let’s now continue Example 7.2.1 by discussing the concavity of the curve.
Example 7.3.2 (Continuation of Example 7.2.1)
Consider again the function

f (x) = x4 ´ 6x3

• Its first derivative is f 1 (x) = 4x3 ´ 18x2 , so

f 2 (x) = 12x2 ´ 36x = 12x(x ´ 3)

• Thus the second derivative is zero (and potentially changes sign) at x = 0, 3. Thus we should
consider the sign of the second derivative on the following intervals

(´8, 0) (0, 3) (3, 8)

A little algebra gives us

interval (´8, 0) 0 (0, 3) 3 (3, 8)


f 2 (x ) positive 0 negative 0 positive
concavity up inflection down inflection up

Since the concavity changes at both x = 0 and x = 3, the following are inflection points

(0, 0) (3, 34 ´ 6 ˆ 33 ) = (3, ´34 )

• Putting this together with the information we obtained earlier gives us the following sketch

220
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY

Example 7.3.2


Example 7.3.3 Optional — y = x1/3 and y = x2/3

In our Definition 7.3.1, concerning concavity and inflection points, we considered only functions
having first and second derivatives on the entire interval of interest. In this example, we will consider
the functions
f (x) = x1/3 g(x) = x2/3
We shall see that x = 0 is a singular point for both of those functions. There is no universal agreement
as to precisely when a singular point should also be called an inflection point. We choose to extend
our definition of inflection point in Definition 7.3.1 as follows. If
• the function f (x) is defined and continuous on an interval a ă x ă b and if
• the first and second derivatives f 1 (x) and f 2 (x) exist on a ă x ă b except possibly at the single
point a ă c ă b and if
• f is concave up on one side of c and is concave down on the other side of c

then we say that c , f (c) is an inflection point of y = f (x). Now let’s check out y = f (x) and
y = g(x) from this point of view.
(1) Features of y = f (x) and y = g(x) that are read off of f (x) and g(x):
• Since f (0) = 01/3 = 0 and g(0) = 02/3 = 0, the origin (0, 0) lies on both y = f (x) and
y = g(x ).
• For example, 13 = 1 and (´1)3 = ´1 so that the cube root of 1 is 11/3 = 1 and the cube
root of ´1 is (´1)1/3 = ´1. In general,
$
&ă 0 if x ă 0

1/3
x = 0 if x = 0

ą 0 if x ą 0
%

Consequently the graph y = f (x) = x1/3 lies below the x-axis when x ă 0 and lies above
 2
the x-axis when x ą 0. On the other hand, the graph y = g(x) = x2/3 = x1/3 lies on or
above the x-axis for all x.
• As x Ñ +8, both y = f (x) = x1/3 and y = g(x) = x2/3 tend to +8.
• As x Ñ ´8, y = f (x) = x1/3 tends to ´8 and y = g(x) = x2/3 tends to +8.
(2) Features of y = f (x) and y = g(x) that are read off of f 1 (x) and g1 (x):
# +
1 ´2/3
x if x ‰ 0
f 1 (x ) = 3 ùñ f 1 (x) ą 0 for all x ‰ 0
undefined if x = 0
# + #
2 ´1/3
x if x ‰ 0 ă 0 if x ă 0
g1 (x) = 3 ùñ g1 (x)
undefined if x = 0 ą 0 if x ą 0

221
S KETCHING G RAPHS 7.3 S ECOND DERIVATIVE — CONCAVITY

So the graph y = f (x) is increasing on both sides of the singular point x = 0, while the graph
 x Ñ 0, f (x)
y = g(x) is decreasing to the left of x = 0 and is increasing to the right of x = 0. As 1
1
and g (x) become infinite. That is, the slopes of the tangent lines at x, f (x) and x, g(x)
become infinite and the tangent lines become vertical.

(3) Features of y = f (x) and y = g(x) that are read off of f 2 (x) and g2 (x):
# 
2 ´1/3 5
+ #
´9x2 ´5/3
= ´9 x ] if x ‰ 0 ą 0 if x ă 0
f 2 (x ) = ùñ f 2 (x)
undefined if x = 0 ă 0 if x ą 0
#  4
+
´ 92 x´4/3 = ´ 29 x´1/3 ] if x ‰ 0
g2 ( x ) = ùñ g2 (x) ă 0 for all x ‰ 0
undefined if x = 0

So the graph y = g(x) is concave down on both sides of the singular point x = 0, while the
graph y = f (x) is concave up to the left of x = 0 and is concave down to the right of x = 0.

By way of summary, we have, for f (x),

interval (´8, 0) 0 (0, 8)


f (x ) negative 0 positive
f 1 (x ) positive undefined positive
increasing increasing
f 2 (x ) positive undefined negative
concave up inflection concave down

and for g(x),

interval (´8, 0) 0 (0, 8)


g(x ) positive 0 positive
g1 (x) negative undefined positive
decreasing increasing
g2 ( x ) negative undefined negative
concave down concave down

Since the concavity changes at x = 0 for y = f (x), but not for y = g(x), (0, 0) is an inflection point
for y = f (x), but not for y = g(x). We have the following sketch for y = f (x) = x1/3 ,

y = f (x) = x1/3
inflection point

(0, 0) x

f ′ >0, f increasing f ′ >0, f increasing


f ′′ >0, f concave up f ′′ <0, f concave down

222
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES

and the following sketch for y = g(x) = x2/3 .

y = g(x) = x2/3

(0, 0) x
g ′ <0, g decreasing g ′ >0, g increasing
g ′′ <0, g concave down g ′′ <0, g concave down

Note that the curve y = f (x) = x1/3 looks perfectly smooth, even though f 1 (x) Ñ 8 as x Ñ 0.
There is no kink or discontinuity at (0, 0). The singularity at x = 0 has caused the y-axis to be a
vertical tangent to the curve, but has not prevented the curve from looking smooth.
Example 7.3.3

7.4 IJ (optional) Symmetries


Before we proceed to some examples, we should examine some simple symmetries possessed by
some functions. We’ll look at three symmetries — evenness, oddness and periodicity. If a function
possesses one of these symmetries then it can be exploited to reduce the amount of work required to
sketch the graph of the function. (You can, however, still sketch even and odd graphs without taking
advantage of evenness and oddness.)
Let us start with even and odd functions.
Definition 7.4.1.

A function f (x) is said to be even if f (´x) = f (x) for all x.

Definition 7.4.2.

A function f (x) is said to be odd if f (´x) = ´ f (x) for all x.

Example 7.4.3
Let f (x) = x2 and g(x) = x3 . Then

f (´x) = (´x)2 = x2 = f (x)


g(´x) = (´x)3 = ´x3 = ´g(x)

Hence f (x) is even and g(x) is odd.

223
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES

Notice any polynomial involving only even powers of x will be even


f (x) = 7x6 + 2x4 ´ 3x2 + 5 remember that 5 = 5x0
f (´x) = 7(´x)6 + 2(´x)4 ´ 3(´x)2 + 5
= 7x6 + 2x4 ´ 3x2 + 5 = f (x)
Similarly any polynomial involving only odd powers of x will be odd
g(x) = 2x5 ´ 8x3 ´ 3x
g(´x) = 2(´x)5 ´ 8(´x)3 ´ 3(´x)
= ´2x5 + 8x3 + 3x = ´g(x)

Example 7.4.3
Not all even and odd functions are polynomials. For example
|x| cos x and (ex + e´x )
are all even, while
sin x tan x and (ex ´ e´x )
are all odd. Indeed, given any function f (x), the function
g(x) = f (x) + f (´x) will be even, and
h(x) = f (x) ´ f (´x) will be odd.
Now let us see how we can make use of these symmetries to make graph sketching easier. Let
f (x) be an even function. Then
the point (x0 , y0 ) lies on the graph of y = f (x)
if and only if y0 = f (x0 ) = f (´x0 ) which is the case if and only if
the point (´x0 , y0 ) lies on the graph of y = f (x).

(−x0 , y0) y0 (x0 , y0)

−x0 x0

Notice that the points (x0 , y0 ) and (´x0 , y0 ) are just reflections of each other across the y-axis.
Consequently, to draw the graph y = f (x), it suffices to draw the part of the graph with x ě 0 and
then reflect it in the y–axis. Here is an example. The part with x ě 0 is on the left and the full graph
is on the right.

224
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES

y y
(x0 , y0) (−x0 , y0) (x0 , y0)

x x

Very similarly, when f (x) is an odd function then

(x0 , y0 ) lies on the graph of y = f (x)


if and only if

(´x0 , ´y0 ) lies on the graph of y = f (x)

(−x0 , y0 ) y0 (x0 , y0 )

−x0 x0

(−x0 , −y0 ) −y0

Now the symmetry is a little harder to interpret pictorially. To get from (x0 , y0 ) to (´x0 , ´y0 ) one
can first reflect (x0 , y0 ) in the y–axis to get to (´x0 , y0 ) and then reflect the result in the x–axis to get
to (´x0 , ´y0 ). Consequently, to draw the graph y = f (x), it suffices to draw the part of the graph
with x ě 0 and then reflect it first in the y–axis and then in the x–axis. Here is an example. First,
here is the part of the graph with x ě 0.

y
(x0 , y0 )

Next, as an intermediate step (usually done in our heads rather than on paper), we add in the
reflection in the y–axis.

225
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES

y
(−x0 , y0 ) (x0 , y0 )

Finally to get the full graph, we reflect the dashed line in the x–axis

y
(−x0 , y0 ) (x0 , y0 )

(−x0 , −y0 )

and then remove the dashed line.

y
(x0 , y0 )

(−x0 , −y0 )

Let’s do a more substantial example of an even function


Example 7.4.4
Consider the function

x2 ´ 9
g(x ) =
x2 + 3

226
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES

• The function is even since


(´x)2 ´ 9 x2 ´ 9
g(´x) = = = g(x )
(´x)2 + 3 x2 + 3
Thus it suffices to study the function for x ě 0 because we can then use the even symmetry to
understand what happens for x ă 0.
• The function is defined on all real numbers since its denominator x2 + 3 is never zero. Hence
it has no vertical asymptotes.

3 = ´3. And x-intercepts are given by the solution of x ´ 9 = 0,


• The y-intercept is g(0) = ´9 2

namely x = ˘3. Note that we only need to establish x = 3 as an intercept. Then since g is
even, we know that x = ´3 is also an intercept.
• To find the horizontal asymptotes we compute the limit as x Ñ +8
x2 ´ 9
lim g(x) = lim
xÑ8 xÑ8 x2 + 3
x2 (1 ´ 9/x2 )
= lim 2
xÑ8 x (1 + 3/x2 )

1 ´ 9/x2
= lim =1
xÑ8 1 + 3/x2

Thus y = 1 is a horizontal asymptote. Indeed, this is also the asymptote as x Ñ ´8 since by


the even symmetry
lim g(x) = lim g(´x) = lim g(x).
xÑ´8 xÑ8 xÑ8

• We can already produce a quite reasonable sketch just by putting in the horizontal asymptote
and the intercepts and drawing a smooth curve between them.

Note that we have drawn the function as never crossing the asymptote y = 1, however we have
not yet proved that. We could by trying to solve g(x) = 1.
x2 ´ 9
=1
x2 + 3
x2 ´ 9 = x2 + 3
´9 = 3 so no solutions.

227
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES

Alternatively we could analyse the first derivative to see how the function approaches the
asymptote.

• Now we turn to the first derivative:

(x2 + 3)(2x) ´ (x2 ´ 9)(2x)


g1 (x) =
( x2 + 3 ) 2
24x
= 2
(x + 3)2

There are no singular points since the denominator is nowhere zero. The only critical point is
at x = 0. Thus we must find the sign of g1 (x) on the intervals

(´8, 0) (0, 8)

• When x ą 0, 24x ą 0 and (x2 + 3) ą 0, so g1 (x) ą 0 and the function is increasing. By even
symmetry we know that when x ă 0 the function must be decreasing. Hence the critical point
x = 0 is a local minimum of the function.

• Notice that since the function is increasing for x ą 0 and the function must approach the
horizontal asymptote y = 1 from below. Thus the sketch above is quite accurate.

• Now consider the second derivative:

d 24x
g2 ( x ) =
dx (x2 + 3)2
(x2 + 3)2 ¨ 24 ´ 24x ¨ 2(x2 + 3) ¨ 2x
= cancel a factor of (x2 + 3)
( x2 + 3 ) 4
(x2 + 3) ¨ 24 ´ 96x2
=
( x2 + 3 ) 3
72(1 ´ x2 )
= 2
(x + 3)3

• It is clear that g2 (x) = 0 when x = ˘1. Note that, again, we can infer the zero at x = ´1
from the zero at x = 1 by the even symmetry. Thus we need to examine the sign of g2 (x) the
intervals

(´8, ´1) (´1, 1) (1, 8)

• When |x| ă 1 we have (1 ´ x2 ) ą 0 so that g2 (x) ą 0 and the function is concave up. When
|x| ą 1 we have (1 ´ x2 ) ă 0 so that g2 (x) ă 0 and the function is concave down. Thus the
points x = ˘1 are inflection points. Their coordinates are (˘1, g(˘1)) = (˘1, ´2).

• Putting this together gives the following sketch:

228
S KETCHING G RAPHS 7.4 ( OPTIONAL ) S YMMETRIES

Example 7.4.4

Another symmetry we should consider is periodicity.


Definition 7.4.5.

A function f (x) is said to be periodic, with period P ą 0, if f (x + P) = f (x) for all x.

Note that if f (x + P) = f (x) for all x, then replacing x by x + P, we have


f (x + 2P) = f (x + P + P) = f (x + P) = f (x).
More generally f (x + kP) = f (x) for all integers k. Thus if f has period P, then it also has period
nP for all natural numbers n. The smallest period is called the fundamental period.
Example 7.4.6
The classic example of a periodic function is f (x) = sin x, which has period 2π since f (x + 2π ) =
sin(x + 2π ) = sin x = f (x).
Example 7.4.6
If f (x) has period P then
(x0 , y0 ) lies on the graph of y = f (x)
if and only if y0 = f (x0 ) = f (x0 + P) which is the case if and only if
(x0 + P, y0 ) lies on the graph of y = f (x)
and, more generally,
(x0 , y0 ) lies on the graph of y = f (x)
if and only if
(x0 + nP, y0 ) lies on the graph of y = f (x)
for all integers n.
Note that the point (x0 + P, y0 ) can be obtained by translating (x0 , y0 ) horizontally by P. Similarly
the point (x0 + nP, y0 ) can be found by repeatedly translating (x0 , y0 ) horizontally by P.

229
S KETCHING G RAPHS 7.5 A CHECKLIST FOR SKETCHING

(x0 −P,y0 ) (x0 ,y0 ) (x0 +P,y0 ) (x0 +2P,y0 )


y0

x0 − P x0 x0 + P x0 + 2P

Consequently, to draw the graph y = f (x), it suffices to draw one period of the graph, say the part
with 0 ď x ď P, and then translate it repeatedly. Here is an example. Here is a sketch of one period

y (x0 ,y0 )

P x

and here is the full sketch.

(x0 −P,y0 ) y (x0 ,y0 ) (x0 +P,y0 ) (x0 +2P,y0 )

−P P 2P x

7.5 IJ A checklist for sketching


Above we have described how we can use our accumulated knowledge of derivatives to quickly
identify the most important qualitative features of graphs y = f (x). Here we give the reader a quick
checklist of things to examine in order to produce an accurate sketch based on properties that are
easily read off from f (x), f 1 (x) and f 2 (x).

§§ A Sketching Checklist.
(1) Features of y = f (x) that are read off of f (x):

• First check where f (x) is defined. Then


• y = f (x) is plotted only for x’s in the domain of f (x), i.e. where f (x) is defined.
• y = f (x) has vertical asymptotes at the points where f (x) blows up to ˘8.
• Next determine whether the function is even, odd, or periodic.

230
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

• y = f (x) is first plotted for x ě 0 if the function is even or odd. The rest of the sketch is
then created by reflections.
• y = f (x) is first plotted for a single period if the function is periodic. The rest of the sketch
is then created by translations.
• Next compute f (0), limxÑ8 f (x) and limxÑ´8 f (x) and look for solutions to f (x) = 0
that you can easily find. Then

• y = f (x) has y–intercept 0, f (0) .
• y = f (x) has x–intercept (a, 0) whenever f (a) = 0
• y = f (x) has horizontal asymptote y = Y if limxÑ8 f (x) = L or limxÑ´8 f (x) = L.

(2) Features of y = f (x) that are read off of f 1 (x):

• Compute f 1 (x) and determine its critical points and singular points, then
• y = f (x) has a horizontal tangent at the points where f 1 (x) = 0.
• y = f (x) is increasing at points where f 1 (x) ą 0.
• y = f (x) is decreasing at points where f 1 (x) ă 0.
• y = f (x) has vertical tangents or vertical asymptotes at the points where f 1 (x) = ˘8.

(3) Features of y = f (x) that are read off of f 2 (x):

• Compute f 2 (x) and determine where f 2 (x) = 0 or does not exist, then
• y = f (x) is concave up at points where f 2 (x) ą 0.
• y = f (x) is concave down at points where f 2 (x) ă 0.
• y = f (x) may or may not have inflection points where f 2 (x) = 0.

7.6 IJ Sketching examples



Example 7.6.1 Sketch f (x) = x3 ´ 3x + 1

(1) Reading from f (x):

• The function is a polynomial so it is defined everywhere.


• Since f (´x) = ´x3 + 3x + 1 ‰ ˘ f (x), it is not even or odd. Nor is it periodic.
• The y-intercept is y = 1. The x-intercepts are not easily computed since it is a cubic
polynomial that does not factor nicely2 . So for this example we don’t worry about finding
them.
• Since it is a polynomial it has no vertical asymptotes.

2 With the aid of a computer we can find the x-intercepts numerically: x « ´1.879385242, 0.3472963553, and
1.532088886.

231
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

• For very large x, both positive and negative, the x3 term in f (x) dominates the other two
terms so that
#
+8 as x Ñ +8
f (x ) Ñ
´8 as x Ñ ´8

and there are no horizontal asymptotes.

(2) We now compute the derivative:

f 1 (x) = 3x2 ´ 3 = 3(x2 ´ 1) = 3(x + 1)(x ´ 1)

• The critical points (where f 1 (x) = 0) are at x = ˘1. Further since the derivative is a
polynomial it is defined everywhere and there are no singular points. The critical points
split the real line into the intervals (´8, ´1), (´1, 1) and (1, 8).

• When x ă ´1, both factors (x + 1), (x ´ 1) ă 0 so f 1 (x) ą 0.

• Similarly when x ą 1, both factors (x + 1), (x ´ 1) ą 0 so f 1 (x) ą 0.

• When ´1 ă x ă 1, (x ´ 1) ă 0 but (x + 1) ą 0 so f 1 (x) ă 0.

• Summarising all this

(´8, ´1) -1 (-1,1) 1 (1, 8)


f 1 (x ) positive 0 negative 0 positive
increasing maximum decreasing minimum increasing

So (´1, f (´1)) = (´1, 3) is a local maximum and (1, f (1)) = (1, ´1) is a local mini-
mum.

(3) Compute the second derivative:

f 2 (x) = 6x

• The second derivative is zero when x = 0, and the problem is quite easy to analyse. Clearly,
f 2 (x) ă 0 when x ă 0 and f 2 (x) ą 0 when x ą 0.

• Thus f is concave down for x ă 0, concave up for x ą 0 and has an inflection point at
x = 0.

Putting this all together gives:

232
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

y
y = x3 − 3x + 1

(−1, 3)
(0, 1)
x
(1, −1)

f ′ >0, f increasing f ′ <0, f decreasing f ′ >0, f increasing

f ′′ <0, f convex down f ′′ >0, f convex up

Example 7.6.1


Example 7.6.2 Sketch f (x) = x4 ´ 4x3

(1) Reading from f (x):

• The function is a polynomial so it is defined everywhere.


• Since f (´x) = x4 + 4x3 ‰ ˘ f (x), it is not even or odd. Nor is it periodic.
• The y-intercept is y = f (0) = 0, while the x-intercepts are given by the solution of

f (x) = x4 ´ 4x3 = 0
x3 ( x ´ 4 ) = 0

Hence the x-intercepts are 0, 4.


• Since f is a polynomial it does not have any vertical asymptotes.
• For very large x, both positive and negative, the x4 term in f (x) dominates the other term
so that
#
+8 as x Ñ +8
f (x ) Ñ
+8 as x Ñ ´8

and the function has no horizontal asymptotes.

(2) Now compute the derivative f 1 (x):

f 1 (x) = 4x3 ´ 12x2 = 4(x ´ 3)x2

• The critical points are at x = 0, 3. Since the function is a polynomial there are no singular
points. The critical points split the real line into the intervals (´8, 0), (0, 3) and (3, 8).

233
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

• When x ă 0, x2 ą 0 and x ´ 3 ă 0, so f 1 (x) ă 0.


• When 0 ă x ă 3, x2 ą 0 and x ´ 3 ă 0, so f 1 (x) ă 0.
• When 3 ă x, x2 ą 0 and x ´ 3 ą 0, so f 1 (x) ą 0.
• Summarising all this
(´8, 0) 0 (0,3) 3 (3, 8)
f 1 (x ) negative 0 negative 0 positive
horizontal
decreasing decreasing minimum increasing
tangent
So the point (3, f (3)) = (3, ´27) is a local minimum. The point (0, f (0)) = (0, 0) is
neither a minimum nor a maximum, even though f 1 (0) = 0.
(3) Now examine f 2 (x):
f 2 (x) = 12x2 ´ 24x = 12x(x ´ 2)
• So f 2 (x) = 0 when x = 0, 2. This splits the real line into the intervals (´8, 0), (0, 2) and
(2, 8).
• When x ă 0, x ´ 2 ă 0 and so f 2 (x) ą 0.
• When 0 ă x ă 2, x ą 0 and x ´ 2 ă 0 and so f 2 (x) ă 0.
• When 2 ă x, x ą 0 and x ´ 2 ą 0 and so f 2 (x) ą 0.
• Thus the function is convex up for x ă 0, then convex down for 0 ă x ă 2, and finally
convex up again for x ą 2. Hence (0, f (0)) = (0, 0) and (2, f (2)) = (2, ´16) are inflection
points.
Putting all this information together gives us the following sketch.

y y = x4 − 4x3

(0, 0) (4, 0)
x

(2, −16)

(3, −27)
f ′ <0, f decreasing f ′ <0, f decreasing f ′ >0, f increasing

f ′′ >0, f convex up f ′′ <0, convex down f ′′ >0, f convex up

Example 7.6.2


Example 7.6.3 f (x) = x3 ´ 6x2 + 9x ´ 54

234
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

(1) Reading from f (x):

• The function is a polynomial so it is defined everywhere.


• Since f (´x) = ´x3 ´ 6x2 ´ 9x ´ 54 ‰ ˘ f (x), it is not even or odd. Nor is it periodic.
• The y-intercept is y = f (0) = ´54, while the x-intercepts are given by the solution of

f (x) = x3 ´ 6x2 + 9x ´ 54 = 0
x2 ( x ´ 6 ) + 9 ( x ´ 6 ) = 0
(x2 + 9)(x ´ 6) = 0

Hence the only x-intercept is 6.


• Since f is a polynomial it does not have any vertical asymptotes.
• For very large x, both positive and negative, the x3 term in f (x) dominates the other term
so that
#
+8 as x Ñ +8
f (x ) Ñ
´8 as x Ñ ´8

and the function has no horizontal asymptotes.

(2) Now compute the derivative f 1 (x):

f 1 (x) = 3x2 ´ 12x + 9


= 3(x2 ´ 4x + 3) = 3(x ´ 3)(x ´ 1)

• The critical points are at x = 1, 3. Since the function is a polynomial there are no singular
points. The critical points split the real line into the intervals (´8, 1), (1, 3) and (3, 8).
• When x ă 1, (x ´ 1) ă 0 and (x ´ 3) ă 0, so f 1 (x) ą 0.
• When 1 ă x ă 3, (x ´ 1) ą 0 and (x ´ 3) ă 0, so f 1 (x) ă 0.
• When 3 ă x, (x ´ 1) ą 0 and (x ´ 3) ą 0, so f 1 (x) ą 0.
• Summarising all this
(´8, 1) 1 (1,3) 3 (3, 8)
f 1 (x ) positive 0 negative 0 positive
increasing maximum decreasing minimum increasing
So the point (1, f (1)) = (1, ´50) is a local maximum. The point (3, f (3)) = (3, ´54) is
a local minimum.

(3) Now examine f 2 (x):

f 2 (x) = 6x ´ 12

• So f 2 (x) = 0 when x = 2. This splits the real line into the intervals (´8, 2) and (2, 8).
• When x ă 2, f 2 (x) ă 0.
• When x ą 2, f 2 (x) ą 0.

235
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

• Thus the function is convex down for x ă 2, then convex up for x ą 2. Hence (2, f (2)) =
(2, ´52) is an inflection point.
Putting all this information together gives us the following sketch.

y = x3 − 6x2 + 9x − 54
y

(6, 0)

(1,−50)
(0,−54)
(2,−52) (3,−54)

and if we zoom in around the interesting points (minimum, maximum and inflection point), we have

(1,−50)
(0,−54)
(2,−52)
(3,−54)

f ′ >0 f ′ <0 f ′ >0


f increasing f decreasing f increasing

f ′′ <0, f convex down f ′′ >0, convex up

Example 7.6.3

An example of sketching a simple rational function.


 
x
Example 7.6.4 f (x) = 2
x ´4

(1) Reading from f (x):

• The function is rational so it is defined except where its denominator is zero — namely at
x = ˘2.

236
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

´x
• Since f (´x) = = ´ f (x), it is odd. Indeed this means that we only need to examine
x2 ´ 4
what happens to the function for x ě 0 and we can then infer what happens for x ď 0 using
f (´x) = ´ f (x). In practice we will sketch the graph for x ě 0 and then infer the rest from
this symmetry.
• The y-intercept is y = f (0) = 0, while the x-intercepts are given by the solution of f (x) = 0.
So the only x-intercept is 0.
• Since f is rational, it may have vertical asymptotes where its denominator is zero — at
x = ˘2. Since the function is odd, we only have to analyse the asymptote at x = 2 and we
can then infer what happens at x = ´2 by symmetry.
x
lim f (x) = lim = +8
xÑ2+ (x ´ 2)(x + 2)
xÑ2+
x
lim f (x) = lim = ´8
xÑ2 ´ xÑ2 (x ´ 2)(x + 2)
´

• We now check for horizontal asymptotes:


x
lim f (x) = lim
xÑ+8 xÑ+8 x2 ´ 4
1
= lim =0
xÑ+8 x ´ 4/x

(2) Now compute the derivative f 1 (x):

(x2 ´ 4) ¨ 1 ´ x ¨ 2x
f 1 (x ) =
( x2 ´ 4 ) 2
´ ( x2 + 4 )
= 2
(x ´ 4)2
• Hence there are no critical points. There are singular points where the denominator is zero,
namely x = ˘2. Before we proceed, notice that the numerator is always negative and the
denominator is always positive. Hence f 1 (x) ă 0 except at x = ˘2 where it is undefined.
• The function is decreasing except at x = ˘2.
• We already know that at x = 2 we have a vertical asymptote and that f 1 (x) ă 0 for all x. So

lim f 1 (x) = ´8
xÑ2

• Summarising all this


[0,2) 2 (2, 8)
f 1 (x ) negative DNE negative
vertical
decreasing decreasing
asymptote
Remember — we will draw the graph for x ě 0 and then use the odd symmetry to infer the
graph for x ă 0.

237
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

(3) Now examine f 2 (x):

(x2 ´ 4)2 ¨ (2x) ´ (x2 + 4) ¨ 2 ¨ 2x ¨ (x2 ´ 4)


f 2 (x ) = ´
( x2 ´ 4 ) 4
(x2 ´ 4) ¨ (2x) ´ (x2 + 4) ¨ 4x

( x2 ´ 4 ) 3
2x3 ´ 8x ´ 4x3 ´ 16x

( x2 ´ 4 ) 3
2x(x2 + 12)
=
( x2 ´ 4 ) 3

• So f 2 (x) = 0 when x = 0 and does not exist when x = ˘2. This splits the real line into the
intervals (´8, ´2), (´2, 0), (0, 2) and (2, 8). However we only need to consider x ě 0
(because of the odd symmetry).
• When 0 ă x ă 2, x ą 0, (x2 + 12) ą 0 and (x2 ´ 4) ă 0 so f 2 (x) ă 0.
• When x ą 2, x ą 0, (x2 + 12) ą 0 and (x2 ´ 4) ą 0 so f 2 (x) ą 0.

Putting all this information together gives the following sketch for x ě 0:

y y= x
x2 −4

2 x

f ′′ <0 f ′′ >0
convex convex up
down

We can then draw in the graph for x ă 0 using f (´x) = ´ f (x):

y y= x
x2 −4

inflection point

-2 2 x

f ′′ <0 f ′′ >0
convex convex up
down

Notice that this means that the concavity changes at x = 0, so the point (0, f (0)) = (0, 0) is a point
of inflection (as indicated).

238
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

Example 7.6.4

This final example is more substantial since the function has singular points (points where the
derivative is undefined). The analysis is more involved.
 b 2 
x
Example 7.6.5 f (x) = 3 (x´6 )2

(1) Reading from f (x):

• First notice that we can rewrite


d d d
x 2 x 2 1
f (x ) = 3 = 3 2 = 3
(x ´ 6) 2 x ¨ (1 ´ 6/x) 2 (1 ´ 6/x)2

• The function is the cube root of a rational function. The rational function is defined except
at x = 6, so the domain of f is all reals except x = 6.
• Clearly the function is not periodic, and examining
d
1
f (´x) = 3
(1 ´ 6/(´x))2
d
1
= 3 ‰ ˘ f (x )
(1 + 6/x)2

shows the function is neither even nor odd.


• To compute horizontal asymptotes we examine the limit of the portion of the function
inside the cube-root
1
lim =1
xÑ˘8 (1 ´ 6 )2
x

This means we have

lim f (x) = 1
xÑ˘8

That is, the line y = 1 will be a horizontal asymptote to the graph y = f (x) both for
x Ñ +8 and for x Ñ ´8.
• Our function f (x) Ñ +8 as x Ñ 6, because of the (1 ´ 6/x)2 in its denominator. So
y = f (x) has x = 6 as a vertical asymptote.

(2) Now compute f 1 (x). Since we rewrote


  2
6 ´ /3
d
1
f (x ) = 3
= 1´
(1 ´ 6/x)2 x

239
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

we can use the chain rule


  5
2 6 ´ /3 6
f (x ) = ´ 1 ´
1
3 x x2
  5
x ´ 6 ´ /3 1
= ´4
x x2
 5/3
1 1
= ´4 1/3
x´6 x

• Notice that the derivative is nowhere equal to zero, so the function has no critical points.
However there are two places the derivative is undefined. The terms

 5/3
1 1
1/3
x´6 x

are undefined at x = 6, 0 respectively. Hence x = 0, 6 are singular points. These split the
real line into the intervals (´8, 0), (0, 6) and (6, 8).
• When x ă 0, (x ´ 6) ă 0, we have that (x ´ 6)´5/3 ă 0 and x´1/3 ă 0 and so f 1 (x) =
´4 ¨ (negative) ¨ (negative) ă 0.
• When 0 ă x ă 6, (x ´ 6) ă 0, we have that (x ´ 6)´5/3 ă 0 and x´1/3 ą 0 and so f 1 (x) ą 0.
• When x ą 6, (x ´ 6) ą 0, we have that (x ´ 6)´5/3 ą 0 and x´1/3 ą 0 and so f 1 (x) ă 0.
• We should also examine the behaviour of the derivative as x Ñ 0 and x Ñ 6.
  
´5/3 ´1/3
lim f (x) = ´4
1
lim (x ´ 6) lim x = ´8
xÑ0´ xÑ0´ xÑ0´
  
´5/3 ´1/3
lim f (x) = ´4
1
lim (x ´ 6) lim x = +8
xÑ0+ xÑ0+ xÑ0+
  
´5/3 ´1/3
lim f (x) = ´4
1
lim (x ´ 6) lim x = +8
xÑ6´ xÑ6´ xÑ6´
  
´5/3 ´1/3
lim f (x) = ´4
1
lim (x ´ 6) lim x = ´8
xÑ6+ xÑ6+ xÑ6+

We already know that x = 6 is a vertical asymptote of the function, so it is not surprising


that the lines tangent to the graph become vertical as we approach 6. The behavior around
x = 0 is less standard, since the lines tangent to the graph become vertical, but x = 0 is not
a vertical asymptote of the function. Indeed the function takes a finite value y = f (0) = 0.
• Summarising all this

(´8, 0) 0 (0,6) 6 (6, 8)


f 1 (x ) negative DNE positive DNE negative
vertical vertical
decreasing increasing decreasing
tangents asymptote

240
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES

(3) Now look at f 2 (x):

" 5/3 # "  8/3  5/3 #


d 1 1 5 1 1 1 1 1
f (x) = ´4
2
= ´4 ´ ´
dx x´6 x1/3 3 x´6 x1/3 3 x ´ 6 x4/3
 8/3
4 1 1
= 4/3
[5x + (x ´ 6)]
3 x´6 x
 8/3
1 1
=8 4/3
[x ´ 1]
x´6 x

Oof!

 
8/3 8  4
1 1
• Both of the factors x´6 = ?3 x´6 and 41/3 = ? 1
3x are even powers and so are
x
positive (though possibly infinite). So the sign of f 2 (x) is the same as the sign of the factor
x ´ 1. Thus

(´8, 1) 1 (1, 8)
f 2 (x ) negative 0 positive
inflection
concave down concave up
point

Here is a sketch of the graph y = f (x).

q
x2
y= 3
(x−6)2

6 x

f ′ <0, f decreasing f ′ >0 f ′ <0, f decreasing

1
It is hard to see the inflection point at x = 1, y = f (1) = ?
3 in the above sketch. So here is a blow
25
up of the part of the sketch around x = 1.

241
S KETCHING G RAPHS 7.6 S KETCHING EXAMPLES


(1 , 1/ 3 25)
x
6

And if we zoom in even more we have


(1 , 1/ 3 25)

Example 7.6.5

242
Chapter 8

O PTIMIZATION

Learning Objectives
• Determine the critical and singular points of a function.

• Identify local extrema of a function.

• Find the global extrema of a function on a closed interval.

• Explain how the algorithm can be used in optimization problems. (Note that finding a
critical point is not enough to identify an extremum.)

• Convert geometric information into a function optimization problem.

• Interpret model optimization problems based on real-world examples according to


their context.

One important application of differential calculus is to find the maximum (or minimum) value of
a function. This often finds real world applications in problems such as the following.
Example 8.0.1
A farmer has 400m of fencing materials. What is the largest rectangular paddock that can be en-
closed?
Solution. We will describe a general approach to these sorts of problems in Sections 8.2 and 8.3
below, but here we can take a stab at starting the problem.

• Begin by defining variables and their units (more generally we might draw a picture too); let
the dimensions of the paddock be x by y metres.

• The area enclosed is then Am2 where

A = x¨y

243
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA

At this stage we cannot apply the calculus we have developed since the area is a function of
two variables and we only know how to work with functions of a single variable. We need to
eliminate one variable.

• We know that the perimeter of the rectangle (and hence the dimensions x and y) are constrained
by the amount of fencing materials the farmer has to hand:

2x + 2y ď 400

and so we have

y ď 200 ´ x

Clearly the area of the paddock is maximised when we use all the fencing possible, so

y = 200 ´ x

• Now substitute this back into our expression for the area

A = x ¨ (200 ´ x)

Since the area cannot be negative (and our lengths x, y cannot be negative either), we must
also have

0 ď x ď 200

• Thus the question of the largest paddock enclosed becomes the problem of finding the
maximum value of

A = x ¨ (200 ´ x) subject to the constraint 0 ď x ď 200.

Example 8.0.1
The above example is sufficiently simple that we can likely determine the answer by several different
methods. In general, we will need more systematic methods for solving problems of the form

Find the maximum value of y = f (x) subject to a ď x ď b

To do this we need to examine what a function looks like near its maximum and minimum values.

8.1 IJ Local and global maxima and minima


We start by asking:

Suppose that the maximum (or minimum) value of f (x) is f (c) then what does that tell
us about c?

Notice that we have not yet made the ideas of maximum and minimum very precise. For the moment
think of maximum as “the biggest value” and minimum as “the smallest value”.

244
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA

Warning 8.1.1.

It is important to distinguish between “the smallest value” and “the smallest magnitude”.
For example, because

´5 ă ´1

the number ´5 is smaller than ´1. But the magnitude of ´1, which is | ´ 1| = 1, is
smaller than the magnitude of ´5, which is | ´ 5| = 5. Thus the smallest number in the
set t´1, ´5u is ´5, while the number in the set t´1, ´5u that has the smallest magnitude
is ´1.

Now back to thinking about what happens around a maximum. Suppose that the maximum value
of f (x) is f (c), then for all “nearby” points, the function should be smaller.

Consider the derivative of f 1 (c):

f (c + h) ´ f (c)
f 1 (c) = lim .
hÑ0 h

Split the above limit into the left and right limits:

• Consider points to the right of x = c, For all h ą 0,

f (c + h) ď f (c) which implies that


f (c + h) ´ f (c) ď 0 which also implies
f (c + h) ´ f (c) negative
ď0 since = negative.
h positive

But now if we squeeze h Ñ 0 we get

f (c + h) ´ f (c)
lim ď0
hÑ0+ h

(provided the limit exists).

245
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA

• Consider points to the left of x = c. For all h ă 0,


f (c + h) ď f (c) which implies that
f (c + h) ´ f (c) ď 0 which also implies
f (c + h) ´ f (c) negative
ě0 since = positive.
h negative
But now if we squeeze h Ñ 0 we get
f (c + h) ´ f (c)
lim ě0
hÑ0´ h
(provided the limit exists).
• So if the derivative f 1 (c) exists, then the above right- and left-hand limits must agree, which
forces f 1 (c) = 0.
Thus we can conclude that
If the maximum value of f (x) is f (c) and f 1 (c) exists, then f 1 (c) = 0.
Using similar reasoning one can also see that
If the minimum value of f (x) is f (c) and f 1 (c) exists, then f 1 (c) = 0.
Notice two things about the above reasoning:
• Firstly, in order for the argument to work we only need that f (x) ă f (c) for x close to c — it
does not matter what happens for x values far from c.
• Secondly, in the above argument we needed to consider f (x) for x both to the left of and to the
right of c. If the function f (x) is defined on a closed interval [a, b], then the above argument
only applies when a ă c ă b — not when c is either of the endpoints a and b.
Consider the function below

This function has only 1 maximum value (the middle green point in the graph) and 1 minimum
value (the rightmost blue point), however it has 4 points at which the derivative is zero. In the
small intervals around those points where the derivative is zero, we can see that function is locally a
maximum or minimum, even if it is not the global maximum or minimum. We clearly need to be
more careful distinguishing between these cases.

246
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA

Definition 8.1.2.

Let I be an interval, like (a, b) or [a, b] for example, and let the function f (x) be defined
for all x P I. Now let c P I. Then

• we say that f (x) has a global (or absolute) minimum on the interval I at the point
x = c if f (x) ě f (c) for all x P I.

• Similarly, we say that f (x) has a global (or absolute) maximum on I at x = c if


f (x) ď f (c) for all x P I.

• We say that f (x) has a local1 minimum on I at x = c if f (x) ě f (c) for all x P I that
are near c. Precisely, if there is a δ ą 0 such that f (x) ě f (c) for all x P I that are
within a distance δ of c.

• Similarly, we say that f (x) has a local maximum on I at x = c if f (x) ď f (c) for all
x P I that are near c. Precisely, if there is a δ ą 0 such that f (x) ď f (c) for all x P I
that are within a distance δ of c.

The global maxima and minima of a function are called the global extrema of the function,
while the local maxima and minima are called the local extrema.

Consider again the function we showed in the figure above

It has 3 local maxima and 3 local minima on the interval [a, b]. The global maximum occurs at
the middle green point (which is also a local maximum), and the global minimum occurs at the
rightmost blue point (which is also a local minimum).
Using the above definition we can summarise what we have learned above as the following
theorem2 :

1 Beware that, while many textbooks use these definitions of local minimum and maximum, some textbooks exclude
the endpoints a, b of the interval [a, b] from their definitions. Our definitions allow the endpoints a and b to be local
minima and maxima. Note that, under our definitions, every global minimum (maximum) is also a local minimum
(maximum).
2 This is one of several important mathematical contributions made by Pierre de Fermat, a French government lawyer
and amateur mathematician, who lived in the first half of the seventeenth century.

247
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA

Theorem 8.1.3.

Let the function f (x) be defined on the interval I and let a, b, c be points in I with
a ă c ă b. If f (x) has a local maximum or local minimum at x = c and if f 1 (c) exists,
then f 1 (c) = 0.

• It is often (but not always) the case that, when f (x) has a local maximum at x = c, the function
f (x) increases strictly as x approaches c from the left and decreases strictly as x leaves c to
the right. That is, f 1 (x) ą 0 for x just to the left of c and f 1 (x) ă 0 for x just to the right of c.
Then, it is often the case, because f 1 (x) is decreasing as x increases through c, that f 2 (c) ă 0.
• Conversely, if f 1 (c) = 0 and f 2 (c) ă 0, then, just to the right of c, f 1 (x) must be negative,
so that f (x) is decreasing, and just to the left of c, f 1 (x) must be positive, so that f (x) is
increasing. So f (x) has a local maximum at c.
• Similarly, it is often the case that, when f (x) has a local minimum at x = c, f 1 (x) ă 0 for x
just to the left of c and f 1 (x) ą 0 for x just to the right of c and f 2 (x) ą 0.
• Conversely, if f 1 (c) = 0 and f 2 (c) ą 0, then, just to the right of c, f 1 (x) must be positive,
so that f (x) is increasing, and, just to the left of c, f 1 (x) must be negative, so that f (x) is
decreasing. So f (x) has a local minimum at c.

Theorem 8.1.4 (Second Derivative Test).

Let f (x) be defined on the interval I and let a, b, c P I with a ă c ă b.


If f 1 (c) = 0 and f 2 (c) ă 0, then f (x) has a local maximum at c.
If f 1 (c) = 0 and f 2 (c) ą 0, then f (x) has a local minimum at c.
Note the strict inequalities.

Theorem 8.1.3 says that, when f (x) has a local maximum or minimum on an interval I at the
point x = c, there are three possibilities.
• The derivative f 1 (c) = 0. This case is illustrated in the following figure.

y y y = f ′ (x)
y = f (x)

x x
−1 1 2 3 −1 1 2

248
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA

Observe that, in this example, f 1 (x) changes continuously from negative to positive at the
local minimum, taking the value zero at the local minimum (the red dot).

• The derivative f 1 (c) does not exist. This case is illustrated in the following figure.

y y = f (x) y y = f ′ (x)

x
a b

x
a b

Observe that, in this example, f 1 (x) changes discontinuously from negative to positive at the
local minimum (x = 0) and f 1 (0) does not exist.

• The point c is an endpoint of the interval I = [a, b]. This case is also illustrated in the above
figure. The endpoints a and b are both local maxima. But f 1 (a) and f 1 (b) are not zero.

This theorem demonstrates that the points at which the derivative is zero or does not exist are very
important. It simplifies the discussion that follows if we give these points names.

Definition 8.1.5.

Let f (x) be a function that is defined on the interval a ă x ă b and let a ă c ă b. Then

• if f 1 (c) exists and is zero we call x = c a critical point of the function, and

• if f 1 (c) does not exist then we call x = c a singular point3 of the function.

Warning 8.1.6.

Note that some people (and texts) will combine both of these cases and call x = c a critical
point when either the derivative is zero or does not exist. The reader should be aware of
the lack of convention on this point4 and should be careful to understand whether the more
inclusive definition of critical point is being used, or if the text is using the more precise
definition that distinguishes critical and singular points.

3 For c to be a local maximum or minimum of f , the function f must obviously be defined at c. So here we are
considering only points c in the domain of f . We will later, in Section 7.2, extend the definition of singular points
of f to points that are not in the domain of f .

249
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA

We’ll now look at a few simple examples involving local maxima and minima, critical points
and singular points. Then we will move on to global maxima and minima.
Example 8.1.7
In this example, we’ll look for local maxima and minima of the function f (x) = x3 ´ 6x on the
interval ´2 ď x ď 3.

• First compute the derivative

f 1 (x) = 3x2 ´ 6.

Since this is a polynomial it is defined everywhere on the domain and so there will not be any
singular points. So we now look for critical points.

• To do so we look for zeroes of the derivative


? ?
f 1 (x) = 3x2 ´ 6 = 3(x2 ´ 2) = 3(x ´ 2)(x + 2).
?
? takes the value 0 at two different values of x. Namely x = c´ = ´ 2 and
This derivative
x = c+ = 2. Here is a sketch of the graph of f (x).

y
 y = f (x) = x3 − 6x
c− , f (c− )

x
−2 −1 1 2 3


c+ , f (c+ )

From the figure we see that

– f (x) has a local minimum at the endpoint x = ´2 (i.e. we have f (x) ě f (´2) whenever
x ě ´2 is close to ´2) and
– f (x) has a local minimum at x = c+ (i.e. we have f (x) ě f (c+ ) whenever x is close to
c+ ) and
– f (x) has a local maximum at x = c´ (i.e. we have f (x) ď f (c´ ) whenever x is close to
c´ ) and
– f (x) has a local maximum at the endpoint x = 3 (i.e. we have f (x) ď f (3) whenever
x ď 3 is close to 3) and

4 No pun intended.

250
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA

– the global minimum of f (x), for x in the interval ´2 ď x ď 3, is at x = c+ (i.e. we have


f (x) ě f (c+ ) whenever ´2 ď x ď 3) and
– the global maximum of f (x), for x in the interval ´2 ď x ď 3, is at x = 3 (i.e. we have
f (x) ď f (3) whenever ´2 ď x ď 3).
• Note that we have carefully constructed this example to illustrate that the global maximum (or
minimum) of a function on an interval may or may not also be a critical point of the function.
Example 8.1.7

Example 8.1.8
In this example, we’ll look for local maxima and minima of the function f (x) = x3 on the interval
´1 ă x ă 1.
• First compute the derivative:
f 1 (x) = 3x2 .
Again, this is a polynomial and so defined on all of the domain. The function will not have
singular points, but may have critical points.
• The derivative is zero only when x = 0, so x = c = 0 is the only critical point of the function.
• The graph of f (x) is sketched below. From that sketch we see that f (x) has neither a
local maximum nor a local minimum at x = c despite the fact that f 1 (c) = 0 — we have
f (x) ă f (c) = 0 for all x ă c = 0 and f (x) ą f (c) = 0 for all x ą c = 0.

y
y = f (x) = x3

 x
−1 c, f (c) 1

• Note that this example has been constructed to illustrate that a critical point (or singular point)
of a function need not be a local maximum or minimum for the function.
• Reread Theorem 8.1.3. It says5 “Let ¨ ¨ ¨ . If f (x) has a local maximum/minimum at x = c

5 A very common error of logic that people make is “Affirming the consequent”. When the statement “if P then Q” is
true, observing Q does not imply P. (“Affirming the consequent” eliminates “not” from the previous sentence.)
For example, “If he is Shakespeare, then he is dead,” and “That man is dead.” does not imply “He must be
Shakespeare.”. Or you may have also seen someone use this reasoning: “If a person is a genius before their time
then they are misunderstood.” “I am misunderstood.” “So I must be a genius before my time.”.

251
O PTIMIZATION 8.1 L OCAL AND GLOBAL MAXIMA AND MINIMA

and if f 1 (c) exists, then f 1 (c) = 0”. It does not say that “if f 1 (c) = 0 then f has a local
maximum/minimum at x = c”.

Example 8.1.8

Example 8.1.9
In this example, we’ll look for local maxima and minima of the function
#
x if x ě 0
f (x) = |x| =
´x if x ă 0

on the interval ´1 ă x ă 1 and we’ll also look for local maxima and minima of the function

g(x) = x2/3

on the interval ´1 ă x ă 1.

• Again, start by computing the derivatives (reread Example 3.3.15):


$
&1
’ if x ą 0
1
f (x) = undefined if x = 0

´1 if x ă 0
%
#
2 ´1/3
1 x if x ‰ 0
g (x ) = 3
undefined if x = 0

• These derivatives never take the value 0, so the functions f (x) and g(x) do not have any
critical points. However both derivatives do not exist at the point x = 0, so that point is a
singular point for both f (x) and g(x).

• Here is a sketch of the graph of f (x)

y
y = f (x) = |x|

−1 1 x

and a sketch of the graph of g(x).

252
O PTIMIZATION 8.2 F INDING GLOBAL MAXIMA AND MINIMA

y
y = g(x) = x2/3

−1 1 x

From the figures we see that both f (x) and g(x) have a local (and in fact global) minimum at
x = 0 despite the fact that x = 0 is not a critical point.

• Reread Theorem 8.1.3 yet again. It says “Let ¨ ¨ ¨ . If f (x) has a local maximum or local
minimum at x = c and if f is differentiable at x = c, then f 1 (c) = 0”. It says nothing about
what happens at points where the derivative does not exist. Indeed that is why we have to
consider both critical points and singular points when we look for maxima and minima.

Example 8.1.9

8.2 IJ Finding global maxima and minima


We now have a technique for finding local maxima and minima — just look at endpoints of the
interval of interest and for values of x for which either f 1 (x) = 0 or f 1 (x) does not exist. What
about finding global maxima and minima? We’ll start by stating explicitly that, under appropriate
hypotheses, global maxima and minima are guaranteed to exist.

Theorem 8.2.1.

Let the function f (x) be defined and continuous on the closed, finite interval6 ´8 ă a ď
x ď b ă 8. Then f (x) attains a maximum and a minimum at least once. That is, there
exist numbers a ď xm , xM ď b such that

f (xm ) ď f (x) ď f (xM ) for all a ď x ď b

So let’s again consider the question

Suppose that the maximum (or minimum) value of f (x), for a ď x ď b, is f (c). What
does that tell us about c?

6 The hypotheses that f (x) be continuous and that the interval be finite and closed are all essential. We suggest that
you find three functions f1 (x), f2 (x) and f3 (x) with f1 defined but not continuous on 0 ď x ď 1, f2 defined and
continuous on ´8 ă x ă 8, and f3 defined and continuous on 0 ă x ă 1, and with none of f1 , f2 and f3 attaining
either a global maximum or a global minimum.

253
O PTIMIZATION 8.2 F INDING GLOBAL MAXIMA AND MINIMA

If c obeys a ă c ă b (note the strict inequalities), then f has a local maximum (or minimum) at x = c
and Theorem 8.1.3 tells us that either f 1 (c) = 0 or f 1 (c) does not exist. The only other place that a
maximum or minimum can occur are at the ends of the interval. We can summarise this as:

Theorem 8.2.2.

If f (x) has a global maximum or global minimum, for a ď x ď b, at x = c then there are 3
possibilities. Either

• f 1 (c) = 0, or

• f 1 (c) does not exist, or

• c = a or c = b.

That is, a global maximum or minimum must occur either at a critical point, a singular
point or at the endpoints of the interval.

This theorem provides the basis for a method to find the maximum and minimum values of f (x)
for a ď x ď b:

Corollary 8.2.3.

Let f (x) be a function on the interval a ď x ď b. Then to find the global maximum and
minimum of the function:

• Make a list of all values of c, with a ď c ď b, for which

– f 1 (c) = 0, or
– f 1 (c) does not exist, or
– c = a or c = b.

That is — compute the function at all the critical points, singular points, and
endpoints.

• Evaluate f (c) for each c in that list. The largest (or smallest) of those values is the
largest (or smallest) value of f (x) for a ď x ď b.

Let’s now demonstrate how to use this strategy. The function in this first example is not too
simple — but it is a good example of a function that contains both a singular point and a critical
point.
Example 8.2.4
Find the largest and smallest values of the function f (x) = 2x5/3 + 3x2/3 for ´1 ď x ď 1.
Solution. We will apply the method in Corollary 8.2.3. It is perhaps easiest to find the values at the
endpoints of the intervals and then move on to the values at any critical or singular points.

254
O PTIMIZATION 8.2 F INDING GLOBAL MAXIMA AND MINIMA

• Before we get into things, notice that we can rewrite the function by factoring it:

f (x) = 2x5/3 + 3x2/3 = x2/3 ¨ (2x + 3)

• Let’s compute the function at the endpoints of the interval:

f (1) = 2 + 3 = 5
f (´1) = 2 ¨ (´1)5/3 + 3 ¨ (´1)2/3 = ´2 + 3 = 1

• To compute the function at the critical and singular points we first need to find the derivative:
5 2
f 1 (x) = 2 ¨ x2/3 + 3 ¨ x´1/3
3 3
10 2/3
= x + 2x´1/3
3
10x + 6
=
3x1/3
• Notice that the numerator and denominator are defined for all x. The only place the derivative
is undefined is when the denominator is zero. Hence the only singular point is at x = 0. The
corresponding function value is

f (0) = 0

• To find the critical points we need to solve f 1 (x) = 0:


10x + 6
0=
3x1/3
Hence we must have 10x = ´6 or x = ´3/5. The corresponding function value is

f (x) = x2/3 ¨ (2x + 3) recall this from above, then


 
2/3 ´3
f (´3/5) = (´3/5) ¨ 2 ¨ +3
5
 1/3
9 ´6 + 15
= ¨
25 5
 1/3
9 9
= ¨ « 1.28
25 5
Note that if we do not want to approximate the root (if, for example, we do not have a
calculator handy), then we can also write
 1/3
9 9
f (´3/5) = ¨
25 5
 1/3
9 9
= ¨ ¨5
25 25
 4/3
9
= 5¨
25

255
O PTIMIZATION 8.3 M AX / MIN EXAMPLES


9 4/3
Since 0 ă 9/25 ă 1, we know that 0 ă 25 ă 1, and hence
 4/3
9
0 ă f (´3/5) = 5 ¨ ă 5.
25

• We summarise our work in this table

c ´ 35 0 ´1 1
type critical
b point singular point endpoint endpoint
9 9
f (c) 5
3
25 « 1.28 0 1 5

• The largest value of f in the table is 5 and the smallest value of f in the table is 0.

• Thus on the interval ´1 ď x ď 1 the global maximum of f is 5, and is taken at x = 1, while


the global minimum value of f (x) is 0, and is taken at x = 0.

• For completeness we also sketch the graph of this function on the same interval.

y
y = f (x) = 2x5/3 + 3x2/3

x
−1 1

Later (in Section 7) we will see how to construct such a sketch without using a calculator or
computer.

Example 8.2.4

8.3 IJ Max/min examples


As noted at the beginning of this section, the problem of finding maxima and minima is a very
important application of differential calculus in the real world. We now turn to a number of examples
of this process. But to guide the reader we will describe a general procedure to follow for these
problems.

(1) Read — read the problem carefully. Work out what information is given in the statement of the
problem and what we are being asked to compute.

(2) Diagram — draw a diagram. This will typically help you to identify what you know about the
problem and what quantities you need to work out.

256
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

(3) Variables — assign variables to the quantities in the problem along with their units. It is typically
a good idea to make sensible choices of variable names: A for area, h for height, t for time etc.

(4) Relations — find relations between the variables. By now you should know the quantity we
are interested in (the one we want to maximise or minimise) and we need to establish a relation
between it and the other variables.

(5) Reduce — the relation down to a function of one variable. In order to apply the calculus we
know, we must have a function of a single variable. To do this we need to use all the information
we have to eliminate variables. We should also work out the domain of the resulting function.

(6) Maximise or minimise — we can now apply the methods of Corollary 8.2.3 to find the maximum
or minimum of the quantity we need (as the problem dictates).

(7) Be careful — make sure your answer makes sense. Make sure quantities are physical. For
example, lengths and areas cannot be negative.

(8) Answer the question — be sure your answer really answers the question asked in the problem.

Let us start with a relatively simple problem:


Example 8.3.1
A closed rectangular container with a square base is to be made from two different materials. The
material for the base costs $5 per square meter, while the material for the other five sides costs $1
per square meter. Find the dimensions of the container which has the largest possible volume if the
total cost of materials is $72.
Solution. We can follow the steps we outlined above to find the solution.

• We need to determine the area of the two types of materials used and the corresponding total
cost.

• Draw a picture of the box.

The more useful picture is the unfolded box on the right.

257
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

• In the picture we have already introduced two variables. The square base has side-length b
metres and it has height h metres. Let the area of the base be Ab and the area of the other fives
sides be As (both in m2 ), and the total cost be C (in dollars). Finally let the volume enclosed
be V m3 .

• Some simple geometry tells us that

Ab = b2
As = 4bh + b2
V = b2 h
C = 5 ¨ Ab + 1 ¨ As = 5b2 + 4bh + b2 = 6b2 + 4bh.

• To eliminate one of the variables we use the fact that the total cost is $72.

C = 6b2 + 4bh = 72 rearrange


4bh = 72 ´ 6b2 isolate h
72 ´ 6b2 3 12 ´ b2
h= = ¨
4b 2 b
Substituting this into the volume gives
3b 3
V = b2 h = (12 ´ b2 ) = 18b ´ b3
2 2
Now note that since b is a length it cannot be negative, so b ě 0. Further since volume cannot
be negative, we must also have

12 ´ b2 ě 0
?
and so b ď 12.
?
• Now we can apply Corollary 8.2.3 on the above expression for the volume with 0 ď b ď 12.
The endpoints give:

V (0) = 0
?
V ( 12) = 0

The derivative is
9b2
V 1 (b) = 18 ´
2
Since this is a polynomial there are no singular points. However we can solve V 1 (b) = 0 to
find critical points:

9b2
18 ´ =0 divide by 9 and multiply by 2
2
4 ´ b2 = 0

258
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

Hence b = ˘2. Thus the only critical point in the domain is b = 2. The corresponding volume
is
3
V (2) = 18 ˆ 2 ´ ˆ 23
2
= 36 ´ 12 = 24.

So by Corollary 8.2.3, the maximum volume is when 24 when b = 2 and

3 12 ´ b2 3 12 ´ 4
h= ¨ = = 6.
2 b 2 2

• All our quantities make sense; lengths, areas and volumes are all non-negative.

• Checking the question again, we see that we are asked for the dimensions of the container
(rather than its volume) so we can answer with

The container with dimensions 2 ˆ 2 ˆ 6m will be the largest possible.

Example 8.3.1

Example 8.3.2
A rectangular sheet of cardboard is 6 inches by 9 inches. Four identical squares are cut from the
corners of the cardboard, as shown in the figure below, and the remaining piece is folded into an
open rectangular box. What should the size of the cut out squares be in order to maximize the
volume of the box?
Solution. This one is quite similar to the previous one, so we perhaps don’t need to go into so much
detail.

• After reading carefully we produce the following picture:

• Let the height of the box be x inches, and the base be ` ˆ w inches. The volume of the box is
then V cubic inches.

259
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

• Some simple geometry tells us that ` = 9 ´ 2x, w = 6 ´ 2x and so

V = x(9 ´ 2x)(6 ´ 2x)cubic inches


= 54x ´ 30x2 + 4x3 .

Notice that since all lengths must be non-negative, we must have

x, `, w ě 0

and so 0 ď x ď 3 (if x ą 3 then w ă 0).

• We can now apply Corollary 8.2.3. First the endpoints of the interval give

V (0) = 0 V (3) = 0

The derivative is

V 1 (x) = 54 ´ 60x + 12x2


= 6(9 ´ 10x + 2x2 )

Since this is a polynomial there are no singular points. To find critical points we solve
V 1 (x) = 0 to get
?
10 ˘ 100 ´ 4 ˆ 2 ˆ 9
x˘ =
? 4 ? ?
10 ˘ 28 10 ˘ 2 7 5 ˘ 7
= = =
4 4 2
We can then use a calculator to approximate

x+ « 3.82 x´ « 1.18.

So x´ is inside the domain, while x+ lies outside.


?
Alternatively7 , we can bound x˘ by first noting that 2 ď 7 ď 3. From this we know that
?
5´3 5´ 7 5´2
1= ď x´ = ď = 1.5
2 2? 2
5+2 5+ 7 5+3
3.5 = ď x+ = ď =4
2 2 2

• Since the volume


? is zero when x = 0, 3, it must be the case that the volume is maximised when
5´ 7
x = x´ = 2 .

• Notice that since 0 ă x´ ă 3 we know that the other lengths are positive, so our answer makes
sense. Further, the question only asks for the length x and not the resulting volume so we have
answered the question.

7 Say if we do not have a calculator to hand, or your instructor insists that the problem be done without one.

260
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

Example 8.3.2

There is a new wrinkle in the next two examples. Each involves finding the minimum value
of a function f (x) with x running over all real numbers, rather than just over a finite interval as in
Corollary 8.2.3. Both in Example 8.3.4 and in Example 8.3.5 the function f (x) tends to +8 as x
tends to either +8 or ´8. So the minimum value of f (x) will be achieved for some finite value of
x, which will be a local minimum as well as a global minimum.

Theorem 8.3.3.

Let f (x) be defined and continuous for all ´8 ă x ă 8. Let c be a finite real number.

(a) If lim f (x) = +8 and lim f (x) = +8 and if f (x) has a global minimum at
xÑ+8 xÑ´8
x = c, then there are 2 possibilities. Either

• f 1 (c) = 0, or
• f 1 (c) does not exist

That is, a global minimum must occur either at a critical point or at a singular point.

(b) If lim f (x) = ´8 and lim f (x) = ´8 and if f (x) has a global maximum at
xÑ+8 xÑ´8
x = c, then there are 2 possibilities. Either

• f 1 (c) = 0, or
• f 1 (c) does not exist

That is, a global maximum must occur either at a critical point or at a singular point.

Example 8.3.4
Find the point on the line y = 6 ´ 3x that is closest to the point (7, 5).

Solution. In this problem

• A simple picture

261
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

• Some notation is already given to us. Let a point on the line have coordinates (x, y), and we
do not need units. And let ` be the distance from the point (x, y) to the point (7, 5).
• Since the points are on the line the coordinates (x, y) must obey
y = 6 ´ 3x
Notice that x and y have no further constraints. The distance ` is given by
`2 = (x ´ 7)2 + (y ´ 5)2

• We can now eliminate the variable y:


`2 = (x ´ 7)2 + (y ´ 5)2
= (x ´ 7)2 + (6 ´ 3x ´ 5)2 = (x ´ 7)2 + (1 ´ 3x)2
= x2 ´ 14x + 49 + 1 ´ 6x + 9x2 = 10x2 ´ 20x + 50
= 10(x2 ´ 2x + 5)
? a
` = 10 ¨ x2 ´ 2x + 5
Notice that as x Ñ ˘8 the distance ` Ñ +8.
• We can now apply Theorem 8.3.3
– Since the distance is defined for all real x, we do not have to check the endpoints of the
domain — there are none.
– Form the derivative:
d` ? 2x ´ 2
= 10 ? 2
dx 2 x ´ 2x + 5
It is zero when x = 1, and undefined if x2 ´ 2x + 5 ă 0. However, since
x2 ´ 2x + 5 = (x2 ´ 2x + 1) + 4 = loomoon
(x ´ 1)2 +4
ě0

we know that x2 ´ 2x + 5 ě 4. Thus the function has no singular points and the only
critical point occurs at x = 1. The corresponding function value is then
? ? ?
`(1) = 10 1 ´ 2 + 5 = 2 10.

262
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

?
– Thus the minimum value of the distance is ` = 2 10 and occurs at x = 1.
• This answer makes sense — the distance is not negative.
• The question asks for the point that minimises the distance, not that minimum distance. Hence
the answer is x = 1, y = 6 ´ 3 = 3. I.e.
The point that minimises the distance is (1, 3).

Notice that we can make the analysis easier by observing that the point that minimises the
distance also minimises the squared-distance. So that instead of minimising the function `, we can
just minimise `2 :

`2 = 10(x2 ´ 2x + 5)

The resulting algebra is a bit easier and we don’t have to hunt for singular points.
Example 8.3.4

Example 8.3.5
Find the minimum distance from (2, 0) to the curve y2 = x2 + 1.
Solution. This is very much like the previous question.
• After reading the problem carefully we can draw a picture

y
(x, y)

x
(2, 0)

• In this problem we do not need units and the variables x, y are supplied. We define the distance
to be ` and it is given by

`2 = (x ´ 2)2 + y2 .

As noted in the previous problem, we will minimise the squared-distance since that also
minimises the distance.
• Since x, y satisfy y2 = x2 + 1, we can write the distance as a function of x:

`2 = (x ´ 2)2 + y2 = (x ´ 2)2 + (x2 + 1)

Notice that as x Ñ ˘8 the squared-distance `2 Ñ +8.

263
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

• Since the squared-distance is a polynomial it will not have any singular points, only critical
points. The derivative is

d 2
` = 2(x ´ 2) + 2x = 4x ´ 4
dx
so the only critical point occurs at x = 1.
?
• When x = 1, y = ˘ 2 and the distance is
?
`2 = (1 ´ 2)2 + (1 + 1) = 3 `= 3
?
and thus the minimum distance from the curve to (2, 0) is 3.

Example 8.3.5

Example 8.3.6
A water trough is to be constructed from a metal sheet of width 45 cm by bending up one third of
the sheet on each side through an angle θ . Which θ will allow the trough to carry the maximum
amount of water?
Solution. Clearly 0 ď θ ď π, so we are back in the domain8 of Corollary 8.2.3.

• After reading the problem carefully we should realise that it is really asking us to maximise
the cross-sectional area. A figure really helps.

• From this we are led to define the height h cm and cross-sectional area A cm2 . Both are
functions of θ .

h = 15 sin θ

while the area can be computed as the sum of the central 15 ˆ h rectangle, plus two triangles.
Each triangle has height h and base 15 cos θ . Hence

1
A = 15h + 2 ¨ ¨ h ¨ 15 cos θ
2
= 15h (1 + cos θ )

8 Again, no pun intended.

264
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

• Since h = 15 sin θ we can rewrite the area as a function of just θ :

A(θ ) = 225 sin θ (1 + cos θ )

where 0 ď θ ď π.

• Now we use Corollary 8.2.3. The ends of the interval give

A(0) = 225 sin 0(1 + cos 0) = 0


A(π ) = 225 sin π (1 + cos π ) = 0

The derivative is

A1 (θ ) = 225 cos θ ¨ (1 + cos θ ) + 225 sin θ ¨ (´ sin θ )


 
= 225 cos θ + cos2 θ ´ sin2 θ recall sin2 θ = 1 ´ cos2 θ
 
= 225 cos θ + 2 cos2 θ ´ 1

This is a continuous function, so there are no singular points. However we can still hunt for
critical points by solving A1 (θ ) = 0. That is

2 cos2 θ + cos θ ´ 1 = 0 factor carefully


(2 cos θ ´ 1)(cos θ + 1) = 0

Hence we must have cos θ = ´1 or cos θ = 12 . On the domain 0 ď θ ď π, this means θ = π/3
or θ = π.

A(π ) = 0
A(π/3) = 225 sin(π/3)(1 + cos(π/3))
?  
3 1
= 225 ¨ ¨ 1+
2 2
?
3 3
= 225 ¨ « 292.28
4

π
• Thus the cross-sectional area is maximised when θ = .
3

Example 8.3.6

Example 8.3.7
x2
Find the points on the ellipse 4 + y2 = 1 that are nearest to and farthest from the point (1, 0).
Solution. While this is another distance problem, the possible values of x, y are bounded, so we
need Corollary 8.2.3 rather than Theorem 8.3.3.

• We start by drawing a picture:

265
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

y 
x, y(x)

(−2, 0) (1, 0) (2, 0)


x

• Let ` be the distance from the point (x, y) on the ellipse to the point (1, 0). As was the case
above, we will maximise the squared-distance.

`2 = (x ´ 1)2 + y2 .

• Since (x, y) lie on the ellipse we have

x2
+ y2 = 1
4
Note that this also shows that ´2 ď x ď 2 and ´1 ď y ď 1.
Isolating y2 and substituting this into our expression for `2 gives

1 ´ x2 /4 .
`2 = (x ´ 1)2 + looomooon
=y2

• Now we can apply Corollary 8.2.3. The endpoints of the domain give

`2 (´2) = (´2 ´ 1)2 + 1 ´ (´2)2 /4 = 32 + 1 ´ 1 = 9


`2 (2) = (2 ´ 1)2 + 1 ´ 22 /4 = 1 + 1 ´ 1 = 1

The derivative is
d 2 3x
` = 2(x ´ 1) ´ x/2 = ´2
dx 2
Thus there are no singular points, but there is a critical point at x = 4/3. The corresponding
squared-distance is
 2
2 4 (4/3)2
` (4/3) = ´1 +1´
3 4
= (1/3)2 + 1 ´ (4/9) = 6/9 = 2/3.

• To summarise (and giving distances and coordinates of points):

266
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

x (x, y) `
´2 (´2, 0) 3
?  ?
4/3 4/3, ˘ 5/3 2/3

2 (2, 0) 1

? 
The point of maximum distance is (´2, 0), and the point of minimum distance is 4/3, ˘ 5/3 .

Example 8.3.7

Example 8.3.8
Find the dimensions of the rectangle of largest area that can be inscribed in an equilateral triangle of
side a if one side of the rectangle lies on the base of the triangle.
Solution. Since the rectangle must sit inside the triangle, its dimensions are bounded and we will
end up using Corollary 8.2.3.

• Carefully draw a picture:


(0, 3a/2)

√ a (−x, y) (x, y)
3a
2

π/3
a (−a/2, 0) (a/2, 0)
2

We have drawn (on the left) the triangle in the xy-plane with its base on the x-axis. The base
has been drawn running from (´a/2, 0) to (a/2, 0) so its centre lies at the origin. A little
Pythagoras (or a little trigonometry) tells us that the height of the triangle is
?
3
b
π
a2 ´ (a/2)2
= ¨ a = a ¨ sin
2 3
 ? 
3
Thus the vertex at the top of the triangle lies at 0, 2 ¨ a .

• If we construct a rectangle that does not touch the sides of the triangle, then we can increase
the dimensions of the rectangle until it touches the triangle and so make its area larger. Thus
we can assume that the two top corners of the rectangle touch the triangle as drawn in the
right-hand figure above.

267
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

• Now let the rectangle be 2x wide and y high. And let A denote its area. Clearly

A = 2xy.
?
3
where 0 ď x ď a/2 and 0 ď y ď 2 a.

• Our construction means that the top-right corner of the rectangle?will have coordinates (x, y)
and lie on the line joining the top vertex of the triangle at (0, 3a/2) to the bottom-right
vertex at (a/2, 0). In order to write the area as a function of x alone, we need the equation for
this line since it will tell us how to write y as a function of x. The line has slope
?
3a/2 ´ 0 ?
slope = = ´ 3.
0 ´ a/2
?
and passes through the point (0, 3a/2), so any point (x, y) on that line satisfies:
?
? 3
y = ´ 3x + a.
2

• We can now write the area as a function of x alone


 ? 
? 3
A(x) = 2x ´ 3x + a
2
?
= 3x(a ´ 2x).

with 0 ď x ď a/2.

• The ends of the domain give:

A(0) = 0 A(a/2) = 0.

The derivative is
? ?
A1 (x) = 3 (x ¨ (´2) + 1 ¨ (a ´ 2x)) = 3(a ´ 4x).

Since this is a polynomial there are no singular points, but there is a critical point at x = a/4.
There
? a ? a2
A(a/4) = 3 ¨ ¨ (a ´ a/2) = 3 ¨ .
4 ? 8
? 3 ? a
y = ´ 3 ¨ (a/4) + a = 3¨ .
2 4

• Checking the question again, we see that we are asked for the dimensions rather than the area,
so the answer is 2x ˆ y:
?
3a
The largest such rectangle has dimensions a2 ˆ 4 .

268
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

Example 8.3.8

This next one is a good physics example. In it we will derive Snell’s Law9 from Fermat’s
principle10 .
Example 8.3.9
Consider the figure below which shows the trajectory of a ray of light as it passes through two
different mediums (say air and water).

P
θi

θi O

θr
θr
Q

Let ca be the speed of light in air and cw be the speed of light in water. Fermat’s principle states
that a ray of light will always travel along a path that minimises the time taken. So if a ray of light
travels from P (in air) to Q (in water) then it will “choose” the point O (on the interface) so as to
minimise the total time taken. Use this idea to show Snell’s law,

sin θi ca
=
sin θr cw

where θi is the angle of incidence and θr is the angle of refraction (as illustrated in the figure above).

Solution. This problem is a little more abstract than the others we have examined, but we can still
apply Theorem 8.3.3.

• We are given a figure in the statement of the problem and it contains all the relevant points
and angles. However it will simplify things if we decide on a coordinate system. Let’s assume
that the point O lies on the x-axis, at coordinates (x, 0). The point P then lies above the axis at
(XP , +YP ), while Q lies below the axis at (XQ , ´YQ ). This is drawn below.

9 Snell’s law is named after the Dutch astronomer Willebrord Snellius who derived it in around 1621, though it was
first stated accurately in 984 by Ibn Sahl.
10 Named after Pierre de Fermat who described it in a letter in 1662. The beginnings of the idea, however, go back
as far as Hero of Alexandria in around 60CE. Hero is credited with many inventions including the first vending
machine, and a precursor of the steam engine called an aeolipile.

269
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

(XP , +YP )
θi

θi (x, 0)
(XP , 0) (XQ , 0)
θr

θr
(XQ , −YQ )

• The statement of Snell’s law contains terms sin θi and sin θr , so it is a good idea for us to see
how to express these in terms of the coordinates we have just introduced:
opposite (x ´ XP )
sin θi = =b
hypotenuse (XP ´ x)2 + YP2
opposite (XQ ´ x)
sin θr = =b
hypotenuse (XQ ´ x)2 + YQ2

• Let `P denote the distance PO, and `Q denote the distance OQ. Then we have
b
`P = (XP ´ x)2 + YP2
b
`Q = (XQ ´ x)2 + YQ2

If we then denote the total time taken by T , then


`P `Q 1 1b
b
T= + = (XP ´ x)2 + YP2 + (XQ ´ x)2 + YQ2
ca cw ca cw
which is written as a function of x since all the other terms are constants.
• Notice that as x Ñ +8 or x Ñ ´8 the total time T Ñ 8 and so we can apply Theorem 8.3.3.
The derivative is
dT 1 ´2(XP ´ x) 1 ´2(XQ ´ x)
= b + b
dx ca 2 (X ´ x)2 + Y 2 cw 2 (X ´ x)2 + Y 2
P P Q Q

Notice that the terms inside the square-roots cannot be zero or negative since they are both
sums of squares and YP ,YQ ą 0. So there are no singular points, but there is a critical point
when T 1 (x) = 0, namely when
1 XP ´ x 1 XQ ´ x
0= b + b
ca (X ´ x)2 + Y 2 cw (X ´ x)2 + Y 2
P P Q Q
´ sin θi sin θr
= +
ca cw

270
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

Rearrange this to get


sin θi sin θr
= move sines to one side
ca cw
sin θi ca
=
sin θr cw
which is exactly Snell’s law.
Example 8.3.9

Example 8.3.10
The Statue of Liberty has height 46m and stands on a 47m tall pedestal. How far from the statue
should an observer stand to maximize the angle subtended by the statue at the observer’s eye, which
is 1.5m above the base of the pedestal?
Solution. Obviously if we stand too close then all the observer sees is the pedestal, while if they
stand too far then everything is tiny. The best spot for taking a photograph is somewhere in between.
• Draw a careful picture11

and we can put in the relevant lengths and angles.


• The height of the statue is h = 46m, and the height of the pedestal (above the eye) is p =
47 ´ 1.5 = 45.5m. The horizontal distance from the statue to the eye is x. There are two
relevant angles. First θ is the angle subtended by the statue, while ϕ is the angle subtended by
the portion of the pedestal above the eye.
• Some trigonometry gives us
p
tan ϕ =
x
p+h
tan(ϕ + θ ) =
x

11 And make some healthy use of public domain clip art.

271
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

Thus
p
ϕ = arctan
x
p+h
ϕ + θ = arctan
x
and so
p+h p
θ = arctan ´ arctan .
x x
• If we allow the viewer to stand at any point in front of the statue, then 0 ď x ă 8. Further
observe that as x Ñ 8 or x Ñ 0 the angle θ Ñ 0, since
p+h p
lim arctan = lim arctan = 0
xÑ8 x xÑ8 x
and
p+h p π
lim arctan = lim arctan =
xÑ0+ x xÑ0+ x 2
Clearly the largest value of θ will be strictly positive and so has to be taken for some 0 ă x ă 8.
(Note the strict inequalities.) This x will be a local maximum as well as a global maximum.
As θ is not singular at any 0 ă x ă 8, we need only search for critical points. A careful
application of the chain rule shows that the derivative is
   ´p 
dθ 1 ´( p + h) 1
= ¨ ´ ¨
dx 1 + ( p+x h )2 x2 1 + ( xp )2 x2
´( p + h) p
= 2 2
+ 2
x + ( p + h) x + p2
So a critical point occurs when
( p + h) p
= cross multiply
x2 + ( p + h ) 2 x2 + p2
( p + h)(x2 + p2 ) = p(x2 + ( p + h)2 ) collect x terms
x 2 ( p + h ´ p ) = p ( p + h ) 2 ´ p2 ( p + h ) clean up
hx2 = p( p + h)( p + h ´ p) = ph( p + h) cancel common factors
x2 = p ( p + h )
b
x = ˘ p( p + h) « ˘64.9m

• Thus the best place to stand approximately 64.9m in front or behind the statue. At that point
θ « 0.348 radians or 19.9˝ .
Example 8.3.10

Example 8.3.11
Find the length of the longest rod that can be carried horizontally (no tilting allowed) from a corridor
3m wide into a corridor 2m wide. The two corridors are perpendicular to each other.
Solution.

272
O PTIMIZATION 8.3 M AX / MIN EXAMPLES

• Suppose that we are carrying the rod around the corner, then if the rod is as long as possible it
must touch the corner and the outside walls of both corridors. A picture of this is show below.

You can see that this gives rise to two similar triangles, one inside each corridor. Also the
maximum length of the rod changes with the angle it makes with the walls of the corridor.
• Suppose that the angle between the rod and the inner wall of the 3m corridor is θ , as illustrated
in the figure above. At the same time it will make an angle of π2 ´ θ with the outer wall of the
2m corridor. Denote by `1 (θ ) the length of the part of the rod forming the hypotenuse of the
upper triangle in the figure above. Similarly, denote by `2 (θ ) the length of the part of the rod
forming the hypotenuse of the lower triangle in the figure above. Then
3 2
`1 (θ ) = `2 (θ ) =
sin θ cos θ
and the total length is
3 2
`(θ ) = `1 (θ ) + `2 (θ ) = +
sin θ cos θ
where 0 ď θ ď π2 .
• The length of the longest rod we can move through the corridor in this way is the minimum of
`(θ ). Notice that `(θ ) is not defined at θ = 0, π2 . Indeed we find that as θ Ñ 0+ or θ Ñ π2 ´ ,
the length ` Ñ +8. (You should be able to picture what happens to our rod in those two
limits). Clearly the minimum allowed `(θ ) is going to be finite and will be achieved for some
0 ă θ ă π2 (note the strict inequalities) and so will be a local minimum as well as a global
minimum. So we only need to find zeroes of `1 (θ ). Differentiating ` gives
d` 3 cos θ 2 sin θ ´3 cos3 θ + 2 sin3 θ
=´ 2 + = .
dθ sin θ cos2 θ sin2 θ cos2 θ
This does not exist at θ = 0, π2 (which we have already analysed) but does exist at every
0 ă θ ă π2 and is equal to zero when the numerator is zero. Namely when

2 sin3 θ = 3 cos3 θ divide by cos3 θ


2 tan3 θ = 3
c
3 3
tan θ =
2

273
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS

• From this we can recover sinθ and cosθ , without having to ? compute θ itself. We can,
? for
3 3
example, construct
? a right-angle triangle with adjacent length 2 and opposite length 3 (so
3
that tan θ = 3/2):


22/3 + 32/3 √
3
3
θ
√3
2

a
It has hypotenuse 32/3 + 22/3 , and so

31/3
sin θ = a
32/3 + 22/3
21/3
cos θ = a
32/3 + 22/3

Alternatively could use the identities:

1 + tan2 θ = sec2 θ 1 + cot2 θ = csc2 θ

to obtain expressions for 1/ cos θ and 1/ sin θ .

• Using the above expressions for sin θ , cos θ we find the minimum of ` (which is the longest
rod that we can move):

3 2 3 2
`= + = ?3 + ?3
sin θ cos θ ? 3 ? 2
22/3 +32/3 22/3 +32/3
a  
= 22/3 + 32/3 3 /3 + 2 /3
2 2

 2 2 3/2
= 2 /3 + 3 /3 « 7.02m

Example 8.3.11

A new challenge in this section is translating a word-problem into a mathematical problem. We


start with elementary examples, and work to more complex situations with biological motivation.

8.4 IJ Sample optimization problems


In the first examples, the function to optimize is specified, making the problem simply one of
carefully applying calculus methods.

274
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS

8.4.1 §§ Density dependent (logistic) growth in a population


Biologists often notice that the growth rate of a population depends not only on the size of the
population, but also on how crowded it is. Constant growth is not sustainable. When individuals
have to compete for resources, nesting sites, mates, or food, they cannot invest time nor energy in
reproduction, leading to a decline in the rate of growth of the population. Such population growth is
called density dependent growth.
One common example of density dependent growth is called the logistic growth law. Here it is
assumed that the growth rate of the population, G depends on the density of the population, N, as
follows:  
K ´N
G(N ) = rN .
K

Concept Check-In
1. Give an example of units for N.

2. What units might G carry?

Here N is the independent variable, and G(N ) is the function of interest. All other quantities are
constant:

• r ą 0 is a constant, called the intrinsic growth rate, and

• K ą 0 is a constant, called the carrying capacity. It represents the population density that a
given environment can sustain.

Importantly, when differentiating G, we treat r and K as “numbers”. A generic sketch of G as a


function of N is shown in Figure 8.1.

Example 8.4.1 (Logistic growth rate). Answer the following questions:

a) Find the population density N that leads to the maximal growth rate G(N ).

b) Find the value of the maximal growth in terms of r, K.

c) For what population size is the growth rate zero?

Solution. We can expand G(N ):


 
K ´N r 2
G(N ) = rN = rN ´ N ,
K K

from which it is apparent that G(N ) is a polynomial in powers of N, with constant coefficients r and
r/K.

a) To find critical points of G(N ), we find N such that G1 (N ) = 0, and then test for maxima:

r r K
G1 (N ) = r ´ 2 N = 0. ñ r=2 N ñ N= .
K K 2

275
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS

N
0 K/2 K

Figure 8.1: In logistic growth, the population growth rate rate G depends on population size N as
shown here.

Hence, N = K/2 is a critical point, but is it a maximum? We check this in one of several
ways. First, a sketch in Figure 8.1 reveals a downwards-opening parabola. This confirms a local
maximum. Alternately, we can apply Theorem 8.1.4:
r
G2 (N ) = ´2 ă 0 ñ G(N ) concave down
K
K
ñ N = is a local maximum
2

Thus, the population density with the greatest growth rate is K/2.
b) The maximal growth rate is found by evaluating the function G at the critical point, N = K/2,
    !
K K K ´ K2 K 1 rK
G =r =r ¨ = .
2 2 K 2 2 4

c) To find the population size at which the growth rate is zero, we set G = 0 and solve for N:
 
K ´N
G(N ) = rN = 0.
K
There are two solutions. One is trivial: N = 0. (This is biologically interesting in the sense
that it rules out the ancient idea of spontaneous generation - a defunct theory that held that
life can arise on its own, from dust or air. If N = 0, the growth rate is also 0, so no population
spontaneously arises according to logistic growth.) The second solution, N = K means that the
population is at its “carrying capacity”.

We return to this type of growth in Chapter 13.

276
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS

8.4.2 §§ Wine for Kepler’s wedding


In 1613, Kepler set out to purchase a few barrels of wine for his wedding party. To compute the
cost, the merchant would plunge a measuring rod through the tap hole, as shown in Figure 8.2 and
measure the length L of the “wet” part of rod. The cost would be set at a value proportional to L.
Kepler noticed that barrels come in different shapes. Some are tall and skinny, while others are
squat and fat. He conjectured that some shapes would contain larger volumes for a given length L,
i.e. would contain more wine for the same price. Knowing mathematics, he set out to determine
which barrel shape would be the best bargain for his wedding.

Figure 8.2: Barrels come in various shapes. But the cost of a barrel of wine was determined by the
length L (dashed blue line segment) of the wet portion of the rod inserted into the tap hole. Kepler
figured out which barrels contain the most wine for a given price.

Kepler sought the wine barrel that contains the most wine for a given cost. This is equivalent to
asking which cylinder has the largest volume for a fixed (constant) length L. Below, we solve this
optimization problem. An alternate approach is to seek the wine barrel that costs least for a given
volume,which leads to the same result.

Example 8.4.2. Find the proportions (height:radius) of the cylinder with largest volume for a fixed
length L (dashed line segment in Figure 8.2).
Solution. We make the following assumptions:

1. the barrel is a simple cylinder, as shown in Figure 8.3,

2. the tap-hole (normally sealed to avoid leaks) is half-way up the height of the barrel, and

3. the barrel is full to the top with wine.

Concept Check-In
3. Give two different examples of barrel dimensions which would both yeild a volume of
160L.

Let r, h denote the radius and height of the barrel. These two variables uniquely determine the shape
as well as the volume of the barrel. Note that because the barrel is assumed to be full, the volume of
the cylinder is the same as the volume of wine, namely

V = base area ˆ height. ñ V = πr2 h. (8.4.1)

277
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS

The rod used to “measure” the amount of wine (and hence determine the cost of the barrel) is shown
as the diagonal of length L in Figure 8.3. Because the cylinder walls are perpendicular to its base, the
length L is the hypotenuse of a right-angle triangle whose other sides have lengths 2r and h/2. (This
follows from the assumption that the tap hole is half-way up the side.) Thus, by the Pythagorean
theorem,  2
2 2 h
L = (2r ) + . (8.4.2)
2
The problem can now be stated mathematically: maximize V in Eqn. (8.4.1) subject to a fixed
value of L in Eqn. (8.4.2). The fact that L is fixed means that we have a constraint, as before, that
we use to reduce the number of variables in the problem.

h
L h
2

2r
Figure 8.3: We simplify the problem to a cylindrical barrel with diameter 2r and height h. We
assumed that the height of the tap-hole is h/2. Length L denotes the “wet” portion of the merchant’s
rod, used to determine the cost. We observe a Pythagorean triangle formed by the dashed line
segments.

Expanding the squares in the constraint and solving for r2 leads to


 
2 2 h2 2 1 2 h2
L = 4r + ñ r = L ´ .
4 4 4
When we use this to eliminate r from the expression for V , we obtain
   
2 π 2 h2 π 2 1 3
V = πr h = L ´ h= L h´ h .
4 4 4 4

278
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS

The mathematical problem to solve is now: find h that maximizes


 
π 2 1 3
V (h) = L h´ h .
4 4

The function V (h) is positive for h in the range 0 ď h ď 2L, and V = 0 at the two endpoints of the
interval. We can restrict attention to this interval since otherwise V ă 0, which makes no physical
sense. Since V (h) is a smooth function, we anticipate that somewhere inside this range of values
there should be a maximal volume.
Computing first and second derivatives, we find
   
π 2 3 2 π 3 3
1
V (h) = L ´ h , V (h) = 2
0 ´ 2 ¨ h = ´ πh ă 0.
4 4 4 4 8

Setting V 1 (h) = 0 to find critical points, we then solve for h:


3
V 1 (h) = 0 ñ L2 ´ h2 = 0 ñ 3h2 = 4L2
4
L2 L
ñ h2 = 4 ñ h = 2? .
3 3
We verify that this solution is a local maximum by the following reasoning.
The second derivative V 2 (h) = ´ 38 πh ă 0 is always negative for any positive value of h, so V (h)
is concave down for h ą 0, which confirms a local maximum. We also noted that V (r ) is smooth,
positive within the range of interest and zero at the endpoints. As there is only one critical point in
that range, it must be a local maximum.
Finally, we find the radius of the barrel by plugging the optimal h into the constraint equation,
i.e. using      
2 1 2 h2 1 2 L2 1 2 2
r = L ´ = L ´ = L
4 4 4 3 4 3
1
ñ r = ? ? L.
3 2
The shape of the optimal barrel can now be characterized. One way to do so is to specify the ratio
of its height to its radius. (Tall skinny barrels have a largeh/r ratio, and squat fat ones have a low
ratio.) By the above reasoning, the ratio of h/r for the optimal barrel is

h 2 ?L3 ?
= 1 = 2 2. (8.4.3)
r ? ? L
3 2

Hence,
? for greatest economy, Kepler would have purchased barrels with height to radius ratio of
2 2 = 2.82 « 3. ♦

Concept Check-In
4. If all barrels had a radius of 25cm, given the result Example 8.4.2, what would be the best
barrel height?

5. What would the volume of such a barrel be?

279
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS

6. Consider a barrel with radius 25cm and height 100cm. What is this barrel’s volume?

8.4.3 §§ Optimal foraging


Animals spend much of their time foraging - searching for food. Time is limited, since when the
sun goes down, the risk of becoming food (to a predator) increases, and the likelihood of finding
food decreases. Individuals who are most successful at finding food over that limited time have
the greatest chance of surviving. It is argued by biologists that evolution tends to optimize animal
behaviour by selecting those that are faster, stronger, or more fit, or - in this case - most efficient at
finding food.
In this section, we investigate a model for optimal foraging. We follow the basic principles of
(?) and (?).

f
food patch
travel time τ
time t

Figure 8.4: A bird travels daily to forage in a food patch. We want to determine how long it should
stay in the patch to optimize its overall average energy gain per unit time.

Notation. We define the following notation:

• τ= travel time between nest and food patch (this is considered to be time that is unavoidably
wasted).
• t = residence time in the patch (i.e. how long to spend foraging in one patch), also called
foraging time,
• f (t ) = total energy gained by foraging in a patch for time t.

Energy gain in food patches.


Concept Check-In
7. Which of the energy gain functions in Figure 8.5 are strictly increasing?

In some patches, food is ample and found quickly, while in others, it takes time and effort to obtain.
The typical time needed to find food is reflected by various energy gain functions f (t ) shown in
Figure 8.5.
Example 8.4.3 (Energy gain versus patch residence time). For each panel in Figure 8.5, explain
what the graph of the total energy gain f (t ) is saying about the type of food patch: how easy or hard
is it to find food?
Solution. The types of food patches are as follows:

280
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS

f (t) f (t) f (t)

t t t

(a) (b) (c)

f (t) f (t) f (t)

t t t

(d) (e) (f)

Figure 8.5: Examples of various total energy gain f (t ) for a given foraging time t. The shapes of
these functions determine how hard or easy it is to extract food from a food patch.

1. The energy gain is linearly proportional to time spent in the patch. In this case, the patch has
so much food that it is never depleted. It would make sense to stay in such a patch for as long
as possible.

2. Energy gain is independent of time spent. The animal gets the full quantity as soon as it gets
to the patch.

3. Food is gradually depleted, (the total energy gain levels off to some constant as t increases).
There is “diminishing return” for staying longer, suggesting that it is best not to stay too long.

4. The reward for staying longer in this patch increases: the net energy gain is concave up
( f 2 (t ) ą 0), so its slope is increasing.

5. It takes time to begin to gain energy. After some time, the gain increases, but eventually, the
patch is depleted.

6. Staying too long in such a patch is disadvantageous, resulting in net loss of energy. It is
important to leave this patch early enough to avoid that loss.

Concept Check-In

281
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS

8. Which model(s) can you automatically dismiss as not very biologically realistic?

Example 8.4.4. Consider the hypothetical patch energy function


Emaxt
f (t ) = where Emax , k ą 0, are constants. (8.4.4)
k +t
a) Match this function to one of the panels in Figure 8.5.

b) Interpret the meanings of the constants Emax , k.


Solution.
a) The function resembles Michaelis-Menten kinetics (Figure 1.7). In Figure 8.5, Panel (3) is the
closest match.

b) From Chapter 1, Emax is the horizontal asymptote, corresponding to an upper bound for the total
amount of energy that can be extracted from the patch. The parameter k has units of time and
controls the steepness of the function. Foraging for a time t = k, leads the animal to obtain half
of the total available energy, since f (k) = Emax /2. ♦
Example 8.4.5 (Currency to optimize). We can assume that animals try to maximize the average
energy gain per unit time, defined by the ratio:
Total energy gained
R(t ) = ,
total time spent

Write down R(t ) for the assumed patch energy function Eqn. 8.4.4.
Solution. The ‘total time spent’ is a sum of the fixed amount of time τ traveling, and time t foraging.
The ‘total energy gained’ is f (t ). Thus, for the patch function f (t ) assumed in Eqn. (8.4.4),

Concept Check-In
9. What units might be used in the function R(t )?

f (t ) Emaxt
R(t ) = = . (8.4.5)
(τ + t ) (k + t )(τ + t )

We can now state the mathematical problem:

Find the time t that maximizes R(t ).

In finding such a t we are determining the optimal residence time.


Example 8.4.6. Use tools of calculus and curve-sketching to find and classify the critical points of
R(t ) in Eqn. (8.4.5).
Solution. We first sketch R(t ), focusing on t ą 0 for biological relevance.
• For t « 0, we have R(t ) « (Emax /kτ )t, which is a linearly increasing function.

282
O PTIMIZATION 8.4 S AMPLE OPTIMIZATION PROBLEMS

R(t) R(t)

t t

Figure 8.6: In Example 8.4.6 we first compose a rough sketch of the average rate of energy gain
R(t ) in Eqn. (8.4.5). The graph is linear near the origin, and decays to zero at large t.

• As t Ñ 8, R(t ) Ñ Emaxt/t 2 Ñ 0, so the graph eventually decreases to zero.


These two conclusions are shown in Figure 8.6 (left panel), and strongly suggest that there
should be a local maximum in the range 0 ă t ă 8, as shown in the right panel of Fig 8.6. Since the
function is continuous for t ą 0, this sketch verifies that there is a local maximum for some positive
t value.
To find a local maximum, we compute R1 (t ) using the quotient rule,and set R1 (t ) = 0:

kτ ´ t 2
R1 (t ) = Emax = 0. (8.4.6)
(k + t )2 (τ + t )2
This can only be satisfied if the numerator is zero, that is
?
kτ ´ t 2 = 0 ñ t1,2 = ˘ kτ.

? the (irrelevant) negative root, we deduce that the critical point of the function R(t ) is
Rejecting
tcrit = kτ. The sketch in Figure 8.6, verifies that this critical point is a local maximum. ♦
?
Example 8.4.7. For practice, use one of the calculus tests for critical points to show that tcrit = kτ
is a local maximum for the function R(t ) in Eqn. (8.4.5).
Solution. R(t ) is a rational function, so a second derivative is messy. Instead, we apply the first
derivative test –that is, we check the sign of R1 (t ) on both sides of the critical point.
• Eqn. (8.4.6) gives R1 (t ). Its denominator is positive, so the sign of R1 (t ) is determined by its
numerator, (kτ ´ t 2 ).
• Thus, R1 (t ) ą 0 for t ă tcrit , and R1 (t ) ă 0 for t ą tcrit .
This confirms that the function increases up to the critical point and decreases afterwards, so the
critical point is a local maximum, henceforth denoted tmax . ♦
To optimize the average rate of?energy gain, R(t ), we found that the animal should stay in the
patch for a duration of t = tmax = kτ.

283
O PTIMIZATION 8.5 S UMMARY

Concept Check-In
10. Given tmax is the duration of time an animal should stay in a patch, and τ is travelling
time, explain why the constant k is also in units of time.

Example 8.4.8. Determine the average rate of energy gain at this optimal patch residence time, i.e.
find the maximal average rate of energy gain.
?
Solution. Computing R(t ) for t = tmax = kτ, we find that
Emaxtmax Emax 1
R(tmax ) = = ? . (8.4.7)
(k + tmax )(τ + tmax ) τ (1 + k/τ )2

8.5 IJ Summary
1. Optimization is a process of finding critical points, and identifying local and global max-
ima/minima.

2. A scientific problem that address “biggest/smallest, best, most efficient” is often reducible to
an optimization problem.

3. As with all mathematical models, translating scientific observations and reasonable assump-
tions into mathematical terms is an important first step.

4. The following applications were considered:

(a) Density dependent population growth. Using a given logistic growth law, the following
parameters were considered:
• population growth rate (to be maximized),
• population density,
• intrinsic growth rate (constant),
• carrying capacity (constant).
(b) Wine for Kepler’s wedding, seeking the largest barrel volume for a fixed diagonal length.
The following parameters were considered:
• barrel volume, (to be maximized)
• barrel height,
• barrel radius,
• length of the diagonal (constant).
(c) Foraging time for an animal collecting food. We considered:
• travel time between nest and food patch,
• foraging time in the patch,
• energy gained by foraging in a patch for various time durations.

284
O PTIMIZATION 8.5 S UMMARY

Quick Concept Check


1. If the growth rate of a population follows the following logistic equation:
 
50000 ´ N
G(N ) = 1.2N ,
50000

where N is the density of the population, under what circumstances is the population
growing fastest?

2. When finding a global maximum, why is always imperative to check the endpoints?

3. Demonstrate the variability of barrel dimensions by giving two different height and radius
pairs which lead to a volume of 50L.

4. Would the answer to Kepler’s wine barrel problem have changed if we had solved for h2
instead of r2 ?

285
O PTIMIZATION 8.5 S UMMARY

286
Chapter 9

A PPROXIMATING F UNCTIONS
N EAR A S PECIFIED P OINT—
TAYLOR P OLYNOMIALS

Learning Objectives
• Use a linear approximation to approximate a differentiable function that is difficult to
evaluate exactly. This includes choosing an appropriate centre point.

• Use a linear approximation to approximate an irrational number with a rational number.


This may include choosing an appropriate centre point as well as an appropriate
function.

• Explain what a degree n approximation of a function is.

• Determine degree n approximations for appropriately differentiable functions.


1
• State the Maclaurin polynomials for the standard functions: 1´x , ex , cos x, sin x,
log (1 + x).

Suppose that you are interested in the values of some function f (x) for x near some fixed point
a. When the function is a polynomial or a rational function we can use some arithmetic (and maybe
some hard work) to write down the answer. For example:

x2 ´ 3
f (x ) =
x2 ´ 2x + 4
1 1´75
25 ´ 3 25
f (1/5) = 1
= 1´10+100
´ 25 + 4
25 25
´74
=
91
Tedious, but we can do it. On the other hand if you are asked to compute sin(1/10) then what can

287
TAYLOR P OLYNOMIALS 9.1 Z EROTH APPROXIMATION — THE CONSTANT APPROXIMATION

we do? We know that a calculator can work it out


sin(1/10) = 0.09983341 . . .
but how does the calculator do this? How did people compute this before calculators1 ? A hint comes
from the following sketch of sin(x) for x around 0.

Figure 9.0.1.

1
x
sin x
0.5

x
´1 ´0.5 0.5 1

´0.5

´1

The above figure shows that the curves y = x and y = sin x are almost the same when x is close
to 0. Hence if we want the value of sin(1/10) we could just use this approximation y = x to get
sin(1/10) « 1/10.
Of course, in this case we simply observed that one function was a good approximation of the other.
We need to know how to find such approximations more systematically.
More precisely, say we are given a function f (x) that we wish to approximate close to some
point x = a, and we need to find another function F (x) that
• is simple and easy to compute2
• is a good approximation to f (x) for x values close to a.
Further, we would like to understand how good our approximation actually is. Namely we need to
be able to estimate the error | f (x) ´ F (x)|.
There are many different ways to approximate a function and we will discuss one family of
approximations: Taylor polynomials. This is an infinite family of ever improving approximations,
and our starting point is the very simplest.

9.1 IJ Zeroth approximation — the constant approximation


The simplest functions are those that are constants. And our zeroth3 approximation will be by
a constant function. That is, the approximating function will have the form F (x) = A, for some
constant A. Notice that this function is a polynomial of degree zero.

1 Originally the word “calculator” referred not to the software or electronic (or even mechanical) device we think of
today, but rather to a person who performed calculations.
2 It is no good approximating a function with something that is even more difficult to work with.
3 It barely counts as an approximation at all, but it will help build intuition. Because of this, and the fact that a
constant is a polynomial of degree 0, we’ll start counting our approximations from zero rather than 1.

288
TAYLOR P OLYNOMIALS 9.1 Z EROTH APPROXIMATION — THE CONSTANT APPROXIMATION

To ensure that F (x) is a good approximation for x close to a, we choose A so that f (x) and F (x)
take exactly the same value when x = a.

F (x ) = A so F (a) = A = f (a) ùñ A = f (a)

Our first, and crudest, approximation rule is

Equation 9.1.1 (Constant approximation).

f (x ) « f (a)

An important point to note is that we need to know f (a) — if we cannot compute that easily then
we are not going to be able to proceed. We will often have to choose a (the point around which we
are approximating f (x)) with some care to ensure that we can compute f (a).
Here is a figure showing the graphs of a typical f (x) and approximating function F (x). At

y
y = f (x)

y = F (x) = f (a)

a x

x = a, f (x) and F (x) take the same value. For x very near a, the values of f (x) and F (x) remain
close together. But the quality of the approximation deteriorates fairly quickly as x moves away
from a. Clearly we could do better with a straight line that follows the slope of the curve. That is
our next approximation.
But before then, an example:
Example 9.1.2
Use the constant approximation to estimate e0.1 .
Solution. First set f (x) = ex .
• Now we first need to pick a point x = a to approximate the function. This point needs to be
close to 0.1 and we need to be able to evaluate f (a) easily. The obvious choice is a = 0.
• Then our constant approximation is just

F (x) = f (0) = e0 = 1
F (0.1) = 1

Note that e0.1 = 1.105170918 . . . , so even this approximation isn’t too bad..
Example 9.1.2

289
TAYLOR P OLYNOMIALS 9.2 L INEAR APPROXIMATION

9.2 IJ First approximation — the linear approximation


Our first4 approximation improves on our zeroth approximation by allowing the approximating
function to be a linear function of x rather than just a constant function. That is, we allow F (x) to be
of the form A + Bx, for some constants A and B.
To ensure that F (x) is a good approximation for x close to a, we still require that f (x) and F (x)
have the same value at x = a (that was our zeroth approximation). Our additional requirement is
that their tangent lines at x = a have the same slope — that the derivatives of f (x) and F (x) are the
same at x = a. Hence

F (x) = A + Bx ùñ F (a) = A + Ba = f (a)


F 1 (x ) = B ùñ F 1 (a) = B = f 1 (a)

So we must have B = f 1 (a). Substituting this into A + Ba = f (a) we get A = f (a) ´ a f 1 (a). So
we can write
A
hkkkkkkikkkkkkj
F (x) = A + Bx = f (a) ´ a f 1 (a) + f 1 (a) ¨ x
= f (a) + f 1 (a) ¨ (x ´ a)

We write it in this form because we can now clearly see that our first approximation is just an
extension of our zeroth approximation. This first approximation is also often called the linear
approximation of f (x) about x = a.

Equation 9.2.1 (Linear approximation).

f (x) « f (a) + f 1 (a)(x ´ a)

We should again stress that in order to form this approximation we need to know f (a) and f 1 (a) —
if we cannot compute them easily then we are not going to be able to proceed.
Recall, from Theorem 3.3.7, that y = f (a) + f 1 (a)(x ´ a) is exactly the equation of the tangent
line to the curve y = f (x) at a. Here is a figure showing the graphs of a typical f (x) and the
approximating function F (x). Observe that the graph of f (a) + f 1 (a)(x ´ a) remains close to the

y y = F (x) = f (a) + f ′ (a)(x − a)


y = f (x)

a x

4 Recall that we started counting from zero.

290
TAYLOR P OLYNOMIALS 9.2 L INEAR APPROXIMATION

graph of f (x) for a much larger range of x than did the graph of our constant approximation, f (a).
One can also see that we can improve this approximation if we can use a function that curves down
rather than being perfectly straight. That is our next approximation.
But before then, back to our example:
Example 9.2.2
Use the linear approximation to estimate e0.1 .
Solution. First set f (x) = ex and a = 0 as before.
• To form the linear approximation we need f (a) and f 1 (a):
f (x) = ex f (0) = 1
f 1 (x) = ex f 1 (0) = 1
• Then our linear approximation is
F (x ) = f (0) + x f 1 (0) = 1 + x
F (0.1) = 1.1

Recall that e0.1 = 1.105170918 . . . , so the linear approximation is almost correct to 3 digits.
Example 9.2.2

It is worth doing another simple example here.


Example 9.2.3
?
Use a linear approximation to estimate 4.1.
? 1
Solution. First set f (x) = x. Hence f 1 (x) = ?
2 x
. Then we are trying to approximate f (4.1).
Now we need to choose a sensible a value.
• We need to choose a so that f (a) and f 1 (a) are easy to compute.
– We could try a = 4.1 — but then we need to compute f (4.1) and f 1 (4.1) — which is
our original problem and more!
– We could try a = 0 — then f (0) = 0 and f 1 (0) = DNE.
– Setting a = 1 gives us f (1) = 1 and f 1 (1) = 12 . This would work, but we can get a better
approximation by choosing a is closer to 4.1.
– Indeed we can set a to be the square of any rational number and we’ll get a result that is
easy to compute.
– Setting a = 4 gives f (4) = 2 and f 1 (4) = 14 . This seems good enough.
• Substitute this into equation (9.2.1) to get
f (4.1) « f (4) + f 1 (4) ¨ (4.1 ´ 4)
0.1
= 2+ = 2 + 0.025 = 2.025
4
?
Notice that the true value is 4.1 = 2.024845673 . . . .
Example 9.2.3

291
TAYLOR P OLYNOMIALS 9.3 Q UADRATIC APPROXIMATION

9.3 IJ Second approximation — the quadratic approximation


We next develop a still better approximation by now allowing the approximating function be to a
quadratic function of x. That is, we allow F (x) to be of the form A + Bx + Cx2 , for some constants
A, B and C. To ensure that F (x) is a good approximation for x close to a, we choose A, B and C so
that

• f (a) = F (a) (just as in our zeroth approximation),

• f 1 (a) = F 1 (a) (just as in our first approximation), and

• f 2 (a) = F 2 (a) — this is a new condition.

These conditions give us the following equations

F (x) = A + Bx + Cx2 ùñ F (a) = A + Ba + Ca2 = f (a)


F 1 (x) = B + 2Cx ùñ F 1 (a) = B + 2Ca = f 1 (a)
F 2 (x) = 2C ùñ F 2 (a) = 2C = f 2 (a)

Solve these for C first, then B and finally A.

C = 21 f 2 (a) substitute
B = f 1 (a) ´ 2Ca = f 1 (a) ´ a f 2 (a) substitute again
A = f (a) ´ Ba ´Ca2 = f (a) ´ a[ f 1 (a) ´ a f 2 (a)] ´ 12 f 2 (a)a2

Then put things back together to build up F (x):

F (x) = f (a) ´ f 1 (a)a + 21 f 2 (a)a2 (this line is A)


+ f 1 (a) x ´ f 2 (a)ax (this line is Bx)
+ 21 f 2 (a)x2 (this line is Cx2 )
= f (a) + f 1 (a)(x ´ a) + 21 f 2 (a)(x ´ a)2

Oof! We again write it in this form because we can now clearly see that our second approximation is
just an extension of our first approximation.
Our second approximation is called the quadratic approximation:

Equation 9.3.1 (Quadratic approximation).

f (x) « f (a) + f 1 (a)(x ´ a) + 12 f 2 (a)(x ´ a)2

Here is a figure showing the graphs of a typical f (x) and approximating function F (x). This new

292
TAYLOR P OLYNOMIALS 9.3 Q UADRATIC APPROXIMATION

y
y = f (x)
y = F (x) = f (a) + f ′ (a)(x − a) + 21 f ′′ (a)(x − a)2

a x

approximation looks better than both the first and second.


Now there is actually an easier way to derive this approximation, which we show you now. Let
us rewrite5 F (x) so that it is easy to evaluate it and its derivatives at x = a:

F (x ) = α + β ¨ (x ´ a) + γ ¨ (x ´ a)2

Then

F (x ) = α + β ¨ (x ´ a) + γ ¨ (x ´ a)2 F (a) = α = f (a)


F 1 (x) = β + 2γ ¨ (x ´ a) F 1 (a) = β = f 1 (a)
F 2 (x) = 2γ F 2 (a) = 2γ = f 2 (a)

And from these we can clearly read off the values of α, β and γ and so recover our function
F (x). Additionally if we write things this way, then it is quite clear how to extend this to a cubic
approximation and a quartic approximation and so on.
Return to our example:
Example 9.3.2
Use the quadratic approximation to estimate e0.1 .
Solution. Set f (x) = ex and a = 0 as before.

• To form the quadratic approximation we need f (a), f 1 (a) and f 2 (a):

f (x) = ex f (0) = 1
f 1 (x) = ex f 1 (0) = 1
f 2 (x) = ex f 2 (0) = 1

• Then our quadratic approximation is

1 x2
F ( x ) = f ( 0 ) + x f 1 ( 0 ) + x2 f 2 ( 0 ) = 1 + x +
2 2
F (0.1) = 1.105

5 Any polynomial of degree two can be written in this form. For example, when a = 1, 3 + 2x + x2 = 6 + 4(x ´ 1) +
(x ´ 1)2 .

293
TAYLOR P OLYNOMIALS 9.3 Q UADRATIC APPROXIMATION

Recall that e0.1 = 1.105170918 . . . , so the quadratic approximation is quite accurate with very little
effort.
Example 9.3.2

Before we go on, let us first introduce (or revise) some notation that will make our discussion
easier.

IJ Whirlwind tour of summation notation

In the remainder of this section we will frequently need to write sums involving a large number of
terms. Writing out the summands explicitly can become quite impractical — for example, say we
need the sum of the first 11 squares:

1 + 22 + 32 + 42 + 52 + 62 + 72 + 82 + 92 + 102 + 112

This becomes tedious. Where the pattern is clear, we will often skip the middle few terms and
instead write

1 + 22 + ¨ ¨ ¨ + 112 .

A far more precise way to write this is using Σ (capital-sigma) notation. For example, we can write
the above sum as

11
ÿ
k2
k =1

This is read as

The sum from k equals 1 to 11 of k2 .

More generally

294
TAYLOR P OLYNOMIALS 9.3 Q UADRATIC APPROXIMATION

Notation 9.3.3.

Let m ď n be integers and let f (x) be a function defined on the integers. Then we write
n
ÿ
f (k )
k =m

to mean the sum of f (k) for k from m to n:

f (m) + f (m + 1) + f (m + 2) + ¨ ¨ ¨ + f (n ´ 1) + f (n).

Similarly we write
n
ÿ
ai
i=m

to mean

am + am+1 + am+2 + ¨ ¨ ¨ + an´1 + an

for some set of coefficients tam , . . . , an u.

Consider the example

7
ÿ 1 1 1 1 1 1
2
= 2+ 2+ 2+ 2+ 2
k =3
k 3 4 5 6 7

It is important to note that the right hand side of this expression evaluates to a number6 ; it does not
contain “k”. The summation index k is just a “dummy” variable and it does not have to be called k.
For example

7 7 7 7
ÿ 1 ÿ 1 ÿ 1 ÿ 1
2
= 2
= 2
=
k =3
k i=3
i j =3
j `=3
`2

Also the summation index has no meaning outside the sum. For example

7
ÿ 1
k
k =3
k2

has no mathematical meaning; It is gibberish7 .

46181
6 Some careful addition shows it is 176400 .
7 Or possibly gobbledygook. For a discussion of statements without meaning and why one should avoid them we
recommend the book “Bendable learnings: the wisdom of modern management” by Don Watson.

295
TAYLOR P OLYNOMIALS 9.4 S TILL BETTER APPROXIMATIONS — TAYLOR POLYNOMIALS

9.4 IJ Still better approximations — Taylor polynomials


We can use the same strategy to generate still better approximations by polynomials8 of any degree
we like. As was the case with the approximations above, we determine the coefficients of the
polynomial by requiring, that at the point x = a, the approximation and its first n derivatives agree
with those of the original function.
Rather than simply moving to a cubic polynomial, let us try to write things in a more general
way. We will consider approximating the function f (x) using a polynomial, Tn (x), of degree n —
where n is a non-negative integer. As we discussed above, the algebra is easier if we write
Tn (x) = c0 + c1 (x ´ a) + c2 (x ´ a)2 + ¨ ¨ ¨ + cn (x ´ a)n
ÿn
= ck (x ´ a)k using Σ notation
k =0
The above form 10 makes
9 it very easy to evaluate this polynomial and its derivatives at x = a.
Before we proceed, we remind the reader of some notation (see Notation 3.3.13):
• Let f (x) be a function and k be a positive integer. We can denote its kth derivative with respect
to x by
 k
dk f d
k
f (x ) f (k ) ( x )
dx dx
Additionally we will need
Definition 9.4.1 (Factorial).

Let n be a positive integer11 , then n-factorial, denoted n!, is the product

n! = n ˆ (n ´ 1) ˆ ¨ ¨ ¨ ˆ 3 ˆ 2 ˆ 1

Further, we use the convention that

0! = 1

The first few factorials are

1! = 1 2! = 2 3! = 6
4! = 24 5! = 120 6! = 720

8 Polynomials are generally a good choice for an approximating function since they are so easy to work with. De-
pending on the situation other families of functions may be more appropriate. For example if you are approximating
a periodic function, then sums of sines and cosines might be a better choice; this leads to Fourier series.
9 Any polynomial in x of degree n can also be expressed as a polynomial in (x ´ a) of the same degree n and vice
versa. So Tn (x) really still is a polynomial of degree n.
10 Furthermore when x is close to a, (x ´ a)k decreases very quickly as k increases, which often makes the ”high k”
terms in Tn (x) very small. This can be a considerable advantage when building up approximations by adding more
ÿn
and more terms. If we were to rewrite Tn (x) in the form bk xk the ”high k” terms would typically not be very
k =0
small when x is close to a.

296
TAYLOR P OLYNOMIALS 9.4 S TILL BETTER APPROXIMATIONS — TAYLOR POLYNOMIALS

Now consider Tn (x) and its derivatives:

Tn (x) = c0 +c1 (x ´ a) +c2 (x ´ a)2 +c3 (x ´ a)3 + ¨ ¨ ¨ + cn (x ´ a)n


Tn1 (x) = c1 +2c2 (x ´ a) +3c3 (x ´ a)2 + ¨ ¨ ¨ + ncn (x ´ a)n´1
Tn2 (x) = 2c2 +6c3 (x ´ a) + ¨ ¨ ¨ + n(n ´ 1)cn (x ´ a)n´2
Tn3 (x) = 6c3 + ¨ ¨ ¨ + n(n ´ 1)(n ´ 2)cn (x ´ a)n´3
..
.
(n)
Tn (x) = n! ¨ cn
Now notice that when we substitute x = a into the above expressions only the constant terms survive
and we get

Tn (a) = c0
Tn1 (a) = c1
Tn2 (a) = 2 ¨ c2
Tn3 (a) = 6 ¨ c3
..
.
(n)
Tn (a) = n! ¨ cn

So now if we want to set the coefficients of Tn (x) so that it agrees with f (x) at x = a then we need
1
Tn (a) = c0 = f (a) c0 = f (a) = f (a)
0!
We also want the first n derivatives of Tn (x) to agree with the derivatives of f (x) at x = a, so
1 1
Tn1 (a) = c1 = f 1 (a) c1 = f 1 (a) = f (a)
1!
1 1
Tn2 (a) = 2 ¨ c2 = f 2 (a) c2 = f 2 (a) = f 2 (a)
2 2!
1 3 1
Tn3 (a) = 6 ¨ c3 = f 3 (a) c3 = f (a) = f 3 (a)
6 3!
More generally, making the kth derivatives agree at x = a requires :
(k ) 1 (k )
Tn (a) = k! ¨ ck = f (k) (a) ck = f (a)
k!
And finally the nth derivative:
(n) 1 (n)
Tn (a) = n! ¨ cn = f (n) (a) cn = f (a)
n!
Putting this all together we have

11 It is actually possible to define the factorial of positive real numbers and even negative numbers but it requires
more advanced calculus and is outside the scope of this course. The interested reader should look up the Gamma
function.

297
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES

Equation 9.4.2 (Taylor polynomial).

1 2 1
f (x) « Tn (x) = f (a) + f 1 (a)(x ´ a) + f ( a ) ¨ ( x ´ a ) 2 + ¨ ¨ ¨ + f (n) ( a ) ¨ ( x ´ a ) n
2 n!
n
ÿ 1 (k )
= f (a) ¨ (x ´ a)k
k =0
k!

Let us formalise this definition.

Definition 9.4.3 (Taylor polynomial).

Let a be a constant and let n be a non-negative integer. The nth degree Taylor polynomial
for f (x) about x = a is
n
ÿ 1 (k )
Tn (x) = f (a) ¨ (x ´ a)k .
k =0
k!

The special case a = 0 is called a Maclaurin12 polynomial.

Before we proceed with some examples, a couple of remarks are in order.

• While we can compute a Taylor polynomial about any a-value (providing the derivatives exist),
in order to be a useful approximation, we must be able to compute f (a), f 1 (a), . . . , f (n) (a)
easily. This means we must choose the point a with care. Indeed for many functions the
choice a = 0 is very natural — hence the prominence of Maclaurin polynomials.

• If we have computed the approximation Tn (x), then we can readily extend this to the next
Taylor polynomial Tn+1 (x) since

1
Tn+1 (x) = Tn (x) + f (n+1) ( a ) ¨ ( x ´ a ) n+1
(n + 1) !

This is very useful if we discover that Tn (x) is an insufficient approximation, because then we
can produce Tn+1 (x) without having to start again from scratch.

9.5 IJ Some examples


Let us return to our running example of ex :

12 The polynomials are named after Brook Taylor who devised a general method for constructing them in 1715.
Slightly later, Colin Maclaurin made extensive use of the special case a = 0 (with attribution of the general case to
Taylor) and it is now named after him. The special case of a = 0 was worked on previously by James Gregory
and Isaac Newton, and some specific cases were known to the 14th century Indian mathematician Madhava of
Sangamagrama.

298
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES

Example 9.5.1
The constant, linear and quadratic approximations we used above were the first few Maclaurin
polynomial approximations of ex . That is

x2
T0 (x) = 1 T1 (x) = 1 + x T2 (x) = 1 + x +
2
d x
Since dx e = ex , the Maclaurin polynomials are very easy to compute. Indeed this invariance under
differentiation means that

f (n) (x) = ex n = 0, 1, 2, . . . so
(n)
f (0) = 1

Substituting this into equation (9.4.2) we get


n
ÿ 1 k
Tn (x) = x
k =0
k!

Thus we can write down the seventh Maclaurin polynomial very easily:

x2 x3 x4 x5 x6 x7
T7 (x) = 1 + x + + + + + +
2 6 24 120 720 5040
The following figure contains sketches of the graphs of ex and its Taylor polynomials Tn (x) for
n = 0, 1, 2, 3, 4.

y
y = ex
7 y = T4 (x)

y = T3 (x)
6

x2
5 y = T2 (x) = 1 + x + 2

3 y = T1 (x) = 1 + x

1 y = T0 (x) = 1

x
−1 1 2

299
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES

Also notice that if we use T7 (1) to approximate the value of e1 we obtain:


1 1 1 1 1 1
e1 « T7 (1) = 1 + 1 + + + + + +
2 6 24 120 720 5040
685
= = 2.718253968 . . .
252
The true value of e is 2.718281828 . . . , so the approximation has an error of about 3 ˆ 10´5 .
Under the assumption that the accuracy of the approximation improves with n (an assumption
we examine in Subsection 9.6 below) we can see that the approximation of e above can be improved
by adding more and more terms. Indeed this is how the expression for e in equation (3.5.2) in
Section 3.5 comes about.
Example 9.5.1
Now that we have examined Maclaurin polynomials for ex we should take a look at log x. Notice
that we cannot compute a Maclaurin polynomial for log x since it is not defined at x = 0.
Example 9.5.2
Compute the 5th Taylor polynomial for log x about x = 1.
Solution. We have been told a = 1 and fifth degree, so we should start by writing down the function
and its first five derivatives:

f (x) = log x f (1) = log 1 = 0


1
f 1 (x ) = f 1 (1) = 1
x
´1
f 2 (x ) = 2 f 2 (1) = ´1
x
2
f 3 (x ) = 3 f 3 (1) = 2
x
´6
f (4) ( x ) = 4 f (4) (1) = ´6
x
24
f (5) ( x ) = 5 f (5) (1) = 24
x
Substituting this into equation (9.4.2) gives
1 1 1 1
T5 (x) = 0 + 1 ¨ (x ´ 1) + ¨ (´1) ¨ (x ´ 1)2 + ¨ 2 ¨ (x ´ 1)3 + ¨ (´6) ¨ (x ´ 1)4 + ¨ 24 ¨ (x ´ 1)5
2 6 24 120
1 1 1 1
= (x ´ 1) ´ (x ´ 1)2 + (x ´ 1)3 ´ (x ´ 1)4 + (x ´ 1)5
2 3 4 5
Again, it is not too hard to generalise the above work to find the Taylor polynomial of degree n:
With a little work one can show that
n
ÿ (´1)k+1
Tn (x) = (x ´ 1)k .
k =1
k

Example 9.5.2
For cosine:

300
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES

Example 9.5.3
Find the 4th degree Maclaurin polynomial for cos x.
Solution. We have a = 0 and we need to find the first 4 derivatives of cos x.

f (x) = cos x f (0) = 1


f 1 (x) = ´ sin x f 1 (0) = 0
f 2 (x) = ´ cos x f 2 (0) = ´1
f 3 (x) = sin x f 3 (0) = 0
f (4) (x) = cos x f (4) ( 0 ) = 1

Substituting this into equation (9.4.2) gives


1 1 1
T4 (x) = 1 + 1 ¨ (0) ¨ x + ¨ (´1) ¨ x2 + ¨ 0 ¨ x3 + ¨ (1) ¨ x4
2 6 24
x2 x4
= 1´ +
2 24
Notice that since the 4th derivative of cos x is cos x again, we also have that the fifth derivative is the
same as the first derivative, and the sixth derivative is the same as the second derivative and so on.
Hence the next four derivatives are

f (4) (x) = cos x f (4) ( 0 ) = 1


f (5) (x) = ´ sin x f (5) ( 0 ) = 0
f (6) (x) = ´ cos x f (6) (0) = ´1
f (7) (x) = sin x f (7) ( 0 ) = 0
f (8) (x) = cos x f (8) ( 0 ) = 1

Using this we can find the 8th degree Maclaurin polynomial:

x2 x4 x6 x8
T8 (x) = 1 ´ + ´ +
2 24 6! 8!
Continuing this process gives us the 2nth Maclaurin polynomial
n
ÿ (´1)k 2k
T2n (x) = ¨x
k =0
(2k)!

Warning 9.5.4.

The above formula only works when x is measured in radians, because all of our derivative
formulae for trig functions were developed under the assumption that angles are measured
in radians.

Below we plot cos x against its first few Maclaurin polynomial approximations:

301
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES

cos x ≈ 1 cos x ≈ 1 − 2!1 x2

cos x ≈ 1 − 2!1 x2 + 4!1 x4 cos x ≈ 1 − 2!1 x2 + 4!1 x4 − 6!1 x6

Example 9.5.4
The above work is quite easily recycled to get the Maclaurin polynomial for sine:
Example 9.5.5
Find the 5th degree Maclaurin polynomial for sin x.
Solution. We could simply work as before and compute the first five derivatives of sin x. But set
g(x) = sin x and notice that g(x) = ´ f 1 (x), where f (x) = cos x. Then we have

g(0) = ´ f 1 (0) = 0
g1 ( 0 ) = ´ f 2 ( 0 ) = 1
g2 (0) = ´ f 3 (0) = 0
g3 (0) = ´ f (4) (0) = ´1
g(4) ( 0 ) = ´ f (5) ( 0 ) = 0
g(5) ( 0 ) = ´ f (6) ( 0 ) = 1

Hence the required Maclaurin polynomial is

x3 x5
T5 (x) = x ´ +
3! 5!

302
TAYLOR P OLYNOMIALS 9.5 S OME EXAMPLES

Just as we extended to the 2nth Maclaurin polynomial for cosine, we can also extend our work to
compute the (2n + 1)th Maclaurin polynomial for sine:
n
ÿ (´1)k
T2n+1 (x) = ¨ x2k+1
k =0
( 2k + 1 ) !

Warning 9.5.6.

The above formula only works when x is measured in radians, because all of our derivative
formulae for trig functions were developed under the assumption that angles are measured
in radians.

Below we plot sin x against its first few Maclaurin polynomial approximations.

sin x ≈ x sin x ≈ x − 3!1 x3

sin x ≈ x − 3!1 x3 + 5!1 x5 sin x ≈ x − 3!1 x3 + 5!1 x5 − 7!1 x7

Example 9.5.6

To get an idea of how good these Taylor polynomials are at approximating sin and cos, let’s
concentrate on sin x and consider x’s whose magnitude |x| ď 1. There are tricks that you can
employ13 to evaluate sine and cosine at values of x outside this range.
If |x| ď 1 radians14 , then the magnitudes of the successive terms in the Taylor polynomials for

13 If you are writing software to evaluate sin x, you can always use the trig identity sin(x) = sin(x ´ 2nπ ), to easily
restrict to |x| ď π. You can then use the trig identity sin(x) = ´ sin(x ˘ π ) to reduce to |x| ď π2 . Finally you can
use the trig identity sin(x) = ¯ cos( π2 ˘ x)) to reduce to |x| ď π4 ă 1.
14 Recall that the derivative formulae that we used to derive the Taylor polynomials are valid only when x is in radians.
The restriction ´1 ď x ď 1 radians translates to angles bounded by 180 π « 57 .
˝

303
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS

sin x are bounded by


1 3 1 1 5 1
|x| ď 1 3! |x| ď 6 5! |x| ď 120 « 0.0083
1 7 1 1 9 1 1 11 1
7! |x| ď 7! « 0.0002 9! |x| ď 9! « 0.000003 11! |x| ď 11! « 0.000000025

From these inequalities, and the graphs on the previous pages, it certainly looks like, for x not too
large, even relatively low degree Taylor polynomials give very good approximations. In Section 9.6
we’ll see how to get rigorous error bounds on our Taylor polynomial approximations.

9.6 IJ (Flavour A) Error in Taylor Polynomials

Learning Objectives
• Be able to use the formula for the error in Taylor polynomial approximations, and
interpret its result. For example: determine a bound on the error of a polynomial
approximation at a point; determine a range for which a particular approximation has an
error within a certain tolerance; or determine which degree Taylor approximation will
result in an error within a certain tolerance.

Any time you make an approximation, it is desirable to have some idea of the size of the error you
introduced. That is, we would like to know the difference R(x) between the original function f (x)
and our approximation F (x):

R(x ) = f (x ) ´ F (x ).

Of course if we know R(x) exactly, then we could recover f (x) = F (x) + R(x) — so this is an
unrealistic hope. In practice we would simply like to bound R(x):

|R(x)| = | f (x) ´ F (x)| ď M

where (hopefully) M is some small number. It is worth stressing that we do not need the tightest
possible value of M, we just need a relatively easily computed M that isn’t too far off the true value
of | f (x) ´ F (x)|.
We will now develop a formula for the error introduced by the constant approximation, equa-
tion (9.1.1) (developed back in Section 9.1)

f (x) « f (a) = T0 (x) 0th Taylor polynomial

The resulting formula can be used to get an upper bound on the size of the error |R(x)|.

304
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS

Consider the following obvious statement:

f (x ) = f (x ) now some sneaky manipulations


= f (a) + ( f (x) ´ f (a))
x´a
(ao)n +( f (x) ´ f (a)) ¨
= lofomo
´ aon
loxomo
=T0 (x)
=1
f (x ) ´ f (a)
= T0 (x) + ¨(x ´ a)
x´a
looooomooooon
looks familiar

Indeed, this equation is important in the discussion that follows, so we’ll highlight it

Equation 9.6.1 (We will need it again soon).


 
f (x ) ´ f (a)
f (x) = T0 (x) + (x ´ a)
x´a

f (x ) ´ f (a)
The coefficient of (x ´ a) is the average slope of f (t ) as t moves from t = a to
x´a
t = x. We can picture this as the slope of the secant joining the points (a, f (a)) and (x, f (x)) in the
Flavour A

sketch below.


y x, f (x)

 y = f (t)
a, f (a)

a c x t

As t moves from a to x, the instantaneous slope f 1 (t ) keeps changing. Sometimes f 1 (t ) might


f (x )´ f (a)
be larger than the average slope x´a , and sometimes f 1 (t ) might be smaller than the average
f (x )´ f (a)
slope x´a .
However, by the Mean-Value Theorem (see 2 – not assessable), there must be some
f (x ) ´ f (a)
number c, strictly between a and x, for which f 1 (c) = exactly.
x´a
Substituting this into formula (9.6.1) gives

Equation 9.6.2 (Towards the error).

f (x) = T0 (x) + f 1 (c)(x ´ a) for some c strictly between a and x

305
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS

Notice that this expression as it stands is not quite what we want. Let us massage this around a
little more into a more useful form
Equation 9.6.3 (The error in constant approximation).

f (x) ´ T0 (x) = f 1 (c) ¨ (x ´ a) for some c strictly between a and x

Notice that the MVT doesn’t tell us the value of c, however we do know that it lies strictly
between x and a. So if we can get a good bound on f 1 (c) on this interval then we can get a good
bound on the error.
Example 9.6.4
Let us return to Example 9.1.2, and we’ll try to bound the error in our approximation of e0.1 .

• Recall that f (x) = ex , a = 0 and T0 (x) = e0 = 1.

• Then by equation (9.6.3)

e0.1 ´ T0 (0.1) = f 1 (c) ¨ (0.1 ´ 0) with 0 ă c ă 0.1

• Now f 1 (c) = ec , so we need to bound ec on (0, 0.1). Since ec is an increasing function, we


Flavour A

know that

e0 ă f 1 (c) ă e0.1 when 0 ă c ă 0.1

So one is tempted to write that

|e0.1 ´ T0 (0.1)| = |R(x)| = | f 1 (c)| ¨ (0.1 ´ 0)


ă e0.1 ¨ 0.1

And while this is true, it is rather circular. We have just bounded the error in our approximation
1 0.1
of e0.1 by 10 e — if we actually knew e0.1 then we wouldn’t need to estimate it!

• While we don’t know e0.1 exactly, we do knowa that 1 = e0 ă e0.1 ă e1 ă 3. This gives us

|R(0.1)| ă 3 ˆ 0.1 = 0.3

That is — the error in our approximation of e0.1 is no greater than 0.3. Recall that we don’t
need the error exactly, we just need a good idea of how large it actually is.

• In fact the real error here is

|e0.1 ´ T0 (0.1)| = |e0.1 ´ 1| = 0.1051709 . . .

so we have over-estimated the error by a factor of 3.

306
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS

But we can actually go a little further here — we can bound the error above and below. If we do
not take absolute values, then since

e0.1 ´ T0 (0.1) = f 1 (c) ¨ 0.1 and 1 ă f 1 (c) ă 3

we can write

1 ˆ 0.1 ď (e0.1 ´ T0 (0.1)) ď 3 ˆ 0.1

so

T0 (0.1) + 0.1 ď e0.1 ď T0 (0.1) + 0.3


1.1 ď e0.1 ď 1.3

So while the upper bound is weak, the lower bound is quite tight.
Example 9.6.4

There are formulae similar to equation (9.6.2), that can be used to bound the error in our
other approximations; all are based on generalisations of the MVT. The next one — for linear
approximations — is
Flavour A

f (a) + f 1 (a)(x ´ a) + 12 f 2 (c)(x ´ a)2


f (x) = looooooooooomooooooooooon for some c strictly between a and x
=T1 (x)

which we can rewrite in terms of T1 (x):

Equation 9.6.5 (The error in linear approximation).

f (x) ´ T1 (x) = 12 f 2 (c)(x ´ a)2 for some c strictly between a and x

a Oops! Do we really know that e ă 3? We haven’t proved it. We will do so soon.

It implies that the error that we make when we approximate f (x) by T1 (x) = f (a) + f 1 (a) (x´a)
is exactly 21 f 2 (c) (x ´ a)2 for some c strictly between a and x.
More generally
1 1
f ( x ) = f ( a ) + f 1 ( a ) ¨ ( x ´ a ) + ¨ ¨ ¨ + f (n) ( a ) ¨ ( x ´ a ) n + f (n+1) ( c ) ¨ ( x ´ a ) n+1
n!
loooooooooooooooooooooooooooooomoooooooooooooooooooooooooooooon (n + 1) !
=Tn (x)

for some c strictly between a and x. Again, rewriting this in terms of Tn (x) gives

307
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS

Equation 9.6.6.

1
f (x) ´ Tn (x) = f (n+1) (c) ¨ (x ´ a)n+1 for some c strictly between a and x
(n + 1) !

That is, the error introduced when f (x) is approximated by its Taylor polynomial of degree n, is
precisely the last term of the Taylor polynomial of degree n + 1, but with the derivative evaluated
at some point between a and x, rather than exactly at a. These error formulae are proven in the
optional Section 9.7 later in this chapter.
Example 9.6.7
Approximate sin 46˝ using Taylor polynomials about a = 45˝ , and estimate the resulting error.
Solution.

• Start by defining f (x) = sin x and

a = 45˝ = 45 180
π
radians x = 46˝ = 46 180
π
radians x´a = π
180 radians
Flavour A

• The first few derivatives of f at a are


1
f (x) = sin x f (a) = ?
2
1
f 1 (x) = cos x f 1 (a) = ?
2
1
f 2 (x) = ´ sin x f 2 (a) = ´ ?
2
1
f (3) (x) = ´ cos x f (3) (a) = ´ ?
2

• The constant, linear and quadratic Taylor approximations for sin(x) about π
4 are

1
T0 (x) = f (a) =?
2
1 1  π
T1 (x) = T0 (x) + f 1 (a) ¨ (x ´ a) = ? + ? x ´
2 2 4
1 1  π 1  π 2
T2 (x) = T1 (x) + 21 f 2 (a) ¨ (x ´ a)2 = ? + ? x ´ ´ ? x´
2 2 4 2 2 4

308
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS

• So the approximations for sin 46˝ are


 
46π 1
sin 46 « T0
˝
=? = 0.70710678
180 2
 
46π 1 1  π 
sin 46˝ « T1 = ? +? = 0.71944812
180 2 2 180
 
46π 1 1  π  1  π 2
sin 46˝ « T2 = ? +? ´ ? = 0.71934042
180 2 2 180 2 2 180

• The errors in those approximations are (respectively)


 π 
error in 0.70710678 = f 1 (c)(x ´ a) = cos c ¨
180
1 2 1  π 2
error in 0.71944812 = f (c)(x ´ a)2 = ´ ¨ sin c ¨
2 2 180
1 (3) 1  π 3
3
error in 0.71923272 = f (c)(x ´ a) = ´ ¨ cos c ¨
3! 3! 180
In each of these three cases c must lie somewhere between 45˝ and 46˝ .
Flavour A

• Rather than carefully estimating sin c and cos c for c in that range, we make use of a simpler
(but much easier bound). No matter what c is, we know that | sin c| ď 1 and | cos c| ď 1. Hence
ˇ  
ˇerror in 0.70710678ˇ ď π
ˇ
ă 0.018
180
 2
ˇerror in 0.71944812ˇ ď 1 π
ˇ ˇ
ă 0.00015
2 180
 3
ˇerror in 0.71934042ˇ ď 1 π
ˇ ˇ
ă 0.0000009
3! 180

Example 9.6.7

Example 9.6.8 (Showing e ă 3)


In Example 9.6.4 above we used the fact that e ă 3 without actually proving it. Let’s do so now.

• Consider the linear approximation of ex about a = 0.

T1 (x) = f (0) + f 1 (0) ¨ x = 1 + x

So at x = 1 we have

e « T1 (1) = 2

309
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS

• The error in this approximation is


1 2 ec
ex ´ T1 (x) = f ( c ) ¨ x2 = ¨ x2
2 2
So at x = 1 we have
ec
e ´ T1 (1) =
2
where 0 ă c ă 1.

• Now since ex is an increasinga function, it follows that ec ă e. Hence


ec e
e ´ T1 (1) = ă
2 2
e
Moving the 2 to the left hand side and the T1 (1) to the right hand side gives
e
ď T1 (1) = 2
2
Flavour A

So e ă 4.

• This isn’t as tight as we would like — so now do the same with the quadratic approximation
with a = 0:
x2
ex « T2 (x) = 1 + x +
2
So when x = 1 we have
1 5
e « T2 (1) = 1 + 1 + =
2 2

• The error in this approximation is


1 3 ec
ex ´ T2 (x) = f ( c ) ¨ x3 = ¨ x3
3! 6
So at x = 1 we have
ec
e ´ T2 (1) =
6
where 0 ă c ă 1.

310
TAYLOR P OLYNOMIALS 9.6 (F LAVOUR A) E RROR IN TAYLOR P OLYNOMIALS

• Again since ex is an increasing function we have ec ă e. Hence


ec e
e ´ T2 (1) = ă
6 6
That is
5e 5
ă T2 (1) =
6 2
So e ă 3 as required.

Example 9.6.8

Example 9.6.9 (More on ex )


We wrote down the general nth degree Maclaurin polynomial approximation of ex in Example 9.5.1
above.

• Recall that
n
ÿ 1 k
Tn (x) = x
Flavour A

k =0
k!

• The error in this approximation is (by equation (9.6.6))


1
ex ´ Tn (x) = ec
(n + 1) !
where c is some number between 0 and x.

• So setting x = 1 in this gives


1
e ´ Tn (1) = ec
(n + 1) !
where 0 ă c ă 1.

• Since ex is an increasing function we know that 1 = e0 ă ec ă e1 ă 3, so the above expression


becomes
1 1 3
ď e ´ Tn (1) = ec ď
(n + 1) ! (n + 1) ! (n + 1) !

• So when n = 9 we have
 
1 1 1 3
ď e´ 1 + 1 + + ¨¨¨ + ď
10! 2 9! 10!

311
TAYLOR P OLYNOMIALS 9.7 (O PTIONAL ) — D ERIVATION OF THE ERROR FORMULAE

• Now 1/10! ă 3/10! ă 10´6 , so the approximation of e by


1 1 98641
e « 1+1+ + ¨¨¨ + = = 2.718281 . . .
2 9! 36288
is correct to 6 decimal places.

• More generally we know that using Tn (1) to approximate e will have an error of at most
3
(n+1) !
— so it converges very quickly.

Example 9.6.9

a Since the derivative of ex is ex which is positive everywhere, the function is increasing everywhere.

9.7 IJ (Optional) — Derivation of the error formulae


In this section we will derive the formula for the error that we gave in equation (9.6.6) — namely

1
Rn (x) = f (x) ´ Tn (x) = f (n+1) ( c ) ¨ ( x ´ a ) n+1
(n + 1) !

for some c strictly between a and x, and where Tn (x) is the nth degree Taylor polynomial approxima-
tion of f (x) about x = a:
n
ÿ 1 (k )
Tn (x) = f (a).
k =0
k!

Theorem 9.7.1 (Generalised Mean-Value Theorem).

Let the functions F (x) and G(x) both be defined and continuous on a ď x ď b and both be
differentiable on a ă x ă b. Furthermore, suppose that G1 (x) ‰ 0 for all a ă x ă b. Then,
there is a number c obeying a ă c ă b such that

F (b) ´ F (a) F 1 (c)


= 1
G(b) ´ G(a) G (c)

Notice that setting G(x) = x recovers the original Mean-Value Theorem. It turns out that this
theorem is not too difficult to prove from the MVT using some sneaky algebraic manipulations:

Proof. • First we construct a new function h(x) as a linear combination of F (x) and G(x) so
that h(a) = h(b) = 0. Some experimentation yields
       
h(x ) = F (b) ´ F (a) ¨ G(x ) ´ G(a) ´ G(b) ´ G(a) ¨ F (x ) ´ F (a)

312
TAYLOR P OLYNOMIALS 9.7 (O PTIONAL ) — D ERIVATION OF THE ERROR FORMULAE

• Since h(a) = h(b) = 0, the Mean-Value theorem (actually Rolle’s theorem) tells us that there
is a number c obeying a ă c ă b such that h1 (c) = 0:
   
h1 ( x ) = F ( b ) ´ F ( a ) ¨ G1 ( x ) ´ G ( b ) ´ G ( a ) ¨ F 1 ( x ) so
  1   1
0 = F (b) ´ F (a) ¨ G (c) ´ G(b) ´ G(a) ¨ F (c)

Now move the G1 (c) terms to one side and the F 1 (c) terms to the other:
   
F ( b ) ´ F ( a ) ¨ G1 ( c ) = G ( b ) ´ G ( a ) ¨ F 1 ( c ) .

• Since we have G1 (x) ‰ 0, we know that G1 (c) ‰ 0. Further the Mean-Value theorem ensures15
that G(a) ‰ G(b). Hence we can move terms about to get

    F 1 (c)
F (b) ´ F (a) = G(b) ´ G(a) ¨ 1
G (c)
F (b) ´ F (a) 1
F (c)
= 1
G(b) ´ G(a) G (c)

as required.

Armed with the above theorem we can now move on to the proof of the Taylor remainder
formula.

Proof of equation (9.6.6). We begin by proving the remainder formula for n = 1. That is
1 2
f (x) ´ T1 (x) = f (c) ¨ (x ´ a)2
2
• Start by setting

F (x) = f (x) ´ T1 (x) G(x ) = (x ´ a)2

Notice that, since T1 (a) = f (a) and T11 (x) = f 1 (a),

F (a) = 0 G(a) = 0
F 1 (x ) = f 1 (x ) ´ f 1 (a) G1 (x) = 2(x ´ a)

• Now apply the generalised MVT with b = x: there exists a point q between a and x such that

F (x ) ´ F (a) F 1 (q)
= 1
G(x ) ´ G(a) G (q)
F (x ) ´ 0 f 1 (q) ´ f 1 (a)
=
G(x ) ´ 0 2(q ´ a)
F (x ) f (q) ´ f 1 (a)
1
2¨ =
G(x ) q´a

15 Otherwise if G(a) = G(b) the MVT tells us that there is some point c between a and b so that G1 (c) = 0.

313
TAYLOR P OLYNOMIALS 9.7 (O PTIONAL ) — D ERIVATION OF THE ERROR FORMULAE

• Consider the right-hand side of the above equation and set g(x) = f 1 (x). Then we have the
g(q)´g(a)
term q´a — this is exactly the form needed to apply the MVT. So now apply the standard
MVT to the right-hand side of the above equation — there is some c between q and a so that
f 1 (q) ´ f 1 (a) g(q) ´ g(a)
= = g1 ( c ) = f 2 ( c )
q´a q´a
Notice that here we have assumed that f 2 (x) exists.

• Putting this together we have that


F (x ) f 1 (q) ´ f 1 (a)
2¨ = = f 2 (c)
G(x ) q´a
f (x) ´ T1 (x)
2 = f 2 (c)
(x ´ a)2
1
f (x) ´ T1 (x) = f 2 (c) ¨ (x ´ a)2
2!
as required.
Oof! We have now proved the cases n = 1 (and we did n = 0 earlier).
To proceed — assume we have proved our result for n = 1, 2, . . . , k. We realise that we haven’t
done this yet, but bear with us. Using that assumption we will prove the result is true for n = k + 1.
Once we have done that, then
• we have proved the result is true for n = 1, and

• we have shown if the result is true for n = k then it is true for n = k + 1


Hence it must be true for all n ě 1. This style of proof is called mathematical induction. You can
think of the process as something like climbing a ladder:
• prove that you can get onto the ladder (the result is true for n = 1), and

• if I can stand on the current rung, then I can step up to the next rung (if the result is true for
n = k then it is also true for n = k + 1)
Hence I can climb as high as like.

• Let k ą 0 and assume we have proved


1
f (x) ´ Tk (x) = f (k +1) ( c ) ¨ ( x ´ a ) k +1
(k + 1) !
for some c between a and x.

• Now set

F (x) = f (x) ´ Tk+1 (x) G (x ) = (x ´ a )k +1

and notice that, since Tk+1 (a) = f (a),

F (a) = f (a) ´ Tk+1 (a) = 0 G(a) = 0 G1 (x) = (k + 1)(x ´ a)k

314
TAYLOR P OLYNOMIALS 9.7 (O PTIONAL ) — D ERIVATION OF THE ERROR FORMULAE

and apply the generalised MVT with b = x: hence there exists a q between a and x so that
F (x ) ´ F (a) F 1 (q)
= 1 which becomes
G(x ) ´ G(a) G (q)
F (x ) F 1 (q)
= rearrange
(x ´ a)k+1 (k + 1)(q ´ a)k
(x ´ a )k +1
F (x ) = ¨ F 1 (q)
(k + 1)(q ´ a) k

• We now examine F 1 (q). First carefully differentiate F (x):


  
d 1 2 2 1 (k ) k
1
F (x ) = f (x) ´ f (a) + f (a)(x ´ a) + f (a)(x ´ a) + ¨ ¨ ¨ + f (x ´ a)
1
dx 2 k!
 
2 2 3 3 2 k (k ) k´1
= f (x) ´ f (a) + f (a)(x ´ a) + f (a)(x ´ a) + ¨ ¨ ¨ + f (a)(x ´ a)
1 1
2 3! k!
 
1 3 2 1 (k ) k´1
= f (x) ´ f (a) + f (a)(x ´ a) + f (a)(x ´ a) + ¨ ¨ ¨ +
1 1 2
f (a)(x ´ a)
2 (k ´ 1) !
Now notice that if we set f 1 (x) = g(x) then this becomes
 
1 2 2 1 (k´1) k´1
F (x) = g(x) ´ g(a) + g (a)(x ´ a) + g (a)(x ´ a) + ¨ ¨ ¨ +
1 1
g (a)(x ´ a)
2 (k ´ 1) !
So F 1 (x) is then exactly the remainder formula but for a degree k ´ 1 approximation to the
function g(x) = f 1 (x).
• Hence the function F 1 (q) is the remainder when we approximate f 1 (q) with a degree k ´ 1
Taylor polynomial. The remainder formula, equation (9.6.6), then tells us that there is a
number c between a and q so that
 
1 2 2 1 (k´1) k´1
F (q) = g(q) ´ g(a) + g (a)(q ´ a) + g (a)(q ´ a) + ¨ ¨ ¨ +
1 1
g (a)(q ´ a)
2 (k ´ 1) !
1 1
= g(k) (c)(q ´ a)k = f (k+1) (c)(q ´ a)k
k! k!
Notice that here we have assumed that f (k+1) (x) exists.
• Now substitute this back into our equation above
(x ´ a )k +1
F (x ) = ¨ F 1 (q)
(k + 1)(q ´ a) k

(x ´ a )k +1 1
= ¨ f (k+1) (c)(q ´ a)k
(k + 1)(q ´ a) k! k

1 (k +1) (x ´ a )k +1 (q ´ a )k
= ¨f (c) ¨
(k + 1)k! (q ´ a)k
1
= ¨ f (k +1) ( c ) ¨ ( x ´ a ) k +1
(k + 1) !
as required.

315
TAYLOR P OLYNOMIALS 9.7 (O PTIONAL ) — D ERIVATION OF THE ERROR FORMULAE

So we now know that

• if, for some k, the remainder formula (with n = k) is true for all k times differentiable functions,

• then the remainder formula is true (with n = k + 1) for all k + 1 times differentiable functions.

Repeatedly applying this for k = 1, 2, 3, 4, ¨ ¨ ¨ (and recalling that we have shown the remainder
formula is true when n = 0, 1) gives equation (9.6.6) for all n = 0, 1, 2, . . . .

316
N EWTON ’ S M ETHOD

Chapter 10

(F LAVOUR A) N EWTON ’ S M ETHOD

Learning Objectives
• Given a function, find an integer that is reasonably close to the root.

• Given a differentiable function, find the x-intercept of the tangent line at a particular
point.

• Explain how Newton’s method works. That is, how you can use tangent lines to
approximate the roots of a function.

• Write down the formula for Newton’s method and explain what each term in the
equation represents.

• Use Newton’s method to estimate the root(s) of a function.

• Recognize pathological cases where Newton’s Method doesn’t converge to a root.

Newton’s method1 , also known as the Newton-Raphson method, is another technique for
generating numerical approximate solutions
? to equations of the form f (x) = 0. For example, one
can easily get a good approximation to 2 by applying Newton’s method to the equation x2 ´ 2 = 0.
This will be done in Example 10.0.2, below.
Here is the derivation of Newton’s method. We start by simply making a guess for the solution.
For example, we could base the guess on a sketch of the graph of f (x). Call the initial guess x1 .
Next recall, from Theorem 3.3.7, that the tangent line to y = f (x) at x = x1 is y = F (x), where

F (x) = f (x1 ) + f 1 (x1 ) (x ´ x1 )

Usually F (x) is a pretty good approximation to f (x) for x near x1 . So, instead of trying to solve

1 The algorithm that we are about to describe grew out of a method that Newton wrote about in 1669. But the modern
method incorporates substantial changes introduced by Raphson in 1690 and Simpson in 1740.
N EWTON ’ S M ETHOD

f (x) = 0, we solve the linear equation F (x) = 0 and call the solution x2 .

f (x1 )
0 = F (x) = f (x1 ) + f 1 (x1 ) (x ´ x1 ) ðñ x ´ x1 = ´
f 1 (x1 )
f (x1 )
ðñ x = x2 = x1 ´ 1
f (x1 )

Note that if f (x) were a linear function, then F (x) would be exactly f (x) and x2 would solve
f (x) = 0 exactly.

y (x1 , f (x1 ))
y = f (x)

x1 x2 x
y = F (x)

Now we repeat, but starting with the (second) guess x2 rather than x1 . This gives the (third)
f (x )
guess x3 = x2 ´ f 1 (x2 ) . And so on. By way of summary, Newton’s method is
2

1. Make a preliminary guess x1 .


f (x1 )
2. Define x2 = x1 ´ f 1 (x1 )
.

3. Iterate. That is, for each natural number n, once you have computed xn , define

Equation 10.0.1 (Newton’s method).


f (xn )
xn+1 = xn ´ 1
f (xn )

? 
Example 10.0.2 Approximating 2
In this example we compute,?approximately, the square root of two. We will of course pretend that
we do not already know
? that 2 = 1.41421 ¨ ¨ ¨ . So we cannot find it by solving, approximately, the
equation f (x) = x ´ 2 = 0. Instead we apply Newton’s method to the equation

f ( x ) = x2 ´ 2 = 0

Since f 1 (x) = 2x, Newton’s method says that we should generate approximate solutions by iteratively
applying
f (xn ) x2 ´ 2 xn 1
xn+1 = xn ´ 1 = xn ´ n = +
f (xn ) 2xn 2 xn
N EWTON ’ S M ETHOD

We need a starting point. Since 12 = 1 ă 2 and 22 = 4 ą 2, the square root of two must be between
1 and 2, so let’s start Newton’s method with the initial guess x1 = 1.5. Here goes2 :

x1 = 1.5
1 1 1 1
x2 = x1 + = (1.5) +
2 x1 2 1.5
= 1.416666667
1 1 1 1
x3 = x2 + = (1.416666667) +
2 x2 2 1.416666667
= 1.414215686
1 1 1 1
x4 = x3 + = (1.414215686) +
2 x3 2 1.414215686
= 1.414213562
1 1 1 1
x5 = x4 + = (1.414213562) +
2 x4 2 1.414213562
= 1.414213562

It looks like the xn ’s, rounded


? to nine decimal places, have stabilized to 1.414213562. So it is
reasonable to guess that 2, rounded to nine decimal places, is exactly 1.414213562. Recalling
that all numbers 1.4142135615 ď y ă 1.4142135625 round to 1.414213562, we can check our
guess by evaluating f (1.4142135615) and f (1.4142135625). Since f (1.4142135615) = ´2.5 ˆ
10´9 ă 0 and f (1.4142135625) = 3.6 ˆ 10´10 ą 0 the square root of two must indeed be between
1.4142135615 and 1.4142135625.
Example 10.0.2

Example 10.0.3 (Approximating π )


In this example we compute, approximately, π by applying Newton’s method to the equation

f (x) = sin x = 0

starting with x1 = 3. Since f 1 (x) = cos x, Newton’s method says that we should generate approximate
solutions by iteratively applying

f (xn ) sin xn
xn+1 = xn ´ = xn ´ = xn ´ tan xn
f (xn )
1 cos xn

2 The following computations have been carried out in double precision, which is computer speak for about 15
significant digits. We are displaying each xn rounded to 10 significant digits (9 decimal places). So each displayed
xn has not been impacted by roundoff error, and still contains more decimal places than are usually needed.
N EWTON ’ S M ETHOD

Here goes

x1 = 3
x2 = x1 ´ tan x1 = 3 ´ tan 3
= 3.142546543
x3 = 3.142546543 ´ tan 3.142546543
= 3.141592653
x4 = 3.141592653 ´ tan 3.141592653
= 3.141592654
x5 = 3.141592654 ´ tan 3.141592654
= 3.141592654

Since f (3.1415926535) = 9.0 ˆ 10´11 ą 0 and f (3.1415926545) = ´9.1 ˆ 10´11 ă 0, π must be


between 3.1415926535 and 3.1415926545. Of course to compute π in this way, we (or at least our
computers) have to be able to evaluate tan x for various values of x. Taylor expansions can help us
do that.
Example 10.0.3

Example 10.0.4 (wild instability)


This example illustrates how Newton’s method can go badly wrong if your initial guess is not good
enough. We’ll try to solve the equation

f (x) = arctan x = 0

starting with x1 = 1.5. (Of course the solution to f (x) = 0 is just x = 0; we chose x1 = 1.5 for
demonstration purposes.) Since the derivative f 1 (x) = 1+1x2 , Newton’s method gives

f (xn )
xn+1 = xn ´ = xn ´ (1 + xn2 ) arctan xn
f 1 (xn )

So3

x1 = 1.5
x2 = 1.5 ´ (1 + 1.52 ) arctan 1.5 = ´1.69
x3 = ´1.69 ´ (1 + 1.692 ) arctan(´1.69) = 2.32
x4 = 2.32 ´ (1 + 2.322 ) arctan(2.32) = ´5.11
x5 = ´5.11 ´ (1 + 5.112 ) arctan(´5.11) = 32.3
x6 = 32.3 ´ (1 + 32.32 ) arctan(32.3) = ´1575
x7 = 3, 894, 976

Looks pretty bad! Our xn ’s are not settling down at all!

3 Once again, the following computations have been carried out in double precision. This time, it is clear that the
xn ’s are growing madly as n increases. So there is not much point to displaying many decimal places and we have
not done so.
N EWTON ’ S M ETHOD

The figure below shows what went wrong. In this figure, y = F1 (x) is the tangent line to
y = arctan x at x = x1 . Under Newton’s method, this tangent line crosses the x–axis at x = x2 . Then
y = F2 (x) is the tangent to y = arctan x at x = x2 . Under Newton’s method, this tangent line crosses
the x–axis at x = x3 . And so on.
The problem arose because the xn ’s were far enough from the solution, x = 0, that the tangent
line approximations, while good approximations to f (x) for x « xn , were very poor approximations

y
y = F3 (x)
y = F1 (x)
y = f (x) = tan−1 x
(x1 ,f (x1 ))

y = F2 (x)

x4 x2 x1 x3 x

(x2 ,f (x2 )) y = F4 (x)


(x4 ,f (x4 ))

to f (x) for x « 0. In particular, y = F1 (x) (i.e. the tangent line at x = x1 ) was a bad enough
approximation to y = arctan x for x « 0 that x = x2 (i.e. the value of x where y = F1 (x) crosses the
x-axis) is farther from the solution x = 0 than our original guess x = x1 . If we had started with
x1 = 0.5 instead of x1 = 1.5, Newton’s method would have succeeded very nicely:
x1 = 0.5 x2 = ´0.0796 x3 = 0.000335 x4 = ´2.51 ˆ 10´11

Example 10.0.4

Example 10.0.5 (interest rate)


A car dealer sells a new car for $23,520. He also offers to finance the same car for payments of $420
per month for five years. What interest rate is this dealer charging?
Solution. By way of preparation, we’ll start with a simpler problem. Suppose that you will have
to make a single $420 payment n months in the future. The simpler problem is to determine how
much money you have to deposit now in an account that pays an interest rate of 100r% per month,
compounded monthly4 , in order to be able to make the $420 payment in n months.
Let’s denote by P the initial deposit. Because the interest rate is 100r% per month, compounded
monthly,
• the first month’s interest is P ˆ r. So at the end of month #1, the account balance is P + P r =
P(1 + r ).

4 “Compounded monthly”, means that, each month, interest is paid on the accumulated interest that was paid in all
previous months.
N EWTON ’ S M ETHOD

• The second month’s interest is [P(1 + r )] ˆ r. So at the end of month #2, the account balance
is P(1 + r ) + P(1 + r ) r = P(1 + r )2 .
• And so on.
• So at the end of n months, the account balance is P(1 + r )n .

In order for the balance at the end of n months, P(1 + r )n , to be $420, the initial deposit has to be
P = 420(1 + r )´n . That is what is meant by the statement “The present value5 of a $420 payment
made n months in the future, when the interest rate is 100r% per month, compounded monthly, is
420(1 + r )´n .”
Now back to the original problem. We will be making 60 monthly payments of $420. The
present value of all 60 payments is6

(1 + r )´1 ´ (1 + r )´61
420(1 + r )´1 + 420(1 + r )´2 + ¨ ¨ ¨ + 420(1 + r )´60 = 420
1 ´ (1 + r )´1
1 ´ (1 + r )´60 1 ´ (1 + r )´60
= 420 = 420
(1 + r ) ´ 1 r

The interest rate 100r% being charged by the car dealer is such that the present value of 60 monthly
payments of $420 is $23520. That is, the monthly interest rate being charged by the car dealer is the
solution of

1 ´ (1 + r )´60 1 ´ (1 + r )´60
23520 = 420 or 56 =
r r
or 56r = 1 ´ (1 + r )´60
or 56r (1 + r )60 = (1 + r )60 ´ 1
or (1 ´ 56r )(1 + r )60 = 1

Set f (r ) = (1 ´ 56r )(1 + r )60 ´ 1. Then

f 1 (r ) = ´56(1 + r )60 + 60(1 ´ 56r )(1 + r )59

or
 
f 1 (r ) = ´ 56(1 + r ) + 60(1 ´ 56r ) (1 + r )59 = (4 ´ 3416r )(1 + r )59

Apply Newton’s method with an initial guess of r1 = .002. (That’s 0.2% per month or 2.4% per

5 Inflation means that prices of goods (typically) increase with time, and hence $100 now is worth more than $100 in
10 years time. The term “present value” is widely used in economics and finance to mean “the current amount of
money that will have a specified value at a specified time in the future”. It takes inflation into account. If the money
is invested, it takes into account the rate of return of the investment. We recommend that the interested reader do
some search-engining to find out more.
6 Don’t worry if you don’t know how to evaluate such sums. They are called geometric sums, and will be covered in
the CLP-2 text. (See (1.1.3) in the CLP-2 text.) In any event, you can check that this is correct, by multiplying the
whole equation by 1 ´ (1 + r )´1 . When you simplify the left hand side, you should get the right hand side.
N EWTON ’ S M ETHOD

year.) Then

(1 ´ 56r1 )(1 + r1 )60 ´ 1


r2 = r1 ´ = 0.002344
(4 ´ 3416r1 )(1 + r1 )59
(1 ´ 56r2 )(1 + r2 )60 ´ 1
r3 = r2 ´ = 0.002292
(4 ´ 3416r2 )(1 + r2 )59
(1 ´ 56r3 )(1 + r3 )60 ´ 1
r4 = r3 ´ = 0.002290
(4 ´ 3416r3 )(1 + r3 )59
(1 ´ 56r4 )(1 + r4 )60 ´ 1
r5 = r4 ´ = 0.002290
(4 ´ 3416r4 )(1 + r4 )59
So the interest rate is 0.229% per month or 2.75% per year.
Example 10.0.5
N EWTON ’ S M ETHOD

324
Differential Equations

325
I NTRODUCTION TO D IFFERENTIAL E QUATIONS

Chapter 11

(F LAVOURS A, B) I NTRODUCTION
TO D IFFERENTIAL E QUATIONS

Learning Objectives
• Explain how a differential equation is different from an algebraic equation.

• Check whether a given function satisfies a differential equation.

• Understand basic differential-equation models of exponential growth and decay.

• Identify solutions to simple differential equations (of the form y1 = ay ) and interpret
them in context.

• Given an initial condition, find a particular solution that satisfies a differential equation.

11.1 IJ Introducing a new kind of equation


i A screencast summary of the introduction: differential equations for exponential growth and
decay. Edu.Cr.

§§ Observations about the exponential function


Earlier, we introduced the exponential function y = f (x) = ex , and noted that it satisfies the
relationship
dex dy
= ex , ñ = y.
dx dx
I NTRODUCTION TO D IFFERENTIAL E QUATIONS11.1 I NTRODUCING A NEW KIND OF EQUATION

The equation on the right (linking a function to its own derivative) is a new kind of equation called
a differential equation (abbreviated DE). We say that f (x) = ex is a function that “satisfies” the
equation, and we call this a solution to the differential equation.
Note: The solution to an algebraic equation is a number, whereas the solution to a differential
equation is a function.
We call this a differential equation because it connects (one or more) derivatives of a function
with the function itself.
Concept Check-In
1. For what constant C does y = Cex satisfy the differential equation dy/dx = y?

2. What function satisfies the DE dy/dz = y?

Definition 11.1.1 (Differential equation). A differential equation is a mathematical equation that


relates one or more derivatives of some function to the function itself. Solving the differential
equation is the process of identifying the function(s) that satisfies the given relationship.

We will be interested in applications in which a system or process varies over time. For this
reason, we will henceforth use the independent variable t, for time in place of the former generic “x”.
Observations.

1. Consider the function of time: y = f (t ) = et .

Hint
Notice that we merely changed the notation very slightly. Now the derivative is “with
respect to” t rather than x.

Show that this function satisfies the differential equation


dy
= y.
dt

2. The functions y = ekt (for k constant) satisfy the differential equation


dy
= ky. (11.1.1)
dt
We can verify by differentiating y = ekt , using the chair rule. Setting u = kt, and y = eu , we
have
dy dy du dy
= = eu ¨ k = kekt = ky ñ = ky
dt du dt dt
Hence, we have established that y = ekt satisfies the DE (11.1.1).

It is interesting to ask: Is this is the only function that satisfies the differential equation 11.1.1? Are
there other possible solutions? What about a function such as y = 2ekt or y = 400ekt ?
The reader should show that for any constant C, the function y = Cekt is a solution to the
DE (11.1.1).
I NTRODUCTION TO D IFFERENTIAL E QUATIONS11.1 I NTRODUCING A NEW KIND OF EQUATION

Hint
Notice that the constant C in front will appear in both the derivative and the function, and so
will not change the equation.

To do so, differentiate the function and plug into (11.1.1). Verifying that the two sides of the equation
are then the same establishes the result. While we do not prove it here, it turns out that y = Cekt are
the only functions that satisfy Eqn. (11.1.1).
Let us summarize what we have found out so far:

Solutions to the differential equation


dy
= ky (11.1.2)
dt
are the functions
y = Cekt (11.1.3)
for C an arbitrary constant.

A few comments are in order. First, unlike algebraic equations - whose solutions are numbers -
differential equations have solutions that are functions.

Concept Check-In
1. Give an example of an algebraic equation and its solution.
dy
2. Verify that y = 3e´t satisfies differential equation dt = ´y.

3. Why is ekt always positive?

4. Sketch y = Cet for each of C = ´4, ´2, 2 and 4.

5. Sketch y = Ce´t for each of C = ´4, ´2, 2, and 4.

Second, the constant k that appears in Eqn. (11.1.2), is the same as the constant k in ekt . Depending
on the sign of k, we get either

a) exponential growth for k ą 0, as illustrated in Figure 11.1(a), or

b) exponential decay for k ă 0, as illustrated in Figure 11.1(b).

Third, since ekt is always positive, the constant C determines the sign of the function as a whole -
whether its graph lies above or below the t axis.
A few curves of each type (C ą 0,C ă 0) are shown in each panel of Figure 11.1. The collection
of curves in a panel is called a family of solution curves. The family shares the same value of k,
but each member has a distinct value of C. Next, we ask how to specify a particular member of the
family as the solution.
I NTRODUCTION TO D IFFERENTIAL E QUATIONS11.1 I NTRODUCING A NEW KIND OF EQUATION

y y

t
t

k>0 k<0

(a) (b)

Figure 11.1: (a) A family of solutions to the differential equation (DE) (11.1.2). These are functions
of the form y = Cekt for k ą 0 and arbitrary constant C. (b) Ampther family of solutions of a DE of
the form (11.1.2), but for k ă 0.

§§ The solution to a differential equation


Definition 11.1.2 (Solution to a differential equation). By a solution to a differential equation, we
mean a function that satisfies that equation.

We often refer to “solution curves” - the graphs of the family of solutions of a differential
equation, as shown, for example in the panels of Figure 11.1.
So far, we found that “many” functions can be valid solutions of the differential equation (11.1.2),
since we can chose the constant C arbitrarily in the family of solutions y = Cekt . Hence, in order
to distinguish one specific solution of interest, we need additional information. This additional
information is called an initial value, or initial condition, and it specifies one point belonging to
the solution curve of interest. A common way to set an initial value is to specify a fixed value of the
function (say y = y0 ) at time t = 0.

Definition 11.1.3 (Initial value). An initial value for a differential equation is a specified, known
value of the solution at some specific time point (usually at time t = 0).

` Adjust the sliders in this interactive graph to see how the values of k and C affect the shape
of the graph of the function y = Cekt as well as its initial value y(0) = y0 .
Note the transitions that take place when k changes from positive to negative.

Example 11.1.4. Given the differential Eqn. (11.1.2) and the initial value

y(0) = y0 ,

find the value of C for the solution in Eqn. (11.1.3).


I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH

Concept Check-In
1. Given differential Eqn. (11.1.2) and the initial value y(0) = 1, find C for the solution in
Eqn (11.1.3).

2. Repeat the above but for the initial value y(0) = 10.

3. Draw the ty-plane with the points (0, y0 ) for y0 = 1, 10.

4. Use differentiation to verify that the unction y = 3e´0.5t in Example 11.1.5 is a solution
to dy/dt = ´0.5y with initial condition y(0) = 3.

Solution. We proceed as follows:


y(t ) = Cekt , so y(0) = Cek¨0 = Ce0 = C ¨ 1 = C.
But, by the initial condition, y(0) = y0 . So,
C = y0
and we have established that
y(t ) = y0 ekt , where y0 is the initial value.

For example, in Figure 11.1, the initial value specifies that the solution we want passes through a
specific point in the ty-plane - namely, the point (0, y0 ). Only one curve in the family of curves has
that property. Hence, the initial value picks out a unique solution.
Example 11.1.5. Find the solution to the differential equation
dy
= ´0.5y
dt
that satisfies the initial condition y(0) = 3. Describe the behaviour of the solution you have found.
Solution. The DE indicates that k = ´0.5, so solutions are exponential functions y = Ce´0.5t . The
initial condition sets the value of C. From previous discussion, we know that C = y(0) = 3. Hence,
the solution is y = 3e´0.5t . This is a decaying exponential. ♦

11.2 IJ Differential equation for unlimited population growth


i A screencast summary of the model for (unlimited) human population growth.

Differential equations are important because they turn up in the study of many natural processes
that vary continuously. In this section we examine the way that a simple differential equation arises
when we study continuous uncontrolled population growth.
Here we set up a mathematical model for population growth. Let N (t ) be the number of
individuals in a population at time t. The population changes with time due to births and mortality.
(Here we ignore migration). Consider the changes that take place in the population size between
time t and t + h, where ∆t = h is a small time increment. Then
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH

Concept Check-In
1. What is the dependent variable in this model? The independent variable?

2. What are the units associated with each variable in this model?

3. What does “x is proportional to y” mean?

h i h i h i
Change Number Number
N (t + h) ´ N (t ) = = ´ (11.2.1)
in N of births of deaths
Eqn. (11.2.1) is just a “book-keeping” equation that keeps track of people entering and leaving the
population. It is sometimes called a balance equation. We use it to derive a differential equation
linking the derivative of N to the value of N at the given time.
Notice that dividing each term by the time interval h, we obtain
   
N (t + h) ´ N (t ) Number of births Number of deaths
= ´ .
h h h

The term on the left “looks familiar”. If we shrink the time interval, h Ñ 0, this term is a derivative
dN/dt, so
     
Rate of Number Number
dN  change of   of births   of deaths 
= = ´ 
dt N per unit per unit per unit
time time time
For simplicity, we assume that all individuals are identical and that the number of births per unit
time is proportional to the population size. Denote by r the constant of proportionality. Similarly,
we assume that the number of deaths per unit time is proportional to population size with m the
constant of proportionality.
Both r and m have meanings: r is the average per capita birth rate, and m is the average per
capita mortality rate . Here, both are assumed to be fixed positive constants that carry units of
1/time. This is required to make the units match for every term in Eqn. (11.2.1). Then

number births per unit time


r = per capita birth rate = ,
population size

Concept Check-In
1. If there are 10 births/year in a population of size 1000, what is the birth rate r? Give units.

2. If there are 11 deaths/year in a population of size 1000, what is the mortality rate m? Give
units.

3. Given the above conditions, what is the net growth rate k for such a population? Give
units. Is the population growing or shrinking?

number deaths per unit time


m = per capita mortality rate = .
population size
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH

Consequently, we have

Number of births per unit time = rN,


Number of deaths per unit time = mN.

We refer to constants such as r, m as parameters. In general, for a given population, these would
have specific numerical values that could be found through experiment, by collecting data, or by
making simple assumptions. In Section 11.2, we show how some elementary assumptions about
birth and mortality could help to estimate approximate values of r and m.
Taking the assumptions and the form of the balance equation (11.2.1) together we have arrived
at:
dN
= rN ´ mN = (r ´ m)N. (11.2.2)
dt
This is a differential equation: it links the derivative of N (t ) to the function N (t ). By solving
the equation (i.e. identifying its solution), we are be able to make a projection about how fast a
population is growing.
Define the constant k = r ´ m. Then k is the net growth rate, of the population, so

dN
= kN, for k = (r ´ m).
dt
Suppose we also know that at time t = 0, the population size is N0 . Then:

• The function that describes population over time is (by previous results),

N (t ) = N0 ekt = N0 e(r´m)t . (11.2.3)

(The result is identical to what we saw previously, but with N rather than y as the time-
dependent function. We can easily check by differentiation that this function satisfies
Eqn. (11.2.2).)

• Since N (t ) represents a population size, it has to be non-negative to have biological relevance.


This is true so long as N0 ě 0.

• The initial condition N (0) = N0 , allows us to specify the (otherwise arbitrary) constant
multiplying the exponential function.

• The population grows provided k ą 0 which happens when r ´ m ą 0 i.e. when birth rate
exceeds mortality rate.

• If k ă 0, or equivalently, r ă m then more people die on average than are born, so that the
population shrinks and (eventually) go extinct.

§§ A simple model for human population growth


The differential equation (11.2.2) and its initial condition led us to predict that a population grows or
decays exponentially in time, according to Eqn. (11.2.3). We can make this prediction quantitative
by estimating the values of parameters r and m. To this end, let us consider the example of a human
population and make further simplifying assumptions. We measure time in years.
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH

number
of people

age
0 80
Figure 11.2: Flat age distribution assumption

We assume a uniform age distribution to to determine the fraction of people who are fertile (and can
give birth) or who are old (and likely to die). While slightly silly, this simplification helps estimate
the desired parameters.
Assumptions.

• The age distribution of the population is “flat”, i.e. there are as many 10 year-olds as 70 year
olds. Of course, this is quite inaccurate, but a good place to start since it is easy to estimate
some of the quantities we need. Figure 11.2 shows such a uniform age distribution.

• The sex ratio is roughly 50%. This means that half of the population is female and half male.
number of
people fertile

age
0 15 55 80

Figure 11.3: Simple assumption about fertility

We assume that only women between the ages of 15 and 55 years old are fertile and can give birth.
Then, according to our uniform age distribution assumption, half of all women are between these
ages and hence fertile.

• Women are fertile and can have babies only during part of their lives: we assume that the
fertile years are between age 15 and age 55, as shown in Figure 11.3.

• A lifetime lasts 80 years. This means that for half of that time a given woman can contribute
(55´15)
to the birth rate, or that 80 = 50% of women alive at any time are able to give birth.

• During a woman’s fertile years, we assume that on average, she has one baby every 10 years.

• We assume that deaths occur only from old age (i.e. we ignore disease, war, famine, and child
mortality.)

• We assume that everyone lives precisely to age 80, and then dies instantly.
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH

number of
people mortality occurs
here

age
0 80

Figure 11.4: Simple assumption about mortality

We assume that the people in the age bracket 79-80 years old all die each year, and that those are the
only deaths. This, too, is a silly assumption, but makes it easy to estimate mortality in the population.
Based on the above assumptions, we can estimate the birthrate parameter r as follows:
number women years fertile number babies per woman
r= ¨ ¨
population years of life number of years
Thus we compute that
1 1 1
r= ¨ ¨ = 0.025 births per person per year.
2 2 10

Concept Check-In
1. Under these assumptions, for a population size of 800, how many male 35 year-olds
would you expect? Women in their 60’s?

2. Is the fertility assumption reasonable? Why or why not?

3. Explain the units attached to the birthrate parameter r.

Note that this value is now a rate per person per year, averaged over the entire population (male and
female, of all ages). We need such an average rate since our model of Eqn. (11.2.2) assumes that
individuals “are identical”. We now have an approximate value for the average human per capita
birth rate, r « 0.025 per year.
Next, using our assumptions, we estimate the mortality parameter, m. With the flat age distribu-
tion shown in Figure 11.2, there would be a fraction of 1/80 of the population who are precisely
removed by mortality every year (i.e. only those in their 80th year.) In this case, we can estimate
that the per capita mortality is:
1
m= = 0.0125 deaths per person per year.
80
The net per capita growth rate is k = r ´ m = 0.025 ´ 0.0125 = 0.0125 per person per year. We
often refer to the constant k as a growth rate constant and we also say that the population grows at
the rate of 1.25% per year.
Example 11.2.1. Using the results of this section, find a prediction for the population size N (t ) as a
function of time t.
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH

Solution. We have found that our population satisfies the equation


dN
= (r ´ m)N = kN = 0.0125N,
dt
so that
N (t ) = N0 e0.0125t , (11.2.4)
where N0 is the starting population size. Figure 11.5 illustrates how this function behaves, using a
starting value of N (0) = N0 = 7 billion. ♦
Human population
20

15
N (t)

10

5
20 40 60 80 100
t (years)
Figure 11.5: Projected world population

Projected world population (in billions) over 100 years, based on the model in Eqn. (11.2.4) and
assuming that the initial population is « 7 billion.

Concept Check-In
1. Based on Figure 11.5, when would we expect the human population to reach 15 billion?

Example 11.2.2 (Human population in 100 years). Given the initial condition N (0) = 7 billion,
determine the size of the human population at t = 100 years predicted by the model.
Solution. At time t = 0, the population is N (0) = N0 = 7 billion. Then in billions,

N (t ) = 7e0.0125t

so that when t = 100 we would have

N (100) = 7e0.0125¨100 = 7e1.25 = 7 ¨ 3.49 = 24.43.

Thus, with a starting population of 7 billion, there would be about 24.4 billion after 100 years based
on the uncontrolled continuous growth model. ♦
A critique. Before leaving our population model, we should remember that our projections hold
only so long as some rather restrictive assumptions are made. We have made many simplifications,
and ignored many features that would seriously affect these results. These include (among others),
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.2 D IFFERENTIAL EQUATION FOR
UNLIMITED POPULATION GROWTH

• variations in birth and mortality rates that stem from competition for resources and,

• epidemics that take hold when crowding occurs, and

• uneven distributions of resources or space.

We have also assumed that the age distribution is uniform (flat), but that is not accurate: the
population grows only by adding new infants, and this would skew the distribution even if it is
initially uniform. All these factors suggest that some “healthy skepticism” should be applied to
any model predictions. Predictions may cease to be valid if model assumptions are not satisfied.
This caveat will lead us to think about more realistic models for population growth. Certainly, the
uncontrolled exponential growth would not be sustainable in the long run. That said, such a model
is a good starting point for a first description of population growth, later to be adjusted.

§§ Growth and doubling


The doubling time. How long would it take a population to double, given that it is growing
exponentially with growth rate k? We seek a time t such that N (t ) = 2N0 . Then

N (t ) = 2N0 and N (t ) = N0 ekt ,

implies that the population has doubled when t satisfies

2N0 = N0 ekt , ñ 2 = ekt ñ ln(2) = ln(ekt ) = kt.

We solve for t. Thus, the doubling time, denoted τ is:

ln(2)
τ= .
k

Concept Check-In
1. What are the units associated with τ?

2. The human population hit 3 billion in 1959. How does this fit with our (imperfect) model?

Example 11.2.3 (Human population doubling time). Determine the doubling time for the human
population based on the results of our approximate growth model.

Solution. We have found a growth rate of roughly k = 0.0125 per year for the human population.
Based on this, it would take
ln(2)
τ= = 55.45 years
0.0125
for the population to double. Compare this with the graph of Fig 11.5, and note that over this time
span, the population increases from 6 to 12 billion. ♦
Note: the observant student may notice that we are simply converting back from base e to base 2
when we compute the doubling time.
We summarize an important observation:
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.3 R ADIOACTIVE DECAY

In general, an equation of the form


dy
= ky
dt
that represents an exponential growth has a doubling time of

ln(2)
τ= .
k

This is shown in Figure 11.6. We have discovered that based on the uncontrolled growth model,
the population doubles every 55 years! After 110 years, for example, there have been two doublings,
or a quadrupling of the population.
y

2y0

y0

t
τ
Figure 11.6: Doubling time for exponential growth.
Example 11.2.4 (A ten year doubling time). Suppose we are told that some animal population
doubles every 10 years. What growth rate would lead to such a trend?

Solution. In this case, τ = 10 years. Rearranging

ln(2)
τ= ,
k
we obtain
ln(2) 0.6931
k= = « 0.07 per year.
τ 10
Thus, a growth rate of 7% leads to doubling roughly every 10 years. ♦

11.3 IJ Radioactive decay


A radioactive material consists of atoms that undergo a spontaneous change. Every so often, some
radioactive atom emits a particle, and decays into an inert form. We call this a process of radioactive
decay. For any one atom, it is impossible to predict when this event would occur exactly, but based
on the behaviour of a large number of atoms decaying spontaneously, we can assign a probability k
of decay per unit time.
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.3 R ADIOACTIVE DECAY

In this section, we use the same kind of book-keeping (keeping track of the number of radioactive
atoms remaining) as in the population growth example, to arrive at a differential equation that
describes the process. Once we have the equation, we determine its solution and make a long-term
prediction about the amount of radioactivity remaining at a future time.

§§ Deriving the model


We start by letting N (t ) be the number of radioactive atoms at time t. Generally, we would
know N (0), the number present initially. Our goal is to make simple assumptions about the process
of decay that allows us to arrive at a mathematical model to predict values of N (t ) at any later
time t ą 0.
Assumptions.

(1) The process of radioactive decay is random, but on average, the probability of decay for a given
radioactive atom is k per unit time where k ą 0 is some constant.

(2) During each (small) time interval of length ∆t = h, a radioactive atom has probability kh of
decaying. This is merely a restatement of (1).

Concept Check-In
1. Suppose a given atom has a 1% chance of decay per 24 hours. What is this atom’s
probability of decay per week? Per hour?

Suppose that at some time t, there are N (t ) radioactive atoms. Then, according to our assump-
tions, during the time period t ď t ď t + h, on average khN (t0 ) atoms would decay. How many are
there at time t + h? We can write the following balance-equation:
     
Amount left Amount present Amount decayed
 at time  =  at time  ´  during time interval 
t +h t t ď t ď t +h
or, restated:
N (t + h) = N (t ) ´ khN (t ). (11.3.1)
Here we have assumed that h is a small time period. Rearranging Eqn. (11.3.1) leads to

N (t + h) ´ N (t )
= ´kN (t ).
h
Considering the left hand side of this equation, we let h get smaller and smaller (h Ñ 0) and recall
that
N (t + h) ´ N (t ) dN
lim = = N 1 (t )
hÑ0 h dt
where we have used the notation for a derivative of N with respect to t. We have thus shown that a
description of the population of radioactive atoms reduces to

dN
= ´kN. (11.3.2)
dt
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.3 R ADIOACTIVE DECAY

We have, once more, arrived at a differential equation that provides a link between a function of time
N (t ) and its own rate of change dN/dt. Indeed, this equation specifies that dN/dt is proportional
to N, but with a negative constant of proportionality which implies decay.
Above we formulated the entire model in terms of the number of radioactive atoms. However,
as shown below, the same equation holds regardless of the system of units used measure the amount
of radioactivity
Example 11.3.1. Define the number of moles of radioactive material by y(t ) = N (t )/A where A
is Avogadro’s number (the number of molecules in 1 mole: « 6.022 ˆ 1023 - a dimensionless
quantity, i.e. just a number with no associated units). Determine the differential equation satisfied
by y(t ).
Solution. We write y(t ) = N (t )/A in the form N (t ) = Ay(t ) and substitute this expression for N (t )
in Eqn. (11.3.2). We use the fact that A is a constant to simplify the derivative. Then
dN Ady(t ) dy(t )
= ´kN ñ = ´k(Ay(t )) ñ A = A(´ky(t ))
dt dt dt
cancelling the constant A from both sides of the equations leads to
dy(t ) dy
= ´ky(t ), or simply = ´ky. (11.3.3)
dt dt
Thus y(t ) satisfies the same kind of differential equation (with the same negative proportionality
constant) between the derivative and the original function. We will refer to (11.3.3) as the decay
equation. ♦

§§ Solution to the decay equation (11.3.3)


Suppose that initially, there was an amount y0 . Then, together, the differential equation and initial
condition are
dy
= ´ky, y(0) = y0 . (11.3.4)
dt
We often refer to this pairing between a differential equation and an initial condition as an initial
value problem. Next, we show that an exponential function is an appropriate solution to this
problem
Example 11.3.2 (Checking a solution). Show that the function
y(t ) = y0 e´kt . (11.3.5)
is a solution to initial value problem (11.3.4).
Solution. Ee compute the derivative of the candidate function (11.3.5), and rearrange, obtaining
dy(t ) d de´kt
= [y0 e´kt ] = y0 = ´ky0 e´kt = ´ky(t ).
dt dt dt
This verifies that for the derivative of the function is ´k times the original function, so satisfies the
DE in (11.3.4). We can also check that the initial condition is satisfied:
y(0) = y0 e´k¨0 = y0 e0 = y0 ¨ 1 = y0 .
Hence, Eqn. (11.3.5) is the solution to the initial value problem for radioactive decay. For k ą 0 a
constant, this is a decreasing function of time that we refer to as exponential decay. ♦
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.3 R ADIOACTIVE DECAY

§§ The half life


Given a process of exponential decay, how long would it take for half of the original amount to
remain? Let us recall that the “original amount” (at time t = 0) is y0 . Then we are looking for the
time t such that y0 /2 remains. We must solve for t in
y0
y(t ) = .
2
We refer to the value of t that satisfies this as the half life.
Example 11.3.3 (Half life). Determine the half life in the exponential decay described by Eqn. (11.3.5).
Solution. We compute:
y0 1
= y0 e´kt ñ = e´kt .
2 2
Now taking reciprocals:
1
2= = ekt .
e´kt
Thus we find the same result as in our calculation for doubling times, namely,

ln(2) = ln(ekt ) = kt,

so that the half life is


ln(2)
τ= .
k
This is shown in Figure 11.7.

y0

y0
2

t
[-1in] τ

Figure 11.7: Half-life in an exponentially decreasing process.

Example 11.3.4 (Chernobyl: April 1986). In 1986 the Chernobyl nuclear power plant exploded, and
scattered radioactive material over Europe. The radioactive element iodine-131 (I131 ) has half-life
of 8 days whereas cesium-137 (Cs137 ) has half life of 30 years. Use the model for radioactive decay
to predict how much of this material would remain over time.
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.4 S UMMARY

Solution. We first determine the decay constants for each of these two elements, by noting that
ln(2)
k= ,
τ
and recalling that ln(2) « 0.693. Then for I131 we have

ln(2) ln(2)
k= = = 0.0866 per day.
τ 8
Then the amount of I131 left at time t (in days) would be

yI (t ) = y0 e´0.0866t .

For Cs137
ln(2)
k= = 0.023 per year.
30
so that for T in years,
yC (T ) = y0 e´0.023T .
Note: we have used T rather than t to emphasize that units are different in the two calculations done
in this example.
Example 11.3.5 (Decay to 0.1% of the initial level). How long it would take for I131 to decay to 0.1
% of its initial level? Assume that the initial level occurred just after the explosion at Chernobyl.
Solution. We must calculate the time t such that yI = 0.001y0 :

0.001y0 = y0 e´0.0866t ñ 0.001 = e´0.0866t ñ ln(0.001) = ´0.0866t.

Therefore,
ln(0.001) ´6.9
t= = = 79.7 days.
´0.0866 ´0.0866
Thus it would take about 80 days for the level of Iodine-131 to decay to 0.1% of its initial level. ♦

Concept Check-In
1. Repeat the calculation in Example 11.3.5 for Cesium.

2. Convert the Cesium decay time units to days and repeat the calculation of Example 11.3.4
with the new time units.

3. If the decay rate of a substance is 10% per day, what is its half-life?

11.4 IJ Summary
1. A differential equation is a statement linking the rate of change of some state variable with
current values of that variable. An example is the simplest population growth model: if N (t )
is population size at time t:
dN
= kN.
dt
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.4 S UMMARY

2. A solution to a differential equation is a function that satisfies the equation. For instance, the
function N (t ) = Cekt (for any constant C) is a solution to the unlimited population growth
model (we check this by the appropriate differentiation). Graphs of such solutions (e.g. N
versus t) are called solution curves.

3. To select a specific solution, more information (an initial condition) is needed. Given this
information, e.g. N (0) = N0 , we can fully characterize the desired solution.

4. The decay equation is one representative of the same class of problems, and has an exponen-
tially decaying solution.

dy
= ´ky, y(0) = y0 ñ Solution: y(t ) = y0 e´kt . (11.4.1)
dt

5. So far, we have seen simple differential equations with simple (exponential) functions for
their solutions. In general, it may be quite challenging to make the connection between the
differential equation (stemming from some application or model) with the solution (which we
want in order to understand and predict the behaviour of the system.)

Scientific
problem or
system

“Laws of
Facts,
Nature” or
observations
statements
assumptions,
about rates
hypotheses
of change

Predictions
about the Mathematical
systme model
behaviour

Solutions to Differential
the equation(s)
differential describing
equations the system
[0in]

Figure 11.8: A “flow chart” showing how differential equations originate from scientific problems.

In this chapter, we saw examples in which a natural phenomenon (population growth, radioactive
decay, cell growth) motivated a mathematical model that led to a differential equation. In both cases,
I NTRODUCTION TO D IFFERENTIAL E QUATIONS 11.4 S UMMARY

that equation was derived by making a statement that tracked the amount or number or mass of a
system over time. Numerous simplifications were made to derive each differential equation. For
example, we assumed that the birth and mortality rates stay fixed even as the population grows to
huge sizes.

With regard to a larger context.

• Our purpose was to illustrate how a simple model is created, and what such models can
predict.

• In general, differential equation models are often based on physical laws (“F = ma”) or
conservation statements (“rate in minus rate out equals net rate of change”, or “total energy =
constant”).

• In biology, where the laws governing biochemical events are less formal, the models are often
based on some mix of speculation and reasonable assumptions.

• In Figure 11.8 we illustrate how the scientific method leads to a cycle between the mathematical
models and their test and validation using observations about the natural world.

Quick Concept Check


1. Identify each of the following with either exponential growth or exponential decay:

(a) y = 20e3t ;
(b) y = 5e´3t ;
dy
(c) dt = 3t;
dy
(d) dx = ´5x.

2. Determine the doubling time of the exponential growth function N (t ) = 500e2t .

3. Determine the half life of the of the exponential decay function N (t ) = 500e´2t .

4. Consider the following figure depicting exponential growth:


20

15

y
10

20 40 60 80 100
t

What is the doubling time of this function?


S OLVING DIFFERENTIAL EQUATIONS

Chapter 12

(F LAVOURS A, B) S OLVING
DIFFERENTIAL EQUATIONS

In Chapter 11, we introduced differential equations to keep track of continuous changes in the
growth of a population or the decay of radioactivity. We encountered a differential equation that
tracks changes in cell mass due to nutrient absorption and consumption. Finally, we learned that
the solutions to a differential equation is a function. In applications studied, that function can be
interpreted as predictions of the behaviour of the system or process over time.
In this chapter, we further develop some of these ideas. We explore several techniques for finding
and verifying that a given function is a solution to a differential equation. We then examine a simple
class of differential equations that have many applications to processes of production and decay,
and find their solutions. Finally, we show how an approximation method provides for numerical
solutions of such problems.

12.1 IJ Verifying that a function is a solution

In this section we concentrate on analytic solutions to a differential equation. By analytic solution,


we mean a “formula” such as y = f (x) that satisfies the given differential equation. We saw in Chap-
ter 11 that we can check whether a function satisfies a differential equation (e.g., Example 11.3.2)
by simple differentiation. In this section, we further demonstrate this process.

Example 12.1.1. Show that the function y(t ) = (2t + 1)1/2 is a solution to the differential equation
and initial condition
dy 1
= , y(0) = 1.
dt y
Solution. First, we check the derivative, obtaining

dy(t ) d (2t + 1)1/2 1


= = (2t + 1)´1/2 ¨ 2
dt dt 2
1 1
= (2t + 1)´1/2 = 1/2
= .
(2t + 1) y
S OLVING DIFFERENTIAL EQUATIONS 12.1 V ERIFYING THAT A FUNCTION IS A SOLUTION

LHS RHS
dy
1´y
dt

d [y0 e´t ]
1 ´ y0 e´t
dt

´y0 e´t X

Table 12.1: The function y(t ) = y0 e´t is not a solution to the differential equation (12.1.1). Plugging
the function into each side of the DE and simplifying (down the rows) leads to expressions that do
not match.

LHS RHS
dy
1´y
dt
d  
[1 ´ (1 ´ y0 )e´t ] 1 ´ 1 ´ (1 ´ y0 )e´t
dt

de´t
´(1 ´ y0 ) (1 ´ y0 )e´t
dt

(1 ´ y0 )e´t X
Table 12.2: (b) The function y(t ) = 1 ´ (1 ´ y0 )e´t is a solution to the differential equation (12.1.1).
The expressions we get by evaluating each side of the differential equation do match.

Hence, the function satisfies the differential equation. We must also verify the initial condition. We
find that y(0) = (2 ¨ 0 + 1)1/2 = 11/2 = 1. Thus the initial condition is also satisfied, and y(t ) is
indeed a solution. ♦
Example 12.1.2. Consider the differential equation and initial condition
dy
= 1 ´ y, y(0) = y0 . (12.1.1)
dt
a) Show that the function y(t ) = y0 e´t is not a solution to this differential equation.
b) Show that the function y(t ) = 1 ´ (1 ´ y0 )e´t is a solution.
Solution.
a) To check whether y(t ) = y0 e´t is a solution to the differential equation (12.1.1), we substitute
the function into each side (“left hand side”, LHS; “right hand side”. RHS) of the equation. We
show the results in the columns of Table 12.1. After some steps in the simplification, we see that
the two sides do not match, and conclude that the function is not a solution, as it fails to satisfy
the equation
b) Similarly, we check the second function. The calculations are shown in columns of Table 12.2.
We find that RHS=LHS, so the differential equation is satisfied. Finally, let us show that the
initial condition y(0) = y0 is also satisfied. Plugging in t = 0 we have
y(0) = 1 ´ (1 ´ y0 )e0 = 1 ´ (1 ´ y0 ) ¨ 1 = 1 ´ (1 ´ y0 ) = y0 .
S OLVING DIFFERENTIAL EQUATIONS 12.1 V ERIFYING THAT A FUNCTION IS A SOLUTION

Thus, both differential equation and initial condition are satisfied.



Example 12.1.3 (Height of water draining out of a cylindrical container). A cylindrical container
with cross-sectional area A has a small hole of area a at its base, through which water leaks out. It
can be shown that height of water h(t ) in the container satisfies the differential equation
dh ?
= ´k h, (12.1.2)
dt
?
(where k is a constant that depends on the size and shape of the cylinder and its hole: k = Aa 2g ą 0
and g is acceleration due to gravity.) Show that the function
a t 2
h(t ) = h0 ´ k (12.1.3)
2

Concept Check-In
1. Draw a diagram of the system described in Example 12.1.3.

2. What set of units would be reasonable for each of the parameters in Example 12.1.3.

3. Create a table to organize the calculations for this example, similar to Tables 12.1 and
12.2.

is a solution to the differential equation (12.1.2) and initial condition h(0) = h0 .


Solution. We first easily verify that the initial condition is satisfied. Substitute t = 0 into the function
(12.1.3). Then we find h(0) = h0 , verifying the initial conditions.
To show that the differential equation (12.1.2) is satisfied, we differentiate the function in
Eqn. (12.1.3):
 
dh(t ) d a t 2 a t  ´k
= h0 ´ k = 2 h0 ´ k ¨
dt dt 2 2 2
a t  b
= ´k h0 ´ k = ´k h(t ).
2
Here we have used the power law and the chain rule, aremembering that h0 , k are constants. Now we
notice that, using Eqn. (12.1.3), the expression for h(t ) exactly matches what we have computed
for dh/dt. Thus, we have shown that the function in Eqn. (12.1.3) satisfies both the initial condition
and the differential equation. ♦

As shown in Examples 12.1.1- 12.1.3, if we are told that a function is a solution to a differential
equation, we can check the assertion and verify that it is correct or incorrect. A much more difficult
task is to find the solution of a new differential equation from first principles.
In some cases, integration, learned in second semester calculus, can be used. In others, some
transformation that changes the problem to a more familiar one is helpful - an example of this type
is presented in Section 12.2. In many cases, particularly those of so-called non-linear differential
equations, great expertise and familiarity with advanced mathematical methods are required to
find the solution to such problems in an analytic form, i.e. as an explicit formula. In such cases,
approximation and numerical methods are helpful.
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by

12.2 IJ Equations of the form y1 (t ) = a ´ by


In this section we introduce an important class of differential equations that have many applications
in physics, chemistry, biology, and other applications. All share a similar structure, namely all are of
the form
dy
= a ´ by, y(0) = y0 . (12.2.1)
dt
First, we show how a solution to such equation can be found. Then, we examine a number of
applications.

§§ Special solutions: steady states


We first ask about “special solutions” to the differential equation (12.2.1) in which there is no change
over time. That is, we ask whether there are values of y for which dy/dt = 0.

dy
i An explanation of the way we find solutions to equations of the form dt = a ´ by, with
y(0) = y0 .

From (12.2.1), we find that such solutions would satisfy

dy a
=0 ñ a ´ by = 0 ñ y= .
dt b
In other words, if we were to start with the initial value y(0) = a/b, then that value would not

Figure 12.1: y = a/b is a constant solution to the differential equation in (12.2.1). We call this type
of solution a steady state.

change, since it satisfies dy/dt = 0, so that the solution at all future times would be y(t ) = a/b.
(Of course, this is a perfectly good function; it is simply a function that is alway constant.)
We refer to such constant solutions as Steady States.

§§ Other solutions: away from steady state


What happens if we start with a value of y that is not exactly at the “special” steady state? Let us
rewrite the DE in a more suggestive form,

dy dy  a
= a ´ by ñ = ´b y ´ ,
dt dt b
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by

(having factored out ´b). The advantage is that we recognize the expression (y ´ ab ) as the difference,
or deviation of y away from its steady state value. (That deviation could be either positive or negative,
depending on whether y is larger or smaller than a/b.) We ask whether this deviation gets larger or
smaller as time goes by, i.e., whether y gets further away or closer to its steady state value a/b.
Define z(t ) as that deviation, that is

Figure 12.2: We define z(t ) as the deviation of y from its steady state value. Here we show two
typical initial values of z, where z0 = y0 ´ ab .

a
z(t ) = y(t ) ´ ,
b
Then, since a, b are constants, we recognize that
dz dy
= .
dt dt
Second, the initial value of z follows simply from the initial value of y:
a a
z(0) = y(0) ´ = y0 ´ .
b b
Now we can transform the equation (12.2.1) into a new differential equation for the variable z
by using these two facts. We can replace the y derivative by the z derivative, and also, using
Eqn. (12.2.1), find that
dz dy  a
= = ´b y ´ = ´bz.
dt dt b
Hence, we have transformed the original DE and IC into the new problem

Figure 12.3: The deviation away from steady state (blue, grey curves) is z(t ) = y(t ) ´ a/b. We can
solve the differential equation for z(t ) because it is a simple exponential decay equation. Here we
show two typical solutions for z.

dz h ai
= ´bz, z(0) = z0 , where z0 = y0 ´ .
dt b
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by

Figure 12.4: Finally, we can determine the solution y(t ).

But this is the familiar decay initial value problem that we have already solved before. So
z(t ) = z0 e´bt .
We have arrived at the conclusion that the deviation from steady state decays exponentially with
time, provided that b ą 0. Hence, we already know that y should get closer to the constant value
a/b as time goes by!
We can do even better than this, by transforming the solution we found for z(t ) into an expression
for y(t ). To do so, use the definition once more, setting

` Adjust the sliders to see how the parameters a and b and the initial value y0 affect the shape
of the function y(t ) in the formula (12.2.2).
a  a  ´bt
z(t ) = z0 e´bt
ñ y(t ) ´ = y0 ´ e .
b b
Solving for y(t ) then leads to
a  a  ´bt
y(t ) = + y0 ´ e . (12.2.2)
b b
Example 12.2.1 (a = b = 1). Suppose we are given the differential equation and initial condition
dy
= 1 ´ y, y(0) = y0 . (12.2.3)
dt
Determine the solution to this differential equation.
Solution.
Concept Check-In
1. Find the steady state of Eqn. (12.2.3).

2. From Figure 12.5, determine what were the four different initial conditions used.

3. Rewrite these four initial conditions as the initial deviations away from steady state, that
is, give the initial values, z0 of the deviation.

By substituting a = 1, b = 1 in the solution found above, we observe that


y(t ) = 1 ´ (1 ´ y0 )e´t .
Representative curves in this family of solutions are shown in Figure 12.5 for various initial values
y0 . ♦
We now apply the methods to a number of examples.
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by

y
3
Solutions to
differential equation
dy
dt = 1 − y
2

0.5 1 1.5 2
time, t

Figure 12.5: Solutions to Eqn. (12.2.3) are functions that approach y = 1.

§§ Newton’s law of cooling


Note: Newtons law of cooling is another nice example of the modelling we can do using differential
equations. For Math 100, you don’t have to memorize the specifics of this law – it’s shown here
only as an example.
Consider an object at temperature T (t ) in an environment whose ambient temperature is E.
Depending on whether the object is cooler or warmer than the environment, it heats up or cools
down. From common experience we know that, after a long time, the temperature of the object
equilibrates with its environment.
Isaac Newton formulated a hypothesis to describe the rate of change of temperature of an object.
He assumed that
The rate of change of temperature T of an object is proportional to the difference between its
temperature and the ambient temperature, E.
To rephrase this statement mathematically, we write
dT
is proportional to (T (t ) ´ E ).
dt
This implies that the derivative dT /dt is some constant multiple of the term (T (t ) ´ E ). However,
the sign of that constant requires some discussion. Denote the constant of proportionality by α
temporarily, and suppose α ě 0. Let us check whether the differential equation
dT
= α (T (t ) ´ E ),
dt
makes physical sense.
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by

Concept Check-In
1. What can we say about the units of T and E?

Suppose the object is warmer than its environment (T (t ) ą E). Then T (t ) ´ E ) ą 0 and α ě 0
implies that dT /dt ą 0 which says that the temperature of the object should get warmer! But this
does not agree with our everyday experience: a hot cup of coffee cools off in a chilly room. Hence
α ě 0 cannot be correct. Based on this, we conclude that Newton’s Law of Cooling, written in the
form of a differential equation, should read:
dT
= k(E ´ T (t )), where k ą 0. (12.2.4)
dt
Note: the sign of the term in braces has been switched.
Typically, given the temperature at some initial time T (0) = T0 , we want to predict T (t ) for later
time.
Example 12.2.2. Consider the temperature T (t ) as a function of time. Solve the differential equation
for Newton’s law of cooling
dT
= k (E ´ T ),
dt
together with the initial condition T (0) = T0 .
Solution. As before, we transform the variable to reduce the differential equation to one that we
know how to solve. This time, we select the new variable to be z(t ) = E ´ T (t ). Then, by steps
similar to previous examples, we find that
dz(t )
= ´kz.
dt
We also rewrite the initial condition in terms of z, leading to z(0) = E ´ T (0) = E ´ T0 . After
carrying out Steps 1-3 as before, we find the solution for T (t ),

T (t ) = E + (T0 ´ E )e´kt . (12.2.5)

Concept Check-In
1. Fill in the details for Example 12.2.2.

2. In Figure 12.6, what are the five different initial temperatures, T0 corresponding to each
solution curve?

3. In Figure 12.6, how many curves represent a heating object and how many a cooling
object?

In Figure 12.6 we show a family of curves of the form of Eqn. (12.2.5) for five different initial
temperature values (we have set E = 10 and k = 0.2 for all these curves). ♦
Next, we interpret the behaviour of these solutions.
Example 12.2.3. Explain (in words) what the form of the solution in Eqn. (12.2.5) of Newton’s law
of cooling implies about the temperature of an object as it warms or cools.
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by

20

temperature, T 15

10

0
t
[-1in] 0 2 4 6 8 10 12 14

Figure 12.6: Temperature versus time, T (t ), for a cooling object.

Solution. We make the following remarks

• It is straightforward to verify that the initial temperature is T (0) = T0 (substitute t = 0 into the
solution of Eqn. (12.2.5)). Now examine the time dependence. Only one term, e´kt depends
on time. Since k ą 0, this is an exponentially decaying function, whose magnitude shrinks
with time. The whole term that it multiplies, (T0 ´ E )e´kt , continually shrinks. Hence,

T (t ) = E + (T0 ´ E )e´kt ñ as t Ñ 8, e´kt Ñ 0,


so T (t ) Ñ E.

Thus the temperature of the object always approaches the ambient temperature. This is evident
in the solution curves shown in Figure 12.6.

• We also observe that the direction of approach (decreasing or increasing) depends on the
sign of the constant (T0 ´ E ). If T0 ą E, the temperature approaches E from above, whereas
if T0 ă E, the temperature approaches E from below.

• In the specific case that T0 = E, there is no change at all. T = E satisfies dT /dt = 0, and
corresponds to a steady state of the differential equation, as previously defined.

Steady states are studied in more detail in Chapter 13.

Concept Check-In
1. Consider three cups of coffee left in a 20˝ C room. If one is iced, another is piping hot,
and the third is room temperature, which cup will not change temperature? Which, thus,
represents a steady state?

2. Convert the temperatures in Example 12.2.4 to Fahrenheit and repeat.


S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by

§§ Using Newton’s law of cooling to solve a mystery


Now that we have a detailed solution to the differential equation representing Newton’s law of
cooling, we can apply it to making exact determinations of temperature over time, or of time at
which a certain temperature was attained. The following example illustrates an application of this
idea.
Example 12.2.4 (Murder mystery). It is a dark clear night. The air temperature is 10˝ C. A body is
discovered at midnight. Its temperature is 27˝ C. One hour later, the body has cooled to 24˝ C. Use
Newton’s law of cooling to determine the time of death.

i Details of the calculations for Example 12.2.4.

Solution. We assume that body-temperature just before death was 37˝ C (normal human body
temperature). Let t = 0 be the time of death. Then the initial temperature is T (0) = T0 = 37˝ C. We
want to find the time elapsed until the body was found, i.e. time t at which the temperature of the
body had cooled down to 27˝ C. We assume that the ambient temperature, E = 10, was constant.
From Newton’s law of cooling, the body temperature satisfies
dT
= k(10 ´ T ).
dt
From previous work and Eqn. (12.2.5), the solution to this DE is

T (t ) = 10 + (37 ´ 10)e´kt .

We do not know the value of the constant k, but we have enough information to find it. First, at
discovery, the body’s temperature was 27˝ . Hence at time t

27 = 10 + 27e´kt ñ 17 = 27e´kt .

Also at t + 1 (one hour after discovery), the temperature was 24˝ C, so

T (t + 1) = 10 + (37 ´ 10)e´k(t +1) = 24, ñ 24 = 10 + 27e´k(t +1) .

Thus,
14 = 27e´k(t +1) .
We have two equations for the two unknowns t and k. To solve for k, take a ratio of the sides of
the equations. Then
 
14 27e´k(t +1) 14
= ´kt
=e ´k
ñ ´k = ln = ´0.194 .
17 27e 17
This is the constant that describes the rate of cooling of the body.
To find the time of death, t, use
 
17
17 = 27e ´kt
ñ ´kt = ln = ´0.4626
27
finally, solving for t, we get
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by

Concept Check-In
1. Give the concluding sentence for Example 12.2.4. Be sure to include an actual time of
death, given that the body was discovered at midnight.

2. Use a plotting program to graph T (t ) for Example 12.2.4.

3. Use your plot to estimate how long it took for the body to cool off to 33˝ C.

0.4626 0.4626
t= = = 2.384 hours.
k 0.194

§§ Related applications and further examples


Having gained familiarity with specific examples, we now return to the general case and summarize
the results.

The differential equation and initial condition


dy
= a ´ by, y(0) = y0 (12.2.6)
dt
has the solution
a a 
y(t ) = ´ ´ y0 e´bt . (12.2.7)
b b

Suppose that a, b ą 0 in Eqn. (12.2.6). Then we can summarize the behaviour of the solutions
(12.2.7) as follows:

• The time dependence of Eqn. (12.2.7) is contained in the term e´bt , which (for b ą 0) is
exponentially decreasing. As time increases, t Ñ 8, the exponential term becomes negligibly
small, so y Ñ a/b.

• If initially y(0) = y0 ą a/b, then y(t ) approaches a/b from above, whereas if y0 ă a/b, it
approaches a/b from below.

• If initially y0 = a/b, there is no change at all (dy/dt = 0). Thus y = a/b is a steady state of
the DE in Eqn. (12.2.6).

Recognizing such general structure means that we can avoid repeating similar calculations from
scratch in related examples. Newton’s law of cooling is one representative of the class of differential
equations of the form Eqn. (12.2.6). If we set a = kE, b = k and T = y in Eqn. (12.2.6), we get
back to Eqn. (12.2.4). As expected from the general case, T approaches a/b = E, the ambient
temperature, which corresponds to a steady state of NLC.
Next, we describe other examples that share this structure, and hence similar dynamic behaviour.
Friction and terminal velocity A falling object accelerates under the force of gravity, but friction
slows down this acceleration.
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by

Note
Eqn. (12.2.8) comes from a simple force balance:

ma = Fgravity ´ Fdrag ,

and from the assumption that Fdrag = µv, where µ ą 0 is the “drag coefficient”. Dividing both
sides by m and replacing a by dv/dt leads to this equation, with k = µ/m.

The differential equation satisfied by the velocity v(t ) of the falling object with friction is
dv
= g ´ kv (12.2.8)
dt
where g ą 0 is acceleration due to gravity and k ą 0 is a constant representing the effect of air
resistance. Usually, a frictional force is assumed to be proportional to the velocity of the object,
and to act in a direction that slows it down. (This accounts for the negative sign in Eqn. (12.2.8).)
Parachutes operate on the principle of enhancing that frictional force to damp out the acceleration of
a skydiver. Hence, Eqn. (12.2.8) is often called the skydiver equation.

Example 12.2.5. Use the general results for Eqn. (12.2.6) to write down the solution to the differen-
tial equation (12.2.8) for the velocity of a skydiver given the initial condition v(0) = v0 . Interpret
your results in a simple description of what happens over time.

Concept Check-In
1. Assign appropriate units to each of the parameters in Example 12.2.5.

2. When a sky-diver steps into the void, her initial vertical velocity is zero. Write down her
velocity v(t ) based on results of Example 12.2.5 .

Solution. Eqn. (12.2.8) is of the same form as Eqn. (12.2.6), and has the same type of solutions. We
merely have to adjust the notation, by identifying

v(t ) Ñ y(t ), g Ñ a, k Ñ b, v0 Ñ y0 .

Hence, without further calculation, we can conclude that the solution of (12.2.8) together with its
initial condition is:
g g 
v(t ) = ´ ´ v0 e´kt . (12.2.9)
k k
The velocity is initially v0 , and eventually approaches g/k which is the steady state or terminal
velocity for the object. Depending on the initial speed, the object either slows down (if v0 ą g/k) or
speed up (if v0 ă g/k) as it approaches the terminal velocity. ♦
Chemical production and decay. A chemical reaction inside a fixed reaction volume produces
a substance at a constant rate Kin . A second reaction results in decay of that substance at a rate
proportional to its concentration. Let c(t ) denote the time-dependent concentration of the substance,
and assume that time is measured in units of hours. Then, writing down a balance equation leads to
a differential equation of the form
dc
= Kin ´ γc. (12.2.10)
dt
S OLVING DIFFERENTIAL EQUATIONS 12.2 E QUATIONS OF THE FORM y1 (t ) = a ´ by

Here, the first term is the rate of production and the second term is the rate of decay. The
net rate of change of the chemical concentration is then the difference of the two. The constants
Kin ą 0, γ ą 0 represent the rate of production and decay - recall that the units of each term in any
equation have to match.For example, if the concentration c is measured in units of milli-Molar
(mM), then dc/dt has units of mM/hr, and hence Kin must have units of mM/h and γ must have
units of 1/hr.

Example 12.2.6. Write down the solution to the DE (12.2.10) given the initial condition c(0) = c0 .
Determine the steady state chemical concentration.

Solution. Translating notation from the general case to this example,

c(t ) Ñ y(t ), Kin Ñ a, γ Ñ b.

Then we can immediately write down the solution:


 
Kin Kin
c(t ) = ´ ´ c0 e´γt . (12.2.11)
γ γ

Regardless of its initial condition, the chemical concentration will approach a steady state concentra-
tion is c = Kin /γ. ♦
In this section we have seen that the behaviour found in the general case of the differential
equation (12.2.1), can be reinterpreted in each specific situation of interest. This points to one
powerful aspect of mathematics, namely the ability to use results in abstract general cases to solve a
variety of seemingly unrelated scientific problems that share the same mathematical structure.
Featured Problem 12.2.7 (Greenhouse Gasses and atmospheric CO2 )
Climate change has been attributed partly to the accumulation of greenhouse gasses (such as carbon
dioxide and methane) in the atmosphere.

[0in]

Figure 12.7: CO2 is produced by emissions from burning fossil fuel and other human activities
(orange arrow). The oceans and plant biomass are both sinks that absorbs CO2 (light green arrows).

Here we consider a simplified illustrative model for the carbon cycle that tracks the sources and
sinks of CO2 in the atmosphere. Consider C (t ) as the level of atmospheric carbon dioxide. Define
the production rate of C02 due to utilization of fossil fuel and other human activity to be EFF , and
let the rate of absorption of CO2 by the oceans be SOCEAN . We will also assume that living plants
absorb CO2 at a rate proportional to their biomass and to the CO2 level.
S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS

1. Explain the following differential equation for atmospheric CO2 :


dC
= EFF ´ SOCEAN ´ γPC. (12.2.12)
dt

2. Assuming that EFF , SOCEAN , γ, P are constants, find the steady state level of CO2 in terms of
these parameters.

Hint
CO2 is usually given in units of “parts per million”, ppm (=10´6 ) ,1 ppm = 2.1 GtC.
(1GtC= 1 gigaton carbon = 109 tons.)
Time is typically given in years, so rates are “per year” (yr´1 ).
Approximate parameter values:

• EFF « 10 GtC yr´1 ,


• SOCEAN « 3 GtC yr´1 ,
• P « 560 Gt plant biomass,
• γ « 1.35 ¨ 10´5 yr´1 Gt´1 .

3. Find C (t ), that is, predict the amount of CO2 over time, assuming that C (0) = C0 .

4. Graph the function C (t ) for parameter values given in the problem, assuming that C0 =
400ppm= 840 GtC.

5. How big an effect would be produced on the CO2 level in 50 years if 15% of the plant biomass
is removed to deforestation just prior to t = 0?

Featured Problem 12.2.7


Note: Information for Problem 12.2.7 is adapted from the literature, and may reflect many simpli-
fications and approximations. In actual fact, most “constants” in the problem are time-dependent,
making the real problem of predicting CO2 levels much more challenging.

12.3 IJ Euler’s method and numerical solutions

Learning Objectives
• Explain how a differential equation may be solved computationally using linear ap-
proximations. That is, explain how Euler’s method works.

• Explain what each term represents in the formula for Euler’s method.

• Examine and compare computational (numerical) and exact (analytical) solutions to


differential equations.

• Use Euler’s method to solve a differential equation by hand (small number of steps)
S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS

So far, we have explored ways of understanding the behaviour predicted by a differential equation
in the form of an analytic solution, namely an explicit formula for the solution as a function of
time. However, in reality this is typically difficult without extensive training, and occasionally,
impossible even for experts. Even if we can find such a solution, it may be inconvenient to determine
its numerical values at arbitrary times, or to interpret its behaviour.
For this reason, we sometimes need a method for computing an approximation for the desired
solution. We refer to that approximation as a numerical solution. The idea is to harness a
computational device - computer, laptop, or calculator - to find numerical values of points along
the solution curve, rather than attempting to determine the formula for the solution as a function
of time. We illustrate this process using a technique called Euler’s method, which is based on an
approximation of a derivative by the slope of a secant line.
Below, we describe how Euler’s method is used to approximate the solution to a general initial
value problem (differential equation together with initial condition) of the form
dy
= f (y), y(0) = y0 .
dt

Set up. We first must pick a “step size,” ∆t, and subdivide the t axis into discrete steps of that size.
We thus have a set of time points t1 ,t2 , . . . , spaced ∆t apart as shown in Figure 12.8. Our procedure
starts with the known initial value y(0) = y0 , and uses it to generate an approximate value at the
next time point (y1 ), then the next (y2 ), and so on. We denote by yk the value of the independent
variable generated at the k’th time step by Euler’s method as an approximation to the (unknown)
true solution y(tk ).

∆t

time
t0 t1 t2 t3 t4 t5

Figure 12.8: The time axis is subdivided into steps of size ∆t.

Method. We approximate the differential equation by a finite difference equation


dy yk+1 ´ yk
= f (y) approximated by = f (yk ).
dt ∆t

Concept Check-In
1. If ∆t = 0.1 and t0 = 0, what are t1 ,t2 and t3 ?

2. Explain the difference between the value y1 and the true solution y(t1 ).

3. If ∆t is not sufficiently small, why might Euler’s method give a bad approximation to the
solution?
S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS

This approximation is reasonable only when ∆t, the time step size, is small. Rearranging this
equation leads to a process (also called recurrence relation) for linking values of the solution at
successive time points,
yk+1 ´ yk
= f (yk ), ñ yk+1 = yk + ∆t ¨ f (yk ). (12.3.1)
∆t
Application. We start with the known initial value, y0 . Then (setting the index to k = 0 in
Eqn. (12.3.1)) we obtain
y1 = y0 + f (y0 )∆t.
The quantities on the right are known, so we can compute the value of y1 , which is the approximation
to the solution y(t1 ) at the time point t1 . We can then continue to generate the value at the next time
point in the same way, by approximating the derivative again as a secant slope. This leads to

y2 = y1 + f (y1 )∆t.

The approximation so generated, leading to values y1 , y2 , . . . is called Euler’s method.

Concept Check-In
1. In Euler’s method, can you determine t2 directly? That is, without first computing t1 ?

2. In Euler’s method, can you determine y2 directly? That is, without first computing y1 ?

Applying this approximation repeatedly, leads to an iteration method, that is, the repeated
computation

y1 = y0 + f (y0 )∆t,
y2 = y1 + f (y1 )∆t,
..
.
yk+1 = yk + f (yk )∆t.

From this iteration, we obtain the approximate values of the function yk « y(tk ) for as many time
steps as desired starting from t = 0 in increments of ∆t up to some final time T of interest.
It is customary to use the following notations:

• t0 : the initial time point, usually at t = 0.

• h = ∆t : common notations for the step size, i.e. the distance between the points along the t
axis.

• tk : the k’th time point. Note that since the points are at multiples of the step size that we have
picked, tk = k∆t = kh.

• y(t ) : the actual value of the solution to the differential equation at time t. This is usually not
known, but in the examples discussed in this chapter, we can solve the differential equation
exactly, so we have a formula for the function y(t ). In most hard scientific problems, no such
formula is known in advance.
S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS

• y(tk ) : the actual value of the solution to the differential equation at one of the discrete time
points, tk (again, not usually known).

• yk : the approximate value of the solution obtained by Euler’s method. We hope that this
approximate value is fairly close to the true value, i.e. that yk « y(tk ), but there is always some
error in the approximation. More advanced methods that are specifically designed to reduce
such errors are discussed in courses on numerical analysis.

§§ Euler’s method applied to population growth


We illustrate how Euler’s method is used in a familiar example, that of unlimited population growth.

Example 12.3.1. Apply Euler’s method to approximating solutions for the simple exponential
growth model that was studied in Chapter 11,

dy
= ay, y(0) = y0
dt
where a is a constant (see Eqn 11.1.2).

Concept Check-In
1. Carry our Example 12.3.1 with ∆t = 0.1, a = 1, and y0 = 1.

2. Plot the first 5 points you determine. Compare with the true solution.

3. Solve the initial value problem in Example 12.3.2 analytically. Compare the points (0, 100),
(0.1, 95), (0.2, 90.25) and (0.3, 85.7375) with the true solution at the corresponding t
values.

Solution. Subdivide the t axis into steps of size ∆t, starting with t0 = 0, and t1 = ∆t,t2 = 2∆t, . . .
The first value of y is known from the initial condition,

y0 = y(0) = y0 .

We replace the differential equation by the approximation


yk+1 ´ yk
= ayk ñ yk+1 = yk + a∆tyk , k = 1, 2, . . .
∆t
In particular,

y1 = y0 + a∆ty0 = y0 (1 + a∆t ),
y2 = y1 (1 + a∆t ),
y3 = y2 (1 + a∆t ),

and so on. At every stage, the quantity on the right hand side depends only on value of yk that as
already known from the step before. ♦
The next example demonstrates Euler’s method applied to a specific differential equation.
S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS

k tk yk
0 0 100.00
1 0.1 95.00
2 0.2 90.25
3 0.3 85.74
4 0.4 81.45
5 0.5 77.38

Table 12.3: Euler’s method applied to Example 12.3.2.

Example 12.3.2. Use Euler’s method to find the solution to


dy
= ´0.5y, y(0) = 100.
dt
Use step size ∆t = 0.1 to approximate the solution for the first two time steps.
Solution. Euler’s method applied to this example would lead to

y0 =100.
y1 =y0 (1 + a∆t ) = 100(1 + (´0.5)(0.1)) = 95, etc.

We show the first five values in Table 12.3. Clearly, these kinds of repeated calculations are best
handled on a spreadsheet or similar computer software.

P Link to Google Sheets. This spreadsheet implements Euler’s method for Example 12.3.2.
You can view the formulae by clicking on a cell in the sheet but you cannot edit the sheet here.

§§ Euler’s method applied to Newton’s law of cooling


We apply Euler’s method to Newton’s law of cooling. Upon completion, we can directly compare
the approximate numerical solution generated by Euler’s method to the true (analytic) solution,
(12.2.5), that we determined earlier in this chapter.
Example 12.3.3 (Newton’s law of cooling). Consider the temperature of an object T (t ) in an
ambient temperature of E = 10˝ . Assume that k = 0.2/min. Use the initial value problem
dT
= k (E ´ T ), T (0) = T0
dt
to write the exact solution to Eqn. (12.2.5) in terms of the initial value T0 .
Solution. In this case, the differential equation has the form
dT
= 0.2(10 ´ T ),
dt
and its analytic solution, from Eqn. (12.2.5), is

T (t ) = 10 + (T0 ´ 10)e´0.2t . (12.3.2)


S OLVING DIFFERENTIAL EQUATIONS 12.3 E ULER ’ S METHOD AND NUMERICAL SOLUTIONS


Below, we use Euler’s method to compute a solution from each of several initial conditions, T (0) =
0, 5, 15, 20 degrees.

Example 12.3.4 (Euler’s method applied to Newton’s law of cooling). Write the Euler’s method
procedure for the approximate solution to the problem in Example 12.3.3.

Solution. Euler’s method approximates the differential equation by


Tk+1 ´ Tk
= 0.2(10 ´ Tk ).
∆t
or, in simplified form,
Tk+1 = Tk + 0.2(10 ´ Tk )∆t.

y
20
Euler’s method
time approx solution exact soln
tk Tk T (t)
15 True solution 0.0000 0.0000 0.0000
1.0000 2.0000 1.8127
2.0000 3.6000 3.2968
10 3.0000 4.8800 4.5119
4.0000 5.9040 5.5067
5.0000 6.7232 6.3212
5 6.0000 7.3786 6.9881
7.0000 7.9028 7.5340
8.0000 8.3223 7.9810
∆t = 1.0
2 4 6 8 10
time, t

Figure 12.9: Euler’s method applied to Newton’s law of cooling. The graph shows the true solution (red) and the
approximate solution (black).

Example 12.3.5. Use Euler’s method from Example 12.3.4 and time steps of size ∆t = 1.0 to find
a numerical solution to the the cooling problem. Use a spreadsheet for the calculations. Note that
∆t = 1.0 is not a “small step;” we use it here for illustration purposes.

Solution. The procedure to implement is

Tk+1 = Tk + 0.2(10 ´ Tk )∆t.

In Figure 12.9 we show a typical example of the method with initial value T (0) = T0 and with the
time step size ∆t = 1.0. Black dots represent the discrete values generated by the Euler method,
starting from initial conditions, T0 = 0, 5, 15, 20. Notice that the black curve is simply made up of
line segments linking points obtained by the numerical solution.
S OLVING DIFFERENTIAL EQUATIONS 12.4 S UMMARY

Concept Check-In
1. What change would you make in the process set up in Example 12.3.5 to improve the
approximation made by Euler’s method?

On the same graph, we also show the analytic solution (red curves) given by Eqn. (12.3.2) with the
same four initial temperatures. We see that the black and red curves start out at the same points (since
they both satisfy the same initial conditions). However, the approximate solution obtained with
Euler’s method is not identical to the true solution. The difference between the two (gap between
the red and black curves) is the numerical error in the approximation.

12.4 IJ Summary
1. Given a function, we can check whether it is a solution to a differential equation by performing
the appropriate differentiation and algebraic simplification.

2. Solutions to differential equations in which there is no change at all (“constant solutions”) are
referred to as steady states.

3. The differential equations


dy
= a ´ by, y(0) = y0
dt
has a steady state solution y = a/b.

4. If we define the deviation from steady state, z(t ) = y(t ) ´ ab , we get a decay equation for z(t )
that has exponentially decreasing solutions provided b ą 0. This says that the eviation from
steady state always decrease over time.

5. The resulting solution for y(t ) is


a a 
y(t ) = ´ ´ y0 e´bt .
b b

6. For some differential equations, it is not always possible to determine an analytic solution
(explicit formula). Numerical solutions can be found using Euler’s method, and serve as an
approximate solution.

7. Euler’s method takes a known initial value y0 and uses the iteration scheme:

yk+1 = yk + f (yk )∆t.

to generate successive values of yk that approximate the solution at time points tk = k∆t

8. Applications considered in this chapter included:

(a) height of water draining out of a cylindrical container (verifying a solution to a differential
equation);
(b) Newton’s law of cooling (described by a linear differential equation);
(c) growth of the radius of a cell;
S OLVING DIFFERENTIAL EQUATIONS 12.4 S UMMARY

(d) the accumulation of greenhouse gasses in the atmosphere;

(e) friction and terminal velocity; and

(f) chemical production and decay.

Quick Concept Checks

1. Explain why an object at room temperature is at a steady state for Newton’s


law of cooling.

2. The following graph depicts solution curves to a particular differential equa-


tion of the form dy/dt = a ´ by.
30

20
y

10

t
2 4 6 8 10 12 14

(a) Estimate the value that these solution curves are approaching.
(b) Which solutions are approaching from above? From below?

3. Consider the following initial value problem:


dy
= 2 ´ 4y, y(0) = 4,
dt
(a) What value does its solution curve approach?
(b) Does its solution approach from above or below?

4. Why is a large value of ∆t not a good idea when using Euler’s method?
S OLVING DIFFERENTIAL EQUATIONS 12.4 S UMMARY

366
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS

Chapter 13

(F LAVOURS A, B) Q UALITATIVE METH -


ODS FOR DIFFERENTIAL EQUATIONS

Concept Check-In
1. What is meant by an analytic solution to a differential equation?

2. What other kind of solutions are possible?

3. Give an example of a nonlinear function f (y).

Not all differential equations are easily solved analytically. Furthermore, even when we find the
analytic solution, it is not necessarily easy to interpret, graph, or understand. This situation motivates
qualitative methods that promote an overall understanding of behaviour - directly from information
in the differential equation - without the challenge of finding a full functional form of the solution.
In this chapter we expand our familiarity with differential equations and assemble new, qualitative
techniques for understanding them. We consider differential equations in which the expression on
one side, f (y), is nonlinear, i.e. equations of the form
dy
= f (y)
dt
in which f is more complicated than the form a ´ by. Geometric techniques, rather than algebraic
calculations form the core of the concepts we discuss.

13.1 IJ Linear and nonlinear differential equations


In the model for population growth in Chapter 11, we encountered the differential equation
dN
= kN,
dt
where N (t ) is population size at time t and k is a constant per capita growth rate. We showed that
this differential equation has exponential solutions. It means that two behaviours are generically
obtained: explosive growth if k ą 0 or extinction if k ă 0.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.1 L INEAR AND NONLINEAR
DIFFERENTIAL EQUATIONS

Concept Check-In
4. What happens in the case that k = 0? Explain under what conditions this might arise and
what happens to the population N (t ) in this case.

The case of k ą 0 is unrealistic, since real populations cannot keep growing indefinitely in an
explosive, exponential way. Eventually running out of space or resources, the population growth
dwindles, and the population attains some static level rather than expanding forever. This motivates
a revision of our previous model to depict density-dependent growth.

§§ The logistic equation for population growth


Let N (t ) represent the size of a population at time t, as before. Consider the differential equation
dN (K ´ N )
= rN . (13.1.1)
dt K
We call this differential equation the logistic equation.The logistic equation has a long history in
modelling population growth of humans, microorganisms, and animals. Here the parameter r is
the intrinsic growth rate and K is the carrying capacity. Both r, K are assumed to be positive
constants for a given population in a given environment.
In the form written above, we could interpret the logistic equation as
 
dN (K ´ N )
= R(N ) ¨ N, where R(N ) = r .
dt K
The term R(N ) is a function of N that replaces the constant rate of growth k (found in the unrealistic,
unlimited population growth model). R is called the density dependent growth rate.

§§ Linear versus nonlinear


The logistic equation introduces the first example of a nonlinear differential equation. We explain
the distinction between linear and nonlinear differential equations and why it matters.

Concept Check-In
dy
5. Can the differential equation dt = a ´ by be written in the form (13.1.2)? If so, what are
the values of α, β , γ?

Definition 13.1.1 (Linear differential equation). A first order differential equation is said to be linear
if it is a linear combination of terms of the form
dy
, y, 1
dt
that is, it can be written in the form
dy
α +βy+γ = 0 (13.1.2)
dt
where α, β , γ do not depend on y. Note that “first order” means that only the first derivative (or no
derivative at all) may occur in the equation.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.1 L INEAR AND NONLINEAR
DIFFERENTIAL EQUATIONS

So far, we have seen several examples of this type with constant coefficients α, β , γ. For example,
α = 1, β = ´k, and γ = 0 in Eqn. 11.1.2 whereas α = 1, γ = ´a, and β = b in Eqn. (12.2.1). A
differential equation that is not of this form is said to be nonlinear.
Example 13.1.2 (Linear versus nonlinear differential equations). Which of the following differential
equations are linear and which are nonlinear?
dy dy dy
(a) = y2 , (b) ´ y = 5, (c) y = ´1.
dt dt dt
?
Solution. Any term of the form y2 , y, 1/y, etc. is nonlinear in y. A product such as y dy dt is also
nonlinear in the independent variable. Hence equations (a), (c) are nonlinear, while (b) is linear. ♦
Concept Check-In
6. For what values of α, β and γ can Example 13.1.2(b) be put into the form (13.1.2)?

The significance of the distinction between linear and nonlinear differential equations is that
nonlinearities make it much harder to systematically find a solution to the given differential equation
by “analytic” methods. Most linear differential equations have solutions that are made of exponential
functions or expressions involving such functions. This is not true for nonlinear equations.
However, as we see shortly, geometric methods are very helpful in understanding the behaviour
of such nonlinear differential equations.

§§ Law of Mass Action


Nonlinear terms in differential equations arise naturally in various ways. One common source comes
from describing interactions between individuals, as the following example illustrates.
In a chemical reaction, molecules of types A and B bind and react to form product P. Let
a(t ), b(t ) denote the concentrations of A and B. These concentrations depend on time because the
chemical reaction uses up both types in producing the product.
The reaction only occurs when A and B molecules “collide” and stick to one another. Collisions
occur randomly, but if concentrations are larger, more collisions take place, and the reaction is faster.
If either the concentration a or b is doubled, then the reaction rate doubles. But if both a and b are
doubled, then the reaction rate should be four times faster, based on the higher chances of collisions
between A and B.
Concept Check-In
7. If the concentration of A is tripled, and that of B is doubled, how much faster would we
expect the reaction rate to be?

8. Why does the product a ¨ b, rather than the sum a + b appear in the Law of Mass Action ?

The simplest assumption that captures this dependence is


rate of reaction is proportional to a ¨ b ñ rate of reaction = k ¨ a ¨ b
where k is some constant that represents the reactivity of the molecules.
We can formally state this result, known as the Law of Mass Action as follows:
The Law of Mass Action: The rate of a chemical reaction involving an interaction of two or more
chemical species is proportional to the product of the concentrations of the given species.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.1 L INEAR AND NONLINEAR
DIFFERENTIAL EQUATIONS

Example 13.1.3 (Differential equation for interacting chemicals). Substance A is added at a constant
rate of I moles per hour to a 1-litre vessel. Pairs of molecules of A interact chemically to form a
product P. Write down a differential equation that keeps track of the concentration of A, denoted y(t ).

Concept Check-In
1. In each of Examples 13.1.3 and 13.1.4, clearly identify the constant quantities.

Solution. First consider the case that there is no reaction. Then, the addition of A to the reactor at a
constant rate leads to changing y(t ), described by the differential equation

dy
= I.
dt

When the chemical reaction takes place, the depletion of A depends on interactions of pairs of
molecules. By the law of mass action, the rate of reaction is of the form k ¨ y ¨ y = ky2 , and as it
reduces the concentration, it appear with a minus sign in the DE. Hence

dy
= I ´ ky2 .
dt

This is a nonlinear differential equation - it contains a term of the form y2 . ♦

Example 13.1.4 (Logistic equation reinterpreted). Rewrite the logistic equation in the form

dN
= rN ´ bN 2
dt

(where b = r/K is a positive quantity).

a) Interpret the meaning of this restated form of the equation by explaining what each of the terms
on the right hand side could represent.

b) Which of the two terms dominates for small versus large population levels?

Solution.

a) This form of the equation has growth term rN proportional to population size, as encountered
previously in unlimited population growth. However, there is also a quadratic (nonlinear) rate of
loss (note the minus sign) ´bN 2 . This term could describe interactions between individuals that
lead to mortality, e.g. through fighting or competition.

b) From familiarity with power functions (in this case, the functions of N that form the two terms,
rN and bN 2 ) we can deduce that the second, quadratic term dominates for larger values of N,
and this means that when the population is crowded, the loss of individuals is greater than the
rate of reproduction. ♦
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.1 L INEAR AND NONLINEAR
DIFFERENTIAL EQUATIONS

§§ Scaling the logistic equation


Consider units involved in the logistic equation (13.1.1):
dN (K ´ N )
= rN .
dt K
This equation has two parameters, r and K. Since units on each side of an equation must balance,
and must be the same for terms that are added or subtracted, we can infer that K has the same units as
N, and thus it is a population density. When N = K, the population growth rate is zero (dN/dt = 0).
It turns out that we can understand the behaviour of the logistic equation by converting it to a
“generic” form that does not depend on the constant K. We do so by transforming variables, which
amounts to choosing a convenient way to measure the population size.
Example 13.1.5 (Rescaling). Define a new variable
N (t )
y(t ) = ,
K
with N (t ) and K as in the logistic equation. Then N (t ) = Ky(t ).
a) Interpret what the transformed variable y represents.
b) Rewrite the logistic equation in terms of this variable.

Concept Check-In
10. Suppose an environment can sustain 2000 aphids per plant, and the current population
size on a given plant is 1700. What is K, N and y based on this information?

11. This population is at what percent of its carrying capacity?

Solution.
a) The variable, y(t ) represents a scaled version of the population density. Instead of measuring
the population in some arbitrary units - such as number of individuals per acre, or number of
bacteria per ml - y(t ) measures the population in “multiples of the carrying capacity.”
For example, if the environment can sustain 1000 aphids per plant (so K = 1000 individuals
per plant), and the current population size on a given plant is N = 950 then the value of the
scaled variable is y = 950/1000 = 0.95. We would say that “the aphid population is at 95% of
its carrying capacity on the plant.”
b) Since K is assumed constant, it follows that
dN dy
N (t ) = Ky(t ) ñ =K .
dt dt
Using this, we can simplify the logistic equation as follows:
dN (K ´ N ) dy (K ´ Ky)
= rN , ñ K = r (Ky) ,
dt K dt K (13.1.3)
dy
ñ = ry(1 ´ y).
dt

Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.2 T HE GEOMETRY OF CHANGE

Eqn. (13.1.3) “looks simpler” than Eqn. (13.1.1) since it depends on only one parameter, r.
Moreover, by understanding this equation, and transforming back to the original logistic in terms
of N (t ) = Ky(t ), we can interpret results for the original model. While we do not go further with
transforming variables at present, it turns out that one can also further reduce the scaled logistic to
an equation in which r = 1 by “rescaling time units”.

Concept Check-In
12. What are the units of the parameter r?

13. How might we use the parameter r to define a time-scale?

13.2 IJ The geometry of change

Learning Objectives
• Find linear approximations of a solution to a DE, given a point.

• Sketch a linear approximation of a solution to a DE at a point.

• Interpret slope fields for a given differential equation and use them to roughly sketch
solutions.

In this section, we introduce a new method for understanding differential equations using
graphical and geometric arguments. Such methods circumvent the solutions that we expressed in
terms of analytic formulae. We resort to concepts learned much earlier - for example, the derivative
as a slope of a tangent line - in order to use the differential equation itself to assemble a sketch
of the behaviour that it predicts. That is, rather than writing down y = F (t ) as a solution to the
differential equation (and then graphing that function) we sketch the qualitative behaviour of such
solution curves directly from information contained in the differential equation.

§§ Slope fields
Here we discuss a geometric way of understanding what a differential equation is saying using
a slope field, also called a direction field. We have already seen that solutions to a differential
equation of the form
dy
= f (y)
dt

are curves in the (y,t )-plane that describe how y(t ) changes over time (thus, these curves are graphs
of functions of time). Each initial condition y(0) = y0 is associated with one of these curves, so that
together, these curves form a family of solutions.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.2 T HE GEOMETRY OF CHANGE

What do these curves have in common, geometrically?


• the slope of the tangent line (dy/dt) at any point on any of the curves is related to the
value of the y-coordinate of that point - as stated in the differential equation.

• at any point (t, y(t )) on a solution curve, the tangent line must have slope f (y), which
depends only on the y value, and not on the time t.
Note: in more general cases, the expression f (y) that appears in the differential equation might
depend on t as well as y. For our purposes, we do not consider such examples in detail.
By sketching slopes at various values of y, we obtain the slope field through which we can get a
reasonable idea of the behaviour of the solutions to the differential equation.

Example 13.2.1. Consider the differential equation

dy
= 2y. (13.2.1)
dt
Compute some of the slopes for various values of y and use this to sketch a slope field for this
differential equation.

Concept Check-In
14. Solve Differential Eqn. (13.2.1) analytically.

Solution. Equation (13.2.1) states that if a solution curve passes through a point (t, y), then its
tangent line at that point has a slope 2y, regardless of the value of t. This example is simple enough
that we can state the following: for positive values of y, the slope is positive; for negative values of
y, the slope is negative; and for y = 0, the slope is zero.
We provide some tabulated values of y indicating the values of the slope f (y), its sign, and what
this implies about the local behaviour of the solution and its direction. Then, in Figure 13.1 we

y f (y ) slope of behaviour of direction


tangent y of arrow
line
-2 -4 -ve decreasing Œ
-1 -2 -ve decreasing Œ
0 0 0 no change Ñ
1 2 +ve increasing Õ
2 4 +ve increasing Õ

dy
Table 13.1: Table for the slope field diagram of differential equation (13.2.1), dt = 2y, described in
Example 13.2.1.

combine this information to generate the direction field and the corresponding solution curves. Note
that the direction of the arrows (rather than their absolute magnitude) provides the most important
qualitative tendency for the slope field sketch. ♦
In constructing the slope field and solution curves, the following basic rules should be followed:
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.2 T HE GEOMETRY OF CHANGE
y
2 2 y

1 1

0 0

−1 −1

−2 t −2 t
0 1 2 3 4 5 0 1 2 3 4 5
dy
Figure 13.1: Direction field and solution curves for differential equation, dt = 2y described in
Example 13.2.1.

1. By convention, time flows from left to right along the t axis in our graphs, so the direction of
all arrows (not usually indicated explicitly on the slope field) is always from left to right.

2. According to the differential equation, for any given value of the variable y, the slope is given
by the expression f (y) in the differential equation. The sign of that quantity is particularly
important in determining whether the solution is locally increasing, decreasing, or neither. In
the tables, we indicate this in the last column with the notation Õ, Œ, or Ñ.

3. There is a single arrow at any point in the ty-plane, and consequently solution curves cannot
intersect anywhere (although they can get arbitrarily close to one another).

We see some implications of these rules in our examples.

Example 13.2.2. Consider the differential equation


dy
= f (y) = y ´ y3 . (13.2.2)
dt
Create a slope field diagram for this differential equation.

i A summary of steps in creating the slope field for Example 13.2.2.

Solution. Based on the last example, we focus on the sign, rather than the value of the derivative f (y),
since that sign determines whether the solutions increase, decrease, or stay constant. Recall that
factoring helps to find zeros, and to identify where an expression changes sign. For example,
dy
= f (y) = y ´ y3 = y(1 ´ y2 ) = y(1 + y)(1 ´ y).
dt
The sign of f depends on the signs of the factors y, (1 + y), (1 ´ y).
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.2 T HE GEOMETRY OF CHANGE

Concept Check-In
15. Graph the function f (y) = y(1 + y)(1 ´ y) and indiate where it changes sign.

16. Repeat the process for the function f (y) = y2 (1 + y)2 (1 ´ y).

For y ă ´1, two factors, y, (1 + y), are negative, whereas (1 ´ y) is positive, so that the product
is positive overall. The sign of f (y) changes at each of the three points y = 0, ˘1 where one or
another of the three factors changes sign, as shown in Table 13.2. Eventually, to the right of all three
(when y ą 1), the sign is negative. We summarize these observations in Table 13.2 and show the
slopes field and solution curves in Figure 13.2. ♦

y sign of behaviour of direction


f (y) y of arrow
y ă ´1 +ve increasing Õ
-1 0 no change Ñ
-0.5 -ve decreasing Œ
0 0 no change Ñ
0.5 +ve increasing Õ
1 0 no change Ñ
yą1 -ve decreasing Œ

Table 13.2: Table for the slope field diagram of the DE (13.2.2) described in Example 13.2.2.

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
1 1
/ / / / / / / / / / / / / / / / / / / /
0.5 / / / / / / / / / / 0.5 / / / / / / / / / /
                   
0 0
                   
−0.5 \ \ \ \ \ \ \ \ \ \ −0.5 \ \ \ \ \ \ \ \ \ \
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
−1 −1
/ / / / / / / / / / / / / / / / / / / /
0 5 10 15 20 0 5 10 15 20
Figure 13.2: Direction field and solution curves for differential equation (13.2.2) described in
Example 13.2.2.

Example 13.2.3. Sketch a slope field and solution curves for the problem of a cooling object, and
specifically for
dT
= f (T ) = 0.2(10 ´ T ). (13.2.3)
dt
Solution. The family of curves shown in Figure 13.3 (also Figure 12.6) are solutions to (13.2.3).
The function f (T ) = 0.2(10 ´ T ) corresponds to the slopes of tangent lines to these curves. We
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.3 (F LAVOUR B) S TATE - SPACE
DIAGRAMS

T sign of behaviour of T direction


f (T ) of arrow
T ă 10 +ve increasing Õ
T = 10 0 no change Ñ
T ą 10 -ve decreasing Œ

Table 13.3: Table for the slope field diagram of dT


dt = 0.2(10 ´ T ) described in Example 13.2.3.

indicate the sign of f (T ) and thereby the behaviour of T (t ) in Table 13.3. Note that there is only
one change of sign, at T = 10. For smaller T , the solution is always increasing and for larger T ,
the solution is always decreasing. The slope field and solution curves are shown in Figure 13.3. In
the slope field, one particular value of t is coloured to emphasize the associated changes in T , as in
Table 13.3. ♦

20 20
temperature, T

temperature, T
15 15

10 10

5 5

0
t 0
t
0 5 10 15 20 0 5 10 15 20
Figure 13.3: Slope field and solution curves for a cooling object that satisfies the differential
equation (13.2.3) in Example 13.2.3.

Concept Check-In
17. Indicate the regions Figure 13.3 where T is increasing.

18. Where is T not changing in Figure 13.3?

We observe an agreement between the detailed solutions found analytically (Example 12.2.2),
found using Euler’s method (Example 12.3.4), and those sketched using the new qualitative argu-
ments (Example 13.2.3).

13.3 IJ (Flavour B) State-space diagrams


Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.3 (F LAVOUR B) S TATE - SPACE
DIAGRAMS

Learning Objectives
• Explain what is meant by a “steady-state” solution.

• Find steady-state solutions to simple differential equations.

• Sketch a state-space diagram for a given differential equation and use it to describe the
behaviour of solutions.

• Explain what it means for a steady-state solution to be “stable”. Determine the stability
of a steady state.

• Use a state-space diagram to identify stability of steady states

In Examples 13.2.1-13.2.3, we saw that we can understand qualitative features of solutions to


the differential equation
dy
= f (y), (13.3.1)
dt
by examining the expression f (y). We used the sign of f (y) to assemble a slope field diagram and
sketch solution curves. The slope field informed us about which initial values of y would increase,
decrease or stay constant. We next show another way of determining the same information.
First, let us define a state space, also called phase line or phase diagram, which is essentially
the y-axis with superimposed arrows representing the direction of flow.

Definition 13.3.1 (State space diagram (or phase line)). A line representing the dependent vari-
able (y) together with arrows to describe the flow along that line (increasing, decreasing, or
stationary y) satisfying Eqn. (13.3.1) is called the state space diagram or the phase line diagram
for the differential equation.

Rather than tabulating signs for f (y), we can arrive at similar conclusions by sketching f (y)
and observing where this function is positive (implying that y increases) or negative (y decreases).
Places where f (y) = 0 (“zeros of f ”) are important since these represent steady states (“static
solutions”, where there is no change in y). Along the y axis (which is now on the horizontal axis of
the sketch) increasing y means motion to the right, decreasing y means motion to the left.
As we shall see, the information contained in this type of diagram provides a qualitative
description of solutions to the differential equation, but with the explicit time behaviour suppressed.
This is illustrated by Figure 13.4, where we show the connection between the slope field diagram
and the state space diagram for a typical differential equation.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.3 (F LAVOUR B) S TATE - SPACE
DIAGRAMS

y
y
%     f (y)

&    
t y
%    

&    

(a) (b) (c)

Figure 13.4: Slope field related to state space


The relationship of the slope field and state space diagrams. (a) A typical slope field. A few arrows
have been added to indicate the direction of time flow along the tangent vectors. Now consider
“looking down the time axis” as shown by the “eye” in this diagram. Then the t axis points towards
us, and we see only the y-axis as in (b). Arrows on the y-axis indicate the directions of flow for
various values of y as determined in (a). Now “rotate” the y axis so it is horizontal, as shown in (c).
The direction of the arrows exactly correspond to places where f (y), in (c), is positive (which
implies increasing y, Ñ), or negative (which implies decreasing y, Ð). The state space diagram is
Flavour B

the y-axis in (b) or (c).


Example 13.3.2. Consider the differential equation
dy
= f (y) = y ´ y3 . (13.3.2)
dt
Sketch f (y) versus y and use your sketch to determine where y is static, and where y increases or
decreases. Then describe what this predicts starting from each of the three initial conditions:

(i) y(0) = ´0.5,

(ii) y(0) = 0.3, or

(iii) y(0) = 2.

Solution. From Example 13.2.2, we know that f (y) = 0 at y = ´1, 0, 1.

i Video explanation of the steps in the solution to Example 13.3.2.

This means that y does not change at these steady state values, so, if we start a system off with
y(0) = 0, or y(0) = ˘1, the value of y is static. The three places at which this happens are marked
by heavy dots in Figure 13.5(a).
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.3 (F LAVOUR B) S TATE - SPACE
DIAGRAMS

f (y) f (y)

y y

(a) (b)

Figure 13.5: Where y is increasing, decreasing, or static

Steady states (dots) and intervals for which y increases or decreases for the differential equation
(13.3.2). See Example 13.3.2.
Flavour B

We also see that f (y) ă 0 for ´1 ă y ă 0 and for y ą 1. In these intervals, y(t ) must be a
decreasing function of time (dy/dt ă 0). On the other hand, for 0 ă y ă 1 or for y ă ´1, we have
f (y) ą 0, so y(t ) is increasing. See arrows on Figure 13.5(b). We see from this figure that there is a
tendency for y to move away from the steady state value y = 0 and to approach either of the steady
states at 1 or ´1. Starting from the initial values given above, we have

(i) y(0) = ´0.5 results in y Ñ ´1,

(ii) y(0) = 0.3 leads to y Ñ 1, and

(iii) y(0) = 2 implies y Ñ 1. ♦

Example 13.3.3 (A cooling object). Sketch the same type of diagram for the problem of a cooling
object and interpret its meaning.

Solution. Here, the differential equation is


dT
= f (T ) = 0.2(10 ´ T ). (13.3.3)
dt
A sketch of the rate of change, f (T ) versus the temperature T is shown in Figure 13.6. We deduce
the direction of the flow directly form this sketch. ♦
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.3 (F LAVOUR B) S TATE - SPACE
DIAGRAMS

f (T )
2

T
10

Figure 13.6: Determining flow direction


Flavour B

Figure for Example 13.3.3, the differential equation (13.3.3).

Concept Check-In
19. In Figures 13.6 and 13.7, where is the function positive?

20. Consider Eqn. (13.3.4) analytically: what value does y approach?

Example 13.3.4. Create a similar qualitative sketch for the more general form of linear differential
equation
dy
= f (y) = a ´ by. (13.3.4)
dt
For what values of y would there be no change?

Solution. The rate of change of y is given by the function f (y) = a ´ by. This is shown in
Figure 13.7. The steady state at which f (y) = 0 is at y = a/b. Starting from an initial condi-
tion y(0) = a/b, there would be no change. We also see from this figure that y approaches this
value over time. After a long time, the value of y will be approximately a/b. ♦
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.3 (F LAVOUR B) S TATE - SPACE
DIAGRAMS

f (y)

y
a
b
Figure 13.7: Qualitative sketch for a DE

Qualitative sketch for Eqn. (13.3.4) in Example 13.3.4.

§§ Steady states and stability


Flavour B

From the last few figures, we observe that wherever the function f on the right hand side of the
differential equations crosses the horizontal axis (satisfies f = 0) there is a steady state. For example,
in Figure 13.6 this takes place at T = 10. At that temperature the differential equation specifies that
dT /dt = 0 and so, T = 10 is a steady state, a concept we first encountered in Chapter 12.

Definition 13.3.5 (Steady state). A steady state is a state in which a system is not changing.

Example 13.3.6. Identify steady states of Eqn. (13.3.2),


dy
= y3 ´ y.
dt
Solution. Steady states are points that satisfy f (y) = 0. We already found those to be y = 0 and
y = ˘1 in Example 13.3.2. ♦
From Figure 13.5, we see that solutions starting close to y = 1 tend to get closer and closer to
this value. We refer to this behaviour as stability of the steady state.

Definition 13.3.7 (Stability). We say that a steady state is stable if states that are initially close
enough to that steady state will get closer to it with time. We say that a steady state is unstable, if
states that are initially very close to it eventually move away from that steady state.

Example 13.3.8. Determine the stability of steady states of Eqn. (13.3.2):


dy
= y ´ y3 .
dt
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS

Concept Check-In
21. In the state space diagram in Figure 13.4, identify the stable steady states.

Solution. From any starting value of y ą 0 in this example, we see that after a long time, the
solution curves tend to approach the value y = 1. States close to y = 1 get closer to it, so this is a
stable steady state. For the steady state y = 0, we see that initial conditions near y = 0 move away
over time. Thus, this steady state is unstable. Similarly, the steady state at y = ´1 is stable. In
Figure 13.5 we show the stable steady states with black dots and the unstable steady state with an
open dot. ♦

13.4 IJ Applying qualitative analysis to biological models


The qualitative ideas developed so far will now be applied to to problems from biology. In the
following sections we first use these methods to obtain a thorough understanding of logistic popula-
tion growth. We then derive a model for the spread of a disease, and use qualitative arguments to
analyze the predictions of that differential equation model.

§§ Qualitative analysis of the logistic equation


We apply the new methods to the logistic equation.

Example 13.4.1. Find the steady states of the logistic equation, Eqn. (13.1.1):

i The scaled logistic equation, its slope field, and steady state values are discussed here.

dN (K ´ N )
= rN .
dt K
Solution. To determine the steady states of Eqn. (13.1.1), i.e. the level of population that would not
change over time, we look for values of N such that
dN
= 0.
dt
This leads to
(K ´ N )
rN = 0,
K
which has solutions N = 0 (no population at all) or N = K (the population is at its carrying capacity).

We could similarly find steady states of the scaled form of the logistic equation, Eqn. (13.1.3).
Setting dy/dt = 0 leads to
dy
0= = ry(1 ´ y) ñ y = 0, or y = 1.
dt
This comes as no surprise since these values of y correspond to the values N = 0 and N = K.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS

i A second way to analyze the scaled logistic equation, using the phase line approach, and its
connection to the slope field method as described in Example 13.4.2.

Example 13.4.2. Draw a plot of the rate of change dy/dt versus the value of y for the scaled logistic
equation,Eqn. (13.1.3):
dy
= ry(1 ´ y).
dt

Concept Check-In
22. Circle the steady states in Figure 13.8 and identify which one is stable.

23. Why is y ă 0 not relevant in Example 13.4.2?

Solution. In the plot of Figure 13.8 only y ě 0 is relevant. In the interval 0 ă y ă 1, the rate
of change is positive, so that y increases, whereas for y ą 1, the rate of change is negative, so y
decreases. Since y refers to population size, we need not concern ourselves with behaviour for y ă 0.
From Figure 13.8 we deduce that solutions that start with a positive y value approach y = 1 with

Rate of change
dy/dt

y
1

Figure 13.8: Plot of dy/dt versus y for the the scaled logistic equation (13.1.3).

time. Solutions starting at either steady state y = 0 or y = 1 would not change. Restated in terms of
the variable N (t ), any initial population should approach its carrying capacity K with time. ♦
We now look at the same equation from the perspective of the slope field.
Example 13.4.3. Draw a slope field for the scaled logistic equation with r = 0.5, that is for
dy
= f (y) = 0.5 ¨ y(1 ´ y). (13.4.1)
dt
Solution. We generate slopes for various values of y in Table 13.4 and plot the slope field in
Figure 13.9(a). ♦
Finally, we practice Euler’s method to graph the numerical solution to Eqn. (13.4.1) from several
initial conditions.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS

y sign of behaviour direction


f (y) of y of arrow
0 0 no change Ñ
0ăyă1 +ve increasing Õ
1 0 no change Ñ
yą1 -ve decreasing Œ

Table 13.4: Table for slope field for the logistic equation (13.4.1). See Fig 13.9(a) for the resulting
diagram.

1.2
1.2

1
1
population

population
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 5 10 15 20 0 2 4 6 8 10

(a) time (b) time

dy
Figure 13.9: (a) Slope field and (b) solution curves for the logistic equation (13.4.1), dt = 0.5 ¨ y(1 ´
y)

Example 13.4.4 (Numerical solutions to the logistic equation). Use Euler’s method to approximate
the solutions to the logistic equation (13.4.1).

Concept Check-In
24. What initial values y0 were used in drawing the different solution curves depicted in
Figure 13.9(b)?

Solution. In Figure 13.9(b) we show a set of solution curves, obtained by solving the equation
numerically using Euler’s method. To obtain these solutions, a value of h = ∆t = 0.1 was used.
The solution is plotted for various initial conditions y(0) = y0 . The successive values of y were
calculated according to

yk+1 = yk + 0.5yk (1 ´ yk )h, k = 0, . . . 100.


Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS

P Link to Google Sheets. This spreadsheet implements Euler’s method for Example 13.4.4. A
chart showing solutions from four initial conditions is included.

From Figure 13.9(b), we see that solution curves approach the steady state y = 1, meaning that the
population N (t ) approaches the carrying capacity K for all positive starting values. A link to the
spreadsheet that implements Euler’s method is included. ♦
Example 13.4.5 (Inflection points). Some of the curves shown in Figure 13.9(b) have an inflection
point, but others do not. Use the differential equation to determine which of the solution curves have
an inflection point.
Solution. We have already established that all initial values in the range 0 ă y0 ă 1 are associated
with increasing solutions y(t ). Now we consider the concavity of those solutions.

Concept Check-In
25. How do we know that initial conditions in the range 0 ă y0 ă 1 lead to increasing
solutions?

The logistic equation has the form


dy
= ry(1 ´ y) = ry ´ ry2
dt
Differentiate both sides using the chain rule and factor, to get

d2y dy dy dy
2
= r ´ 2ry = r (1 ´ 2y).
dt dt dt dt
An inflection point would occur at places where the second derivative changes sign. This is possible
for dy/dt = 0 or for (1 ´ 2y) = 0. We have already dismissed the first possibility because we argued
that the rate of change is nonzero in the interval of interest. Thus we conclude that an inflection point
would occur whenever y = 1/2. Any initial condition satisfying 0 ă y0 ă 1/2 would eventually
pass through y = 1/2 on its way to the steady state level at y = 1, and in so doing, would have an
inflection point. ♦

§§ A changing aphid population


In Chapter 1,we investigated a situation when predation and growth rates of an aphid population
exactly balanced. But what happens if these two rates do not balance? We are now ready to tackle
this question.
Featured Problem 13.4.6 (aphids)

Hint
Growth rate (number of aphids born per unit time) contributes positively, whereas predation rate
(number of aphids eaten per unit time) contributes negatively to the rate of change of aphids
with respect to time (dx/dt).
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS

Consider the aphid-ladybug problem (Example 1.4.1) with aphid density x, growth rate G(x) = rx,
and predation rate by a ladybug P(x) as in (1.4.1). (a) Write down a differential equation for the
aphid population. (b) Use your equation, and a sketch of the two functions to answer the following
question: What happens to the aphid population starting from various initial population sizes?
Featured Problem 13.4.6

§§ A model for the spread of a disease


In the era of human immunodeficiency virus (HIV), Severe Acute Respiratory Syndrome (SARS),
Avian influenza (“bird flu”) and similar emerging infectious diseases, it is prudent to consider how
infection spreads, and how it could be controlled or suppressed. This motivates the following
example.
For a given disease, let us subdivide the population into two classes: healthy individuals who are
susceptible to catching the infection, and those that are currently infected and able to transmit the
infection to others. We consider an infection that is mild enough that individuals recover at some
constant rate, and that they become susceptible once recovered.
Note: usually, recovery from an illness leads to partial temporary immunity. While this, too, can be
modelled, we restrict attention to the simpler case which is tractable using mathematics we have just
introduced.
The simplest case to understand is that of a fixed population (with no birth, death or migration
during the timescale of interest). A goal is to predict whether the infection spreads and persists
(becomes endemic) in the population or whether it runs its course and disappear.

i A video summary of the model for the spread of a disease, together with its analysis.

We use the following notation:

S(t ) = size of population of susceptible (healthy) individuals,


I (t ) = size of population of infected individuals,
N (t ) = S(t ) + I (t ) = total population size.

We add a few simplifying assumptions.

1. The population mixes very well, so each individual is equally likely to contact and interact
with any other individual. The contact is random.

2. Other than the state (S or I), individuals are “identical,” with the same rates of recovery and
infectivity.

3. On the timescale of interest, there is no birth, death or migration, only exchange between S
and I.

Example 13.4.7. Suppose that the process can be represented by the scheme

S + I Ñ I + I,
IÑS
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS

The first part, transmission of disease from I to S involves interaction. The second part is recovery.
Use the assumptions above to track the two populations and to formulate a set of differential
equations for I (t ) and S(t ).
Solution. The following balance equations keeps track of individuals
     
Rate of Rate of gain Rate of loss
 change of  =  due to disease  ´  due to 
I (t ) transmission recovery
According to our assumption, recovery takes place at a constant rate per unit time, denoted by µ ą 0
. By the law of mass action, the disease transmission rate should be proportional to the product of
the populations, (S ¨ I ). Assigning β ą 0 to be the constant of proportionality leads to the following
differential equations for the infected population:
dI
= β SI ´ µI.
dt
Similarly, we can write a balance equation that tracks the population of susceptible individuals:
     
Rate of Rate of Loss Rate of gain
 change of  = ´  due to disease  +  due to 
S(t ) transmission recovery
Observe that loss from one group leads to (exactly balanced) gain in the other group. By similar
logic, the differential equation for S(t ) is then
dS
= ´β SI + µI.
dt
We have arrived at a system of equations that describe the changes in each of the groups,
dI
= β SI ´ µI, (13.4.2a)
dt
dS
= ´β SI + µI. (13.4.2b)
dt

Concept Check-In
26. Identify any constants in Eqns. (13.4.2)(a) and (b).

27. What are the units of those constants?

28. Why does the hint given in Example 13.4.8 help?

From Eqns. (13.4.2) it is clear that changes in one population depend on both, which means
that the differential equations are coupled (linked to one another). Hence, we cannot “solve one”
independently of the other. We must treat them as a pair. However, as we observe in the next
examples, we can simplify this system of equations using the fact that the total population does not
change.
Example 13.4.8. Use Eqns.(13.4.2) to show that the total population does not change (hint: show
that the derivative of S(t ) + I (t ) is zero).
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS

i Video showing that the population N (t ) = I (t ) + S(t ) is constant.

Solution. Add the equations to one another. Then we obtain


d dI dS
[I (t ) + S(t )] = + = β SI ´ µI ´ β SI + µI = 0.
dt dt dt
Hence
d dN
[I (t ) + S(t )] = = 0,
dt dt
which mean that N (t ) = [I (t ) + S(t )] = N=constant, so the total population does not change. (In
Eqn. (13.1.1), here N is a constant and I (t ), S(t ) are the variables.) ♦
Example 13.4.9. Use the fact that N is constant to express S(t ) in terms of I (t ) and N, and
eliminate S(t ) from the differential equation for I (t ). Your equation should only contain the
constants N, β , µ.

Concept Check-In
29. Redo Example 13.4.9 but eliminate I (t ) instead of S(t ) .

30. Analyze the equation you get for dS(t )/dt as done for dI/dt in Example 13.4.10.

Solution. Since N = S(t ) + I (t ) is constant, we can write S(t ) = N ´ I (t ). Then, plugging this into
the differential equation for I (t ) we obtain
dI dI
= β SI ´ µI, ñ = β (N ´ I )I ´ µI.
dt dt

Example 13.4.10. a) Show that the above equation can be written in the form
dI
= β I (K ´ I ),
dt
where K is a constant.

b) Determine how this constant K depends on N, β , and µ.

c) Is the constant K positive or negative?


Solution.
a) We rewrite the differential equation for I (t ) as follows:
   
dI µ µ
= β (N ´ I )I ´ µI = β I (N ´ I ) ´ = βI N ´ ´I .
dt β β

b) We identify the constant,  


µ
K = N´ .
β
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS

c) Evidently, K could be either positive or negative, that is


#
N ě βµ ñ K ě 0,
N ă βµ ñ K ă 0.


Using the above process, we have reduced the system of two differential equations for the two
variables I (t ), S(t ) to a single differential equation for I (t ), together with the statement S(t ) =
N ´ I (t ). We now examine implications of this result using the qualitative methods of this chapter.

Example 13.4.11. Consider the differential equation for I (t ) given by


 
dI µ
” f (I ) = β I (K ´ I ), where K = N´ . (13.4.3)
dt β

Find the steady states of the differential equation (13.4.3) and draw a state space diagram in each of
the following cases:

(a) K ě 0,

(b) K ă 0.

Use your diagram to determine which steady state(s) are stable or unstable.

Concept Check-In
31. What is the significance of the grey shaded regions in Fig. 13.10.

32. Draw Fig. 13.10 for K = 0.

33. Why is I = K not an admissible steady state if K ă 0?

Solution. Steady states of Eqn. (13.4.3) satisfy dI/dt = β I (K ´ I ) = 0. Hence, these steady states
are I = 0 (no infected individuals) and I = K. The latter only makes sense if K ě 0. We plot the
function f (I ) = β I (K ´ I ) in Eqn. (13.4.3) against the state variable I in Figure 13.10 (a) for K ě 0
and (b) for K ă 0. Since f (I ) is quadratic in I, its graph is a parabola and it opens downwards. We
add arrows pointing right (Ñ) in the regions where dI/dt ą 0 and arrows pointing left (Ð) where
dI/dt ă 0.
In case (a), when K ě 0, we find that arrows point toward I = K, so this steady state is stable.
Arrows point away from I = 0, so this represents an unstable steady state. In case (b), while we still
have a parabolic graph with two steady states, the state I = K is not admissible since K is negative.
Hence only one steady state, at I = 0 is relevant biologically, and all initial conditions move towards
this state. ♦

Example 13.4.12. Interpret the results of the model in terms of the disease, assuming that initially
most of the population is in the susceptible S group, and a small number of infected individuals are
present at t = 0.
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.4 A PPLYING QUALITATIVE
ANALYSIS TO BIOLOGICAL MODELS

f (I) f (I)

I 0 I
0 K K

(a) (b)

Figure 13.10: State-space diagrams for differential equation (13.4.3). Plots of f (I ) as a function of
I in the cases (a) K ě 0, and (b) K ă 0. The grey regions are not biologically meaningful since I
cannot be negative.

Solution. In case (a), as long as the initial size of the infected group is positive (I ą 0), with time
it approaches K, that is, I (t ) Ñ K = N ´ µ/β . The rest of the population is in the susceptible
group, that is S(t ) Ñ µ/β (so that S(t ) + I (t ) = N is always constant.) This first scenario holds
provided K ą 0 which is equivalent to N ą µ/β . There are then some infected and some healthy
individuals in the population indefinitely, according to the model. In this case, we say that the
disease becomes endemic.
In case (b), which corresponds to N ă µ/β , we see that I (t ) Ñ 0 regardless of the initial size of
the infected group. In that case, S(t ) Ñ N so with time, the infected group shrinks and the healthy
group grows so that the whole population becomes healthy. From these two results, we conclude
that the disease is wiped out in a small population, whereas in a sufficiently large population, it can
spread until a steady state is attained where some fraction of the population is always infected. In
fact we have identified a threshold that separates these two behaviours:
Concept Check-In
34. In the case that β = 0.001per person per day and µ = 0.1 per day, how large would the
population have to be for the disease to become endemic?

35. Frequent hand-washing can be a protective measure that decreases the spread of disease.
Which parameter of the model would this affect and in what way?


ą1 ñ disease becomes endemic,
µ

ă1 ñ disease is wiped out.
µ

i A video summarizing the interpretation of the model and the meaning of the constant
R0 = Nβ /µ.

The ratio of constants in these inequalities, R0 = Nβ /µ is called the basic reproduction number
for the disease. Many current and much more detailed models for disease transmission also have
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.5 S UMMARY

such threshold behaviour, and the ratio that determines whether the disease spreads or disappears, R0
is of great interest in vaccination strategies. This ratio represents the number of infections that arise
when 1 infected individual interacts with a population of N susceptible individuals.

13.5 IJ Summary
1. A differential equation of the form α dy
dt + β y + γ = 0 is linear (and “first order”). We
encountered several examples of nonlinear DEs in this chapter.
2. A (possibly nonlinear) differential equation dy dt = f (y) can be analyzed qualitatively by
observing where f (y) is positive, negative or zero.
3. A slope field (or “direction field”) is a collection of tangent vectors for solutions to a differential
equation. Slope fields can be sketched from f (y) without the need to solve the differential
equation.
4. A solution curve drawn in a slope field corresponds to a single solution to a differential
equation, with some initial y0 value given.
5. A state space (or “phase line” diagram) for the differential equation is a y axis, together
with arrows describing the flow (increasing/decreasing/stationary) along that axis. It can be
obtained from a sketch of f (y).
6. A steady state is stable if nearby states get closer. A steady state is unstable if nearby states
get further away with time.
7. Creating/interpreting slope field and state space diagrams is helpful in understanding the
behavior of solutions to differential equations.
8. Applications considered in this chapter included:
(a) the logistic equations for population growth (a nonlinear differential equation, scaling,
steady state and slope field demonstration);
(b) the Law of Mass Action (a nonlinear differential equation);
(c) a cooling object (state space and phase line diagram demonstration); and
(d) disease spread model (an extensive exposition on qualitative differential equation meth-
ods).

Quick Concept Check


1. Why is it helpful to rescale an equation?

2. Identify which of the following differential equations are linear:

(a) 5 dy
dt ´ y = ´0.5 (c) dy
+ πy + ρ = 3
dx
 2
(b) dy
+y+1 = 0 (d) dx
dt + x + 2 = ´3x
dt
Q UALITATIVE METHODS FOR DIFFERENTIAL EQUATIONS 13.5 S UMMARY

3. Consider the following slope field:


y
20

15

(a) Where is y decreasing?


10
(b) What is y approaching?
5

t
5 10 15 20

4. Circle the stable steady states in the following state space diagram
f (y)

y
Application to Multivariable Equations

393
G EOMETRY IN T HREE D IMENSIONS

Chapter 14

(F LAVOUR C) G EOMETRY
IN T HREE D IMENSIONS

Before we get started doing calculus in two and three dimensions we need to brush up on some basic
geometry that we will use a lot. We are already familiar with the Cartesian plane1 , but we’ll start
from the beginning.

14.1 IJ Points and planes

Learning Objectives
• Label points on the x-y-z axes and identify basic planes of constant x, y, or z.

Each point in two dimensions may be labeled by two coordinates2 (x, y) which specify the
position of the point in some units with respect to some axes as in the figure below.

(x, y)

x x

1 René Descartes (1596–1650) was a French scientist and philosopher, who lived in the Dutch Republic for roughly
twenty years after serving in the (mercenary) Dutch States Army. He is viewed as the father of analytic geometry,
which uses numbers to study geometry.
2 This is why the xy-plane is called “two dimensional” — the name of each point consists of two real numbers.
G EOMETRY IN T HREE D IMENSIONS 14.1 P OINTS AND PLANES

The set of all points in two dimensions is denoted3 R2 . Observe that

• the distance from the point (x, y) to the x-axis is |y|


• the distance from the point (x, y) to the y-axis is |x| a
• the distance from the point (x, y) to the origin (0, 0) is x2 + y2

Similarly, each point in three dimensions may be labeled by three coordinates (x, y, z), as in the
two figures below.

z z

(x, y, z) (x, y, z)

z z
y y
x
x
y y

x x

The set of all points in three dimensions is denoted R3 . The plane that contains, for example, the x-
and y-axes is called the xy-plane.

• The xy-plane is the set of all points (x, y, z) that satisfy z = 0.


• The xz-plane is the set of all points (x, y, z) that satisfy y = 0.
• The yz-plane is the set of all points (x, y, z) that satisfy x = 0.

More generally,

• The set of all points (x, y, z) that obey z = c is a plane that is parallel to the xy-plane and is a
distance |c| from it. If c ą 0, the plane z = c is above the xy-plane. If c ă 0, the plane z = c is
below the xy-plane. We say that the plane z = c is a signed distance c from the xy-plane.
• The set of all points (x, y, z) that obey y = b is a plane that is parallel to the xz-plane and is a
signed distance b from it.
• The set of all points (x, y, z) that obey x = a is a plane that is parallel to the yz-plane and is a
signed distance a from it.

3 Not surprisingly, the 2 in R2 signifies that each point is labelled by two numbers and the R in R2 signifies that the
numbers in question are real numbers. There are more advanced applications (for example in signal analysis and
in quantum mechanics) where complex numbers are used. The space of all pairs (z1 , z2 ), with z1 and z2 complex
numbers is denoted C2 .
G EOMETRY IN T HREE D IMENSIONS 14.1 P OINTS AND PLANES

z z z
z“c

y“b

y y y

x x x x“a

Observe that our 2d distances extend quite easily to 3d.


• the distance from the point (x, y, z) to the xy-plane is |z|
• the distance from the point (x, y, z) to the xz-plane is |y|
• the distance from the point (x, y, z) to the yz-plane is |x| a
• the distance from the point (x, y, z) to the origin (0, 0, 0) is x2 + y2 + z2
a
To see that the distance from the point (x, y, z) to the origin (0, 0, 0) is indeed x2 + y2 + z2 ,
• apply Pythagoras to the right-angled trianglea with vertices (0, 0, 0), (x, 0, 0) and (x, y, 0) to see
that the distance from (0, 0, 0) to (x, y, 0) is x2 + y2 and then
• apply Pythagoras to the right-angled triangleb with vertices (0, 0, 0), (x, y, 0) and (x, y, z) to see
a 2 a
that the distance from (0, 0, 0) to (x, y, z) is x2 + y2 + z2 = x2 + y2 + z2 .

px, y, zq

y
x
px, 0, 0q y px, y, 0q

More generally, the distance from the point (x, y, z) to the point (x1 , y1 , z1 ) is
b
(x ´ x1 )2 + (y ´ y1 )2 + (z ´ z1 )2
Notice that this gives us the equation for a sphere quite directly. All the points on a sphere are
equidistant from the centre of the sphere. So, for example, the equation of the sphere centered on
(1, 2, 3) with radius 4, that is, the set of all points (x, y, z) whose distance from (1, 2, 3) is 4, is
(x ´ 1)2 + (y ´ 2)2 + (z ´ 3)2 = 16
If you’re having a hard time picturing the three-dimensional axes, Appendix section 14.1.1 will
lead you through folding a model out of a piece of paper.
G EOMETRY IN T HREE D IMENSIONS 14.1 P OINTS AND PLANES

14.1.1 §§ (optional) Folding the first octant of R3


This text, whether you’re reading it on a computer screen or a printed page, exists in two dimensions.
So, anything we draw in three dimensions is going to require a little bit of imagination. If you’re
struggling to understand the figures with three coordinates, it might help to make your own model of
these axes.
In the Cartesian plane, the first quadrant is the part of the plane where both x and y are positive.
R3 divides three-dimensional space into eight regions, called octants. The first octant is the region
where all of x, y, and z are positive.
Following the instructions below, you can fold a piece of paper into an octant.
1. Fold your paper in half “hamburger style” (so that the fold goes along the shorter dimension
of the paper). Position it so that it opens like a book4 .

2. Bring the corner of your folded paper up to the side.

3. Your paper now has a triangle sitting on top of a rectangle. Where the triangle ends, make a
crease in the underlying rectangle shapes.

crease

4. Your paper has four layers, with the triangle shapes on top. Open the paper so that three layers
are on top, and one is on the bottom. The result should look like the inside corner of a box.

open

Your octant is created! The vertical crease is the z axis, the crease to the left is the x axis, and the
crease to the right is the y axis. In the picture below, the blue sphere indicates that the octant is open
towards you: if you were to put a marble inside the paper structure, it would sit as shown.

4 in a language written left-to-right


G EOMETRY IN T HREE D IMENSIONS 14.2 F UNCTIONS OF TWO VARIABLES

x y

To practice with your octant, label the following points directly on the paper:
• (1, 1, 0)
• (0, 1, 1)
• (1, 0, 1)
The next collection of points will exist out in space, not on any of the paper sides. Point to their
positions relative to your octant:
• (1, 1, 1)
• (1, 2, 3)
• (1, ´1, 1)
• (1, 1, ´1)

14.2 IJ Functions of two variables

Learning Objectives
• Given a simple function of two variables, z = f (x, y), evaluate z values for given pairs
(x, y).

First, a quick review of dependent and independent variables. Independent variables are the variables
we think of as changing somehow on their own; the dependent variables are the variables whose
change we think of as being caused by the independent variables. For example, if you want to
describe the relationship between the age of a cup of cottage cheese, and the number of bacteria in
that cup, we generally choose age (time) to be the independent variable and population of bacteria to
be the dependent variable: we think of age changing on its own, then that age causing the bacterial
population to change.
We could of course go the other way, and write time as a function of bacteria. This could be
useful if we were trying to figure out how old the cheese was by counting its bacteria. So the
difference between an independent variable and a dependent variable has to do with how we want to
interpret a function.
In a single-variable function, by convention we write

y = f (x )
G EOMETRY IN T HREE D IMENSIONS 14.2 F UNCTIONS OF TWO VARIABLES

where y is the dependent variable and x is the independent variable. Similarly, in a two-variable
function, we generally write
z = f (x, y)
We think of the variables x and y as independent, and the variable z as dependent.
If we’re not too concerned with independent vs dependent variables; or if the relationship
between the dependent and independent variables is difficult (or impossible) to write explicitly in
this form; then we can also define multivariable functions implicitly. For example, in the equation

z3 x + z2 y + xyz ´ 1 = 0

we can think of z as an implicitly defined function of x and y. You’ve already seen two families of
implicitly defined functions: planes and spheres.
Example 14.2.1
Which points (1, y, 1) in R3 satisfy the equation

z3 x + z2 y + xyz ´ 1 = 0 ?

Solution. If x = z = 1, then the equation becomes

1+y+y´1 = 0

which has solution y = 0. So the only such point is (1, 0, 1).


Example 14.2.1

It’s common to see a multivariable equation like

f (x, y) = sin(x + y)

or
2 +y2
g(x, y) = ex
and think that the sine and exponential functions are different from the sine and exponential
functions we’ve seen in two dimensions. They aren’t! When x and y are real numbers, then (x + y)
and (x2 + y2 ) are real numbers as well. We’re taking the sine of a real number in the first equation,
and e to a real power in the second equation, just as we always have.
Functions of two (or more) variables are not so different from functions of one variable in other
ways as well.

Definition 14.2.2 (Domain and Range).

Let f (x, y) be a function that takes pairs of real numbers as inputs, and gives a real number
as its output.
The set of points (x, y) that can be input to f is the domain of that function. The set of
outputs of f over its entire domain is the range of that function.
G EOMETRY IN T HREE D IMENSIONS 14.2 F UNCTIONS OF TWO VARIABLES

Example 14.2.3 (Domain and Range)


Find the domain and range of the function
a
2 2
f (x, y) = ex +y ´ 2

Solution. There are three operations in our function: exponentiation, subtraction, and taking of a
square root. We can subtract anything from anything; and we can raise e to any power. So the only
thing that could “break” our function is if we tried to take the square root of a negative number. This
tells us that, in order for f (x, y) to be defined, we need
 2 2 
x +y
e ´2 ě 0
2 +y2
ùñ ex ě2
2 2
ùñ x + y ě ln 2

One way of describing the domain of this function is to call it “all points (x, y) with x2 + y2 ě ln 2.”
? the shape this set makes in R : all points on or outside the circle
A more standard way is to describe 2

centred at the origin with radius ln 2 « 0.83.

?
ln 2

To help you visualize what we mean, take a point in the shaded area above. For example, (1, .5).
If we plug that into our function, it causes no problems:
a
2 +.52
a ?
f (1, .5) = e1 ´1 = e1.25 ´ 2 « 1.49 « 1.22

On the other hand, take a point in the white area. For example, (.5, .5). If we try to plug this into our
function, we end up with
a
2 2
a ? ?
f (.5, .75) = e.5 +.5 ´ 2 = e0.5 ´ 2 « 1.65 ´ 2 « ´0.35

which is not a real number.


G EOMETRY IN T HREE D IMENSIONS 14.2 F UNCTIONS OF TWO VARIABLES

?
ln 2
(1, .5)

(.5, .5)
x

Now, let’s think about range. By choosing larger and larger values of x and y, we can make
x2 + y2 into larger and larger numbers. So within our restricted domain, the range of x2 + y2 is
2 2   2 2
[ln2, 8); so the range of ex +y is eln 2 , 8 = [2, 8); so the range of ex +y ´ 2 is [0, 8); so the
range of f (x, y) is [0, 8).
Again, note that the domain of f consists of ordered pairs of real numbers, while its range
consists of real numbers.
Example 14.2.3

Example 14.2.4
Find the domain and range of the function

 
x
f (x, y) = sin ?
y

Solution. Let’s start with domain. We can take the sine of any number we like, so that part of the
function doesn’t limit the domain. The things limiting the domain are that we cannot take the square
root of a negative number, and we can’t divide by zero.

• Because we can’t take the square root of a negative number, we must have y ě 0.

?
• Because we can’t divide by 0, we must have y ‰ 0, i.e. y ‰ 0.

Combining these restrictions, we can only have values of y in the interval (0, 8); x can be any real
number. So, our domain is the upper half of the xy plane, excluding the x-axis:
G EOMETRY IN T HREE D IMENSIONS 14.2 F UNCTIONS OF TWO VARIABLES

In general, the range of sin x is [´1, 1]. So, we certainly can’t get a larger range than this. We
should check that our range is no smaller. When y = 1, our function becomes f (x, 1) = sin(x/1) =
sin x. Since x can be any real number, indeed the range of our function is [´1, 1].
Example 14.2.4

Example 14.2.5
Find the domain and range of the function

f (x, y) = ln(arctan(x + y))

Solution. First, let’s think about the arctangent and logarithm function in the context of single-
variable functions. The domain of arctangent is all real numbers, and its range is ´ π2 , π2 . The
domain of the natural logarithm is all positive numbers, and its range is all real numbers.

z z

π
2

t t

z = arctant z = lnt
G EOMETRY IN T HREE D IMENSIONS 14.2 F UNCTIONS OF TWO VARIABLES

Since only positive numbers may be input into the natural logarithm, we require arctan(x + y) ą 0.
That requires (x + y) ą 0. So, our domain is the collection of all points (x, y) such that x + y ą 0;
put another way, all points above the line y = ´x.

If our domain is points (x, y) such that x + y ą 0, then the range of the function (x + y) is (0, 8);
so the numbers being plugged into the arctangent function are (0, 8  ). So, the numbers coming out
of the arctangent function are 0, π2 . Then the numbers from 0, π2 are being  input into the natural
logarithm function, leading to a range of the entire function of ´8, ln 2 . π

z z

π
2

π

ln 2
t t
π
2


If 0 ă t, then 0 ă arctant ă π
2 If 0 ă t ă π2 , then ´8 ă lnt ă ln π
2

Example 14.2.5

We may sometimes restrict the domain of a function more than is mathematically necessary in
order for it to make sense in a model. For example, we may have a function that only makes sense
in our model when it gives positive values. In this case, we might restrict the domain to a model
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

domain, the set of inputs for which the function is not only defined, but sensible in the context of
our model.
Example 14.2.6
A large pharmaceutical company determines its research budget for a new vaccine according to the
formula
R(x, y) = ln(xy)
where x is the size of the customer base they expect to have and y is the revenue they expect per
dose.
Then for each variable x, y, and R, negative values don’t make sense in the model. So although
we could compute R(´1, ´1) = 1, and we could compute R(0.5, 0.5) « ´1.39, they wouldn’t be
sensible in the context of our model.
• Since x and y need to be nonnegative, we will only consider points (x, y) in the first quadrant
of the Cartesian plane: x ě 0 and y ě 0.
• Since R needs to be nonnegative, we will further restrict xy ě 1. That is, y ě 1x .
The two restrictions above give us the model domain shaded below.
y

x
1

Depending on the specifics of how the function is being used, the model domain may be restricted
even further. For example, perhaps the firm has a maximum budget for any given project; perhaps
the amount they can charge is limited by law; etc.
Example 14.2.6

14.3 IJ Sketching surfaces in 3D

Learning Objectives
In Math 100, you won’t be asked to produce sketches of 3D surfaces, so there are no learning
objectives associated with this section. However, you will be shown such sketches. Under-
standing how they can be produced can help you deepen and solidify your understanding of
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

the behaviour of multivariable functions.

In practice students taking multivariable calculus regularly have great difficulty visualising
surfaces in three dimensions, despite the fact that we all live in three dimensions. We’ll now develop
some technique to help us sketch surfaces in three dimensions5 .
We all have a fair bit of experience drawing curves in two dimensions. Typically the intersection
of a surface (in three dimensions) with a plane is a curve lying in the (two dimensional) plane. Such
an intersection is usually called a cross-section. In the special case that the plane is one of the
coordinate planes, or parallel to one of the coordinate planes, the intersection is sometimes called a
trace.
Definition 14.3.1.

The trace of a surface is the intersection of that surface with a plane that is parallel to one
of the coordinate planes.

So, one trace (the intersection with the xy plane) is found by setting z equal to a constant; another
trace (the intersection with the yz plane) is found by setting x equal to a constant; and the final trace
(the intersection with the xz plane) is found by setting y equal to a constant.
One can often get a pretty good idea of what a surface looks like by sketching a bunch of
cross-sections. Here are some examples.

Example 14.3.2 4x2 + y2 ´ z2 = 1
Sketch the surface that satisfies 4x2 + y2 ´ z2 = 1.

Solution. We’ll start by fixing any number z0 and sketching the part of the surface that lies in the
horizontal plane z = z0 .

z “ z0

The intersection of our surface with that horizontal plane is a horizontal cross-section. Any point
(x, y, z) lying on that horizontal cross-section satsifies both
z = z0 and 4x2 + y2 ´ z2 = 1
ðñ z = z0 and 4x2 + y2 = 1 + z20

5 Of course you could instead use some fancy graphing software, but part of the point is to build intuition. Not to
mention that you can’t use fancy graphing software on your exam.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

Think of z0 as a constant. Then 4x2 + y2 = 1 + z20 is a curve in the xy-plane. As 1 + z20 is a


constant,bthe curve is an ellipse. To determine its semi-axes6 , we observe that when y = 0, we have
b
x = ˘ 12 1 + z20 and when x = 0, we have y = ˘ 1 + z20 . So the curve is just an ellipse with x
b b
semi-axis 21 1 + z20 and y semi-axis 1 + z20 . It’s easy to sketch.

p y
(0 , 1 + z02 )

x
p
( 21 1 + z02 , 0)

Remember that this ellipse is the part of our surface that lies in the plane z = z0 . Imagine that the
sketch of the ellipse is on a single sheet of paper. Lift the sheet of paper up, move it around so that
the x- and y-axes point in the directions of the three dimensional x- and y-axes and place the sheet of
paper into the three dimensional sketch at height z0 . This gives a single horizontal ellipse in 3d, as
in the figure below.

z
z “ z0

We can build up the full surface by stacking many of these horizontal ellipses — one for each
possible height z0 . So we now draw a few of them as in the figure below. To reduce the amount of
clutter in the sketch, we have only drawn the first octant (i.e. the part of three dimensions that has
x ě 0, y ě 0 and z ě 0).

6 The semi-axes of an ellipse are the line segments from the centre of the ellipse to the farthest point on the curve
and to the nearest point on the curve. For a circle the lengths of both of these line segments are just the radius.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

z
z=3

z=2

z=1
y

Here is why it is OK, in this case, to just sketch the first octant. Replacing x by ´x in the equation
4x2 + y2 ´ z2 = 1 does not change the equation. That means that a point (x, y, z) is on the surface if
and only if the point (´x, y, z) is on the surface. So the surface is invariant under reflection in the
yz-plane. Similarly, the equation 4x2 + y2 ´ z2 = 1 does not change when y is replaced by ´y or z
is replaced by ´z. Our surface is also invariant under reflection in the xz- and xy-planes. Once we
have the part in the first octant, the remaining octants can be gotten simply by reflecting about the
coordinate planes.
We can get a more visually meaningful sketch by adding in some vertical cross-sections. The
x = 0 and y = 0 cross-sections (also called traces — they are the parts of our surface that are in the
yz- and xz-planes, respectively) are
x = 0, y2 ´ z2 = 1 and y = 0, 4x2 ´ z2 = 1
These equations describe hyperbolae7 . If you don’t remember how to sketch them, don’t worry.
We’ll do it now. We’ll first sketch them in 2d. Since
y2 = 1 + z2 ùñ |y| ě 1 and y = ˘1 when z = 0 and for large z, y « ˘z
2 2 1
4x = 1 + z ùñ |x| ě 2 and x= ˘ 12 when z = 0 and for large z, x « ˘ 12 z
the sketchs are

z z=y z

y2 − z2 = 1 4x2 − z 2 = 1

y x

Now we’ll incorporate them into the 3d sketch. Once again imagine that each is a single sheet of
paper. Pick each up and move it into the 3d sketch, carefully matching up the axes. The red (blue)
parts of the hyperbolas above become the red (blue) parts of the 3d sketch below (assuming of course
that you are looking at this on a colour screen).

7 It’s not just a figure of speech!


G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

z
z=3

z=2

z=1
y

Now that we have a pretty good idea of what the surface looks like we can clean up and simplify the
sketch. Here are a couple of possibilities.

This type of surface is called a hyperboloid of one sheet.


There are also hyperboloids of two sheets. For example, replacing the +1 on the right hand side
of x2 + y2 ´ z2 = 1 gives x2 + y2 ´ z2 = ´1, which is a hyperboloid of two sheets. We’ll sketch it
quickly in the next example.
Example 14.3.2


Example 14.3.3 4x2 + y2 ´ z2 = ´1
Sketch the surface that satisfies 4x2 + y2 ´ z2 = ´1.

Solution. As in the last example, we’ll start by fixing any number z0 and sketching the part of the
surface that lies in the horizontal plane z = z0 . The intersection of our surface with that horizontal
plane is

z = z0 and 4x2 + y2 = z20 ´ 1

Think of z0 as a constant.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

• If |z0 | ă 1, then z20 ´ 1 ă 0 and there are no solutions to x2 + y2 = z20 ´ 1.

• If |z0 | = 1 there is exactly one solution, namely x = y = 0.

b
1
• If |z0 | ą 1 then4x2 + y2 = z20 ´ 1 is an ellipse with x semi-axis 2z20 ´ 1 and y semi-axis
b
z20 ´ 1. These semi-axes are small when |z0 | is close to 1 and grow as |z0 | increases.

The first octant parts of a few of these horizontal cross-sections are drawn in the figure below.

z
z“3

z“2

z “ 1.02
y

Next we add in the x = 0 and y = 0 cross-sections (i.e. the parts of our surface that are in the yz-
and xz-planes, respectively)

x = 0, z2 = 1 + y2 and y = 0, z2 = 1 + 4x2

z
z“3

z“2

z “ 1.05
y

Now that we have a pretty good idea of what the surface looks like we clean up and simplify the
sketch.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

This type of surface is called a hyperboloid of two sheets.


Example 14.3.3

Example 14.3.4 (yz = 1)


Sketch the surface yz = 1.

Solution. This surface has a special property that makes it relatively easy to sketch. There are no
x’s in the equation yz = 1. That means that if some y0 and z0 obey y0 z0 = 1, then the point (x, y0 , z0 )
lies on the surface yz = 1 for all values of x. As x runs from ´8 to 8, the point (x, y0 , z0 ) sweeps
out a straight line parallel to the x-axis. So the surface yz = 1 is a union of lines parallel to the x-axis.
It is invariant under translations parallel to the x-axis. To sketch yz = 1, we just need to sketch its
intersection with the yz-plane and then translate the resulting curve parallel to the x-axis to sweep
out the surface.
We’ll start with a sketch of the hyperbola yz = 1 in two dimensions.

z
yz = 1

Next we’ll move this 2d sketch into the yz-plane, i.e. the plane x = 0, in 3d, except that we’ll only
draw in the part in the first octant.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

The we’ll draw in x = x0 cross-sections for a couple of more values of x0

and clean up the sketch a bit

Example 14.3.4

Example 14.3.5 (xyz = 4)


Sketch the surface xyz = 4.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

Solution. We’ll sketch this surface using much the same procedure as we used in Examples 14.3.2
and 14.3.3. We’ll only sketch the part of the surface in the first octant. The remaining parts (in the
octants with x, y ă 0, z ě 0, with x, z ă 0, y ě 0 and with y, z ă 0, x ě 0) are just reflections of the
first octant part.
As usual, we start by fixing any number z0 and sketching the part of the surface that lies in the
horizontal plane z = z0 . The intersection of our surface with that horizontal plane is the hyperbola
4
z = z0 and xy =
z0
Note that x Ñ 8 as y Ñ 0 and that y Ñ 8 as x Ñ 0. So the hyperbola has both the x-axis and the
y-axis as asymptotes, when drawn in the xy-plane. The first octant parts of a few of these horizontal
cross-sections (namely, z0 = 4, z0 = 2 and z0 = 21 ) are drawn in the figure below.

z
z“4

z“2

z “ 1{2
y

Next we add some vertical cross-sections. We can’t use x = 0 or y = 0 because any point on xyz = 4
must have all of x, y, z nonzero. So we use

x = 4, yz = 1 and y = 4, xz = 1

instead. They are again hyperbolae.

y“4

x“4

x
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

Finally, we clean up and simplify the sketch.

Example 14.3.5

Often the reason you are interested in a surface in 3d is that it is the graph z = f (x, y) of a
function of two variables f (x, y). Another good way to visualize the behaviour of a function f (x, y)
is to sketch what are called its level curves.

Definition 14.3.6.

A level curve of f (x, y) is a curve whose equation is f (x, y) = C, for some constant C.

A level curve is the set of points in the xy-plane where f takes the value C. Because it is a curve
in 2d, it is usually easier to sketch than the graph of f . Here are a couple of examples.


Example 14.3.7 f (x, y) = x2 + 4y2 ´ 2x + 2
Sketch the level curves of f (x, y) = x2 + 4y2 ´ 2x + 2.

Solution. Fix any real number C. Then, for the specified function f , the level curve f (x, y) = C is
the set of points (x, y) that obey

x2 + 4y2 ´ 2x + 2 = C ðñ x2 ´ 2x + 1 + 4y2 + 1 = C
ðñ (x ´ 1)2 + 4y2 = C ´ 1

Now (x ´ 1)2 + 4y2 is the sum of two squares, and so is always at least zero. So if C ´ 1 ă 0, i.e.
if C ă 1, there is no curve f (x, y) = C. If C ´ 1 = 0, i.e. if C = 1, then f (x, y) = C ´ 1 = 0 if and
only if both (x ´ 1)2 = 0 and 4y2 = 0 and so the level curve consists of the single point (1, 0). If
C ą 1, then f (x, y) = C become (x ´ 1)2 + 4y2 = C ´ 1 ą 0 which describes an ellipse centred on
(1, 0). It intersects the x-axis when y = 0 and
? ?
(x ´ 1)2 = C ´ 1 ðñ x ´ 1 = ˘ C ´ 1 ðñ x = 1 ˘ C ´ 1
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

and it intersects the line x = 1 (i.e. the vertical line through the centre) when

? ?
4y2 = C ´ 1 ðñ 2y = ˘ C ´ 1 ðñ y = ˘ 12 C ´ 1

?
So,
? when C ą 1, f ( x, y ) = C is the ellipse centred on ( 1, 0 ) with x semi-axis C ´ 1 and y semi-axis
1
2 C ´ 1. Here is a sketch of some representative level curves of f (x, y) = x + 4y ´ 2x + 2.
2 2

1
f “17
f “10
f “5
f “1 f “2

1 x

x“1

It is often easier to develop an understanding of the behaviour of a function f (x, y) by looking at


a sketch of its level curves, than it is by looking at a sketch of its graph. On the other hand, you
can also use a sketch of the level curves of f (x, y) as the first step in building a sketch of the graph
z = f (x, y). The next step would be to redraw, for each C, the level curve f (x, y) = C, in the plane
z = C, as we did in Example 14.3.2.

Example 14.3.7

If you’ve ever used a topographic map, you’ve seen examples of level curves. Modelling the
z-axis as a measure of elevation, with z = 0 as sea level, the contours shown on topographic maps
show the level curves associated with different elevations. The example8 below shows the area
around Gambier, Anvil, and Keats Islands, north of UBC. The lines show level curves for z = 0
metres, z = 100 metres, z = 200 metres, etc.

8 generated by Natural Resources Canada’s Atlas of Canada - Toporama, included under an open government license
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

Example 14.3.8 (ex+y+z = 1)


The function f (x, y) is given implicitly by the equation ex+y+z = 1. Sketch the level curves of f .

Solution. This one is not as nasty as it appears. That “ f (x, y) is given implicitly by the equation
ex+y+z = 1” means that, for each x, y, the solution z of ex+y+z = 1 is f (x, y). So, for the specified
function f and any fixed real number C, the level curve f (x, y) = C is the set of points (x, y) that
obey

ex+y+C = 1 ðñ x + y + C = 0 (by taking the ln of both sides)


ðñ x + y = ´C

This is of course a straight line. It intersects the x-axis when y = 0 and x = ´C and it intersects the
y-axis when x = 0 and y = ´C. Here is a sketch of some level curves.

1 f =−3

x
1 f =−2

f =−1

f =3 f =2 f =1 f =0
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

Example 14.3.8

We have just seen that sketching the level curves of a function f (x, y) can help us understand the
behaviour of f . We can generalise this to functions F (x, y, z) of three variables. A level surface of
F (x, y, z) is a surface whose equation is of the form F (x, y, z) = C for some constant C. It is the set
of points (x, y, z) at which F takes the value C.


Example 14.3.9 F (x, y, z) = x2 + y2 + z2

? F (x, y, z) = x + y + z . If C ą 0, then the level surface F (x, y, z) = C is the sphere of radius


Let 2 2 2

C centred on the origin. Here is a sketch of the parts of the level surfaces F = 1 (radius 1), F = 4
(radius 2) and F = 9 (radius 3) that are in the first octant.

F “9
F “4

F “1

Example 14.3.9


Example 14.3.10 F (x, y, z) = x2 + z2
Let F (x, y, z) = x2 + z2 and C ą 0. Consider the level surface x2 + z2 = C. The variable y does not
2 2
? y0 , the intersection of the our surface x + z = C with the
appear in this equation. So for any fixed
plane y = y0 is the circle of radius C centred on x = z = 0. Here is a sketch of the first quadrant
part of one such circle.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

z
y “ y0

F “C

The full surface


? is the horizontal stack of all of those circles with y0 running over R. It is the cylinder
of radius C centred on the y-axis. Here is a sketch of the parts of the level surfaces F = 1 (radius
1), F = 4 (radius 2) and F = 9 (radius 3) that are in the first octant.

z
F “9

F “4

F “1

Example 14.3.10

Example 14.3.11 (F (x, y, z) = ex+y+z )


Let F (x, y, z) = ex+y+z and C ą 0. Consider the level surface ex+y+z = C, or equivalently, x + y + z =
lnC. It is the plane that contains the intercepts (lnC, 0, 0), (0, lnC, 0) and (0, 0, lnC ). Here is a
sketch of the parts of the level surfaces

• F = e (intercepts (1, 0, 0), (0, 1, 0), (0, 0, 1)),


• F = e2 (intercepts (2, 0, 0), (0, 2, 0), (0, 0, 2)) and
• F = e3 (intercepts (3, 0, 0), (0, 3, 0), (0, 0, 3))

that are in the first octant.


G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

F “ e3
F “e 2

F “e
y

Example 14.3.11

There some classes of relatively simple, but commonly occurring, surfaces that are given their
own names. One such class is cylindrical surfaces. You are probably used to thinking of a cylinder
as being something that looks like x2 + y2 = 1.

x2 ` y 2 “ 1

In Mathematics the word “cylinder” is given a more general meaning.

Definition 14.3.12 (Cylinder).

A cylinder is a surface that consists of all points that are on all lines that are

• parallel to a given line and


• pass through a given fixed plane curve (in a plane not parallel to the given line).

Example 14.3.13
Here are sketches of three cylinders. The familiar cylinder on the left below
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

x2 ` y 2 “ 1 x2 ` py ´ zq2 “ 1

is called a right circular cylinder, because the given fixed plane curve (x2 + y2 = 1, z = 0) is a circle
and the given line (the z-axis) is perpendicular (i.e. at right angles) to the fixed plane curve.
The cylinder on the left above can be thought of as a vertical stack of circles. The cylinder on
the right above can also be thought of as a stack of circles, but the centre of the circle at height z has
been shifted rightward to (0, z, z). For that cylinder, the given fixed plane curve is once again the
circle x2 + y2 = 1, z = 0, but the given line is y = z, x = 0.
We have already seen the third cylinder

x yz “ 1
x, y, z ą 0

in Example 14.3.4. It is called a hyperbolic cylinder. In this example, the given fixed plane curve is
the hyperbola yz = 1, x = 0 and the given line is the x-axis.
Example 14.3.13

§§ Quadric Surfaces
Another named class of relatively simple, but commonly occurring, surfaces is the quadric surfaces.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

Definition 14.3.14 (Quadrics).

A quadric surface is surface that consists of all points that obey Q(x, y, z) = 0, with Q
being a polynomial of degree two9 .

For Q(x, y, z) to be a polynomial of degree two, it must be of the form

Q(x, y, z) = Ax2 + By2 + Cz2 + Dxy + Eyz + Fxz + Gx + Hy + Iz + J

for some constants A, B, ¨ ¨ ¨ , J. Each constant z cross section of a quadric surface has an equation of
the form

Ax2 + Dxy + By2 + gx + hy + j = 0, z = z0

If A = B = D = 0 but g and h are not both zero, this is a straight line. If A, B, and D are not all
zero, then by rotating and translating our coordinate system the equation of the cross section can be
brought into one of the forms10

• αx2 + β y2 = γ with α, β ą 0, which, if γ ą 0, is an ellipse (or a circle),

• αx2 ´ β y2 = γ with α, β ą 0, which, if γ ‰ 0, is a hyperbola, and if γ = 0 is two lines,

• x2 = δ y, which, if δ ‰ 0 is a parabola, and if δ = 0 is a straight line.

There are similar statements for the constant y cross sections and the constant z cross sections. Hence
quadratic surfaces are built by stacking these three types of curves.
We have already seen a number of quadric surfaces in the last couple of sections.

• We saw the quadric surface 4x2 + y2 ´ z2 = 1 in Example 14.3.2.

Its constant z cross sections are ellipses and its x = 0 and y = 0 cross sections are hyperbolae.
It is called a hyperboloid of one sheet.

• We saw the quadric surface x2 + y2 = 1 in Example 14.3.13.

9 Technically, we should also require that the polynomial can’t be factored into the product of two polynomials of
degree one.
10 This statement can be justified using a linear algebra eigenvalue/eigenvector analysis. It is beyond what we can
cover here, but is not too difficult for a standard linear algebra course.
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

Its constant z cross sections are circles and its x = 0 and y = 0 cross sections are straight lines.
It is called a right circular cylinder.

• the quadric surface x2 + (y ´ z)2 = 1 in Example 14.3.13, and

• We saw the quadric surface yz = 1 in Example 14.3.4.

Example 14.3.15 (Indifference curves)


Suppose a function U (x, y) gives the happiness11 (or utility) a consumer gains when they purchase
x units of Good X and y units of Good Y . The level curves of the surface z = U (x, y) are called
indifference curves, because every point along that curve results in the same benefit to the consumer.
?
Suppose U (x, y) = x y. The purchasing 2 units of Good X and one unit of Good Y produces the
same benefit as purchasing 1 unit of Good X and 4 units of Good Y , because both these combinations
are on the level curve U (x, y) = 2.

?
1 x y=2

x
1 2

?
Let’s make a small contour map of our surface U (x, y) = x y, plotting several indifference
? 2
curves. (Note x y = c is equivalent to y = xc2 in our model domain.)

11 An amusing thought experiment is to propose units for measuring happiness. ”The one-point increase in GDP was
associated with an average increase of 3.7 wrinkly puppy faces of happiness nation-wide.”
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

=1
=2
=3
=4
=5
y

U
U
U
U
U
x

Not surprisingly, if we move roughly in the direction of the (1, 1) (that is, increasing both x and
y), our happiness U (x, y) goes up.
Note that none of the indifference curves touch either of the x or y axes. It is clear enough
from the formula that U (0, y) = U (x, 0) = 0. This is a common feature of utility functions: that to
maximize utility, a consumer will have at least a little of both products, rather than consuming only
one type.
Example 14.3.15
G EOMETRY IN T HREE D IMENSIONS 14.3 S KETCHING SURFACES IN 3D

424
PARTIAL D ERIVATIVES

Chapter 15

(F LAVOUR C) PARTIAL D ERIVATIVES

In this chapter we are going to generalize the definition of “derivative” to functions of more than
one variable, and then we are going to use those derivatives. We can speed things up considerably
by recycling what we have already learned in the single-variable case.

15.1 IJ Partial derivatives

Learning Objectives
• Compute partial derivatives of two-variable functions.

• Provide a physical interpretation of a partial derivative in terms of directional steepness


at a point on a surface.

First, recall how we defined the derivative, f 1 (a), of a function of one variable, f (x). We imagined
that we were walking along the x-axis, in the positive direction, measuring, for example, the
temperature along the way. We denoted by f (x) the temperature at x. The instantaneous rate of
change of temperature that we observed as we passed through x = a was

df f (a + h) ´ f (a) f (x ) ´ f (a)
(a) = lim = lim
dx hÑ0 h xÑa x´a

Next suppose that we are walking in the xy-plane and that the temperature at (x, y) is f (x, y).
We can pass through the point (x, y) = (a, b) moving in many different directions, and we cannot
expect the measured rate of change of temperature if we walk parallel to the x-axis, in the direction
of increasing x, to be the same as the measured rate of change of temperature if we walk parallel to
the y-axis in the direction of increasing y. We’ll start by considering just those two directions. other
directions (like walking parallel to the line y = x) later.
Suppose that we are passing through the point (x, y) = (a, b) and that we are walking parallel to
the x-axis (in the positive direction). Then our y-coordinate will be constant, always taking the value
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES

y = b. So we can think of the measured temperature as the function of one variable B(x) = f (x, b)
and we will observe the rate of change of temperature

dB B(a + h) ´ B(a) f (a + h, b) ´ f (a, b)


(a) = lim = lim
dx hÑ0 h hÑ0 h

Bf 
This is called the “partial derivative f with respect to x at (a, b)” and is denoted Bx y (a, b). Here

˝ the symbol B, which is read “partial”, indicates that we are dealing with a function of more
than one variable and 
˝ the subscript y on y
indicates that y is being held fixed, i.e. being treated as a constant,
and
˝ the x in BBxf indicates that we are differentiating with respect to x.
Bf
˝ Bx is read “ partial dee f dee x”.

d B d B
Do not write dx when Bx is appropriate. (There exist situations when dx f and Bx f are both defined
and have different meanings.)
If, instead, we are passing through the point (x, y) = (a, b) and are walking parallel to the y-axis
(in the positive direction), then our x-coordinate will be constant, always taking the value x = a. So
we can think of the measured temperature as the function of one variable A(y) = f (a, y) and we
will observe the rate of change of temperature

dA A(b + h) ´ A(b) f (a, b + h) ´ f (a, b)


(b) = lim = lim
dy hÑ0 h hÑ0 h

Bf 
This is called the “partial derivative f with respect to y at (a, b)” and is denoted By x (a, b).
Just as was the case for the ordinary derivative ddxf (x), it is common to treat the partial derivatives
of f (x, y) as functions of (x, y) simply by evaluating the partial derivtives at (x, y) rather than at
(a, b).

Definition 15.1.1 (Partial Derivatives).

The x- and y-partial derivatives of the function f (x, y) are


 
Bf f (x + h, y) ´ f (x, y)
(x, y) = lim
Bx y hÑ0 h
 
Bf f (x, y + h) ´ f (x, y)
(x, y) = lim
By x hÑ0 h

respectively. The partial derivatives of functions of more than two variables are defined
analogously.

Partial derivatives are used a lot. And there many notations for them.
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES

Notation 15.1.2.
 
Bf
The partial derivative Bx y of a function f (x, y) is also denoted

Bf
fx Dx f D1 f
Bx
The subscript 1 on D1 f indicates
 that
 f is being differentiated with respect to its first
Bf
variable. The partial derivative Bx (a, b) is also denoted
y

B f ˇˇ
ˇ

Bx ˇ(a,b)

with the subscript (a, b) indicating


  that BBxf is being evaluated at (x, y) = (a, b). The
abbreviated notation BBxf for BBxf is extremely commonly used. But it is dangerous to do
y
so, when it is not clear from the context, that it is the variable y that is being held fixed.

Remark 15.1.3 (The Geometric Interpretation of Partial Derivatives). We’ll now develop a
geometric interpretation of the partial derivative

 
Bf f (a + h, b) ´ f (a, b)
(a, b) = lim
Bx y hÑ0 h

in terms of the shape of the graph z = f (x, y) of the function f (x, y). That graph appears in the
figure below. It looks like the part of a deformed sphere that is in the first octant.
 
Bf
The definition of Bx y (a, b) concerns only points on the graph that have y = b. In other words,
the curve of intersection of the surface z = f (x, y) with the plane y = b. That is the red curve in the
figure. The two blue vertical line segments in the figure have heights f (a, b) and f (a + h, b), which
f (a+h,b)´ f (a,b)
are the two numbers in the numerator of h .
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES

z “ f px, yq
y“b
f pa ` h, bq ´ f pa, bq
f pa, bq
f pa ` h, bq
y
pa, b, 0q
h
pa ` h, b, 0q

A side view of the curve (looking from the left side of the y-axis) is sketched in the figure below.

f pa ` h, bq ´ f pa, bq

z “ f px, bq, y “ b

f pa, bq
f pa ` h, bq

x
pa, b, 0q pa ` h, b, 0q

Again, the two blue vertical line segments in the figure have heights f (a, b) and f (a + h, b), which
f (a+h,b)´ f (a,b)
are the two numbers in the numerator of h . So the numerator f (a + h, b) ´ f (a, b) and
denominator
  h are the rise and run, respectively, of the curve z = f (x, b) from x = a to x = a + h.
Bf
Thus Bx (a, b) is exactly the slope of (the tangent to) the curve of intersection of the surface
y  

z = f (x, y) and the plane y = b at the point a, b, f (a, b) . In the same way BByf (a, b) is exactly
x
the slope of (the tangent to) the curve of intersection of the surface z = f (x, y) and the plane x = a
at the point a, b, f (a, b) .
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES

§§§ Evaluation of Partial Derivatives


B
From the above discussion, we see that we can readily compute partial derivatives Bx by using what
d
we already know about ordinary derivatives dx . More precisely,

• to evaluate BBxf (x, y), treat the y in f (x, y) as a constant and differentiate the resulting function
of x with respect to x.

• To evaluate BByf (x, y), treat the x in f (x, y) as a constant and differentiate the resulting function
of y with respect to y.

• To evaluate BBxf (a, b), treat the y in f (x, y) as a constant and differentiate the resulting function
of x with respect to x. Then evaluate the result at x = a, y = b.

• To evaluate BByf (a, b), treat the x in f (x, y) as a constant and differentiate the resulting function
of y with respect to y. Then evaluate the result at x = a, y = b.

Now for some examples.


Example 15.1.4
Let
f (x, y) = x3 + y2 + 4xy2
B
Then, since Bx treats y as a constant,
 
Bf Bf B 3 B B
= = (x ) + (y2 ) + (4xy2 )
Bx Bx y Bx Bx Bx
B
= 3x2 + 0 + 4y2 (x)
Bx
2 2
= 3x + 4y
B
and, since By treats x as a constant,
 
Bf Bf B 3 B B
= = (x ) + (y2 ) + (4xy2 )
By By x By By By
B
= 0 + 2y + 4x (y2 )
By
= 2y + 8xy

In particular, at (x, y) = (1, 0) these partial derivatives take the values

Bf
(1, 0) = 3(1)2 + 4(0)2 = 3
Bx
Bf
(1, 0) = 2(0) + 8(1)(0) = 0
By

Example 15.1.4
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES

Example 15.1.5
Let
f (x, y) = y cos x + xexy
B yx
Then, since B
Bx treats y as a constant, Bx e = yeyx and

Bf
(x, y) = ´y sin x + exy + xyexy
Bx
Bf
(x, y) = cos x + x2 exy
By

Example 15.1.5
Let’s move up to a function of four variables. Things generalize in a quite straight forward way.
Example 15.1.6
Let
f (x, y, z,t ) = x sin(y + 2z) + t 2 e3y ln z
Then
Bf
(x, y, z,t ) = sin(y + 2z)
Bx
Bf
(x, y, z,t ) = x cos(y + 2z) + 3t 2 e3y ln z
By
Bf
(x, y, z,t ) = 2x cos(y + 2z) + t 2 e3y /z
Bz
Bf
(x, y, z,t ) = 2te3y ln z
Bt

Example 15.1.6
Now here is a more complicated example — our function takes a special value at (0, 0). To compute
derivatives there we have to revert to the definition.
Example 15.1.7
Set #
cos x´cos y
x´y if x ‰ y
f (x, y) =
0 if x = y

If b ‰ a, then for all (x, y) sufficiently close to (a, b), f (x, y) = cos x´cos
x´y
y
and we can compute the
partial derivatives of f at (a, b) using the familiar rules of differentiation. However that is not the
case for (a, b) = (0, 0). To evaluate fx (0, 0), we need to set y = 0 and find the derivative of
#
cos x´1
x if x ‰ 0
f (x, 0) =
0 if x = 0
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES

with respect to x at x = 0. To do so, we basically have to apply the definition


f (h, 0) ´ f (0, 0)
fx (0, 0) = lim
hÑ0 h
cos h´1
´0
= lim h (Recall that h ‰ 0 in the limit.)
hÑ0 h
cos h ´ 1
= lim
hÑ0 h2
´ sin h
= lim (By l’Hôpital’s rule.)
hÑ0 2h
´ cos h
= lim (By l’Hôpital again.)
hÑ0 2
1

2
Example 15.1.7

Example 15.1.8
Again set #
cos x´cos y
x´y if x ‰ y
f (x, y) =
0 if x = y
We’ll now compute fy (x, y) for all (x, y).
The case y ‰ x: When y ‰ x,
B cos x ´ cos y
fy (x, y) =
By x´y
(x ´ y) By
B
(cos x ´ cos y) ´ (cos x ´ cos y) By
B
(x ´ y)
= by the quotient rule
(x ´ y)2
(x ´ y) sin y + cos x ´ cos y
=
(x ´ y)2

The case y = x: When y = x,


f (x, y + h) ´ f (x, y) f (x, x + h) ´ f (x, x)
fy (x, y) = lim = lim
hÑ0 h hÑ0 h
cos x´cos(x+h)
x´(x+h)
´0
= lim (Recall that h ‰ 0 in the limit.)
hÑ0 h
cos(x + h) ´ cos x
= lim
hÑ0 h2
Now we apply L’Hôpital’s rule, remembering that, in this limit, x is a constant and h is the variable
— so we differentiate with respect to h.
´ sin(x + h)
fy (x, y) = lim
hÑ0 2h
PARTIAL D ERIVATIVES 15.1 PARTIAL DERIVATIVES

Note that if x is not an integer multiple of π, then the numerator ´ sin(x + h) does not tend to zero
as h tends to zero, and the limit giving fy (x, y) does not exist. On the other hand, if x is an integer
multiple of π, both the numerator and denominator tend to zero as h tends to zero, and we can apply
L’Hôpital’s rule a second time. Then
´ cos(x + h)
fy (x, y) = lim
hÑ0 2
cos x

2

The conclusion:
$ (x´y) sin y+cos x´cos y

& (x´y)2
if x ‰ y
fy (x, y) = ´ cos x
if x = y with x an integer multiple of π
% 2

DNE if x = y with x not an integer multiple of π

Example 15.1.8
Our next example uses implicit differentiation.
Example 15.1.9
The equation
z5 + y2 ez + e2x = 0
implicitly determines z as a function of x and y. For example, when x = y = 0, the equation reduces
to
z5 = ´1
which forces1 z(0, 0) = ´1. Let’s find the partial derivative Bx
Bz
(0, 0).
We are not going to be able to explicitly solve the equation for z(x, y). All we know is that

z(x, y)5 + y2 ez(x,y) + e2x = 0

for all x and y. We can turn this into an equation for Bz


Bx (0, 0) by differentiating2 the whole equation
with respect to x, giving
Bz Bz
5z(x, y)4 (x, y) + y2 ez(x,y) (x, y) + 2e2x = 0
Bx Bx
and then setting x = y = 0, giving
Bz
5z(0, 0)4 (0, 0) + 2 = 0
Bx
As we already know that z(0, 0) = ´1,
Bz 2 2
(0, 0) = ´ =´
Bx 5z(0, 0) 4 5

1 The only real number z which obeys z5 = ´1 is z = ´1. However there are four other complex numbers which
also obey z5 = ´1.
2 You should have already seen this technique, called implicit differentiation, in your first Calculus course.
PARTIAL D ERIVATIVES 15.2 H IGHER ORDER DERIVATIVES

Example 15.1.9
Next we have a partial derivative disguised as a limit.
Example 15.1.10
In this example we are going to evaluate the limit

(x + y + z)3 ´ (x + y)3
lim
zÑ0 (x + y)z

The critical observation is that, in taking the limit z Ñ 0, x and y are fixed. They do not change as
z is getting smaller and smaller. Furthermore this limit is exactly of the form of the limits in the
Definition 15.1.1 of partial derivative, disguised by some obfuscating changes of notation.
Set
(x + y + z)3
f (x, y, z) =
(x + y)
Then

(x + y + z)3 ´ (x + y)3 f (x, y, z) ´ f (x, y, 0) f (x, y, 0 + h) ´ f (x, y, 0)


lim = lim = lim
zÑ0 (x + y)z zÑ0 z hÑ0 h
Bf
= (x, y, 0)
Bz
 
B (x + y + z)3
=
Bz x+y z=0

B
Recalling that Bz treats x and y as constants, we are evaluating the derivative of a function of the
(const+z)3
form const . So

(x + y + z)3 ´ (x + y)3 (x + y + z)2 ˇˇ


ˇ
lim =3
zÑ0 (x + y)z x + y ˇz = 0
= 3(x + y)

Example 15.1.10

15.2 IJ Higher order derivatives

Learning Objectives
• Compute the second order partial derivatives given a function of two variables.

• State without proof that the mixed partials should be equal for “nice” functions.
PARTIAL D ERIVATIVES 15.2 H IGHER ORDER DERIVATIVES

You have already observed, in your first Calculus course, that if f (x) is a function of x, then its
derivative, ddxf (x), is also a function of x, and can be differentiated to give the second order derivative
d2 f
dx2
(x ), which can in turn be differentiated yet again to give the third order derivative, f (3) (x), and
so on.
We can do the same for functions of more than one variable. If f (x, y) is a function of x and y,
then both of its partial derivatives, BBxf (x, y) and BByf (x, y) are also functions of x and y. They can both
be differentiated with respect to x and they can both be differentiated with respect to y. So there are
four possible second order derivatives. Here they are, together with various alternate notations.
 
B Bf B2 f
(x, y) = (x, y) = fxx (x, y)
Bx Bx Bx2
 
B Bf B2 f
(x, y) = (x, y)= fxy (x, y)
By Bx ByBx
 
B Bf B2 f
(x, y) = (x, y)= fyx (x, y)
Bx By BxBy
 
B Bf B2 f
(x, y) = (x, y) = fyy (x, y)
By By By2

Warning 15.2.1.

B f 2 2
B B
In By Bx = By Bx f , the derivative closest to f , in this case Bx , is applied first. So we work
through the variables in the bottom right-to-left.
In fxy , the derivative with respect to the variable closest to f , in this case x, is applied first.
So we work through the subscript variables left-to-right.

The difference in “direction” highlighted in the warning seems confusing at first, but it stems from
the way the first partial derivative is written. In the fractional notation, if f is being differentiated
with respect to x, we write BBxf or Bx
B B
f . So the operator Bx is added to the left of the function.
h i Now 2
suppose we want to differentiate BBxf with respect to y. By analogy, we would write By B Bf B f
Bx , or ByBx .
This leads to the order of variables being right-to-left.
With the subscript notation, if f is being differentiated with respect to x, we write fx , with the
variable on the right of the function. So now if we take the second derivative with respect to y, it
makes sense by analogy to add that new variable to the right: ( fx )y , or fxy , in left-to-right order.
Example 15.2.2
Let f (x, y) = emy cos(nx). Then

fx = ´nemy sin(nx) fy = memy cos(nx)


fxx = ´n2 emy cos(nx) fyx = ´mnemy sin(nx)
fxy = ´mnemy sin(nx) fyy = m2 emy cos(nx)
PARTIAL D ERIVATIVES 15.2 H IGHER ORDER DERIVATIVES

Example 15.2.2

Example 15.2.3
Let f (x, y) = eαx+β y . Then

fx = αeαx+β y fy = β eαx+β y
fxx = α 2 eαx+β y fyx = β αeαx+β y
fxy = αβ eαx+β y fyy = β 2 eαx+β y

More generally, for any integers m, n ě 0,

B m+n f
= α m β n eαx+β y
Bxm Byn

Example 15.2.3

Example 15.2.4
If f (x1 , x2 , x3 , x4 ) = x14 x23 x32 x4 , then

B4 f B3 
= x14 x23 x32
Bx1 Bx2 Bx3 Bx4 Bx1 Bx2 Bx3
B2 
= 2 x14 x23 x3
Bx1 Bx2
B 
= 6 x14 x22 x3
Bx1
= 24 x13 x22 x3

and

B4 f B3 
= 4x13 x23 x32 x4
Bx4 Bx3 Bx2 Bx1 Bx4 Bx3 Bx2
B2 
= 12 x13 x22 x32 x4
Bx4 Bx3
B 
= 24 x13 x22 x3 x4
Bx4
= 24 x13 x22 x3

Example 15.2.4
Notice that in Example 15.2.2,

fxy = fyx = ´mnemy sin(nx)


PARTIAL D ERIVATIVES 15.2 H IGHER ORDER DERIVATIVES

and in Example 15.2.3


fxy = fyx = αβ eαx+β y

and in Example 15.2.4

B4 f B4 f
= = 24 x13 x22 x3
Bx1 Bx2 Bx3 Bx4 Bx4 Bx3 Bx2 Bx1

In all of these examples, it didn’t matter what order we took the derivatives in. The following
theorem3 shows that this was no accident.

Theorem 15.2.5 (Clairaut’s Theorem4 or Schwarz’s Theorem5 ).


B2 f B2 f
If the partial derivatives BxBy and ByBx exist and are continuous at (x0 , y0 ), then

B2 f B2 f
(x0 , y0 ) = (x0 , y0 )
BxBy ByBx

We won’t use this theorem a whole lot in Math 105. It can occasionally be useful to note that as
long as a function is continuous and differentiable, you can differentiate it in any “order.”
Example 15.2.6
Let f (x, y) = x5 ex + y. Find fxxxy .

Solution. Since f (x, y) is continuous and differentiable everywhere, then the order of differentiation
doesn’t matter. Rather than starting with respect to x (which is harder), we start with respect to y
(which is easier).

fy = 1
fyx = 0 ùñ fxy = 0
fxyx = 0 ùñ fxxy = 0
fxxyx = 0 ùñ fxxxy = 0

Example 15.2.6

3 The history of this important theorem is pretty convoluted. See “A note on the history of mixed partial derivatives”
by Thomas James Higgins which was published in Scripta Mathematica 7 (1940), 59-62.
4 Alexis Clairaut (1713–1765) was a French mathematician, astronomer, and geophysicist.
5 Hermann Schwarz (1843–1921) was a German mathematician.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS

Chapter 16

(F LAVOUR C) O PTIMIZATION OF
M ULTIVARIABLE F UNCTIONS

16.1 IJ Local maximum and minimum values


One of the core topics in single variable calculus courses is finding the maxima and minima of
functions of one variable. We’ll now extend that discussion to functions of more than one variable1 .
To keep things simple, we’ll focus on functions with two variables. It’s worth noting, though, that
many of the techniques we use will generalize to functions with even more. To start, we have the
following natural extensions to some familiar definitions.

Definition 16.1.1.

Let the function f (x, y) be defined for all (x, y) in some subset R of R2 . Let (a, b) be a
point in R.

• (a, b) is a local maximum of f (x, y) if f (x, y) ď f (a, b) for all (x, y) close to (a, b).
More precisely, (a, b) is a local maximum of f (x, y) if there is an r ą 0 such that
f (x, y) ď f (a, b) for all points (x, y) within a distance r of (a, b).

• (a, b) is a local minimum of f (x, y) if f (x, y) ě f (a, b) for all (x, y) close to (a, b).

• Local maximum and minimum values are also called extremal values.

• (a, b) is an absolute maximum or global maximum of f (x, y) if f (x, y) ď f (a, b) for


all (x, y) in R.

• (a, b) is an absolute minimum or global minimum of f (x, y) if f (x, y) ě f (a, b) for


all (x, y) in R.

1 Life is not (always) one-dimensional and sometimes we have to embrace it.


O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

16.1.1 §§ Critical points

Learning Objectives
• Define critical point and singular point for a function of two variables.

• Compute the critical points and singular points of a given function of two variables.

• State (without proof) that extreme values of a continuous multivariable function will
occur at critical or singular points.

• Be able to visualize critical points as ‘flat spots.’

One of the first things you did when you were developing the techniques used to find the maximum
and minimum values of f (x) was to ask yourself2
Suppose that the largest value of f (x) is f (a). What does that tell us about a?
After a little thought you answered
If the largest value of f (x) is f (a) and f is differentiable at a, then f 1 (a) = 0.

y “ f pxq

Let’s recall why that’s true. Suppose that the largest value of f (x) is f (a). Then for all h ą 0,
f (a + h) ´ f (a)
f (a + h) ď f (a) ùñ f (a + h) ´ f (a) ď 0 ùñ ď0 if h ą 0
h
Taking the limit h Ñ 0 tells us that f 1 (a) ď 0. Similarly, for all h ă 0,
f (a + h) ´ f (a)
f (a + h) ď f (a) ùñ f (a + h) ´ f (a) ď 0 ùñ ě0 if h ă 0
h
Taking the limit h Ñ 0 now tells us that f 1 (a) ě 0. So we have both f 1 (a) ě 0 and f 1 (a) ď 0 which
forces f 1 (a) = 0.
You also observed at the time that for this argument to work, you only need f (x) ď f (a) for
all x’s close to a, not necessarily for all x’s in the whole world. (In the above inequalities, we only
used f (a + h) with h small.) Since we care only about f (x) for x near a, we can refine the above
statement.

2 Or perhaps your instructor asked you.


O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

If f (a) is a local maximum for f (x) and f is differentiable at a, then f 1 (a) = 0.

Precisely the same reasoning applies to minima.

If f (a) is a local minimum for f (x) and f is differentiable at a, then f 1 (a) = 0.

Let’s use the ideas of the above discourse to extend the study of local maxima and local minima
to functions of more than one variable. Suppose that the function f (x, y) is defined for all (x, y) in
some subset R of R2 , that (a, b) is point of R that is not on the boundary of R, and that f has a local
maximum at (a, b). See the figure below.

pa,b , f pa,bqq

z “ f px, yq

y
pa,bq
R
x

Then the function f (x, y) must decrease in value as (x, y) moves away from (a, b) in any direction.
If we change the x-coordinate a little, f (x, y) must not increase. So for all h ą 0:

f (a + h, b) ´ f (a, b)
f (a + h, b) ď f (a, b) ùñ f (a + h, b) ´ f (a, b) ď 0 ùñ ď0 if h ą 0
h

Taking the limit h Ñ 0 tells us that fx (a, b) ď 0.


Similarly, for all h ă 0,

f (a + h, b) ´ f (a, b)
f (a + h, b) ď f (a, b) ùñ f (a + h, b) ´ f (a, b) ď 0 ùñ ě0 if h ă 0
h

Taking the limit h Ñ 0 now tells us that fx (a, b) ě 0. So we have both fx (a, b) ě 0 and fx (a, b) ď 0
which forces fx (a, b) = 0. The same reasoning tells us fy (a, b) = 0 as well, and that these partial
derivatives are zero for minima as well as maxima.
This is an important and useful result, so let’s theoremise it.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

Theorem 16.1.2.

Let the function f (x, y) be defined for all (x, y) in some subset R of R2 . Assume that

˝ (a, b) is a point of R that is not on the boundary of R and


˝ (a, b) is a local maximum or local minimum of f and that
˝ the partial derivatives of f exist at (a, b).

Then

fx (a, b) = 0
and fy (a, b) = 0

Definition 16.1.3.

Let f (x, y) be a function and let (a, b) be a point in its domain. Then we call (a, b) a
critical point (or a stationary point) of the function if

• fx (a, b) does not exist, or

• fy (a, b) does not exist, or

• fx (a, b) = fy (a, b) = 0.

Warning 16.1.4.

Note that some people (and texts) do not include the cases where one or both partial
derivatives do not exist in the definition of a critical point. These points would (usually)
be referred as a singular point of the function. We do not use this terminology.

Warning 16.1.5.

Theorem 16.1.2 tells us that every local maximum or minimum (in the interior of the
domain of a differentiable function) is a critical point. Beware that it does not3 tell us that
every critical point is either a local maximum or a local minimum.

In fact, as we shall see in Example 16.1.12, critical points that are neither local maxima nor a
local minima. None-the-less, Theorem 16.1.2 is very useful because often functions have only a
small number of critical points. To find local maxima and minima of such functions, we only need

3 A very common error of logic that people make is “Affirming the consequent”. “If P then Q” is true, does not imply
that “If Q then P” is true . The statement “If he is Shakespeare then he is dead” is true. But concluding from “That
sheep is dead” that “He must be Shakespeare” is just silly.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

to consider its critical points. We’ll return later to the question of how to tell if a critical point is a
local maximum, local minimum or neither. For now, we’ll just practice finding critical points.

Example 16.1.6 f (x, y) = x2 ´ 2xy + 2y2 + 2x ´ 6y + 12
Find all critical points of f (x, y) = x2 ´ 2xy + 2y2 + 2x ´ 6y + 12.
Solution. To find the critical points, we need to find the first order partial derivatives. So, as a
preliminary calculation, we find the two first order partial derivatives of f (x, y).

fx (x, y) = 2x ´ 2y + 2
fy (x, y) = ´2x + 4y ´ 6

These functions are defined everywhere. So the critical points are the solutions of the pair of
equations
2x ´ 2y + 2 = 0 ´ 2x + 4y ´ 6 = 0
or equivalently (dividing by two and moving the constants to the right hand side)

x ´ y = ´1 (E1)
´x + 2y = 3 (E2)

This is a system of two equations in two unknowns (x and y). One strategy for solving system like
this is to
• First use one of the equations to solve for one of the unkowns in terms of the other unknown.
For example, (E1) tells us that y = x + 1. This expresses y in terms of x. We say that we have
solved for y in terms of x.

• Then substitute the result, y = x + 1 in our case, into the other equation, (E2). In our case, this
gives
´x + 2(x + 1) = 3 ðñ x + 2 = 3 ðñ x = 1

• We have now found that x = 1, y = x + 1 = 2 is the only solution. So the only critical point
is (1, 2). Of course it only takes a moment to verify that fx (1, 2) = fy (1, 2) = 0. It is a good
idea to do this as a simple check of our work.
An alternative strategy for solving a system of two equations in two unknowns, like (E1) and (E2),
is to

• add equations (E1) and (E2) together. This gives

(E1) + (E2) : (1 ´ 1)x + (´1 + 2)y = ´1 + 3 ðñ y = 2

The point here is that adding equations (E1) and (E2) together eliminates the unknown x,
leaving us with one equation in the unknown y, which is easily solved. For other systems of
equations you might have to multiply the equations by some numbers before adding them
together.

• We now know that y = 2. Substituting it into (E1) gives us

x ´ 2 = ´1 ùñ x = 1
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

• Once again (thankfully) we have found that the only critical point is (1, 2).

Example 16.1.6
This was pretty easy because we only had to solve linear equations, which in turn was a consequence
of the fact that f (x, y) was a polynomial of degree two. Here is an example with some slightly more
challenging algebra.

Example 16.1.7 f (x, y) = 2x3 ´ 6xy + y2 + 4y
Find all critical points of f (x, y) = 2x3 ´ 6xy + y2 + 4y.
Solution. As in the last example, we need to find where the partial derivatives do not exist or are
zero.
fx = 6x2 ´ 6y fy = ´6x + 2y + 4
These functions are defined everywhere. So the critical points are the solutions of

6x2 ´ 6y = 0 ´ 6x + 2y + 4 = 0

We can rewrite the first equation as y = x2 , which expresses y as a function of x. We can then
substitute y = x2 into the second equation, giving

´6x + 2y + 4 = 0 ðñ ´6x + 2x2 + 4 = 0 ðñ x2 ´ 3x + 2 = 0 ðñ (x ´ 1)(x ´ 2) = 0


ðñ x = 1 or 2

When x = 1, y = 12 = 1 and when x = 2, y = 22 = 4. So, there are two critical points: (1, 1), (2, 4).
Alternatively, we could have also used the second equation to write y = 3x ´ 2, and then
substituted that into the first equation to get

6x2 ´ 6(3x ´ 2) = 0 ðñ x2 ´ 3x + 2 = 0

just as above.
Example 16.1.7

And here is an example for which the algebra requires a bit more thought.
Example 16.1.8 ( f (x, y) = xy(5x + y ´ 15))
Find all critical points of f (x, y) = xy(5x + y ´ 15).
Solution. The first order partial derivatives of f (x, y) = xy(5x + y ´ 15) are

fx (x, y) = y(5x + y ´ 15) + xy(5) = y(5x + y ´ 15) + y(5x) = y(10x + y ´ 15)


fy (x, y) = x(5x + y ´ 15) + xy(1) = x(5x + y ´ 15) + x(y) = x(5x + 2y ´ 15)

Therefore the partial derivatives of the function exist everywhere in the domain of the function. The
critical points are the solutions of fx (x, y) = fy (x, y) = 0. That is, we need to find all x, y that satisfy
the pair of equations

y(10x + y ´ 15) = 0 (E1)


x(5x + 2y ´ 15) = 0 (E2)
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

The first equation, y(10x + y ´ 15) = 0, is satisfied if at least one of the two factors y, (10x + y ´ 15)
is zero. So the first equation is satisfied if at least one of the two equations
y=0 (E1a)
10x + y = 15 (E1b)
is satisfied. The second equation, x(5x + 2y ´ 15) = 0, is satisfied if at least one of the two factors x,
(5x + 2y ´ 15) is zero. So the second equation is satisfied if at least one of the two equations
x=0 (E2a)
5x + 2y = 15 (E2b)
is satisfied.
So both critical point equations (E1) and (E2) are satisfied if and only if at least one of (E1a),
(E1b) is satisfied and in addition at least one of (E2a), (E2b) is satisfied. So both critical point
equations (E1) and (E2) are satisfied if and only if at least one of the following four possibilities
hold.
• (E1a) and (E2a) are satisfied if and only if x = y = 0
• (E1a) and (E2b) are satisfied if and only if y = 0, 5x + 2y = 15 ðñ y = 0, 5x = 15
• (E1b) and (E2a) are satisfied if and only if 10x + y = 15, x = 0 ðñ y = 15, x = 0
• (E1b) and (E2b) are satisfied if and only if 10x + y = 15, 5x + 2y = 15. We can use, for
example, the second of these equations to solve for x in terms of y: x = 15 (15 ´ 2y). When we
substitute this into the first equation we get 2(15 ´ 2y) + y = 15, which we can solve for y.
This gives ´3y = 15 ´ 30 or y = 5 and then x = 51 (15 ´ 2 ˆ 5) = 1.
In conclusion, the critical points are (0, 0), (3, 0), (0, 15) and (1, 5).
A more compact way to write what we have just done is
fx (x, y) = 0 and fy (x, y) = 0
ðñ y(10x + y ´ 15) = 0 and x(5x + 2y ´ 15) = 0
( (
ðñ y = 0 or 10x + y = 15 and x = 0 or 5x + 2y = 15
( ( (
ðñ y = 0, x = 0 or y = 0, 5x + 2y = 15 or 10x + y = 15, x = 0 or
(
10x + y = 15, 5x + 2y = 15
( ( ( (
ðñ x = y = 0 or y = 0, x = 3 or x = 0, y = 15 or x = 1, y = 5

Example 16.1.8

Let’s try a more practical example — something from the real world. Well, a mathematician’s
“real world”. The interested reader should search-engine their way to a discussion of “idealisation”,
“game theory” “Cournot models” and “Bertrand models”. But don’t spend too long there. A
discussion of breweries is about to take place.
Example 16.1.9
In a certain community, there are two breweries in competition4 , so that sales of each negatively

4 We have both types of music here — country and western.


O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

affect the profits of the other. If brewery A produces x litres of beer per month and brewery B
produces y litres per month, then the profits of the two breweries are given by
2x2 + y2 4y2 + x2
P = 2x ´ Q = 2y ´
106 2 ˆ 106
respectively. Find the sum of the two profits if each brewery independently sets its own production
level to maximize its own profit and assumes that its competitor does likewise. Then, assuming
cartel behaviour, find the sum of the two profits if the two breweries cooperate so as to maximize
that sum5 .
Solution. If A adjusts x to maximize P (for y held fixed) and B adjusts y to maximize Q (for x held
fixed) then we want to find the (x, y) using
4x
Px = 2 ´ 10 6
8y
Qy = 2 ´ 2ˆ10 6

Note that Px and Qy exists everywhere. Then x and y are determined by the equations
Px = 0 (E1)
Qy = 0 (E2)
Equation (E1) yields x = 12 106 and equation (E2) yields y = 12 106 . Knowing x and y we can
determine P, Q and the total profit

P + Q = 2(x + y) ´ 1016 25 x2 + 3y2

= 106 1 + 1 ´ 58 ´ 43 = 58 106

On the other hand if (A, B) adjust (x, y) to maximize P + Q = 2(x + y) ´ 1016 52 x2 + 3y2 , then x and
y are determined by
5x
(P + Q)x = 2 ´ 10 6 = 0 (E1)
6y
(P + Q)y = 2 ´ 106
=0 (E2)
Equation (E1) yields x = 25 106 and equation (E2) yields y = 31 106 . Again knowing x and y we can
determine the total profit

P + Q = 2(x + y) ´ 1016 52 x2 + 3y2

= 106 45 + 32 ´ 25 ´ 13 = 11
15 10
6

So cooperating really does help their profits. Unfortunately, like a very small tea-pot, consumers
will be a little poorer6 .
Example 16.1.9
Moving swiftly away from the last pun, let’s do something a little more geometric.
Example 16.1.10
Equal angle bends are made at equal distances from the two ends of a 100 metre long fence so
the resulting three segment fence can be placed along an existing wall to make an enclosure of
trapezoidal shape. What is the largest possible area for such an enclosure?

5 This sort of thing is generally illegal.


6 The authors extend their deepest apologies.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

wall

θ θ

Solution. This is a very geometric problem (fenced off from pun opportunities), and as such we
should start by drawing a sketch and introducing some variable names.

x x x sin θ
θ θ
100 ´ 2x

The area enclosed by the fence is the area inside the blue rectangle (in the figure on the right above)
plus the area inside the two blue triangles.
A(x, θ ) = (100 ´ 2x)x sin θ + 2 ¨ 12 ¨ x sin θ ¨ x cos θ
= (100x ´ 2x2 ) sin θ + x2 sin θ cos θ
To maximize the area, we need to solve
BA
0= = (100 ´ 4x) sin θ + 2x sin θ cos θ
Bx
BA
= (100x ´ 2x2 ) cos θ + x2 cos2 θ ´ sin2 θ
(
0=

BA BA
Note that Bx and Bθ are defined everywhere in their domain (so here the critical points are the points
where both partial derivatives are zero). Both terms in the first equation contain the factor sin θ and
all terms in the second equation contain the factor x. If either sin θ or x are zero the area A(x, θ ) will
also be zero, and so will certainly not be maximal. So we may divide the first equation by sin θ and
the second equation by x, giving
(100 ´ 4x) + 2x cos θ = 0 (E1)
(100 ´ 2x) cos θ + x cos2 θ ´ sin2 θ = 0
(
(E2)
These equations might look a little scary. But there is no need to panic. They are not as bad as they
look because θ enters only through cos θ and sin2 θ , which we can easily write in terms of cos θ .
Furthermore we can eliminate cos θ by observing that the first equation forces cos θ = ´ 100´4x
2x and
(100´4x)2
hence sin2 θ = 1 ´ cos2 θ = 1 ´ 4x2
.
Substituting these into the second equation gives
 
100 ´ 4x (100 ´ 4x)2
´(100 ´ 2x) +x ´1 = 0
2x 2x2

ùñ ´(100 ´ 2x)(100 ´ 4x) + (100 ´ 4x)2 ´ 2x2 = 0

ùñ 6x2 ´ 200x = 0

100 ´100/3 1
ùñ x= cos θ = ´ = θ = 60˝
3 200/3 2
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

and the maximum area enclosed is


 ? ?
100 1002  3 1 1002 3 2500
A = 100 ´2 2 + 2
= ?
3 3 2 2 3 2 3

Example 16.1.10

Now here is a very useful (even practical!) statistical example — finding the line that best fits a
given collection of points.
Example 16.1.11 (Linear regression)
An experiment yields n data points (xi , yi ), i = 1, 2, ¨ ¨ ¨ , n. We wish to find the straight line
y = mx + b which “best” fits the data. The definition of “best” is “minimizes the root mean

pxn ,yn q

px1 ,y1 q px3 ,y3 q


px2 ,y2 q

y “ mx ` b
x

square error”, i.e. minimizes


n
ÿ
E (m, b) = (mxi + b ´ yi )2
i=1
Note that
th
h E (m, bi) is the square of the difference between yi , which is the i measured
• term number i in
value of y, and mx + b , which is the approximation to yi given by the line y = mx + b.
x=xi

• All terms in the sum are positive, regardless of whether the points (xi , yi ) are above or below
the line.

Our problem is to find the m and b that minimizes E (m, b). This technique for drawing a line through
a bunch of data points is called “linear regression”. It is used a lot7 8 . Even in the real world — and
not just the real world that you find in mathematics problems. The actual real world that involves
jobs.

7 Proof by search engine.


8 And has been used for a long time. It was introduced by the French mathematician Adrein-Marie Legendre,
1752–1833, in 1805, and by the German mathematician and physicist Carl Friedrich Gauss, 1777–1855, in 1809.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

Solution. We wish to choose m and b so as to minimize E (m, b). So we need to determine where
the partial derivatives of E do not exist, or exist and are equal to zero.

BE ÿ
n hřn i hřn i hř n i
= 2(mxi + b ´ yi )xi = m 2xi2 + b 2xi ´ 2xi yi
Bm i=1 i=1 i=1 i=1

BE ÿn hřn i hřn i hř n i
= 2(mxi + b ´ yi ) =m 2xi + b 2 ´ 2yi
Bb i=1 i=1 i=1 i=1

There are a lot of symbols here. But remember that all of the xi ’s and yi ’s are given constants. They
come from, for example, experimental data. The only unknowns are m and b. To emphasize this,
and to save some writing, define the constants
n n n n
xi2
ř ř ř ř
Sx = xi Sy = yi Sx 2 = Sxy = xi yi
i=1 i=1 i=1 i=1

The partial derivatives of E exists everywhere so we only need to find where they are equal to zero.
The equations which determine the critical points are (after dividing by two)

0 = Sx2 m + Sx b ´ Sxy ùñ Sx2 m + Sx b = Sxy (E1)


0 = Sx m + n b ´ Sy ùñ Sx m + n b = Sy (E2)

These are two linear equations on the unknowns m and b. They may be solved in any of the usual
ways. One is to use (E2) to solve for b in terms of m
1 
b= Sy ´ Sx m (E3)
n
and then substitute this into (E1) to get the equation
1  
Sx2 m + Sx Sy ´ Sx m = Sxy ùñ nSx2 ´ Sx2 m = nSxy ´ Sx Sy
n
for m. We can then solve this equation for m and substitute back into (E3) to get b. This gives
nSxy ´ Sx Sy Sx Sxy ´ Sy Sx2
m= b=´
nSx2 ´ Sx2 nSx2 ´ Sx2
Another way to solve the system of equations is
h i
n(E1) ´ Sx (E2) : nSx2 ´ Sx2 m = nSxy ´ Sx Sy
h i
´Sx (E1) + Sx2 (E2) : nSx2 ´ Sx b = ´Sx Sxy + Sy Sx2
2

which gives the same solution.


So given a bunch of data points, it only takes a quick bit of arithmetic — no calculus required
— to apply the above formulae and so to find the best fitting line. Of course while you don’t need
any calculus to apply the formulae, you do need calculus to understand where they came from. The
same technique can be extended to other types of curve fitting problems. For example, polynomial
regression.
Example 16.1.11
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

16.1.2 §§ Classifying critical points

Learning Objectives
• Use the second derivative test to classify critical points as either local maximums, local
minimums, or saddle points.

• Explain using words or pictures what a saddle point is.

Now let’s start thinking about how to tell if a critical point is a local minimum, local maximum, or
neither. We’ll start with an intuitive approach, then introduce the (multivariable) Second Derivative
Test.
You have already encountered single variable functions that have a critical point which is neither
a local max nor a local min. This can also happen for functions of two variables. We’ll start with the
simplest possible such example.

Example 16.1.12 f (x, y) = x2 ´ y2
The first partial derivatives of f (x, y) = x2 ´ y2 are fx (x, y) = 2x and fy (x, y) = ´2y. So the only
critical point of this function is (0, 0). Is this a local minimum or maximum? Well let’s start with
(x, y) at (0, 0) and then move (x, y) away from (0, 0) and see if f (x, y) gets bigger or smaller. At the
origin f (0, 0) = 0. Of course we can move (x, y) away from (0, 0) in many different directions.
• First consider moving (x, y) along the x-axis. Then (x, y) = (x, 0) and f (x, y) = f (x, 0) = x2 .
So when we start with x = 0 and then increase x, the value of the function f increases —
which means that (0, 0) cannot be a local maximum for f .
• Next let’s move (x, y) away from (0, 0) along the y-axis. Then (x, y) = (0, y) and f (x, y) =
f (0, y) = ´y2 . So when we start with y = 0 and then increase y, the value of the function f
decreases — which means that (0, 0) cannot be a local minimum for f .
So moving away from (0, 0) in one direction causes the value of f to increase, while moving away
from (0, 0) in a second direction causes the value of f to decrease. Consequently (0, 0) is neither
a local minimum or maximum for f . It is called a saddle point, because the graph of f looks like
a saddle. (The full definition of “saddle point” is given immediately after this example.) Here are
some figures showing the graph of f .

The figure below show some level curves of f . Observe from the level curves that
• f increases as you leave (0, 0) walking along the x axis
• f decreases as you leave (0, 0) walking along the y axis
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

y
f =−9

f =−4

f =−1

f =0
f =9 f =4 f =1 f =1 f =4 f =9
x

f =−1

f =−4

f =−9

Example 16.1.12

Approximately speaking, if a critical point (a, b) is neither a local minimum nor a local maximum,
then it is a saddle point. For (a, b) to not be a local minimum, f has to take values smaller than
f (a, b) at some points nearby (a, b). For (a, b) to not be a local maximum, f has to take values
bigger than f (a, b) at some points nearby (a, b). Writing this more mathematically we get the
following definition.

Definition 16.1.13.

The critical point (a, b) is called a saddle point for the function f (x, y) if, for each r ą 0,

• there is at least one point (x, y), within a distance r of (a, b), for which f (x, y) ą
f (a, b) and

• there is at least one point (x, y), within a distance r of (a, b), for which f (x, y) ă
f (a, b).

Understanding what the graph of a function looks like is a powerful tool for classifying critical
points, but it can be very time-consuming. The Second Derivative Test (below) is a more algebraic
approach to classification. This test is often faster than graphing, but the drawback is that it is
sometimes inconclusive.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

Theorem 16.1.14 (Second Derivative Test).

Let r ą 0 and assume that all second order derivatives of the function f (x, y) are continuous
at all points (x, y) that are within a distance r of (a, b). Assume that fx (a, b) = fy (a, b) = 0.
Define
D(x, y) = fxx (x, y) fyy (x, y) ´ fxy (x, y)2
It is called the discriminant of f . Then

• if D(a, b) ą 0 and fxx (a, b) ą 0, then f (x, y) has a local minimum at (a, b),

• if D(a, b) ą 0 and fxx (a, b) ă 0, then f (x, y) has a local maximum at (a, b),

• if D(a, b) ă 0, then f (x, y) has a saddle point at (a, b), but

• if D(a, b) = 0, then we cannot draw any conclusions without more work.

The proof of Theorem 16.1.14 is beyond the scope of Math 105, but there is some intuition
supporting it that is more accessible. Extremely informally, we can think of saddle points as places
with inconsistent concavity: in some directions the surface looks concave up, in other directions
it looks concave down. On the other hand, at a local extremum, the concavity is the same in all
directions.
Let’s do thought experiments on a few simple cases to expand those ideas.
Example 16.1.15 (Second Derivative Test Intuition)
Let (a, b) be a critical point of the function f (x, y) with fx (a, b) = fy (a, b) = 0, and assume all
second-order derivatives fo f (x, y) are continuous.
1. Suppose at (a, b), the surface looks like a minimum if y is held constant, but it looks like a
maximum if x is held constant. (In particular, this means (a, b) is the location of a saddle
point.)

(a, b, f (a, b))

Holding y = b constant, we can think of z = f (x, b) as a one-variable function, in which case


fxx (a, b) ě 0 by the single-variable second derivative test. Holding x = a constant, we can
think of z = f (a, y) as a one-variable function (whose variable is y). In that case, fyy (a, b) ď 0
by the single-variable second derivative test.

f (x, b)

f (a, y)
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

Since fxx (a, b) and fyy (a, b) have different signs (or at least one of them is zero):

fxx (a, b) fyy (a, b) ď 0


2 2
fxx (a, b) fyy (a, b) ´ fxy (a, b) ď ´ fxy )(a, b) ď 0
D(a, b) ď 0

So in this simple saddle-point example, we expect D(a, b) ď 0. This accords with the third
bullet point in Theorem 16.1.14.

2. Suppose D(a, b) ą 0.
2
0 ă fxx (a, b) fyy (a, b) ´ fxy (a, b)
2
fxy (a, b) ă fxx (a, b) fyy (a, b)

Since fxy is raised to an even power, it’s nonnegative.


2
0 ď fxy (a, b) ă fxx (a, b) fyy (a, b)
0 ă fxx (a, b) fyy (a, b)

This tells us that fxx (a, b) and fyy (a, b) have the same sign – either they’re both positive or
they’re both negative. So, the function’s concavity is the same whether we hold the x-value or
the y-value constant. The function might have the same concavity in all directions – unlike the
saddle point example we saw above. So, it seems plausible that critical points with positive
discriminants are local extrema, rather than saddle points.

3. Suppose the surface has a local maximum at (a, b).


Holding y = b constant, we can think of z = f (x, b) as a one-variable function, in which case
fxx (a, b) ď 0 by the single-variable second derivative test.

z = f ( x, y) z = f ( x, b)

b b
a a
y y
x x

This doesn’t go so far as to show us that D(a, b) ě 0, but it does accord with the test of
fxx (a, b) in the second bullet point of Theorem 16.1.14.

4. Similarly, suppose the surface has a local minimum at (a, b).


Holding y = b constant, we can think of z = f (x, b) as a one-variable function, in which case
fxx (a, b) ě 0 by the single-variable second derivative test.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

z = f ( x, y) z = f ( x, b)

b b
a a
y y
x x

Again, although this doesn’t go so far as to show us that D(a, b) ě 0, it does accord with the
test of fxx (a, b) in the first bullet point of Theorem 16.1.14.

Example 16.1.15

You might wonder why, in the local maximum/local minimum cases of Theorem 16.1.14,
fxx (a, b) appears rather than fyy (a, b). The answer is only that x is before y in the alphabet9 . You
can use fyy (a, b) just as well as fxx (a, b). The reason is that if D(a, b) ą 0 (as in the first two bullets
of the theorem), then because D(a, b) = fxx (a, b) fyy (a, b) ´ fxy (a, b)2 ą 0, we necessarily have
fxx (a, b) fyy (a, b) ą 0 so that fxx (a, b) and fyy (a, b) must have the same sign — either both are
positive or both are negative.
You might also wonder why we cannot draw any conclusions when D(a, b) = 0 and what
happens then. The second derivative test for functions of two variables was derived in precisely the
same way as the second derivative test for functions of one variable is derived — you approximate
the function by a polynomial that is of degree two in (x ´ a), (y ´ b) and then you analyze the
behaviour of the quadratic polynomial near (a, b). For this to work, the contributions to f (x, y) from
terms that are of degree two in (x ´ a), (y ´ b) had better be bigger than the contributions to f (x, y)
from terms that are of degree three and higher in (x ´ a), (y ´ b) when (x ´ a), (y ´ b) are really
small. If this is not the case, for example when the terms in f (x, y) that are of degree two in (x ´ a),
(y ´ b) all have coefficients that are exactly zero, the analysis will certainly break down. That’s
exactly what happens when D(a, b) = 0. Here are some examples. The functions

f1 (x, y) = x4 + y4 f2 (x, y) = ´x4 ´ y4 f3 (x, y) = x3 + y3 f4 (x, y) = x4 ´ y4

all have (0, 0) as the only critical point and all have D(0, 0) = 0. The first, f1 has its minimum there.
The second, f2 , has its maximum there. The third and fourth have a saddle point there.
Here are sketchs of some level curves for each of these four functions (with all renamed to
simply f ).

9 The shackles of convention are not limited to mathematics. Election ballots often have the candidates listed in
alphabetic order.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

y f “9 y f “´9
f “4 f “´4

f “1 f “´1

f “0.1 f “´0.1

f “0 f “0
x x

level curves of f px, yq “ x4 ` y 4 level curves of f px, yq “ ´x4 ´ y 4

y y
f “4
f “´4

f “1 f “´1

f “0
f “4 f “1 f “1 f “4

f “0
x f “0 x

f “´1
f “´1
f “´4
f “´4

level curves of f px, yq “ x3 ` y 3 level curves of f px, yq “ x4 ´ y 4


Example 16.1.16 f (x, y) = 2x3 ´ 6xy + y2 + 4y
Find and classify all critical points of f (x, y) = 2x3 ´ 6xy + y2 + 4y.
Solution. Thinking a little way ahead, to find the critical points we will need the first order partial
derivatives. To apply the second derivative test of Theorem 16.1.14 we will need all second order
partial derivatives. So we need all partial derivatives of order up to two. Here they are.

f = 2x3 ´ 6xy + y2 + 4y
fx = 6x2 ´ 6y fxx = 12x fxy = ´6
fy = ´6x + 2y + 4 fyy = 2 fyx = ´6

(Of course, fxy and fyx have to be the same. It is still useful to compute both, as a way to catch some
mechanical errors.)
We have already found, in Example 16.1.7, that the critical points are (1, 1), (2, 4). The
classification is
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

critical
point fxx fyy ´ fxy
2 fxx type
(1, 1) 12 ˆ 2 ´ (´6)2 ă 0 saddle point
(2, 4) 24 ˆ 2 ´ (´6)2 ą0 24 local min
We were able to leave the fxx entry in the top row blank, because
• we knew that fxx (1, 1) fyy (1, 1) ´ fxy
2 (1, 1) ă 0, and

• we knew, from Theorem 16.1.14, that fxx (1, 1) fyy (1, 1) ´ fxy
2 (1, 1) ă 0, by itself, was enough

to ensure that (1, 1) was a saddle point.


Here is a sketch of some level curves of our f (x, y). They are not needed to answer this question,

f p2,4q“0, p2,4q

f “0.25

f “0.5

f “2 f “1

f “3 f “2

f “3

p1,1q, f p1,1q“1
f “0.5

f “0 f “1 x

but can give you some idea as to what the graph of f looks like.
Example 16.1.16

Example 16.1.17 ( f (x, y) = xy(5x + y ´ 15))


Find and classify all critical points of f (x, y) = xy(5x + y ´ 15).
Solution. We have already computed the first order partial derivatives

fx (x, y) = y(10x + y ´ 15) fy (x, y) = x(5x + 2y ´ 15)

of f (x, y) in Example 16.1.8. Again, to classify the critical points we need the second order partial
derivatives. They are

fxx (x, y) = 10y


fyy (x, y) = 2x
fxy (x, y) = (1)(10x + y ´ 15) + y(1)= 10x + 2y ´ 15
fyx (x, y) = (1)(5x + 2y ´ 15) + x(5)= 10x + 2y ´ 15
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

(Once again, we have computed both fxy and fyx to guard against mechanical errors.) We have
already found, in Example 16.1.8, that the critical points are (0, 0), (0, 15), (3, 0) and (1, 5). The
classification is
critical
point fxx fyy ´ fxy
2 fxx type
(0, 0) 0 ˆ 0 ´ (´15)2 ă 0 saddle point
(0, 15) 150 ˆ 0 ´ 152 ă0 saddle point
(3, 0) 0 ˆ 6 ´ 152 ă 0 saddle point
(1, 5) 50 ˆ 2 ´ 52 ą 0 50 local min

Here is a sketch of some level curves of our f (x, y). f is negative in the shaded regions and f is
positive in the unshaded regions. Again this is not needed to answer this question, but can give you

p0,15q, f p0,15q“0

f p1,5q“´25, p1,5q

f “20 f “20
p3,0q, f p3,0q“0
f p0,0q“0, p0,0q f “´20

f “´10
f “0
x
f “´20 f “´20
f “20

some idea as to what the graph of f looks like.


Example 16.1.17

Example 16.1.18
Find and classify all of the critical points of f (x, y) = x3 + xy2 ´ 3x2 ´ 4y2 + 4.
Solution. We know the drill now. We start by computing all of the partial derivatives of f up to order
2.

f = x3 + xy2 ´ 3x2 ´ 4y2 + 4


fx = 3x2 + y2 ´ 6x fxx = 6x ´ 6 fxy = 2y
fy = 2xy ´ 8y fyy = 2x ´ 8 fyx = 2y
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

fx and fy are defined everywhere. So the critical points are then the solutions of fx = 0, fy = 0. That
is

fx = 3x2 + y2 ´ 6x = 0 (E1)
fy = 2y(x ´ 4) = 0 (E2)

The second equation, 2y(x ´ 4) = 0, is satisfied if and only if at least one of the two equations y = 0
and x = 4 is satisfied.

• When y = 0, equation (E1) forces x to obey

0 = 3x2 + 02 ´ 6x = 3x(x ´ 2)

so that x = 0 or x = 2.

• When x = 4, equation (E1) forces y to obey

0 = 3 ˆ 42 + y2 ´ 6 ˆ 4 = 24 + y2

which is impossible.

So, there are two critical points: (0, 0), (2, 0). Here is a table that classifies the critical points.

critical
point fxx fyy ´ fxy
2 fxx type
(0, 0) (´6) ˆ (´8) ´ 02 ą 0 ´6 ă 0 local max
(2, 0) 6 ˆ (´4) ´ 02 ă 0 saddle point

Example 16.1.18

Example 16.1.19
A manufacturer wishes to make an open rectangular box of given volume V using the least possible
material. Find the design specifications.
Solution. Denote by x, y and z, the length, width and height, respectively, of the box.

x
y

The box has two sides of area xz, two sides of area yz and a bottom of area xy. So the total surface
area of material used is
S = 2xz + 2yz + xy
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.1 L OCAL MAXIMUM AND MINIMUM
VALUES

However the three dimensions x, y and z are not independent. The requirement that the box have
volume V imposes the constraint
xyz = V
We can use this constraint to eliminate one variable. Since z is at the end of the alphabet (poor z),
V
we eliminate z by substituting z = xy . Note that if x (or y) is equal to zero then the volume of the
box would equal zero. What is the point of a box with zero volume?! So if we assume the box has
non-zero volume then x ­= 0 and y ­= 0. So we have find the values of x and y that minimize the
function
2V 2V
S(x, y) = + + xy
y x
Let’s start by finding the critical points of S. Since
2V
Sx (x, y) = ´ +y
x2
2V
Sy (x, y) = ´ +x
y2
Note that the partial derivatives are not defined for (x, y) = (0, 0) but we have already eliminated
the case where x or y is equal to zero. So (x, y) is a critical point if and only if

x2 y = 2V (E1)
xy2 = 2V (E2)
2V
Solving (E1) for y gives y = x2
. Substituting this into (E2) gives

4V 2 3
?
3 2V ?
3
x = 2V ùñ x = 2V ùñ x = 2V and y = = 2V
x4 (2V ) 2/3

As there is only one critical point, we would expect it to give the minimum10 . But let’s use the
second derivative test to verify that at least the critical point is a local minimum. The various second
partial derivatives are
4V ?
3
?
3

Sxx (x, y) = Sxx 2V , 2V = 2
x3 ? ? 
3 3
Sxy (x, y) = 1 Sxy 2V , 2V = 1
4V ?
3
?
3

Syy (x, y) = 3 Syy 2V , 2V = 2
y
So
?3
?
3
 ?
3
?3
 ?3
?3
2 ?3
?
3

Sxx 2V , 2V Syy 2V , 2V ´ Sxy 2V , 2V = 3 ą 0 Sxx 2V , 2V = 2 ą 0
? ? 
and, by Theorem 16.1.14.b, 3 2V , 3 2V is a local minimum and the desired dimensions are

?
c
3 3 V
x = y = 2V z=
4

10 Indeed one can use the facts that 0 ă x ă 8, that 0 ă y ă 8, and that S Ñ 8 as x Ñ 0 and as y Ñ 0 and as x Ñ 8
and as y Ñ 8 to prove that the single critical point gives the global minimum.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA

Note that our solution has x = y. That’s a good thing — the function S(x, y) is symmetric in x and y.
Because the box has no top, the symmetry does not extend to z.
Example 16.1.19

16.2 IJ Absolute minima and maxima

Learning Objectives
• Reduce a constrained optimization problem in 3D, where the constraint is a single
function (possibly with endpoints, possibly not), to a single-variable calculus problem.

• Understand that the global extrema of a two-variable function over a closed region
occur along the boundary and/or at critical points of the interior

• Find the extreme values for a function of two variables on a closed region in cases
where optimization on the boundary can be reduced to a single-variable calculus
problem.

Of course a local maximum or minimum of a function need not be the absolute maximum or
minimum. We’ll now consider how to find the absolute maximum and minimum. Let’s start by
reviewing how one finds the absolute maximum and minimum of a function of one variable on an
interval.
For concreteness, let’s suppose that we want to find the extremal11 values of a function f (x) on
the interval 0 ď x ď 1. If an extremal value is attained at some x = a which is in the interior of the
interval, i.e. if 0 ă a ă 1, then a is also a local maximum or minimum and so has to be a critical
point of f . But if an extremal value is attained at a boundary point a of the interval, i.e. if a = 0
or a = 1, then a need not be a critical point of f . This happens, for example, when f (x) = x. The
largest value of f (x) on the interval 0 ď x ď 1 is 1 and is attained at x = 1, but f 1 (x) = 1 is never
zero, so that f has no critical points.

y y = f (x) = x
1

1 x

So to find the maximum and minimum of the function f (x) on the interval [0, 1], you:
1. build up a list of all candidate points 0 ď a ď 1 at which the maximum or miminum could be
attained, by finding all a’s for which either

11 Recall that “extremal value” means “either maximum value or minimum value”.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA

(a) 0 ă a ă 1 and f 1 (a) does not exist or


(b) 0 ă a ă 1 and f 1 (a) = 0 or
(c) a is a boundary point, i.e. a = 0 or a = 1;
2. and then you evaluate f (a) at each a on the list of candidates. The biggest of these candidate
values of f (a) is the absolute maximum and the smallest of these candidate values is the
absolute minimum.
The procedure for finding the maximum and minimum of a function of two variables f (x, y) in a
set like, for example, the unit disk x2 + y2 ď 1, is similar. You again:
1. build up a list of all candidate points (a, b) in the set at which the maximum or minimum
could be attained, by finding all (a, b)’s for which either12
(a) (a, b) is in the interior of the set and fx (a, b) or fy (a, b) does not exist or
(b) (a, b) is in the interior of the set (for our example, a2 + b2 ă 1) and fx (a, b) = fy (a, b) = 0
or
(c) (a, b) is a boundary13 point, (for our example, a2 + b2 = 1), and could give the maximum
or minimum on the boundary — more about this shortly —
2. and then you evaluate f (a, b) at each (a, b) on the list of candidates. The biggest of these
candidate values of f (a, b) is the absolute maximum and the smallest of these candidate values
is the absolute minimum.
The boundary of a set in R2 (like x2 + y2 ď 1) is a curve (like x2 + y2 = 1). This curve is a one
dimensional set, meaning that it is like a deformed x-axis. We can find the maximum and minimum
of f (x, y) on this curve by converting f (x, y) into a function of one variable (on the curve) and using
the standard function of one variable techniques. This is best explained by some examples.
Example 16.2.1
Find the maximum and minimum values of f (x, y) = x3 + xy2 ´ 3x2 ´ 4y2 + 4 on the disk x2 + y2 ď 1.
Solution. Again, we first find all critical points, and then we analyze the boundary.
Interior: If f takes its maximum or minimum value at a point in the interior, x2 + y2 ă 1, then that
point must be a critical point of f . To find the critical points14 we compute the first order derivatives.
fx = 3x2 + y2 ´ 6x fy = 2xy ´ 8y
These are polynomials (in two variables) and they are defined everywhere. So the critical points are
the solutions of
fx = 3x2 + y2 ´ 6x = 0 (E1)
fy = 2y(x ´ 4) = 0 (E2)

12 This is probably a good time to review the statement of Theorem 16.1.2.


13 It should intuitively obvious from a sketch that the boundary of the disk x2 + y2 ď 1 is the circle x2 + y2 = 1. But if
you really need a formal definition, here it is. A point (a, b) is on the boundary of a set S if there is a sequence of
points in S that converges to (a, b) and there is also a sequence of points in the complement of S that converges to
(a, b).
14 We actually found the critical points in Example 16.1.18. But, for the convenience of the reader, we’ll repeat that
here.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA

The second equation, 2y(x ´ 4) = 0, is satisfied if and only if at least one of the two equations y = 0
and x = 4 is satisfied.
• When y = 0, equation (E1) forces x to obey
0 = 3x2 + 02 ´ 6x = 3x(x ´ 2)
so that x = 0 or x = 2.
• When x = 4, equation (E1) forces y to obey
0 = 3 ˆ 42 + y2 ´ 6 ˆ 4 = 24 + y2
which is impossible.
So, there are only two critical points: (0, 0), (2, 0).
Boundary: Our boundary is x2 + y2 = 1 We know that (x, y) satisfies x2 + y2 = 1, and hence
y2 = 1 ´ x2 . Examining the formula for f (x, y), we see that it contains only even15 powers of y, so
we can eliminate y by substituting y2 = 1 ´ x2 into the formula.
f = x3 + x(1 ´ x2 ) ´ 3x2 ´ 4(1 ´ x2 ) + 4 = x + x2
The max and min of x + x2 for ´1 ď x ď 1 must occur either
• when x = ´1 (ñ y = f = 0) or
• when x = +1 (ñ y = 0, f = 2) or
b
• when 0 = d 2
dx (x + x ) = 1 + 2x ( so x = ´ 12 , y = ˘ 34 , f = ´ 14 ).
Here is a sketch showing all of the points that we have identified.

√ y
(− 12 , 2
3
)

(−1, 0) (0, 0) (1, 0) (2, 0) x


(− 21 , − 2
3
)

Note that the point (2, 0) is outside the allowed region16 . So all together, we have the following
candidates for max and min, with the max and min indicated.
? 
point (0, 0) (´1, 0) (1, 0) ´ 2 , ˘ 23
1

value of f 4 2 0 ´ 14
max min

Example 16.2.1

?
2
?we could consider the cases y ě 0 and y ď 0 separately and substitute y = 1 ´ x
15 If it contained odd powers too,
2
in the former case and y = ´ 1 ´ x in the latter case.
16 We found (2, 0) as a solution to the critical point equations (E1), (E2). That’s because, in the course of solving
those equations, we ignored the constraint that x2 + y2 ď 1.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA

Example 16.2.2
Find the maximum and minimum values of f (x, y) = xy ´ x3 y2 when (x, y) runs over the square
0 ď x ď 1, 0 ď y ď 1.
Solution. As usual, let’s examine the critical points and boundary in turn.
Interior: If f takes its maximum or minimum value at a point in the interior, 0 ă x ă 1, 0 ă y ă 1,
then that point must be a critical point of f . To find the critical points we compute the first order
derivatives.
fx (x, y) = y ´ 3x2 y2 fy (x, y) = x ´ 2x3 y
Again, these functions are polynomials in two variables and they are smooth everywhere in their
domain, so the first order partial derivatives exist everywhere in the interior. This means that the
critical points are the solutions of

fx = 0 ðñ y(1 ´ 3x2 y) = 0 ðñ y = 0 or 3x2 y = 1


fy = 0 ðñ x(1 ´ 2x2 y) = 0 ðñ x = 0 or 2x2 y = 1

• If y = 0, we cannot have 2x2 y = 1, so we must have x = 0.


3x2 y 3
• If 3x2 y = 1, we cannot have x = 0, so we must have 2x2 y = 1. Dividing gives 1 = 2x2 y
= 2
which is impossible.
So the only critical point in the square is (0, 0). There f = 0. Boundary: The region is a square, so
its boundary consists of its four sides.
• First, we look at the part of the boundary with x = 0. On that entire side f = 0.

• Next, we look at the part of the boundary with y = 0. On that entire side f = 0.

• Next, we look at the part of the boundary with y = 1. There f = f (x, 1) = x ´ x3 . To find the
maximum and minimum of f (x, y) on the part of the boundary with y = 1, we must find the
maximum and minimum of x ´ x3 when 0 ď x ď 1.
Recall that, in general, the maximum and minimum of a function h(x) on the interval a ď x ď b,
must occur either at x = a or at x = b or at an x for which either h1 (x) = 0 or h1 (x) does not
exist. In this case, ddx (x ´ x3 ) = 1 ´ 3x2 , so the max and min of x ´ x3 for 0 ď x ď 1 must
occur

– either at x = 0, where f = 0,
– or at x = ?13 , where f = 3?
2
3
,
– or at x = 1, where f = 0.

• Finally, we look at the part of the boundary with x = 1. There f = f (1, y) = y ´ y2 . As


1
dy (y ´ y ) = 1 ´ 2y, the only critical point of y ´ y is at y = 2 . So the max and min of y ´ y
d 2 2 2

for 0 ď y ď 1 must occur

– either at y = 0, where f = 0,
– or at y = 21 , where f = 14 ,
– or at y = 1, where f = 0.

All together, we have the following candidates for max and min, with the max and min indicated.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA

point (0, 0) (0,0ďyď1) (0ďxď1,0) (1, 0) (1, 12 ) (1, 1) (0, 1) ( ?13 , 1)


1 2
value of f 0 0 0 0 4 0 0 ?
3 3
« 0.385
min min min min min min max

y
(0, 1) ( √13 , 1) (1, 1)

(1, 12 )

x
(0, 0) (1, 0)

Example 16.2.2

Warning 16.2.3 (Checking Entire Boundaries).

A common misconception when students are first learning about “checking boundaries” is
that the absolute extrema will occur on the “corners” of the boundaries. In the example we
just finished, Example 16.2.2, the four corners of our square boundary were indeed points
we needed to check. But if we had only checked the corners, we wouldn’t have found the
absolute maximum.
In your homework, if you notice that the extrema often occur at “corners” of boundaries,
or at point with x or y equal to 0, you should not take this to be a general rule.

To really see why corners don’t need to be important, consider the image17 below of an area
northeast of UBC. The central body of water in the image is Indian Arm. Indian Arm extends into
the ocean, so its elevation is pretty close to sea level. If we’re thinking of the z axis as height above
sea level, the surface of Indian Arm is probably the global minimum height in the rectangular region
shown. So, the global minimum along the boundary is not at a corner. It’s somewhere in the middle
of the left vertical boundary segment.

17 image generated by Natural Resources Canada’s Atlas of Canada - Toporama and shared under the open government
license
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA

Similarly, looking at the mountains in the image, there’s no reason to imagine the absolute highest
point along the boundary must specifically happen at a corner.
Example 16.2.4
a
Find the high and low points of the surface z = x2 + y2 with (x, y) varying over the square |x| ď 1,
|y| ď 1 .
a
Solution. The function f (x, y) = x2 + y2 has a particularly simple geometric interpretation — it
is the distance from the point (x, y) to the origin. So

• the minimum of f (x, y) is achieved at the point in the square that is nearest the origin —
namely the origin itself. So (0, 0, 0) is the lowest point on the surface and is at height 0.

• The maximum of f (x, y) is achieved at the points in the square that are farthest from
? the
origin — namely the four corners of the square ˘ 1, ˘1 . At those four points z = 2. So
?
the highest points on the surface are (˘1, ˘1, 2).

Even though we have already answered this question, it will be instructive to see what a we would
have found if we had followed our usual protocol. The partial derivatives of f (x, y) = x2 + y2 are
defined for (x, y) ‰ (0, 0) and are
x y
fx (x, y) = a fy (x, y) = a
x + y2
2 x + y2
2

• As we mentioned above, at the point (x, y) = (0, 0) the partial derivatives are not defined. But
(0, 0) is inside the interior of the domain of our function. Therefore, (0, 0) is a critical point.
• There are no other critical points because

– fx = 0 only for x = 0, and


– fy = 0 only for y = 0.
– So (0, 0) is the only critical point because fx and fy are not defined there.

• The boundary of the square consists of its four sides. One side is
ˇ (
(x, y) ˇ x = 1, ´1 ď y ď 1
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA

a a
On this side f = 1 + y2 . As 1 + y2 increases ? with |y|, the smallest value of f on that side
is 1 (when y = 0) and the largest value of f is 2 (when y = ˘1). The same thing happens
on the other three sides. The maximum value of f is achieved at the four corners. Note that fx
and fy are both nonzero at all four corners.

Example 16.2.4

Example 16.2.5 (Disconnecting a Complete Graph)


In graph theory, a complete graph is a collection of n vertices (visualized as dots), every pair of
which is connected by an edge (visualized as lines). The complete graphs on 10 vertices and on 30
vertices are shown below.

Suppose you start with the complete graph on 30 vertices. You delete edges (but not vertices)
one-by-one until the graph is broken into three parts. Every part has at least one vertex (otherwise it
wouldn’t be a part, it would be a nothing) and there are no edges between vertices of different parts.
Some possibilities are shown below to demonstrate.

What is the minimum number of edges you could have deleted, in order to break the graph into
three pieces?
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA

Solution. Let’s name the pieces X, Y , and W , and say the numbers of vertices they contain are x, y,
and w, respectively. Then x ě 1, y ě 1, w ě 1, and x + y + w = 30.
For every vertex in one piece of the broken graph, you must have deleted the edges connecting it
to every vertex in every other piece. So, to delete all the edges from X to Y , you deleted at least xy
edges; to delete all the edges from X to W , you deleted at least xw edges; and to delete all the edges
from Y to W , you deleted at least yw edges. So all together, you deleted at least this many edges:
xy + xw + yw
Since x + y + w = 30, we can eliminate one of these from our expression, and say the minimum
number of edges deleted was:
f (x, y) = xy + x(30 ´ x ´ y) + y(30 ´ x ´ y)
= 30x + 30y ´ x2 ´ xy ´ y2
The domain of this function is all integer pairs in the region bounded by x ě 1, y ě 1, and x + y ď 29.
y

y = 28

y=1

x
x=1 x = 28
x + y = 29

To find the minimum value of f (x, y) in this region, we should check for critical points, and check
all three boundary lines.
• First, let’s check for critical points.
f (x, y) = 30x + 30y ´ x2 ´ xy ´ y2
fx = 30 ´ 2x ´ y fy = 30 ´ 2y ´ x
Solving fx = 0 for y, we find y = 30 ´ 2x. Plugging into the equation fy = 0, we get:
0 = fy = 30 ´ 2(30 ´ 2x) ´ x
= 3x ´ 30
x = 10
y = 30 ´ 2x = 10
So, our only critical point is (10, 10), and this is inside our region.
f (10, 10) = 300 + 300 ´ 100 ´ 100 ´ 100 = 300
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.2 A BSOLUTE MINIMA AND MAXIMA

• Second, let’s check the boundary line y = 1, 1 ď x ď 28. On this portion of the boundary:

f (x, y) = 30x + 30y ´ x2 ´ xy ´ y2


= 30x + 30 ´ x ´ x ´ 1
= 28x + 29
This is an increasing function, so its minimum will be at the smallest value of x in our interval:
x = 1.
f (1, 1) = 57

• Third, we check the boundary line x = 1, 1 ď y ď 28. On this portion of the boundary:

f (x, y) = 30x + 30y ´ x2 ´ xy ´ y2


= 30 + 30y ´ 1 ´ y ´ y
= 28y + 29
This is an increasing function, so its minimum will be at the smallest value of y in our interval:
y = 1.
f (1, 1) = 57

• Fourth, we check the final boundary line, y = 29 ´ x, 1 ď x ď 28. On this portion of the
boundary:

f (x, y) = 30x + 30y ´ x2 ´ xy ´ y2


= 30x + 30(29 ´ x) ´ x2 ´ x(29 ´ x) ´ (29 ´ x)2
= ´x2 + 29x + 29

The one-variable function g(x) = ´x2 + 29x + 29 is a parabola pointing down, so its minimum
will occur at and endpoint of our interval: x = 1 or x = 28.

f (1, 28) = 57 f (28, 1) = 57

Comparing the values from the four bullet points, we find the minimum number of edges we could
have deleted in order to break the complete graph into 3 pieces is 57. We achieve that minimum by
having two pieces of one vertex each, and the remaining piece with all other vertices.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

Remark 1: making use of sketching and symmetry can reduce the amount of work involved in
solving this problem. If we recognize that f (x, y) is a paraboloid opening down, then we know its
critical point will actually be an absolute max – not the minimum we’re looking for.
We can see the x and y are symmetric in f (x, y) and in our region, so we also could have checked
only the boundary x = 1, and not the boundary y = 1, understanding that their minimum values
would be the same.
Remark 2: Our model domain for this problem actually restricts x and y to whole-number values,
as opposed to real numbers. We showed that 57 was the minimum value of f (x, y) over all real
numbers in the sketched region. Since whole numbers are themselves reals, and the minimum
occurred at integer value of x and y (i.e. the minimum is in our model domain), we can be sure that
57 is the minimum over all whole numbers in our domain. If the minimum had occurred at, say
x = 12 and y = 12 , then it wouldn’t have been in our model domain – and this would be a problem for
a different course!
Example 16.2.5

16.3 IJ The method of Lagrange multipliers

Learning Objectives
• Understand that solutions to a particular system of equations correspond to points
along a curve that is locally flat.

• Use the method of Lagrange multipliers to find extrema along a constraint.

• Choose between the method of Lagrange multipliers, and simple plugging in, for
determining extrema along a constraint.

• Find the absolute extrema of a surface over a closed region, using the appropriate
method (Lagrange or plugging in) for investigating the boundary.

In the last section we had to solve a number of problems of the form “What is the maximum value
of the function f on the curve C?” In those examples, the curve C was simple enough that we could
reduce the problem to finding the maximum of a function of one variable. For more complicated
problems this reduction might not be possible. In this section, we introduce another method for
solving such problems. First some nomenclature.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

Definition 16.3.1.

A problem of the form

“Find the maximum and minimum values of the function f (x, y) for (x, y) on the curve
g(x, y) = 0.”

is one type of constrained optimization problem. The function being maximized or


minimized, f (x, y), is called the objective function. The function, g(x, y), whose zero set
is the curve of interest, is called the constraint function.

Such problems are quite common. As we said above, we have already encountered them in
the last section on absolute maxima and minima, when we were looking for the extreme values
of a function on the boundary of a region. In economics “utility functions” are used to model the
relative “usefulness” or “desirability” or “preference” of various economic choices. For example, a
utility function U (w, κ ) might specify the relative level of satisfaction a consumer would get from
purchasing a quantity w of wine and κ of coffee. If the consumer wants to spend $100 and wine
costs $20 per unit and coffee costs $5 per unit, then the consumer would like to mazimize U (w, κ )
subject to the constraint that 20w + 5κ = 100.
To this point we have always solved such constrained optimization problemsby solving g(x, y) =
0 for y as a function of x (or for x as a function of y). However, quite often the function g(x, y) is so
complicated that one cannot explicitly solve g(x, y) = 0 for y as a function of x or for x as a function
of y and one also cannot explicitly parametrize g(x, y) = 0. Or sometimes you can, for example,
solve g(x, y) = 0 for y as a function of x, but the resulting solution is so complicated that it is
really hard, or even virtually impossible, to work with. Direct attacks become even harder in higher
dimensions when, for example, we wish to optimize a function f (x, y, z) subject to a constraint
g(x, y, z) = 0.
There is another procedure called the method of “Lagrange18 multipliers” that comes to our
rescue in these scenarios. Here is the two-dimensional version of the method. There are obvious
analogues is other dimensions.

16.3.1 §§ Motivation for the method


First, some intuition. When we talk about derivatives on a surface, we need to think about the
derivatives in a particular direction.19 Consider in particular the surface formed by all points (x, y)
such that f (x, y) = z, for some function f (x, y). The directions giving zero rate of increase are those
that keep you on a level curve. Those directions are perpendicular to ∇ f (a, b).
The corresponding statement in three dimensions is that ∇ F (a, b, c) is perpendicular to the level
surface F (x, y, z) = F (a, b, c) at (a, b, c). Hence a good way to find a vector normal to the surface
F (x, y, z) = 0 at the point (a, b, c) is to compute the gradient ∇ F (a, b, c).

18 Joseph-Louis Lagrange was actually born Giuseppe Lodovico Lagrangia in Turin, Italy in 1736. He moved to
Berlin in 1766 and then to Paris in 1786. He eventually acquired French citizenship and then the French claimed he
was a French mathematician, while the Italians continued to claim that he was an Italian mathematician.

19 If you’re walking along hilly terrain, changing direction can cause you to change from going uphill to downhill.
Direction definitely matters!
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

Theorem 16.3.2 (Lagrange Multipliers).

Let f (x, y, z) and g(x, y, z) have continuous first partial derivatives in a region of R3
that contains the surface S given by the equation g(x, y, z) = 0. Further sssume that
∇ g(x, y, z) ‰ 0 on S.
If f , restricted to the surface S, has a local extreme value at the point (a, b, c) on S, then
there is a real number λ such that

∇ f (a, b, c) = λ∇
∇g(a, b, c)

that is

fx (a, b, c) = λ gx (a, b, c)
fy (a, b, c) = λ gy (a, b, c)
fz (a, b, c) = λ gz (a, b, c)

The number λ is called a Lagrange multiplier.

Proof. Suppose that (a, b, c) is a point of S and that f (x, y, z) ě f (a, b, c) for all points (x, y, z) on S
that are close to (a, b, c). That is (a, b, c) is a local minimum for f on S. Of course the argument for
a local maximum is virtually identical.
Imagine that we go for a walk on S, with the time t running, say, from t = ´1 to t = +1 and
that at time t = 0 we happen to be exactly at (a, b, c). Let’s say that our position is x(t ), y(t ), z(t )
at time t. Write 
F (t ) = f x(t ), y(t ), z(t )

So F (t ) is the value of f that we see on our walk at time t. Then for all t close to 0, x(t ), y(t ), z(t )

is close to x(0), y(0), z(0) = (a, b, c) so that
 
F (0) = f x(0), y(0), z(0) = f (a, b, c) ď f x(t ), y(t ), z(t ) = F (t )
for all t close to zero. So F (t ) has a local minimum at t = 0 and consequently F 1 (0) = 0.
By the multivariable chain rule,
1 d ˇˇ
F (0) = f x(t ), y(t ), z(t ) ˇ
dt  t =0
 
= fx a, b, c x1 (0) + fy a, b, c y1 (0) + fz a, b, c z1 (0) = 0 (˚)
We may rewrite this as a dot product:
0 = F 1 (0) = ∇ f (a, b, c) ¨ [x1 (0) , y1 (0) , z1 (0)]
ùñ ∇ f (a, b, c) K [x1 (0) , y1 (0) , z1 (0)]
This is true for all paths on S that pass through (a, b, c) at time 0. In particular it is true for all vectors
[x1 (0) , y1 (0) , z1 (0)] that are tangent to S at (a, b, c). So ∇ f (a, b, c) is perpendicular to S at (a, b, c).
But we already knowthat ∇ g(a, b, c) is also perpendicular to S at (a, b, c). So ∇ f (a, b, c) and
∇ g(a, b, c) have to be parallel vectors. That is,
∇ f (a, b, c) = λ∇
∇g(a, b, c)
for some number λ . That’s the Lagrange multiplier rule of our theorem.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

16.3.2 §§ Using the method

Theorem 16.3.3 (Lagrange Multipliers).

Let f (x, y) and g(x, y) have continuous first partial derivatives in a region of R2 that
contains the surface S given by the equation g(x, y) = 0. Further assume that g(x, y) has
no critical points on S.
If f , restricted to the surface S, has a local extreme value at the point (a, b) on S, then
there is a real number λ such that

fx (a, b) = λ gx (a, b)
fy (a, b) = λ gy (a, b)

The number λ is called a Lagrange multiplier.

So to find the maximum and minimum values of f (x, y) on a surface g(x, y) = 0, assuming
that both the objective function f (x, y) and constraint function g(x, y) have continuous first partial
derivatives, and that g(x, y)hasnocritical points, you

1. build up a list of candidate points (x, y, z) by finding all solutions to the equations

fx (x, y) = λ gx (x, y)
fy (x, y) = λ gy (x, y)
g(x, y) = 0

Note that there are three equations and three unknowns, namely x, y, and λ .

2. Then you evaluate f (x, y) at each (x, y) on the list of candidates. The biggest of these
candidate values is the absolute maximum, if an absolute maximum exists. The smallest of
these candidate values is the absolute minimum, if an absolute minimum exists..

Theorem 16.3.3 can be extended to functions of more variables in a natural way. Using higher-
dimensional Lagrange isn’t in our learning goals, but for interest, we want you to see how easily the
method generalizes. The calculus is the same – it’s only the algebra that gets longer.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

Theorem 16.3.4 ((Optional) Lagrange Multipliers for Functions of Three Variables).

Let f (x, y, z) and g(x, y, z) have continuous first partial derivatives in a region of R3 that
contains the surface S given by the equation g(x, y, z) = 0. Further assume that g(x, y, z)
has no critical points on S.
If f , restricted to the surface S, has a local extreme value at the point (a, b, c) on S, then
there is a real number λ such that

fx (a, b, c) = λ gx (a, b, c)
fy (a, b, c) = λ gy (a, b, c)
fz (a, b, c) = λ gz (a, b, c)

The number λ is called a Lagrange multiplier.

Now for a bunch of examples.


Example 16.3.5
Find the maximum and minimum of the function x2 ´ 10x ´ y2 on the ellipse whose equation is
x2 + 4y2 = 16.
Solution. For this first example, we’ll do out the algebra in truly gory detail. Once you get the hang
of it, it’ll go much faster.
Our objective function (the one we want to maximize and/or minimize) is f (x, y) = x2 ´ 10x ´ y2
and the constraint function is g(x, y) = x2 + 4y2 ´ 16. To apply the method of Lagrange multipliers
we start by computing the first-order derivatives of these functions.

fx = 2x ´ 10 fy = ´2y gx = 2x gy = 8y

So, according to the method of Lagrange multipliers, we need to find all solutions to the following
system of equations.

fx = λ gx 2x ´ 10 = λ (2x) (E1)
fy = λ gy ùñ ´2y = λ (8y) (E2)
g(x, y) = 0 x2 + 4y2 ´ 16 = 0 (E3)

(E1) In equation (E1), if 2x is nonzero, then we can divide both sides of the equation by it, to find
x´5
λ = 2x´10
2x , i.e. λ = x . If 2x = 0, then the equation becomes ´10 = 0λ , which is not
true for any λ .

(E2) In equation (E2), if 8y is nonzero, then we can divide both sides of the equation by it, to find
1
λ = ´2y
8y , i.e. λ = ´ 4 . If 8y = 0, then we also get a solution y = 0 for any λ .

(E1)+(E2) We need all three equations to be true at the same time (that is, for the same values of x,
y, and λ . We’ve found two ways for both (E1) and (E2) to be true.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

1
x and λ = ´ 4
x´5
• First way: λ =
• Second way: λ = x´5
x and y = 0

(E3) Now we’ll see which points make (E1) and (E2) true while also making (E3) true.

• First way: λ = x´5


x and λ = ´ 14

x´5 1
λ= and λ = ´
x 4
x´5 1
ùñ =´
x 4
ùñ ´4x + 20 = x
ùñ x=4

In order to satisfy (E3):

0 = 42 + 4y2 ´ 16
0=y

So, the point (x, y) = (4, 0) satisfies all three equations.


x´5
• Second way: λ = x and y = 0. If y = 0, then from E3, we see

0 = x2 + 4 ¨ 02 ´ 16
16 = x2
x = ˘4

So the points to consider are (x, y) = (˘4, 0) .

Now we’ve found the only possible solutions to all three equations: (˘4, 0). (λ has to exist, but
we don’t actually care what it is.) So the method of Lagrange multipliers, Theorem 16.3.3, gives that
the only possible locations of the maximum and minimum of the function f are (4, 0) and (´4, 0).
To complete the problem, we only have to compute f at those points.

point (4, 0) (´4, 0)


value of f ´24 56
min max

Hence the maximum value of x2 ´ 10x ´ y2 on the ellipse is 56 and the minimum value is ´24.

y
x2 ` 4y 2 “ 16

p4,0q
p´4,0q
x
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

Example 16.3.5

In the previous example, we had to make a lot of decisions about how to solve for the solutions
to the system of three equations. Actually, we can start our Lagrange system-solving the same way
every time. The first observation we make is that the partial derivatives of g can be 0, or nonzero. If
they’re zero, this may or may not lead to a solution; if they’re nonzero, this tells us something about
λ.
In the textbook and problem book, we will consistently use the same method to solve the system
of equations. It’s certainly not the only way, and you are free to use other methods. Once you
get used to the computations, you’ll probably start finding ways to make them faster based on the
specifics of individual problems.
Example 16.3.6 (Solving Lagrange in General)
Suppose you want to find all points (x, y) for which a solution exists to the system below.

f x = λ gx (E1)
f y = λ gy (E2)
g(x, y) = 0 (E3)

where λ is some real constant. Our method below will hinge on the observation from the last
example that we get different solutions for zero vs. nonzero partial derivatives of the constraint.

fx fy
• If gx ‰ 0 and gy ‰ 0, then from (E1) we see λ = gx , and from (E2) we see λ = gy . So,
choosing a pair (x, y) such that
fx fy
=
gx gy
means that for some λ , that pair makes (E1) and (E2) true. Simplify the equation above to
find the necessary relationship between x and y, then find which pairs with that relationship
make (E3) true.

• If gx = 0, then from (E1) we see also fx = 0. Then (E1) is true for any λ that we like. We can
check that there exists some λ that makes (E2) true as well. Then, we find the points (x, y)
that make (E3) true as well as gx = fx = 0.

• If gy = 0, then from (E2) we see also fy = 0. Then (E2) is true for any λ that we like. We can
check that there exists some λ that makes (E1) true as well. Then, we find the points (x, y)
that make (E3) true as well as gx = fx = 0.

Sometimes, one or more of these cases won’t lead to any solutions. In Example 16.3.5, we were
immediately able to discard the possibility gx = 0, because it didn’t lead to a solution. Once you’re
practiced with these types of problems, you’ll often see quite quickly which cases you get to discard.

Example 16.3.6
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

We’ll apply our three-case breakdown in subsequent examples.


Example 16.3.7
Find the minimum and maximum values of the objective function
 
f (x, y) = ln x2 ´ 2x + 5 + ln y2 ´ 4y + 13

subject to the constraint


x2 ´ 2x + y2 ´ 4y = 20

Solution. Our constraint function is

g(x, y) = x2 ´ 2x + y2 ´ 4y ´ 20 = 0

We start by setting up the first two equations from the method of Lagrange multipliers.

2x ´ 2
f x = λ gx = λ (2x ´ 2) (E1)
x2 ´ 2x + 5
2y ´ 4
f y = λ gy = λ (2y ´ 4) (E2)
y2 ´ 4y + 13
g(x, y) = 0 x2 ´ 2x + y2 ´ 4y = 20 (E3)

Now we consider our three cases.


1 1
• gx ‰ 0 and gy ‰ 0. From (E1), this means λ = x2 ´2x+5
. From (E2), λ = y2 ´4y+13
.

1 1
=
x2 ´ 2x + 5 y2 ´ 4y + 13
x2 ´ 2x + 5 = y2 ´ 4y + 13
x2 ´ 2x = y2 ´ 4y + 8

This gives us the relationship between x and y that must hold for (E1) and (E2) to be true
under the assumption gx ‰ 0 and gy ‰ 0. Now, in order for (E3) to be true as well:

0 = (x2 ´ 2x) + y2 ´ 4y ´ 20
= (y2 ´ 4y + 8) + y2 ´ 4y ´ 20
= 2y2 ´ 8y ´ 12
0 = y2 ´ 4y ´ 6
?
?
a
4 ˘ 16 ´ 4(1)(´6) 4 ˘ 40
y= = = 2 ˘ 10
2 2
2 2
So, 0 = (x ´ 2x) + y ´ 4y ´ 20
 ? 2 ?
2
= x ´ 2x + 2 ˘ 10 ´ 4(2 ˘ 10) ´ 20
 ?  ?
= x2 ´ 2x + 4 ˘ 4 10 + 10 ´ 8 ¯ 4 10 ´ 20
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

? ?
Note ˘4 2 ¯ 4 2 = 0

= x2 ´ 2x + 4 + 10 ´ 8 ´ 20
= x2 ´ 2x ´ 14
?
?
a
2 ˘ 4 ´ 4(´14) 2 ˘ 2 15
x= = = 1 ˘ 15
2 2
This gives
? us four
? points
 to ?
consider: ?  ? ?  ? ? 
1 + 15, 2 + 10 , 1 ´ 15, 2 + 10 , 1 + 15, 2 ´ 10 , and 1 ´ 15, 2 ´ 10 .

• If gx = 0, then x = 1, and (E1) is true for any λ . Then we can choose whatever λ is necessary
to make (E2) true. By (E3):

0 = x2 ´ 2x + y2 ´ 4y ´ 20
= 1 ´ 2 + y2 ´ 4y ´ 20
= y2 ´ 4y ´ 21
= (y ´ 7)(y + 3)
y = 7, y = ´3

This gives us two points to consider: (1, 7) and (1, ´3).

• If gy = 0, then y = 2, and (E2) is true for any λ . Then we can choose whatever λ is necessary
to make (E1) true. By (E3):

0 = x2 ´ 2x + y2 ´ 4y ´ 20
= x2 ´ 2x + 4 ´ 8 ´ 20
= x2 ´ 2x ´ 24
= (x ´ 6)(x + 4)
x = 6, x = ´4

This gives us two points to consider: (´4, 2) and (6, 2).

So, all together we have eight points that satisfy our three Lagrange equations. It’s left only to
decide which of those points lead to maxima and to minima.

? ? ? ? ? ? ? ?
point (1 + 15, 2 + 10) (1 ´ 15, 2 + 10) (1 + 15, 2 ´ 10) (1 ´ 15, 2 ´ 10)
value of f ln 361 ln 361 ln 361 ln 361
max max max max

point (´4, 2) (6, 2) (1, 7) (1, ´3)


value of f ln 261 ln 261 ln 136 ln 136
min min
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

Our maximum value is ln 361, and our minimum value is ln 136.


Example 16.3.7

Example 16.3.8
Find the ends of the major and minor axes of the ellipse 3x2 ´ 2xy + 3y2 = 4. They are the points on
the ellipse that are farthest from and nearest to the origin.
Solution. Let (x, y) be a point on 3x2 ´ 2xy + 3y2 = 4. This point is at the end of a major axis when
it maximizes its distance from the centre of the ellipse, (0, 0). It is at the end of a minor axis
awhen it
minimizes its distance from (0, 0). So we wish to maximize and minimize the distance x2 + y2
subject to the constraint
g(x, y) = 3x2 ´ 2xy + 3y2 ´ 4 = 0
a a 2
Now maximizing/minimizing x2 + y2 is equivalent20 to maximizing/minimizing its square x2 + y2 =
x2 + y2 . So we are free to choose the objective function
f (x, y) = x2 + y2
which we will do, because it makes the derivatives cleaner. Again, we use Lagrange multipliers to
solve this problem, so we start by finding the partial derivatives.
fx (x, y) = 2x fy (x, y) = 2y gx (x, y) = 6x ´ 2y gy (x, y) = ´2x + 6y
We need to find all solutions to
2x = λ (6x ´ 2y) (E1)
2y = λ (´2x + 6y) (E2)
3x2 ´ 2xy + 3y2 ´ 4 = 0 (E3)
2x 2y y
• If gx ‰ 0 and gy ‰ 0, then λ = 6x´2y = x
by (E1), and λ =
3x´y ´2x+6y = ´x+3y by (E2).
x y
=
3x ´ y ´x + 3y
´x + 3xy = 3xy ´ y2
2

x2 = y2
x = ˘y
So if x = ˘y, then the appropriate λ will make both (E1) and (E2) true. Now let’s see what
makes (E3) true.
4 = 3x2 ´ 2xy + 3y2
4 = 3(˘y)2 ´ 2(˘y)y + 3y2
= 3y2 ¯ 2y2 + 3y2
= (6 ¯ 2)y2
1
4 = ( 6 + 2 ) x2 ùñ x = ˘ ? when x = ´y
2
4 = ( 6 ´ 2 ) x2 ùñ x = ˘1 when x = y

20 The function S(z) = z2 is a strictly increasing function for z ě 0. So, for a, b ě 0, the statement “a ă b” is equivalent
to the statement “S(a) ă S(b)”.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

 
This gives us four points to check: the two points ˘ ?1 , ´ ?1 and the two points ˘(1, 1)
2 2

• If gx = 0, then 6x ´ 2y = 0, i.e. y = 3x. By (E1), x = 0, so y = 0. Then (E3) doesn’t hold, so


this leads to no solutions.

• If gy = 0, then ´2x + 6y = 0, i.e. x = 3y. By (E2), y = 0, so x = 0. Then (E3) doesn’t hold,


so this leads to no solutions.

?
The distance
 from ( 0, 0 ) to ˘ ( 1, 1 ) , namely  distance from (0, 0) to
2, is larger than the
1 1 1 1
˘ 2 , ´ 2 , namely 1. So the ends of the minor axes are ˘ 2 , ´ 2 and the ends of the major
? ? ? ?

axes are ˘(1, 1). Those ends are sketched in the figure on the left below. Once we have the ends, it
is an easy matter21 to sketch the ellipse as in the figure on the right below.

y y

p1,1q p1,1q
? ?
p´1,1q{ 2 p´1,1q{ 2

x x

? ?
p1,´1q{ 2 p1,´1q{ 2
p´1,´1q p´1,´1q

3x2 ´ 2xy ` 3y 2 “ 4

Example 16.3.8

In the previous examples, the objective function and the constraint were specified explicitly. That
will not always be the case. In the next example, we have to do a little geometry to extract them.

Example 16.3.9
Find the rectangle of largest area (with sides parallel to the coordinates axes) that can be inscribed in
the ellipse x2 + 2y2 = 1.

Solution. Since this question is so geometric, it is best to start by drawing a picture.

21 if you tilt your head so that the line through (1, 1) and (´1, ´1) appears horizontal
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

y
x2 ` 2y 2 “ 1
px, yq

p´x, ´yq px, ´yq

Call the coordinates of the upper right corner of the rectangle (x, y), as in the figure above. Note
that x ě 0 and y ě 0; and if x = 0 or y = 0, then the area of the rectangle is 0, which is certainly
not a maximum. So the global maximum must occur at some point where x and y are both positive.
This will also be a local maximum, so we should be able to find it using the method of Lagrange
multipliers.
The four corners of the rectangle are (˘x, ˘y) so the rectangle has width 2x and height 2y and the
objective function is f (x, y) = 4xy. The constraint function for this problem is g(x, y) = x2 + 2y2 ´ 1.
Again, to use Lagrange mutlipliers we need the first order partial derivatives.

fx = 4y fy = 4x gx = 2x gy = 4y

So, according to the method of Lagrange multipliers, we need to find all solutions to

4y = λ (2x) (E1)
4x = λ (4y) (E2)
x2 + 2y2 ´ 1 = 0 (E3)
4y 2y 4x
• If gx ‰ 0 and gy ‰ 0, then λ = 2x = x from (E1) and λ = 4y = xy from (E2). So,

2y x
=
x y
2y = x2
2
?
x = (˘ 2)y

From (E3),
 ? 2
(˘ 2)y + 2y2 ´ 1 = 0
2y2 + 2y2 = 1
4y2 = 1
1
y=˘
2
? 1
x = (˘ 2)y = ˘ ?
2
 
So there are four points to consider: ˘ ?12 , ˘ 12 .
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

• If gx = 0, i.e. 2x = 0, then x = 0; by (E1) also y = 0; but then (E3) fails. So this doesn’t give
us any more points to consider.

• If gy = 0, i.e. 4y = 0, then y = 0; by (E2) also x = 0; but then (E3) fails. So this doesn’t give
us any more points to consider either.
?  ?  ? 
We now have four possible values of (x, y), namely 1/ 2 , 1/2 , ´ 1/ 2 , ´1/2 , 1/ 2 , ´1/2
? 
and ´ 1/ 2 , 1/2 . They are the four corners of a single rectangle. We said that we wanted (x, y) to
? 
be the upper right corner, i.e. the corner in the first quadrant. It is 1/ 2 , 1/2 .
How do we interpret the other three points we found? The global min of the function 4xy subject
to the constraint x2 + 2y2 = 1 will occur at one of these points, but those points aren’t in our model
domain. When x and y have different signs, 4xy no longer gives the area of a rectangle, since it’s
negative. Over our model domain, we kind of have “endpoints:” x = 0 and y = 0. Our maximum
occurred somewhere between our endpoints; our model minimum occurs at the endpoints.
Example 16.3.9

16.3.3 §§ Bounded vs unbounded Constraints


In the last example, we had to think a little extra about whether the solution to the Lagrange equations
gave a maximum or minimum. Take a closer look at Theorem 16.3.3: all local extrema will occur at
a solution point. So when do the solution points definitely also include all absolute extrema?

1. If our constraint function is a closed curve (circle, ellipse, square, etc.) and our objective
function is continuous over it, then there will certainly be an absolute max and absolute min
over the constraint; and these will certainly also be local extrema. So when our constraint is
a closed curve, and our objective function is continuous over it, we are guaranteed that the
absolute max and min exist, and are at points that satisfy the Lagrange equations.

In Section 16.2 we considered domains that were bounded by a closed curve, so we only
considered boundaries of this type.

2. If our constraint function is not a closed curve (e.g. a line, a line segment, a function like
xy = 1, etc.) then the system is more complicated. Assume that the objective function is
continuous over the constraint curve. Since our constraint curve is one-dimensional (like a
line, but a line that has some orientation in space), we’re in a similar position as we were in
single-variable calculus: extrema can occur at endpoints, or at “critical points.” In our case,
“critical points” translate to solutions to the Lagrange equations; “endpoints” mean pretty
much the same thing they always have.
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

(a) If the constraint function is bounded, we must consider its endpoints as well as solutions
to the Lagrange system. There will be an absolute maximum and minimum, and these
will definitely occur at solutions to the Lagrange system or at the endpoints of the
constraint.

(b) If the constraint function is unbounded, there may or may not exist absolute extrema.
This is where you’ll most heavily rely on your understanding of function shape and
behaviour. Limits can be useful here.

Example 16.3.10
Find the values of w ě 0 and κ ě 0 that maximize the utility function

U (w, κ ) = 6w /3 κ
2 1/3
subject to the constraint 4w + 2κ = 12

Solution. The constraint 4w + 2κ = 12 is simple enough that we can easily use it to express
κ in terms of w, then substitute κ = 6 ´ 2w into U (w, κ ), and then maximize U (w, 6 ´ 2w) =
6w2/3 (6 ´ 2w)1/3 using the techniques of last semester.
However, for practice purposes, we’ll use Lagrange multipliers with the objective function
U (w, κ ) = 6w2/3 κ 1/3 and the constraint function g(w, κ ) = 4w + 2κ ´ 12. The first order derivatives
of these functions are

Uw = 4w´ /3 κ
1/3
Uκ = 2w /3 κ ´ /3
1 2 2
gw = 4 gκ = 2

The boundary values (“endpoints”) w = 0 and κ = 0 give utility 0, which is obviously not going to
be the maximum utility. So it suffices to consider only local maxima. According to the method of
Lagrange multipliers, we need to find all solutions to

4w´ /3 κ
1 1/3
= 4λ (E1)
2w /3 κ ´ /3 = 2λ
2 2
(E2)
4w + 2κ ´ 12 = 0 (E3)

Then we see gx ‰ 0 and gw ‰ 0, so we only have one of our usual three cases.

• equation (E1) gives λ = w´1/3 κ 1/3 .


O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

• Substituting this into (E2) gives w2/3 κ ´2/3 = λ = w´1/3 κ 1/3 and hence w = κ.

• Then substituting w = κ into (E3) gives 6κ = 12.

So w = κ = 2 and the maximum utility is U (2, 2) = 12.


Note in this example we had a bounded (but not closed) curve. It has endpoints (0, 6) and
(3, 0). Since the maximum didn’t occur at the endpoints, then the global maximum was also a local
maximum, and so it showed up as a solution to the system of Lagrange equations.
Example 16.3.10
O PTIMIZATION OF M ULTIVARIABLE F UNCTIONS 16.3 L AGRANGE MULTIPLIERS

482
Appendix

483
Appendix A

L IST OF LEARNING OBJECTIVES

Chapter 0: Introduction
• Solve a long question by breaking it up into smaller pieces.

• Apply mathematical concepts to models of physical processes.

• Apply concepts creatively to unfamiliar contexts.

• Be able to clearly and effectively communicate mathematical content in prose.

• Understand some basic ideas about what constitutes a proof in mathematics; understand the
differences between how something is defined and how it is computed.

• Correctly and appropriately manipulate algebraic and trigonometric expressions: simplifica-


tion, solving, etc.

Chapter 1: Power functions as building blocks


1.1: Power functions

• Sketch functions of the form f (x) = xn , where n is a real number (power functions); interpret
the shapes of power functions relative to one another.

• Determine which term in a polynomial function will dominate for small x and for large x.

1.2: First steps in graph sketching

• Sketch two-term polynomial functions by determining which term dominates for small x and
for large x. For example, sketch f (x) = x2 ´ 3x4 .

1.3: Rate of reaction

485
L IST OF LEARNING OBJECTIVES

1.5: Familiar functions

• Know that ex eventually dominates any given power function, and any power function with
positive exponent dominates logarithm (for large positive x). Use these facts for sketching.
For example, sketch f (x) = ex ´ x.
?
• Sketch familiar functions such as ex , log x, sin x, cos x, tan x, 1/x, x, and |x|.

Chapter 2: Limits
2.1: Quick review of limits

• Explain using both words and pictures what lim f (x) = L, lim f (x) = L, and lim f (x) = L
xÑa xÑa´ xÑa+
mean (including the case where L is equal to 8 or ´8).

• Explain using both words and pictures what lim f (x) = L and lim f (x) = L mean (includ-
xÑ8 xÑ´8
ing the case where L is equal to 8 or ´8).

• Find the limit of a function at a point given the graph of the function.

• Understand when limits do and do not exist.

2.2: Asymptotes

• Evaluate limits of polynomial, rational, trigonometric, exponential, and logarithmic functions.

• Explain using both informal language and the language of limits what it means for a function
to have a horizontal or vertical asymptote.

• Given a simple function, find its vertical and horizontal asymptotes by asymptotic reasoning
or by taking limits.

• Explain why it is not true that a function cannot cross its horizontal asymptote.

2.3: Limits and continuity

• Explain informally and formally what it means for a function to be continuous on its domain.

• Identify and classify points of discontinuity (jump, infinite, removable).

• Determine where a given function is continuous. Use formal notation as well as informal
explanation.

• Given a function defined with parameters, select parameter values that make the function
continuous.

486
L IST OF LEARNING OBJECTIVES

Chapter 3: Introduction to the Derivative


3.1: Review: lines

• Given an equation for a line, sketch the line, and identify its slope.

• Describe negative / positive / zero slope as corresponding to a line that is decreasing / increasing
/ constant over an interval.

• Find a line from two points; from a point and a slope; or from a clearly labelled graph.

• Find the slope at various points of a piecewise-linear function

3.2: Slopes and rates of change

• Describe the slope of a linear function as the rate of change of that function (change in y over
change in x).

• Compute the average rate of change of a nonlinear function over an interval.

3.3: The Derivative

• Explain using words, pictures, and the language of limits what a derivative is.

• Use the definition of derivative to find the tangent line to a function at a given point.

• Describe the tangent line as an approximation to a function at a given point.

• Describe the derivative of a function as a function itself.

• Given the graph of a function, sketch the graph of its derivative.

• Interpret derivatives as instantaneous rates of change

• Explain why the definition of a derivative is important, even if you know shortcuts for
computation.

3.4: Higher order derivatives

• Understand what is meant by ‘higher-order derivatives,’ and compute them.

3.5: Derivatives of exponential functions

• Use the definition of the derivative to show that the derivative of the function f (x) = ax (where
a is a positive constant) is a constant times ax .

• Describe the exponential function ex in terms of its derivative.

• Note the useful modelling power of a function whose derivative is proportional to itself.

487
L IST OF LEARNING OBJECTIVES

Chapter 4: Computing Derivatives


4.1: Arithmetic of derivatives

• Demonstrate using the limit definition of derivative that differentiation is linear.

• Use linearity to “break down” derivatives of sums and constant multiples.

• Use counterexamples to demonstrate that certain statements about derivatives are false.

• Explain why an example does not constitute a “proof”.

• Demonstrate the Power Rule for integer exponents using the limit definition of derivative.

• State and apply the Power Rule.

• Use the Product Rule to differentiate the product of functions.

• Use the Quotient Rule to differentiate the quotient of functions.

4.2: Trigonometric functions and their derivatives

• Review the definitions of trigonometric functions.

• Determine derivatives of trigonometric functions using the limit definition of derivative,


trigonometric limits, addition formulas, and Product and Quotient Rules.

4.3: The chain rule

• Use the chain rule to compute derivatives of compositions of functions.

4.4: Logarithmic differentiation

• Differentiate logarithmic functions.

• Determine when to use logarithmic differentiation to simplify derivatives.

• Use logarithmic differentiation.

• Use the generalized product rule to compute the derivative of products of many functions.

4.5: Implicit differentiation

• Explain how implicit differentiation is a consequence of the Chain Rule.

• Use implicit differentiation to find slopes of tangent lines to implicitly defined curves.

4.6: Inverse functions

488
L IST OF LEARNING OBJECTIVES

4.7: Inverse trig functions and their derivatives

• Sketch f (x) = arctan x.

• Evaluate (at nice points) the inverse trigonometric functions arcsin(x), arccos(x) and arctan(x).

• Use implicit differentiation / chain rule to find the derivatives of the inverse trigonometric
functions arcsin(x), arccos(x) and arctan(x).

Chapter 5: Related Rates

• Implement a sequence of steps to solve related rates problems.

Chapter 6: L’Hôpital’s Rule and Indeterminate Forms

• Recognize the two types of indeterminate forms where L’Hôpital’s rule is directly applicable.

• Use L’Hôpital’s rule to evaluate limits; compare/contrast with asymptotics.

Chapter 7: Sketching Graphs


7.1: Domain, intercepts and asymptotes

• Sketch a function using information from precalculus (limits, intercepts) and the first derivative

• Efficiently find signs of factored functions by determining where the signs change.

7.3: Second derivative — concavity

• Explain what it means for a twice-differentiable function to be concave up or concave down


on an interval.

• Determine whether a twice-differentiable function is concave up or concave down on an


interval.

• Explain how information about the graph of a function may be extracted from the function, its
derivative and its second derivative.

• Sketch the graph of a function f (x) using the function, its derivative and its second derivative.

• Sketch the graph of a function using characteristics determined from the function and its
derivatives, without scaffolding from an external source.

489
L IST OF LEARNING OBJECTIVES

Chapter 8: Optimization
• Determine the critical and singular points of a function.
• Identify local extrema of a function.
• Find the global extrema of a function on a closed interval.
• Explain how the algorithm can be used in optimization problems. (Note that finding a critical
point is not enough to identify an extremum.)
• Convert geometric information into a function optimization problem.
• Interpret model optimization problems based on real-world examples according to their
context.

Chapter 9: Approximating Functions Near a Specified Point— Taylor Polyno-


mials
• Use a linear approximation to approximate a differentiable function that is difficult to evaluate
exactly. This includes choosing an appropriate centre point.
• Use a linear approximation to approximate an irrational number with a rational number. This
may include choosing an appropriate centre point as well as an appropriate function.
• Explain what a degree n approximation of a function is.
• Determine degree n approximations for appropriately differentiable functions.
1 x
• State the Maclaurin polynomials for the standard functions: 1´x , e , cos x, sin x, log (1 + x).
9.6: (Flavour A) Error in Taylor Polynomials
• Be able to use the formula for the error in Taylor polynomial approximations, and interpret
its result. For example: determine a bound on the error of a polynomial approximation at a
point; determine a range for which a particular approximation has an error within a certain
tolerance; or determine which degree Taylor approximation will result in an error within a
certain tolerance.

Chapter 10: (Flavour A) Newton’s Method


• Given a function, find an integer that is reasonably close to the root.
• Given a differentiable function, find the x-intercept of the tangent line at a particular point.
• Explain how Newton’s method works. That is, how you can use tangent lines to approximate
the roots of a function.
• Write down the formula for Newton’s method and explain what each term in the equation
represents.
• Use Newton’s method to estimate the root(s) of a function.
• Recognize pathological cases where Newton’s Method doesn’t converge to a root.

490
L IST OF LEARNING OBJECTIVES

Chapter 11: (Flavours A, B) Introduction to Differential Equations


• Explain how a differential equation is different from an algebraic equation.

• Check whether a given function satisfies a differential equation.

• Understand basic differential-equation models of exponential growth and decay.

• Identify solutions to simple differential equations (of the form y1 = ay ) and interpret them in
context.

• Given an initial condition, find a particular solution that satisfies a differential equation.

Chapter 12: (Flavours A, B) Solving differential equations


12.3: Euler’s method and numerical solutions

• Explain how a differential equation may be solved computationally using linear approxima-
tions. That is, explain how Euler’s method works.

• Explain what each term represents in the formula for Euler’s method.

• Examine and compare computational (numerical) and exact (analytical) solutions to differen-
tial equations.

• Use Euler’s method to solve a differential equation by hand (small number of steps)

Chapter 13: (Flavours A, B) Qualitative methods for differential equations


13.2: The geometry of change

• Find linear approximations of a solution to a DE, given a point.

• Sketch a linear approximation of a solution to a DE at a point.

• Interpret slope fields for a given differential equation and use them to roughly sketch solutions.

13.3: (Flavour B) State-space diagrams

• Explain what is meant by a “steady-state” solution.

• Find steady-state solutions to simple differential equations.

• Sketch a state-space diagram for a given differential equation and use it to describe the
behaviour of solutions.

• Explain what it means for a steady-state solution to be “stable”. Determine the stability of a
steady state.

• Use a state-space diagram to identify stability of steady states

491
L IST OF LEARNING OBJECTIVES

Chapter 14: (Flavour C) Geometry in Three Dimensions


14.1: Points and planes

• Label points on the x-y-z axes and identify basic planes of constant x, y, or z.

14.2: Functions of two variables

• Given a simple function of two variables, z = f (x, y), evaluate z values for given pairs (x, y).

14.3: Sketching surfaces in 3D

Chapter 15: (Flavour C) Partial Derivatives


15.1: Partial derivatives

• Compute partial derivatives of two-variable functions.

• Provide a physical interpretation of a partial derivative in terms of directional steepness at a


point on a surface.

15.2: Higher order derivatives

• Compute the second order partial derivatives given a function of two variables.

• State without proof that the mixed partials should be equal for “nice” functions.

Chapter 16: (Flavour C) Optimization of Multivariable Functions


16.1: Local maximum and minimum values

• Define critical point and singular point for a function of two variables.

• Compute the critical points and singular points of a given function of two variables.

• State (without proof) that extreme values of a continuous multivariable function will occur at
critical or singular points.

• Be able to visualize critical points as ‘flat spots.’

• Use the second derivative test to classify critical points as either local maximums, local
minimums, or saddle points.

• Explain using words or pictures what a saddle point is.

492
L IST OF LEARNING OBJECTIVES

16.2: Absolute minima and maxima

• Reduce a constrained optimization problem in 3D, where the constraint is a single function
(possibly with endpoints, possibly not), to a single-variable calculus problem.

• Understand that the global extrema of a two-variable function over a closed region occur along
the boundary and/or at critical points of the interior

• Find the extreme values for a function of two variables on a closed region in cases where
optimization on the boundary can be reduced to a single-variable calculus problem.

16.3: Lagrange multipliers

• Understand that solutions to a particular system of equations correspond to points along a


curve that is locally flat.

• Use the method of Lagrange multipliers to find extrema along a constraint.

• Choose between the method of Lagrange multipliers, and simple plugging in, for determining
extrema along a constraint.

• Find the absolute extrema of a surface over a closed region, using the appropriate method
(Lagrange or plugging in) for investigating the boundary.

493

You might also like